The traditional methods of supervised learning often encounter difficulties when applied to graph analysis as they require labeled data, which is complex and challenging in the case of academic, social, and biological networks. Graph Self-supervised Pre-training (GSP) techniques, classified broadly as contrastive and generative, address these limitations by harnessing the inherent structures and features of graph data to glean significant representations without the need for labeled examples.
Contrastive methods create multiple graph perspectives through augmentation and learn these representations by distinguishing between positive and negative samples. On the other hand, generative methods learn node representations focusing on a reconstruction objective. Although Generative graph-masked AutoEncoder (GMAE) models currently focus on reconstructing node features and capturing node-level information, they fall short in accommodating the multi-scale nature inherent in many graphs and fail to grasp the higher-level structural information effectively.
Addressing these limitations, a team of researchers, including those from Wuhan University, developed the Hierarchical Graph Masked AutoEncoders (Hi-GMAE) framework. Hi-GMAE consists of three main parts designed to harness hierarchical information in graphs: multi-scale coarsening, Coarse-to-Fine (CoFi) masking with recovery, and Fine- and Coarse-Grained (Fi-Co) encoder and decoder.
In the multi-scale coarsening, coarse graphs are created at multiple scales using graph pooling methods, thereby progressively clustering nodes into super-nodes. The CoFi masking with recovery introduces a novel masking technique that ensures the masked subgraphs’ consistency across all scales, achieved by beginning with random masking of the coarsest graph and then back-projecting the mask to finer scales. Finally, the Fi-Co encoder and decoder use fine-grained graph convolution modules to comprehend local information and coarse-grained graph transformer (GT) modules to concentrate on global information.
The effectiveness of the Hi-GMAE framework was established through extensive experiments on widely-used datasets for unsupervised and transfer learning tasks. The experimental results showed that Hi-GMAE outperforms existing models, underlining the advantages of the multi-scale GMAE approach over traditional single-scale models, particularly in capturing and leveraging hierarchical graph information.
Thus, Hi-GMAE stands as a significant advancement in self-supervised graph pre-training by capably understanding graph structures’ complexities at different levels. Its superior performance in experimental assessments suggests its potential as an influential tool for graph learning tasks and sets a new standard in graph analysis.