The standard Transformer models in machine learning have encountered significant challenges when applied to graph data due to their quadratic computational complexity, which scales with the number of nodes in the graph. Past efforts to navigate these obstacles have tended to diminish the key advantage of self-attention, which is a global receptive field, or have been incompatible with the relative structural encoding used in graph Transformers.
A team of researchers proposed a novel solution named AnchorGT, which addresses the scalability issue while simultaneously maintaining the power of transformers. The central concept of AnchorGT is assigning a small set of strategically chosen “anchor” nodes that represent information hubs. Instead of every node attending to all others, nodes only need to attend to their local neighbors and these anchors, substantially reducing computational load while still transmitting wider information.
The team used a concept from graph theory called the “k-dominating set” to select these anchor nodes. This efficient selection process involves iteratively choosing high-degree nodes and removing their k-hop neighborhoods until all nodes are covered. Each node’s attention is then focused on its k-hop neighbors and the anchor set.
The researchers theoretically demonstrated that AnchorGT is more expressive than traditional graph neural networks. This was based on certain conditions, such as the shortest path distance, and according to the Weisfeiler-Lehman test, a powerful tool for graph representation analysis.
Throughout their experiments, the researchers tested AnchorGT versions of popular graph Transformer models on multiple learning tasks. The results showed that these AnchorGT models matched or outperformed their original models in performance while being more memory-efficient and faster. For example, Graphormer-AnchorGT outperformed the original Graphormer using 60% less GPU memory during training.
The success of AnchorGT is attributed to its ability to maintain a balance between computational efficiency and expressive power. By utilizing the concept of anchor nodes and redesigning the attention mechanism, the researchers have made graph Transformers more practical for large-scale data, extending their applications to a wider range of domains involving graph-structured data.