Skip to content Skip to footer

Improving Graph Neural Network Training with DiskGNN: A Significant Advancement towards Effective Large-Scale Learning

Graph Neural Networks (GNNs) are essential for processing complex data structures in domains such as e-commerce and social networks. However, as graph data volume increases, existing systems struggle to efficiently handle data that exceed memory capacity. This warrants out-of-core solutions where data resides on disk. Yet, such systems have faced challenges balancing speed of data access and model accuracy.

Certain systems either suffer from slow input/output operations due to minute, frequent reads from the disk or compromise accuracy by handling graph data in disconnected segments. Notable methods like Ginex and MariusGNN have indicated significant shortcomings in either training speed or accuracy due to these issues.

Researchers from various institutions, including Southern University of Science and Technology and New York University, have developed DiskGNN to optimize speed and accuracy of GNN training on large datasets. With an offline sampling technique, DiskGNN preprocesses and organizes graph data based on projected access patterns. This reduces needless disk reads, bolstering training efficiency.

The DiskGNN architecture uses a multi-tiered storage approach, using GPU and CPU memory with disk storage. This design keeps frequently accessed data closer to computation layers, which greatly accelerates the training process. In benchmark tests, DiskGNN was over eight times faster than baseline methods, with training epochs averaging 76 seconds compared to systems like Ginex’s 580 seconds.

Evaluations highlight DiskGNN’s effectiveness in accelerating GNN training and maintaining high model accuracy. Tests involving the Ogbn-papers100M graph dataset showed DiskGNN matched or surpassed the best model accuracies from existing systems while significantly reducing average epoch time and disk access time. With an average disk access time of only 51.2 seconds, compared to previous systems’ 412 seconds, DiskGNN maintained an accuracy of about 65.9%.

The innovative design of DiskGNN mitigates the traditionally amplified read operations found in disk-based systems. By arranging node features into contiguous blocks on the disk, the system sidesteps the usual necessity for multiple, small-scale read operations at each training step. This lessens the burden on the storage system and reduces the time spent waiting for data, thereby optimizing the overall training pipeline.

Overall, DiskGNN sets a new benchmark for out-of-core GNN training by effectively addressing data access speed and model accuracy issues. Its strategic management of data and innovative architecture allow it to outperform existing solutions, offering a faster and more accurate means to train graph neural networks. Consequently, DiskGNN offers an invaluable tool for those working with large graph datasets where speed and accuracy are essential.

Leave a comment

0.0/5