A new study by Google is aiming to teach powerful large language models (LLMs) how to reason better with graph information. In computer science, the term ‘graph’ refers to the connections between entities – with nodes being the objects and edges being the links that signify their relationships. This type of information, which is inherent to the structure of the internet, provides a more effective way of organizing data compared to ordinary text.
To test the most effective methods for graph-to-text translation, the researchers created a benchmark named GraphQA. Unlike tests that rely on one graph type, the team used various graphs to cover a wide range of connections, making the test more representative of real-world situations LLMs might come across.
The GraphQA test includes elementary operations such as confirming the existence of an edge, counting edges or nodes, determining which nodes are linked to a certain node, and detecting cycles in a graph. These seemingly simple tasks require an understanding of relationships between nodes and edges. GraphQA also encompasses the creation of random graphs through various methods and the generation of simpler graph structures, providing a diverse dataset for training.
The team evaluated the performance of pre-trained LLMs on graph tasks like cycle detection, node degree estimation, and connection identification. They found that encoding significantly influenced LLM performance, with ‘incident’ encoding generally performing well.
The researchers also investigated the impact of LLM size and graph shape on performance. They found that larger models often performed better on graph reasoning tasks as they had more parameters to learn intricate patterns. However, the ‘edge existence’ task was less influenced by size, and even the largest LLM couldn’t reliably outperform a basic solution when it came to the cycle check task.
The study showed that the shape of a graph, or the connections between its nodes, considerably affected LLM performance. For instance, LLMs performed well on graphs with closely linked edges but struggled with path graphs.
The research offers insights into the best practices for preparing graphs for LLMs and suggests that the right encoding methods can increase an LLM’s accuracy on graph tasks by a factor of five to sixty-plus. The team hopes that the new benchmark, GraphQA, encourages more studies in this field.