This article details a recent Google study whose goal is to train Large Language Models (LLMs) to better process information represented in graph form. LLMs are typically trained on text, but graphs provide a more efficient way of organising information due to their visual representation of relationships between entities (nodes) as connected by links (edges). One hurdle is the complex task of converting graph data into a language that LLMs can understand.
To test the best method for graph-to-text translation, the researchers created a benchmark known as GraphQA. Using many different types of graphs ensures a significant number of connections, revealing potential biases in an LLM’s analysis. Various graph operations fall within the scope of GraphQA, including verifying the existence of an edge, counting nodes or edges, determining which nodes a given one is connected to, and detecting cycles within a graph. All these tasks require knowledge of the relationships between nodes and edges.
GraphQA covers a broad spectrum of tasks, from identifying patterns to making new connections. It also includes generating random graphs via various algorithms and simpler graph structures such as routes, full graphs, and star graphs. This diversity provides a wide-ranging data set with which to train LLMs.
The research team conducted multiple experiments. One tested the performance of pre-trained LLMs on graph tasks such as cycle detection and node degree estimation. The results showed that encoding is crucial – the way the graph is represented in text significantly impacts LLM performance.
A second experiment tested whether LLM performance increases with larger model size (more parameters). Often, bigger models performed better, seemingly able to learn more complex patterns. However, some tasks, such as cycle checks, were less affected by size.
The team also investigated whether graph shape (the connections between nodes) impacts the problem-solving abilities of LLMs. They found that, indeed, a graph’s structure significantly affects LLM performance. For example, LLMs performed well on graphs containing many closely linked edges, but struggled with path graphs.
This research suggests that proper encoding methods can boost LLM accuracy on graph tasks by factor of five to sixty-plus. The goal is that the new benchmark, GraphQA, will drive more studies in the field.