DeepMind researchers have presented TransNAR, a new hybrid architecture which pairs the language comprehension capabilities of Transformers with the robust algorithmic abilities of pre-trained graph neural networks (GNNs), known as neural algorithmic reasoners (NARs. This combination is designed to enhance the reasoning capabilities of language models, while maintaining generalization capacities.
The routine issue faced by purely GNN-based NARs is rigid input formatting. They struggle when it comes to noisy forms such as natural language variations. Transformers, on the other hand, are good at handling chaotic text data but lack the ability to cope with algorithmic tasks, particularly those requiring out-of-distribution reasoning.
The TransNAR method builds on numerous research areas including neural algorithmic reasoning, length generalization in language models, tool use, and multimodality. It employs a pre-trained multi-task NAR module and integrates it with a language model to overcome the length generalization limitations. The Transformer in the TransNAR model utilizes cross-attention to access high-dimensional token embeddings. This method aims to enhance reasoning abilities and improve how natural language algorithmic tasks are handled.
TransNAR accepts two inputs: a textual algorithmic problem specification and its corresponding graph representation. The model applies Transformer’s layers to process the text input and NAR layers to the graph input. Cross-attention is used to condition token embeddings as per the node embeddings computed by the NAR. In this case, the Transformer can augment its understanding with the robust algorithmic reasoning capacities of the pre-trained NAR module. The model uses a next-token prediction objective to train end-to-end.
In comparison to the baseline Transformer, TransNAR has achieved significant improvements, outperforming it on most individual algorithms, both in-distribution and out-of-distribution. Interestingly, TransNAR was found to enhance out-of-distribution generalization capabilities, even when the baseline completely lacked them. The model also increased the ratio of inputs for which it could produce outputs of the correct shape, suggesting improved data handling. TransNAR, though, had a hard time with algorithms involving the search for a particular index in an input list, indicating a shared failure mode related to surprising robust index limits.
Evaluations on the CLRS-Text benchmark confirmed the TransNAR’s superiority over Transformer-only models, both in-distribution and most importantly, in out-of-distribution regimes with larger input sizes.
In conclusion, the groundbreaking development of TransNAR may lead to better handling and solving of algorithmic tasks in natural language. This combination of both the understanding capabilities of Transformers and GNN’s robust algorithmic reasoning skills points towards a future where language models are better equipped to deal with complex reasoning and algorithmic tasks in a more robust manner.