Artificial intelligence research often examines whether next-token prediction—the convention for AI language models—can replicate some aspects of human intelligence such as planning and reasoning. However, despite its extensive use, this method may have native limitations when it comes to tasks necessitating foresight and decision-making. This is important because overcoming this could allow the development of AI systems with more complex, human-like reasoning abilities, in turn expanding their application in real-world settings.
Typically, models rely on autoregressive inference and teacher-forcing during training with applications in language modeling and text generation. These methods have significant limitations, though. For example, autoregressive inference can suffer from cascading errors, where minor inaccuracies pile up leading to large deviations from the anticipated sequence over lengthy outputs. Teacher-forcing, on the other hand, can create shortcuts, leading to a failure in appropriately learning next-token prediction for some tasks. This failure to learn the true sequence dependencies necessary for meaningful planning and reasoning prevents these AI models from delivering in areas requiring complex, long-term planning and decision-making.
In response, a group of researchers have proposed an approach that uses a multi-token prediction objective to address these shortcomings. Predicting multiple tokens in advance rather than just focusing on sequential next-token predictions could solve these issues. This new approach can therefore provide a more robust and accurate sequence prediction method. Not only does this improve the model’s planning and reasoning abilities, but it also signifies a sizable contribution to the field by potentially increasing the complexity and reliability of AI models.
The suggested method involves predicting several tokens at once during training, avoiding the pitfalls of classic teacher-forcing and autoregressive methods. The researchers developed a simple planning task using a path-finding problem on a graph to test this. Both the Transformer and Mamba architectures were experimented with, revealing that these models don’t accurately learn the task with classical next-token prediction methods. The data set comprised of path-star graphs with varying degrees and path lengths. The models were then trained to find paths from one starting node to a goal node.
The results show that when traditional methods were used, both the Transformer and Mamba architectures failed to accurately predict the next tokens for the path-finding task. Traditional methods showed substantial limitations, with errors increasing and leading to significant inaccuracies in long sequences. However, the proposed multi-token prediction approach showed an improvement in both accuracy and performance—successfully mitigating the issues with autoregressive inference and teacher-forcing.
In conclusion, the authors address the challenge of whether next-token prediction can emulate human intelligence in tasks requiring planning and reasoning. Through a path-finding task, the researchers propose a novel multi-token prediction approach that minimizes the limitations of traditional methods. While underlining the shortcomings of current methods, this approach presents an alternative that augments AI models’ planning and reasoning abilities, thereby offering a significant advancement in AI research.