Generative models aim to replicate the patterns in the data they are trained on, often striving to replicate human actions and results. These models strive to match human proficiency in various tasks, but there is a debate over whether these models can surpass their human trainers. A new study from researchers at Harvard University, UC Santa Barbara, Apple, the Kempner Institute, Princeton University, and Google DeepMind has explored this concept of “transcendence” in generative models, which is when a model is able to surpass the abilities of its expert data sources.
The study used an autoregressive transformer, trained on chess game transcripts, as a way to demonstrate this concept. The model was able to outperform the best players in the dataset through a process known as low-temperature sampling. This method is in line with the “wisdom of the crowd” idea, where the collective decision-making of a diverse group can surpass individual performance. The research provides a framework and empirical evidence to show that generative models can indeed improve performance.
The game of chess has been a significant part of AI development from its very inception, and it continues to be a source of inspiration for AI advances. The study also ties in with research on AI diversity, showing that models trained on diverse datasets outperform those trained on individual expert-based models. This concept also relates to Offline Reinforcement Learning, where training on diverse behaviors can lead to policies that surpass the original performance of the training data.
The idea of transcendence in generative models is defined as a model outperforming the experts it was trained on. This is measured by comparing the model’s average reward on a test distribution to the rewards of the experts. Low-temperature sampling plays a crucial role in achieving transcendence, since it concentrates probability on high-reward actions, effectively acting as a majority vote among the expert predictions. This denoising effect can enhance performance, particularly in settings with multiple experts who excel in different areas.
To test the theoretical results on this concept in chess-playing models, various autoregressive transformer models were trained on a dataset of one billion games from lichess.org. The models with no direct access to the board state were then tested against the Stockfish chess engine under different temperature sampling settings. Results showed that low-temperature sampling significantly enhanced the model’s play, particularly its move choice during critical game states. Models trained on more diverse datasets were found to be better at transcending their training limitations.
In conclusion, the study introduces this concept of transcendence where generative models trained on expert data outperform the best individual experts. Low-temperature sampling helps achieve this by denoising expert biases and incorporating diverse knowledge. Dataset diversity is key to achieving transcendence. The study suggests further research into other fields like NLP (Natural Language Processing) and computer vision to assess generalisability. Ethical considerations and other potential impacts of deploying generative models are also discussed.