In recent years, the advancement of technology has allowed for the development of computer-verifiable formal languages, further advancing the field of mathematical reasoning. One of these languages, known as Lean, is an instrument employed to validate mathematical theorems, thereby ensuring accuracy and consistency in mathematical outcomes. Scholars are increasingly using Large Language Models (LLMs), specifically designed and trained upon Natural Language (NL) proofs, to create thorough formal proofs, signifying a significant breakthrough in formal theorem proving.
However, the lack of alignment within NL and Formal Language (FL) theorem verification data often impedes LLMs from operating at full capacity. This absence of resources hinders the improvement of efficient training techniques, and by extension, the total potential of applying LLMs in creating formal mathematical proofs.
In response to these obstacles, researchers from The Hong Kong University of Science and Technology and the University of Illinois Urban-Champaign have developed TheoremLlama. This end-to-end framework has been designed to adapt a general-purpose LLM for Lean4 theorem proving.
The framework incorporates several key components, such as NL-FL Aligned Dataset Generation that attempts to rectify the deficiency of data. Through this, TheoremLlama introduces a strategy for developing an NL-FL-aligned dataset referred to as Open Bootstrapped Theorems (OBT). This dataset employs the bootstrapping technique to integrate NL proofs into Lean4 code. By incorporating NL logic into Lean4 scenarios, LLMs’ understanding and execution of structured reasoning are improved.
TheoremLlama also utilizes new training techniques to help LLMs become proficient theorem provers in Lean4. Methods like block training and sorting data based on curriculum enhance the LLM’s in-context learning, ensuring reliable training on the OBT dataset.
Moreover, the framework also works on increasing the LLM’s capability to independently write formal proofs in Lean4, expanding its capacity in formal reasoning based on accurately generated proofs.
The bootstrapping approach of TheoremLlama is a remarkable innovation that efficiently coordinates natural language reasoning with formal mathematical language limitations. The efficacy of this framework has been substantiated by experimental results, demonstrating cumulative accuracies of 36.48% and 33.61% on the MiniF2F-Valid and Test datasets, respectively. These scores surpass those resulting from the GPT-4’s baseline findings, which had only yielded accuracies of 22.95% and 25.41% on the same datasets.
To conclude, TheoremLlama signifies a considerable stride towards leveraging LLMs’ NL capabilities to formalize theorem proving in Lean4. It offers solutions to improve mathematical reasoning and tackles key challenges with data alignment and training techniques, representing a significant academic and technological breakthrough.