Large language models (LLMs) have received much acclaim for their ability to understand and process human language. However, these models tend to struggle with mathematical reasoning, a skill that requires a combination of logic and numeric understanding. This shortcoming has sparked interest in researching and developing methods to improve LLMs’ mathematical abilities without downgrading their linguistic capabilities.
Current approaches to improve LLMs include the Chain of Thought prompting, which aids in structuring LLMs’ reasoning process and models such as WizardMath which employ Supervised Fine-tuning and Reinforcement Learning methods to enhance capabilities. In addition, Self-Consistency strategies and tools like MATH-SHEPHERD have shown promise in improving problem-solving efficiencies. Some projects, such as Mammoth and Tora, have also used code insertion techniques to overcome computational limitations.
Recently, researchers from Zhipu.AI and Tsinghua University have developed a new method called “Self-Critique”, which stands out for its unique feedback-driven approach. Contrary to traditional methods which rely on external feedback, this method internalizes and uses the model’s own output for self-improvement, fostering simultaneous progress in both mathematical reasoning and language processing capabilities.
The process comprises two main phases. First, the LLM’s mathematical outputs are assessed by a model called Math-Critique. This allows the rejection of unsatisfactory responses in the Rejective Fine-tuning phase, leaving only high-quality responses for further analysis and refinement. The following step is the Direct Preference Optimization stage, where the model learns to differentiate between correct and incorrect answers, thereby improving its problem-solving skills. This pipeline was tested on the ChatGLM3-32B model using both academic datasets and a specially designed MATH USER EVAL dataset, which gave insight into the model’s improved mathematical and linguistic abilities.
The application of the Self-Critique pipeline to the ChatGLM3-32B model resulted in significant improvements. The enhanced model demonstrated a 17.5% increase in accuracy on the MATH USER EVAL dataset compared to its initial performance, placing it ahead of other leading models such as InternLM2-Chat-20B and DeepSeek-Chat-67B, which improved by 5.1% and 1.2% respectively. Furthermore, the model’s language capabilities also improved by 6.8%, indicating that the pipeline successfully maintained a balance between mathematical and language processing skills.
In conclusion, the researchers have introduced a new, effective tool known as the “Self-Critique” pipeline that significantly improves the mathematical problem-solving abilities of LLMs while preserving their linguistic proficiency. This was achieved by using internal feedback for improvement, rejecting ineffective responses, and learning from pairs of correct and incorrect answers. The improvement in mathematical accuracy and language processing capabilities exhibited by the ChatGLM3-32B model indicates a significant leap towards developing more versatile and intelligent AI systems. This innovation points to an optimistic future for AI research and applications.