Skip to content Skip to footer

This Chinese Artificial Intelligence Study Introduces MathScale: A Highly Scalable Machine Learning Technique for Generating Premium Mathematical Logic Data through Advanced LLMs.

Large language models (LLMs) that excel in solving various problems often falter when it comes to complex mathematical reasoning tasks. This is attributed to the requirement of multi-step reasoning, a process facilitated by Instruction Tuning. The effectiveness of Instruction Tuning is, however, hampered by limited mathematical reasoning datasets. The scarcity of such datasets underscores the need to boost their numbers for optimum utilization of Instruction Tuning, consequently improving LLM performance in mathematical problem-solving.

Approaches like chat GPT-based Instruction Tuning exemplified in the WizardMath and MetaMath methods have shown promise in improving mathematical instruction. These approaches use reinforced Evol-instruct and bootstrapping strategies to evolve questions and expand datasets, but their efficiency is curtailed by their reliance on manually designed operations.

To address this, researchers from The Chinese University of Hong Kong, Microsoft Research, and Shenzhen Research Institute of Big Data have introduced a novel method called MathScale. This innovative solution aims to resolve the scalability and quality issues related to mathematical reasoning datasets. It does so by extracting high-level concepts from existing math questions, constructing a concept graph to illustrate the connections between them, and using randomly sampled concepts to generate new questions. The effectiveness of MathScale is demonstrated through the MathScaleQA dataset and its performance on the comprehensive benchmark MWPBENCH.

More specifically, MathScale employs a systematic four-step process to generate its datasets. Firstly, it uses GPT-3.5 to extract high-level concepts from existing math questions, independent of the original questions. Next, it builds a concept graph based on these extractions, which visually depicts the connections between different concepts. After that, it applies a random walk algorithm to sample topics from the graph, ensuring a comprehensive dataset. Finally, it generates fresh math problems based on these sampled concepts.

MathScale has been found to outperform other models, including LLaMA-2 7B, LLaMA-2 13B, and Mistral 7B, in the MWPBENCH dataset. MathScale achieved a micro-average accuracy of 35.0% and a macro-average accuracy of 37.5%, surpassing equivalent models by 42.9% and 43.7% respectively. MathScale-Mistral demonstrated performance parity with GPT-3.5-Turbo on both the micro and macro averages, further attesting to its supremacy.

In conclusion, MathScale provides a straightforward and scalable solution for generating top-tier mathematical reasoning data using LLMs. Notably, MWPBENCH offers a comprehensive benchmark for math word problems of various difficulty levels. By outperforming its equivalents on MWPBENCH, MathScale-7B, has elevated mathematical reasoning, speeding up consistent model evaluations in academic forums.

Credit goes to the researchers behind this project for their contributions in advancing LLM capabilities. For further information about this research, you can check out their published paper using the source link provided.

Leave a comment

0.0/5