Large Language Models (LLMs) and multi-modal counterparts (MLLMs), crucial in advancing artificial general intelligence (AGI), face issues while dealing with visual mathematical problems, especially where geometric figures and spatial relationships are involved. While advances have been made through techniques for vision-language integration and text-based mathematical problem-solving, progress in the multi-modal mathematical domain has been limited.
A…
