Artificial Intelligence’s Large Multimodal Models (LMMs) exhibit impressive problem-solving abilities across varied tasks like zero-shot classifications, retrieval, and multimodal questioning. However, a significant gap lurks between robust LMMs and expert AI, especially concerning complex perception and reasoning with domain-specific expertise. This study introduces CMMMU, a pioneering Chinese evaluative benchmark designed to evaluate LMMs in multi-disciplinary tasks, advancing the growth of bilingual LMMs.
CMMMU, an abbreviation for Chinese Massive Multi-discipline Multimodal Understanding, is a comprehensive benchmark. It consists of 12,000 manually sourced Chinese multimodal questions from college quizzes and textbooks spanning six core disciplines. This not only evaluates LMMs on complex tasks but also annotates each question with detailed subfields and image types.
A systematic data collection process ensures CMMMU’s quality and diversity. Authors collect sources per license requirements, crowdsource annotators to annotate these sources, and supplement questions to ensure balanced representation across all subjects. The paper’s authors verify each question, filtering those too challenging or not meeting college standards.
Both closed-source and open-source implementations are considered in the evaluation, using zero-shot settings instead of fine-tuning for a raw model assessment. A precise and rule-based evaluation pipeline ensures comprehensive evaluation, with micro-average accuracy as the metric. The study also includes an analysis of errors even advanced LMMs make, highlighting the long journey towards proficiency.
Intriguingly, the study revealed a smaller performance gap between open- and closed-source LMMs in Chinese, indicating considerable improvement possibilities. Even the most advanced LMMs, GPT-4V and Qwen-VL-Plus, attained only 42% and 36% accuracy, respectively.
CMMMU benchmark is a significant advancement towards superior AI. It serves as an effective evaluator of LMMs’ primary perceptual skills, intricate reasoning, and profound domain expertise. The research provides insights into the reasoning capabilities of bilingual LMMs in the Chinese and English languages. This will pave the way for AI that can rival professionals across various fields.
The researchers for this project deserve all credit for this research. Check out the Paper and Project, and don’t forget to follow us on Twitter and Google News. Also, join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.
For more updates, join our Telegram Channel and subscribe to our newsletter.