Is Our Approach in Assessing Large-Scale Visual-Language Models Correct? This Chinese AI Research Presents MMStar: A Superior Vision-Driven Multi-Modal Benchmark.

Researchers have noted gaps in the evaluation methods for Large Vision Language Models (LVLMs). Primarily, they note that evaluations overlook the potential of visual content being unnecessary for many samples, as well as the risk of unintentional data leakage during training. They also indicate the limitations of single-task benchmarks for accurately assessing the multi-modal capabilities of LVLMs.

To remedy these issues, they present MMStar, an advanced multi-modal benchmark developed to offer a more thorough and accurate evaluation for LVLMs. MMStar incorporates a dataset of 1,500 samples, selected conscientiously by human reviewers, concentrating on six core capabilities and 18 detailed axes.

The development of MMStar entailed three main stages. First was the data curation process where the selected evaluation samples satisfied three primary criteria: visual dependency, minimal data leakage, and the need for advanced multi-modal capabilities for resolution. An automated pipeline was used for initial sample filtering, followed by a human review of the samples to ensure that they met the prescribed evaluation criteria.

The second stage focused on defining the Core Capabilities to thoroughly evaluate the LVLMs’ diverse multi-modal capabilities. This was done by incorporating six core capability dimensions and eighteen detailed axes, drawn from existing benchmarks.

Finally, the evaluative metrics were considered. Two unique metrics were developed to assess the potential data leakage and actual performance gain from the multi-modal training process.

On testing various LVLMs, the best average scores achieved were only slightly above 50%, indicating room for improvement in the performances of these models. This underlines the pressing necessity for stringent evaluation methods, such as MMStar, in order to further advance the capabilities of LVLMs.

This critical research was undertaken by a team from University of Science and Technology of China, The Chinese University of Hong Kong, and Shanghai AI Laboratory. Their work on MMStar is expected to contribute significantly to the study and advancement of LVLMs.

All
Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All
Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All
Categories

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Is Our Approach in Assessing Large-Scale Visual-Language Models Correct? This Chinese AI Research Presents MMStar: A Superior Vision-Driven Multi-Modal Benchmark.

Leave a comment Cancel reply

You May Also Like

Transformed from Misplaced to Discovered: The Training Movement of Information-Intensive (IN2) Revolutionizes the Comprehension of Extensive-Context Language

This study by UC Berkeley showcases how the division of tasks could potentially undermine the security of artificial intelligence (AI) systems, initiating misuse.

+60 12-462 2768

All Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All Categories

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Is Our Approach in Assessing Large-Scale Visual-Language Models Correct? This Chinese AI Research Presents MMStar: A Superior Vision-Driven Multi-Modal Benchmark.

Leave a comment Cancel reply

You May Also Like

Transformed from Misplaced to Discovered: The Training Movement of Information-Intensive (IN2) Revolutionizes the Comprehension of Extensive-Context Language

This study by UC Berkeley showcases how the division of tasks could potentially undermine the security of artificial intelligence (AI) systems, initiating misuse.

+60 12-462 2768

All
Categories

All
Categories

All
Categories