Multi-modal generative models combine diverse data formats such as text, images, and videos to enhance artificial intelligence (AI) applications across various fields. However, the challenges in their optimization, particularly the discord between data and model development approaches, hinder progress. Current methodologies either focus on refining model architectures and algorithms or advancing data processing techniques, limiting…
