Meta’s Fundamental AI Research (FAIR) team has made significant advancements and contributions to AI research, models, and datasets recently that align with principles of openness, collaboration, quality, and scalability. Through these, the team aims to encourage innovation and responsible development in AI.
Meta FAIR has made six key research artifacts public, as part of an aim to promote openness and collaboration within the AI community. These artifacts include cutting-edge models for converting images to text and text to music, multi-token prediction models, and a new technique for identifying AI-generated speech. These outputs are designed to inspire further research and promote evolution in AI technologies.
One of the most significant releases is the Meta Chameleon model family. This collection of models uses a joint architecture for inputting and outputting text and images. In contrast to conventional models using diffusion-based learning, the Meta Chameleon model uses tokenization for texts and images, making it more scalable and efficient. This technology opens the door for numerous applications, such as producing innovative captions for images or blending text prompts and images to create new scenes. The components of Chameleon 7B and 34B models are made available under a research-only license.
Another notable contribution is a new multi-token prediction approach for language models. Unlike traditional language models that predict the next word in a script, Meta FAIR’s innovative approach predicts multiple future words at once, enhancing the model’s capabilities and training efficiency while also increasing processing speed. Models pre-trained for code completion using this method are available under a non-commercial, research-only license.
Further, Meta FAIR has developed an innovative text-to-music generation model called JASCO, which enhances control over generated music by accepting various conditioning inputs. Importantly, Meta FAIR has also introduced AudioSeal, an audio watermarking technique for detecting AI-generated speech that enhances detection speed up to 485 times compared to previous methods. It is part of Meta FAIR’s broader efforts to alleviate misuse of generative AI tools.
In collaboration with external partners, Meta FAIR also released the PRISM dataset maps, providing insights into diversity in dialogue and preference, and encouraging inclusive AI development.
Additional tools from Meta FAIR include the “DIG In” indicators, and a study involving over 65,000 annotations was done to comprehend differences in geographic representation perceptions. The Vendi Score guidance system was introduced to increase the diversity of representation while maintaining quality and consistency.
These contributions show Meta FAIR’s commitment to pioneering AI research while ensuring responsible and inclusive development. With these advancements, Meta FAIR hopes to drive innovation and promote collaborative efforts in addressing AI’s challenges and opportunities.