Kyutai Discloses Moshi as Open Source: A Live Native Multimodal Foundation AI Model Capable of Speaking and Listening

In a significant reveal that has shaken the world of technology, Kyutai introduced Moshi, a pioneering real-time native multimodal foundation model. This new AI model emulates and exceeds some functionalities previously demonstrated by OpenAI’s GPT-4o. Moshi understands and delivers emotions in various accents, including French, and can simultaneously handle two audio streams, allowing it to listen and speak concurrently. Developed with rigorous processes involving a fine-tuning process of 100,000 “oral-style” synthetic conversations, Kyutai has ensured Moshi is accessible to a wide number of users, offering a smaller variant that can operate on MacBook or consumer-sized GPUs.

Moshi operates on the principle of responsible AI. Kyutai has thus incorporated features such as watermarking to identify AI-generated audio, an initiative currently in progress. The decision to provide Moshi as an open-source project underlines Kyutai’s dedication to openness and joint development within the AI community.

At its foundation, Moshi is run by a 7-billion parameter multimodal language model that processes speech input and output. The model functions with a dual-channel I/O, generating text tokens and audio codecs simultaneously. The base text language model, Helium 7B, was developed from scratch and then trained jointly with text and audio codecs. The speech codec, based on Kyutai’s Mimi model, allows a 300x compression factor, capturing semantic and acoustic information.

Looking ahead, Kyutai has ambitious plans for Moshi. It intends to release a technical report and open model versions, including the inference codebase, the 7B model, the audio codec, and the full optimized stack. Moshi’s future versions (1.1, 1.2, and 2.0) will refine the model based on user feedback and are expected to encourage widespread acceptance and innovation due to its permissive licensing.

In conclusion, Moshi illustrates the extraordinary advancements in AI technology a small, focused team can achieve. It demonstrates the power of AI to transform research assistance, brainstorming, language learning, and more. The open-source nature of the model invites collaboration and innovation, ensuring that the advantages of this revolutionary technology are accessible to all.

Lastly, Kyutai thanked its researchers for this breakthrough and led followers to their Twitter and LinkedIn profiles. They also mentioned a forthcoming paper, code, model, and asked the audience to join their Telegram channel or subscribe to their newsletter or subreddit.

All
Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All
Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All
Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

Kyutai Discloses Moshi as Open Source: A Live Native Multimodal Foundation AI Model Capable of Speaking and Listening

Leave a comment Cancel reply

You May Also Like

Google’s Frontier Safety Structure aims to alleviate significant AI dangers.

GenSQL: An AI System that Utilizes Generative Mechanisms to Enhance the Application of Probabilistic Programming in Synthesizing Tabular Data Analysis.

+60 12-462 2768

All Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

Kyutai Discloses Moshi as Open Source: A Live Native Multimodal Foundation AI Model Capable of Speaking and Listening

Leave a comment Cancel reply

You May Also Like

Google’s Frontier Safety Structure aims to alleviate significant AI dangers.

GenSQL: An AI System that Utilizes Generative Mechanisms to Enhance the Application of Probabilistic Programming in Synthesizing Tabular Data Analysis.

+60 12-462 2768

All
Categories

All
Categories

All
Categories