Scientists from Princeton University and Meta AI have unveiled 'Lory', a completely differentiable MoE model which has been exclusively designed for pre-training of autoregressive language models.

Mixture-of-experts (MoE) architectures, designed for better scaling of model sizes and more efficient inference and training, present a challenge to optimize due to their non-differentiable, discrete nature. Traditional MoEs use a router network which directs input data to expert modules, a process that is complex and can lead to inefficiencies and under-specialization of expert modules. Recently, researchers from Princeton University and Meta AI designed the Lory method, which significantly improves MoE architectures for application in autoregressive language model pre-training.

Lory utilizes two major techniques: casual segment routing and a similarity-based data batching method. The first of these techniques, casual segment routing, works by breaking down a series of input tokens into smaller segments, a process that aids in expert merging operations while preserving the autoregressive nature of language models. However, segment-level routing can lead to insufficient specialization of experts, a challenge that Lory overcomes with its second technique: similarity-based data batching. This method groups similar documents together during training to efficiently create sequential segments ideal for expert routing.

Lory’s methods resulted in marked improvements across several factors. In regards to training efficiency and convergence, Lory demonstrated an equivalent loss level with less than half of the training tokens for 0.3B and 1.5B models, suggesting better performance with the same computational input. Lory surpassed dense models in language modeling spheres, leading to decreased perplexity. Furthermore, the model also achieved performance increases in common sense reasoning, reading comprehension, and text classification in downstream tasks.

In general, the Lory model demonstrates that improvements to MoE architecture optimization can lead to significant advancements in autoregressive language model pre-training. Given the success of Lory with these two techniques, future work aims at scaling up the model and integrating token and segment-level routing via the development of effective decoding methods for Lory. Such advancements to the field of MoEs offer tremendous potential and will play a crucial role in furthering research and understanding of language model pre-training.

All
Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All
Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All
Categories

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Scientists from Princeton University and Meta AI have unveiled ‘Lory’, a completely differentiable MoE model which has been exclusively designed for pre-training of autoregressive language models.

Leave a comment Cancel reply

You May Also Like

Leading Artificial Intelligence Instruments for Fashion Designers in 2024

Pricing of the Apple Vision Pro in AUD: Discover Additional Purchases Australians Can Make For Comparable Cost

+60 12-462 2768

All Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All Categories

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Scientists from Princeton University and Meta AI have unveiled ‘Lory’, a completely differentiable MoE model which has been exclusively designed for pre-training of autoregressive language models.

Leave a comment Cancel reply

You May Also Like

Leading Artificial Intelligence Instruments for Fashion Designers in 2024

Pricing of the Apple Vision Pro in AUD: Discover Additional Purchases Australians Can Make For Comparable Cost

+60 12-462 2768

All
Categories

All
Categories

All
Categories