Skip to content Skip to footer

Reducing Expenses without Sacrificing Efficiency: Implementing Structured FeedForward Networks (FFNs) in Transformer-Based Language Model Systems (LLMs)

Improving the efficiency of Feedforward Neural Networks (FFNs) in Transformer architectures is a significant challenge, particularly when dealing with highly resource-intensive Large Language Models (LLMs). Optimizing these networks is essential for supporting more sustainable AI methods and broadening access to such technologies by lowering operation costs.

Existing techniques for boosting FFNs efficiency are commonly based on low-rank approximations and structured matrices, like LowRank and BlockDense decompositions. These aim to diminish parameters and FLOPs but face practical limitations: low-rank approximations can result in poor optimization dynamics due to increased symmetries causing saddle points, and structured matrices can cause subpar training dynamics and reduced online decoding efficiency owing to poor parallelism on GPUs. These issues hamper the applicability of these methods for real-time applications or large-scale deployments.

To address these issues, a research team from Google DeepMind and EPFL proposed a hybrid structure that combines low-rank and block-diagonal matrices, using a technique named ‘self-guided training’. The premise is to introduce a dense matrix during the early training stages that is slowly phased out, permitting the structured matrices to assume control. This approach enhances both training stability and quicker convergence. It not only addresses computational efficiency, but it also ensures smooth optimization dynamics, reducing events of loss spikes and instability, marking a notable improvement over current methodologies.

The research uses structured linear parameterization where the layers of FFN are approximated using combinations of low-rank and block-diagonal matrices. The breakthrough lies in the ‘self-guided training’ technique, where the dense matrix assists in the early stages of the training, gradually transitioning to an efficient structured form. The proposed models were tested on scales ranging from 110M to 1.3B parameters, thus demonstrating scalability and robustness.

The research shows substantial improvements in training and inference efficiency. The structured FFN models achieved a 1.35 times speed-up in training and a 2.5 times faster FFN at inference with a small increase in perplexity. The implementation of ‘self-guided training’ resulted in a drop of 0.4 in perplexity on a 1.3B parameter model with an unchanged training FLOPs. This shows improved performance metrics, including lower perplexity and higher throughput, proving its effectiveness and superiority over traditional FFNs.

In conclusion, this research is a significant contribution to optimizing large language models with a hybrid structured FFN approach combined with self-guided training. This innovative method overcomes critical limitations of existing methods, culminating in improved training efficiency and model performance. This novel approach could lead the way forward in AI research by making large-scale models more computationally efficient and accessible, which could lead to more sustainable and democratized AI development. This research only underscores the fact that performance can be improved while simultaneously cutting costs.

Leave a comment

0.0/5