Skip to content Skip to footer

A Change in Perspective: MoRA’s Contribution to Promoting Techniques for Fine-Tuning Parameters Efficiently

Large language models (LLMs) are renowned for their ability to perform specific tasks due to the principle of fine-tuning their parameters. Full Fine-Tuning (FFT) involves updating all parameters, while Parameter-Efficient Fine-Tuning (PEFT) techniques such as Low-Rank Adaptation (LoRA) update only a small subset, thus reducing memory requirements. LoRA operates by utilizing low-rank matrices, enhancing performance without incurring extra computational costs. The merging of these matrices with the original model parameters avoids additional inference costs. Several methods seek to improve LoRA’s performance in LLMs, often achieving this through GLUE, which validates efficiency either by improved performance or fewer required trainable parameters.

The performance of LoRA has seen improvements due to enhancements such as DoRA’s decomposition approach, LoRA+’s differential learning rates, and ReLoRA’s integration during training. Most of these LoRA variants use either instruction tuning or GLUE tasks, but these may not capture the whole scope of effectiveness. Testing reasoning tasks has proven to be more effective but often needs more extensive training data which may limit impartial evaluation.

Researchers from Beihang University and Microsoft Corporation proposed a novel method identified as MoRA, which uses a square matrix, unlike LoRA’s low-rank matrices, to achieve a high-rank update with the same number of trainable parameters. MoRA uses four non-parameter operators to adjust the input and output dimensions for the weight to merge back into the LLMs. It efficiently reduces the input dimension and increases the output dimension. This adjustment is achieved using various methods, including truncating dimensions, sharing rows and columns, and reshaping inputs.

The researchers evaluated MoRA across five crucial tasks – instruction tuning, mathematical reasoning, continuation pretraining, memory, and pretraining, showing its efficiency. The performance of MoRA closely mirrored that of LoRA for instruction tuning and mathematical reasoning. However, it surpassed LoRA for tasks in the biomedical and financial domains, thanks to its high-rank updating. The study revealed that the different tasks require varying fine-tuning wherein a lower rank may suffice for some tasks such as instruction tuning, but fail in others, necessitating an increase in rank for parity with FFT.

To conclude, the researchers found that the strength of MoRA lies in its high-rank updating capability, which efficiently handles memory-intensive tasks. This process outperforms LoRA in continual pretraining and memory tasks. Hence, MoRA represents a significant improvement in the area of PEFT techniques.

Leave a comment

0.0/5