Multitask learning (MLT) is a method used to train a single model to perform various tasks simultaneously by utilizing shared information to boost performance. Despite its benefits, MLT poses certain challenges, such as managing large models and optimizing across tasks.
Current solutions to under-optimization problems in MLT involve gradient manipulation techniques, which can become computationally costly. These techniques compute a new update vector to the average loss, ensuring all task losses decrease uniformly. However, this process requires computing and storing all task gradients at each iteration, which results in significant space and time complexities.
Attempting to overcome these limitations, a team of researchers from The University of Texas at Austin, Salesforce AI Research, and Sony AI introduced the Fast Adaptive Multitask Optimization (FAMO). FAMO is a method designed to tackle the under-optimization issue in MLT without the computational burden associated with existing gradient manipulation techniques. It dynamically adjusts task weights to ensure a balanced loss decrease across tasks, leveraging loss history instead of computing all task gradients.
FAMO works by intending to decrease all task losses at an equal rate. It defines the rate of improvement for each task based on the change in loss over time. An optimization problem is formulated, causing FAMO to seek an update direction that maximizes the worst-case improvement rate across all tasks.
Instead of solving the optimization problem at each step, FAMO amortizes computation over the optimization process. It does a single-step gradient descent on a parameter representing task weights and then updates the task weights based on the change in log losses to approximate the gradient.
To validate FAMO, empirical experiments were conducted in various settings. A 2-task problem demonstrated FAMO’s effectiveness in mitigating conflicting gradients. In comparison to state-of-the-art methods in MLT supervised and reinforcement learning benchmarks, FAMO consistently achieved significant efficiency improvements, especially in terms of training time.
FAMO offers a promising solution to MLT’s challenges by dynamically adjusting task weights and amortizing computation over time. This removes the need for extensive gradient computations, resulting in improved performance. FAMO’s ability to consistently enhance performance across various MLT scenarios reflects its effectiveness and efficiency, thus establishing it as a valuable contribution to the field of multitask learning and the development of efficient machine learning models.