Language models (LMs), used in applications such as autocomplete and language translation, are trained on a vast amount of text data. Yet, these models also face significant challenges in relation to privacy and copyright concerns. In some cases, the inadvertent inclusion of private and copyrighted content in training datasets can lead to legal and ethical issues. This has stimulated research into machine unlearning, a method that transforms trained models to act as if they have never learned certain data, maintaining their performance and efficiency.
Machine unlearning methods fall into two categories: exact and approximate. The former makes the unlearned model identical to a model retrained without the ‘forgotten’ data, but is computationally expensive for large LMs. The latter uses techniques like gradient ascent optimization, locally-informed unlearning, in-context unlearning, and task-specific or behavior-specific unlearning. However, assessing unlearning effectiveness is limited by task specificity, lack of comprehensiveness, and real-world deployment considerations like scalability and the handling of multiple unlearning requests.
Addressing these challenges, researchers from universities, including Washington, Princeton, Southern California, Chicago, and Google Research have proposed a comprehensive evaluation framework, termed MUSE (Machine Unlearning Six-Way Evaluation). MUSE evaluates six key properties vital for practical unlearning: the ability to remove explicit and implicit data memories, prevent privacy leaks, maintain utility and performance, scale efficiently, and manage multiple unlearning requests. Assessment was conducted on two representative datasets, including unlearning books and news articles.
MUSE prescribes six key criteria for machine unlearning that caters to both the data owner and the model deployer. These criteria measure the absence of verbatim and knowledge memorization, privacy leakage, utility preservation, scalability, and sustainable performance over multiple unlearning requests. These measures expose the model’s weaknesses and strengths, providing means for potential improvement.
Application of MUSE to eight unlearning methods highlight significant challenges. While most methods could effectively remove explicit and implicit learning from the model, they faced problems in preventing privacy leakage, often over- or under-unlearning. Specifically, preserving model utility emerged as a significant challenge, with some models becoming unusable after unlearning. Scalability became an issue with increasing sizes of ‘forget sets’. Moreover, continuous and multiple unlearning requests led to a steady decline in model performance.
The findings stress the substantial trade-offs and limitations in current unlearning techniques, underscoring the need for developing robust and balanced machine unlearning methods that can cater to both data holders and deployers. These innovative techniques would ideally eliminate the complexities of unlearning, ensuring that the models remain effective, efficient, and more importantly, ethically sound.
This research paper presents MUSE, a comprehensive framework designed to evaluate machine unlearning in LMs. The study results illuminate the difficulties and trade-offs inherent in current unlearning approaches, and underline the urgent need for more advanced and balanced machine unlearning methods. Ultimately, these methods should meet the sophisticated demands and requirements of real-world applications that use language models.