Language learning models (LLMs) are capable of memorizing and reproducing their training data, which can create substantial privacy and copyright issues, particularly in commercial environments. These concerns are especially important for models that generate code as they may unintentionally reuse code snippets verbatim, thereby conflicting with licensing terms that restrict commercial use. Moreover, models may reveal personally identifiable information (PII) or other sensitive data.
To address these issues, researchers have developed various techniques, such as post-training “unlearning” and model editing. The most effective strategy is to tackle these problems during the initial training phase, rather than relying solely on subsequent adjustments.
A team of researchers from the University of Maryland, the ELLIS Institute Tübingen, and the Max Planck Institute for Intelligent Systems have devised a new training technique dubbed “goldfish loss.” This method works by excluding a random selection of tokens (units of data) from the loss computation during training, which prevents the model from memorizing and reproducing exact sequences from its training data.
In experiments with large Llama-2 models, the scientists found that goldfish loss significantly reduces memorization with minimal impact on performance. Although models trained with this method may require slightly longer training periods, they are less prone to verbatim reproduction and less vulnerable to data extraction attacks.
Various other approaches have been proposed and explored to mitigate memorization in LLMs. Techniques include extracting training data via prompts to measure “extractable memorization,” where a model completes a string from a given prefix. Spontaneous data reproduction has been noticed in both text and image models. Differentially private training and data deduplication are common strategies employed to lessen memorization, although these tend to decrease model performance and consume a lot of resources.
To combat verbatim reproduction, the goldfish loss technique modifies the way LLMs are trained by selectively excluding tokens from the loss computation. Hashed masking further enhances this, maintaining consistent token masking based on the context of preceding tokens. This is crucial for handling duplicate passages in web documents, where variations can happen due to different attributions, headers, and other content.
Goldfish loss effectively prevents memorization in LLMs across various training scenarios. For instance, standard training methods led to significant memorization of exact sequences when models were trained on a small dataset. In contrast, models trained with the goldfish loss showed minimal memorization, as gauged by RougeL scores and exact match rates.
However, while goldfish loss is effective in mitigating memorization risks in LLMs, it does not guarantee total resistance to adversarial extraction methods, and it has some limitations against membership inference attacks (MIAs) and adaptive attacks such as beam search. Despite its shortcomings, goldfish loss presents a viable strategy for enhancing privacy in industrial applications, with the potential for selective use in high-risk scenarios or specific document types.