Skip to content Skip to footer

Cohere’s AI Paper improves the stability of language models by automatically identifying under-trained tokens in large language models (LLMs).

Large Language Models (LLMs) heavily rely on the process of tokenization – breaking down texts into manageable pieces or tokens – for their training and operations. However, LLMs often encounter a problem called ‘glitch tokens’. These tokens exist in the model’s vocabulary but are underrepresented or absent in the training datasets. Glitch tokens can destabilize a model causing it to produce unpredictable outputs.

The root cause of the glitch token problem revolves around the misalignment between tokenizer training and model training. Usually, tokenizers and models are trained independently using distinct datasets. If there’s a significant difference between the two datasets, some tokens can become under-trained in the model’s perspective.

A notorious glitch token example is “_SolidGoldMagikarp”. This token could lead to unwanted model behavior producing hallucinations or nonsensical outputs.

Traditionally, detecting under-trained tokens in the vocabulary involved manually checking the tokenizer’s behavior including their encoding, decoding, and frequency in the training data. Considering the increasing complexity of LLMs, manual methods are no longer scalable.

To address this, researchers from Cohere have proposed an automated approach applying the model’s embedding weights. The method involves analyzing these weights to detect anomalies indicative of insufficient training. Glitch tokens can be identifiable if the tokens’ embedding weights significantly deviate from well-represented tokens.

By calculating and comparing the variance and distribution of these weights against a normative model, researchers can systematically detect glitch tokens. The effectiveness of this new method was proven when applied on well-known models like BERT and GPT, where it identified up to 10% of the tokenizer’s vocabulary as under-trained. Most of these tokens were specialized or infrequently used words depicting significant discrepancies in their embedding weight patterns.

This research significantly aids the development and maintenance of LLMs. Automated methods detecting and fixing under-trained tokens can improve the accuracy and robustness of LLMs, especially given their growing usage in applications ranging from automated writing aids to complex conversational agents.

In conclusion, while the research touched upon a notable vulnerability in LLM training, it proffered a scalable solution. Automated detection methods for under-trained tokens allow for more robust training operations, ensuring all tokens in the model’s vocabulary are correctly prepared to handle real-world applications. This research is therefore a considerable step forward in making language models more reliable and efficient in Natural Language Processing tools.

Leave a comment

0.0/5