Researchers in the field of large language models (LLMs) are focused on training these models to respond more effectively to human-generated text. This requires aligning the models with human preferences, reducing bias, and ensuring the generation of useful and safe responses, a task often achieved through supervised fine-tuning and complex pipelines like reinforcement learning from human feedback (RLHF). While tools such as HuggingFace TRL and DeepSpeedChat have been useful, they often lack the scalability and performance necessary for large-scale models.
NVIDIA researchers introduced a new tool to streamline the training process. This tool is called NeMo-Aligner and it utilizes NVIDIA’s NeMo framework to optimize the whole RLHF pipeline. The tool focuses on optimizing both parallelism and distributed computing techniques, to better handle the complexity of training large-scale models. It also distributes the computing workload across various clusters to use hardware resources effectively.
NeMo-Aligner’s architecture enhances model alignment, making it more efficient and accessible. The tool divides the training pipeline into three stages: Supervised fine-tuning, Reward model training, and Proximal policy optimization (PPO). During PPO, the tool evenly distributes workloads among data-parallel workers, leading to improved training efficiency.
The performance of NeMo-Aligner has shown efficiency improvements, especially during the PPO stage. The integration of TensorRT-LLM reduces training times by up to seven times compared to more traditional methods. The tool also supports training models of up to 70 billion parameters, reducing training times while improving efficiency.
NeMo-Aligner is a flexible tool that can be integrated with various alignment algorithms like Supervised Finetuning, Direct Preference Optimization, and SPIN. It supports different optimization strategies and allows for alignment of models with human preferences across various aspects such as correctness and toxicity.
In summary, NeMo-Aligner is an effective tool for training large language models. It addresses the challenges of scalability and performance while streamlining the process of aligning large language models with human preferences. The tool enhances training efficiency and ensures that LLMs can be fine-tuned with improved alignment with human expectations.