Training large language models (LLMs) hinges on the availability of diverse and abundant datasets, which can be created through synthetic data generation. The conventional methods of creating synthetic data - instance-driven and key-point-driven - have limitations in diversity and scalability, making them insufficient for training advanced LLMs.
Addressing these shortcomings, researchers at Tencent AI Lab have…
