Recent advances in Large Language Models (LLMs) have allowed for significant progress in language processing and various other fields. Despite these advancements, LLMs have not been highly impactful in molecule optimization, a crucial component of drug discovery. Traditional methods focus on patterns in chemical structure data rather than incorporating expert feedback, resulting in gaps in the drug discovery process.
Researchers from Tencent AI Lab and Hunan University’s Department of Computer Science are trying to bridge this gap by centering on human-machine interaction through the use of powerful LLMs. The team introduced MolOpt-Instructions, a large dataset used for fine-tuning LLMs on molecule optimization tasks, and DrugAssist, a LLM-based molecule optimization model that enables interactive optimization via human-machine dialogue.
MolOpt-Instructions provides extensive data on tasks related to molecule optimization, ensuring molecule similarity and substantial differences in the properties between molecules. DrugAssist, a Llama-2-7B-Chat-based model, uses human-machine dialogue to guide the process in refining the initial output.
The team evaluated DrugAssist by comparing it with two other molecule optimization models and three LLMs in terms of solubility, BP, success rate and validity. The results showed DrugAssist consistently delivered promising outcomes in multi-property optimization and was able to maintain optimized molecular property values within a given range.
The researchers also conducted a case study in a zero-shot setting, wherein DrugAssist managed to increase the BP and QED properties simultaneously by at least 0.1. This was achieved despite only being exposed to the data during training—the result demonstrates the model’s excellent transferability under zero-shot and few-shot settings. While one interaction did result in an incorrect output, the model rectified its error based on human feedback.
In summary, DrugAssist shines in real-time human interaction, demonstrating strong results in both single and multi-property optimizations, and showcases impressive transferability and iterative optimization abilities. Researchers are planning to boost the model’s abilities further by incorporating multimodal data handling, a move expected to significantly enhance and optimize drug discovery processes.