Skip to content Skip to footer

Progress in Protein Sequence Design: Utilizing Reinforcement Learning and Language Models

Protein sequence design is a significant part of protein engineering for drug discovery, involving the exploration of vast amino acid sequence combinations. To overcome the limitations of traditional methods like evolutionary strategies, researchers have proposed utilizing reinforcement learning (RL) techniques to facilitate the creation of new protein sequences. This progress comes as advancements in protein language models (PLMs), which score proteins based on biological measures like the TM-score, are being used to predict protein folding.

A team of researchers from McGill University, Mila–Quebec AI Institute, ÉTS Montréal, BRAC University, Bangladesh University of Engineering and Technology, University of Calgary, CIFAR AI Chair, and Dreamfold have proposed a current model that applies PLMs as reward functions to generate new protein sequences. However, the PLMs pose a computational challenge due to their size. The researchers present an alternative approach that optimizes based on scores from a more manageable, proxy model that is fine-tuned during the learning of mutation policies.

Experiments conducted by the researchers have shown that RL-based approaches yield promising results regarding biological plausibility and sequence diversity for various sequence lengths. The team has made their implementation, which facilitates the integration of different PLMs and exploration algorithms, open source. Their hope is that researchers can build upon their work to further protein sequence design.

The design of protein sequences through RL models the process as a Markov Decision Process (MDP), where sequences are manipulated based on actions chosen by an RL policy. A more economical proxy model is typically used as a reward, which is evaluated periodically alongside the more complex oracle model. The researchers note that the criteria for evaluation focus predominantly on biological plausibility and diversity.

The researchers assessed various sequence design algorithms using the ESMFold’s pTM scores as the primary metric in their experiments. The results revealed that while methods like Markov Chain Monte Carlo (MCMC) were best at direct optimization of pTM, RL techniques and GFlowNets were most effective when using a proxy model. These findings suggest that RL and GFlowNets methods maintain high pTM scores while significantly reducing computational costs. This research sheds light on the strength of RL methods in terms of their adaptability for sequence generation tasks.

However, this research isn’t free from limitations. The findings are restricted by computational constraints for longer sequences, and the reliance on either the expensive ESMFold model or the proxy model for evaluation. As a next step, these researchers will explore other PLMs like AlphaFold2 or larger ESMFold variants. Scaling to larger proxy models could also enhance the accuracy of longer sequences predictions.

This research showcases the power and potential of RL in protein sequence design and the innovative use of PLMs in advancing drug discovery techniques. However, as the authors highlight, there is a need for further research and vigilance to prevent any potential misuse of PLMs.

Leave a comment

0.0/5