Protein engineering is a complicated process, typically involving the random mutation of a natural protein with a desirable function, repeated until an optimal version of the protein is developed. This process has proven successful for proteins like the green fluorescent protein (GFP), but this isn’t the case for all proteins. Researchers at MIT have developed a computational approach to help predict better protein mutations based on a small amount of data.
Using this approach, the MIT team managed to generate proteins with mutations predicted to improve GFP and a protein from adeno-associated virus (AAV), often used for gene therapy DNA delivery. The model’s broad application could benefit neuroscience research and medical procedures.
Where mapping the DNA sequence to protein structure and function is complex, this model aims to simplify the task, according to Ila Fiete, a professor of brain and cognitive sciences at MIT.
The model’s potential was demonstrated by optimizing natural proteins for specific applications, one such being ‘voltage indicators’, which are produced by bacteria and algae and emit fluorescent light upon detecting electrical potential.
Traditionally, the engineering of these proteins to produce a stronger, faster fluorescent signal has taken decades, and they still aren’t effective enough for widespread use. The computational approach aims to significantly speed up the protein optimization process.
The MIT researchers managed to smooth the path to reaching peak protein fitness using computational modelling, addressing the issue of a protein having to undergo less-fit mutation stages before reaching peak fitness levels.
The researchers used an established computational technique to ‘smooth’ the protein’s ‘fitness landscape,’ enabling more straightforward progression to enhanced fitness peaks. Furthermore, the model predicted optimized GFP sequences, some containing seven different amino acids from the original sequence, and reaching up to 2.5 times the fitness of the original protein.
The successful application of this computational method to predict preferential sequences for the viral capsid of the adeno-associated virus, optimized for DNA payload packaging, was also demonstrated.
These successful results indicate promising potential for the model’s application in various protein engineering problems, maximizing efficiency and results.
The researchers plan to apply the computational technique to data produced on voltage indicator proteins, a prospect that could revolutionize two decades of manual testing by numerous labs. The hope is that this computational approach could fast-track protein optimization using smaller datasets, generating predictions superior to historical manual testing processes.
The study was supported in part by the U.S. National Science Foundation, the Machine Learning for Pharmaceutical Discovery and Synthesis consortium, the Abdul Latif Jameel Clinic for Machine Learning in Health, as well as other bodies.