Researchers at MIT have developed a computational method to hasten the process of generating optimized versions of proteins, using only a small amount of data. The researchers have generated proteins with mutations capable of improving Green Fluorescent Protein (GFP) and a protein used to deliver DNA for gene therapy from an adeno-associated virus (AAV).
The process of optimizing proteins typically begins with a natural protein that has a desirable function and undergoes several rounds of random mutation to generate its optimized version. This is challenging due to the complex mapping from DNA sequence to protein structure and function. The researchers built a computational model to make this process more predictable and efficient.
The researchers used GFP and AAV as proofs-of-concept for their method, which helped them develop better protein sequences using several data sets. The model operates like a “fitness landscape,” a three-dimensional map illustrating the fitness of a given protein and how it differs from the original sequence. It predicts the path a protein needs to follow to reach peak fitness.
The researchers used their model and an existing computational technique to “smooth” the fitness landscape, enabling the model to reach peak fitness more easily by iteratively making small improvements. The optimized GFP sequences predicted by the model were about 2.5 times fitter than the original and had up to seven different amino acids from the original sequence.
The researchers plan to apply this computational method to data on voltage indicator proteins, which, despite two decades of study, have not been perfected for widespread use. With access to a small data set, the researchers hope their model will be able to make predictions superior to two decades of manual testing. Their work could lead to new tools being developed for neuroscience research and medical applications.