In a search to create more effective proteins for various purposes, including research and medical applications, researchers at MIT have developed a new computational approach aimed at predicting beneficial mutations based on limited data. Modeling this technique, they produced modified versions of green fluorescent protein (GFP), a protein found in certain jellyfish, and explored its potential for gene therapy delivery.
The intricacy of protein design lies in its complex transformation from DNA sequence to protein structure and function. Current processes necessitate a tedious routine of random mutations until an optimized protein form results. Professor Ila Fiete, a leader on this research, compares the process as attempting to locate a river basin concealed by a mountain range.
This research crew includes Regina Barzilay, the School of Engineering Distinguished Professor for AI and Health, and Tommi Jaakkola, the Thomas Siebel Professor of Electrical Engineering and Computer Science at MIT. Together, they have published an open-access paper presenting the work, which will be reviewed at the International Conference on Learning Representations.
Collaborating with Edward Boyden’s lab, the researchers hoped to develop proteins to detect voltage indicators in living cells, which would allow them to measure neuron activity without electrodes. Using a convolutional neural network (CNN), the team assembled an approximate graph of GFP changes based on experimental data from about 1000 variations of the protein.
Such a graph showed the relative fitness of different proteins, charting valleys for unfit proteins and peaks for more optimal forms. Computational methods were employed to ‘smooth’ the graph, which allowed the model to predict peaks representing optimized GFP sequences more simply. This resulted in smoother transitions from starting points to tops. The best-generated proteins were estimated to be around 2.5 times fitter than their source proteins.
The researchers were also able to apply the approach to optimize the viral capsid of adeno-associated virus (AAV), employed across gene therapies to transport DNA. The protein was optimized for its packaging ability of DNA payloads, offering validation of the new method’s relevance to other protein engineering tasks.
A next step within this research involves using the new computational model for data on voltage indicator proteins. The hope is that these projections can be more beneficial than the previous reliance on manual testing. The research received funding from a variety of sources including the U.S. National Science Foundation, the Machine Learning for Pharmaceutical Discovery and Synthesis consortium, Howard Hughes Medical Institute, National Institutes of Health and more