Researchers have developed an AI system called ESM3 that is capable of simulating hundreds of millions of years of protein evolution to create a new fluorescent protein unlike any found in nature. The system, designed by a team led by Alexander Rives at EvolutionaryScale, can process and generate data about protein sequences, structures, and functions. This reflects a significant achievement in protein engineering through AI, enabling researchers to explore protein design spaces far beyond what has been seen through natural evolution.
When trained with data from billions of natural proteins, the AI model has been able to understand how proteins may evolve over time. However, according to the researchers, ESM3 isn’t just combining existing protein information. Instead, they believe that the model has developed an understanding of the fundamental principles underlying protein structure and function, which allows it to generate truly novel designs.
In the experiment, the AI model successfully created its own version of green fluorescent protein (GFP), a protein often used in biotechnology research. The generated protein, called esmGFP, only shares 58% of its sequence with other known fluorescent proteins. However, it displays brightness similar to that of naturally occurring GFPs, and it possesses the characteristic barrel-shaped structure needed for fluorescence.
The creation of esmGFP involved prompting ESM3 with limited structural data from a template GFP, which the machine utilized to generate new protein sequences and structures by repeating the prompting process over and again. Thousands of candidate designs were evaluated and filtered to find the most likely candidates, which were synthesized and tested for fluorescence activity. After identifying a low-light but distant GFP variant, ESM3 was used again to optimize the design.
Commenting on the achievement, Dr. Tiffany Taylor, a professor not linked with the study said this is a significant step in approaching synthetic biology, and AI models like ESM3 can enable the discovery of new proteins that natural evolution would not permit.
Despite the promise these developments hold, they also present risks. For instance, AI-designed proteins could potentially interact with natural ecosystems if they escape into the environment. They could also create harmful biological agents or toxins due to unexpected interactions within living organisms. These risks have prompted researchers to call for ethical standards in AI-protein design.
This research demonstrates that AI is increasingly being leveraged in the field of protein research and design. For instance, DeepMind’s AlphaFold 3 has been hailed for its accurate predictions of how proteins fold. Nevertheless, with the fast-paced advancements in AI and its interference with biology, caution must be exercised to prevent potentially risky outcomes.