Tokyo Institute of Technology Scientists Launch ProtHyena: A Swift and Effective Principal Protein Language Model with Individual Amino Acid Resolution

Proteins and their functions are vital to human biology and health. They provide essential amino acids and require highly advanced machine-learning models for their representation. Self-supervised pre-training models have greatly improved protein sequence representation, but there are still challenges with handling longer sequences and maintaining contextual understanding. Although strategies such as linearized and sparse approximations have been adopted, they often defeat expressivity. Additionally, current models with over 100 million parameters struggle with larger inputs, and the role of individual amino acids poses a unique obstacle.

Researchers from the Tokyo Institute of Technology have created a model called ProtHyena to address these issues. ProtHyena is a fast and efficient model that uses the Hyena operator to analyze protein data. It captures both long-range context and single amino acid resolution in actual protein sequences, outperforming current models like TAPE Transformer and SPRoBERTa.

Traditional models based on the Transformer and BERT architectures exhibit impressive capabilities in many applications. However, their efficiency and the length of context they can process are limited due to the quadratic computational complexity of the self-attention mechanism. Techniques to reduce this high computational cost, such as factorized self-attention used in sparse Transformers and the Performer, often result in compromised model expressivity.

ProtHyena circumvents these limitations using the Hyena operator, which comprises long convolutions and element-wise gating. The model process each amino acid as an individual token and includes unique character tokens for padding, separation, and unknown characters. A variant of ProtHyena, ProtHyena-bpe, uses byte pair encoding (BPE) for data compression and employs a larger vocabulary size.

ProtHyena has proven its efficiency through state-of-the-art results in various tasks, including Remote Homology and Fluorescence predictions, and demonstrated its robustness with a Spearman’s r of 0.678. The model also showed potential in Secondary Structure Prediction (SSP) and Stability tasks, although specific metrics were not provided.

In summary, ProtHyena represents a significant breakthrough in protein sequence analysis. This protein language model uses the Hyena operator to address computational challenges faced by other models. It efficiently processes long protein sequences and offers state-of-the-art performance with far fewer parameters required. The extensive training of ProtHyena on the Pfam dataset across a variety of tasks illustrates its ability to capture complex biological data accurately. The use of the Hyena operator allows performance at a subquadratic time complexity, marking a substantial advancement in protein sequence study. The researchers’ work was recently published in a scientific paper.

All
Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All
Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All
Categories

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Tokyo Institute of Technology Scientists Launch ProtHyena: A Swift and Effective Principal Protein Language Model with Individual Amino Acid Resolution

Leave a comment Cancel reply

You May Also Like

Exploring the Potential of Artificial Intelligence and Neuroimaging in the Diagnosis and Treatment of Chronic Temporomandibular Disorders

Former Pakistan Prime Minister Imran Khan announces electoral triumph in AI format

+60 12-462 2768

All Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All Categories

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Tokyo Institute of Technology Scientists Launch ProtHyena: A Swift and Effective Principal Protein Language Model with Individual Amino Acid Resolution

Leave a comment Cancel reply

You May Also Like

Exploring the Potential of Artificial Intelligence and Neuroimaging in the Diagnosis and Treatment of Chronic Temporomandibular Disorders

Former Pakistan Prime Minister Imran Khan announces electoral triumph in AI format

+60 12-462 2768

All
Categories

All
Categories

All
Categories