Scientists from the McGovern Institute for Brain Research at MIT, the Broad Institute of MIT and Harvard, and the National Center for Biotechnology Information at the National Institutes of Health have developed a new algorithm that can sift through massive amounts of genomic data to identify unique CRISPR systems. Known as Fast Locality-Sensitive Hashing-based clustering (FLSHclust), the algorithm represents a significant step forward for the fields of biotechnology, disease diagnosis and gene editing. The results of the research were published recently in Science.
The FLSHclust algorithm swiftly processes vast volumes of data in microbiome databases, which have become complex and difficult to navigate due to rapid expansion in recent years. The researchers used FLSHclust to identify 188 new rare CRISPR systems within bacterial genomes in three public databases, showing unprecedented diversity and complexity.
CRISPR, standing for clustered regularly interspaced short palindromic repeats, is a bacterial defensive mechanism that scientists have repurposed for gene editing and disease diagnostics. CRISPR systems hold immense potential in altering the genetics of mammalian cells with fewer off-target effects than current Cas9 systems, and could revolutionize disease diagnoses and provide molecular records of cellular activities.
The new algorithm’s most astounding discovery was its identification of new variants of the Type I CRISPR systems, which uses a 32-base pair guide RNA instead of the 20-nucleotide guide of Cas9. This new variant has the potential to enhance the precision of gene editing technology and reduce unintended off-target editing. The same Type I systems also demonstrated the capability of broadly degrading nucleic acids when bound to the CRISPR protein, a feature that could be used to develop tools for infectious disease diagnostics.
Additionally, the algorithm disclosed new mechanisms of action for Type IV CRISPR systems and a Type VII system that precisely targets RNA, which could be deployed in RNA editing, or as molecular records to track gene expression or sense specific activity within living cells.
This breakthrough by the researchers underscores the importance and benefits of diverse sampling. The scientists believe that their findings will pave the way for further discoveries of novel biochemical systems, as the FLSHclust algorithm can be utilized by anyone seeking to investigate large databases, discover new genes, or understand protein evolution.
The study also illuminates the diversity of CRISPR systems, indicating that most are rare and only found in atypical bacteria. As these databases continue to grow, it’s expected that many more rare systems are yet to be discovered. By leveraging the wealth of information in these databases, the researchers hope to uncover more of these “molecular gems” that could propel scientific discovery and innovation.