Researchers from Harvard and Stanford universities have developed a new meta-algorithm known as SPRITE (Spatial Propagation and Reinforcement of Imputed Transcript Expression) to improve predictions of spatial gene expression. This technology serves to overcome current limitations in single-cell transcriptomics, which can currently only measure a limited number of genes.
SPRITE works by refining predictions from existing methods by propagating information across gene correlation networks and spatial neighborhood graphs. This propagation in two steps increases the accuracy of spatial gene expression predictions and leads to better results in subsequent analyses including cell clustering, visualization, and cell-type classification.
The research team tested SPRITE using eleven benchmark datasets combining spatial transcriptomics with RNA-seq data from various species and tissue types, including humans, mice, fruit flies, and axolotls. They compared the predictions derived using SPRITE with three other prediction methods – SpaGE, Tangram, and Harmony-kNN. The results showed that SPRITE consistently improved prediction accuracy and reduced mean absolute error, often increasing correlation with ground truth data.
The effectiveness of SPRITE relies on two key steps: the ‘Reinforce’ and ‘Smooth’ steps. The ‘Reinforce’ phase propagates prediction errors across a gene correlation network, thereby refining the prediction of target genes. The ‘Smooth’ step further refines the predictions by propagating them across a spatial neighborhood graph, based on the distances between cell centroids adjusted for cell-type similarity.
The team also found that SPRITE enhanced downstream tasks such as cell clustering, data visualization, and cell-type classification, often yielding better results than models trained on original measured data. They even discovered that SPRITE occasionally outperforms ground truth data, suggesting that the algorithm may be capable of de-noising gene expression.
However, the scalability of SPRITE is controlled by the number of cross-validation folds used. Accordingly, future research can explore integrating spatial and gene correlation information directly into prediction methods and expanding SPRITE to accommodate other data types like spatial proteomics.