Digital pathology is transforming the analysis of traditional glass slides into digital images, accelerated by advancements in imaging technology and software. This transition has important implications for medical diagnostics, research, and education. The ongoing AI revolution and digital shift in biomedicine have the potential to expedite improvements in precision health tenfold. Digital pathology can be used in conjunction with other multimodal patient data to produce evidence at a population level.
When digitalized, a regular gigapixel slide can be thousands of times wider and longer than standard images, presenting unique computational challenges. Traditional vision transformers are unable to manage this scale due to increasing computational demand required for self-attention in tandem with the length of the input. As a result, prior digital pathology approaches often overlook intricate interdependencies within slide image tiles, neglecting significant slide-level context that is important for applications like tumor microenvironment modeling.
To overcome this, Microsoft has introduced a new vision transformer, GigaPath, which uses dilated self-attention to manage computation. In collaboration with Providence Health System and the University of Washington, they have developed Prov-GigaPath, the first open-access whole-slide pathology foundation model. This model uses over one billion high-resolution pathology image tiles from more than 170,000 whole slides, computed with approval at the private tenant of Providence.
The two-stage learning curriculum of GigaPath consists of tile-level pretraining with DINOv2 and slide-level pre-training using a masked autoencoder and LongNet. The researchers also introduced an approach that cuts the tile sequence into pieces of increasing sizes, employing sparse attention for longer segments to counterbalance the quadratic expansion, with sparsity directly proportional to segment length.
Prov-GigaPath sets a digital pathology standard with its application to nine tasks for cancer subtyping and 17 for pathomics, leveraging data from Providence and The Cancer Genome Atlas (TCGA) datasets. The model successfully outperformed the second-best model in 18 of 26 tasks, showcasing its potential for versatile applications in digital pathology.
Beyond this, researchers also found universal signals for gene mutation across all types of cancer in a pan-cancer scenario, where Prov-GigaPath achieved top-level performance in 17 of 18 tasks. Researchers have also demonstrated the ability of GigaPath to perform vision-language tasks by aligning the pathology slide representation with report semantics for downstream prediction tasks.
In conclusion, the researchers have presented the first digital pathology foundation model, Prov-GigaPath, which has been pre-trained on a large scale with real-world data. The model excels at cancer classification, pathomics, and vision-language tasks, offering potential to improve patient care, accelerate clinical discoveries and demonstrate the importance of whole-slide modeling on large data. However, the team acknowledges the need for further work to fully realize the potential of a conversational assistant for this field, especially in integrating multimodal frameworks.