3D medical image segmentation faces difficulties in capturing global data from high-resolution images often resulting in suboptimal segmentation. A possible solution involves the use of depth-wise convolution with larger kernel sizes to detect a wider array of features. However, this approach may not fully capture the relations across distant pixels, hence needing a complementary method.
Transformers architectures, specifically designed to extract global data, have been explored intensively in recent years. They are combined with 3D Convolutional Neural Networks (CNNs), in models such as TransBTS and UNETR, to capture both local spatial features and global dependencies in high-level features. However, they encounter computational challenges due to the high resolution of 3D images, hindering their speed performance.
To mitigate the problems of long-sequence modeling, Mamba, a state space model (SSM), has been introduced. It efficiently models long-range dependencies via a selection mechanism and a hardware-aware algorithm. It has been applied in various computer vision tasks and integrated into models like U-Mamba to enhance medical image segmentation. Based on Mamba, Vision Mamba proposes the Vim block that combines bidirectional SSM for global visual context modeling and positional embeddings for location-aware comprehension.
In response to the challenges faced by traditional transformer blocks in handling large-size features, researchers at Beijing Academy of Artificial Intelligence introduced SegMamba. This groundbreaking architecture fuses the U-shape structure with Mamba to model global volumetric features at multiple scales. Specially developed for 3D medical image segmentation, SegMamba exhibits excellent capabilities in modeling long-range dependencies, while maintaining impressive inference efficiency as compared to traditional CNN and transformer-based methods.
Several experiments tested on the BraTS2023 dataset affirm SegMamba’s remarkable efficacy and efficiency in performing 3D medical image segmentation tasks. It outperforms transformer-based methods by utilizing state space modeling principles for whole-volume feature modelling while maintaining a superior processing speed.
The complete research is available on paper and Github thanks to the researchers involved in the project. Feel free to follow them and also join our engaging communities on several social media platforms. Read through our newsletters and remember to join our Telegram channel!