Medical image segmentation plays a pivotal role in identifying anatomically significant regions within medical images, facilitating crucial clinical tasks such as disease diagnosis [1], disease progression monitoring [2], [3], and effective therapy planning [4]. Precise segmentation is particularly indispensable in detecting anomalies and tumors. Brain tumors, for instance, pose a significant health concern worldwide, accounting for 85% to 90% of all primary central nervous system (CNS) tumors [5]. They are also the leading cause of cancer deaths among children and adolescents younger than 20 years.
Brain tumors are classified as either primary or secondary. Primary tumors originate from brain cells, whereas secondary tumors metastasize from other parts of the body. Among primary tumors, gliomas are the most prevalent and originate from glial cells. They can be of high grade (HGG) or low grade (LGG), with high-grade gliomas being highly aggressive and requiring urgent medical attention. Delays in diagnosis and treatment of gliomas can lead to advanced-stage cancer and mortality. Accurate segmentation of them at an early stage is, therefore, required for effective diagnosis and treatment planning, given their aggressive nature and potential for metastasis. But, due to the heterogeneity in shapes and sizes of gliomas, identifying and delineating tumor boundaries is challenging. Gliomas exhibit notable heterogeneity not only in their morphological attributes, such as shape and size, but also in their histological composition and genetic profiles. The complex morphology of gliomas, ranging from compact masses to infiltrative patterns, further exacerbates the challenge of accurately capturing their extent within medical images. Further the anatomical variations also complicate glioma segmentation. Variations in brain anatomy, such as differences in shape, size, and spatial orientation, can obscure tumor boundaries and confound segmentation algorithms.
Traditionally, brain tumor segmentation has relied on manual annotation by medical experts, which is time-consuming and subjective process. Thresholding-based methods [6] which involve segmenting brain tumor regions based on intensity thresholds are relatively simple but struggle in the presence of noise and intensity variations. Region growing algorithms [7] that iteratively group pixels with similar properties are effective but sensitive to initialization and parameter selection. The widely used watershed transform algorithm [8] and active contour models [9] although accurate, suffers from over-segmentation. Atlas-based segmentation methods [10] register a pre-segmented atlas image to the target image and propagate the corresponding segmentation labels. They are sensitive to anatomical variability across different subjects. The traditional machine learning approaches including Support Vector Machines (SVMs) and Random Forests have revolutionized medical image segmentation [11], [12] but shows limitations in capturing complex spatial relationships particularly required in segmenting brain tumors across multi-modalities images and images with varying structures.
In the past decade, deep learning techniques have emerged as powerful tools for medical image analysis, showing great potential in automating the segmentation task. Particularly, Convolutional Neural Networks (CNNs) have become a potent tool for automated segmentation by leveraging their ability to learn discriminative features from large-scale medical imaging datasets [13], [14], [15]. However, despite these advancements, several key gaps remain in the current models. CNNs face significant challenges in capturing long-range dependencies and the spatial relationships between objects and their surroundings [16], which are crucial for accurately delineating tumor boundaries. This gap in the integration of contextual information arises from the reliance of traditional CNN-based architectures on local receptive fields. These models struggle to incorporate global context, making it difficult to accurately segment complex tumor subregions that often have irregular shapes and vary significantly in intensity, shape, and appearance across different patients. The UNet architecture, introduced by Ronneberger et al. [17], with its encoder–decoder structure and skip connections, has shown flexibility in handling these variations to some extent. Despite numerous adaptations aimed at enhancing its capabilities, UNet based architectures encounter difficulties in capturing intricate details and attending to the most relevant features, which is crucial for distinguishing subtle differences in tumor regions. Another significant gap lies in the insufficient handling of multimodal data, where current models fail to fully exploit the complementary information provided by different MRI modalities. The inability to effectively address these gaps often leads to errors, especially in ambiguous or noisy areas of the images, emphasizing the need for models that better integrate contextual information and leverage multimodal data for more robust and accurate brain tumor segmentation.
This motivates the need to integrate advanced attention mechanisms to boost the effectiveness of UNet-based segmentation models for accurate representation and discrimination of tumor regions while maintaining computational efficiency. In this article, a deep learning structure named Bias Corrected Twin Squeeze-and-Excitation Attention Enhanced UNet (BC-TSEA-UNet) is introduced. BC-TSEA-UNet merges the well-known UNet design with twin squeeze-and-excitation (SE) attention blocks, resulting in improved segmentation performance and heightened network discriminative power. SE Blocks selectively recalibrate feature maps by emphasizing informative features and suppressing less important ones, effectively capturing both local and global context. This approach enhances the model’s ability to integrate contextual information and improves segmentation performance. The primary contributions of the paper are:
- •
A novel deep learning architecture, BC-TSEA-UNet, is proposed for brain tumor segmentation, that integrates twin SE blocks to enhance feature recalibration and improve the model’s ability to capture both local and global contextual information.
- •
A comprehensive bias field correction mechanism is incorporated to mitigate intensity variations across brain images, ensuring consistent and reliable data for segmentation.
- •
An extensive evaluation of the proposed BC-TSEA-UNet on the BraTS 2019, BraTS 2020, and BraTS 2023 datasets is performed that demonstrates its strong generalization capability across multiple brain tumor datasets.
The paper is organized as follows: Section 2 provides an overview of related work in medical image segmentation. Section 3 discusses the overall methodology of BC-TSEA-UNet. Section 4 outlines the experiments conducted, while Section 5 presents the result analysis performed on the BraTS datasets.