Domain-Adaptive Organ Segmentation through SegFormer Architecture in Clinical Imaging

Abstract
This paper proposes an organ segmentation method based on the SegFormer architecture to address the challenges of complex organ structures, blurred boundaries, and significant cross-domain distribution differences in medical images. A hierarchical encoder is designed by combining convolutional embedding modules with multi-head self-attention to achieve accurate modeling of multi-scale spatial structures. In the decoding stage, a lightweight multilayer perceptron module is used to fuse multi-scale features, avoiding information loss from traditional upsampling and enhancing boundary delineation. To validate the effectiveness of the proposed method, a comprehensive evaluation framework is constructed, covering various scenarios such as inference resolution changes, image quality degradation, and cross-center distribution shifts. Experiments are conducted on a public abdominal multi-organ CT dataset. Results show that the proposed model outperforms existing representative methods in metrics such as mIoU, mDice, and mAcc, demonstrating high segmentation accuracy and structural fidelity. Under complex test conditions, the model maintains strong robustness across different data domains and degraded images, showing good generalization. This study systematically explains the model from the perspectives of structural design, fusion mechanism, and stability evaluation, further confirming the adaptability and practical value of the SegFormer architecture in medical image structure analysis tasks.