TY - JOUR
T1 - GH-UNet: group-wise hybrid convolution-VIT for robust medical image segmentation
AU - Wang, Shengxiang
AU - Li, Ge
AU - Gao, Min
AU - Zhuo, Linlin
AU - Liu, Mingzhe
AU - Ma, Zhizhong
AU - Zhao, Wei
AU - Fu, Xiangzheng
N1 - Publisher Copyright:
© The Author(s) 2025.
Funding Information:
The study was supported by National Natural Science Foundation of China (Nos. 62476291, 62372158 and 62302339), the Hunan Provincial Natural Science Foundation for Distinguished Young Scholars (No. 2025JJ20097), the Research Foundation of Education Bureau of Hunan Province (No. 24B0003), and Wenzhou Key Scientific and Technological Projects (Nos. ZG2024007 and ZG2024012).
PY - 2025/7/10
Y1 - 2025/7/10
N2 - Medical image segmentation is vital for accurate diagnosis. While U-Net-based models are effective, they struggle to capture long-range dependencies in complex anatomy. We propose GH-UNet, a Group-wise Hybrid Convolution-ViT model within the U-Net framework, to address this limitation. GH-UNet integrates a hybrid convolution-Transformer encoder for both local detail and global context modeling, a Group-wise Dynamic Gating (GDG) module for adaptive feature weighting, and a cascaded decoder for multi-scale integration. Both the encoder and GDG are modular, enabling compatibility with various CNN or ViT backbones. Extensive experiments on five public and one private dataset show GH-UNet consistently achieves superior performance. On ISIC2016, it surpasses H2Former with 1.37% and 1.94% gains in DICE and IOU, respectively, using only 38% of the parameters and 49.61% of the FLOPs. The code is freely accessible via: https://github.com/xiachashuanghua/GH-UNet.
AB - Medical image segmentation is vital for accurate diagnosis. While U-Net-based models are effective, they struggle to capture long-range dependencies in complex anatomy. We propose GH-UNet, a Group-wise Hybrid Convolution-ViT model within the U-Net framework, to address this limitation. GH-UNet integrates a hybrid convolution-Transformer encoder for both local detail and global context modeling, a Group-wise Dynamic Gating (GDG) module for adaptive feature weighting, and a cascaded decoder for multi-scale integration. Both the encoder and GDG are modular, enabling compatibility with various CNN or ViT backbones. Extensive experiments on five public and one private dataset show GH-UNet consistently achieves superior performance. On ISIC2016, it surpasses H2Former with 1.37% and 1.94% gains in DICE and IOU, respectively, using only 38% of the parameters and 49.61% of the FLOPs. The code is freely accessible via: https://github.com/xiachashuanghua/GH-UNet.
UR - http://www.scopus.com/inward/record.url?scp=105010390170&partnerID=8YFLogxK
U2 - 10.1038/s41746-025-01829-2
DO - 10.1038/s41746-025-01829-2
M3 - Journal article
SN - 2398-6352
VL - 8
JO - npj Digital Medicine
JF - npj Digital Medicine
IS - 1
M1 - 426
ER -