GH-UNet: group-wise hybrid convolution-VIT for robust medical image segmentation

Shengxiang Wang, Ge Li, Min Gao, Linlin Zhuo*, Mingzhe Liu*, Zhizhong Ma*, Wei Zhao*, Xiangzheng Fu*

*Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

1 Citation (Scopus)

Abstract

Medical image segmentation is vital for accurate diagnosis. While U-Net-based models are effective, they struggle to capture long-range dependencies in complex anatomy. We propose GH-UNet, a Group-wise Hybrid Convolution-ViT model within the U-Net framework, to address this limitation. GH-UNet integrates a hybrid convolution-Transformer encoder for both local detail and global context modeling, a Group-wise Dynamic Gating (GDG) module for adaptive feature weighting, and a cascaded decoder for multi-scale integration. Both the encoder and GDG are modular, enabling compatibility with various CNN or ViT backbones. Extensive experiments on five public and one private dataset show GH-UNet consistently achieves superior performance. On ISIC2016, it surpasses H2Former with 1.37% and 1.94% gains in DICE and IOU, respectively, using only 38% of the parameters and 49.61% of the FLOPs. The code is freely accessible via: https://github.com/xiachashuanghua/GH-UNet.
Original languageEnglish
Article number426
Number of pages15
Journalnpj Digital Medicine
Volume8
Issue number1
DOIs
Publication statusPublished - 10 Jul 2025

Fingerprint

Dive into the research topics of 'GH-UNet: group-wise hybrid convolution-VIT for robust medical image segmentation'. Together they form a unique fingerprint.

Cite this