TY - JOUR
T1 - LADDA
T2 - Latent Diffusion-based Domain-adaptive Feature Disentangling for Unsupervised Multi-modal Medical Image Registration
AU - Yuan, Peng
AU - Dong, Jianmin
AU - Zhao, Wei
AU - Lyu, Fei
AU - Xue, Cheng
AU - Zhang, Yudong
AU - Yang, Chunfeng
AU - Wu, Zhan
AU - Gao, Zhiqiang
AU - Lyu, Tianling
AU - LouisCoatrieux, Jean
AU - Chen, Yang
N1 - Publisher Copyright:
© 2013 IEEE.
Funding Information:
This work was supported in part by the Science and Technology Project of Xizang Autonomous Region under Grant XZ202401JD0009, the State Key Project of Research and Development Plan under Grants 2022YFC2408500, the National Natural Science Foundation of China under Grant T2225025 and 62401141, and the Key Research and Development Programs in Jiangsu Province of China under Grant BE2021703 and BE2022768, and in part by the Natural Science Foundation of Zhejiang Province under Grant LZ23A050002 and National Natural Science Foundation of China under Grant 12175012, and the Jiangsu Province Science Foundation for Youths under Grant BK20241305, and China Postdoctoral Science Foundation under Grant 2023M740607.
PY - 2025/7/15
Y1 - 2025/7/15
N2 - Deformable image registration (DIR) is critical for accurate clinical
diagnosis and effective treatment planning. However, patient movement,
significant intensity differences, and large breathing deformations
hinder accurate anatomical alignment in multi-modal image registration.
These factors exacerbate the entanglement of anatomical and
modality-specific style information, thereby severely limiting the
performance of multi-modal registration. To address this, we propose a
novel LAtent Diffusion-based Domain-Adaptive feature disentangling
(LADDA) framework for unsupervised multi-modal medical image
registration, which explicitly addresses the representation
disentanglement. First, LADDA extracts reliable anatomical priors from
the Latent Diffusion Model (LDM), facilitating downstream content-style
disentangled learning. A Domain-Adaptive Feature Disentangling (DAFD)
module is proposed to promote anatomical structure alignment further.
This module disentangles image features into content and style
information, boosting the network to focus on cross-modal content
information. Next, a Neighborhood-Preserving Hashing (NPH) is
constructed to further perceive and integrate hierarchical content
information through local neighbourhood encoding, thereby maintaining
cross-modal structural consistency. Furthermore, a
Unilateral-Query-Frozen Attention (UQFA) module is proposed to enhance
the coupling between upstream prior and downstream content information.
The feature interaction within intra-domain consistent structures
improves the fine recovery of detailed textures. The proposed framework
is extensively evaluated on large-scale multi-center datasets,
demonstrating superior performance across diverse clinical scenarios and
strong generalization on out-of-distribution (OOD) data.
AB - Deformable image registration (DIR) is critical for accurate clinical
diagnosis and effective treatment planning. However, patient movement,
significant intensity differences, and large breathing deformations
hinder accurate anatomical alignment in multi-modal image registration.
These factors exacerbate the entanglement of anatomical and
modality-specific style information, thereby severely limiting the
performance of multi-modal registration. To address this, we propose a
novel LAtent Diffusion-based Domain-Adaptive feature disentangling
(LADDA) framework for unsupervised multi-modal medical image
registration, which explicitly addresses the representation
disentanglement. First, LADDA extracts reliable anatomical priors from
the Latent Diffusion Model (LDM), facilitating downstream content-style
disentangled learning. A Domain-Adaptive Feature Disentangling (DAFD)
module is proposed to promote anatomical structure alignment further.
This module disentangles image features into content and style
information, boosting the network to focus on cross-modal content
information. Next, a Neighborhood-Preserving Hashing (NPH) is
constructed to further perceive and integrate hierarchical content
information through local neighbourhood encoding, thereby maintaining
cross-modal structural consistency. Furthermore, a
Unilateral-Query-Frozen Attention (UQFA) module is proposed to enhance
the coupling between upstream prior and downstream content information.
The feature interaction within intra-domain consistent structures
improves the fine recovery of detailed textures. The proposed framework
is extensively evaluated on large-scale multi-center datasets,
demonstrating superior performance across diverse clinical scenarios and
strong generalization on out-of-distribution (OOD) data.
KW - Large deformation
KW - domain-adaptive feature disentangling
KW - latent diffusion model
KW - multi-modal
KW - image registration
UR - http://www.scopus.com/inward/record.url?scp=105012176228&partnerID=8YFLogxK
U2 - 10.1109/JBHI.2025.3588511
DO - 10.1109/JBHI.2025.3588511
M3 - Journal article
C2 - 40663660
AN - SCOPUS:105012176228
SN - 2168-2194
VL - 14
SP - 1
EP - 14
JO - IEEE Journal of Biomedical and Health Informatics
JF - IEEE Journal of Biomedical and Health Informatics
IS - 8
ER -