TY - GEN
T1 - Gene-induced Multimodal Pre-training for Image-omic Classification
AU - Jin, Ting
AU - Xie, Xingran
AU - Wan, Renjie
AU - Li, Qingli
AU - Wang, Yan
N1 - Funding Information:
This work was supported by the National Natural Science Foundation of China (Grant No. 62101191), Shanghai Natural Science Foundation (Grant No. 21ZR1420800), and the Science and Technology Commission of Shanghai Municipality (Grant No. 22DZ2229004).
Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023.
PY - 2023/10/8
Y1 - 2023/10/8
N2 - Histology analysis of the tumor micro-environment integrated with genomic assays is the gold standard for most cancers in modern medicine. This paper proposes a Gene-induced Multimodal Pre-training (GiMP) framework, which jointly incorporates genomics and Whole Slide Images (WSIs) for classification tasks. Our work aims at dealing with the main challenges of multi-modality image-omic classification w.r.t. (1) the patient-level feature extraction difficulties from gigapixel WSIs and tens of thousands of genes, and (2) effective fusion considering high-order relevance modeling. Concretely, we first propose a group multi-head self-attention gene encoder to capture global structured features in gene expression cohorts. We design a masked patch modeling paradigm (MPM) to capture the latent pathological characteristics of different tissues. The mask strategy is randomly masking a fixed-length contiguous subsequence of patch embeddings of a WSI. Finally, we combine the classification tokens of paired modalities and propose a triplet learning module to learn high-order relevance and discriminative patient-level information. After pre-training, a simple fine-tuning can be adopted to obtain the classification results. Experimental results on the TCGA dataset show the superiority of our network architectures and our pre-training framework, achieving 99.47% in accuracy for image-omic classification. The code is publicly available at https://github.com/huangwudiduan/GIMP.
AB - Histology analysis of the tumor micro-environment integrated with genomic assays is the gold standard for most cancers in modern medicine. This paper proposes a Gene-induced Multimodal Pre-training (GiMP) framework, which jointly incorporates genomics and Whole Slide Images (WSIs) for classification tasks. Our work aims at dealing with the main challenges of multi-modality image-omic classification w.r.t. (1) the patient-level feature extraction difficulties from gigapixel WSIs and tens of thousands of genes, and (2) effective fusion considering high-order relevance modeling. Concretely, we first propose a group multi-head self-attention gene encoder to capture global structured features in gene expression cohorts. We design a masked patch modeling paradigm (MPM) to capture the latent pathological characteristics of different tissues. The mask strategy is randomly masking a fixed-length contiguous subsequence of patch embeddings of a WSI. Finally, we combine the classification tokens of paired modalities and propose a triplet learning module to learn high-order relevance and discriminative patient-level information. After pre-training, a simple fine-tuning can be adopted to obtain the classification results. Experimental results on the TCGA dataset show the superiority of our network architectures and our pre-training framework, achieving 99.47% in accuracy for image-omic classification. The code is publicly available at https://github.com/huangwudiduan/GIMP.
KW - Multimodal learning
KW - Whole slide image classification
UR - http://www.scopus.com/inward/record.url?scp=85174741864&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-43987-2_49
DO - 10.1007/978-3-031-43987-2_49
M3 - Conference proceeding
SN - 9783031439865
T3 - Lecture Notes in Computer Science
SP - 508
EP - 517
BT - Medical Image Computing and Computer Assisted Intervention – MICCAI 2023
A2 - Greenspan, Hayit
A2 - Madabhushi, Anant
A2 - Mousavi, Parvin
A2 - Salcudean, Septimiu
A2 - Duncan, James
A2 - Syeda-Mahmood, Tanveer
A2 - Taylor, Russell
PB - Springer
CY - Cham
T2 - 26th International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2023
Y2 - 8 October 2023 through 12 October 2023
ER -