Gene-induced Multimodal Pre-training for Image-omic Classification

Ting Jin, Xingran Xie, Renjie Wan, Qingli Li, Yan Wang*

*Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

1 Citation (Scopus)


Histology analysis of the tumor micro-environment integrated with genomic assays is the gold standard for most cancers in modern medicine. This paper proposes a Gene-induced Multimodal Pre-training (GiMP) framework, which jointly incorporates genomics and Whole Slide Images (WSIs) for classification tasks. Our work aims at dealing with the main challenges of multi-modality image-omic classification w.r.t. (1) the patient-level feature extraction difficulties from gigapixel WSIs and tens of thousands of genes, and (2) effective fusion considering high-order relevance modeling. Concretely, we first propose a group multi-head self-attention gene encoder to capture global structured features in gene expression cohorts. We design a masked patch modeling paradigm (MPM) to capture the latent pathological characteristics of different tissues. The mask strategy is randomly masking a fixed-length contiguous subsequence of patch embeddings of a WSI. Finally, we combine the classification tokens of paired modalities and propose a triplet learning module to learn high-order relevance and discriminative patient-level information. After pre-training, a simple fine-tuning can be adopted to obtain the classification results. Experimental results on the TCGA dataset show the superiority of our network architectures and our pre-training framework, achieving 99.47% in accuracy for image-omic classification. The code is publicly available at
Original languageEnglish
Title of host publicationMedical Image Computing and Computer Assisted Intervention – MICCAI 2023
Subtitle of host publication26th International Conference, Vancouver, BC, Canada, October 8–12, 2023, Proceedings, Part VI
EditorsHayit Greenspan, Anant Madabhushi, Parvin Mousavi, Septimiu Salcudean, James Duncan, Tanveer Syeda-Mahmood, Russell Taylor
Place of PublicationCham
Number of pages10
ISBN (Electronic)9783031439872
ISBN (Print)9783031439865
Publication statusPublished - 8 Oct 2023
Event26th International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2023 - Vancouver, Canada
Duration: 8 Oct 202312 Oct 2023 (Conference proceedings)

Publication series

NameLecture Notes in Computer Science
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349
NameMICCAI: International Conference on Medical Image Computing and Computer-Assisted Intervention


Conference26th International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2023
Internet address

Scopus Subject Areas

  • Theoretical Computer Science
  • Computer Science(all)

User-Defined Keywords

  • Multimodal learning
  • Whole slide image classification


Dive into the research topics of 'Gene-induced Multimodal Pre-training for Image-omic Classification'. Together they form a unique fingerprint.

Cite this