TedPar: Temporally Dependent PARAFAC2 Factorization for Phenotype-based Disease Progression Modeling

Kejing Yin*, William K. Cheung, Benjamin C.M. Fung, Jonathan Poon

*Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference contributionpeer-review

Abstract

PARAFAC2 factorization provides a practical solution to map the temporally irregular electronic health records (EHR) to clinically relevant and interpretable phenotypes. Existing methods ignore the effect of interdependency of diseases over clinical history. Consequently, the crucial temporal information contained in the EHR data cannot be fully utilized and the learned phenotypes can be sub-optimal to characterize patients with progressive conditions. To address this issue, we propose a novel temporally dependent PARAFAC2 (TedPar) factorization in which the temporal dependency among the phenotypes is explicitly modeled. TedPar learns a set of target phenotypes to capture the clinical features relevant to the diseases of interest and a set of background phenotypes to capture irrelevant but frequently co-occurring clinical features. By effectively modeling the temporal dependency and separating relevant and irrelevant features, the discovered target phenotypes can be used to model the progression of the diseases of interest. Empirical evaluations show that TedPar obtains up to 32.4% relative improvement in reconstruction accuracy over the test set, suggesting significantly better generalizability than the baselines for both noise-free and heavily noisy input data. Qualitative analysis also shows that TedPar is capable of discovering clinically meaningful phenotypes and capturing the temporal dependency between them.

Original languageEnglish
Title of host publicationProceedings of the 2021 SIAM International Conference on Data Mining (SDM)
EditorsCarlotta Demeniconi, Ian Davidson
PublisherSociety for Industrial and Applied Mathematics (SIAM)
Pages594-602
Number of pages9
ISBN (Electronic)9781611976700
DOIs
Publication statusPublished - 29 Apr 2021
Event2021 SIAM International Conference on Data Mining, SDM 2021 - Virtual, Online
Duration: 29 Apr 20211 May 2021
https://www.siam.org/conferences/cm/conference/sdm21
https://epubs.siam.org/doi/book/10.1137/1.9781611976700

Publication series

NameSIAM International Conference on Data Mining (SDM)

Conference

Conference2021 SIAM International Conference on Data Mining, SDM 2021
CityVirtual, Online
Period29/04/211/05/21
Internet address

Scopus Subject Areas

  • Computer Science Applications
  • Software

Fingerprint

Dive into the research topics of 'TedPar: Temporally Dependent PARAFAC2 Factorization for Phenotype-based Disease Progression Modeling'. Together they form a unique fingerprint.

Cite this