Abstract
PARAFAC2 factorization provides a practical solution to map the temporally irregular electronic health records (EHR) to clinically relevant and interpretable phenotypes. Existing methods ignore the effect of interdependency of diseases over clinical history. Consequently, the crucial temporal information contained in the EHR data cannot be fully utilized and the learned phenotypes can be sub-optimal to characterize patients with progressive conditions. To address this issue, we propose a novel temporally dependent PARAFAC2 (TedPar) factorization in which the temporal dependency among the phenotypes is explicitly modeled. TedPar learns a set of target phenotypes to capture the clinical features relevant to the diseases of interest and a set of background phenotypes to capture irrelevant but frequently co-occurring clinical features. By effectively modeling the temporal dependency and separating relevant and irrelevant features, the discovered target phenotypes can be used to model the progression of the diseases of interest. Empirical evaluations show that TedPar obtains up to 32.4% relative improvement in reconstruction accuracy over the test set, suggesting significantly better generalizability than the baselines for both noise-free and heavily noisy input data. Qualitative analysis also shows that TedPar is capable of discovering clinically meaningful phenotypes and capturing the temporal dependency between them.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2021 SIAM International Conference on Data Mining (SDM) |
Editors | Carlotta Demeniconi, Ian Davidson |
Publisher | Society for Industrial and Applied Mathematics (SIAM) |
Pages | 594-602 |
Number of pages | 9 |
ISBN (Electronic) | 9781611976700 |
DOIs | |
Publication status | Published - 29 Apr 2021 |
Event | 2021 SIAM International Conference on Data Mining, SDM 2021 - Virtual, Online Duration: 29 Apr 2021 → 1 May 2021 https://www.siam.org/conferences/cm/conference/sdm21 https://epubs.siam.org/doi/book/10.1137/1.9781611976700 |
Publication series
Name | SIAM International Conference on Data Mining (SDM) |
---|
Conference
Conference | 2021 SIAM International Conference on Data Mining, SDM 2021 |
---|---|
City | Virtual, Online |
Period | 29/04/21 → 1/05/21 |
Internet address |
Scopus Subject Areas
- Computer Science Applications
- Software