Abstract
Predictive analysis of electronic health records (EHR) is a fundamental task that could provide actionable insights to help clinicians improve the efficiency and quality of care. EHR are commonly recorded in binary format and contain inevitable missing data. The nature of missingness may vary by patients, clinical features, and time, which incurs observation bias. It is essential to account for the binary missingness and observation bias or the predictive performance could be substantially compromised. In this paper, we develop a propensity-adjusted temporal network (PATNet) to conduct data imputation and predictive analysis simultaneously. PATNet contains three subnetworks: 1) an imputation subnetwork that generates the initial imputation based on historical observations, 2) a propensity subnetwork that infers the patient-, feature-, and time-dependent propensity scores, and 3) a prediction subnetwork that produces the missing-informative prediction using the propensity-adjusted imputations and the missing probabilities. To allow the propensity scores to be inferred from data, we use the expectation-maximization (EM) algorithm to learn the imputation and propensity subnetworks and incorporate a low-rank constraint via PARAFAC2 approximation. Extensive evaluation using the MIMIC-III and eICU datasets demonstrates that PATNet outperforms the state-of-the-art methods in terms of binary data imputation, disease progression modeling, and mortality prediction tasks.
Original language | English |
---|---|
Pages (from-to) | 2600-2613 |
Number of pages | 14 |
Journal | IEEE Transactions on Knowledge and Data Engineering |
Volume | 36 |
Issue number | 6 |
Early online date | 13 Oct 2023 |
DOIs | |
Publication status | Published - Jun 2024 |
Scopus Subject Areas
- Information Systems
- Computer Science Applications
- Computational Theory and Mathematics
User-Defined Keywords
- Binary data imputation
- Data models
- Diseases
- Medical diagnostic imaging
- Predictive analytics
- Predictive models
- Task analysis
- Time series analysis
- clinical risk prediction
- disease progression modeling
- electronic health records
- missing at random
- missing data
- propensity score