A Temporal Tensor Factorization Framework for Phenotyping and Dynamic Patient Representation Learning Using Multi-Modal EHR Data

Project: Research project

Project Details


Leveraging of the data from electronic health records (EHR) for predictive analytics in healthcare has received increasing attention in recent years. High-throughput phenotyping refers to the use of machine learning to derive phenotypes (sets of clinical conditions) from the EHR data to characterize patients' disease states. Most phenotyping algorithms are developed only on the basis of records of discrete clinical events such as diagnoses and medications. For better phenotyping and patient characterization, multiple data modalities (such as laboratory test results, progress notes and vital signs) should be considered. Fusing the multi-modal EHR data for analytics is not trivial because the data are of different types, contain noisy and missing information, and are related in a complex manner.

In this project, we propose the use of a temporal tensor factorization framework to infer highly interpretable phenotypes and dynamic patient representations from multi-modal EHR data, and a number of underlying research challenges will be addressed. First, even though tensor factorization is useful for phenotyping, the interaction information of the clinical events is in fact missing and thus the tensor is often defined via heuristics. We propose to infer the latent temporal tensor and perform factorization under a unified framework. Second, determination of an appropriate time scale is crucial to obtain more robust phenotyping results using tensor factorization. We propose to extend the framework to a multi-scale version so that factorization of the clinical data at multiple time scales can be considered simultaneously. Also, because the nursing progress notes contain additional health related information on patients, we propose to incorporate them using a character-aware neural model to embed the progress notes into a vector space and a recurrent neural network (RNN) to model sequences of progress notes. The RNN can then be coupled with the temporal tensor framework for joint learning. In addition, we propose to link the disease state of a patient captured by the tensor framework with a physiological time series model to allow phenotype-based predictive analytics based on the patient monitoring data. The whole proposed framework will be learned end-to-end and implemented on machines with graphics processing units. For performance evaluation, we will apply the proposed methods to the MIMIC III Critical Care database which contains electronic health records with multi-modal data. The empirical results will also undergo qualitative evaluation by clinicians.
Effective start/end date1/01/2031/12/22


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.