LogPar: Logistic PARAFAC2 Factorization for Temporal Binary Data with Missing Values

Kejing Yin, Ardavan Afshar, Joyce C. Ho, Kwok Wai CHEUNG, Chao Zhang, Jimeng Sun

Research output: Chapter in book/report/conference proceedingConference contributionpeer-review

Abstract

Binary data with one-class missing values are ubiquitous in real-world applications. They can be represented by irregular tensors with varying sizes in one dimension, where value one means presence of a feature while zero means unknown (i.e., either presence or absence of a feature). Learning accurate low-rank approximations from such binary irregular tensors is a challenging task. However, none of the existing models developed for factorizing irregular tensors take the missing values into account, and they assume Gaussian distributions, resulting in a distribution mismatch when applied to binary data. In this paper, we propose Logistic PARAFAC2 (LogPar) by modeling the binary irregular tensor with Bernoulli distribution parameterized by an underlying real-valued tensor. Then we approximate the underlying tensor with a positive-unlabeled learning loss function to account for the missing values. We also incorporate uniqueness and temporal smoothness regularization to enhance the interpretability. Extensive experiments using large-scale real-world datasets show that LogPar outperforms all baselines in both irregular tensor completion and downstream predictive tasks. For the irregular tensor completion, LogPar achieves up to 26% relative improvement compared to the best baseline. Besides, LogPar obtains relative improvement of 13.2% for heart failure prediction and 14% for mortality prediction on average compared to the state-of-the-art PARAFAC2 models.

Original languageEnglish
Title of host publicationKDD 2020 - Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages1625-1635
Number of pages11
ISBN (Electronic)9781450379984
DOIs
Publication statusPublished - 23 Aug 2020
Event26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2020 - Virtual, Online, United States
Duration: 23 Aug 202027 Aug 2020

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Conference

Conference26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2020
Country/TerritoryUnited States
CityVirtual, Online
Period23/08/2027/08/20

Scopus Subject Areas

  • Software
  • Information Systems

User-Defined Keywords

  • binary tensor completion
  • computational phenotyping
  • PARAFAC2 factorization
  • tensor factorization

Fingerprint

Dive into the research topics of 'LogPar: Logistic PARAFAC2 Factorization for Temporal Binary Data with Missing Values'. Together they form a unique fingerprint.

Cite this