Learning Hierarchical Variational Autoencoders with Mutual Information Maximization for Autoregressive Sequence Modeling

Research output: Contribution to journalArticlepeer-review

Abstract

Variational autoencoders (VAEs) are a class of effective deep generative models, with the objective to approximate the true, but unknown data distribution. VAEs make use of latent variables to capture high-level semantics so as to reconstruct the data well with the help of informative latent variables. Yet, training VAEs tends to suffer from posterior collapse, when the decoder is parameterized by an autoregressive model for sequence generation. On the other hand, VAEs can be further extended to contain multiple layers of latent variables, but posterior collapse still happens, which hinders the usage of hierarchical VAEs in real-world applications. In this paper, we introduce InfoMaxHVAE, which integrates mutual information estimated via neural networks into hierarchical VAEs to alleviate posterior collapse, when powerful autoregressive models are used for modeling sequences. Experimental results on a number of text and image datasets show that InfoMaxHVAE, in general, outperforms the state-of-the-art baselines and exhibits less posterior collapse. We further show that InfoMaxHVAE can shape a coarse-to-fine hierarchical organization of the latent space.
Original languageEnglish
JournalIEEE Transactions on Pattern Analysis and Machine Intelligence
DOIs
Publication statusE-pub ahead of print - 21 Mar 2022

Scopus Subject Areas

  • Software
  • Artificial Intelligence
  • Applied Mathematics
  • Computer Vision and Pattern Recognition
  • Computational Theory and Mathematics

User-Defined Keywords

  • Variational autoencoders (VAEs)
  • hierarchical variational autoencoders (HVAEs)
  • mutual information neural estimation
  • neural autoregressive sequence modeling

Fingerprint

Dive into the research topics of 'Learning Hierarchical Variational Autoencoders with Mutual Information Maximization for Autoregressive Sequence Modeling'. Together they form a unique fingerprint.

Cite this