Transfer learning for long-interval consecutive missing values imputation without external features in air pollution time series

  • Jun Ma
  • , Jack C.P. Cheng
  • , Yuexiong Ding
  • , Changqing Lin
  • , Feifeng Jiang
  • , Mingzhu Wang
  • , Chong Zhai*
  • *Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

80 Citations (Scopus)

Abstract

Air pollution has become one of the world’s largest health and environmental problems. Studies focusing on air quality prediction, influential factors analysis, and control policy evaluation are increasing. When conducting these studies, valid and high-quality air pollution data are necessarily required to generate reasonable results. Missing data, which is frequently contained in the collected raw data, therefore, has become a significant barrier. Existing methods on missing data either cannot effectively capture the temporal and spatial mechanism of air pollution or focus on sequences with low missing rates and random missing positions. To address this problem, this paper proposes a new imputation methodology, namely transferred long short-term memory-based iterative estimation (TLSTM-IE) to impute consecutive missing values with large missing rates. A case study is conducted in New York City to verify the effectiveness and priority of the proposed methodology. Long-interval consecutive missing PM2.5 concentration data are filled. Experimental results show that the proposed model can effectively learn from long-term dependencies and transfer the learned knowledge. The imputation accuracy of the TLSTM-IE model is 25–50% higher than other commonly seen methods. The novelty of this study lies in two aspects. First is that we target at long-interval consecutive missing data, which has not been addressed before by existing studies in atmospheric research. Second is the novel application of transfer learning on missing values imputation. To our best knowledge, no research on air quality has implemented this technique on this problem before.
Original languageEnglish
Article number101092
JournalAdvanced Engineering Informatics
Volume44
DOIs
Publication statusPublished - Apr 2020

User-Defined Keywords

  • Air quality
  • Deep learning
  • Long short-term memory (LSTM)
  • Long-interval consecutive missing values
  • Neural network
  • Transfer learning

Fingerprint

Dive into the research topics of 'Transfer learning for long-interval consecutive missing values imputation without external features in air pollution time series'. Together they form a unique fingerprint.

Cite this