TY - JOUR
T1 - Transfer learning for long-interval consecutive missing values imputation without external features in air pollution time series
AU - Ma, Jun
AU - Cheng, Jack C.P.
AU - Ding, Yuexiong
AU - Lin, Changqing
AU - Jiang, Feifeng
AU - Wang, Mingzhu
AU - Zhai, Chong
N1 - Publisher Copyright:
© Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2020/4
Y1 - 2020/4
N2 - Air pollution has become one of the world’s largest health and environmental problems. Studies focusing on air quality prediction, influential factors analysis, and control policy evaluation are increasing. When conducting these studies, valid and high-quality air pollution data are necessarily required to generate reasonable results. Missing data, which is frequently contained in the collected raw data, therefore, has become a significant barrier. Existing methods on missing data either cannot effectively capture the temporal and spatial mechanism of air pollution or focus on sequences with low missing rates and random missing positions. To address this problem, this paper proposes a new imputation methodology, namely transferred long short-term memory-based iterative estimation (TLSTM-IE) to impute consecutive missing values with large missing rates. A case study is conducted in New York City to verify the effectiveness and priority of the proposed methodology. Long-interval consecutive missing PM2.5 concentration data are filled. Experimental results show that the proposed model can effectively learn from long-term dependencies and transfer the learned knowledge. The imputation accuracy of the TLSTM-IE model is 25–50% higher than other commonly seen methods. The novelty of this study lies in two aspects. First is that we target at long-interval consecutive missing data, which has not been addressed before by existing studies in atmospheric research. Second is the novel application of transfer learning on missing values imputation. To our best knowledge, no research on air quality has implemented this technique on this problem before.
AB - Air pollution has become one of the world’s largest health and environmental problems. Studies focusing on air quality prediction, influential factors analysis, and control policy evaluation are increasing. When conducting these studies, valid and high-quality air pollution data are necessarily required to generate reasonable results. Missing data, which is frequently contained in the collected raw data, therefore, has become a significant barrier. Existing methods on missing data either cannot effectively capture the temporal and spatial mechanism of air pollution or focus on sequences with low missing rates and random missing positions. To address this problem, this paper proposes a new imputation methodology, namely transferred long short-term memory-based iterative estimation (TLSTM-IE) to impute consecutive missing values with large missing rates. A case study is conducted in New York City to verify the effectiveness and priority of the proposed methodology. Long-interval consecutive missing PM2.5 concentration data are filled. Experimental results show that the proposed model can effectively learn from long-term dependencies and transfer the learned knowledge. The imputation accuracy of the TLSTM-IE model is 25–50% higher than other commonly seen methods. The novelty of this study lies in two aspects. First is that we target at long-interval consecutive missing data, which has not been addressed before by existing studies in atmospheric research. Second is the novel application of transfer learning on missing values imputation. To our best knowledge, no research on air quality has implemented this technique on this problem before.
KW - Air quality
KW - Deep learning
KW - Long short-term memory (LSTM)
KW - Long-interval consecutive missing values
KW - Neural network
KW - Transfer learning
UR - https://www.sciencedirect.com/science/article/abs/pii/S1474034620300616?via%3Dihub
U2 - 10.1016/j.aei.2020.101092
DO - 10.1016/j.aei.2020.101092
M3 - Journal article
SN - 1474-0346
VL - 44
JO - Advanced Engineering Informatics
JF - Advanced Engineering Informatics
M1 - 101092
ER -