TY - JOUR
T1 - USN: A Robust Imitation Learning Method against Diverse Action Noise
AU - Yu, Xingrui
AU - Han, Bo
AU - Tsang, Ivor W.
N1 - XY was supported by China Scholarship Council No. 201806450045, Australian Artificial Intelligence Institute (AAII), University of Technology Sydney (UTS), Australia, and Centre for Frontier AI Research (CFAR), Agency for Science, Technology and Research (A*STAR),Singapore (https://www.a-star.edu.sg/cfar). BH was supported by the NSFC General Program No. 62376235, Guangdong Basic and Applied Basic Research Foundation No. 2022A1515011652, HKBU Faculty Niche Research Areas No. RC-FNRA-IG/22-23/SCI/04, and HKBU CSD Departmental Incentive Scheme. IWT was supported by Australian Artificial Intelligence Institute (AAII), University of Technology Sydney (UTS), Australia. This research was partially supported by the National Research Foundation, Singapore,and the Maritime and Port Authority of Singapore / Singapore Maritime Institute under the Maritime Transformation Programme (Maritime Artificial Intelligence (AI) Research Programme – Grant number SMI-2022-MTP-06). The computational work for this article was partially performed on resources of the National Super computing Centre, Singapore(https://www.nscc.sg).
Publisher Copyright:
©2024 The Authors.
PY - 2024/4/21
Y1 - 2024/4/21
N2 - Learning from imperfect demonstrations is a crucial challenge in imitation learning (IL). Unlike existing works that still rely on the enormous effort of expert demonstrators, we consider a more cost-effective option for obtaining a large number of demonstrations. That is, hire annotators to label actions for existing image records in realistic scenarios. However,action noise can occur when annotators are not domain experts or encounter confusing states.In this work, we introduce two particular forms of action noise, i.e.,state-independent and state-dependent action noise. Previous IL methods fail to achieve expert-level performance when the demonstrations contain action noise, especially the state-dependent action noise.To mitigate the harmful effects of action noises, we propose a robust learning paradigm called USN (Uncertainty-aware Sample-selection with Negative learning). The model first estimates the predictive uncertainty for all demonstration data and then selects samples with high loss based on the uncertainty measures. Finally, it updates the model parameters with additional negative learning on the selected samples. Empirical results in Box2Dtasks and Atari games show that USN consistently improves the final rewards of behavioral cloning, online imitation learning, and offline imitation learning methods under various action noises. The ratio of significant improvements is up to 94.44%. Moreover, our method scales to conditional imitation learning with real-world noisy commands in urban driving.
AB - Learning from imperfect demonstrations is a crucial challenge in imitation learning (IL). Unlike existing works that still rely on the enormous effort of expert demonstrators, we consider a more cost-effective option for obtaining a large number of demonstrations. That is, hire annotators to label actions for existing image records in realistic scenarios. However,action noise can occur when annotators are not domain experts or encounter confusing states.In this work, we introduce two particular forms of action noise, i.e.,state-independent and state-dependent action noise. Previous IL methods fail to achieve expert-level performance when the demonstrations contain action noise, especially the state-dependent action noise.To mitigate the harmful effects of action noises, we propose a robust learning paradigm called USN (Uncertainty-aware Sample-selection with Negative learning). The model first estimates the predictive uncertainty for all demonstration data and then selects samples with high loss based on the uncertainty measures. Finally, it updates the model parameters with additional negative learning on the selected samples. Empirical results in Box2Dtasks and Atari games show that USN consistently improves the final rewards of behavioral cloning, online imitation learning, and offline imitation learning methods under various action noises. The ratio of significant improvements is up to 94.44%. Moreover, our method scales to conditional imitation learning with real-world noisy commands in urban driving.
KW - imitation learning
KW - noisy demonstrations
KW - soft negative learning
KW - uncertainty-aware sample-selection
UR - https://www.jair.org/index.php/jair/article/view/15819/27029
UR - http://www.scopus.com/inward/record.url?scp=85192802474&partnerID=8YFLogxK
U2 - 10.1613/jair.1.15819
DO - 10.1613/jair.1.15819
M3 - Journal article
SN - 1076-9757
VL - 79
SP - 1237
EP - 1280
JO - Journal of Artificial Intelligence Research
JF - Journal of Artificial Intelligence Research
ER -