Paper
5 July 2024 State-only imitation learning via generative adversarial and inverse dynamics model
Zhen Zhou, Xiaoming Wang, Yang Li, Xiangfeng Luo, Tao Wang
Author Affiliations +
Proceedings Volume 13184, Third International Conference on Electronic Information Engineering and Data Processing (EIEDP 2024); 1318460 (2024) https://doi.org/10.1117/12.3032879
Event: 3rd International Conference on Electronic Information Engineering and Data Processing (EIEDP 2024), 2024, Kuala Lumpur, Malaysia
Abstract
Imitation learning aims to learn policy from the demonstrations of experts. Compared to reinforcement learning, which learns by trial and error, imitation learning is not limited and affected by reward functions. Therefore, more and more research is focusing on using imitation learning to help agents explore and learn, especially in reward-sparse environments. Most existing work in this area assumes that expert demonstrations include both state and action information. However, in many cases we are only provided with state-only demonstrations, which can affect policy performance. In this paper, we use a state-only demonstrations to guide agents learning in a reward-sparse environment. We propose a policy optimization from observation (POfO) method. First, we reshape the rewards by forcing occupancy measure matching between the current policy and the demonstrations, which can effectively guide agent learning. Second, we train an inverse dynamics model(IDM) for inferring and completing the missing actions in state-only demonstrations. Finally, we accelerate policy learning based on demonstrations that have been complemented by IDM. According to the experimental results, the performance of our method is comparable to that of the method using the complete demonstrations and is significantly better than other methods of the same type.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Zhen Zhou, Xiaoming Wang, Yang Li, Xiangfeng Luo, and Tao Wang "State-only imitation learning via generative adversarial and inverse dynamics model", Proc. SPIE 13184, Third International Conference on Electronic Information Engineering and Data Processing (EIEDP 2024), 1318460 (5 July 2024); https://doi.org/10.1117/12.3032879
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Education and training

Neural networks

Ablation

Unmanned ground vehicles

Data modeling

Detection and tracking algorithms

Mathematical optimization

Back to Top