State-only imitation learning via generative adversarial and inverse dynamics model

Zhen Zhou; Xiaoming Wang; Yang Li; Xiangfeng Luo; Tao Wang

doi:10.1117/12.3032879

5 July 2024 State-only imitation learning via generative adversarial and inverse dynamics model

Zhen Zhou, Xiaoming Wang, Yang Li, Xiangfeng Luo, Tao Wang

Proceedings Volume 13184, Third International Conference on Electronic Information Engineering and Data Processing (EIEDP 2024); 1318460 (2024) https://doi.org/10.1117/12.3032879
Event: 3rd International Conference on Electronic Information Engineering and Data Processing (EIEDP 2024), 2024, Kuala Lumpur, Malaysia

Abstract

Imitation learning aims to learn policy from the demonstrations of experts. Compared to reinforcement learning, which learns by trial and error, imitation learning is not limited and affected by reward functions. Therefore, more and more research is focusing on using imitation learning to help agents explore and learn, especially in reward-sparse environments. Most existing work in this area assumes that expert demonstrations include both state and action information. However, in many cases we are only provided with state-only demonstrations, which can affect policy performance. In this paper, we use a state-only demonstrations to guide agents learning in a reward-sparse environment. We propose a policy optimization from observation (POfO) method. First, we reshape the rewards by forcing occupancy measure matching between the current policy and the demonstrations, which can effectively guide agent learning. Second, we train an inverse dynamics model(IDM) for inferring and completing the missing actions in state-only demonstrations. Finally, we accelerate policy learning based on demonstrations that have been complemented by IDM. According to the experimental results, the performance of our method is comparable to that of the method using the complete demonstrations and is significantly better than other methods of the same type.

(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.

Citation Download Citation

Zhen Zhou, Xiaoming Wang, Yang Li, Xiangfeng Luo, and Tao Wang "State-only imitation learning via generative adversarial and inverse dynamics model", Proc. SPIE 13184, Third International Conference on Electronic Information Engineering and Data Processing (EIEDP 2024), 1318460 (5 July 2024); https://doi.org/10.1117/12.3032879

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available

Members: $17.00

Non-members: $21.00 ADD TO CART

PROCEEDINGS
8 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Education and training

Neural networks

Ablation

Unmanned ground vehicles

Data modeling

Detection and tracking algorithms

Mathematical optimization

Show All Keywords

Keywords/Phrases

Search In:

Publication Years