Zhancun Mu

Student


Curriculum vitae



YuanPei College

Peking University



Pre-Training Goal-based Models for Sample-Efficient Reinforcement Learning


Conference paper


Haoqi Yuan, Zhancun Mu, Feiyang Xie, Zongqing Lu
The Twelfth International Conference on Learning Representations, 2024

PDF Website
Cite

Cite

APA   Click to copy
Yuan, H., Mu, Z., Xie, F., & Lu, Z. (2024). Pre-Training Goal-based Models for Sample-Efficient Reinforcement Learning. In The Twelfth International Conference on Learning Representations.


Chicago/Turabian   Click to copy
Yuan, Haoqi, Zhancun Mu, Feiyang Xie, and Zongqing Lu. “Pre-Training Goal-Based Models for Sample-Efficient Reinforcement Learning.” In The Twelfth International Conference on Learning Representations, 2024.


MLA   Click to copy
Yuan, Haoqi, et al. “Pre-Training Goal-Based Models for Sample-Efficient Reinforcement Learning.” The Twelfth International Conference on Learning Representations, 2024.


BibTeX   Click to copy

@inproceedings{haoqi2024a,
  title = {Pre-Training Goal-based Models for Sample-Efficient Reinforcement Learning},
  year = {2024},
  author = {Yuan, Haoqi and Mu, Zhancun and Xie, Feiyang and Lu, Zongqing},
  booktitle = {The Twelfth International Conference on Learning Representations}
}

Abstract

Pre-training on task-agnostic large datasets is a promising approach for enhancing the sample efficiency of reinforcement learning (RL) in solving complex tasks. We present PTGM, a novel method that pre-trains goal-based models to augment RL by providing temporal abstractions and behavior regularization. PTGM involves pre-training a low-level, goal-conditioned policy and training a high-level policy to generate goals for subsequent RL tasks. To address the challenges posed by the high-dimensional goal space, while simultaneously maintaining the agent’s capability to accomplish various skills, we propose clustering goals in the dataset to form a discrete high-level action space. Additionally, we introduce a pre-trained goal prior model to regularize the behavior of the high-level policy in RL, enhancing sample efficiency and learning stability. Experimental results in a robotic simulation environment and the challenging open-world environment of Minecraft demonstrate PTGM’s superiority in sample efficiency and task performance compared to baselines. Moreover, PTGM exemplifies enhanced interpretability and generalization of the acquired low-level skills.

Share



Follow this website


You need to create an Owlstown account to follow this website.


Sign up

Already an Owlstown member?

Log in