Pre-Training Goal-based Models for Sample-Efficient Reinforcement Learning

Conference paper

Haoqi Yuan, Zhancun Mu, Feiyang Xie, Zongqing Lu
The Twelfth International Conference on Learning Representations, 2024

PDF Website

Cite

APA Click to copy
Yuan, H., Mu, Z., Xie, F., & Lu, Z. (2024). Pre-Training Goal-based Models for Sample-Efficient Reinforcement Learning. In The Twelfth International Conference on Learning Representations.

Chicago/Turabian Click to copy
Yuan, Haoqi, Zhancun Mu, Feiyang Xie, and Zongqing Lu. “Pre-Training Goal-Based Models for Sample-Efficient Reinforcement Learning.” In The Twelfth International Conference on Learning Representations, 2024.

MLA Click to copy
Yuan, Haoqi, et al. “Pre-Training Goal-Based Models for Sample-Efficient Reinforcement Learning.” The Twelfth International Conference on Learning Representations, 2024.

BibTeX Click to copy

@inproceedings{haoqi2024a,
  title = {Pre-Training Goal-based Models for Sample-Efficient Reinforcement Learning},
  year = {2024},
  author = {Yuan, Haoqi and Mu, Zhancun and Xie, Feiyang and Lu, Zongqing},
  booktitle = {The Twelfth International Conference on Learning Representations}
}

Abstract

Pre-training on task-agnostic large datasets is a promising approach for enhancing the sample efficiency of reinforcement learning (RL) in solving complex tasks. We present PTGM, a novel method that pre-trains goal-based models to augment RL by providing temporal abstractions and behavior regularization. PTGM involves pre-training a low-level, goal-conditioned policy and training a high-level policy to generate goals for subsequent RL tasks. To address the challenges posed by the high-dimensional goal space, while simultaneously maintaining the agent’s capability to accomplish various skills, we propose clustering goals in the dataset to form a discrete high-level action space. Additionally, we introduce a pre-trained goal prior model to regularize the behavior of the high-level policy in RL, enhancing sample efficiency and learning stability. Experimental results in a robotic simulation environment and the challenging open-world environment of Minecraft demonstrate PTGM’s superiority in sample efficiency and task performance compared to baselines. Moreover, PTGM exemplifies enhanced interpretability and generalization of the acquired low-level skills.