DrQ-v2: Improved Data-Augmented RL Agent
Method
DrQ-v2 is a model-free off-policy algorithm for image-based continuous control. DrQ-v2 builds on DrQ, an actor-critic approach that uses data augmentation to learn directly from pixels. We introduce several improvements including:
- Switch the base RL learner from SAC to DDPG.
- Incorporate n-step returns to estimate TD error.
- Introduce a decaying schedule for exploration noise.
- Make implementation 3.5 times faster.
- Find better hyper-parameters.
These changes allow us to significantly improve sample efficiency and wall-clock training time on a set of challening tasks from the DeepMind Control Suite compared to prior methods. Furthermore, DrQ-v2 is able to solve complex humanoid locomotion tasks directly from pixel observations, previously unattained by model-free RL.
Citation
If you use this repo in your research, please consider citing the paper as follows:
@article{yarats2021drqv2,
title={Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning},
author={Denis Yarats and Rob Fergus and Alessandro Lazaric and Lerrel Pinto},
journal={arXiv preprint arXiv:},
year={2021}
}
Instructions
Install dependencies:
conda env create -f conda_env.yml
conda activate drqv2
Train the agent:
python train.py task=quadruped_walk
Monitor results:
tensorboard --logdir exp_local
License
The majority of DrQ-v2 is licensed under the MIT license, however portions of the project are available under separate license terms: DeepMind is licensed under the Apache 2.0 license.