Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.

Weirui Ye

Last update: Jan 3, 2023

Related tags

Deep Learning EfficientZero

Overview

EfficientZero (NeurIPS 2021)

Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.

Thank you for your attention, we will open-source the codebase later. Please leave your email address here. We will send you the email once we open-source it.

Citation

If you find this repo useful, please cite our paper:

@inproceedings{ye2021mastering,
  title={Mastering Atari Games with Limited Data},
  author={Weirui Ye, and Shaohuai Liu, and Thanard Kurutach, and Pieter Abbeel, and Yang Gao},
  booktitle={NeurIPS},
  year={2021}
}

Contact

If you have any question or want to use the code, please contact [email protected] .

Acknowledgement

We appreciate the following github repos a lot for their valuable code base or datasets:

https://github.com/koulanurag/muzero-pytorch

https://github.com/werner-duvaud/muzero-general

Comments

Add .gitignore for built ctree files

After building the ctree files using make.sh, it tries to mark the build files as updates to the git repo.

This .gitignore file will prevent people from accidentally committing these files.

opened by steventrouble 3
"bash make.sh" failed

Hi Weirui, I tried to build the depandency but failed. Is there a requirement on GCC version? The log is as belows. I also tried to modify ">>" to "> >", but the "nullptr" problem was still out of there. Did you have any suggestions? Thank you!

Best, Tao

running build_ext building 'cytree' extension gcc -pthread -B /home/v-ty/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I. -I/home/v-ty/anaconda3/lib/python3.8/site-packages/numpy/core/include -I/home/v-ty/anaconda3/include/python3.8 -c cytree.cpp -o build/temp.linux-x86_64-3.8/cytree.o cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ In file included from cytree.cpp:653:0: cnode.cpp:31:9: warning: identifier ‘nullptr’ is a keyword in C++11 [-Wc++0x-compat] this->ptr_node_pool = nullptr; ^ In file included from /home/v-ty/anaconda3/lib/python3.8/site-packages/numpy/core/include/numpy/ndarraytypes.h:1822:0, from /home/v-ty/anaconda3/lib/python3.8/site-packages/numpy/core/include/numpy/ndarrayobject.h:12, from /home/v-ty/anaconda3/lib/python3.8/site-packages/numpy/core/include/numpy/arrayobject.h:4, from cytree.cpp:659: /home/v-ty/anaconda3/lib/python3.8/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: #warning "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp] #warning "Using deprecated NumPy API, disable it with "
^ In file included from cnode.cpp:2:0, from cytree.cpp:653: cnode.h:47:42: error: ‘>>’ should be ‘> >’ within a nested template argument list std::vector<std::vector> node_pools; ^ cnode.h:53:94: error: ‘>>’ should be ‘> >’ within a nested template argument list void prepare(float root_exploration_fraction, const std::vector<std::vector> &noises, const std::vector &value_prefixs, const std::vector<std::vector> &policies); ^ cnode.h:53:182: error: ‘>>’ should be ‘> >’ within a nested template argument list void prepare(float root_exploration_fraction, const std::vector<std::vector> &noises, const std::vector &value_prefixs, const std::vector<std::vector> &policies); ^ cnode.h:54:111: error: ‘>>’ should be ‘> >’ within a nested template argument list void prepare_no_noise(const std::vector &value_prefixs, const std::vector<std::vector> &policies); ^ cnode.h:56:40: error: ‘>>’ should be ‘> >’ within a nested template argument list std::vector<std::vector> get_trajectories(); ^ cnode.h:57:40: error: ‘>>’ should be ‘> >’ within a nested template argument list std::vector<std::vector> get_distributions(); ^ cnode.h:67:43: error: ‘>>’ should be ‘> >’ within a nested template argument list std::vector<std::vector<CNode*>> search_paths; ^ cnode.h:79:184: error: ‘>>’ should be ‘> >’ within a nested template argument list void cbatch_back_propagate(int hidden_state_index_x, float discount, const std::vector &value_prefixs, const std::vector &values, const std::vector<std::vector> &policies, tools::CMinMaxStatsList min_max_s ^ In file included from cytree.cpp:653:0: cnode.cpp: In constructor ‘tree::CNode::CNode()’: cnode.cpp:31:31: error: ‘nullptr’ was not declared in this scope this->ptr_node_pool = nullptr; ^ In file included from cytree.cpp:653:0: cnode.cpp: At global scope: cnode.cpp:204:94: error: ‘>>’ should be ‘> >’ within a nested template argument list void CRoots::prepare(float root_exploration_fraction, const std::vector<std::vector> &noises, const std::vector &value_prefixs, const std::vector<std::vector> &policies){ ^ cnode.cpp:204:182: error: ‘>>’ should be ‘> >’ within a nested template argument list void CRoots::prepare(float root_exploration_fraction, const std::vector<std::vector> &noises, const std::vector &value_prefixs, const std::vector<std::vector> &policies){ ^ cnode.cpp:213:111: error: ‘>>’ should be ‘> >’ within a nested template argument list void CRoots::prepare_no_noise(const std::vector &value_prefixs, const std::vector<std::vector> &policies){ ^ cnode.cpp:226:32: error: ‘>>’ should be ‘> >’ within a nested template argument list std::vector<std::vector> CRoots::get_trajectories(){ ^ cnode.cpp: In member function ‘std::vector<std::vector > tree::CRoots::get_trajectories()’: cnode.cpp:227:36: error: ‘>>’ should be ‘> >’ within a nested template argument list std::vector<std::vector> trajs; ^ cnode.cpp: At global scope: cnode.cpp:236:32: error: ‘>>’ should be ‘> >’ within a nested template argument list std::vector<std::vector> CRoots::get_distributions(){ ^ cnode.cpp: In member function ‘std::vector<std::vector > tree::CRoots::get_distributions()’: cnode.cpp:237:36: error: ‘>>’ should be ‘> >’ within a nested template argument list std::vector<std::vector> distributions; ^ cnode.cpp: At global scope: cnode.cpp:317:184: error: ‘>>’ should be ‘> >’ within a nested template argument list void cbatch_back_propagate(int hidden_state_index_x, float discount, const std::vector &value_prefixs, const std::vector &values, const std::vector<std::vector> &policies, tools::CMinMaxStatsList min_max_s ^ cytree.cpp:3382:12: warning: ‘int pyx_pw_6cytree_4Node_1__cinit(PyObject, PyObject, PyObject*)’ defined but not used [-Wunused-function] static int pyx_pw_6cytree_4Node_1__cinit(PyObject *__pyx_v_self, PyObject *__pyx_args, PyObject *__pyx_kwds) { ^ error: command 'gcc' failed with exit status 1

opened by geekyutao 3
In reanalyze_worker GPU worker, why prepare policy targets and value targets separately?

Why prepare policy targets and value targets separately? The two functions both run initial inference and a MCTS search to get either the root distributions or the root values. Why not get them using a single initial inference and MCTS search to save computation?

opened by desaixie 2
Slight discrepancy with implementation of value scaling
Hey, firstly just wanted to say thank you because this is an amazing repo for understanding how MuZero/EfficientZero work in detail!

I've been trying to dig into exactly how the value prediction is done as it seems like a pretty significant detail that is hidden away in an appendix and I think there seems to be a slight discrepancy (that probably doesn't make much difference but is maybe still worth highlighting).

In the original paper (https://arxiv.org/pdf/1805.11593.pdf) they define the scaling function as: $h(x) = \text{sgn}(x) (\sqrt{|x| 1} - 1) + \epsilon x$

with the inverse function given by proposition A.2 (iii).

but in the MuZero appendix they have: $h(x) = \text{sgn}(x) (\sqrt{|x| 1} - 1 + \epsilon x)$ (with the final term inside the bracket).

Unless I'm mistaken, in the code you've used the MuZero version of h(x), but for the inverse formula you've used the formula given in proposition A.2 (iii) of the first paper - which won't quite be correct anymore, right?

Just to show the discrepancy - if I look at the following code:

import torch def scalar_transform(x, epsilon=0.001): sign = torch.ones(x.shape).float().to(x.device) sign[x < 0] = -1.0 output = sign * (torch.sqrt(torch.abs(x) + 1) - 1 + epsilon * x) return output def inverse_scalar_transform(value, epsilon=0.001): sign = torch.ones(value.shape).float().to(value.device) sign[value < 0] = -1.0 output = (((torch.sqrt(1 + 4 * epsilon * (torch.abs(value) + 1 + epsilon)) - 1) / (2 * epsilon)) ** 2 - 1) output = sign * output return output a = torch.randn(1000) b = scalar_transform(a) c = inverse_scalar_transform(b) print(torch.sum(torch.abs(a-c)))

which is how the functions are implemented in this code base I get a value of ~2.4 printed, whilst if I change the scalar transform to be the same as in the first paper I get a value of ~0.04.
opened by henrycharlesworth 2
what does reward_hidden_c mean in mcts.py?

Hi, In mcts.py code line 35-36, what does reward_hidden_c and reward_hidden_h mean? ( what is c and h short for?) why reward_hidden_c_pool = [reward_hidden_roots[0]] and reward_hidden_h_pool = [reward_hidden_roots[1]]. I find it difficult to understand the code, could you give some comments. Many thanks!

opened by sekv 2

EfficientZero doesn't seem to be training

Hi, first of all congratulations on the great work!

I haven't managed to train an agent yet using the EfficientZero framework. The command I'm using to train is the following:

python3 main.py  --env BreakoutNoFrameskip-v4 
                 --case atari 
                 --opr train 
                 --amp_type torch_amp 
                 --num_gpus 4 
                 --num_cpus 32 
                 --cpu_actor 12 
                 --gpu_actor 28 
                 --force 
                 --use_priority 
                 --use_max_priority 
                 --debug

In a cluster with the following architecture:

32 CPUs, each with 8 GB ram.
4 16GB teslaV100 gpus:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:00:1B.0 Off |                    0 |
| N/A   32C    P0    52W / 300W |  12836MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  On   | 00000000:00:1C.0 Off |                    0 |
| N/A   31C    P0    51W / 300W |  11373MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-SXM2...  On   | 00000000:00:1D.0 Off |                    0 |
| N/A   31C    P0    54W / 300W |  10004MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  Tesla V100-SXM2...  On   | 00000000:00:1E.0 Off |                    0 |
| N/A   33C    P0    55W / 300W |   8529MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

The problem I'm facing is that even after a while of training there's only the following log:

(pid=52926) A.L.E: Arcade Learning Environment (version +978d2ce)
(pid=52926) [Powered by Stella]
(pid=52926) Start evaluation at step 0.
(pid=52926) Step 0, test scores: 
(pid=52926) [5. 0. 5. 2. 0. 2. 0. 9. 0. 0. 0. 2. 2. 4. 0. 2. 0. 0. 0. 0. 2. 2. 0. 5.
(pid=52926)  0. 0. 0. 5. 2. 0. 2. 5.]

Also the results folder of the experiment is mostly empty, I only have a train.log with the initial parameters.

I'm not sure if this is just a matter of waiting for a long time or If something in the inner workings is stuck (it looks like the batch_storage from the main train loop is always empty since we haven't entered into the train phase yet.

Something I think is really weird is that time passes but the GPU Memory-Usage stays exactly the same which makes me think something is off.

Would appreciate any advice in order to make this work. Thanks in advance!

opened by SergioArnaud 2

How to evaluate the model
Thanks for your great work!

When I run your code, I find scores from the test bash is always a little higher than scores from the evaluation stage in the training bash (In train, the model is tested every 1w steps).

There are some results I got from the scripts. Left is from train bash and right is from test bash.

CrazyClimber 7246 9603 BankHeist 419 454

I have glanced two bash scripts and codes. In my understanding, two bash scripts evaluate agents in the completely same way where agents are evaluated with 32 seeds and get the mean of 32 scores.

So I have two questions,

Why the test bash is always a little higher than scores from the evaluation stage in the training bash ?

Which scripts you used to get the results in the paper?

Looking forward for your reply.
opened by yueyang130 1
Removing Baselines dependency
Hello, it would be nice to remove baselines dependency (as it requires tensorflow, whereas the rest of the codebase is written with pytorch). As apparently it is used for atari wrappers only, there are few options:

use the ones that are now in Gym (but I'm not sure they are exactly the same)

use the ones from Stable-Baselines3 (also depends on pytorch, so less dependencies)

copy them into the repo
opened by araffin 1
Question about the effect of state encoding indentity connection in dynamics network

Thanks for you open-sourced code very much.

I'm a little confused about the reason for the identity connection of state encoding in DynamicsNetwork in model.py:

Why do we add this state encoding identity connection, rather than using action encoding, and what is its empirical impact on atari results?

Looking forward to your reply！

opened by puyuan1996 0
reproduce results for other environment

Hi, Nice work! Many thanks for the open-source code.

If I want to reproduce results for other environments other than BreakoutNoFrameskip-v4, what env name (especially the version name like -v4) I should pass in?

Thanks!

opened by yix081 0
Question about the effect of discount factor and done mask when calculating the target value?

Thanks for your open-sourced code very much.

This is a common definition of an target value in classical RL:

I'm a little confused about the way of calculating target value here in reanalyze_worker.py:

Why we do not multiply the bootstrap value (here is value_lst) by the discount_factor^td_steps, and why we do not mask the bootsrap value when the target obs is a done state.

Looking forward to your reply！

opened by puyuan1996 0
How to use with SLURM

Any guidance for using with SLURM? Certain actors are failing

When I run

srun -p compsci-gpu --gres=gpu:4 --cpus-per-gpu=5 --mem=24G --pty bash

Followed by:

python main.py --env BreakoutNoFrameskip-v4 --case atari --opr train --amp_type torch_amp --num_gpus 1 --num_cpus 10 --cpu_actor 1 --gpu_actor 1 --force

I get the following warning:

WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 135095644160 bytes available. This may slow down performance! You may be able to free up space by deleting files in /dev/shm or terminating any running plasma_store_server processes. If you are inside a Docker container, you may need to pass an argument with the flag '--shm-size' to 'docker run'.

Followed by the task failing:

2022-12-22 10:38:02,577 WARNING worker.py:1072 -- The node with node id 67f743d808b7bd16d45063d18dadf1b5cbb39e7d has been marked dead because the detector has missed too many heartbeats from it.

E1222 10:38:02.612172 8087 8433 task_manager.cc:323] Task failed: IOError: 14: Socket closed: Type=ACTOR_TASK, Language=PYTHON, Resources: {CPU: 1, }, function_descriptor={type=PythonFunctionDescriptor, module_name=core.reanalyze_worker, class_name=BatchWorker_CPU, function_name=run, function_hash=}, task_id=d251967856448ceb88866c7d01000000, task_name=BatchWorker_CPU.run(), job_id=01000000, num_args=0, num_returns=2, actor_task_spec={actor_id=88866c7d01000000, actor_caller_id=ffffffffffffffffffffffff01000000, actor_counter=0}

I am not sure how to parse the error, any advice? What #SBATCH headings do you recommend using in the providedtrain.sh? Thank you!

opened by dillonmsandhu 0
WSL2 NVIDIA 3090 or M1 MBP correct environment
Hi, I had trouble identifying the right mix of python and packages to get this to run.

Could you please review/confirm the python version and requirements.txt for either one of these?

Dual RTX 3090 on WSL2

M1 MBP

Is there a docker container for EfficientZero?

Many thanks in advance!
opened by atalapan 0
Question about whether need to train multiple agents for different games

Thanks for you open-sourced code very much. Recently, I want to apply the model used for breakout to other games, but I find that different games have different action Spaces, which will lead to errors in the process of test, the parameter dimension of breakout is inconsistent with that of other games, I would like to ask whether each game needs to train an agent separately,I really hope to get your answer,tank you

opened by QiGuLongDongQiang 1
Question about getting zero test score when I try to run EfficientZero on BabyAI grid environment
Hello, first of all thanks for your amazing job on EfficientZero.

I tried to adapt EfficientZero on BabyAI environment like: "PutNextLocal", but it just keep give me 0 test score during the 100k step training process.

I made several modifications in order to adapt to BabyAI "PutNextLocal" env:

I create dir for env at config/babyai, and implement BabyAIConfig(BaseConfig). I leave every parameters as default just like Atari, and only change line 101 from (image_channel,96,96) to (image_channel,7,7) in file config/babyai/__init__.py.

Change class name from AtariWrapper(Game) to BabyAIWrapper(Game), and leave everything else as default setting.

Comment out from line 103 to line 111 since grid game does not have ale.

Also comment out line from 235 to 237 https://github.com/YeWR/EfficientZero/blob/main/core/utils.py#L235 in core/utils.py

Also, modify my bash file like,

After running the programing with default parameter setting like atari, the tensorboard like: Do you have any suggestions about how to make a correct modification and make the program produce reasonable result on babyai 'PutNextLocal'?

Thank you so much and looking forward to hear from you.
opened by jiachengc 2
Question about the effect of state encoding indentity connection in dynamics network

Thanks for your open-sourced code very much.

I'm a little confused about the reason for the identity connection of state encoding in DynamicsNetwork in model.py:

Why do we add this state encoding identity connection, rather than using action encoding, and what is its empirical impact on atari results?

Looking forward to your reply！

opened by puyuan1996 1

Owner

Weirui Ye

GitHub

This is the codebase for the ICLR 2021 paper Trajectory Prediction using Equivariant Continuous Convolution

Trajectory Prediction using Equivariant Continuous Convolution (ECCO) This is the codebase for the ICLR 2021 paper Trajectory Prediction using Equivar

45 Jul 22, 2022

[NeurIPS 2021] “Improving Contrastive Learning on Imbalanced Data via Open-World Sampling”,

Improving Contrastive Learning on Imbalanced Data via Open-World Sampling Introduction Contrastive learning approaches have achieved great success in

24 Dec 17, 2022

Official implementation of "Open-set Label Noise Can Improve Robustness Against Inherent Label Noise" (NeurIPS 2021)

Open-set Label Noise Can Improve Robustness Against Inherent Label Noise NeurIPS 2021: This repository is the official implementation of ODNL. Require

12 Dec 7, 2022

Code for our NeurIPS 2021 paper 'Exploiting the Intrinsic Neighborhood Structure for Source-free Domain Adaptation'

Exploiting the Intrinsic Neighborhood Structure for Source-free Domain Adaptation (NeurIPS 2021) Code for our NeurIPS 2021 paper 'Exploiting the Intri

53 Dec 25, 2022

Source code of NeurIPS 2021 Paper ''Be Confident! Towards Trustworthy Graph Neural Networks via Confidence Calibration''

CaGCN This repo is for source code of NeurIPS 2021 paper "Be Confident! Towards Trustworthy Graph Neural Networks via Confidence Calibration". Paper L

6 Dec 19, 2022

Spearmint Bayesian optimization codebase

Spearmint Spearmint is a software package to perform Bayesian optimization. The Software is designed to automatically run experiments (thus the code n

Formerly: Harvard Intelligent Probabilistic Systems Group -- Now at Princeton

1.5k Dec 29, 2022

A general 3D Object Detection codebase in PyTorch.

Det3D is the first 3D Object Detection toolbox which provides off the box implementations of many 3D object detection algorithms such as PointPillars, SECOND, PIXOR, etc, as well as state-of-the-art methods on major benchmarks like KITTI(ViP) and nuScenes(CBGS).

1.4k Jan 5, 2023

Official codebase for Pretrained Transformers as Universal Computation Engines.

universal-computation Overview Official codebase for Pretrained Transformers as Universal Computation Engines. Contains demo notebook and scripts to r

210 Dec 28, 2022

AOT-GAN for High-Resolution Image Inpainting (codebase for image inpainting)

AOT-GAN for High-Resolution Image Inpainting Arxiv Paper | AOT-GAN: Aggregated Contextual Transformations for High-Resolution Image Inpainting Yanhong

214 Jan 3, 2023

This is the codebase for Diffusion Models Beat GANS on Image Synthesis.

3k Dec 26, 2022

Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.

Decision Transformer Lili Chen*, Kevin Lu*, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas†, and Igor M

1.4k Jan 7, 2023

Codebase for the Summary Loop paper at ACL2020

Summary Loop This repository contains the code for ACL2020 paper: The Summary Loop: Learning to Write Abstractive Summaries Without Examples. Training

Canny Lab @ The University of California, Berkeley

44 Nov 4, 2022

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

README.md shall be finished soon. WSSGG 0 Overview 1 Installation 1.1 Faster-RCNN 1.2 Language Parser 1.3 GloVe Embeddings 2 Settings 2.1 VG-GT-Graph

35 Nov 20, 2022

Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.

Related tags

Overview

EfficientZero (NeurIPS 2021)

Citation

Contact

Acknowledgement

Comments

Owner

Weirui Ye

This is the codebase for the ICLR 2021 paper Trajectory Prediction using Equivariant Continuous Convolution

[NeurIPS 2021] “Improving Contrastive Learning on Imbalanced Data via Open-World Sampling”,

Official implementation of "Open-set Label Noise Can Improve Robustness Against Inherent Label Noise" (NeurIPS 2021)

Code for our NeurIPS 2021 paper 'Exploiting the Intrinsic Neighborhood Structure for Source-free Domain Adaptation'

Source code of NeurIPS 2021 Paper ''Be Confident! Towards Trustworthy Graph Neural Networks via Confidence Calibration''

Spearmint Bayesian optimization codebase

A general 3D Object Detection codebase in PyTorch.

Official codebase for Pretrained Transformers as Universal Computation Engines.

AOT-GAN for High-Resolution Image Inpainting (codebase for image inpainting)

This is the codebase for Diffusion Models Beat GANS on Image Synthesis.

Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.

Codebase for the Summary Loop paper at ACL2020

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

X-modaler is a versatile and high-performance codebase for cross-modal analytics.

Codebase for Diffusion Models Beat GANS on Image Synthesis.

Official codebase for Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World

An Image Captioning codebase

Ranger - a synergistic optimizer using RAdam (Rectified Adam), Gradient Centralization and LookAhead in one codebase

Codebase for the self-supervised goal reaching benchmark introduced in the LEXA paper