DeepMind Alchemy task environment: a meta-reinforcement learning benchmark

DeepMind

Last update: Dec 25, 2022

Related tags

Deep Learning dm_alchemy

Overview

`dm_alchemy`: DeepMind Alchemy environment

The DeepMind Alchemy environment is a meta-reinforcement learning benchmark that presents tasks sampled from a task distribution with deep underlying structure. It was created to test for the ability of agents to reason and plan via latent state inference, as well as useful exploration and experimentation. It is Unity-based.

Overview

This environment is provided through pre-packaged Docker containers.

This package consists of support code to run these Docker containers. You interact with the task environment via a dm_env Python interface.

Please see the documentation for more detailed information on the available tasks, actions and observations.

Requirements

dm_alchemy requires Docker, Python 3.6.1 or later and a x86-64 CPU with SSE4.2 support. We do not attempt to maintain a working version for Python 2.

Alchemy is intended to be run on Linux and is not officially supported on Mac and Windows. However, it can in principle be run on any platform (though installation may be more of a headache). In particular, on Windows, you will need to install and run Alchemy with WSL.

Note: We recommend using Python virtual environment to mitigate conflicts with your system's Python environment.

Download and install Docker:

For Linux, install Docker-CE. Ensure that you can run Docker as a non-root user.
Install Docker Desktop for OSX or Windows.

Ensure that docker is working correctly by running docker run -d gcr.io/deepmind-environments/alchemy:v1.0.0.

Installation

You can install dm_alchemy by cloning a local copy of our GitHub repository:

$ git clone https://github.com/deepmind/dm_alchemy.git
$ pip install wheel
$ pip install --upgrade setuptools
$ pip install ./dm_alchemy

To also install the dependencies for the examples/, install with:

$ pip install ./dm_alchemy[examples]

Usage

Once dm_alchemy is installed, to instantiate a dm_env instance run the following:

import dm_alchemy

LEVEL_NAME = ('alchemy/perceptual_mapping_'
              'randomized_with_rotation_and_random_bottleneck')
settings = dm_alchemy.EnvironmentSettings(seed=123, level_name=LEVEL_NAME)
env = dm_alchemy.load_from_docker(settings)

For more details see the introductory colab.

Citing Alchemy

If you use Alchemy in your work, please cite the accompanying technical report:

@article{wang2021alchemy,
    title={Alchemy: A structured task distribution for meta-reinforcement learning},
    author={Jane Wang and Michael King and Nicolas Porcel and Zeb Kurth-Nelson
        and Tina Zhu and Charlie Deck and Peter Choy and Mary Cassin and
        Malcolm Reynolds and Francis Song and Gavin Buttimore and David Reichert
        and Neil Rabinowitz and Loic Matthey and Demis Hassabis and Alex Lerchner
        and Matthew Botvinick},
    year={2021},
    journal={arXiv preprint arXiv:2102.02926},
    url={https://arxiv.org/abs/2102.02926},
}

Notice

This is not an officially supported Google product.

Comments

raise grpc.FutureTimeoutError() grpc.FutureTimeoutError

followed instruction from readme below

$ git clone https://github.com/deepmind/dm_alchemy.git
$ pip install wheel
$ pip install --upgrade setuptools
$ pip install ./dm_alchemy
$ pip install pygame

and then try to run human_agent.py using below command getting error

python human_agent.py --seed 123 --level_name 'alchemy/perceptual_mapping_randomized_with_rotation_and_random_bottleneck'

pygame 2.0.1 (SDL 2.0.14, Python 3.7.4)
Hello from the pygame community. https://www.pygame.org/contribute.html
I0212 16:02:42.783728 140172242384704 _load_environment.py:377] Downloading docker image "gcr.io/deepmind-environments/alchemy:v1.0.0"...
I0212 16:03:59.853875 140172242384704 _load_environment.py:379] Download finished.
Traceback (most recent call last):
  File "human_agent.py", line 186, in <module>
    app.run(main)
  File "/home/venv/lib/python3.7/site-packages/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/home/venv/lib/python3.7/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "human_agent.py", line 117, in main
    environment_variables=environment_variables) as env:
  File "/home/venv/lib/python3.7/site-packages/dm_alchemy/_load_environment.py", line 389, in load_from_docker
    connection_details=_connect_to_environment(port, settings),
  File "/home/venv/lib/python3.7/site-packages/dm_alchemy/_load_environment.py", line 246, in _connect_to_environment
    channel, connection = _create_channel_and_connection(port)
  File "/home/venv/lib/python3.7/site-packages/dm_alchemy/_load_environment.py", line 204, in _create_channel_and_connection
    _check_grpc_channel_ready(channel)
  File "/home/venv/lib/python3.7/site-packages/dm_alchemy/_load_environment.py", line 183, in _check_grpc_channel_ready
    return grpc.channel_ready_future(channel).result(timeout=1)
  File "/home/venv/lib/python3.7/site-packages/grpc/_utilities.py", line 140, in result
    self._block(timeout)
  File "/home/venv/lib/python3.7/site-packages/grpc/_utilities.py", line 86, in _block
    raise grpc.FutureTimeoutError()
grpc.FutureTimeoutError

opened by vis7 5

Add `frozendict` requirement

frozendict seems to be used by the code in a few places [1] with no explicit requirement, just adding a missing line to fix install.

[1] https://github.com/deepmind/dm_alchemy/search?q=frozendict

opened by tkukurin 2

grpc.FutureTimeoutError

unfortunately on windows, I run into the same error trace as #1 trying to run human_agent.py:

python human_agent.py --seed 123 --level_name alchemy/perceptual_mapping_randomized_with_rotation_and_random_bottleneck' gives the error:

pygame 2.0.1 (SDL 2.0.14, Python 3.7.9)
Hello from the pygame community. https://www.pygame.org/contribute.html
Traceback (most recent call last):
  File "human_agent.py", line 186, in <module>
    app.run(main)
  File "C:\Users\jonat\AppData\Local\Programs\Python\Python37\lib\site-packages\absl\app.py", line 303, in run
    _run_main(main, args)
  File "C:\Users\jonat\AppData\Local\Programs\Python\Python37\lib\site-packages\absl\app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "human_agent.py", line 117, in main
    environment_variables=environment_variables) as env:
  File "C:\Users\jonat\AppData\Local\Programs\Python\Python37\lib\site-packages\dm_alchemy\_load_environment.py", line 389, in load_from_docker
    connection_details=_connect_to_environment(port, settings),
  File "C:\Users\jonat\AppData\Local\Programs\Python\Python37\lib\site-packages\dm_alchemy\_load_environment.py", line 246, in _connect_to_environment
    channel, connection = _create_channel_and_connection(port)
  File "C:\Users\jonat\AppData\Local\Programs\Python\Python37\lib\site-packages\dm_alchemy\_load_environment.py", line 204, in _create_channel_and_connection
    _check_grpc_channel_ready(channel)
  File "C:\Users\jonat\AppData\Local\Programs\Python\Python37\lib\site-packages\dm_alchemy\_load_environment.py", line 183, in _check_grpc_channel_ready
    return grpc.channel_ready_future(channel).result(timeout=1)
  File "C:\Users\jonat\AppData\Local\Programs\Python\Python37\lib\site-packages\grpc\_utilities.py", line 140, in result
    self._block(timeout)
  File "C:\Users\jonat\AppData\Local\Programs\Python\Python37\lib\site-packages\grpc\_utilities.py", line 86, in _block
    raise grpc.FutureTimeoutError()
grpc.FutureTimeoutError

the command that is mentioned in #1 was already updated in the installation instructions, but does not seem to help. Running

level_name = 'alchemy/perceptual_mapping_randomized_with_rotation_and_random_bottleneck'
seed = 1023
settings = dm_alchemy.EnvironmentSettings(
    seed=seed, level_name=level_name, width=width, height=height)
env = dm_alchemy.load_from_docker(settings)

in the AlchemyGettingStarted.ipynb seems to give the same error. The setup is as written in the guide and uses Docker Desktop (except that I don't use a Python virtual environment). docker run -d gcr.io/deepmind-environments/alchemy:v1.0.0 gives no error.

While you don't officially support windows maybe you can still help?

opened by jthomy 2

Running IdealObserverBot

Hi, I am trying to replicate the evaluation of the Ideal Observer agent on Symbolic Alchemy to get the ~ 284.4 ± 1.6 score reported in the paper. That said, I wrote the following function in an attempt to do so:

Side note: I know that you made available the ideal observer evaluation trajectory in the agent_events folder, but my aim is to then use the IdealObserverBot on a different set of events.

def evaluate():

    level_name = "perceptual_mapping_randomized_with_random_bottleneck"
    chems = chemistries_proto_conversion.load_chemistries_and_items(
        f'chemistries/{level_name}/chemistries')

    precomputed = precomputed_maps.load_from_level_name(level_name)
    reward_weights = RewardWeights(coefficients=[1, 1, 1], offset=0, bonus=12)

    N_EPISODES = 1_000
    rewards = np.zeros(N_EPISODES)

    for ep in tqdm(range(N_EPISODES)):

        chem, items = chems[ep]
        env = symbolic_alchemy.get_symbolic_alchemy_fixed(chemistry=chem, episode_items=items)

        env.add_trackers({symbolic_alchemy_trackers.ScoreTracker.NAME: symbolic_alchemy_trackers.ScoreTracker(env._reward_weights)})

        belief_state_tracker = {symbolic_alchemy_trackers.BeliefStateTracker.NAME: symbolic_alchemy_trackers.BeliefStateTracker(precomputed, env)}
        env.add_trackers(belief_state_tracker)

        bot = IdealObserverBot(reward_weights, precomputed, env, minimise_world_states=False)

        episode_results = bot.run_episode()
    
        rewards[ep] = np.sum(episode_results['score']['per_trial'])

    return rewards.mean(), rewards.std() / np.sqrt(N_EPISODES)

Since each episode takes understandably a long time to run, I just want to confirm the correctness of my code as it's based on a few assumptions I took while reading the codebase.

Thanks!

opened by BKHMSI 2

Live animation of the environment

Is there an easy way to have a live-updating render of what's happening in the environment as actions are taken? I'm hoping for something similar to the GIF you have in the README.

opened by ilia10000 0

Owner

DeepMind

GitHub

RoboDesk A Multi-Task Reinforcement Learning Benchmark

RoboDesk A Multi-Task Reinforcement Learning Benchmark If you find this open source release useful, please reference in your paper: @misc{kannan2021ro

66 Oct 7, 2022

Code for the ICML 2021 paper "Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation", Haoxiang Wang, Han Zhao, Bo Li.

Bridging Multi-Task Learning and Meta-Learning Code for the ICML 2021 paper "Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Trainin

57 Dec 15, 2022

Implementation of "Meta-rPPG: Remote Heart Rate Estimation Using a Transductive Meta-Learner"

Meta-rPPG: Remote Heart Rate Estimation Using a Transductive Meta-Learner This repository is the official implementation of Meta-rPPG: Remote Heart Ra

137 Dec 13, 2022

This is an official PyTorch implementation of Task-Adaptive Neural Network Search with Meta-Contrastive Learning (NeurIPS 2021, Spotlight).

NeurIPS 2021 (Spotlight): Task-Adaptive Neural Network Search with Meta-Contrastive Learning This is an official PyTorch implementation of Task-Adapti

15 Nov 21, 2022

Source code of the paper Meta-learning with an Adaptive Task Scheduler.

ATS About Source code of the paper Meta-learning with an Adaptive Task Scheduler. If you find this repository useful in your research, please cite the

16 Dec 26, 2022

Reinforcement Learning with Q-Learning Algorithm on gym's frozen lake environment implemented in python

Reinforcement Learning with Q Learning Algorithm Q learning algorithm is trained on the gym's frozen lake environment. Libraries Used gym Numpy tqdm P

1 Nov 10, 2021

TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.

TextWorld A text-based game generator and extensible sandbox learning environment for training and testing reinforcement learning (RL) agents. Also ch

983 Dec 23, 2022

Usable Implementation of "Bootstrap Your Own Latent" self-supervised learning, from Deepmind, in Pytorch

Bootstrap Your Own Latent (BYOL), in Pytorch Practical implementation of an astoundingly simple method for self-supervised learning that achieves a ne

1.4k Dec 29, 2022

Trading and Backtesting environment for training reinforcement learning agent or simple rule base algo.

TradingGym TradingGym is a toolkit for training and backtesting the reinforcement learning algorithms. This was inspired by OpenAI Gym and imitated th

1.1k Jan 2, 2023

CowHerd is a partially-observed reinforcement learning environment

CowHerd is a partially-observed reinforcement learning environment, where the player walks around an area and is rewarded for milking cows. The cows try to escape and the player can place fences to help capture them. The implementation of CowHerd is based on the Crafter environment.

6 Mar 6, 2022

Reinforcement learning models in ViZDoom environment

DoomNet DoomNet is a ViZDoom agent trained by reinforcement learning. The agent is a neural network that outputs a probability of actions given only p

126 Dec 9, 2022

Predicting path with preference based on user demonstration using Maximum Entropy Deep Inverse Reinforcement Learning in a continuous environment

Preference-Planning-Deep-IRL Introduction Check my portfolio post Dependencies Gym stable-baselines3 PyTorch Usage Take Demonstration python3 record.

9 Oct 26, 2022

A modified version of DeepMind's Alphafold2 to divide CPU part (MSA and template searching) and GPU part (prediction model)

ParallelFold Author: Bozitao Zhong This is a modified version of DeepMind's Alphafold2 to divide CPU part (MSA and template searching) and GPU part (p

77 Dec 22, 2022

DeepMind Alchemy task environment: a meta-reinforcement learning benchmark

Related tags

Overview

`dm_alchemy`: DeepMind Alchemy environment

Overview

Requirements

Installation

Usage

Citing Alchemy

Notice

Comments

raise grpc.FutureTimeoutError() grpc.FutureTimeoutError

Add `frozendict` requirement

grpc.FutureTimeoutError

Running IdealObserverBot

Live animation of the environment

Owner

DeepMind

RoboDesk A Multi-Task Reinforcement Learning Benchmark

Code for the ICML 2021 paper "Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation", Haoxiang Wang, Han Zhao, Bo Li.

Implementation of "Meta-rPPG: Remote Heart Rate Estimation Using a Transductive Meta-Learner"

This is an official PyTorch implementation of Task-Adaptive Neural Network Search with Meta-Contrastive Learning (NeurIPS 2021, Spotlight).

Source code of the paper Meta-learning with an Adaptive Task Scheduler.

Reinforcement Learning with Q-Learning Algorithm on gym's frozen lake environment implemented in python

TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.

Usable Implementation of "Bootstrap Your Own Latent" self-supervised learning, from Deepmind, in Pytorch

Trading and Backtesting environment for training reinforcement learning agent or simple rule base algo.

CowHerd is a partially-observed reinforcement learning environment

Reinforcement learning models in ViZDoom environment

Predicting path with preference based on user demonstration using Maximum Entropy Deep Inverse Reinforcement Learning in a continuous environment

Multi-agent reinforcement learning algorithm and environment

The Environment I built to study Reinforcement Learning + Pokemon Showdown

Wordle Env: A Daily Word Environment for Reinforcement Learning

My implementation of DeepMind's Perceiver

An implementation of DeepMind's Relational Recurrent Neural Networks in PyTorch.

This's an implementation of deepmind Visual Interaction Networks paper using pytorch

A modified version of DeepMind's Alphafold2 to divide CPU part (MSA and template searching) and GPU part (prediction model)

DeepMind Alchemy task environment: a meta-reinforcement learning benchmark

Related tags

Overview

dm_alchemy: DeepMind Alchemy environment

Overview

Requirements

Installation

Usage

Citing Alchemy

Notice

Comments

raise grpc.FutureTimeoutError() grpc.FutureTimeoutError

Add `frozendict` requirement

grpc.FutureTimeoutError

Running IdealObserverBot

Live animation of the environment

Owner

DeepMind

RoboDesk A Multi-Task Reinforcement Learning Benchmark

Code for the ICML 2021 paper "Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation", Haoxiang Wang, Han Zhao, Bo Li.

Implementation of "Meta-rPPG: Remote Heart Rate Estimation Using a Transductive Meta-Learner"

This is an official PyTorch implementation of Task-Adaptive Neural Network Search with Meta-Contrastive Learning (NeurIPS 2021, Spotlight).

Source code of the paper Meta-learning with an Adaptive Task Scheduler.

Reinforcement Learning with Q-Learning Algorithm on gym's frozen lake environment implemented in python

​TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.

Usable Implementation of "Bootstrap Your Own Latent" self-supervised learning, from Deepmind, in Pytorch

Trading and Backtesting environment for training reinforcement learning agent or simple rule base algo.

CowHerd is a partially-observed reinforcement learning environment

Reinforcement learning models in ViZDoom environment

Predicting path with preference based on user demonstration using Maximum Entropy Deep Inverse Reinforcement Learning in a continuous environment

Multi-agent reinforcement learning algorithm and environment

The Environment I built to study Reinforcement Learning + Pokemon Showdown

Wordle Env: A Daily Word Environment for Reinforcement Learning

My implementation of DeepMind's Perceiver

An implementation of DeepMind's Relational Recurrent Neural Networks in PyTorch.

This's an implementation of deepmind Visual Interaction Networks paper using pytorch

A modified version of DeepMind's Alphafold2 to divide CPU part (MSA and template searching) and GPU part (prediction model)

`dm_alchemy`: DeepMind Alchemy environment

TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.