MO-Gym: Multi-Objective Reinforcement Learning Environments
Gym environments for multi-objective reinforcement learning (MORL). The environments follow the standard gym's API, but return vectorized rewards as numpy arrays.
For details on multi-objective MPDS (MOMDP's) and other MORL definitions, see A practical guide to multi-objective reinforcement learning and planning.
Install
git clone https://github.com/LucasAlegre/mo-gym.git
cd mo-gym
pip install -e .
Usage
import gym
import mo_gym
env = gym.make('minecart-v0') # It follows the original gym's API ...
obs = env.reset()
next_obs, vector_reward, done, info = env.step(your_agent.act(obs)) # but vector_reward is a numpy array!
# Optionally, you can scalarize the reward function with the LinearReward wrapper
env = mo_gym.LinearReward(env, weight=np.array([0.8, 0.2, 0.2]))
Environments
Env | Obs/Action spaces | Objectives | Description |
---|---|---|---|
deep-sea-treasure-v0 |
Discrete / Discrete | [treasure, time_penalty] |
Agent is a submarine that must collect a treasure while taking into account a time penalty. Treasures values taken from Yang et al. 2019. |
resource-gathering-v0 |
Discrete / Discrete | [enemy, gold, gem] |
Agent must collect gold or gem. Enemies have a 10% chance of killing the agent. From Barret & Narayanan 2008. |
four-room-v0 |
Discrete / Discrete | [item1, item2, item3] |
Agent must collect three different types of items in the map and reach the goal. |
mo-mountaincar-v0 |
Continuous / Discrete | [time_penalty, reverse_penalty, forward_penalty] |
Classic Mountain Car env, but with extra penalties for the forward and reverse actions. From Vamplew et al. 2011. |
mo-reacher-v0 |
Continuous / Discrete | [target_1, target_2, target_3, target_4] |
Reacher robot from PyBullet, but there are 4 different target positions. |
minecart-v0 |
Continuous or Image / Discrete | [ore1, ore2, fuel] |
Agent must collect two types of ores and minimize fuel consumption. From Abels et al. 2019. |
mo-supermario-v0 |
Image / Discrete | [x_pos, time, death, coin, enemy] |
Multi-objective version of SuperMarioBrosEnv. Objectives are defined similarly as in Yang et al. 2019. |
Citing
If you use this repository in your work, please cite:
@misc{mo-gym,
author = {Lucas N. Alegre},
title = {MO-Gym: Multi-Objective Reinforcement Learning Environments},
year = {2022},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/LucasAlegre/mo-gym}},
}
Acknowledgments
- The
minecart-v0
env is a refactor of https://github.com/axelabels/DynMORL. - The
deep-sea-treasure-v0
andmo-supermario-v0
are based on https://github.com/RunzheYang/MORL. - The
four-room-v0
is based on https://github.com/mike-gimelfarb/deep-successor-features-for-transfer.