Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".

Wah Loon Keng

Last update: Dec 24, 2022

Related tags

Reinforcement Learning benchmark reinforcement-learning deep-reinforcement-learning pytorch dqn policy-gradient a3c sac ppo a2c

Overview

SLM Lab

Modular Deep Reinforcement Learning framework in PyTorch.

Documentation:
https://slm-lab.gitbook.io/slm-lab/



BeamRider	Breakout	KungFuMaster	MsPacman

Pong	Qbert	Seaquest	Sp.Invaders

Ant	HalfCheetah	Hopper	Humanoid

Inv.DoublePendulum	InvertedPendulum	Reacher	Walker

Comments

All 'search' examples end with error

Describe the bug I'm enjoying the book a lot. The best book on the subject and I've read Sutton & Barto, but I'm an empiricist and not an academic. Anyway, I can run all the examples in the book in 'dev' and 'train' modes but not in 'search' mode. They all end with error. I don't see anybody complaining about this so it must be a rooky mistake on my part. I hope you can help so I can continue enjoying the book to its fullest.

To Reproduce

OS and environment: Ubuntu 18.04
SLM Lab git SHA (run git rev-parse HEAD to get it): What?
spec file used: benchmark/reinforce/reinforce_cartpole.json

Additional context I'm showing the error logs for Code 2.15 in page 50, but I get similar error logs for all the other codes ran in 'search' mode. There are 32 files in the 'data' folder, no plots. All the folders in the 'data' folder are empty except for 'log' which has a file with this

[2020-01-30 11:03:56,907 PID:3351 INFO search.py run_ray_search] Running ray search for spec reinforce_cartpole

NVIDIA drive version: 440.33.01 CUDA version: 10.2

Error logs

python run_lab.py slm_lab/spec/benchmark/reinforce/reinforce_cartpole.json reinforce_baseline_cartpole search
[2020-01-30 11:38:57,177 PID:4355 INFO run_lab.py read_spec_and_run] Running lab spec_file:slm_lab/spec/benchmark/reinforce/reinforce_cartpole.json spec_name:reinforce_baseline_cartpole in mode:search
[2020-01-30 11:38:57,183 PID:4355 INFO search.py run_ray_search] Running ray search for spec reinforce_baseline_cartpole
2020-01-30 11:38:57,183	WARNING worker.py:1341 -- WARNING: Not updating worker name since `setproctitle` is not installed. Install this with `pip install setproctitle` (or ray[debug]) to enable monitoring of worker processes.
2020-01-30 11:38:57,183	INFO node.py:497 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2020-01-30_11-38-57_183527_4355/logs.
2020-01-30 11:38:57,288	INFO services.py:409 -- Waiting for redis server at 127.0.0.1:59003 to respond...
2020-01-30 11:38:57,409	INFO services.py:409 -- Waiting for redis server at 127.0.0.1:55931 to respond...
2020-01-30 11:38:57,414	INFO services.py:806 -- Starting Redis shard with 3.35 GB max memory.
2020-01-30 11:38:57,435	INFO node.py:511 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2020-01-30_11-38-57_183527_4355/logs.
2020-01-30 11:38:57,435	INFO services.py:1441 -- Starting the Plasma object store with 5.02 GB memory using /dev/shm.
2020-01-30 11:38:57,543	INFO tune.py:60 -- Tip: to resume incomplete experiments, pass resume='prompt' or resume=True to run()
2020-01-30 11:38:57,543	INFO tune.py:223 -- Starting a new experiment.
== Status ==
Using FIFO scheduling algorithm.
Resources requested: 0/8 CPUs, 0/1 GPUs
Memory usage on this node: 2.1/16.7 GB

2020-01-30 11:38:57,572	WARNING logger.py:130 -- Couldn't import TensorFlow - disabling TensorBoard logging.
2020-01-30 11:38:57,573	WARNING logger.py:224 -- Could not instantiate <class 'ray.tune.logger.TFLogger'> - skipping.
== Status ==
Using FIFO scheduling algorithm.
Resources requested: 4/8 CPUs, 0/1 GPUs
Memory usage on this node: 2.2/16.7 GB
Result logdir: /home/joe/ray_results/reinforce_baseline_cartpole
Number of trials: 2 ({'RUNNING': 1, 'PENDING': 1})
PENDING trials:
 - ray_trainable_1_agent.0.algorithm.center_return=False,trial_index=1:	PENDING
RUNNING trials:
 - ray_trainable_0_agent.0.algorithm.center_return=True,trial_index=0:	RUNNING

2020-01-30 11:38:57,596	WARNING logger.py:130 -- Couldn't import TensorFlow - disabling TensorBoard logging.
2020-01-30 11:38:57,607	WARNING logger.py:224 -- Could not instantiate <class 'ray.tune.logger.TFLogger'> - skipping.
(pid=4389) [2020-01-30 11:38:58,297 PID:4389 INFO logger.py info] Running sessions
(pid=4388) [2020-01-30 11:38:58,292 PID:4388 INFO logger.py info] Running sessions
(pid=4388) terminate called after throwing an instance of 'c10::Error'
(pid=4388)   what():  CUDA error: initialization error (getDevice at /opt/conda/conda-bld/pytorch_1556653114079/work/c10/cuda/impl/CUDAGuardImpl.h:35)
(pid=4388) frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7fcf770dedc5 in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libc10.so)
(pid=4388) frame #1: <unknown function> + 0xca67 (0x7fcf6f2daa67 in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
(pid=4388) frame #2: torch::autograd::Engine::thread_init(int) + 0x3ee (0x7fcf6f9fbb1e in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
(pid=4388) frame #3: torch::autograd::python::PythonEngine::thread_init(int) + 0x2a (0x7fcfa636128a in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
(pid=4388) frame #4: <unknown function> + 0xc8421 (0x7fcfbb3bd421 in /home/joe/anaconda3/envs/lab/bin/../lib/libstdc++.so.6)
(pid=4388) frame #5: <unknown function> + 0x76db (0x7fcfc0c466db in /lib/x86_64-linux-gnu/libpthread.so.0)
(pid=4388) frame #6: clone + 0x3f (0x7fcfc096f88f in /lib/x86_64-linux-gnu/libc.so.6)
(pid=4388) 
(pid=4388) Fatal Python error: Aborted
(pid=4388) 
(pid=4388) Stack (most recent call first):
(pid=4389) [2020-01-30 11:38:58,326 PID:4456 INFO openai.py __init__] OpenAIEnv:
(pid=4389) - env_spec = {'max_frame': 100000, 'max_t': None, 'name': 'CartPole-v0'}
(pid=4389) - eval_frequency = 2000
(pid=4389) - log_frequency = 10000
(pid=4389) - frame_op = None
(pid=4389) - frame_op_len = None
(pid=4389) - image_downsize = (84, 84)
(pid=4389) - normalize_state = False
(pid=4389) - reward_scale = None
(pid=4389) - num_envs = 1
(pid=4389) - name = CartPole-v0
(pid=4389) - max_t = 200
(pid=4389) - max_frame = 100000
(pid=4389) - to_render = False
(pid=4389) - is_venv = False
(pid=4389) - clock_speed = 1
(pid=4389) - clock = <slm_lab.env.base.Clock object at 0x7fcc1a023d30>
(pid=4389) - done = False
(pid=4389) - total_reward = nan
(pid=4389) - u_env = <TrackReward<TimeLimit<CartPoleEnv<CartPole-v0>>>>
(pid=4389) - observation_space = Box(4,)
(pid=4389) - action_space = Discrete(2)
(pid=4389) - observable_dim = {'state': 4}
(pid=4389) - action_dim = 2
(pid=4389) - is_discrete = True
(pid=4389) [2020-01-30 11:38:58,327 PID:4453 INFO openai.py __init__] OpenAIEnv:
(pid=4389) - env_spec = {'max_frame': 100000, 'max_t': None, 'name': 'CartPole-v0'}
(pid=4389) - eval_frequency = 2000
(pid=4389) - log_frequency = 10000
(pid=4389) - frame_op = None
(pid=4389) - frame_op_len = None
(pid=4389) - image_downsize = (84, 84)
(pid=4389) - normalize_state = False
(pid=4389) - reward_scale = None
(pid=4389) - num_envs = 1
(pid=4389) - name = CartPole-v0
(pid=4389) - max_t = 200
(pid=4389) - max_frame = 100000
(pid=4389) - to_render = False
(pid=4389) - is_venv = False
(pid=4389) - clock_speed = 1
(pid=4389) - clock = <slm_lab.env.base.Clock object at 0x7fcc1a023d30>
(pid=4389) - done = False
(pid=4389) - total_reward = nan
(pid=4389) - u_env = <TrackReward<TimeLimit<CartPoleEnv<CartPole-v0>>>>
(pid=4389) - observation_space = Box(4,)
(pid=4389) - action_space = Discrete(2)
(pid=4389) - observable_dim = {'state': 4}
(pid=4389) - action_dim = 2
(pid=4389) - is_discrete = True
(pid=4389) [2020-01-30 11:38:58,328 PID:4450 INFO openai.py __init__] OpenAIEnv:
(pid=4389) - env_spec = {'max_frame': 100000, 'max_t': None, 'name': 'CartPole-v0'}
(pid=4389) - eval_frequency = 2000
(pid=4389) - log_frequency = 10000
(pid=4389) - frame_op = None
(pid=4389) - frame_op_len = None
(pid=4389) - image_downsize = (84, 84)
(pid=4389) - normalize_state = False
(pid=4389) - reward_scale = None
(pid=4389) - num_envs = 1
(pid=4389) - name = CartPole-v0
(pid=4389) - max_t = 200
(pid=4389) - max_frame = 100000
(pid=4389) - to_render = False
(pid=4389) - is_venv = False
(pid=4389) - clock_speed = 1
(pid=4389) - clock = <slm_lab.env.base.Clock object at 0x7fcc1a023d30>
(pid=4389) - done = False
(pid=4389) - total_reward = nan
(pid=4389) - u_env = <TrackReward<TimeLimit<CartPoleEnv<CartPole-v0>>>>
(pid=4389) - observation_space = Box(4,)
(pid=4389) - action_space = Discrete(2)
(pid=4389) - observable_dim = {'state': 4}
(pid=4389) - action_dim = 2
(pid=4389) - is_discrete = True
(pid=4389) [2020-01-30 11:38:58,335 PID:4458 INFO openai.py __init__] OpenAIEnv:
(pid=4389) - env_spec = {'max_frame': 100000, 'max_t': None, 'name': 'CartPole-v0'}
(pid=4389) - eval_frequency = 2000
(pid=4389) - log_frequency = 10000
(pid=4389) - frame_op = None
(pid=4389) - frame_op_len = None
(pid=4389) - image_downsize = (84, 84)
(pid=4389) - normalize_state = False
(pid=4389) - reward_scale = None
(pid=4389) - num_envs = 1
(pid=4389) - name = CartPole-v0
(pid=4389) - max_t = 200
(pid=4389) - max_frame = 100000
(pid=4389) - to_render = False
(pid=4389) - is_venv = False
(pid=4389) - clock_speed = 1
(pid=4389) - clock = <slm_lab.env.base.Clock object at 0x7fcc1a023d30>
(pid=4389) - done = False
(pid=4389) - total_reward = nan
(pid=4389) - u_env = <TrackReward<TimeLimit<CartPoleEnv<CartPole-v0>>>>
(pid=4389) - observation_space = Box(4,)
(pid=4389) - action_space = Discrete(2)
(pid=4389) - observable_dim = {'state': 4}
(pid=4389) - action_dim = 2
(pid=4389) - is_discrete = True
(pid=4388) [2020-01-30 11:38:58,313 PID:4440 INFO openai.py __init__] OpenAIEnv:
(pid=4388) - env_spec = {'max_frame': 100000, 'max_t': None, 'name': 'CartPole-v0'}
(pid=4388) - eval_frequency = 2000
(pid=4388) - log_frequency = 10000
(pid=4388) - frame_op = None
(pid=4388) - frame_op_len = None
(pid=4388) - image_downsize = (84, 84)
(pid=4388) - normalize_state = False
(pid=4388) - reward_scale = None
(pid=4388) - num_envs = 1
(pid=4388) - name = CartPole-v0
(pid=4388) - max_t = 200
(pid=4388) - max_frame = 100000
(pid=4388) - to_render = False
(pid=4388) - is_venv = False
(pid=4388) - clock_speed = 1
(pid=4388) - clock = <slm_lab.env.base.Clock object at 0x7fce28f7fcf8>
(pid=4388) - done = False
(pid=4388) - total_reward = nan
(pid=4388) - u_env = <TrackReward<TimeLimit<CartPoleEnv<CartPole-v0>>>>
(pid=4388) - observation_space = Box(4,)
(pid=4388) - action_space = Discrete(2)
(pid=4388) - observable_dim = {'state': 4}
(pid=4388) - action_dim = 2
(pid=4388) - is_discrete = True
(pid=4388) [2020-01-30 11:38:58,318 PID:4445 INFO openai.py __init__] OpenAIEnv:
(pid=4388) - env_spec = {'max_frame': 100000, 'max_t': None, 'name': 'CartPole-v0'}
(pid=4388) - eval_frequency = 2000
(pid=4388) - log_frequency = 10000
(pid=4388) - frame_op = None
(pid=4388) - frame_op_len = None
(pid=4388) - image_downsize = (84, 84)
(pid=4388) - normalize_state = False
(pid=4388) - reward_scale = None
(pid=4388) - num_envs = 1
(pid=4388) - name = CartPole-v0
(pid=4388) - max_t = 200
(pid=4388) - max_frame = 100000
(pid=4388) - to_render = False
(pid=4388) - is_venv = False
(pid=4388) - clock_speed = 1
(pid=4388) - clock = <slm_lab.env.base.Clock object at 0x7fce28f7fcf8>
(pid=4388) - done = False
(pid=4388) - total_reward = nan
(pid=4388) - u_env = <TrackReward<TimeLimit<CartPoleEnv<CartPole-v0>>>>
(pid=4388) - observation_space = Box(4,)
(pid=4388) - action_space = Discrete(2)
(pid=4388) - observable_dim = {'state': 4}
(pid=4388) - action_dim = 2
(pid=4388) - is_discrete = True
(pid=4388) [2020-01-30 11:38:58,319 PID:4449 INFO openai.py __init__] OpenAIEnv:
(pid=4388) - env_spec = {'max_frame': 100000, 'max_t': None, 'name': 'CartPole-v0'}
(pid=4388) - eval_frequency = 2000
(pid=4388) - log_frequency = 10000
(pid=4388) - frame_op = None
(pid=4388) - frame_op_len = None
(pid=4388) - image_downsize = (84, 84)
(pid=4388) - normalize_state = False
(pid=4388) - reward_scale = None
(pid=4388) - num_envs = 1
(pid=4388) - name = CartPole-v0
(pid=4388) - max_t = 200
(pid=4388) - max_frame = 100000
(pid=4388) - to_render = False
(pid=4388) - is_venv = False
(pid=4388) - clock_speed = 1
(pid=4388) - clock = <slm_lab.env.base.Clock object at 0x7fce28f7fcf8>
(pid=4388) - done = False
(pid=4388) - total_reward = nan
(pid=4388) - u_env = <TrackReward<TimeLimit<CartPoleEnv<CartPole-v0>>>>
(pid=4388) - observation_space = Box(4,)
(pid=4388) - action_space = Discrete(2)
(pid=4388) - observable_dim = {'state': 4}
(pid=4388) - action_dim = 2
(pid=4388) - is_discrete = True
(pid=4388) [2020-01-30 11:38:58,323 PID:4452 INFO openai.py __init__] OpenAIEnv:
(pid=4388) - env_spec = {'max_frame': 100000, 'max_t': None, 'name': 'CartPole-v0'}
(pid=4388) - eval_frequency = 2000
(pid=4388) - log_frequency = 10000
(pid=4388) - frame_op = None
(pid=4388) - frame_op_len = None
(pid=4388) - image_downsize = (84, 84)
(pid=4388) - normalize_state = False
(pid=4388) - reward_scale = None
(pid=4388) - num_envs = 1
(pid=4388) - name = CartPole-v0
(pid=4388) - max_t = 200
(pid=4388) - max_frame = 100000
(pid=4388) - to_render = False
(pid=4388) - is_venv = False
(pid=4388) - clock_speed = 1
(pid=4388) - clock = <slm_lab.env.base.Clock object at 0x7fce28f7fcf8>
(pid=4388) - done = False
(pid=4388) - total_reward = nan
(pid=4388) - u_env = <TrackReward<TimeLimit<CartPoleEnv<CartPole-v0>>>>
(pid=4388) - observation_space = Box(4,)
(pid=4388) - action_space = Discrete(2)
(pid=4388) - observable_dim = {'state': 4}
(pid=4388) - action_dim = 2
(pid=4388) - is_discrete = True
(pid=4389) [2020-01-30 11:38:58,339 PID:4453 INFO base.py post_init_nets] Initialized algorithm models for lab_mode: search
(pid=4389) [2020-01-30 11:38:58,340 PID:4450 INFO base.py post_init_nets] Initialized algorithm models for lab_mode: search
(pid=4389) [2020-01-30 11:38:58,343 PID:4456 INFO base.py post_init_nets] Initialized algorithm models for lab_mode: search
(pid=4389) [2020-01-30 11:38:58,345 PID:4450 INFO base.py __init__][2020-01-30 11:38:58,345 PID:4453 INFO base.py __init__] Reinforce:
(pid=4389) - agent = <slm_lab.agent.Agent object at 0x7fcc10bddcc0>
(pid=4389) - algorithm_spec = {'action_pdtype': 'default',
(pid=4389)  'action_policy': 'default',
(pid=4389)  'center_return': False,
(pid=4389)  'entropy_coef_spec': {'end_step': 20000,
(pid=4389)                        'end_val': 0.001,
(pid=4389)                        'name': 'linear_decay',
(pid=4389)                        'start_step': 0,
(pid=4389)                        'start_val': 0.01},
(pid=4389)  'explore_var_spec': None,
(pid=4389)  'gamma': 0.99,
(pid=4389)  'name': 'Reinforce',
(pid=4389)  'training_frequency': 1}
(pid=4389) - name = Reinforce
(pid=4389) - memory_spec = {'name': 'OnPolicyReplay'}
(pid=4389) - net_spec = {'clip_grad_val': None,
(pid=4389)  'hid_layers': [64],
(pid=4389)  'hid_layers_activation': 'selu',
(pid=4389)  'loss_spec': {'name': 'MSELoss'},
(pid=4389)  'lr_scheduler_spec': None,
(pid=4389)  'optim_spec': {'lr': 0.002, 'name': 'Adam'},
(pid=4389)  'type': 'MLPNet'}
(pid=4389) - body = body: {
(pid=4389)   "agent": "<slm_lab.agent.Agent object at 0x7fcc10bddcc0>",
(pid=4389)   "env": "<slm_lab.env.openai.OpenAIEnv object at 0x7fcc10c56cc0>",
(pid=4389)   "a": 0,
(pid=4389)   "e": 0,
(pid=4389)   "b": 0,
(pid=4389)   "aeb": "(0, 0, 0)",
(pid=4389)   "explore_var": NaN,
(pid=4389)   "entropy_coef": 0.01,
(pid=4389)   "loss": NaN,
(pid=4389)   "mean_entropy": NaN,
(pid=4389)   "mean_grad_norm": NaN,
(pid=4389)   "best_total_reward_ma": -Infinity,
(pid=4389)   "total_reward_ma": NaN,
(pid=4389)   "train_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
(pid=4389)   "eval_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
(pid=4389)   "tb_writer": "<torch.utils.tensorboard.writer.SummaryWriter object at 0x7fcc10bcb710>",
(pid=4389)   "tb_actions": [],
(pid=4389)   "tb_tracker": {},
(pid=4389)   "observation_space": "Box(4,)",
(pid=4389)   "action_space": "Discrete(2)",
(pid=4389)   "observable_dim": {
(pid=4389)     "state": 4
(pid=4389)   },
(pid=4389)   "state_dim": 4,
(pid=4389)   "action_dim": 2,
(pid=4389)   "is_discrete": true,
(pid=4389)   "action_type": "discrete",
(pid=4389)   "action_pdtype": "Categorical",
(pid=4389)   "ActionPD": "<class 'torch.distributions.categorical.Categorical'>",
(pid=4389)   "memory": "<slm_lab.agent.memory.onpolicy.OnPolicyReplay object at 0x7fcc10bddd68>"
(pid=4389) }
(pid=4389) - action_pdtype = default
(pid=4389) - action_policy = <function default at 0x7fcc21560620>
(pid=4389) - center_return = False
(pid=4389) - explore_var_spec = None
(pid=4389) - entropy_coef_spec = {'end_step': 20000,
(pid=4389)  'end_val': 0.001,
(pid=4389)  'name': 'linear_decay',
(pid=4389)  'start_step': 0,
(pid=4389)  'start_val': 0.01}
(pid=4389) - policy_loss_coef = 1.0
(pid=4389) - gamma = 0.99
(pid=4389) - training_frequency = 1
(pid=4389) - to_train = 0
(pid=4389) - explore_var_scheduler = <slm_lab.agent.algorithm.policy_util.VarScheduler object at 0x7fcc10bddd30>
(pid=4389) - entropy_coef_scheduler = <slm_lab.agent.algorithm.policy_util.VarScheduler object at 0x7fcc10bdda20>
(pid=4389) - net = MLPNet(
(pid=4389)   (model): Sequential(
(pid=4389)     (0): Linear(in_features=4, out_features=64, bias=True)
(pid=4389)     (1): SELU()
(pid=4389)   )
(pid=4389)   (model_tail): Sequential(
(pid=4389)     (0): Linear(in_features=64, out_features=2, bias=True)
(pid=4389)   )
(pid=4389)   (loss_fn): MSELoss()
(pid=4389) )
(pid=4389) - net_names = ['net']
(pid=4389) - optim = Adam (
(pid=4389) Parameter Group 0
(pid=4389)     amsgrad: False
(pid=4389)     betas: (0.9, 0.999)
(pid=4389)     eps: 1e-08
(pid=4389)     lr: 0.002
(pid=4389)     weight_decay: 0
(pid=4389) )
(pid=4389) - lr_scheduler = <slm_lab.agent.net.net_util.NoOpLRScheduler object at 0x7fcc10ba20b8>
(pid=4389) - global_net = None
(pid=4389)  Reinforce:
(pid=4389) - agent = <slm_lab.agent.Agent object at 0x7fcc10bdddd8>
(pid=4389) - algorithm_spec = {'action_pdtype': 'default',
(pid=4389)  'action_policy': 'default',
(pid=4389)  'center_return': False,
(pid=4389)  'entropy_coef_spec': {'end_step': 20000,
(pid=4388) [2020-01-30 11:38:58,330 PID:4445 INFO base.py post_init_nets] Initialized algorithm models for lab_mode: search
(pid=4388) [2020-01-30 11:38:58,330 PID:4449 INFO base.py post_init_nets] Initialized algorithm models for lab_mode: search
(pid=4388) [2020-01-30 11:38:58,335 PID:4452 INFO base.py post_init_nets] Initialized algorithm models for lab_mode: search
(pid=4388) [2020-01-30 11:38:58,335 PID:4449 INFO base.py __init__] Reinforce:
(pid=4388) - agent = <slm_lab.agent.Agent object at 0x7fce0e097f60>
(pid=4388) - algorithm_spec = {'action_pdtype': 'default',
(pid=4388)  'action_policy': 'default',
(pid=4388)  'center_return': True,
(pid=4388)  'entropy_coef_spec': {'end_step': 20000,
(pid=4388)                        'end_val': 0.001,
(pid=4388)                        'name': 'linear_decay',
(pid=4388)                        'start_step': 0,
(pid=4388)                        'start_val': 0.01},
(pid=4388)  'explore_var_spec': None,
(pid=4388)  'gamma': 0.99,
(pid=4388)  'name': 'Reinforce',
(pid=4388)  'training_frequency': 1}
(pid=4388) - name = Reinforce
(pid=4388) - memory_spec = {'name': 'OnPolicyReplay'}
(pid=4388) - net_spec = {'clip_grad_val': None,
(pid=4388)  'hid_layers': [64],
(pid=4388)  'hid_layers_activation': 'selu',
(pid=4388)  'loss_spec': {'name': 'MSELoss'},
(pid=4388)  'lr_scheduler_spec': None,
(pid=4388)  'optim_spec': {'lr': 0.002, 'name': 'Adam'},
(pid=4388)  'type': 'MLPNet'}
(pid=4388) - body = body: {
(pid=4388)   "agent": "<slm_lab.agent.Agent object at 0x7fce0e097f60>",
(pid=4388)   "env": "<slm_lab.env.openai.OpenAIEnv object at 0x7fce28044eb8>",
(pid=4388)   "a": 0,
(pid=4388)   "e": 0,
(pid=4388)   "b": 0,
(pid=4388)   "aeb": "(0, 0, 0)",
(pid=4388)   "explore_var": NaN,
(pid=4388)   "entropy_coef": 0.01,
(pid=4388)   "loss": NaN,
(pid=4388)   "mean_entropy": NaN,
(pid=4388)   "mean_grad_norm": NaN,
(pid=4388)   "best_total_reward_ma": -Infinity,
(pid=4388)   "total_reward_ma": NaN,
(pid=4388)   "train_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
(pid=4388)   "eval_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
(pid=4388)   "tb_writer": "<torch.utils.tensorboard.writer.SummaryWriter object at 0x7fce2b00a780>",
(pid=4388)   "tb_actions": [],
(pid=4388)   "tb_tracker": {},
(pid=4388)   "observation_space": "Box(4,)",
(pid=4388)   "action_space": "Discrete(2)",
(pid=4388)   "observable_dim": {
(pid=4388)     "state": 4
(pid=4388)   },
(pid=4388)   "state_dim": 4,
(pid=4388)   "action_dim": 2,
(pid=4388)   "is_discrete": true,
(pid=4388)   "action_type": "discrete",
(pid=4388)   "action_pdtype": "Categorical",
(pid=4388)   "ActionPD": "<class 'torch.distributions.categorical.Categorical'>",
(pid=4388)   "memory": "<slm_lab.agent.memory.onpolicy.OnPolicyReplay object at 0x7fce0e097fd0>"
(pid=4388) }
(pid=4388) - action_pdtype = default
(pid=4388) - action_policy = <function default at 0x7fce304ad620>
(pid=4388) - center_return = True
(pid=4388) - explore_var_spec = None
(pid=4388) - entropy_coef_spec = {'end_step': 20000,
(pid=4388)  'end_val': 0.001,
(pid=4388)  'name': 'linear_decay',
(pid=4388)  'start_step': 0,
(pid=4388)  'start_val': 0.01}
(pid=4388) - policy_loss_coef = 1.0
(pid=4388) - gamma = 0.99
(pid=4388) - training_frequency = 1
(pid=4388) - to_train = 0
(pid=4388) - explore_var_scheduler = <slm_lab.agent.algorithm.policy_util.VarScheduler object at 0x7fce0e097c88>
(pid=4388) - entropy_coef_scheduler = <slm_lab.agent.algorithm.policy_util.VarScheduler object at 0x7fce0e083940>
(pid=4388) - net = MLPNet(
(pid=4388)   (model): Sequential(
(pid=4388)     (0): Linear(in_features=4, out_features=64, bias=True)
(pid=4388)     (1): SELU()
(pid=4388)   )
(pid=4388)   (model_tail): Sequential(
(pid=4388)     (0): Linear(in_features=64, out_features=2, bias=True)
(pid=4388)   )
(pid=4388)   (loss_fn): MSELoss()
(pid=4388) )
(pid=4388) - net_names = ['net']
(pid=4388) - optim = Adam (
(pid=4388) Parameter Group 0
(pid=4388)     amsgrad: False
(pid=4388)     betas: (0.9, 0.999)
(pid=4388)     eps: 1e-08
(pid=4388)     lr: 0.002
(pid=4388)     weight_decay: 0
(pid=4388) )
(pid=4388) - lr_scheduler = <slm_lab.agent.net.net_util.NoOpLRScheduler object at 0x7fce0e0562e8>
(pid=4388) - global_net = None
(pid=4388) [2020-01-30 11:38:58,335 PID:4445 INFO base.py __init__] Reinforce:
(pid=4388) - agent = <slm_lab.agent.Agent object at 0x7fce0e098da0>
(pid=4388) - algorithm_spec = {'action_pdtype': 'default',
(pid=4388)  'action_policy': 'default',
(pid=4388)  'center_return': True,
(pid=4388)  'entropy_coef_spec': {'end_step': 20000,
(pid=4389)                        'end_val': 0.001,
(pid=4389)                        'name': 'linear_decay',
(pid=4389)                        'start_step': 0,
(pid=4389)                        'start_val': 0.01},
(pid=4389)  'explore_var_spec': None,
(pid=4389)  'gamma': 0.99,
(pid=4389)  'name': 'Reinforce',
(pid=4389)  'training_frequency': 1}
(pid=4389) - name = Reinforce
(pid=4389) - memory_spec = {'name': 'OnPolicyReplay'}
(pid=4389) - net_spec = {'clip_grad_val': None,
(pid=4389)  'hid_layers': [64],
(pid=4389)  'hid_layers_activation': 'selu',
(pid=4389)  'loss_spec': {'name': 'MSELoss'},
(pid=4389)  'lr_scheduler_spec': None,
(pid=4389)  'optim_spec': {'lr': 0.002, 'name': 'Adam'},
(pid=4389)  'type': 'MLPNet'}
(pid=4389) - body = body: {
(pid=4389)   "agent": "<slm_lab.agent.Agent object at 0x7fcc10bdddd8>",
(pid=4389)   "env": "<slm_lab.env.openai.OpenAIEnv object at 0x7fcc10c56da0>",
(pid=4389)   "a": 0,
(pid=4389)   "e": 0,
(pid=4389)   "b": 0,
(pid=4389)   "aeb": "(0, 0, 0)",
(pid=4389)   "explore_var": NaN,
(pid=4389)   "entropy_coef": 0.01,
(pid=4389)   "loss": NaN,
(pid=4389)   "mean_entropy": NaN,
(pid=4389)   "mean_grad_norm": NaN,
(pid=4389)   "best_total_reward_ma": -Infinity,
(pid=4389)   "total_reward_ma": NaN,
(pid=4389)   "train_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
(pid=4389)   "eval_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
(pid=4389)   "tb_writer": "<torch.utils.tensorboard.writer.SummaryWriter object at 0x7fcc10bc5828>",
(pid=4389)   "tb_actions": [],
(pid=4389)   "tb_tracker": {},
(pid=4389)   "observation_space": "Box(4,)",
(pid=4389)   "action_space": "Discrete(2)",
(pid=4389)   "observable_dim": {
(pid=4389)     "state": 4
(pid=4389)   },
(pid=4389)   "state_dim": 4,
(pid=4389)   "action_dim": 2,
(pid=4389)   "is_discrete": true,
(pid=4389)   "action_type": "discrete",
(pid=4389)   "action_pdtype": "Categorical",
(pid=4389)   "ActionPD": "<class 'torch.distributions.categorical.Categorical'>",
(pid=4389)   "memory": "<slm_lab.agent.memory.onpolicy.OnPolicyReplay object at 0x7fcc10bdde80>"
(pid=4389) }
(pid=4389) - action_pdtype = default
(pid=4389) - action_policy = <function default at 0x7fcc21560620>
(pid=4389) - center_return = False
(pid=4389) - explore_var_spec = None
(pid=4389) - entropy_coef_spec = {'end_step': 20000,
(pid=4389)  'end_val': 0.001,
(pid=4389)  'name': 'linear_decay',
(pid=4389)  'start_step': 0,
(pid=4389)  'start_val': 0.01}
(pid=4389) - policy_loss_coef = 1.0
(pid=4389) - gamma = 0.99
(pid=4389) - training_frequency = 1
(pid=4389) - to_train = 0
(pid=4389) - explore_var_scheduler = <slm_lab.agent.algorithm.policy_util.VarScheduler object at 0x7fcc10bdde48>
(pid=4389) - entropy_coef_scheduler = <slm_lab.agent.algorithm.policy_util.VarScheduler object at 0x7fcc10bddb38>
(pid=4389) - net = MLPNet(
(pid=4389)   (model): Sequential(
(pid=4389)     (0): Linear(in_features=4, out_features=64, bias=True)
(pid=4389)     (1): SELU()
(pid=4389)   )
(pid=4389)   (model_tail): Sequential(
(pid=4389)     (0): Linear(in_features=64, out_features=2, bias=True)
(pid=4389)   )
(pid=4389)   (loss_fn): MSELoss()
(pid=4389) )
(pid=4389) - net_names = ['net']
(pid=4389) - optim = Adam (
(pid=4389) Parameter Group 0
(pid=4389)     amsgrad: False
(pid=4389)     betas: (0.9, 0.999)
(pid=4389)     eps: 1e-08
(pid=4389)     lr: 0.002
(pid=4389)     weight_decay: 0
(pid=4389) )
(pid=4389) - lr_scheduler = <slm_lab.agent.net.net_util.NoOpLRScheduler object at 0x7fcc10ba11d0>
(pid=4389) - global_net = None
(pid=4389) [2020-01-30 11:38:58,347 PID:4453 INFO __init__.py __init__][2020-01-30 11:38:58,347 PID:4450 INFO __init__.py __init__] Agent:
(pid=4389) - spec = reinforce_baseline_cartpole
(pid=4389) - agent_spec = {'algorithm': {'action_pdtype': 'default',
(pid=4389)                'action_policy': 'default',
(pid=4389)                'center_return': False,
(pid=4389)                'entropy_coef_spec': {'end_step': 20000,
(pid=4389)                                      'end_val': 0.001,
(pid=4389)                                      'name': 'linear_decay',
(pid=4389)                                      'start_step': 0,
(pid=4389)                                      'start_val': 0.01},
(pid=4389)                'explore_var_spec': None,
(pid=4389)                'gamma': 0.99,
(pid=4389)                'name': 'Reinforce',
(pid=4389)                'training_frequency': 1},
(pid=4389)  'memory': {'name': 'OnPolicyReplay'},
(pid=4388)                        'end_val': 0.001,
(pid=4388)                        'name': 'linear_decay',
(pid=4388)                        'start_step': 0,
(pid=4388)                        'start_val': 0.01},
(pid=4388)  'explore_var_spec': None,
(pid=4388)  'gamma': 0.99,
(pid=4388)  'name': 'Reinforce',
(pid=4388)  'training_frequency': 1}
(pid=4388) - name = Reinforce
(pid=4388) - memory_spec = {'name': 'OnPolicyReplay'}
(pid=4388) - net_spec = {'clip_grad_val': None,
(pid=4388)  'hid_layers': [64],
(pid=4388)  'hid_layers_activation': 'selu',
(pid=4388)  'loss_spec': {'name': 'MSELoss'},
(pid=4388)  'lr_scheduler_spec': None,
(pid=4388)  'optim_spec': {'lr': 0.002, 'name': 'Adam'},
(pid=4388)  'type': 'MLPNet'}
(pid=4388) - body = body: {
(pid=4388)   "agent": "<slm_lab.agent.Agent object at 0x7fce0e098da0>",
(pid=4388)   "env": "<slm_lab.env.openai.OpenAIEnv object at 0x7fce28044da0>",
(pid=4388)   "a": 0,
(pid=4388)   "e": 0,
(pid=4388)   "b": 0,
(pid=4388)   "aeb": "(0, 0, 0)",
(pid=4388)   "explore_var": NaN,
(pid=4388)   "entropy_coef": 0.01,
(pid=4388)   "loss": NaN,
(pid=4388)   "mean_entropy": NaN,
(pid=4388)   "mean_grad_norm": NaN,
(pid=4388)   "best_total_reward_ma": -Infinity,
(pid=4388)   "total_reward_ma": NaN,
(pid=4388)   "train_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
(pid=4388)   "eval_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
(pid=4388)   "tb_writer": "<torch.utils.tensorboard.writer.SummaryWriter object at 0x7fce2b00a780>",
(pid=4388)   "tb_actions": [],
(pid=4388)   "tb_tracker": {},
(pid=4388)   "observation_space": "Box(4,)",
(pid=4388)   "action_space": "Discrete(2)",
(pid=4388)   "observable_dim": {
(pid=4388)     "state": 4
(pid=4388)   },
(pid=4388)   "state_dim": 4,
(pid=4388)   "action_dim": 2,
(pid=4388)   "is_discrete": true,
(pid=4388)   "action_type": "discrete",
(pid=4388)   "action_pdtype": "Categorical",
(pid=4388)   "ActionPD": "<class 'torch.distributions.categorical.Categorical'>",
(pid=4388)   "memory": "<slm_lab.agent.memory.onpolicy.OnPolicyReplay object at 0x7fce0e098e48>"
(pid=4388) }
(pid=4388) - action_pdtype = default
(pid=4388) - action_policy = <function default at 0x7fce304ad620>
(pid=4388) - center_return = True
(pid=4388) - explore_var_spec = None
(pid=4388) - entropy_coef_spec = {'end_step': 20000,
(pid=4388)  'end_val': 0.001,
(pid=4388)  'name': 'linear_decay',
(pid=4388)  'start_step': 0,
(pid=4388)  'start_val': 0.01}
(pid=4388) - policy_loss_coef = 1.0
(pid=4388) - gamma = 0.99
(pid=4388) - training_frequency = 1
(pid=4388) - to_train = 0
(pid=4388) - explore_var_scheduler = <slm_lab.agent.algorithm.policy_util.VarScheduler object at 0x7fce0e098e10>
(pid=4388) - entropy_coef_scheduler = <slm_lab.agent.algorithm.policy_util.VarScheduler object at 0x7fce0e098f28>
(pid=4388) - net = MLPNet(
(pid=4388)   (model): Sequential(
(pid=4388)     (0): Linear(in_features=4, out_features=64, bias=True)
(pid=4388)     (1): SELU()
(pid=4388)   )
(pid=4388)   (model_tail): Sequential(
(pid=4388)     (0): Linear(in_features=64, out_features=2, bias=True)
(pid=4388)   )
(pid=4388)   (loss_fn): MSELoss()
(pid=4388) )
(pid=4388) - net_names = ['net']
(pid=4388) - optim = Adam (
(pid=4388) Parameter Group 0
(pid=4388)     amsgrad: False
(pid=4388)     betas: (0.9, 0.999)
(pid=4388)     eps: 1e-08
(pid=4388)     lr: 0.002
(pid=4388)     weight_decay: 0
(pid=4388) )
(pid=4388) - lr_scheduler = <slm_lab.agent.net.net_util.NoOpLRScheduler object at 0x7fce0e05b1d0>
(pid=4388) - global_net = None
(pid=4388) [2020-01-30 11:38:58,336 PID:4449 INFO __init__.py __init__] Agent:
(pid=4388) - spec = reinforce_baseline_cartpole
(pid=4388) - agent_spec = {'algorithm': {'action_pdtype': 'default',
(pid=4388)                'action_policy': 'default',
(pid=4388)                'center_return': True,
(pid=4388)                'entropy_coef_spec': {'end_step': 20000,
(pid=4388)                                      'end_val': 0.001,
(pid=4388)                                      'name': 'linear_decay',
(pid=4388)                                      'start_step': 0,
(pid=4388)                                      'start_val': 0.01},
(pid=4388)                'explore_var_spec': None,
(pid=4388)                'gamma': 0.99,
(pid=4388)                'name': 'Reinforce',
(pid=4388)                'training_frequency': 1},
(pid=4388)  'memory': {'name': 'OnPolicyReplay'},
(pid=4389)  'name': 'Reinforce',
(pid=4389)  'net': {'clip_grad_val': None,
(pid=4389)          'hid_layers': [64],
(pid=4389)          'hid_layers_activation': 'selu',
(pid=4389)          'loss_spec': {'name': 'MSELoss'},
(pid=4389)          'lr_scheduler_spec': None,
(pid=4389)          'optim_spec': {'lr': 0.002, 'name': 'Adam'},
(pid=4389)          'type': 'MLPNet'}}
(pid=4389) - name = Reinforce
(pid=4389) - body = body: {
(pid=4389)   "agent": "<slm_lab.agent.Agent object at 0x7fcc10bdddd8>",
(pid=4389)   "env": "<slm_lab.env.openai.OpenAIEnv object at 0x7fcc10c56da0>",
(pid=4389)   "a": 0,
(pid=4389)   "e": 0,
(pid=4389)   "b": 0,
(pid=4389)   "aeb": "(0, 0, 0)",
(pid=4389)   "explore_var": NaN,
(pid=4389)   "entropy_coef": 0.01,
(pid=4389)   "loss": NaN,
(pid=4389)   "mean_entropy": NaN,
(pid=4389)   "mean_grad_norm": NaN,
(pid=4389)   "best_total_reward_ma": -Infinity,
(pid=4389)   "total_reward_ma": NaN,
(pid=4389)   "train_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
(pid=4389)   "eval_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
(pid=4389)   "tb_writer": "<torch.utils.tensorboard.writer.SummaryWriter object at 0x7fcc10bc5828>",
(pid=4389)   "tb_actions": [],
(pid=4389)   "tb_tracker": {},
(pid=4389)   "observation_space": "Box(4,)",
(pid=4389)   "action_space": "Discrete(2)",
(pid=4389)   "observable_dim": {
(pid=4389)     "state": 4
(pid=4389)   },
(pid=4389)   "state_dim": 4,
(pid=4389)   "action_dim": 2,
(pid=4389)   "is_discrete": true,
(pid=4389)   "action_type": "discrete",
(pid=4389)   "action_pdtype": "Categorical",
(pid=4389)   "ActionPD": "<class 'torch.distributions.categorical.Categorical'>",
(pid=4389)   "memory": "<slm_lab.agent.memory.onpolicy.OnPolicyReplay object at 0x7fcc10bdde80>"
(pid=4389) }
(pid=4389) - algorithm = <slm_lab.agent.algorithm.reinforce.Reinforce object at 0x7fcc10bdde10>
(pid=4389)  Agent:
(pid=4389) - spec = reinforce_baseline_cartpole
(pid=4389) - agent_spec = {'algorithm': {'action_pdtype': 'default',
(pid=4389)                'action_policy': 'default',
(pid=4389)                'center_return': False,
(pid=4389)                'entropy_coef_spec': {'end_step': 20000,
(pid=4389)                                      'end_val': 0.001,
(pid=4389)                                      'name': 'linear_decay',
(pid=4389)                                      'start_step': 0,
(pid=4389)                                      'start_val': 0.01},
(pid=4389)                'explore_var_spec': None,
(pid=4389)                'gamma': 0.99,
(pid=4389)                'name': 'Reinforce',
(pid=4389)                'training_frequency': 1},
(pid=4389)  'memory': {'name': 'OnPolicyReplay'},
(pid=4389)  'name': 'Reinforce',
(pid=4389)  'net': {'clip_grad_val': None,
(pid=4389)          'hid_layers': [64],
(pid=4389)          'hid_layers_activation': 'selu',
(pid=4389)          'loss_spec': {'name': 'MSELoss'},
(pid=4389)          'lr_scheduler_spec': None,
(pid=4389)          'optim_spec': {'lr': 0.002, 'name': 'Adam'},
(pid=4389)          'type': 'MLPNet'}}
(pid=4389) - name = Reinforce
(pid=4389) - body = body: {
(pid=4389)   "agent": "<slm_lab.agent.Agent object at 0x7fcc10bddcc0>",
(pid=4389)   "env": "<slm_lab.env.openai.OpenAIEnv object at 0x7fcc10c56cc0>",
(pid=4389)   "a": 0,
(pid=4389)   "e": 0,
(pid=4389)   "b": 0,
(pid=4389)   "aeb": "(0, 0, 0)",
(pid=4389)   "explore_var": NaN,
(pid=4389)   "entropy_coef": 0.01,
(pid=4389)   "loss": NaN,
(pid=4389)   "mean_entropy": NaN,
(pid=4389)   "mean_grad_norm": NaN,
(pid=4389)   "best_total_reward_ma": -Infinity,
(pid=4389)   "total_reward_ma": NaN,
(pid=4389)   "train_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
(pid=4389)   "eval_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
(pid=4389)   "tb_writer": "<torch.utils.tensorboard.writer.SummaryWriter object at 0x7fcc10bcb710>",
(pid=4389)   "tb_actions": [],
(pid=4389)   "tb_tracker": {},
(pid=4389)   "observation_space": "Box(4,)",
(pid=4389)   "action_space": "Discrete(2)",
(pid=4389)   "observable_dim": {
(pid=4389)     "state": 4
(pid=4389)   },
(pid=4389)   "state_dim": 4,
(pid=4389)   "action_dim": 2,
(pid=4389)   "is_discrete": true,
(pid=4389)   "action_type": "discrete",
(pid=4389)   "action_pdtype": "Categorical",
(pid=4389)   "ActionPD": "<class 'torch.distributions.categorical.Categorical'>",
(pid=4389)   "memory": "<slm_lab.agent.memory.onpolicy.OnPolicyReplay object at 0x7fcc10bddd68>"
(pid=4389) }
(pid=4389) - algorithm = <slm_lab.agent.algorithm.reinforce.Reinforce object at 0x7fcc10bddcf8>
(pid=4389) [2020-01-30 11:38:58,347 PID:4458 INFO base.py post_init_nets] Initialized algorithm models for lab_mode: search[2020-01-30 11:38:58,347 PID:4450 INFO logger.py info][2020-01-30 11:38:58,347 PID:4453 INFO logger.py info]
(pid=4388)  'name': 'Reinforce',
(pid=4388)  'net': {'clip_grad_val': None,
(pid=4388)          'hid_layers': [64],
(pid=4388)          'hid_layers_activation': 'selu',
(pid=4388)          'loss_spec': {'name': 'MSELoss'},
(pid=4388)          'lr_scheduler_spec': None,
(pid=4388)          'optim_spec': {'lr': 0.002, 'name': 'Adam'},
(pid=4388)          'type': 'MLPNet'}}
(pid=4388) - name = Reinforce
(pid=4388) - body = body: {
(pid=4388)   "agent": "<slm_lab.agent.Agent object at 0x7fce0e097f60>",
(pid=4388)   "env": "<slm_lab.env.openai.OpenAIEnv object at 0x7fce28044eb8>",
(pid=4388)   "a": 0,
(pid=4388)   "e": 0,
(pid=4388)   "b": 0,
(pid=4388)   "aeb": "(0, 0, 0)",
(pid=4388)   "explore_var": NaN,
(pid=4388)   "entropy_coef": 0.01,
(pid=4388)   "loss": NaN,
(pid=4388)   "mean_entropy": NaN,
(pid=4388)   "mean_grad_norm": NaN,
(pid=4388)   "best_total_reward_ma": -Infinity,
(pid=4388)   "total_reward_ma": NaN,
(pid=4388)   "train_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
(pid=4388)   "eval_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
(pid=4388)   "tb_writer": "<torch.utils.tensorboard.writer.SummaryWriter object at 0x7fce2b00a780>",
(pid=4388)   "tb_actions": [],
(pid=4388)   "tb_tracker": {},
(pid=4388)   "observation_space": "Box(4,)",
(pid=4388)   "action_space": "Discrete(2)",
(pid=4388)   "observable_dim": {
(pid=4388)     "state": 4
(pid=4388)   },
(pid=4388)   "state_dim": 4,
(pid=4388)   "action_dim": 2,
(pid=4388)   "is_discrete": true,
(pid=4388)   "action_type": "discrete",
(pid=4388)   "action_pdtype": "Categorical",
(pid=4388)   "ActionPD": "<class 'torch.distributions.categorical.Categorical'>",
(pid=4388)   "memory": "<slm_lab.agent.memory.onpolicy.OnPolicyReplay object at 0x7fce0e097fd0>"
(pid=4388) }
(pid=4388) - algorithm = <slm_lab.agent.algorithm.reinforce.Reinforce object at 0x7fce0e097f98>
(pid=4388) [2020-01-30 11:38:58,337 PID:4449 INFO logger.py info] Session:
(pid=4388) - spec = reinforce_baseline_cartpole
(pid=4388) - index = 2
(pid=4388) - agent = <slm_lab.agent.Agent object at 0x7fce0e097f60>
(pid=4388) - env = <slm_lab.env.openai.OpenAIEnv object at 0x7fce28044eb8>
(pid=4388) - eval_env = <slm_lab.env.openai.OpenAIEnv object at 0x7fce28044eb8>
(pid=4388) [2020-01-30 11:38:58,337 PID:4449 INFO logger.py info] Running RL loop for trial 0 session 2
(pid=4388) [2020-01-30 11:38:58,337 PID:4445 INFO __init__.py __init__] Agent:
(pid=4388) - spec = reinforce_baseline_cartpole
(pid=4388) - agent_spec = {'algorithm': {'action_pdtype': 'default',
(pid=4388)                'action_policy': 'default',
(pid=4388)                'center_return': True,
(pid=4388)                'entropy_coef_spec': {'end_step': 20000,
(pid=4388)                                      'end_val': 0.001,
(pid=4388)                                      'name': 'linear_decay',
(pid=4388)                                      'start_step': 0,
(pid=4388)                                      'start_val': 0.01},
(pid=4388)                'explore_var_spec': None,
(pid=4388)                'gamma': 0.99,
(pid=4388)                'name': 'Reinforce',
(pid=4388)                'training_frequency': 1},
(pid=4388)  'memory': {'name': 'OnPolicyReplay'},
(pid=4388)  'name': 'Reinforce',
(pid=4388)  'net': {'clip_grad_val': None,
(pid=4388)          'hid_layers': [64],
(pid=4388)          'hid_layers_activation': 'selu',
(pid=4388)          'loss_spec': {'name': 'MSELoss'},
(pid=4388)          'lr_scheduler_spec': None,
(pid=4388)          'optim_spec': {'lr': 0.002, 'name': 'Adam'},
(pid=4388)          'type': 'MLPNet'}}
(pid=4388) - name = Reinforce
(pid=4388) - body = body: {
(pid=4388)   "agent": "<slm_lab.agent.Agent object at 0x7fce0e098da0>",
(pid=4388)   "env": "<slm_lab.env.openai.OpenAIEnv object at 0x7fce28044da0>",
(pid=4388)   "a": 0,
(pid=4388)   "e": 0,
(pid=4388)   "b": 0,
(pid=4388)   "aeb": "(0, 0, 0)",
(pid=4388)   "explore_var": NaN,
(pid=4388)   "entropy_coef": 0.01,
(pid=4388)   "loss": NaN,
(pid=4388)   "mean_entropy": NaN,
(pid=4388)   "mean_grad_norm": NaN,
(pid=4388)   "best_total_reward_ma": -Infinity,
(pid=4388)   "total_reward_ma": NaN,
(pid=4388)   "train_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
(pid=4388)   "eval_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
(pid=4388)   "tb_writer": "<torch.utils.tensorboard.writer.SummaryWriter object at 0x7fce2b00a780>",
(pid=4388)   "tb_actions": [],
(pid=4388)   "tb_tracker": {},
(pid=4388)   "observation_space": "Box(4,)",
(pid=4388)   "action_space": "Discrete(2)",
(pid=4388)   "observable_dim": {
(pid=4388)     "state": 4
(pid=4388)   },
(pid=4388)   "state_dim": 4,
(pid=4388)   "action_dim": 2,
(pid=4388)   "is_discrete": true,
(pid=4389)  Session:
(pid=4389) - spec = reinforce_baseline_cartpole
(pid=4389) - index = 0
(pid=4389) - agent = <slm_lab.agent.Agent object at 0x7fcc10bddcc0>
(pid=4389) - env = <slm_lab.env.openai.OpenAIEnv object at 0x7fcc10c56cc0>
(pid=4389) - eval_env = <slm_lab.env.openai.OpenAIEnv object at 0x7fcc10c56cc0> Session:
(pid=4389) - spec = reinforce_baseline_cartpole
(pid=4389) - index = 1
(pid=4389) - agent = <slm_lab.agent.Agent object at 0x7fcc10bdddd8>
(pid=4389) - env = <slm_lab.env.openai.OpenAIEnv object at 0x7fcc10c56da0>
(pid=4389) - eval_env = <slm_lab.env.openai.OpenAIEnv object at 0x7fcc10c56da0>
(pid=4389) 
(pid=4389) [2020-01-30 11:38:58,347 PID:4450 INFO logger.py info] Running RL loop for trial 1 session 0[2020-01-30 11:38:58,347 PID:4453 INFO logger.py info]
(pid=4389)  Running RL loop for trial 1 session 1
(pid=4389) [2020-01-30 11:38:58,348 PID:4456 INFO base.py __init__] Reinforce:
(pid=4389) - agent = <slm_lab.agent.Agent object at 0x7fcc10bdcf28>
(pid=4389) - algorithm_spec = {'action_pdtype': 'default',
(pid=4389)  'action_policy': 'default',
(pid=4389)  'center_return': False,
(pid=4389)  'entropy_coef_spec': {'end_step': 20000,
(pid=4389)                        'end_val': 0.001,
(pid=4389)                        'name': 'linear_decay',
(pid=4389)                        'start_step': 0,
(pid=4389)                        'start_val': 0.01},
(pid=4389)  'explore_var_spec': None,
(pid=4389)  'gamma': 0.99,
(pid=4389)  'name': 'Reinforce',
(pid=4389)  'training_frequency': 1}
(pid=4389) - name = Reinforce
(pid=4389) - memory_spec = {'name': 'OnPolicyReplay'}
(pid=4389) - net_spec = {'clip_grad_val': None,
(pid=4389)  'hid_layers': [64],
(pid=4389)  'hid_layers_activation': 'selu',
(pid=4389)  'loss_spec': {'name': 'MSELoss'},
(pid=4389)  'lr_scheduler_spec': None,
(pid=4389)  'optim_spec': {'lr': 0.002, 'name': 'Adam'},
(pid=4389)  'type': 'MLPNet'}
(pid=4389) - body = body: {
(pid=4389)   "agent": "<slm_lab.agent.Agent object at 0x7fcc10bdcf28>",
(pid=4389)   "env": "<slm_lab.env.openai.OpenAIEnv object at 0x7fcc10c56eb8>",
(pid=4389)   "a": 0,
(pid=4389)   "e": 0,
(pid=4389)   "b": 0,
(pid=4389)   "aeb": "(0, 0, 0)",
(pid=4389)   "explore_var": NaN,
(pid=4389)   "entropy_coef": 0.01,
(pid=4389)   "loss": NaN,
(pid=4389)   "mean_entropy": NaN,
(pid=4389)   "mean_grad_norm": NaN,
(pid=4389)   "best_total_reward_ma": -Infinity,
(pid=4389)   "total_reward_ma": NaN,
(pid=4389)   "train_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
(pid=4389)   "eval_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
(pid=4389)   "tb_writer": "<torch.utils.tensorboard.writer.SummaryWriter object at 0x7fcc10bc7940>",
(pid=4389)   "tb_actions": [],
(pid=4389)   "tb_tracker": {},
(pid=4389)   "observation_space": "Box(4,)",
(pid=4389)   "action_space": "Discrete(2)",
(pid=4389)   "observable_dim": {
(pid=4389)     "state": 4
(pid=4389)   },
(pid=4389)   "state_dim": 4,
(pid=4389)   "action_dim": 2,
(pid=4389)   "is_discrete": true,
(pid=4389)   "action_type": "discrete",
(pid=4389)   "action_pdtype": "Categorical",
(pid=4389)   "ActionPD": "<class 'torch.distributions.categorical.Categorical'>",
(pid=4389)   "memory": "<slm_lab.agent.memory.onpolicy.OnPolicyReplay object at 0x7fcc10bdcfd0>"
(pid=4389) }
(pid=4389) - action_pdtype = default
(pid=4389) - action_policy = <function default at 0x7fcc21560620>
(pid=4389) - center_return = False
(pid=4389) - explore_var_spec = None
(pid=4389) - entropy_coef_spec = {'end_step': 20000,
(pid=4389)  'end_val': 0.001,
(pid=4389)  'name': 'linear_decay',
(pid=4389)  'start_step': 0,
(pid=4389)  'start_val': 0.01}
(pid=4389) - policy_loss_coef = 1.0
(pid=4389) - gamma = 0.99
(pid=4389) - training_frequency = 1
(pid=4389) - to_train = 0
(pid=4389) - explore_var_scheduler = <slm_lab.agent.algorithm.policy_util.VarScheduler object at 0x7fcc10bdcf98>
(pid=4389) - entropy_coef_scheduler = <slm_lab.agent.algorithm.policy_util.VarScheduler object at 0x7fcc10bdcc50>
(pid=4389) - net = MLPNet(
(pid=4389)   (model): Sequential(
(pid=4389)     (0): Linear(in_features=4, out_features=64, bias=True)
(pid=4389)     (1): SELU()
(pid=4389)   )
(pid=4389)   (model_tail): Sequential(
(pid=4389)     (0): Linear(in_features=64, out_features=2, bias=True)
(pid=4389)   )
(pid=4389)   (loss_fn): MSELoss()
(pid=4389) )
(pid=4389) - net_names = ['net']
(pid=4389) - optim = Adam (
(pid=4389) Parameter Group 0
(pid=4389)     amsgrad: False
(pid=4389)     betas: (0.9, 0.999)
(pid=4389)     eps: 1e-08
(pid=4388)   "action_type": "discrete",
(pid=4388)   "action_pdtype": "Categorical",
(pid=4388)   "ActionPD": "<class 'torch.distributions.categorical.Categorical'>",
(pid=4388)   "memory": "<slm_lab.agent.memory.onpolicy.OnPolicyReplay object at 0x7fce0e098e48>"
(pid=4388) }
(pid=4388) - algorithm = <slm_lab.agent.algorithm.reinforce.Reinforce object at 0x7fce0e098dd8>
(pid=4388) [2020-01-30 11:38:58,338 PID:4445 INFO logger.py info] Session:
(pid=4388) - spec = reinforce_baseline_cartpole
(pid=4388) - index = 1
(pid=4388) - agent = <slm_lab.agent.Agent object at 0x7fce0e098da0>
(pid=4388) - env = <slm_lab.env.openai.OpenAIEnv object at 0x7fce28044da0>
(pid=4388) - eval_env = <slm_lab.env.openai.OpenAIEnv object at 0x7fce28044da0>
(pid=4388) [2020-01-30 11:38:58,338 PID:4445 INFO logger.py info] Running RL loop for trial 0 session 1
(pid=4388) [2020-01-30 11:38:58,340 PID:4449 INFO __init__.py log_summary] Trial 0 session 2 reinforce_baseline_cartpole_t0_s2 [train_df] epi: 0  t: 0  wall_t: 0  opt_step: 0  frame: 0  fps: 0  total_reward: nan  total_reward_ma: nan  loss: nan  lr: 0.002  explore_var: nan  entropy_coef: 0.01  entropy: nan  grad_norm: nan
(pid=4388) [2020-01-30 11:38:58,340 PID:4452 INFO base.py __init__] Reinforce:
(pid=4388) - agent = <slm_lab.agent.Agent object at 0x7fce0e082a58>
(pid=4388) - algorithm_spec = {'action_pdtype': 'default',
(pid=4388)  'action_policy': 'default',
(pid=4388)  'center_return': True,
(pid=4388)  'entropy_coef_spec': {'end_step': 20000,
(pid=4388)                        'end_val': 0.001,
(pid=4388)                        'name': 'linear_decay',
(pid=4388)                        'start_step': 0,
(pid=4388)                        'start_val': 0.01},
(pid=4388)  'explore_var_spec': None,
(pid=4388)  'gamma': 0.99,
(pid=4388)  'name': 'Reinforce',
(pid=4388)  'training_frequency': 1}
(pid=4388) - name = Reinforce
(pid=4388) - memory_spec = {'name': 'OnPolicyReplay'}
(pid=4388) - net_spec = {'clip_grad_val': None,
(pid=4388)  'hid_layers': [64],
(pid=4388)  'hid_layers_activation': 'selu',
(pid=4388)  'loss_spec': {'name': 'MSELoss'},
(pid=4388)  'lr_scheduler_spec': None,
(pid=4388)  'optim_spec': {'lr': 0.002, 'name': 'Adam'},
(pid=4388)  'type': 'MLPNet'}
(pid=4388) - body = body: {
(pid=4388)   "agent": "<slm_lab.agent.Agent object at 0x7fce0e082a58>",
(pid=4388)   "env": "<slm_lab.env.openai.OpenAIEnv object at 0x7fce28044fd0>",
(pid=4388)   "a": 0,
(pid=4388)   "e": 0,
(pid=4388)   "b": 0,
(pid=4388)   "aeb": "(0, 0, 0)",
(pid=4388)   "explore_var": NaN,
(pid=4388)   "entropy_coef": 0.01,
(pid=4388)   "loss": NaN,
(pid=4388)   "mean_entropy": NaN,
(pid=4388)   "mean_grad_norm": NaN,
(pid=4388)   "best_total_reward_ma": -Infinity,
(pid=4388)   "total_reward_ma": NaN,
(pid=4388)   "train_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
(pid=4388)   "eval_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
(pid=4388)   "tb_writer": "<torch.utils.tensorboard.writer.SummaryWriter object at 0x7fce2b00a780>",
(pid=4388)   "tb_actions": [],
(pid=4388)   "tb_tracker": {},
(pid=4388)   "observation_space": "Box(4,)",
(pid=4388)   "action_space": "Discrete(2)",
(pid=4388)   "observable_dim": {
(pid=4388)     "state": 4
(pid=4388)   },
(pid=4388)   "state_dim": 4,
(pid=4388)   "action_dim": 2,
(pid=4388)   "is_discrete": true,
(pid=4388)   "action_type": "discrete",
(pid=4388)   "action_pdtype": "Categorical",
(pid=4388)   "ActionPD": "<class 'torch.distributions.categorical.Categorical'>",
(pid=4388)   "memory": "<slm_lab.agent.memory.onpolicy.OnPolicyReplay object at 0x7fce0e0540b8>"
(pid=4388) }
(pid=4388) - action_pdtype = default
(pid=4388) - action_policy = <function default at 0x7fce304ad620>
(pid=4388) - center_return = True
(pid=4388) - explore_var_spec = None
(pid=4388) - entropy_coef_spec = {'end_step': 20000,
(pid=4388)  'end_val': 0.001,
(pid=4388)  'name': 'linear_decay',
(pid=4388)  'start_step': 0,
(pid=4388)  'start_val': 0.01}
(pid=4388) - policy_loss_coef = 1.0
(pid=4388) - gamma = 0.99
(pid=4388) - training_frequency = 1
(pid=4388) - to_train = 0
(pid=4388) - explore_var_scheduler = <slm_lab.agent.algorithm.policy_util.VarScheduler object at 0x7fce0e054080>
(pid=4388) - entropy_coef_scheduler = <slm_lab.agent.algorithm.policy_util.VarScheduler object at 0x7fce0e054160>
(pid=4388) - net = MLPNet(
(pid=4388)   (model): Sequential(
(pid=4388)     (0): Linear(in_features=4, out_features=64, bias=True)
(pid=4388)     (1): SELU()
(pid=4388)   )
(pid=4388)   (model_tail): Sequential(
(pid=4388)     (0): Linear(in_features=64, out_features=2, bias=True)
(pid=4388)   )
(pid=4388)   (loss_fn): MSELoss()
(pid=4388) )
(pid=4388) - net_names = ['net']
(pid=4388) - optim = Adam (
(pid=4388) Parameter Group 0
(pid=4388)     amsgrad: False
(pid=4388)     betas: (0.9, 0.999)
(pid=4388)     eps: 1e-08
(pid=4389)     lr: 0.002
(pid=4389)     weight_decay: 0
(pid=4389) )
(pid=4389) - lr_scheduler = <slm_lab.agent.net.net_util.NoOpLRScheduler object at 0x7fcc10b9a2e8>
(pid=4389) - global_net = None
(pid=4389) [2020-01-30 11:38:58,350 PID:4456 INFO __init__.py __init__] Agent:
(pid=4389) - spec = reinforce_baseline_cartpole
(pid=4389) - agent_spec = {'algorithm': {'action_pdtype': 'default',
(pid=4389)                'action_policy': 'default',
(pid=4389)                'center_return': False,
(pid=4389)                'entropy_coef_spec': {'end_step': 20000,
(pid=4389)                                      'end_val': 0.001,
(pid=4389)                                      'name': 'linear_decay',
(pid=4389)                                      'start_step': 0,
(pid=4389)                                      'start_val': 0.01},
(pid=4389)                'explore_var_spec': None,
(pid=4389)                'gamma': 0.99,
(pid=4389)                'name': 'Reinforce',
(pid=4389)                'training_frequency': 1},
(pid=4389)  'memory': {'name': 'OnPolicyReplay'},
(pid=4389)  'name': 'Reinforce',
(pid=4389)  'net': {'clip_grad_val': None,
(pid=4389)          'hid_layers': [64],
(pid=4389)          'hid_layers_activation': 'selu',
(pid=4389)          'loss_spec': {'name': 'MSELoss'},
(pid=4389)          'lr_scheduler_spec': None,
(pid=4389)          'optim_spec': {'lr': 0.002, 'name': 'Adam'},
(pid=4389)          'type': 'MLPNet'}}
(pid=4389) - name = Reinforce
(pid=4389) - body = body: {
(pid=4389)   "agent": "<slm_lab.agent.Agent object at 0x7fcc10bdcf28>",
(pid=4389)   "env": "<slm_lab.env.openai.OpenAIEnv object at 0x7fcc10c56eb8>",
(pid=4389)   "a": 0,
(pid=4389)   "e": 0,
(pid=4389)   "b": 0,
(pid=4389)   "aeb": "(0, 0, 0)",
(pid=4389)   "explore_var": NaN,
(pid=4389)   "entropy_coef": 0.01,
(pid=4389)   "loss": NaN,
(pid=4389)   "mean_entropy": NaN,
(pid=4389)   "mean_grad_norm": NaN,
(pid=4389)   "best_total_reward_ma": -Infinity,
(pid=4389)   "total_reward_ma": NaN,
(pid=4389)   "train_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
(pid=4389)   "eval_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
(pid=4389)   "tb_writer": "<torch.utils.tensorboard.writer.SummaryWriter object at 0x7fcc10bc7940>",
(pid=4389)   "tb_actions": [],
(pid=4389)   "tb_tracker": {},
(pid=4389)   "observation_space": "Box(4,)",
(pid=4389)   "action_space": "Discrete(2)",
(pid=4389)   "observable_dim": {
(pid=4389)     "state": 4
(pid=4389)   },
(pid=4389)   "state_dim": 4,
(pid=4389)   "action_dim": 2,
(pid=4389)   "is_discrete": true,
(pid=4389)   "action_type": "discrete",
(pid=4389)   "action_pdtype": "Categorical",
(pid=4389)   "ActionPD": "<class 'torch.distributions.categorical.Categorical'>",
(pid=4389)   "memory": "<slm_lab.agent.memory.onpolicy.OnPolicyReplay object at 0x7fcc10bdcfd0>"
(pid=4389) }
(pid=4389) - algorithm = <slm_lab.agent.algorithm.reinforce.Reinforce object at 0x7fcc10bdcf60>
(pid=4389) [2020-01-30 11:38:58,351 PID:4456 INFO logger.py info] Session:
(pid=4389) - spec = reinforce_baseline_cartpole
(pid=4389) - index = 2
(pid=4389) - agent = <slm_lab.agent.Agent object at 0x7fcc10bdcf28>
(pid=4389) - env = <slm_lab.env.openai.OpenAIEnv object at 0x7fcc10c56eb8>
(pid=4389) - eval_env = <slm_lab.env.openai.OpenAIEnv object at 0x7fcc10c56eb8>
(pid=4389) [2020-01-30 11:38:58,351 PID:4456 INFO logger.py info] Running RL loop for trial 1 session 2
(pid=4389) [2020-01-30 11:38:58,351 PID:4450 INFO __init__.py log_summary] Trial 1 session 0 reinforce_baseline_cartpole_t1_s0 [train_df] epi: 0  t: 0  wall_t: 0  opt_step: 0  frame: 0  fps: 0  total_reward: nan  total_reward_ma: nan  loss: nan  lr: 0.002  explore_var: nan  entropy_coef: 0.01  entropy: nan  grad_norm: nan
(pid=4389) [2020-01-30 11:38:58,351 PID:4453 INFO __init__.py log_summary] Trial 1 session 1 reinforce_baseline_cartpole_t1_s1 [train_df] epi: 0  t: 0  wall_t: 0  opt_step: 0  frame: 0  fps: 0  total_reward: nan  total_reward_ma: nan  loss: nan  lr: 0.002  explore_var: nan  entropy_coef: 0.01  entropy: nan  grad_norm: nan
(pid=4389) [2020-01-30 11:38:58,352 PID:4458 INFO base.py __init__] Reinforce:
(pid=4389) - agent = <slm_lab.agent.Agent object at 0x7fcc10bddd68>
(pid=4389) - algorithm_spec = {'action_pdtype': 'default',
(pid=4389)  'action_policy': 'default',
(pid=4389)  'center_return': False,
(pid=4389)  'entropy_coef_spec': {'end_step': 20000,
(pid=4389)                        'end_val': 0.001,
(pid=4389)                        'name': 'linear_decay',
(pid=4389)                        'start_step': 0,
(pid=4389)                        'start_val': 0.01},
(pid=4389)  'explore_var_spec': None,
(pid=4389)  'gamma': 0.99,
(pid=4389)  'name': 'Reinforce',
(pid=4389)  'training_frequency': 1}
(pid=4389) - name = Reinforce
(pid=4389) - memory_spec = {'name': 'OnPolicyReplay'}
(pid=4389) - net_spec = {'clip_grad_val': None,
(pid=4389)  'hid_layers': [64],
(pid=4389)  'hid_layers_activation': 'selu',
(pid=4389)  'loss_spec': {'name': 'MSELoss'},
(pid=4389)  'lr_scheduler_spec': None,
(pid=4389)  'optim_spec': {'lr': 0.002, 'name': 'Adam'},
(pid=4389)  'type': 'MLPNet'}
(pid=4389) - body = body: {
(pid=4389)   "agent": "<slm_lab.agent.Agent object at 0x7fcc10bddd68>",
(pid=4389)   "env": "<slm_lab.env.openai.OpenAIEnv object at 0x7fcc10c56fd0>",
(pid=4389)   "a": 0,
(pid=4389)   "e": 0,
(pid=4389)   "b": 0,
(pid=4388)     lr: 0.002
(pid=4388)     weight_decay: 0
(pid=4388) )
(pid=4388) - lr_scheduler = <slm_lab.agent.net.net_util.NoOpLRScheduler object at 0x7fce0e054400>
(pid=4388) - global_net = None
(pid=4388) [2020-01-30 11:38:58,342 PID:4445 INFO __init__.py log_summary] Trial 0 session 1 reinforce_baseline_cartpole_t0_s1 [train_df] epi: 0  t: 0  wall_t: 0  opt_step: 0  frame: 0  fps: 0  total_reward: nan  total_reward_ma: nan  loss: nan  lr: 0.002  explore_var: nan  entropy_coef: 0.01  entropy: nan  grad_norm: nan
(pid=4388) [2020-01-30 11:38:58,342 PID:4452 INFO __init__.py __init__] Agent:
(pid=4388) - spec = reinforce_baseline_cartpole
(pid=4388) - agent_spec = {'algorithm': {'action_pdtype': 'default',
(pid=4388)                'action_policy': 'default',
(pid=4388)                'center_return': True,
(pid=4388)                'entropy_coef_spec': {'end_step': 20000,
(pid=4388)                                      'end_val': 0.001,
(pid=4388)                                      'name': 'linear_decay',
(pid=4388)                                      'start_step': 0,
(pid=4388)                                      'start_val': 0.01},
(pid=4388)                'explore_var_spec': None,
(pid=4388)                'gamma': 0.99,
(pid=4388)                'name': 'Reinforce',
(pid=4388)                'training_frequency': 1},
(pid=4388)  'memory': {'name': 'OnPolicyReplay'},
(pid=4388)  'name': 'Reinforce',
(pid=4388)  'net': {'clip_grad_val': None,
(pid=4388)          'hid_layers': [64],
(pid=4388)          'hid_layers_activation': 'selu',
(pid=4388)          'loss_spec': {'name': 'MSELoss'},
(pid=4388)          'lr_scheduler_spec': None,
(pid=4388)          'optim_spec': {'lr': 0.002, 'name': 'Adam'},
(pid=4388)          'type': 'MLPNet'}}
(pid=4388) - name = Reinforce
(pid=4388) - body = body: {
(pid=4388)   "agent": "<slm_lab.agent.Agent object at 0x7fce0e082a58>",
(pid=4388)   "env": "<slm_lab.env.openai.OpenAIEnv object at 0x7fce28044fd0>",
(pid=4388)   "a": 0,
(pid=4388)   "e": 0,
(pid=4388)   "b": 0,
(pid=4388)   "aeb": "(0, 0, 0)",
(pid=4388)   "explore_var": NaN,
(pid=4388)   "entropy_coef": 0.01,
(pid=4388)   "loss": NaN,
(pid=4388)   "mean_entropy": NaN,
(pid=4388)   "mean_grad_norm": NaN,
(pid=4388)   "best_total_reward_ma": -Infinity,
(pid=4388)   "total_reward_ma": NaN,
(pid=4388)   "train_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
(pid=4388)   "eval_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
(pid=4388)   "tb_writer": "<torch.utils.tensorboard.writer.SummaryWriter object at 0x7fce2b00a780>",
(pid=4388)   "tb_actions": [],
(pid=4388)   "tb_tracker": {},
(pid=4388)   "observation_space": "Box(4,)",
(pid=4388)   "action_space": "Discrete(2)",
(pid=4388)   "observable_dim": {
(pid=4388)     "state": 4
(pid=4388)   },
(pid=4388)   "state_dim": 4,
(pid=4388)   "action_dim": 2,
(pid=4388)   "is_discrete": true,
(pid=4388)   "action_type": "discrete",
(pid=4388)   "action_pdtype": "Categorical",
(pid=4388)   "ActionPD": "<class 'torch.distributions.categorical.Categorical'>",
(pid=4388)   "memory": "<slm_lab.agent.memory.onpolicy.OnPolicyReplay object at 0x7fce0e0540b8>"
(pid=4388) }
(pid=4388) - algorithm = <slm_lab.agent.algorithm.reinforce.Reinforce object at 0x7fce0e054048>
(pid=4388) [2020-01-30 11:38:58,342 PID:4452 INFO logger.py info] Session:
(pid=4388) - spec = reinforce_baseline_cartpole
(pid=4388) - index = 3
(pid=4388) - agent = <slm_lab.agent.Agent object at 0x7fce0e082a58>
(pid=4388) - env = <slm_lab.env.openai.OpenAIEnv object at 0x7fce28044fd0>
(pid=4388) - eval_env = <slm_lab.env.openai.OpenAIEnv object at 0x7fce28044fd0>
(pid=4388) [2020-01-30 11:38:58,342 PID:4452 INFO logger.py info] Running RL loop for trial 0 session 3
(pid=4388) [2020-01-30 11:38:58,343 PID:4440 INFO base.py post_init_nets] Initialized algorithm models for lab_mode: search
(pid=4388) [2020-01-30 11:38:58,346 PID:4452 INFO __init__.py log_summary] Trial 0 session 3 reinforce_baseline_cartpole_t0_s3 [train_df] epi: 0  t: 0  wall_t: 0  opt_step: 0  frame: 0  fps: 0  total_reward: nan  total_reward_ma: nan  loss: nan  lr: 0.002  explore_var: nan  entropy_coef: 0.01  entropy: nan  grad_norm: nan
(pid=4388) [2020-01-30 11:38:58,348 PID:4440 INFO base.py __init__] Reinforce:
(pid=4388) - agent = <slm_lab.agent.Agent object at 0x7fce0e09ac88>
(pid=4388) - algorithm_spec = {'action_pdtype': 'default',
(pid=4388)  'action_policy': 'default',
(pid=4388)  'center_return': True,
(pid=4388)  'entropy_coef_spec': {'end_step': 20000,
(pid=4388)                        'end_val': 0.001,
(pid=4388)                        'name': 'linear_decay',
(pid=4388)                        'start_step': 0,
(pid=4388)                        'start_val': 0.01},
(pid=4388)  'explore_var_spec': None,
(pid=4388)  'gamma': 0.99,
(pid=4388)  'name': 'Reinforce',
(pid=4388)  'training_frequency': 1}
(pid=4388) - name = Reinforce
(pid=4388) - memory_spec = {'name': 'OnPolicyReplay'}
(pid=4388) - net_spec = {'clip_grad_val': None,
(pid=4388)  'hid_layers': [64],
(pid=4388)  'hid_layers_activation': 'selu',
(pid=4388)  'loss_spec': {'name': 'MSELoss'},
(pid=4388)  'lr_scheduler_spec': None,
(pid=4388)  'optim_spec': {'lr': 0.002, 'name': 'Adam'},
(pid=4388)  'type': 'MLPNet'}
(pid=4388) - body = body: {
(pid=4388)   "agent": "<slm_lab.agent.Agent object at 0x7fce0e09ac88>",
(pid=4388)   "env": "<slm_lab.env.openai.OpenAIEnv object at 0x7fce28044cc0>",
(pid=4388)   "a": 0,
(pid=4388)   "e": 0,
(pid=4388) terminate called after throwing an instance of 'c10::Error'
(pid=4388)   what():  CUDA error: initialization error (getDevice at /opt/conda/conda-bld/pytorch_1556653114079/work/c10/cuda/impl/CUDAGuardImpl.h:35)
(pid=4388) frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7fcf770dedc5 in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libc10.so)
(pid=4388) frame #1: <unknown function> + 0xca67 (0x7fcf6f2daa67 in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
(pid=4388) frame #2: torch::autograd::Engine::thread_init(int) + 0x3ee (0x7fcf6f9fbb1e in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
(pid=4388) frame #3: torch::autograd::python::PythonEngine::thread_init(int) + 0x2a (0x7fcfa636128a in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
(pid=4388) frame #4: <unknown function> + 0xc8421 (0x7fcfbb3bd421 in /home/joe/anaconda3/envs/lab/bin/../lib/libstdc++.so.6)
(pid=4388) frame #5: <unknown function> + 0x76db (0x7fcfc0c466db in /lib/x86_64-linux-gnu/libpthread.so.0)
(pid=4388) frame #6: clone + 0x3f (0x7fcfc096f88f in /lib/x86_64-linux-gnu/libc.so.6)
(pid=4388) 
(pid=4388) Fatal Python error: Aborted
(pid=4388) 
(pid=4388) Stack (most recent call first):
(pid=4388) terminate called after throwing an instance of 'c10::Error'
(pid=4388)   what():  CUDA error: initialization error (getDevice at /opt/conda/conda-bld/pytorch_1556653114079/work/c10/cuda/impl/CUDAGuardImpl.h:35)
(pid=4388) frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7fcf770dedc5 in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libc10.so)
(pid=4388) frame #1: <unknown function> + 0xca67 (0x7fcf6f2daa67 in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
(pid=4388) frame #2: torch::autograd::Engine::thread_init(int) + 0x3ee (0x7fcf6f9fbb1e in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
(pid=4388) frame #3: torch::autograd::python::PythonEngine::thread_init(int) + 0x2a (0x7fcfa636128a in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
(pid=4388) frame #4: <unknown function> + 0xc8421 (0x7fcfbb3bd421 in /home/joe/anaconda3/envs/lab/bin/../lib/libstdc++.so.6)
(pid=4388) frame #5: <unknown function> + 0x76db (0x7fcfc0c466db in /lib/x86_64-linux-gnu/libpthread.so.0)
(pid=4388) frame #6: clone + 0x3f (0x7fcfc096f88f in /lib/x86_64-linux-gnu/libc.so.6)
(pid=4388) 
(pid=4388) Fatal Python error: Aborted
(pid=4388) 
(pid=4388) Stack (most recent call first):
(pid=4389)   "aeb": "(0, 0, 0)",
(pid=4389)   "explore_var": NaN,
(pid=4389)   "entropy_coef": 0.01,
(pid=4389)   "loss": NaN,
(pid=4389)   "mean_entropy": NaN,
(pid=4389)   "mean_grad_norm": NaN,
(pid=4389)   "best_total_reward_ma": -Infinity,
(pid=4389)   "total_reward_ma": NaN,
(pid=4389)   "train_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
(pid=4389)   "eval_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
(pid=4389)   "tb_writer": "<torch.utils.tensorboard.writer.SummaryWriter object at 0x7fcc10bc6a58>",
(pid=4389)   "tb_actions": [],
(pid=4389)   "tb_tracker": {},
(pid=4389)   "observation_space": "Box(4,)",
(pid=4389)   "action_space": "Discrete(2)",
(pid=4389)   "observable_dim": {
(pid=4389)     "state": 4
(pid=4389)   },
(pid=4389)   "state_dim": 4,
(pid=4389)   "action_dim": 2,
(pid=4389)   "is_discrete": true,
(pid=4389)   "action_type": "discrete",
(pid=4389)   "action_pdtype": "Categorical",
(pid=4389)   "ActionPD": "<class 'torch.distributions.categorical.Categorical'>",
(pid=4389)   "memory": "<slm_lab.agent.memory.onpolicy.OnPolicyReplay object at 0x7fcc10b9a0b8>"
(pid=4389) }
(pid=4389) - action_pdtype = default
(pid=4389) - action_policy = <function default at 0x7fcc21560620>
(pid=4389) - center_return = False
(pid=4389) - explore_var_spec = None
(pid=4389) - entropy_coef_spec = {'end_step': 20000,
(pid=4389)  'end_val': 0.001,
(pid=4389)  'name': 'linear_decay',
(pid=4389)  'start_step': 0,
(pid=4389)  'start_val': 0.01}
(pid=4389) - policy_loss_coef = 1.0
(pid=4389) - gamma = 0.99
(pid=4389) - training_frequency = 1
(pid=4389) - to_train = 0
(pid=4389) - explore_var_scheduler = <slm_lab.agent.algorithm.policy_util.VarScheduler object at 0x7fcc10b9a080>
(pid=4389) - entropy_coef_scheduler = <slm_lab.agent.algorithm.policy_util.VarScheduler object at 0x7fcc10b9a160>
(pid=4389) - net = MLPNet(
(pid=4389)   (model): Sequential(
(pid=4389)     (0): Linear(in_features=4, out_features=64, bias=True)
(pid=4389)     (1): SELU()
(pid=4389)   )
(pid=4389)   (model_tail): Sequential(
(pid=4389)     (0): Linear(in_features=64, out_features=2, bias=True)
(pid=4389)   )
(pid=4389)   (loss_fn): MSELoss()
(pid=4389) )
(pid=4389) - net_names = ['net']
(pid=4389) - optim = Adam (
(pid=4389) Parameter Group 0
(pid=4389)     amsgrad: False
(pid=4389)     betas: (0.9, 0.999)
(pid=4389)     eps: 1e-08
(pid=4389)     lr: 0.002
(pid=4389)     weight_decay: 0
(pid=4389) )
(pid=4389) - lr_scheduler = <slm_lab.agent.net.net_util.NoOpLRScheduler object at 0x7fcc10b9a400>
(pid=4389) - global_net = None
(pid=4389) [2020-01-30 11:38:58,354 PID:4458 INFO __init__.py __init__] Agent:
(pid=4389) - spec = reinforce_baseline_cartpole
(pid=4389) - agent_spec = {'algorithm': {'action_pdtype': 'default',
(pid=4389)                'action_policy': 'default',
(pid=4389)                'center_return': False,
(pid=4389)                'entropy_coef_spec': {'end_step': 20000,
(pid=4389)                                      'end_val': 0.001,
(pid=4389)                                      'name': 'linear_decay',
(pid=4389)                                      'start_step': 0,
(pid=4389)                                      'start_val': 0.01},
(pid=4389)                'explore_var_spec': None,
(pid=4389)                'gamma': 0.99,
(pid=4389)                'name': 'Reinforce',
(pid=4389)                'training_frequency': 1},
(pid=4389)  'memory': {'name': 'OnPolicyReplay'},
(pid=4389)  'name': 'Reinforce',
(pid=4389)  'net': {'clip_grad_val': None,
(pid=4389)          'hid_layers': [64],
(pid=4389)          'hid_layers_activation': 'selu',
(pid=4389)          'loss_spec': {'name': 'MSELoss'},
(pid=4389)          'lr_scheduler_spec': None,
(pid=4389)          'optim_spec': {'lr': 0.002, 'name': 'Adam'},
(pid=4389)          'type': 'MLPNet'}}
(pid=4389) - name = Reinforce
(pid=4389) - body = body: {
(pid=4389)   "agent": "<slm_lab.agent.Agent object at 0x7fcc10bddd68>",
(pid=4389)   "env": "<slm_lab.env.openai.OpenAIEnv object at 0x7fcc10c56fd0>",
(pid=4389)   "a": 0,
(pid=4389)   "e": 0,
(pid=4389)   "b": 0,
(pid=4389)   "aeb": "(0, 0, 0)",
(pid=4389)   "explore_var": NaN,
(pid=4389)   "entropy_coef": 0.01,
(pid=4389)   "loss": NaN,
(pid=4389)   "mean_entropy": NaN,
(pid=4389)   "mean_grad_norm": NaN,
(pid=4389)   "best_total_reward_ma": -Infinity,
(pid=4389)   "total_reward_ma": NaN,
(pid=4388)   "b": 0,
(pid=4388)   "aeb": "(0, 0, 0)",
(pid=4388)   "explore_var": NaN,
(pid=4388)   "entropy_coef": 0.01,
(pid=4388)   "loss": NaN,
(pid=4388)   "mean_entropy": NaN,
(pid=4388)   "mean_grad_norm": NaN,
(pid=4388)   "best_total_reward_ma": -Infinity,
(pid=4388)   "total_reward_ma": NaN,
(pid=4388)   "train_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
(pid=4388)   "eval_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
(pid=4388)   "tb_writer": "<torch.utils.tensorboard.writer.SummaryWriter object at 0x7fce2b00a780>",
(pid=4388)   "tb_actions": [],
(pid=4388)   "tb_tracker": {},
(pid=4388)   "observation_space": "Box(4,)",
(pid=4388)   "action_space": "Discrete(2)",
(pid=4388)   "observable_dim": {
(pid=4388)     "state": 4
(pid=4388)   },
(pid=4388)   "state_dim": 4,
(pid=4388)   "action_dim": 2,
(pid=4388)   "is_discrete": true,
(pid=4388)   "action_type": "discrete",
(pid=4388)   "action_pdtype": "Categorical",
(pid=4388)   "ActionPD": "<class 'torch.distributions.categorical.Categorical'>",
(pid=4388)   "memory": "<slm_lab.agent.memory.onpolicy.OnPolicyReplay object at 0x7fce0e09ad30>"
(pid=4388) }
(pid=4388) - action_pdtype = default
(pid=4388) - action_policy = <function default at 0x7fce304ad620>
(pid=4388) - center_return = True
(pid=4388) - explore_var_spec = None
(pid=4388) - entropy_coef_spec = {'end_step': 20000,
(pid=4388)  'end_val': 0.001,
(pid=4388)  'name': 'linear_decay',
(pid=4388)  'start_step': 0,
(pid=4388)  'start_val': 0.01}
(pid=4388) - policy_loss_coef = 1.0
(pid=4388) - gamma = 0.99
(pid=4388) - training_frequency = 1
(pid=4388) - to_train = 0
(pid=4388) - explore_var_scheduler = <slm_lab.agent.algorithm.policy_util.VarScheduler object at 0x7fce0e09acf8>
(pid=4388) - entropy_coef_scheduler = <slm_lab.agent.algorithm.policy_util.VarScheduler object at 0x7fce0e09ae10>
(pid=4388) - net = MLPNet(
(pid=4388)   (model): Sequential(
(pid=4388)     (0): Linear(in_features=4, out_features=64, bias=True)
(pid=4388)     (1): SELU()
(pid=4388)   )
(pid=4388)   (model_tail): Sequential(
(pid=4388)     (0): Linear(in_features=64, out_features=2, bias=True)
(pid=4388)   )
(pid=4388)   (loss_fn): MSELoss()
(pid=4388) )
(pid=4388) - net_names = ['net']
(pid=4388) - optim = Adam (
(pid=4388) Parameter Group 0
(pid=4388)     amsgrad: False
(pid=4388)     betas: (0.9, 0.999)
(pid=4388)     eps: 1e-08
(pid=4388)     lr: 0.002
(pid=4388)     weight_decay: 0
(pid=4388) )
(pid=4388) - lr_scheduler = <slm_lab.agent.net.net_util.NoOpLRScheduler object at 0x7fce0e05c0b8>
(pid=4388) - global_net = None
(pid=4388) [2020-01-30 11:38:58,350 PID:4440 INFO __init__.py __init__] Agent:
(pid=4388) - spec = reinforce_baseline_cartpole
(pid=4388) - agent_spec = {'algorithm': {'action_pdtype': 'default',
(pid=4388)                'action_policy': 'default',
(pid=4388)                'center_return': True,
(pid=4388)                'entropy_coef_spec': {'end_step': 20000,
(pid=4388)                                      'end_val': 0.001,
(pid=4388)                                      'name': 'linear_decay',
(pid=4388)                                      'start_step': 0,
(pid=4388)                                      'start_val': 0.01},
(pid=4388)                'explore_var_spec': None,
(pid=4388)                'gamma': 0.99,
(pid=4388)                'name': 'Reinforce',
(pid=4388)                'training_frequency': 1},
(pid=4388)  'memory': {'name': 'OnPolicyReplay'},
(pid=4388)  'name': 'Reinforce',
(pid=4388)  'net': {'clip_grad_val': None,
(pid=4388)          'hid_layers': [64],
(pid=4388)          'hid_layers_activation': 'selu',
(pid=4388)          'loss_spec': {'name': 'MSELoss'},
(pid=4388)          'lr_scheduler_spec': None,
(pid=4388)          'optim_spec': {'lr': 0.002, 'name': 'Adam'},
(pid=4388)          'type': 'MLPNet'}}
(pid=4388) - name = Reinforce
(pid=4388) - body = body: {
(pid=4388)   "agent": "<slm_lab.agent.Agent object at 0x7fce0e09ac88>",
(pid=4388)   "env": "<slm_lab.env.openai.OpenAIEnv object at 0x7fce28044cc0>",
(pid=4388)   "a": 0,
(pid=4388)   "e": 0,
(pid=4388)   "b": 0,
(pid=4388)   "aeb": "(0, 0, 0)",
(pid=4388)   "explore_var": NaN,
(pid=4388)   "entropy_coef": 0.01,
(pid=4388)   "loss": NaN,
(pid=4388)   "mean_entropy": NaN,
(pid=4388)   "mean_grad_norm": NaN,
(pid=4388)   "best_total_reward_ma": -Infinity,
(pid=4389)   "train_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
(pid=4389)   "eval_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
(pid=4389)   "tb_writer": "<torch.utils.tensorboard.writer.SummaryWriter object at 0x7fcc10bc6a58>",
(pid=4389)   "tb_actions": [],
(pid=4389)   "tb_tracker": {},
(pid=4389)   "observation_space": "Box(4,)",
(pid=4389)   "action_space": "Discrete(2)",
(pid=4389)   "observable_dim": {
(pid=4389)     "state": 4
(pid=4389)   },
(pid=4389)   "state_dim": 4,
(pid=4389)   "action_dim": 2,
(pid=4389)   "is_discrete": true,
(pid=4389)   "action_type": "discrete",
(pid=4389)   "action_pdtype": "Categorical",
(pid=4389)   "ActionPD": "<class 'torch.distributions.categorical.Categorical'>",
(pid=4389)   "memory": "<slm_lab.agent.memory.onpolicy.OnPolicyReplay object at 0x7fcc10b9a0b8>"
(pid=4389) }
(pid=4389) - algorithm = <slm_lab.agent.algorithm.reinforce.Reinforce object at 0x7fcc10b9a048>
(pid=4389) [2020-01-30 11:38:58,354 PID:4458 INFO logger.py info] Session:
(pid=4389) - spec = reinforce_baseline_cartpole
(pid=4389) - index = 3
(pid=4389) - agent = <slm_lab.agent.Agent object at 0x7fcc10bddd68>
(pid=4389) - env = <slm_lab.env.openai.OpenAIEnv object at 0x7fcc10c56fd0>
(pid=4389) - eval_env = <slm_lab.env.openai.OpenAIEnv object at 0x7fcc10c56fd0>
(pid=4389) [2020-01-30 11:38:58,354 PID:4458 INFO logger.py info] Running RL loop for trial 1 session 3
(pid=4389) [2020-01-30 11:38:58,355 PID:4456 INFO __init__.py log_summary] Trial 1 session 2 reinforce_baseline_cartpole_t1_s2 [train_df] epi: 0  t: 0  wall_t: 0  opt_step: 0  frame: 0  fps: 0  total_reward: nan  total_reward_ma: nan  loss: nan  lr: 0.002  explore_var: nan  entropy_coef: 0.01  entropy: nan  grad_norm: nan
(pid=4389) [2020-01-30 11:38:58,358 PID:4458 INFO __init__.py log_summary] Trial 1 session 3 reinforce_baseline_cartpole_t1_s3 [train_df] epi: 0  t: 0  wall_t: 0  opt_step: 0  frame: 0  fps: 0  total_reward: nan  total_reward_ma: nan  loss: nan  lr: 0.002  explore_var: nan  entropy_coef: 0.01  entropy: nan  grad_norm: nan
(pid=4388)   "total_reward_ma": NaN,
(pid=4388)   "train_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
(pid=4388)   "eval_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
(pid=4388)   "tb_writer": "<torch.utils.tensorboard.writer.SummaryWriter object at 0x7fce2b00a780>",
(pid=4388)   "tb_actions": [],
(pid=4388)   "tb_tracker": {},
(pid=4388)   "observation_space": "Box(4,)",
(pid=4388)   "action_space": "Discrete(2)",
(pid=4388)   "observable_dim": {
(pid=4388)     "state": 4
(pid=4388)   },
(pid=4388)   "state_dim": 4,
(pid=4388)   "action_dim": 2,
(pid=4388)   "is_discrete": true,
(pid=4388)   "action_type": "discrete",
(pid=4388)   "action_pdtype": "Categorical",
(pid=4388)   "ActionPD": "<class 'torch.distributions.categorical.Categorical'>",
(pid=4388)   "memory": "<slm_lab.agent.memory.onpolicy.OnPolicyReplay object at 0x7fce0e09ad30>"
(pid=4388) }
(pid=4388) - algorithm = <slm_lab.agent.algorithm.reinforce.Reinforce object at 0x7fce0e09acc0>
(pid=4388) [2020-01-30 11:38:58,350 PID:4440 INFO logger.py info] Session:
(pid=4388) - spec = reinforce_baseline_cartpole
(pid=4388) - index = 0
(pid=4388) - agent = <slm_lab.agent.Agent object at 0x7fce0e09ac88>
(pid=4388) - env = <slm_lab.env.openai.OpenAIEnv object at 0x7fce28044cc0>
(pid=4388) - eval_env = <slm_lab.env.openai.OpenAIEnv object at 0x7fce28044cc0>
(pid=4388) [2020-01-30 11:38:58,350 PID:4440 INFO logger.py info] Running RL loop for trial 0 session 0
(pid=4388) [2020-01-30 11:38:58,354 PID:4440 INFO __init__.py log_summary] Trial 0 session 0 reinforce_baseline_cartpole_t0_s0 [train_df] epi: 0  t: 0  wall_t: 0  opt_step: 0  frame: 0  fps: 0  total_reward: nan  total_reward_ma: nan  loss: nan  lr: 0.002  explore_var: nan  entropy_coef: 0.01  entropy: nan  grad_norm: nan
(pid=4388) terminate called after throwing an instance of 'c10::Error'
(pid=4388)   what():  CUDA error: initialization error (getDevice at /opt/conda/conda-bld/pytorch_1556653114079/work/c10/cuda/impl/CUDAGuardImpl.h:35)
(pid=4388) frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7fcf770dedc5 in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libc10.so)
(pid=4388) frame #1: <unknown function> + 0xca67 (0x7fcf6f2daa67 in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
(pid=4388) frame #2: torch::autograd::Engine::thread_init(int) + 0x3ee (0x7fcf6f9fbb1e in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
(pid=4388) frame #3: torch::autograd::python::PythonEngine::thread_init(int) + 0x2a (0x7fcfa636128a in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
(pid=4388) frame #4: <unknown function> + 0xc8421 (0x7fcfbb3bd421 in /home/joe/anaconda3/envs/lab/bin/../lib/libstdc++.so.6)
(pid=4388) frame #5: <unknown function> + 0x76db (0x7fcfc0c466db in /lib/x86_64-linux-gnu/libpthread.so.0)
(pid=4388) frame #6: clone + 0x3f (0x7fcfc096f88f in /lib/x86_64-linux-gnu/libc.so.6)
(pid=4388) 
(pid=4388) Fatal Python error: Aborted
(pid=4388) 
(pid=4388) Stack (most recent call first):
(pid=4389) terminate called after throwing an instance of 'c10::Error'
(pid=4389)   what():  CUDA error: initialization error (getDevice at /opt/conda/conda-bld/pytorch_1556653114079/work/c10/cuda/impl/CUDAGuardImpl.h:35)
(pid=4389) frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7fcd68190dc5 in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libc10.so)
(pid=4389) frame #1: <unknown function> + 0xca67 (0x7fcd6038ca67 in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
(pid=4389) frame #2: torch::autograd::Engine::thread_init(int) + 0x3ee (0x7fcd60aadb1e in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
(pid=4389) frame #3: torch::autograd::python::PythonEngine::thread_init(int) + 0x2a (0x7fcd9741328a in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
(pid=4389) frame #4: <unknown function> + 0xc8421 (0x7fcdac471421 in /home/joe/anaconda3/envs/lab/bin/../lib/libstdc++.so.6)
(pid=4389) frame #5: <unknown function> + 0x76db (0x7fcdb1cfa6db in /lib/x86_64-linux-gnu/libpthread.so.0)
(pid=4389) frame #6: clone + 0x3f (0x7fcdb1a2388f in /lib/x86_64-linux-gnu/libc.so.6)
(pid=4389) 
(pid=4389) Fatal Python error: Aborted
(pid=4389) 
(pid=4389) Stack (most recent call first):
(pid=4389) terminate called after throwing an instance of 'c10::Error'
(pid=4389)   what():  CUDA error: initialization error (getDevice at /opt/conda/conda-bld/pytorch_1556653114079/work/c10/cuda/impl/CUDAGuardImpl.h:35)
(pid=4389) frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7fcd68190dc5 in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libc10.so)
(pid=4389) frame #1: <unknown function> + 0xca67 (0x7fcd6038ca67 in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
(pid=4389) frame #2: torch::autograd::Engine::thread_init(int) + 0x3ee (0x7fcd60aadb1e in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
(pid=4389) frame #3: torch::autograd::python::PythonEngine::thread_init(int) + 0x2a (0x7fcd9741328a in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
(pid=4389) frame #4: <unknown function> + 0xc8421 (0x7fcdac471421 in /home/joe/anaconda3/envs/lab/bin/../lib/libstdc++.so.6)
(pid=4389) frame #5: <unknown function> + 0x76db (0x7fcdb1cfa6db in /lib/x86_64-linux-gnu/libpthread.so.0)
(pid=4389) frame #6: clone + 0x3f (0x7fcdb1a2388f in /lib/x86_64-linux-gnu/libc.so.6)
(pid=4389) 
(pid=4389) Fatal Python error: Aborted
(pid=4389) 
(pid=4389) Stack (most recent call first):
(pid=4389) terminate called after throwing an instance of 'c10::Error'
(pid=4389)   what():  CUDA error: initialization error (getDevice at /opt/conda/conda-bld/pytorch_1556653114079/work/c10/cuda/impl/CUDAGuardImpl.h:35)
(pid=4389) frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7fcd68190dc5 in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libc10.so)
(pid=4389) frame #1: <unknown function> + 0xca67 (0x7fcd6038ca67 in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
(pid=4389) frame #2: torch::autograd::Engine::thread_init(int) + 0x3ee (0x7fcd60aadb1e in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
(pid=4389) frame #3: torch::autograd::python::PythonEngine::thread_init(int) + 0x2a (0x7fcd9741328a in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
(pid=4389) frame #4: <unknown function> + 0xc8421 (0x7fcdac471421 in /home/joe/anaconda3/envs/lab/bin/../lib/libstdc++.so.6)
(pid=4389) frame #5: <unknown function> + 0x76db (0x7fcdb1cfa6db in /lib/x86_64-linux-gnu/libpthread.so.0)
(pid=4389) frame #6: clone + 0x3f (0x7fcdb1a2388f in /lib/x86_64-linux-gnu/libc.so.6)
(pid=4389) 
(pid=4389) Fatal Python error: Aborted
(pid=4389) 
(pid=4389) Stack (most recent call first):
(pid=4389) terminate called after throwing an instance of 'c10::Error'
(pid=4389)   what():  CUDA error: initialization error (getDevice at /opt/conda/conda-bld/pytorch_1556653114079/work/c10/cuda/impl/CUDAGuardImpl.h:35)
(pid=4389) frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7fcd68190dc5 in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libc10.so)
(pid=4389) frame #1: <unknown function> + 0xca67 (0x7fcd6038ca67 in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
(pid=4389) frame #2: torch::autograd::Engine::thread_init(int) + 0x3ee (0x7fcd60aadb1e in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
(pid=4389) frame #3: torch::autograd::python::PythonEngine::thread_init(int) + 0x2a (0x7fcd9741328a in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
(pid=4389) frame #4: <unknown function> + 0xc8421 (0x7fcdac471421 in /home/joe/anaconda3/envs/lab/bin/../lib/libstdc++.so.6)
(pid=4389) frame #5: <unknown function> + 0x76db (0x7fcdb1cfa6db in /lib/x86_64-linux-gnu/libpthread.so.0)
(pid=4389) frame #6: clone + 0x3f (0x7fcdb1a2388f in /lib/x86_64-linux-gnu/libc.so.6)
(pid=4389) 
(pid=4389) Fatal Python error: Aborted
(pid=4389) 
(pid=4389) Stack (most recent call first):
(pid=4388) 2020-01-30 11:38:58,550	ERROR function_runner.py:96 -- Runner Thread raised error.
(pid=4388) Traceback (most recent call last):
(pid=4388)   File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/function_runner.py", line 90, in run
(pid=4388)     self._entrypoint()
(pid=4388)   File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/function_runner.py", line 141, in entrypoint
(pid=4388)     return self._trainable_func(config, self._status_reporter)
(pid=4388)   File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/function_runner.py", line 249, in _trainable_func
(pid=4388)     output = train_func(config, reporter)
(pid=4388)   File "/home/joe/SLM-Lab/slm_lab/experiment/search.py", line 90, in ray_trainable
(pid=4388)     metrics = Trial(spec).run()
(pid=4388)   File "/home/joe/SLM-Lab/slm_lab/experiment/control.py", line 181, in run
(pid=4388)     metrics = analysis.analyze_trial(self.spec, session_metrics_list)
(pid=4388)   File "/home/joe/SLM-Lab/slm_lab/experiment/analysis.py", line 265, in analyze_trial
(pid=4388)     trial_metrics = calc_trial_metrics(session_metrics_list, info_prepath)
(pid=4388)   File "/home/joe/SLM-Lab/slm_lab/experiment/analysis.py", line 187, in calc_trial_metrics
(pid=4388)     frames = session_metrics_list[0]['local']['frames']
(pid=4388) IndexError: list index out of range
(pid=4388) Exception in thread Thread-1:
(pid=4388) Traceback (most recent call last):
(pid=4388)   File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/function_runner.py", line 90, in run
(pid=4388)     self._entrypoint()
(pid=4388)   File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/function_runner.py", line 141, in entrypoint
(pid=4388)     return self._trainable_func(config, self._status_reporter)
(pid=4388)   File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/function_runner.py", line 249, in _trainable_func
(pid=4388)     output = train_func(config, reporter)
(pid=4388)   File "/home/joe/SLM-Lab/slm_lab/experiment/search.py", line 90, in ray_trainable
(pid=4388)     metrics = Trial(spec).run()
(pid=4388)   File "/home/joe/SLM-Lab/slm_lab/experiment/control.py", line 181, in run
(pid=4388)     metrics = analysis.analyze_trial(self.spec, session_metrics_list)
(pid=4388)   File "/home/joe/SLM-Lab/slm_lab/experiment/analysis.py", line 265, in analyze_trial
(pid=4388)     trial_metrics = calc_trial_metrics(session_metrics_list, info_prepath)
(pid=4388)   File "/home/joe/SLM-Lab/slm_lab/experiment/analysis.py", line 187, in calc_trial_metrics
(pid=4388)     frames = session_metrics_list[0]['local']['frames']
(pid=4388) IndexError: list index out of range
(pid=4388) 
(pid=4388) During handling of the above exception, another exception occurred:
(pid=4388) 
(pid=4388) Traceback (most recent call last):
(pid=4388)   File "/home/joe/anaconda3/envs/lab/lib/python3.7/threading.py", line 917, in _bootstrap_inner
(pid=4388)     self.run()
(pid=4388)   File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/function_runner.py", line 102, in run
(pid=4388)     err_tb = err_tb.format_exc()
(pid=4388) AttributeError: 'traceback' object has no attribute 'format_exc'
(pid=4388) 
(pid=4389) 2020-01-30 11:38:58,570	ERROR function_runner.py:96 -- Runner Thread raised error.
(pid=4389) Traceback (most recent call last):
(pid=4389)   File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/function_runner.py", line 90, in run
(pid=4389)     self._entrypoint()
(pid=4389)   File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/function_runner.py", line 141, in entrypoint
(pid=4389)     return self._trainable_func(config, self._status_reporter)
(pid=4389)   File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/function_runner.py", line 249, in _trainable_func
(pid=4389)     output = train_func(config, reporter)
(pid=4389)   File "/home/joe/SLM-Lab/slm_lab/experiment/search.py", line 90, in ray_trainable
(pid=4389)     metrics = Trial(spec).run()
(pid=4389)   File "/home/joe/SLM-Lab/slm_lab/experiment/control.py", line 181, in run
(pid=4389)     metrics = analysis.analyze_trial(self.spec, session_metrics_list)
(pid=4389)   File "/home/joe/SLM-Lab/slm_lab/experiment/analysis.py", line 265, in analyze_trial
(pid=4389)     trial_metrics = calc_trial_metrics(session_metrics_list, info_prepath)
(pid=4389)   File "/home/joe/SLM-Lab/slm_lab/experiment/analysis.py", line 187, in calc_trial_metrics
(pid=4389)     frames = session_metrics_list[0]['local']['frames']
(pid=4389) IndexError: list index out of range
(pid=4389) Exception in thread Thread-1:
(pid=4389) Traceback (most recent call last):
(pid=4389)   File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/function_runner.py", line 90, in run
(pid=4389)     self._entrypoint()
(pid=4389)   File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/function_runner.py", line 141, in entrypoint
(pid=4389)     return self._trainable_func(config, self._status_reporter)
(pid=4389)   File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/function_runner.py", line 249, in _trainable_func
(pid=4389)     output = train_func(config, reporter)
(pid=4389)   File "/home/joe/SLM-Lab/slm_lab/experiment/search.py", line 90, in ray_trainable
(pid=4389)     metrics = Trial(spec).run()
(pid=4389)   File "/home/joe/SLM-Lab/slm_lab/experiment/control.py", line 181, in run
(pid=4389)     metrics = analysis.analyze_trial(self.spec, session_metrics_list)
(pid=4389)   File "/home/joe/SLM-Lab/slm_lab/experiment/analysis.py", line 265, in analyze_trial
(pid=4389)     trial_metrics = calc_trial_metrics(session_metrics_list, info_prepath)
(pid=4389)   File "/home/joe/SLM-Lab/slm_lab/experiment/analysis.py", line 187, in calc_trial_metrics
(pid=4389)     frames = session_metrics_list[0]['local']['frames']
(pid=4389) IndexError: list index out of range
(pid=4389) 
(pid=4389) During handling of the above exception, another exception occurred:
(pid=4389) 
(pid=4389) Traceback (most recent call last):
(pid=4389)   File "/home/joe/anaconda3/envs/lab/lib/python3.7/threading.py", line 917, in _bootstrap_inner
(pid=4389)     self.run()
(pid=4389)   File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/function_runner.py", line 102, in run
(pid=4389)     err_tb = err_tb.format_exc()
(pid=4389) AttributeError: 'traceback' object has no attribute 'format_exc'
(pid=4389) 
2020-01-30 11:38:59,690	ERROR trial_runner.py:497 -- Error processing event.
Traceback (most recent call last):
  File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 446, in _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 316, in fetch_result
    result = ray.get(trial_future[0])
  File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/worker.py", line 2197, in get
    raise value
ray.exceptions.RayTaskError: ray_worker (pid=4388, host=Gauss)
  File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/trainable.py", line 151, in train
    result = self._train()
  File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/function_runner.py", line 203, in _train
    ("Wrapped function ran until completion without reporting "
ray.tune.error.TuneError: Wrapped function ran until completion without reporting results or raising an exception.

2020-01-30 11:38:59,694	INFO ray_trial_executor.py:180 -- Destroying actor for trial ray_trainable_0_agent.0.algorithm.center_return=True,trial_index=0. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
2020-01-30 11:38:59,705	ERROR trial_runner.py:497 -- Error processing event.
Traceback (most recent call last):
  File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 446, in _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 316, in fetch_result
    result = ray.get(trial_future[0])
  File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/worker.py", line 2197, in get
    raise value
ray.exceptions.RayTaskError: ray_worker (pid=4389, host=Gauss)
  File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/trainable.py", line 151, in train
    result = self._train()
  File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/function_runner.py", line 203, in _train
    ("Wrapped function ran until completion without reporting "
ray.tune.error.TuneError: Wrapped function ran until completion without reporting results or raising an exception.

2020-01-30 11:38:59,707	INFO ray_trial_executor.py:180 -- Destroying actor for trial ray_trainable_1_agent.0.algorithm.center_return=False,trial_index=1. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
== Status ==
Using FIFO scheduling algorithm.
Resources requested: 0/8 CPUs, 0/1 GPUs
Memory usage on this node: 2.5/16.7 GB
Result logdir: /home/joe/ray_results/reinforce_baseline_cartpole
Number of trials: 2 ({'ERROR': 2})
ERROR trials:
 - ray_trainable_0_agent.0.algorithm.center_return=True,trial_index=0:	ERROR, 1 failures: /home/joe/ray_results/reinforce_baseline_cartpole/ray_trainable_0_agent.0.algorithm.center_return=True,trial_index=0_2020-01-30_11-38-57n2qc80ke/error_2020-01-30_11-38-59.txt
 - ray_trainable_1_agent.0.algorithm.center_return=False,trial_index=1:	ERROR, 1 failures: /home/joe/ray_results/reinforce_baseline_cartpole/ray_trainable_1_agent.0.algorithm.center_return=False,trial_index=1_2020-01-30_11-38-57unqmlqvg/error_2020-01-30_11-38-59.txt

Traceback (most recent call last):
  File "run_lab.py", line 80, in <module>
    main()
  File "run_lab.py", line 72, in main
    read_spec_and_run(*args)
  File "run_lab.py", line 56, in read_spec_and_run
    run_spec(spec, lab_mode)
  File "run_lab.py", line 35, in run_spec
    Experiment(spec).run()
  File "/home/joe/SLM-Lab/slm_lab/experiment/control.py", line 203, in run
    trial_data_dict = search.run_ray_search(self.spec)
  File "/home/joe/SLM-Lab/slm_lab/experiment/search.py", line 124, in run_ray_search
    server_port=util.get_port(),
  File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/tune.py", line 265, in run
    raise TuneError("Trials did not complete", errored_trials)
ray.tune.error.TuneError: ('Trials did not complete', [ray_trainable_0_agent.0.algorithm.center_return=True,trial_index=0, ray_trainable_1_agent.0.algorithm.center_return=False,trial_index=1])

dependency

opened by xombio 16

ERROR:buffer_manager.cc(488)] [.DisplayCompositor]GL ERROR :GL_INVALID_OPERATION : glBufferData: <- error from previous GL command

Describe the bug After successfully installing SLM-Lab and proceeding to the "Quick Start" portion which involves running DQN on the CartPole environment, everything works well i.e. (final_return_ma increases).

Command entered: python run_lab.py slm_lab/spec/demo.json dqn_cartpole dev

After several log summary and metric instances an OpenGL error code occurs :

[101017:1015/191313.594764:ERROR:buffer_manager.cc(488)] [.DisplayCompositor]GL ERROR :GL_INVALID_OPERATION : glBufferData: <- error from previous GL command

and then the process seems to end without showing any graphs.

To Reproduce

OS and environment: Ubuntu 20.04 LTS
SLM Lab git SHA (run git rev-parse HEAD to get it):dda02d00031553aeda4c49c5baa7d0706c53996b
spec file used: slm_lab/spec/demo.json

Error logs

[2020-10-15 19:13:09,800 PID:100781 INFO __init__.py log_summary] Trial 0 session 0 dqn_cartpole_t0_s0 [train_df] epi: 123  t: 120  wall_t: 153  opt_step: 398720  frame: 10000  fps: 65.3595  total_reward: 200  total_reward_ma: 142.7  loss: 5.46846  lr: 0.00774841  explore_var: 0.1  entropy_coef: nan  entropy: nan  grad_norm: 0.230459
[2020-10-15 19:13:09,821 PID:100781 INFO __init__.py log_metrics] Trial 0 session 0 dqn_cartpole_t0_s0 [train_df metrics] final_return_ma: 142.7  strength: 120.84  max_strength: 178.14  final_strength: 178.14  sample_efficiency: 0.00019783  training_efficiency: 5.02079e-06  stability: 0.926742
[100946:1015/191310.923076:ERROR:buffer_manager.cc(488)] [.DisplayCompositor]GL ERROR :GL_INVALID_OPERATION : glBufferData: <- error from previous GL command
[2020-10-15 19:13:12,794 PID:100781 INFO __init__.py log_metrics] Trial 0 session 0 dqn_cartpole_t0_s0 [eval_df metrics] final_return_ma: 142.7  strength: 120.84  max_strength: 178.14  final_strength: 178.14  sample_efficiency: 0.00019783  training_efficiency: 5.02079e-06  stability: 0.926742
[2020-10-15 19:13:12,798 PID:100781 INFO logger.py info] Session 0 done
[101017:1015/191313.594764:ERROR:buffer_manager.cc(488)] [.DisplayCompositor]GL ERROR :GL_INVALID_OPERATION : glBufferData: <- error from previous GL command
[2020-10-15 19:13:15,443 PID:100781 INFO logger.py info] Trial 0 done

opened by Nick-Kou 11

Error at end the execution

Hi, I get stuck at the end of the trial, when it finish, can't create the respective graphics, i got the next traceback error, what can it be?

Traceback (most recent call last): File "run_lab.py", line 63, in main() File "run_lab.py", line 59, in main run_by_mode(spec_file, spec_name, lab_mode) File "run_lab.py", line 38, in run_by_mode Trial(spec).run() File "/home/kelo/librerias/SLM-Lab/slm_lab/experiment/control.py", line 122, in run session_datas = util.parallelize_fn(self.init_session_and_run, info_spaces, num_cpus) File "/home/kelo/librerias/SLM-Lab/slm_lab/lib/util.py", line 533, in parallelize_fn results = pool.map(fn, args) File "/usr/lib/python3.6/multiprocessing/pool.py", line 266, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/usr/lib/python3.6/multiprocessing/pool.py", line 644, in get raise self._value ValueError: Invalid property specified for object of type plotly.graph_objs.Layout: 'yaxis2'

opened by angel-ayala 10
Arch Install

Hi, i'm having trouble in the installation because the linux distro, can you indicate the packages required for a correct installation for run the "yarn install" command.

It's look a great framework and i'll like to test it, thanks and regards.

opened by angel-ayala 8
How to add a non-gym envrinment？

Hi, kengz, How to add a non-gym environment, such as Mahjong or Poker enviroment in rlcard project(https://github.com/datamllab/rlcard). Would you provide a simple demo for adding a new non-gym env, or give some suggestions about how to quickly add ?

opened by Jzhou0 6
why i get "terminating" ?

HI!

I get terminating when i trainning with search mode and connect to env by grpc ,the log like this: "(pid=2023) terminating" and has nothing else logs about this "terminating", my process also killed by it at the same time. why i get that? @kengz @lgraesser

opened by lidongke 6
missing module cv2
/SLM-Lab/slm_lab/lib/util.py", line 5, in import cv2 ModuleNotFoundError: No module named 'cv2'

To Reproduce

OS used: Ubuntu 18 LTS

SLM-Lab git: git cloned

demo.json not working

Additional context had to add cmake libgcc manually

Error logs (base) l*@l*-HP-Pavilion-dv7-PC:~/SLM-Lab$ python3 run_lab.py slm_lab/spec/demo.json dqn_cartpole dev Traceback (most recent call last): File "run_lab.py", line 10, in from slm_lab.experiment import analysis, retro_analysis File "/home/l*/SLM-Lab/slm_lab/experiment/analysis.py", line 5, in from slm_lab.agent import AGENT_DATA_NAMES File "/home/lr/SLM-Lab/slm_lab/agent/init.py", line 21, in from slm_lab.agent import algorithm, memory File "/home/l/SLM-Lab/slm_lab/agent/algorithm/init.py", line 8, in from .actor_critic import * File "/home/l*/SLM-Lab/slm_lab/agent/algorithm/actor_critic.py", line 1, in from slm_lab.agent import net File "/home/l*/SLM-Lab/slm_lab/agent/net/init.py", line 6, in from slm_lab.agent.net.conv import * File "/home/l*/SLM-Lab/slm_lab/agent/net/conv.py", line 1, in from slm_lab.agent.net import net_util File "/home/l*/SLM-Lab/slm_lab/agent/net/net_util.py", line 3, in from slm_lab.lib import logger, util File "/home/lr/SLM-Lab/slm_lab/lib/logger.py", line 1, in from slm_lab.lib import util File "/home/l/SLM-Lab/slm_lab/lib/util.py", line 5, in import cv2 ModuleNotFoundError: No module named 'cv2'
opened by LodeVanB 6

Undefined names

Undefined names have the potential to raise NameError at runtime.

flake8 testing of https://github.com/kengz/SLM-Lab on Python 3.6.3

$ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics

./slm_lab/agent/algorithm/base.py:73:16: F821 undefined name 'action'
        return action
               ^
./slm_lab/agent/algorithm/base.py:99:16: F821 undefined name 'batch'
        return batch
               ^
./slm_lab/agent/algorithm/policy_util.py:43:13: F821 undefined name 'new_prob'
            new_prob[torch.argmax(probs, dim=0)] = 1.0
            ^
./slm_lab/env/__init__.py:97:49: F821 undefined name 'nvec'
        setattr(gym_space, 'low', np.zeros_like(nvec))
                                                ^
./slm_lab/experiment/search.py:131:9: F821 undefined name 'config'
        config['trial_index'] = self.experiment.info_space.tick('trial')['trial']
        ^
./slm_lab/experiment/search.py:133:16: F821 undefined name 'config'
        return config
               ^
./slm_lab/experiment/search.py:146:16: F821 undefined name 'trial_data_dict'
        return trial_data_dict
               ^
./test/agent/net/test_nn.py:83:25: F821 undefined name 'net_util'
        before_params = net_util.copy_trainable_params(net)
                        ^
./test/agent/net/test_nn.py:88:24: F821 undefined name 'net_util'
        after_params = net_util.copy_trainable_params(net)
                       ^
./test/agent/net/test_nn.py:114:25: F821 undefined name 'net_util'
        before_params = net_util.copy_fixed_params(net)
                        ^
./test/agent/net/test_nn.py:118:24: F821 undefined name 'net_util'
        after_params = net_util.copy_fixed_params(net)
                       ^
11    F821 undefined name 'action'
11

opened by cclauss 6

docker gotchas
Hi. I tried running this through Docker, and ran into a few gotchas following the gitbook instructions:

the files in bin somehow gave me permission errors, despite being root. pasting these manually helped as a work-around.

the setup script used sudo a lot, but the docker container did not recognize this. removing these helped. fwiw, installing sudo helped as well.

source activate lab errored stating source was not recognized. I then tried:

# conda config --add channels anaconda # conda activate lab # conda env update (lab) # python3 --version Python 3.6.4 (lab) # yarn start $ python3 run_lab.py Traceback (most recent call last): File "run_lab.py", line 6, in <module> from slm_lab.experiment.control import Session, Trial, Experiment File "/opt/SLM-Lab/slm_lab/__init__.py", line 12 with open(os.path.join(ROOT_DIR, 'config', f'{config_name}.json')) as f: ^ SyntaxError: invalid syntax error Command failed with exit code 1

Trying this line in this python3 seemed not to yield syntax errors though, so f-strings do seem supported. Weird.

I haven't fully gotten this to work, but hopefully some of this may be useful for the tutorial. I tried looking for the gitbook source in case I could add to the installation instructions based on this, but couldn't find it.
opened by KiaraGrouwstra 6
Potential Memory Leak
Hello,

I am currently using SLM lab as the learning component of my custom Unity environments. I am using a modified UnityEnv wrapper and I run my experiments using a modified version of the starter code here.

When I am running both PPO and SAC I realized that my Unix kernel kills the job after a while due running out of memory (RAM/Swap).

Given the custom nature of this bug, I don't expect you to replicate it, but rather, asking if you had ever faced a similar problem on your end.

Some more detail:

Initially, I assumed it was due to the size of the replay buffer. But even after the replay buffer was capped up a small number (1000) and got maxed out the problem persisted.

The memory increase is roughly on the order of 1mb/s which is relatively high.

I managed to trace it to the "train step" in SAC. Can't trace if memory is created there, but when the training steps aren't taken, there is no problem.

I tested with the default Unity envs to ensure I didn't cause the problem with my custom env--this doesn't seem to be the cause.

We will be testing with the provided Cartpole env to see if the problem persists.

Any guidance or tips would be appreciated! And once again thank you for the great library!
question
opened by batu 5
Fail to save graphs
I follow the book "Foundations of Deep Reinforcement Learning" to conduct the experiments of reinformace algorithm.Although the algorithm can be conducted successfully, its graphs fail to be saved successfully, with an error from orca "service unavaialble".

OS and environment: Ubuntu 16.04

spec file used: reinforce_cartpole.json

Additional context Add any other context about the problem here.

Error logs Failed to generate graph. Run retro-analysis to generate graphs later. The image request was rejected by the orca conversion utility with the following error: 503:
503 Service Unavailable
Service Unavailable

The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later.

dependency
opened by alessandroweiliu 5
Exception: pyglet 2.0.0 requires Python 3.8 or newer
Exception: pyglet 2.0.0 requires Python 3.8 or newer. After launch this command for the demo "python run_lab.py slm_lab/spec/demo.json dqn_cartpole dev" i get this error "Exception: pyglet 2.0.0 requires Python 3.8 or newer." I.ve checked tha the python version in the lab conda environment is 3.7.3

To Reproduce

OS Linux Mint 21:

python run_lab.py slm_lab/spec/demo.json dqn_cartpole dev

Error logs Traceback (most recent call last): File "run_lab.py", line 99, in main() File "run_lab.py", line 91, in main get_spec_and_run(*args) File "run_lab.py", line 75, in get_spec_and_run run_spec(spec, lab_mode) File "run_lab.py", line 58, in run_spec Trial(spec).run() File "/home/javi/Code/AI/RL/foundation/SLM-Lab/slm_lab/experiment/control.py", line 179, in run session_metrics_list = self.run_sessions() File "/home/javi/Code/AI/RL/foundation/SLM-Lab/slm_lab/experiment/control.py", line 157, in run_sessions session_metrics_list = [Session(spec).run()] File "/home/javi/Code/AI/RL/foundation/SLM-Lab/slm_lab/experiment/control.py", line 118, in run self.run_rl() File "/home/javi/Code/AI/RL/foundation/SLM-Lab/slm_lab/experiment/control.py", line 90, in run_rl state = self.env.reset() File "/home/javi/Code/AI/RL/foundation/SLM-Lab/slm_lab/env/openai.py", line 62, in reset self.u_env.render() File "/home/javi/anaconda3/envs/lab/lib/python3.7/site-packages/gym/core.py", line 249, in render return self.env.render(mode, **kwargs) File "/home/javi/anaconda3/envs/lab/lib/python3.7/site-packages/gym/core.py", line 249, in render return self.env.render(mode, **kwargs) File "/home/javi/anaconda3/envs/lab/lib/python3.7/site-packages/gym/envs/classic_control/cartpole.py", line 150, in render from gym.envs.classic_control import rendering File "/home/javi/anaconda3/envs/lab/lib/python3.7/site-packages/gym/envs/classic_control/rendering.py", line 17, in import pyglet File "/home/javi/anaconda3/envs/lab/lib/python3.7/site-packages/pyglet/init.py", line 54, in raise Exception(f"pyglet {version} requires Python {MIN_PYTHON_VERSION_STR} or newer.") Exception: pyglet 2.0.0 requires Python 3.8 or newer.
opened by jefalcon 0
How to add a custom gym environment in json spec file.

Hi, I am interested in creating my own environment on gym open ai and train and evaluate different slm-lab algorithms on it. Can you kindly guide me how can i add the custom created gym environment in the spec files. I am new to it so I will highly appreciate it you can explain it in laymen terms.

opened by abdullahbm09 0

Docker build fails on environment.yml installation

Describe the bug running docker build hits an error during the build process

To Reproduce

OS and environment: Windows 10, Docker for Windows
SLM Lab git SHA (run git rev-parse HEAD to get it): 2890277c8d499dbc925a16bda40acd8c29cb6819
spec file used: unknown

Additional context this appears to be caused by a problem earlier in the Dockerfile, where the python-pyglet package is failing to install.

Error logs

 > [7/9] RUN . ~/miniconda3/etc/profile.d/conda.sh &&     conda create -n lab python=3.7.3 -y &&     conda activate lab &&     conda env update -f environment.yml &&     conda clean -y --all &&     rm -rf ~/.cache/pip:
#11 1.493 Collecting package metadata (current_repodata.json): ...working... done
#11 5.422 Solving environment: ...working... failed with repodata from current_repodata.json, will retry with next repodata source.
#11 5.424 Collecting package metadata (repodata.json): ...working... done
#11 15.34 Solving environment: ...working... done
#11 15.85
#11 15.85
#11 15.85 ==> WARNING: A newer version of conda exists. <==
#11 15.85   current version: 4.12.0
#11 15.85   latest version: 4.14.0
#11 15.85
#11 15.85 Please update conda by running
#11 15.85
#11 15.85     $ conda update -n base -c defaults conda
#11 15.85
#11 15.85
#11 15.93
#11 15.93 ## Package Plan ##
#11 15.93
#11 15.93   environment location: /root/miniconda3/envs/lab
#11 15.93
#11 15.93   added / updated specs:
#11 15.93     - python=3.7.3
#11 15.93
#11 15.93
#11 15.93 The following packages will be downloaded:
#11 15.93
#11 15.93     package                    |            build
#11 15.93     ---------------------------|-----------------
#11 15.93     _openmp_mutex-5.1          |            1_gnu          21 KB
#11 15.93     ca-certificates-2022.07.19 |       h06a4308_0         124 KB
#11 15.93     certifi-2022.6.15          |   py37h06a4308_0         153 KB
#11 15.93     libedit-3.1.20210910       |       h7f8727e_0         166 KB
#11 15.93     libffi-3.2.1               |    hf484d3e_1007          48 KB
#11 15.93     libgcc-ng-11.2.0           |       h1234567_1         5.3 MB
#11 15.93     libgomp-11.2.0             |       h1234567_1         474 KB
#11 15.93     libstdcxx-ng-11.2.0        |       h1234567_1         4.7 MB
#11 15.93     ncurses-6.3                |       h5eee18b_3         781 KB
#11 15.93     openssl-1.1.1q             |       h7f8727e_0         2.5 MB
#11 15.93     pip-22.1.2                 |   py37h06a4308_0         2.4 MB
#11 15.93     python-3.7.3               |       h0371630_0        32.1 MB
#11 15.93     readline-7.0               |       h7b6447c_5         324 KB
#11 15.93     setuptools-63.4.1          |   py37h06a4308_0         1.1 MB
#11 15.93     sqlite-3.33.0              |       h62c20be_0         1.1 MB
#11 15.93     tk-8.6.12                  |       h1ccaba5_0         3.0 MB
#11 15.93     xz-5.2.5                   |       h7f8727e_1         339 KB
#11 15.93     zlib-1.2.12                |       h7f8727e_2         106 KB
#11 15.93     ------------------------------------------------------------
#11 15.93                                            Total:        54.8 MB
#11 15.93
#11 15.93 The following NEW packages will be INSTALLED:
#11 15.93
#11 15.93   _libgcc_mutex      pkgs/main/linux-64::_libgcc_mutex-0.1-main
#11 15.93   _openmp_mutex      pkgs/main/linux-64::_openmp_mutex-5.1-1_gnu
#11 15.93   ca-certificates    pkgs/main/linux-64::ca-certificates-2022.07.19-h06a4308_0
#11 15.93   certifi            pkgs/main/linux-64::certifi-2022.6.15-py37h06a4308_0
#11 15.93   libedit            pkgs/main/linux-64::libedit-3.1.20210910-h7f8727e_0
#11 15.93   libffi             pkgs/main/linux-64::libffi-3.2.1-hf484d3e_1007
#11 15.93   libgcc-ng          pkgs/main/linux-64::libgcc-ng-11.2.0-h1234567_1
#11 15.93   libgomp            pkgs/main/linux-64::libgomp-11.2.0-h1234567_1
#11 15.93   libstdcxx-ng       pkgs/main/linux-64::libstdcxx-ng-11.2.0-h1234567_1
#11 15.93   ncurses            pkgs/main/linux-64::ncurses-6.3-h5eee18b_3
#11 15.93   openssl            pkgs/main/linux-64::openssl-1.1.1q-h7f8727e_0
#11 15.93   pip                pkgs/main/linux-64::pip-22.1.2-py37h06a4308_0
#11 15.93   python             pkgs/main/linux-64::python-3.7.3-h0371630_0
#11 15.93   readline           pkgs/main/linux-64::readline-7.0-h7b6447c_5
#11 15.93   setuptools         pkgs/main/linux-64::setuptools-63.4.1-py37h06a4308_0
#11 15.93   sqlite             pkgs/main/linux-64::sqlite-3.33.0-h62c20be_0
#11 15.93   tk                 pkgs/main/linux-64::tk-8.6.12-h1ccaba5_0
#11 15.93   wheel              pkgs/main/noarch::wheel-0.37.1-pyhd3eb1b0_0
#11 15.93   xz                 pkgs/main/linux-64::xz-5.2.5-h7f8727e_1
#11 15.93   zlib               pkgs/main/linux-64::zlib-1.2.12-h7f8727e_2
#11 15.93
#11 15.93
#11 15.93
#11 15.93 Downloading and Extracting Packages
zlib-1.2.12          | 106 KB    | ########## | 100%
xz-5.2.5             | 339 KB    | ########## | 100%
libedit-3.1.20210910 | 166 KB    | ########## | 100%
_openmp_mutex-5.1    | 21 KB     | ########## | 100%
sqlite-3.33.0        | 1.1 MB    | ########## | 100%
libstdcxx-ng-11.2.0  | 4.7 MB    | ########## | 100%
ncurses-6.3          | 781 KB    | ########## | 100%
python-3.7.3         | 32.1 MB   | ########## | 100%
certifi-2022.6.15    | 153 KB    | ########## | 100%
tk-8.6.12            | 3.0 MB    | ########## | 100%
libgomp-11.2.0       | 474 KB    | ########## | 100%
libffi-3.2.1         | 48 KB     | ########## | 100%
ca-certificates-2022 | 124 KB    | ########## | 100%
setuptools-63.4.1    | 1.1 MB    | ########## | 100%
pip-22.1.2           | 2.4 MB    | ########## | 100%
openssl-1.1.1q       | 2.5 MB    | ########## | 100%
readline-7.0         | 324 KB    | ########## | 100%
libgcc-ng-11.2.0     | 5.3 MB    | ########## | 100%
#11 24.41 Preparing transaction: ...working... done
#11 24.74 Verifying transaction: ...working... done
#11 25.93 Executing transaction: ...working... done
#11 28.11 #
#11 28.11 # To activate this environment, use
#11 28.11 #
#11 28.11 #     $ conda activate lab
#11 28.11 #
#11 28.11 # To deactivate an active environment, use
#11 28.11 #
#11 28.11 #     $ conda deactivate
#11 28.11
#11 29.82 Collecting package metadata (repodata.json): ...working... done
#11 101.5 Solving environment: ...working... done
#11 148.2
#11 148.2
#11 148.2 ==> WARNING: A newer version of conda exists. <==
#11 148.2   current version: 4.12.0
#11 148.2   latest version: 4.14.0
#11 148.2
#11 148.2 Please update conda by running
#11 148.2
#11 148.2     $ conda update -n base -c defaults conda
#11 148.2
#11 148.2
#11 148.3
#11 148.3 Downloading and Extracting Packages
libgfortran-ng-7.5.0 | 23 KB     | ########## | 100%
colorlog-4.0.2       | 19 KB     | ########## | 100%
lz4-c-1.9.3          | 179 KB    | ########## | 100%
jdcal-1.4.1          | 9 KB      | ########## | 100%
scipy-1.3.0          | 18.8 MB   | ########## | 100%
ujson-1.35           | 28 KB     | ########## | 100%
mkl-2022.0.1         | 127.7 MB  | ########## | 100%
xlrd-1.2.0           | 108 KB    | ########## | 100%
libopenblas-0.3.12   | 8.2 MB    | ########## | 100%
regex-2019.05.25     | 365 KB    | ########## | 100%
pytest-4.5.0         | 354 KB    | ########## | 100%
libgcc-7.2.0         | 304 KB    | ########## | 100%
libwebp-base-1.2.2   | 824 KB    | ########## | 100%
six-1.16.0           | 14 KB     | ########## | 100%
zipp-3.8.1           | 13 KB     | ########## | 100%
cffi-1.14.4          | 224 KB    | ########## | 100%
et_xmlfile-1.0.1     | 11 KB     | ########## | 100%
liblapack-3.9.0      | 11 KB     | ########## | 100%
olefile-0.46         | 32 KB     | ########## | 100%
importlib-metadata-4 | 33 KB     | ########## | 100%
cudatoolkit-10.1.243 | 427.6 MB  | ########## | 100%
py-1.11.0            | 74 KB     | ########## | 100%
backports.functools_ | 9 KB      | ########## | 100%
wcwidth-0.2.5        | 33 KB     | ########## | 100%
pydash-4.2.1         | 60 KB     | ########## | 100%
retrying-1.3.3       | 11 KB     | ########## | 100%
libgfortran4-7.5.0   | 1.2 MB    | ########## | 100%
flaky-3.5.3          | 19 KB     | ########## | 100%
ca-certificates-2022 | 149 KB    | ########## | 100%
pluggy-0.13.1        | 29 KB     | ########## | 100%
python-3.7.3         | 35.7 MB   | ########## | 100%
libtiff-4.2.0        | 590 KB    | ########## | 100%
typing_extensions-4. | 28 KB     | ########## | 100%
autopep8-1.4.4       | 38 KB     | ########## | 100%
psutil-5.6.2         | 320 KB    | ########## | 100%
openssl-1.1.1o       | 2.1 MB    | ########## | 100%
importlib_metadata-4 | 4 KB      | ########## | 100%
libcblas-3.9.0       | 11 KB     | ########## | 100%
pytorch-1.3.1        | 428.0 MB  | ########## | 100%
python-dateutil-2.8. | 240 KB    | ########## | 100%
zstd-1.5.0           | 490 KB    | ########## | 100%
yaml-0.2.5           | 87 KB     | ########## | 100%
libpng-1.6.37        | 306 KB    | ########## | 100%
ninja-1.11.0         | 2.8 MB    | ########## | 100%
attrs-22.1.0         | 48 KB     | ########## | 100%
coverage-4.5.3       | 216 KB    | ########## | 100%
pytest-cov-2.7.1     | 17 KB     | ########## | 100%
certifi-2022.6.15    | 155 KB    | ########## | 100%
pillow-6.2.0         | 634 KB    | ########## | 100%
bzip2-1.0.8          | 484 KB    | ########## | 100%
pyyaml-5.1.2         | 184 KB    | ########## | 100%
numpy-1.16.3         | 4.3 MB    | ########## | 100%
atomicwrites-1.4.1   | 12 KB     | ########## | 100%
jpeg-9e              | 268 KB    | ########## | 100%
pytz-2022.2.1        | 224 KB    | ########## | 100%
libblas-3.9.0        | 11 KB     | ########## | 100%
intel-openmp-2022.0. | 4.2 MB    | ########## | 100%
pycparser-2.21       | 100 KB    | ########## | 100%
pandas-0.24.2        | 8.6 MB    | ########## | 100%
pip-19.1.1           | 1.8 MB    | ########## | 100%
more-itertools-8.14. | 45 KB     | ########## | 100%
plotly-4.9.0         | 5.8 MB    | ########## | 100%
pycodestyle-2.5.0    | 36 KB     | ########## | 100%
pytest-timeout-1.3.3 | 12 KB     | ########## | 100%
freetype-2.10.4      | 890 KB    | ########## | 100%
python_abi-3.7       | 4 KB      | ########## | 100%
backports-1.0        | 4 KB      | ########## | 100%
openpyxl-2.6.1       | 152 KB    | ########## | 100%
#11 419.8 Preparing transaction: ...working... done
#11 422.4 Verifying transaction: ...working... done
#11 425.0 Executing transaction: ...working... By downloading and using the CUDA Toolkit conda packages, you accept the terms and conditions of the CUDA End User License Agreement (EULA): https://docs.nvidia.com/cuda/eula/index.html
#11 436.6
#11 436.6 done
#11 437.1 Installing pip dependencies: ...working... Ran pip subprocess with arguments:
#11 818.8 ['/root/miniconda3/envs/lab/bin/python', '-m', 'pip', 'install', '-U', '-r', '/root/SLM-Lab/condaenv.u9_zu190.requirements.txt']
#11 818.8 Pip subprocess output:
#11 818.8 Collecting box2d-py==2.3.8 (from -r /root/SLM-Lab/condaenv.u9_zu190.requirements.txt (line 1))
#11 818.8   Downloading https://files.pythonhosted.org/packages/87/34/da5393985c3ff9a76351df6127c275dcb5749ae0abbe8d5210f06d97405d/box2d_py-2.3.8-cp37-cp37m-manylinux1_x86_64.whl (448kB)
#11 818.8 Collecting cloudpickle==0.5.2 (from -r /root/SLM-Lab/condaenv.u9_zu190.requirements.txt (line 2))
#11 818.8   Downloading https://files.pythonhosted.org/packages/aa/18/514b557c4d8d4ada1f0454ad06c845454ad438fd5c5e0039ba51d6b032fe/cloudpickle-0.5.2-py2.py3-none-any.whl
#11 818.8 Collecting colorlover==0.3.0 (from -r /root/SLM-Lab/condaenv.u9_zu190.requirements.txt (line 3))
#11 818.8   Downloading https://files.pythonhosted.org/packages/9a/53/f696e4480b1d1de3b1523991dea71cf417c8b19fe70c704da164f3f90972/colorlover-0.3.0-py3-none-any.whl
#11 818.8 Collecting future==0.18.2 (from -r /root/SLM-Lab/condaenv.u9_zu190.requirements.txt (line 4))
#11 818.8   Downloading https://files.pythonhosted.org/packages/45/0b/38b06fd9b92dc2b68d58b75f900e97884c45bedd2ff83203d933cf5851c9/future-0.18.2.tar.gz (829kB)
...
...

#11 818.8 Requirement already satisfied, skipping upgrade: zipp>=0.5 in /root/miniconda3/envs/lab/lib/python3.7/site-packages (from importlib-metadata; python_version < "3.8"->click->ray==0.7.0->-r /root/SLM-Lab/condaenv.u9_zu190.requirements.txt (line 8)) (3.8.1)
#11 818.8 Requirement already satisfied, skipping upgrade: typing-extensions>=3.6.4; python_version < "3.8" in /root/miniconda3/envs/lab/lib/python3.7/site-packages (from importlib-metadata; python_version < "3.8"->click->ray==0.7.0->-r /root/SLM-Lab/condaenv.u9_zu190.requirements.txt (line 8)) (4.3.0)
#11 818.8 Collecting pyasn1>=0.1.3 (from rsa<5,>=3.1.4; python_version >= "3.6"->google-auth<2,>=1.6.3->tensorboard==2.1.1->-r /root/SLM-Lab/condaenv.u9_zu190.requirements.txt (line 10))
#11 818.8   Downloading https://files.pythonhosted.org/packages/62/1e/a94a8d635fa3ce4cfc7f506003548d0a2447ae76fd5ca53932970fe3053f/pyasn1-0.4.8-py2.py3-none-any.whl (77kB)
#11 818.8 Collecting oauthlib>=3.0.0 (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard==2.1.1->-r /root/SLM-Lab/condaenv.u9_zu190.requirements.txt (line 10))
#11 818.8   Downloading https://files.pythonhosted.org/packages/1d/46/5ee2475e1b46a26ca0fa10d3c1d479577fde6ee289f8c6aa6d7ec33e31fd/oauthlib-3.2.0-py3-none-any.whl (151kB)
#11 818.8 Building wheels for collected packages: future, pyopengl, xvfbwrapper, gym, typing, grpcio, MarkupSafe
#11 818.8   Building wheel for future (setup.py): started
#11 818.8   Building wheel for future (setup.py): finished with status 'done'
#11 818.8   Stored in directory: /root/.cache/pip/wheels/8b/99/a0/81daf51dcd359a9377b110a8a886b3895921802d2fc1b2397e
#11 818.8   Building wheel for pyopengl (setup.py): started
#11 818.8   Building wheel for pyopengl (setup.py): finished with status 'done'
#11 818.8   Stored in directory: /root/.cache/pip/wheels/6c/00/7f/1dd736f380848720ad79a1a1de5272e0d3f79c15a42968fb58
#11 818.8   Building wheel for xvfbwrapper (setup.py): started
#11 818.8   Building wheel for xvfbwrapper (setup.py): finished with status 'done'
#11 818.8   Stored in directory: /root/.cache/pip/wheels/10/f2/61/cacfaf84b352c223761ea8d19616e3b5ac5c27364da72863f0
#11 818.8   Building wheel for gym (setup.py): started
#11 818.8   Building wheel for gym (setup.py): finished with status 'done'
#11 818.8   Stored in directory: /root/.cache/pip/wheels/57/b0/13/4153e1acab826fbe612c95b1336a63a3fa6416902a8d74a1b7
#11 818.8   Building wheel for typing (setup.py): started
#11 818.8   Building wheel for typing (setup.py): finished with status 'done'
#11 818.8   Stored in directory: /root/.cache/pip/wheels/2d/04/41/8e1836e79581989c22eebac3f4e70aaac9af07b0908da173be
#11 818.8   Building wheel for grpcio (setup.py): started
#11 818.8   Building wheel for grpcio (setup.py): still running...
#11 818.8   Building wheel for grpcio (setup.py): still running...
#11 818.8   Building wheel for grpcio (setup.py): finished with status 'error'
#11 818.8   Running setup.py clean for grpcio
#11 818.8   Building wheel for MarkupSafe (setup.py): started
#11 818.8   Building wheel for MarkupSafe (setup.py): finished with status 'done'
#11 818.8   Stored in directory: /root/.cache/pip/wheels/f5/40/34/d60ef965622011684037ea53e53fd44ef58ed2062f26878ce2
#11 818.8 Successfully built future pyopengl xvfbwrapper gym typing MarkupSafe
#11 818.8 Failed to build grpcio
#11 818.8 Installing collected packages: box2d-py, cloudpickle, colorlover, future, kaleido, opencv-python, pyopengl, typing, funcsigs, click, colorama, flatbuffers, redis, filelock, ray, absl-py, pyasn1, rsa, pyasn1-modules, cachetools, google-auth, markdown, MarkupSafe, werkzeug, charset-normalizer, idna, urllib3, requests, oauthlib, requests-oauthlib, google-auth-oauthlib, grpcio, protobuf, tensorboard, xvfbwrapper, pyglet, gym, pybullet, roboschool, atari-py
#11 818.8   Running setup.py install for grpcio: started
#11 818.8     Running setup.py install for grpcio: still running...
#11 818.8     Running setup.py install for grpcio: still running...
#11 818.8     Running setup.py install for grpcio: finished with status 'error'
#11 818.8 Pip subprocess error:
#11 818.8   ERROR: Complete output from command /root/miniconda3/envs/lab/bin/python -u -c 'import setuptools, tokenize;__file__='"'"'/tmp/pip-install-n1qdzi5c/grpcio/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-gt9n7xut --python-tag cp37:
#11 818.8   ERROR: Found cython-generated files...
#11 818.8   running bdist_wheel
#11 818.8   running build
#11 818.8   running build_py
#11 818.8   running build_project_metadata
#11 818.8   creating python_build
#11 818.8   creating python_build/lib.linux-x86_64-cpython-37
#11 818.8   creating python_build/lib.linux-x86_64-cpython-37/grpc
#11 818.8   copying src/python/grpcio/grpc/_channel.py -> python_build/lib.linux-x86_64-cpython-37/grpc
#11 818.8   copying src/python/grpcio/grpc/_utilities.py -> python_build/lib.linux-x86_64-cpython-37/grpc

...
...
                         ^
#11 818.8   gcc -pthread -B /root/miniconda3/envs/lab/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -D_WIN32_WINNT=1536 -DGRPC_XDS_USER_AGENT_NAME_SUFFIX=\"Python\" -DGRPC_XDS_USER_AGENT_VERSION_SUFFIX=\"1.48.0\" -DGPR_BACKWARDS_COMPATIBILITY_MODE=1 -DHAVE_CONFIG_H=1 -DGRPC_ENABLE_FORK_SUPPORT=1 "-DPyMODINIT_FUNC=extern \"C\" __attribute__((visibility (\"default\"))) PyObject*" -DGRPC_POSIX_FORK_ALLOW_PTHREAD_ATFORK=1 -Isrc/python/grpcio -Iinclude -I. -Ithird_party/abseil-cpp -Ithird_party/address_sorting/include -Ithird_party/cares/cares/include -Ithird_party/cares -Ithird_party/cares/cares -Ithird_party/cares/config_linux -Ithird_party/re2 -Ithird_party/boringssl-with-bazel/src/include -Ithird_party/upb -Isrc/core/ext/upb-generated -Isrc/core/ext/upbdefs-generated -Ithird_party/xxhash -Ithird_party/zlib -I/root/miniconda3/envs/lab/include/python3.7m -c third_party/boringssl-with-bazel/src/crypto/base64/base64.c -o python_build/temp.linux-x86_64-cpython-37/third_party/boringssl-with-bazel/src/crypto/base64/base64.o -std=c++14 -fvisibility=hidden -fno-wrapv -fno-exceptions -pthread
#11 818.8   cc1: warning: command line option ‘-std=c++14’ is valid for C++/ObjC++ but not for C
#11 818.8   gcc -pthread -B /root/miniconda3/envs/lab/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -D_WIN32_WINNT=1536 -DGRPC_XDS_USER_AGENT_NAME_SUFFIX=\"Python\" -DGRPC_XDS_USER_AGENT_VERSION_SUFFIX=\"1.48.0\" -DGPR_BACKWARDS_COMPATIBILITY_MODE=1 -DHAVE_CONFIG_H=1 -DGRPC_ENABLE_FORK_SUPPORT=1 "-DPyMODINIT_FUNC=extern \"C\" __attribute__((visibility (\"default\"))) PyObject*" -DGRPC_POSIX_FORK_ALLOW_PTHREAD_ATFORK=1 -Isrc/python/grpcio -Iinclude -I. -Ithird_party/abseil-cpp -Ithird_party/address_sorting/include -Ithird_party/cares/cares/include -Ithird_party/cares -Ithird_party/cares/cares -Ithird_party/cares/config_linux -Ithird_party/re2 -Ithird_party/boringssl-with-bazel/src/include -Ithird_party/upb -Isrc/core/ext/upb-generated -Isrc/core/ext/upbdefs-generated -Ithird_party/xxhash -Ithird_party/zlib -I/root/miniconda3/envs/lab/include/python3.7m -c third_party/abseil-cpp/absl/strings/internal/str_format/bind.cc -o python_build/temp.linux-x86_64-cpython-37/third_party/abseil-cpp/absl/strings/internal/str_format/bind.o -std=c++14 -fvisibility=hidden -fno-wrapv -fno-exceptions -pthread
#11 818.8   cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
#11 818.8   gcc -pthread -B /root/miniconda3/envs/lab/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -D_WIN32_WINNT=1536 -DGRPC_XDS_USER_AGENT_NAME_SUFFIX=\"Python\" -DGRPC_XDS_USER_AGENT_VERSION_SUFFIX=\"1.48.0\" -DGPR_BACKWARDS_COMPATIBILITY_MODE=1 -DHAVE_CONFIG_H=1 -DGRPC_ENABLE_FORK_SUPPORT=1 "-DPyMODINIT_FUNC=extern \"C\" __attribute__((visibility (\"default\"))) PyObject*" -DGRPC_POSIX_FORK_ALLOW_PTHREAD_ATFORK=1 -Isrc/python/grpcio -Iinclude -I. -Ithird_party/abseil-cpp -Ithird_party/address_sorting/include -Ithird_party/cares/cares/include -Ithird_party/cares -Ithird_party/cares/cares -Ithird_party/cares/config_linux -Ithird_party/re2 -Ithird_party/boringssl-with-bazel/src/include -Ithird_party/upb -Isrc/core/ext/upb-generated -Isrc/core/ext/upbdefs-generated -Ithird_party/xxhash -Ithird_party/zlib -I/root/miniconda3/envs/lab/include/python3.7m -c third_party/abseil-cpp/absl/time/internal/cctz/src/time_zone_lookup.cc -o python_build/temp.linux-x86_64-cpython-37/third_party/abseil-cpp/absl/time/internal/cctz/src/time_zone_lookup.o -std=c++14 -fvisibility=hidden -fno-wrapv -fno-exceptions -pthread
#11 818.8   cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
#11 818.8   gcc -pthread -B /root/miniconda3/envs/lab/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -D_WIN32_WINNT=1536 -DGRPC_XDS_USER_AGENT_NAME_SUFFIX=\"Python\" -DGRPC_XDS_USER_AGENT_VERSION_SUFFIX=\"1.48.0\" -DGPR_BACKWARDS_COMPATIBILITY_MODE=1 -DHAVE_CONFIG_H=1 -DGRPC_ENABLE_FORK_SUPPORT=1 "-DPyMODINIT_FUNC=extern \"C\" __attribute__((visibility (\"default\"))) PyObject*" -DGRPC_POSIX_FORK_ALLOW_PTHREAD_ATFORK=1 -Isrc/python/grpcio -Iinclude -I. -Ithird_party/abseil-cpp -Ithird_party/address_sorting/include -Ithird_party/cares/cares/include -Ithird_party/cares -Ithird_party/cares/cares -Ithird_party/cares/config_linux -Ithird_party/re2 -Ithird_party/boringssl-with-bazel/src/include -Ithird_party/upb -Isrc/core/ext/upb-generated -Isrc/core/ext/upbdefs-generated -Ithird_party/xxhash -Ithird_party/zlib -I/root/miniconda3/envs/lab/include/python3.7m -c third_party/boringssl-with-bazel/src/crypto/cpu-aarch64-fuchsia.c -o python_build/temp.linux-x86_64-cpython-37/third_party/boringssl-with-bazel/src/crypto/cpu-aarch64-fuchsia.o -std=c++14 -fvisibility=hidden -fno-wrapv -fno-exceptions -pthread
#11 818.8   cc1: warning: command line option ‘-std=c++14’ is valid for C++/ObjC++ but not for C
#11 818.8   gcc -pthread -B /root/miniconda3/envs/lab/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -D_WIN32_WINNT=1536 -DGRPC_XDS_USER_AGENT_NAME_SUFFIX=\"Python\" -DGRPC_XDS_USER_AGENT_VERSION_SUFFIX=\"1.48.0\" -DGPR_BACKWARDS_COMPATIBILITY_MODE=1 -DHAVE_CONFIG_H=1 -DGRPC_ENABLE_FORK_SUPPORT=1 "-DPyMODINIT_FUNC=extern \"C\" __attribute__((visibility (\"default\"))) PyObject*" -DGRPC_POSIX_FORK_ALLOW_PTHREAD_ATFORK=1 -Isrc/python/grpcio -Iinclude -I. -Ithird_party/abseil-cpp -Ithird_party/address_sorting/include -Ithird_party/cares/cares/include -Ithird_party/cares -Ithird_party/cares/cares -Ithird_party/cares/config_linux -Ithird_party/re2 -Ithird_party/boringssl-with-bazel/src/include -Ithird_party/upb -Isrc/core/ext/upb-generated -Isrc/core/ext/upbdefs-generated -Ithird_party/xxhash -Ithird_party/zlib -I/root/miniconda3/envs/lab/include/python3.7m -c third_party/boringssl-with-bazel/src/crypto/cpu-aarch64-linux.c -o python_build/temp.linux-x86_64-cpython-37/third_party/boringssl-with-bazel/src/crypto/cpu-aarch64-linux.o -std=c++14 -fvisibility=hidden -fno-wrapv -fno-exceptions -pthread
#11 818.8   cc1: warning: command line option ‘-std=c++14’ is valid for C++/ObjC++ but not for C
#11 818.8   gcc -pthread -B /root/miniconda3/envs/lab/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -D_WIN32_WINNT=1536 -DGRPC_XDS_USER_AGENT_NAME_SUFFIX=\"Python\" -DGRPC_XDS_USER_AGENT_VERSION_SUFFIX=\"1.48.0\" -DGPR_BACKWARDS_COMPATIBILITY_MODE=1 -DHAVE_CONFIG_H=1 -DGRPC_ENABLE_FORK_SUPPORT=1 "-DPyMODINIT_FUNC=extern \"C\" __attribute__((visibility (\"default\"))) PyObject*" -DGRPC_POSIX_FORK_ALLOW_PTHREAD_ATFORK=1 -Isrc/python/grpcio -Iinclude -I. -Ithird_party/abseil-cpp -Ithird_party/address_sorting/include -Ithird_party/cares/cares/include -Ithird_party/cares -Ithird_party/cares/cares -Ithird_party/cares/config_linux -Ithird_party/re2 -Ithird_party/boringssl-with-bazel/src/include -Ithird_party/upb -Isrc/core/ext/upb-generated -Isrc/core/ext/upbdefs-generated -Ithird_party/xxhash -Ithird_party/zlib -I/root/miniconda3/envs/lab/include/python3.7m -c third_party/boringssl-with-bazel/src/crypto/cpu-aarch64-win.c -o python_build/temp.linux-x86_64-cpython-37/third_party/boringssl-with-bazel/src/crypto/cpu-aarch64-win.o -std=c++14 -fvisibility=hidden -fno-wrapv -fno-exceptions -pthread
#11 818.8   cc1: warning: command line option ‘-std=c++14’ is valid for C++/ObjC++ but not for C
#11 818.8   gcc -pthread -B /root/miniconda3/envs/lab/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -D_WIN32_WINNT=1536 -DGRPC_XDS_USER_AGENT_NAME_SUFFIX=\"Python\" -DGRPC_XDS_USER_AGENT_VERSION_SUFFIX=\"1.48.0\" -DGPR_BACKWARDS_COMPATIBILITY_MODE=1 -DHAVE_CONFIG_H=1 -DGRPC_ENABLE_FORK_SUPPORT=1 "-DPyMODINIT_FUNC=extern \"C\" __attribute__((visibility (\"default\"))) PyObject*" -DGRPC_POSIX_FORK_ALLOW_PTHREAD_ATFORK=1 -Isrc/python/grpcio -Iinclude -I. -Ithird_party/abseil-cpp -Ithird_party/address_sorting/include -Ithird_party/cares/cares/include -Ithird_party/cares -Ithird_party/cares/cares -Ithird_party/cares/config_linux -Ithird_party/re2 -Ithird_party/boringssl-with-bazel/src/include -Ithird_party/upb -Isrc/core/ext/upb-generated -Isrc/core/ext/upbdefs-generated -Ithird_party/xxhash -Ithird_party/zlib -I/root/miniconda3/envs/lab/include/python3.7m -c third_party/boringssl-with-bazel/src/crypto/cpu-arm-linux.c -o python_build/temp.linux-x86_64-cpython-37/third_party/boringssl-with-bazel/src/crypto/cpu-arm-linux.o -std=c++14 -fvisibility=hidden -fno-wrapv -fno-exceptions -pthread
#11 818.8   cc1: warning: command line option ‘-std=c++14’ is valid for C++/ObjC++ but not for C
#11 818.8   gcc -pthread -B /root/miniconda3/envs/lab/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -D_WIN32_WINNT=1536 -DGRPC_XDS_USER_AGENT_NAME_SUFFIX=\"Python\" -DGRPC_XDS_USER_AGENT_VERSION_SUFFIX=\"1.48.0\" -DGPR_BACKWARDS_COMPATIBILITY_MODE=1 -DHAVE_CONFIG_H=1 -DGRPC_ENABLE_FORK_SUPPORT=1 "-DPyMODINIT_FUNC=extern \"C\" __attribute__((visibility (\"default\"))) PyObject*" -DGRPC_POSIX_FORK_ALLOW_PTHREAD_ATFORK=1 -Isrc/python/grpcio -Iinclude -I. -Ithird_party/abseil-cpp -Ithird_party/address_sorting/include -Ithird_party/cares/cares/include -Ithird_party/cares -Ithird_party/cares/cares -Ithird_party/cares/config_linux -Ithird_party/re2 -Ithird_party/boringssl-with-bazel/src/include -Ithird_party/upb -Isrc/core/ext/upb-generated -Isrc/core/ext/upbdefs-generated -Ithird_party/xxhash -Ithird_party/zlib -I/root/miniconda3/envs/lab/include/python3.7m -c third_party/boringssl-with-bazel/src/crypto/cpu-arm.c -o python_build/temp.linux-x86_64-cpython-37/third_party/boringssl-with-bazel/src/crypto/cpu-arm.o -std=c++14 -fvisibility=hidden -fno-wrapv -fno-exceptions -pthread
#11 818.8   cc1: warning: command line option ‘-std=c++14’ is valid for C++/ObjC++ but not for C
#11 818.8   gcc -pthread -B /root/miniconda3/envs/lab/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -D_WIN32_WINNT=1536 -DGRPC_XDS_USER_AGENT_NAME_SUFFIX=\"Python\" -DGRPC_XDS_USER_AGENT_VERSION_SUFFIX=\"1.48.0\" -DGPR_BACKWARDS_COMPATIBILITY_MODE=1 -DHAVE_CONFIG_H=1 -DGRPC_ENABLE_FORK_SUPPORT=1 "-DPyMODINIT_FUNC=extern \"C\" __attribute__((visibility (\"default\"))) PyObject*" -DGRPC_POSIX_FORK_ALLOW_PTHREAD_ATFORK=1 -Isrc/python/grpcio -Iinclude -I. -Ithird_party/abseil-cpp -Ithird_party/address_sorting/include -Ithird_party/cares/cares/include -Ithird_party/cares -Ithird_party/cares/cares -Ithird_party/cares/config_linux -Ithird_party/re2 -Ithird_party/boringssl-with-bazel/src/include -Ithird_party/upb -Isrc/core/ext/upb-generated -Isrc/core/ext/upbdefs-generated -Ithird_party/xxhash -Ithird_party/zlib -I/root/miniconda3/envs/lab/include/python3.7m -c third_party/boringssl-with-bazel/src/crypto/cpu-intel.c -o python_build/temp.linux-x86_64-cpython-37/third_party/boringssl-with-bazel/src/crypto/cpu-intel.o -std=c++14 -fvisibility=hidden -fno-wrapv -fno-exceptions -pthread
#11 818.8   gcc -pthread -B /root/miniconda3/envs/lab/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -D_WIN32_WINNT=1536 -DGRPC_XDS_USER_AGENT_NAME_SUFFIX=\"Python\" -DGRPC_XDS_USER_AGENT_VERSION_SUFFIX=\"1.48.0\" -DGPR_BACKWARDS_COMPATIBILITY_MODE=1 -DHAVE_CONFIG_H=1 -DGRPC_ENABLE_FORK_SUPPORT=1 "-DPyMODINIT_FUNC=extern \"C\" __attribute__((visibility (\"default\"))) PyObject*" -DGRPC_POSIX_FORK_ALLOW_PTHREAD_ATFORK=1 -Isrc/python/grpcio -Iinclude -I. -Ithird_party/abseil-cpp -Ithird_party/address_sorting/include -Ithird_party/cares/cares/include -Ithird_party/cares -Ithird_party/cares/cares -Ithird_party/cares/config_linux -Ithird_party/re2 -Ithird_party/boringssl-with-bazel/src/include -Ithird_party/upb -Isrc/core/ext/upb-generated -Isrc/core/ext/upbdefs-generated -Ithird_party/xxhash -Ithird_party/zlib -I/root/miniconda3/envs/lab/include/python3.7m
#11 818.8 [output clipped, log limit 1MiB reached]
#11 818.8
#11 818.8 failed
------
executor failed running [/bin/bash -c . ~/miniconda3/etc/profile.d/conda.sh &&     conda create -n lab python=3.7.3 -y &&     conda activate lab &&     conda env update -f environment.yml &&     conda clean -y --all &&     rm -rf ~/.cache/pip]: exit code: 1

opened by jtruxon 1

`optimizer.step()` before `lr_scheduler.step()` Warning Occurred

I really appreciate to you for your book, It's a great help for me to start RL. ^^

Describe the bug A clear and concise description of what the bug is. When executing example code 4.7 (vanilla_dpn without any change), there comes a warning msg as below

To Reproduce

OS and environment: Ubuntu 20.04
SLM Lab git SHA (run git rev-parse HEAD to get it): 5fa5ee3d034a38d5644f6f96b4c02ec366c831d0 (from the file "SLM-lab/data/vanilla_dqn_boltzmann_cartpole_2022_07_15_092012/vanilla_dqn_boltzmann_cartpole_t0_spec.json")
spec file used: SLM-lab/slm_lab/spec/benchmark/dqn/dqn_cartpole.json

Additional context After it occurred, it proceeded too slow (it took over an hour) than other methods (15 minutes for SARSA), and the result is also strange that mean_returns_ma decreases gradually to about 50 after 30k frames. I wonder the result of this trial is related to the warning situation

Error logs

[2022-07-15 09:20:14,002 PID:245693 INFO logger.py info] Running RL loop for trial 0 session 3
[2022-07-15 09:20:14,006 PID:245693 INFO __init__.py log_summary] Trial 0 session 3 vanilla_dqn_boltzmann_cartpole_t0_s3 [train_df] epi: 0  t: 0  wall_t: 0  opt_step: 0  frame: 0  fps: 0  total_reward: nan  total_reward_ma: nan  loss: nan  lr: 0.01  explore_var: 5  entropy_coef: nan  entropy: nan  grad_norm: nan
/home/eric/miniconda3/envs/lab/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:100: UserWarning:

Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule.See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate

/home/eric/miniconda3/envs/lab/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:100: UserWarning:

Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule.See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate

/home/eric/miniconda3/envs/lab/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:100: UserWarning:

Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule.See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate

opened by younghwa-hong 0

how to improve the convergence performance of training loss?

Hi kengz, I find that the convergence performance of training loss (=value loss+policy loss) of ppo algorithem applied in game pong is poor (see Fig.1), but the corresponding mean_returns shows a good upward trend and reaches convergence (see Fig.2). That is why? how to improve the convergence performance of training loss? I tried many imporved tricks with ppo, but none of them worked. Fig.1 Fig.2

opened by Jzhou0 0

Releases(v4.2.4)

v4.2.4(Dec 18, 2021)
What's Changed

upgrade plotly, replace orca with kaleido by @kengz in https://github.com/kengz/SLM-Lab/pull/501

Full Changelog: https://github.com/kengz/SLM-Lab/compare/v4.2.3...v4.2.4
Source code(tar.gz)
Source code(zip)
v4.2.3(Dec 6, 2021)
What's Changed

Added Algorithms config files for VideoPinball-v0 game by @dd-iuonac in https://github.com/kengz/SLM-Lab/pull/488

fix build for new RTX GPUs by @kengz and @Karl-Grantham in https://github.com/kengz/SLM-Lab/pull/496

remove the reinforce_pong.json spec to prevent confusion in https://github.com/kengz/SLM-Lab/pull/499

New Contributors

@dd-iuonac made their first contribution in https://github.com/kengz/SLM-Lab/pull/488

@Karl-Grantham for help with debugging #496

Full Changelog: https://github.com/kengz/SLM-Lab/compare/v4.2.2...v4.2.3
Source code(tar.gz)
Source code(zip)
v4.2.2(May 25, 2021)
Improve Installation Stability

:raised_hands: Thanks to @Nickfagiano help with debugging.

#487 update installation to work with MacOS BigSur

#487 improve setup with Conda path guard

#487 lock atari-py version to 0.2.6 for safety

Google Colab/Jupyter

:raised_hands: Thanks to @piosif97 for helping.

added instruction for Google Colab/Jupyter

example Colab notebook

Windows setup

:raised_hands: Thanks to @vladimirnitu and @steindaian for providing the PDF.

added Windows setup instruction

Source code(tar.gz)
Source code(zip)
v4.2.1(May 17, 2021)
Update installation

Dependencies and systems around SLM Lab has changed and caused some breakages. This release fixes these installation issues.

#461, #476 update to homebrew/cask (thanks @ben-e, @amjadmajid )

#463 add pybullet to dependencies (thanks @rafapi)

#483 fix missing install command in Arch Linux setup (thanks @sebimarkgraf)

#485 update GitHub Actions CI to v2

#485 fix demo spec to use strict json

Source code(tar.gz)
Source code(zip)
v4.2.0(Apr 14, 2020)
Resume mode

#455 adds train@ resume mode and refactors the enjoy mode. See PR for detailed info.

train@ usage example

Specify train mode as train@{predir}, where {predir} is the data directory of the last training run, or simply uselatest` to use the latest. e.g.:

python run_lab.py slm_lab/spec/benchmark/reinforce/reinforce_cartpole.json reinforce_cartpole train # terminate run before its completion # optionally edit the spec file in a past-future-consistent manner # run resume with either of the commands: python run_lab.py slm_lab/spec/benchmark/reinforce/reinforce_cartpole.json reinforce_cartpole train@latest # or to use a specific run folder python run_lab.py slm_lab/spec/benchmark/reinforce/reinforce_cartpole.json reinforce_cartpole train@data/reinforce_cartpole_2020_04_13_232521

enjoy mode refactor

The train@ resume mode API allows for the enjoy mode to be refactored. Both share similar syntax. Continuing with the example above, to enjoy a train model, we now use:

python run_lab.py slm_lab/spec/benchmark/reinforce/reinforce_cartpole.json reinforce_cartpole enjoy@data/reinforce_cartpole_2020_04_13_232521/reinforce_cartpole_t0_s0_spec.json

Plotly and PyTorch update

#453 updates Plotly to 4.5.4 and PyTorch to 1.3.1.

#454 explicitly shuts down Plotly orca server after plotting to prevent zombie processes

PPO batch size optimization

#453 adds chunking to allow PPO to run on larger batch size by breaking up the forward loop.

New OnPolicyCrossEntropy memory

#446 adds a new OnPolicyCrossEntropy memory class. See PR for details. Credits to @ingambe.

Source code(tar.gz)
Source code(zip)
v4.1.1(Nov 13, 2019)
Discrete SAC benchmark update

Upload PR #429

Dropbox data

|||||||| |:---:|:---:|:---:|:---:|:---:|:---:|:---:| | Env. \ Alg. | DQN | DDQN+PER | A2C (GAE) | A2C (n-step) | PPO | SAC | | Breakout
graph
| 80.88 | 182 | 377 | 398 | 443 | 3.51* | | Pong
graph
| 18.48 | 20.5 | 19.31 | 19.56 | 20.58 | 19.87* | | Seaquest
graph
| 1185 | 4405 | 1070 | 1684 | 1715 | 171* | | Qbert
graph
| 5494 | 11426 | 12405 | 13590 | 13460 | 923* | | LunarLander
graph
| 192 | 233 | 25.21 | 68.23 | 214 | 276 | | UnityHallway
graph
| -0.32 | 0.27 | 0.08 | -0.96 | 0.73 | 0.01 | | UnityPushBlock
graph
| 4.88 | 4.93 | 4.68 | 4.93 | 4.97 | -0.70 |

Episode score at the end of training attained by SLM Lab implementations on discrete-action control problems. Reported episode scores are the average over the last 100 checkpoints, and then averaged over 4 Sessions. A Random baseline with score averaged over 100 episodes is included. Results marked with * were trained using the hybrid synchronous/asynchronous version of SAC to parallelize and speed up training time. For SAC, Breakout, Pong and Seaquest were trained for 2M frames instead of 10M frames.

For the full Atari benchmark, see Atari Benchmark

Source code(tar.gz)
Source code(zip)
v4.1.0(Oct 29, 2019)
This marks a stable release of SLM Lab with full benchmark results

RAdam+Lookahead optimizer

Lookahead + RAdam optimizer significantly improves the performance of some RL algorithms (A2C (n-step), PPO) on continuous domain problems, but does not improve (A2C (GAE), SAC). #416

TensorBoard

Add TensorBoard in body to auto-log summary variables, graph, network parameter histograms, action histogram. To launch TensorBoard, run tensorboard --logdir=data after a session/trial is completed. Example screenshot:

Full Benchmark Upload

Plot Legend

Discrete Benchmark

Upload PR #427

Dropbox data

|||||||| |:---:|:---:|:---:|:---:|:---:|:---:|:---:| | Env. \ Alg. | DQN | DDQN+PER | A2C (GAE) | A2C (n-step) | PPO | SAC | | Breakout
graph
| 80.88 | 182 | 377 | 398 | 443 | - | | Pong
graph
| 18.48 | 20.5 | 19.31 | 19.56 | 20.58 | 19.87* | | Seaquest
graph
| 1185 | 4405 | 1070 | 1684 | 1715 | - | | Qbert
graph
| 5494 | 11426 | 12405 | 13590 | 13460 | 214* | | LunarLander
graph
| 192 | 233 | 25.21 | 68.23 | 214 | 276 | | UnityHallway
graph
| -0.32 | 0.27 | 0.08 | -0.96 | 0.73 | - | | UnityPushBlock
graph
| 4.88 | 4.93 | 4.68 | 4.93 | 4.97 | - |

Episode score at the end of training attained by SLM Lab implementations on discrete-action control problems. Reported episode scores are the average over the last 100 checkpoints, and then averaged over 4 Sessions. Results marked with * were trained using the hybrid synchronous/asynchronous version of SAC to parallelize and speed up training time.

For the full Atari benchmark, see Atari Benchmark

Continuous Benchmark

Upload PR #427

Dropbox data

|||||| |:---:|:---:|:---:|:---:|:---:| | Env. \ Alg. | A2C (GAE) | A2C (n-step) | PPO | SAC | | RoboschoolAnt
graph
| 787 | 1396 | 1843 | 2915 | | RoboschoolAtlasForwardWalk
graph
| 59.87 | 88.04 | 172 | 800 | | RoboschoolHalfCheetah
graph
| 712 | 439 | 1960 | 2497 | | RoboschoolHopper
graph
| 710 | 285 | 2042 | 2045 | | RoboschoolInvertedDoublePendulum
graph
| 996 | 4410 | 8076 | 8085 | | RoboschoolInvertedPendulum
graph
| 995 | 978 | 986 | 941 | | RoboschoolReacher
graph
| 12.9 | 10.16 | 19.51 | 19.99 | | RoboschoolWalker2d
graph
| 280 | 220 | 1660 | 1894 | | RoboschoolHumanoid
graph
| 99.31 | 54.58 | 2388 | 2621* | | RoboschoolHumanoidFlagrun
graph
| 73.57 | 178 | 2014 | 2056* | | RoboschoolHumanoidFlagrunHarder
graph
| -429 | 253 | 680 | 280* | | Unity3DBall
graph
| 33.48 | 53.46 | 78.24 | 98.44 | | Unity3DBallHard
graph
| 62.92 | 71.92 | 91.41 | 97.06 |

Episode score at the end of training attained by SLM Lab implementations on continuous control problems. Reported episode scores are the average over the last 100 checkpoints, and then averaged over 4 Sessions. Results marked with * require 50M-100M frames, so we use the hybrid synchronous/asynchronous version of SAC to parallelize and speed up training time.

Atari Benchmark

Upload PR #427

Dropbox data: DQN

Dropbox data: DDQN+PER

Dropbox data: A2C (GAE)

Dropbox data: A2C (n-step)

Dropbox data: PPO

Dropbox data: all Atari graphs

||||||| |:---:|:---:|:---:|:---:|:---:|:---:| | Env. \ Alg. | DQN | DDQN+PER | A2C (GAE) | A2C (n-step) | PPO | | Adventure
graph
| -0.94 | -0.92 | -0.77 | -0.85 | -0.3 | | AirRaid
graph
| 1876 | 3974 | 4202 | 3557 | 4028 | | Alien
graph
| 822 | 1574 | 1519 | 1627 | 1413 | | Amidar
graph
| 90.95 | 431 | 577 | 418 | 795 | | Assault
graph
| 1392 | 2567 | 3366 | 3312 | 3619 | | Asterix
graph
| 1253 | 6866 | 5559 | 5223 | 6132 | | Asteroids
graph
| 439 | 426 | 2951 | 2147 | 2186 | | Atlantis
graph
| 68679 | 644810 | 2747371 | 2259733 | 2148077 | | BankHeist
graph
| 131 | 623 | 855 | 1170 | 1183 | | BattleZone
graph
| 6564 | 6395 | 4336 | 4533 | 13649 | | BeamRider
graph
| 2799 | 5870 | 2659 | 4139 | 4299 | | Berzerk
graph
| 319 | 401 | 1073 | 763 | 860 | | Bowling
graph
| 30.29 | 39.5 | 24.51 | 23.75 | 31.64 | | Boxing
graph
| 72.11 | 90.98 | 1.57 | 1.26 | 96.53 | | Breakout
graph
| 80.88 | 182 | 377 | 398 | 443 | | Carnival
graph
| 4280 | 4773 | 2473 | 1827 | 4566 | | Centipede
graph
| 1899 | 2153 | 3909 | 4202 | 5003 | | ChopperCommand
graph
| 1083 | 4020 | 3043 | 1280 | 3357 | | CrazyClimber
graph
| 46984 | 88814 | 106256 | 109998 | 116820 | | Defender
graph
| 281999 | 313018 | 665609 | 657823 | 534639 | | DemonAttack
graph
| 1705 | 19856 | 23779 | 19615 | 121172 | | DoubleDunk
graph
| -21.44 | -22.38 | -5.15 | -13.3 | -6.01 | | ElevatorAction
graph
| 32.62 | 17.91 | 9966 | 8818 | 6471 | | Enduro
graph
| 437 | 959 | 787 | 0.0 | 1926 | | FishingDerby
graph
| -88.14 | -1.7 | 16.54 | 1.65 | 36.03 | | Freeway
graph
| 24.46 | 30.49 | 30.97 | 0.0 | 32.11 | | Frostbite
graph
| 98.8 | 2497 | 277 | 261 | 1062 | | Gopher
graph
| 1095 | 7562 | 929 | 1545 | 2933 | | Gravitar
graph
| 87.34 | 258 | 313 | 433 | 223 | | Hero
graph
| 1051 | 12579 | 16502 | 19322 | 17412 | | IceHockey
graph
| -14.96 | -14.24 | -5.79 | -6.06 | -6.43 | | Jamesbond
graph
| 44.87 | 702 | 521 | 453 | 561 | | JourneyEscape
graph
| -4818 | -2003 | -921 | -2032 | -1094 | | Kangaroo
graph
| 1965 | 8897 | 67.62 | 554 | 4989 | | Krull
graph
| 5522 | 6650 | 7785 | 6642 | 8477 | | KungFuMaster
graph
| 2288 | 16547 | 31199 | 25554 | 34523 | | MontezumaRevenge
graph
| 0.0 | 0.02 | 0.08 | 0.19 | 1.08 | | MsPacman
graph
| 1175 | 2215 | 1965 | 2158 | 2350 | | NameThisGame
graph
| 3915 | 4474 | 5178 | 5795 | 6386 | | Phoenix
graph
| 2909 | 8179 | 16345 | 13586 | 30504 | | Pitfall
graph
| -68.83 | -73.65 | -101 | -31.13 | -35.93 | | Pong
graph
| 18.48 | 20.5 | 19.31 | 19.56 | 20.58 | | Pooyan
graph
| 1958 | 2741 | 2862 | 2531 | 6799 | | PrivateEye
graph
| 784 | 303 | 93.22 | 78.07 | 50.12 | | Qbert
graph
| 5494 | 11426 | 12405 | 13590 | 13460 | | Riverraid
graph
| 953 | 10492 | 8308 | 7565 | 9636 | | RoadRunner
graph
| 15237 | 29047 | 30152 | 31030 | 32956 | | Robotank
graph
| 3.43 | 9.05 | 2.98 | 2.27 | 2.27 | | Seaquest
graph
| 1185 | 4405 | 1070 | 1684 | 1715 | | Skiing
graph
| -14094 | -12883 | -19481 | -14234 | -24713 | | Solaris
graph
| 612 | 1396 | 2115 | 2236 | 1892 | | SpaceInvaders
graph
| 451 | 670 | 733 | 750 | 797 | | StarGunner
graph
| 3565 | 38238 | 44816 | 48410 | 60579 | | Tennis
graph
| -23.78 | -10.33 | -22.42 | -19.06 | -11.52 | | TimePilot
graph
| 2819 | 1884 | 3331 | 3440 | 4398 | | Tutankham
graph
| 35.03 | 159 | 161 | 175 | 211 | | UpNDown
graph
| 2043 | 11632 | 89769 | 18878 | 262208 | | Venture
graph
| 4.56 | 9.61 | 0.0 | 0.0 | 11.84 | | VideoPinball
graph
| 8056 | 79730 | 35371 | 40423 | 58096 | | WizardOfWor
graph
| 869 | 328 | 1516 | 1247 | 4283 | | YarsRevenge
graph
| 5816 | 15698 | 27097 | 11742 | 10114 | | Zaxxon
graph
| 442 | 54.28 | 64.72 | 24.7 | 641 |

The table above presents results for 62 Atari games. All agents were trained for 10M frames (40M including skipped frames). Reported results are the episode score at the end of training, averaged over the previous 100 evaluation checkpoints with each checkpoint averaged over 4 Sessions. Agents were checkpointed every 10k training frames.

Source code(tar.gz)
Source code(zip)
v4.0.1(Aug 11, 2019)
This release adds a new algorithm: Soft Actor-Critic (SAC).

Soft Actor-Critic

-implement the original paper: "Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor" https://arxiv.org/abs/1801.01290 #398

implement the improvement of SAC paper: "Soft Actor-Critic Algorithms and Applications" https://arxiv.org/abs/1812.05905 #399

extend SAC to work directly for discrete environment using GumbelSoftmax distribution (custom)

Roboschool (continuous control) Benchmark

Note that the Roboschool reward scales are different from MuJoCo's.

| Env. \ Alg. | SAC | |:---|---| | RoboschoolAnt | 2451.55
graph
| | RoboschoolHalfCheetah | 2004.27
graph
| | RoboschoolHopper | 2090.52
graph
| | RoboschoolWalker2d | 1711.92
graph
|

LunarLander (discrete control) Benchmark

| | | |---|---| ||| | Trial graph | Moving average |
Source code(tar.gz)
Source code(zip)
v4.0.0(Jul 31, 2019)
This release corrects and optimizes all the algorithms from benchmarking on Atari. New metrics are introduced. The lab's API is also redesigned for simplicity.

Benchmark

full algorithm benchmark on 4 core Atari environments #396

LunarLander benchmark #388 and BipedalWalker benchmark #377

This benchmark table is pulled from PR396. See the full benchmark results here.

| Env. \ Alg. | A2C (GAE) | A2C (n-step) | PPO | DQN | DDQN+PER | |:---|---|---|---|---|---| | Breakout
graph
| 389.99
graph
| 391.32
graph
| 425.89
graph
| 65.04
graph
| 181.72
graph
| | Pong
graph
| 20.04
graph
| 19.66
graph
| 20.09
graph
| 18.34
graph
| 20.44
graph
| | Qbert
graph
| 13,328.32
graph
| 13,259.19
graph
| 13,691.89
graph
| 4,787.79
graph
| 11,673.52
graph
| | Seaquest
graph
| 892.68
graph
| 1,686.08
graph
| 1,583.04
graph
| 1,118.50
graph
| 3,751.34
graph
|

Algorithms

correct and optimize all algorithms with benchmarking #315 #327 #328 #361

introduce "shared" and "synced" Hogwild modes for distributed training #337 #340

streamline and optimize agent components too

Now, the full list of algorithms are:

SARSA

DQN, distributed-DQN

Double-DQN, Dueling-DQN, PER-DQN

REINFORCE

A2C, A3C (N-step & GAE)

PPO, distributed-PPO

SIL (A2C, PPO) All the algorithms can be ran in distributed mode also; which in some cases they have their special names (mentioned above)

Environments

implement vector environments #302

implement more environment wrappers for preprocessing. Some replay memories are retired. #303 #330 #331 #342

make Lab Env wrapper interface identical to gym #304, #305, #306, #307

API

all the Space objects (AgentSpace, EnvSpace, AEBSpace, InfoSpace) are retired, to opt for a much simpler interface. #335 #348

major API simplification throughout

Analysis

rework analysis, introduce new metrics: strength, sample efficiency, training efficiency, stability, consistency #347 #349

fast evaluation using vectorized env for rigorous_eval #390 , and using inference for fast eval #391

Search

update and rework Ray search #350 #351

Source code(tar.gz)
Source code(zip)
v3.2.1(Apr 17, 2019)
Improve installation

#288 split out yarn installation as extra step

Improve functions

#283 #284 redesign fitness slightly

#281 simplify PER sample index

#287 #290 improve DQN polyak and network switching

#291 refactor advantage functions

#295 #296 refactor various utils, fix PyTorch inplace ops

Add out layer activation

#300 add out layer activation

Source code(tar.gz)
Source code(zip)
v3.2.0(Feb 5, 2019)
Eval rework

#275 #278 #279 #280

This release adds an eval mode that is the same as OpenAI baseline. Spawn 2 environments, 1 for training and 1 more eval. In the same process (blocking), run training as usual, then at ckpt, run an episode on eval env and update stats.

The logic for the stats are the same as before, except the original body.df is now split into two: body.train_df and body.eval_df. Eval df uses the main env stats except for t, reward to reflect progress on eval env. Correspondingly, session analysis also produces both versions of data.

Data from body.eval_df is used to generate session_df, session_graph, session_fitness_df, whereas the data from body.train_df is used to generate a new set of trainsession_df, trainsession_graph, trainsession_fitness_df for debugging.

The previous process-based eval functionality is kept, but is now considered as parallel_eval. This can be useful for more robust checkpointing and eval.

Refactoring

#279

purge useless computations

properly and efficiently gather and organize all update variable computations.

This also speeds up run time by x2. For Atari Beamrider with DQN on V100 GPU, manual benchmark measurement gives 110 FPS for training every 4 frames, while eval achieves 160 FPS. This translates to 10M frames in roughly 24 hours.
Source code(tar.gz)
Source code(zip)
v3.1.1(Jan 20, 2019)
Docker image kengz/slm_lab:v3.0.0 released

Add Retro Eval

#270 add retro eval mode to run fail online eval sessions. Use command yarn retro_eval data/reinforce_cartpole_2018_01_22_211751

#272 #273 fix eval saving 0 index to eval_session_df causing trial analysis to break; add reset_index for safety

fix Boltzmann spec

#271 change Boltzmann spec to use Categorical instead of the wrong Argmax

misc

#273 update colorlover package to proper pip after they fixed division error

#274 remove unused torchvision package to lighten build

Source code(tar.gz)
Source code(zip)
v3.1.0(Jan 9, 2019)
v3.1.0: L1 fitness norm, code and spec refactor, online eval

Docker image kengz/slm_lab:v3.1.0 released

L1 fitness norm (breaking change)

change fitness vector norm from L2 to L1 for intuitiveness and non-extreme values

code and spec refactor

#254 PPO cleanup: remove hack and restore minimization scheme

#255 remove use_gae and use_nstep param to infer from lam, num_step_returns

#260 fix decay start_step offset, add unit tests for rate decay methods

#262 make epi start from 0 instead of 1 for code logic consistency

#264 switch max_total_t, max_epi to max_tick and max_tick_unit for directness. retire graph_x for the unit above

#266 add Atari fitness std, fix CUDA coredump issue

#269 update gym, remove box2d hack

Online Eval mode

#252 #257 #261 #267 Evaluation sessions during training on a subprocess. This does not interfere with the training process, but spawns multiple subprocesses to do independent evaluation, which then adds to an eval file, and at the end a final eval will finish and plot all the graphs and save all the data for eval.

enabled by meta spec 'training_eval'

configure NUM_EVAL_EPI in analysis.py

update enjoy and eval mode syntax. see README.

change ckpt behavior to use e.g. tag ckpt-epi10-totalt1000

add new eval mode to lab. runs on a checkpoint file. see below

Eval Session

add a proper eval Session which loads from the ckpt like above, and does not interfere with existing files. This can be ran on terminal, and it's also used by the internal eval logic, e.g. command python run_lab.py data/dqn_cartpole_2018_12_20_214412/dqn_cartpole_t0_spec.json dqn_cartpole eval@dqn_cartpole_t0_s2_ckpt-epi10-totalt1000

when eval session is done, it will average all of its ran episodes and append to a row in an eval_session_df.csv

after that it will delete the ckpt files it had just used (to prevent large storage)

then, it will run a trial analysis to update eval_trial_graph.png, and an accompanying trial_df as average of all session_dfs

How eval mode works

checkpoint will save the models using the scheme which records its epi and total_t. This allows one to eval using the ckpt model

after creating ckpt files, if spec.meta.training_eval intrainmode, a subprocess will launch using the ckpt prepath to run an eval Session, using the same way abovepython run_lab.py data/dqn_cartpole_2018_12_20_214412/dqn_cartpole_t0_spec.json dqn_cartpole eval@dqn_cartpole_t0_s2_ckpt-epi10-totalt1000`

eval session runs as above. ckpt will now run at the starting timestep, ckpt timestep, and at the end

the main Session will wait for the final eval session and it's final eval trial to finish before closing, to ensure that other processes like zipping wait for them.

Example eval trial graph:

Source code(tar.gz)
Source code(zip)
v3.0.0(Dec 3, 2018)
V3: PyTorch 1.0, faster Neural Network, Variable Scheduler

Docker image kengz/slm_lab:v3.0.0 released

PRs included #240 #241 #239 #238 #244 #248

PyTorch 1.0 and parallel CUDA

switch to PyTorch 1.0 with various improvements and parallel CUDA fix

new Neural Network API (breaking changes)

To accommodate more advanced features and improvements, all the networks have been improved with better spec and code design, faster operations, and added features

single-tail networks will now not use list but a single tail for fast output compute (for loop is slow)

use PyTorch optim.lr_scheduler for learning rate decay. retire old methods.

more efficient spec format for network, clip_grad, lr_scheduler_spec

fix and add proper generalization for ConvNet and RecurrentNet

add full basic network unit tests

DQN

rewrite DQN loss for 2x speedup and code simplicity. extend to SARSA

retire MultitaskDQN for HydraDQN

Memory

add OnpolicyConcatReplay

standardize preprocess_state logic in onpolicy memories

Variable Scheduler (breaking spec changes)

implement variable decay class VarScheduler similar to pytorch's LR scheduler. use clock with flexible scheduling units epi or total_t

unify VarScheduler to use standard clock.max_tick_unit specified from env

retire action_policy_update, update agent spec to explore_var_spec

replace entropy_coef with entropy_coef_spec

replace clip_eps with clip_eps_spec (PPO)

update all specs

Math util

move decay methods to math_util.py

move math_util.py from algorithm/ to lib/

env max tick (breaking spec changes)

spec/variable renamings:

max_episode to max_epi

max_timestep to max_t

save_epi_frequency to save_frequency

traininig_min_timestep to training_start_step

allow env to stop based on max_epi as well as max_total_t. propagate clock unit usage

introduce max_tick, max_tick_unit properties to env and clock from above

allow save_frequency to use the same units accordingly

update Pong and Beamrider to use max_total_t as end-condition

Update Ray to reenable CUDA in search

update ray from 0.3.1 to 0.5.3 to address broken GPU with pytorch 1.0.0

to fix CUDA not discovered in Ray worker, have to manually set CUDA devices at ray remote function due to poor design.

Improved logging and Enjoy mode

#243 #245

Best models checkpointing measured using the the reward_ma

Early termination if the environment is solved

method for logging learning rate to session data frame needed to be updated after move to PyTorch lr_scheduler

Also removed training_net from the mean learning rate reported in the session dataframe since the learning rate doesn't change

update naming scheme to work with enjoy mode

unify and simplify prepath methods

info_space now uses a ckpt for loading ckpt model. Example usage: yarn start pong.json dqn_pong enjoy@data/dqn_cartpole_2018_12_02_124127/dqn_cartpole_t0_s0_ckptbest

update agent load and policy to properly set variables to end_val in enjoy mode

random-seed env as well

Working Atari

#242 Atari benchmark had been failing, but the root cause had finally been discovered and fix: wrong image preprocessing. This can be due to several factors, and we are doing ablation studies to check against the old code: - Image normalization cause the input values to be lowered by ~255, and the resultant loss is too small for optimizer.

blackframes in stacking at the beginning timesteps

wrong image permutation

PR #242 introduces:

global environment preprocessor in the form of env wrapper borrowed from OpenAI baselines, in env/wrapper.py

a TransformImage to do the proper image transform: grayscale, downsize, and shape from w,h,c to PyTorch format c,h,w

a FrameStack which uses LazyFrames for efficiency to replace the agent-specific Atari stack frame preprocessing. This simplifies the Atari memories

update convnet to use honest shape (c,h,w) without extra transform, and remove its expensive image axis permutation since input now is in the right shape

update Vizdoom to produce (c,h,w) shape consistent with convnet input expectation

Tuned parameters will be obtained and released next version.

Attached is a quick training curve on Pong, DQN, where the solution avg is +18:
Source code(tar.gz)
Source code(zip)
v2.2.0(Nov 3, 2018)
Add VizDoom environment

#222 #224

add new OnPolicyImageReplay and ImageReplay memories

add VizDoom environment, thanks to @joelouismarino

Add NN Weight Initialization functionality

#223 #225

allow specification of NN weight init function in spec, thanks to @mwcvitkovic

Update Plotly to v3

#221

move to v3 to allow Python based (instead of bash) image saving for stability

Fixes

#207 fix PPO loss function broken during refactoring

#217 fix multi-device CUDA parallelization in grad assignment

Source code(tar.gz)
Source code(zip)
v2.1.2(Oct 2, 2018)
Benchmark

#177 #183 zip experiment data file for easy upload

#178 #186 #188 #194 add benchmark spec files

#193 add benchmark standard data to compute fitness

#196 add benchmark mode

Reward scaling

#175 add environment-specific reward scaling

HydraDQN

#175 HydraDQN works on cartpole and 2dball using reward scaling. spec committed

Add code of conduct

#199 add a code of conduct file for community

Misc

#172 add MA reward to dataframe

#174 refactor session parallelization

#196 add sys args to run lab

#198 add train@ mode

Source code(tar.gz)
Source code(zip)
v2.1.1(Sep 15, 2018)

Enable Distributed CUDA

#170 Fix the long standing pytorch + distributed using spawn multiprocessing due to Lab classes not pickleable. Just let the class wrapped in a mp_runner passed as mp.Process(target=mp_runner, args) so the classes don't get cloned from memory when spawning process, since it is now passed from outside.

DQN replace method fix

#169 DQN target network replacement was in the wrong direction. Fix that.

AtariPrioritizedReplay

#170 #171 Add a quick AtariPrioritizedReplay via some multi-inheritance black magic with PrioritizedReplay, AtariReplay
Source code(tar.gz)
Source code(zip)
v2.1.0(Sep 9, 2018)
This release optimizes the RAM consumption and memory sampling speed after stress-testing with Atari. RAM growth is curbed, and replay memory RAM usage is now near theoretical optimality.

Thanks to @mwcvitkovic for providing major help with this release.

Remove DataSpace history

#163

debug and fix memory growth (cause: data space saving history)

remove history saving altogether, and mdp data. remove aeb add_single. This changes the API.

create body.df to track data efficiently as a replacement. This is the API replacement for above.

Optimize Replay Memory RAM

#163 first optimization, halves replay RAM

make memory state numpy storage float16 to accommodate big memory size. half a million max_size virtual memory goes from 200GB to 50GB

memory index sampling for training with large size is very slow. add a method fast_uniform_sampling to speed up

#165 second optimization, halves replay RAM again to the theoretical minimum

do not save next_states for replay memories due to redundancy

replace with sentinel self.latest_next_states during sampling

1 mil max_size for Atari replay now consumes 50Gb instead of 100Gb (was 200Gb before float16 downcasting in #163 )

Add OnPolicyAtariReplay

#164

add OnPolicyAtariReplay memory so that policy based algorithms can be applied to the Atari suite.

Misc

#157 allow usage as a python module via pip install -e . or python setup.py install

#160 guard lab default.json creation on first install

#161 fix agent save method, improve logging

#162 split logger by session for easier debugging

#164 fix N-Step-returns calculation

#166 fix pandas weird casting breaking issue causing process to hang

#167 uninstall unused tensorflow and tensorboard that come with Unity ML-Agents. rebuild Docker image.

#168 rebuild Docker and CI images

Source code(tar.gz)
Source code(zip)
v2.0.0(Sep 3, 2018)
This major v2.0.0 release addresses the user feedbacks on usability and feature requests:

makes the singleton case (single-agent-env) default

adds CUDA GPU support for all algorithms (except for distributed)

adds distributed training to all algorithms (ala A3C style)

optimizes compute, fixes some computation bugs

Note that this release is backward-incompatible with v1.x. and earlier.

v2.0.0: make components independent of the framework so it can be used outside of SLM-Lab for development and production, and improve usability. Backward-incompatible with v1.x.

Singleton Mode as Default

#153

singleton case (single-agent-env-body) is now the default. Any implementations need only to worry about singleton. Uses the Session in lab.

space case (multi-agent-env-body) is now an extension from singleton case. Simply add space_{method} to handle the space logic. Uses the SpaceSession in lab.

make components more independent from framework

major logic simplification to improve usability. Simplify the AEB and init sequences. remove post_body_init()

make network update and grad norm check more robust

CUDA support

#153

add attribute Net.cuda_id for device assignment (per network basis), and auto-calculate the cuda_id by trial and session index to distribute jobs

enable CUDA and add GPU support for all algorithms, except for distributed (A3C, DPPO etc.)

properly assign tensors to CUDA automatically depending if GPU is available and desired

run unit tests on machine with GTX 1070

Distributed Training

#153 #148

add distributed key to meta spec

enable distributed training using pytorch multiprocessing. Create new DistSession class which acts as the worker.

In distributed training, Trial creates the global networks for agents, then passes to and spawns DistSession. Effectively, the semantics of a session changes from being a disjoint copy to being a training worker.

make distributed usable for both singleton (single agent) and space (multiagent) cases.

add distributed cases to unit tests

State Normalization

#155

add state normalization using running mean and std: state = (state - mean) / std

apply to all algorithms

TODO conduct a large scale systematic study of the effect is state normalization vs without it

Bug Fixes and Improvements

#153

save() and load() now include network optimizers

refactor set_manual_seed to util

rename StackReplay to ConcatReplay for clarity

improve network training check of weights and grad norms

introduce BaseEnv as base class to OpenAIEnv and UnityEnv

optimize computations, major refactoring

update Dockerfile and release

Misc

#155 add state normalization using running mean and std

#154 fix A2C advantage calculation for Nstep returns

#152 refactor SIL implementation using multi-inheritance

#151 refactor Memory module

#150 refactor Net module

#147 update grad clipping, norm check, multicategorical API

#156 fix multiprocessing for device with cuda, without using cuda

#156 fix multi policy arguments to be consistent, and add missing state append logic

Source code(tar.gz)
Source code(zip)
v1.1.2(Aug 8, 2018)

This release adds PPOSIL, fixes some small issues with continuous actions, and PPO ratio computation.

Implementations

#145 Implement PPOSIL. Improve debug logging #143 add Arch installer thanks to @angel-ayala

Bug Fixes

#138 kill hanging processes of Electron for plotting #145 fix PPO wrong graph update sequence causing ratio to be 1. Fix continuous action output construction. add guards. #146 fix continuous actions and add full tests
Source code(tar.gz)
Source code(zip)
v1.1.1(Jun 28, 2018)

This release adds some new implementations, and fixes some bugs from first benchmark runs.

Implementations

#127 Self-Imitation Learning #128 Checkpointing for saving models #129 Dueling Networks

Bug Fixes

#132 GPU test-run fixes #133 fix ActorCritic family loss compute getting detached, and linux plotting issues, add SHA to generated specs
Source code(tar.gz)
Source code(zip)
v1.1.0(Jun 19, 2018)
v1.1.0 Roadmap

v1.1.0 Docker release

Canonical Algorithms and Components

This release is research-ready.

Finish implementation of all canonical algorithms and components. All design is fully refactored and usable across components as suitable. This release is ready for research. Read the updated doc

SLM Lab implements most of the recent canonical algorithms and various extensions. These are used as the base of research.

Algorithm

code: slm_lab/agent/algorithm

Various algorithms are in fact extensions of some simpler ones, and they are implemented as such. This makes the code very concise.

Policy Gradient:

REINFORCE

AC (Vanilla Actor-Critic)

shared or separate actor critic networks

plain TD

entropy term control

A2C (Advantage Actor-Critic)

extension of AC with with advantage function

N-step returns as advantage

GAE (Generalized Advantage Estimate) as advantage

PPO (Proximal Policy Optimization)

extension of A3C with PPO loss function

Value-based:

SARSA

DQN (Deep Q Learning)

boltzmann or epsilon-greedy policy

DRQN (Recurrent DQN)

Double DQN

Double DRQN

Multitask DQN (multi-environment DQN)

Hydra DQN (multi-environment DQN)

Below are the modular building blocks for the algorithms. They are designed to be general, and are reused extensively.

Memory

code: slm_lab/agent/memory

For on-policy algorithms (policy gradient):

OnPolicyReplay

OnPolicySeqReplay

OnPolicyBatchReplay

OnPolicyBatchSeqReplay

For off-policy algorithms (value-based)

Replay

SeqReplay

StackReplay

AtariReplay

PrioritizedReplay

Neural Network

code: slm_lab/agent/net

These networks are usable for all algorithms.

MLPNet (Multi Layer Perceptron)

MLPHeterogenousTails (multi-tails)

HydraMLPNet (multi-heads, multi-tails)

RecurrentNet

ConvNet

Policy

code: slm_lab/agent/algorithm/policy_util.py

different probability distributions for sampling actions

default policy

Boltzmann policy

Epsilon-greedy policy

numerous rate decay methods

Source code(tar.gz)
Source code(zip)
v1.0.3(May 16, 2018)
New features and improvements

some code cleanup to prepare for the next version

DQN Atari working, not optimized yet

Dockerfile finished, ready to run lab at scale on server

implemented PPO in tensorflow from OpenAI, along with the utils

Source code(tar.gz)
Source code(zip)
v1.0.2(Mar 4, 2018)
New features and improvements

add EvolutionarySearch for hyperparameter search

rewrite and simplify the underlying Ray logic

fix categorical error in a2c

improve experiment graph: wider, add opacity

Source code(tar.gz)
Source code(zip)
v1.0.1(Feb 17, 2018)
New features and improvements

improve fitness computation after usage

add retro analysis script, via yarn analyze <dir>

improve plotly renderings

improve CNN and RNN architectures, bring to Reinforce

fine tune A2C and Reinforce specs

Source code(tar.gz)
Source code(zip)
v1.0.0(Feb 4, 2018)
This is the first stable release of the lab, with the core API and features finalized.

Refer to the docs: Github Repo | Lab Documentation | Experiment Log Book

Features

All the crucial features of the lab are stable and tested:

baseline algorithms

OpenAI gym, Unity environments

modular reusable components

multi-agents, multi-environments

scalable hyperparameter search with ray

useful graphs and analytics

fitness vector for universal benchmarking of agents, environments

Baselines

The first release includes the following algorithms, with more to come later.

DQN

Double DQN

REINFORCE

Option to add entropy to encourage exploration

Actor-Critic

Batch or episodic training

Shared or separate actor and critic params

Advantage calculated using n-step returns or generalized advantage estimation

Option to add entropy to encourage exploration

Source code(tar.gz)
Source code(zip)