Hi,
I've been running the MAgent Tiger-Deer environment with 2 different algorithms: a RandomLearner and rllib
's PPO. I'm also currently using rllib
's PettingZooEnv. It seems both of the algorithms work for some number of iterations, but then error-out in this line https://github.com/ray-project/ray/blob/master/rllib/env/pettingzoo_env.py#L161.
The issue is that the agent being selected, deer_92
, is not in the action_dict
. I checked the self.aec_env.dones
dict, however, and the agent is there. I added a snippet of the output, below. I printed out relevant info (shown after each == start step ==
) when entering the step() function. Furthermore, it also appears all steps prior to this error only select deer_0
as the agent. I've re-ran the experiment several times and it always has the same result (e.g., deer_0
is always chosen and then it errors-out once any other agent is chosen).
I'm not sure if this is an issue with rllib, the Tiger-Deer env, or my own config.
[2m[36m(pid=34152)[0m =============== start step =====================
[2m[36m(pid=34152)[0m self.aec_env.agent_selection --> deer_0
[2m[36m(pid=34152)[0m stepped_agents --> set()
[2m[36m(pid=34152)[0m list(action_dict) --> ['deer_0', 'deer_1', 'deer_2', 'deer_3', 'deer_4', 'deer_5', 'deer_6', 'deer_7', 'deer_8', 'deer_9', 'deer_10', 'deer_11', 'deer_12', 'deer_13', 'deer_14', 'deer_15', 'deer_16', 'deer_17', 'deer_18', 'deer_19', 'deer_20', 'deer_21', 'deer_22', 'deer_23', 'deer_24', 'deer_25', 'deer_26', 'deer_27', 'deer_28', 'deer_29', 'deer_30', 'deer_31', 'deer_32', 'deer_33', 'deer_34', 'deer_35', 'deer_36', 'deer_37', 'deer_38', 'deer_39', 'deer_40', 'deer_41', 'deer_42', 'deer_43', 'deer_44', 'deer_45', 'deer_46', 'deer_47', 'deer_48', 'deer_49', 'deer_50', 'deer_51', 'deer_52', 'deer_53', 'deer_54', 'deer_55', 'deer_56', 'deer_57', 'deer_58', 'deer_59', 'deer_60', 'deer_61', 'deer_62', 'deer_63', 'deer_64', 'deer_65', 'deer_66', 'deer_67', 'deer_68', 'deer_69', 'deer_70', 'deer_71', 'deer_72', 'deer_73', 'deer_74', 'deer_75', 'deer_76', 'deer_77', 'deer_78', 'deer_79', 'deer_80', 'deer_81', 'deer_82', 'deer_83', 'deer_84', 'deer_85', 'deer_86', 'deer_87', 'deer_88', 'deer_89', 'deer_90', 'deer_91', 'deer_92', 'deer_93', 'deer_94', 'deer_95', 'deer_96', 'deer_97', 'deer_98', 'deer_99', 'deer_100', 'tiger_0', 'tiger_1', 'tiger_2', 'tiger_3', 'tiger_4', 'tiger_5', 'tiger_6', 'tiger_7', 'tiger_8', 'tiger_9', 'tiger_10', 'tiger_11', 'tiger_12', 'tiger_13', 'tiger_14', 'tiger_15', 'tiger_16', 'tiger_17', 'tiger_18', 'tiger_19']
[2m[36m(pid=34152)[0m agent in action_dict --> True
[2m[36m(pid=34152)[0m agent in self.aec_env.dones --> False
[2m[36m(pid=34152)[0m =============== start step =====================
[2m[36m(pid=34152)[0m self.aec_env.agent_selection --> deer_0
[2m[36m(pid=34152)[0m stepped_agents --> set()
[2m[36m(pid=34152)[0m list(action_dict) --> ['deer_0', 'deer_1', 'deer_2', 'deer_3', 'deer_4', 'deer_5', 'deer_6', 'deer_7', 'deer_8', 'deer_9', 'deer_10', 'deer_11', 'deer_12', 'deer_13', 'deer_14', 'deer_15', 'deer_16', 'deer_17', 'deer_18', 'deer_19', 'deer_20', 'deer_21', 'deer_22', 'deer_23', 'deer_24', 'deer_25', 'deer_26', 'deer_27', 'deer_28', 'deer_29', 'deer_30', 'deer_31', 'deer_32', 'deer_33', 'deer_34', 'deer_35', 'deer_36', 'deer_37', 'deer_38', 'deer_39', 'deer_40', 'deer_41', 'deer_42', 'deer_43', 'deer_44', 'deer_45', 'deer_46', 'deer_47', 'deer_48', 'deer_49', 'deer_50', 'deer_51', 'deer_52', 'deer_53', 'deer_54', 'deer_55', 'deer_56', 'deer_57', 'deer_58', 'deer_59', 'deer_60', 'deer_61', 'deer_62', 'deer_63', 'deer_64', 'deer_65', 'deer_66', 'deer_67', 'deer_68', 'deer_69', 'deer_70', 'deer_71', 'deer_72', 'deer_73', 'deer_74', 'deer_75', 'deer_76', 'deer_77', 'deer_78', 'deer_79', 'deer_80', 'deer_81', 'deer_82', 'deer_83', 'deer_84', 'deer_85', 'deer_86', 'deer_87', 'deer_88', 'deer_89', 'deer_90', 'deer_91', 'deer_92', 'deer_93', 'deer_94', 'deer_95', 'deer_96', 'deer_97', 'deer_98', 'deer_99', 'deer_100', 'tiger_0', 'tiger_1', 'tiger_2', 'tiger_3', 'tiger_4', 'tiger_5', 'tiger_6', 'tiger_7', 'tiger_8', 'tiger_9', 'tiger_10', 'tiger_11', 'tiger_12', 'tiger_13', 'tiger_14', 'tiger_15', 'tiger_16', 'tiger_17', 'tiger_18', 'tiger_19']
[2m[36m(pid=34152)[0m agent in action_dict --> True
[2m[36m(pid=34152)[0m agent in self.aec_env.dones --> False
[2m[36m(pid=34152)[0m =============== start step =====================
[2m[36m(pid=34152)[0m self.aec_env.agent_selection --> deer_92
[2m[36m(pid=34152)[0m stepped_agents --> set()
[2m[36m(pid=34152)[0m list(action_dict) --> ['deer_0', 'deer_1', 'deer_2', 'deer_3', 'deer_4', 'deer_5', 'deer_6', 'deer_7', 'deer_8', 'deer_9', 'deer_10', 'deer_11', 'deer_12', 'deer_13', 'deer_14', 'deer_15', 'deer_16', 'deer_17', 'deer_18', 'deer_19', 'deer_20', 'deer_21', 'deer_22', 'deer_23', 'deer_24', 'deer_25', 'deer_26', 'deer_27', 'deer_28', 'deer_29', 'deer_30', 'deer_31', 'deer_32', 'deer_33', 'deer_34', 'deer_35', 'deer_36', 'deer_37', 'deer_38', 'deer_39', 'deer_40', 'deer_41', 'deer_42', 'deer_43', 'deer_44', 'deer_45', 'deer_46', 'deer_47', 'deer_48', 'deer_49', 'deer_50', 'deer_51', 'deer_52', 'deer_53', 'deer_54', 'deer_55', 'deer_56', 'deer_57', 'deer_58', 'deer_59', 'deer_60', 'deer_61', 'deer_62', 'deer_63', 'deer_64', 'deer_65', 'deer_66', 'deer_67', 'deer_68', 'deer_69', 'deer_70', 'deer_71', 'deer_72', 'deer_73', 'deer_74', 'deer_75', 'deer_76', 'deer_77', 'deer_78', 'deer_79', 'deer_80', 'deer_81', 'deer_82', 'deer_83', 'deer_84', 'deer_85', 'deer_86', 'deer_87', 'deer_88', 'deer_89', 'deer_90', 'deer_91', 'deer_93', 'deer_94', 'deer_95', 'deer_96', 'deer_97', 'deer_98', 'deer_99', 'deer_100', 'tiger_0', 'tiger_1', 'tiger_2', 'tiger_3', 'tiger_4', 'tiger_5', 'tiger_6', 'tiger_7', 'tiger_8', 'tiger_9', 'tiger_10', 'tiger_11', 'tiger_12', 'tiger_13', 'tiger_14', 'tiger_15', 'tiger_16', 'tiger_17', 'tiger_18', 'tiger_19']
[2m[36m(pid=34152)[0m agent in action_dict --> False
[2m[36m(pid=34152)[0m agent in self.aec_env.dones --> True
== Status ==
Memory usage on this node: 24.8/377.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/80 CPUs, 0/2 GPUs, 0.0/252.88 GiB heap, 0.0/77.54 GiB objects (0/1.0 GPUType:V100)
Result logdir: /home/ray_results/Campaign_Tiger-Deer-v1
Number of trials: 1 (1 ERROR)
+------------------------------------------+----------+-------+--------+------------------+------+----------+
| Trial name | status | loc | iter | total time (s) | ts | reward |
|------------------------------------------+----------+-------+--------+------------------+------+----------|
| PS_PPO_Trainer_Tiger-Deer-v1_41b65_00000 | ERROR | | 3 | 3576.35 | 3672 | 0.166667 |
+------------------------------------------+----------+-------+--------+------------------+------+----------+
Number of errored trials: 1
If I use the PettingZooEnv
in version in ray==0.87
, the error is https://github.com/ray-project/ray/blob/releases/0.8.7/rllib/env/pettingzoo_env.py#L165.
Lastly, I also applied the following SuperSuit
wrappers: pad_observations_v0
, pad_action_space_v0
, agent_indicator_v0
, and flatten_v0
, and I'm running PettingZoo==1.3.3
and SuperSuit==2.1.0
.
Thanks.