Rethink action mask API
The current API for specifying actors is very confusing and poorly documented (I myself was very confused by it after not looking at it for a month after implementing it and forgetting how it works 😆). I think we can come up with something better.
Running example
Suppose we have an environment with two entity types, Worker
and Resource
, both with features x
and y
, a harvest
action, and a move
action. The observation and action spaces might look like this:
obs_space = {
"resource": Entity(["x", "y"]),
"worker": Entity(["x", "y"]),
}
action_space = {
"harvest": CategoricalAction(["harvest", "don't harvest"]),
"move": CategoricalAction(["up", "down", "left", "right"]),
}
Suppose that on a particular timestep, we have two resource entities r1
and r2
, and three worker entities w1
, w2
, and w3
. Only w1
and w3
can perform the harvest
action on this timestep. All resources and workers can perform the move
action, but w2
can't move down
or left
.
Observations
Observations are currently defined like this:
https://github.com/entity-neural-network/incubator/blob/c887f4a2f6601c67e8feb03ba9b16f27f74a223f/entity_gym/entity_gym/environment/environment.py#L82-L92
entities
The observation consists of a dictionary mapping each entity type to a numpy array with the features of all entities of that type:
entities = {
"resource": [
[10, 4], # r1
[11, 6], # r2
],
"worker": [
[10, 4], # w1
[0, 0], # w2
[11, 6], # w3
],
}
This seems fine, or at least I can't think of any obvious simplification or more ergonomic API.
ids
The ids
are meant to allow the environment to designate an opaque identifier for each entity, which then makes it easy to later lookup that entity when executing actions chosen by the policy. For example, we could use the following list of strings to identify our entities:
ids = ["r1", "r2", "w1", "w2", "w3"],
In practice, this mostly adds runtime overhead and API complexity, and it's unclear whether it is actually particularly useful for most environments. One issue is also that the correspondence between the ids
and entities is not obvious since it relies on the implicit ordering of entity types defined by the observation space.
action_masks
The environment must specify which entities can perform each action. This is done by returning an ActionMask
for each action that specifies an actors
list that specifies the indices of the entities that can perform the action:
action_masks = {
"harvest": DenseCategoricalActionMask(actors=[2, 4]),
"move": DenseCategoricalActionMask(
actors=[0, 1, 2, 3, 4],
mask=[
[True, True, True, True],
[True, True, True, True],
[True, True, True, True],
[True, False, False, True],
[True, True, True, True],
],
),
}
The actors
field is very unintuitive. In contrast to the observation, it's a single list rather than a list per entity, and it relies on the implicit ordering of entity types given by the observation space dictionary. The mask
is also inconsistent with the ids
, since it uses indices rather than the ids
to specify actors.
action
Once we have sampled actions from the policy, we call the step
method again and supply a action: Mapping[str, Action]
with the choices made by the policy for each action. For categorical actions, this is represented by a actions: List[Tuple[EntityID, int]]
which pairs each actor with the index of the chosen action:
actions = {
"move": CategoricalAction([
"r1": 0,
"r2": 0,
"w1": 2,
"w2", 0,
"w3", 3
]),
"harvest": CategoricalAction([
"w1": 0,
"w3": 3,
]),
}
Solutions
The current API is pretty bad, not sure what the best solution is but here are some ideas:
Proposal 1: Group action_masks
and ids
by entity type
The indices would now be per entity, corresponding to the features in entities
. This would make it a little more obvious what's going on. And in the common case of all entities of some type being able to execute an action, we wouldn't need to specify the actors
field at all.
action_masks = {
"worker": {
"harvest": DenseCategoricalActionMask(actors=[0, 2]),
"move": DenseCategoricalActionMask(
mask=[
[True, True, True, True],
[True, False, False, True],
[True, True, True, True],
],
),
},
"resource": {
"move": DenseCategoricalActionMask(
mask=[
[True, True, True, True],
[True, True, True, True],
],
)
},
}
ids = {
"worker": ["w1", "w2", "w3"],
"resource": ["r1", "r2"],
}
Proposal 2: Use EntityID
s to specify actors
Instead of indices, use EntiyID
s for actors
:
action_masks = {
"harvest": DenseCategoricalActionMask(actors=["w1", "w2"]),
"move": DenseCategoricalActionMask(
actors=["r1", "r2", "w1", "w2", "w3"],
mask=[
[True, True, True, True],
[True, True, True, True],
[True, True, True, True],
[True, False, False, True],
[True, True, True, True],
],
),
}
Proposal 3: Just get rid of ids
The EntityID
s haven't proved particularly useful so far. We could just get rid of the ids
field and remove the EntityID
from the actions passed to act
. Something similar to the old interface that works with environment-supplied ids rather than indices could still be achieved with an env wrapper or helper methods.
actions = {
"move": CategoricalAction([0, 0, 2, 0, 3]),
"harvest": CategoricalAction([0, 3]),
}
If combined with (1):
actions = {
"worker": {
"move": CategoricalAction([2, 0, 3]),
"harvest": CategoricalAction([0, 3]),
},
"resource": {
"move": CategoricalAction([0, 0]),
},
}