A Closer Look at Invalid Action Masking in Policy Gradient Algorithms

Costa Huang

Last update: Dec 24, 2022

Related tags

Deep Learning invalid-action-masking

Overview

A Closer Look at Invalid Action Masking in Policy Gradient Algorithms

This repo contains the source code to reproduce the results in the paper A Closer Look at Invalid Action Masking in Policy Gradient Algorithms.

Steps to reproduce the experiments

Our experiments use docker containers to run and Weight and Biases (https://www.wandb.com/) to record the experiments, so the first step is to register a wandb account and get an API key, which we refer to as YOUR_WANDB_KEY

# build the docker container
docker build -t invalid_action_masking:latest -f sharedmemory.Dockerfile .
# build docker run commands. replace `{YOUR_WANDB_KEY}` with your own
WANDB_KEY={YOUR_WANDB_KEY} python docker.py > docker.sh
# run experiments (96 in total)
# if you have limited computational resources, consider not running all of them at a time.
# in addition, notice the commands have --cpuset-cpus="0", --cpuset-cpus="1" for different runs
# to make sure each container is only using one core. By default I assume your machine has 40 cores,
# but feel free to modify the `cores` variable in `docker.py`
bash docker.sh

Steps to reproduce the figures

Record your wandb username, which we will refer to as YOUR_WANDB_ENTITY

cd plots
WANDB_ENTITY={YOUR_WANDB_ENTITY} python episode_reward.py
WANDB_ENTITY={YOUR_WANDB_ENTITY} python approx_kl.py

These command should reproduce the PDFs in plots that are attached to the repo.

Reproduction without WANDB

Although it would be possible, it would require a significant amount of effort to properly log metrics and redo the plotting, so at this time we would not have intructions to do reproduction without WANDB. Note that it is possible to use wandb locally by following https://docs.wandb.com/self-hosted/local.

If you have an issue reproducing the results

We have tested these scripts to reproduce but it is possible that there is a bug and maybe we are assuming something specific regarding the environment. If you couldn't reproduce our results, please file an issue and we will address it as soon as the double-blind review is over.

Comments

Bump pillow from 8.3.1 to 9.0.1
Bumps pillow from 8.3.1 to 9.0.1.

Release notes

Sourced from pillow's releases.

9.0.1

https://pillow.readthedocs.io/en/stable/releasenotes/9.0.1.html

Changes

In show_file, use os.remove to remove temporary images. CVE-2022-24303 #6010 [@radarhere, @hugovk]

Restrict builtins within lambdas for ImageMath.eval. CVE-2022-22817 #6009 [radarhere]

9.0.0

https://pillow.readthedocs.io/en/stable/releasenotes/9.0.0.html

Changes

Restrict builtins for ImageMath.eval() #5923 [@radarhere]

Ensure JpegImagePlugin stops at the end of a truncated file #5921 [@radarhere]

Fixed ImagePath.Path array handling #5920 [@radarhere]

Remove consecutive duplicate tiles that only differ by their offset #5919 [@radarhere]

Removed redundant part of condition #5915 [@radarhere]

Explicitly enable strip chopping for large uncompressed TIFFs #5517 [@kmilos]

Use the Windows method to get TCL functions on Cygwin #5807 [@DWesl]

Changed error type to allow for incremental WebP parsing #5404 [@radarhere]

Improved I;16 operations on big endian #5901 [@radarhere]

Ensure that BMP pixel data offset does not ignore palette #5899 [@radarhere]

Limit quantized palette to number of colors #5879 [@radarhere]

Use latin1 encoding to decode bytes #5870 [@radarhere]

Fixed palette index for zeroed color in FASTOCTREE quantize #5869 [@radarhere]

When saving RGBA to GIF, make use of first transparent palette entry #5859 [@radarhere]

Pass SAMPLEFORMAT to libtiff #5848 [@radarhere]

Added rounding when converting P and PA #5824 [@radarhere]

Improved putdata() documentation and data handling #5910 [@radarhere]

Exclude carriage return in PDF regex to help prevent ReDoS #5912 [@radarhere]

Image.NONE is only used for resampling and dithers #5908 [@radarhere]

Fixed freeing pointer in ImageDraw.Outline.transform #5909 [@radarhere]

Add Tidelift alignment action and badge #5763 [@aclark4life]

Replaced further direct invocations of setup.py #5906 [@radarhere]

Added ImageShow support for xdg-open #5897 [@m-shinder]

Fixed typo #5902 [@radarhere]

Switched from deprecated "setup.py install" to "pip install ." #5896 [@radarhere]

Support 16-bit grayscale ImageQt conversion #5856 [@cmbruns]

Fixed raising OSError in _safe_read when size is greater than SAFEBLOCK #5872 [@radarhere]

Convert subsequent GIF frames to RGB or RGBA #5857 [@radarhere]

WebP: Fix memory leak during decoding on failure #5798 [@ilai-deutel]

Do not prematurely return in ImageFile when saving to stdout #5665 [@infmagic2047]

Added support for top right and bottom right TGA orientations #5829 [@radarhere]

Corrected ICNS file length in header #5845 [@radarhere]

Block tile TIFF tags when saving #5839 [@radarhere]

Added line width argument to ImageDraw polygon #5694 [@radarhere]

Do not redeclare class each time when converting to NumPy #5844 [@radarhere]

Only prevent repeated polygon pixels when drawing with transparency #5835 [@radarhere]

... (truncated)

Changelog

Sourced from pillow's changelog.

9.0.1 (2022-02-03)

In show_file, use os.remove to remove temporary images. CVE-2022-24303 #6010 [radarhere, hugovk]

Restrict builtins within lambdas for ImageMath.eval. CVE-2022-22817 #6009 [radarhere]

9.0.0 (2022-01-02)

Restrict builtins for ImageMath.eval(). CVE-2022-22817 #5923 [radarhere]

Ensure JpegImagePlugin stops at the end of a truncated file #5921 [radarhere]

Fixed ImagePath.Path array handling. CVE-2022-22815, CVE-2022-22816 #5920 [radarhere]

Remove consecutive duplicate tiles that only differ by their offset #5919 [radarhere]

Improved I;16 operations on big endian #5901 [radarhere]

Limit quantized palette to number of colors #5879 [radarhere]

Fixed palette index for zeroed color in FASTOCTREE quantize #5869 [radarhere]

When saving RGBA to GIF, make use of first transparent palette entry #5859 [radarhere]

Pass SAMPLEFORMAT to libtiff #5848 [radarhere]

Added rounding when converting P and PA #5824 [radarhere]

Improved putdata() documentation and data handling #5910 [radarhere]

Exclude carriage return in PDF regex to help prevent ReDoS #5912 [hugovk]

Fixed freeing pointer in ImageDraw.Outline.transform #5909 [radarhere]

... (truncated)

Commits

6deac9e 9.0.1 version bump

c04d812 Update CHANGES.rst [ci skip]

4fabec3 Added release notes for 9.0.1

02affaa Added delay after opening image with xdg-open

ca0b585 Updated formatting

427221e In show_file, use os.remove to remove temporary images

c930be0 Restrict builtins within lambdas for ImageMath.eval

75b69dd Dont need to pin for GHA

cd938a7 Autolink CWE numbers with sphinx-issues

2e9c461 Add CVE IDs

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 1
Bump pillow from 8.3.1 to 9.0.0
Bumps pillow from 8.3.1 to 9.0.0.

Release notes

Sourced from pillow's releases.

9.0.0

https://pillow.readthedocs.io/en/stable/releasenotes/9.0.0.html

Changes

Restrict builtins for ImageMath.eval() #5923 [@radarhere]

Ensure JpegImagePlugin stops at the end of a truncated file #5921 [@radarhere]

Fixed ImagePath.Path array handling #5920 [@radarhere]

Remove consecutive duplicate tiles that only differ by their offset #5919 [@radarhere]

Removed redundant part of condition #5915 [@radarhere]

Explicitly enable strip chopping for large uncompressed TIFFs #5517 [@kmilos]

Use the Windows method to get TCL functions on Cygwin #5807 [@DWesl]

Changed error type to allow for incremental WebP parsing #5404 [@radarhere]

Improved I;16 operations on big endian #5901 [@radarhere]

Ensure that BMP pixel data offset does not ignore palette #5899 [@radarhere]

Limit quantized palette to number of colors #5879 [@radarhere]

Use latin1 encoding to decode bytes #5870 [@radarhere]

Fixed palette index for zeroed color in FASTOCTREE quantize #5869 [@radarhere]

When saving RGBA to GIF, make use of first transparent palette entry #5859 [@radarhere]

Pass SAMPLEFORMAT to libtiff #5848 [@radarhere]

Added rounding when converting P and PA #5824 [@radarhere]

Improved putdata() documentation and data handling #5910 [@radarhere]

Exclude carriage return in PDF regex to help prevent ReDoS #5912 [@radarhere]

Image.NONE is only used for resampling and dithers #5908 [@radarhere]

Fixed freeing pointer in ImageDraw.Outline.transform #5909 [@radarhere]

Add Tidelift alignment action and badge #5763 [@aclark4life]

Replaced further direct invocations of setup.py #5906 [@radarhere]

Added ImageShow support for xdg-open #5897 [@m-shinder]

Fixed typo #5902 [@radarhere]

Switched from deprecated "setup.py install" to "pip install ." #5896 [@radarhere]

Support 16-bit grayscale ImageQt conversion #5856 [@cmbruns]

Fixed raising OSError in _safe_read when size is greater than SAFEBLOCK #5872 [@radarhere]

Convert subsequent GIF frames to RGB or RGBA #5857 [@radarhere]

WebP: Fix memory leak during decoding on failure #5798 [@ilai-deutel]

Do not prematurely return in ImageFile when saving to stdout #5665 [@infmagic2047]

Added support for top right and bottom right TGA orientations #5829 [@radarhere]

Corrected ICNS file length in header #5845 [@radarhere]

Block tile TIFF tags when saving #5839 [@radarhere]

Added line width argument to ImageDraw polygon #5694 [@radarhere]

Do not redeclare class each time when converting to NumPy #5844 [@radarhere]

Only prevent repeated polygon pixels when drawing with transparency #5835 [@radarhere]

Fix pushes_fd method signature #5833 [@hoodmane]

Add support for pickling TrueType fonts #5826 [@hugovk]

Only prefer command line tools SDK on macOS over default MacOSX SDK #5828 [@radarhere]

Fix compilation on 64-bit Termux #5793 [@landfillbaby]

Replace 'setup.py sdist' with '-m build --sdist' #5785 [@hugovk]

Use declarative package configuration #5784 [@hugovk]

Use title for display in ImageShow #5788 [@radarhere]

Fix for PyQt6 #5775 [@hugovk]

... (truncated)

Changelog

Sourced from pillow's changelog.

9.0.0 (2022-01-02)

Restrict builtins for ImageMath.eval(). CVE-2022-22817 #5923 [radarhere]

Ensure JpegImagePlugin stops at the end of a truncated file #5921 [radarhere]

Fixed ImagePath.Path array handling. CVE-2022-22815, CVE-2022-22816 #5920 [radarhere]

Remove consecutive duplicate tiles that only differ by their offset #5919 [radarhere]

Improved I;16 operations on big endian #5901 [radarhere]

Limit quantized palette to number of colors #5879 [radarhere]

Fixed palette index for zeroed color in FASTOCTREE quantize #5869 [radarhere]

When saving RGBA to GIF, make use of first transparent palette entry #5859 [radarhere]

Pass SAMPLEFORMAT to libtiff #5848 [radarhere]

Added rounding when converting P and PA #5824 [radarhere]

Improved putdata() documentation and data handling #5910 [radarhere]

Exclude carriage return in PDF regex to help prevent ReDoS #5912 [hugovk]

Fixed freeing pointer in ImageDraw.Outline.transform #5909 [radarhere]

Added ImageShow support for xdg-open #5897 [m-shinder, radarhere]

Support 16-bit grayscale ImageQt conversion #5856 [cmbruns, radarhere]

Convert subsequent GIF frames to RGB or RGBA #5857 [radarhere]

... (truncated)

Commits

82541b6 9.0.0 version bump

cae5ac4 Merge pull request #5924 from radarhere/cves

ed4cf78 CVEs TBD

d7f60d1 Merge pull request #5923 from radarhere/imagemath_eval

8531b01 Restrict builtins for ImageMath.eval

1efb1d9 Merge pull request #5922 from radarhere/releasenotes

f6c7871 Added release notes for #5919, #5920 and #5921

032d2dc Update CHANGES.rst [ci skip]

baae9ec Merge pull request #5921 from radarhere/jpeg_eoi

1059eb5 If appended EOI did not work, do not keep trying

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 1
Bump pillow from 8.3.1 to 8.3.2
Bumps pillow from 8.3.1 to 8.3.2.

Release notes

Sourced from pillow's releases.

8.3.2

https://pillow.readthedocs.io/en/stable/releasenotes/8.3.2.html

Security

CVE-2021-23437 Raise ValueError if color specifier is too long [hugovk, radarhere]

Fix 6-byte OOB read in FliDecode [wiredfool]

Python 3.10 wheels

Add support for Python 3.10 #5569, #5570 [hugovk, radarhere]

Fixed regressions

Ensure TIFF RowsPerStrip is multiple of 8 for JPEG compression #5588 [kmilos, radarhere]

Updates for ImagePalette channel order #5599 [radarhere]

Hide FriBiDi shim symbols to avoid conflict with real FriBiDi library #5651 [nulano]

Changelog

Sourced from pillow's changelog.

8.3.2 (2021-09-02)

CVE-2021-23437 Raise ValueError if color specifier is too long [hugovk, radarhere]

Fix 6-byte OOB read in FliDecode [wiredfool]

Add support for Python 3.10 #5569, #5570 [hugovk, radarhere]

Ensure TIFF RowsPerStrip is multiple of 8 for JPEG compression #5588 [kmilos, radarhere]

Updates for ImagePalette channel order #5599 [radarhere]

Hide FriBiDi shim symbols to avoid conflict with real FriBiDi library #5651 [nulano]

Commits

8013f13 8.3.2 version bump

23c7ca8 Update CHANGES.rst

8450366 Update release notes

a0afe89 Update test case

9e08eb8 Raise ValueError if color specifier is too long

bd5cf7d FLI tests for Oss-fuzz crash.

94a0cf1 Fix 6-byte OOB read in FliDecode

cece64f Add 8.3.2 (2021-09-02) [CI skip]

e422386 Add release notes for Pillow 8.3.2

08dcbb8 Pillow 8.3.2 supports Python 3.10 [ci skip]

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 1
Bump oauthlib from 3.1.1 to 3.2.1
Bumps oauthlib from 3.1.1 to 3.2.1.

Release notes

Sourced from oauthlib's releases.

3.2.1

In short

OAuth2.0 Provider:

#803 : Metadata endpoint support of non-HTTPS

CVE-2022-36087

OAuth1.0:

#818 : Allow IPv6 being parsed by signature

General:

Improved and fixed documentation warnings.

Cosmetic changes based on isort

What's Changed

add missing slots to TokenBase by @ariebovenberg in oauthlib/oauthlib#804

Add CORS support for Refresh Token Grant. by @luhn in oauthlib/oauthlib#806

GitHub Action to lint Python code by @cclauss in oauthlib/oauthlib#797

Docs: fix Sphinx warnings for better ReadTheDocs generation by @JonathanHuot in oauthlib/oauthlib#807

Allow non-HTTPS issuer when OAUTHLIB_INSECURE_TRANSPORT. by @luhn in oauthlib/oauthlib#803

chore: fix typo in test by @tamanobi in oauthlib/oauthlib#816

Fix typo in server.rst by @NemanjaT in oauthlib/oauthlib#819

Fixed isort imports by @dasm in oauthlib/oauthlib#820

docs: Fix a few typos by @timgates42 in oauthlib/oauthlib#822

docs: fix typos by @kianmeng in oauthlib/oauthlib#823

New Contributors

@ariebovenberg made their first contribution in oauthlib/oauthlib#804

@tamanobi made their first contribution in oauthlib/oauthlib#816

@NemanjaT made their first contribution in oauthlib/oauthlib#819

@kianmeng made their first contribution in oauthlib/oauthlib#823

Full Changelog: https://github.com/oauthlib/oauthlib/compare/v3.2.0...v3.2.1

3.2.0

Changelog

OAuth2.0 Client:

#795: Add Device Authorization Flow for Web Application

#786: Add PKCE support for Client

#783: Fallback to none in case of wrong expires_at format.

OAuth2.0 Provider:

#790: Add support for CORS to metadata endpoint.

#791: Add support for CORS to token endpoint.

#787: Remove comma after Bearer in WWW-Authenticate

OAuth2.0 Provider - OIDC:

#755: Call save_token in Hybrid code flow

#751: OIDC add support of refreshing ID Tokens with refresh_id_token

#751: The RefreshTokenGrant modifiers now take the same arguments as the AuthorizationCodeGrant modifiers (token, token_handler, request).

... (truncated)

Changelog

Sourced from oauthlib's changelog.

3.2.1 (2022-09-09)

OAuth2.0 Provider:

#803: Metadata endpoint support of non-HTTPS

CVE-2022-36087

OAuth1.0:

#818: Allow IPv6 being parsed by signature

General:

Improved and fixed documentation warnings.

Cosmetic changes based on isort

3.2.0 (2022-01-29)

OAuth2.0 Client:

#795: Add Device Authorization Flow for Web Application

#786: Add PKCE support for Client

#783: Fallback to none in case of wrong expires_at format.

OAuth2.0 Provider:

#790: Add support for CORS to metadata endpoint.

#791: Add support for CORS to token endpoint.

#787: Remove comma after Bearer in WWW-Authenticate

OAuth2.0 Provider - OIDC:

#755: Call save_token in Hybrid code flow

#751: OIDC add support of refreshing ID Tokens with refresh_id_token

#751: The RefreshTokenGrant modifiers now take the same arguments as the AuthorizationCodeGrant modifiers (token, token_handler, request).

General:

Added Python 3.9, 3.10, 3.11

Improve Travis & Coverage

Commits

88bb156 Updated date and authors

1a45d97 Prepare 3.2.1 release

0adbbe1 docs: fix typos

6569ec3 docs: Fix a few typos

bdc486e Fixed isort imports

7db45bd Fix typo in server.rst

b14ad85 chore: s/bode_code_verifier/body_code_verifier/g

b123283 Allow non-HTTPS issuer when OAUTHLIB_INSECURE_TRANSPORT. (#803)

2f887b5 Docs: fix Sphinx warnings for better ReadTheDocs generation (#807)

d4bafd9 Merge pull request #797 from cclauss/patch-2

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0

[Bug?] An error when installing gym_microrts

Hi,

Thanks for the great work. I am interested in this paper and attempt to reproduce the experiments to understand more.

One error is that Multiple top-level packages discovered in a flat-layout: ['maps', 'docker', 'experiments', 'gym_microrts']. when installing gym_microrts. I guess this is caused by automatic discovery of the setuptools as not explicitly specified in setup.py.

This workaround works for me: adding py_modules=['gym_microrts'] in the setup.py.

Specifically, I modified .../envs/act_mask/src/gym-microrts/setup.py to:

from setuptools import setup

setup(name='gym_microrts',
      version='0.1.0',
      install_requires=['gym', 'dacite', 'jPype1', 'hilbertcurve'],
      py_modules=['gym_microrts']
)

Previously, it is:

from setuptools import setup

setup(name='gym_microrts',
      version='0.1.0',
      install_requires=['gym', 'dacite', 'jPype1', 'hilbertcurve']
)

The environment specifications:

anaconda: v4.11.0
Python: 3.8

Error details:

$ poetry install
Installing dependencies from lock file

Package operations: 0 installs, 32 updates, 1 removal

  • Removing py (1.10.0)
  • Updating certifi (2021.10.8 -> 2021.5.30)
  • Updating charset-normalizer (2.0.7 -> 2.0.1)
  • Updating idna (3.3 -> 3.2)
  • Updating pytz (2021.3 -> 2021.1)
  • Updating urllib3 (1.26.7 -> 1.26.6)
  • Updating cachetools (4.2.4 -> 4.2.2)
  • Updating click (8.0.3 -> 8.0.1)
  • Updating jinja2 (3.0.2 -> 3.0.1)
  • Updating numpy (1.21.4 -> 1.21.0)
  • Updating packaging (21.2 -> 21.0)
  • Updating smmap (5.0.0 -> 4.0.0)
  • Updating typing-extensions (3.10.0.2 -> 3.10.0.0)
  • Updating cloudpickle (2.0.0 -> 1.6.0)
  • Updating cycler (0.11.0 -> 0.10.0)
  • Updating gitdb (4.0.9 -> 4.0.7)
  • Updating google-auth (2.3.3 -> 1.32.1)
  • Updating kiwisolver (1.3.2 -> 1.3.1)
  • Updating pillow (8.4.0 -> 8.3.1)
  • Updating absl-py (0.15.0 -> 0.13.0)
  • Updating configparser (5.1.0 -> 5.0.2)
  • Updating google-auth-oauthlib (0.4.6 -> 0.4.4)
  • Updating grpcio (1.41.1 -> 1.38.1)
  • Updating gym (0.21.0 -> 0.17.3)
  • Updating matplotlib (3.4.3 -> 3.4.2)
  • Updating pandas (1.3.4 -> 1.3.0)
  • Updating protobuf (3.19.1 -> 3.17.3)
  • Updating sentry-sdk (1.4.3 -> 1.4.2)
  • Updating werkzeug (2.0.2 -> 2.0.1)
  • Updating tensorboard (2.7.0 -> 2.5.0)
  • Updating wandb (0.12.6 -> 0.12.2)
  • Updating cleanrl (0.0.1 35694d2 -> 0.0.1 35694d2)
  • Updating gym-microrts (0.0.0 /home/***/gym-microrts -> 0.1.0 b0cabba): Failed

  EnvCommandError

  Command ['/home/***/anaconda3/envs/act_mask/bin/pip', 'install', '--no-deps', '-U', '-e', '/home/***/anaconda3/envs/act_mask/src/gym-microrts'] errored with the following return code 1, and output: 
  Obtaining file:///home/***/anaconda3/envs/act_mask/src/gym-microrts
    Preparing metadata (setup.py): started
    Preparing metadata (setup.py): finished with status 'error'
    error: subprocess-exited-with-error
    
    × python setup.py egg_info did not run successfully.
    │ exit code: 1
    ╰─> [14 lines of output]
        error: Multiple top-level packages discovered in a flat-layout: ['maps', 'docker', 'experiments', 'gym_microrts'].
        
        To avoid accidental inclusion of unwanted files or directories,
        setuptools will not proceed with this build.
        
        If you are trying to create a single distribution with multiple packages
        on purpose, you should not rely on automatic discovery.
        Instead, consider the following options:
        
        1. set up custom discovery (`find` directive with `include` or `exclude`)
        2. use a `src-layout`
        3. explicitly set `py_modules` or `packages` with a list of names
        
        To find more information, look for "package discovery" on setuptools docs.
        [end of output]
    
    note: This error originates from a subprocess, and is likely not a problem with pip.
  error: metadata-generation-failed
  
  × Encountered error while generating package metadata.
  ╰─> See above for output.
  
  note: This is an issue with the package mentioned above, not pip.
  hint: See above for details.
  

  at ~/.poetry/lib/poetry/utils/env.py:1195 in _run
      1191│                 output = subprocess.check_output(
      1192│                     cmd, stderr=subprocess.STDOUT, **kwargs
      1193│                 )
      1194│         except CalledProcessError as e:
    → 1195│             raise EnvCommandError(e, input=input_)
      1196│ 
      1197│         return decode(output)
      1198│ 
      1199│     def execute(self, bin, *args, **kwargs):

opened by cameron-chen 0

Masking removed still behaves to some extent

Hey,

not sure if that is the proper way of asking you something but I'll give it a try. So according to you paper, you analysed that invalid action masking still works quite well even without using the mask after the training finished. As far as I understood, you compared against the method of giving a reward penalty for executing some invalid action.

In my case I am experiencing exactly the opposite. The training with invalid action masking outputs not reasonable distributions, as opposed to the negative reward penalty. However, the negative rew. penalty is still not satisfying. In my case, it doesn't bother me too much, since I am having access to the mask even for inference time. Question: Do you have any other insides of what could be the issue. Because training with invalid actions masking accelerates training by a factor of 10. The training pipeline looks great and the number of discrete actions is also not "too big" (8 actions). Another disturbing factor for my issue is, that you prove that the policy gradient is a valid gradient and must therefore be correct for backpropagation.

Thanks in advance if you have any ideas or hints. Greetings, Fabian

opened by sycz00 0
Bump numpy from 1.21.0 to 1.22.0
Bumps numpy from 1.21.0 to 1.22.0.

Release notes

Sourced from numpy's releases.

v1.22.0

NumPy 1.22.0 Release Notes

NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.

A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.

NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.

New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.

A new configurable allocator for use by downstream projects.

These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

Expired deprecations

Deprecated numeric style dtype strings have been removed

Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

(gh-19539)

Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

(gh-19615)

... (truncated)

Commits

4adc87d Merge pull request #20685 from charris/prepare-for-1.22.0-release

fd66547 REL: Prepare for the NumPy 1.22.0 release.

125304b wip

c283859 Merge pull request #20682 from charris/backport-20416

5399c03 Merge pull request #20681 from charris/backport-20954

f9c45f8 Merge pull request #20680 from charris/backport-20663

794b36f Update armccompiler.py

d93b14e Update test_public_api.py

7662c07 Update init.py

311ab52 Update armccompiler.py

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
Bump ipython from 7.28.0 to 7.31.1
Bumps ipython from 7.28.0 to 7.31.1.

Commits

e321e76 release 7.31.1

67ca2b3 Merge pull request from GHSA-pq7m-3gw7-gq5x

2794330 back to dev

be343e7 release 7.31.0

0fcf2c4 Merge pull request #13428 from meeseeksmachine/auto-backport-of-pr-13427-on-7.x

b8db9b1 Backport PR #13427: wn 731

7f253dc Merge pull request #13412 from bnavigator/backport-inspect

4f26796 fix xxlimited_35 import name

77ca4a6 don't run nose-based iptest on py310, only pytest

533e509 back to decorator skip

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
Bump ujson from 4.2.0 to 5.1.0
Bumps ujson from 4.2.0 to 5.1.0.

Release notes

Sourced from ujson's releases.

5.1.0

Changed

Strip debugging symbols from Linux binaries (#493) @bwoodsend

5.0.0

Added

Use cibuildwheel to build wheels (#491) @bwoodsend

Removed

Drop support for soon-EOL Python 3.6 (#490) @hugovk

Fixed

Install Twine to upload to PyPI (#492) @hugovk

4.3.0

Added

Enable Windows on ARM64 target (#488) @nsait-linaro

Commits

682c660 Merge pull request #493 from bwoodsend/strip-binaries

c1d5b6d [pre-commit.ci] auto fixes from pre-commit.com hooks

b9275f7 Strip debugging symbols from Linux binaries.

e3ccc5a Merge pull request #492 from hugovk/deploy-twine

243d49b Install Twine to upload to PyPI

269621b Merge pull request #490 from hugovk/rm-3.6

cccde3f Drop support for EOL Python 3.6

b55049f Merge pull request #491 from bwoodsend/switch-to-ci-build-wheels

04286a6 Drop wheels for Python 3.6. (#490)

ab32d48 CI/CD: Ensure that sdists are uploaded last.

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0

Owner

Costa Huang

Computer Science Ph.D student at Drexel University researching Game Artificial Intelligence

GitHub

code for `Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation`

Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation (CVPR 2021) Introduction PBR is a conceptually simple yet effective

143 Jan 5, 2023

A Closer Look at Structured Pruning for Neural Network Compression

A Closer Look at Structured Pruning for Neural Network Compression Code used to reproduce experiments in https://arxiv.org/abs/1810.04622. To prune, w

140 Dec 5, 2022

Look Closer: Bridging Egocentric and Third-Person Views with Transformers for Robotic Manipulation

Look Closer: Bridging Egocentric and Third-Person Views with Transformers for Robotic Manipulation Official PyTorch implementation for the paper Look

20 Nov 24, 2022

Deep Reinforcement Learning by using an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO)

V-MPO Simple code to demonstrate Deep Reinforcement Learning by using an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO) in Pyt

9 Jun 6, 2022

This project provides a stock market environment using OpenGym with Deep Q-learning and Policy Gradient.

Stock Trading Market OpenAI Gym Environment with Deep Reinforcement Learning using Keras Overview This project provides a general environment for stoc

769 Dec 25, 2022

PGPortfolio: Policy Gradient Portfolio, the source code of "A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem"(https://arxiv.org/pdf/1706.10059.pdf).

This is the original implementation of our paper, A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem (arXiv:1706.1

1.5k Dec 29, 2022

Trains an agent with stochastic policy gradient ascent to solve the Lunar Lander challenge from OpenAI

Introduction This script trains an agent with stochastic policy gradient ascent to solve the Lunar Lander challenge from OpenAI. In order to run this

0 Jan 2, 2022

A PyTorch implementation of Learning to learn by gradient descent by gradient descent

Intro PyTorch implementation of Learning to learn by gradient descent by gradient descent. Run python main.py TODO Initial implementation Toy data LST

300 Dec 11, 2022

Official implementation of GraphMask as presented in our paper Interpreting Graph Neural Networks for NLP With Differentiable Edge Masking.

GraphMask This repository contains an implementation of GraphMask, the interpretability technique for graph neural networks presented in our ICLR 2021

29 Sep 2, 2022

Code & Data for the Paper "Time Masking for Temporal Language Models", WSDM 2022

Time Masking for Temporal Language Models This repository provides a reference implementation of the paper: Time Masking for Temporal Language Models

12 Jan 6, 2023

U-Net Implementation: Convolutional Networks for Biomedical Image Segmentation" using the Carvana Image Masking Dataset in PyTorch

U-Net Implementation By Christopher Ley This is my interpretation and implementation of the famous paper "U-Net: Convolutional Networks for Biomedical

1 Jan 6, 2022

Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding (CVPR2022)

Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding by Qiaole Dong*, Chenjie Cao*, Yanwei Fu Paper and Supple

190 Dec 27, 2022

Allows including an action inside another action (by preprocessing the Yaml file). This is how composite actions should have worked.

actions-includes Allows including an action inside another action (by preprocessing the Yaml file). Instead of using uses or run in your action step,

70 Nov 4, 2022

Official implementation of ACTION-Net: Multipath Excitation for Action Recognition (CVPR'21).

ACTION-Net Official implementation of ACTION-Net: Multipath Excitation for Action Recognition (CVPR'21). Getting Started EgoGesture data folder struct

171 Dec 26, 2022

Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization' (ICCV-21 Oral)

Learning-Action-Completeness-from-Points Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal A

67 Jan 3, 2023

Human Action Controller - A human action controller running on different platforms.

Human Action Controller (HAC) Goal A human action controller running on different platforms. Fun Easy-to-use Accurate Anywhere Fun Examples Mouse Cont

27 Jul 20, 2022

The official TensorFlow implementation of the paper Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

Action Transformer A Self-Attention Model for Short-Time Human Action Recognition This repository contains the official TensorFlow implementation of t

20 Jan 3, 2023

Pytorch implementations of popular off-policy multi-agent reinforcement learning algorithms, including QMix, VDN, MADDPG, and MATD3.

Off-Policy Multi-Agent Reinforcement Learning (MARL) Algorithms This repository contains implementations of various off-policy multi-agent reinforceme

183 Dec 28, 2022

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

14.5k Jan 8, 2023

A Closer Look at Invalid Action Masking in Policy Gradient Algorithms

Related tags

Overview

A Closer Look at Invalid Action Masking in Policy Gradient Algorithms

Steps to reproduce the experiments

Steps to reproduce the figures

Reproduction without WANDB

If you have an issue reproducing the results

Comments

9.0.1

Changes

9.0.0

Changes

9.0.1 (2022-02-03)

9.0.0 (2022-01-02)

9.0.0

Changes

9.0.0 (2022-01-02)

8.3.2

Security

Python 3.10 wheels

Fixed regressions

8.3.2 (2021-09-02)

3.2.1

In short

What's Changed

New Contributors

3.2.0

Changelog

3.2.1 (2022-09-09)

3.2.0 (2022-01-29)

v1.22.0

NumPy 1.22.0 Release Notes

Expired deprecations

Deprecated numeric style dtype strings have been removed

Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

5.1.0

Changed

5.0.0

Added

Removed

Fixed

4.3.0

Added

Owner

Costa Huang

code for `Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation`

A Closer Look at Structured Pruning for Neural Network Compression

Look Closer: Bridging Egocentric and Third-Person Views with Transformers for Robotic Manipulation

Deep Reinforcement Learning by using an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO)

This project provides a stock market environment using OpenGym with Deep Q-learning and Policy Gradient.

PGPortfolio: Policy Gradient Portfolio, the source code of "A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem"(https://arxiv.org/pdf/1706.10059.pdf).

Trains an agent with stochastic policy gradient ascent to solve the Lunar Lander challenge from OpenAI

A PyTorch implementation of Learning to learn by gradient descent by gradient descent

Official implementation of GraphMask as presented in our paper Interpreting Graph Neural Networks for NLP With Differentiable Edge Masking.

Code & Data for the Paper "Time Masking for Temporal Language Models", WSDM 2022

U-Net Implementation: Convolutional Networks for Biomedical Image Segmentation" using the Carvana Image Masking Dataset in PyTorch

Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding (CVPR2022)

Allows including an action inside another action (by preprocessing the Yaml file). This is how composite actions should have worked.

Official implementation of ACTION-Net: Multipath Excitation for Action Recognition (CVPR'21).

Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization' (ICCV-21 Oral)

Human Action Controller - A human action controller running on different platforms.

The official TensorFlow implementation of the paper Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

Pytorch implementations of popular off-policy multi-agent reinforcement learning algorithms, including QMix, VDN, MADDPG, and MATD3.

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Expired deprecations for `loads`, `ndfromtxt`, and `mafromtxt` in npyio