Train an RL agent to execute natural language instructions in a 3D Environment (PyTorch)

Overview

Gated-Attention Architectures for Task-Oriented Language Grounding

This is a PyTorch implementation of the AAAI-18 paper:

Gated-Attention Architectures for Task-Oriented Language Grounding
Devendra Singh Chaplot, Kanthashree Mysore Sathyendra, Rama Kumar Pasumarthi, Dheeraj Rajagopal, Ruslan Salakhutdinov
Carnegie Mellon University

Project Website: https://sites.google.com/view/gated-attention

example

This repository contains:

  • Code for training an A3C-LSTM agent using Gated-Attention
  • Code for Doom-based language grounding environment

Dependencies

(We recommend using Anaconda)

Usage

Using the Environment

For running a random agent:

python env_test.py

To play in the environment:

python env_test.py --interactive 1

To change the difficulty of the environment (easy/medium/hard):

python env_test.py -d easy

Training Gated-Attention A3C-LSTM agent

For training a A3C-LSTM agent with 32 threads:

python a3c_main.py --num-processes 32 --evaluate 0

The code will save the best model at ./saved/model_best.

To the test the pre-trained model for Multitask Generalization:

python a3c_main.py --evaluate 1 --load saved/pretrained_model

To the test the pre-trained model for Zero-shot Task Generalization:

python a3c_main.py --evaluate 2 --load saved/pretrained_model

To the visualize the model while testing add '--visualize 1':

python a3c_main.py --evaluate 2 --load saved/pretrained_model --visualize 1

To test the trained model, use --load saved/model_best in the above commands.

All arguments for a3c_main.py:

  -h, --help            show this help message and exit
  -l MAX_EPISODE_LENGTH, --max-episode-length MAX_EPISODE_LENGTH
                        maximum length of an episode (default: 30)
  -d DIFFICULTY, --difficulty DIFFICULTY
                        Difficulty of the environment, "easy", "medium" or
                        "hard" (default: hard)
  --living-reward LIVING_REWARD
                        Default reward at each time step (default: 0, change
                        to -0.005 to encourage shorter paths)
  --frame-width FRAME_WIDTH
                        Frame width (default: 300)
  --frame-height FRAME_HEIGHT
                        Frame height (default: 168)
  -v VISUALIZE, --visualize VISUALIZE
                        Visualize the envrionment (default: 0, use 0 for
                        faster training)
  --sleep SLEEP         Sleep between frames for better visualization
                        (default: 0)
  --scenario-path SCENARIO_PATH
                        Doom scenario file to load (default: maps/room.wad)
  --interactive INTERACTIVE
                        Interactive mode enables human to play (default: 0)
  --all-instr-file ALL_INSTR_FILE
                        All instructions file (default:
                        data/instructions_all.json)
  --train-instr-file TRAIN_INSTR_FILE
                        Train instructions file (default:
                        data/instructions_train.json)
  --test-instr-file TEST_INSTR_FILE
                        Test instructions file (default:
                        data/instructions_test.json)
  --object-size-file OBJECT_SIZE_FILE
                        Object size file (default: data/object_sizes.txt)
  --lr LR               learning rate (default: 0.001)
  --gamma G             discount factor for rewards (default: 0.99)
  --tau T               parameter for GAE (default: 1.00)
  --seed S              random seed (default: 1)
  -n N, --num-processes N
                        how many training processes to use (default: 4)
  --num-steps NS        number of forward steps in A3C (default: 20)
  --load LOAD           model path to load, 0 to not reload (default: 0)
  -e EVALUATE, --evaluate EVALUATE
                        0:Train, 1:Evaluate MultiTask Generalization
                        2:Evaluate Zero-shot Generalization (default: 0)
  --dump-location DUMP_LOCATION
                        path to dump models and log (default: ./saved/)

Demostration videos:

Multitask Generalization video: https://www.youtube.com/watch?v=YJG8fwkv7gA

Zero-shot Task Generalization video: https://www.youtube.com/watch?v=JziCKsLrudE

Different stages of training: https://www.youtube.com/watch?v=o_G6was03N0

Cite as

Chaplot, D.S., Sathyendra, K.M., Pasumarthi, R.K., Rajagopal, D. and Salakhutdinov, R., 2017. Gated-Attention Architectures for Task-Oriented Language Grounding. arXiv preprint arXiv:1706.07230. (PDF)

Bibtex:

@article{chaplot2017gated,
  title={Gated-Attention Architectures for Task-Oriented Language Grounding},
  author={Chaplot, Devendra Singh and Sathyendra, Kanthashree Mysore and Pasumarthi, Rama Kumar and Rajagopal, Dheeraj and Salakhutdinov, Ruslan},
  journal={arXiv preprint arXiv:1706.07230},
  year={2017}
}

Acknowledgements

This repository uses ViZDoom API (https://github.com/mwydmuch/ViZDoom) and parts of the code from the API. The implementation of A3C is borrowed from https://github.com/ikostrikov/pytorch-a3c. The poisson-disc code is borrowed from https://github.com/IHautaI/poisson-disc.

Comments
  • Issue in reproducing results

    Issue in reproducing results

    Hi! Thank you for providing the code. I am facing issues reproducing the results from the paper. Starting training from the pre-trained model provided, avg accuracy scores are low. I am pasting the log:

    DeepRL-Grounding-master$ python a3c_main.py --num-processes 16 --evaluate 0 --load saved/pretrained_model --difficulty easy 2 Loading model ... saved/pretrained_model 9 Loading model ... saved/pretrained_model Loading model ... saved/pretrained_model 14 Loading model ... saved/pretrained_model 5 Loading model ... saved/pretrained_model 12 Loading model ... saved/pretrained_model 11 Loading model ... saved/pretrained_model 6 Loading model ... saved/pretrained_model 8 Loading model ... saved/pretrained_model 0 Loading model ... saved/pretrained_model 10 Loading model ... saved/pretrained_model 15 Loading model ... saved/pretrained_model 1 Loading model ... saved/pretrained_model 13 Loading model ... saved/pretrained_model 4 Loading model ... saved/pretrained_model 3 Loading model ... saved/pretrained_model 7 Loading model ... saved/pretrained_model Time 00h 39m 33s, Avg Reward 0.392, Avg Accuracy 0.42, Avg Ep length 19.9, Best Reward 0.0 Time 01h 21m 19s, Avg Reward 0.26, Avg Accuracy 0.3, Avg Ep length 21.66, Best Reward 0.392 Time 02h 04m 29s, Avg Reward 0.324, Avg Accuracy 0.34, Avg Ep length 21.9, Best Reward 0.392 Time 02h 48m 33s, Avg Reward 0.308, Avg Accuracy 0.32, Avg Ep length 22.92, Best Reward 0.392 Time 03h 29m 33s, Avg Reward 0.376, Avg Accuracy 0.38, Avg Ep length 21.22, Best Reward 0.392 Time 04h 14m 27s, Avg Reward 0.288, Avg Accuracy 0.3, Avg Ep length 23.64, Best Reward 0.392 Time 04h 58m 49s, Avg Reward 0.228, Avg Accuracy 0.24, Avg Ep length 23.74, Best Reward 0.392 Time 05h 41m 42s, Avg Reward 0.276, Avg Accuracy 0.28, Avg Ep length 23.64, Best Reward 0.392 Time 06h 25m 22s, Avg Reward 0.336, Avg Accuracy 0.34, Avg Ep length 23.26, Best Reward 0.392 Time 07h 06m 41s, Avg Reward 0.376, Avg Accuracy 0.38, Avg Ep length 21.96, Best Reward 0.392 Time 07h 47m 06s, Avg Reward 0.416, Avg Accuracy 0.42, Avg Ep length 21.44, Best Reward 0.392 Time 08h 27m 26s, Avg Reward 0.38, Avg Accuracy 0.38, Avg Ep length 22.06, Best Reward 0.416


    When training the model from scratch in easy mode with -0.005 living cost and 0 living cost, the log was:

    python a3c_main.py --num-processes 16 --evaluate 0 --difficulty easy Time 00h 11m 08s, Avg Reward 0.004, Avg Accuracy 0.16, Avg Ep length 7.98, Best Reward 0.0 Time 00h 19m 36s, Avg Reward -0.032, Avg Accuracy 0.14, Avg Ep length 5.56, Best Reward 0.004 Time 00h 27m 11s, Avg Reward -0.056, Avg Accuracy 0.12, Avg Ep length 4.68, Best Reward 0.004 Time 00h 33m 53s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 4.0, Best Reward 0.004 Time 00h 40m 24s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 4.0, Best Reward 0.04 Time 00h 47m 14s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 4.0, Best Reward 0.04 Time 00h 53m 58s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 4.0, Best Reward 0.04 Time 01h 00m 38s, Avg Reward -0.08, Avg Accuracy 0.1, Avg Ep length 4.0, Best Reward 0.04 Time 01h 07m 26s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 4.0, Best Reward 0.04 Time 01h 14m 05s, Avg Reward 0.016, Avg Accuracy 0.18, Avg Ep length 4.0, Best Reward 0.04 Time 01h 20m 46s, Avg Reward -0.056, Avg Accuracy 0.12, Avg Ep length 4.0, Best Reward 0.04 Time 01h 28m 01s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 4.5, Best Reward 0.04 Time 01h 35m 43s, Avg Reward 0.112, Avg Accuracy 0.26, Avg Ep length 4.94, Best Reward 0.04 Time 01h 43m 23s, Avg Reward -0.032, Avg Accuracy 0.14, Avg Ep length 5.0, Best Reward 0.112 Time 01h 51m 28s, Avg Reward 0.064, Avg Accuracy 0.22, Avg Ep length 5.0, Best Reward 0.112 Time 01h 58m 52s, Avg Reward 0.112, Avg Accuracy 0.26, Avg Ep length 4.74, Best Reward 0.112 Time 02h 06m 37s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 4.98, Best Reward 0.112 Time 02h 14m 08s, Avg Reward 0.088, Avg Accuracy 0.24, Avg Ep length 4.7, Best Reward 0.112 Time 02h 20m 59s, Avg Reward -0.032, Avg Accuracy 0.14, Avg Ep length 4.04, Best Reward 0.112 Time 02h 28m 27s, Avg Reward -0.032, Avg Accuracy 0.14, Avg Ep length 4.6, Best Reward 0.112 Time 02h 35m 23s, Avg Reward 0.016, Avg Accuracy 0.18, Avg Ep length 4.26, Best Reward 0.112 Time 02h 42m 11s, Avg Reward 0.112, Avg Accuracy 0.26, Avg Ep length 4.02, Best Reward 0.112 Time 02h 49m 04s, Avg Reward 0.088, Avg Accuracy 0.24, Avg Ep length 4.16, Best Reward 0.112 Time 02h 57m 03s, Avg Reward -0.032, Avg Accuracy 0.14, Avg Ep length 4.88, Best Reward 0.112
    Time 03h 04m 38s, Avg Reward 0.016, Avg Accuracy 0.18, Avg Ep length 4.7, Best Reward 0.112 Time 03h 11m 32s, Avg Reward 0.016, Avg Accuracy 0.18, Avg Ep length 4.18, Best Reward 0.112 Time 03h 18m 16s, Avg Reward 0.112, Avg Accuracy 0.26, Avg Ep length 4.0, Best Reward 0.112 Time 03h 25m 05s, Avg Reward 0.16, Avg Accuracy 0.3, Avg Ep length 4.0, Best Reward 0.112 Time 03h 31m 42s, Avg Reward 0.064, Avg Accuracy 0.22, Avg Ep length 4.0, Best Reward 0.16 Time 03h 38m 36s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 4.0, Best Reward 0.16 Time 03h 45m 30s, Avg Reward -0.032, Avg Accuracy 0.14, Avg Ep length 4.0, Best Reward 0.16 Time 03h 52m 15s, Avg Reward 0.064, Avg Accuracy 0.22, Avg Ep length 4.12, Best Reward 0.16 Time 03h 59m 01s, Avg Reward 0.016, Avg Accuracy 0.18, Avg Ep length 4.0, Best Reward 0.16 Time 04h 05m 44s, Avg Reward 0.064, Avg Accuracy 0.22, Avg Ep length 4.24, Best Reward 0.16 Time 04h 13m 43s, Avg Reward -0.056, Avg Accuracy 0.12, Avg Ep length 5.0, Best Reward 0.16 Time 04h 21m 32s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 5.0, Best Reward 0.16 Time 04h 29m 13s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 5.0, Best Reward 0.16 Time 04h 37m 07s, Avg Reward -0.08, Avg Accuracy 0.1, Avg Ep length 5.0, Best Reward 0.16 Time 04h 44m 51s, Avg Reward 0.088, Avg Accuracy 0.24, Avg Ep length 4.86, Best Reward 0.16 Time 04h 52m 06s, Avg Reward 0.064, Avg Accuracy 0.22, Avg Ep length 4.52, Best Reward 0.16 Time 04h 58m 46s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 4.06, Best Reward 0.16 Time 05h 05m 55s, Avg Reward -0.104, Avg Accuracy 0.08, Avg Ep length 4.22, Best Reward 0.16 Time 05h 16m 13s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 4.0, Best Reward 0.16 Time 05h 27m 28s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 4.0, Best Reward 0.16 Time 05h 38m 21s, Avg Reward 0.016, Avg Accuracy 0.18, Avg Ep length 4.0, Best Reward 0.16 Time 05h 49m 23s, Avg Reward -0.056, Avg Accuracy 0.12, Avg Ep length 4.0, Best Reward 0.16 Time 06h 00m 09s, Avg Reward 0.112, Avg Accuracy 0.26, Avg Ep length 4.0, Best Reward 0.16 Time 06h 10m 35s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 4.0, Best Reward 0.16 Time 06h 21m 45s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 4.12, Best Reward 0.16 Time 06h 36m 23s, Avg Reward -0.08, Avg Accuracy 0.1, Avg Ep length 5.88, Best Reward 0.16 Time 06h 51m 05s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 6.0, Best Reward 0.16 Time 07h 04m 16s, Avg Reward -0.056, Avg Accuracy 0.12, Avg Ep length 5.12, Best Reward 0.16 Time 07h 18m 24s, Avg Reward -0.032, Avg Accuracy 0.14, Avg Ep length 6.78, Best Reward 0.16 Time 07h 28m 26s, Avg Reward 0.016, Avg Accuracy 0.18, Avg Ep length 6.74, Best Reward 0.16 Time 07h 36m 14s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 4.84, Best Reward 0.16 Time 07h 43m 10s, Avg Reward 0.088, Avg Accuracy 0.24, Avg Ep length 4.34, Best Reward 0.16 Time 07h 51m 03s, Avg Reward 0.136, Avg Accuracy 0.28, Avg Ep length 4.82, Best Reward 0.16 Time 07h 58m 41s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 4.64, Best Reward 0.16
    Time 08h 05m 15s, Avg Reward 0.064, Avg Accuracy 0.22, Avg Ep length 4.0, Best Reward 0.16 Time 08h 12m 02s, Avg Reward 0.184, Avg Accuracy 0.32, Avg Ep length 4.08, Best Reward 0.16 Time 08h 20m 20s, Avg Reward 0.088, Avg Accuracy 0.24, Avg Ep length 5.36, Best Reward 0.184 Time 08h 28m 37s, Avg Reward -0.056, Avg Accuracy 0.12, Avg Ep length 5.42, Best Reward 0.184 Time 08h 36m 42s, Avg Reward 0.064, Avg Accuracy 0.22, Avg Ep length 5.26, Best Reward 0.184 Time 08h 45m 00s, Avg Reward -0.056, Avg Accuracy 0.12, Avg Ep length 5.32, Best Reward 0.184 Time 08h 54m 55s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 6.58, Best Reward 0.184 Time 09h 05m 13s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 7.02, Best Reward 0.184 Time 09h 12m 35s, Avg Reward 0.112, Avg Accuracy 0.26, Avg Ep length 4.66, Best Reward 0.184 Time 09h 20m 36s, Avg Reward -0.08, Avg Accuracy 0.1, Avg Ep length 5.16, Best Reward 0.184 Time 09h 28m 52s, Avg Reward 0.184, Avg Accuracy 0.32, Avg Ep length 5.42, Best Reward 0.184 Time 09h 37m 38s, Avg Reward 0.112, Avg Accuracy 0.26, Avg Ep length 5.86, Best Reward 0.184 Time 09h 45m 54s, Avg Reward 0.016, Avg Accuracy 0.18, Avg Ep length 5.42, Best Reward 0.184 Time 09h 52m 41s, Avg Reward 0.112, Avg Accuracy 0.26, Avg Ep length 4.14, Best Reward 0.184 Time 09h 59m 17s, Avg Reward -0.032, Avg Accuracy 0.14, Avg Ep length 4.0, Best Reward 0.184 Time 10h 05m 48s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 4.04, Best Reward 0.184 Time 10h 12m 19s, Avg Reward 0.016, Avg Accuracy 0.18, Avg Ep length 4.0, Best Reward 0.184 Time 10h 18m 39s, Avg Reward 0.136, Avg Accuracy 0.28, Avg Ep length 4.0, Best Reward 0.184 Time 10h 25m 00s, Avg Reward 0.112, Avg Accuracy 0.26, Avg Ep length 4.0, Best Reward 0.184 Time 10h 31m 55s, Avg Reward 0.184, Avg Accuracy 0.32, Avg Ep length 4.06, Best Reward 0.184 Time 10h 39m 39s, Avg Reward -0.032, Avg Accuracy 0.14, Avg Ep length 5.0, Best Reward 0.184 Time 10h 47m 22s, Avg Reward 0.064, Avg Accuracy 0.22, Avg Ep length 4.9, Best Reward 0.184 Time 10h 55m 23s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 5.0, Best Reward 0.184 Time 11h 03m 23s, Avg Reward 0.088, Avg Accuracy 0.24, Avg Ep length 5.2, Best Reward 0.184 Time 11h 10m 32s, Avg Reward 0.088, Avg Accuracy 0.24, Avg Ep length 4.4, Best Reward 0.184 Time 11h 17m 43s, Avg Reward 0.112, Avg Accuracy 0.26, Avg Ep length 4.24, Best Reward 0.184 Time 11h 25m 51s, Avg Reward -0.032, Avg Accuracy 0.14, Avg Ep length 5.26, Best Reward 0.184 Time 11h 33m 49s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 5.22, Best Reward 0.184 Time 11h 41m 35s, Avg Reward -0.032, Avg Accuracy 0.14, Avg Ep length 5.0, Best Reward 0.184 Time 11h 49m 08s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 5.0, Best Reward 0.184 Time 11h 56m 36s, Avg Reward 0.016, Avg Accuracy 0.18, Avg Ep length 4.78, Best Reward 0.184 Time 12h 03m 25s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 4.04, Best Reward 0.184 Time 12h 13m 12s, Avg Reward 0.064, Avg Accuracy 0.22, Avg Ep length 4.0, Best Reward 0.184 Time 12h 23m 39s, Avg Reward 0.064, Avg Accuracy 0.22, Avg Ep length 4.0, Best Reward 0.184 Time 12h 30m 18s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 4.0, Best Reward 0.184
    Time 12h 37m 06s, Avg Reward 0.136, Avg Accuracy 0.28, Avg Ep length 4.0, Best Reward 0.184 Time 12h 43m 50s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 4.0, Best Reward 0.184 Time 12h 50m 31s, Avg Reward 0.016, Avg Accuracy 0.18, Avg Ep length 4.0, Best Reward 0.184 Time 12h 57m 05s, Avg Reward 0.088, Avg Accuracy 0.24, Avg Ep length 4.0, Best Reward 0.184 Time 13h 06m 19s, Avg Reward -0.056, Avg Accuracy 0.12, Avg Ep length 4.12, Best Reward 0.184 Time 13h 18m 38s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 6.12, Best Reward 0.184 Time 13h 31m 20s, Avg Reward 0.064, Avg Accuracy 0.22, Avg Ep length 6.2, Best Reward 0.184 Time 13h 45m 22s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 5.84, Best Reward 0.184 Time 13h 56m 15s, Avg Reward -0.104, Avg Accuracy 0.08, Avg Ep length 4.36, Best Reward 0.184 Time 14h 07m 04s, Avg Reward 0.112, Avg Accuracy 0.26, Avg Ep length 4.22, Best Reward 0.184 Time 14h 18m 43s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 4.82, Best Reward 0.184 Time 14h 30m 49s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 4.94, Best Reward 0.184 Time 14h 43m 07s, Avg Reward 0.136, Avg Accuracy 0.28, Avg Ep length 4.86, Best Reward 0.184 Time 14h 55m 36s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 5.06, Best Reward 0.184 Training thread: 15 Num iters: 1K Avg policy loss: 0.117592454028 Avg value loss: 0.832445306242 Time 15h 07m 37s, Avg Reward 0.088, Avg Accuracy 0.24, Avg Ep length 5.0, Best Reward 0.184 Training thread: 3 Num iters: 1K Avg policy loss: -0.130185781124 Avg value loss: 0.736817106047 Time 15h 20m 15s, Avg Reward 0.064, Avg Accuracy 0.22, Avg Ep length 5.0, Best Reward 0.184 Training thread: 8 Num iters: 1K Avg policy loss: -0.115444350065 Avg value loss: 0.736987084803 Training thread: 12 Num iters: 1K Avg policy loss: -0.139137712868 Avg value loss: 0.745042469732 Training thread: 13 Num iters: 1K Avg policy loss: -0.087330975621 Avg value loss: 0.74735086819 Time 15h 32m 27s, Avg Reward 0.136, Avg Accuracy 0.28, Avg Ep length 5.0, Best Reward 0.184 Training thread: 5 Num iters: 1K Avg policy loss: -0.109482607283 Avg value loss: 0.762932332613 Training thread: 7 Num iters: 1K Avg policy loss: -0.0333308469482 Avg value loss: 0.775354296297 Training thread: 11 Num iters: 1K Avg policy loss: -0.185568212742 Avg value loss: 0.713746232403 Training thread: 10 Num iters: 1K Avg policy loss: -0.0643703758532 Avg value loss: 0.743953702528 Training thread: 6 Num iters: 1K Avg policy loss: -0.260978381567 Avg value loss: 0.682364837718 Time 15h 45m 03s, Avg Reward 0.088, Avg Accuracy 0.24, Avg Ep length 5.0, Best Reward 0.184 Training thread: 0 Num iters: 1K Avg policy loss: 0.00585684951555 Avg value loss: 0.793881461014 Training thread: 14 Num iters: 1K Avg policy loss: -0.193044243555 Avg value loss: 0.703806976835 Training thread: 1 Num iters: 1K Avg policy loss: -0.184739318009 Avg value loss: 0.707147910109 Time 15h 56m 53s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 4.84, Best Reward 0.184 Training thread: 9 Num iters: 1K Avg policy loss: -0.0644458006085 Avg value loss: 0.785048694798 Training thread: 4 Num iters: 1K Avg policy loss: 0.00307522571788 Avg value loss: 0.817228694934 Training thread: 2 Num iters: 1K Avg policy loss: -0.0580463716025 Avg value loss: 0.795366896593 Time 16h 09m 23s, Avg Reward -0.08, Avg Accuracy 0.1, Avg Ep length 4.94, Best Reward 0.184 Time 16h 22m 41s, Avg Reward 0.112, Avg Accuracy 0.26, Avg Ep length 5.64, Best Reward 0.184 Time 16h 38m 12s, Avg Reward 0.016, Avg Accuracy 0.18, Avg Ep length 6.64, Best Reward 0.184 Time 16h 54m 38s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 7.22, Best Reward 0.184 Time 17h 10m 50s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 7.0, Best Reward 0.184 Time 17h 24m 54s, Avg Reward 0.136, Avg Accuracy 0.28, Avg Ep length 5.96, Best Reward 0.184 Time 17h 38m 40s, Avg Reward 0.016, Avg Accuracy 0.18, Avg Ep length 5.6, Best Reward 0.184 Time 17h 50m 38s, Avg Reward 0.016, Avg Accuracy 0.18, Avg Ep length 5.0, Best Reward 0.184 Time 18h 02m 37s, Avg Reward 0.112, Avg Accuracy 0.26, Avg Ep length 5.0, Best Reward 0.184 Time 18h 13m 26s, Avg Reward 0.016, Avg Accuracy 0.18, Avg Ep length 4.4, Best Reward 0.184 Time 18h 23m 55s, Avg Reward 0.016, Avg Accuracy 0.18, Avg Ep length 4.08, Best Reward 0.184 Time 18h 34m 18s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 4.0, Best Reward 0.184 Time 18h 45m 35s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 4.58, Best Reward 0.184 Time 18h 57m 34s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 4.86, Best Reward 0.184 Time 19h 09m 44s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 5.0, Best Reward 0.184 Time 19h 22m 10s, Avg Reward 0.112, Avg Accuracy 0.26, Avg Ep length 5.0, Best Reward 0.184 Time 19h 34m 37s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 5.0, Best Reward 0.184 Time 19h 47m 00s, Avg Reward -0.056, Avg Accuracy 0.12, Avg Ep length 5.0, Best Reward 0.184 Time 19h 59m 30s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 5.0, Best Reward 0.184 Time 20h 11m 21s, Avg Reward 0.16, Avg Accuracy 0.3, Avg Ep length 5.0, Best Reward 0.184 Time 20h 23m 52s, Avg Reward 0.016, Avg Accuracy 0.18, Avg Ep length 5.0, Best Reward 0.184 Time 20h 36m 31s, Avg Reward 0.064, Avg Accuracy 0.22, Avg Ep length 5.0, Best Reward 0.184 Time 20h 48m 41s, Avg Reward -0.032, Avg Accuracy 0.14, Avg Ep length 5.0, Best Reward 0.184 Time 21h 01m 03s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 5.0, Best Reward 0.184 Time 21h 13m 27s, Avg Reward 0.016, Avg Accuracy 0.18, Avg Ep length 5.0, Best Reward 0.184 Time 21h 25m 45s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 5.0, Best Reward 0.184 Time 21h 37m 54s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 5.0, Best Reward 0.184 Time 21h 50m 15s, Avg Reward -0.032, Avg Accuracy 0.14, Avg Ep length 5.04, Best Reward 0.184 Time 22h 02m 34s, Avg Reward -0.032, Avg Accuracy 0.14, Avg Ep length 5.0, Best Reward 0.184
    Time 22h 14m 52s, Avg Reward -0.032, Avg Accuracy 0.14, Avg Ep length 5.0, Best Reward 0.184 Time 22h 27m 06s, Avg Reward 0.184, Avg Accuracy 0.32, Avg Ep length 5.0, Best Reward 0.184 Time 22h 39m 22s, Avg Reward 0.016, Avg Accuracy 0.18, Avg Ep length 5.0, Best Reward 0.184


    System config:

    lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 40 Thread(s) per core: 2 Core(s) per socket: 10 Socket(s): 2 NUMA node(s): 2 CPU family: 6 Model: 62 Model name: Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz

    It'll be really helpful if you could point me to what could be going wrong in my training procedure. Regards

    opened by soumikdasgupta 6
  • Hardware specifications

    Hardware specifications

    I have been running the code on 32 threads since 3 days but still haven't been able to get avg Accuracy of more than 0.5 for the 'easy' difficulty. Can you specify the hardware that was used for getting the result mentioned in the paper?

    opened by a7b23 3
  • Weird Targets for train instructions

    Weird Targets for train instructions

    We are looking into some train instructions and there are few things which are strange. Would be great if someone clarifies this. All the instructions mentioned below are from instruction_train.json.

    { "instruction": "Go to the red object", "targets": [ "ShortRedTorch", "ShortRedColumn", "TallRedColumn", "RedTorch", "RedSkull", "RedCard", "BlueArmor" ], "description": "red object" } How come "BlueArmor" be a red object?

    { "instruction": "Go to the smallest green object", "targets": [ "GreenArmor", "TallGreenColumn", "ShortGreenTorch", "ShortGreenColumn", "GreenTorch" ], "description": "smallest green object" }

    How come "TallGreenColumn" be a small object?

    There are several such cases in the instruction set.

    opened by y12uc231 2
  • About the Vizdoom problem

    About the Vizdoom problem

    Hi, I have a question about the VizDoom environment in your project. It seems like for every loop, you didn't do game.close() but just reset it, will this cause memory leak problems?

    opened by pengzhi1998 1
  • How long to train to reproduce the paper results?

    How long to train to reproduce the paper results?

    Hi

    I am experimenting with the code but I don't know how long to let the code run to get the results mentioned in the paper? Can you please let me know that?

    Thanks

    opened by Singularity42 1
  • Getting issue while running python env_test.py (Please help !)

    Getting issue while running python env_test.py (Please help !)

    AL lib: (WW) alc_initconfig: Failed to initialize backend "pulse"

    *** Fatal Error *** Address not mapped to object (signal 11) Address: 0x8

    Generating vizdoom-crash.log and killing process 30193, please wait... 29 ../sysdeps/unix/sysv/linux/waitpid.c: No such file or directory. Traceback (most recent call last): File "env_test.py", line 53, in env.game_init().add_args("+vid_forcesurface 1") File "/new_data/gpu/shivansh/visual-grounding/DeepRL-Grounding/env.py", line 48, in game_init game.init() vizdoom.vizdoom.ViZDoomErrorException: Unexpected ViZDoom instance crash.

    opened by rshivansh 1
  • Dimension Error in Inference

    Dimension Error in Inference

    Hi

    I was trying to run ' python a3c_main.py --evaluate 2 --load saved/pretrained_model' to run inference using the pre-trained model. However, I faced the following dimension error without changing the code:

    File "/home/rparik/Downloads/Documents/CMU/Fall18/multi-modal/project/baselines/DeepRL-Grounding/models.py", line 88, in forward
      _, encoder_hidden = self.gru(word_embedding, encoder_hidden)
    File "/home/rparik/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
      result = self.forward(*input, **kwargs)
    File "/home/rparik/anaconda3/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 178, in forward
      self.check_forward_args(input, hx, batch_sizes)
    File "/home/rparik/anaconda3/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 126, in check_forward_args
      expected_input_dim, input.dim()))
    RuntimeError: input must have 3 dimensions, got 2
    
    

    Any leads would be appreciated

    opened by Mrs-Hudson 3
Owner
Devendra Chaplot
Ph.D. student in Machine Learning Dept., School of Computer Science, CMU.
Devendra Chaplot
Implementation of EMNLP 2017 Paper "Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog" using PyTorch and ParlAI

Language Emergence in Multi Agent Dialog Code for the Paper Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog Satwik Kottur, José M.

Karan Desai 105 Nov 25, 2022
Implementation of EMNLP 2017 Paper "Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog" using PyTorch and ParlAI

Language Emergence in Multi Agent Dialog Code for the Paper Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog Satwik Kottur, José M.

Karan Desai 105 Nov 25, 2022
This project uses reinforcement learning on stock market and agent tries to learn trading. The goal is to check if the agent can learn to read tape. The project is dedicated to hero in life great Jesse Livermore.

Reinforcement-trading This project uses Reinforcement learning on stock market and agent tries to learn trading. The goal is to check if the agent can

Deepender Singla 1.4k Dec 22, 2022
ALFRED - A Benchmark for Interpreting Grounded Instructions for Everyday Tasks

ALFRED A Benchmark for Interpreting Grounded Instructions for Everyday Tasks Mohit Shridhar, Jesse Thomason, Daniel Gordon, Yonatan Bisk, Winson Han,

ALFRED 204 Dec 15, 2022
Trading and Backtesting environment for training reinforcement learning agent or simple rule base algo.

TradingGym TradingGym is a toolkit for training and backtesting the reinforcement learning algorithms. This was inspired by OpenAI Gym and imitated th

Yvictor 1.1k Jan 2, 2023
Multi-agent reinforcement learning algorithm and environment

Multi-agent reinforcement learning algorithm and environment [en/cn] Pytorch implements multi-agent reinforcement learning algorithms including IQL, Q

万鲲鹏 7 Sep 20, 2022
Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation This repository is the pytorch implementation of our paper: Hierarchical Cr

null 43 Nov 21, 2022
This is a simple backtesting framework to help you test your crypto currency trading. It includes a way to download and store historical crypto data and to execute a trading strategy.

You can use this simple crypto backtesting script to ensure your trading strategy is successful Minimal setup required and works well with static TP a

Andrei 154 Sep 12, 2022
Convert Mission Planner (ArduCopter) Waypoint Missions to Litchi CSV Format to execute on DJI Drones

Mission Planner to Litchi Convert Mission Planner (ArduCopter) Waypoint Surveys to Litchi CSV Format to execute on DJI Drones Litchi doesn't support S

Yaros 24 Dec 9, 2022
🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

English | 简体中文 | 繁體中文 State-of-the-art Natural Language Processing for Jax, PyTorch and TensorFlow ?? Transformers provides thousands of pretrained mo

Hugging Face 77.2k Jan 2, 2023
A Pytorch implementation of the multi agent deep deterministic policy gradients (MADDPG) algorithm

Multi-Agent-Deep-Deterministic-Policy-Gradients A Pytorch implementation of the multi agent deep deterministic policy gradients(MADDPG) algorithm This

Phil Tabor 159 Dec 28, 2022
PyTorch implementation for ACL 2021 paper "Maria: A Visual Experience Powered Conversational Agent".

Maria: A Visual Experience Powered Conversational Agent This repository is the Pytorch implementation of our paper "Maria: A Visual Experience Powered

Jokie 22 Dec 12, 2022
Learning to Communicate with Deep Multi-Agent Reinforcement Learning in PyTorch

Learning to Communicate with Deep Multi-Agent Reinforcement Learning This is a PyTorch implementation of the original Lua code release. Overview This

Minqi 297 Dec 12, 2022
Pytorch implementations of popular off-policy multi-agent reinforcement learning algorithms, including QMix, VDN, MADDPG, and MATD3.

Off-Policy Multi-Agent Reinforcement Learning (MARL) Algorithms This repository contains implementations of various off-policy multi-agent reinforceme

null 183 Dec 28, 2022
Pytorch implementation of CVPR2020 paper “VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation”

VectorNet Re-implementation This is the unofficial pytorch implementation of CVPR2020 paper "VectorNet: Encoding HD Maps and Agent Dynamics from Vecto

null 120 Jan 6, 2023
Pytorch modules for paralel models with same architecture. Ideal for multi agent-based systems

WideLinears Pytorch parallel Neural Networks A package of pytorch modules for fast paralellization of separate deep neural networks. Ideal for agent-b

null 1 Dec 17, 2021
Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Pattern Pattern is a web mining module for Python. It has tools for: Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM par

Computational Linguistics Research Group 8.4k Jan 3, 2023
Uncertain natural language inference

Uncertain Natural Language Inference This repository hosts the code for the following paper: Tongfei Chen*, Zhengping Jiang*, Adam Poliak, Keisuke Sak

Tongfei Chen 14 Sep 1, 2022