ALFRED - A Benchmark for Interpreting Grounded Instructions for Everyday Tasks

Related tags

Deep Learning alfred
Overview

ALFRED

A Benchmark for Interpreting Grounded Instructions for Everyday Tasks
Mohit Shridhar, Jesse Thomason, Daniel Gordon, Yonatan Bisk,
Winson Han, Roozbeh Mottaghi, Luke Zettlemoyer, Dieter Fox
CVPR 2020

ALFRED (Action Learning From Realistic Environments and Directives), is a new benchmark for learning a mapping from natural language instructions and egocentric vision to sequences of actions for household tasks. Long composition rollouts with non-reversible state changes are among the phenomena we include to shrink the gap between research benchmarks and real-world applications.

For the latest updates, see: askforalfred.com

What more? Checkout ALFWorld – interactive TextWorld environments for ALFRED scenes!

Quickstart

Clone repo:

$ git clone https://github.com/askforalfred/alfred.git alfred
$ export ALFRED_ROOT=$(pwd)/alfred

Install requirements:

$ virtualenv -p $(which python3) --system-site-packages alfred_env # or whichever package manager you prefer
$ source alfred_env/bin/activate

$ cd $ALFRED_ROOT
$ pip install --upgrade pip
$ pip install -r requirements.txt

Download Trajectory JSONs and Resnet feats (~17GB):

$ cd $ALFRED_ROOT/data
$ sh download_data.sh json_feat

Train models:

$ cd $ALFRED_ROOT
$ python models/train/train_seq2seq.py --data data/json_feat_2.1.0 --model seq2seq_im_mask --dout exp/model:{model},name:pm_and_subgoals_01 --splits data/splits/oct21.json --gpu --batch 8 --pm_aux_loss_wt 0.1 --subgoal_aux_loss_wt 0.1

More Info

  • Dataset: Downloading full dataset, Folder structure, JSON structure.
  • Models: Training and Evaluation, File structure, Pre-trained models.
  • Data Generation: Generation, Replay Checks, Data Augmentation (high-res, depth, segementation masks etc).
  • Errata: Updated numbers for Goto subgoal evaluation.
  • THOR 2.1.0 Docs: Deprecated documentation from Ai2-THOR 2.1.0 release.
  • FAQ: Frequently Asked Questions.

SOTA Models

Open-source models that outperform the Seq2Seq baselines from ALFRED:

Episodic Transformer for Vision-and-Language Navigation
Alexander Pashevich, Cordelia Schmid, Chen Sun
Paper, Code

MOCA: A Modular Object-Centric Approach for Interactive Instruction Following
Kunal Pratap Singh*, Suvaansh Bhambri*, Byeonghwi Kim*, Roozbeh Mottaghi, Jonghyun Choi
Paper, Code

Contact Mohit to add your model here.

Prerequisites

  • Python 3
  • PyTorch 1.1.0
  • Torchvision 0.3.0
  • AI2THOR 2.1.0

See requirements.txt for all prerequisites

Hardware

Tested on:

  • GPU - GTX 1080 Ti (12GB)
  • CPU - Intel Xeon (Quad Core)
  • RAM - 16GB
  • OS - Ubuntu 16.04

Leaderboard

Run your model on test seen and unseen sets, and create an action-sequence dump of your agent:

$ cd $ALFRED_ROOT
$ python models/eval/leaderboard.py --model_path <model_path>/model.pth --model models.model.seq2seq_im_mask --data data/json_feat_2.1.0 --gpu --num_threads 5

This will create a JSON file, e.g. task_results_20191218_081448_662435.json, inside the <model_path> folder. Submit this JSON here: AI2 ALFRED Leaderboard. For rules and restrictions, see the getting started page.

Rules:

  1. You are only allowed to use RGB and language instructions (goal & step-by-step) as input for your agents. You cannot use additional depth, mask, metadata info etc. from the simulator on Test Seen and Test Unseen scenes. However, during training you are allowed to use additional info for auxiliary losses etc.
  2. During evaluation, agents are restricted to max_steps=1000 and max_fails=10. Do not change these settings in the leaderboard script; these modifications will not be reflected in the evaluation server.
  3. Pick a legible model name for the submission. Just "baseline" is not very descriptive.
  4. All submissions must be attempts to solve the ALFRED dataset.
  5. Answer the following questions in the description: a. Did you use additional sensory information from THOR as input, eg: depth, segmentation masks, class masks, panoramic images etc. during test-time? If so, please report them. b. Did you use the alignments between step-by-step instructions and expert action-sequences for training or testing? (no by default; the instructions are serialized into a single sentence)
  6. Share who you are: provide a team name and affiliation.
  7. (Optional) Share how you solved it: if possible, share information about how the task was solved. Link an academic paper or code repository if public.
  8. Only submit your own work: you may evaluate any model on the validation set, but must only submit your own work for evaluation against the test set.

Docker Setup

Install Docker and NVIDIA Docker.

Modify docker_build.py and docker_run.py to your needs.

Build

Build the image:

$ python scripts/docker_build.py 

Run (Local)

For local machines:

$ python scripts/docker_run.py
 
  source ~/alfred_env/bin/activate
  cd $ALFRED_ROOT

Run (Headless)

For headless VMs and Cloud-Instances:

$ python scripts/docker_run.py --headless 

  # inside docker
  tmux new -s startx  # start a new tmux session

  # start nvidia-xconfig (might have to run this twice)
  sudo nvidia-xconfig -a --use-display-device=None --virtual=1280x1024
  sudo nvidia-xconfig -a --use-display-device=None --virtual=1280x1024

  # start X server on DISPLAY 0
  # single X server should be sufficient for multiple instances of THOR
  sudo python ~/alfred/scripts/startx.py 0  # if this throws errors e.g "(EE) Server terminated with error (1)" or "(EE) already running ..." try a display > 0

  # detach from tmux shell
  # Ctrl+b then d

  # source env
  source ~/alfred_env/bin/activate
  
  # set DISPLAY variable to match X server
  export DISPLAY=:0

  # check THOR
  cd $ALFRED_ROOT
  python scripts/check_thor.py

  ###############
  ## (300, 300, 3)
  ## Everything works!!!

You might have to modify X_DISPLAY in gen/constants.py depending on which display you use.

Cloud Instance

ALFRED can be setup on headless machines like AWS or GoogleCloud instances. The main requirement is that you have access to a GPU machine that supports OpenGL rendering. Run startx.py in a tmux shell:

# start tmux session
$ tmux new -s startx 

# start X server on DISPLAY 0
# single X server should be sufficient for multiple instances of THOR
$ sudo python $ALFRED_ROOT/scripts/startx.py 0  # if this throws errors e.g "(EE) Server terminated with error (1)" or "(EE) already running ..." try a display > 0

# detach from tmux shell
# Ctrl+b then d

# set DISPLAY variable to match X server
$ export DISPLAY=:0

# check THOR
$ cd $ALFRED_ROOT
$ python scripts/check_thor.py

###############
## (300, 300, 3)
## Everything works!!!

You might have to modify X_DISPLAY in gen/constants.py depending on which display you use.

Also, checkout this guide: Setting up THOR on Google Cloud

Citation

If you find the dataset or code useful, please cite:

@inproceedings{ALFRED20,
  title ={{ALFRED: A Benchmark for Interpreting Grounded
           Instructions for Everyday Tasks}},
  author={Mohit Shridhar and Jesse Thomason and Daniel Gordon and Yonatan Bisk and
          Winson Han and Roozbeh Mottaghi and Luke Zettlemoyer and Dieter Fox},
  booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2020},
  url  = {https://arxiv.org/abs/1912.01734}
}

License

MIT License

Change Log

14/10/2020:

  • Added errata for Goto subgoal evaluation.

28/10/2020:

  • Added --use_templated_goals option to train with templated goals instead of human-annotated goal descriptions.

26/10/2020:

  • Fixed missing stop-frame in Modeling Quickstart dataset (json_feat_2.1.0.zip).

07/04/2020:

  • Updated download links. Switched from Google Cloud to AWS. Old download links will be deactivated.

28/03/2020:

  • Updated the mask-interaction API to use IoU scores instead of max pixel count for selecting objects.
  • Results table in the paper will be updated with new numbers.

Contact

Questions or issues? Contact [email protected]

Comments
  • The process is being

    The process is being "Killed"

    Hi there,

    After preprocessing, the process is being killed after a couple of warnings as follow:

    warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")

    Killed: 0%|▏ | 4/2628 [00:11<2:09:00, 2.95s/it]

    Also, all the evaluations are being killed for both best_seen and best_unseen checkpoints:

    {'tests_seen': 1533, 'tests_unseen': 1529, 'train': 21023, 'valid_seen': 820, 'valid_unseen': 821} Loading: best_checkpoints/best_unseen.pth Killed

    {'tests_seen': 1533, 'tests_unseen': 1529, 'train': 21023, 'valid_seen': 820, 'valid_unseen': 821} Loading: best_checkpoints/best_seen.pth Killed

    I was wondering somebody who've faced this issue could help. I've already installed all the dependencies as mentioned, as well as adding CUDA 10.0 for the project.

    opened by NavidRajabi 29
  • How do I understand the npy file of the floorplan layout?

    How do I understand the npy file of the floorplan layout?

    Hi, thanks for sharing this amazing dataset! I try to get the floorplan from the npy file, and I have tried NumPy to load the file in Python. However, I find it difficult to understand the NumPy array I got from the npy file. How should I interpret the content of the floorplan npy files? Thank you.

    Here's the code I use to load the file:

    import numpy as np
    npyData = np.load("example/fp2/FloorPlan2-layout.npy")
    print(npyData.shape)
    print(npyData)
    

    Here's the output of the shape: (115, 2) and the array:

    [[-0.75 -0.75]
     [-0.5  -0.75]
     [-0.25 -0.75]
     [ 0.   -0.75]
     [ 0.25 -0.75]
     [ 0.5  -0.75]
     [ 0.75 -0.75]
     [ 1.   -0.75]
     [-0.75 -0.5 ]
     [ 0.75 -0.5 ]
     [ 1.   -0.5 ]
     [-0.75 -0.25]
     [ 0.75 -0.25]
     [ 1.   -0.25]
     [-0.75  0.  ]
     [ 0.75  0.  ]
     [ 1.    0.  ]
     [ 1.25  0.  ]
     [-0.75  0.25]
     [ 0.75  0.25]
     [ 1.    0.25]
     [ 1.25  0.25]
     [-0.75  0.5 ]
     [ 0.75  0.5 ]
     [ 1.    0.5 ]
     [ 1.25  0.5 ]
     [-0.75  0.75]
     [ 0.75  0.75]
     [ 1.    0.75]
     [ 1.25  0.75]
     [-0.75  1.  ]
     [ 0.75  1.  ]
     [ 1.    1.  ]
     [ 1.25  1.  ]
     [-0.75  1.25]
     [ 0.75  1.25]
     [ 1.    1.25]
     [ 1.25  1.25]
     [-1.25  1.5 ]
     [-1.    1.5 ]
     [-0.75  1.5 ]
     [ 0.75  1.5 ]
     [ 1.    1.5 ]
     [-1.25  1.75]
     [-1.    1.75]
     [-0.75  1.75]
     [-0.5   1.75]
     [ 0.5   1.75]
     [ 0.75  1.75]
     [-1.5   2.  ]
     [-1.25  2.  ]
     [-1.    2.  ]
     [-0.75  2.  ]
     [-0.5   2.  ]
     [-0.25  2.  ]
     [ 0.    2.  ]
     [ 0.25  2.  ]
     [ 0.5   2.  ]
     [ 0.75  2.  ]
     [-1.75  2.25]
     [-1.5   2.25]
     [-1.25  2.25]
     [-1.    2.25]
     [-0.75  2.25]
     [-0.5   2.25]
     [-0.25  2.25]
     [ 0.    2.25]
     [ 0.25  2.25]
     [ 0.5   2.25]
     [ 0.75  2.25]
     [-2.5   2.5 ]
     [-2.25  2.5 ]
     [-2.    2.5 ]
     [-1.75  2.5 ]
     [-1.5   2.5 ]
     [-1.25  2.5 ]
     [-1.    2.5 ]
     [-0.75  2.5 ]
     [-0.5   2.5 ]
     [-0.25  2.5 ]
     [ 0.    2.5 ]
     [ 0.25  2.5 ]
     [ 0.5   2.5 ]
     [ 0.75  2.5 ]
     [-2.5   2.75]
     [-2.25  2.75]
     [-2.    2.75]
     [-1.75  2.75]
     [-1.5   2.75]
     [-1.25  2.75]
     [-1.    2.75]
     [-0.75  2.75]
     [-0.5   2.75]
     [-0.25  2.75]
     [ 0.    2.75]
     [ 0.25  2.75]
     [ 0.5   2.75]
     [ 0.75  2.75]
     [-2.5   3.  ]
     [-2.25  3.  ]
     [-2.    3.  ]
     [-1.75  3.  ]
     [-1.5   3.  ]
     [-1.25  3.  ]
     [-1.    3.  ]
     [-0.75  3.  ]
     [-0.5   3.  ]
     [-0.25  3.  ]
     [ 0.    3.  ]
     [ 0.25  3.  ]
     [ 0.5   3.  ]
     [ 0.75  3.  ]
     [-1.    3.25]
     [-0.75  3.25]
     [-0.5   3.25]]
    
    question 
    opened by wenda-gu 12
  • Get different feature vectors when loading from feat_conv.pt and from resnet.featurize()

    Get different feature vectors when loading from feat_conv.pt and from resnet.featurize()

    Hi,

    I am doing this example: full_2.1.0/train/pick_and_place_with_movable_recep-ButterKnife-Cup-SinkBasin-2/trial_T20190908_233322_447979/raw_images

    For image 000000000.jpg

    feat = resnet.featurize([Image.open(fname)], batch=1)
    print(feat[0][:5,:5])
    

    I get

    tensor([[[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0188, 0.0000],
             [0.0000, 0.0000, 0.0000, 0.6659, 0.2334, 0.4881, 0.0220],
             [0.0000, 0.0278, 0.0000, 0.1306, 0.0000, 0.0177, 0.0000],
             [0.2511, 0.9387, 0.5719, 0.2475, 0.1024, 0.3862, 0.1884],
             [1.4310, 1.5767, 0.8272, 0.0000, 0.0000, 0.0000, 0.3523]],
    
            [[0.0000, 0.0549, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
             [0.1023, 0.2866, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
             [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
             [0.4572, 0.0000, 0.0000, 0.0000, 0.4733, 0.8688, 0.6622],
             [0.9684, 0.0573, 0.0000, 0.0000, 0.0489, 0.0000, 0.0655]],
    
            [[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
             [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
             [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
             [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
             [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000]],
    
            [[1.8689, 2.2256, 0.6508, 1.0258, 0.5759, 0.9021, 0.6726],
             [2.5713, 3.0914, 1.0797, 1.3719, 0.9788, 1.8322, 1.6944],
             [3.7268, 3.6742, 1.5358, 1.2200, 0.9661, 2.6235, 2.2480],
             [4.2898, 4.0467, 1.9082, 0.6326, 0.2264, 1.4761, 1.8504],
             [4.6841, 4.2882, 1.2279, 0.0133, 0.0000, 0.8532, 1.6886]],
    
            [[0.0000, 0.0000, 0.0000, 0.1095, 0.0000, 0.0000, 0.0000],
             [0.0000, 0.0000, 0.0000, 0.3814, 0.0000, 0.0000, 0.0000],
             [0.0000, 0.0000, 0.0000, 0.7656, 0.0000, 0.4508, 0.0000],
             [0.0000, 0.0000, 0.0000, 1.1697, 0.3166, 0.8373, 0.0000],
             [0.0000, 0.0000, 0.0052, 1.3592, 0.6858, 1.1441, 0.0000]]])
    

    When using

    x = torch.load("feat_conv.pt")
    print(x[0][:5,:5])
    

    I get

    tensor([[[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
             [0.0000, 0.0987, 0.0000, 0.4179, 0.0000, 0.3932, 0.0267],
             [0.1942, 0.3280, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
             [0.4623, 1.1134, 0.7551, 0.0720, 0.0000, 0.2512, 0.1772],
             [1.6237, 1.7352, 1.0370, 0.0000, 0.0000, 0.0000, 0.2772]],
    
            [[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
             [0.1637, 0.0102, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
             [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
             [0.2434, 0.0000, 0.0000, 0.0000, 0.1376, 0.5788, 0.3524],
             [0.7411, 0.0000, 0.0000, 0.0000, 0.2674, 0.1301, 0.0218]],
    
            [[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
             [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
             [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
             [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
             [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000]],
    
            [[1.7606, 1.9361, 0.6170, 1.0957, 0.7715, 1.0960, 0.9179],
             [2.4290, 2.7654, 0.9339, 1.2526, 1.0871, 1.8147, 1.7731],
             [3.6967, 3.4739, 1.3849, 1.0991, 1.0435, 2.3166, 1.9403],
             [4.4716, 4.3128, 2.0116, 0.7454, 0.2938, 1.2445, 1.4761],
             [5.1107, 4.7808, 1.3802, 0.2358, 0.0000, 1.0597, 1.6283]],
    
            [[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
             [0.0000, 0.0000, 0.0000, 0.1443, 0.0000, 0.0000, 0.0000],
             [0.0000, 0.0000, 0.0000, 0.6003, 0.0000, 0.3137, 0.0000],
             [0.0000, 0.0000, 0.0000, 1.2218, 0.5109, 0.9440, 0.0120],
             [0.0000, 0.0000, 0.0878, 1.5063, 0.8553, 1.0575, 0.0000]]])
    
    opened by shuyanzhou 11
  • UPDATE: Unity process crashes with driver mismatch inside ai2thor-docker with startx.py, Ubuntu 18.04

    UPDATE: Unity process crashes with driver mismatch inside ai2thor-docker with startx.py, Ubuntu 18.04

    I've been following along with #48 since I'm also trying to run ALFRED evaluation with THOR on a headless machine where I don't have root access. So far, I've modified the ai2thor-docker repo so that it installs ai2thor==2.1.0 (I had to also add RUN pip3 install --upgrade torch torchvision to the Dockerfile because there were some compatibility issues with the pytorch being 1.1.0 instead of 1.6.0 and torchvision being 0.3.0 instead of 0.7.0, since I was getting errors like

    {'tests_seen': 1533,
     'tests_unseen': 1529,
     'train': 21023,
     'valid_seen': 820,
     'valid_unseen': 821}
    Loading:  exp/model:seq2seq_im_mask,name:pm_and_subgoals_01/best_seen.pth
    Traceback (most recent call last):
      File "/usr/lib/python3.6/tarfile.py", line 188, in nti
        s = nts(s, "ascii", "strict")
      File "/usr/lib/python3.6/tarfile.py", line 172, in nts
        return s.decode(encoding, errors)
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xba in position 1: ordinal not in range(128)
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/usr/lib/python3.6/tarfile.py", line 2299, in next
        tarinfo = self.tarinfo.fromtarfile(self)
      File "/usr/lib/python3.6/tarfile.py", line 1093, in fromtarfile
        obj = cls.frombuf(buf, tarfile.encoding, tarfile.errors)
      File "/usr/lib/python3.6/tarfile.py", line 1035, in frombuf
        chksum = nti(buf[148:156])
      File "/usr/lib/python3.6/tarfile.py", line 191, in nti
        raise InvalidHeaderError("invalid header")
    tarfile.InvalidHeaderError: invalid header
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 556, in _load
        return legacy_load(f)
      File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 467, in legacy_load
        with closing(tarfile.open(fileobj=f, mode='r:', format=tarfile.PAX_FORMAT)) as tar, \
      File "/usr/lib/python3.6/tarfile.py", line 1591, in open
        return func(name, filemode, fileobj, **kwargs)
      File "/usr/lib/python3.6/tarfile.py", line 1621, in taropen
        return cls(name, mode, fileobj, **kwargs)
      File "/usr/lib/python3.6/tarfile.py", line 1484, in __init__
        self.firstmember = self.next()
      File "/usr/lib/python3.6/tarfile.py", line 2311, in next
        raise ReadError(str(e))
    tarfile.ReadError: invalid header
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "models/eval/eval_seq2seq.py", line 54, in <module>
        eval = EvalTask(args, manager)
      File "/app/alfred/models/eval/eval.py", line 31, in __init__
        self.model, optimizer = M.Module.load(self.args.model_path)
      File "/app/alfred/models/model/seq2seq.py", line 318, in load
        save = torch.load(fsave)
      File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 387, in load
        return _load(f, map_location, pickle_module, **pickle_load_args)
      File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 560, in _load
        raise RuntimeError("{} is a zip archive (did you mean to use torch.jit.load()?)".format(f.name))
    RuntimeError: exp/model:seq2seq_im_mask,name:pm_and_subgoals_01/best_seen.pth is a zip archive (did you mean to use torch.jit.load()?)
    

    ).

    I started by doing a pretty naive approach where I just moved my ALFRED repo with the quickstart data and the model checkpoints I wanted to evaluate into the Docker build context and copying all of it into the Docker image (which takes a while, but that's a "me" problem). Unfortunately, I get a bus error when attempting to run evaluation on my saved checkpoint, even if I generate the checkpoint by training inside the Docker container:

    {'tests_seen': 1533,
     'tests_unseen': 1529,
     'train': 21023,
     'valid_seen': 820,
     'valid_unseen': 821}
    Loading:  exp/model:seq2seq_im_mask,name:pm_and_subgoals_01/best_seen.pth
    ./test.sh: line 3:   117 Bus error               (core dumped) python3 models/eval/eval_seq2seq.py --model_path exp/model:seq2seq_im_mask,name:pm_and_subgoals_01/best_seen.pth --eval_split valid_seen --data data/json_feat_2.1.0 --model models.model.seq2seq_im_mask --gpu --num_threads 1
    

    Update: Tried cloning the alfred repo and downloading the data from inside the docker and training from scratch, but same issue.

    The reason I used torch==1.6.0 and torchvision==0.7.0 instead of torch==1.1.0 and torchvision==0.3.0 is that it silences the error

    Traceback (most recent call last):
      File "models/train/train_seq2seq.py", line 103, in <module>
        model = model.to(torch.device('cuda'))
      File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 386, in to
        return self._apply(convert)
      File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 193, in _apply
        module._apply(fn)
      File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py", line 127, in _apply
        self.flatten_parameters()
      File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py", line 123, in flatten_parameters
        self.batch_first, bool(self.bidirectional))
    RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
    

    I suspect the bus error has to do with the version differences, but I'm not quite sure yet.

    opened by jzhanson 11
  • Run alfred on headless servers without root account

    Run alfred on headless servers without root account

    Hello there,

    I'm trying to deploy the code on the headless servers that I don't have root access. The job is submitted to the servers via a job scheduler so that I even can't ssh to such servers.

    I followed your guide in #29, but I got an error when running startx.py. It seems like that the execution needs the root privilege. May you give me some hint how can I work around this problem?

    Thank you a lot!

    Below is the full output:

    python startx.py
    Starting X on DISPLAY=:0
    
    Section "Device"
        Identifier     "Device0"
        Driver         "nvidia"
        VendorName     "NVIDIA Corporation"
        BusID          "PCI:61:0:0"
    EndSection
    
    
    Section "Screen"
        Identifier     "Screen0"
        Device         "Device0"
        DefaultDepth    24
        Option         "AllowEmptyInitialConfiguration" "True"
        SubSection     "Display"
            Depth       24
            Virtual 1024 768
        EndSubSection
    EndSection
    
    
    Section "Device"
        Identifier     "Device1"
        Driver         "nvidia"
        VendorName     "NVIDIA Corporation"
        BusID          "PCI:62:0:0"
    EndSection
    
    
    Section "Screen"
        Identifier     "Screen1"
        Device         "Device1"
        DefaultDepth    24
        Option         "AllowEmptyInitialConfiguration" "True"
        SubSection     "Display"
            Depth       24
            Virtual 1024 768
        EndSubSection
    EndSection
    
    
    Section "Device"
        Identifier     "Device2"
        Driver         "nvidia"
        VendorName     "NVIDIA Corporation"
        BusID          "PCI:177:0:0"
    EndSection
    
    
    Section "Screen"
        Identifier     "Screen2"
        Device         "Device2"
        DefaultDepth    24
        Option         "AllowEmptyInitialConfiguration" "True"
        SubSection     "Display"
            Depth       24
            Virtual 1024 768
        EndSubSection
    EndSection
    
    
    Section "Device"
        Identifier     "Device3"
        Driver         "nvidia"
        VendorName     "NVIDIA Corporation"
        BusID          "PCI:178:0:0"
    EndSection
    
    
    Section "Screen"
        Identifier     "Screen3"
        Device         "Device3"
        DefaultDepth    24
        Option         "AllowEmptyInitialConfiguration" "True"
        SubSection     "Display"
            Depth       24
            Virtual 1024 768
        EndSubSection
    EndSection
    
    
    Section "ServerLayout"
        Identifier     "Layout0"
        Screen 0 "Screen0" 0 0
        Screen 1 "Screen1" 0 0
        Screen 2 "Screen2" 0 0
        Screen 3 "Screen3" 0 0
    EndSection
    
    (EE) 
    Fatal server error:
    (EE) PAM authentication failed, cannot start X server.
    	Perhaps you do not have console ownership?
    (EE) 
    (EE) 
    Please consult the The X.Org Foundation support 
    	 at http://wiki.x.org
     for help. 
    (EE) 
    
    opened by davidnvq 10
  • Leaderboard submission

    Leaderboard submission

    Hello,

    I am assuming the file which should be submitted is tests_actseqs_dump_{datetime}.json. When I submit the file to the leaderboard, the test status is succeeded but the numbers do not show up. May I ask if this is because my .json file is in the wrong format?

    Thanks so much!

    Best, Muqiao Screen Shot 2020-07-21 at 11 56 49 PM

    opened by muqiaoy 9
  • Why repeating the same frame to predict <<stop>>?

    Why repeating the same frame to predict <>?

    Hi,

    I have a question about this line. https://github.com/askforalfred/alfred/blob/6d78d8085699da35371c5f65339a470ffcf89a3e/models/model/seq2seq_im_mask.py#L121

    If I understand correctly, frames contains an image which conditions the prediction of action_low of the same index. I assume that the im[-1] in this line is the image when executing the last actual action (e.g. Slice an object), but to predict the stop action, it would be natural to use a new image after the last actual action. Why does it repeat the same frame, or is my understanding correct? Thanks!

    opened by Ryou0634 8
  • Evaluation Error,

    Evaluation Error, "No Protocol Specified"

    Hello, my environment is

    • Ubuntu 20.04
    • GTX 1060

    I downloaded the pre-train model and to evaluate it I set up a docker environment. When I run the following command after I run script/run_docker.py, Error occurred.

    (alfred_env) yuki@yuki-lab:~/alfred$ python models/eval/eval_seq2seq.py --model_path baseline/best_seen.pth --eval_split valid_seen --data data/json_feat_2.1.0 --model models.model.seq2seq_im_mask --gpu --num_threads 3
    {'tests_seen': 1533,
     'tests_unseen': 1529,
     'train': 21023,
     'valid_seen': 820,
     'valid_unseen': 821}
    Loading:  baseline/best_seen.pth
    Downloading: "https://download.pytorch.org/models/resnet18-5c106cde.pth" to /home/yuki/.cache/torch/checkpoints/resnet18-5c106cde.pth
    100%|######################################################################################################################################################| 46827520/46827520 [00:04<00:00, 11243319.04it/s]
    thor-201909061227-Linux64: [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%   4.3 MiB/s]  of 390.MB
    thor-201909061227-Linux64: [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||                                               70%   3.0 MiB/s]  of 390.MBNo protocol specified
    Found path: /home/yuki/.ai2thor/releases/thor-201909061227-Linux64/thor-201909061227-Linux64
    Mono path[0] = '/home/yuki/.ai2thor/releases/thor-201909061227-Linux64/thor-201909061227-Linux64_Data/Managed'
    Mono config path = '/home/yuki/.ai2thor/releases/thor-201909061227-Linux64/thor-201909061227-Linux64_Data/Mono/etc'
    thor-201909061227-Linux64: [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||                                          72%   2.9 MiB/s]  of 390.MBPreloaded 'ScreenSelector.so'
    PlayerPrefs - Creating folder: /home/yuki/.config/unity3d/Allen Institute for Artificial Intelligence
    PlayerPrefs - Creating folder: /home/yuki/.config/unity3d/Allen Institute for Artificial Intelligence/AI2-Thor
    Logging to /home/yuki/.config/unity3d/Allen Institute for Artificial Intelligence/AI2-Thor/Player.log
    No protocol specified
    thor-201909061227-Linux64: [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%   3.4 MiB/s]  of 390.MB
    thor-201909061227-Linux64: [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%   3.3 MiB/s]  of 390.MB
    Process Process-4:
    Traceback (most recent call last):
      File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
        self.run()
      File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
        self._target(*self._args, **self._kwargs)
      File "/home/yuki/alfred/models/eval/eval_task.py", line 20, in run
        env = ThorEnv()
      File "/home/yuki/alfred/env/thor_env.py", line 33, in __init__
        player_screen_width=player_screen_width)
      File "/home/yuki/alfred_env/lib/python3.5/site-packages/ai2thor/controller.py", line 858, in start
        self.download_binary()
      File "/home/yuki/alfred_env/lib/python3.5/site-packages/ai2thor/controller.py", line 796, in download_binary
        os.rename(extract_dir, os.path.join(self.releases_dir(), self.build_name()))
    OSError: [Errno 39] Directory not empty: '/home/yuki/.ai2thor/tmp/thor-201909061227-Linux64' -> '/home/yuki/.ai2thor/releases/thor-201909061227-Linux64'
    Process Process-2:
    Traceback (most recent call last):
      File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
        self.run()
      File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
        self._target(*self._args, **self._kwargs)
      File "/home/yuki/alfred/models/eval/eval_task.py", line 20, in run
        env = ThorEnv()
      File "/home/yuki/alfred/env/thor_env.py", line 33, in __init__
        player_screen_width=player_screen_width)
      File "/home/yuki/alfred_env/lib/python3.5/site-packages/ai2thor/controller.py", line 858, in start
        self.download_binary()
      File "/home/yuki/alfred_env/lib/python3.5/site-packages/ai2thor/controller.py", line 796, in download_binary
        os.rename(extract_dir, os.path.join(self.releases_dir(), self.build_name()))
    OSError: [Errno 39] Directory not empty: '/home/yuki/.ai2thor/tmp/thor-201909061227-Linux64' -> '/home/yuki/.ai2thor/releases/thor-201909061227-Linux64'
    

    And I set num_threads 1, then eval_seq2seq.py print "No Protocol Specified".

    yuki@yuki-lab:~/alfred$ python models/eval/eval_seq2seq.py --model_path <model_path>/best_seen.pth --eval_split valid_seen --data data/json_feat_2.1.0 --model models.model.seq2seq_im_mask --gpu --num_threads 3 --subgoals all^C
    yuki@yuki-lab:~/alfred$ python models/eval/eval_seq2seq.py --model_path baseline/best_seen.pth --eval_split valid_seen --data data/json_feat_2.1.0 --model models.model.seq2seq_im_mask --gpu --num_threads 1 
    {'tests_seen': 1533,
     'tests_unseen': 1529,
     'train': 21023,
     'valid_seen': 820,
     'valid_unseen': 821}
    Loading:  baseline/best_seen.pth
    Downloading: "https://download.pytorch.org/models/resnet18-5c106cde.pth" to /home/yuki/.cache/torch/checkpoints/resnet18-5c106cde.pth
    100%|######################################################################################################################################################| 46827520/46827520 [00:04<00:00, 11261491.30it/s]
    thor-201909061227-Linux64: [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%   7.6 MiB/s]  of 390.MB
    No protocol specified
    Found path: /home/yuki/.ai2thor/releases/thor-201909061227-Linux64/thor-201909061227-Linux64
    Mono path[0] = '/home/yuki/.ai2thor/releases/thor-201909061227-Linux64/thor-201909061227-Linux64_Data/Managed'
    Mono config path = '/home/yuki/.ai2thor/releases/thor-201909061227-Linux64/thor-201909061227-Linux64_Data/Mono/etc'
    Preloaded 'ScreenSelector.so'
    PlayerPrefs - Creating folder: /home/yuki/.config/unity3d/Allen Institute for Artificial Intelligence
    PlayerPrefs - Creating folder: /home/yuki/.config/unity3d/Allen Institute for Artificial Intelligence/AI2-Thor
    Logging to /home/yuki/.config/unity3d/Allen Institute for Artificial Intelligence/AI2-Thor/Player.log
    No protocol specified
    
    

    Is this message correct? And can I deal with the first problem?

    Thank you.

    opened by yukiTakezawa 8
  • Why does `feat_conv.pt` has 10 more frames than the number of images?

    Why does `feat_conv.pt` has 10 more frames than the number of images?

    Hi. Thanks for the amazing repository.

    I find that feat_conv.pt has 10 more frames than the image. For example,

    for task=pick_cool_then_place_in_recep-LettuceSliced-None-DiningTable-17/trial_T20190909_070538_437648, there are 455 images in traj_data['images'], but feat_conv is of shape 465x512x7x7

    Similarly for task=pick_two_obj_and_place-Newspaper-None-GarbageCan-218/trial_T20190907_225356_202464, there are 530 images in traj_data['images'] but feat_conv is of shape 540x512x7x7

    Is there any particular reason why this is the case?

    opened by TheShadow29 8
  • It seems the evaluation might have some bugs

    It seems the evaluation might have some bugs

    Screen Shot 2021-07-05 at 11 37 59 PM Screen Shot 2021-07-05 at 11 37 04 PM

    This is returning False. However, it should be True. This is from : pick_clean_then_place_in_recep-AppleSliced-None-DiningTable-27/trial_T20190907_151802_277016

    bug 
    opened by nikepupu 7
  • Converting object positions to planner target poses

    Converting object positions to planner target poses

    Hi,

    We were looking into writing reward functions based ALFRED dataset annotated skills that work independently of expert trajectories (so that we can do RL), and everything is completed except for skills corresponding to GotoLocation actions.

    Here, we're having an issue with using x/y/z distances + visibility checks to ensure the agent is near a specified object from a GotoLocation task (e.g. table), so we wanted to instead convert the object positions to target agent poses for the planner.

    We were thinking of converting object locations (x/y/z and rotation, for moveable objects within a scene) to the discrete pose used in the ground truth scene graph, then finding the nearest location in the graph that the agent can navigate to, and using the A* planning distance threshold.

    Given this, how do we actually convert the event metadata object poses to the target discrete locations as presented in the dataset? I'm not sure how to calculate a correct target agent pose from an object location. One issue is finding the right discrete orientation so that the agent is actually facing the object, and another is ensuring that the converted discrete object location is actually reachable.

    Thanks!

    opened by jesbu1 6
  • PDDL Trajectory Generation in ProcTHOR

    PDDL Trajectory Generation in ProcTHOR

    Hi, Thanks for the amazing dataset.

    I wanted to generate PDDL-based expert demonstrations in ProcTHOR dataset similar to ALFRED-gen dataset.

    I modified the Layout generation script to support the ProcTHOR-10k dataset.

    However, I think the AI2THOR nightly build which supports ProcTHOR does not support receptacles. While applying the PutObject action in layout generation, I get this error:

      File "/home/anaconda3/envs/procthor/lib/python3.7/site-packages/ai2thor/controller.py", line 983, in step
        raise ValueError(self.last_event.metadata["errorMessage"])
    ValueError: 
            Action: "PutObject" called with invalid argument: 'receptacleObjectId'
            Expected arguments: String objectId, Boolean forceAction = False, Boolean placeStationary = True, Int32 randomSeed = 0
            Your arguments: 'objectId', 'receptacleObjectId', 'forceAction', 'placeStationary'
            Valid ways to call "PutObject" action:
                    Void PutObject(String objectId, Boolean forceAction = False, Boolean placeStationary = True, Int32 randomSeed = 0)
                    Void PutObject(Single x, Single y, Boolean forceAction = False, Boolean placeStationary = True, Int32 randomSeed = 0, Boolean putNearXY = False)
    

    Is there a way I can fix this issue for PDDL-based data generation in ProcTHOR?

    opened by pushkalkatara 9
  • Mismatch between high-level and low-level actions after preprocessing

    Mismatch between high-level and low-level actions after preprocessing

    Hello!

    I'm new to ALFRED and trying to learn it. After reading #84, I find there is still a mismatch between high-level and low-level actions after preprocessing. For example, in pick_two_obj_and_place-AppleSliced-None-CounterTop-10/trial_T20190907_061009_396474, the generated low_to_high_idx is [..., 10, 11, 11, 13]. However, 13 is out of the bound of instr list and also exceeds the maximum high_idx in action_high (which is 12).

    I'm not sure if it is a bug because this phenomenon does not cause problems when training baseline models. I think the reason is this line: https://github.com/askforalfred/alfred/blob/1898b83547b589b8635737929e04ee4f2f404177/data/preprocess.py#L212 which may be better like this:

    conv['num']['action_low'][-1][0]['high_idx'] = conv['plan']['high_pddl'][-1]['high_idx'] - 1
    
    bug 
    opened by RavenKiller 1
  • Is the configuration generated by `nvidia-xconfig` not used?

    Is the configuration generated by `nvidia-xconfig` not used?

    Does startx.py take into account a config generated by nvidia-xconfig? It seems it doesn't since startx.py uses the following command:

    Xorg -noreset +extension GLX +extension RANDR +extension RENDER -config %s :%s
    

    So, Xorg uses a config generated by generate_xorg_conf(devices) not by nvidia-xconfig.

    I also tried to start X-server with sudo Xorg -noreset +extension GLX +extension RANDR +extension RENDER -config /etc/X11/xorg.conf and the detected devices were better configured as far as I could see in the Xorg logs.

    opened by TopCoder2K 1
  • augment_trajectories.py throws error regarding Original Image Count and New Image Count

    augment_trajectories.py throws error regarding Original Image Count and New Image Count

    When cloning the repo from scratch and using either just the json feats or the full dataset, running augment_trajectories.py with no changes throws this error for me for all trajectories:

    Original Image Count 514, New Image Count 0
    Traceback (most recent call last):
      File "/home/jesse/alfred/gen/scripts/augment_trajectories.py", line 264, in run
        augment_traj(env, json_file)
      File "/home/jesse/alfred/gen/scripts/augment_trajectories.py", line 244, in augment_traj
        raise Exception("WARNING: the augmented sequence length doesn't match the original")
    Exception: WARNING: the augmented sequence length doesn't match the original
    Error: Exception("WARNING: the augmented sequence length doesn't match the original")
    
    opened by jesbu1 3
  • Package 'setuptools' requires a different Python: 3.5.2 not in '>=3.6'

    Package 'setuptools' requires a different Python: 3.5.2 not in '>=3.6'

    I'm trying to setup Alfred using docker. Steps to reproduce:

    git clone https://github.com/askforalfred/alfred.git alfred
    export ALFRED_ROOT=$(pwd)/alfred
    cd $ALFRED_ROOT
    python scripts/docker_build.py 
    

    The last command fails with

    ERROR: Package 'setuptools' requires a different Python: 3.5.2 not in '>=3.6'
    WARNING: You are using pip version 19.3.1; however, version 20.3.4 is available.
    You should consider upgrading via the 'pip install --upgrade pip' command.
    The command '/bin/sh -c pip install -U setuptools' returned a non-zero code: 1
    
    opened by TopCoder2K 15
  • KeyError: 'pddl_params' when running generate_trajectories.py

    KeyError: 'pddl_params' when running generate_trajectories.py

    Hi, I meet the error output KeyError: 'pddl_params' when running generate_trajectories.py, as shown in image below:

    image

    I was wondering where could be wrong? Thanks!

    bug 
    opened by wang-sj16 4
Owner
ALFRED
ALFRED
alfred-py: A deep learning utility library for **human**

Alfred Alfred is command line tool for deep-learning usage. if you want split an video into image frames or combine frames into a single video, then a

JinTian 800 Jan 3, 2023
Train an RL agent to execute natural language instructions in a 3D Environment (PyTorch)

Gated-Attention Architectures for Task-Oriented Language Grounding This is a PyTorch implementation of the AAAI-18 paper: Gated-Attention Architecture

Devendra Chaplot 234 Nov 5, 2022
A Repository of Community-Driven Natural Instructions

A Repository of Community-Driven Natural Instructions TLDR; this repository maintains a community effort to create a large collection of tasks and the

AI2 244 Jan 4, 2023
git《Self-Attention Attribution: Interpreting Information Interactions Inside Transformer》(AAAI 2021) GitHub:

Self-Attention Attribution This repository contains the implementation for AAAI-2021 paper Self-Attention Attribution: Interpreting Information Intera

null 60 Dec 29, 2022
Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

PyTorch Implementation of Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers 1 Using Colab Please notic

Hila Chefer 489 Jan 7, 2023
Official implementation of GraphMask as presented in our paper Interpreting Graph Neural Networks for NLP With Differentiable Edge Masking.

GraphMask This repository contains an implementation of GraphMask, the interpretability technique for graph neural networks presented in our ICLR 2021

Michael Schlichtkrull 29 Sep 2, 2022
[CVPR 2020] Interpreting the Latent Space of GANs for Semantic Face Editing

InterFaceGAN - Interpreting the Latent Space of GANs for Semantic Face Editing Figure: High-quality facial attributes editing results with InterFaceGA

GenForce: May Generative Force Be with You 1.3k Dec 29, 2022
[CVPR 2021] A Peek Into the Reasoning of Neural Networks: Interpreting with Structural Visual Concepts

Visual-Reasoning-eXplanation [CVPR 2021 A Peek Into the Reasoning of Neural Networks: Interpreting with Structural Visual Concepts] Project Page | Vid

Andy_Ge 54 Dec 21, 2022
InterFaceGAN - Interpreting the Latent Space of GANs for Semantic Face Editing

InterFaceGAN - Interpreting the Latent Space of GANs for Semantic Face Editing Figure: High-quality facial attributes editing results with InterFaceGA

GenForce: May Generative Force Be with You 1.3k Jan 9, 2023
Code to reproduce the results in the paper "Tensor Component Analysis for Interpreting the Latent Space of GANs".

Tensor Component Analysis for Interpreting the Latent Space of GANs [ paper | project page ] Code to reproduce the results in the paper "Tensor Compon

James Oldfield 4 Jun 17, 2022
Official codes for the paper "Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech"

ResDAVEnet-VQ Official PyTorch implementation of Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech What is in this repo? M

Wei-Ning Hsu 21 Aug 23, 2022
DSTC10 Track 2 - Knowledge-grounded Task-oriented Dialogue Modeling on Spoken Conversations

DSTC10 Track 2 - Knowledge-grounded Task-oriented Dialogue Modeling on Spoken Conversations This repository contains the data, scripts and baseline co

Alexa 51 Dec 17, 2022
Large-scale open domain KNOwledge grounded conVERsation system based on PaddlePaddle

Knover Knover is a toolkit for knowledge grounded dialogue generation based on PaddlePaddle. Knover allows researchers and developers to carry out eff

null 607 Dec 31, 2022
PyTorch implementation for the visual prior component (i.e. perception module) of the Visually Grounded Physics Learner [Li et al., 2020].

VGPL-Visual-Prior PyTorch implementation for the visual prior component (i.e. perception module) of the Visually Grounded Physics Learner (VGPL). Give

Toru 8 Dec 29, 2022
[CVPR'22] Official PyTorch Implementation of Collaborative Transformers for Grounded Situation Recognition

[CVPR'22] Collaborative Transformers for Grounded Situation Recognition Paper | Model Checkpoint This is the official PyTorch implementation of Collab

Junhyeong Cho 29 Dec 10, 2022
A machine learning benchmark of in-the-wild distribution shifts, with data loaders, evaluators, and default models.

WILDS is a benchmark of in-the-wild distribution shifts spanning diverse data modalities and applications, from tumor identification to wildlife monitoring to poverty mapping.

P-Lambda 437 Dec 30, 2022
DeepMind Alchemy task environment: a meta-reinforcement learning benchmark

The DeepMind Alchemy environment is a meta-reinforcement learning benchmark that presents tasks sampled from a task distribution with deep underlying structure.

DeepMind 188 Dec 25, 2022
OpenMMLab Detection Toolbox and Benchmark

MMDetection is an open source object detection toolbox based on PyTorch. It is a part of the OpenMMLab project.

OpenMMLab 22.5k Jan 5, 2023