ALFRED - A Benchmark for Interpreting Grounded Instructions for Everyday Tasks

ALFRED

Last update: Dec 15, 2022

Related tags

Deep Learning alfred

Overview

ALFRED

A Benchmark for Interpreting Grounded Instructions for Everyday Tasks
Mohit Shridhar, Jesse Thomason, Daniel Gordon, Yonatan Bisk,
Winson Han, Roozbeh Mottaghi, Luke Zettlemoyer, Dieter Fox
CVPR 2020

ALFRED (Action Learning From Realistic Environments and Directives), is a new benchmark for learning a mapping from natural language instructions and egocentric vision to sequences of actions for household tasks. Long composition rollouts with non-reversible state changes are among the phenomena we include to shrink the gap between research benchmarks and real-world applications.

For the latest updates, see: askforalfred.com

What more? Checkout ALFWorld – interactive TextWorld environments for ALFRED scenes!

Quickstart

Clone repo:

$ git clone https://github.com/askforalfred/alfred.git alfred
$ export ALFRED_ROOT=$(pwd)/alfred

Install requirements:

$ virtualenv -p $(which python3) --system-site-packages alfred_env # or whichever package manager you prefer
$ source alfred_env/bin/activate

$ cd $ALFRED_ROOT
$ pip install --upgrade pip
$ pip install -r requirements.txt

Download Trajectory JSONs and Resnet feats (~17GB):

$ cd $ALFRED_ROOT/data
$ sh download_data.sh json_feat

Train models:

$ cd $ALFRED_ROOT
$ python models/train/train_seq2seq.py --data data/json_feat_2.1.0 --model seq2seq_im_mask --dout exp/model:{model},name:pm_and_subgoals_01 --splits data/splits/oct21.json --gpu --batch 8 --pm_aux_loss_wt 0.1 --subgoal_aux_loss_wt 0.1

More Info

Dataset: Downloading full dataset, Folder structure, JSON structure.
Models: Training and Evaluation, File structure, Pre-trained models.
Data Generation: Generation, Replay Checks, Data Augmentation (high-res, depth, segementation masks etc).
Errata: Updated numbers for Goto subgoal evaluation.
THOR 2.1.0 Docs: Deprecated documentation from Ai2-THOR 2.1.0 release.
FAQ: Frequently Asked Questions.

SOTA Models

Open-source models that outperform the Seq2Seq baselines from ALFRED:

Episodic Transformer for Vision-and-Language Navigation
Alexander Pashevich, Cordelia Schmid, Chen Sun
Paper, Code

MOCA: A Modular Object-Centric Approach for Interactive Instruction Following
Kunal Pratap Singh*, Suvaansh Bhambri*, Byeonghwi Kim*, Roozbeh Mottaghi, Jonghyun Choi
Paper, Code

Contact Mohit to add your model here.

Prerequisites

Python 3
PyTorch 1.1.0
Torchvision 0.3.0
AI2THOR 2.1.0

See requirements.txt for all prerequisites

Hardware

Tested on:

GPU - GTX 1080 Ti (12GB)
CPU - Intel Xeon (Quad Core)
RAM - 16GB
OS - Ubuntu 16.04

Leaderboard

Run your model on test seen and unseen sets, and create an action-sequence dump of your agent:

$ cd $ALFRED_ROOT
$ python models/eval/leaderboard.py --model_path <model_path>/model.pth --model models.model.seq2seq_im_mask --data data/json_feat_2.1.0 --gpu --num_threads 5

This will create a JSON file, e.g. task_results_20191218_081448_662435.json, inside the <model_path> folder. Submit this JSON here: AI2 ALFRED Leaderboard. For rules and restrictions, see the getting started page.

Rules:

You are only allowed to use RGB and language instructions (goal & step-by-step) as input for your agents. You cannot use additional depth, mask, metadata info etc. from the simulator on Test Seen and Test Unseen scenes. However, during training you are allowed to use additional info for auxiliary losses etc.
During evaluation, agents are restricted to max_steps=1000 and max_fails=10. Do not change these settings in the leaderboard script; these modifications will not be reflected in the evaluation server.
Pick a legible model name for the submission. Just "baseline" is not very descriptive.
All submissions must be attempts to solve the ALFRED dataset.
Answer the following questions in the description: a. Did you use additional sensory information from THOR as input, eg: depth, segmentation masks, class masks, panoramic images etc. during test-time? If so, please report them. b. Did you use the alignments between step-by-step instructions and expert action-sequences for training or testing? (no by default; the instructions are serialized into a single sentence)
Share who you are: provide a team name and affiliation.
(Optional) Share how you solved it: if possible, share information about how the task was solved. Link an academic paper or code repository if public.
Only submit your own work: you may evaluate any model on the validation set, but must only submit your own work for evaluation against the test set.

Docker Setup

Install Docker and NVIDIA Docker.

Modify docker_build.py and docker_run.py to your needs.

Build

Build the image:

$ python scripts/docker_build.py

Run (Local)

For local machines:

$ python scripts/docker_run.py
 
  source ~/alfred_env/bin/activate
  cd $ALFRED_ROOT

Run (Headless)

For headless VMs and Cloud-Instances:

$ python scripts/docker_run.py --headless 

  # inside docker
  tmux new -s startx  # start a new tmux session

  # start nvidia-xconfig (might have to run this twice)
  sudo nvidia-xconfig -a --use-display-device=None --virtual=1280x1024
  sudo nvidia-xconfig -a --use-display-device=None --virtual=1280x1024

  # start X server on DISPLAY 0
  # single X server should be sufficient for multiple instances of THOR
  sudo python ~/alfred/scripts/startx.py 0  # if this throws errors e.g "(EE) Server terminated with error (1)" or "(EE) already running ..." try a display > 0

  # detach from tmux shell
  # Ctrl+b then d

  # source env
  source ~/alfred_env/bin/activate
  
  # set DISPLAY variable to match X server
  export DISPLAY=:0

  # check THOR
  cd $ALFRED_ROOT
  python scripts/check_thor.py

  ###############
  ## (300, 300, 3)
  ## Everything works!!!

You might have to modify X_DISPLAY in gen/constants.py depending on which display you use.

Cloud Instance

ALFRED can be setup on headless machines like AWS or GoogleCloud instances. The main requirement is that you have access to a GPU machine that supports OpenGL rendering. Run startx.py in a tmux shell:

# start tmux session
$ tmux new -s startx 

# start X server on DISPLAY 0
# single X server should be sufficient for multiple instances of THOR
$ sudo python $ALFRED_ROOT/scripts/startx.py 0  # if this throws errors e.g "(EE) Server terminated with error (1)" or "(EE) already running ..." try a display > 0

# detach from tmux shell
# Ctrl+b then d

# set DISPLAY variable to match X server
$ export DISPLAY=:0

# check THOR
$ cd $ALFRED_ROOT
$ python scripts/check_thor.py

###############
## (300, 300, 3)
## Everything works!!!

You might have to modify X_DISPLAY in gen/constants.py depending on which display you use.

Also, checkout this guide: Setting up THOR on Google Cloud

Citation

If you find the dataset or code useful, please cite:

@inproceedings{ALFRED20,
  title ={{ALFRED: A Benchmark for Interpreting Grounded
           Instructions for Everyday Tasks}},
  author={Mohit Shridhar and Jesse Thomason and Daniel Gordon and Yonatan Bisk and
          Winson Han and Roozbeh Mottaghi and Luke Zettlemoyer and Dieter Fox},
  booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2020},
  url  = {https://arxiv.org/abs/1912.01734}
}

License

MIT License

Change Log

14/10/2020:

Added errata for Goto subgoal evaluation.

28/10/2020:

Added --use_templated_goals option to train with templated goals instead of human-annotated goal descriptions.

26/10/2020:

Fixed missing stop-frame in Modeling Quickstart dataset (json_feat_2.1.0.zip).

07/04/2020:

Updated download links. Switched from Google Cloud to AWS. Old download links will be deactivated.

28/03/2020:

Updated the mask-interaction API to use IoU scores instead of max pixel count for selecting objects.
Results table in the paper will be updated with new numbers.

Contact

Questions or issues? Contact [email protected]

Comments

The process is being "Killed"

Hi there,

After preprocessing, the process is being killed after a couple of warnings as follow:

warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")

Killed: 0%|▏ | 4/2628 [00:11<2:09:00, 2.95s/it]

Also, all the evaluations are being killed for both best_seen and best_unseen checkpoints:

{'tests_seen': 1533, 'tests_unseen': 1529, 'train': 21023, 'valid_seen': 820, 'valid_unseen': 821} Loading: best_checkpoints/best_unseen.pth Killed

{'tests_seen': 1533, 'tests_unseen': 1529, 'train': 21023, 'valid_seen': 820, 'valid_unseen': 821} Loading: best_checkpoints/best_seen.pth Killed

I was wondering somebody who've faced this issue could help. I've already installed all the dependencies as mentioned, as well as adding CUDA 10.0 for the project.

opened by NavidRajabi 29

How do I understand the npy file of the floorplan layout?

Hi, thanks for sharing this amazing dataset! I try to get the floorplan from the npy file, and I have tried NumPy to load the file in Python. However, I find it difficult to understand the NumPy array I got from the npy file. How should I interpret the content of the floorplan npy files? Thank you.

Here's the code I use to load the file:

import numpy as np
npyData = np.load("example/fp2/FloorPlan2-layout.npy")
print(npyData.shape)
print(npyData)

Here's the output of the shape: (115, 2) and the array:

[[-0.75 -0.75]
 [-0.5  -0.75]
 [-0.25 -0.75]
 [ 0.   -0.75]
 [ 0.25 -0.75]
 [ 0.5  -0.75]
 [ 0.75 -0.75]
 [ 1.   -0.75]
 [-0.75 -0.5 ]
 [ 0.75 -0.5 ]
 [ 1.   -0.5 ]
 [-0.75 -0.25]
 [ 0.75 -0.25]
 [ 1.   -0.25]
 [-0.75  0.  ]
 [ 0.75  0.  ]
 [ 1.    0.  ]
 [ 1.25  0.  ]
 [-0.75  0.25]
 [ 0.75  0.25]
 [ 1.    0.25]
 [ 1.25  0.25]
 [-0.75  0.5 ]
 [ 0.75  0.5 ]
 [ 1.    0.5 ]
 [ 1.25  0.5 ]
 [-0.75  0.75]
 [ 0.75  0.75]
 [ 1.    0.75]
 [ 1.25  0.75]
 [-0.75  1.  ]
 [ 0.75  1.  ]
 [ 1.    1.  ]
 [ 1.25  1.  ]
 [-0.75  1.25]
 [ 0.75  1.25]
 [ 1.    1.25]
 [ 1.25  1.25]
 [-1.25  1.5 ]
 [-1.    1.5 ]
 [-0.75  1.5 ]
 [ 0.75  1.5 ]
 [ 1.    1.5 ]
 [-1.25  1.75]
 [-1.    1.75]
 [-0.75  1.75]
 [-0.5   1.75]
 [ 0.5   1.75]
 [ 0.75  1.75]
 [-1.5   2.  ]
 [-1.25  2.  ]
 [-1.    2.  ]
 [-0.75  2.  ]
 [-0.5   2.  ]
 [-0.25  2.  ]
 [ 0.    2.  ]
 [ 0.25  2.  ]
 [ 0.5   2.  ]
 [ 0.75  2.  ]
 [-1.75  2.25]
 [-1.5   2.25]
 [-1.25  2.25]
 [-1.    2.25]
 [-0.75  2.25]
 [-0.5   2.25]
 [-0.25  2.25]
 [ 0.    2.25]
 [ 0.25  2.25]
 [ 0.5   2.25]
 [ 0.75  2.25]
 [-2.5   2.5 ]
 [-2.25  2.5 ]
 [-2.    2.5 ]
 [-1.75  2.5 ]
 [-1.5   2.5 ]
 [-1.25  2.5 ]
 [-1.    2.5 ]
 [-0.75  2.5 ]
 [-0.5   2.5 ]
 [-0.25  2.5 ]
 [ 0.    2.5 ]
 [ 0.25  2.5 ]
 [ 0.5   2.5 ]
 [ 0.75  2.5 ]
 [-2.5   2.75]
 [-2.25  2.75]
 [-2.    2.75]
 [-1.75  2.75]
 [-1.5   2.75]
 [-1.25  2.75]
 [-1.    2.75]
 [-0.75  2.75]
 [-0.5   2.75]
 [-0.25  2.75]
 [ 0.    2.75]
 [ 0.25  2.75]
 [ 0.5   2.75]
 [ 0.75  2.75]
 [-2.5   3.  ]
 [-2.25  3.  ]
 [-2.    3.  ]
 [-1.75  3.  ]
 [-1.5   3.  ]
 [-1.25  3.  ]
 [-1.    3.  ]
 [-0.75  3.  ]
 [-0.5   3.  ]
 [-0.25  3.  ]
 [ 0.    3.  ]
 [ 0.25  3.  ]
 [ 0.5   3.  ]
 [ 0.75  3.  ]
 [-1.    3.25]
 [-0.75  3.25]
 [-0.5   3.25]]

question

opened by wenda-gu 12

Get different feature vectors when loading from feat_conv.pt and from resnet.featurize()

Hi,

I am doing this example: full_2.1.0/train/pick_and_place_with_movable_recep-ButterKnife-Cup-SinkBasin-2/trial_T20190908_233322_447979/raw_images

For image 000000000.jpg

feat = resnet.featurize([Image.open(fname)], batch=1)
print(feat[0][:5,:5])

I get

tensor([[[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0188, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.6659, 0.2334, 0.4881, 0.0220],
         [0.0000, 0.0278, 0.0000, 0.1306, 0.0000, 0.0177, 0.0000],
         [0.2511, 0.9387, 0.5719, 0.2475, 0.1024, 0.3862, 0.1884],
         [1.4310, 1.5767, 0.8272, 0.0000, 0.0000, 0.0000, 0.3523]],

        [[0.0000, 0.0549, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
         [0.1023, 0.2866, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
         [0.4572, 0.0000, 0.0000, 0.0000, 0.4733, 0.8688, 0.6622],
         [0.9684, 0.0573, 0.0000, 0.0000, 0.0489, 0.0000, 0.0655]],

        [[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000]],

        [[1.8689, 2.2256, 0.6508, 1.0258, 0.5759, 0.9021, 0.6726],
         [2.5713, 3.0914, 1.0797, 1.3719, 0.9788, 1.8322, 1.6944],
         [3.7268, 3.6742, 1.5358, 1.2200, 0.9661, 2.6235, 2.2480],
         [4.2898, 4.0467, 1.9082, 0.6326, 0.2264, 1.4761, 1.8504],
         [4.6841, 4.2882, 1.2279, 0.0133, 0.0000, 0.8532, 1.6886]],

        [[0.0000, 0.0000, 0.0000, 0.1095, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.3814, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.7656, 0.0000, 0.4508, 0.0000],
         [0.0000, 0.0000, 0.0000, 1.1697, 0.3166, 0.8373, 0.0000],
         [0.0000, 0.0000, 0.0052, 1.3592, 0.6858, 1.1441, 0.0000]]])

When using

x = torch.load("feat_conv.pt")
print(x[0][:5,:5])

I get

tensor([[[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0987, 0.0000, 0.4179, 0.0000, 0.3932, 0.0267],
         [0.1942, 0.3280, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
         [0.4623, 1.1134, 0.7551, 0.0720, 0.0000, 0.2512, 0.1772],
         [1.6237, 1.7352, 1.0370, 0.0000, 0.0000, 0.0000, 0.2772]],

        [[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
         [0.1637, 0.0102, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
         [0.2434, 0.0000, 0.0000, 0.0000, 0.1376, 0.5788, 0.3524],
         [0.7411, 0.0000, 0.0000, 0.0000, 0.2674, 0.1301, 0.0218]],

        [[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000]],

        [[1.7606, 1.9361, 0.6170, 1.0957, 0.7715, 1.0960, 0.9179],
         [2.4290, 2.7654, 0.9339, 1.2526, 1.0871, 1.8147, 1.7731],
         [3.6967, 3.4739, 1.3849, 1.0991, 1.0435, 2.3166, 1.9403],
         [4.4716, 4.3128, 2.0116, 0.7454, 0.2938, 1.2445, 1.4761],
         [5.1107, 4.7808, 1.3802, 0.2358, 0.0000, 1.0597, 1.6283]],

        [[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.1443, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.6003, 0.0000, 0.3137, 0.0000],
         [0.0000, 0.0000, 0.0000, 1.2218, 0.5109, 0.9440, 0.0120],
         [0.0000, 0.0000, 0.0878, 1.5063, 0.8553, 1.0575, 0.0000]]])

opened by shuyanzhou 11

UPDATE: Unity process crashes with driver mismatch inside ai2thor-docker with startx.py, Ubuntu 18.04

I've been following along with #48 since I'm also trying to run ALFRED evaluation with THOR on a headless machine where I don't have root access. So far, I've modified the ai2thor-docker repo so that it installs ai2thor==2.1.0 (I had to also add RUN pip3 install --upgrade torch torchvision to the Dockerfile because there were some compatibility issues with the pytorch being 1.1.0 instead of 1.6.0 and torchvision being 0.3.0 instead of 0.7.0, since I was getting errors like

{'tests_seen': 1533,
 'tests_unseen': 1529,
 'train': 21023,
 'valid_seen': 820,
 'valid_unseen': 821}
Loading:  exp/model:seq2seq_im_mask,name:pm_and_subgoals_01/best_seen.pth
Traceback (most recent call last):
  File "/usr/lib/python3.6/tarfile.py", line 188, in nti
    s = nts(s, "ascii", "strict")
  File "/usr/lib/python3.6/tarfile.py", line 172, in nts
    return s.decode(encoding, errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xba in position 1: ordinal not in range(128)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/tarfile.py", line 2299, in next
    tarinfo = self.tarinfo.fromtarfile(self)
  File "/usr/lib/python3.6/tarfile.py", line 1093, in fromtarfile
    obj = cls.frombuf(buf, tarfile.encoding, tarfile.errors)
  File "/usr/lib/python3.6/tarfile.py", line 1035, in frombuf
    chksum = nti(buf[148:156])
  File "/usr/lib/python3.6/tarfile.py", line 191, in nti
    raise InvalidHeaderError("invalid header")
tarfile.InvalidHeaderError: invalid header

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 556, in _load
    return legacy_load(f)
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 467, in legacy_load
    with closing(tarfile.open(fileobj=f, mode='r:', format=tarfile.PAX_FORMAT)) as tar, \
  File "/usr/lib/python3.6/tarfile.py", line 1591, in open
    return func(name, filemode, fileobj, **kwargs)
  File "/usr/lib/python3.6/tarfile.py", line 1621, in taropen
    return cls(name, mode, fileobj, **kwargs)
  File "/usr/lib/python3.6/tarfile.py", line 1484, in __init__
    self.firstmember = self.next()
  File "/usr/lib/python3.6/tarfile.py", line 2311, in next
    raise ReadError(str(e))
tarfile.ReadError: invalid header

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "models/eval/eval_seq2seq.py", line 54, in <module>
    eval = EvalTask(args, manager)
  File "/app/alfred/models/eval/eval.py", line 31, in __init__
    self.model, optimizer = M.Module.load(self.args.model_path)
  File "/app/alfred/models/model/seq2seq.py", line 318, in load
    save = torch.load(fsave)
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 387, in load
    return _load(f, map_location, pickle_module, **pickle_load_args)
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 560, in _load
    raise RuntimeError("{} is a zip archive (did you mean to use torch.jit.load()?)".format(f.name))
RuntimeError: exp/model:seq2seq_im_mask,name:pm_and_subgoals_01/best_seen.pth is a zip archive (did you mean to use torch.jit.load()?)

I started by doing a pretty naive approach where I just moved my ALFRED repo with the quickstart data and the model checkpoints I wanted to evaluate into the Docker build context and copying all of it into the Docker image (which takes a while, but that's a "me" problem). Unfortunately, I get a bus error when attempting to run evaluation on my saved checkpoint, even if I generate the checkpoint by training inside the Docker container:

{'tests_seen': 1533,
 'tests_unseen': 1529,
 'train': 21023,
 'valid_seen': 820,
 'valid_unseen': 821}
Loading:  exp/model:seq2seq_im_mask,name:pm_and_subgoals_01/best_seen.pth
./test.sh: line 3:   117 Bus error               (core dumped) python3 models/eval/eval_seq2seq.py --model_path exp/model:seq2seq_im_mask,name:pm_and_subgoals_01/best_seen.pth --eval_split valid_seen --data data/json_feat_2.1.0 --model models.model.seq2seq_im_mask --gpu --num_threads 1

Update: Tried cloning the alfred repo and downloading the data from inside the docker and training from scratch, but same issue.

The reason I used torch==1.6.0 and torchvision==0.7.0 instead of torch==1.1.0 and torchvision==0.3.0 is that it silences the error

Traceback (most recent call last):
  File "models/train/train_seq2seq.py", line 103, in <module>
    model = model.to(torch.device('cuda'))
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 386, in to
    return self._apply(convert)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 193, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py", line 127, in _apply
    self.flatten_parameters()
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py", line 123, in flatten_parameters
    self.batch_first, bool(self.bidirectional))
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

I suspect the bus error has to do with the version differences, but I'm not quite sure yet.

opened by jzhanson 11

Run alfred on headless servers without root account

Hello there,

I'm trying to deploy the code on the headless servers that I don't have root access. The job is submitted to the servers via a job scheduler so that I even can't ssh to such servers.

I followed your guide in #29, but I got an error when running startx.py. It seems like that the execution needs the root privilege. May you give me some hint how can I work around this problem?

Thank you a lot!

Below is the full output:

python startx.py
Starting X on DISPLAY=:0

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:61:0:0"
EndSection


Section "Screen"
    Identifier     "Screen0"
    Device         "Device0"
    DefaultDepth    24
    Option         "AllowEmptyInitialConfiguration" "True"
    SubSection     "Display"
        Depth       24
        Virtual 1024 768
    EndSubSection
EndSection


Section "Device"
    Identifier     "Device1"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:62:0:0"
EndSection


Section "Screen"
    Identifier     "Screen1"
    Device         "Device1"
    DefaultDepth    24
    Option         "AllowEmptyInitialConfiguration" "True"
    SubSection     "Display"
        Depth       24
        Virtual 1024 768
    EndSubSection
EndSection


Section "Device"
    Identifier     "Device2"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:177:0:0"
EndSection


Section "Screen"
    Identifier     "Screen2"
    Device         "Device2"
    DefaultDepth    24
    Option         "AllowEmptyInitialConfiguration" "True"
    SubSection     "Display"
        Depth       24
        Virtual 1024 768
    EndSubSection
EndSection


Section "Device"
    Identifier     "Device3"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:178:0:0"
EndSection


Section "Screen"
    Identifier     "Screen3"
    Device         "Device3"
    DefaultDepth    24
    Option         "AllowEmptyInitialConfiguration" "True"
    SubSection     "Display"
        Depth       24
        Virtual 1024 768
    EndSubSection
EndSection


Section "ServerLayout"
    Identifier     "Layout0"
    Screen 0 "Screen0" 0 0
    Screen 1 "Screen1" 0 0
    Screen 2 "Screen2" 0 0
    Screen 3 "Screen3" 0 0
EndSection

(EE) 
Fatal server error:
(EE) PAM authentication failed, cannot start X server.
	Perhaps you do not have console ownership?
(EE) 
(EE) 
Please consult the The X.Org Foundation support 
	 at http://wiki.x.org
 for help. 
(EE)

opened by davidnvq 10

Leaderboard submission

Hello,

I am assuming the file which should be submitted is tests_actseqs_dump_{datetime}.json. When I submit the file to the leaderboard, the test status is succeeded but the numbers do not show up. May I ask if this is because my .json file is in the wrong format?

Thanks so much!

Best, Muqiao

opened by muqiaoy 9
Why repeating the same frame to predict <>?

Hi,

I have a question about this line. https://github.com/askforalfred/alfred/blob/6d78d8085699da35371c5f65339a470ffcf89a3e/models/model/seq2seq_im_mask.py#L121

If I understand correctly, frames contains an image which conditions the prediction of action_low of the same index. I assume that the im[-1] in this line is the image when executing the last actual action (e.g. Slice an object), but to predict the stop action, it would be natural to use a new image after the last actual action. Why does it repeat the same frame, or is my understanding correct? Thanks!

opened by Ryou0634 8

Evaluation Error, "No Protocol Specified"

Hello, my environment is

Ubuntu 20.04
GTX 1060

I downloaded the pre-train model and to evaluate it I set up a docker environment. When I run the following command after I run script/run_docker.py, Error occurred.

(alfred_env) yuki@yuki-lab:~/alfred$ python models/eval/eval_seq2seq.py --model_path baseline/best_seen.pth --eval_split valid_seen --data data/json_feat_2.1.0 --model models.model.seq2seq_im_mask --gpu --num_threads 3
{'tests_seen': 1533,
 'tests_unseen': 1529,
 'train': 21023,
 'valid_seen': 820,
 'valid_unseen': 821}
Loading:  baseline/best_seen.pth
Downloading: "https://download.pytorch.org/models/resnet18-5c106cde.pth" to /home/yuki/.cache/torch/checkpoints/resnet18-5c106cde.pth
100%|######################################################################################################################################################| 46827520/46827520 [00:04<00:00, 11243319.04it/s]
thor-201909061227-Linux64: [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%   4.3 MiB/s]  of 390.MB
thor-201909061227-Linux64: [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||                                               70%   3.0 MiB/s]  of 390.MBNo protocol specified
Found path: /home/yuki/.ai2thor/releases/thor-201909061227-Linux64/thor-201909061227-Linux64
Mono path[0] = '/home/yuki/.ai2thor/releases/thor-201909061227-Linux64/thor-201909061227-Linux64_Data/Managed'
Mono config path = '/home/yuki/.ai2thor/releases/thor-201909061227-Linux64/thor-201909061227-Linux64_Data/Mono/etc'
thor-201909061227-Linux64: [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||                                          72%   2.9 MiB/s]  of 390.MBPreloaded 'ScreenSelector.so'
PlayerPrefs - Creating folder: /home/yuki/.config/unity3d/Allen Institute for Artificial Intelligence
PlayerPrefs - Creating folder: /home/yuki/.config/unity3d/Allen Institute for Artificial Intelligence/AI2-Thor
Logging to /home/yuki/.config/unity3d/Allen Institute for Artificial Intelligence/AI2-Thor/Player.log
No protocol specified
thor-201909061227-Linux64: [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%   3.4 MiB/s]  of 390.MB
thor-201909061227-Linux64: [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%   3.3 MiB/s]  of 390.MB
Process Process-4:
Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/yuki/alfred/models/eval/eval_task.py", line 20, in run
    env = ThorEnv()
  File "/home/yuki/alfred/env/thor_env.py", line 33, in __init__
    player_screen_width=player_screen_width)
  File "/home/yuki/alfred_env/lib/python3.5/site-packages/ai2thor/controller.py", line 858, in start
    self.download_binary()
  File "/home/yuki/alfred_env/lib/python3.5/site-packages/ai2thor/controller.py", line 796, in download_binary
    os.rename(extract_dir, os.path.join(self.releases_dir(), self.build_name()))
OSError: [Errno 39] Directory not empty: '/home/yuki/.ai2thor/tmp/thor-201909061227-Linux64' -> '/home/yuki/.ai2thor/releases/thor-201909061227-Linux64'
Process Process-2:
Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/yuki/alfred/models/eval/eval_task.py", line 20, in run
    env = ThorEnv()
  File "/home/yuki/alfred/env/thor_env.py", line 33, in __init__
    player_screen_width=player_screen_width)
  File "/home/yuki/alfred_env/lib/python3.5/site-packages/ai2thor/controller.py", line 858, in start
    self.download_binary()
  File "/home/yuki/alfred_env/lib/python3.5/site-packages/ai2thor/controller.py", line 796, in download_binary
    os.rename(extract_dir, os.path.join(self.releases_dir(), self.build_name()))
OSError: [Errno 39] Directory not empty: '/home/yuki/.ai2thor/tmp/thor-201909061227-Linux64' -> '/home/yuki/.ai2thor/releases/thor-201909061227-Linux64'

And I set num_threads 1, then eval_seq2seq.py print "No Protocol Specified".

yuki@yuki-lab:~/alfred$ python models/eval/eval_seq2seq.py --model_path <model_path>/best_seen.pth --eval_split valid_seen --data data/json_feat_2.1.0 --model models.model.seq2seq_im_mask --gpu --num_threads 3 --subgoals all^C
yuki@yuki-lab:~/alfred$ python models/eval/eval_seq2seq.py --model_path baseline/best_seen.pth --eval_split valid_seen --data data/json_feat_2.1.0 --model models.model.seq2seq_im_mask --gpu --num_threads 1 
{'tests_seen': 1533,
 'tests_unseen': 1529,
 'train': 21023,
 'valid_seen': 820,
 'valid_unseen': 821}
Loading:  baseline/best_seen.pth
Downloading: "https://download.pytorch.org/models/resnet18-5c106cde.pth" to /home/yuki/.cache/torch/checkpoints/resnet18-5c106cde.pth
100%|######################################################################################################################################################| 46827520/46827520 [00:04<00:00, 11261491.30it/s]
thor-201909061227-Linux64: [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%   7.6 MiB/s]  of 390.MB
No protocol specified
Found path: /home/yuki/.ai2thor/releases/thor-201909061227-Linux64/thor-201909061227-Linux64
Mono path[0] = '/home/yuki/.ai2thor/releases/thor-201909061227-Linux64/thor-201909061227-Linux64_Data/Managed'
Mono config path = '/home/yuki/.ai2thor/releases/thor-201909061227-Linux64/thor-201909061227-Linux64_Data/Mono/etc'
Preloaded 'ScreenSelector.so'
PlayerPrefs - Creating folder: /home/yuki/.config/unity3d/Allen Institute for Artificial Intelligence
PlayerPrefs - Creating folder: /home/yuki/.config/unity3d/Allen Institute for Artificial Intelligence/AI2-Thor
Logging to /home/yuki/.config/unity3d/Allen Institute for Artificial Intelligence/AI2-Thor/Player.log
No protocol specified

Is this message correct? And can I deal with the first problem?

Thank you.

opened by yukiTakezawa 8

Why does `feat_conv.pt` has 10 more frames than the number of images?

Hi. Thanks for the amazing repository.

I find that feat_conv.pt has 10 more frames than the image. For example,

for task=pick_cool_then_place_in_recep-LettuceSliced-None-DiningTable-17/trial_T20190909_070538_437648, there are 455 images in traj_data['images'], but feat_conv is of shape 465x512x7x7

Similarly for task=pick_two_obj_and_place-Newspaper-None-GarbageCan-218/trial_T20190907_225356_202464, there are 530 images in traj_data['images'] but feat_conv is of shape 540x512x7x7

Is there any particular reason why this is the case?

opened by TheShadow29 8
It seems the evaluation might have some bugs

This is returning False. However, it should be True. This is from : pick_clean_then_place_in_recep-AppleSliced-None-DiningTable-27/trial_T20190907_151802_277016
bug

opened by nikepupu 7
Converting object positions to planner target poses

Hi,

We were looking into writing reward functions based ALFRED dataset annotated skills that work independently of expert trajectories (so that we can do RL), and everything is completed except for skills corresponding to GotoLocation actions.

Here, we're having an issue with using x/y/z distances + visibility checks to ensure the agent is near a specified object from a GotoLocation task (e.g. table), so we wanted to instead convert the object positions to target agent poses for the planner.

We were thinking of converting object locations (x/y/z and rotation, for moveable objects within a scene) to the discrete pose used in the ground truth scene graph, then finding the nearest location in the graph that the agent can navigate to, and using the A* planning distance threshold.

Given this, how do we actually convert the event metadata object poses to the target discrete locations as presented in the dataset? I'm not sure how to calculate a correct target agent pose from an object location. One issue is finding the right discrete orientation so that the agent is actually facing the object, and another is ensuring that the converted discrete object location is actually reachable.

Thanks!

opened by jesbu1 6

PDDL Trajectory Generation in ProcTHOR

Hi, Thanks for the amazing dataset.

I wanted to generate PDDL-based expert demonstrations in ProcTHOR dataset similar to ALFRED-gen dataset.

I modified the Layout generation script to support the ProcTHOR-10k dataset.

However, I think the AI2THOR nightly build which supports ProcTHOR does not support receptacles. While applying the PutObject action in layout generation, I get this error:

  File "/home/anaconda3/envs/procthor/lib/python3.7/site-packages/ai2thor/controller.py", line 983, in step
    raise ValueError(self.last_event.metadata["errorMessage"])
ValueError: 
        Action: "PutObject" called with invalid argument: 'receptacleObjectId'
        Expected arguments: String objectId, Boolean forceAction = False, Boolean placeStationary = True, Int32 randomSeed = 0
        Your arguments: 'objectId', 'receptacleObjectId', 'forceAction', 'placeStationary'
        Valid ways to call "PutObject" action:
                Void PutObject(String objectId, Boolean forceAction = False, Boolean placeStationary = True, Int32 randomSeed = 0)
                Void PutObject(Single x, Single y, Boolean forceAction = False, Boolean placeStationary = True, Int32 randomSeed = 0, Boolean putNearXY = False)

Is there a way I can fix this issue for PDDL-based data generation in ProcTHOR?

opened by pushkalkatara 9

Mismatch between high-level and low-level actions after preprocessing
Hello!

I'm new to ALFRED and trying to learn it. After reading #84, I find there is still a mismatch between high-level and low-level actions after preprocessing. For example, in pick_two_obj_and_place-AppleSliced-None-CounterTop-10/trial_T20190907_061009_396474, the generated low_to_high_idx is [..., 10, 11, 11, 13]. However, 13 is out of the bound of instr list and also exceeds the maximum high_idx in action_high (which is 12).

I'm not sure if it is a bug because this phenomenon does not cause problems when training baseline models. I think the reason is this line: https://github.com/askforalfred/alfred/blob/1898b83547b589b8635737929e04ee4f2f404177/data/preprocess.py#L212 which may be better like this:

conv['num']['action_low'][-1][0]['high_idx'] = conv['plan']['high_pddl'][-1]['high_idx'] - 1
bug
opened by RavenKiller 1
Is the configuration generated by `nvidia-xconfig` not used?
Does startx.py take into account a config generated by nvidia-xconfig? It seems it doesn't since startx.py uses the following command:

Xorg -noreset +extension GLX +extension RANDR +extension RENDER -config %s :%s

So, Xorg uses a config generated by generate_xorg_conf(devices) not by nvidia-xconfig.

I also tried to start X-server with sudo Xorg -noreset +extension GLX +extension RANDR +extension RENDER -config /etc/X11/xorg.conf and the detected devices were better configured as far as I could see in the Xorg logs.
opened by TopCoder2K 1

augment_trajectories.py throws error regarding Original Image Count and New Image Count

When cloning the repo from scratch and using either just the json feats or the full dataset, running augment_trajectories.py with no changes throws this error for me for all trajectories:

Original Image Count 514, New Image Count 0
Traceback (most recent call last):
  File "/home/jesse/alfred/gen/scripts/augment_trajectories.py", line 264, in run
    augment_traj(env, json_file)
  File "/home/jesse/alfred/gen/scripts/augment_trajectories.py", line 244, in augment_traj
    raise Exception("WARNING: the augmented sequence length doesn't match the original")
Exception: WARNING: the augmented sequence length doesn't match the original
Error: Exception("WARNING: the augmented sequence length doesn't match the original")

opened by jesbu1 3

Package 'setuptools' requires a different Python: 3.5.2 not in '>=3.6'

I'm trying to setup Alfred using docker. Steps to reproduce:

git clone https://github.com/askforalfred/alfred.git alfred
export ALFRED_ROOT=$(pwd)/alfred
cd $ALFRED_ROOT
python scripts/docker_build.py

The last command fails with

ERROR: Package 'setuptools' requires a different Python: 3.5.2 not in '>=3.6'
WARNING: You are using pip version 19.3.1; however, version 20.3.4 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
The command '/bin/sh -c pip install -U setuptools' returned a non-zero code: 1

opened by TopCoder2K 15

KeyError: 'pddl_params' when running generate_trajectories.py

Hi, I meet the error output KeyError: 'pddl_params' when running generate_trajectories.py, as shown in image below:

I was wondering where could be wrong? Thanks!
bug

opened by wang-sj16 4

Owner

ALFRED

GitHub

alfred-py: A deep learning utility library for human

Alfred Alfred is command line tool for deep-learning usage. if you want split an video into image frames or combine frames into a single video, then a

800 Jan 3, 2023

Train an RL agent to execute natural language instructions in a 3D Environment (PyTorch)

Gated-Attention Architectures for Task-Oriented Language Grounding This is a PyTorch implementation of the AAAI-18 paper: Gated-Attention Architecture

234 Nov 5, 2022

A Repository of Community-Driven Natural Instructions

A Repository of Community-Driven Natural Instructions TLDR; this repository maintains a community effort to create a large collection of tasks and the

244 Jan 4, 2023

git《Self-Attention Attribution: Interpreting Information Interactions Inside Transformer》(AAAI 2021) GitHub:

Self-Attention Attribution This repository contains the implementation for AAAI-2021 paper Self-Attention Attribution: Interpreting Information Intera

60 Dec 29, 2022

Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

PyTorch Implementation of Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers 1 Using Colab Please notic

489 Jan 7, 2023

Official implementation of GraphMask as presented in our paper Interpreting Graph Neural Networks for NLP With Differentiable Edge Masking.

GraphMask This repository contains an implementation of GraphMask, the interpretability technique for graph neural networks presented in our ICLR 2021

29 Sep 2, 2022

[CVPR 2020] Interpreting the Latent Space of GANs for Semantic Face Editing

InterFaceGAN - Interpreting the Latent Space of GANs for Semantic Face Editing Figure: High-quality facial attributes editing results with InterFaceGA

GenForce: May Generative Force Be with You

1.3k Dec 29, 2022

[CVPR 2021] A Peek Into the Reasoning of Neural Networks: Interpreting with Structural Visual Concepts

Visual-Reasoning-eXplanation [CVPR 2021 A Peek Into the Reasoning of Neural Networks: Interpreting with Structural Visual Concepts] Project Page | Vid

54 Dec 21, 2022

InterFaceGAN - Interpreting the Latent Space of GANs for Semantic Face Editing

InterFaceGAN - Interpreting the Latent Space of GANs for Semantic Face Editing Figure: High-quality facial attributes editing results with InterFaceGA

1.3k Jan 9, 2023

Code to reproduce the results in the paper "Tensor Component Analysis for Interpreting the Latent Space of GANs".

Tensor Component Analysis for Interpreting the Latent Space of GANs [ paper | project page ] Code to reproduce the results in the paper "Tensor Compon

4 Jun 17, 2022

Official codes for the paper "Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech"

ResDAVEnet-VQ Official PyTorch implementation of Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech What is in this repo? M

21 Aug 23, 2022

DSTC10 Track 2 - Knowledge-grounded Task-oriented Dialogue Modeling on Spoken Conversations

DSTC10 Track 2 - Knowledge-grounded Task-oriented Dialogue Modeling on Spoken Conversations This repository contains the data, scripts and baseline co

51 Dec 17, 2022

Large-scale open domain KNOwledge grounded conVERsation system based on PaddlePaddle

Knover Knover is a toolkit for knowledge grounded dialogue generation based on PaddlePaddle. Knover allows researchers and developers to carry out eff

607 Dec 31, 2022

PyTorch implementation for the visual prior component (i.e. perception module) of the Visually Grounded Physics Learner [Li et al., 2020].

VGPL-Visual-Prior PyTorch implementation for the visual prior component (i.e. perception module) of the Visually Grounded Physics Learner (VGPL). Give

8 Dec 29, 2022

[CVPR'22] Official PyTorch Implementation of Collaborative Transformers for Grounded Situation Recognition

[CVPR'22] Collaborative Transformers for Grounded Situation Recognition Paper | Model Checkpoint This is the official PyTorch implementation of Collab

29 Dec 10, 2022

This is the official repository for evaluation on the NoW Benchmark Dataset. The goal of the NoW benchmark is to introduce a standard evaluation metric to measure the accuracy and robustness of 3D face reconstruction methods from a single image under variations in viewing angle, lighting, and common occlusions.

NoW Evaluation This is the official repository for evaluation on the NoW Benchmark Dataset. The goal of the NoW benchmark is to introduce a standard e

71 Dec 30, 2022

ALFRED - A Benchmark for Interpreting Grounded Instructions for Everyday Tasks

Related tags

Overview

ALFRED

Quickstart

More Info

SOTA Models

Prerequisites

Hardware

Leaderboard

Docker Setup

Build

Run (Local)

Run (Headless)

Cloud Instance

Citation

License

Change Log

Contact

Comments

Hi there,

{'tests_seen': 1533, 'tests_unseen': 1529, 'train': 21023, 'valid_seen': 820, 'valid_unseen': 821} Loading: best_checkpoints/best_unseen.pth Killed

Owner

ALFRED

alfred-py: A deep learning utility library for **human**

Train an RL agent to execute natural language instructions in a 3D Environment (PyTorch)

A Repository of Community-Driven Natural Instructions

git《Self-Attention Attribution: Interpreting Information Interactions Inside Transformer》(AAAI 2021) GitHub:

Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

Official implementation of GraphMask as presented in our paper Interpreting Graph Neural Networks for NLP With Differentiable Edge Masking.

[CVPR 2020] Interpreting the Latent Space of GANs for Semantic Face Editing

[CVPR 2021] A Peek Into the Reasoning of Neural Networks: Interpreting with Structural Visual Concepts

InterFaceGAN - Interpreting the Latent Space of GANs for Semantic Face Editing

Code to reproduce the results in the paper "Tensor Component Analysis for Interpreting the Latent Space of GANs".

Official codes for the paper "Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech"

DSTC10 Track 2 - Knowledge-grounded Task-oriented Dialogue Modeling on Spoken Conversations

Large-scale open domain KNOwledge grounded conVERsation system based on PaddlePaddle

PyTorch implementation for the visual prior component (i.e. perception module) of the Visually Grounded Physics Learner [Li et al., 2020].

[CVPR'22] Official PyTorch Implementation of Collaborative Transformers for Grounded Situation Recognition

A machine learning benchmark of in-the-wild distribution shifts, with data loaders, evaluators, and default models.

DeepMind Alchemy task environment: a meta-reinforcement learning benchmark

OpenMMLab Detection Toolbox and Benchmark

alfred-py: A deep learning utility library for human