TEACh is a dataset of human-human interactive dialogues to complete tasks in a simulated household environment.

Alexa

Last update: Dec 9, 2022

Related tags

Text Data & NLP teach

Overview

TEACh

Aishwarya Padmakumar*, Jesse Thomason*, Ayush Shrivastava, Patrick Lange, Anjali Narayan-Chen, Spandana Gella, Robinson Piramuthu, Gokhan Tur, Dilek Hakkani-Tur

TEACh is a dataset of human-human interactive dialogues to complete tasks in a simulated household environment. The code is licensed under the MIT License (see SOFTWARELICENSE), images are licensed under Apache 2.0 (see IMAGESLICENSE) and other data files are licensed under CDLA-Sharing 1.0 (see DATALICENSE). Please include appropriate licensing and attribution when using our data and code, and please cite our paper.

Prerequisites

python3 >=3.7,<=3.8
python3.x-dev, example: sudo apt install python3.8-dev
tmux, example: sudo apt install tmux
xorg, example: sudo apt install xorg openbox
ffmpeg, example: sudo apt install ffmpeg

Installation

pip install -r requirements.txt
pip install -e .

Downloading the dataset

Run the following script:

teach_download

This will download and extract the archive files (experiment_games.tar.gz, all_games.tar.gz, images_and_states.tar.gz, edh_instances.tar.gz & tfd_instances.tar.gz) in the default directory (/tmp/teach-dataset).
Optional arguments:

-d/directory: The location to store the dataset into. Default=/tmp/teach-dataset.
-se/--skip-extract: If set, skip extracting archive files.
-sd/--skip-download: If set, skip downloading archive files.
-f/--file: Specify the file name to be retrieved from S3 bucket.

Remote Server Setup

If running on a remote server without a display, the following setup will be needed to run episode replay, model inference of any model training that invokes the simulator (student forcing / RL).

Start an X-server

tmux
sudo python ./bin/startx.py

Exit the tmux session (CTRL+B, D). Any other commands should be run in the main terminal / different sessions.

Replaying episodes

Most users should not need to do this since we provide this output in images_and_states.tar.gz.

The following steps can be used to read a .json file of a gameplay session, play it in the AI2-THOR simulator, and at each time step save egocentric observations of the Commander and Driver (Follower in the paper). It also saves the target object panel and mask seen by the Commander, and the difference between current and initial state.

Replaying a single episode locally, or in a new tmux session / main terminal of remote headless server:

teach_replay \
--game_fn /path/to/game/file \
--write_frames_dir /path/to/desired/output/images/dir \
--write_frames \
--write_states \
--status-out-fn /path/to/desired/output/status/file.json

Note that --status-out-fn must end in .json Also note that the script will by default not replay sessions for which an output subdirectory already exists under --write-frames-dir Additionally, if the file passed to --status-out-fn already exists, the script will try to resume files not marked as replayed in that file. It will error out if there is a mismatch between the status file and output directories on which sessions have been previously played. It is recommended to use a new --write-frames-dir and new --status-out-fn for additional runs that are not intended to resume from a previous one.

Replay all episodes in a folder locally, or in a new tmux session / main terminal of remote headless server:

teach_replay \
--game_dir /path/to/dir/containing/.game.json/files \
--write_frames_dir /path/to/desired/output/images/dir \
--write_frames \
--write_states \
--num_processes 50 \
--status-out-fn /path/to/desired/output/status/file.json

To generate a video, additionally specify --create_video. Note that for images to be saved, --write_images must be specified and --write-frames-dir must be provided. For state changes to be saved, --write_states must be specified and --write_frames_dir must be provided.

Evaluation

We include sample scripts for inference and calculation of metrics. teach_inference and teach_eval. teach_inference is a wrapper that implements loading EDH instance, interacting with the simulator as well as writing the game file and predicted action sequence as JSON files after each inference run. It dynamically loads the model based on the --model_module and --model_class arguments. Your model has to implement teach.inference.teach_model.TeachModel. See teach.inference.sample_model.SampleModel for an example implementation which takes random actions at every time step.

After running teach_inference, you use teach_eval to compute the metrics based output data produced by teach_inference.

Sample run:

export DATA_DIR=/path/to/data/with/games/and/edh_instances/as/subdirs (Default in Downloading is /tmp/teach-dataset)
export OUTPUT_DIR=/path/to/output/folder/for/split
export METRICS_FILE=/path/to/output/metrics/file_without_extension

teach_inference \
    --data_dir $DATA_DIR \
    --output_dir $OUTPUT_DIR \
    --split valid_seen \
    --metrics_file $METRICS_FILE \
    --model_module teach.inference.sample_model \
    --model_class SampleModel

teach_eval \
    --data_dir $DATA_DIR \
    --inference_output_dir $OUTPUT_DIR \
    --split valid_seen \
    --metrics_file $METRICS_FILE

Security

See CONTRIBUTING for more information.

License

The code is licensed under the MIT License (see SOFTWARELICENSE), images are licensed under Apache 2.0 (see IMAGESLICENSE) and other data files are licensed under CDLA-Sharing 1.0 (see DATALICENSE).

Comments

UnboundLocalError when evaluating the Episodic Transformer baselines

Hello,

I have been trying to evaluate the Episodic Transformer baselines for the TEACh Benchmark Challenge. And I keep getting the following error message when I am running the evaluation script provided inside the ET directory. I have also tried running the evaluation via "teach_inference". The error is the same.

Traceback (most recent call last):
  File "/home/ubuntu/workplace/teach/src/teach/inference/inference_runner.py", line 121, in _run
    instance_id, instance_metrics = InferenceRunner._run_edh_instance(instance_file, config, model, er)
  File "/home/ubuntu/workplace/teach/src/teach/inference/inference_runner.py", line 221, in _run_edh_instance
    traj_steps_taken,
UnboundLocalError: local variable 'traj_steps_taken' referenced before assignment

I am doing the inference on an AWS instance. I have started the X-server and installed all requirements and prerequisites without bug.

Here's the script I used for evaluation.

#!/bin/sh

export AWS_ROOT=/home/ubuntu/workplace
export ET_DATA=$AWS_ROOT/data
export TEACH_ROOT_DIR=$AWS_ROOT/teach
export TEACH_SRC_DIR=$TEACH_ROOT_DIR/src
export ET_ROOT=$TEACH_SRC_DIR/teach/modeling/ET
export ET_LOGS=$TEACH_ROOT_DIR/src/teach/modeling/ET/checkpoints
export INFERENCE_OUTPUT_PATH=$TEACH_ROOT_DIR/inference_output
export PYTHONPATH=$TEACH_SRC_DIR:$ET_ROOT:$PYTHONPATH
export SPLIT=valid_seen

cd $TEACH_ROOT_DIR
python src/teach/cli/inference.py \
            --model_module teach.inference.et_model \
                --model_class ETModel \
                    --data_dir $ET_DATA \
                        --output_dir $INFERENCE_OUTPUT_PATH/inference__teach_et_trial_$SPLIT \
                            --split $SPLIT \
                                --metrics_file $INFERENCE_OUTPUT_PATH/metrics__teach_et_trial_$SPLIT.json \
                                    --seed 4 \
                                        --model_dir $ET_DATA/baseline_models/et \
                                            --object_predictor $ET_LOGS/pretrained/maskrcnn_model.pth \
                                            --visual_checkpoint $ET_LOGS/pretrained/fasterrcnn_model.pth \
                                                --device "cpu" \
                                                --images_dir $INFERENCE_OUTPUT_PATH/images

Could you help me with this? Thanks!

opened by yingShen-ys 6

Possible bug in config file

Hi @aishwaryap

I was revisiting the code for the E.T. baseline and there seems to be a bug in the config file for training the model: https://github.com/alexa/teach/blob/5554f02f55c22abfe5c2a749dbb24c13377726c8/src/teach/modeling/ET/alfred/config.py#L183 I believe it should be detach_lang_emb = True since we do not want to propagate the gradients through the look-up table or the language encoder.

Please let me know your thoughts on this.

Thanks, Divyam

opened by dv-fenix 4
About the evaluation time spent

Could you please let me know how much time was spent on the evaluation? It took me about two days to evaluate with 4 processes, and I found that a large part of the time was spent in the state initialization of edh instance, as well as reaching max_api_fails and max_traj_steps. And the time for the agent to take a step is also very long, which is very dependent on the frequency of the CPU. Can you tell me the settings of your experimental equipment? and is there any other way to evaluate the trained model?

opened by RupertLuo 3
Bump flask-cors from 3.0.8 to 3.0.9
Bumps flask-cors from 3.0.8 to 3.0.9.

Release notes

Sourced from flask-cors's releases.

Release 3.0.9

Security

Escape path before evaluating resource rules (thanks @praetorian-colby-morgan). Prior to this, flask-cors incorrectly evaluated CORS resource matching before path expansion. E.g. "/api/../foo.txt" would incorrectly match resources for "/api/*" whereas the path actually expands simply to "/foo.txt"

Changelog

Sourced from flask-cors's changelog.

3.0.9

Security

Escape path before evaluating resource rules (thanks to Colby Morgan). Prior to this, flask-cors incorrectly evaluated CORS resource matching before path expansion. E.g. "/api/../foo.txt" would incorrectly match resources for "/api/*" whereas the path actually expands simply to "/foo.txt"

Commits

91babb9 Update Api docs for credentialed requests (#221)

522d989 Release version 3.0.9 (#273)

67c4b2c Fix request path normalization (#272)

5c6e05e docs: Fix simple typo, garaunteed -> guaranteed

566aef2 Fixed over-indentation

8a4e6e7 Update changelog to give proper kudos to @juanmaneo and @jdevera

See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 2
How to setup docker to use multiple GPUs for inference

Hi,

I am trying to set up the evaluation environment on a multi-GPU AWS following the instructions in TEACh Benchmark Challenge. However, I encounter two problems: (1) The model can use only 1 GPU even if I have set the value of API_GPUS to multiple GPUs. (2) When I start the inference runner, although it is able to launch multiple ai2thor instances by specifying --num_process X, the processes are all on one GPU instead of on X GPUs. Also, I have to manually specify --model_api_host_and_port to include multiple API ports (e.g. "@TeachModelAPI:$API_PORT,@TeachModelAPI:$API_PORT,@TeachModelAPI:$API_PORT" for --num_processes 3), which seems weird.

Besides, I notice that in this line it mentions that the model container will have access to only one GPU, while this line says that the model can use all GPUs of a p3.16xlarge instance. I wonder which would be the case, and if multiple GPUs are allowed, how to correctly setup the docker container.

Thanks!

opened by 594zyc 2
Questions about the evaluation rules for the Alexa Simbot Challenge
I have three questions regarding the evaluation rules for the Alexa Simbot Challenge:

Can we use "dialog_history_cleaned" rather than "dialog_history" in the edh instance?

Using only the action history and dialogue history from the driver results in missing a piece of key information - the time of each user utterance made during the interaction. We argue that such causal information should be allowed to use.

Could you elaborate on the "should not use task definitions" rule? For example, are we allowed to integrate the task structures provided in the task definitions into our model, while do not rely on any ground truth task information during inference as the model has to figure out the task and arguments by itself from the dialog input?

Thanks!
opened by 594zyc 2

User-friendly API for manipulating the data

Hi @aishwaryap,

Thanks for releasing your dataset. I was wondering whether you had a way to manipulate the low-level JSON data in a more user-friendly way. I can see from the codebase that there is a Dataset class that exposes a from_dict() method which is supposed to be used to create a Dataset object from a Dict. However, I'm currently having the following issue when doing so:

>>> with open("/tmp/teach/games/train/8cdf3d9a18cac7fe_6b02.game.json") as in_file:
            game = json.load(in_file)
>>> dataset = Dataset.from_dict(game)

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/Users/asuglia/workspace/teach/src/teach/dataset/dataset.py", line 47, in from_dict
    tasks = [
  File "/Users/asuglia/workspace/teach/src/teach/dataset/dataset.py", line 48, in <listcomp>
    Task.from_dict(task_dict, definitions, process_init_state) for task_dict in dataset_dict.get("tasks")
  File "/Users/asuglia/workspace/teach/src/teach/dataset/task.py", line 35, in from_dict
    episodes = [
  File "/Users/asuglia/workspace/teach/src/teach/dataset/task.py", line 36, in <listcomp>
    Episode.from_dict(episode_dict, definitions, process_init_state)
  File "/Users/asuglia/workspace/teach/src/teach/dataset/episode.py", line 60, in from_dict
    initial_state=Initialization.from_dict(episode_dict["initial_state"])
  File "/Users/asuglia/workspace/teach/src/teach/dataset/initialization.py", line 47, in from_dict
    agents = [Pose_With_ID.from_dict(x) for x in initialization_dict["agents"]]
  File "/Users/asuglia/workspace/teach/src/teach/dataset/initialization.py", line 47, in <listcomp>
    agents = [Pose_With_ID.from_dict(x) for x in initialization_dict["agents"]]
  File "/Users/asuglia/workspace/teach/src/teach/dataset/pose.py", line 60, in from_dict
    return cls(identity=identity, pose=Pose.from_array(pose_with_id_dict["pose"]), is_object=is_object)
KeyError: 'pose'

In general, I would love to have a more object-oriented way of handling the data. I could write my own parser of the JSON data but I believe this logic must be somewhere already in your codebase. Potentially having an example script with an example showing how to explore the dataset might be useful to others as well. Any thoughts?

opened by aleSuglia 2

No module named "alfred"

Hi,

While running the teach_inference on the sample_model using the code below, the inference and the evaluation work fine.

teach_inference     
  --data_dir /Users/sakthi/Desktop/teach-main/data4/     
  --output_dir /Users/sakthi/Desktop/teach-main/data4/outputs     
  --split valid_seen     
  --metrics_file /Users/sakthi/Desktop/teach-main/data4/outputs/metrics     
  --model_module teach.inference.sample_model    
  --model_class SampleModel    
  --images_dir /Users/sakthi/Desktop/teach-main/data4/images/

But, while trying to run it with the ET models using the code below, it throws the error "No module named 'alfred'".

teach_inference     
  --data_dir /Users/sakthi/Desktop/teach-main/data4/     
  --output_dir /Users/sakthi/Desktop/teach-main/data4/outputs     
  --split valid_seen     
  --metrics_file /Users/sakthi/Desktop/teach-main/data4/outputs/metrics    
   --model_module teach.inference.et_model    
  --model_class ETModel     
  --images_dir /Users/sakthi/Desktop/teach-main/data4/images/ 
  --model_dir /Users/sakthi/Desktop/teach-main/data4/baseline_models/et 
  --object_predictor /Users/sakthi/Desktop/teach-main/data4/et_pretrained_models/maskrcnn_model.pth     
  --visual_checkpoint /Users/sakthi/Desktop/teach-main/data4/et_pretrained_models/fasterrcnn_model.pth

Error:

Traceback (most recent call last):
  File "/Users/sakthi/opt/anaconda3/envs/teach_edh/bin/teach_inference", line 8, in <module>
    sys.exit(main())
  File "/Users/sakthi/Desktop/teach-main/src/teach/cli/inference.py", line 150, in main
    model_class=dynamically_load_class(args.model_module, args.model_class),
  File "/Users/sakthi/Desktop/teach-main/src/teach/utils.py", line 390, in dynamically_load_class
    module = __import__(package_path, fromlist=[class_name])
  File "/Users/sakthi/Desktop/teach-main/src/teach/inference/et_model.py", line 11, in <module>
    from alfred import constants
ModuleNotFoundError: No module named 'alfred'

Could you please let me know how to solve it? Or add the required alfred package to the requirements?

Thanks!

opened by msakthiganesh 1

Error while running teach_eval - float division by zero

Hi,

I am trying to run the teach_eval command to evaluate the performance of the model for EDH by running the following command.

teach_eval --data_dir /scratch/smahali6/teach_edh/teach/data/ --inference_output_dir /scratch/smahali6/teach_edh/teach/data/outputs/ --split valid_seen --metrics_file /scratch/smahali6/teach_edh/teach/data/outputs/metrics/

Initially, I got the error below:

[MainThread-71052-INFO] teach.cli.eval: Evaluating split valid_seen requiring 608 files
INFO:teach.cli.eval:Evaluating split valid_seen requiring 608 files
Traceback (most recent call last):
  File "/home/smahali6/.conda/envs/teach_edh/bin/teach_eval", line 8, in <module>
    sys.exit(main())
  File "/scratch/smahali6/teach_edh/teach/src/teach/cli/eval.py", line 79, in main
    logger.info("Evaluating split %s requiring %d files" % (args.split, len(edh_instance_files)))
NameError: name 'edh_instance_files' is not defined

I solved it by using the line below: edh_instance_files = set(os.listdir(os.path.join(args.data_dir, input_subdir, args.split)))

Now, I get a ZeroDivisonError mentioned below:

[MainThread-91566-INFO] teach.cli.eval: Evaluating split valid_seen requiring 608 files
INFO:teach.cli.eval:Evaluating split valid_seen requiring 608 files
[MainThread-91566-INFO] teach.cli.eval: Evaluating split valid_seen requiring 608 files
INFO:teach.cli.eval:Evaluating split valid_seen requiring 608 files
[MainThread-91566-INFO] teach.cli.eval: Found output files for 0 instances; treating remaining 608 as failed...
INFO:teach.cli.eval:Found output files for 0 instances; treating remaining 608 as failed...
Traceback (most recent call last):
  File "/home/smahali6/.conda/envs/teach_edh/bin/teach_eval", line 8, in <module>
    sys.exit(main())
  File "/scratch/smahali6/teach_edh/teach/src/teach/cli/eval.py", line 114, in main
    results = aggregate_metrics(traj_stats, args)
  File "/scratch/smahali6/teach_edh/teach/src/teach/eval/compute_metrics.py", line 87, in aggregate_metrics
    sr = float(num_successes) / num_evals
ZeroDivisionError: float division by zero

Does the pipeline require any files under the output directory as Found output files for 0 instances; treating remaining 608 as failed...? How can I fix this and evaluate the model?

The directory tree for the project directory can be found in this link. https://drive.google.com/file/d/16DrPFl-dcxbPgtbIz0ryeFZXxDRUOcIO/view?usp=sharing

Thanks!

opened by msakthiganesh 1

Are the tests seen and tests unseen splits not available again?

Hello,

I think around late May, the tests seen and tests unseen splits were available withteach-download. Are they not available again? Is there a way to run one's model on them again?

opened by soyeonm 1
Bump protobuf from 3.20.0 to 3.20.2
Bumps protobuf from 3.20.0 to 3.20.2.

Release notes

Sourced from protobuf's releases.

Protocol Buffers v3.20.2

C++

Reduce memory consumption of MessageSet parsing

This release addresses a Security Advisory for C++ and Python users

Protocol Buffers v3.20.1

PHP

Fix building packaged PHP extension (#9727)

Fixed composer.json to only advertise compatibility with PHP 7.0+. (#9819)

Ruby

Disable the aarch64 build on macOS until it can be fixed. (#9816)

Other

Fix versioning issues in 3.20.0

Protocol Buffers v3.20.1-rc1

#PHP

Fix building packaged PHP extension (#9727)

#Other

Fix versioning issues in 3.20.0

Commits

a20c65f Updating changelog

c49fe79 Updating version.json and repo version numbers to: 20.2

806d7e4 Merge pull request #10544 from deannagarcia/3.20.x

ae718b3 Add missing includes

b4c395a Apply patch

6439c5c Merge pull request #10531 from protocolbuffers/deannagarcia-patch-7

22c79e6 Update version.json

c1a2d2e Fix python release on macos (#10512)

a826282 Merge pull request #10505 from deannagarcia/3.20.x

7639a71 Add version file

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 1
Different vocab size between `data.vocab` and `embs_ann`?

Hi, I noticed that the data.vocab stored in the baseline model has a different vocabulary length compared to the language embedding stored in pretrained model.

For the baseline model "et_plus_h", the data.vocab file has Vocab(2554) for words while if I load the pretrained model from baseline_models/et_plus_h/latest.pth, the embedding layer model.embs_ann.lmdb_simbot_edh_vocab_none.weight has torch.Size([2788, 768]).

Did I miss something?

opened by yingShen-ys 2

The same action prediction gets different evaluation metrics in different runs

Hi,

I ran the baseline ET model and found that two different runs get significantly different evaluation metrics. (might relate to this issue #10) Run1:

SR: 77/608 = 0.127
GC: 487/3526 = 0.138
PLW SR: 0.026
PLW GC: 0.093

Run2:

SR: 52/608 = 0.086
GC: 321/3526 = 0.091
PLW SR: 0.007
PLW GC: 0.034

After taking a close look at the output I find in some episodes the same set of prediction actions results in different evaluation metrics in different runs. For example in this 66957a984ae5a714_f28d.edh4, the inference output for the first run is:

"66957a984ae5a714_f28d.edh4": {
        "instance_id": "66957a984ae5a714_f28d.edh4",
        "game_id": "66957a984ae5a714_f28d",
        "completed_goal_conditions": 2,
        "total_goal_conditions": 2,
        "goal_condition_success": 1,
        "success_spl": 0.55,
        "path_len_weighted_success_spl": 12.100000000000001,
        "goal_condition_spl": 0.55,
        "path_len_weighted_goal_condition_spl": 12.100000000000001,
        "gt_path_len": 22,
        "reward": 0,
        "success": 1,
        "traj_len": 40,
        "predicted_stop": 0,
        "num_api_fails": 30,
        "error": 0,
        "init_success": true,
        "pred_actions": [
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ]
        ]
    }

While for the second run it is:

"66957a984ae5a714_f28d.edh4": {
        "instance_id": "66957a984ae5a714_f28d.edh4",
        "game_id": "66957a984ae5a714_f28d",
        "completed_goal_conditions": 0,
        "total_goal_conditions": 2,
        "goal_condition_success": 0.0,
        "success_spl": 0.0,
        "path_len_weighted_success_spl": 0.0,
        "goal_condition_spl": 0.0,
        "path_len_weighted_goal_condition_spl": 0.0,
        "gt_path_len": 22,
        "reward": 0.0,
        "success": 0,
        "traj_len": 40,
        "predicted_stop": 0,
        "num_api_fails": 30,
        "error": 0,
        "init_success": true,
        "pred_actions": [
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ]
        ]
    }

So basically the first evaluation result does not make sense since there should be no chance for the model to succeed without performing any manipulative actions.

The first run is done using an AWS ec2 p3.8 instance while the second run using a p3.16. All the other settings are the same. The full evaluation logs are available here: [run 1] [run 2]

Do you have any idea about the cause? Thanks

opened by 594zyc 3

Support for Python 3.9?

README.md states that only python version >=3.7 and <=3.8 are supported but setup.py only specifies >=3.7. Is Python 3.9 officially supported?

opened by yukw777 2
Trajectories that raise an error are ignored

Hi @aishwaryap,

I was reading about the recent changes to the code reported in #10 and we unfortunately get results that differ substantially from yours. I started dissecting the code to understand what's the reason for such discrepancies in the results. From my understanding of the inference_runner.py script, you spawn several processes, each with a given portion of the tasks. However, I can see that the exception handling logic simply ignores an instance that raises an error: https://github.com/emma-simbot/teach/blob/speaker_tokens/src/teach/inference/inference_runner.py#L130

This is detrimental because if a dataset instance errors for whatever reason, its contribution to the overall metrics is ignored. Instead, the proper way of dealing with this should be to ignore that trajectory and still add to the metrics that you were not successful. Potentially, such faulty trajectories should be reported in the metrics file for future debugging.

Am I missing something?

opened by aleSuglia 1
Possible bugs in `get_state_changes`

Hi @aishwaryap,

Thank you for releasing the dataset. It seems that there is a bug in the get_state_changes function: https://github.com/alexa/teach/blob/5554f02f55c22abfe5c2a749dbb24c13377726c8/src/teach/utils.py#L92

I believe it should be agent_final = final_state["agents"][idx] instead. As a result, the state differences of the agents are empty in all teach-dataset/images/$SPLIT/$REPLAYED_CODE/statediff.*.json files.

Thanks, Jiachen

opened by Ji4chenLi 3

Much higher scores when evaluating Episodic Transformer baselines for EDH instances

Hello,

I have finished the evaluation of the Episodic Transformer baselines for the TEACh Benchmark Challenge on the valid_seen.

However, one weird thing I found is that our reproduced result is much higher than what is reported in the paper. The result is shown below (All values are percentages). There is a total of 608 EDH instances (valid_seen) in the metric file which matches the number in the paper.

| SR [TLW] | GC [TLW] -- | -- | -- Reproduced | 13.8 [3.2] | 14 [8.7] Reported in the paper | 5.76 [0.90] | 7.99 [1.65]

I believe I am using the correct checkpoints. And the only change I made to the code is mentioned in #9.

I am running on an AWS instance. I have started the X-server and installed all requirements and prerequisites without bugs. And the inference process is bugfree.

Here is the script I used for evaluation.

#!/bin/sh

export AWS_ROOT=/home/ubuntu/workplace
export ET_DATA=$AWS_ROOT/data
export TEACH_ROOT_DIR=$AWS_ROOT/teach
export TEACH_SRC_DIR=$TEACH_ROOT_DIR/src
export ET_ROOT=$TEACH_SRC_DIR/teach/modeling/ET
export ET_LOGS=$TEACH_ROOT_DIR/src/teach/modeling/ET/checkpoints
export INFERENCE_OUTPUT_PATH=$TEACH_ROOT_DIR/inference_output
export PYTHONPATH=$TEACH_SRC_DIR:$ET_ROOT:$PYTHONPATH
export SPLIT=valid_seen

cd $TEACH_ROOT_DIR
python src/teach/cli/inference.py \
            --model_module teach.inference.et_model \
                --model_class ETModel \
                    --data_dir $ET_DATA \
                        --output_dir $INFERENCE_OUTPUT_PATH/inference__teach_et_trial_$SPLIT \
                            --split $SPLIT \
                                --metrics_file $INFERENCE_OUTPUT_PATH/metrics__teach_et_trial_$SPLIT.json \
                                    --seed 4 \
                                        --model_dir $ET_DATA/baseline_models/et \
                                            --object_predictor $ET_LOGS/pretrained/maskrcnn_model.pth \
                                            --visual_checkpoint $ET_LOGS/pretrained/fasterrcnn_model.pth \
                                                --device "cpu" \
                                                --images_dir $INFERENCE_OUTPUT_PATH/images

I wonder if the data split provided in the dataset is the same as the paper. And if so, what would be the possible explanation for this?

Please let me know if someone else is getting similar results. Thank you!

opened by yingShen-ys 4

Owner

Alexa

GitHub

Source code for AAAI20 "Generating Persona Consistent Dialogues by Exploiting Natural Language Inference".

Generating Persona Consistent Dialogues by Exploiting Natural Language Inference Source code for RCDG model in AAAI20 Generating Persona Consistent Di

16 Oct 8, 2022

Deal or No Deal? End-to-End Learning for Negotiation Dialogues

Introduction This is a PyTorch implementation of the following research papers: (1) Hierarchical Text Generation and Planning for Strategic Dialogue (

1.4k Dec 29, 2022

Interactive Jupyter Notebook Environment for using the GPT-3 Instruct API

gpt3-instruct-sandbox Interactive Jupyter Notebook Environment for using the GPT-3 Instruct API Description This project updates an existing GPT-3 san

312 Jan 3, 2023

Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU，一个中文文本分类、序列标注工具包，支持中文长文本、短文本的多类、多标签分类任务，支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

186 Dec 24, 2022

A complete NLP guideline for enthusiasts

NLP-NINJA A complete guide for Natural Language Processing in Python Table of Contents S.No. Topic Level Meaning 1 Tokenization ?? Beginner 2 Stemming

22 Dec 27, 2022

This is the Alpha of Nutte language, she is not complete yet / Essa é a Alpha da Nutte language, não está completa ainda

nutte-language This is the Alpha of Nutte language, it is not complete yet / Essa é a Alpha da Nutte language, não está completa ainda My language was

2 Dec 18, 2021

The FinQA dataset from paper: FinQA: A Dataset of Numerical Reasoning over Financial Data

Data and code for EMNLP 2021 paper "FinQA: A Dataset of Numerical Reasoning over Financial Data"

114 Dec 29, 2022

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

740 Dec 24, 2022

A 30000+ Chinese MRC dataset - Delta Reading Comprehension Dataset

Delta Reading Comprehension Dataset 台達閱讀理解資料集 Delta Reading Comprehension Dataset (DRCD) 屬於通用領域繁體中文機器閱讀理解資料集。本資料集期望成為適用於遷移學習之標準中文閱讀理解資料集。本資料集從2,108篇

272 Dec 15, 2022

ReCoin - Restoring our environment and businesses in parallel

Shashank Ojha, Sabrina Button, Abdellah Ghassel, Joshua Gonzales "Reduce Reuse R

1 Mar 14, 2022

Framework for fine-tuning pretrained transformers for Named-Entity Recognition (NER) tasks

NERDA Not only is NERDA a mesmerizing muppet-like character. NERDA is also a python package, that offers a slick easy-to-use interface for fine-tuning

141 Dec 30, 2022

Text vectorization tool to outperform TFIDF for classification tasks

WHAT: Supervised text vectorization tool Textvec is a text vectorization tool, with the aim to implement all the "classic" text vectorization NLP meth

186 Dec 29, 2022

Text vectorization tool to outperform TFIDF for classification tasks

WHAT: Supervised text vectorization tool Textvec is a text vectorization tool, with the aim to implement all the "classic" text vectorization NLP meth

160 Feb 9, 2021

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks. It takes raw videos/images + text as inputs, and outputs task predictions. ClipBERT is designed based on 2D CNNs and transformers, and uses a sparse sampling strategy to enable efficient end-to-end video-and-language learning.

612 Jan 4, 2023

TextFlint is a multilingual robustness evaluation platform for natural language processing tasks,

TextFlint is a multilingual robustness evaluation platform for natural language processing tasks, which unifies general text transformation, task-specific transformation, adversarial attack, sub-population, and their combinations to provide a comprehensive robustness analysis.

587 Dec 20, 2022

skweak: A software toolkit for weak supervision applied to NLP tasks

Labelled data remains a scarce resource in many practical NLP scenarios. This is especially the case when working with resource-poor languages (or text domains), or when using task-specific labels without pre-existing datasets. The only available option is often to collect and annotate texts by hand, which is expensive and time-consuming.

Norsk Regnesentral (Norwegian Computing Center)

850 Dec 28, 2022

TEACh is a dataset of human-human interactive dialogues to complete tasks in a simulated household environment.

Related tags

Overview

TEACh

Prerequisites

Installation

Downloading the dataset

Remote Server Setup

Replaying episodes

Evaluation

Security

License

Comments

Release 3.0.9

Security

3.0.9

Security

Protocol Buffers v3.20.2

C++

Protocol Buffers v3.20.1

PHP

Ruby

Other

Protocol Buffers v3.20.1-rc1

Owner

Alexa

Source code for AAAI20 "Generating Persona Consistent Dialogues by Exploiting Natural Language Inference".

Deal or No Deal? End-to-End Learning for Negotiation Dialogues

Interactive Jupyter Notebook Environment for using the GPT-3 Instruct API

Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

A complete NLP guideline for enthusiasts

This is the Alpha of Nutte language, she is not complete yet / Essa é a Alpha da Nutte language, não está completa ainda

The FinQA dataset from paper: FinQA: A Dataset of Numerical Reasoning over Financial Data

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

A 30000+ Chinese MRC dataset - Delta Reading Comprehension Dataset

ReCoin - Restoring our environment and businesses in parallel

Framework for fine-tuning pretrained transformers for Named-Entity Recognition (NER) tasks

Text vectorization tool to outperform TFIDF for classification tasks

Text vectorization tool to outperform TFIDF for classification tasks

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks

TextFlint is a multilingual robustness evaluation platform for natural language processing tasks,

skweak: A software toolkit for weak supervision applied to NLP tasks

NLPShala , the best IDE for all Natural language processing tasks.

Universal End2End Training Platform, including pre-training, classification tasks, machine translation, and etc.

pysentimiento: A Python toolkit for Sentiment Analysis and Social NLP tasks