TEACh is a dataset of human-human interactive dialogues to complete tasks in a simulated household environment.

Overview

TEACh

Task-driven Embodied Agents that Chat

Aishwarya Padmakumar*, Jesse Thomason*, Ayush Shrivastava, Patrick Lange, Anjali Narayan-Chen, Spandana Gella, Robinson Piramuthu, Gokhan Tur, Dilek Hakkani-Tur

TEACh is a dataset of human-human interactive dialogues to complete tasks in a simulated household environment. The code is licensed under the MIT License (see SOFTWARELICENSE), images are licensed under Apache 2.0 (see IMAGESLICENSE) and other data files are licensed under CDLA-Sharing 1.0 (see DATALICENSE). Please include appropriate licensing and attribution when using our data and code, and please cite our paper.

Prerequisites

  • python3 >=3.7,<=3.8
  • python3.x-dev, example: sudo apt install python3.8-dev
  • tmux, example: sudo apt install tmux
  • xorg, example: sudo apt install xorg openbox
  • ffmpeg, example: sudo apt install ffmpeg

Installation

pip install -r requirements.txt
pip install -e .

Downloading the dataset

Run the following script:

teach_download 

This will download and extract the archive files (experiment_games.tar.gz, all_games.tar.gz, images_and_states.tar.gz, edh_instances.tar.gz & tfd_instances.tar.gz) in the default directory (/tmp/teach-dataset).
Optional arguments:

  • -d/directory: The location to store the dataset into. Default=/tmp/teach-dataset.
  • -se/--skip-extract: If set, skip extracting archive files.
  • -sd/--skip-download: If set, skip downloading archive files.
  • -f/--file: Specify the file name to be retrieved from S3 bucket.

Remote Server Setup

If running on a remote server without a display, the following setup will be needed to run episode replay, model inference of any model training that invokes the simulator (student forcing / RL).

Start an X-server

tmux
sudo python ./bin/startx.py

Exit the tmux session (CTRL+B, D). Any other commands should be run in the main terminal / different sessions.

Replaying episodes

Most users should not need to do this since we provide this output in images_and_states.tar.gz.

The following steps can be used to read a .json file of a gameplay session, play it in the AI2-THOR simulator, and at each time step save egocentric observations of the Commander and Driver (Follower in the paper). It also saves the target object panel and mask seen by the Commander, and the difference between current and initial state.

Replaying a single episode locally, or in a new tmux session / main terminal of remote headless server:

teach_replay \
--game_fn /path/to/game/file \
--write_frames_dir /path/to/desired/output/images/dir \
--write_frames \
--write_states \
--status-out-fn /path/to/desired/output/status/file.json

Note that --status-out-fn must end in .json Also note that the script will by default not replay sessions for which an output subdirectory already exists under --write-frames-dir Additionally, if the file passed to --status-out-fn already exists, the script will try to resume files not marked as replayed in that file. It will error out if there is a mismatch between the status file and output directories on which sessions have been previously played. It is recommended to use a new --write-frames-dir and new --status-out-fn for additional runs that are not intended to resume from a previous one.

Replay all episodes in a folder locally, or in a new tmux session / main terminal of remote headless server:

teach_replay \
--game_dir /path/to/dir/containing/.game.json/files \
--write_frames_dir /path/to/desired/output/images/dir \
--write_frames \
--write_states \
--num_processes 50 \
--status-out-fn /path/to/desired/output/status/file.json

To generate a video, additionally specify --create_video. Note that for images to be saved, --write_images must be specified and --write-frames-dir must be provided. For state changes to be saved, --write_states must be specified and --write_frames_dir must be provided.

Evaluation

We include sample scripts for inference and calculation of metrics. teach_inference and teach_eval. teach_inference is a wrapper that implements loading EDH instance, interacting with the simulator as well as writing the game file and predicted action sequence as JSON files after each inference run. It dynamically loads the model based on the --model_module and --model_class arguments. Your model has to implement teach.inference.teach_model.TeachModel. See teach.inference.sample_model.SampleModel for an example implementation which takes random actions at every time step.

After running teach_inference, you use teach_eval to compute the metrics based output data produced by teach_inference.

Sample run:

export DATA_DIR=/path/to/data/with/games/and/edh_instances/as/subdirs (Default in Downloading is /tmp/teach-dataset)
export OUTPUT_DIR=/path/to/output/folder/for/split
export METRICS_FILE=/path/to/output/metrics/file_without_extension

teach_inference \
    --data_dir $DATA_DIR \
    --output_dir $OUTPUT_DIR \
    --split valid_seen \
    --metrics_file $METRICS_FILE \
    --model_module teach.inference.sample_model \
    --model_class SampleModel

teach_eval \
    --data_dir $DATA_DIR \
    --inference_output_dir $OUTPUT_DIR \
    --split valid_seen \
    --metrics_file $METRICS_FILE

Security

See CONTRIBUTING for more information.

License

The code is licensed under the MIT License (see SOFTWARELICENSE), images are licensed under Apache 2.0 (see IMAGESLICENSE) and other data files are licensed under CDLA-Sharing 1.0 (see DATALICENSE).

Comments
  • UnboundLocalError when evaluating the Episodic Transformer baselines

    UnboundLocalError when evaluating the Episodic Transformer baselines

    Hello,

    I have been trying to evaluate the Episodic Transformer baselines for the TEACh Benchmark Challenge. And I keep getting the following error message when I am running the evaluation script provided inside the ET directory. I have also tried running the evaluation via "teach_inference". The error is the same.

    Traceback (most recent call last):
      File "/home/ubuntu/workplace/teach/src/teach/inference/inference_runner.py", line 121, in _run
        instance_id, instance_metrics = InferenceRunner._run_edh_instance(instance_file, config, model, er)
      File "/home/ubuntu/workplace/teach/src/teach/inference/inference_runner.py", line 221, in _run_edh_instance
        traj_steps_taken,
    UnboundLocalError: local variable 'traj_steps_taken' referenced before assignment
    

    I am doing the inference on an AWS instance. I have started the X-server and installed all requirements and prerequisites without bug.

    Here's the script I used for evaluation.

    #!/bin/sh
    
    export AWS_ROOT=/home/ubuntu/workplace
    export ET_DATA=$AWS_ROOT/data
    export TEACH_ROOT_DIR=$AWS_ROOT/teach
    export TEACH_SRC_DIR=$TEACH_ROOT_DIR/src
    export ET_ROOT=$TEACH_SRC_DIR/teach/modeling/ET
    export ET_LOGS=$TEACH_ROOT_DIR/src/teach/modeling/ET/checkpoints
    export INFERENCE_OUTPUT_PATH=$TEACH_ROOT_DIR/inference_output
    export PYTHONPATH=$TEACH_SRC_DIR:$ET_ROOT:$PYTHONPATH
    export SPLIT=valid_seen
    
    cd $TEACH_ROOT_DIR
    python src/teach/cli/inference.py \
                --model_module teach.inference.et_model \
                    --model_class ETModel \
                        --data_dir $ET_DATA \
                            --output_dir $INFERENCE_OUTPUT_PATH/inference__teach_et_trial_$SPLIT \
                                --split $SPLIT \
                                    --metrics_file $INFERENCE_OUTPUT_PATH/metrics__teach_et_trial_$SPLIT.json \
                                        --seed 4 \
                                            --model_dir $ET_DATA/baseline_models/et \
                                                --object_predictor $ET_LOGS/pretrained/maskrcnn_model.pth \
                                                --visual_checkpoint $ET_LOGS/pretrained/fasterrcnn_model.pth \
                                                    --device "cpu" \
                                                    --images_dir $INFERENCE_OUTPUT_PATH/images
    

    Could you help me with this? Thanks!

    opened by yingShen-ys 6
  • Possible bug in config file

    Possible bug in config file

    Hi @aishwaryap

    I was revisiting the code for the E.T. baseline and there seems to be a bug in the config file for training the model: https://github.com/alexa/teach/blob/5554f02f55c22abfe5c2a749dbb24c13377726c8/src/teach/modeling/ET/alfred/config.py#L183 I believe it should be detach_lang_emb = True since we do not want to propagate the gradients through the look-up table or the language encoder.

    Please let me know your thoughts on this.

    Thanks, Divyam

    opened by dv-fenix 4
  • About the evaluation time spent

    About the evaluation time spent

    Could you please let me know how much time was spent on the evaluation? It took me about two days to evaluate with 4 processes, and I found that a large part of the time was spent in the state initialization of edh instance, as well as reaching max_api_fails and max_traj_steps. And the time for the agent to take a step is also very long, which is very dependent on the frequency of the CPU. Can you tell me the settings of your experimental equipment? and is there any other way to evaluate the trained model?

    opened by RupertLuo 3
  • Bump flask-cors from 3.0.8 to 3.0.9

    Bump flask-cors from 3.0.8 to 3.0.9

    Bumps flask-cors from 3.0.8 to 3.0.9.

    Release notes

    Sourced from flask-cors's releases.

    Release 3.0.9

    Security

    • Escape path before evaluating resource rules (thanks @​praetorian-colby-morgan). Prior to this, flask-cors incorrectly evaluated CORS resource matching before path expansion. E.g. "/api/../foo.txt" would incorrectly match resources for "/api/*" whereas the path actually expands simply to "/foo.txt"
    Changelog

    Sourced from flask-cors's changelog.

    3.0.9

    Security

    • Escape path before evaluating resource rules (thanks to Colby Morgan). Prior to this, flask-cors incorrectly evaluated CORS resource matching before path expansion. E.g. "/api/../foo.txt" would incorrectly match resources for "/api/*" whereas the path actually expands simply to "/foo.txt"
    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 2
  • How to setup docker to use multiple GPUs for inference

    How to setup docker to use multiple GPUs for inference

    Hi,

    I am trying to set up the evaluation environment on a multi-GPU AWS following the instructions in TEACh Benchmark Challenge. However, I encounter two problems: (1) The model can use only 1 GPU even if I have set the value of API_GPUS to multiple GPUs. (2) When I start the inference runner, although it is able to launch multiple ai2thor instances by specifying --num_process X, the processes are all on one GPU instead of on X GPUs. Also, I have to manually specify --model_api_host_and_port to include multiple API ports (e.g. "@TeachModelAPI:$API_PORT,@TeachModelAPI:$API_PORT,@TeachModelAPI:$API_PORT" for --num_processes 3), which seems weird.

    Besides, I notice that in this line it mentions that the model container will have access to only one GPU, while this line says that the model can use all GPUs of a p3.16xlarge instance. I wonder which would be the case, and if multiple GPUs are allowed, how to correctly setup the docker container.

    Thanks!

    opened by 594zyc 2
  • Questions about the evaluation rules for the Alexa Simbot Challenge

    Questions about the evaluation rules for the Alexa Simbot Challenge

    I have three questions regarding the evaluation rules for the Alexa Simbot Challenge:

    1. Can we use "dialog_history_cleaned" rather than "dialog_history" in the edh instance?
    2. Using only the action history and dialogue history from the driver results in missing a piece of key information - the time of each user utterance made during the interaction. We argue that such causal information should be allowed to use.
    3. Could you elaborate on the "should not use task definitions" rule? For example, are we allowed to integrate the task structures provided in the task definitions into our model, while do not rely on any ground truth task information during inference as the model has to figure out the task and arguments by itself from the dialog input?

    Thanks!

    opened by 594zyc 2
  • User-friendly API for manipulating the data

    User-friendly API for manipulating the data

    Hi @aishwaryap,

    Thanks for releasing your dataset. I was wondering whether you had a way to manipulate the low-level JSON data in a more user-friendly way. I can see from the codebase that there is a Dataset class that exposes a from_dict() method which is supposed to be used to create a Dataset object from a Dict. However, I'm currently having the following issue when doing so:

    >>> with open("/tmp/teach/games/train/8cdf3d9a18cac7fe_6b02.game.json") as in_file:
                game = json.load(in_file)
    >>> dataset = Dataset.from_dict(game)
    
    Traceback (most recent call last):
      File "<input>", line 1, in <module>
      File "/Users/asuglia/workspace/teach/src/teach/dataset/dataset.py", line 47, in from_dict
        tasks = [
      File "/Users/asuglia/workspace/teach/src/teach/dataset/dataset.py", line 48, in <listcomp>
        Task.from_dict(task_dict, definitions, process_init_state) for task_dict in dataset_dict.get("tasks")
      File "/Users/asuglia/workspace/teach/src/teach/dataset/task.py", line 35, in from_dict
        episodes = [
      File "/Users/asuglia/workspace/teach/src/teach/dataset/task.py", line 36, in <listcomp>
        Episode.from_dict(episode_dict, definitions, process_init_state)
      File "/Users/asuglia/workspace/teach/src/teach/dataset/episode.py", line 60, in from_dict
        initial_state=Initialization.from_dict(episode_dict["initial_state"])
      File "/Users/asuglia/workspace/teach/src/teach/dataset/initialization.py", line 47, in from_dict
        agents = [Pose_With_ID.from_dict(x) for x in initialization_dict["agents"]]
      File "/Users/asuglia/workspace/teach/src/teach/dataset/initialization.py", line 47, in <listcomp>
        agents = [Pose_With_ID.from_dict(x) for x in initialization_dict["agents"]]
      File "/Users/asuglia/workspace/teach/src/teach/dataset/pose.py", line 60, in from_dict
        return cls(identity=identity, pose=Pose.from_array(pose_with_id_dict["pose"]), is_object=is_object)
    KeyError: 'pose'
    

    In general, I would love to have a more object-oriented way of handling the data. I could write my own parser of the JSON data but I believe this logic must be somewhere already in your codebase. Potentially having an example script with an example showing how to explore the dataset might be useful to others as well. Any thoughts?

    opened by aleSuglia 2
  • No module named

    No module named "alfred"

    Hi,

    While running the teach_inference on the sample_model using the code below, the inference and the evaluation work fine.

    teach_inference     
      --data_dir /Users/sakthi/Desktop/teach-main/data4/     
      --output_dir /Users/sakthi/Desktop/teach-main/data4/outputs     
      --split valid_seen     
      --metrics_file /Users/sakthi/Desktop/teach-main/data4/outputs/metrics     
      --model_module teach.inference.sample_model    
      --model_class SampleModel    
      --images_dir /Users/sakthi/Desktop/teach-main/data4/images/
    

    But, while trying to run it with the ET models using the code below, it throws the error "No module named 'alfred'".

    teach_inference     
      --data_dir /Users/sakthi/Desktop/teach-main/data4/     
      --output_dir /Users/sakthi/Desktop/teach-main/data4/outputs     
      --split valid_seen     
      --metrics_file /Users/sakthi/Desktop/teach-main/data4/outputs/metrics    
       --model_module teach.inference.et_model    
      --model_class ETModel     
      --images_dir /Users/sakthi/Desktop/teach-main/data4/images/ 
      --model_dir /Users/sakthi/Desktop/teach-main/data4/baseline_models/et 
      --object_predictor /Users/sakthi/Desktop/teach-main/data4/et_pretrained_models/maskrcnn_model.pth     
      --visual_checkpoint /Users/sakthi/Desktop/teach-main/data4/et_pretrained_models/fasterrcnn_model.pth
    

    Error:

    Traceback (most recent call last):
      File "/Users/sakthi/opt/anaconda3/envs/teach_edh/bin/teach_inference", line 8, in <module>
        sys.exit(main())
      File "/Users/sakthi/Desktop/teach-main/src/teach/cli/inference.py", line 150, in main
        model_class=dynamically_load_class(args.model_module, args.model_class),
      File "/Users/sakthi/Desktop/teach-main/src/teach/utils.py", line 390, in dynamically_load_class
        module = __import__(package_path, fromlist=[class_name])
      File "/Users/sakthi/Desktop/teach-main/src/teach/inference/et_model.py", line 11, in <module>
        from alfred import constants
    ModuleNotFoundError: No module named 'alfred'
    

    Could you please let me know how to solve it? Or add the required alfred package to the requirements?

    Thanks!

    opened by msakthiganesh 1
  • Error while running teach_eval - float division by zero

    Error while running teach_eval - float division by zero

    Hi,

    I am trying to run the teach_eval command to evaluate the performance of the model for EDH by running the following command.

    teach_eval --data_dir /scratch/smahali6/teach_edh/teach/data/ --inference_output_dir /scratch/smahali6/teach_edh/teach/data/outputs/ --split valid_seen --metrics_file /scratch/smahali6/teach_edh/teach/data/outputs/metrics/
    

    Initially, I got the error below:

    [MainThread-71052-INFO] teach.cli.eval: Evaluating split valid_seen requiring 608 files
    INFO:teach.cli.eval:Evaluating split valid_seen requiring 608 files
    Traceback (most recent call last):
      File "/home/smahali6/.conda/envs/teach_edh/bin/teach_eval", line 8, in <module>
        sys.exit(main())
      File "/scratch/smahali6/teach_edh/teach/src/teach/cli/eval.py", line 79, in main
        logger.info("Evaluating split %s requiring %d files" % (args.split, len(edh_instance_files)))
    NameError: name 'edh_instance_files' is not defined
    

    I solved it by using the line below: edh_instance_files = set(os.listdir(os.path.join(args.data_dir, input_subdir, args.split)))

    Now, I get a ZeroDivisonError mentioned below:

    [MainThread-91566-INFO] teach.cli.eval: Evaluating split valid_seen requiring 608 files
    INFO:teach.cli.eval:Evaluating split valid_seen requiring 608 files
    [MainThread-91566-INFO] teach.cli.eval: Evaluating split valid_seen requiring 608 files
    INFO:teach.cli.eval:Evaluating split valid_seen requiring 608 files
    [MainThread-91566-INFO] teach.cli.eval: Found output files for 0 instances; treating remaining 608 as failed...
    INFO:teach.cli.eval:Found output files for 0 instances; treating remaining 608 as failed...
    Traceback (most recent call last):
      File "/home/smahali6/.conda/envs/teach_edh/bin/teach_eval", line 8, in <module>
        sys.exit(main())
      File "/scratch/smahali6/teach_edh/teach/src/teach/cli/eval.py", line 114, in main
        results = aggregate_metrics(traj_stats, args)
      File "/scratch/smahali6/teach_edh/teach/src/teach/eval/compute_metrics.py", line 87, in aggregate_metrics
        sr = float(num_successes) / num_evals
    ZeroDivisionError: float division by zero
    

    Does the pipeline require any files under the output directory as Found output files for 0 instances; treating remaining 608 as failed...? How can I fix this and evaluate the model?

    The directory tree for the project directory can be found in this link. https://drive.google.com/file/d/16DrPFl-dcxbPgtbIz0ryeFZXxDRUOcIO/view?usp=sharing

    Thanks!

    opened by msakthiganesh 1
  • Are the tests seen and tests unseen  splits not available again?

    Are the tests seen and tests unseen splits not available again?

    Hello,

    I think around late May, the tests seen and tests unseen splits were available withteach-download. Are they not available again? Is there a way to run one's model on them again?

    opened by soyeonm 1
  • Bump protobuf from 3.20.0 to 3.20.2

    Bump protobuf from 3.20.0 to 3.20.2

    Bumps protobuf from 3.20.0 to 3.20.2.

    Release notes

    Sourced from protobuf's releases.

    Protocol Buffers v3.20.2

    C++

    Protocol Buffers v3.20.1

    PHP

    • Fix building packaged PHP extension (#9727)
    • Fixed composer.json to only advertise compatibility with PHP 7.0+. (#9819)

    Ruby

    • Disable the aarch64 build on macOS until it can be fixed. (#9816)

    Other

    • Fix versioning issues in 3.20.0

    Protocol Buffers v3.20.1-rc1

    #PHP

    • Fix building packaged PHP extension (#9727)

    #Other

    • Fix versioning issues in 3.20.0
    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • Different vocab size between `data.vocab` and `embs_ann`?

    Different vocab size between `data.vocab` and `embs_ann`?

    Hi, I noticed that the data.vocab stored in the baseline model has a different vocabulary length compared to the language embedding stored in pretrained model.

    For the baseline model "et_plus_h", the data.vocab file has Vocab(2554) for words while if I load the pretrained model from baseline_models/et_plus_h/latest.pth, the embedding layer model.embs_ann.lmdb_simbot_edh_vocab_none.weight has torch.Size([2788, 768]).

    Did I miss something?

    opened by yingShen-ys 2
  • The same action prediction gets different evaluation metrics in different runs

    The same action prediction gets different evaluation metrics in different runs

    Hi,

    I ran the baseline ET model and found that two different runs get significantly different evaluation metrics. (might relate to this issue #10) Run1:

    SR: 77/608 = 0.127
    GC: 487/3526 = 0.138
    PLW SR: 0.026
    PLW GC: 0.093
    

    Run2:

    SR: 52/608 = 0.086
    GC: 321/3526 = 0.091
    PLW SR: 0.007
    PLW GC: 0.034
    

    After taking a close look at the output I find in some episodes the same set of prediction actions results in different evaluation metrics in different runs. For example in this 66957a984ae5a714_f28d.edh4, the inference output for the first run is:

    "66957a984ae5a714_f28d.edh4": {
            "instance_id": "66957a984ae5a714_f28d.edh4",
            "game_id": "66957a984ae5a714_f28d",
            "completed_goal_conditions": 2,
            "total_goal_conditions": 2,
            "goal_condition_success": 1,
            "success_spl": 0.55,
            "path_len_weighted_success_spl": 12.100000000000001,
            "goal_condition_spl": 0.55,
            "path_len_weighted_goal_condition_spl": 12.100000000000001,
            "gt_path_len": 22,
            "reward": 0,
            "success": 1,
            "traj_len": 40,
            "predicted_stop": 0,
            "num_api_fails": 30,
            "error": 0,
            "init_success": true,
            "pred_actions": [
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ]
            ]
        }
    

    While for the second run it is:

    "66957a984ae5a714_f28d.edh4": {
            "instance_id": "66957a984ae5a714_f28d.edh4",
            "game_id": "66957a984ae5a714_f28d",
            "completed_goal_conditions": 0,
            "total_goal_conditions": 2,
            "goal_condition_success": 0.0,
            "success_spl": 0.0,
            "path_len_weighted_success_spl": 0.0,
            "goal_condition_spl": 0.0,
            "path_len_weighted_goal_condition_spl": 0.0,
            "gt_path_len": 22,
            "reward": 0.0,
            "success": 0,
            "traj_len": 40,
            "predicted_stop": 0,
            "num_api_fails": 30,
            "error": 0,
            "init_success": true,
            "pred_actions": [
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ],
                [
                    "Forward",
                    null
                ]
            ]
        }
    

    So basically the first evaluation result does not make sense since there should be no chance for the model to succeed without performing any manipulative actions.

    The first run is done using an AWS ec2 p3.8 instance while the second run using a p3.16. All the other settings are the same. The full evaluation logs are available here: [run 1] [run 2]

    Do you have any idea about the cause? Thanks

    opened by 594zyc 3
  • Support for Python 3.9?

    Support for Python 3.9?

    README.md states that only python version >=3.7 and <=3.8 are supported but setup.py only specifies >=3.7. Is Python 3.9 officially supported?

    opened by yukw777 2
  • Trajectories that raise an error are ignored

    Trajectories that raise an error are ignored

    Hi @aishwaryap,

    I was reading about the recent changes to the code reported in #10 and we unfortunately get results that differ substantially from yours. I started dissecting the code to understand what's the reason for such discrepancies in the results. From my understanding of the inference_runner.py script, you spawn several processes, each with a given portion of the tasks. However, I can see that the exception handling logic simply ignores an instance that raises an error: https://github.com/emma-simbot/teach/blob/speaker_tokens/src/teach/inference/inference_runner.py#L130

    This is detrimental because if a dataset instance errors for whatever reason, its contribution to the overall metrics is ignored. Instead, the proper way of dealing with this should be to ignore that trajectory and still add to the metrics that you were not successful. Potentially, such faulty trajectories should be reported in the metrics file for future debugging.

    Am I missing something?

    opened by aleSuglia 1
  • Possible bugs in `get_state_changes`

    Possible bugs in `get_state_changes`

    Hi @aishwaryap,

    Thank you for releasing the dataset. It seems that there is a bug in the get_state_changes function: https://github.com/alexa/teach/blob/5554f02f55c22abfe5c2a749dbb24c13377726c8/src/teach/utils.py#L92

    I believe it should be agent_final = final_state["agents"][idx] instead. As a result, the state differences of the agents are empty in all teach-dataset/images/$SPLIT/$REPLAYED_CODE/statediff.*.json files.

    Thanks, Jiachen

    opened by Ji4chenLi 3
  • Much higher scores when evaluating Episodic Transformer baselines for EDH instances

    Much higher scores when evaluating Episodic Transformer baselines for EDH instances

    Hello,

    I have finished the evaluation of the Episodic Transformer baselines for the TEACh Benchmark Challenge on the valid_seen.

    However, one weird thing I found is that our reproduced result is much higher than what is reported in the paper. The result is shown below (All values are percentages). There is a total of 608 EDH instances (valid_seen) in the metric file which matches the number in the paper.

      | SR [TLW] | GC [TLW] -- | -- | -- Reproduced | 13.8 [3.2] | 14 [8.7] Reported in the paper | 5.76 [0.90] | 7.99 [1.65]

    I believe I am using the correct checkpoints. And the only change I made to the code is mentioned in #9.

    I am running on an AWS instance. I have started the X-server and installed all requirements and prerequisites without bugs. And the inference process is bugfree.

    Here is the script I used for evaluation.

    #!/bin/sh
    
    export AWS_ROOT=/home/ubuntu/workplace
    export ET_DATA=$AWS_ROOT/data
    export TEACH_ROOT_DIR=$AWS_ROOT/teach
    export TEACH_SRC_DIR=$TEACH_ROOT_DIR/src
    export ET_ROOT=$TEACH_SRC_DIR/teach/modeling/ET
    export ET_LOGS=$TEACH_ROOT_DIR/src/teach/modeling/ET/checkpoints
    export INFERENCE_OUTPUT_PATH=$TEACH_ROOT_DIR/inference_output
    export PYTHONPATH=$TEACH_SRC_DIR:$ET_ROOT:$PYTHONPATH
    export SPLIT=valid_seen
    
    cd $TEACH_ROOT_DIR
    python src/teach/cli/inference.py \
                --model_module teach.inference.et_model \
                    --model_class ETModel \
                        --data_dir $ET_DATA \
                            --output_dir $INFERENCE_OUTPUT_PATH/inference__teach_et_trial_$SPLIT \
                                --split $SPLIT \
                                    --metrics_file $INFERENCE_OUTPUT_PATH/metrics__teach_et_trial_$SPLIT.json \
                                        --seed 4 \
                                            --model_dir $ET_DATA/baseline_models/et \
                                                --object_predictor $ET_LOGS/pretrained/maskrcnn_model.pth \
                                                --visual_checkpoint $ET_LOGS/pretrained/fasterrcnn_model.pth \
                                                    --device "cpu" \
                                                    --images_dir $INFERENCE_OUTPUT_PATH/images
    

    I wonder if the data split provided in the dataset is the same as the paper. And if so, what would be the possible explanation for this?

    Please let me know if someone else is getting similar results. Thank you!

    opened by yingShen-ys 4
Owner
Alexa
Alexa
Source code for AAAI20 "Generating Persona Consistent Dialogues by Exploiting Natural Language Inference".

Generating Persona Consistent Dialogues by Exploiting Natural Language Inference Source code for RCDG model in AAAI20 Generating Persona Consistent Di

null 16 Oct 8, 2022
Deal or No Deal? End-to-End Learning for Negotiation Dialogues

Introduction This is a PyTorch implementation of the following research papers: (1) Hierarchical Text Generation and Planning for Strategic Dialogue (

Facebook Research 1.4k Dec 29, 2022
Interactive Jupyter Notebook Environment for using the GPT-3 Instruct API

gpt3-instruct-sandbox Interactive Jupyter Notebook Environment for using the GPT-3 Instruct API Description This project updates an existing GPT-3 san

null 312 Jan 3, 2023
Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

null 186 Dec 24, 2022
A complete NLP guideline for enthusiasts

NLP-NINJA A complete guide for Natural Language Processing in Python Table of Contents S.No. Topic Level Meaning 1 Tokenization ?? Beginner 2 Stemming

MAINAK CHAUDHURI 22 Dec 27, 2022
This is the Alpha of Nutte language, she is not complete yet / Essa é a Alpha da Nutte language, não está completa ainda

nutte-language This is the Alpha of Nutte language, it is not complete yet / Essa é a Alpha da Nutte language, não está completa ainda My language was

catdochrome 2 Dec 18, 2021
The FinQA dataset from paper: FinQA: A Dataset of Numerical Reasoning over Financial Data

Data and code for EMNLP 2021 paper "FinQA: A Dataset of Numerical Reasoning over Financial Data"

Zhiyu Chen 114 Dec 29, 2022
WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

Google Research Datasets 740 Dec 24, 2022
A 30000+ Chinese MRC dataset - Delta Reading Comprehension Dataset

Delta Reading Comprehension Dataset 台達閱讀理解資料集 Delta Reading Comprehension Dataset (DRCD) 屬於通用領域繁體中文機器閱讀理解資料集。 本資料集期望成為適用於遷移學習之標準中文閱讀理解資料集。 本資料集從2,108篇

null 272 Dec 15, 2022
ReCoin - Restoring our environment and businesses in parallel

Shashank Ojha, Sabrina Button, Abdellah Ghassel, Joshua Gonzales "Reduce Reuse R

sabrina button 1 Mar 14, 2022
Framework for fine-tuning pretrained transformers for Named-Entity Recognition (NER) tasks

NERDA Not only is NERDA a mesmerizing muppet-like character. NERDA is also a python package, that offers a slick easy-to-use interface for fine-tuning

Ekstra Bladet 141 Dec 30, 2022
Text vectorization tool to outperform TFIDF for classification tasks

WHAT: Supervised text vectorization tool Textvec is a text vectorization tool, with the aim to implement all the "classic" text vectorization NLP meth

null 186 Dec 29, 2022
Text vectorization tool to outperform TFIDF for classification tasks

WHAT: Supervised text vectorization tool Textvec is a text vectorization tool, with the aim to implement all the "classic" text vectorization NLP meth

null 160 Feb 9, 2021
Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks. It takes raw videos/images + text as inputs, and outputs task predictions. ClipBERT is designed based on 2D CNNs and transformers, and uses a sparse sampling strategy to enable efficient end-to-end video-and-language learning.

Jie Lei 雷杰 612 Jan 4, 2023
TextFlint is a multilingual robustness evaluation platform for natural language processing tasks,

TextFlint is a multilingual robustness evaluation platform for natural language processing tasks, which unifies general text transformation, task-specific transformation, adversarial attack, sub-population, and their combinations to provide a comprehensive robustness analysis.

TextFlint 587 Dec 20, 2022
skweak: A software toolkit for weak supervision applied to NLP tasks

Labelled data remains a scarce resource in many practical NLP scenarios. This is especially the case when working with resource-poor languages (or text domains), or when using task-specific labels without pre-existing datasets. The only available option is often to collect and annotate texts by hand, which is expensive and time-consuming.

Norsk Regnesentral (Norwegian Computing Center) 850 Dec 28, 2022
NLPShala , the best IDE for all Natural language processing tasks.

The revolutionary IDE for all NLP (Natural language processing) stuffs on the internet.

Abhi 3 Aug 8, 2021
Universal End2End Training Platform, including pre-training, classification tasks, machine translation, and etc.

背景 安装教程 快速上手 (一)预训练模型 (二)机器翻译 (三)文本分类 TenTrans 进阶 1. 多语言机器翻译 2. 跨语言预训练 背景 TrenTrans是一个统一的端到端的多语言多任务预训练平台,支持多种预训练方式,以及序列生成和自然语言理解任务。 安装教程 git clone git

Tencent Minority-Mandarin Translation Team 42 Dec 20, 2022
pysentimiento: A Python toolkit for Sentiment Analysis and Social NLP tasks

A Python multilingual toolkit for Sentiment Analysis and Social NLP tasks

null 297 Dec 29, 2022