Code for SIMMC 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations

Facebook Research

Last update: Nov 22, 2022

Related tags

Deep Learning simmc2

Overview

The Second Situated Interactive MultiModal Conversations (SIMMC 2.0) Challenge 2021

Welcome to the Second Situated Interactive Multimodal Conversations (SIMMC 2.0) Track for DSTC10 2021.

The SIMMC challenge aims to lay the foundations for the real-world assistant agents that can handle multimodal inputs, and perform multimodal actions. Similar to the First SIMMC challenge (as part of DSTC9), we focus on the task-oriented dialogs that encompass a situated multimodal user context in the form of a co-observed & immersive virtual reality (VR) environment. The conversational context is dynamically updated on each turn based on the user actions (e.g. via verbal interactions, navigation within the scene). For this challenge, we release a new Immersive SIMMC 2.0 dataset in the shopping domains: furniture and fashion.

Organizers: Seungwhan Moon, Satwik Kottur, Paul A. Crook, Ahmad Beirami, Babak Damavandi, Alborz Geramifard

Example from SIMMC-Furniture Dataset

Latest News

[June 14, 2021] Challenge announcement. Training / development datasets (SIMMC v2.0) are released.

Important Links

Task Description Paper
Challenge Registration
Data Formats
Baseline Details: Will be added soon!
Challenge Instructions
Submission Instructions

Timeline

Date	Milestone
June 14, 2021	Training & development data released
Sept 24, 2021	Test-Std data released, End of Challenge Phase 1
Oct 1, 2021	Entry submission deadline, End of Challenge Phase 2
Oct 8, 2021	Final results announced

Track Description

Tasks and Metrics

We present four sub-tasks primarily aimed at replicating human-assistant actions in order to enable rich and interactive shopping scenarios.

Sub-Task #1	Multimodal Disambiguation
Goal	To classify if the assistant should disambiguate in the next turn
Input	Current user utterance, Dialog context, Multimodal context
Output	Binary label
Metrics	Binary classification accuracy

Sub-Task #2	Multimodal Coreference Resolution
Goal	To resolve referent objects to thier canonical ID(s) as defined by the catalog.
Input	Current user utterance with objection mentions, Dialog context, Multimodal context
Output	Canonical object IDs
Metrics	Coref F1 / Precision / Recall

Sub-Task #3	Multimodal Dialog State Tracking (MM-DST)
Goal	To track user belief states across multiple turns
Input	Current user utterance, Dialogue context, Multimodal context
Output	Belief state for current user utterance
Metrics	Slot F1, Intent F1

Sub-Task #4	Multimodal Dialog Response Generation & Retrieval
Goal	To generate Assistant responses or retrieve from a candidate pool
Input	Current user utterance, Dialog context, Multimodal context, (Ground-truth API Calls)
Output	Assistant response utterance
Metrics	Generation: BLEU-4, Retrieval: MRR, R@1, R@5, R@10, Mean Rank

Please check the task input file for a full description of inputs for each subtask.

Evaluation

For the DSTC10 SIMMC Track, we will do a two phase evaluation as follows.

Challenge Period 1: Participants will evaluate the model performance on the provided devtest set. At the end of Challenge Period 1 (Sept 24), we ask participants to submit their model prediction results and a link to their code repository.

Challenge Period 2: A test-std set will be released on Sept 28 for the participants who submitted the results for the Challenge Period 1. We ask participants to submit their model predictions on the test-std set by Oct 1. We will announce the final results and the winners on Oct 8.

Challenge Instructions

(1) Challenge Registration

Fill out this form to register at DSTC10. Check “Track 3: SIMMC 2.0: Situated Interactive Multimodal Conversational AI” along with other tracks you are participating in.

(2) Download Datasets and Code

Irrespective of participation in the challenge, we'd like to encourge those interested in this dataset to complete this optional survey. This will also help us communicate any future updates on the codebase, the datasets, and the challenge track.
Git clone our repository to download the datasets and the code. You may use the provided baselines as a starting point to develop your models.

$ git lfs install
$ git clone https://github.com/facebookresearch/simmc2.git

(3) Reporting Results for Challenge Phase 1

Submit your model prediction results on the devtest set, following the submission instructions.
We will release the test-std set (with ground-truth labels hidden) on Sept 24.

(4) Reporting Results for Challenge Phase 2

Submit your model prediction results on the test-std set, following the submission instructions.
We will evaluate the participants’ model predictions using the same evaluation script for Phase 1, and announce the results.

Contact

Questions related to SIMMC Track, Data, and Baselines

Please contact [email protected], or leave comments in the Github repository.

DSTC Mailing List

If you want to get the latest updates about DSTC10, join the DSTC mailing list.

Citations

If you want to publish experimental results with our datasets or use the baseline models, please cite the following articles:

@article{kottur2021simmc,
  title={SIMMC 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations},
  author={Kottur, Satwik and Moon, Seungwhan and Geramifard, Alborz and Damavandi, Babak},
  journal={arXiv preprint arXiv:2104.08667},
  year={2021}
}

NOTE: The paper above describes in detail the datasets, the collection process, and some of the baselines we provide in this challenge. The paper reports the results from an earlier version of the dataset and with different train-dev-test splits, hence the baseline performances on the challenge resources will be slightly different.

License

SIMMC 2.0 is released under CC-BY-NC-SA-4.0, see LICENSE for details.

Comments

I have a question about json data in public.
I have a question about json data in public.

What do each of the four values in bbox represent?

How do I change the bbox+position to (left, top, right, bottom)?

Does relationships represent a relationship between indices?

What is the difference between unique_id and index?

Thank you.
opened by rungjoo 4
Correct code errors
@satwikkottur I add minor corrects to the code. Please let me know if they are incorrect

Delete tokenizer.padding_side = "left". It seems that after removing this line, the accuracy could be improved from 72% to 92% in the devtest set.

Add tokenizer.truncation_side = "left" to enable truncate oldest context tokens when needed.

CLA Signed
opened by jianguoz 3
Questions about the testutd retrieval candidate file containing only one turn

Hi, you have released the simmc2_dials_dstc10_teststd_retrieval_candidates_public.json data for the challenge phase 2. But I found out that the original the teststd data contains only one turn of candidates in each dialog. Since the original simmc2_dials_dstc10_devtest_retrieval_candidates.json file had candidates of multiple turns in a dialog, I just wanted to make sure that this modification was made on purpose. Thank you!

opened by boychaboy 3
Question about subtask1

Hello.

When I solve subtask1, I have a question.

subtask2 needs to find an object reference corresponding to the current utterance.

Is this multimodal information available in subtask1? That is, when predicting disambiguation, is it okay to use multimodal information (e.g. object bounding bbox and metatdata) corresponding to the current utterance? Or can I just use information about the whole image corresponding to the dialog?

Thank you.

opened by rungjoo 3
About evaluate_response.py

Hi,

In evaluate_response.py, I see the following snippet

def parse_response_from_file(input_path): """Parses the response from a flattened file. Args: input_path: Path to read the responses from. """ lines = [] with open(input_path, "r") as file_id: for ii in file_id.readlines(): split_line = ii.split("<SOR>", 1) lines.append( (split_line[0].strip("\n"), split_line[1].strip("\n").strip("")) ) return lines

Here we have <SOR>, but this is only used at noblief mode, while the baseline also uses belief. Is it allowed to fix evaluation code a little for cases like this? or should I conform to this eval script?

opened by heyzude 2
Missing Object IDs

Hi,

I am facing the issue that some object IDs do not appear in the scene file for that dialogue. For example, in m_cloth_store_1498649_woman_5_9, there is a reference to object ID 55, but there is no object ID 55 either as index or unique_id. Could you please clarify this?

This question was also asked by @tungngthanh, (quoted below), as the issue was closed without an answer.

After a quick count, this happens ~615 times in the train split: the target object ID does not appear in the scene or bbox jsons for that dialogue.

Thank you.

EDIT: if I include the objects in system_transcript_annotated and transcript_annotated together, ~3183 entries (user utterance + dialogue history up to that point) make references to objects that do not appear in the respective scene jsons, around 8% of the train data. I am skipping these for now.

Thank you for your reply. Follow your suggestions, I find that each scene idx corresponds to one json file and one image file. However, now I face some problems when I mapped the object_local_id to the canonical object_id. For example, the third dialogue (index 2) in the training set which has the scene_ids as follows: {'0': 'm_cloth_store_1498649_woman_5_3', '5': 'm_cloth_store_1498649_woman_5_9'} its 7th uterance is : { 'turn_idx': 6, 'system_transcript': "Sure, I'll add that now.", 'system_transcript_annotated': {'act': 'CONFIRM:ADD_TO_CART', 'act_attributes': {'slot_values': {}, 'request_slots': [], 'objects': [55]}}, 'transcript': 'Actually, just add that brown jacket to my cart.', 'transcript_annotated': {'act': 'REQUEST:ADD_TO_CART', 'act_attributes': {'slot_values': {}, 'request_slots': [], 'objects': [55]}} } According to the document, we should expect to see the local_id 55 object in m_cloth_store_1498649_woman_5_9_scene.json, right? However, when I load the file, I do not see 55 in index or unique_id. Can you clarify it for me?

Originally posted by @tungngthanh in https://github.com/facebookresearch/simmc2/issues/3#issuecomment-896868063

opened by JChiyah 2
Sometimes it's Impossible to predict the correct Belief state

At simmc2_dials_dstc10_devtest_target.txt, the following line exits:

User : I'm interested in a hoodie. System : How do you feel about this brown one here on the front floor rack, and the brown one in front of it? They are both hoodies. 47, 50 User : What's the prive of the item? => Belief State : ASK:GET [ ] (price) < 36 > Which item do you mean?

But I think it's impossible to correctly predict the object at Belief State which is 36, when the model is given only the part before Belief State, because the user utterance contains ambiguity in it (so the system actually disambiguates it!).

If I'm right, this is a problem. Please let me know.

opened by heyzude 2
Parse Error on model/mm_dst/gpt2_dst/utils/convert.py

There is an error on parse_flattened_result function in model/mm_dst/gpt2_dst/utils/convert.py

This function returns empty array when it handles strings contain nested square bracktes.

e.g.) ".. INFORM:GET [ sleeveLength = short, availableSizes = ['XXL', 'S', 'L'], pattern = leafy design, type = blouse ] (availableSizes, pattern) < 86, 57 > ... "

opened by han0ah 2
Broken links

Hi,

I think some of the links in the README.md are broken or files are missing. For instance, the following:

Please check the [task input](./TASK_INPUTS.md) file for a full description of inputs for each subtask.

It references TASK_INPUTS.md but I cannot find the file anywhere in the repository and opening it takes me to a 404 Not Found webpage. Are there files missing from the repo by any chance? Thanks! :)

opened by JChiyah 2
Question about Data format

Hello. I am a DSTC10 participant.

Thanks for the data release.

After downloading and checking the data, it is currently stored as follows in simmc2_dials_dstc10_train.json.

It's different from what you described in the data introduction. Is there an update?

Thank you.

opened by rungjoo 2
what does "all", "dress" mean in disambiguation_candidates_raw ?

"disambiguation candidate raw" seems to narrow down the object subspace. Normally, discrete indexes are listed, but there are times where the values are "all", "blouse", "dress" etc. What does these string type values mean?

I guessed that "all" and "blouse" would mean all blouses in the given scene. I'm also curious if "disambiguation candidate raw" could be used at testing.

opened by bambidz 1

Some of the coreference label/target sets do not exist in the object map in their corresponding scenes

Hello, I would like to raise a question regarding SIMMC 2.1 dataset provided here.

During creating my own data preprocessing script, I noticed that some of the coreference labels (obtained from the dialogue JSON data under transcript_annotated > act_attributes > objects; following MM-DST's preprocessing script) are not a subset of the corresponding object maps used by the relevant dialogue id and turn id. I extracted these object maps following this function from the preprocessing script of ambiguous candidate identification. For clarity, here are the mismatched label/target set and object map pairs from the devtest data.

dialog_id 10618 | turn_id 7 | image_name cloth_store_1416238_woman_3_9.png | scene_label m_cloth_store_1416238_woman_3_9 | target {57, 2} | object_map {85, 86, 87, 56, 57, 58, 59, 61, 62, 63}
dialog_id 10653 | turn_id 8 | image_name cloth_store_1416238_woman_4_6.png | scene_label m_cloth_store_1416238_woman_4_6 | target {2, 59} | object_map {1, 2, 3, 4, 5, 6, 7, 8, 12, 13, 14, 76, 77, 78, 79, 80, 81, 82, 83}
dialog_id 10677 | turn_id 7 | image_name cloth_store_1498649_woman_20_3.png | scene_label m_cloth_store_1498649_woman_20_3 | target {52, 53} | object_map {19, 20, 21, 22, 24, 25, 26, 27, 31, 32, 33, 34, 35, 36, 37, 44, 45, 46, 47}
dialog_id 10677 | turn_id 8 | image_name cloth_store_1498649_woman_20_3.png | scene_label m_cloth_store_1498649_woman_20_3 | target {52, 53} | object_map {19, 20, 21, 22, 24, 25, 26, 27, 31, 32, 33, 34, 35, 36, 37, 44, 45, 46, 47}
dialog_id 10743 | turn_id 8 | image_name cloth_store_1498649_woman_20_10.png | scene_label m_cloth_store_1498649_woman_20_10 | target {21} | object_map {0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 23, 28, 29, 38, 40, 43, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66}
dialog_id 10743 | turn_id 9 | image_name cloth_store_1498649_woman_20_10.png | scene_label m_cloth_store_1498649_woman_20_10 | target {21} | object_map {0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 23, 28, 29, 38, 40, 43, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66}
dialog_id 10769 | turn_id 8 | image_name cloth_store_1498649_woman_20_9.png | scene_label m_cloth_store_1498649_woman_20_9 | target {15} | object_map {0, 1, 2, 3, 4, 5}
dialog_id 10788 | turn_id 7 | image_name cloth_store_1498649_woman_20_6.png | scene_label m_cloth_store_1498649_woman_20_6 | target {71} | object_map {0, 1, 4, 5, 6, 8, 9, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43}
dialog_id 10873 | turn_id 7 | image_name cloth_store_1498649_woman_20_5.png | scene_label m_cloth_store_1498649_woman_20_5 | target {9, 14} | object_map {48, 51, 52, 53, 54, 59, 60, 61, 62, 63, 64, 67, 68, 69, 70, 71, 72, 73, 74, 75}
dialog_id 10873 | turn_id 8 | image_name cloth_store_1498649_woman_20_5.png | scene_label m_cloth_store_1498649_woman_20_5 | target {9} | object_map {48, 51, 52, 53, 54, 59, 60, 61, 62, 63, 64, 67, 68, 69, 70, 71, 72, 73, 74, 75}
dialog_id 10896 | turn_id 8 | image_name cloth_store_1498649_woman_2_9.png | scene_label m_cloth_store_1498649_woman_2_9 | target {8, 18} | object_map {0, 1, 2, 40, 42, 44, 45, 14, 15, 46, 19, 20, 53, 21, 23, 24, 60}
dialog_id 10942 | turn_id 7 | image_name cloth_store_1416238_woman_3_11.png | scene_label m_cloth_store_1416238_woman_3_11 | target {40, 3} | object_map {39, 40, 41, 42, 43, 44, 45}
dialog_id 10966 | turn_id 8 | image_name cloth_store_1498649_woman_20_5.png | scene_label m_cloth_store_1498649_woman_20_5 | target {7} | object_map {48, 51, 52, 53, 54, 59, 60, 61, 62, 63, 64, 67, 68, 69, 70, 71, 72, 73, 74, 75}
dialog_id 10981 | turn_id 9 | image_name cloth_store_1416238_woman_4_5.png | scene_label m_cloth_store_1416238_woman_4_5 | target {33, 9} | object_map {15, 17, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36}
dialog_id 11030 | turn_id 9 | image_name cloth_store_1498649_woman_20_1.png | scene_label m_cloth_store_1498649_woman_20_1 | target {33, 36} | object_map {0, 1, 2, 3, 4, 5}
dialog_id 11092 | turn_id 6 | image_name cloth_store_1416238_woman_4_5.png | scene_label m_cloth_store_1416238_woman_4_5 | target {60, 37} | object_map {15, 17, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36}
dialog_id 11107 | turn_id 7 | image_name cloth_store_1498649_woman_2_9.png | scene_label m_cloth_store_1498649_woman_2_9 | target {34, 39} | object_map {0, 1, 2, 40, 42, 44, 45, 14, 15, 46, 19, 20, 53, 21, 23, 24, 60}
dialog_id 11150 | turn_id 9 | image_name cloth_store_1498649_woman_20_2.png | scene_label m_cloth_store_1498649_woman_20_2 | target {17, 18} | object_map {48, 51, 52, 53, 54, 55, 59, 60, 61, 62, 63, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78}
dialog_id 11179 | turn_id 9 | image_name cloth_store_1498649_woman_20_10.png | scene_label m_cloth_store_1498649_woman_20_10 | target {69} | object_map {0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 23, 28, 29, 38, 40, 43, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66}
dialog_id 11258 | turn_id 6 | image_name cloth_store_1416238_woman_4_3.png | scene_label m_cloth_store_1416238_woman_4_3 | target {63, 71} | object_map {97, 98, 99, 100, 84, 85, 86, 87, 88, 89, 90, 91, 92}
dialog_id 11325 | turn_id 8 | image_name cloth_store_1416238_woman_3_11.png | scene_label m_cloth_store_1416238_woman_3_11 | target {25} | object_map {39, 40, 41, 42, 43, 44, 45}
dialog_id 11509 | turn_id 7 | image_name cloth_store_1498649_woman_2_1.png | scene_label m_cloth_store_1498649_woman_2_1 | target {1, 45} | object_map {0, 1, 41, 43, 44, 14, 15, 51, 52, 53, 20, 23, 21, 24, 59, 61, 62, 63}
dialog_id 11584 | turn_id 7 | image_name cloth_store_1416238_woman_3_9.png | scene_label m_cloth_store_1416238_woman_3_9 | target {3, 6} | object_map {85, 86, 87, 56, 57, 58, 59, 61, 62, 63}
dialog_id 11630 | turn_id 9 | image_name cloth_store_1416238_woman_4_10.png | scene_label m_cloth_store_1416238_woman_4_10 | target {0, 6} | object_map {0, 9, 10, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50}
dialog_id 11677 | turn_id 8 | image_name cloth_store_1416238_woman_4_3.png | scene_label m_cloth_store_1416238_woman_4_3 | target {59} | object_map {97, 98, 99, 100, 84, 85, 86, 87, 88, 89, 90, 91, 92}
dialog_id 11677 | turn_id 9 | image_name cloth_store_1416238_woman_4_3.png | scene_label m_cloth_store_1416238_woman_4_3 | target {59} | object_map {97, 98, 99, 100, 84, 85, 86, 87, 88, 89, 90, 91, 92}
dialog_id 11738 | turn_id 7 | image_name cloth_store_1416238_woman_3_9.png | scene_label m_cloth_store_1416238_woman_3_9 | target {72, 49} | object_map {85, 86, 87, 56, 57, 58, 59, 61, 62, 63}
dialog_id 11738 | turn_id 8 | image_name cloth_store_1416238_woman_3_9.png | scene_label m_cloth_store_1416238_woman_3_9 | target {49} | object_map {85, 86, 87, 56, 57, 58, 59, 61, 62, 63}
dialog_id 11792 | turn_id 8 | image_name cloth_store_1416238_woman_3_9.png | scene_label m_cloth_store_1416238_woman_3_9 | target {8} | object_map {85, 86, 87, 56, 57, 58, 59, 61, 62, 63}
dialog_id 11792 | turn_id 9 | image_name cloth_store_1416238_woman_3_9.png | scene_label m_cloth_store_1416238_woman_3_9 | target {8} | object_map {85, 86, 87, 56, 57, 58, 59, 61, 62, 63}
dialog_id 11839 | turn_id 6 | image_name cloth_store_1416238_woman_3_11.png | scene_label m_cloth_store_1416238_woman_3_11 | target {10, 46} | object_map {39, 40, 41, 42, 43, 44, 45}
dialog_id 11866 | turn_id 6 | image_name cloth_store_1498649_woman_20_2.png | scene_label m_cloth_store_1498649_woman_20_2 | target {57} | object_map {48, 51, 52, 53, 54, 55, 59, 60, 61, 62, 63, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78}
dialog_id 11869 | turn_id 8 | image_name cloth_store_1498649_woman_20_3.png | scene_label m_cloth_store_1498649_woman_20_3 | target {78, 55} | object_map {19, 20, 21, 22, 24, 25, 26, 27, 31, 32, 33, 34, 35, 36, 37, 44, 45, 46, 47}
dialog_id 12059 | turn_id 8 | image_name cloth_store_1498649_woman_20_5.png | scene_label m_cloth_store_1498649_woman_20_5 | target {6} | object_map {48, 51, 52, 53, 54, 59, 60, 61, 62, 63, 64, 67, 68, 69, 70, 71, 72, 73, 74, 75}
dialog_id 12128 | turn_id 6 | image_name cloth_store_1416238_woman_4_3.png | scene_label m_cloth_store_1416238_woman_4_3 | target {11, 12} | object_map {97, 98, 99, 100, 84, 85, 86, 87, 88, 89, 90, 91, 92}
dialog_id 12146 | turn_id 8 | image_name cloth_store_1416238_woman_3_1.png | scene_label m_cloth_store_1416238_woman_3_1 | target {56, 86} | object_map {60, 64, 69, 70, 71, 72, 73, 74, 75, 76, 81, 82, 83, 84, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100}
dialog_id 12167 | turn_id 9 | image_name cloth_store_1416238_woman_3_3.png | scene_label m_cloth_store_1416238_woman_3_3 | target {43} | object_map {32, 3, 4, 5, 6, 7, 8, 9, 10, 37, 76, 84, 29}
dialog_id 12170 | turn_id 6 | image_name cloth_store_1498649_woman_2_5.png | scene_label m_cloth_store_1498649_woman_2_5 | target {40, 2} | object_map {32, 33, 34, 35, 36, 37, 38, 39, 25, 26, 27, 28, 29, 30, 31}
dialog_id 12236 | turn_id 6 | image_name cloth_store_1416238_woman_4_0.png | scene_label m_cloth_store_1416238_woman_4_0 | target {98, 85} | object_map {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14}
dialog_id 12385 | turn_id 6 | image_name cloth_store_1498649_woman_20_9.png | scene_label m_cloth_store_1498649_woman_20_9 | target {60, 70} | object_map {0, 1, 2, 3, 4, 5}
dialog_id 12400 | turn_id 8 | image_name cloth_store_1498649_woman_20_10.png | scene_label m_cloth_store_1498649_woman_20_10 | target {34, 18} | object_map {0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 23, 28, 29, 38, 40, 43, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66}
dialog_id 12479 | turn_id 8 | image_name cloth_store_1498649_woman_20_5.png | scene_label m_cloth_store_1498649_woman_20_5 | target {9, 11} | object_map {48, 51, 52, 53, 54, 59, 60, 61, 62, 63, 64, 67, 68, 69, 70, 71, 72, 73, 74, 75}
dialog_id 12482 | turn_id 8 | image_name cloth_store_1498649_woman_20_0.png | scene_label m_cloth_store_1498649_woman_20_0 | target {4, 6} | object_map {6, 7, 10, 13, 48, 49, 50, 51, 52, 53, 54}

I would like to know if these differences are intentional. If not, I wonder how the corresponding data instances should be handled during the evaluation process.

PS: If you can't reproduce these mismatched results, my preprocessing script could be incorrect. In that case, it would be great for me to know how the coreference labels or the object maps should be extracted to work as intended.

opened by holylovenia 0

Questions about SIMMC 2.1 Task1 Annotation

Hi，SIMMC Organizers!

When we analyze disambiguation annotations provided by SIMMC 2.1 Task1, we are a little confused about "disambiguation_candidates" field and "disambiguation_candidates_raw" field.

We find that sometimes "disambiguation_candidates_raw" field is "all items" even concrete description, like “light blue jeans”, has existed in the user utterance.

SIMMC 2.1 Dev dataset -- Dialogue ID 12216, Turn 3

Last User: Just go ahead and add the blue jeans to my cart.

Last System: Okay, they will be added.

Curr User: Now tell me the size and brand for the light blue jeans.

Curr System: Which ones?

Disam Raw: ['all', 'items']

Disam Candidates: [25, 26, 28, 29, 30, 31, 34, 35, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 61, 97, 98, 99]

We also find that "disambiguation_candidates_raw" field contains object id which are not related to the object type mentioned in the user utterance. The following dialogue disambiguation relates to "brown jacket" as user says but "disambiguation_candidates_raw" contains brown pants.

SIMMC 2.1 Dev dataset -- Dialogue ID 10858, Turn 7

Last User: Also, I need some large shoes.

Last System: Tell me what you think of the red shoes on the bottom of the right dresser?

Curr User: Actually, could you give me the available sizes of the brown jacket?

Curr System: Which one are you talking about?

Disam Raw: [60, 69, 81, 86, 92]

Disam Candidates: [60, 69, 81, 86, 92]

Besides, in some cases, there is only one object ID exists in "disambiguation_candidates_raw" field and "disambiguation_candidates" field, which seems contradictory to the goal of disambiguation task. For example,

SIMMC 2.1 Dev dataset -- Dialogue ID 11088, Turn 3

Last User: I need a solid color hoodie from 212 Local.

Last System: Here's a brown one on the far left, check it out.

Curr User: How much does that brown hoodie cost?

Curr System: Sorry, which one?

Disam Raw: [45]

Disam Candidates: [45]

Therefore, we want to ask for some help from you. Can you tell us the annotation process about SIMMC 2.1 Task1 or how to determine the scope of disambiguation candidates object IDs. Thank you very much!

opened by StarrySkyLYX 7
Subtask 4 b - length of the answer candidate list

Hello, I'm currently trying to implement a model for subtask 4b but I'm not able to find information about how long the list of answer candidates should be for the retrieval task. I would be really grateful for some clarification.

Thanks in advance and best regards Manuel

opened by Manuelvh44 1
Question about mm_dst model
Maybe some mistakes, but I am not sure.

As far as I understand, '3.Generate prediction for devtest data' generated prediction 'simmc2.1_dials_dstc11_devtest_predicted.txt' for '4. Evaluate predictions for devtest data'. but the value of 'path_output' and 'input_path_predicted' are not consistent

2, In the results.

I am confusing the reported results are SIMMC 2 or 2.1
opened by XiaowenSun-Lab 1
Some inconsistencies in evaluation scripts
When I convert the flat text format into submission format, I see some inconsistencies in evaluation scripts. Can you clarify for me? Subtask 1 In ./model/utils/disambiguator_evaluation.py line 46:

assert "disambiguation_label" in gt_datum, "Turn not to be evaluated!"

This line will make the error when the groundtrue data do not have disambiguation_label. A lot of dialogue turn do not have the label so it causes the error when evaluating the dataset.

Subtask 3 In ./model/mm_dst/utils/evaluate_dst.py line [272] (https://github.com/facebookresearch/simmc2/blob/master/model/mm_dst/utils/evaluate_dst.py#L272)

true_frame_object_values == pred_frame_object_values

I think the n_correct_beliefs should not be related to frame_object_values, right?

Subtask 4 In ./model/utils/retrieval_evaluation.py According to the code, the expected format should be

[ "dialog_id": <dialog_id>, "candidate_scores": [ { "turn_id": <turn_id>, "scores": [ <list of 100 floats> ] } ... ] ... ]
opened by i2r-simmc 8

Code for SIMMC 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations

Related tags

Overview

The Second Situated Interactive MultiModal Conversations (SIMMC 2.0) Challenge 2021

Latest News

Important Links

Timeline

Track Description

Tasks and Metrics

Evaluation

Challenge Instructions

(1) Challenge Registration

(2) Download Datasets and Code

(3) Reporting Results for Challenge Phase 1

(4) Reporting Results for Challenge Phase 2

Contact

Questions related to SIMMC Track, Data, and Baselines

DSTC Mailing List

Citations

License

Comments

Owner

Facebook Research

DSTC10 Track 2 - Knowledge-grounded Task-oriented Dialogue Modeling on Spoken Conversations

This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.

This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

This repo contains the code and data used in the paper "Wizard of Search Engine: Access to Information Through Conversations with Search Engines"

Source code for our paper "Improving Empathetic Response Generation by Recognizing Emotion Cause in Conversations"

Data & Code for ACCENTOR Adding Chit-Chat to Enhance Task-Oriented Dialogues

PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.

This repo contains implementation of different architectures for emotion recognition in conversations.

PAthological QUpath Obsession - QuPath and Python conversations

Code for Talk-to-Edit (ICCV2021). Paper: Talk-to-Edit: Fine-Grained Facial Editing via Dialog.

This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

NeuralWOZ: Learning to Collect Task-Oriented Dialogue via Model-based Simulation (ACL-IJCNLP 2021)

A repository for the updated version of CoinRun used to collect MUGEN, a multimodal video-audio-text dataset.

VD-BERT: A Unified Vision and Dialog Transformer with BERT

Implementation of EMNLP 2017 Paper "Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog" using PyTorch and ParlAI

🌈 PyTorch Implementation for EMNLP'21 Findings "Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer"

Implementation of EMNLP 2017 Paper "Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog" using PyTorch and ParlAI