The Submission for SIMMC 2.0 Challenge 2021
Requirements
- python 3.8.8
- pytorch 1.8.1
- transformers 4.8.2
- apex for multi-gpu
- nltk
Preprocessing
- Download Data
- Download the data provided by the challenge organizer and put it in the data folder.
- Unzip data files
- Image saving
- Preprocess the image files in advance. The preprocessed result has the image name as the key and visual as the value.
python3 image_preprocessor.py
python3 image_preprocessor_final.py
- The result(.pickle) is saved in res folder.
Step 1 (ITM)
First, the model is post-trained by image-to-text matching. Here, image is each object and text is the visual metadata of the object. Code is provided in the ITM folder.
Step 2 (BTM)
Second, pretraining is performed to use background reprsentation of image in subtasks. Similar to ITM, it is trained to match image and text, and the image is the background of the dialog and the text is the entire context of the dialog. Code is provided in the BTM folder.
Step 3
This is the learning process for each subtask. You can train the model in each folder (sub1, sub2_1, sub2_2, sub2_3, sub2_4, sub4).
Model
All models can be downloaded from the following link
model.pt is a model for evaluating devtest, and the result is saved in the dstc10-simmc-entry folder. model_final.pt is a model for evaluating teststd, and the result is saved in the dstc10-simmc-final-entry folder. However, the training of the model was not completed within the challenge period, so we inferred to model.pt for the teststd data in subtask2.
Evlauation
Using the evaluation script suggested by the challenge organizer
- Evaluation script for subtask1
- Evaluation script for subtask2
- Evaluation script for subtask4-generation
The SIMMC organizers introduce the scripts:
$ python tools/disambiguator_evaluation.py \
--pred_file="{PATH_TO_PRED_FILE}" \
--test_file="{PATH_TO_TEST_FILE}" \
(line-by-line evaluation)
$ python -m gpt2_dst.scripts.evaluate \
--input_path_target={PATH_TO_GROUNDTRUTH_TARGET} \
--input_path_predicted={PATH_TO_MODEL_PREDICTIONS} \
--output_path_report={PATH_TO_REPORT}
(Or, dialog level evaluation)
$ python -m utils.evaluate_dst \
--input_path_target={PATH_TO_GROUNDTRUTH_TARGET} \
--input_path_predicted={PATH_TO_MODEL_PREDICTIONS} \
--output_path_report={PATH_TO_REPORT}
$ python tools/response_evaluation.py \
--data_json_path={PATH_TO_GOLD_RESPONSES} \
--model_response_path={PATH_TO_MODEL_RESPONSES} \
--single_round_evaluation
$ python tools/retrieval_evaluation.py \
--retrieval_json_path={PATH_TO_GROUNDTRUTH_RETRIEVAL} \
--model_score_path={PATH_TO_MODEL_CANDIDATE_SCORES} \
--single_round_evaluation
DevTest Results
Subtask #1: Multimodal Disambiguation
Test Method | Accuracy |
---|---|
GPT2 from CO(Challenge Organizer) | 73.9 |
Ours | 92.28 |
Subtask #2: Multimodal Coreference Resolution
Test Method | Object F1 |
---|---|
GPT2 from CO | 0.366 |
Ours-1 (sub2_1) | 0.595 |
Ours-2 (sub2_2) | 0.604 |
Ours-3 (sub2_3) | 0.607 |
Ours-4 (sub2_4) | 0.608 |
Subtask #3: Multimodal Dialog State Tracking
No Training/Testing
Subtask #4: Multimodal Dialog Response Generation
Generation
Baseline | BLEU |
---|---|
GPT2 from CO | 0.192 |
MTN-SIMMC2 from CO | 0.217 |
Ours | 0.285 |
Retrieval
No Training/Testing