ACCENTOR: Adding Chit-Chat to Enhance Task-Oriented Dialogues
Overview
ACCENTOR consists of the human-annotated chit-chat additions to the 23.8K dialogues from Schema Guided Dialogue (SGD) and MultiWOZ 2.1, allowing researchers to study contexutal addition of chit-chat utterances for virtual assistants, to make task-oriented dialogues more engaging and social.
We also provide three new models for ACCENTOR explicitly trained to predict user goals and to generate contextually relevant chit-chat responses.
Automatic and human evaluations show that, compared with the state of-the-art task-oriented baseline, our models can code-switch between task and chit-chat to be more engaging, interesting, knowledgeable, and humanlike, while maintaining competitive task performance.
For more details, please refer to this paper.
Data
v1.0/candidates-{sgd,multiwoz}.json
: Annotated chit-chat candidates. The format is as follows.
{
"dialogue 1 / id": [
[
dialogue 1 / candidate 1 / turn id,
dialogue 1 / candidate 1 / position,
dialogue 1 / candidate 1 / candidate,
dialogue 1 / candidate 1 / label,
dialogue 1 / candidate 1 / justification
],
[
dialogue 1 / candidate 2 / turn id,
...
],
...
],
"dialogue 2 / id": [
...
],
...
}
-
Folder
v1.0/accentor-sgd
: The augmented SGD dataset. The format follows the original SGD dataset, with two additional keys (i.e.,beginning
andend
) that store lists of(candidate, label, justification)
tuples.- The folder is generated by
v1.0/accentor-sgd.py
(withv1.0/candidates-sgd.json
and the original SGD dataset as input). Usage:python3 v1.0/accentor-sgd.py --help
.
- The folder is generated by
-
v1.0/accentor-multiwoz-1k.json
: 1K augmented MultiWOZ 2.1 dialogues. The format follows the original MultiWOZ dataset, with two additional keys (i.e.,beginning
andend
) that store lists of(candidate, label, justification)
tuples.- The file is generated by
v1.0/accentor-multiwoz.py
(withv1.0/candidates-multiwoz.json
and the original MultiWOZ 2.1 dataset as input). Usage:python3 v1.0/accentor-multiwoz.py --help
.
- The file is generated by
Baseline Models
Preparation
-
Dependencies: ParlAI (af12799a) and Transformers (2.11.0)
-
Run the following commands to prepare the data for model training and the off-the-shelf models (i.e., a task-oriented dialogue model and a chit-chat model) for Arranger and Rewriter.
cp -r ./v1.0/accentor-sgd .
python3 gen_delex.py
python3 gen_parlai_data.py
parlai train_model -t fromfile:parlaiformat --fromfile_datapath ./parlai --fromfile-datatype-extension true -m transformer/generator --init-model zoo:tutorial_transformer_generator/model --dict-file zoo:tutorial_transformer_generator/model.dict --embedding-size 512 --n-layers 8 --ffn-size 2048 --dropout 0.1 --n-heads 16 --learn-positional-embeddings True --n-positions 512 --variant xlm --activation gelu --skip-generation True --fp16 True --text-truncate 512 --label-truncate 128 --dict-tokenizer bpe --dict-lower True -lr 1e-06 --optimizer adamax --lr-scheduler reduceonplateau --gradient-clip 0.1 -veps 0.25 --betas 0.9,0.999 --update-freq 1 --attention-dropout 0.0 --relu-dropout 0.0 --skip-generation True -vp 15 -stim 60 -vme 20000 -bs 16 -vmt ppl -vmm min --save-after-valid True --model-file ./train_90M
parlai interactive -mf ./train_90M < lm.input.dev.cc.txt > lm.output.dev.cc.txt
parlai interactive -mf ./train_90M < lm.input.test.cc.txt > lm.output.test.cc.txt
python3 run_language_modeling.py --output_dir=output_gpt2_10epoch_1e-3_fp16 --model_type=gpt2 --model_name_or_path=gpt2 --do_train --train_data_file=lm.input.train.txt --do_eval --eval_data_file=lm.input.dev.txt --per_device_train_batch_size 2 --gradient_accumulation_steps 18 --num_train_epochs 10 --learning_rate 1e-3 --fp16 --overwrite_output_dir
python3 run_generation.py --input lm.input.dev.eval.txt --output dev.inference.gpt2_10epoch_1e-3_fp16.json --model_name_or_path ./output_gpt2_10epoch_1e-3_fp16 --eos_token_id 50262
python3 run_generation.py --input lm.input.test.eval.txt --output test.inference.gpt2_10epoch_1e-3_fp16.json --model_name_or_path ./output_gpt2_10epoch_1e-3_fp16 --eos_token_id 50262
SimpleTOD+
- Dependency: Transformers (2.11.0)
python3 run_language_modeling.py --output_dir=output_both_gpt2_10epoch_1e-3_fp16 --model_type=gpt2 --model_name_or_path=gpt2 --do_train --train_data_file=lm.input.train.both.txt --do_eval --eval_data_file=lm.input.dev.both.txt --per_device_train_batch_size 2 --gradient_accumulation_steps 18 --num_train_epochs 10 --learning_rate 1e-3 --fp16 --overwrite_output_dir
python3 run_generation.py --input lm.input.dev.eval.txt --output dev.inference.both_gpt2_10epoch_1e-3_fp16.json --model_name_or_path ./output_both_gpt2_10epoch_1e-3_fp16 --eos_token_id 50262
python3 run_generation.py --input lm.input.test.eval.txt --output test.inference.both_gpt2_10epoch_1e-3_fp16.json --model_name_or_path ./output_both_gpt2_10epoch_1e-3_fp16 --eos_token_id 50262
Arranger
- Dependency: Transformers (2.2.0)
python3 gen_arranger_input.py
python3 run_multiple_choice.py --model_type roberta --task_name acc --model_name_or_path roberta-base --do_train --do_eval --do_test --do_lower_case --data_dir . --learning_rate 2e-5 --num_train_epochs 3 --max_seq_length 512 --output_dir acc_arranger_roberta_base_3epoch --per_gpu_eval_batch_size=16 --per_gpu_train_batch_size=1 --gradient_accumulation_steps 24 --overwrite_output --save_steps 10000
python3 gen_arranger_output.py
Rewriter
- Dependency: Transformers 2.11.0
python3 gen_rewriter_data.py
python3 run_language_modeling.py --output_dir=output_ff_gpt2_10epoch_1e-3_fp16 --model_type=gpt2 --model_name_or_path=gpt2 --do_train --train_data_file=lm.input.train.ff.txt --do_eval --eval_data_file=lm.input.dev.ff.txt --per_device_train_batch_size 2 --gradient_accumulation_steps 18 --num_train_epochs 10 --learning_rate 1e-3 --fp16 --overwrite_output_dir
python3 run_generation.py --input lm.input.dev.eval.ff.txt --output dev.inference.ff_gpt2_10epoch_1e-3_fp16.json --model_name_or_path ./output_ff_gpt2_10epoch_1e-3_fp16 --eos_token_id 50262
python3 run_generation.py --input lm.input.test.eval.ff.txt --output test.inference.ff_gpt2_10epoch_1e-3_fp16.json --model_name_or_path ./output_ff_gpt2_10epoch_1e-3_fp16 --eos_token_id 50262
Evaluation
-
Dependency: the official evaluation script of SGD
-
Pass the output inference files (i.e.,
{dev,test}.inference*.json
) togen_predict.py
to obtain act-slot F1 and BLEU-4 scores. For example,
python3 gen_predict.py --inference test.inference.both_gpt2_10epoch_1e-3_fp16.json --split test
- The above command will also generate a folder (named
./prediction/
by default), which can be passed to the official evaluation script of SGD to obtain the joint goal accuracy and average accuracy. For example,
python3 -m schema_guided_dst.evaluate --dstc8_data_dir ./simpletod/ --prediction_dir ./prediction/test/ --eval_set test --output_metric_file simpletod+_test_result.json
Citations
If you want to publish experimental results with our datasets or use the baseline models, please cite the following article (pdf):
@inproceedings{sun2020adding,
title={Adding Chit-Chat to Enhance Task-Oriented Dialogues},
author={Sun, Kai and Moon, Seungwhan and Crook, Paul and Roller, Stephen and Silvert, Becka and Liu, Bing and Wang, Zhiguang and Liu, Honglei and Cho, Eunjoon and Cardie, Claire},
booktitle={Proceedings of the NAACL-HLT},
year={2021},
url={https://arxiv.org/abs/2010.12757}
}
License
ACCENTOR is released under CC-BY-SA-4.0, see LICENSE for details.