Code to reproduce experiments in the paper "Task-Oriented Dialogue as Dataflow Synthesis" (TACL 2020).

Microsoft

Last update: Dec 28, 2022

Related tags

Admin Panels task_oriented_dialogue_as_dataflow_synthesis

Overview

Task-Oriented Dialogue as Dataflow Synthesis

This repository contains tools and instructions for reproducing the experiments in the paper Task-Oriented Dialogue as Dataflow Synthesis (TACL 2020). If you use any source code or data included in this toolkit in your work, please cite the following paper.

@article{SMDataflow2020,
  author = {{Semantic Machines} and Andreas, Jacob and Bufe, John and Burkett, David and Chen, Charles and Clausman, Josh and Crawford, Jean and Crim, Kate and DeLoach, Jordan and Dorner, Leah and Eisner, Jason and Fang, Hao and Guo, Alan and Hall, David and Hayes, Kristin and Hill, Kellie and Ho, Diana and Iwaszuk, Wendy and Jha, Smriti and Klein, Dan and Krishnamurthy, Jayant and Lanman, Theo and Liang, Percy and Lin, Christopher H. and Lintsbakh, Ilya and McGovern, Andy and Nisnevich, Aleksandr and Pauls, Adam and Petters, Dmitrij and Read, Brent and Roth, Dan and Roy, Subhro and Rusak, Jesse and Short, Beth and Slomin, Div and Snyder, Ben and Striplin, Stephon and Su, Yu and Tellman, Zachary and Thomson, Sam and Vorobev, Andrei and Witoszko, Izabela and Wolfe, Jason and Wray, Abby and Zhang, Yuchen and Zotov, Alexander},
  title = {Task-Oriented Dialogue as Dataflow Synthesis},
  journal = {Transactions of the Association for Computational Linguistics},
  volume = {8},
  pages = {556--571},
  year = {2020},
  month = sep,
  url = {https://doi.org/10.1162/tacl_a_00333},
  abstract = {We describe an approach to task-oriented dialogue in which dialogue state is represented as a dataflow graph. A dialogue agent maps each user utterance to a program that extends this graph. Programs include metacomputation operators for reference and revision that reuse dataflow fragments from previous turns. Our graph-based state enables the expression and manipulation of complex user intents, and explicit metacomputation makes these intents easier for learned models to predict. We introduce a new dataset, SMCalFlow, featuring complex dialogues about events, weather, places, and people. Experiments show that dataflow graphs and metacomputation substantially improve representability and predictability in these natural dialogues. Additional experiments on the MultiWOZ dataset show that our dataflow representation enables an otherwise off-the-shelf sequence-to-sequence model to match the best existing task-specific state tracking model. The SMCalFlow dataset, code for replicating experiments, and a public leaderboard are available at \url{https://www.microsoft.com/en-us/research/project/dataflow-based-dialogue-semantic-machines}.},
}

Understand SMCalFlow Programs

Please read this document to understand the syntax of SMCalFlow programs, and read this document to understand their semantics.

Install

# (Recommended) Create a virtual environment
virtualenv --python=python3 env
source env/bin/activate

# Install the sm-dataflow package and its core dependencies
pip install git+https://github.com/microsoft/task_oriented_dialogue_as_dataflow_synthesis.git

# Download the spaCy model for tokenization
python -m spacy download en_core_web_md-2.2.0 --direct

# Install OpenNMT-py and PyTorch for training and running the models
pip install OpenNMT-py==1.0.0 torch==1.4.0

Our experiments used OpenNMT-py 1.0.0 with PyTorch 1.4.0. Other versions are not tested. You can skip these two packages if you don't need to train or run the models.

SMCalFlow Experiments

Follow the steps below to reproduce the results reported in the paper (Table 2).

NOTE: We highly recommend following the instructions for the leaderboard to report your results for consistency. If you use your own evaluation script, please pay attention to the notes in Step 2 and Step 7.

Download and unzip the SMCalFlow 1.0 dataset.

dataflow_dialogues_dir="output/dataflow_dialogues"
mkdir -p "${dataflow_dialogues_dir}"

cd "${dataflow_dialogues_dir}"
# Download the dataset `smcalflow.full.data.tgz` or `smcalflow.inlined.data.tgz`
# The `PATH_TO_DATA_TGZ` is the path to the tgz file of the corresponding dataset.
tar -xvzf PATH_TO_DATA_TGZ

SMCalFlow 1.0 links
- smcalflow.full.data.tgz
- smcalflow.inlined.data.tgz
SMCalFlow 2.0 can be found under the datasets folder.
The dataset is distributed under the CC BY-SA 4.0 license.

Compute data statistics:

dataflow_dialogues_stats_dir="output/dataflow_dialogues_stats"
mkdir -p "${dataflow_dialogues_stats_dir}"
python -m dataflow.analysis.compute_data_statistics \
    --dataflow_dialogues_dir ${dataflow_dialogues_dir} \
    --subset train valid \
    --outdir ${dataflow_dialogues_stats_dir}

Basic statistics

	num_dialogues	num_turns	num_kept_turns	num_skipped_turns	num_refer_turns	num_revise_turns
train	32,647	133,821	121,200	12,621	33,011	9,315
valid	3,649	14,757	13,499	1,258	3,544	1,052
test	5,211	22,012	21,224	7,88	8,965	3,315
all	41,517	170,590	155,923	14,667	45,520	13,682

We currently do not release the test set, but we report the data statistics here.
NOTE: There are a small number of turns (num_skipped_turns in the table) whose sole purpose is to establish dialogue context and should not be directly trained or tested on. The dataset statistics reported in the paper are based on non-skipped turns only.

Prepare text data for the OpenNMT toolkit.
```
onmt_text_data_dir="output/onmt_text_data"
mkdir -p "${onmt_text_data_dir}"
for subset in "train" "valid"; do
    python -m dataflow.onmt_helpers.create_onmt_text_data \
        --dialogues_jsonl ${dataflow_dialogues_dir}/${subset}.dataflow_dialogues.jsonl \
        --num_context_turns 2 \
        --include_program \
        --include_described_entities \
        --onmt_text_data_outbase ${onmt_text_data_dir}/${subset}
done
```
- We use --include_program to add the gold program of the context turns.
- We use --include_described_entities to add the entities (e.g., entity@123456) described in the generation outcome for the context turns. These entities mentioned in the context turns can appear in the "inlined" programs for the current turn, and thus, we include them in the source sequence so that the seq2seq model can produce such tokens via a copy mechanism.
- You can vary the number of context turns by changing --num_context_turns.

Compute statistics for the created OpenNMT text data.

onmt_data_stats_dir="output/onmt_data_stats"
mkdir -p "${onmt_data_stats_dir}"
python -m dataflow.onmt_helpers.compute_onmt_data_stats \
    --text_data_dir ${onmt_text_data_dir} \
    --suffix src src_tok tgt \
    --subset train valid \
    --outdir ${onmt_data_stats_dir}

Train OpenNMT models. You can also skip this step and instead download the trained model from the table below.

onmt_binarized_data_dir="output/onmt_binarized_data"
mkdir -p "${onmt_binarized_data_dir}"

src_tok_max_ntokens=$(jq '."100"' ${onmt_data_stats_dir}/train.src_tok.ntokens_stats.json)
tgt_max_ntokens=$(jq '."100"' ${onmt_data_stats_dir}/train.tgt.ntokens_stats.json)

# create OpenNMT binarized data
onmt_preprocess \
    --dynamic_dict \
    --train_src ${onmt_text_data_dir}/train.src_tok \
    --train_tgt ${onmt_text_data_dir}/train.tgt \
    --valid_src ${onmt_text_data_dir}/valid.src_tok \
    --valid_tgt ${onmt_text_data_dir}/valid.tgt \
    --src_seq_length ${src_tok_max_ntokens} \
    --tgt_seq_length ${tgt_max_ntokens} \
    --src_words_min_frequency 0 \
    --tgt_words_min_frequency 0 \
    --save_data ${onmt_binarized_data_dir}/data

# extract pretrained Glove 840B embeddings (https://nlp.stanford.edu/projects/glove/)
glove_840b_dir="output/glove_840b"
mkdir -p "${glove_840b_dir}"
wget -O ${glove_840b_dir}/glove.840B.300d.zip http://nlp.stanford.edu/data/glove.840B.300d.zip
unzip ${glove_840b_dir}/glove.840B.300d.zip -d ${glove_840b_dir}

onmt_embeddings_dir="output/onmt_embeddings"
mkdir -p "${onmt_embeddings_dir}"
python -m dataflow.onmt_helpers.embeddings_to_torch \
    -emb_file_both ${glove_840b_dir}/glove.840B.300d.txt \
    -dict_file ${onmt_binarized_data_dir}/data.vocab.pt \
    -output_file ${onmt_embeddings_dir}/embeddings

# train OpenNMT models
onmt_models_dir="output/onmt_models"
mkdir -p "${onmt_models_dir}"

batch_size=64
train_num_datapoints=$(jq '.train' ${onmt_data_stats_dir}/nexamples.json)
# validate approximately at each epoch
valid_steps=$(python3 -c "from math import ceil; print(ceil(${train_num_datapoints}/${batch_size}))")

onmt_train \
    --encoder_type brnn \
    --decoder_type rnn \
    --rnn_type LSTM \
    --global_attention general \
    --global_attention_function softmax \
    --generator_function softmax \
    --copy_attn_type general \
    --copy_attn \
    --seed 1 \
    --optim adam \
    --learning_rate 0.001 \
    --early_stopping 2 \
    --batch_size ${batch_size} \
    --valid_batch_size 8 \
    --valid_steps ${valid_steps} \
    --save_checkpoint_steps ${valid_steps} \
    --data ${onmt_binarized_data_dir}/data \
    --pre_word_vecs_enc ${onmt_embeddings_dir}/embeddings.enc.pt \
    --pre_word_vecs_dec ${onmt_embeddings_dir}/embeddings.dec.pt \
    --word_vec_size 300 \
    --attention_dropout 0 \
    --dropout 0.5 \
    --layers ??? \
    --rnn_size ??? \
    --gpu_ranks 0 \
    --world_size 1 \
    --save_model ${onmt_models_dir}/checkpoint

Hyperparameters for models reported in the Table 2 in the paper.

	`--layers`	`--rnn_size`	model
dataflow	2	384	link
inline	3	384	link

Make predictions using a trained OpenNMT model. You need to replace the checkpoint_last.pt in the following script with the final model you get from the previous step.

onmt_translate_outdir="output/onmt_translate_output"
mkdir -p "${onmt_translate_outdir}"

onmt_model_pt="${onmt_models_dir}/checkpoint_last.pt"
nbest=5
tgt_max_ntokens=$(jq '."100"' ${onmt_data_stats_dir}/train.tgt.ntokens_stats.json)

# predict programs using a trained OpenNMT model
onmt_translate \
    --model ${onmt_model_pt} \
    --max_length ${tgt_max_ntokens} \
    --src ${onmt_text_data_dir}/valid.src_tok \
    --replace_unk \
    --n_best ${nbest} \
    --batch_size 8 \
    --beam_size 10 \
    --gpu 0 \
    --report_time \
    --output ${onmt_translate_outdir}/valid.nbest

Compute the exact-match accuracy (taking into account whether the program_execution_oracle.refer_are_correct is true).

evaluation_outdir="output/evaluation_output"
mkdir -p "${evaluation_outdir}"

# create the prediction report
python -m dataflow.onmt_helpers.create_onmt_prediction_report \
    --dialogues_jsonl ${dataflow_dialogues_dir}/valid.dataflow_dialogues.jsonl \
    --datum_id_jsonl ${onmt_text_data_dir}/valid.datum_id \
    --src_txt ${onmt_text_data_dir}/valid.src_tok \
    --ref_txt ${onmt_text_data_dir}/valid.tgt \
    --nbest_txt ${onmt_translate_outdir}/valid.nbest \
    --nbest ${nbest} \
    --outbase ${evaluation_outdir}/valid

# evaluate the predictions (all turns)
python -m dataflow.onmt_helpers.evaluate_onmt_predictions \
    --prediction_report_tsv ${evaluation_outdir}/valid.prediction_report.tsv \
    --scores_json ${evaluation_outdir}/valid.all.scores.json

# evaluate the predictions (refer turns)
python -m dataflow.onmt_helpers.evaluate_onmt_predictions \
    --prediction_report_tsv ${evaluation_outdir}/valid.prediction_report.tsv \
    --datum_ids_json ${dataflow_dialogues_stats_dir}/valid.refer_turn_ids.jsonl \
    --scores_json ${evaluation_outdir}/valid.refer_turns.scores.json

# evaluate the predictions (revise turns)
python -m dataflow.onmt_helpers.evaluate_onmt_predictions \
    --prediction_report_tsv ${evaluation_outdir}/valid.prediction_report.tsv \
    --datum_ids_json ${dataflow_dialogues_stats_dir}/valid.revise_turn_ids.jsonl \
    --scores_json ${evaluation_outdir}/valid.revise_turns.scores.json

NOTE: The numbers reported using the scripts above should match those reported in Table 2 in the paper. The leaderboard has used a slightly different evaluation script that canonicalizes both the gold and predicted programs, and thus, the accuracy would be slightly higher (e.g., 0.665 vs. 0.668 on the test set). To obtain the leaderboard results, please add --use_leaderboard_metric when running python -m dataflow.onmt_helpers.create_onmt_prediction_report to create the report.

Calculate the statistical significance for two different experiments.

analysis_outdir="output/analysis_output"
mkdir -p "${analysis_outdir}"
python -m dataflow.analysis.calculate_statistical_significance \
    --exp0_prediction_report_tsv ${exp0_evaluation_outdir}/valid.prediction_report.tsv \
    --exp1_prediction_report_tsv ${exp1_evaluation_outdir}/valid.prediction_report.tsv \
    --scores_json ${analysis_outdir}/exp0_vs_exp1.valid.scores.json

The exp0_evaluation_outdir and exp1_evaluation_outdir are the evaluation_outdir in Step 7 for corresponding experiments.
You can also provide --datum_ids_jsonl to carry out the significance test on a subset of turns.

MultiWOZ Experiments

Download the MultiWoZ dataset and convert it to dataflow programs.

# creates TRADE-processed dialogues
raw_trade_dialogues_dir="output/trade_dialogues"
mkdir -p "${raw_trade_dialogues_dir}"
python -m dataflow.multiwoz.trade_dst.create_data \
    --use_multiwoz_2_1 \
    --output_dir ${raw_trade_dialogues_dir}

# patch TRADE dialogues
patched_trade_dialogues_dir="output/patched_trade_dialogues"
mkdir -p "${patched_trade_dialogues_dir}"
for subset in "train" "dev" "test"; do
    python -m dataflow.multiwoz.patch_trade_dialogues \
        --trade_data_file ${raw_trade_dialogues_dir}/${subset}_dials.json \
        --outbase ${patched_trade_dialogues_dir}/${subset}
done
ln -sr ${patched_trade_dialogues_dir}/dev_dials.json ${patched_trade_dialogues_dir}/valid_dials.json

# create dataflow programs
dataflow_dialogues_dir="output/dataflow_dialogues"
mkdir -p "${dataflow_dialogues_dir}"
for subset in "train" "valid" "test"; do
    python -m dataflow.multiwoz.create_programs \
        --trade_data_file ${patched_trade_dialogues_dir}/${subset}_dials.json \
        --outbase ${dataflow_dialogues_dir}/${subset}
done

To create programs that inline refer calls, add --no_refer when running the dataflow.multiwoz.create_programs command.
To create programs that inline both refer and revise calls, add --no_refer --no_revise.

Prepare text data for the OpenNMT toolkit.

onmt_text_data_dir="output/onmt_text_data"
mkdir -p "${onmt_text_data_dir}"
for subset in "train" "valid" "test"; do
    python -m dataflow.onmt_helpers.create_onmt_text_data \
        --dialogues_jsonl ${dataflow_dialogues_dir}/${subset}.dataflow_dialogues.jsonl \
        --num_context_turns 2 \
        --include_agent_utterance \
        --onmt_text_data_outbase ${onmt_text_data_dir}/${subset}
done

We use --include_agent_utterance following the setup in TRADE (Wu et al., 2019).
You can vary the number of context turns by changing --num_context_turns.

Compute statistics for the created OpenNMT text data.

onmt_data_stats_dir="output/onmt_data_stats"
mkdir -p "${onmt_data_stats_dir}"
python -m dataflow.onmt_helpers.compute_onmt_data_stats \
    --text_data_dir ${onmt_text_data_dir} \
    --suffix src src_tok tgt \
    --subset train valid test \
    --outdir ${onmt_data_stats_dir}

Train OpenNMT models. You can also skip this step and instead download the trained models from the table below.

onmt_binarized_data_dir="output/onmt_binarized_data"
mkdir -p "${onmt_binarized_data_dir}"

# create OpenNMT binarized data
src_tok_max_ntokens=$(jq '."100"' ${onmt_data_stats_dir}/train.src_tok.ntokens_stats.json)
tgt_max_ntokens=$(jq '."100"' ${onmt_data_stats_dir}/train.tgt.ntokens_stats.json)

onmt_preprocess \
    --dynamic_dict \
    --train_src ${onmt_text_data_dir}/train.src_tok \
    --train_tgt ${onmt_text_data_dir}/train.tgt \
    --valid_src ${onmt_text_data_dir}/valid.src_tok \
    --valid_tgt ${onmt_text_data_dir}/valid.tgt \
    --src_seq_length ${src_tok_max_ntokens} \
    --tgt_seq_length ${tgt_max_ntokens} \
    --src_words_min_frequency 0 \
    --tgt_words_min_frequency 0 \
    --save_data ${onmt_binarized_data_dir}/data

# extract pretrained Glove 6B embeddings
glove_6b_dir="output/glove_6b"
mkdir -p "${glove_6b_dir}"
wget -O ${glove_6b_dir}/glove.6B.zip http://nlp.stanford.edu/data/glove.6B.zip
unzip ${glove_6b_dir}/glove.6B.zip -d ${glove_6b_dir}

onmt_embeddings_dir="output/onmt_embeddings"
mkdir -p "${onmt_embeddings_dir}"
python -m dataflow.onmt_helpers.embeddings_to_torch \
    -emb_file_both ${glove_6b_dir}/glove.6B.300d.txt \
    -dict_file ${onmt_binarized_data_dir}/data.vocab.pt \
    -output_file ${onmt_embeddings_dir}/embeddings

# train OpenNMT models
onmt_models_dir="output/onmt_models"
mkdir -p "${onmt_models_dir}"

batch_size=64
train_num_datapoints=$(jq '.train' ${onmt_data_stats_dir}/nexamples.json)
# approximately validate at each epoch
valid_steps=$(python3 -c "from math import ceil; print(ceil(${train_num_datapoints}/${batch_size}))")

onmt_train \
    --encoder_type brnn \
    --decoder_type rnn \
    --rnn_type LSTM \
    --global_attention general \
    --global_attention_function softmax \
    --generator_function softmax \
    --copy_attn_type general \
    --copy_attn \
    --seed 1 \
    --optim adam \
    --learning_rate 0.001 \
    --early_stopping 2 \
    --batch_size ${batch_size} \
    --valid_batch_size 8 \
    --valid_steps ${valid_steps} \
    --save_checkpoint_steps ${valid_steps} \
    --data ${onmt_binarized_data_dir}/data \
    --pre_word_vecs_enc ${onmt_embeddings_dir}/embeddings.enc.pt \
    --pre_word_vecs_dec ${onmt_embeddings_dir}/embeddings.dec.pt \
    --word_vec_size 300 \
    --attention_dropout 0 \
    --dropout ??? \
    --layers ??? \
    --rnn_size ??? \
    --gpu_ranks 0 \
    --world_size 1 \
    --save_model ${onmt_models_dir}/checkpoint

Hyperparameters for models reported in the Table 3 in the paper.

	`--dropout`	`--layers`	`--rnn_size`	model
dataflow (`--num_context_turns 2`)	0.7	2	384	link
inline refer (`--num_context_turns 4`)	0.3	3	320	link
inline both (`--num_context_turns 10`)	0.7	2	320	link

Make predictions using a trained OpenNMT model. You need to replace the checkpoint_last.pt in the following script with the actual model you get from the previous step.

onmt_translate_outdir="output/onmt_translate_output"
mkdir -p "${onmt_translate_outdir}"

onmt_model_pt="${onmt_models_dir}/checkpoint_last.pt"
nbest=5
tgt_max_ntokens=$(jq '."100"' ${onmt_data_stats_dir}/train.tgt.ntokens_stats.json)

# predict programs on the test set using a trained OpenNMT model
onmt_translate \
    --model ${onmt_model_pt} \
    --max_length ${tgt_max_ntokens} \
    --src ${onmt_text_data_dir}/test.src_tok \
    --replace_unk \
    --n_best ${nbest} \
    --batch_size 8 \
    --beam_size 10 \
    --gpu 0 \
    --report_time \
    --output ${onmt_translate_outdir}/test.nbest

Compute the exact-match accuracy of the program predictions.

evaluation_outdir="output/evaluation_output"
mkdir -p "${evaluation_outdir}"

# create the prediction report
python -m dataflow.onmt_helpers.create_onmt_prediction_report \
    --dialogues_jsonl ${dataflow_dialogues_dir}/test.dataflow_dialogues.jsonl \
    --datum_id_jsonl ${onmt_text_data_dir}/test.datum_id \
    --src_txt ${onmt_text_data_dir}/test.src_tok \
    --ref_txt ${onmt_text_data_dir}/test.tgt \
    --nbest_txt ${onmt_translate_outdir}/test.nbest \
    --nbest ${nbest} \
    --outbase ${evaluation_outdir}/test

# evaluate the predictions
python -m dataflow.onmt_helpers.evaluate_onmt_predictions \
    --prediction_report_tsv ${evaluation_outdir}/test.prediction_report.tsv \
    --scores_json ${evaluation_outdir}/test.scores.json

Evaluate the belief state predictions.

belief_state_tracker_eval_dir="output/belief_state_tracker_eval"
mkdir -p "${belief_state_tracker_eval_dir}"

# creates the gold file from TRADE-preprocessed dialogues (after patch)
python -m dataflow.multiwoz.create_belief_state_tracker_data \
    --trade_data_file ${patched_trade_dialogues_dir}/test_dials.json \
    --belief_state_tracker_data_file ${belief_state_tracker_eval_dir}/test.belief_state_tracker_data.jsonl

# creates the hypo file from predicted programs
python -m dataflow.multiwoz.execute_programs \
    --dialogues_file ${evaluation_outdir}/test.dataflow_dialogues.jsonl \
    --cheating_mode never \
    --outbase ${belief_state_tracker_eval_dir}/test.hypo

python -m dataflow.multiwoz.create_belief_state_prediction_report \
    --input_data_file ${belief_state_tracker_eval_dir}/test.hypo.execution_results.jsonl \
    --format dataflow \
    --remove_none \
    --gold_data_file ${belief_state_tracker_eval_dir}/test.belief_state_tracker_data.jsonl \
    --outbase ${belief_state_tracker_eval_dir}/test

# evaluates belief state predictions
python -m dataflow.multiwoz.evaluate_belief_state_predictions \
    --prediction_report_jsonl ${belief_state_tracker_eval_dir}/test.prediction_report.jsonl \
    --outbase ${belief_state_tracker_eval_dir}/test

The scores are reported in ${belief_state_tracker_eval_dir}/test.scores.json.

Calculate the statistical significance for two different experiments.

analysis_outdir="output/analysis_output"
mkdir -p "${analysis_outdir}"
python -m dataflow.analysis.calculate_statistical_significance \
    --exp0_prediction_report_tsv ${exp0_evaluation_outdir}/test.prediction_report.tsv \
    --exp1_prediction_report_tsv ${exp1_evaluation_outdir}/test.prediction_report.tsv \
    --scores_json ${analysis_outdir}/exp0_vs_exp1.test.scores.json

The exp0_evaluation_outdir and exp1_evaluation_outdir are the belief_state_tracker_eval_dir in Step 7 for corresponding experiments.

Comments

More clarification on the data format

Hi, thanks for the effort of open-sourcing the data and codes! I'm curious whether there is (or will be) more detailed documentation on the data provided. For example, what each of the data's fields represents and how are we supposed to use them. I'm having trouble locating this in the paper and the README, but feel free to point me to specific line numbers if I miss anything. Thanks in advance for the help!

opened by HanGuo97 7
program length

Hello,

Looking at your paper (Task-Oriented Dialogue as Dataflow Synthesis), there is a table with program length for .25, .50, .75 quantiles (table 1) - with values (11, 40, 57) for SMCalflow. This looks on the hight side... How is this length calculated? for example - what is the length of this program. (which looks to be of "average" length): (Yield :output (CreateCommitEventWrapper :event (CreatePreflightEventWrapper :constraint (Constraint[Event] :start (?= (DateAtTimeWithDefaults :date (Tomorrow) :time (NumberAM :number #(Number 10.0)))) :subject (?= #(String "soccer game"))))))

Thanks!

opened by jrmmrn 4
how to get the string of program as in the paper instead of lispress

I've read the relevant issues and READMEs. The annotation of the program is in the lispress format, which can be parsed through parse_lispress and transformed to a Program object through lispress_to_program. However, how can I get the string that represents the function calls directly as in the paper (e.g., findEvent(EventSpec(start=DateTimeSpec(month=feb, day=30))))?

Thanks!

opened by zqwerty 2
How could one execute a dataflow program?
Hi all! Apologies in advance for the wall of text. I was interested in looking into interactive extensions using the dataflow representation and wanted to know roughly how executor of a lispress program works in a fully interactive dialogue. To my understanding an executor is not provided, but I'm curious about how one could be constructed. The high-level dialogue flow I considered was:

Parse a user utterance and context history into a new Program

all the important pieces for this seem to be provided

From this Program, use infer_types(program: Program, library: Dict[str, Definition]) -> Program to produce an equivalent Program that is fully-typed

Missing for SMCalFlow is the library: Dict[str, Definition] which if I understand right, cannot be provided due to legal reasons mentioned here.

Execute this fully-typed lispress Program to produce an execution ?

Generate an agent utterance from the program and its execution

Use these results as context in subsequent dialogues

Most of my questions are on representations for 3 and 5:

Is this a reasonable understanding of the process for a full interactive system in this representation?

For type-inference and execution of the program in step 2 & 3, I believe one would need a library: Dict[str, Definition] with additional pointers to implementations and some kind of executor execute(program: Program, library: Dict[str, Definition]) -> ? which could be library-agnostic.

Is this correct, or are there additional missing pieces not present in this repository?

What type of result would be produced by an executor? Could one just append the result of each CallOp as a ValueOp expression, returning an extended Program?

Is a single Program also an adequate representation for a full dialogue, or only a turn? e.g. each turn in a dialogue appears to produce a distinct Program instead of appending expressions to the initial one, but I wasn't sure if this was a dataset artifact or something that would also be true when run interactively.

Last, the execution methods in the multiwoz section of the repository seem to be focused on translating a Program to a belief state dictionary and vice-versa. Do these also resemble actual program execution in dataflow? For example the VanillaSalienceModel takes an ExecutionTrace argument which itself holds slot-value pairs. What would a similar VanillaSalienceModel look like in a dataflow-only representation?

Thanks for releasing this dataset and library and for any insight you can provide! For context, given the size of the SMCalFlow library that would need to be implemented I am looking into other potential libraries to work with in the dataflow representation, but wanted to know the best place to start.
opened by kingb12 2
What is the different between SMCalFlow 1.0 and 2.0?

Hi, thank you for your contribution of the datasets. I noticed that there is SMCalFlow 1.0 and SMCalFlow 2.0 in this repository, but this is not mentioned in either your paper or your leader board, which only contains SMCalFlow. If possible, may you explain the difference between SMCalFlow 1.0 and SMCalFlow 2.0?

opened by PosoSAgapo 2
Inquiry about the new model

hi, I notice that a new model has shown up in the leaderboard named "Value-Agnostic Conversational Semantic Parsing". Is there any publication linked to the model?

opened by YiweiJiang2015 2
regarding agent utterance

Hi,

from reading the paper and looking at the training/valid data , I understand that the seq2seq model training is user_utterance -> program. I just want to know how to involve agent_utterance into the picture. Would be great if you could provide some tips regarding the same. Thanks!

opened by nikhil-iyer-97 2

Lambda lispress

updates lispress_to_program and program_to_lispress to handle the lambda syntax that's used in TreeDST. e.g.:

(action
  (Inform
    (^(Navigation) Find
      :focus (Some
        (Constraint.apply
          (lambda
            (^Navigation x0)
            (allows
              (Constraint.apply
                (lambda
                  (^AppleDuration x1)
                  (allows (?= 10) (AppleDuration.minutes x1))))
              (Navigation.travelTime x0))))))))

opened by sammthomson 1

Bump pydantic from 1.4 to 1.6.2
Bumps pydantic from 1.4 to 1.6.2.

Release notes

Sourced from pydantic's releases.

v1.6.2 (2021-05-11)

Security fix: Fix date and datetime parsing so passing either 'infinity' or float('inf') (or their negative values) does not cause an infinite loop, see security advisory CVE-2021-29510.

v1.6.1 (2020-07-15)

See Changelog.

Thank you to pydantic's sponsors: @matin, @tiangolo, @chdsbd, @jorgecarleitao, and 1 anonymous sponsor for their kind support.

changes:

fix validation and parsing of nested models with default_factory, #1710 by @PrettyWood

v1.6 (2020-07-11)

See Changelog.

Thank you to pydantic's sponsors: @matin, @tiangolo, @chdsbd, @jorgecarleitao, and 1 anonymous sponsor for their kind support.

changes:

Modify validators for conlist and conset to not have always=True, #1682 by @samuelcolvin

add port check to AnyUrl (can't exceed 65536) ports are 16 insigned bits: 0 <= port <= 2**16-1 src: rfc793 header format, #1654 by @flapili

Document default regex anchoring semantics, #1648 by @yurikhan

Use chain.from_iterable in class_validators.py. This is a faster and more idiomatic way of using itertools.chain. Instead of computing all the items in the iterable and storing them in memory, they are computed one-by-one and never stored as a huge list. This can save on both runtime and memory space, #1642 by @cool-RR

Add conset(), analogous to conlist(), #1623 by @patrickkwang

make pydantic errors (un)pickable, #1616 by @PrettyWood

Allow custom encoding for dotenv files, #1615 by @PrettyWood

Ensure SchemaExtraCallable is always defined to get type hints on BaseConfig, #1614 by @PrettyWood

Update datetime parser to support negative timestamps, #1600 by @mlbiche

Update mypy, remove AnyType alias for Type[Any], #1598 by @samuelcolvin

Adjust handling of root validators so that errors are aggregated from all failing root validators, instead of reporting on only the first root validator to fail, #1586 by @beezee

Make __modify_schema__ on Enums apply to the enum schema rather than fields that use the enum, #1581 by @therefromhere

Fix behavior of __all__ key when used in conjunction with index keys in advanced include/exclude of fields that are sequences, #1579 by @xspirus

Subclass validators do not run when referencing a List field defined in a parent class when each_item=True. Added an example to the docs illustrating this, #1566 by @samueldeklund

change schema.field_class_to_schema to support frozenset in schema, #1557 by @wangpeibao

Call __modify_schema__ only for the field schema, #1552 by @PrettyWood

Move the assignment of field.validate_always in fields.py so the always parameter of validators work on inheritance, #1545 by @dcHHH

Added support for UUID instantiation through 16 byte strings such as b'\x12\x34\x56\x78' * 4. This was done to support BINARY(16) columns in sqlalchemy, #1541 by @shawnwall

Add a test assertion that default_factory can return a singleton, #1523 by @therefromhere

Add NameEmail.__eq__ so duplicate NameEmail instances are evaluated as equal, #1514 by @stephen-bunn

Add datamodel-code-generator link in pydantic document site, #1500 by @koxudaxi

Added a "Discussion of Pydantic" section to the documentation, with a link to "Pydantic Introduction" video by Alexander Hultnér, #1499 by @hultner

Avoid some side effects of default_factory by calling it only once if possible and by not setting a default value in the schema, #1491 by @PrettyWood

Added docs about dumping dataclasses to JSON, #1487 by @mikegrima

Make BaseModel.__signature__ class-only, so getting __signature__ from model instance will raise AttributeError, #1466 by @MrMrRobat

include 'format': 'password' in the schema for secret types, #1424 by @atheuz

Modify schema constraints on ConstrainedFloat so that exclusiveMinimum and minimum are not included in the schema if they are equal to -math.inf and exclusiveMaximum and maximum are not included if they are equal to math.inf, #1417 by @vdwees

Squash internal __root__ dicts in .dict() (and, by extension, in .json()), #1414 by @patrickkwang

... (truncated)

Changelog

Sourced from pydantic's changelog.

v1.6.2 (2021-05-11)

Security fix: Fix date and datetime parsing so passing either 'infinity' or float('inf') (or their negative values) does not cause an infinite loop, See security advisory CVE-2021-29510

v1.6.1 (2020-07-15)

fix validation and parsing of nested models with default_factory, #1710 by @PrettyWood

v1.6 (2020-07-11)

Thank you to pydantic's sponsors: @matin, @tiangolo, @chdsbd, @jorgecarleitao, and 1 anonymous sponsor for their kind support.

Modify validators for conlist and conset to not have always=True, #1682 by @samuelcolvin

add port check to AnyUrl (can't exceed 65536) ports are 16 insigned bits: 0 <= port <= 2**16-1 src: rfc793 header format, #1654 by @flapili

Document default regex anchoring semantics, #1648 by @yurikhan

Use chain.from_iterable in class_validators.py. This is a faster and more idiomatic way of using itertools.chain. Instead of computing all the items in the iterable and storing them in memory, they are computed one-by-one and never stored as a huge list. This can save on both runtime and memory space, #1642 by @cool-RR

Add conset(), analogous to conlist(), #1623 by @patrickkwang

make pydantic errors (un)pickable, #1616 by @PrettyWood

Allow custom encoding for dotenv files, #1615 by @PrettyWood

Ensure SchemaExtraCallable is always defined to get type hints on BaseConfig, #1614 by @PrettyWood

Update datetime parser to support negative timestamps, #1600 by @mlbiche

Update mypy, remove AnyType alias for Type[Any], #1598 by @samuelcolvin

Adjust handling of root validators so that errors are aggregated from all failing root validators, instead of reporting on only the first root validator to fail, #1586 by @beezee

Make __modify_schema__ on Enums apply to the enum schema rather than fields that use the enum, #1581 by @therefromhere

Fix behavior of __all__ key when used in conjunction with index keys in advanced include/exclude of fields that are sequences, #1579 by @xspirus

Subclass validators do not run when referencing a List field defined in a parent class when each_item=True. Added an example to the docs illustrating this, #1566 by @samueldeklund

change schema.field_class_to_schema to support frozenset in schema, #1557 by @wangpeibao

Call __modify_schema__ only for the field schema, #1552 by @PrettyWood

Move the assignment of field.validate_always in fields.py so the always parameter of validators work on inheritance, #1545 by @dcHHH

Added support for UUID instantiation through 16 byte strings such as b'\x12\x34\x56\x78' * 4. This was done to support BINARY(16) columns in sqlalchemy, #1541 by @shawnwall

Add a test assertion that default_factory can return a singleton, #1523 by @therefromhere

Add NameEmail.__eq__ so duplicate NameEmail instances are evaluated as equal, #1514 by @stephen-bunn

Add datamodel-code-generator link in pydantic document site, #1500 by @koxudaxi

Added a "Discussion of Pydantic" section to the documentation, with a link to "Pydantic Introduction" video by Alexander Hultnér, #1499 by @hultner

Avoid some side effects of default_factory by calling it only once if possible and by not setting a default value in the schema, #1491 by @PrettyWood

Added docs about dumping dataclasses to JSON, #1487 by @mikegrima

Make BaseModel.__signature__ class-only, so getting __signature__ from model instance will raise AttributeError, #1466 by @MrMrRobat

include 'format': 'password' in the schema for secret types, #1424 by @atheuz

Modify schema constraints on ConstrainedFloat so that exclusiveMinimum and minimum are not included in the schema if they are equal to -math.inf and exclusiveMaximum and maximum are not included if they are equal to math.inf, #1417 by @vdwees

Squash internal __root__ dicts in .dict() (and, by extension, in .json()), #1414 by @patrickkwang

Move const validator to post-validators so it validates the parsed value, #1410 by @selimb

Fix model validation to handle nested literals, e.g. Literal['foo', Literal['bar']], #1364 by @DBCerigo

Remove user_required = True from RedisDsn, neither user nor password are required, #1275 by @samuelcolvin

... (truncated)

Commits

acf7783 tweak history

829528c comment out broken tests

cf9a417 hack tests into passing

b37a922 fix formatting

ac360c5 prepare for release

bdde15b Merge pull request from GHSA-5jqp-qgf6-3pvh

d2b0501 uprev

e2fcab5 fix: validate and parse nested models properly with default_factory (#1712)

ba56a67 Bump pytest-mock from 3.1.1 to 3.2.0 (#1719)

f1f944f Update datamode_code_generator:typo in pip install (#1713)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 1
bugfix in create_leaderboard_data: don't use tokenized_lispress

create_leaderboard_data was inserting spaces in string literals in Lispress (which matched our unprocessed seq2seq output, but was not semantically equivalent to the real data). #20 fixes our seq2seq output to trim spaces, and this PR fixes the leaderboard data creation script to pass through the original Lispress unchanged.

opened by sammthomson 1
Canonicalize `let` bindings

Also gives do its own Expression in a Program, so that order and multiplicity is not lost. These changes make comparing canonicalized Lispress with exact match more robust.

opened by sammthomson 0
OpenDF / Constraints

Hello,

Thanks again for putting the tremendous effort to develop and release SMCalFlow!

Following the motto "you don't understand it until you implement it", we tried to reproduce the basic functionality of a system which would be able to execute some SMCalFlow dialogues. Trying to do this, really made it clear 1) how much work was put into building the full SMCalFlow application, and 2) that there are many design decisions to be made when developing such a system, which would result in different flavors of dataflow applications.

One central example of this is the way constraints are represented, handled, and executed. If you could give some more explanation, that would be very appreciated. For example, multiple constraints are often combined in a compositional pattern, or are surrounded by intension/extension markers. Does this mean you use lazy evaluation? And how do you handle contradictions between constraints?...

We tried to simplify the expressions in our implementation, to make them more understandable (somewhat along the lines of your macros in "Constrained Language Models Yield Few-Shot Semantic Parsers"), while still being executable. This is described in an upcoming paper in LREC 2022 : "Simplifying Semantic Annotations of SMCalFlow", and the corresponding package OpenDF - https://github.com/telepathylabsai/OpenDF, which includes the code to transform SMCalFlow expressions to a simplified form, and code to execute simplified expressions.

We hope OpenDF will increase interest in dataflow dialogues, and encourage development of new dialogue designs.

opened by meron-tl 2
How to retrive the refer and recovery information from previous turn?

Thank you for releasing this wonderful dataset. I am wondering how one could actually execute a program?

For example, given such dialogue below: __User When is my next staff meeting scheduled for? __User Is there a meeting after that? __StartOfProgram The target program is : ( let ( x0 ( Event.end ( Execute ( refer ( extensionConstraint ( ^ ( Event ) EmptyStructConstraint ) ) ) ) ) ) ( Yield ( > ( size ( QueryEventResponse.results ( FindEventWrapperWithDefaults ( EventOnDateAfterTime ( DateTime.date x0 ) ( ^ ( Event ) EmptyStructConstraint ) ( DateTime.time x0 ) ) ) ) ) 0L ) ) ) The target program before this turn is : ( Yield ( Event.start ( FindNumNextEvent ( Event.subject_? ( ?~= " staff meeting " ) ) 1L ) ) )

We know that the that in the dialogue is to refer to a staff meeting in the previous turn, but how we could automatically resolve this refer and locate the event or entity it refers to? What I mean is, is there a way that could automatically retrieve the referred event/entity from previous dialogue turns?

To be exact: There is a refer function in the target program, which we can know from the context, it refers to the staff meeting. This staff meaning is also in the target program of the previous turn. How can we automatically know that refer refers to staff meeting of the previous program? Is a code provided in the repository to resolve such special functions like refer/recovery/revision? So we could automatically retrieve information from previous turns?

It seems like the repository does not provide a program executor, but I want to is such function provided? If it isn't, training in a Seq2Seq way does not help the model to understand what refer is, since in such occasion, refer is just a normal function like any other function, but we want the model to find related information from the previous turn when the model encounters such function like refer/recovery/revision. If it is, how we could use current codes in the repository to automatically retrieve information from previous turns so that the model could actually use information from previous turns when encountering the refer token?

opened by PosoSAgapo 0
Is SMCalFlow more like a Semantic Parsing dataset?

I know the main contribution of the SMCalFlow is targeted in a new representation of dialogue state, but it seems the SMCalFlow seems to be a very complex scheme. It is true that by using Dataflow structure you can solve many hard problems in the dialogue like reference, but as you mentioned in the repository, all the data in the SMCalFlow is represented as lispress language. Both the generated prediction and the gold data are treated as a lispress program and will be executed using a parser to get the results, so it really makes me think a SMCalFlow is more like a Semantic Parsing task, which the goal is to translate the given user query to the lispress program, even though the program is intended to represent dialogue states, but the core is to translate user query to an executable format, whereas in conventional dialogue state tracking like MultiWoz or DSTC, the dialogue state is mostly represented as a multiple slot-value pair, which is mostly not executable as a program language. If possible, could you share some of your opinions about this?

opened by PosoSAgapo 1
Convert non-gold programs with meta-computation to in-line programs

I might have missed this, but it seems that the dataset releases the inlined and non-inlined versions for the gold annotated programs only. How do I convert an arbitrary predicted program with meta-computation to an inlined version? Is this feature currently supported? Thanks!!

opened by ruiqi-zhong 1
REQUEST: Any plan to update README-SEMANTICS.md?

I am really impressed by your tremendous work on this dataset. Notice that you have deprecated several symbols like "#", "get" and update the syntax Readme. Is there any plan to update the file for lispress semantics as well?

opened by YiweiJiang2015 0

Code to reproduce experiments in the paper "Task-Oriented Dialogue as Dataflow Synthesis" (TACL 2020).

Related tags

Overview

Task-Oriented Dialogue as Dataflow Synthesis

Understand SMCalFlow Programs

Install

SMCalFlow Experiments

MultiWOZ Experiments

Comments

v1.6.2 (2021-05-11)

v1.6.1 (2020-07-15)

v1.6 (2020-07-11)

v1.6.2 (2021-05-11)

v1.6.1 (2020-07-15)

v1.6 (2020-07-11)

Owner

Microsoft

Code to reproduce experiments in the paper "Explainability Requires Interactivity".

Code to reproduce the experiments from our NeurIPS 2021 paper " The Limitations of Large Width in Neural Networks: A Deep Gaussian Process Perspective"

Applications using the GTN library and code to reproduce experiments in "Differentiable Weighted Finite-State Transducers"

Contains an implementation (sklearn API) of the algorithm proposed in "GENDIS: GEnetic DIscovery of Shapelets" and code to reproduce all experiments.

Source code and notebooks to reproduce experiments and benchmarks on Bias Faces in the Wild (BFW).

Official codebase for "B-Pref: Benchmarking Preference-BasedReinforcement Learning" contains scripts to reproduce experiments.

TaCL: Improve BERT Pre-training with Token-aware Contrastive Learning

DNSpooq - dnsmasq cache poisoning (CVE-2020-25686, CVE-2020-25684, CVE-2020-25685)

The LaTeX and Python code for generating the paper, experiments' results and visualizations reported in each paper is available (whenever possible) in the paper's directory

Code reproduce for paper "Vehicle Re-identification with Viewpoint-aware Metric Learning"

Code to reproduce the results of the paper 'Towards Realistic Few-Shot Relation Extraction' (EMNLP 2021)

Code to reproduce the results in the paper "Tensor Component Analysis for Interpreting the Latent Space of GANs".

Use PaddlePaddle to reproduce the paper：mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

The codes reproduce the figures and statistics in the paper, "Controlling for multiple covariates," by Mark Tygert.

This repo will contain code to reproduce and build upon understanding transfer learning

PyTorch Implementation of Fully Convolutional Networks. (Training code to reproduce the original result is available.)

PyTorch Implementation of Fully Convolutional Networks. (Training code to reproduce the original result is available.)

Code to reproduce the results for Compositional Attention: Disentangling Search and Retrieval.

Code to reproduce the results for Statistically Robust Neural Network Classification, published in UAI 2021

L2X - Code for replicating the experiments in the paper Learning to Explain: An Information-Theoretic Perspective on Model Interpretation.