Release of SPLASH: Dataset for semantic parse correction with natural language feedback in the context of text-to-SQL parsing

Related tags

Deep Learning nlp splash natural-language-inference semantic-parsing human-in-the-loop text-to-sql interactive-systems

Overview

SPLASH: Semantic Parsing with Language Assistance from Humans

SPLASH is dataset for the task of semantic parse correction with natural language feedback in the context of text-to-SQL parsing.

The task, dataset along with baseline results are presented in
Speak to your Parser: Interactive Text-to-SQL with Natural Language Feedback.
Ahmed Elgohary, Saghar Hosseini and Ahmed Hassan Awadallah.
ACL 2020.

Release

The train.json, dev.json and test.json contain the training, development and testing examples of SPLASH. In addition to that, we also release the 179 examples that are based on the EditSQL parser (Please, see section 6.3 in the paper for more details). The EditSQL examples are in editsql.json. SPLASH is distributed under the CC BY-SA 4.0 license.

Format

Each example contains the following fields:

db_id: Name of Spider database.

question: Question (Utterance) as provided in Spider.

predicted_parse: The predicted SQL parse by the relevant model.

predicted_parse_with_values: The predicted SQL with the values (annonomized in predicted_parse) inferred by a rule-based post-processor. Note that we still use Spider's evaluation measure which ignores the values, but inferring values for the predicted parse is essential for generating meaningful explanations.

predicted_parse_explanation: The generated natural language explanation of the predicted SQL.

feedback: Collected natural language feedback.

gold_parse: The gold parse of the given question as provided in Spider.

beam: The top 20 predictions with corresponding scores produced by Seq2Struct beam search.

Please, refer to the paper for more details.

Example

    {
        "db_id": "csu_1", 
        "question": "Which university is in Los Angeles county and opened after 1950?", 
        "predicted_parse": "SELECT T1.Campus FROM Campuses AS T1 JOIN faculty AS T2 ON T1.Id = T2.Campus WHERE T1.County = value AND T1.Year > value AND T2.Year > value", 
        "predicted_parse_with_values": "SELECT T1.Campus FROM Campuses AS T1 JOIN faculty AS T2 ON T1.Id = T2.Campus WHERE T1.County = \"Los Angeles\" AND T1.Year > 1950 AND T2.Year > 2002",
        "predicted_parse_explanation": [
            "Step 1: For each row in Campuses table, find the corresponding rows in faculty     
            table", 
            "Step 2: find Campuses's Campus of the results of step 1 whose County equals Los 
             Angeles and Campuses's Year greater than 1950 and faculty's Year greater than 2002"
        ],
        "feedback": "In step 2 Remove faculty 's year greater than 2002\".", 
        "gold_parse": "SELECT campus FROM campuses WHERE county  =  \"Los Angeles\" AND YEAR  >  
        1950", 
        "beam": [
            [
                "SELECT T1.Campus FROM Campuses AS T1 JOIN faculty AS T2 ON T1.Id = T2.Campus WHERE T1.County = value AND T2.Year > value AND T2.Year > value", 
                -1.5820374488830566
            ], 
            [
                "SELECT T1.County FROM Campuses AS T1 JOIN faculty AS T2 ON T1.Id = T2.Campus WHERE T1.Campus = value AND T2.Year > value AND T2.Year > value", 
                -2.0078020095825195
            ], 
            ..
  }

Please, contact Ahmed Elgohary < [email protected] > for any questions/feedback.

Citation

@inproceedings{Elgohary20Speak,
Title = {Speak to your Parser: Interactive Text-to-SQL with Natural Language Feedback},
Author = {Ahmed Elgohary and Saghar Hosseini and Ahmed Hassan Awadallah},
Year = {2020},
Booktitle = {Association for Computational Linguistics},
}

The dataset and source code for our paper: "Did You Ask a Good Question? A Cross-Domain Question IntentionClassification Benchmark for Text-to-SQL"

TriageSQL The dataset and source code for our paper: "Did You Ask a Good Question? A Cross-Domain Question Intention Classification Benchmark for Text

22 Nov 9, 2022

Siamese-nn-semantic-text-similarity - A repository containing comprehensive Neural Networks based PyTorch implementations for the semantic text similarity task

Siamese Deep Neural Networks for Semantic Text Similarity PyTorch A repository c

32 Dec 15, 2022

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language This repository contains UA-GEC data and an accompanying Python lib

226 Dec 29, 2022

Official Implementation of CoSMo: Content-Style Modulation for Image Retrieval with Text Feedback

CoSMo.pytorch Official Implementation of CoSMo: Content-Style Modulation for Image Retrieval with Text Feedback, Seungmin Lee*, Dongwan Kim*, Bohyung

54 Dec 8, 2022

LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation (NeurIPS2021 Benchmark and Dataset Track)

LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation by Junjue Wang, Zhuo Zheng, Ailong Ma, Xiaoyan Lu, and Yanfei Zh

174 Dec 22, 2022

Release of SPLASH: Dataset for semantic parse correction with natural language feedback in the context of text-to-SQL parsing

Related tags

Overview

SPLASH: Semantic Parsing with Language Assistance from Humans

Release

Format

Example

Please, contact Ahmed Elgohary < [email protected] > for any questions/feedback.

Citation

You might also like...

The dataset and source code for our paper: "Did You Ask a Good Question? A Cross-Domain Question IntentionClassification Benchmark for Text-to-SQL"

Siamese-nn-semantic-text-similarity - A repository containing comprehensive Neural Networks based PyTorch implementations for the semantic text similarity task

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

Official Implementation of CoSMo: Content-Style Modulation for Image Retrieval with Text Feedback

LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation (NeurIPS2021 Benchmark and Dataset Track)

Code release for "Transferable Semantic Augmentation for Domain Adaptation" (CVPR 2021)

Official code release for: EditGAN: High-Precision Semantic Image Editing

git《Commonsense Knowledge Base Completion with Structural and Semantic Context》(AAAI 2020) GitHub: [fig1]

Semi-supervised Semantic Segmentation with Directional Context-aware Consistency (CVPR 2021)

Owner

Microsoft Research - Language and Information Technologies (MSR LIT)

Feedback is important: response-aware feedback mechanism for background based conversation

:hot_pepper: R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)

ICLR 2021: Pre-Training for Context Representation in Conversational Semantic Parsing

Weakly Supervised Text-to-SQL Parsing through Question Decomposition

Source code for paper "ATP: AMRize Than Parse! Enhancing AMR Parsing with PseudoAMRs" @NAACL-2022

This is the dataset and code release of the OpenRooms Dataset.

Pytorch implementation for Semantic Segmentation/Scene Parsing on MIT ADE20K dataset

Pytorch implementation for Semantic Segmentation/Scene Parsing on MIT ADE20K dataset

Implementation of fast algorithms for Maximum Spanning Tree (MST) parsing that includes fast ArcMax+Reweighting+Tarjan algorithm for single-root dependency parsing.

Text-to-SQL in the Wild: A Naturally-Occurring Dataset Based on Stack Exchange Data