COPA-SSE contains crowdsourced explanations for the Balanced COPA dataset

Ana Brassard

Last update: Jul 31, 2022

Related tags

Deep Learning copa-sse

Overview

COPA-SSE

Repository for COPA-SSE: Semi-Structured Explanations for Commonsense Reasoning.

COPA-SSE contains crowdsourced explanations for the Balanced COPA dataset, a variant of the Choice of Plausible Alternatives (COPA) benchmark. The explanations are formatted as a set of triple-like common sense statements with ConceptNet relations but freely written concepts.

Data format

dev-explained.jsonl and test-explained.jsonl each contain Balanced COPA samples with added explanations in .jsonl format. The question ids match the original questions of the development and test set, respectively.

Each entry contains:

the original question (matching format and ids)
human-explanations: a list of explanations each containing:
- expl-id: the explanation id
- text: the explanation in plain text (full sentences)
- worker-id: anonymized worker id (the author of the explanation)
- worker-avg: the average score the author got for their explanations
- all-ratings: all collected ratings for the explanation
- filtered-ratings: ratings excluding those that failed the control
- triples: the triple-form explanation (a list of ConceptNet-like triples)

Example entry:

id: 1, 
asks-for: cause, 
most-plausible-alternative: 1,
p: "My body cast a shadow over the grass.", 
a1: "The sun was rising.", 
a2: "The grass was cut.", 
human-explanations: [
    {expl-id: f4d9b407-681b-4340-9be1-ac044f1c2230, 
     text: "Sunrise causes casted shadows.", 
     worker-id: 3a71407b-9431-49f9-b3ca-1641f7c05f3b, 
     worker-avg: 3.5832864694635025, 
     all-ratings: [1, 3, 3, 4, 3], 
     filtered-ratings: [3, 3, 4, 3], 
     filtered-avg-rating: 3.25, 
     triples: [["sunrise", "Causes", "casted shadows"]]
     }, ...]

Aggregated versions

graphs.pkl contains aggregated versions of the triples for each question in a dictionary format with COPA question ids as the key.

Each entry contains a list of edges, each being a tuple of (u, v, {'rel': relation, 'weight': weight}). Similar nodes were connected or merged with relatedto, depending on the cosine similarity between their SentenceTransformer embeddings. The weight is the average score of the explanation the edge originated from (summed if multiple), or 1.0 if the edge was automatically generated.

Note: not all graphs are (weakly) connected.

Example entry:

1: [('sunrise', 'casted_shadows', {'rel': 'causes', 'weight': 3.25}),
  ('sunrise', 'sun', {'rel': 'relatedto', 'weight': 1.0}),
  ('casted_shadows', 'the_shadow', {'rel': 'relatedto', 'weight': 1.0}),
  ('sun_rising', 'bringing_light', {'rel': 'hasproperty', 'weight': 4.25}),
  ('sun_rising', 'a_sun_raising', {'rel': 'relatedto', 'weight': 1.0}),
 ...
]

Citation

Thank you for your interest in our dataset! If you use it in your research, please cite:

@misc{brassard2022copasse,
    title={COPA-SSE: Semi-structured Explanations for Commonsense Reasoning},
    author={Ana Brassard and Benjamin Heinzerling and Pride Kavumba and Kentaro Inui},
    year={2022},
    eprint={2201.06777},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

The Malware Open-source Threat Intelligence Family dataset contains 3,095 disarmed PE malware samples from 454 families

MOTIF Dataset The Malware Open-source Threat Intelligence Family (MOTIF) dataset contains 3,095 disarmed PE malware samples from 454 families, labeled

112 Dec 13, 2022

Official Implementation and Dataset of "PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask and Group-Level Consistency", CVPR 2021

Portrait Photo Retouching with PPR10K Paper | Supplementary Material PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask an

184 Dec 11, 2022

This is the dataset and code release of the OpenRooms Dataset.

95 Jan 8, 2023

A large dataset of 100k Google Satellite and matching Map images, resembling pix2pix's Google Maps dataset.

Larger Google Sat2Map dataset This dataset extends the aerial ⟷ Maps dataset used in pix2pix (Isola et al., CVPR17). The provide script download_sat2m

34 Dec 28, 2022

This is the official repo for TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations at CVPR'21. According to some product reasons, we are not planning to release the training/testing codes and models. However, we will release the dataset and the scripts to prepare the dataset.

TransFill-Reference-Inpainting This is the official repo for TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transf

80 Dec 8, 2022

Dataset used in "PlantDoc: A Dataset for Visual Plant Disease Detection" accepted in CODS-COMAD 2020

PlantDoc: A Dataset for Visual Plant Disease Detection This repository contains the Cropped-PlantDoc dataset used for benchmarking classification mode

109 Dec 29, 2022

EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

MADE (Multi-Adapter Dataset Experts) This repository contains the implementation of MADE (Multi-adapter dataset experts), which is described in the pa

68 Jul 18, 2022

39 Oct 5, 2021

LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation (NeurIPS2021 Benchmark and Dataset Track)

LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation by Junjue Wang, Zhuo Zheng, Ailong Ma, Xiaoyan Lu, and Yanfei Zh

174 Dec 22, 2022

COPA-SSE contains crowdsourced explanations for the Balanced COPA dataset

Related tags

Overview

COPA-SSE

Data format

Example entry:

Aggregated versions

Example entry:

Citation

You might also like...

The Malware Open-source Threat Intelligence Family dataset contains 3,095 disarmed PE malware samples from 454 families

Official Implementation and Dataset of "PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask and Group-Level Consistency", CVPR 2021

This is the dataset and code release of the OpenRooms Dataset.

A large dataset of 100k Google Satellite and matching Map images, resembling pix2pix's Google Maps dataset.

Dataset used in "PlantDoc: A Dataset for Visual Plant Disease Detection" accepted in CODS-COMAD 2020

EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation (NeurIPS2021 Benchmark and Dataset Track)

Owner

Ana Brassard

Official Pytorch implementation of "Unbiased Classification Through Bias-Contrastive and Bias-Balanced Learning (NeurIPS 2021)

BESS: Balanced Evolutionary Semi-Stacking for Disease Detection via Partially Labeled Imbalanced Tongue Data

audioLIME: Listenable Explanations Using Source Separation

Code for paper: Group-CAM: Group Score-Weighted Visual Explanations for Deep Convolutional Networks

PyTorch implementation of Interpretable Explanations of Black Boxes by Meaningful Perturbation

Collection of NLP model explanations and accompanying analysis tools

📦 PyTorch based visualization package for generating layer-wise explanations for CNNs.

An image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testingAn image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testing

This repository contains the code for using the H3DS dataset introduced in H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction

NUANCED is a user-centric conversational recommendation dataset that contains 5.1k annotated dialogues and 26k high-quality user turns.