💃 VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena

Heidelberg-NLP

Last update: Nov 7, 2022

Related tags

Deep Learning VALSE

Overview

VALSE 💃

💃 VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena. https://arxiv.org/abs/2112.07566

Data Instructions

Please find the data in the data folder. The dataset is in json format and contains the following relevant fields:

A reference to the image in the original dataset: dataset and image_file.
The valid sentence, the caption for VALSE: caption.
The altered caption, the foil.
The annotator's votes (3 annotators per sample): mturk.
- The subentry caption counts the number of annotators who chose the caption, but/and not the foil, to be the one describing the image.
- The subentry foil counts how many of the three annotators chose the foil to be (also) describing the image.
- For more information, see subsec. 4.4 and App. E of the paper.

‼️ Please be aware that the jsons are containing both valid (meaning: validated by annotators) and non-validated samples. In order to work only with the valid set, please consider filtering them:

We consider a valid foil to mean: at least two out of three annotators identified the caption, but not the foil, as the text which accurately describes the image.

This means that the valid samples of the dataset are the ones where sample["mturk"]["caption"] >= 2.

Example instance:

{
    "actions_test_0": {
        "dataset": "SWiG",
        "original_split": "test",                 # the split of the original dataset in which the sample belonged to
        "dataset_idx": "exercising_255.jpg",      # the sample id in the original dataset
        "linguistic_phenomena": "actions",        # the linguistic phenomenon targeted
        "image_file": "exercising_255.jpg",
        "caption": "A man exercises his torso.",
        "classes": "man",                         # the word of the caption that was replaced
        "classes_foil": "torso",                  # the foil word / phrase
        "mturk": {
            "foil": 0,
            "caption": 3,
            "other": 0
        },
        "foil": "A torso exercises for a man."
    }, ...
}

Images

For the images, please follow the downloading instructions of the respective original dataset. The provenance of the original images is mentioned in the json files in the field dataset.

Reference

Please cite our 💃 VALSE paper if you are using this dataset.

@misc{parcalabescu2021valse,
      title={VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena}, 
      author={Letitia Parcalabescu and Michele Cafagna and Lilitta Muradjan and Anette Frank and Iacer Calixto and Albert Gatt},
      year={2021},
      eprint={2112.07566},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

You might also like...

This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent Convolutional Networks.

Orientation independent Möbius CNNs This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of

59 Dec 9, 2022

Independent and minimal implementations of some reinforcement learning algorithms using PyTorch (including PPO, A3C, A2C, ...).

PyTorch RL Minimal Implementations There are implementations of some reinforcement learning algorithms, whose characteristics are as follow: Less pack

4 Dec 31, 2022

Temporal Dynamic Convolutional Neural Network for Text-Independent Speaker Verification and Phonemetic Analysis

TDY-CNN for Text-Independent Speaker Verification Official implementation of Temporal Dynamic Convolutional Neural Network for Text-Independent Speake

16 Oct 17, 2022

Official implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis https://arxiv.org/abs/2011.13775

CIPS -- Official Pytorch Implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis Requirements pip install -r requi

Multimodal Lab @ Samsung AI Center Moscow

201 Dec 21, 2022

PyTorch implementations of the paper: "Learning Independent Instance Maps for Crowd Localization"

IIM - Crowd Localization This repo is the official implementation of paper: Learning Independent Instance Maps for Crowd Localization. The code is dev

91 Nov 10, 2022

2D Time independent Schrodinger equation solver for arbitrary shape of well

Schrodinger Well Python Python solver for timeless Schrodinger equation for well with arbitrary shape https://imgur.com/a/jlhK7OZ Pictures of circular

24 Nov 18, 2022

Joint Versus Independent Multiview Hashing for Cross-View Retrieval[J] (IEEE TCYB 2021, PyTorch Code)

Thanks to the low storage cost and high query speed, cross-view hashing (CVH) has been successfully used for similarity search in multimedia retrieval. However, most existing CVH methods use all views to learn a common Hamming space, thus making it difficult to handle the data with increasing views or a large number of views.

4 Nov 19, 2022

DeepMind Alchemy task environment: a meta-reinforcement learning benchmark

The DeepMind Alchemy environment is a meta-reinforcement learning benchmark that presents tasks sampled from a task distribution with deep underlying structure.

188 Dec 25, 2022

RoboDesk A Multi-Task Reinforcement Learning Benchmark

RoboDesk A Multi-Task Reinforcement Learning Benchmark If you find this open source release useful, please reference in your paper: @misc{kannan2021ro

66 Oct 7, 2022

Comments

random.choice() on a sentence generates a character

   for foil_id, foil in tqdm(foils_data.items()):
        caption_fits = foil['mturk']['caption']

        if caption_fits >= 2:  # valid examples only (validated by mturkers)

            test_img_path = os.path.join(images_path, foil["image_file"])

            if instrument == 'plurals':
                test_sentences = [foil["caption"][0], random.choice(foil["foils"])]
            else:
                test_sentences = [foil["caption"], random.choice(foil["foils"])]

why random.choice() used in here? that creates single character sentences

opened by BiophysNinja 3

Release the scoring scripts?

Hi, thanks a lot for this great work which quite inspires me! I wonder if it is possible to release the scoring scripts to reproduce any results (e.g., Table 2.) in your paper?

Looking forward to your reply!

opened by Wangt-CN 3
In consistencies with data labels and sources
Thank you for sharing this fantastic dataset. Here are some minor issues that I have observed

1 . foil examples given key "dataset" : "VisDial_v1.0", "original_split" : "test" -> Images are actually from COCO_train_2014

two keys are used to refer to COCO 2017 Validation . Some data use key "coco2017_val and others use "coco_2017_val"
opened by BiophysNinja 1

💃 VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena

Related tags

Overview

VALSE 💃

Data Instructions

Images

Reference

You might also like...

This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent Convolutional Networks.

Independent and minimal implementations of some reinforcement learning algorithms using PyTorch (including PPO, A3C, A2C, ...).

Temporal Dynamic Convolutional Neural Network for Text-Independent Speaker Verification and Phonemetic Analysis

Official implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis https://arxiv.org/abs/2011.13775

PyTorch implementations of the paper: "Learning Independent Instance Maps for Crowd Localization"

2D Time independent Schrodinger equation solver for arbitrary shape of well

Joint Versus Independent Multiview Hashing for Cross-View Retrieval[J] (IEEE TCYB 2021, PyTorch Code)

DeepMind Alchemy task environment: a meta-reinforcement learning benchmark

RoboDesk A Multi-Task Reinforcement Learning Benchmark

Comments

random.choice() on a sentence generates a character

Release the scoring scripts?

In consistencies with data labels and sources

Owner

Heidelberg-NLP

HSC4D: Human-centered 4D Scene Capture in Large-scale Indoor-outdoor Space Using Wearable IMUs and LiDAR. CVPR 2022

Codes for our paper "SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge" (EMNLP 2020)

This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.

Multi Task Vision and Language

Speech Emotion Recognition with Fusion of Acoustic- and Linguistic-Feature-Based Decisions

Official codes for the paper "Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech"

[ACL 20] Probing Linguistic Features of Sentence-level Representations in Neural Relation Extraction

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

Implementation of Bidirectional Recurrent Independent Mechanisms (Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules)