IndoNLI: A Natural Language Inference Dataset for Indonesian

Last update: Feb 10, 2022

Related tags

Overview

IndoNLI: A Natural Language Inference Dataset for Indonesian

This is a repository for data and code accompanying our EMNLP 2021 paper "IndoNLI: A Natural Language Inference Dataset for Indonesian". The datasets used for our experiments can be found under the data directory:

indonli: human-annotated NLI data, split into train, val, and test (test_lay and test_expert)

diagnostic: subset of examples from test_expert that are annotated with linguistic and logical phenomena
translate_train.tar.gz: MNLI dataset translated to Indonesian (train and dev)
translate_train_small.tar.gz: sampled of translate_train used for the translate_train_small experiment.

The experiment code can be found under experiment directory, please check the related README file.

License

We use premises taken from the Indonesian Wikipedia, news, and Web articles.

Wikipedia is licensed under Creative Commons Attribution-ShareAlike 3.0 Unported License (CC-BY-SA) and the GNU Free Documentation License (GFDL).

For the news genre, we use premise text from Indonesian PUD and GSD treebanks provided by the Universal Dependencies 2.5 (Zeman et al., 2019) and IndoSum (Kurniawan and Louvan, 2018). Indonesian PUD and GSD treebanks are licensed under Creative Commons Attribution-ShareAlike 3.0 Unported License (CC-BY-SA) and Creative Commons Attribution-ShareAlike 4.0 International License (CC-BY-SA). IndoSum is licensed under Apache License, Version 2.0.

Citation

If you use our corpus in your work, please consider citing our paper:

@inproceedings{indonli,
    title = "IndoNLI: A Natural Language Inference Dataset for Indonesian",
    author = "Mahendra, Rahmad and Aji, Alham Fikri and Louvan, Samuel and Rahman, Fahrurrozi and Vania, Clara",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    publisher = "Association for Computational Linguistics",
}

You might also like...

🏆 The 1st Place Submission to AICity Challenge 2021 Natural Language-Based Vehicle Retrieval Track (Alibaba-UTS submission)

AI City 2021: Connecting Language and Vision for Natural Language-Based Vehicle Retrieval 🏆 The 1st Place Submission to AICity Challenge 2021 Natural

82 Dec 29, 2022

CLIP: Connecting Text and Image (Learning Transferable Visual Models From Natural Language Supervision)

CLIP (Contrastive Language–Image Pre-training) Experiments (Evaluation) Model Dataset Acc (%) ViT-B/32 (Paper) CIFAR100 65.1 ViT-B/32 (Our) CIFAR100 6

52 Jan 7, 2023

Train an RL agent to execute natural language instructions in a 3D Environment (PyTorch)

Gated-Attention Architectures for Task-Oriented Language Grounding This is a PyTorch implementation of the AAAI-18 paper: Gated-Attention Architecture

234 Nov 5, 2022

Implementation of EMNLP 2017 Paper "Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog" using PyTorch and ParlAI

Language Emergence in Multi Agent Dialog Code for the Paper Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog Satwik Kottur, José M.

105 Nov 25, 2022

🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

English | 简体中文 | 繁體中文 State-of-the-art Natural Language Processing for Jax, PyTorch and TensorFlow 🤗 Transformers provides thousands of pretrained mo

77.2k Jan 2, 2023

Implementation of EMNLP 2017 Paper "Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog" using PyTorch and ParlAI

Language Emergence in Multi Agent Dialog Code for the Paper Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog Satwik Kottur, José M.

105 Nov 25, 2022

IndoNLI: A Natural Language Inference Dataset for Indonesian

Related tags

Overview

IndoNLI: A Natural Language Inference Dataset for Indonesian

License

Citation

You might also like...

🏆 The 1st Place Submission to AICity Challenge 2021 Natural Language-Based Vehicle Retrieval Track (Alibaba-UTS submission)

CLIP: Connecting Text and Image (Learning Transferable Visual Models From Natural Language Supervision)

Train an RL agent to execute natural language instructions in a 3D Environment (PyTorch)

Implementation of EMNLP 2017 Paper "Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog" using PyTorch and ParlAI

🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

Implementation of EMNLP 2017 Paper "Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog" using PyTorch and ParlAI

Flybirds - BDD-driven natural language automated testing framework, present by Trip Flight

Deep Learning for Natural Language Processing SS 2021 (TU Darmstadt)

Deep Learning for Natural Language Processing SS 2021 (TU Darmstadt)

Owner

Uncertain natural language inference

Release of SPLASH: Dataset for semantic parse correction with natural language feedback in the context of text-to-SQL parsing

Torchserve server using a YoloV5 model running on docker with GPU and static batch inference to perform production ready inference.

Monocular 3D pose estimation. OpenVINO. CPU inference or iGPU (OpenCL) inference.

PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices.

Data-depth-inference - Data depth inference with python

Pytorch Implementation of paper "Noisy Natural Gradient as Variational Inference"

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

NaturalProofs: Mathematical Theorem Proving in Natural Language

The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".