Source code for the paper: Variance-Aware Machine Translation Test Sets (NeurIPS 2021 Datasets and Benchmarks Track)

Related tags

Deep Learning Variance-Aware-MT-Test-Sets

Overview

Variance-Aware-MT-Test-Sets

Variance-Aware Machine Translation Test Sets

License

See LICENSE. We follow the data licensing plan as the same as the WMT benchmark.

VAT Data

We release 70 lightweight and discriminative test sets for machine translation evaluation, covering 35 translation directions from WMT16 to WMT20 competitions. See VAT_data folder for detailed information.

For each translation direction of a specific year, both source and reference are provided for different types of evaluation metrics. For example,

VAT_data/
├── wmt20
    ├── ...
    ├── vat_newstest2020-zhen-ref.en.txt
    └── vat_newstest2020-zhen-src.zh.txt

Meta-Information of VAT

We also provide the meta-inforamtion of reserved data. Each json file contains the IDs of retained data in the original test set. For instance, file wmt20/bert-r_filter-std60.json describes:

{
	...
	"en-de": [4, 6, 10, 13, 14, 15, ...],
	"de-en": [0, 3, 4, 5, 7, 9, ...],
	...
}

Reproduce & Create VAT

The reported results in the paper were produced by single NVIDIA GeForce 1080Ti card.

We will keep updating the code and related documentation after the response.

Requirements

sacreBLEU version >= 1.4.14
BLEURT version >= 0.0.2
COMET version >= 0.1.0
BERTScore version >= 0.3.7 (hug_trans==4.2.1)
PyTorch version >= 1.5.1
Python version >= 3.8
CUDA & cudatoolkit >= 10.1

Note: the minimal version is for reproducing the results

Pipeline

Use score_xxx.py to generate the CSV files that stores the sentence-level scores evaluated by the corresponding metrics. For example, evaluating all the WMT20 submissions of all the language pairs using BERTScore:

CUDA_VISIBLE_DEVICES=0 python score_bert.py -b 128 -s -r dummy -c dummy --rescale_with_baseline \
	--hypos-dir ${WMT_DATA_PATH}/system-outputs \
	--refs-dir ${WMT_DATA_PATH}/references \
	--scores-dir ${WMT_DATA_PATH}/results/system-level/scores_ALL \
	--testset-name newstest2020 --score-dump wmt20-bertscore.csv

(Alternative Option) You can use your implementation for dumping the scores given by the metrics. But the CSV header should contain:

,TESTSET,LP,ID,METRIC,SYS,SCORE

Use cal_filtering.py to filter the test set based on the score warehouse calculated in the last step. For example,
```
python cal_filtering.py --score-dump wmt20-bertscore.csv --output VAT_meta/wmt20-test/ --filter-per 60
```
It will produce the json files which contain the IDs of reserved sentences.

Statistics of VAT (References)

Benchmark	Translation Direction	# Sentences	# Words	# Vocabulary
wmt20	km-en	928	17170	3645
wmt20	cs-en	266	12568	3502
wmt20	en-de	567	21336	5945
wmt20	ja-en	397	10526	3063
wmt20	ps-en	1088	20296	4303
wmt20	en-zh	567	18224	5019
wmt20	en-ta	400	7809	4028
wmt20	de-en	314	16083	4046
wmt20	zh-en	800	35132	6457
wmt20	en-ja	400	12718	2969
wmt20	en-cs	567	16579	6391
wmt20	en-pl	400	8423	3834
wmt20	en-ru	801	17446	6877
wmt20	pl-en	400	7394	2399
wmt20	iu-en	1188	23494	3876
wmt20	ru-en	396	6966	2330
wmt20	ta-en	399	7427	2148
wmt19	zh-en	800	36739	6168
wmt19	en-cs	799	15433	6111
wmt19	de-en	800	15219	4222
wmt19	en-gu	399	8494	3548
wmt19	fr-de	680	12616	3698
wmt19	en-zh	799	20230	5547
wmt19	fi-en	798	13759	3555
wmt19	en-fi	799	13303	6149
wmt19	kk-en	400	9283	2584
wmt19	de-cs	799	15080	6166
wmt19	lt-en	400	10474	2874
wmt19	en-lt	399	7251	3364
wmt19	ru-en	800	14693	3817
wmt19	en-kk	399	6411	3252
wmt19	en-ru	799	16393	6125
wmt19	gu-en	406	8061	2434
wmt19	de-fr	680	16181	3517
wmt19	en-de	799	18946	5340
wmt18	en-cs	1193	19552	7926
wmt18	cs-en	1193	23439	5453
wmt18	en-fi	1200	16239	7696
wmt18	en-tr	1200	19621	8613
wmt18	en-et	800	13034	6001
wmt18	ru-en	1200	26747	6045
wmt18	et-en	800	20045	5045
wmt18	tr-en	1200	25689	5955
wmt18	fi-en	1200	24912	5834
wmt18	zh-en	1592	42983	7985
wmt18	en-zh	1592	34796	8579
wmt18	en-ru	1200	22830	8679
wmt18	de-en	1199	28275	6487
wmt18	en-de	1199	25473	7130
wmt17	en-lv	800	14453	6161
wmt17	zh-en	800	20590	5149
wmt17	en-tr	1203	17612	7714
wmt17	lv-en	800	18653	4747
wmt17	en-de	1202	22055	6463
wmt17	ru-en	1200	24807	5790
wmt17	en-fi	1201	17284	7763
wmt17	tr-en	1203	23037	5387
wmt17	en-zh	800	18001	5629
wmt17	en-ru	1200	22251	8761
wmt17	fi-en	1201	23791	5300
wmt17	en-cs	1202	21278	8256
wmt17	de-en	1202	23838	5487
wmt17	cs-en	1202	22707	5310
wmt16	tr-en	1200	19225	4823
wmt16	ru-en	1199	23010	5442
wmt16	ro-en	800	16200	3968
wmt16	de-en	1200	22612	5511
wmt16	en-ru	1199	20233	7872
wmt16	fi-en	1200	20744	5176
wmt16	cs-en	1200	23235	5324

[NeurIPS 2021] Well-tuned Simple Nets Excel on Tabular Datasets

[NeurIPS 2021] Well-tuned Simple Nets Excel on Tabular Datasets Introduction This repo contains the source code accompanying the paper: Well-tuned Sim

52 Jan 4, 2023

Code for the paper "Balancing Training for Multilingual Neural Machine Translation, ACL 2020"

Balancing Training for Multilingual Neural Machine Translation Implementation of the paper Balancing Training for Multilingual Neural Machine Translat

21 May 18, 2022

Code for paper "Vocabulary Learning via Optimal Transport for Neural Machine Translation"

**Codebase and data are uploaded in progress. ** VOLT(-py) is a vocabulary learning codebase that allows researchers and developers to automaticaly ge

416 Jan 9, 2023

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

H2O H2O is an in-memory platform for distributed, scalable machine learning. H2O uses familiar interfaces like R, Python, Scala, Java, JSON and the Fl

6.1k Jan 5, 2023

Code repo for "RBSRICNN: Raw Burst Super-Resolution through Iterative Convolutional Neural Network" (Machine Learning and the Physical Sciences workshop in NeurIPS 2021).

RBSRICNN: Raw Burst Super-Resolution through Iterative Convolutional Neural Network An official PyTorch implementation of the RBSRICNN network as desc

6 Nov 14, 2022

Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track (SIGIR 2021 Full Paper).

Optimizing Dense Retrieval Model Training with Hard Negatives Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Jiafeng Guo, Min Zhang, Shaoping Ma This repo provi

99 Dec 27, 2022

Code and datasets for the paper "Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction" (RA-L, 2021)

Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction This is the code for the paper Combining E

69 Dec 26, 2022

An easy way to build PyTorch datasets. Modularly build datasets and automatically cache processed results

EasyDatas An easy way to build PyTorch datasets. Modularly build datasets and automatically cache processed results Installation pip install git+https

4 Dec 14, 2021

Deep Learning Datasets Maker is a QGIS plugin to make datasets creation easier for raster and vector data.

Deep Learning Dataset Maker Deep Learning Datasets Maker is a QGIS plugin to make datasets creation easier for raster and vector data. How to use Down

25 Dec 15, 2022

Source code for the paper: Variance-Aware Machine Translation Test Sets (NeurIPS 2021 Datasets and Benchmarks Track)

Related tags

Overview

Variance-Aware-MT-Test-Sets

License

VAT Data

Meta-Information of VAT

Reproduce & Create VAT

Requirements

Pipeline

Statistics of VAT (References)

You might also like...

[NeurIPS 2021] Well-tuned Simple Nets Excel on Tabular Datasets

Code for the paper "Balancing Training for Multilingual Neural Machine Translation, ACL 2020"

Code for paper "Vocabulary Learning via Optimal Transport for Neural Machine Translation"

Code repo for "RBSRICNN: Raw Burst Super-Resolution through Iterative Convolutional Neural Network" (Machine Learning and the Physical Sciences workshop in NeurIPS 2021).

Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track (SIGIR 2021 Full Paper).

Code and datasets for the paper "Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction" (RA-L, 2021)

An easy way to build PyTorch datasets. Modularly build datasets and automatically cache processed results

Deep Learning Datasets Maker is a QGIS plugin to make datasets creation easier for raster and vector data.

Owner

NLP2CT Lab, University of Macau

Source code and notebooks to reproduce experiments and benchmarks on Bias Faces in the Wild (BFW).

Code and datasets for the paper "KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization for Relation Extraction"

Source code, datasets and trained models for the paper Learning Advanced Mathematical Computations from Examples (ICLR 2021), by François Charton, Amaury Hayat (ENPC-Rutgers) and Guillaume Lample

Training code and evaluation benchmarks for the "Self-Supervised Policy Adaptation during Deployment" paper.

Source code for CIKM 2021 paper for Relation-aware Heterogeneous Graph for User Profiling

Code for our NeurIPS 2021 paper 'Exploiting the Intrinsic Neighborhood Structure for Source-free Domain Adaptation'

Source code of NeurIPS 2021 Paper ''Be Confident! Towards Trustworthy Graph Neural Networks via Confidence Calibration''

Code of Adverse Weather Image Translation with Asymmetric and Uncertainty aware GAN

Code and model benchmarks for "SEVIR : A Storm Event Imagery Dataset for Deep Learning Applications in Radar and Satellite Meteorology"

Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021