A benchmark for the task of translation suggestion

Overview

WeTS: A Benchmark for Translation Suggestion

Translation Suggestion (TS), which provides alternatives for specific words or phrases given the entire documents translated by machine translation (MT) has been proven to play a significant role in post editing (PE). WeTS is a benchmark data set for TS, which is annotated by expert translators. WeTS contains corpus(train/dev/test) for four different translation directions, i.e., English2German, German2English, Chinese2English and English2Chinese.


Contents

Data


WeTS is a benchmark dataset for TS, where all the examples are annotated by expert translators. As far as we know, this is the first golden corpus for TS. The statistics about WeTS are listed in the following table:

Translation Direction Train Valid Test
English2German 14,957 1000 1000
German2English 11,777 1000 1000
English2Chinese 15,769 1000 1000
Chinese2English 21,213 1000 1000

For corpus in each direction, the data is organized as:
direction.split.src: the source-side sentences
direction.split.mask: the masked translation sentences, the placeholder is "<MASK>"
direction.split.tgt: the predicted suggestions, the test set for English2Chinese has three references for each example

direction: En2De, De2En, Zh2En, En2Zh
split: train, dev, test

Models


We release the pre-trained NMT models which are used to generate the MT sentences. Additionally, the released NMT models can be used to generate synthetic corpus for TS, which can improve the final performance dramatically.Detailed description about the way of generating synthetic corpus can be found in our paper.

The released models can be downloaded at:

Download the models

and the password is "2iyk"

For inference with the released model, we can:

sh inference_*direction*.sh 

direction can be: en2de, de2en, en2zh, zh2en

Get Started


data preprocessing

sh process.sh 

pre-training

Codes for the first-phase pre-training are not included in this repo, as we directly utilized the codes of XLM (https://github.com/facebookresearch/XLM) with little modiafication. And we did not achieve much gains with the first-phase pretraining.

The second-phase pre-training:

sh preptraining.sh

fine-tuning

sh finetuning.sh

Codes in this repo is mainly forked from fairseq (https://github.com/pytorch/fairseq.git)

Citation


Please cite the following paper if you found the resources in this repository useful.

@article{yang2021wets,
  title={WeTS: A Benchmark for Translation Suggestion},
  author={Yang, Zhen and Zhang, Yingxue and Li, Ernan and Meng, Fandong and Zhou, Jie},
  journal={arXiv preprint arXiv:2110.05151},
  year={2021}
}

LICENCE


See LICENCE

You might also like...
[CVPR 2021 Oral] ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis
[CVPR 2021 Oral] ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis

ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis [arxiv|pdf|v

A Research-oriented Federated Learning Library and Benchmark Platform for Graph Neural Networks. Accepted to ICLR'2021 - DPML and MLSys'21 - GNNSys workshops.

FedGraphNN: A Federated Learning System and Benchmark for Graph Neural Networks A Research-oriented Federated Learning Library and Benchmark Platform

[CVPR2021] UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles

UAV-Human Official repository for CVPR2021: UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicle Paper arXiv Res

3D AffordanceNet is a 3D point cloud benchmark consisting of 23k shapes from 23 semantic object categories, annotated with 56k affordance annotations and covering 18 visual affordance categories.
3D AffordanceNet is a 3D point cloud benchmark consisting of 23k shapes from 23 semantic object categories, annotated with 56k affordance annotations and covering 18 visual affordance categories.

3D AffordanceNet This repository is the official experiment implementation of 3D AffordanceNet benchmark. 3D AffordanceNet is a 3D point cloud benchma

modelvshuman is a Python library to benchmark the gap between human and machine vision
modelvshuman is a Python library to benchmark the gap between human and machine vision

modelvshuman is a Python library to benchmark the gap between human and machine vision. Using this library, both PyTorch and TensorFlow models can be evaluated on 17 out-of-distribution datasets with high-quality human comparison data.

NAS Benchmark in
NAS Benchmark in "Prioritized Architecture Sampling with Monto-Carlo Tree Search", CVPR2021

NAS-Bench-Macro This repository includes the benchmark and code for NAS-Bench-Macro in paper "Prioritized Architecture Sampling with Monto-Carlo Tree

Baseline model for
Baseline model for "GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping" (CVPR 2020)

GraspNet Baseline Baseline model for "GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping" (CVPR 2020). [paper] [dataset] [API] [do

Official Implement of CVPR 2021 paper “Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting”
Official Implement of CVPR 2021 paper “Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting”

RGBT Crowd Counting Lingbo Liu, Jiaqi Chen, Hefeng Wu, Guanbin Li, Chenglong Li, Liang Lin. "Cross-Modal Collaborative Representation Learning and a L

Generic Event Boundary Detection: A Benchmark for Event Segmentation

Generic Event Boundary Detection: A Benchmark for Event Segmentation We release our data annotation & baseline codes for detecting generic event bound

Comments
  • Missing apply_bpe.py

    Missing apply_bpe.py

    Hi I am trying to implement your code following the instruction you noted in README.md. I found that "apply_bpe.py" which is required in executing "process.sh" does not exist in the repository. Could you please provide more information about the subword tokenization method?

    Additionally, I cannot find the sentencepiece model or BPE model that is suited to the pre-trained model you have released. Is there any released file that I have not found yet?

    Thank you

    opened by hyeonseokk 3
  • Some questions about process.py

    Some questions about process.py

    Hi:) thank you for providing the tutorial code. Some of the code on process.py seems to be wrong.

    I think the code below needs to be modified. Please check if this modification is correct.

    • original process.py
    # process.py
    ...
    # bpe
    python apply_bpe.py -c $bpe_codes <$data_dir/en2cn/en2cn.train.src > $data_dir/en2cn/en2cn.train.src.bpe.cn 
    python apply_bpe.py -c $bpe_codes <$data_dir/en2cn/en2cn.train.mask > $data_dir/en2cn/en2cn.train.src.bpe.en
    python apply_bpe.py -c $bpe_codes <$data_dir/en2cn/en2cn.train.tgt > $data_dir/en2cn/en2cn.train.tgt.en
    
    python apply_bpe.py -c $bpe_codes <$data_dir/en2cn/en2cn.valid.src > $data_dir/en2cn/en2cn.valid.src.bpe.cn 
    python apply_bpe.py -c $bpe_codes <$data_dir/en2cn/en2cn.valid.mask > $data_dir/en2cn/en2cn.valid.src.bpe.en
    python apply_bpe.py -c $bpe_codes <$data_dir/en2cn/en2cn.valid.tgt > $data_dir/en2cn/en2cn.valid.tgt.en
    
    
    # build vocab
    touch $src_vocab
    python $codes_dir/build_vocab.py $data_dir/en2cn/en2cn.train.src.bpe.cn $data_dir/en2cn.train.src.bpe.en $src_vocab 5 
    ...
    
    • After modification
    # process.py
    ...
    # bpe
    python apply_bpe.py -c $bpe_codes <$data_dir/en2cn/en2cn.train.src > $data_dir/en2cn/en2cn.train.src.bpe.en 
    python apply_bpe.py -c $bpe_codes <$data_dir/en2cn/en2cn.train.mask > $data_dir/en2cn/en2cn.train.src.bpe.cn
    python apply_bpe.py -c $bpe_codes <$data_dir/en2cn/en2cn.train.tgt > $data_dir/en2cn/en2cn.train.tgt.bpe.cn
    
    python apply_bpe.py -c $bpe_codes <$data_dir/en2cn/en2cn.valid.src > $data_dir/en2cn/en2cn.valid.src.bpe.en 
    python apply_bpe.py -c $bpe_codes <$data_dir/en2cn/en2cn.valid.mask > $data_dir/en2cn/en2cn.valid.src.bpe.cn
    python apply_bpe.py -c $bpe_codes <$data_dir/en2cn/en2cn.valid.tgt > $data_dir/en2cn/en2cn.valid.tgt.bpe.cn
    
    
    # build vocab
    touch $src_vocab
    python $codes_dir/build_vocab.py $data_dir/en2cn/en2cn.train.src.bpe.cn $data_dir/en2cn/en2cn.train.src.bpe.en $src_vocab 5 
    ...
    

    Another question seems to be that the ratio in the results of fairseq preprocessing is too high. Is this normal?

    Namespace(alignfile=None, bpe=None, cpu=False, criterion='cross_entropy', dataset_impl='mmap', destdir='../WMT22_TS/WeTS/data-bin', fp16=False, fp16_init_scale=128, fp16_scale_tolerance=0.0, fp16_scale_window=None, joined_dictionary=True, log_format=None, log_interval=1000, lr_scheduler='fixed', memory_efficient_fp16=False, min_loss_scale=0.0001, no_progress_bar=False, nwordssrc=-1, nwordstgt=-1, only_source=False, optimizer='nag', padding_factor=8, seed=1, source_lang='en', srcdict='../WMT22_TS/WeTS/src.vocab', target_lang='cn', task='input_suggestion', tbmf_wrapper=False, tensorboard_logdir='', testpref=None, tgtdict=None, threshold_loss_scale=None, thresholdsrc=0, thresholdtgt=0, tokenizer=None, trainpref='../WMT22_TS/WeTS/train_and_dev_0425/NaiveTs/en2cn/en2cn.train.src.bpe', user_dir=None, validpref='../WMT22_TS/WeTS/train_and_dev_0425/NaiveTs/en2cn/en2cn.dev.src.bpe', workers=10)
    | [en] Dictionary: 34751 types
    | [en] ../WMT22_TS/WeTS/train_and_dev_0425/NaiveTs/en2cn/en2cn.train.src.bpe.en: 14759 sents, 785715 tokens, 0.0% replaced by <unk>
    | [en] Dictionary: 34751 types
    | [en] ../WMT22_TS/WeTS/train_and_dev_0425/NaiveTs/en2cn/en2cn.dev.src.bpe.en: 2733 sents, 161151 tokens, 0.853% replaced by <unk>
    | [cn] Dictionary: 34751 types
    | [cn] ../WMT22_TS/WeTS/train_and_dev_0425/NaiveTs/en2cn/en2cn.train.src.bpe.cn: 14759 sents, 919988 tokens, 4.68% replaced by <unk>
    | [cn] Dictionary: 34751 types
    | [cn] ../WMT22_TS/WeTS/train_and_dev_0425/NaiveTs/en2cn/en2cn.dev.src.bpe.cn: 2733 sents, 191096 tokens, 4.47% replaced by <unk>
    | Wrote preprocessed data to ../WMT22_TS/WeTS/data-bin
    Namespace(alignfile=None, bpe=None, cpu=False, criterion='cross_entropy', dataset_impl='mmap', destdir='../WMT22_TS/WeTS/data-bin', fp16=False, fp16_init_scale=128, fp16_scale_tolerance=0.0, fp16_scale_window=None, joined_dictionary=False, log_format=None, log_interval=1000, lr_scheduler='fixed', memory_efficient_fp16=False, min_loss_scale=0.0001, no_progress_bar=False, nwordssrc=-1, nwordstgt=-1, only_source=True, optimizer='nag', padding_factor=8, seed=1, source_lang='cn', srcdict='../WMT22_TS/WeTS/tgt.vocab', target_lang=None, task='input_suggestion', tbmf_wrapper=False, tensorboard_logdir='', testpref=None, tgtdict=None, threshold_loss_scale=None, thresholdsrc=0, thresholdtgt=0, tokenizer=None, trainpref='../WMT22_TS/WeTS/train_and_dev_0425/NaiveTs/en2cn/en2cn.train.tgt.bpe', user_dir=None, validpref='../WMT22_TS/WeTS/train_and_dev_0425/NaiveTs/en2cn/en2cn.dev.tgt.bpe', workers=10)
    | [cn] Dictionary: 11039 types
    | [cn] ../WMT22_TS/WeTS/train_and_dev_0425/NaiveTs/en2cn/en2cn.train.tgt.bpe.cn: 14759 sents, 103263 tokens, 0.0% replaced by <unk>
    | [cn] Dictionary: 11039 types
    | [cn] ../WMT22_TS/WeTS/train_and_dev_0425/NaiveTs/en2cn/en2cn.dev.tgt.bpe.cn: 2733 sents, 17299 tokens, 5.13% replaced by <unk>
    | Wrote preprocessed data to ../WMT22_TS/WeTS/data-bin
    
    opened by tmtmaj 1
Owner
zhyang
zhyang
RoboDesk A Multi-Task Reinforcement Learning Benchmark

RoboDesk A Multi-Task Reinforcement Learning Benchmark If you find this open source release useful, please reference in your paper: @misc{kannan2021ro

Google Research 66 Oct 7, 2022
A machine learning benchmark of in-the-wild distribution shifts, with data loaders, evaluators, and default models.

WILDS is a benchmark of in-the-wild distribution shifts spanning diverse data modalities and applications, from tumor identification to wildlife monitoring to poverty mapping.

P-Lambda 437 Dec 30, 2022
OpenMMLab Detection Toolbox and Benchmark

MMDetection is an open source object detection toolbox based on PyTorch. It is a part of the OpenMMLab project.

OpenMMLab 22.5k Jan 5, 2023
[ICLR 2021] HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark

HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark Accepted as a spotlight paper at ICLR 2021. Table of content File structure Prerequi

null 72 Jan 3, 2023
Repository for Traffic Accident Benchmark for Causality Recognition (ECCV 2020)

Causality In Traffic Accident (Under Construction) Repository for Traffic Accident Benchmark for Causality Recognition (ECCV 2020) Overview Data Prepa

Tackgeun 21 Nov 20, 2022
Open-L2O: A Comprehensive and Reproducible Benchmark for Learning to Optimize Algorithms

Open-L2O This repository establishes the first comprehensive benchmark efforts of existing learning to optimize (L2O) approaches on a number of proble

VITA 161 Jan 2, 2023
A code repository associated with the paper A Benchmark for Rough Sketch Cleanup by Chuan Yan, David Vanderhaeghe, and Yotam Gingold from SIGGRAPH Asia 2020.

A Benchmark for Rough Sketch Cleanup This is the code repository associated with the paper A Benchmark for Rough Sketch Cleanup by Chuan Yan, David Va

null 33 Dec 18, 2022
OpenMMLab Semantic Segmentation Toolbox and Benchmark.

Documentation: https://mmsegmentation.readthedocs.io/ English | 简体中文 Introduction MMSegmentation is an open source semantic segmentation toolbox based

OpenMMLab 5k Dec 31, 2022
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

Introduction English | 简体中文 MMAction2 is an open-source toolbox for video understanding based on PyTorch. It is a part of the OpenMMLab project. The m

OpenMMLab 2.7k Jan 7, 2023
OpenMMLab Pose Estimation Toolbox and Benchmark.

Introduction English | 简体中文 MMPose is an open-source toolbox for pose estimation based on PyTorch. It is a part of the OpenMMLab project. The master b

OpenMMLab 2.8k Dec 31, 2022