Understanding the Properties of Minimum Bayes Risk Decoding in Neural Machine Translation.

ZurichNLP

Last update: May 1, 2022

Related tags

Deep Learning understanding-mbr

Overview

Understanding Minimum Bayes Risk Decoding

This repo provides code and documentation for the following paper:

Müller and Sennrich (2021): Understanding the Properties of Minimum Bayes Risk Decoding in Neural Machine Translation.

@inproceedings{muller2021understanding,
      title={Understanding the Properties of Minimum Bayes Risk Decoding in Neural Machine Translation}, 
      author = {M{\"u}ller, Mathias  and
      Sennrich, Rico},
      year={2021},
      eprint={2105.08504},
      booktitle = "Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021)"
}

Basic Setup

Clone this repo in the desired place:

git clone https://github.com/ZurichNLP/understanding-mbr
cd understanding-mbr

then proceed to install software before running any experiments.

Install required software

Create a new virtualenv that uses Python 3. Please make sure to run this command outside of any virtual Python environment:

./scripts/create_venv.sh

Important: Then activate the env by executing the source command that is output by the shell script above.

Download and install required software:

./scripts/download.sh

The download script makes several important assumptions, such as: your OS is Linux, you have CUDA 10.2 installed, you have access to a GPU for training and translation, your folder for temp files is /var/tmp. Edit the script before running it to fit to your needs.

Running experiments in general

Definition of "run"

We define a "run" as one complete experiment, in the sense that a run executes a pipeline of steps. Every run is completely self-contained: it does everything from downloading the data until evaluation of a trained model.

The series of steps executed in a run is defined in

scripts/tatoeba/run_tatoeba_generic.sh

This script is generic and will never be called on its own (many variables would be undefined), but all our scripts eventually call this script.

SLURM jobs

Individual steps in runs are submitted to a SLURM system. The generic run script:

scripts/tatoeba/run_tatoeba_generic.sh

will submit each individual step (such as translation, or model training) as a separate SLURM job. Depending on the nature of the task, the scripts submits to a different cluster, or asks for different resources.

IMPORTANT: if

you do not work on a cluster that uses SLURM for job management,
your cluster layout, resource naming etc. is different

you absolutely need to modify or replace the generic script scripts/tatoeba/run_tatoeba_generic.sh before running anything. If you do not use SLURM at all, it might be possible to just replace calls to scripts/tatoeba/run_tatoeba_generic.sh with scripts/tatoeba/run_tatoeba_generic_no_slurm.sh.

scripts/tatoeba/run_tatoeba_generic_no_slurm.sh is a script we provide for convenience, but have not tested it ourselves. We cannot guarantee that it runs without error.

Dry run

Before you run actual experiments, it can be useful to perform a dry run. Dry runs attempt to run all commands, create all files etc. but are finished within minutes and use CPU only. Dry runs help to catch some bugs (such as file permissions) early.

To dry-run a baseline system for the language pair DAN-EPO, run:

./scripts/tatoeba/dry_run_baseline.sh

Single (non-dry!) example run

To run the entire pipeline (downloading data until evaluation of trained model) for a single language pair from Tatoeba, run

./scripts/tatoeba/run_baseline.sh

This will train a model for the language pair DAN-EPO, but also execute all steps before and after model training.

Start a certain group of runs

It is possible to submit several runs at the same time, using the same shell script. For instance, to run all required steps for a number of medium-resource language pairs, run

./scripts/tatoeba/run_mediums.sh

Recovering partial runs

Steps within a run pipeline depend on each other (SLURM sbatch --afterok dependency in most cases). This means that if a job X fails, subsequent jobs that depend on X will never start. If you attempt to re-run completed steps they exit immediately -- so you can always re-run an entire pipeline if any step fails.

Reproducing the results presented in our paper in particular

Training and evaluating the models

To create all models and statistics necessary to compare MBR with different utility functions:

scripts/tatoeba/run_compare_risk_functions.sh

To reproduce experiments on domain robustness:

scripts/tatoeba/run_robustness_data.sh

To reproduce experiments on copy noise in the training data:

scripts/tatoeba/run_copy_noise.sh

Creating visualizations and result tables

To reproduce exactly the tables and figures we show in the paper, use our Google Colab here:

https://colab.research.google.com/drive/1GYZvxRB1aebOThGllgb0teY8A4suH5j-?usp=sharing

This is possible only because we have hosted the results of our experiments on our servers and Colab can retrieve files from there.

Browse MBR samples

We also provide examples for pools of MBR samples for your perusal, as HTML files that can be viewed in any browser. The example HTML files are created by running the following script:

./scripts/tatoeba/local_html.sh

and are available at the following URLs (Markdown does not support clickable links, sorry!):

Domain robustness

language pair	domain test set	link
DEU-ENG	it	https://files.ifi.uzh.ch/cl/archiv/2020/clcontra/deu-eng.domain_robustness.it.html
DEU-ENG	koran	https://files.ifi.uzh.ch/cl/archiv/2020/clcontra/deu-eng.domain_robustness.koran.html
DEU-ENG	law	https://files.ifi.uzh.ch/cl/archiv/2020/clcontra/deu-eng.domain_robustness.law.html
DEU-ENG	medical	https://files.ifi.uzh.ch/cl/archiv/2020/clcontra/deu-eng.domain_robustness.medical.html
DEU-ENG	subtitles	https://files.ifi.uzh.ch/cl/archiv/2020/clcontra/deu-eng.domain_robustness.subtitles.html

Copy noise in training data

language pair	amount of copy noise	link
ARA-DEU	0.001	https://files.ifi.uzh.ch/cl/archiv/2020/clcontra/ara-deu.copy_noise.0.001.slice-test.html
ARA-DEU	0.005	https://files.ifi.uzh.ch/cl/archiv/2020/clcontra/ara-deu.copy_noise.0.005.slice-test.html
ARA-DEU	0.01	https://files.ifi.uzh.ch/cl/archiv/2020/clcontra/ara-deu.copy_noise.0.01.slice-test.html
ARA-DEU	0.05	https://files.ifi.uzh.ch/cl/archiv/2020/clcontra/ara-deu.copy_noise.0.05.slice-test.html
ARA-DEU	0.075	https://files.ifi.uzh.ch/cl/archiv/2020/clcontra/ara-deu.copy_noise.0.075.slice-test.html
ARA-DEU	0.1	https://files.ifi.uzh.ch/cl/archiv/2020/clcontra/ara-deu.copy_noise.0.1.slice-test.html
ARA-DEU	0.25	https://files.ifi.uzh.ch/cl/archiv/2020/clcontra/ara-deu.copy_noise.0.25.slice-test.html
ARA-DEU	0.5	https://files.ifi.uzh.ch/cl/archiv/2020/clcontra/ara-deu.copy_noise.0.5.slice-test.html

language pair	amount of copy noise	link
ENG-MAR	0.001	https://files.ifi.uzh.ch/cl/archiv/2020/clcontra/eng-mar.copy_noise.0.001.slice-test.html
ENG-MAR	0.005	https://files.ifi.uzh.ch/cl/archiv/2020/clcontra/eng-mar.copy_noise.0.005.slice-test.html
ENG-MAR	0.01	https://files.ifi.uzh.ch/cl/archiv/2020/clcontra/eng-mar.copy_noise.0.01.slice-test.html
ENG-MAR	0.05	https://files.ifi.uzh.ch/cl/archiv/2020/clcontra/eng-mar.copy_noise.0.05.slice-test.html
ENG-MAR	0.075	https://files.ifi.uzh.ch/cl/archiv/2020/clcontra/eng-mar.copy_noise.0.075.slice-test.html
ENG-MAR	0.1	https://files.ifi.uzh.ch/cl/archiv/2020/clcontra/eng-mar.copy_noise.0.1.slice-test.html
ENG-MAR	0.25	https://files.ifi.uzh.ch/cl/archiv/2020/clcontra/eng-mar.copy_noise.0.25.slice-test.html
ENG-MAR	0.5	https://files.ifi.uzh.ch/cl/archiv/2020/clcontra/eng-mar.copy_noise.0.5.slice-test.html

You might also like...

Code For TDEER: An Efficient Translating Decoding Schema for Joint Extraction of Entities and Relations (EMNLP2021)

TDEER (WIP) Code For TDEER: An Efficient Translating Decoding Schema for Joint Extraction of Entities and Relations (EMNLP2021) Overview TDEER is an e

6 Dec 17, 2022

Code for the paper "Balancing Training for Multilingual Neural Machine Translation, ACL 2020"

Balancing Training for Multilingual Neural Machine Translation Implementation of the paper Balancing Training for Multilingual Neural Machine Translat

21 May 18, 2022

Code for paper "Vocabulary Learning via Optimal Transport for Neural Machine Translation"

**Codebase and data are uploaded in progress. ** VOLT(-py) is a vocabulary learning codebase that allows researchers and developers to automaticaly ge

416 Jan 9, 2023

Implementation of "Glancing Transformer for Non-Autoregressive Neural Machine Translation"

GLAT Implementation for the ACL2021 paper "Glancing Transformer for Non-Autoregressive Neural Machine Translation" Requirements Python = 3.7 Pytorch

117 Jan 9, 2023

Contrastive Learning for Many-to-many Multilingual Neural Machine Translation(mCOLT/mRASP2), ACL2021

Contrastive Learning for Many-to-many Multilingual Neural Machine Translation(mCOLT/mRASP2), ACL2021 The code for training mCOLT/mRASP2, a multilingua

104 Jan 1, 2023

"Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback"

This is code repo for our EMNLP 2017 paper "Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback", which implements the A2C algorithm on top of a neural encoder-decoder model and benchmarks the combination under simulated noisy rewards.

131 Oct 21, 2022

Comments

query on baseline translation quality

Hi, I'm working through your scripts (for which, thank you!) and would like to check that the Dan-Epo system that I produced is as expected. I ran the scripts more or less as expected, although from the command line, without SLURM. In computing BLEU score, I get:

cat evaluations/dan-epo/no_label_smoothing/test.beam.1.0.top.10.bleu BLEU+case.mixed+numrefs.1+smooth.exp+tok.13a+version.1.4.14 = 25.875 61.0/32.1/20.0/12.7 (BP = 0.974 ratio = 0.975 hyp_len = 33965 ref_len = 34844)

Does that look correct? In looking at Figure 9 of your appendix I expected a BLEU score closer to 30.0 or so.

Thanks, Bill

opened by wjbyrne 3

Understanding the Properties of Minimum Bayes Risk Decoding in Neural Machine Translation.

Related tags

Overview

Understanding Minimum Bayes Risk Decoding

Basic Setup

Install required software

Running experiments in general

Definition of "run"

SLURM jobs

Dry run

Single (non-dry!) example run

Start a certain group of runs

Recovering partial runs

Reproducing the results presented in our paper in particular

Training and evaluating the models

Creating visualizations and result tables

Browse MBR samples

Domain robustness

Copy noise in training data

You might also like...

Code For TDEER: An Efficient Translating Decoding Schema for Joint Extraction of Entities and Relations (EMNLP2021)

Code for the paper "Balancing Training for Multilingual Neural Machine Translation, ACL 2020"

Code for paper "Vocabulary Learning via Optimal Transport for Neural Machine Translation"

Implementation of "Glancing Transformer for Non-Autoregressive Neural Machine Translation"

Contrastive Learning for Many-to-many Multilingual Neural Machine Translation(mCOLT/mRASP2), ACL2021

"Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback"

Neural machine translation between the writings of Shakespeare and modern English using TensorFlow

PyTorch Implementation of "Non-Autoregressive Neural Machine Translation"

Tilted Empirical Risk Minimization (ICLR '21)

Comments

query on baseline translation quality

Owner

ZurichNLP

Code accompanying the paper "How Tight Can PAC-Bayes be in the Small Data Regime?"

Predict Breast Cancer Wisconsin (Diagnostic) using Naive Bayes

Clustering with variational Bayes and population Monte Carlo

Official repository for "Intriguing Properties of Vision Transformers" (2021)

Explainer for black box models that predict molecule properties

An end-to-end regression problem of predicting the price of properties in Bangalore.

Graph Self-Supervised Learning for Optoelectronic Properties of Organic Semiconductors

TUPÃ was developed to analyze electric field properties in molecular simulations

PICARD - Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models

PyTorch implementation of D2C: Diffuison-Decoding Models for Few-shot Conditional Generation.