Code for the paper "Balancing Training for Multilingual Neural Machine Translation, ACL 2020"

Xinyi Wang

Last update: May 18, 2022

Related tags

Deep Learning multiDDS

Overview

Balancing Training for Multilingual Neural Machine Translation

Implementation of the paper

Balancing Training for Multilingual Neural Machine Translation

Xinyi Wang, Yulia Tsvetkov, Graham Neubig

Data:

The preprocessed and binarized data for fairseq can be downloaded here

To process data from scrach, see the script

util_scripts/prepare_multilingual_data.sh

Training Scripts:

The training scripts for many-to-one translation of the related language group (Related M2O) is under the directory job_scripts/related_ted8_m2o/.

Our methods:

MultiDDS-S:

job_scripts/related_ted8_m2o/multidds_s.sh

MultiDDS:

job_scripts/related_ted8_m2o/multidds.sh

Baselines:

Proportional:

job_scripts/related_ted8_m2o/proportional.sh

Temperature:

job_scripts/related_ted8_m2o/temperature.sh

The scripts for Related O2M is under the directory job_scripts/related_ted8_o2m/

The scripts for Diverse M2O is under the directory job_scripts/diverse_ted8_m2o/

The scripts for Diverse O2M is under the directory job_scripts/diverse_ted8_o2m/

Inference Scripts:

Each of the experiment script directory contains a trans.sh file to translate the test set. To translate the test set for the Related M2O MultiDDS-S

job_scripts/related_ted8_m2o/trans.sh checkpoints/related_ted8_m2o/multidds_s/

To translate other experiment, simply replace the argument with the experiment checkpoint directory.

Citation

Please cite as:

@inproceedings{wang2020multiDDS,
  title = {Balancing Training for Multilingual Neural Machine Translation},
  author = {Xinyi Wang, Yulia Tsvetkov, Graham Neubig},
  booktitle = {ACL},
  year = {2020},
}

Comments

Couldn't re-implement the result in diverse_ted8_o2m setting

Hi, I succeeded in re-implementing the result in diverse_ted8_m2o setting. However, I failed to re-implement the result in diverse_ted8_o2m. The score of each language is 4 BLEU lower than the result shown in the paper (Table-11).

The below is my result in diverse_ted8_o2m:

Have you met this situation?

opened by OrangeInSouth 8
Some code fix and questions about the modifications on fairseq source code
Hello, I'm Steven, from Johns Hopkins University. I'm currently working on a research project, studying different methods to denoise the training data for low resource languages. I came across your papers (DDS, TCS, and multiDDS) and I'm very interested in your implementation. I start checking this code repo very carefully and I found some issues (I sort of "fixed" them in my own way in a forked repo, if you think it's useful to incorporate in your repo, I can submit a pull request for you to review my changes). Here are the issues:

fairseq beamsearch is out of date:

the code in fairseq/seach.py (torch.div) is deprecated so I update them using the most recent fairseq's beamsearch code.

undefined variable in trainer.py/update_language_sampler()

I think this is the most important part of the code since you calculated the gradient similarity between the training set and dev set to get the similarity score to update the language distribution. There are some undefined or unused variables like self.optimizers, all_sim_list. I changed them so that the code only use one vector sim_list though theoretically there should be a N*N (N is number of language pairs) sim_list, and that's why you need all_sim_list to append different sim_list right? My change only helps me to run my own code since I'm using just 1pair of language instead of multilingual settings, but I think it shouldn't be hard to fix it, you might just leave those variables there by accident.

generator is not reporting score properly

It seems that if I use --sacrebleu to generate, the result is not a string but | Generate test with beam=5: <sacrebleu.metrics.bleu.BLEUScore object at 0x7fec308a75b0> I'm not what causes the object to be printed.

The code is not working with Ave type data_actor

Since I'm more interested in a one-pair setting instead of multilingual input, I want the scorer to directly work on src_tokens and trg_tokens, which is the method you proposed in the DDS paper. If I interpret your code correctly, this block should never be run right?

# data selection: reset epoch iter to filter out unselected data if epoch_itr.epoch == args.select_by_dds_epoch and args.select_by_dds_epoch > 0: epoch_itr, _ = trainer.get_filtered_train_iterator(epoch_itr.epoch, filtered_maxpos_indices=filtered_maxpos_indices)

Since I want to work with data-filtering, and I realize base data-actor is only seeing language IDs instead of real tokens, I have to useave type. To make it work, I changed your initialization steps (basically I added elif self.args.data_actor == 'ave': and adam optimizer for it in your trainer.py). I'm not sure if this modification is correct but select_by_dds_epoch works after this change. Therefore, I just want some confirmation/help from you that this is indeed the correct way to implement a data-filtering with ave data actor.

Last question

I'm just curious what is the usage --utility-type in the args. I didn't find where it's triggered when I debug through my console. Also, could you share with me the training script/hyper parameters you use for DDS (Optimizing Data Usage via Differentiable Rewards) since I want to train directly on 1 pair of languages and replicate your result.

I'm really impressed by how well you modified the fairseq toolkit and incorporated the reinforcement optimization to change the data loading. If I have any misunderstanding about your methods or code implementation, please let me know. Also, please let me know that if you want me to submit a pull request for you to better view my changes. Thank you for your help in advance!
opened by steventan0110 5
Dependency

Hi

Thank you for sharing all the codes.

I got an error `ImportError: cannot import name 'libbleu' from 'fairseq'

when I run the train.py

I think it might be due to the old version of fairseq.

Could you tell me all the dependencies for this repo?

Thank you

opened by seanie12 2

Trainer Gradient Update For Scorer in both train_step and update_language_sampler

Hi Cindy,

I was studying your code for in trainer.py and it seems like that you perform update for RL scorer (the data actor) in both update_language_sampler function and train_step function. Initially I thought you only update the RL in update_language_sampler() where you compute the cosine similarity of two gradients, but then I saw this block of code (which seems to only update ave_emb actor, so I wonder if you actually use this block of code?)

# optimize data actor
            for k in cached_loss.keys():
                reward = 1./eta * (cur_loss[k] - cached_loss[k])
                if self.args.out_score_type == 'sigmoid':
                    #loss = -(torch.log(1e-20 + data_actor_out[k]) * reward.data)
                    loss = -(data_actor_out[k] * reward.data)
                elif self.args.out_score_type == 'exp':
                    loss = -(torch.log(1e-20 + data_actor_out[k]) * reward.data)
                if cur_loss[k].size(0) > 0:
                    loss.div_(cur_loss[k].size(0))
                loss.sum().backward()
            if self.args.data_actor == 'ave_emb': 
                self.data_optimizer.step()
                self.data_optimizer.zero_grad()

Thank you for your help and clarification!

opened by steventan0110 1

Dataset (Ted-8-Related) is missing

Hi, I found the two files in databin are empty: ted_8_related/combined-train.spm8000.src ted_8_related/combined-train.spm8000.eng

Could you re-upload the above files?

opened by OrangeInSouth 0
bug fix in update-language-sampling and other files

Some of the files I added might not be necessary, like workflow.md (which I used to help myself understand the code) as well as the bash scripts for me to debug the code. For workflow.md though, I could work on it and make it a tutorial for this code repo after I fully understand your code. You can decide whether you want those files.

The major changes are in trainer.py (about sim_list), and search.py (update the deprecated code).

opened by steventan0110 0

Owner

Xinyi Wang

GitHub

Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper. This code is for the part of the paper describing video-based avatars.

NeuralTextures This is repository with inference code for paper "StylePeople: A Generative Model of Fullbody Human Avatars" (CVPR21). This code is for

Visual Understanding Lab @ Samsung AI Center Moscow

18 Oct 6, 2022

This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

SLATE This is the official source code for SLATE. We provide the code for the model, the training code and a dataset loader for the 3D Shapes dataset.

66 Dec 26, 2022

Code for paper ECCV 2020 paper: Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization in the Loop.

Who Left the Dogs Out? Evaluation and demo code for our ECCV 2020 paper: Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization

29 Dec 28, 2022

TensorFlow code for the neural network presented in the paper: "Structural Language Models of Code" (ICML'2020)

SLM: Structural Language Models of Code This is an official implementation of the model described in: "Structural Language Models of Code" [PDF] To ap

73 Nov 6, 2022

Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

CoProtector Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

1 Oct 26, 2021

Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

This codebase is being actively maintained, please create and issue if you have issues using it Basics All data files are included under losses and ea

32 Nov 9, 2021

Code for our method RePRI for Few-Shot Segmentation. Paper at http://arxiv.org/abs/2012.06166

Region Proportion Regularized Inference (RePRI) for Few-Shot Segmentation In this repo, we provide the code for our paper : "Few-Shot Segmentation Wit

138 Dec 12, 2022

Code for ACM MM 2020 paper "NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination"

NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination The offical implementation for the "NOH-NMS: Improving Pedestrian Detection by

64 Nov 11, 2022

Official TensorFlow code for the forthcoming paper

~ Efficient-CapsNet ~ Are you tired of over inflated and overused convolutional neural networks? You're right! It's time for CAPSULES :)

203 Jan 8, 2023

This is the code for the paper "Contrastive Clustering" (AAAI 2021)

Contrastive Clustering (CC) This is the code for the paper "Contrastive Clustering" (AAAI 2021) Dependency python>=3.7 pytorch>=1.6.0 torchvision>=0.8

210 Dec 30, 2022

Code for the paper Learning the Predictability of the Future

Learning the Predictability of the Future Code from the paper Learning the Predictability of the Future. Website of the project in hyperfuture.cs.colu

Computer Vision Lab at Columbia University

139 Nov 18, 2022

PyTorch code for the paper: FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning

FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning This is the PyTorch implementation of our paper: FeatMatch: Feature-Based Augmentat

43 Nov 19, 2022

Code for the paper A Theoretical Analysis of the Repetition Problem in Text Generation

A Theoretical Analysis of the Repetition Problem in Text Generation This repository share the code for the paper "A Theoretical Analysis of the Repeti

37 Nov 21, 2022

Code for our ICASSP 2021 paper: SA-Net: Shuffle Attention for Deep Convolutional Neural Networks

SA-Net: Shuffle Attention for Deep Convolutional Neural Networks (paper) By Qing-Long Zhang and Yu-Bin Yang [State Key Laboratory for Novel Software T

199 Jan 8, 2023

Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

Non-Rigid Neural Radiance Fields This is the official repository for the project "Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synt

296 Dec 29, 2022

Code for the paper "Balancing Training for Multilingual Neural Machine Translation, ACL 2020"

Related tags

Overview

Balancing Training for Multilingual Neural Machine Translation

Data:

Training Scripts:

Inference Scripts:

Citation

Comments

Couldn't re-implement the result in diverse_ted8_o2m setting

Some code fix and questions about the modifications on fairseq source code

fairseq beamsearch is out of date:

undefined variable in trainer.py/update_language_sampler()

generator is not reporting score properly

The code is not working with Ave type data_actor

Last question

Dependency

Trainer Gradient Update For Scorer in both train_step and update_language_sampler

Dataset (Ted-8-Related) is missing

bug fix in update-language-sampling and other files

Owner

Xinyi Wang

Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper. This code is for the part of the paper describing video-based avatars.

This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

Code for paper ECCV 2020 paper: Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization in the Loop.

TensorFlow code for the neural network presented in the paper: "Structural Language Models of Code" (ICML'2020)

Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

Code for our method RePRI for Few-Shot Segmentation. Paper at http://arxiv.org/abs/2012.06166

Code for ACM MM 2020 paper "NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination"

Official TensorFlow code for the forthcoming paper

This is the code for the paper "Contrastive Clustering" (AAAI 2021)

Code for the paper Learning the Predictability of the Future

PyTorch code for the paper: FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning

Code for the paper A Theoretical Analysis of the Repetition Problem in Text Generation

Code for our ICASSP 2021 paper: SA-Net: Shuffle Attention for Deep Convolutional Neural Networks

Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

Code for the Shortformer model, from the paper by Ofir Press, Noah A. Smith and Mike Lewis.

PyTorch code for ICLR 2021 paper Unbiased Teacher for Semi-Supervised Object Detection

Official code for paper "Optimization for Oriented Object Detection via Representation Invariance Loss".

Code for our CVPR 2021 paper "MetaCam+DSCE"

The code is not working with `Ave type data_actor`