Code for the paper "Balancing Training for Multilingual Neural Machine Translation, ACL 2020"

Overview

Balancing Training for Multilingual Neural Machine Translation

Implementation of the paper

Balancing Training for Multilingual Neural Machine Translation

Xinyi Wang, Yulia Tsvetkov, Graham Neubig

Data:

The preprocessed and binarized data for fairseq can be downloaded here

To process data from scrach, see the script

util_scripts/prepare_multilingual_data.sh

Training Scripts:

The training scripts for many-to-one translation of the related language group (Related M2O) is under the directory job_scripts/related_ted8_m2o/.

Our methods:

MultiDDS-S:

job_scripts/related_ted8_m2o/multidds_s.sh 

MultiDDS:

job_scripts/related_ted8_m2o/multidds.sh 

Baselines:

Proportional:

job_scripts/related_ted8_m2o/proportional.sh 

Temperature:

job_scripts/related_ted8_m2o/temperature.sh 

The scripts for Related O2M is under the directory job_scripts/related_ted8_o2m/

The scripts for Diverse M2O is under the directory job_scripts/diverse_ted8_m2o/

The scripts for Diverse O2M is under the directory job_scripts/diverse_ted8_o2m/

Inference Scripts:

Each of the experiment script directory contains a trans.sh file to translate the test set. To translate the test set for the Related M2O MultiDDS-S

job_scripts/related_ted8_m2o/trans.sh checkpoints/related_ted8_m2o/multidds_s/ 

To translate other experiment, simply replace the argument with the experiment checkpoint directory.

Citation

Please cite as:

@inproceedings{wang2020multiDDS,
  title = {Balancing Training for Multilingual Neural Machine Translation},
  author = {Xinyi Wang, Yulia Tsvetkov, Graham Neubig},
  booktitle = {ACL},
  year = {2020},
}
Comments
  • Couldn't re-implement the result in diverse_ted8_o2m setting

    Couldn't re-implement the result in diverse_ted8_o2m setting

    Hi, I succeeded in re-implementing the result in diverse_ted8_m2o setting. However, I failed to re-implement the result in diverse_ted8_o2m. The score of each language is 4 BLEU lower than the result shown in the paper (Table-11).

    The below is my result in diverse_ted8_o2m: image

    Have you met this situation?

    opened by OrangeInSouth 8
  • Some code fix and questions about the modifications on fairseq source code

    Some code fix and questions about the modifications on fairseq source code

    Hello, I'm Steven, from Johns Hopkins University. I'm currently working on a research project, studying different methods to denoise the training data for low resource languages. I came across your papers (DDS, TCS, and multiDDS) and I'm very interested in your implementation. I start checking this code repo very carefully and I found some issues (I sort of "fixed" them in my own way in a forked repo, if you think it's useful to incorporate in your repo, I can submit a pull request for you to review my changes). Here are the issues:

    fairseq beamsearch is out of date:

    the code in fairseq/seach.py (torch.div) is deprecated so I update them using the most recent fairseq's beamsearch code.

    undefined variable in trainer.py/update_language_sampler()

    I think this is the most important part of the code since you calculated the gradient similarity between the training set and dev set to get the similarity score to update the language distribution. There are some undefined or unused variables like self.optimizers, all_sim_list. I changed them so that the code only use one vector sim_list though theoretically there should be a N*N (N is number of language pairs) sim_list, and that's why you need all_sim_list to append different sim_list right? My change only helps me to run my own code since I'm using just 1pair of language instead of multilingual settings, but I think it shouldn't be hard to fix it, you might just leave those variables there by accident.

    generator is not reporting score properly

    It seems that if I use --sacrebleu to generate, the result is not a string but | Generate test with beam=5: <sacrebleu.metrics.bleu.BLEUScore object at 0x7fec308a75b0> I'm not what causes the object to be printed.

    The code is not working with Ave type data_actor

    Since I'm more interested in a one-pair setting instead of multilingual input, I want the scorer to directly work on src_tokens and trg_tokens, which is the method you proposed in the DDS paper. If I interpret your code correctly, this block should never be run right?

        # data selection: reset epoch iter to filter out unselected data
        if epoch_itr.epoch == args.select_by_dds_epoch and args.select_by_dds_epoch > 0:
            epoch_itr, _ = trainer.get_filtered_train_iterator(epoch_itr.epoch, filtered_maxpos_indices=filtered_maxpos_indices)
    

    Since I want to work with data-filtering, and I realize base data-actor is only seeing language IDs instead of real tokens, I have to useave type. To make it work, I changed your initialization steps (basically I added elif self.args.data_actor == 'ave': and adam optimizer for it in your trainer.py). I'm not sure if this modification is correct but select_by_dds_epoch works after this change. Therefore, I just want some confirmation/help from you that this is indeed the correct way to implement a data-filtering with ave data actor.

    Last question

    I'm just curious what is the usage --utility-type in the args. I didn't find where it's triggered when I debug through my console. Also, could you share with me the training script/hyper parameters you use for DDS (Optimizing Data Usage via Differentiable Rewards) since I want to train directly on 1 pair of languages and replicate your result.

    I'm really impressed by how well you modified the fairseq toolkit and incorporated the reinforcement optimization to change the data loading. If I have any misunderstanding about your methods or code implementation, please let me know. Also, please let me know that if you want me to submit a pull request for you to better view my changes. Thank you for your help in advance!

    opened by steventan0110 5
  • Dependency

    Dependency

    Hi

    Thank you for sharing all the codes.

    I got an error `ImportError: cannot import name 'libbleu' from 'fairseq'

    when I run the train.py

    I think it might be due to the old version of fairseq.

    Could you tell me all the dependencies for this repo?

    Thank you

    opened by seanie12 2
  • Trainer Gradient Update For Scorer in both train_step and update_language_sampler

    Trainer Gradient Update For Scorer in both train_step and update_language_sampler

    Hi Cindy,

    I was studying your code for in trainer.py and it seems like that you perform update for RL scorer (the data actor) in both update_language_sampler function and train_step function. Initially I thought you only update the RL in update_language_sampler() where you compute the cosine similarity of two gradients, but then I saw this block of code (which seems to only update ave_emb actor, so I wonder if you actually use this block of code?)

    # optimize data actor
                for k in cached_loss.keys():
                    reward = 1./eta * (cur_loss[k] - cached_loss[k])
                    if self.args.out_score_type == 'sigmoid':
                        #loss = -(torch.log(1e-20 + data_actor_out[k]) * reward.data)
                        loss = -(data_actor_out[k] * reward.data)
                    elif self.args.out_score_type == 'exp':
                        loss = -(torch.log(1e-20 + data_actor_out[k]) * reward.data)
                    if cur_loss[k].size(0) > 0:
                        loss.div_(cur_loss[k].size(0))
                    loss.sum().backward()
                if self.args.data_actor == 'ave_emb': 
                    self.data_optimizer.step()
                    self.data_optimizer.zero_grad()
    

    Thank you for your help and clarification!

    opened by steventan0110 1
  • Dataset (Ted-8-Related) is missing

    Dataset (Ted-8-Related) is missing

    Hi, I found the two files in databin are empty: ted_8_related/combined-train.spm8000.src ted_8_related/combined-train.spm8000.eng

    Could you re-upload the above files?

    opened by OrangeInSouth 0
  • bug fix in update-language-sampling and other files

    bug fix in update-language-sampling and other files

    Some of the files I added might not be necessary, like workflow.md (which I used to help myself understand the code) as well as the bash scripts for me to debug the code. For workflow.md though, I could work on it and make it a tutorial for this code repo after I fully understand your code. You can decide whether you want those files.

    The major changes are in trainer.py (about sim_list), and search.py (update the deprecated code).

    opened by steventan0110 0
Owner
Xinyi Wang
Xinyi Wang
Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper. This code is for the part of the paper describing video-based avatars.

NeuralTextures This is repository with inference code for paper "StylePeople: A Generative Model of Fullbody Human Avatars" (CVPR21). This code is for

Visual Understanding Lab @ Samsung AI Center Moscow 18 Oct 6, 2022
This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

SLATE This is the official source code for SLATE. We provide the code for the model, the training code and a dataset loader for the 3D Shapes dataset.

Gautam Singh 66 Dec 26, 2022
Code for paper ECCV 2020 paper: Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization in the Loop.

Who Left the Dogs Out? Evaluation and demo code for our ECCV 2020 paper: Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization

Benjamin Biggs 29 Dec 28, 2022
TensorFlow code for the neural network presented in the paper: "Structural Language Models of Code" (ICML'2020)

SLM: Structural Language Models of Code This is an official implementation of the model described in: "Structural Language Models of Code" [PDF] To ap

null 73 Nov 6, 2022
Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

CoProtector Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

Zhensu Sun 1 Oct 26, 2021
Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

This codebase is being actively maintained, please create and issue if you have issues using it Basics All data files are included under losses and ea

J K Terry 32 Nov 9, 2021
Code for our method RePRI for Few-Shot Segmentation. Paper at http://arxiv.org/abs/2012.06166

Region Proportion Regularized Inference (RePRI) for Few-Shot Segmentation In this repo, we provide the code for our paper : "Few-Shot Segmentation Wit

Malik Boudiaf 138 Dec 12, 2022
Code for ACM MM 2020 paper "NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination"

NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination The offical implementation for the "NOH-NMS: Improving Pedestrian Detection by

Tencent YouTu Research 64 Nov 11, 2022
Official TensorFlow code for the forthcoming paper

~ Efficient-CapsNet ~ Are you tired of over inflated and overused convolutional neural networks? You're right! It's time for CAPSULES :)

Vittorio Mazzia 203 Jan 8, 2023
This is the code for the paper "Contrastive Clustering" (AAAI 2021)

Contrastive Clustering (CC) This is the code for the paper "Contrastive Clustering" (AAAI 2021) Dependency python>=3.7 pytorch>=1.6.0 torchvision>=0.8

Yunfan Li 210 Dec 30, 2022
Code for the paper Learning the Predictability of the Future

Learning the Predictability of the Future Code from the paper Learning the Predictability of the Future. Website of the project in hyperfuture.cs.colu

Computer Vision Lab at Columbia University 139 Nov 18, 2022
PyTorch code for the paper: FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning

FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning This is the PyTorch implementation of our paper: FeatMatch: Feature-Based Augmentat

null 43 Nov 19, 2022
Code for the paper A Theoretical Analysis of the Repetition Problem in Text Generation

A Theoretical Analysis of the Repetition Problem in Text Generation This repository share the code for the paper "A Theoretical Analysis of the Repeti

Zihao Fu 37 Nov 21, 2022
Code for our ICASSP 2021 paper: SA-Net: Shuffle Attention for Deep Convolutional Neural Networks

SA-Net: Shuffle Attention for Deep Convolutional Neural Networks (paper) By Qing-Long Zhang and Yu-Bin Yang [State Key Laboratory for Novel Software T

Qing-Long Zhang 199 Jan 8, 2023
Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

Non-Rigid Neural Radiance Fields This is the official repository for the project "Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synt

Facebook Research 296 Dec 29, 2022
Code for the Shortformer model, from the paper by Ofir Press, Noah A. Smith and Mike Lewis.

Shortformer This repository contains the code and the final checkpoint of the Shortformer model. This file explains how to run our experiments on the

Ofir Press 138 Apr 15, 2022
PyTorch code for ICLR 2021 paper Unbiased Teacher for Semi-Supervised Object Detection

Unbiased Teacher for Semi-Supervised Object Detection This is the PyTorch implementation of our paper: Unbiased Teacher for Semi-Supervised Object Detection

Facebook Research 366 Dec 28, 2022
Official code for paper "Optimization for Oriented Object Detection via Representation Invariance Loss".

Optimization for Oriented Object Detection via Representation Invariance Loss By Qi Ming, Zhiqiang Zhou, Lingjuan Miao, Xue Yang, and Yunpeng Dong. Th

ming71 56 Nov 28, 2022
Code for our CVPR 2021 paper "MetaCam+DSCE"

Joint Noise-Tolerant Learning and Meta Camera Shift Adaptation for Unsupervised Person Re-Identification (CVPR'21) Introduction Code for our CVPR 2021

FlyingRoastDuck 59 Oct 31, 2022