Phrase-Based & Neural Unsupervised Machine Translation

Overview

Unsupervised Machine Translation

This repository contains the original implementation of the unsupervised PBSMT and NMT models presented in
Phrase-Based & Neural Unsupervised Machine Translation (EMNLP 2018).

Note: for the NMT approach, we recommend you have a look at Cross-lingual Language Model Pretraining and the associated GitHub repository https://github.com/facebookresearch/XLM which contains a better model and a more efficient implementation of unsupervised machine translation.

Model

The NMT implementation supports:

  • Three machine translation architectures (seq2seq, biLSTM + attention, Transformer)
  • Ability to share an arbitrary number of parameters across models / languages
  • Denoising auto-encoder training
  • Parallel data training
  • Back-parallel data training
  • On-the-fly multithreaded generation of back-parallel data

As well as other features not used in the original paper (and left for future work):

  • Arbitrary number of languages during training
  • Language model pre-training / co-training with shared parameters
  • Adversarial training

The PBSMT implementation supports:

  • Unsupervised phrase-table generation scripts
  • Automated Moses training

Dependencies

  • Python 3
  • NumPy
  • PyTorch (currently tested on version 0.5)
  • Moses (clean and tokenize text / train PBSMT model)
  • fastBPE (generate and apply BPE codes)
  • fastText (generate embeddings)
  • MUSE (generate cross-lingual embeddings)

For the NMT implementation, the NMT/get_data_enfr.sh script will take care of installing everything (except PyTorch). The same script is also provided for English-German: NMT/get_data_deen.sh. The NMT implementation only requires Moses preprocessing scripts, which does not require to install Moses.

The PBSMT implementation will require a working implementation of Moses, which you will have to install by yourself. Compiling Moses is not always straightforward, a good alternative is to download the binary executables.

Unsupervised NMT

Download / preprocess data

The first thing to do to run the NMT model is to download and preprocess data. To do so, just run:

git clone https://github.com/facebookresearch/UnsupervisedMT.git
cd UnsupervisedMT/NMT
./get_data_enfr.sh

The script will successively:

  • Install tools
    • Download Moses scripts
    • Download and compile fastBPE
    • Download and compile fastText
  • Download and prepare monolingual data
    • Download / extract / tokenize monolingual data
    • Generate and apply BPE codes on monolingual data
    • Extract training vocabulary
    • Binarize monolingual data
  • Download and prepare parallel data (for evaluation)
    • Download / extract / tokenize parallel data
    • Apply BPE codes on parallel data with training vocabulary
    • Binarize parallel data
  • Train cross-lingual embeddings

get_data_enfr.sh contains a few parameters defined at the beginning of the file:

  • N_MONO number of monolingual sentences for each language (default 10000000)
  • CODES number of BPE codes (default 60000)
  • N_THREADS number of threads in data preprocessing (default 48)
  • N_EPOCHS number of fastText epochs (default 10)

Adding more monolingual data will improve the performance, but will take longer to preprocess and train (10 million sentences is what was used in the paper for NMT). The script should output a data summary that contains the location of all files required to start experiments:

Monolingual training data:
    EN: ./data/mono/all.en.tok.60000.pth
    FR: ./data/mono/all.fr.tok.60000.pth
Parallel validation data:
    EN: ./data/para/dev/newstest2013-ref.en.60000.pth
    FR: ./data/para/dev/newstest2013-ref.fr.60000.pth
Parallel test data:
    EN: ./data/para/dev/newstest2014-fren-src.en.60000.pth
    FR: ./data/para/dev/newstest2014-fren-src.fr.60000.pth

Concatenated data in: ./data/mono/all.en-fr.60000
Cross-lingual embeddings in: ./data/mono/all.en-fr.60000.vec

Note that there are several ways to train cross-lingual embeddings:

  • Train monolingual embeddings separately for each language, and align them with MUSE (please refer to the original paper for more details).
  • Concatenate the source and target monolingual corpora in a single file, and train embeddings with fastText on that generated file (this is what is implemented in the get_data_enfr.sh script).

The second method works better when the source and target languages are similar and share a lot of common words (such as French and English). However, when the overlap between the source and target vocabulary is too small, the alignment will be very poor and you should opt for the first method using MUSE to generate your cross-lingual embeddings.

Train the NMT model

Given binarized monolingual training data, parallel evaluation data, and pretrained cross-lingual embeddings, you can train the model using the following command:

python main.py 

## main parameters
--exp_name test                             # experiment name

## network architecture
--transformer True                          # use a transformer architecture
--n_enc_layers 4                            # use 4 layers in the encoder
--n_dec_layers 4                            # use 4 layers in the decoder

## parameters sharing
--share_enc 3                               # share 3 out of the 4 encoder layers
--share_dec 3                               # share 3 out of the 4 decoder layers
--share_lang_emb True                       # share lookup tables
--share_output_emb True                     # share projection output layers

## datasets location
--langs 'en,fr'                             # training languages (English, French)
--n_mono -1                                 # number of monolingual sentences (-1 for everything)
--mono_dataset $MONO_DATASET                # monolingual dataset
--para_dataset $PARA_DATASET                # parallel dataset

## denoising auto-encoder parameters
--mono_directions 'en,fr'                   # train the auto-encoder on English and French
--word_shuffle 3                            # shuffle words
--word_dropout 0.1                          # randomly remove words
--word_blank 0.2                            # randomly blank out words

## back-translation directions
--pivo_directions 'en-fr-en,fr-en-fr'       # back-translation directions (en->fr->en and fr->en->fr)

## pretrained embeddings
--pretrained_emb $PRETRAINED                # cross-lingual embeddings path
--pretrained_out True                       # also pretrain output layers

## dynamic loss coefficients
--lambda_xe_mono '0:1,100000:0.1,300000:0'  # auto-encoder loss coefficient
--lambda_xe_otfd 1                          # back-translation loss coefficient

## CPU on-the-fly generation
--otf_num_processes 30                      # number of CPU jobs for back-parallel data generation
--otf_sync_params_every 1000                # CPU parameters synchronization frequency

## optimization
--enc_optimizer adam,lr=0.0001              # model optimizer
--group_by_size True                        # group sentences by length inside batches
--batch_size 32                             # batch size
--epoch_size 500000                         # epoch size
--stopping_criterion bleu_en_fr_valid,10    # stopping criterion
--freeze_enc_emb False                      # freeze encoder embeddings
--freeze_dec_emb False                      # freeze decoder embeddings


## With
MONO_DATASET='en:./data/mono/all.en.tok.60000.pth,,;fr:./data/mono/all.fr.tok.60000.pth,,'
PARA_DATASET='en-fr:,./data/para/dev/newstest2013-ref.XX.60000.pth,./data/para/dev/newstest2014-fren-src.XX.60000.pth'
PRETRAINED='./data/mono/all.en-fr.60000.vec'

Some parameters must respect a particular format:

  • langs
    • A list of languages, sorted by language ID.
    • en,fr for "English and French"
    • de,en,es,fr for "German, English, Spanish and French"
  • mono_dataset
    • A dictionary that maps a language to train, validation and test files.
    • Validation and test files are optional (usually we only need them for training).
    • en:train.en,valid.en,test.en;fr:train.fr,valid.fr,test.fr
  • para_dataset
    • A dictionary that maps a language pair to train, validation and test files.
    • Training file is optional (in unsupervised MT we only use parallel data for evaluation).
    • en-fr:train.en-fr.XX,valid.en-fr.XX,test.en-fr.XX to indicate the validation and test paths.
  • mono_directions
    • A list of languages on which we want to train the denoising auto-encoder.
    • en,fr to train the auto-encoder both on English and French.
  • para_directions
    • A list of tuples on which we want to train the MT system in a standard supervised way.
    • en-fr,fr-de will train the model in both the en->fr and fr->de directions.
    • Requires to provide the model with parallel data.
  • pivo_directions
    • A list of triplets on which we want to perform back-translation.
    • fr-en-fr,en-fr-en will train the model on the fr->en->fr and en->fr->en directions.
    • en-fr-de,de-fr-en will train the model on the en->fr->de and de->fr->en directions (assuming that fr is the unknown language, and that English-German parallel data is provided).

Other parameters:

  • --otf_num_processes 30 indicates that 30 CPU threads will be generating back-translation data on the fly, using the current model parameters
  • --otf_sync_params_every 1000 indicates that models on CPU threads will be synchronized every 1000 training steps
  • --lambda_xe_otfd 1 means that the coefficient associated to the back-translation loss is fixed to a constant of 1
  • --lambda_xe_mono '0:1,100000:0.1,300000:0' means that the coefficient associated to the denoising auto-encoder loss is initially set to 1, will linearly decrease to 0.1 over the first 100000 steps, then to 0 over the following 200000 steps, and will finally be equal to 0 during the remaining of the experiment (i.e. we train with back-translation only)

Putting all this together, the training command becomes:

python main.py --exp_name test --transformer True --n_enc_layers 4 --n_dec_layers 4 --share_enc 3 --share_dec 3 --share_lang_emb True --share_output_emb True --langs 'en,fr' --n_mono -1 --mono_dataset 'en:./data/mono/all.en.tok.60000.pth,,;fr:./data/mono/all.fr.tok.60000.pth,,' --para_dataset 'en-fr:,./data/para/dev/newstest2013-ref.XX.60000.pth,./data/para/dev/newstest2014-fren-src.XX.60000.pth' --mono_directions 'en,fr' --word_shuffle 3 --word_dropout 0.1 --word_blank 0.2 --pivo_directions 'fr-en-fr,en-fr-en' --pretrained_emb './data/mono/all.en-fr.60000.vec' --pretrained_out True --lambda_xe_mono '0:1,100000:0.1,300000:0' --lambda_xe_otfd 1 --otf_num_processes 30 --otf_sync_params_every 1000 --enc_optimizer adam,lr=0.0001 --epoch_size 500000 --stopping_criterion bleu_en_fr_valid,10

On newstest2014 en-fr, the above command should give above 23.0 BLEU after 25 epochs (i.e. after one day of training on a V100).

Unsupervised PBSMT

Running the PBSMT approach requires to have a working version of Moses. On some systems Moses is not very straightforward to compile, and it is sometimes much simpler to download the binaries directly.

Once you have a working version of Moses, edit the MOSES_PATH variable inside the PBSMT/run.sh script to indicate the location of Moses directory. Then, simply run:

cd PBSMT
./run.sh

The script will successively:

  • Install tools
    • Check Moses files
    • Download MUSE and download evaluation files
  • Download pretrained word embeddings
  • Download and prepare monolingual data
    • Download / extract / tokenize monolingual data
    • Learn truecasers and apply them on monolingual data
    • Learn and binarize language models for Moses decoding
  • Download and prepare parallel data (for evaluation):
    • Download / extract / tokenize parallel data
    • Truecase parallel data
  • Run MUSE to generate cross-lingual embeddings
  • Generate an unsupervised phrase-table using MUSE alignments
  • Run Moses
    • Create Moses configuration file
    • Run Moses on test sentences
    • Detruecase translations
  • Evaluate translations

run.sh contains a few parameters defined at the beginning of the file:

  • MOSES_PATH folder containing Moses installation
  • N_MONO number of monolingual sentences for each language (default 10000000)
  • N_THREADS number of threads in data preprocessing (default 48)
  • SRC source language (default English)
  • TGT target language (default French)

The script should return something like this:

BLEU = 13.49, 51.9/21.1/10.2/5.2 (BP=0.869, ratio=0.877, hyp_len=71143, ref_len=81098)
End of training. Experiment is stored in: ./UnsupervisedMT/PBSMT/moses_train_en-fr

If you use 50M instead of 10M sentences in your language model, you should get BLEU = 15.66, 52.9/23.2/12.3/7.0. Using a bigger language model, as well as phrases instead of words, will improve the results even further.

References

Please cite [1] and [2] if you found the resources in this repository useful.

[1] G. Lample, M. Ott, A. Conneau, L. Denoyer, MA. Ranzato Phrase-Based & Neural Unsupervised Machine Translation

Phrase-Based & Neural Unsupervised Machine Translation

@inproceedings{lample2018phrase,
  title={Phrase-Based \& Neural Unsupervised Machine Translation},
  author={Lample, Guillaume and Ott, Myle and Conneau, Alexis and Denoyer, Ludovic and Ranzato, Marc'Aurelio},
  booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  year={2018}
}

Unsupervised Machine Translation With Monolingual Data Only

[2] G. Lample, A. Conneau, L. Denoyer, MA. Ranzato Unsupervised Machine Translation With Monolingual Data Only

@inproceedings{lample2017unsupervised,
  title = {Unsupervised machine translation using monolingual corpora only},
  author = {Lample, Guillaume and Conneau, Alexis and Denoyer, Ludovic and Ranzato, Marc'Aurelio},
  booktitle = {International Conference on Learning Representations (ICLR)},
  year = {2018}
}

Word Translation Without Parallel Data

[3] A. Conneau*, G. Lample*, L. Denoyer, MA. Ranzato, H. Jégou, Word Translation Without Parallel Data

* Equal contribution. Order has been determined with a coin flip.

@inproceedings{conneau2017word,
  title = {Word Translation Without Parallel Data},
  author = {Conneau, Alexis and Lample, Guillaume and Ranzato, Marc'Aurelio and Denoyer, Ludovic and J\'egou, Herv\'e},
  booktitle = {International Conference on Learning Representations (ICLR)},
  year = {2018}
}

License

See the LICENSE file for more details.

Comments
  • Help me understand the Output/Parameters and inference.

    Help me understand the Output/Parameters and inference.

    I am training spanish to english NMT model. It prints below logs after epoch 0 when i run NMT/main.py.

    INFO - 09/24/18 12:38:38 - 1:47:10 - 600 - 12.01 sent/s - 377.00 words/s - XE-es-es: 4.5986 || XE-en-en: 4.9864 || XE-en-es-en: 5.7000 || XE-es-en-es: 5.5308 || ENC-L2-en: 4.8750 || ENC-L2-es: 4.7380 - LR enc=1.0000e-04,dec=1.0000e-04 - Sentences generation time: 225.13s (42.23%)

    What confusing me is what does XE-es-es and XE-en-en means ? Shouldn't it be XE-es-en and XE-en-es, if it's the loss? Also if anyone could explain all the parameters being printed, during training, it would be helpful in understanding on what is happenning.

    opened by akanshajainn 15
  • Why I get BLEU 1.01 on zh-en of PBSMT ?

    Why I get BLEU 1.01 on zh-en of PBSMT ?

    Hi,

    I was confused for several days.

    I followed the steps of PBSMT/run.sh to do my work, and I think the most important step is "Running MUSE to generate cross-lingual embeddings". I aligned the 'zh' and 'en' pre-trained word vectors you provided on [https://fasttext.cc/docs/en/crawl-vectors.html] with MUSE, and got "Adv-NN P@1=21.3、Adv-CSLS P@1=26.9、Adv-Refine-NN P@1=18.5、Adv-Refine-CSLS P@1=24.0".

    Then, I used the aligned embeddings to generate the phrase-table, but finally I got BLEU of 1.01. I don't think the result is right. Something must have gone wrong.

    My command of MUSE is: python unsupervised.py --src_lang ch \ --tgt_lang en \ --src_emb /data/experiment/embeddings/wiki.ch.300.vec.20w \ --tgt_emb /data/experiment/embeddings/wiki.en.300.vec.20w \ --exp_name test \ --exp_id 0 \ --normalize_embeddings center \ --emb_dim 300 \ --dis_most_frequent 50000 \ --epoch_size 500000 \ --dico_eval /data/experiment/unsupervisedMT/fordict/zh-en.5000-6500.sim.txt \ --n_refinement 5 \ --export "pth"

    My command for generate phrase table is: python create-phrase-table.py \ --src_lang $SRC \ --tgt_lang $TGT \ --src_emb $ALIGNED_EMBEDDINGS_SRC \ --tgt_emb $ALIGNED_EMBEDDINGS_TGT \ --csls 1 \ --max_rank 200 \ --max_vocab 300000 \ --inverse_score 1 \ --temperature 45 \ --phrase_table_path ${PHRASE_TABLE_PATH::-3}

    Does the problem lay in the word embeddings, shoud I use the word embeddings trained on my training data with fastText for MUSE? I have tried it (use the word embedding trained on my training data), but got "Adv-NN P@1=0.07、Adv-CSLS P@1=0.07、Adv-Refine-NN P@1=0.00、Adv-Refine-CSLS P@1=0.00". My command is : ./fasttext skipgram -epoch 10 -minCount 0 -dim 300 -thread 48 -ws 5 -neg 10 -input $SRC_TOK -output $EMB_SRC. So I did't use the word embedding generated on training data , because I think I didn't align them well.

    So, where is the fault?

    opened by socaty 9
  • cannot reproduce the results of unsupervised NMT

    cannot reproduce the results of unsupervised NMT

    I use the command python main.py --exp_name test --transformer True --n_enc_layers 4 --n_dec_layers 4 --share_enc 3 --share_dec 3 --share_lang_emb True --share_output_emb True --langs 'en,fr' --n_mono -1 --mono_dataset 'en:./data/mono/all.en.tok.60000.pth,,;fr:./data/mono/all.fr.tok.60000.pth,,' --para_dataset 'en-fr:,./data/para/dev/newstest2013-ref.XX.60000.pth,./data/para/dev/newstest2014-fren-src.XX.60000.pth' --mono_directions 'en,fr' --word_shuffle 3 --word_dropout 0.1 --word_blank 0.2 --pivo_directions 'fr-en-fr,en-fr-en' --pretrained_emb './data/mono/all.en-fr.60000.vec' --pretrained_out True --lambda_xe_mono '0:1,100000:0.1,300000:0' --lambda_xe_otfd 1 --otf_num_processes 30 --otf_sync_params_every 1000 --enc_optimizer adam,lr=0.0001 --epoch_size 500000 --stopping_criterion bleu_en_fr_valid,10

    to run the codes, but finally I can just get BLEU score: bleu_en_fr_test -> 18.400000 bleu_fr_en_test -> 18.610000

    I cannot get the score above 23.0 BLEU. This the final log: INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - BLEU ./dumped/test/2w3b5142k5/hyp107.en-fr-en.test.txt ./dumped/test/2w3b5142k5/ref.fr-en.test.txt : 45.160000 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - epoch -> 107.000000 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - ppl_en_fr_valid -> 65.701571 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - bleu_en_fr_valid -> 15.980000 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - ppl_fr_en_valid -> 91.976447 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - bleu_fr_en_valid -> 15.940000 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - ppl_en_fr_test -> 41.084037 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - bleu_en_fr_test -> 18.400000 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - ppl_fr_en_test -> 59.131132 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - bleu_fr_en_test -> 18.610000 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - ppl_fr_en_fr_valid -> 3.330721 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - bleu_fr_en_fr_valid -> 44.190000 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - ppl_fr_en_fr_test -> 2.995039 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - bleu_fr_en_fr_test -> 44.170000 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - ppl_en_fr_en_valid -> 3.518063 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - bleu_en_fr_en_valid -> 45.580000 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - ppl_en_fr_en_test -> 3.443367 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - bleu_en_fr_en_test -> 45.160000 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - __log__:{"epoch": 107, "ppl_en_fr_valid": 65.70157069383522, "bleu_en_fr_valid": 15.98, "ppl_fr_en_valid": 91.9764473614227, "bleu_fr_en_valid": 15.94, "ppl_en_fr_test": 41.084036855000534, "bleu_en_fr_test": 18.4, "ppl_fr_en_test": 59.131132225283814, "bleu_fr_en_test": 18.61, "ppl_fr_en_fr_valid": 3.3307206599236614, "bleu_fr_en_fr_valid": 44.19, "ppl_fr_en_fr_test": 2.9950391883629113, "bleu_fr_en_fr_test": 44.17, "ppl_en_fr_en_valid": 3.518062580498395, "bleu_en_fr_en_valid": 45.58, "ppl_en_fr_en_test": 3.4433665669537183, "bleu_en_fr_en_test": 45.16} INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - Not a better validation score (10 / 10). INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - Stopping criterion has been below its best value more than 10 epochs. Ending the experiment...

    I follow the README step by step. So is there anything that I miss?

    opened by tobyyouup 9
  • Train zh-en:assert sorted(params.langs) == params.langs

    Train zh-en:assert sorted(params.langs) == params.langs

    main.py --exp_name zhTest --transformer True --n_enc_layers 4 --n_dec_layers 4 --share_enc 3 --share_dec 3 --share_lang_emb True --share_output_emb True --langs 'zh,en' --n_mono -1 --mono_dataset 'zh:./data/mono/all.zh.tok.60000.pth,,;en:./data/mono/all.en.tok.60000.pth,,' --para_dataset 'zh-en:,./data/para/dev/newsdev2017-enzh-ref.XX.60000.pth,./data/para/dev/newsdev2017-zhen-src.XX.60000.pth' --mono_directions 'zh,en' --word_shuffle 3 --word_dropout 0.1 --word_blank 0.2 --pivo_directions 'en-zh-en,zh-en-zh' --pretrained_emb './data/mono/all.zh-en.60000.vec' --pretrained_out True --lambda_xe_mono '0:1,100000:0.1,300000:0' --lambda_xe_otfd 1 --otf_num_processes 30 --otf_sync_params_every 1000 --enc_optimizer adam,lr=0.0001 --epoch_size 500000 --stopping_criterion bleu_zh_en_valid,10

    This my params

    opened by JxuHenry 8
  • ModuleNotFoundError: No module named 'fb' when running NMT training script

    ModuleNotFoundError: No module named 'fb' when running NMT training script

    While trying to run the training script python3 main.py I receive the following error:

    File "<my_project_directory>/NMT/src/data/loader.py", line 164, in load_para_data data1 = load_binarized(path.replace('XX', lang1), params) File "<my_project_directory>/NMT/src/data/loader.py", line 32, in load_binarized data = torch.load(path) File "<my_virtualenv_directory>/lib/python3.6/site-packages/torch/serialization.py", line 358, in load return _load(f, map_location, pickle_module) File "<my_virtualenv_directory>/lib/python3.6/site-packages/torch/serialization.py", line 542, in _load result = unpickler.load() ModuleNotFoundError: No module named 'fb'

    It seems that there is some dependency missing that I'm not aware of. I use Ubuntu 18.04.1, Python 3.6.6 and PyTorch 0.4.1

    opened by KonceptBlast 8
  • Running the model completely on CPU gives -1 BLEU

    Running the model completely on CPU gives -1 BLEU

    Hi,

    I'm trying to run the model training on CPU. Changes Done: I have removed all the references to cuda i.e. .cuda() mentions and also changed torch.cuda to just torch in the evaluator.py and trainer.py files .
    Changed SIGUR1 to SIGTERM in mulitprocessing_event_loop.py because python multiprocessing in windows is different than in Linux. I had to put torch.cuda.is_available() in build_mt_model(params, data, torch.cuda.is_available()) in line 243 main.py since I was getting a CUDA error.

    With all the above changes done, I'm getting very different results when training, there seems to be an error calculating blue score as below.

    image

    @glample

    Any help appreciated.

    Thanks !

    Mohammed Ayub

    opened by mohammedayub44 8
  • No language model pretraining in these results?

    No language model pretraining in these results?

    Hi @glample , I was reading through the paper and the code and realized that though you mention (in the paper) that pretraining the language model is really important(otherwise the back-translation wouldn't work well), you don't explicitly pretrain the LM in the code(especially in the snippet where you mentioned the training command for NMT)- explicitly, the (--lm_before) flag is not set and by default it is 0(no LM pretrain). So the results which you report in the paper are with the LM or without? Because I would expect that pretrained LM would increase performance. If it achieves this performance, isn't a bit strange?

    Thanks for your time. Pranay

    opened by pranaymanocha 7
  • cannot reproduce the results of unsupervised NMT on ende/deen translation task

    cannot reproduce the results of unsupervised NMT on ende/deen translation task

    I can reproduce the results of unsupervised nmt on enfr/fren translation task after 125 epochs:

    epoch -> 125.000000 ppl_en_fr_valid -> 36.457879 bleu_en_fr_valid -> 21.250000 ppl_fr_en_valid -> 49.971469 bleu_fr_en_valid -> 20.360000 ppl_en_fr_test -> 20.280857 bleu_en_fr_test -> 24.520000 ppl_fr_en_test -> 28.081766 bleu_fr_en_test -> 24.040000 ppl_fr_en_fr_valid -> 1.869078 bleu_fr_en_fr_valid -> 60.330000 ppl_fr_en_fr_test -> 1.725349 bleu_fr_en_fr_test -> 61.530000 ppl_en_fr_en_valid -> 1.827019 bleu_en_fr_en_valid -> 62.540000 ppl_en_fr_en_test -> 1.849352 bleu_en_fr_en_test -> 62.170000

    But I cannot reproduce the results of unsupervised NMT on ende/deen translation task.

    I followed the settings in paper and extracted en/de monolingual dataset from wmt14 to wmt17,used newstest2015 as validation set, and newstest2016 as test set. Other settings are same with the settings on enfr translation task.

    My command is

    python main.py --exp_name test --transformer True --n_enc_layers 4 --n_dec_layers 4 --share_enc 3 --share_dec 3 --share_lang_emb True --share_output_emb True --langs 'de,en' --n_mono -1 --mono_dataset 'en:./data_ende/mono/all.en.tok.60000.pth,,;de:./data_ende/mono/all.de.tok.60000.pth,,' --para_dataset 'de-en:,./data_ende/para/dev/newstest2015-ende-ref.XX.60000.pth,./data_ende/para/dev/newstest2016-ende-src.XX.60000.pth' --mono_directions 'de,en' --word_shuffle 3 --word_dropout 0.1 --word_blank 0.2 --pivo_directions 'en-de-en,de-en-de' --pretrained_emb './data_ende/mono/all.en-de.60000.vec' --pretrained_out True --lambda_xe_mono '0:1,100000:0.1,300000:0' --lambda_xe_otfd 1 --otf_num_processes 30 --otf_sync_params_every 1000 --enc_optimizer adam,lr=0.0001 --epoch_size 500000 --stopping_criterion bleu_de_en_valid,100

    After 122 epochs, the bleu on testset is still lower than the reported results:

    epoch -> 122.000000 ppl_de_en_valid -> 54.630438 bleu_de_en_valid -> 16.020000 ppl_en_de_valid -> 57.778759 bleu_en_de_valid -> 12.210000 ppl_de_en_test -> 41.127902 bleu_de_en_test -> 17.890000 ppl_en_de_test -> 42.743846 bleu_en_de_test -> 13.520000 ppl_de_en_de_valid -> 2.874830 bleu_de_en_de_valid -> 38.840000 ppl_de_en_de_test -> 2.799033 bleu_de_en_de_test -> 38.160000 ppl_en_de_en_valid -> 2.646685 bleu_en_de_en_valid -> 46.750000 ppl_en_de_en_test -> 2.542617 bleu_en_de_en_test -> 47.100000

    I also tried to share all encoder layers and decoder layers following the settings in the paper:

    python main.py --exp_name test --transformer True --n_enc_layers 4 --n_dec_layers 4 --share_enc 4 --share_dec 4 --share_lang_emb True --share_output_emb True --langs 'de,en' --n_mono -1 --mono_dataset 'en:./data_ende/mono/all.en.tok.60000.pth,,;de:./data_ende/mono/all.de.tok.60000.pth,,' --para_dataset 'de-en:,./data_ende/para/dev/newstest2015-ende-ref.XX.60000.pth,./data_ende/para/dev/newstest2016-ende-src.XX.60000.pth' --mono_directions 'de,en' --word_shuffle 3 --word_dropout 0.1 --word_blank 0.2 --pivo_directions 'en-de-en,de-en-de' --pretrained_emb './data_ende/mono/all.en-de.60000.vec' --pretrained_out True --lambda_xe_mono '0:1,100000:0.1,300000:0' --lambda_xe_otfd 1 --otf_num_processes 30 --otf_sync_params_every 1000 --enc_optimizer adam,lr=0.0001 --epoch_size 500000 --stopping_criterion bleu_de_en_valid,100

    But got the similar results with only share 3 encoder layers and 3 decoder layers.

    Is there anyone could reproduce the results on ende/deen translation task? Is there anything that I miss?

    opened by tinyka 7
  • How to slove this problem load_mono_data     assert data['dico'][lang] == mono_data['dico'] AssertionError

    How to slove this problem load_mono_data assert data['dico'][lang] == mono_data['dico'] AssertionError

    INFO - 04/06/19 15:17:42 - 0:00:04 - ============ Monolingual data (en) INFO - 04/06/19 15:17:42 - 0:00:04 - Loading data from ./data/mono/all.en.tok.60000.pth ... INFO - 04/06/19 15:17:45 - 0:00:08 - 1036379321 words (60536 unique) in 10000000 sentences. 0 unknown words (0 unique). en data['dico']: {'ch': <src.data.dictionary.Dictionary object at 0x7fba7c176b38>, 'en': <src.data.dictionary.Dictionary object at 0x7fba7bba6c50>} mono_data['dico']: <src.data.dictionary.Dictionary object at 0x7fba702dd978> data['dico'][lang]: <src.data.dictionary.Dictionary object at 0x7fba7bba6c50> Traceback (most recent call last): File "main.py", line 358, in main(params) File "main.py", line 242, in main data = load_data(params) File "/data/xj/UNMT/UnsupervisedMT-master/NMT/src/data/loader.py", line 509, in load_data load_mono_data(params, data) File "/data/xj/UNMT/UnsupervisedMT-master/NMT/src/data/loader.py", line 292, in load_mono_data assert data['dico'][lang] == mono_data['dico'] AssertionError

    opened by JxuHenry 6
  • Using this code (transformer) on Multi30k English French monolingual

    Using this code (transformer) on Multi30k English French monolingual

    Any experience with running this code on smaller dataset such as Multi30k. The Bleu in the first paper https://arxiv.org/pdf/1711.00043.pdf was around 27.48/28.07. I am trying to get something close to that with transformer based encoder-decoder. Any suggestions. @glample Thanks!

    opened by ahmadrash 5
  • RuntimeError: CUDA error: out of memory

    RuntimeError: CUDA error: out of memory

    When I run the unsupervised NMT codes, the following error is reported. My run command is as follows. Is there any parameter that is too large?

    python main.py --exp_name mnzh --transformer True --n_enc_layers 4 --n_dec_layers 4 --share_enc 3 --share_dec 3 --share_lang_emb True --share_output_emb True --langs 'en,fr' --n_mono -1 --mono_dataset 'en:./data/mono/all.en.tok.60000.pth,,;fr:./data/mono/all.fr.tok.60000.pth,,' --para_dataset 'en-fr:,./data/para/dev/newstest2013-ref.XX.60000.pth,./data/para/dev/newstest2014-fren-src.XX.60000.pth' --mono_directions 'en,fr' --word_shuffle 3 --word_dropout 0.1 --word_blank 0.2 --pivo_directions 'fr-en-fr,en-fr-en' --pretrained_emb './data/mono/all.en-fr.60000.vec' --pretrained_out True --lambda_xe_mono '0:1,100000:0.1,300000:0' --lambda_xe_otfd 1 --otf_num_processes 8 --otf_sync_params_every 1000 --enc_optimizer adam,lr=0.0001 --epoch_size 500000 --stopping_criterion bleu_en_fr_valid,10

    image

    GPU: GTX 1070 Python version: 3.6.2 Pytorch version: 0.4.1 CUDA Version 8.0.44

    Look forward to your reply. Thanks!

    opened by Julisa-test 5
  • transformer multihead attention scaling layer error

    transformer multihead attention scaling layer error

    Hi. I think there's an problem in transformer scaling layer. When I run UNMT, got Exceptionerror in NMT/src/modules/multihead_attention.py line 97.

    line 97 : q = self.scaling line 30 : self.scaling = self.head_dim*-0.5

    I could not find the reason. So I just change my code to

    line 97 : q = q / math.sqrt(self.head_dim)

    and it worked.

    opened by kimziwoo 0
  • About number of shared layers

    About number of shared layers

    https://github.com/facebookresearch/UnsupervisedMT/blob/d5f2fc29246205abd8d62a4377c2bd4c01e086b8/NMT/src/model/attention.py#L76 In this line, If n_enc_layers is 4 and the share_enc is 3, I found that the shared layer is lstm_2 and lstm_3. That is, lstm_1 is not shared. Is it a mistake? Or am I wrong? Thank you very much!

    opened by pangjh3 0
  • Why codes file is empty.?

    Why codes file is empty.?

    I am facing this error

    Applying BPE to valid and test files... Loading vocabulary from /home/UnsupervisedMT/NMT/data/mono/vocab.en.1500 ... Read 26726 words (93 unique) from vocabulary file. Loading codes from /home/UnsupervisedMT/NMT/data/mono/bpe_codes ... Read 0 codes from the codes file. Loading vocabulary from /home/UnsupervisedMT/NMT/data/para/vs11.txt ... Read 0 words (0 unique) from text file. Applying BPE to /home/UnsupervisedMT/NMT/data/para/vs11.txt ... Output memory map failed : 22.

    where am making error?

    opened by ykkhan 4
  • How to train the model without para_dataset

    How to train the model without para_dataset

    I am training UNMT using monolingual only however due to the lack of parallel data between languages, i dont have para_dataset paths for the command. Here is my error: Traceback (most recent call last): File "/content/UnsupervisedMT/NMT/main.py", line 356, in main(params) File "/content/UnsupervisedMT/NMT/main.py", line 241, in main data = load_data(params) File "/content/UnsupervisedMT/NMT/src/data/loader.py", line 496, in load_data load_para_data(params, data) File "/content/UnsupervisedMT/NMT/src/data/loader.py", line 147, in load_para_data assert len(params.para_dataset) > 0 AssertionError @glample can you help me, thank you.

    opened by huyphan168 0
Owner
Facebook Research
Facebook Research
Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

TextBlob: Simplified Text Processing Homepage: https://textblob.readthedocs.io/ TextBlob is a Python (2 and 3) library for processing textual data. It

Steven Loria 7.5k Feb 17, 2021
Neural-Machine-Translation - Implementation of revolutionary machine translation models

Neural Machine Translation Framework: PyTorch Repository contaning my implementa

Utkarsh Jain 1 Feb 17, 2022
Implementaion of our ACL 2022 paper Bridging the Data Gap between Training and Inference for Unsupervised Neural Machine Translation

Bridging the Data Gap between Training and Inference for Unsupervised Neural Machine Translation This is the implementaion of our paper: Bridging the

hezw.tkcw 20 Dec 12, 2022
Python implementation of TextRank for phrase extraction and summarization of text documents

PyTextRank PyTextRank is a Python implementation of TextRank as a spaCy pipeline extension, used to: extract the top-ranked phrases from text document

derwen.ai 1.9k Jan 6, 2023
Python implementation of TextRank for phrase extraction and summarization of text documents

PyTextRank PyTextRank is a Python implementation of TextRank as a spaCy pipeline extension, used to: extract the top-ranked phrases from text document

derwen.ai 1.4k Feb 17, 2021
Automated Phrase Mining from Massive Text Corpora in Python.

Automated Phrase Mining from Massive Text Corpora in Python.

luozhouyang 28 Apr 15, 2021
NLP tool to extract emotional phrase from tweets 🤩

Emotional phrase extractor Extract phrase in the given text that is used to express the sentiment. Capturing sentiment in language is important in the

Shahul ES 38 Oct 17, 2022
Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet

Sockeye This package contains the Sockeye project, an open-source sequence-to-sequence framework for Neural Machine Translation based on Apache MXNet

Amazon Web Services - Labs 1.1k Dec 27, 2022
Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet

Sockeye This package contains the Sockeye project, an open-source sequence-to-sequence framework for Neural Machine Translation based on Apache MXNet

Amazon Web Services - Labs 986 Feb 17, 2021
Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet

Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet

Amazon Web Services - Labs 1000 Apr 19, 2021
The implementation of Parameter Differentiation based Multilingual Neural Machine Translation

The implementation of Parameter Differentiation based Multilingual Neural Machin

Qian Wang 21 Dec 17, 2022
Easy to use, state-of-the-art Neural Machine Translation for 100+ languages

EasyNMT - Easy to use, state-of-the-art Neural Machine Translation This package provides easy to use, state-of-the-art machine translation for more th

Ubiquitous Knowledge Processing Lab 748 Jan 6, 2023
Open Source Neural Machine Translation in PyTorch

OpenNMT-py: Open-Source Neural Machine Translation OpenNMT-py is the PyTorch version of the OpenNMT project, an open-source (MIT) neural machine trans

OpenNMT 5.8k Jan 4, 2023
Open Source Neural Machine Translation in PyTorch

OpenNMT-py: Open-Source Neural Machine Translation OpenNMT-py is the PyTorch version of the OpenNMT project, an open-source (MIT) neural machine trans

OpenNMT 4.8k Feb 18, 2021
Yet Another Neural Machine Translation Toolkit

YANMTT YANMTT is short for Yet Another Neural Machine Translation Toolkit. For a backstory how I ended up creating this toolkit scroll to the bottom o

Raj Dabre 121 Jan 5, 2023
PyTorch Implementation of "Non-Autoregressive Neural Machine Translation"

Non-Autoregressive Transformer Code release for Non-Autoregressive Neural Machine Translation by Jiatao Gu, James Bradbury, Caiming Xiong, Victor O.K.

Salesforce 261 Nov 12, 2022
Training open neural machine translation models

Train Opus-MT models This package includes scripts for training NMT models using MarianNMT and OPUS data for OPUS-MT. More details are given in the Ma

Language Technology at the University of Helsinki 167 Jan 3, 2023
Learning to Rewrite for Non-Autoregressive Neural Machine Translation

RewriteNAT This repo provides the code for reproducing our proposed RewriteNAT in EMNLP 2021 paper entitled "Learning to Rewrite for Non-Autoregressiv

Xinwei Geng 20 Dec 25, 2022
Local cross-platform machine translation GUI, based on CTranslate2

DesktopTranslator Local cross-platform machine translation GUI, based on CTranslate2 Download Windows Installer You can either download a ready-made W

Yasmin Moslem 29 Jan 5, 2023