Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet

Overview

Sockeye

PyPI version GitHub license GitHub issues Documentation Status

This package contains the Sockeye project, an open-source sequence-to-sequence framework for Neural Machine Translation based on Apache MXNet (Incubating). Sockeye powers several Machine Translation use cases, including Amazon Translate. The framework implements state-of-the-art machine translation models with Transformers (Vaswani et al, 2017). Recent developments and changes are tracked in our CHANGELOG.

If you have any questions or discover problems, please file an issue. You can also send questions to sockeye-dev-at-amazon-dot-com.

Version 2.0

With version 2.0, we have updated the usage of MXNet by moving to the Gluon API and adding support for several state-of-the-art features such as distributed training, low-precision training and decoding, as well as easier debugging of neural network architectures. In the context of this rewrite, we also trimmed down the large feature set of version 1.18.x to concentrate on the most important types of models and features, to provide a maintainable framework that is suitable for fast prototyping, research, and production. We welcome Pull Requests if you would like to help with adding back features when needed.

Installation

The easiest way to run Sockeye is with Docker or nvidia-docker. To build a Sockeye image with all features enabled, run the build script:

python3 sockeye_contrib/docker/build.py

See the Dockerfile documentation for more information.

Documentation

For information on how to use Sockeye, please visit our documentation.

Citation

For more information about Sockeye, see our papers (BibTeX).

Sockeye 2.x

Tobias Domhan, Michael Denkowski, David Vilar, Xing Niu, Felix Hieber, Kenneth Heafield. The Sockeye 2 Neural Machine Translation Toolkit at AMTA 2020. Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (AMTA'20).

Felix Hieber, Tobias Domhan, Michael Denkowski, David Vilar. Sockeye 2: A Toolkit for Neural Machine Translation. Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, Project Track (EAMT'20).

Sockeye 1.x

Felix Hieber, Tobias Domhan, Michael Denkowski, David Vilar, Artem Sokolov, Ann Clifton, Matt Post. The Sockeye Neural Machine Translation Toolkit at AMTA 2018. Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (AMTA'18).

Felix Hieber, Tobias Domhan, Michael Denkowski, David Vilar, Artem Sokolov, Ann Clifton and Matt Post. 2017. Sockeye: A Toolkit for Neural Machine Translation. ArXiv e-prints.

Research with Sockeye

Sockeye has been used for both academic and industrial research. A list of known publications that use Sockeye is shown below. If you know more, please let us know or submit a pull request (last updated: October 2020).

2020

  • Dinu, Georgiana, Prashant Mathur, Marcello Federico, Stanislas Lauly, Yaser Al-Onaizan. "Joint translation and unit conversion for end-to-end localization." arXiv preprint arXiv:2004.05219 (2020)
  • Hisamoto, Sorami, Matt Post, Kevin Duh. "Membership Inference Attacks on Sequence-to-Sequence Models: Is My Data In Your Machine Translation System?" Transactions of the Association for Computational Linguistics, Volume 8 (2020)
  • Naradowsky, Jason, Xuan Zhan, Kevin Duh. "Machine Translation System Selection from Bandit Feedback." arXiv preprint arXiv:2002.09646 (2020)
  • Niu, Xing, Prashant Mathur, Georgiana Dinu, Yaser Al-Onaizan. "Evaluating Robustness to Input Perturbations for Neural Machine Translation". arXiv preprint arXiv:2005.00580 (2020)
  • Niu, Xing, Marine Carpuat. "Controlling Neural Machine Translation Formality with Synthetic Supervision." Proceedings of AAAI (2020)
  • Keung, Phillip, Julian Salazar, Yichao Liu, Noah A. Smith. "Unsupervised Bitext Mining and Translation via Self-Trained Contextual Embeddings." arXiv preprint arXiv:2010.07761 (2020).
  • Sokolov, Alex, Tracy Rohlin, Ariya Rastrow. "Neural Machine Translation for Multilingual Grapheme-to-Phoneme Conversion." arXiv preprint arXiv:2006.14194 (2020)
  • Stafanovičs, Artūrs, Toms Bergmanis, Mārcis Pinnis. "Mitigating Gender Bias in Machine Translation with Target Gender Annotations." arXiv preprint arXiv:2010.06203 (2020)
  • Stojanovski, Dario, Alexander Fraser. "Addressing Zero-Resource Domains Using Document-Level Context in Neural Machine Translation." arXiv preprint arXiv preprint arXiv:2004.14927 (2020)
  • Zhang, Xuan, Kevin Duh. "Reproducible and Efficient Benchmarks for Hyperparameter Optimization of Neural Machine Translation Systems." Transactions of the Association for Computational Linguistics, Volume 8 (2020)
  • Swe Zin Moe, Ye Kyaw Thu, Hnin Aye Thant, Nandar Win Min, and Thepchai Supnithi, "Unsupervised Neural Machine Translation between Myanmar Sign Language and Myanmar Language", Journal of Intelligent Informatics and Smart Technology, April 1st Issue, 2020, pp. 53-61. (Submitted December 21, 2019; accepted March 6, 2020; revised March 16, 2020; published online April 30, 2020)
  • Thazin Myint Oo, Ye Kyaw Thu, Khin Mar Soe and Thepchai Supnithi, "Neural Machine Translation between Myanmar (Burmese) and Dawei (Tavoyan)", In Proceedings of the 18th International Conference on Computer Applications (ICCA 2020), Feb 27-28, 2020, Yangon, Myanmar, pp. 219-227
  • Müller, Mathias, Annette Rios, Rico Sennrich. "Domain Robustness in Neural Machine Translation." Proceedings of AMTA (2020)
  • Rios, Annette, Mathias Müller, Rico Sennrich. "Subword Segmentation and a Single Bridge Language Affect Zero-Shot Neural Machine Translation." Proceedings of the 5th WMT: Research Papers (2020)

2019

  • Agrawal, Sweta, Marine Carpuat. "Controlling Text Complexity in Neural Machine Translation." Proceedings of EMNLP (2019)
  • Beck, Daniel, Trevor Cohn, Gholamreza Haffari. "Neural Speech Translation using Lattice Transformations and Graph Networks." Proceedings of TextGraphs-13 (EMNLP 2019)
  • Currey, Anna, Kenneth Heafield. "Zero-Resource Neural Machine Translation with Monolingual Pivot Data." Proceedings of EMNLP (2019)
  • Gupta, Prabhakar, Mayank Sharma. "Unsupervised Translation Quality Estimation for Digital Entertainment Content Subtitles." IEEE International Journal of Semantic Computing (2019)
  • Hu, J. Edward, Huda Khayrallah, Ryan Culkin, Patrick Xia, Tongfei Chen, Matt Post, and Benjamin Van Durme. "Improved Lexically Constrained Decoding for Translation and Monolingual Rewriting." Proceedings of NAACL-HLT (2019)
  • Rosendahl, Jan, Christian Herold, Yunsu Kim, Miguel Graça,Weiyue Wang, Parnia Bahar, Yingbo Gao and Hermann Ney “The RWTH Aachen University Machine Translation Systems for WMT 2019” Proceedings of the 4th WMT: Research Papers (2019)
  • Thompson, Brian, Jeremy Gwinnup, Huda Khayrallah, Kevin Duh, and Philipp Koehn. "Overcoming catastrophic forgetting during domain adaptation of neural machine translation." Proceedings of NAACL-HLT 2019 (2019)
  • Tättar, Andre, Elizaveta Korotkova, Mark Fishel “University of Tartu’s Multilingual Multi-domain WMT19 News Translation Shared Task Submission” Proceedings of 4th WMT: Research Papers (2019)
  • Thazin Myint Oo, Ye Kyaw Thu and Khin Mar Soe, "Neural Machine Translation between Myanmar (Burmese) and Rakhine (Arakanese)", In Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects, NAACL-2019, June 7th 2019, Minneapolis, United States, pp. 80-88

2018

  • Domhan, Tobias. "How Much Attention Do You Need? A Granular Analysis of Neural Machine Translation Architectures". Proceedings of 56th ACL (2018)
  • Kim, Yunsu, Yingbo Gao, and Hermann Ney. "Effective Cross-lingual Transfer of Neural Machine Translation Models without Shared Vocabularies." arXiv preprint arXiv:1905.05475 (2019)
  • Korotkova, Elizaveta, Maksym Del, and Mark Fishel. "Monolingual and Cross-lingual Zero-shot Style Transfer." arXiv preprint arXiv:1808.00179 (2018)
  • Niu, Xing, Michael Denkowski, and Marine Carpuat. "Bi-directional neural machine translation with synthetic parallel data." arXiv preprint arXiv:1805.11213 (2018)
  • Niu, Xing, Sudha Rao, and Marine Carpuat. "Multi-Task Neural Models for Translating Between Styles Within and Across Languages." COLING (2018)
  • Post, Matt and David Vilar. "Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation." Proceedings of NAACL-HLT (2018)
  • Schamper, Julian, Jan Rosendahl, Parnia Bahar, Yunsu Kim, Arne Nix, and Hermann Ney. "The RWTH Aachen University Supervised Machine Translation Systems for WMT 2018." Proceedings of the 3rd WMT: Shared Task Papers (2018)
  • Schulz, Philip, Wilker Aziz, and Trevor Cohn. "A stochastic decoder for neural machine translation." arXiv preprint arXiv:1805.10844 (2018)
  • Tamer, Alkouli, Gabriel Bretschner, and Hermann Ney. "On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation." Proceedings of the 3rd WMT: Research Papers (2018)
  • Tang, Gongbo, Rico Sennrich, and Joakim Nivre. "An Analysis of Attention Mechanisms: The Case of Word Sense Disambiguation in Neural Machine Translation." Proceedings of 3rd WMT: Research Papers (2018)
  • Thompson, Brian, Huda Khayrallah, Antonios Anastasopoulos, Arya McCarthy, Kevin Duh, Rebecca Marvin, Paul McNamee, Jeremy Gwinnup, Tim Anderson, and Philipp Koehn. "Freezing Subnetworks to Analyze Domain Adaptation in Neural Machine Translation." arXiv preprint arXiv:1809.05218 (2018)
  • Vilar, David. "Learning Hidden Unit Contribution for Adapting Neural Machine Translation Models." Proceedings of NAACL-HLT (2018)
  • Vyas, Yogarshi, Xing Niu and Marine Carpuat “Identifying Semantic Divergences in Parallel Text without Annotations”. Proceedings of NAACL-HLT (2018)
  • Wang, Weiyue, Derui Zhu, Tamer Alkhouli, Zixuan Gan, and Hermann Ney. "Neural Hidden Markov Model for Machine Translation". Proceedings of 56th ACL (2018)
  • Zhang, Xuan, Gaurav Kumar, Huda Khayrallah, Kenton Murray, Jeremy Gwinnup, Marianna J Martindale, Paul McNamee, Kevin Duh, and Marine Carpuat. "An Empirical Exploration of Curriculum Learning for Neural Machine Translation." arXiv preprint arXiv:1811.00739 (2018)
  • Swe Zin Moe, Ye Kyaw Thu, Hnin Aye Thant and Nandar Win Min, "Neural Machine Translation between Myanmar Sign Language and Myanmar Written Text", In the second Regional Conference on Optical character recognition and Natural language processing technologies for ASEAN languages 2018 (ONA 2018), December 13-14, 2018, Phnom Penh, Cambodia.
  • Tang, Gongbo, Mathias Müller, Annette Rios and Rico Sennrich. "Why Self-attention? A Targeted Evaluation of Neural Machine Translation Architectures." Proceedings of EMNLP (2018)

2017

  • Domhan, Tobias and Felix Hieber. "Using target-side monolingual data for neural machine translation through multi-task learning." Proceedings of EMNLP (2017).
Comments
  • Unable to install the requirements

    Unable to install the requirements

    Hello,

    I have installed Sockeye in an Anaconda (Conda 4.10.3 with Python 3.8.8) environment as explained here: https://awslabs.github.io/sockeye/setup.html

    But I can't install mxnet:

    Could not find a version that satisfies the requirement mxnet==1.8.0.post0 I tried it with conda install -c anaconda mxnet and with pip install mxnet==1.8.0.post0, but nothing could help.

    Do you know why I can't install mxnet?

    I want to train the model described here: https://aws.amazon.com/blogs/machine-learning/train-neural-machine-translation-models-with-sockeye/

    opened by RamoramaInteractive 39
  • Sockeye freezes at new validation start [v1.18.54]

    Sockeye freezes at new validation start [v1.18.54]

    For the third time in a few days and on 2 independent trainings, I observed that Sockeye freezes after starting some new validation, i.e. it does not crash, does not send any warning, but stops going forward (0% on CPU/GPU). Here are the last lines of my log file before this issue occurs:

    [2018-09-24:21:45:33:INFO:sockeye.training:__call__] Epoch[3] Batch [270000]    Speed: 650.11 samples/sec 22445.47 tokens/sec 2.06 updates/sec  perplexity=3.5
    46109
    [2018-09-24:21:45:34:INFO:root:save_params_to_file] Saved params to "/run/work/generic_fr2en/model_baseline/params.00007"
    [2018-09-24:21:45:34:INFO:sockeye.training:fit] Checkpoint [7]  Updates=270000 Epoch=3 Samples=81602144 Time-cost=4711.141 Updates/sec=2.123
    [2018-09-24:21:45:34:INFO:sockeye.training:fit] Checkpoint [7]  Train-perplexity=3.546109
    [2018-09-24:21:45:36:INFO:sockeye.training:fit] Checkpoint [7]  Validation-perplexity=3.752938
    [2018-09-24:21:45:36:INFO:sockeye.utils:log_gpu_memory_usage] GPU 0: 10093/11178 MB (90.29%) GPU 1: 9791/11178 MB (87.59%) GPU 2: 9795/11178 MB (87.63%) GPU 3: 9789/11178 MB (87.57%)
    [2018-09-24:21:45:36:INFO:sockeye.training:collect_results] Decoder-6 finished: {'rouge2-val': 0.4331754429258854, 'rouge1-val': 0.6335038896620699, 'decode-walltime-val': 3375.992604494095, 'rougel-val': 0.5947101830587342, 'avg-sec-per-sent-val': 1.794786073627908, 'chrf-val': 0.6585073715647153, 'bleu-val': 0.43439024563194745}
    [2018-09-24:21:45:36:INFO:sockeye.training:start_decoder] Starting process: Decoder-7
    

    So at this point, it has outputted params.00007. When I kill the Sockeye process and restart to continue training, it starts again after validation 6 (update 260000), then later overwrites params.00007, starts Decoder-7 and continues training successfully.

    I noted that the freezing occurs at the same moment as in #462, but I have no idea whether it is related to this case. I checked all parameters of the last param file after the issue with numpy.isnan() and no nans were reported.

    opened by franckbrl 30
  • How to measure the BLEU of training/translation

    How to measure the BLEU of training/translation

    Hi, I just trained a 8 layer rnn model and got the following result:

    python -m sockeye.train -s corpus.tc.BPE.de \
                            -t corpus.tc.BPE.en \
                            -vs newstest2016.tc.BPE.de \
                            -vt newstest2016.tc.BPE.en \
                            --num-embed 512 \
                            --rnn-num-hidden 512 \
                            --rnn-attention-type dot \
                            --embed-dropout=0.2 \
                            --rnn-decoder-hidden-dropout=0.2 \
                            --max-seq-len 50 \
                            --decode-and-evaluate 500 \
                            --batch-size 128 \
    			--batch-type sentence \
                            -o gnmt_model \
    			--optimized-metric bleu \
                            --initial-learning-rate=0.0001 \
                            --learning-rate-reduce-num-not-improved=8 \
                            --learning-rate-reduce-factor=0.7 \
                            --weight-init xavier --weight-init-scale 3.0 \
                            --weight-init-xavier-factor-type avg \
    			--lock-dir ~/.temp/ \
                            --num-layers 8:8 \
                            --device-ids 0 1
    
    [2018-07-18:00:56:34:INFO:sockeye.training:fit] Training finished. Best checkpoint: 60. Best validation bleu: 0.746082
    [2018-07-18:00:56:34:INFO:sockeye.utils:__exit__] Releasing GPU 1.
    [2018-07-18:00:56:34:INFO:sockeye.utils:__exit__] Releasing GPU 0.
    

    Evaluate translation:

    [INFO:__main__] bleu	(s_opt)	chrf	(s_opt)
    0.171	(-)	0.484	(-)
    

    Is this bleu value correct? It is far away from the bleu value(20+) from the paper.

    By the way, how to use sockeye to build a GNMT model to align with the tensorflow's config?

    opened by xinyu-intel 26
  • Sampling chooses vocab index that does not exist with certain random seeds

    Sampling chooses vocab index that does not exist with certain random seeds

    Running into the following error while sampling with certain seeds:

    Traceback (most recent call last):
      File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
        "__main__", mod_spec)
      File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "/net/cephfs/scratch/mathmu/map-volatility/venvs/sockeye3/lib/python3.5/site-packages/sockeye/translate.py", line 269, in <module>
        main()
      File "/net/cephfs/scratch/mathmu/map-volatility/venvs/sockeye3/lib/python3.5/site-packages/sockeye/translate.py", line 46, in main
        run_translate(args)
      File "/net/cephfs/scratch/mathmu/map-volatility/venvs/sockeye3/lib/python3.5/site-packages/sockeye/translate.py", line 155, in run_translate
        input_is_json=args.json_input)
      File "/net/cephfs/scratch/mathmu/map-volatility/venvs/sockeye3/lib/python3.5/site-packages/sockeye/translate.py", line 237, in read_and_translate
        chunk_time = translate(output_handler, chunk, translator)
      File "/net/cephfs/scratch/mathmu/map-volatility/venvs/sockeye3/lib/python3.5/site-packages/sockeye/translate.py", line 260, in translate
        trans_outputs = translator.translate(trans_inputs)
      File "/net/cephfs/scratch/mathmu/map-volatility/venvs/sockeye3/lib/python3.5/site-packages/sockeye/inference.py", line 861, in translate
        results.append(self._make_result(trans_input, translation))
      File "/net/cephfs/scratch/mathmu/map-volatility/venvs/sockeye3/lib/python3.5/site-packages/sockeye/inference.py", line 963, in _make_result
        target_tokens = [self.vocab_target_inv[target_id] for target_id in target_ids]
      File "/net/cephfs/scratch/mathmu/map-volatility/venvs/sockeye3/lib/python3.5/site-packages/sockeye/inference.py", line 963, in <listcomp>
        target_tokens = [self.vocab_target_inv[target_id] for target_id in target_ids]
    KeyError: 7525
    

    I am calling Sockeye with a script such as

    OMP_NUM_THREADS=1 python -m sockeye.translate \
                    -i $data_sub/$corpus.pieces.src \
                    -o $samples_sub_sub/$corpus.pieces.$seed.trg \
                    -m $model_path \
                    --sample \
                    --seed $seed \
                    --length-penalty-alpha 1.0 \
                    --device-ids 0 \
                    --batch-size 64 \
                    --disable-device-locking
    

    Sockeye and Mxnet versions:

    [2020-08-25:17:03:03:INFO:sockeye.utils:log_sockeye_version] Sockeye version 2.1.17, commit 92a020a25cbe75935c700ce2f29b286b31a87189, path /net/cephfs/scratch/mathmu/map-volatility/venvs/sockeye3/lib/python3.5/site-packages/sockeye/__init__.py
    [2020-08-25:17:03:03:INFO:sockeye.utils:log_mxnet_version] MXNet version 1.6.0, path /net/cephfs/scratch/mathmu/map-volatility/venvs/sockeye3/lib/python3.5/site-packages/mxnet/__init__.py
    

    Details that may be relevant:

    • This only happens for certain random --seeds
    • Running on a Tesla V100
    • OS: Ubuntu 16.04.6 LTS
    • the MXnet version in the CUDA 10.2 requirements file (https://github.com/awslabs/sockeye/blob/master/requirements/requirements.gpu-cu102.txt) is no longer available on Pypi. I had to install mxnet-cu102mkl==1.6.0.post0.

    The vocabulary does not have this index:

    
    [INFO:sockeye.vocab] Vocabulary (7525 words) loaded from "/net/cephfs/scratch/mathmu/map-volatility/models/bel-eng/baseline/vocab.src.0.json"
    [INFO:sockeye.vocab] Vocabulary (7525 words) loaded from "/net/cephfs/scratch/mathmu/map-volatility/models/bel-eng/baseline/vocab.trg.0.json"
    

    I suspect that the sampling procedure somehow assumes 1-based indexing, whereas the vocabulary is 0-indexed. This would mean that there is a small chance that max_vocab_id+1 is picked as the next token.

    Looking at the inference code, I am not sure yet why this happens.

    sockeye_2 
    opened by bricksdont 21
  • Sockeye transformer has different total number of trainable parameters from T2T Transformer

    Sockeye transformer has different total number of trainable parameters from T2T Transformer

    I read your arxiv paper, and I found that the total number of trainable parameters of SOCKEYE transformer is 62,946,611 on EN→DE task while the number is 60,668,928 for T2T transformer. I wonder what contributes to this difference?

    opened by szhengac 21
  • [WIP] Sockeye 2 Performance Optimizations

    [WIP] Sockeye 2 Performance Optimizations

    Made changes to Sockeye 2 to improve the performance of the Transformer model in machine translation. Current changes only apply to inference; optimizations to training are planned for later but before completion of the pull request.

    A list of changes detailed below:

    1. Replaced the batch_dot ops in multihead attention with ops that do not require folding heads in with batch dimension; one caveat is that batch and sequence_length dimensions are swapped, requiring some adjustments to other parts of the code to account for the change
    2. Removed the take ops that were applied to the encoder states, as they do not change on different beams; this effectively cuts compute time for these takes in half
    3. Gathered the input token ids into a numpy array on CPU before sending them all to the GPU at the beginning of beam search, rather than sending each batch element to the GPU one at a time
    4. Set the data type of arrays during beam search computation to match the model's data type, rather than explicitly setting it to fp32

    Pull Request Checklist

    • [ ] Changes are complete (if posting work-in-progress code, prefix your pull request title with '[WIP]' until you can check this box.
    • [ ] Unit tests pass (pytest)
    • [ ] Were system tests modified? If so did you run these at least 5 times to account for the variation across runs?
    • [ ] System tests pass (pytest test/system)
    • [ ] Passed code style checking (./style-check.sh)
    • [x] You have considered writing a test
    • [ ] Updated major/minor version in sockeye/__init__.py. Major version bump if this is a backwards incompatible change.
    • [ ] Updated CHANGELOG.md

    By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

    enhancement sockeye_2 
    opened by blchu 20
  • Provide multiple source vocabularies as argument

    Provide multiple source vocabularies as argument

    Following issue #527, --source-vocab can now take multiple files for additional source factor vocabularies.

    We may want to consider changing a few variable/parameter names. For instance in train.py, now that args.source_vocab is a list, we may rename it to args.source_vocabs (parameter --source-vocabs), but it would probably not go well with the variable source_vocabs (produced by create_data_iters_and_vocabs()). This would also lead to a backwards-incompatible change.

    Unit tests output 11 failures on the current master branch version. Since the content of this PR added no failures, I'm assuming the tests are passed.

    Pull Request Checklist

    • [x] Changes are complete (if posting work-in-progress code, prefix your pull request title with '[WIP]' until you can check this box.
    • [x] Unit tests pass (pytest)
    • [ ] Were system tests modified? If so did you run these at least 5 times to account for the variation across runs?
    • [x] System tests pass (pytest test/system)
    • [x] Passed code style checking (./style-check.sh)
    • [ ] You have considered writing a test
    • [x] Updated major/minor version in sockeye/__init__.py. Major version bump if this is a backwards incompatible change.
    • [x] Updated CHANGELOG.md

    By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

    enhancement 
    opened by franckbrl 20
  • Sockeye 2 Interleaved Multi-head Attention Operators

    Sockeye 2 Interleaved Multi-head Attention Operators

    Replaced batched dot product in multi-head attention with interleaved_matmul attention operators to improve performance. Also changes the batch-major data to time-major format while in the model to comply with the new operator requirements

    Pull Request Checklist

    • [x] Changes are complete (if posting work-in-progress code, prefix your pull request title with '[WIP]' until you can check this box.
    • [ ] Unit tests pass (pytest)
    • [x] Were system tests modified? If so did you run these at least 5 times to account for the variation across runs?
    • [ ] System tests pass (pytest test/system)
    • [ ] Passed code style checking (./style-check.sh)
    • [x] You have considered writing a test
    • [ ] Updated major/minor version in sockeye/__init__.py. Major version bump if this is a backwards incompatible change.
    • [ ] Updated CHANGELOG.md

    By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

    enhancement sockeye_2 
    opened by blchu 19
  • serve sockeye using mxnet-model-server

    serve sockeye using mxnet-model-server

    Hello!

    mxnet-model-server seems to be a neat way to serve MXNET model. Does sockeye have a plan to add a serving function using mxnet-model-server?

    Besides the mxnet model parameter file and the symbol.json, mxnet-model-server requires a customized data preprocessing and postprocessing pipeline, if I want to write the code myself, is it feasible to do it with sockeye? Would you have some suggestions for that?

    opened by boliangz 18
  • Bug with beam-size=1?

    Bug with beam-size=1?

    In trying to get tests to passing with scoring (#538), I have turned up some weird behavior with scores output by Sockeye. Here are two commands using a transformer model built in the system tests. Notice:

    • The invocations differ only in the beam size (1 or 2)
    • --skip-topk is not enabled
    • With beam size of 1, the scores output should be impossible, since Sockeye outputs negative logprobs.

    Any ideas?

    CC: @bricksdont

    $ python3 -m sockeye.translate -i src --output-type translation_with_score --use-cpu -m model --beam-size 1 2> /dev/null | head
    -10.556	7 5 2 7 3 6 5 4 7 7
    -10.727	9 2 4 1 6 7 8 6 8
    -12.788	8 6 8 7
    -10.413	0 5 0 7 5 9 0 6 3 1
    -10.731	7 9 2 6 8 5 0 6 5
    -12.490	5 6 3 2
    -inf	
    -11.242	3 9 1 3 8 7
    -15.759	2 1
    -10.506	8 8 8 2 4 4 5 5 2 5
    $ python3 -m sockeye.translate -i src --output-type translation_with_score --use-cpu -m model --beam-size 2 2> /dev/null | head
    0.003	7 5 2 7 3 6 5 4 7 7
    0.001	9 2 4 1 6 7 8 6 8
    0.000	8 6 8 7
    0.002	0 5 0 7 5 9 0 6 3 1
    0.001	7 9 2 6 8 5 0 6 5
    0.001	5 6 3 2
    -inf	
    0.001	3 9 1 3 8 7
    0.001	2 1
    0.002	8 8 8 2 4 4 5 5 2 5
    
    opened by mjpost 18
  • Source factors

    Source factors

    Added source factors, as described in.

    Linguistic Input Features Improve Neural Machine Translation.
    Rico Sennrich & Barry Haddow
    In Proceedings of the First Conference on Machine Translation. Berlin, Germany, pp. 83-91.
    

    Source factors are enabled by passing --source-factors file1 [file2 ...] (-sf), where file1, etc. are token-parallel to the source (-s). This option can be passed both to sockeye.train or in the data preparation step, if data sharding is used. An analogous parameter, --validation-source-factors, is used to pass factors for validation data. The flag --source-factors-num-embed D1 [D2 ...] denotes the embedding dimensions. These are concatenated with the source word dimension (--num-embed), which can continue to be tied to the target (--weight-tying --weight-tying-type=src_trg).

    At test time, the input sentence and its factors can be passed by multiple parallel files (--input and --input-factors) or through stdin with token-level annotations, separated by |. Another way is to send a string-serialized JSON object to the CLI through stdin which needs to have a top-level key called 'text' and optionally a key 'factors' of type List[str].

    Pull Request Checklist

    • [x] Changes are complete (if posting work-in-progress code, prefix your pull request title with '[WIP]' until you can check this box.
    • [x] Unit tests pass (pytest)
    • [x] System tests pass (pytest test/system)
    • [x] Passed code style checking (./pre-commit.sh or manual run of pylint & mypy)
    • [x] You have considered writing a test
    • [x] Updated major/minor version in sockeye/__init__.py. Major version bump if this is a backwards incompatible change.
    • [x] Updated CHANGELOG.md

    By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

    backwards_incompatible 
    opened by mjpost 18
  • Specif or omit --shared-vocab consistently when training and preparing the data error

    Specif or omit --shared-vocab consistently when training and preparing the data error

    Recently when I am trying to use sockeye3, it always returns an error "Specif or omit --shared-vocab consistently when training and preparing the data" while I am using the sockeye-prepare-data to prepare the data without specifying the shared-vocab. when using sockeye-train, the error shows up. I've confirmed that both the preparing and training procedure did not specify the shared-vocab. So is there anything I can do to fix this problem? or sockeye3 only support shared-vocab?

    Best regards Peter

    opened by NLP-Peter 1
Releases(3.1.29)
  • 3.1.29(Dec 12, 2022)

    [3.1.29]

    Changed

    • Running sockeye-evaluate no longer applies text tokenization for TER (same behavior as other metrics).
    • Turned on type checking for all sockeye modules except test_utils and addressed resulting type issues.
    • Refactored code in various modules without changing user-level behavior.

    [3.1.28]

    Added

    • Added kNN-MT model from Khandelwal et al., 2021.
      • Installation: see faiss document -- installation via conda is recommended.
      • Building a faiss index from a sockeye model takes two steps:
        • Generate decoder states: sockeye-generate-decoder-states -m [model] --source [src] --target [tgt] --output-dir [output dir]
        • Build index: sockeye-knn -i [input_dir] -o [output_dir] -t [faiss_index_signature] where input_dir is the same as output_dir from the sockeye-generate-decoder-states command.
        • Faiss index signature reference: see here
      • Running inference using the built index: sockeye-translate ... --knn-index [index_dir] --knn-lambda [interpolation_weight] where index_dir is the same as output_dir from the sockeye-knn command.
    Source code(tar.gz)
    Source code(zip)
  • 3.1.27(Nov 6, 2022)

    [3.1.27]

    Changed

    • allow torch 1.13 in requirements.txt
    • Replaced deprecated torch.testing.assert_allclose with torch.testing.close for PyTorch 1.14 compatibility.

    [3.1.26]

    Added

    • --tf32 0|1 bool device (torch.backends.cuda.matmul.allow_tf32) enabling 10-bit precision (19 bit total) transparent float32 acceleration. default true for backward compat with torch < 1.12. allow different --tf32 training continuation

    Changed

    • device.init_device() called by train, translate, and score
    • allow torch 1.12 in requirements.txt

    [3.1.25]

    Changed

    • Updated to sacrebleu==2.3.1. Changed default BLEU floor smoothing offset from 0.01 to 0.1.

    [3.1.24]

    Fixed

    • Updated DeepSpeed checkpoint conversion to support newer versions of DeepSpeed.

    [3.1.23]

    Changed

    • Change decoder softmax size logging level from info to debug.

    [3.1.22]

    Added

    • log beam search avg output vocab size

    Changed

    • common base Search for GreedySearch and BeamSearch
    • .pylintrc: suppress warnings about deprecated pylint warning suppressions

    [3.1.21]

    Fixed

    • Send skip_nvs and nvs_thresh args now to Translator constructor in sockeye-translate instead of ignoring them.

    [3.1.20]

    Added

    • Added training support for DeepSpeed.
      • Installation: pip install deepspeed
      • Usage: deepspeed --no_python ... sockeye-train ...
      • DeepSpeed mode uses Zero Redundancy Optimizer (ZeRO) stage 1 (Rajbhandari et al., 2019).
      • Run in FP16 mode with --deepspeed-fp16 or BF16 mode with --deepspeed-bf16.

    [3.1.19]

    Added

    • Clean up GPU and CPU memory used during training initialization before starting the main training loop.

    Changed

    • Refactored training code in advance of adding DeepSpeed support:
      • Moved logic for flagging interleaved key-value parameters from layers.py to model.py.
      • Refactored LearningRateScheduler API to be compatible with PyTorch/DeepSpeed.
      • Refactored optimizer and learning rate scheduler creation to be modular.
      • Migrated to ModelWithLoss API, which wraps a Sockeye model and its losses in a single module.
      • Refactored primary and secondary worker logic to reduce redundant calculations.
      • Refactored code for saving/loading training states.
      • Added utility code for managing model/training configurations.

    Removed

    • Removed unused training option --learning-rate-t-scale.

    [3.1.18]

    Added

    • Added sockeye-train and sockeye-translate option --clamp-to-dtype that clamps outputs of transformer attention, feed-forward networks, and process blocks to the min/max finite values for the current dtype. This can prevent inf/nan values from overflow when running large models in float16 mode. See: https://discuss.huggingface.co/t/t5-fp16-issue-is-fixed/3139

    [3.1.17]

    Added

    • Added support for offline model quantization with sockeye-quantize.
      • Pre-quantizing a model avoids the load-time memory spike of runtime quantization. For example, a float16 model loads directly as float16 instead of loading as float32 then casting to float16.

    [3.1.16]

    Added

    • Added nbest list reranking options using isometric translation criteria as proposed in an ICASSP 2021 paper https://arxiv.org/abs/2110.03847. To use this feature pass a criterion (isometric-ratio, isometric-diff, isometric-lc) when specifying --metric.
    • Added --output-best-non-blank to output non-blank best hypothesis from the nbest list.

    [3.1.15]

    Fixed

    • Fix type of valid_length to be pt.Tensor instead of Optional[pt.Tensor] = None for jit tracing
    Source code(tar.gz)
    Source code(zip)
  • 3.1.14(May 5, 2022)

    [3.1.14]

    Added

    • Added the implementation of Neural vocabulary selection to Sockeye as presented in our NAACL 2022 paper "The Devil is in the Details: On the Pitfalls of Vocabulary Selection in Neural Machine Translation" (Tobias Domhan, Eva Hasler, Ke Tran, Sony Trenous, Bill Byrne and Felix Hieber).
      • To use NVS simply specify --neural-vocab-selection to sockeye-train. This will train a model with Neural Vocabulary Selection that is automatically used by sockeye-translate. If you want look at translations without vocabulary selection specify --skip-nvs as an argument to sockeye-translate.

    [3.1.13]

    Added

    • Added sockeye-train argument --no-reload-on-learning-rate-reduce that disables reloading the best training checkpoint when reducing the learning rate. This currently only applies to the plateau-reduce learning rate scheduler since other schedulers do not reload checkpoints.
    Source code(tar.gz)
    Source code(zip)
  • 3.1.12(Apr 26, 2022)

    [3.1.12]

    Fixed

    • Fix scoring with batches of size 1 (whic may occur when |data| % batch_size == 1.

    [3.1.11]

    Fixed

    • When resuming training with a fully trained model, sockeye-train will correctly exit without creating a duplicate (but separately numbered) checkpoint.
    Source code(tar.gz)
    Source code(zip)
  • 3.1.10(Apr 12, 2022)

    [3.1.10]

    Fixed

    • When loading parameters, SockeyeModel now ignores false positive missing parameters for traced modules. These modules use the same parameters as their original non-traced versions.
    Source code(tar.gz)
    Source code(zip)
  • 3.1.9(Apr 11, 2022)

    [3.1.9]

    Changed

    • Clarified usage of batch_size in Translator code.

    [3.1.8]

    Fixed

    • When saving parameters, SockeyeModel now skips parameters for traced modules because these modules are created at runtime and use the same parameters as non-traced versions. When loading parameters, SockeyeModel ignores parameters for traced modules that may have been saved by earlier versions.
    Source code(tar.gz)
    Source code(zip)
  • 3.1.7(Mar 23, 2022)

    [3.1.7]

    Changed

    • SockeyeModel components are now traced regardless of whether inference_only is set, including for the CheckpointDecoder during training.

    [3.1.6]

    Changed

    • Moved offsetting of topk scores out of the (traced) TopK module. This allows sending requests of variable batch size to the same Translator/Model/BeamSearch instance.

    [3.1.5]

    Changed

    • Allow PyTorch 1.11 in requirements
    Source code(tar.gz)
    Source code(zip)
  • 3.1.4(Mar 10, 2022)

  • 3.1.3(Feb 28, 2022)

    [3.1.3]

    Added

    • Added support for the use of adding source prefixes to the input in JSON format during inference.

    [3.1.2]

    Changed

    • Optimized creation of source length mask by using expand instead of repeat_interleave.

    [3.1.1]

    Changed

    • Updated torch dependency to 1.10.x (torch>=1.10.0,<1.11.0)
    Source code(tar.gz)
    Source code(zip)
  • 3.1.0(Feb 11, 2022)

    [3.1.0]

    Sockeye is now exclusively based on Pytorch.

    Changed

    • Renamed x_pt modules to x. Updated entry points in setup.py.

    Removed

    • Removed MXNet from the codebase
    • Removed device locking / GPU acquisition logic. Removed dependency on portalocker.
    • Removed arguments --softmax-temperature, --weight-init-*, --mc-dropout, --horovod, --device-ids
    • Removed all MXNet-related tests
    Source code(tar.gz)
    Source code(zip)
  • 3.0.15(Feb 9, 2022)

    [3.0.15]

    Fixed

    • Fixed GPU-based scoring by copying to cpu tensor first before converting to numpy.

    [3.0.14]

    Added

    • Added support for Translation Error Rate (TER) metric as implemented in sacrebleu==1.4.14. Checkpoint decoder metrics will now include TER scores and early stopping can be determined via TER improvements (--optimized-metric ter)
    Source code(tar.gz)
    Source code(zip)
  • 3.0.13(Feb 3, 2022)

    [3.0.13]

    Changed

    • use expand instead of repeat for attention masks to not allocate additional memory
    • avoid repeated transpose for initializing cached encoder-attention states in the decoder.

    [3.0.12]

    Removed

    • Removed unused code for Weight Normalization. Minor code cleanups.

    [3.0.11]

    Fixed

    • Fixed training with a single, fixed learning rate instead of a rate scheduler (--learning-rate-scheduler none --initial-learning-rate ...).
    Source code(tar.gz)
    Source code(zip)
  • 3.0.10(Jan 19, 2022)

    [3.0.10]

    Changed

    • End-to-end trace decode_step of the Sockeye model. Creates less overhead during decoding and a small speedup.

    [3.0.9]

    Fixed

    • Fixed not calling the traced target embedding module during inference.

    [3.0.8]

    Changed

    • Add support for JIT tracing source/target embeddings and JIT scripting the output layer during inference.
    Source code(tar.gz)
    Source code(zip)
  • 3.0.7(Dec 20, 2021)

    [3.0.7]

    Changed

    • Improve training speed by usingtorch.nn.functional.multi_head_attention_forward for self- and encoder-attention during training. Requires reorganization of the parameter layout of the key-value input projections, as the current Sockeye attention interleaves for faster inference. Attention masks (both for source masking and autoregressive masks need some shape adjustments as requirements for the fused MHA op differ slightly).
      • Non-interleaved format for joint key-value input projection parameters: in_features=hidden, out_features=2*hidden -> Shape: (2*hidden, hidden)
      • Interleaved format for joint-key-value input projection stores key and value parameters, grouped by heads: Shape: ((num_heads * 2 * hidden_per_head), hidden)
      • Models save and load key-value projection parameters in interleaved format.
      • When model.training == True key-value projection parameters are put into non-interleaved format for torch.nn.functional.multi_head_attention_forward
      • When model.training == False, i.e. model.eval() is called, key-value projection parameters are again converted into interleaved format in place.

    [3.0.6]

    Fixed

    • Fixed checkpoint decoder issue that prevented using bleu as --optimized-metric for distributed training (#995).

    [3.0.5]

    Fixed

    • Fixed data download in multilingual tutorial.
    Source code(tar.gz)
    Source code(zip)
  • 3.0.4(Dec 13, 2021)

    [3.0.4]

    • Make sure data permutation indices are in int64 format (doesn't seem to be the case by default on all platforms).

    [3.0.3]

    Fixed

    • Fixed ensemble decoding for models without target factors.

    [3.0.2]

    Changed

    • sockeye-translate: Beam search now computes and returns secondary target factor scores. Secondary target factors do not participate in beam search, but are greedily chosen at every time step. Accumulated scores for secondary factors are not normalized by length. Factor scores are included in JSON output (--output-type json).
    • sockeye-score now returns tab-separated scores for each target factor. Users can decide how to combine factor scores depending on the downstream application. Score for the first, primary factor (i.e. output words) are normalized, other factors are not.

    [3.0.1]

    Fixed

    • Parameter averaging (sockeye-average) now always uses the CPU, which enables averaging parameters from GPU-trained models on CPU-only hosts.
    Source code(tar.gz)
    Source code(zip)
  • 3.0.0(Nov 30, 2021)

    [3.0.0] Sockeye 3: Fast Neural Machine Translation with PyTorch

    Sockeye is now based on PyTorch. We maintain backwards compatibility with MXNet models in version 2.3.x until 3.1.0. If MXNet 2.x is installed, Sockeye can run both with PyTorch or MXNet but MXNet is no longer strictly required.

    Added

    • Added model converter CLI sockeye.mx_to_pt that converts MXNet models to PyTorch models.
    • Added --apex-amp training argument that runs entire model in FP16 mode, replaces --dtype float16 (requires Apex).
    • Training automatically uses Apex fused optimizers if available (requires Apex).
    • Added training argument --label-smoothing-impl to choose label smoothing implementation (default of mxnet uses the same logic as MXNet Sockeye 2).

    Changed

    • CLI names point to the PyTorch code base (e.g. sockeye-train etc.).
    • MXNet-based CLIs are now accessible via sockeye-<name>-mx.
    • MXNet code requires MXNet >= 2.0 since we adopted the new numpy interface.
    • sockeye-train now uses PyTorch's distributed data-parallel mode for multi-process (multi-GPU) training. Launch with: torchrun --no_python --nproc_per_node N sockeye-train --dist ...
    • Updated the quickstart tutorial to cover multi-device training with PyTorch Sockeye.
    • Changed --device-ids argument (plural) to --device-id (singular). For multi-GPU training, see distributed mode noted above.
    • Updated default value: --pad-vocab-to-multiple-of 8
    • Removed --horovod argument used with horovodrun (use --dist with torchrun).
    • Removed --optimizer-params argument (use --optimizer-betas, --optimizer-eps).
    • Removed --no-hybridization argument (use PYTORCH_JIT=0, see Disable JIT for Debugging).
    • Removed --omp-num-threads argument (use --env=OMP_NUM_THREADS=N).

    Removed

    • Removed support for constrained decoding (both positive and negative lexical constraints)
    • Removed support for beam histories
    • Removed --amp-scale-interval argument.
    • Removed --kvstore argument.
    • Removed arguments: --weight-init, --weight-init-scale --weight-init-xavier-factor-type, --weight-init-xavier-rand-type
    • Removed --decode-and-evaluate-device-id argument.
    • Removed arguments: --monitor-pattern', --monitor-stat-func
    • Removed CUDA-specific requirements files in requirements/
    Source code(tar.gz)
    Source code(zip)
  • 2.3.24(Nov 5, 2021)

    [2.3.24]

    Added

    • Use of the safe yaml loader for the model configuration files.

    [2.3.23]

    Changed

    • Do not sort BIAS_STATE in beam search. It is constant across decoder steps.
    Source code(tar.gz)
    Source code(zip)
  • 2.3.22(Sep 30, 2021)

    [2.3.22]

    Fixed

    • The previous commit introduced a regression for vocab creation. The results was that the vocabulary was created on the input characters rather than on tokens.

    [2.3.21]

    Added

    • Extended parallelization of data preparation to vocabulary and statistics creation while minimizing the overhead of sharding.

    [2.3.20]

    Added

    • Added debug logging for restrict_lexicon lookups

    [2.3.19]

    Changed

    • When training only the decoder (--fixed-param-strategy all_except_decoder), disable autograd for the encoder and embeddings to save memory.

    [2.3.18]

    Changed

    Source code(tar.gz)
    Source code(zip)
    wmt14_en_de.tgz(1131.78 MB)
  • 2.3.17(Jun 17, 2021)

    [2.3.17]

    Added

    • Added an alternative, faster implementation of greedy search. The '--greedy' flag to sockeye.translate will enable it. This implementation does not support hypothesis scores, batch decoding, or lexical constraints."

    [2.3.16]

    Added

    [2.3.15]

    Changed

    • Optimization: Decoder class is now a complete HybridBlock (no forward method).
    Source code(tar.gz)
    Source code(zip)
  • 2.3.14(Apr 7, 2021)

    [2.3.14]

    Changed

    • Updated to MXNet 1.8.0
    • Removed dependency support for Cuda 9.2 (no longer supported by MXNet 1.8).
    • Added dependency support for Cuda 11.0 and 11.2.
    • Updated Python requirement to 3.7 and later. (Removed backporting dataclasses requirement)

    [2.3.13]

    Added

    • Target factors are now also collected for nbest translations (and stored in the JSON output handler).

    [2.3.12]

    Added

    • Added --config option to prepare_data CLI to allow setting commandline flags via a yaml config.
    • Flags for the prepare_data CLI are now stored in the output folder under args.yaml (equivalent to the behavior of sockeye_train)

    [2.3.11]

    Added

    • Added option prevent_unk to avoid generating <unk> token in beam search.
    Source code(tar.gz)
    Source code(zip)
  • 2.3.10(Feb 8, 2021)

    [2.3.10]

    Changed

    • Make sure that the top N best params files retained, even if N > --keep-last-params. This ensures that model averaging will not be crippled when keeping only a few params files during training. This can result in a significant savings of disk space during training.

    [2.3.9]

    Added

    Source code(tar.gz)
    Source code(zip)
  • 2.3.8(Jan 8, 2021)

    [2.3.8]

    Fixed

    • Fix problem identified in issue #925 that caused learning rate warmup to fail in some instances when doing continued training

    [2.3.7]

    Changed

    • Use dataclass module to simplify Config classes. No functional change.

    [2.3.6]

    Fixed

    • Fixes the problem identified in issue #890, where the lr_scheduler does not behave as expected when continuing training. The problem is that the lr_scheduler is kept as part of the optimizer, but the optimizer is not saved when saving state. Therefore, every time training is restarted, a new lr_scheduler is created with initial parameter settings. Fix by saving and restoring the lr_scheduling separately.

    [2.3.5]

    Fixed

    • Fixed issue with LearningRateSchedulerPlateauReduce.repr printing out num_not_improved instead of reduce_num_not_improved.

    [2.3.4]

    Fixed

    • Fixed issue with dtype mismatch in beam search when translating with --dtype float16.

    [2.3.3]

    Changed

    • Upgraded SacreBLEU dependency of Sockeye to a newer version (1.4.14).
    Source code(tar.gz)
    Source code(zip)
  • 2.3.2(Nov 18, 2020)

    [2.3.2]

    Fixed

    • Fixed edge case that unintentionally skips softmax for sampling if beam size is 1.

    [2.3.1]

    Fixed

    • Optimizing for BLEU/CHRF with horovod required the secondary workers to also create checkpoint decoders.

    [2.3.0]

    Added

    • Added support for target factors. If provided with additional target-side tokens/features (token-parallel to the regular target-side) at training time, the model can now learn to predict these in a multi-task setting. You can provide target factor data similar to source factors: --target-factors <factor_file1> [<factor_fileN>]. During training, Sockeye optimizes one loss per factor in a multi-task setting. The weight of the losses can be controlled by --target-factors-weight. At inference, target factors are decoded greedily, they do not participate in beam search. The predicted factor at each time step is the argmax over its separate output layer distribution. To receive the target factor predictions at inference time, use --output-type translation_with_factors.

    Changed

    • load_model(s) now returns a list of target vocabs.
    • Default source factor combination changed to sum (was concat before).
    • SockeyeModel class has three new properties: num_target_factors, target_factor_configs, and factor_output_layers.
    Source code(tar.gz)
    Source code(zip)
  • 2.2.8(Nov 5, 2020)

    [2.2.8]

    Changed

    • Make source/target data parameters required for the scoring CLI to avoid cryptic error messages.

    [2.2.7]

    Added

    • Added an argument to specify the log level of secondary workers. Defaults to ERROR to hide any logs except for exceptions.

    [2.2.6]

    Fixed

    • Avoid a crash due to an edge case when no model improvement has been observed by the time the learning rate gets reduced for the first time.

    [2.2.5]

    Fixed

    • Enforce sentence batching for sockeye score tool, set default batch size to 56

    [2.2.4]

    Changed

    • Use softmax with length in DotAttentionCell.
    • Use contrib.arange_like in AutoRegressiveBias block to reduce number of ops.

    [2.2.3]

    Added

    • Log the absolute number of <unk> tokens in source and target data

    [2.2.2]

    Fixed

    • Fix: Guard against null division for small batch sizes.

    [2.2.1]

    Fixed

    • Fixes a corner case bug by which the beam decoder can wrongly return a best hypothesis with -infinite score.
    Source code(tar.gz)
    Source code(zip)
  • 2.2.0(Oct 4, 2020)

    [2.2.0]

    Changed

    • Replaced multi-head attention with interleaved_matmul_encdec operators, which removes previously needed transposes and improves performance.

    • Beam search states and model layers now assume time-major format.

    [2.1.26]

    Fixed

    • Fixes a backwards incompatibility introduced in 2.1.17, which would prevent models trained with prior versions to be used for inference.

    [2.1.25]

    Changed

    • Reverting PR #772 as it causes issues with amp.

    [2.1.24]

    Changed

    • Make sure to write a final checkpoint when stopping with --max-updates, --max-samples or --max-num-epochs.

    [2.1.23]

    Changed

    • Updated to MXNet 1.7.0.
    • Re-introduced use of softmax with length parameter in DotAttentionCell (see PR #772).

    [2.1.22]

    Added

    • Re-introduced --softmax-temperature flag for sockeye.score and sockeye.translate.
    Source code(tar.gz)
    Source code(zip)
  • 2.1.21(Aug 27, 2020)

    [2.1.21]

    Added

    • Added an optional ability to cache encoder outputs of model.

    [2.1.20]

    Fixed

    • Fixed a bug where the training state object was saved to disk before training metrics were added to it, leading to an inconsistency between the training state object and the metrics file (see #859).

    [2.1.19]

    Fixed

    • When loading a shard in Horovod mode, there is now a check that each non-empty bucket contains enough sentences to cover each worker's slice. If not, the bucket's sentences are replicated to guarantee coverage.

    [2.1.18]

    Fixed

    • Fixed a bug where sampling translation fails because an array is created in the wrong context.
    Source code(tar.gz)
    Source code(zip)
  • 2.1.17(Aug 20, 2020)

    [2.1.17]

    Added

    • Added layers.SSRU, which implements a Simpler Simple Recurrent Unit as described in Kim et al, "From Research to Production and Back: Ludicrously Fast Neural Machine Translation" WNGT 2019.

    • Added ssru_transformer option to --decoder, which enables the usage of SSRUs as a replacement for the decoder-side self-attention layers.

    Changed

    • Reduced the number of arguments for MultiHeadSelfAttention.hybrid_forward(). previous_keys and previous_values should now be input together as previous_states, a list containing two symbols.
    Source code(tar.gz)
    Source code(zip)
  • 2.1.16(Jul 31, 2020)

    [2.1.16]

    Fixed

    • Fixed batch sizing error introduced in version 2.1.12 (c00da52) that caused batch sizes to be multiplied by the number of devices. Batch sizing now works as documented (same as pre-2.1.12 versions).
    • Fixed max-word batching to properly size batches to a multiple of both --batch-sentences-multiple-of and the number of devices.

    [2.1.15]

    Added

    • Inference option --mc-dropout to use dropout during inference, leading to non-deterministic output. This option uses the same dropout parameters present in the model config file.

    [2.1.14]

    Added

    • Added sockeye.rerank option --output to specify output file.
    • Added sockeye.rerank option --output-reference-instead-of-blank to output reference line instead of best hypothesis when best hypothesis is blank.
    Source code(tar.gz)
    Source code(zip)
  • 2.1.13(Jul 7, 2020)

    [2.1.13]

    Added

    • Training option --quiet-secondary-workers that suppresses console output for secondary workers when training with Horovod/MPI.
    • Set version of isort to <5.0.0 in requirements.dev.txt to avoid incompatibility between newer versions of isort and pylint.

    [2.1.12]

    Added

    • Batch type option max-word for max number of words including padding tokens (more predictable memory usage than word).
    • Batching option --batch-sentences-multiple-of that is similar to --round-batch-sizes-to-multiple-of but always rounds down (more predictable memory usage).

    Changed

    • Default bucketing settings changed to width 8, max sequence length 95 (96 including BOS/EOS tokens), and no bucket scaling.
    • Argument --no-bucket-scaling replaced with --bucket-scaling which is False by default.

    [2.1.11]

    Changed

    • Updated sockeye.rerank module to use "add-k" smoothing for sentence-level BLEU.

    Fixed

    • Updated sockeye.rerank module to use current N-best format.
    Source code(tar.gz)
    Source code(zip)
  • 2.1.10(Jun 23, 2020)

    [2.1.10]

    Changed

    • Changed to a cross-entropy loss implementation that avoids the use of SoftmaxOutput.

    [2.1.9]

    Added

    • Added training argument --ignore-extra-params to ignore extra parameters when loading models. The primary use case is continuing training with a model that has already been annotated with scaling factors (sockeye.quantize).

    Fixed

    • Properly pass allow_missing flag to model.load_parameters()

    [2.1.8]

    Changed

    • Update to sacrebleu=1.4.10
    Source code(tar.gz)
    Source code(zip)
Owner
Amazon Web Services - Labs
AWS Labs
Amazon Web Services - Labs
Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet

Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet

Amazon Web Services - Labs 1000 Apr 19, 2021
Neural-Machine-Translation - Implementation of revolutionary machine translation models

Neural Machine Translation Framework: PyTorch Repository contaning my implementa

Utkarsh Jain 1 Feb 17, 2022
Sequence-to-Sequence Framework in PyTorch

nmtpytorch allows training of various end-to-end neural architectures including but not limited to neural machine translation, image captioning and au

LIUM 395 Nov 21, 2022
Code for the paper: Sequence-to-Sequence Learning with Latent Neural Grammars

Code for the paper: Sequence-to-Sequence Learning with Latent Neural Grammars

Yoon Kim 43 Dec 23, 2022
Phrase-Based & Neural Unsupervised Machine Translation

Unsupervised Machine Translation This repository contains the original implementation of the unsupervised PBSMT and NMT models presented in Phrase-Bas

Facebook Research 1.5k Dec 28, 2022
The implementation of Parameter Differentiation based Multilingual Neural Machine Translation

The implementation of Parameter Differentiation based Multilingual Neural Machin

Qian Wang 21 Dec 17, 2022
Easy to use, state-of-the-art Neural Machine Translation for 100+ languages

EasyNMT - Easy to use, state-of-the-art Neural Machine Translation This package provides easy to use, state-of-the-art machine translation for more th

Ubiquitous Knowledge Processing Lab 748 Jan 6, 2023
Open Source Neural Machine Translation in PyTorch

OpenNMT-py: Open-Source Neural Machine Translation OpenNMT-py is the PyTorch version of the OpenNMT project, an open-source (MIT) neural machine trans

OpenNMT 5.8k Jan 4, 2023
Open Source Neural Machine Translation in PyTorch

OpenNMT-py: Open-Source Neural Machine Translation OpenNMT-py is the PyTorch version of the OpenNMT project, an open-source (MIT) neural machine trans

OpenNMT 4.8k Feb 18, 2021
Yet Another Neural Machine Translation Toolkit

YANMTT YANMTT is short for Yet Another Neural Machine Translation Toolkit. For a backstory how I ended up creating this toolkit scroll to the bottom o

Raj Dabre 121 Jan 5, 2023
PyTorch Implementation of "Non-Autoregressive Neural Machine Translation"

Non-Autoregressive Transformer Code release for Non-Autoregressive Neural Machine Translation by Jiatao Gu, James Bradbury, Caiming Xiong, Victor O.K.

Salesforce 261 Nov 12, 2022
Training open neural machine translation models

Train Opus-MT models This package includes scripts for training NMT models using MarianNMT and OPUS data for OPUS-MT. More details are given in the Ma

Language Technology at the University of Helsinki 167 Jan 3, 2023
Learning to Rewrite for Non-Autoregressive Neural Machine Translation

RewriteNAT This repo provides the code for reproducing our proposed RewriteNAT in EMNLP 2021 paper entitled "Learning to Rewrite for Non-Autoregressiv

Xinwei Geng 20 Dec 25, 2022
Implementaion of our ACL 2022 paper Bridging the Data Gap between Training and Inference for Unsupervised Neural Machine Translation

Bridging the Data Gap between Training and Inference for Unsupervised Neural Machine Translation This is the implementaion of our paper: Bridging the

hezw.tkcw 20 Dec 12, 2022
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language mod

null 20.5k Jan 8, 2023
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language mod

null 11.3k Feb 18, 2021
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language mod

null 13.2k Jul 7, 2021
A highly sophisticated sequence-to-sequence model for code generation

CoderX A proof-of-concept AI system by Graham Neubig (June 30, 2021). About CoderX CoderX is a retrieval-based code generation AI system reminiscent o

Graham Neubig 39 Aug 3, 2021
Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

This is a fork of Fairseq(-py) with implementations of the following models: Pervasive Attention - 2D Convolutional Neural Networks for Sequence-to-Se

Maha 490 Dec 15, 2022