Neural machine translation between the writings of Shakespeare and modern English using TensorFlow

Motoki Wu

Last update: Dec 28, 2022

Related tags

Overview

Shakespeare translations using TensorFlow

This is an example of using the new Google's TensorFlow library on monolingual translation going from modern English to Shakespeare based on research from Wei Xu.

Prepare

First download the TensorFlow library depending on your platform:

pip install https://storage.googleapis.com/tensorflow/mac/tensorflow-0.5.0-py2-none-any.whl # for mac
pip install https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.5.0-cp27-none-linux_x86_64.whl # for ubuntu

Grabs parallel data.
Gets train, dev split.
Builds vocabulary
Converts parallel data into ids

From the root directory:

python -m tensorshake.get_data
python -m tensorshake.prepare_corpus

Delete /cache to start anew.

Train

Use the example BASH script to train the model. This saves the check points in the --train_dir directory. If you run it again, the training process continues from the check point. To restart with fresh parameters, simply delete/rename the check points.

./run.sh

Results

Benchmarks from original paper. (Shakespeare -> Modern English)

Input	Output
i will bite thee by the ear for that jest .	i ’ ll bite you by the ear for that joke .
what further woe conspires against mine age ?	what ’ s true despair conspires against my old age ?
how doth my lady ?	how is my lady ?
hast thou slain tybalt ?	have you killed tybalt ?
an i might live to see thee married once , i have my wish .	if i could live to see you married, i ’ ve my wish .
benvolio , who began this bloody fray ?	benvolio , who started this bloody fight itself ?
what is your will ?	what do you want ?
call her forth to me .	bring her out to me .

Cherrypicked examples from this repo (Modern English -> Shakespeare)

Input	Output
but you’re not listening to me.	but you do not hear me .
Gregory, on my word, we will not be humiliated, like carrying coal.	regory , we 'll not carry coals .
but he got the promotion.	he is the friend .
i can hit quickly, if i'm motivated.	i strike , i am moved .
Did you just give us the finger, sir?	have you leave the thumb , sir ?
You don’t know what you’re doing!	you do not what you know you .
have you killed Tybalt?	hast thou slain tybalt ?
Why, Romeo, are you crazy?	why , art thou mad , mad ?

Pre-Trained Models

Here is a link for an example model: https://s3-us-west-2.amazonaws.com/foxtype-nlp/tensorshake/model_cache.zip

Possible improvements

word embeddings
beam search
language model reranking

Comments

Run.sh gives error after starting

This is a trace what happens:

basti@n095:/home/basti/tensorflow/tensorflow-shakespeare# sh run.sh ROOT_DIR /home/basti/tensorflow/tensorflow-shakespeare CACHE_DIR /home/basti/tensorflow/tensorflow-shakespeare/cache I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 8 I tensorflow/core/common_runtime/direct_session.cc:58] Direct session inter op parallelism threads: 8 Creating 2 layers of 256 units.

Created model with fresh parameters. Traceback (most recent call last): File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main "main", fname, loader, pkg_name) File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/home/basti/tensorflow/tensorflow-shakespeare/tensorshake/translate/translate.py", line 295, in tf.app.run() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/default/_app.py", line 30, in run sys.exit(main(sys.argv)) File "/home/basti/tensorflow/tensorflow-shakespeare/tensorshake/translate/translate.py", line 292, in main train() File "/home/basti/tensorflow/tensorflow-shakespeare/tensorshake/translate/translate.py", line 158, in train model = create_model(sess, False) File "/home/basti/tensorflow/tensorflow-shakespeare/tensorshake/translate/translate.py", line 139, in create_model session.run(tf.variables.initialize_all_variables())

opened by ghost 3
How to train the parallel text?

Thank you so much for sharing this. It's amazing. It's not clear to me though how to structure the data for training? Can you use TMX files? What exactly do I need to be able to train my data? Would Tab-delimited files be OK? What about XLIFF?

Thanks in advance!

opened by mzeidhassan 1

training data error

for aligned_data in DATA_LINKS:
    for root, dirs, filenames in os.walk(get_dir(aligned_data)):

      #you should sort filnames here, or they wouldn't match

        for filename in filenames:
            with open(os.path.join(get_dir(aligned_data), filename), 'r') as f:

opened by dogegg250 1

CVE-2007-4559 Patch

Patching CVE-2007-4559

Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

opened by TrellixVulnTeam 0
ImportError: cannot import name rnn_cell

I have received the following error when trying to execute the command ./run.sh the Tensorflow version is 1.0.1 in python 2.

Traceback (most recent call last): File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main "main", fname, loader, pkg_name) File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/home/drwael/shakespeare/tensorshake/translate/translate.py", line 35, in from tensorshake.translate import seq2seq_model File "tensorshake/translate/seq2seq_model.py", line 13, in from tensorflow.models.rnn import rnn_cell ImportError: No module named models.rnn

opened by drwael 0
weird minimum

Fantastic project :-). Thank you for sharing your code.

Have you got a working example yet? Im having trouble replicating the results (even when using your pre-trained model). e.g.

IN: but you’re not listening to me. OUT: succeeds tempered tempered tempered tempered drum drum drum drum drum drum drum drum drum drum

Would you be able to share a working pre-trained model? (or the necessary parameters?)

Many thanks.

opened by RJBrooker 0
multiple values for keyword argument 'softmax_loss_function'

trideep@trideep-HP-ENVY-x360-m6-Convertible:~/Downloads/tensorflow-shakespeare-master$ ./run.sh en_train /home/trideep/Downloads/tensorflow-shakespeare-master/cache/all_modern_train.ids fr_train /home/trideep/Downloads/tensorflow-shakespeare-master/cache/all_original_train.ids en_dev /home/trideep/Downloads/tensorflow-shakespeare-master/cache/all_modern_dev.ids fr_dev /home/trideep/Downloads/tensorflow-shakespeare-master/cache/all_original_dev.ids Creating 2 layers of 256 units. en_vocab_size 10000 fr_vocab_size 10000 <function sampled_loss at 0x7f3cea7839b0> Traceback (most recent call last): File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main "main", fname, loader, pkg_name) File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/home/trideep/Downloads/tensorflow-shakespeare-master/tensorshake/translate/translate.py", line 314, in tf.app.run() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 30, in run sys.exit(main(sys.argv)) File "/home/trideep/Downloads/tensorflow-shakespeare-master/tensorshake/translate/translate.py", line 311, in main train() File "/home/trideep/Downloads/tensorflow-shakespeare-master/tensorshake/translate/translate.py", line 165, in train model = create_model(sess, False) File "/home/trideep/Downloads/tensorflow-shakespeare-master/tensorshake/translate/translate.py", line 133, in create_model forward_only=forward_only) File "tensorshake/translate/seq2seq_model.py", line 138, in init softmax_loss_function=softmax_loss_function) TypeError: model_with_buckets() got multiple values for keyword argument 'softmax_loss_function'

opened by trideeprath 4

Neural machine translation between the writings of Shakespeare and modern English using TensorFlow

Related tags

Overview

Shakespeare translations using TensorFlow

Prepare

Train

Results

Pre-Trained Models

Possible improvements

Comments

Run.sh gives error after starting

How to train the parallel text?

training data error

CVE-2007-4559 Patch

Patching CVE-2007-4559

ImportError: cannot import name rnn_cell

weird minimum

multiple values for keyword argument 'softmax_loss_function'

Owner

Motoki Wu

TANL: Structured Prediction as Translation between Augmented Natural Languages

🔥 Cogitare - A Modern, Fast, and Modular Deep Learning and Machine Learning framework for Python

Code for the paper "Balancing Training for Multilingual Neural Machine Translation, ACL 2020"

Code for paper "Vocabulary Learning via Optimal Transport for Neural Machine Translation"

Implementation of "Glancing Transformer for Non-Autoregressive Neural Machine Translation"

Contrastive Learning for Many-to-many Multilingual Neural Machine Translation(mCOLT/mRASP2), ACL2021

"Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback"

codes for "Scheduled Sampling Based on Decoding Steps for Neural Machine Translation" (long paper of EMNLP-2022)

PyTorch Implementation of "Non-Autoregressive Neural Machine Translation"

MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.

Compare outputs between layers written in Tensorflow and layers written in Pytorch

This repository contains notebook implementations of the following Neural Process variants: Conditional Neural Processes (CNPs), Neural Processes (NPs), Attentive Neural Processes (ANPs).

modelvshuman is a Python library to benchmark the gap between human and machine vision

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

An attempt at the implementation of Glom, Geoffrey Hinton's new idea that integrates neural fields, predictive coding, top-down-bottom-up, and attention (consensus between columns)

S2-BNN: Bridging the Gap Between Self-Supervised Real and 1-bit Neural Networks via Guided Distribution Calibration (CVPR 2021)

Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021

Machine Translation Implement By Bi-GRU And Transformer