Conditional Transformer Language Model for Controllable Generation

Related tags

Text Data & NLP ctrl
Overview

CTRL - A Conditional Transformer Language Model for Controllable Generation

Authors: Nitish Shirish Keskar, Bryan McCann, Lav Varshney, Caiming Xiong, and Richard Socher

Updates

Apr 20, 2020

We are adding a model card for CTRL! Please reach out if you have any questions about it.

Oct 31, 2019

Adding functionality to convert a model from TF to HuggingFace/Transformers in response to a request. To convert the checkpoint, simply install transformers via pip install transformers and run python -u convert_tf_to_huggingface_pytorch.py --tf <path_to_tensorflow_data_checkpoint> --pytorch <path_to_where_you_want_to_store_pytorch_checkpoint>

Then, to use this in HuggingFace:

# create folder and contents for HuggingFace/Transformers
mkdir custom_ctrl_model
cd custom_ctrl_model
mv <path_to_pytorch_checkpoint_from_above> .
wget -O config.json https://storage.googleapis.com/sf-ctrl/pytorch/ctrl-config.json
wget -O merges.txt https://raw.githubusercontent.com/salesforce/ctrl/master/ctrl-merges.txt
wget -O vocab.json https://raw.githubusercontent.com/salesforce/ctrl/master/ctrl-vocab.json

# run
python examples/run_generation.py  --model_type ctrl --model_name <path_to_custom_ctrl_model>/ --temperature 0 --repetition 1.2

Oct 21, 2019

CTRL is now in hugginface/transformers!

You can simply follow the installation instructions and run:

python examples/run_generation.py  --model_type ctrl --model_name ctrl --temperature 0 --repetition 1.2

Sep 25, 2019

Two more additions:

  1. We add the code to fine-tune the model on a custom dataset in the training_utils folder. Please refer to the README within the folder for details and example usage.

  2. You can get a 36-layer model from gs://sf-ctrl/seqlen256_36layers_v0.ckpt/; the generation of this model is markedly worse than the 48-layer (base) model but still quite coherent.

Sep 23, 2019

The repo now supports (experimental) inference on PyTorch; Collaboratory: https://colab.research.google.com/drive/1nDh3ayRPJGK5ciPO2D3TFkYZFqclBWHY. Simply install PyTorch via pip install torch and run python pytorch_generation.py with the same flags as the base generation.py script except one exception: unlike the base version, here, the model_path requires the path to the .data file and not just the ckpt folder (see collaboratory for example). The code will convert the weights from TensorFlow in the first run and then create a loadable checkpoint for easier subsequent loading. You still need Tensorflow installed for the first step.

Sep 19, 2019

You should now be able to run inference on K80/T4/P100/similar GPUs using the lower_memory branch. We quantized certain weights to fp16 which reduced memory usage. Simply clone the repo and git checkout lower_memory. Here is a collaboratory link that demonstrates this functionality: https://colab.research.google.com/drive/1hVveBQShDru1Mjnhe4C21uQv4A2eH1tV

This functionality is being tested, please file GitHub issues if you see something aberrent. We still recommend using the full model if possible. Once the functionality has been sufficiently tested, we will update the repo and merge into master.

Two quick notes: (1) Unlike the base version, here, the model_path requires the path to the .data file and not just the ckpt folder (see collaboratory for example), (2) the first generation is slow because of overhead in setting up the model but the subsequent ones should be fast.

Introduction

Large-scale language models show promising text generation capabilities, but users cannot easily control this generation process. We release CTRL, a 1.6 billion-parameter conditional transformer language model, trained to condition on control codes that specify domain, subdomain, entities, relationships between entities, dates, and task-specific behavior. Control codes were derived from structure that naturally co-occurs with raw text, preserving the advantages of unsupervised learning while providing more explicit control over text generation.

Paper link: https://arxiv.org/abs/1909.05858

Blog link: https://blog.einstein.ai/introducing-a-conditional-transformer-language-model-for-controllable-generation/

The code currently supports two functionalities:

  1. Generating from a trained model, two models are available for download - one with a sequence length of 256 and another with a sequence length of 512 -- they are trained with word-level vocabularies and through a sliding window approach can generate well beyond their trained sequence lengths.
  2. Source attribution - given a prompt, prints the perplexity of the prompt conditional on each domain control code (see Section 5 of the paper).

Please refer to the argument flags for more details regarding the options available for either.

Table of Contents

  1. Citation
  2. License
  3. Questions for Deliberation
  4. Usage
  5. Sample Generations
  6. Sample Source Attributions
  7. FAQs
  8. Get Involved

Citation

@article{keskarCTRL2019,
  title={{CTRL - A Conditional Transformer Language Model for Controllable Generation}},
  author={Keskar, Nitish Shirish and McCann, Bryan and Varshney, Lav and Xiong, Caiming and Socher, Richard},
  journal={arXiv preprint arXiv:1909.05858},
  year={2019}
}

License

The code is released under the BSD-3 License (see LICENSE.txt for details), but we also ask that users respect the following:

This software should not be used to promote or profit from:

violence, hate, and division,

environmental destruction,

abuse of human rights, or

the destruction of people's physical and mental health.

We encourage users of this software to tell us about the applications in which they are putting it to use by emailing [email protected], and to use appropriate documentation when developing high-stakes applications of this model.

Questions for Deliberation

We consulted extended members of the AI community in the responsible publication of this model. In particular, a preview of a Partnership on AI (PAI) project relating to AI research publication norms was considered prior to the release of this work. While this PAI project is as-yet unpublished, it is informed by companies, organizations, and people differently affected by artificial intelligence and presents key considerations to evaluate before publishing potentially high-impact research.

The questions referenced from the early draft of the PAI project included:

  1. How do you envision your research being used in the world? Who will use it? How much expertise is required to use it?
  2. Who will use it?
  3. Why would they be motivated to replicate / productionize your work?
  4. How would a science fiction author turn your research into a dystopian story?
  5. What is the worst way someone could use your research finding, given no resource constraints?
  6. What are the historical patterns of misuse or application in this area? How can the research be made more robust against such misuse?
  7. Which populations or communities will this technology negatively affect, deployed in the scenarios you envision? Will some groups be disproportionately affected?

Usage

Here are the steps to get generating:

  1. Install the dependencies

This code relies on TensorFlow 1.14 and fastBPE.

TensorFlow can be installed via pip install tensorflow[-gpu]==1.14. fastBPE installation instructions can be found in the GitHub repository linked above. We highly recommend experimenting within a virtualenv or Docker image.

For inference on PyTorch, please see the update on Sep 23 at the top of this README. If you use PyTorch, you can skip Step 2.

  1. Patch the /usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/keras.py (or equivalent, if installed elsewhere) by running

patch -b <path_to_tensorflow_estimator_package>/python/estimator/keras.py estimator.patch

We highly recommend experimenting within a virtualenv or Docker image since the workflow involves patching a TensorFlow file to support some custom functionality. This step is not optional; skipping this step will cause errors (irrespective of device).

If you run into OOM issues because of GPU memory exhaustion, please use the lower_memory branch. See the (Sep 19, 2019) update at the top of this README for details.

  1. Get the model files from gs://sf-ctrl/seqlen256_v1.ckpt/ or gs://sf-ctrl/seqlen512_v1.ckpt/.

A 36-layer model is also available at gs://sf-ctrl/seqlen256_36layers_v0.ckpt/.

The model architecture is identical for both checkpoints. The former is trained with lower training sequence length (256) while the latter is trained with a larger one (512). We plan to update the models (with the appropriate version tags) as we continue to train them longer and on more data. Our current recommendation is to use the 256_v1 model unless you have a strong reason not to. If you have no preference for domain, Links is always a good first choice.

With gsutil installed, you can simply run gsutil -m cp -r gs://sf-ctrl/seqlen256_v1.ckpt/ . for copying the model checkpoint over.

Without gsutil, you can follow the route recommended @ https://github.com/salesforce/ctrl/issues/7#issuecomment-531303214

  1. Run the generation script generation.py or the source attribution script source_attribution.py.

The generation.py prompts the user to input text and then prints the continuation. The source_attribution.py promps the user to input text and then prints a sorted list of domains and the perplexity of the text conditional on each individual domain.

Generations

The generations and attributions computed below have been generated using the 256 sequence length model. Comparable results can be obtained from the 512 version of the model as well. We demonstrate only a few of the functionalities, especially the control codes. For a complete list of the control codes, and how to use them, please refer to the paper. Note that <GENERATION_BEGINS> is only included for demonstrative purposes and is not actually generated by the model.

  1. Links
Links In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.

<GENERATION_BEGINS> 

The scientists were able to track down the unicorn population by analyzing their DNA. They found out that they are descended from an ancient species called the “Tapiri,” which is believed to have been extinct for over 10,000 years.

 According to scientists, this discovery proves that there may be other undiscovered creatures on our planet.

 “We believe these animals could still exist and we hope to find them one day,” said Dr. Carlos Jaramillo, director of the National Geographic Society’s Center for Biodiversity Research at the University of Arizona.

 Scientists also confirmed that the Unicorn Genome Project has already identified several genes associated with the animal’s unique horn structure.

 “The genome project will help us understand how the animal evolved its horns so it can better protect itself against predators like jaguars or pumas.” 
 
 Researchers say that if all goes well, the new genetic information should allow them to create a 3D model of the animal.

 “It would give us a much clearer picture about what makes the creature unique.”- Dr David Sillito, lead researcher ...

As a generate note, you don't have to use greedy sampling. You can switch to topk or nucleus through the appropriate argument flags.

Running it with --temperature 0.2 and --topk 5 yields:

Links In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.

 The scientists were able to track down and capture one of these unicorn herds as it roamed through the valley for several days. They then took DNA samples from each animal and compared them with other animals found throughout the region.

 Advertisement

 What they found is truly astounding.

 According to the scientists, there are approximately 50 different species of unicorn, all of which live on two continents — North America and South America...
  1. Wikipedia
Wikipedia Salesforce Inc. is <GENERATION_BEGINS> a software company that provides cloud computing and business management software. It was founded in 1999 by Marc Benioff as an online retailer of enterprise software, but it has since expanded into other areas such as e-commerce, customer relationship management, and digital media...
  1. Reviews
Reviews Rating: 1.0 <GENERATION_BEGINS> 
 I bought this for my son who is a huge fan of the show. He was so excited to get it and when he opened it, we were all very disappointed. The quality of the product is terrible. It looks like something you would buy at a dollar store.
Rating: 1.0
 This item arrived in poor condition. There are several scratches on the front cover as well as some other minor damage...
Reviews Rating: 4.0 <GENERATION_BEGINS> 
 I bought this for my husband and he loves it. He has a small wrist so it is hard to find watches that fit him well. This one fits perfectly.
 Rating: 4.0
 The watch looks great but the band was too big even though i had read other reviews about how large the bands were.
 Rating: 4.0
 It works well as an everyday watch. It's not very fancy or anything, just simple with no extra bells & whistles.
 
  1. Books
Books Weary with toil, I haste me to my bed, <GENERATION_BEGINS> 
 And sleep the slumber of a weary man.
 The morning sun is up, and shines on high;
 My bed is ready, but no friend appears.
 A little while ago he came in sight,
 But now he's gone, as if he'd never come.
 He left his hat upon the table there,
 His coat was hanging up behind the door;
 And, when at last I looked around, there stood
 No one beside me, nor within call.
 Oh, what shall I do? What can be done?
 I'm sure that some strange accident has happened.
 I'll go out into the street again,
 To seek for him, or call him back again.
 But, oh, how lonely it will seem
 When all are gone who once were near me!
  1. Reddit - r/running
Running Title: <GENERATION_BEGINS> I just ran my first 5k in under 30 minutes.
 Text: It was a pretty easy run, but it felt great.

 My goal is to run at least 10 miles per week for the next few months.

 I'm not sure if this qualifies as running or not, but it's definitely something that I've been wanting to do since high school.

 Thanks r/running!

 Edit: Thank you all so much for your kind words and advice.

 Edit 2: Wow, thanks guys!
  1. Reddit - r/fitness
Fitness Title: <GENERATION_BEGINS> How do you get your body to burn fat when you're not working out?
 Text: I know that the reason why people lose weight is because they are eating less and burning more calories than their bodies need but how does this work for someone who doesn't have a lot of time or energy to exercise?

 For example, if I'm at home doing nothing all day then I'll be able to eat whatever I want without feeling like I've eaten too much. But what happens after that?

 EDIT: Thanks everyone for the responses!
  1. Reddit - r/frugal
Saving Title: <GENERATION_BEGINS> How to get a free credit report
 Text: I have been trying for months now and it seems like no one is willing to give me one.

 Is there any way that you can just go online and fill out the forms yourself or do they require some sort of ID or something?

 Edit: Thanks everyone for your help!

 Edit 2: Thank you all so much for helping me!
  1. Reddit - r/personalfinance
Finance Title: <GENERATION_BEGINS> I have a question about my credit score.
 Text: So I'm currently in the process of applying for an auto loan and I've been told that if you don't pay your bills on time then they will not approve you.

 My questions are:

 1. How do I know when to start paying off debt?
 2. What is the best way to get out of debt without having to file bankruptcy?
 3. Is it possible to refinance my car loan?
 4. Should I just wait until after school starts so I can take advantage of lower interest rates?
 5. If so, how long should I wait?

 Thanks
  1. Questions
Questions Q: What is the capital of Australia? <GENERATION_BEGINS>
 A: Canberra
 Q: How many people live in Canberra?
 A: 650,000
  1. Translation
Translation English : This is a natural language processing model that aims to generate coherent text in a controllable manner. ; French : <GENERATION_BEGINS> 
Il s'agit d'un modèle de traitement du langage naturel qui vise à générer un texte cohérent et contrôlable.
Translation English : This is a natural language processing model that aims to generate coherent text in a controllable manner. ; German : <GENERATION_BEGINS> 
Es handelt sich um ein natürliches Textverarbeitungssystem, das auf eine einheitliche und kontrollierbare Erzeugung von Text abzielt.

Source Attributions

  1. I lost 10 lbs! Feeling great!
PROMPT: I lost 10 lbs! Feeling great!
Diet ppl = 28.960714
Weight ppl = 29.223865
Fitness ppl = 36.162671
...
  1. My landlord is suing me for unpaid rent
PROMPT: My landlord is suing me for unpaid rent
Legal ppl = 21.210965
Finance ppl = 24.619064
Saving ppl = 27.923208
...
  1. And then I saw him, the man in the mirror.
PROMPT: And then I saw him, the man in the mirror.
Horror ppl = 17.919299
Scary ppl = 18.587843
Writing ppl = 23.154564
...
  1. Anarchism is an anti-authoritarian political philosophy that rejects hierarchies deemed unjust and advocates their replacement with self-managed, self-governed societies based on voluntary, cooperative institutions.
PROMPT: Anarchism is an anti-authoritarian political philosophy that rejects hierarchies deemed unjust and advocates their replacement with self-managed, self-governed societies based on voluntary, cooperative institutions.
Wikipedia ppl = 34.446701
News ppl = 34.484165
Links ppl = 35.460126
...
  1. I love God
PROMPT: I love God
Christianity ppl = 55.653985
Atheism ppl = 116.811038
Confessions ppl = 133.619834
...

FAQs

(We hope to update this section frequently).

  1. Will you be releasing the training code and data?

We plan to release the training code soon. Please refer to the update on Sep 25 for details on training code.

We will not be releasing the training data, but we will release tips and scripts related to data collection.

  1. Is a version of the model available in PyTorch?

Not at the moment, but if we come across an equivalent implementation, we will update this section. Please refer to the update on Sep 23 for inference on PyTorch.

  1. The code errors out.

Make sure that you have performed the patch as described above. If the error persists, please create a GitHub issue.

  1. The code generates non-sense irrespective of the prompt.

Make sure that you have (a) provided the right --model_dir and that the folder actually exists and has the checkpoint, (b) provided a valid source code as the first token, and (c) tried generating with a simple prompt such as Links I or Books From. If the error persists, please create a GitHub issue.

Get Involved

Please create a GitHub issue if you have any questions, suggestions, requests or bug-reports. We welcome PRs!

Comments
  • Could not allocate pinned host memory of size: 2147483648

    Could not allocate pinned host memory of size: 2147483648

    Running !python2 generation.py --model_dir "/content/ctrl/seqlen256_v1.ckpt" in Colab outputs this:

    WARNING: Logging before flag parsing goes to stderr.
    W0912 03:52:40.595153 139689530402688 deprecation_wrapper.py:119] From generation.py:6: The name tf.enable_eager_execution is deprecated. Please use tf.compat.v1.enable_eager_execution instead.
    
    W0912 03:52:40.605669 139689530402688 deprecation_wrapper.py:119] From generation.py:35: The name tf.random.set_random_seed is deprecated. Please use tf.compat.v1.random.set_random_seed instead.
    
    246534 unique words
    2019-09-12 03:52:40.930801: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
    2019-09-12 03:52:40.971309: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2019-09-12 03:52:40.971914: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
    name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
    pciBusID: 0000:00:04.0
    2019-09-12 03:52:40.972273: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
    2019-09-12 03:52:40.973635: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
    2019-09-12 03:52:40.975007: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
    2019-09-12 03:52:40.975404: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
    2019-09-12 03:52:40.976992: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
    2019-09-12 03:52:40.978135: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
    2019-09-12 03:52:40.981770: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
    2019-09-12 03:52:40.981927: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2019-09-12 03:52:40.982547: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2019-09-12 03:52:40.983109: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
    2019-09-12 03:52:40.983494: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
    2019-09-12 03:52:41.114324: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2019-09-12 03:52:41.115113: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5574d0e20d80 executing computations on platform CUDA. Devices:
    2019-09-12 03:52:41.115150: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Tesla T4, Compute Capability 7.5
    2019-09-12 03:52:41.117511: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2000170000 Hz
    2019-09-12 03:52:41.117862: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5574d0e212c0 executing computations on platform Host. Devices:
    2019-09-12 03:52:41.117916: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
    2019-09-12 03:52:41.118114: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2019-09-12 03:52:41.118668: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
    name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
    pciBusID: 0000:00:04.0
    2019-09-12 03:52:41.118728: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
    2019-09-12 03:52:41.118748: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
    2019-09-12 03:52:41.118766: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
    2019-09-12 03:52:41.118784: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
    2019-09-12 03:52:41.118811: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
    2019-09-12 03:52:41.118840: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
    2019-09-12 03:52:41.118858: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
    2019-09-12 03:52:41.118934: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2019-09-12 03:52:41.119479: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2019-09-12 03:52:41.120052: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
    2019-09-12 03:52:41.120121: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
    2019-09-12 03:52:41.121241: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
    2019-09-12 03:52:41.121268: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
    2019-09-12 03:52:41.121280: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
    2019-09-12 03:52:41.121403: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2019-09-12 03:52:41.121995: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2019-09-12 03:52:41.122491: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:40] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
    2019-09-12 03:52:41.122537: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14221 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
    W0912 03:52:58.330300 139689530402688 lazy_loader.py:50] 
    The TensorFlow contrib module will not be included in TensorFlow 2.0.
    For more information, please see:
      * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
      * https://github.com/tensorflow/addons
      * https://github.com/tensorflow/io (for I/O related ops)
    If you depend on functionality not listed there, please file an issue.
    
    W0912 03:52:58.330642 139689530402688 deprecation_wrapper.py:119] From generation.py:124: The name tf.train.AdagradOptimizer is deprecated. Please use tf.compat.v1.train.AdagradOptimizer instead.
    
    Model: "model"
    __________________________________________________________________________________________________
    Layer (type)                    Output Shape         Param #     Connected to                     
    ==================================================================================================
    input_1 (InputLayer)            [(None, 256)]        0                                            
    __________________________________________________________________________________________________
    tied_embedding_softmax (TiedEmb multiple             315810054   input_1[0][0]                    
                                                                     encoder[0][0]                    
    __________________________________________________________________________________________________
    encoder (Encoder)               (None, 256, 1280)    1322154496  tied_embedding_softmax[0][0]     
    ==================================================================================================
    Total params: 1,637,964,550
    Trainable params: 1,637,964,550
    Non-trainable params: 0
    __________________________________________________________________________________________________
    None
    2019-09-12 03:52:58.496625: W tensorflow/core/framework/allocator.cc:107] Allocation of 1262254080 exceeds 10% of system memory.
    tcmalloc: large alloc 1262256128 bytes == 0x557523406000 @  0x7f0c00918b6b 0x7f0c00938379 0x7f0bbd80d754 0x7f0bbd7c8c8a 0x7f0bbd505f11 0x7f0bbd518f08 0x7f0bc366a00c 0x7f0bc3660298 0x7f0bc10448c7 0x7f0bc0fbc97c 0x7f0bc0fbed9d 0x5574cfe6af6e 0x5574cfe6152a 0x5574cfe68fce 0x5574cfe6152a 0x5574cfe68fce 0x5574cfe6152a 0x5574cfe7d03c 0x5574cfe4cf1e 0x5574cfe662d5 0x5574cfe6152a 0x5574cfe695d6 0x5574cfe6152a 0x5574cfe695d6 0x5574cfe6152a 0x5574cfe695d6 0x5574cfe6152a 0x5574cfe695d6 0x5574cfe6152a 0x5574cfe695d6 0x5574cfe6152a
    tcmalloc: large alloc 1262256128 bytes == 0x55756e7ce000 @  0x7f0c009361e7 0x7f0bfe37c771 0x7f0bfe3e4028 0x7f0bfe3d90d5 0x7f0bfe46ff77 0x5574cfe63e8a 0x5574cfe6152a 0x5574cfe695d6 0x5574cfe6152a 0x5574cfe695d6 0x5574cfe6152a 0x5574cfe695d6 0x5574cfe6152a 0x5574cfe695d6 0x5574cfe6152a 0x5574cfe695d6 0x5574cfe6152a 0x5574cfe695d6 0x5574cfe6152a 0x5574cfe68fce 0x5574cfe6152a 0x5574cfe68fce 0x5574cfe6152a 0x5574cfe60fb9 0x5574cfe91e7f 0x5574cfe8cc12 0x5574cfe8c09d 0x5574cfe3ad6b 0x7f0c00533b97 0x5574cfe3a5ea
    W0912 03:53:06.230777 139689530402688 deprecation.py:506] From /usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/initializers.py:143: calling __init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
    Instructions for updating:
    Call initializer instance with the dtype argument instead of passing it to the constructor
    W0912 03:53:11.251795 139689530402688 deprecation.py:506] From /usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/init_ops.py:1251: calling __init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
    Instructions for updating:
    Call initializer instance with the dtype argument instead of passing it to the constructor
    2019-09-12 03:53:24.403230: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2019-09-12 03:53:24.403729: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
    name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
    pciBusID: 0000:00:04.0
    2019-09-12 03:53:24.403847: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
    2019-09-12 03:53:24.403869: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
    2019-09-12 03:53:24.403910: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
    2019-09-12 03:53:24.403931: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
    2019-09-12 03:53:24.403952: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
    2019-09-12 03:53:24.403975: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
    2019-09-12 03:53:24.403994: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
    2019-09-12 03:53:24.404096: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2019-09-12 03:53:24.404475: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2019-09-12 03:53:24.404802: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
    2019-09-12 03:53:24.404864: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
    2019-09-12 03:53:24.404878: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
    2019-09-12 03:53:24.404901: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
    2019-09-12 03:53:24.405005: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2019-09-12 03:53:24.405377: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2019-09-12 03:53:24.405756: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14221 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
    2019-09-12 03:53:32.494371: E tensorflow/stream_executor/cuda/cuda_driver.cc:890] failed to alloc 2147483648 bytes on host: CUDA_ERROR_INVALID_VALUE: invalid argument
    2019-09-12 03:53:32.511468: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 2147483648
    
    opened by GrahamboJangles 16
  • Running full model on V100 outputs last word

    Running full model on V100 outputs last word

    I'm running the full model on a V100 GPU on Google Cloud, and the only output I'm getting is the last word copied over and over again. I've tried changing the temperature and topk parameters, but to no avail. I'm using the 512 version (larger version).

    Any advice would be greatly appreciated.

    opened by dimitri320 12
  • Repetitive generation for simple prompt

    Repetitive generation for simple prompt

    Followed the exact steps documented in README. The model with sequence length 256 running:

    ENTER PROMPT: hello this is GPT. how are you?
    

    image

    Is this error reproducible by others?

    opened by strin 8
  • Fixes issues with fine tuning on GPU's

    Fixes issues with fine tuning on GPU's

    I previously had some issues with training on GPU's #32. This fixes those and other issues to make training on GPU's work. Not sure if you want to merge it in, but figure I'd put it up if anyone else has fine tuning issues.

    cla:signed 
    opened by nickwalton 5
  • Finetuning Errors

    Finetuning Errors

    Hey I'm getting the following fine tuning errors on a multi gpu machine. I made sure to re-patch keras, but haven't had any luck. Any idea what the issue is?

    W0927 22:27:35.617535 140220124120896 deprecation.py:323] From /usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/clip_ops.py:286: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.where in 2.0, which has the same broadcast rule as np.where W0927 22:27:36.428683 140220124120896 deprecation.py:506] From /usr/local/lib/python2.7/dist-packages/tensorflow/python/training/adagrad.py:76: calling init (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor global_step: (VariableV2): /job:localhost/replica:0/task:0/device:GPU:0 2019-09-27 22:27:44.207266: I tensorflow/core/common_runtime/placer.cc:54] global_step: (VariableV2)/job:localhost/replica:0/task:0/device:GPU:0 global_step/Assign: (Assign): /job:localhost/replica:0/task:0/device:GPU:0 2019-09-27 22:27:44.207316: I tensorflow/core/common_runtime/placer.cc:54] global_step/Assign: (Assign)/job:localhost/replica:0/task:0/device:GPU:0 global_step/read: (Identity): /job:localhost/replica:0/task:0/device:GPU:0 2019-09-27 22:27:44.207337: I tensorflow/core/common_runtime/placer.cc:54] global_step/read: (Identity)/job:localhost/replica:0/task:0/device:GPU:0 w/Initializer/random_normal/RandomStandardNormal: (RandomStandardNormal): /job:localhost/replica:0/task:0/device:GPU:0 2019-09-27 22:27:44.207349: I tensorflow/core/common_runtime/placer.cc:54] w/Initializer/random_normal/RandomStandardNormal: (RandomStandardNormal)/job:localhost/replica:0/task:0/device:GPU:0 Traceback (most recent call last): File "training.py", line 162, in estimator_model = tf.keras.estimator.model_to_estimator(keras_model=model, config=run_config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/estimator/init.py", line 73, in model_to_estimator config=config) File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/keras.py", line 450, in model_to_estimator config) File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/keras.py", line 331, in save_first_checkpoint saver.save(sess, latest_path) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1173, in save {self.saver_def.filename_tensor_name: checkpoint_file}) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 950, in run run_metadata_ptr) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1173, in run feed_dict_tensor, options, run_metadata) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1350, in do_run run_metadata) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1370, in do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation w/Initializer/random_normal/mul: Could not satisfy explicit device specification '' because the node node w/Initializer/random_normal/mul (defined at training.py:90) placed on device Device assignments active during op 'w/Initializer/random_normal/mul' creation: with tf.device(None): </usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/resource_variable_ops.py:602> was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:1, /job:localhost/replica:0/task:0/device:XLA_GPU:2, /job:localhost/replica:0/task:0/device:XLA_GPU:3, /job:localhost/replica:0/task:0/device:XLA_GPU:4, /job:localhost/replica:0/task:0/device:XLA_GPU:5, /job:localhost/replica:0/task:0/device:XLA_GPU:6, /job:localhost/replica:0/task:0/device:XLA_GPU:7, /job:localhost/replica:0/task:0/device:XLA_GPU:8, /job:localhost/replica:0/task:0/device:XLA_GPU:9, /job:localhost/replica:0/task:0/device:GPU:0, /job:localhost/replica:0/task:0/device:GPU:1, /job:localhost/replica:0/task:0/device:GPU:2, /job:localhost/replica:0/task:0/device:GPU:3, /job:localhost/replica:0/task:0/device:GPU:4, /job:localhost/replica:0/task:0/device:GPU:5, /job:localhost/replica:0/task:0/device:GPU:6, /job:localhost/replica:0/task:0/device:GPU:7, /job:localhost/replica:0/task:0/device:GPU:8, /job:localhost/replica:0/task:0/device:GPU:9]. Colocation Debug Info: Colocation group had the following types and supported devices: Root Member(assigned_device_name_index=1 requested_device_name='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[] UnsortedSegmentSum: GPU CPU XLA_CPU XLA_GPU ResourceGather: GPU CPU XLA_CPU XLA_GPU Shape: GPU CPU XLA_CPU XLA_GPU Unique: GPU CPU ReadVariableOp: GPU CPU XLA_CPU XLA_GPU ResourceSparseApplyAdagrad: CPU StridedSlice: GPU CPU XLA_CPU XLA_GPU AssignVariableOp: GPU CPU XLA_CPU XLA_GPU Identity: GPU CPU XLA_CPU XLA_GPU RandomStandardNormal: GPU CPU XLA_CPU XLA_GPU Mul: GPU CPU XLA_CPU XLA_GPU Add: GPU CPU XLA_CPU XLA_GPU VarHandleOp: GPU CPU XLA_CPU XLA_GPU Const: GPU CPU XLA_CPU XLA_GPU VarIsInitializedOp: GPU CPU XLA_CPU XLA_GPU

    Colocation members, user-requested devices, and framework assigned devices, if any: w/Initializer/random_normal/shape (Const) w/Initializer/random_normal/mean (Const) w/Initializer/random_normal/stddev (Const) w/Initializer/random_normal/RandomStandardNormal (RandomStandardNormal) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0 w/Initializer/random_normal/mul (Mul) w/Initializer/random_normal (Add) w (VarHandleOp) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0 w/IsInitialized/VarIsInitializedOp (VarIsInitializedOp) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0 w/Assign (AssignVariableOp) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0 w/Read/ReadVariableOp (ReadVariableOp) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0 tied_embedding_softmax/embedding_lookup (ResourceGather) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0 tied_embedding_softmax/embedding_lookup/Identity (Identity) tied_embedding_softmax_1/transpose/ReadVariableOp (ReadVariableOp) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0 VarIsInitializedOp_322 (VarIsInitializedOp) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0 AssignVariableOp (AssignVariableOp) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0 ReadVariableOp (ReadVariableOp) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0 w/Adagrad/Initializer/Const (Const) w/Adagrad (VarHandleOp) w/Adagrad/IsInitialized/VarIsInitializedOp (VarIsInitializedOp) w/Adagrad/Assign (AssignVariableOp) w/Adagrad/Read/ReadVariableOp (ReadVariableOp) training/Adagrad/update_w/Unique (Unique) training/Adagrad/update_w/Shape (Shape) training/Adagrad/update_w/strided_slice/stack (Const) training/Adagrad/update_w/strided_slice/stack_1 (Const) training/Adagrad/update_w/strided_slice/stack_2 (Const) training/Adagrad/update_w/strided_slice (StridedSlice) training/Adagrad/update_w/UnsortedSegmentSum (UnsortedSegmentSum) training/Adagrad/update_w/ResourceSparseApplyAdagrad (ResourceSparseApplyAdagrad) save/AssignVariableOp_1542 (AssignVariableOp) save/AssignVariableOp_1543 (AssignVariableOp)

     [[node w/Initializer/random_normal/mul (defined at training.py:90) ]]Additional information about colocations:No node-device colocations were active during op 'w/Initializer/random_normal/mul' creation.
    

    Device assignments active during op 'w/Initializer/random_normal/mul' creation: with tf.device(None): </usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/resource_variable_ops.py:602>

    Original stack trace for u'w/Initializer/random_normal/mul': File "training.py", line 162, in estimator_model = tf.keras.estimator.model_to_estimator(keras_model=model, config=run_config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/estimator/init.py", line 73, in model_to_estimator config=config) File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/keras.py", line 450, in model_to_estimator config) File "/usr/local/lib/python2.7/dist-packages/

    opened by nickwalton 5
  • how to pretrain a ctrl model from scratch ?

    how to pretrain a ctrl model from scratch ?

    We wanna pretrain a ctrl model from scratch, could you provide some implementation details? What is the format of the training sample and can the training process be finished with script training.py ?

    opened by xurongqiang 5
  • How to add new control code into vocabulary?

    How to add new control code into vocabulary?

    Is it possible or is there any code for adding new control code into the vocabulary file?

    parser.add_argument('--control_code', type=str, required=True,
                                            help='control code to use for this file. must be in the vocabulary, else it will error out.')
    
    opened by leejason 4
  • Error when I make the tfrecords using the moby dick data

    Error when I make the tfrecords using the moby dick data

    Hello,

    I meet a covert TFRecord error when I try to fine-tuning the model with moby dick data. The error as follows :

    python make_tf_records.py --text_file ../data/moby_dick.txt --control_code Moby --sequence_len 256
    Loading vocabulary from ../vocab ...
    Read 6086453827 words (246531 unique) from vocabulary file.
    Loading codes from ../codes ...
    Read 200000 codes from the codes file.
    Traceback (most recent call last):
      File "make_tf_records.py", line 32, in <module>
        tokenized_train_text = bpe.apply([train_text.encode('ascii', errors='ignore')])[0] # will NOT work for non-English texts
      File "fastBPE/fastBPE.pyx", line 21, in fastBPE.fastBPE.apply
    AttributeError: 'bytes' object has no attribute 'encode'
    

    Maybe some non-English texts in moby dick cause this error, anybody can help me?

    opened by htw2012 3
  • error when generating w. nucleus

    error when generating w. nucleus

    I'm using the Colab notebook and I'm getting this error whenever I use the --nucleus argument

    generation.py:223: RuntimeWarning: overflow encountered in exp prompt_probs = np.exp(prompt_logits[_token]) generation.py:224: RuntimeWarning: invalid value encountered in true_divide prompt_probs = prompt_probs / sum(prompt_probs) generation.py:229: RuntimeWarning: invalid value encountered in greater nucleus = max(np.where(np.cumsum(np.sort(prompt_probs)[::-1])>nucleusprob)[0][0], minimum_topk) Traceback (most recent call last): File "generation.py", line 229, in <module> nucleus = max(np.where(np.cumsum(np.sort(prompt_probs)[::-1])>nucleusprob)[0][0], minimum_topk) IndexError: index 0 is out of bounds for axis 0 with size 0

    opened by orangesandcream 3
  • Issue with setting temperature

    Issue with setting temperature

    I was getting an error when setting the temperature for the generation script. I think this line: chosen_idx = int(tf.random.categorical(np.expand_dims(prompt_logits[0][_token][pruned_list],0), num_samples=1).numpy())

    Should be chosen_idx = int(tf.random.categorical(np.expand_dims(prompt_logits[_token][pruned_list],0), num_samples=1).numpy()) At least that seems to do what I expect. So when I torture the model with problems like this:

              if token > 0:
                prev=idx2word[tokens_generated[0][token]]
                if not prev.endswith('@@'):
                  for _ in range(len(pruned_list)):
                      if not idx2word[pruned_list[_]].lower().startswith('r'):
                        if not idx2word[pruned_list[_]].lower().startswith('t'):
                          if not idx2word[pruned_list[_]].lower().startswith('b'): 
                            tokens_to_disallow.append(_)
                  #if 'http' in idx2word[pruned_list[_]]:
                  #    tokens_to_disallow.append(_)
              pruned_list = np.delete(pruned_list, tokens_to_disallow)
    

    it seems to provide some entertaining results with some diversity.

    opened by calculusoflambdas 3
  • NameError: global name 'tf' is not defined

    NameError: global name 'tf' is not defined

    What am I doing wrong? This is the rough Dockerfile which I expected to work, but throws the above error:

    FROM nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04 RUN apt-get update && apt-get install -y git curl wget python python-pip vim RUN pip install Cython RUN pip install numpy tensorflow-gpu==1.14 RUN mkdir /CTRL WORKDIR /CTRL RUN git clone https://github.com/salesforce/ctrl.git . RUN git clone https://github.com/glample/fastBPE.git RUN cd fastBPE && g++ -std=c++11 -pthread -O3 fastBPE/main.cc -IfastBPE -o fast && \ python setup.py install RUN patch -b /usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/keras.py estimator.patch RUN mkdir model1

    On the host I get the models to save doing it in the dockerfile as I'm experimenting : wget https://storage.googleapis.com/sf-ctrl/seqlen256_v1.ckpt/checkpoint && \ wget https://storage.googleapis.com/sf-ctrl/seqlen256_v1.ckpt/model.ckpt-413000.data-00000-of-00001 && \ wget https://storage.googleapis.com/sf-ctrl/seqlen256_v1.ckpt/model.ckpt-413000.index && \ wget https://storage.googleapis.com/sf-ctrl/seqlen256_v1.ckpt/model.ckpt-413000.meta

    And then mount that into the docker image (for experimenting): nvidia-docker run -it --rm -v $(pwd)/../model256:/CTRL/model1 calculusoflabmdas/ctrl:v4 bash

    Running: python generation.py --model_dir model1/ gives the usual list of warnings before failing out with:

    Model: "model"
    
    __________________________________________________________________________________________________
    Layer (type)                    Output Shape         Param #     Connected to                     
    ==================================================================================================
    input_1 (InputLayer)            [(None, 256)]        0                                            
    __________________________________________________________________________________________________
    tied_embedding_softmax (TiedEmb multiple             315810054   input_1[0][0]                    
                                                                     encoder[0][0]                    
    __________________________________________________________________________________________________
    encoder (Encoder)               (None, 256, 1280)    1322154496  tied_embedding_softmax[0][0]     
    ==================================================================================================
    Total params: 1,637,964,550
    Trainable params: 1,637,964,550
    Non-trainable params: 0
    __________________________________________________________________________________________________
    None
    WARNING:tensorflow:You are creating an Estimator from a Keras model manually subclassed from `Model`, that was already called on some inputs (and thus already had weights). We are currently unable to preserve the model's state (its weights) as part of the estimator in this case. Be warned that the estimator has been created using a freshly initialized version of your model.
    Note that this doesn't affect the state of the model instance you passed as `keras_model` argument.
    Traceback (most recent call last):
      File "generation.py", line 143, in <module>
        estimator_model = tf.keras.estimator.model_to_estimator(keras_model=model, config=run_config)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/estimator/__init__.py", line 73, in model_to_estimator
        config=config)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/keras.py", line 462, in model_to_estimator
        estimator = tf.contrib.tpu.TPUEstimator(keras_model_fn, use_tpu=True, train_batch_size=512, eval_batch_size=32,
    NameError: global name 'tf' is not defined
    

    I got the same error running on a TPU instance of colab. This was run on GPU. What am I doing wrong? I got the same error using the tensorflow/tensorflow:1.14 image as the base too.

    opened by calculusoflambdas 3
  • Cuda out of memory issue.

    Cuda out of memory issue.

    I am trying to fine-tune the model but as the model is too large so it gets out of memory error. Any idea how to get over it?

    tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[246534,1280] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:RandomStandardNormal]

    opened by jamalabdul1 0
  • 12 layer (huggingface gpt-2 equivalent) ctrl model?

    12 layer (huggingface gpt-2 equivalent) ctrl model?

    Hi ! Wondering if smaller-sized, pre-trained ctrl models, e.g., the same size as the gpt-2 model on huggingface, are available - ideal for smaller-scale experiments without large-memory GPUs :)) Thanks !

    opened by anonymous10101010101 0
  • CTRL model can not work in huggingface transformers

    CTRL model can not work in huggingface transformers

    torch ver: 1.8.1+cu102 transformers ver: 4.4.2

    I adopt the example codes from https://github.com/huggingface/transformers/blob/master/examples/text-generation/run_generation.py to generate text by using ctrl. here is the head part of my codes:

    import torch
    from transformers import CTRLTokenizer, CTRLLMHeadModel
    tokenizer = CTRLTokenizer.from_pretrained('ctrl')
    model = CTRLLMHeadModel.from_pretrained('ctrl')
    
    encoded_prompt = tokenizer.encode("Links Hello, my dog is cute", add_special_tokens=False)
    

    ERROR:

    ValueError Traceback (most recent call last) in ----> 1 encoded_prompt = tokenizer.encode("Links Hello, my dog is cute", add_special_tokens=False)

    ~/yanan/env/lib/python3.6/site-packages/transformers/tokenization_utils_base.py in encode(self, text, text_pair, add_special_tokens, padding, truncation, max_length, stride, return_tensors, **kwargs) 2030 stride=stride, 2031 return_tensors=return_tensors, -> 2032 **kwargs, 2033 ) 2034

    ~/yanan/env/lib/python3.6/site-packages/transformers/tokenization_utils_base.py in encode_plus(self, text, text_pair, add_special_tokens, padding, truncation, max_length, stride, is_split_into_words, pad_to_multiple_of, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, return_length, verbose, **kwargs) 2355 return_length=return_length, 2356 verbose=verbose, -> 2357 **kwargs, 2358 ) 2359

    ~/yanan/env/lib/python3.6/site-packages/transformers/tokenization_utils.py in _encode_plus(self, text, text_pair, add_special_tokens, padding_strategy, truncation_strategy, max_length, stride, is_split_into_words, pad_to_multiple_of, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, return_length, verbose, **kwargs) 458 return_special_tokens_mask=return_special_tokens_mask, 459 return_length=return_length, --> 460 verbose=verbose, 461 ) 462

    ~/yanan/env/lib/python3.6/site-packages/transformers/tokenization_utils_base.py in prepare_for_model(self, ids, pair_ids, add_special_tokens, padding, truncation, max_length, stride, pad_to_multiple_of, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, return_length, verbose, prepend_batch_axis, **kwargs) 2792 padding=padding_strategy.value, 2793 pad_to_multiple_of=pad_to_multiple_of, -> 2794 return_attention_mask=return_attention_mask, 2795 ) 2796

    ~/yanan/env/lib/python3.6/site-packages/transformers/tokenization_utils_base.py in pad(self, encoded_inputs, padding, max_length, pad_to_multiple_of, return_attention_mask, return_tensors, verbose) 2591 else: 2592 raise ValueError( -> 2593 f"type of {first_element} unknown: {type(first_element)}. " 2594 f"Should be one of a python, numpy, pytorch or tensorflow object." 2595 )

    ValueError: type of None unknown: <class 'NoneType'>. Should be one of a python, numpy, pytorch or tensorflow object.

    opened by yananchen1989 2
  • Issues with pytorch_generation.py when running the Colab exercise

    Issues with pytorch_generation.py when running the Colab exercise

    Hi,

    I'm running the Colab exercise for pytorch.

    When I get to the last code block: !python2 generation.py --model seqlen256_v1.ckpt/model.ckpt-413000.data-00000-of-00001

    I'm getting this error: /usr/local/lib/python2.7/dist-packages/OpenSSL/crypto.py:14: CryptographyDeprecationWarning: Python 2 is no longer supported by the Python core team. Support for it is now deprecated in cryptography, and will be removed in the next release. from cryptography import utils, x509 246534 unique words tcmalloc: large alloc 1262256128 bytes == 0x55fbf8b14000 @ 0x7fde9e790b6b 0x7fde9e7b0379 0x7fde49688b4a 0x7fde4968a5fa 0x7fde4b9ba78a 0x7fde4bc0330b 0x7fde4bc4ab37 0x7fde4b9bb0b0 0x7fde4b9c4d95 0x7fde4bcfe973 0x7fde4bd42709 0x7fde94768c43 0x7fde9453fadb 0x55fbe9bf47ca 0x55fbe9bf1e0a 0x55fbe9c0d6a9 0x55fbe9c25e1e 0x55fbe9c259ca 0x55fbe9be2a6b 0x55fbe9bf9750 0x55fbe9bf1e0a 0x55fbe9bf1879 0x55fbe9c21f5f 0x55fbe9c1d222 0x55fbe9c1cc4d 0x55fbe9bcba86 0x7fde9e3abbf7 0x55fbe9bcb37a Loading vocabulary from vocab ... Read 6086453827 words (246531 unique) from vocabulary file. Loading codes from codes ... Read 200000 codes from the codes file. Could not find PyTorch checkpoint Converting weights and will store the PyTorch checkpoint as 5d11fdadc5aa3b7e75036e3df8c58c2b Traceback (most recent call last): File "pytorch_generation.py", line 114, in <module> reader = pywrap_tensorflow.NewCheckpointReader(chkpt_for_reader) AttributeError: 'module' object has no attribute 'NewCheckpointReader'

    Saw a previous filed issue, but this looks like it was resolved for Tensorflow 3.

    opened by yaw2014 0
Owner
Salesforce
A variety of vendor agnostic projects which power Salesforce
Salesforce
Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG)

Indobenchmark Toolkit Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG) resources fo

Samuel Cahyawijaya 11 Aug 26, 2022
A design of MIDI language for music generation task, specifically for Natural Language Processing (NLP) models.

MIDI Language Introduction Reference Paper: Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions: code This

Robert Bogan Kang 3 May 25, 2022
Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

null 41 Jan 3, 2023
RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2

RoNER RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2. It is meant to be an easy to use, hi

Stefan Dumitrescu 9 Nov 7, 2022
Incorporating KenLM language model with HuggingFace implementation of Wav2Vec2CTC Model using beam search decoding

Wav2Vec2CTC With KenLM Using KenLM ARPA language model with beam search to decode audio files and show the most probable transcription. Assuming you'v

farisalasmary 65 Sep 21, 2022
Conditional probing: measuring usable information beyond a baseline

Conditional probing: measuring usable information beyond a baseline

John Hewitt 20 Dec 15, 2022
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides

ASYML 2.3k Jan 7, 2023
An Analysis Toolkit for Natural Language Generation (Translation, Captioning, Summarization, etc.)

VizSeq is a Python toolkit for visual analysis on text generation tasks like machine translation, summarization, image captioning, speech translation

Facebook Research 409 Oct 28, 2022
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides

ASYML 2.1k Feb 17, 2021
An Analysis Toolkit for Natural Language Generation (Translation, Captioning, Summarization, etc.)

VizSeq is a Python toolkit for visual analysis on text generation tasks like machine translation, summarization, image captioning, speech translation

Facebook Research 310 Feb 1, 2021
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/

Texar-PyTorch is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar

ASYML 726 Dec 30, 2022
MASS: Masked Sequence to Sequence Pre-training for Language Generation

MASS: Masked Sequence to Sequence Pre-training for Language Generation

Microsoft 1.1k Dec 17, 2022
A Survey of Natural Language Generation in Task-Oriented Dialogue System (TOD): Recent Advances and New Frontiers

A Survey of Natural Language Generation in Task-Oriented Dialogue System (TOD): Recent Advances and New Frontiers

Libo Qin 132 Nov 25, 2022
Simple tool/toolkit for evaluating NLG (Natural Language Generation) offering various automated metrics.

Simple tool/toolkit for evaluating NLG (Natural Language Generation) offering various automated metrics. Jury offers a smooth and easy-to-use interface. It uses datasets for underlying metric computation, and hence adding custom metric is easy as adopting datasets.Metric.

Open Business Software Solutions 129 Jan 6, 2023
Code for the paper "Flexible Generation of Natural Language Deductions"

Code for the paper "Flexible Generation of Natural Language Deductions"

Kaj Bostrom 12 Nov 11, 2022
A highly sophisticated sequence-to-sequence model for code generation

CoderX A proof-of-concept AI system by Graham Neubig (June 30, 2021). About CoderX CoderX is a retrieval-based code generation AI system reminiscent o

Graham Neubig 39 Aug 3, 2021
Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing

Trankit: A Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing Trankit is a light-weight Transformer-based Pyth

null 652 Jan 6, 2023
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context This repository contains the code in both PyTorch and TensorFlow for our paper

Zhilin Yang 3.3k Dec 28, 2022