An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.

Overview

GPT Neo

DOI arXiv

πŸŽ‰ 1T or bust my dudes πŸŽ‰

An implementation of model & data parallel GPT3-like models using the mesh-tensorflow library.

If you're just here to play with our pre-trained models, we strongly recommend you try out the HuggingFace Transformer integration.

Training and inference is officially supported on TPU and should work on GPU as well. This repository will be (mostly) archived as we move focus to our GPU-specific repo, GPT-NeoX.

In addition to the functionality offered by GPT-3, we also offer the following:

NB, while neo can technically run a training step at 200B+ parameters, it is very inefficient at those scales. This, as well as the fact that many GPUs became available to us, among other things, prompted us to move development over to GPT-NeoX.

Pretrained Models

Update 21/03/2021:

We're proud to release two pretrained GPT-Neo models trained on The Pile, the weights and configs can be freely downloaded from the-eye.eu.

1.3B: https://the-eye.eu/public/AI/gptneo-release/GPT3_XL/

2.7B: https://the-eye.eu/public/AI/gptneo-release/GPT3_2-7B/

For more information on how to get these set up, see the colab notebook, or read through the rest of the readme.

Model Evaluations

Linguistic Reasoning

Model and Size Pile BPB Pile PPL Wikitext PPL Lambada PPL Lambada Acc Winogrande Hellaswag
GPT-Neo 125M ----- ----- 32.285 30.266 37.36% 50.43% 28.67%
GPT-3 125M ----- ----- ----- 18.6 42.7% 52.0% 33.7%
GPT-Neo 350M ----- ----- 22.5657 13.876 47.27% 51.14% 32.16%
GPT-3 350M ----- ----- ----- 9.09 54.3% 52.1% 43.6%
GPT-3 Ada 0.9631 ----- ----- 9.954 51.60% 52.90% 35.93%
GPT-Neo 1.3B 0.7527 6.159 13.10 7.498 57.23% 55.01% 38.66%
GPT-3 1.3B ----- ----- ----- 5.44 63.6% 58.7% 54.7%
GPT-2 1.5B 1.0468 ----- 17.48 10.634 51.21% 59.40% 40.03%
GPT-Neo 2.7B 0.7165 5.646 11.39 5.626 62.22% 56.50% 42.73%
GPT-3 2.7B ----- ----- ----- 4.60 67.1% 62.3% 62.8%

Physical and Scientific Reasoning

Model and Size MathQA PubMedQA Piqa
GPT-Neo 125M 22.78% 55.10% 63.06%
GPT-3 125M ----- ----- 64.6%
GPT-Neo 350M 23.45% 53.80% 65.07%
GPT-3 350M ----- ----- 70.2%
GPT-3 Ada 24.29% 52.80% 68.88%
GPT-Neo 1.3B 24.05% 54.40% 71.11%
GPT-3 1.3B ----- ----- 75.1%
GPT-2 1.5B 23.64% 58.33% 70.78%
GPT-Neo 2.7B 24.72% 57.54% 72.14%
GPT-3 2.7B ----- ----- 75.6%

Note: All evaluations were done using our evaluation harness. Some results for GPT-2 and GPT-3 are inconsistent with the values reported in the respective papers. We are currently looking into why, and would greatly appreciate feedback and further testing of our eval harness.

Setup

git clone https://github.com/EleutherAI/GPTNeo
cd GPTNeo
pip3 install -r requirements.txt

Training Setup

TPUs:

Sign up for Google Cloud Platform, and create a storage bucket.

Create your VM through a google shell (https://ssh.cloud.google.com/) with ctpu up --vm-only so that it can connect to your Google bucket and TPUs and install the requirements with pip (see above).

Google colab provides tpu-v8s for free, which should be enough to finetune our models up to GPT3XL (1.5B parameter) sizes. Click Open In Colab to run through our example colab notebook.

For more detailed instructions, run through our Training Guide below.

GPUs:

You can also choose to train GPTNeo locally on your GPUs. To do so, you can omit the Google cloud setup steps above, and git clone the repo locally. Run through the Training Guide below, then when running main.py, you simply have to omit the tpu flag, and pass in GPU ids instead.

Note: Some users have reported having difficulty getting MTF to recognize their GPUs. See here for details and instructions on how to fix it.

Generating Text

Once you have a trained model, or you've downloaded one of our pre-trained models, generating text is as simple as running the main.py script with the --predict flag on. You can pass a path to your prompt txt file with the --prompt flag, like so:

python3 main.py --predict --prompt <example_prompt.txt> --tpu <tpu_name> --model <config_name>

or, if using GPUs:

python3 main.py --predict --prompt <example_prompt.txt> --gpu_ids <device:GPU:0 device:GPU:1> --model <config_name>

Training Guide

1. Create your Tokenizer (OPTIONAL)

We recommend you use Huggingface's pretrained GPT2 tokenizer with our repo (instructions provided below), but if you want to train a model with a different vocabulary size, we provide facilities to train your own tokenizer like so:

python data/train_tokenizer.py \
    --base_dir ./path/to/your/txt/files \
    --output_dir ./output/path \
    --file_type txt \
    --vocab_size 50257

# if it succeeded, you should see the message
# 'tokenizer saved at ./output/path/byte-level-bpe.tokenizer.json'

2. Tokenizing your Dataset

If you just want to test training, you can skip this step and download some dummy data like so:

wget https://storage.googleapis.com/connors-datasets/bundestag/bundestag_0.tfrecords

Then copy the data to your bucket, or if using GPUs, a local directory:

gsutil cp bundestag_0.tfrecords gs://<your bucket>/

If using your own data to train, you can use the data/create_tfrecords.py script to encode your text data into tfrecords.

Your data must either be in the form of lots of normal .txt files (one document per file), or in any format supported by lm_dataformat.

You can run the script without parameters to see help for all options.

In document mode Each example in the tfrecords is one (variably sized) document. This is to be used with the documents_fixed and documents_random sampling modes (For more details see the parameters reference section). Document mode is the default mode.

The below command will tokenize all files in acceptable formats in base_dir using gpt2 tokenizer and save them to output_dir

python3 create_tfrecords.py --mode documents --input_dir <base> --name <name> --output_dir <output> --use_gpt2_tokenizer --minimum_size <min> 
  • input_dir: Defines the folder where your data is located. The script will encode all files present in this folder.
  • name: Name of output files will be name_i.tfrecords where i is the number of the file.
  • output_dir: Where to save the tfrecords to
  • use_gpt2_tokenizer: Whether to use the pretrained HuggingFace GPT2 tokenizer, in which case the separator will be set to [50256].
  • encoder_path: if not using the pretrained gpt2 tokenizer, use this flag to provide a path to your generated tokenizer json.
  • separator: Written in list format, the separator token(s) to insert between documents (e.g. "[0]"). Will depend on your encoder.
  • minimum_size: The minimum size (in tokens) a document must have, otherwise it is discarded. This is what will later determine your stitch parameter: stitch * minimum_size must always be greater or equal n_ctx (For more details see the parameters reference section).

4. Using a Dataset in a Model

To use a dataset in a model, you must first register that dataset under ./configs/dataset_configs folder. First choose a filename with a .json extension. That filename will serve as the dataset identification. The config should be filled out the following manner.

If you have a dataset encoded using the pretrained gpt2 tokenizer, you can specify that like so:

{
    "n_vocab": 50257,
    "path": "gs://neo-datasets/openwebtext-documents/openwebtext_*.tfrecords",
    "eval_path": "gs://neo-datasets/openwebtext-documents/openwebtext_*.tfrecords",
    "tokenizer_is_pretrained": true,
    "tokenizer_path": "gpt2"
}

or if you've trained a custom tokenizer, like so:

{
    "n_vocab": 32768,
    "path": "./path/to/your/*.tfrecords",
    "eval_path": "./path/to/your/eval/*.tfrecords",
    "tokenizer_path": "./path/to/your/byte-level-bpe.tokenizer.json"
}

Finally, in your model config, add the filename that you created above to the datasets array.

The <dataset id> will be the filename, excluding the .json, that you created above

"datasets": [[<dataset id>, <stitch>, <datatype>, <weight>]] # datasets key defines at run time how each dataset is processed for training

5. Choose a model configuration

Once you have your datasets set up, find a suitable config in /configs.

Here we use a GPT3-XL sized model as an example, but there are many more in ./configs, all of which have short summaries in the Available Configs section.

All you need to do is edit the dataset id as described above, and edit model_path (where logs and checkpoints will be saved) to point to a cloud bucket you have write access to (or local path, if using GPUs).

{
    "n_head": 32,
    "n_vocab": 50257,
    "embed_dropout": 0.1,
    "lr": 0.0002,
    "lr_decay": "cosine",
    "warmup_steps": 3000,
    "beta1": 0.9,
    "beta2": 0.95,
    "epsilon": 1e-8,
    "opt_name": "adam",
    "weight_decay": 0.1,
    "train_batch_size": 512,
    "attn_dropout": 0.1,
    "train_steps": 286150,
    "eval_steps": 0,
    "predict_steps": 1,
    "res_dropout": 0.1,
    "eval_batch_size": 128,
    "predict_batch_size": 1,
    "iterations": 2500,
    "n_embd": 2048,
    "datasets": [["your_dataset_name", 25, "documents_random", 1.0]],
    "model_path": "gs://neo-models/GPT3_XL",
    "n_ctx": 2048,
    "n_layer": 24,
    "scale_by_depth": true,
    "scale_by_in": false,
    "attention_types" :  [[["global"],24]],
    "mesh_shape": "x:128,y:2",
    "layout": "batch:x,memory_length:y,embd:y",
    "activation_function": "gelu",
    "recompute_grad": true,
    "gradient_clipping": 1.0,
    "tokens_per_mb_per_replica": 2048
}

6. Run Training

python3 main.py --model <your_config_name> --steps_per_checkpoint <n> --tpu <tpu-name>
  • tpu: Name of the TPU to use.
  • steps_per_checkpoint: The frequency in steps at which to save checkpoints.
  • --auto_layout and --auto_layout_and_mesh_shape (Optional): Disable training and instead auto generate a memory efficient layout (and mesh_shape)
  • gpu_ids: if training using GPUs, omit the tpu flag and pass in the ids of your gpus. In the example below, we train on 3 GPUs, specifying their device ids delimited by spaces:
python3 main.py --model <your_config_name> --steps_per_checkpoint <n> --gpu_ids <device:GPU:0 device:GPU:1>

Available Configs

We have several model sizes available, but some of our configs require large TPUs and will need tweaking to run on smaller machines, or GPUs. Below is a short guide to each model in the configs directory:

TODO

Extra Features:

Training (with Sacred)

Sacred helps track experiments and is much nicer to work with than tensorboard.

To setup:

  1. Install Docker and Docker-compose

  2. Run docker-compose up

To use:

  1. Ensure model_dir doesn't have any metric logs in it (it trips up the metric stuff for tensorboard, which assumes that it's a continuation of the existing run). You can use gsutil rm -r ... to delete model dir

  2. Run python3 run_experiment.py --tpu sometpuhere --model someconfig.json Options are the same as main.py.

  3. You can go to http://server_ip_goes_here:8081/ to see the Omniboard overview. If you prefer to see a tensorboard, the script also spins one up and automatically assigns it a port. The script should print out the tensorboard port near the top of the log.

Peeking at a Dataset

If you are ever confused by the dataset of a particular config file, you can easily check the minimum and maximum token ids with a single command. This is useful for making sure that the vocabulary size of the model is at least as large as the maximum token id. Tensorflow will not error if you try to gather on a matrix with out of bounds indices, so you need to make sure your vocabulary size is sufficiently large.

python main --model {config_name} --check_dataset

Masked Language Modeling

In addition to being able to train large GPT's, this repository also allows you to easily do masked language modeling (BERT, RoBERTa). In order to do so, you must follow two additional steps.

  1. When tokenizing your dataset, you must reserve a special id for the [mask] token.

  2. In the configs, you will have to define two additional fields

"mlm_training": true,                           # must be set to true
"mlm_mask_id": <mask id>                        # the mask id that you reserved from above

That's all you need to train a model with the MLM objective, good for any type of data that you have encoded properly. If you would like to tweak the other related hyperparameters, please continue reading.

"mlm_cls_token_id": <cls token id>,                # auto append specified CLS token id on the left
"mlm_mask_prob": 0.15,                             # the probability of masking a token, defaults to 15%
"mlm_same_token_prob": 0.10,                       # probability of keeping the token the same, defaults to 10%
"mlm_random_token_prob": 0.10,                     # probability of tokens that are replaced with random tokens, 10% was recommended by the BERT paper
"mlm_mask_ignore_ids": [<cls token>, <sep token>]  # ignore masking other special tokens, if any

Parameter Reference

Pick a valid config from /configs and tweak the parameters as needed:

  • n_heads: The number of attention heads.
  • n_embd: Size of the hidden layers, must be divisible by n_heads.
  • n_vocab: Vocabulary size.
  • embed_dropout, res_dropout, attn_dropout: Dropout probability for word embedding/residuals/attention
  • lr: Learning rate
  • warmup_steps: Number of steps before full learning rate is reached (linear ramp from 0 to lr).
  • lr_decay: cosine or linear.
  • opt_name: adam or adafactor.
  • beta1, beta2 and epsilon: adam optimizer params.
  • beta1, ada_epsilon1 and ada_epsilon2: adafactor optimizer params.
  • weight_decay: Weight decay parameter, if not present no weight decay is used (the weight decay fix for Adam is used) (default: 0.01) (optional).
  • train_batch_size: Batch size during training.
  • train_steps: Number of training steps (batches), set to roughly ~1 epoch for now (total number of tokens in your dataset / number of tokens per batch (= train_batch_size / n_ctx)).
  • eval_steps: Number of steps to run for each evaluation. Set to 0 for no eval. i.e After every checkpoint, the model is tested for eval_steps
  • iterations: Number of steps queued to the TPU, must be smaller than steps_per_checkpoint. (default: 500)
  • datasets: List of tfrecords datasets to use. Each dataset is a list with the following parameters: [train glob , eval glob, stitch, sampling_mode, weight]. So for example for a single dataset (note the double list): [["bundestag_*.tfrecords", "", 10, "random_sample", 1.0]]
    • dataset_id: The name of a dataset configuration file in ./configs/dataset_configs
    • stitch: If sampling_mode random_sample is used, the input pipeline samples this amount of texts into one to sample from. You must select stitch so that stitch * minimum_document_length >= n_ctx
    • sampling_mode: chunks (tfrecords are preprocessed into the correct length and are read sequentially) or documents_random (stitch amount of documents are concatenated and then a n_ctx chunk is randomly subsampled)
    • weights: How much relative weight this dataset should have compared to others
  • model: Which model to train. Currently only GPT is supported, and it defaults to this if not present.
  • model_path: Google storage bucket location (or local path, if using GPUs) to save model checkpoints and logs.
  • n_ctx: Size of context window. Default is 2048
  • n_layer: Number of layers (blocks) in the model.
  • scale_by_depth: If true, the weight initialization of layers are scaled by their depth as in the GPT2 paper.
  • scale_by_in: If true, the weight initialization of layers are scaled by their number of inputs as in the GPT2 paper.
  • mesh_shape: A Mesh is an n-dimensional array of processors with named dimensions used for parallelism in the mesh-tensorflow library. Each Tensor is split evenly across mesh dimensions according to the layout (see below). The 'mesh_shape' is the shape of this array, and must be equal to the number of processors. e.g., for a v3-128 TPU "mesh_shape": β€œx:16,y:8”.
  • layout: A Tensor is laid out on its mesh with one slice on each processor. A Tensor "layout", is an injective partial map specifying which dimensions of the tensor are (evenly) split across which dimensions of the mesh. No dimension of a tensor may be split across two dimensions of its mesh and no two dimensions of a tensor may be split across the same dimension of its mesh. The user defines a global set of layout rules in the form of (tensor-dimension-name, mesh-dimension-name) pairs. A dimension of a tensor is split across a dimension of its mesh if there is a matching rule, e.g. (for the above example mesh_shape: "layout":"batch:x,heads:y"
  • activation_function: selu (self normalizing) or gelu (used by OA), activation function used in feed-forward passes. (default: gelu)
  • attention_types: the type of attention for each layer in a list of the following format [[["attention_type"], n_layers]]. e.g. for a 12 layer net [[["global"], 12]] or [[["local"], 10], [["global"], 2]].
    • Choose from: linear, global, local or none. We have found a 50/50 mix of global and linear to work well. none allows you to create feed-forward only layers for more efficient PAR Transformer models.
  • precision: float32 or bfloat16.
  • tokens_per_mb_per_replica: If not None, will split the batch up into smaller microbatches containing tokens_per_mb_per_replica tokens to avoid OOMs. Gradients are accumulated locally and reduced once. IMPORTANT: mb refers to minibatch not megabyte here.

Mixture of Experts

  • moe_layers: A list of layer numbers to append a mixture of experts layer onto. E.G: [2,4,6,8,10,12]. We have experimentally found a moe layer for every two self-attention layers to work well.
  • moe_params: a dictionary of additional kwargs to pass in to the moe layer. E.G {"moe_dropout_rate": 0.0 }

Experimental features

  • axial_pos_emb_: If true, uses [axial positional embedding](https://arxiv.org/abs/1912.12180.
  • mlp_glu: If true, uses a gated linear unit variant of feed forward layers.
  • scalenorm: If true, uses scalenorm instead of layernorm.
  • rezero: If true, uses rezero instead of layernorm.
  • num_mem_kv: adds memory / key values from the all-attention paper. Param is an int with the number of desired mem/key values.
  • macaron: if true - uses a macaron transformer for each layer block.

TODO:

  • finalize documentation
  • update configs

Citing GPT-Neo

If you have found GPT-Neo helpful in your work, you can cite this repository as

@software{gpt-neo,
  author       = {Black, Sid and
                  Gao, Leo and
                  Wang, Phil and
                  Leahy, Connor and
                  Biderman, Stella},
  title        = {{GPT-Neo: Large Scale Autoregressive Language 
                   Modeling with Mesh-Tensorflow}},
  month        = mar,
  year         = 2021,
  note         = {{If you use this software, please cite it using 
                   these metadata.}},
  publisher    = {Zenodo},
  version      = {1.0},
  doi          = {10.5281/zenodo.5297715},
  url          = {https://doi.org/10.5281/zenodo.5297715}
}

The version number should be replaced with the version number you are using, and the year corresponds to the project's open-source release.

If you are specifically interested in citing the GPT-Neo models trained on the Pile, we would appreciate also citing

@article{gao2020pile,
  title={The Pile: An 800GB Dataset of Diverse Text for Language Modeling},
  author={Gao, Leo and Biderman, Stella and Black, Sid and Golding, Laurence and Hoppe, Travis and Foster, Charles and Phang, Jason and He, Horace and Thite, Anish and Nabeshima, Noa and others},
  journal={arXiv preprint arXiv:2101.00027},
  year={2020}
}
Comments
  • Getting errors when running the command to generate text

    Getting errors when running the command to generate text

    Hi,

    I downloaded this pre-trained model on my system: https://the-eye.eu/public/AI/gptneo-release/GPT3_XL/

    I modified the config.json file to look like below:

    {
    "n_head" : 16,
    "n_vocab" : 50257,
    "embed_dropout" : 0,
    "lr" : 0.0002,
    "lr_decay" : "cosine",
    "warmup_steps" : 3000,
    "beta1" : 0.9,
    "beta2" : 0.95,
    "epsilon" : 1e-08,
    "opt_name" : "adam",
    "weight_decay" : 0,
    "train_batch_size" : 512,
    "attn_dropout" : 0,
    "train_steps" : 400000,
    "lr_decay_end" : 300000,
    "eval_steps" : 10,
    "predict_steps" : 0,
    "res_dropout" : 0,
    "eval_batch_size" : 128,
    "predict_batch_size" : 128,
    "iterations" : 500,
    "n_embd" : 2048,
    "datasets" : [["pile", null, null, null]],
    "model_path" : "D:\\gpt-neo\\the-eye.eu\\public\\AI\\gptneo-release\\GPT3_XL",
    "n_ctx" : 2048,
    "n_layer" : 24,
    "scale_by_depth" : true,
    "scale_by_in" : false,
    "attention_types" : [[["global", "local"], 12]],
    "mesh_shape" : "x:128,y:2",
    "layout" : "batch:x,memory_length:y,embd:y",
    "activation_function" : "gelu",
    "recompute_grad" : true,
    "gradient_clipping" : 1.0,
    "tokens_per_mb_per_replica" : 4096,
    "precision" : "bfloat16",
    "padding_id" : 50257,
    "eos_id" : 50256
    }
    

    After that, I ran the following command:

    main.py --predict --prompt "I like Apples" --tpu "device:CPU:0" --model "D:\gpt-neo\the-eye.eu\public\AI\gptneo-release\GPT3_XL\config.json" 
    

    Running the above command gives me the following errors

    image

    image

    image

    How should I proceed next?

    Thanks. :)

    bug 
    opened by baljeetrathi 22
  • CrossShardOptimizer must be used for model training on TPUs

    CrossShardOptimizer must be used for model training on TPUs

    Running the example on a Colab TPU results in the following error:

    File "main.py", line 256, in <module>
        main(args)
      File "main.py", line 230, in main
        estimator.train(input_fn=partial(input_fn, global_step=current_step, eval=False), max_steps=next_checkpoint)
      File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3130, in train
        rendezvous.raise_errors()
      File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 150, in raise_errors
        six.reraise(typ, value, traceback)
      File "/usr/local/lib/python3.6/dist-packages/six.py", line 703, in reraise
        raise value
      File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3125, in train
        saving_listeners=saving_listeners)
      File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 349, in train
        loss = self._train_model(input_fn, hooks, saving_listeners)
      File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1175, in _train_model
        return self._train_model_default(input_fn, hooks, saving_listeners)
      File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1204, in _train_model_default
        self.config)
      File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2962, in _call_model_fn
        config)
      File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1163, in _call_model_fn
        model_fn_results = self._model_fn(features=features, **kwargs)
      File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3386, in _model_fn
        _validate_tpu_training_graph(ctx)
      File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3817, in _validate_tpu_training_graph
        'CrossShardOptimizer must be used for model training on TPUs.')
    ValueError: CrossShardOptimizer must be used for model training on TPUs.
    
    bug 
    opened by StrangeTcy 17
  • lambada changes

    lambada changes

    I added the LAMBADA task, as interpreted by OpenAI in the GPT-2 and GPT-3 papers, to GPTNeo.

    To use it, add "eval_tasks": ["lambada"] to your config file, and "lambada_tokens_path": "SOME PATH" to at least one of your dataset config files. The LAMBADA dataset will automatically be downloaded, tokenized with the current encoder, and saved to SOME PATH on first use. (Currently, only local paths, not gs:// paths, are supported.) At each checkpoint, training will pause and the program will use a special estimator to evaluate the model's accuracy and perplexity on LAMBADA.

    Note that the LAMBADA task, as interpreted by OpenAI, differs in some respects from what was described in the original LAMBADA paper. In particular, for OpenAI, the task is to predict the last token of the example, not the last word. For quite a few examples in the LAMBADA dataset, the last token is only part of the last word, so the model gets an extra "hint" about what the last word might be. Also, this means that what the task is actually depends on the BPE vocabulary being used. (OpenAI's version of the task is also easier because they allow the model to see the natural punctuation of the example. The original LAMBADA dataset was stripped of punctuation.)

    In preparing this pull request, I felt sometimes that I was going against the grain of what Tensorflow intends. Right now, the code makes an extra TPUEstimator for the LAMBADA evaluation, but I wonder if it would be better to feed both the LAMBADA examples and the normal evaluation examples into the same estimator, with an extra feature on each example to specify which collection it belongs to. But this would lead to some complications in the metrics accumulation at the end of model_fn. Also, the LAMBADA evaluation step is apparently forced to produce a "loss" even though this is not intended.

    opened by kevinwatkins 15
  • ValueError when predicting with pretrained models

    ValueError when predicting with pretrained models

    Describe the bug When using GPT3XL to perform inference with the --predict flag as shown in examples, the following error is thrown

    ValueError: Argument not a list with same length as devices arg=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255] devices=['device:GPU:0']
    

    This is with a single GTX 1070 GPU.

    commands that both produced this error were: python main.py --model=gpt3xl/config.json --predict --prompt=prompt.txt python main.py --model=gpt3xl/config.json --predict --prompt=prompt.txt --gpu_ids=['device:GPU:0']

    bug 
    opened by iocuydi 13
  • Can't generate samples from pre-trained GPT3_XL using main.py without errors

    Can't generate samples from pre-trained GPT3_XL using main.py without errors

    Describe the bug When I run the provided colab notebook so as to sample from a pre-trained model, I get this error:

    FileNotFoundError: [Errno 2] No such file or directory: 'configs/GPT3_XL.json'

    To Reproduce When I go through the steps to generate samples from a pre-trained model without fine-tuning, I do fine until I try to generate the predictions. Specifically, when I run

    !main.py --model $pretrained_model --steps_per_checkpoint 500 --tpu colab --predict --prompt example_prompt.txt

    I get an error:

    python3: can't open file 'main.py': [Errno 2] No such file or directory

    Specifying the full filepath for main.py solves this, but I then need to manually install mesh_tensorflow, tokenizers, and ftfy with pip, which I presumably shouldn't have to do? Once this has been done, I run the cell and get an error:

    FileNotFoundError: [Errno 2] No such file or directory: 'configs/GPT3_XL.json'

    This is no surprise, as this directory does not contain this .json file. But I don't know how to proceed from here, or know what I'm doing wrong to get this result?

    Expected behavior I had expected to be able to generate predictions from the pre-trained GPT3_XL model on the basis of the text prompts supplied.

    Screenshots Screenshot 2021-03-23 at 22 42 40

    Environment (please complete the following information):

    • GPUs: None, running TPU
    • Configs:
    bug 
    opened by texturejc 12
  • Using gpt neo checkpoint

    Using gpt neo checkpoint

    Hi, I downloaded gpt neo from theeye.eye on my pc. It downloaded various checkpoints. How do i use them? ... Because in order too load and use model I'd need encoder. Json, pytorch. Bin, etc..

    bug 
    opened by MK096 9
  • Anyone want to collaborate on tabnine-style auto completion?

    Anyone want to collaborate on tabnine-style auto completion?

    Tabnine was the best autocomplete I ever used, but it's closed source and increasingly expensive. Creating something similar would be a very large project. I'm available to help out a little bt, particularly with data-preparation, and probably with langserver development.

    This seems like a good place to co-ordinate and find other people interested in doing that, so hopefully my opening an issues here isn't a problem.

    feature request 
    opened by traverseda 9
  • There is no `--mode` switch in `data/generate_tfrecord.py`

    There is no `--mode` switch in `data/generate_tfrecord.py`

    Even though --mode documents is both mentioned in the README and used in the Colab notebook, the script data/generate_tfrecord.py does not actually define this parameter.

    documentation 
    opened by JanPokorny 8
  • Failed to install requirements

    Failed to install requirements

    I got this error when installing the requirements. Can anyone help?

    $ pip3 install -r requirements.txt
    ...
      Could not find a version that satisfies the requirement tensorflow==2.4.0 (from -r requirements.txt (line 10)) (from versions: 1.13.0rc1, 1.13.0rc2, 1.13.1, 1.13.2, 1.14.0rc0, 1.14.0rc1, 1.14.0, 2.0.0a0, 2.0.0b0, 2.0.0b1)
    No matching distribution found for tensorflow==2.4.0 (from -r requirements.txt (line 10))
    

    My system is: Debian 10.8 64bit, Python 3.7.3.

    bug 
    opened by notooth1 8
  • Can't load GPT3_XL

    Can't load GPT3_XL

    Hi All, I downloaded the model from https://the-eye.eu/public/AI/gptneo-release/GPT3_XL/

    after which i changed model_path in config.json to: "model_path" : "C:\Users\GPT_NEO_2\GPT3_XL"

    Whenever i run the following code: model = GPTNeoForCausalLM.from_pretrained("C:\Users\GPT_NEO_2\GPT3_XL")

    i get an error: f"Error no file named {[WEIGHTS_NAME, TF2_WEIGHTS_NAME, TF_WEIGHTS_NAME + '.index', FLAX_WEIGHTS_NAME]} found in " OSError: Error no file named ['pytorch_model.bin', 'tf_model.h5', 'model.ckpt.index', 'flax_model.msgpack'] found in directory C:\Users\GPT_NEO_2\GPT3_XL or from_tf and from_flax set to False.

    and while running : generator = pipeline('text-generation', model="C:\Users\GPT_NEO_2\GPT3_XL")

    i get following error: f"Unrecognized model in {pretrained_model_name_or_path}. "

    I have the latest TF and torch (both cpu).

    Thanks

    I have aloso attached my config.json config

    bug 
    opened by MK096 7
  • GPT Neo Colab: no such file or directory

    GPT Neo Colab: no such file or directory "data/create_tfrecords.py"

    Describe the bug In GPTNeo_example_notebook.ipynb, getting an error in cell 8: python3: can't open file 'data/create_tfrecords.py': [Errno 2] No such file or directory

    To Reproduce Steps to reproduce the behavior:

    1. Go to https://colab.research.google.com/github/EleutherAI/GPTNeo/blob/master/GPTNeo_example_notebook.ipynb
    2. Link with Google Cloud
    3. Select "Custom" data set and enter relative path and filename from cloud bucket
    4. See error

    Expected behavior When running cell 8, the data referenced above is tokenized and later available

    Proposed solution It doesn't look like there is a data/ directory included with the notebook

    Screenshots Screen Shot 2021-04-28 at 7 16 28 PM

    Environment (please complete the following information): Google Colab

    Additional context

    bug 
    opened by audiodude 7
  • Cannot Connect To Local TPU-VM

    Cannot Connect To Local TPU-VM

    Describe the bug When I try to connect to the TPU to finetune, it gives me this error:

    Traceback (most recent call last):
      File "main.py", line 257, in <module>
        main(args)
      File "main.py", line 251, in main
        estimator.train(input_fn=partial(input_fn, global_step=current_step, eval=False), max_steps=params["train_steps"])
      File "/home/nikhilnayak/.local/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3110, in train
        rendezvous.raise_errors()
      File "/home/nikhilnayak/.local/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 150, in raise_errors
        six.reraise(typ, value, traceback)
      File "/home/nikhilnayak/.local/lib/python3.8/site-packages/six.py", line 703, in reraise
        raise value
      File "/home/nikhilnayak/.local/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3100, in train
        return super(TPUEstimator, self).train(
      File "/home/nikhilnayak/.local/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 346, in train
        hooks.extend(self._convert_train_steps_to_hooks(steps, max_steps))
      File "/home/nikhilnayak/.local/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2973, in _convert_train_steps_to_hooks
        if ctx.is_running_on_cpu():
      File "/home/nikhilnayak/.local/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_context.py", line 531, in is_running_on_cpu
        self._validate_tpu_configuration()
      File "/home/nikhilnayak/.local/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_context.py", line 699, in _validate_tpu_configuration
        num_cores = self.num_cores
      File "/home/nikhilnayak/.local/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_context.py", line 429, in num_cores
        metadata = self._get_tpu_system_metadata()
      File "/home/nikhilnayak/.local/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_context.py", line 333, in _get_tpu_system_metadata
        tpu_system_metadata_lib._query_tpu_system_metadata(
      File "/home/nikhilnayak/.local/lib/python3.8/site-packages/tensorflow/python/tpu/tpu_system_metadata.py", line 135, in _query_tpu_system_metadata
        raise RuntimeError(
    RuntimeError: Cannot find any TPU cores in the system (master address ). This usually means the master address is incorrect or the TPU worker has some problems. Available devices: [_DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 268435456, -3188567715276368833)]
    

    To Reproduce Steps to reproduce the behavior: I followed the instructions for finetuning on this github page.

    Expected behavior The finetuning program should finetune with my dataset without datasets.

    Proposed solution N/A

    Environment (please complete the following information):

    • TPU Version: v2-alpha
    • TPU Type: v3-8
    • Architecture: TPU-VM
    bug 
    opened by nikhilanayak 1
  • Bump tensorflow from 2.5.1 to 2.5.3

    Bump tensorflow from 2.5.1 to 2.5.3

    Bumps tensorflow from 2.5.1 to 2.5.3.

    Release notes

    Sourced from tensorflow's releases.

    TensorFlow 2.5.3

    Release 2.5.3

    Note: This is the last release in the 2.5 series.

    This releases introduces several vulnerability fixes:

    • Fixes a floating point division by 0 when executing convolution operators (CVE-2022-21725)
    • Fixes a heap OOB read in shape inference for ReverseSequence (CVE-2022-21728)
    • Fixes a heap OOB access in Dequantize (CVE-2022-21726)
    • Fixes an integer overflow in shape inference for Dequantize (CVE-2022-21727)
    • Fixes a heap OOB access in FractionalAvgPoolGrad (CVE-2022-21730)
    • Fixes an overflow and divide by zero in UnravelIndex (CVE-2022-21729)
    • Fixes a type confusion in shape inference for ConcatV2 (CVE-2022-21731)
    • Fixes an OOM in ThreadPoolHandle (CVE-2022-21732)
    • Fixes an OOM due to integer overflow in StringNGrams (CVE-2022-21733)
    • Fixes more issues caused by incomplete validation in boosted trees code (CVE-2021-41208)
    • Fixes an integer overflows in most sparse component-wise ops (CVE-2022-23567)
    • Fixes an integer overflows in AddManySparseToTensorsMap (CVE-2022-23568)
    • Fixes a number of CHECK-failures in MapStage (CVE-2022-21734)
    • Fixes a division by zero in FractionalMaxPool (CVE-2022-21735)
    • Fixes a number of CHECK-fails when building invalid/overflowing tensor shapes (CVE-2022-23569)
    • Fixes an undefined behavior in SparseTensorSliceDataset (CVE-2022-21736)
    • Fixes an assertion failure based denial of service via faulty bin count operations (CVE-2022-21737)
    • Fixes a reference binding to null pointer in QuantizedMaxPool (CVE-2022-21739)
    • Fixes an integer overflow leading to crash in SparseCountSparseOutput (CVE-2022-21738)
    • Fixes a heap overflow in SparseCountSparseOutput (CVE-2022-21740)
    • Fixes an FPE in BiasAndClamp in TFLite (CVE-2022-23557)
    • Fixes an FPE in depthwise convolutions in TFLite (CVE-2022-21741)
    • Fixes an integer overflow in TFLite array creation (CVE-2022-23558)
    • Fixes an integer overflow in TFLite (CVE-2022-23559)
    • Fixes a dangerous OOB write in TFLite (CVE-2022-23561)
    • Fixes a vulnerability leading to read and write outside of bounds in TFLite (CVE-2022-23560)
    • Fixes a set of vulnerabilities caused by using insecure temporary files (CVE-2022-23563)
    • Fixes an integer overflow in Range resulting in undefined behavior and OOM (CVE-2022-23562)
    • Fixes a vulnerability where missing validation causes tf.sparse.split to crash when axis is a tuple (CVE-2021-41206)
    • Fixes a CHECK-fail when decoding resource handles from proto (CVE-2022-23564)
    • Fixes a CHECK-fail with repeated AttrDef (CVE-2022-23565)
    • Fixes a heap OOB write in Grappler (CVE-2022-23566)
    • Fixes a CHECK-fail when decoding invalid tensors from proto (CVE-2022-23571)
    • Fixes an unitialized variable access in AssignOp (CVE-2022-23573)
    • Fixes an integer overflow in OpLevelCostEstimator::CalculateTensorSize (CVE-2022-23575)
    • Fixes an integer overflow in OpLevelCostEstimator::CalculateOutputSize (CVE-2022-23576)
    • Fixes a null dereference in GetInitOp (CVE-2022-23577)
    • Fixes a memory leak when a graph node is invalid (CVE-2022-23578)
    • Fixes an abort caused by allocating a vector that is too large (CVE-2022-23580)
    • Fixes multiple CHECK-failures during Grappler's IsSimplifiableReshape (CVE-2022-23581)
    • Fixes multiple CHECK-failures during Grappler's SafeToRemoveIdentity (CVE-2022-23579)
    • Fixes multiple CHECK-failures in TensorByteSize (CVE-2022-23582)
    • Fixes multiple CHECK-failures in binary ops due to type confusion (CVE-2022-23583)

    ... (truncated)

    Changelog

    Sourced from tensorflow's changelog.

    Release 2.5.3

    This releases introduces several vulnerability fixes:

    • Fixes a floating point division by 0 when executing convolution operators (CVE-2022-21725)
    • Fixes a heap OOB read in shape inference for ReverseSequence (CVE-2022-21728)
    • Fixes a heap OOB access in Dequantize (CVE-2022-21726)
    • Fixes an integer overflow in shape inference for Dequantize (CVE-2022-21727)
    • Fixes a heap OOB access in FractionalAvgPoolGrad (CVE-2022-21730)
    • Fixes an overflow and divide by zero in UnravelIndex (CVE-2022-21729)
    • Fixes a type confusion in shape inference for ConcatV2 (CVE-2022-21731)
    • Fixes an OOM in ThreadPoolHandle (CVE-2022-21732)
    • Fixes an OOM due to integer overflow in StringNGrams (CVE-2022-21733)
    • Fixes more issues caused by incomplete validation in boosted trees code (CVE-2021-41208)
    • Fixes an integer overflows in most sparse component-wise ops (CVE-2022-23567)
    • Fixes an integer overflows in AddManySparseToTensorsMap (CVE-2022-23568)
    • Fixes a number of CHECK-failures in MapStage (CVE-2022-21734)
    • Fixes a division by zero in FractionalMaxPool (CVE-2022-21735)
    • Fixes a number of CHECK-fails when building invalid/overflowing tensor shapes (CVE-2022-23569)
    • Fixes an undefined behavior in SparseTensorSliceDataset (CVE-2022-21736)
    • Fixes an assertion failure based denial of service via faulty bin count operations (CVE-2022-21737)
    • Fixes a reference binding to null pointer in QuantizedMaxPool (CVE-2022-21739)
    • Fixes an integer overflow leading to crash in SparseCountSparseOutput (CVE-2022-21738)
    • Fixes a heap overflow in SparseCountSparseOutput (CVE-2022-21740)
    • Fixes an FPE in BiasAndClamp in TFLite (CVE-2022-23557)
    • Fixes an FPE in depthwise convolutions in TFLite (CVE-2022-21741)

    ... (truncated)

    Commits
    • 959e9b2 Merge pull request #54213 from tensorflow/fix-sanity-on-r2.5
    • d05fcbc Fix sanity build
    • f2526a0 Merge pull request #54205 from tensorflow/disable-flaky-tests-on-r2.5
    • a5f94df Disable flaky test
    • 7babe52 Merge pull request #54201 from tensorflow/cherrypick-510ae18200d0a4fad797c0bf...
    • 0e5d378 Set Env Variable to override Setuptools new behavior
    • fdd4195 Merge pull request #54176 from tensorflow-jenkins/relnotes-2.5.3-6805
    • 4083165 Update RELEASE.md
    • a2bb7f1 Merge pull request #54185 from tensorflow/cherrypick-d437dec4d549fc30f9b85c75...
    • 5777ea3 Update third_party/icu/workspace.bzl
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • IndexError: index out of range in self

    IndexError: index out of range in self

    Describe the bug I'm trying to generate a long text just to play with the library on colab with TPU runtime (or without)

    To Reproduce Steps to reproduce the behavior:

    1. Install transformers pip install transformers
    2. Import pipeline from transformers import pipeline
    3. Download model generator = pipeline('text-generation', model='EleutherAI/gpt-neo-1.3B')
    4. Generate text
    prompt = "Once upon a time"
    generator(prompt, do_sample=True, min_length=50, max_length=4000)
    

    I get this error after very long time

    Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
    ---------------------------------------------------------------------------
    IndexError                                Traceback (most recent call last)
    [<ipython-input-13-db32a0d013d2>](https://localhost:8080/#) in <module>()
          1 prompt = "Once upon a time"
    ----> 2 generator(prompt, do_sample=True, min_length=50, max_length=4000)
    
    14 frames
    [/usr/local/lib/python3.7/dist-packages/transformers/pipelines/text_generation.py](https://localhost:8080/#) in __call__(self, text_inputs, **kwargs)
        169               ids of the generated text.
        170         """
    --> 171         return super().__call__(text_inputs, **kwargs)
        172 
        173     def preprocess(self, prompt_text, prefix="", handle_long_generation=None, **generate_kwargs):
    
    [/usr/local/lib/python3.7/dist-packages/transformers/pipelines/base.py](https://localhost:8080/#) in __call__(self, inputs, num_workers, batch_size, *args, **kwargs)
       1004             return self.iterate(inputs, preprocess_params, forward_params, postprocess_params)
       1005         else:
    -> 1006             return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
       1007 
       1008     def run_multi(self, inputs, preprocess_params, forward_params, postprocess_params):
    
    [/usr/local/lib/python3.7/dist-packages/transformers/pipelines/base.py](https://localhost:8080/#) in run_single(self, inputs, preprocess_params, forward_params, postprocess_params)
       1011     def run_single(self, inputs, preprocess_params, forward_params, postprocess_params):
       1012         model_inputs = self.preprocess(inputs, **preprocess_params)
    -> 1013         model_outputs = self.forward(model_inputs, **forward_params)
       1014         outputs = self.postprocess(model_outputs, **postprocess_params)
       1015         return outputs
    
    [/usr/local/lib/python3.7/dist-packages/transformers/pipelines/base.py](https://localhost:8080/#) in forward(self, model_inputs, **forward_params)
        921                 with inference_context():
        922                     model_inputs = self._ensure_tensor_on_device(model_inputs, device=self.device)
    --> 923                     model_outputs = self._forward(model_inputs, **forward_params)
        924                     model_outputs = self._ensure_tensor_on_device(model_outputs, device=torch.device("cpu"))
        925             else:
    
    [/usr/local/lib/python3.7/dist-packages/transformers/pipelines/text_generation.py](https://localhost:8080/#) in _forward(self, model_inputs, **generate_kwargs)
        204             input_ids = None
        205         prompt_text = model_inputs.pop("prompt_text")
    --> 206         generated_sequence = self.model.generate(input_ids=input_ids, **generate_kwargs)  # BS x SL
        207         return {"generated_sequence": generated_sequence, "input_ids": input_ids, "prompt_text": prompt_text}
        208 
    
    [/usr/local/lib/python3.7/dist-packages/torch/autograd/grad_mode.py](https://localhost:8080/#) in decorate_context(*args, **kwargs)
         26         def decorate_context(*args, **kwargs):
         27             with self.__class__():
    ---> 28                 return func(*args, **kwargs)
         29         return cast(F, decorate_context)
         30 
    
    [/usr/local/lib/python3.7/dist-packages/transformers/generation_utils.py](https://localhost:8080/#) in generate(self, inputs, max_length, min_length, do_sample, early_stopping, num_beams, temperature, top_k, top_p, repetition_penalty, bad_words_ids, bos_token_id, pad_token_id, eos_token_id, length_penalty, no_repeat_ngram_size, encoder_no_repeat_ngram_size, num_return_sequences, max_time, max_new_tokens, decoder_start_token_id, use_cache, num_beam_groups, diversity_penalty, prefix_allowed_tokens_fn, logits_processor, stopping_criteria, output_attentions, output_hidden_states, output_scores, return_dict_in_generate, forced_bos_token_id, forced_eos_token_id, remove_invalid_values, synced_gpus, **model_kwargs)
       1208                 return_dict_in_generate=return_dict_in_generate,
       1209                 synced_gpus=synced_gpus,
    -> 1210                 **model_kwargs,
       1211             )
       1212 
    
    [/usr/local/lib/python3.7/dist-packages/transformers/generation_utils.py](https://localhost:8080/#) in sample(self, input_ids, logits_processor, stopping_criteria, logits_warper, max_length, pad_token_id, eos_token_id, output_attentions, output_hidden_states, output_scores, return_dict_in_generate, synced_gpus, **model_kwargs)
       1712                 return_dict=True,
       1713                 output_attentions=output_attentions,
    -> 1714                 output_hidden_states=output_hidden_states,
       1715             )
       1716 
    
    [/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *input, **kwargs)
       1100         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
       1101                 or _global_forward_hooks or _global_forward_pre_hooks):
    -> 1102             return forward_call(*input, **kwargs)
       1103         # Do not call functions when jit is used
       1104         full_backward_hooks, non_full_backward_hooks = [], []
    
    [/usr/local/lib/python3.7/dist-packages/transformers/models/gpt_neo/modeling_gpt_neo.py](https://localhost:8080/#) in forward(self, input_ids, past_key_values, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict)
        752             output_attentions=output_attentions,
        753             output_hidden_states=output_hidden_states,
    --> 754             return_dict=return_dict,
        755         )
        756         hidden_states = transformer_outputs[0]
    
    [/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *input, **kwargs)
       1100         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
       1101                 or _global_forward_hooks or _global_forward_pre_hooks):
    -> 1102             return forward_call(*input, **kwargs)
       1103         # Do not call functions when jit is used
       1104         full_backward_hooks, non_full_backward_hooks = [], []
    
    [/usr/local/lib/python3.7/dist-packages/transformers/models/gpt_neo/modeling_gpt_neo.py](https://localhost:8080/#) in forward(self, input_ids, past_key_values, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, use_cache, output_attentions, output_hidden_states, return_dict)
        579         if inputs_embeds is None:
        580             inputs_embeds = self.wte(input_ids)
    --> 581         position_embeds = self.wpe(position_ids)
        582         hidden_states = inputs_embeds + position_embeds
        583 
    
    [/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *input, **kwargs)
       1100         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
       1101                 or _global_forward_hooks or _global_forward_pre_hooks):
    -> 1102             return forward_call(*input, **kwargs)
       1103         # Do not call functions when jit is used
       1104         full_backward_hooks, non_full_backward_hooks = [], []
    
    [/usr/local/lib/python3.7/dist-packages/torch/nn/modules/sparse.py](https://localhost:8080/#) in forward(self, input)
        158         return F.embedding(
        159             input, self.weight, self.padding_idx, self.max_norm,
    --> 160             self.norm_type, self.scale_grad_by_freq, self.sparse)
        161 
        162     def extra_repr(self) -> str:
    
    [/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py](https://localhost:8080/#) in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
       2042         # remove once script supports set_grad_enabled
       2043         _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
    -> 2044     return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
       2045 
       2046 
    
    IndexError: index out of range in self
    

    Screenshots image

    Environment (please complete the following information):

    • Colab
    • TPU runtime and also None
    bug 
    opened by dzlab 1
  • TPU device does not support heartbeats.

    TPU device does not support heartbeats.

    Hello,

    When I try to train on a v3-32 TPU with tpu-vm-tf-2.6.0-pod image version, I reveive the following error:

    Creating heartbeat manager for ['/job:worker/replica:0/task:0/device:CPU:0', '/job:worker/replica:0/task:2/device:CPU:0', '/job:worker/replica:0/task:1/device:CPU:0', '/job:worker/replica:0/task:3/device:CPU:0']
    Configuring worker heartbeat: shutdown_mode: WAIT_FOR_COORDINATOR
    
    TPU device does not support heartbeats. Failure handling will be disabled.
    training_loop marked as finished
    Reraising captured error
    Traceback (most recent call last):
      File "/home/dumitrescu_stefan/dev/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1375, in _do_call
        return fn(*args)
      File "/home/dumitrescu_stefan/dev/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1359, in _run_fn
        return self._call_tf_sessionrun(options, feed_dict, fetch_list,
      File "/home/dumitrescu_stefan/dev/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1451, in _call_tf_sessionrun
        return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
    tensorflow.python.framework.errors_impl.PermissionDeniedError: From /job:worker/replica:0/task:2:
    /home/dumitrescu_stefan; Permission denied
    	 [[{{node create_file_writer/CreateSummaryFileWriter}}]]
    Recent warning and error logs:
      OP_REQUIRES failed at summary_kernels.cc:50 : Permission denied: /home/dumitrescu_stefan; Permission denied
      OP_REQUIRES failed at summary_kernels.cc:50 : Permission denied: /home/dumitrescu_stefan; Permission denied
      OP_REQUIRES failed at summary_kernels.cc:50 : Permission denied: /home/dumitrescu_stefan; Permission denied
    

    To Reproduce Run the training script:

    1. python3 main.py --model gpt3_XL_256_Pile --steps_per_checkpoint 40000 --tpu TPU_NAME

    Environment (please complete the following information):

    • TPUs: V3-32 with tpu-vm-tf-2.6.0-pod image
    • Configs: { "n_head": 32, "n_vocab": 64000, "embed_dropout": 0, "lr": 0.0002, "lr_decay": "cosine", "warmup_steps": 3000, "beta1": 0.9, "beta2": 0.95, "epsilon": 1e-8, "opt_name": "adam", "weight_decay": 0.1, "train_batch_size": 512, "attn_dropout": 0, "train_steps": 286150, "eval_steps": 10, "predict_steps": 1, "res_dropout": 0, "eval_batch_size": 512, "predict_batch_size": 1, "iterations": 500, "n_embd": 2048, "datasets": [["example", 25, "documents_random", 1.0]], "model_path": "/home/dumitrescu_stefan/gpt-neo/neo-models/GPT3_1.3B", "n_ctx": 2048, "n_layer": 24, "scale_by_depth": true, "scale_by_in": false, "attention_types" : [[["global"],24]], "mesh_shape": "x:16,y:2", "layout": "batch:x,memory_length:y,embd:y", "activation_function": "gelu", "recompute_grad": true, "gradient_clipping": 1.0, "tokens_per_mb_per_replica": 2048, "precision": "bfloat16" } Dataset config is: { "n_vocab": 64000, "path": "/home/dumitrescu_stefan/gpt-neo/data_tfrecords/train_shard_*.tfrecords", "eval_path": "", "tokenizer_path": "/home/dumitrescu_stefan/gpt-neo/tokenizer/tokenizer.json", "eos_id": 1, "padding_id": 0 }
    bug 
    opened by iliemihai 0
  • The model should return just the generated text, not the prompt text + generated text.

    The model should return just the generated text, not the prompt text + generated text.

    There is no reason we would want the prompted text + generated text since we already know the prompted text because we fed it to the input of the program. Returning prompted text + generated text can lead to some unexpected issues. For example, in my program I say that the generated text is the returnedText.substring(prompt.length). But this can fail if there are single quote chars "'" which get escaped and appear as "\'" in the output, meaning the start of the returned text will be later than expected.

    feature request 
    opened by monsieurpooh 2
  • The temperature at 0.0001 (or other arbitrarily small float) is still too high

    The temperature at 0.0001 (or other arbitrarily small float) is still too high

    If setting the temperature to 0.00001 or similarly low float, the output is noticeably less chaotic than when temperature is a significantly larger numbers; however, the output is still very non-deterministic and often answers questions wrong even when the majority of the time it may have gotten it right. I suspect it would be better to have more freedom over the temperature range, where 0.00001 actually denotes extremely low temperature with almost no variation in output, for better question-answering capability

    If anyone knows of a workaround to this please let me know

    bug 
    opened by monsieurpooh 5
Releases(v1.1.1)
  • v1.1(Aug 28, 2021)

    Vulnerabilities have been found in tensorflow which are patched in the most recent version. This release updates the codebase to use the secure version of tensorflow.

    This release also fixes a small but significant bug in how documents are loaded. For details, see #230

    Source code(tar.gz)
    Source code(zip)
  • v1.0(Mar 21, 2021)

    We're proud to release two pretrained GPT-Neo models trained on The Pile, the weights and configs can be freely downloaded from the-eye.eu.

    1.3B: https://the-eye.eu/eleuther_staging/gptneo-release/GPT3_XL/

    2.7B: https://the-eye.eu/eleuther_staging/gptneo-release/GPT3_2-7B/

    For more information on how to get these set up, see the colab notebook, or read through the rest of the readme.

    This repository will be (mostly) archived as we move focus to our GPU training repo, GPT-Neox

    Source code(tar.gz)
    Source code(zip)
Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together

SpeechMix Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together. Introduction For the same input: from datas

Eric Lam 31 Nov 7, 2022
This code extends the neural style transfer image processing technique to video by generating smooth transitions between several reference style images

Neural Style Transfer Transition Video Processing By Brycen Westgarth and Tristan Jogminas Description This code extends the neural style transfer ima

Brycen Westgarth 110 Jan 7, 2023
πŸ“œ GPT-2 Rhyming Limerick and Haiku models using data augmentation

Well-formed Limericks and Haikus with GPT2 ?? GPT-2 Rhyming Limerick and Haiku models using data augmentation In collaboration with Matthew Korahais &

Bardia Shahrestani 2 May 26, 2022
Seonghwan Kim 24 Sep 11, 2022
Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing

Token Shift GPT Implementation of Token Shift GPT - An autoregressive model that relies solely on shifting along the sequence dimension and feedforwar

Phil Wang 32 Oct 14, 2022
API for the GPT-J language model 🦜. Including a FastAPI backend and a streamlit frontend

gpt-j-api ?? An API to interact with the GPT-J language model. You can use and test the model in two different ways: Streamlit web app at http://api.v

VΓ­ctor Gallego 276 Dec 31, 2022
Ongoing research training transformer language models at scale, including: BERT & GPT-2

What is this fork of Megatron-LM and Megatron-DeepSpeed This is a detached fork of https://github.com/microsoft/Megatron-DeepSpeed, which in itself is

BigScience Workshop 316 Jan 3, 2023
Ongoing research training transformer language models at scale, including: BERT & GPT-2

Megatron (1 and 2) is a large, powerful transformer developed by the Applied Deep Learning Research team at NVIDIA.

NVIDIA Corporation 3.5k Dec 30, 2022
Neural text generators like the GPT models promise a general-purpose means of manipulating texts.

Boolean Prompting for Neural Text Generators Neural text generators like the GPT models promise a general-purpose means of manipulating texts. These m

Jeffrey M. Binder 20 Jan 9, 2023
Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts

gpt-2-simple A simple Python package that wraps existing model fine-tuning and generation scripts for OpenAI's GPT-2 text generation model (specifical

Max Woolf 3.1k Jan 7, 2023
Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts

gpt-2-simple A simple Python package that wraps existing model fine-tuning and generation scripts for OpenAI's GPT-2 text generation model (specifical

Max Woolf 2.5k Feb 17, 2021
Bot to connect a real Telegram user, simulating responses with OpenAI's davinci GPT-3 model.

AI-BOT Bot to connect a real Telegram user, simulating responses with OpenAI's davinci GPT-3 model.

Thempra 2 Dec 21, 2022
Honor's thesis project analyzing whether the GPT-2 model can more effectively generate free-verse or structured poetry.

gpt2-poetry The following code is for my senior honor's thesis project, under the guidance of Dr. Keith Holyoak at the University of California, Los A

Ashley Kim 2 Jan 9, 2022
This repository serves as a place to document a toy attempt on how to create a generative text model in Catalan, based on GPT-2

GPT-2 Catalan playground and scripts to train a GPT-2 model either from scrath or from another pretrained model.

Laura 1 Jan 28, 2022
Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis

MLP Singer Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis. Audio samples are available on our demo page.

Neosapience 103 Dec 23, 2022
Scikit-learn style model finetuning for NLP

Scikit-learn style model finetuning for NLP Finetune is a library that allows users to leverage state-of-the-art pretrained NLP models for a wide vari

indico 665 Dec 17, 2022
Scikit-learn style model finetuning for NLP

Scikit-learn style model finetuning for NLP Finetune is a library that allows users to leverage state-of-the-art pretrained NLP models for a wide vari

indico 631 Feb 2, 2021
Guide: Finetune GPT2-XL (1.5 Billion Parameters) and GPT-NEO (2.7 B) on a single 16 GB VRAM V100 Google Cloud instance with Huggingface Transformers using DeepSpeed

Guide: Finetune GPT2-XL (1.5 Billion Parameters) and GPT-NEO (2.7 Billion Parameters) on a single 16 GB VRAM V100 Google Cloud instance with Huggingfa

null 289 Jan 6, 2023
Generate product descriptions, blogs, ads and more using GPT architecture with a single request to TextCortex API a.k.a Hemingwai

TextCortex - HemingwAI Generate product descriptions, blogs, ads and more using GPT architecture with a single request to TextCortex API a.k.a Hemingw

TextCortex AI 27 Nov 28, 2022