Code for the paper "Language Models are Unsupervised Multitask Learners"

OpenAI

Last update: Jan 8, 2023

Related tags

Text Data & NLP paper

Overview

Status: Archive (code is provided as-is, no updates expected)

gpt-2

Code and models from the paper "Language Models are Unsupervised Multitask Learners".

You can read about GPT-2 and its staged release in our original blog post, 6 month follow-up post, and final post.

We have also released a dataset for researchers to study their behaviors.

^* Note that our original parameter counts were wrong due to an error (in our previous blog posts and paper). Thus you may have seen small referred to as 117M and medium referred to as 345M.

Usage

This repository is meant to be a starting point for researchers and engineers to experiment with GPT-2.

For basic information, see our model card.

Some caveats

GPT-2 models' robustness and worst case behaviors are not well-understood. As with any machine-learned model, carefully evaluate GPT-2 for your use case, especially if used without fine-tuning or in safety-critical applications where reliability is important.
The dataset our GPT-2 models were trained on contains many texts with biases and factual inaccuracies, and thus GPT-2 models are likely to be biased and inaccurate as well.
To avoid having samples mistaken as human-written, we recommend clearly labeling samples as synthetic before wide dissemination. Our models are often incoherent or inaccurate in subtle ways, which takes more than a quick read for a human to notice.

Work with us

Please let us know if you’re doing interesting research with or working on applications of GPT-2! We’re especially interested in hearing from and potentially working with those who are studying

Potential malicious use cases and defenses against them (e.g. the detectability of synthetic text)
The extent of problematic content (e.g. bias) being baked into the models and effective mitigations

Development

See DEVELOPERS.md

Contributors

See CONTRIBUTORS.md

Citation

Please use the following bibtex entry:

@article{radford2019language,
  title={Language Models are Unsupervised Multitask Learners},
  author={Radford, Alec and Wu, Jeff and Child, Rewon and Luan, David and Amodei, Dario and Sutskever, Ilya},
  year={2019}
}

Future work

We may release code for evaluating the models on various benchmarks.

We are still considering release of the larger models.

License

Modified MIT

Comments

Fix TODO in sample.sample_sequences- Avoid 'leaving last token calculation to while loop'

Hi,

This change runs the initial model step on the full context, by calling the body() function. I added an 'first' parameter defaulting to False to allow this.

opened by albertwujj 28
My CPU doesn't support Tensorflow AVX instructions

I was able to install all the requirements. However while generating samples, getting the following error. I have an Intel i3 First gen Processor and running Ubuntu 18.

2019-02-16 03:12:49.453982: F tensorflow/core/platform/cpu_feature_guard.cc:37] The TensorFlow library was compiled to use AVX instructions, but these aren't available on your machine. Aborted (core dumped)

I then installed Tensorflow 1.5 (pip3 install tensorflow==1.5). The sample was generated, however another warning popped up as shown below. Will this affect the quality? Do I need to compile TensorFlow on my system?

2019-02-16 03:22:19.785441: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2

opened by bertagknowles 16
Can't install sh_download_model.sh

Noob here (linguist, with rudimentary knowledge of computers) I've installed the gcloud sdk but I can't get the command: sh download_model.sh 117M to run. I get: 'sh' is not recognized as an internal or external command. Any help would be greatly appreciated.

opened by FatElmo 14
Sampling code flags descriptions (support for --help?)

Is there a list of the flags for both conditional and unconditional models with their definitions? (I looked in the blog and paper and couldn't find any mention.)

In particular, for reproducibility purposes, it'd be great to know the definition of temperature and top_k and how choosing different values for these affect the results.

Thanks!
help wanted good first issue

opened by ilopezfr 13

Issue with gsutil download_model.sh

Hi,

I'm not familiar with gsutil. Installed it freshly using the 6 steps of : https://cloud.google.com/storage/docs/gsutil_install

Upon running the script :

When I'm not logged in on cloud.

ServiceException: 401 Anonymous caller does not have storage.objects.get access to gpt-2/models//checkpoint.
ServiceException: 401 Anonymous caller does not have storage.objects.get access to gpt-2/models//encoder.json.
ServiceException: 401 Anonymous caller does not have storage.objects.get access to gpt-2/models//hparams.json.
ServiceException: 401 Anonymous caller does not have storage.objects.get access to gpt-2/models//model.ckpt.data-00000-of-00001.
ServiceException: 401 Anonymous caller does not have storage.objects.get access to gpt-2/models//model.ckpt.index.
ServiceException: 401 Anonymous caller does not have storage.objects.get access to gpt-2/models//model.ckpt.meta.
ServiceException: 401 Anonymous caller does not have storage.objects.get access to gpt-2/models//vocab.bpe.

When I'm logged in on cloud :


AccessDeniedException: 403 myemail@gmail.com does not have storage.objects.list access to gpt-2.
AccessDeniedException: 403 myemail@gmail.com does not have storage.objects.list access to gpt-2.
AccessDeniedException: 403 myemail@gmail.com does not have storage.objects.list access to gpt-2.
AccessDeniedException: 403 myemail@gmail.com does not have storage.objects.list access to gpt-2.
AccessDeniedException: 403 myemail@gmail.com does not have storage.objects.list access to gpt-2.
AccessDeniedException: 403 myemail@gmail.com does not have storage.objects.list access to gpt-2.
AccessDeniedException: 403 myemail@gmail.com does not have storage.objects.list access to gpt-2.

Thanks

opened by unrealwill 11

Autocomplete a sentence

Hello,

Is it possible to predict the next word in a sentence as the research claims ? Is this locked in the bigger model ?

The python code randomnly generate sentences.

For example, like Google smart compose type "My father gifted me ", and an autocomplete prompts cheque ?

Thanks

opened by kishoreneelamegam 10
Release raw lambada dataset

Is it possible to release the Lambada dataset used to generate accuracy numbers in Table 3 of the paper? This would make it easier to do comparisons with other models :) @Newmu

opened by yaroslavvb 9
removed gsutil dependency

I have removed the gsutil dependency using curl and google drive. This approach is well-known and used in several frameworks that need to download large models files (like in FastText)

opened by loretoparisi 9
Add a Dockerfile and ignore example artifacts

If you'd like, here's a Dockerfile to toss up as an alternate installation method.

Also quickly gitignored the samples file and core file generated by running the example in the README.

opened by madisonmay 9
Reading Comprehension: answer questions about given passages

Is there any way to run the Reading Comprehension: answer questions about given passages as shown in the openai example link. Can we run this using 117m model if yes than how.

opened by tomriddle54 8
sh doesn't do anything on Windows 10

Hello, what operating system do the instructions apply to? sh doesn't do anything on Windows 10. How would I install this on Win10?

Also, is the first step to clone the repo? The instructions don't seem to make sense otherwise.

Thanks.
help wanted

opened by JoeUX 8

docker image

the command : docker build --tag gpt-2 -f Dockerfile.gpu . return this :

Step 9/12 : RUN python3 download_model.py 124M
 ---> Running in afe900659249
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/certifi/core.py", line 14, in <module>
    from importlib.resources import path as get_path, read_text
ImportError: No module named 'importlib.resources'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "download_model.py", line 3, in <module>
    import requests
  File "/usr/local/lib/python3.5/dist-packages/requests/__init__.py", line 112, in <module>
    from . import utils
  File "/usr/local/lib/python3.5/dist-packages/requests/utils.py", line 24, in <module>
    from . import certs
  File "/usr/local/lib/python3.5/dist-packages/requests/certs.py", line 15, in <module>
    from certifi import where
  File "/usr/local/lib/python3.5/dist-packages/certifi/__init__.py", line 1, in <module>
    from .core import contents, where
  File "/usr/local/lib/python3.5/dist-packages/certifi/core.py", line 46, in <module>
    Resource = Union[str, "os.PathLike"]
  File "/usr/lib/python3.5/typing.py", line 552, in __getitem__
    dict(self.__dict__), parameters, _root=True)
  File "/usr/lib/python3.5/typing.py", line 512, in __new__
    for t2 in all_params - {t1} if not isinstance(t2, TypeVar)):
  File "/usr/lib/python3.5/typing.py", line 512, in <genexpr>
    for t2 in all_params - {t1} if not isinstance(t2, TypeVar)):
  File "/usr/lib/python3.5/typing.py", line 190, in __subclasscheck__
    self._eval_type(globalns, localns)
  File "/usr/lib/python3.5/typing.py", line 177, in _eval_type
    eval(self.__forward_code__, globalns, localns),
  File "<string>", line 1, in <module>
AttributeError: module 'os' has no attribute 'PathLike'
The command '/bin/sh -c python3 download_model.py 124M' returned a non-zero code: 1

opened by Vincent-vst 4

How to reproduce the reported F-score for the CoQA benchmark?

I have a question about how you evaluated GPT-2 on the CoQA dataset. We are struggling to reproduce the results reported in the paper (55 F1). We evaluated gpt2-xl from HuggingFace on CoQA and got an F1 of 28.7.

We used the official dev set and evaluation script, which we downloaded from here. Although we get good answers, these answers get a lower score due to the way the original CoQA benchmark evaluator is set up. Did you evaluate it differently?

opened by eirinistamatiou 0
interactive_conditional_samples.py crashes if there is more than one context token
I can run the generate_unconditional_samples.py script on my GPU without issue, however, when I run the interactive_conditional_samples.py script, it crashes if there is more than one context token.

The interactive_conditional_samples.py script works fine as long as the model prompt only produces one context token, for instance using the prompt "please" produces the list of tokens [29688] and correctly generates text. However, it crashes if the model prompt produces two or more context tokens, for instance using the prompt "pig" produces the list of tokens [79, 328] and crashes immediately.

When it crashes I'm getting the error: failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED

And a little further down I see:

Blas xGEMMBatched launch failed : a.shape=[25,2,64], b.shape=[25,2,64], m=2, n=2, k=64, batch_size=25 [[{{node sample_sequence/model/h0/attn/MatMul}}]] [[sample_sequence/while/Exit_3/_1375]]

If anyone has any insight on what might be going wrong, and how I can fix it, I'd really appreciate the help.
opened by Nicholas-Markley 0
Dose the pre-training data also use this prompt structure related to downstream tasks？

I read the gpt2 paper, but not sure whether the pre-training data from WebText will add format information. For example, we konw data format will be english sentence = french sentencein the translation task. So during pre-training time, will we add similar promt to the training data?

Thanks!

opened by Aurora-slz 0
Playground and API parameters dont seem to have an effect on certain completions

I've tried changing temperature, top-p, presence_penalty and frequency_penalty to stop the model from repeating the same joke and other phrases with no success. I thought maybe the playground had a glitch, but this happens in the API too.

Ask for a completion on "tell me a joke" and you will likely get "why did chicken cross the road". Then variations on that "why did the duck cross the road", etc. even if the chicken tokens are removed.

The model knows other jokes.. "why don't scientists trust atoms? because they make up everything" and "why didn't the bicycle go up the hill? because it was two tired", but these only happen randomly.. once it says the chicken/road joke it cant change it (like a bad comedian ; ).

Sorry to put this here, but the GPT-3 repo is archived for new issues, but it probably also applies here.

opened by auwsom 0

Code for the paper "Language Models are Unsupervised Multitask Learners"

Related tags

Overview

gpt-2

Usage

Some caveats

Work with us

Development

Contributors

Citation

Future work

License

Comments

Owner

OpenAI

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper

Code for CVPR 2021 paper: Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning

This repository will contain the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"

Code for ACL 2021 main conference paper "Conversations are not Flat: Modeling the Intrinsic Information Flow between Dialogue Utterances".

Code from the paper "High-Performance Brain-to-Text Communication via Handwriting"

source code for paper: WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach.

Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

Code for our paper "Mask-Align: Self-Supervised Neural Word Alignment" in ACL 2021

Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

Code for our ACL 2021 (Findings) Paper - Fingerprinting Fine-tuned Language Models in the wild .

Code for our paper "Transfer Learning for Sequence Generation: from Single-source to Multi-source" in ACL 2021.

Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification"

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL 2021.

Code for paper "Which Training Methods for GANs do actually Converge? (ICML 2018)"

Code and checkpoints for training the transformer-based Table QA models introduced in the paper TAPAS: Weakly Supervised Table Parsing via Pre-training.

This is the code for the EMNLP 2021 paper AEDA: An Easier Data Augmentation Technique for Text Classification

This repository contains the code for EMNLP-2021 paper "Word-Level Coreference Resolution"