Code for the paper "Language Models are Unsupervised Multitask Learners"

Overview

Status: Archive (code is provided as-is, no updates expected)

gpt-2

Code and models from the paper "Language Models are Unsupervised Multitask Learners".

You can read about GPT-2 and its staged release in our original blog post, 6 month follow-up post, and final post.

We have also released a dataset for researchers to study their behaviors.

* Note that our original parameter counts were wrong due to an error (in our previous blog posts and paper). Thus you may have seen small referred to as 117M and medium referred to as 345M.

Usage

This repository is meant to be a starting point for researchers and engineers to experiment with GPT-2.

For basic information, see our model card.

Some caveats

  • GPT-2 models' robustness and worst case behaviors are not well-understood. As with any machine-learned model, carefully evaluate GPT-2 for your use case, especially if used without fine-tuning or in safety-critical applications where reliability is important.
  • The dataset our GPT-2 models were trained on contains many texts with biases and factual inaccuracies, and thus GPT-2 models are likely to be biased and inaccurate as well.
  • To avoid having samples mistaken as human-written, we recommend clearly labeling samples as synthetic before wide dissemination. Our models are often incoherent or inaccurate in subtle ways, which takes more than a quick read for a human to notice.

Work with us

Please let us know if you’re doing interesting research with or working on applications of GPT-2! We’re especially interested in hearing from and potentially working with those who are studying

  • Potential malicious use cases and defenses against them (e.g. the detectability of synthetic text)
  • The extent of problematic content (e.g. bias) being baked into the models and effective mitigations

Development

See DEVELOPERS.md

Contributors

See CONTRIBUTORS.md

Citation

Please use the following bibtex entry:

@article{radford2019language,
  title={Language Models are Unsupervised Multitask Learners},
  author={Radford, Alec and Wu, Jeff and Child, Rewon and Luan, David and Amodei, Dario and Sutskever, Ilya},
  year={2019}
}

Future work

We may release code for evaluating the models on various benchmarks.

We are still considering release of the larger models.

License

Modified MIT

Comments
  • Fix TODO in sample.sample_sequences- Avoid 'leaving last token calculation to while loop'

    Fix TODO in sample.sample_sequences- Avoid 'leaving last token calculation to while loop'

    Hi,

    This change runs the initial model step on the full context, by calling the body() function. I added an 'first' parameter defaulting to False to allow this.

    opened by albertwujj 28
  • My CPU doesn't support Tensorflow AVX instructions

    My CPU doesn't support Tensorflow AVX instructions

    I was able to install all the requirements. However while generating samples, getting the following error. I have an Intel i3 First gen Processor and running Ubuntu 18.

    2019-02-16 03:12:49.453982: F tensorflow/core/platform/cpu_feature_guard.cc:37] The TensorFlow library was compiled to use AVX instructions, but these aren't available on your machine. Aborted (core dumped)

    I then installed Tensorflow 1.5 (pip3 install tensorflow==1.5). The sample was generated, however another warning popped up as shown below. Will this affect the quality? Do I need to compile TensorFlow on my system?

    2019-02-16 03:22:19.785441: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2

    opened by bertagknowles 16
  • Can't install sh_download_model.sh

    Can't install sh_download_model.sh

    Noob here (linguist, with rudimentary knowledge of computers) I've installed the gcloud sdk but I can't get the command: sh download_model.sh 117M to run. I get: 'sh' is not recognized as an internal or external command. Any help would be greatly appreciated.

    opened by FatElmo 14
  • Sampling code flags descriptions (support for --help?)

    Sampling code flags descriptions (support for --help?)

    Is there a list of the flags for both conditional and unconditional models with their definitions? (I looked in the blog and paper and couldn't find any mention.)

    In particular, for reproducibility purposes, it'd be great to know the definition of temperature and top_k and how choosing different values for these affect the results.

    Thanks!

    help wanted good first issue 
    opened by ilopezfr 13
  • Issue with gsutil download_model.sh

    Issue with gsutil download_model.sh

    Hi,

    I'm not familiar with gsutil. Installed it freshly using the 6 steps of : https://cloud.google.com/storage/docs/gsutil_install

    Upon running the script :

    When I'm not logged in on cloud.

    ServiceException: 401 Anonymous caller does not have storage.objects.get access to gpt-2/models//checkpoint.
    ServiceException: 401 Anonymous caller does not have storage.objects.get access to gpt-2/models//encoder.json.
    ServiceException: 401 Anonymous caller does not have storage.objects.get access to gpt-2/models//hparams.json.
    ServiceException: 401 Anonymous caller does not have storage.objects.get access to gpt-2/models//model.ckpt.data-00000-of-00001.
    ServiceException: 401 Anonymous caller does not have storage.objects.get access to gpt-2/models//model.ckpt.index.
    ServiceException: 401 Anonymous caller does not have storage.objects.get access to gpt-2/models//model.ckpt.meta.
    ServiceException: 401 Anonymous caller does not have storage.objects.get access to gpt-2/models//vocab.bpe.
    
    

    When I'm logged in on cloud :

    
    AccessDeniedException: 403 [email protected] does not have storage.objects.list access to gpt-2.
    AccessDeniedException: 403 [email protected] does not have storage.objects.list access to gpt-2.
    AccessDeniedException: 403 [email protected] does not have storage.objects.list access to gpt-2.
    AccessDeniedException: 403 [email protected] does not have storage.objects.list access to gpt-2.
    AccessDeniedException: 403 [email protected] does not have storage.objects.list access to gpt-2.
    AccessDeniedException: 403 [email protected] does not have storage.objects.list access to gpt-2.
    AccessDeniedException: 403 [email protected] does not have storage.objects.list access to gpt-2.
    
    

    Thanks

    opened by unrealwill 11
  • Autocomplete a sentence

    Autocomplete a sentence

    Hello,

    Is it possible to predict the next word in a sentence as the research claims ? Is this locked in the bigger model ?

    The python code randomnly generate sentences.

    For example, like Google smart compose type "My father gifted me ", and an autocomplete prompts cheque ?

    Thanks

    opened by kishoreneelamegam 10
  • Release raw lambada dataset

    Release raw lambada dataset

    Is it possible to release the Lambada dataset used to generate accuracy numbers in Table 3 of the paper? This would make it easier to do comparisons with other models :) @Newmu

    opened by yaroslavvb 9
  • removed gsutil dependency

    removed gsutil dependency

    I have removed the gsutil dependency using curl and google drive. This approach is well-known and used in several frameworks that need to download large models files (like in FastText)

    opened by loretoparisi 9
  • Add a Dockerfile and ignore example artifacts

    Add a Dockerfile and ignore example artifacts

    If you'd like, here's a Dockerfile to toss up as an alternate installation method.

    Also quickly gitignored the samples file and core file generated by running the example in the README.

    opened by madisonmay 9
  • Reading Comprehension: answer questions about given passages

    Reading Comprehension: answer questions about given passages

    Is there any way to run the Reading Comprehension: answer questions about given passages as shown in the openai example link. Can we run this using 117m model if yes than how.

    opened by tomriddle54 8
  • sh doesn't do anything on Windows 10

    sh doesn't do anything on Windows 10

    Hello, what operating system do the instructions apply to? sh doesn't do anything on Windows 10. How would I install this on Win10?

    Also, is the first step to clone the repo? The instructions don't seem to make sense otherwise.

    Thanks.

    help wanted 
    opened by JoeUX 8
  • docker image

    docker image

    the command : docker build --tag gpt-2 -f Dockerfile.gpu . return this :

    Step 9/12 : RUN python3 download_model.py 124M
     ---> Running in afe900659249
    Traceback (most recent call last):
      File "/usr/local/lib/python3.5/dist-packages/certifi/core.py", line 14, in <module>
        from importlib.resources import path as get_path, read_text
    ImportError: No module named 'importlib.resources'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "download_model.py", line 3, in <module>
        import requests
      File "/usr/local/lib/python3.5/dist-packages/requests/__init__.py", line 112, in <module>
        from . import utils
      File "/usr/local/lib/python3.5/dist-packages/requests/utils.py", line 24, in <module>
        from . import certs
      File "/usr/local/lib/python3.5/dist-packages/requests/certs.py", line 15, in <module>
        from certifi import where
      File "/usr/local/lib/python3.5/dist-packages/certifi/__init__.py", line 1, in <module>
        from .core import contents, where
      File "/usr/local/lib/python3.5/dist-packages/certifi/core.py", line 46, in <module>
        Resource = Union[str, "os.PathLike"]
      File "/usr/lib/python3.5/typing.py", line 552, in __getitem__
        dict(self.__dict__), parameters, _root=True)
      File "/usr/lib/python3.5/typing.py", line 512, in __new__
        for t2 in all_params - {t1} if not isinstance(t2, TypeVar)):
      File "/usr/lib/python3.5/typing.py", line 512, in <genexpr>
        for t2 in all_params - {t1} if not isinstance(t2, TypeVar)):
      File "/usr/lib/python3.5/typing.py", line 190, in __subclasscheck__
        self._eval_type(globalns, localns)
      File "/usr/lib/python3.5/typing.py", line 177, in _eval_type
        eval(self.__forward_code__, globalns, localns),
      File "<string>", line 1, in <module>
    AttributeError: module 'os' has no attribute 'PathLike'
    The command '/bin/sh -c python3 download_model.py 124M' returned a non-zero code: 1
    
    opened by Vincent-vst 4
  • How to reproduce the reported F-score for the CoQA benchmark?

    How to reproduce the reported F-score for the CoQA benchmark?

    I have a question about how you evaluated GPT-2 on the CoQA dataset. We are struggling to reproduce the results reported in the paper (55 F1). We evaluated gpt2-xl from HuggingFace on CoQA and got an F1 of 28.7.

    We used the official dev set and evaluation script, which we downloaded from here. Although we get good answers, these answers get a lower score due to the way the original CoQA benchmark evaluator is set up. Did you evaluate it differently?

    opened by eirinistamatiou 0
  • interactive_conditional_samples.py crashes if there is more than one context token

    interactive_conditional_samples.py crashes if there is more than one context token

    I can run the generate_unconditional_samples.py script on my GPU without issue, however, when I run the interactive_conditional_samples.py script, it crashes if there is more than one context token.

    The interactive_conditional_samples.py script works fine as long as the model prompt only produces one context token, for instance using the prompt "please" produces the list of tokens [29688] and correctly generates text. However, it crashes if the model prompt produces two or more context tokens, for instance using the prompt "pig" produces the list of tokens [79, 328] and crashes immediately.

    When it crashes I'm getting the error: failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED

    And a little further down I see:

    Blas xGEMMBatched launch failed : a.shape=[25,2,64], b.shape=[25,2,64], m=2, n=2, k=64, batch_size=25
             [[{{node sample_sequence/model/h0/attn/MatMul}}]]
             [[sample_sequence/while/Exit_3/_1375]]
    

    If anyone has any insight on what might be going wrong, and how I can fix it, I'd really appreciate the help.

    opened by Nicholas-Markley 0
  • Dose the pre-training data also use this prompt structure related to downstream tasks?

    Dose the pre-training data also use this prompt structure related to downstream tasks?

    I read the gpt2 paper, but not sure whether the pre-training data from WebText will add format information. For example, we konw data format will be english sentence = french sentencein the translation task. So during pre-training time, will we add similar promt to the training data?

    Thanks!

    opened by Aurora-slz 0
  • Playground and API parameters dont seem to have an effect on certain completions

    Playground and API parameters dont seem to have an effect on certain completions

    I've tried changing temperature, top-p, presence_penalty and frequency_penalty to stop the model from repeating the same joke and other phrases with no success. I thought maybe the playground had a glitch, but this happens in the API too.

    Ask for a completion on "tell me a joke" and you will likely get "why did chicken cross the road". Then variations on that "why did the duck cross the road", etc. even if the chicken tokens are removed.

    The model knows other jokes.. "why don't scientists trust atoms? because they make up everything" and "why didn't the bicycle go up the hill? because it was two tired", but these only happen randomly.. once it says the chicken/road joke it cant change it (like a bad comedian ; ).

    Sorry to put this here, but the GPT-3 repo is archived for new issues, but it probably also applies here.

    opened by auwsom 0
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

T5: Text-To-Text Transfer Transformer The t5 library serves primarily as code for reproducing the experiments in Exploring the Limits of Transfer Lear

Google Research 4.6k Jan 1, 2023
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

T5: Text-To-Text Transfer Transformer The t5 library serves primarily as code for reproducing the experiments in Exploring the Limits of Transfer Lear

Google Research 3.2k Feb 17, 2021
Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper

Data Augmentation using Pre-trained Transformer Models Code associated with the Data Augmentation using Pre-trained Transformer Models paper Code cont

null 44 Dec 31, 2022
Code for CVPR 2021 paper: Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning

Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning This is the PyTorch companion code for the paper: A

Amazon 69 Jan 3, 2023
This repository will contain the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"

GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields Project Page | Paper | Supplementary | Video | Slides | Blog | Talk If

null 1.1k Dec 27, 2022
Code for ACL 2021 main conference paper "Conversations are not Flat: Modeling the Intrinsic Information Flow between Dialogue Utterances".

Conversations are not Flat: Modeling the Intrinsic Information Flow between Dialogue Utterances This repository contains the code and pre-trained mode

ICTNLP 90 Dec 27, 2022
Code from the paper "High-Performance Brain-to-Text Communication via Handwriting"

Code from the paper "High-Performance Brain-to-Text Communication via Handwriting"

Francis R. Willett 305 Dec 22, 2022
source code for paper: WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach.

WhiteningBERT Source code and data for paper WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach. Preparation git clone https://github.com

null 49 Dec 17, 2022
Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation This repository is the pytorch implementation of our paper: Hierarchical Cr

null 44 Jan 6, 2023
Code for our paper "Mask-Align: Self-Supervised Neural Word Alignment" in ACL 2021

Mask-Align: Self-Supervised Neural Word Alignment This is the implementation of our work Mask-Align: Self-Supervised Neural Word Alignment. @inproceed

THUNLP-MT 46 Dec 15, 2022
Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

ConSERT Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer Requirements torch==1.6.0

Yan Yuanmeng 478 Dec 25, 2022
Code for our ACL 2021 (Findings) Paper - Fingerprinting Fine-tuned Language Models in the wild .

?? Fingerprinting Fine-tuned Language Models in the wild This is the code and dataset for our ACL 2021 (Findings) Paper - Fingerprinting Fine-tuned La

LCS2-IIITDelhi 5 Sep 13, 2022
Code for our paper "Transfer Learning for Sequence Generation: from Single-source to Multi-source" in ACL 2021.

TRICE: a task-agnostic transferring framework for multi-source sequence generation This is the source code of our work Transfer Learning for Sequence

THUNLP-MT 9 Jun 27, 2022
Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification"

PTR Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification" If you use the code, please cite the following paper: @art

THUNLP 118 Dec 30, 2022
null 189 Jan 2, 2023
Code for paper "Which Training Methods for GANs do actually Converge? (ICML 2018)"

GAN stability This repository contains the experiments in the supplementary material for the paper Which Training Methods for GANs do actually Converg

Lars Mescheder 884 Nov 11, 2022
This is the code for the EMNLP 2021 paper AEDA: An Easier Data Augmentation Technique for Text Classification

The baseline code is for EDA: Easy Data Augmentation techniques for boosting performance on text classification tasks

Akbar Karimi 81 Dec 9, 2022
This repository contains the code for EMNLP-2021 paper "Word-Level Coreference Resolution"

Word-Level Coreference Resolution This is a repository with the code to reproduce the experiments described in the paper of the same name, which was a

null 79 Dec 27, 2022