๐ŸฅA PyTorch implementation of OpenAI's finetuned transformer language model with a script to import the weights pre-trained by OpenAI

Overview

PyTorch implementation of OpenAI's Finetuned Transformer Language Model

This is a PyTorch implementation of the TensorFlow code provided with OpenAI's paper "Improving Language Understanding by Generative Pre-Training" by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.

This implementation comprises a script to load in the PyTorch model the weights pre-trained by the authors with the TensorFlow implementation.

Transformer Language Model

The model classes and loading script are located in model_pytorch.py.

The names of the modules in the PyTorch model follow the names of the Variable in the TensorFlow implementation. This implementation tries to follow the original code as closely as possible to minimize the discrepancies.

This implementation thus also comprises a modified Adam optimization algorithm as used in OpenAI's paper with:

Requirements

To use the model it-self by importing model_pytorch.py, you just need:

  • PyTorch (version >=0.4)

To run the classifier training script in train.py you will need in addition:

  • tqdm
  • sklearn
  • spacy
  • ftfy
  • pandas

You can download the weights of the OpenAI pre-trained version by cloning Alec Radford's repo and placing the model folder containing the pre-trained weights in the present repo.

Using the pre-trained model as a Transformer Language Model

The model can be used as a transformer language model with OpenAI's pre-trained weights as follow:

from model_pytorch import TransformerModel, load_openai_pretrained_model, DEFAULT_CONFIG

args = DEFAULT_CONFIG
model = TransformerModel(args)
load_openai_pretrained_model(model)

This model generates Transformer's hidden states. You can use the LMHead class in model_pytorch.py to add a decoder tied with the weights of the encoder and get a full language model. You can also use the ClfHead class in model_pytorch.py to add a classifier on top of the transformer and get a classifier as described in OpenAI's publication. (see an example of both in the __main__ function of train.py)

To use the positional encoder of the transformer, you should encode your dataset using the encode_dataset() function of utils.py. Please refer to the beginning of the __main__ function in train.py to see how to properly define the vocabulary and encode your dataset.

Fine-tuning the pre-trained model on a classification task

This model can also be integrated in a classifier as detailed in OpenAI's paper. An example of fine-tuning on the ROCStories Cloze task is included with the training code in train.py

The ROCStories dataset can be downloaded from the associated website.

As with the TensorFlow code, this code implements the ROCStories Cloze Test result reported in the paper which can be reproduced by running:

python -m spacy download en
python train.py --dataset rocstories --desc rocstories --submit --analysis --data_dir [path to data here]

First experiments on the ROCStories test set

Finetuning the PyTorch model for 3 Epochs on ROCStories takes 10 minutes to run on a single NVidia K-80.

The single run test accuracy of this PyTorch version is 85.84%, while the authors reports a median accuracy with the TensorFlow code of 85.8% and the paper reports a best single run accuracy of 86.5%.

The authors implementations uses 8 GPU and can thus accomodate a batch of 64 samples while the present implementation is single GPU and is in consequence limited to 20 instances on a K80 for memory reasons. In our test, increasing the batch size from 8 to 20 samples increased the test accuracy by 2.5 points. A better accuracy may be obtained by using a multi-GPU setting (not tried yet).

The previous SOTA on the ROCStories dataset is 77.6% ("Hidden Coherence Model" of Chaturvedi et al. published in "Story Comprehension for Predicting What Happens Next" EMNLP 2017, which is a very nice paper too!)

Issues
  • How does Dropout2d help in cloze task?

    How does Dropout2d help in cloze task?

    class ClfHead(nn.Module):
        """ Classifier Head for the transformer """
    
        def __init__(self, clf_token, cfg):
            super(ClfHead, self).__init__()
            self.n_embd = cfg.n_embd
            self.clf_token = clf_token
            self.dropout = nn.Dropout2d(cfg.clf_pdrop)  # To reproduce the noise_shape parameter of TF implementation
            self.linear = nn.Linear(cfg.n_embd, 1)
            nn.init.normal_(self.linear.weight, std=0.02)
            nn.init.normal_(self.linear.bias, 0)
    
        def forward(self, h, x):
            # Classification logits
            clf_h = h.view(-1, self.n_embd)
            flat = x[:, :, :, 0].contiguous().view(-1)
            clf_h = clf_h[flat == self.clf_token, :]
            clf_h = clf_h.view(-1, x.size(1), self.n_embd, 1)
            clf_h = self.dropout(clf_h)
            clf_h = clf_h.view(-1, self.n_embd)
            clf_logits = self.linear(clf_h)
            return clf_logits.view(-1, x.size(1))
    

    Here the self.dropout(clf_h) essentially removes the representation of a sentence and its conclusion, there is remote chance (0.2*0.2) that both representations get removed for a given data item. I am confused on how this aids training .

    opened by sai-prasanna 12
  • Results and questions on text generation experiments with pretrained LM model

    Results and questions on text generation experiments with pretrained LM model

    Dear guys,

    I did some experiments on text generation with the pretrained LM model. I made a PR so you can see the changes: https://github.com/huggingface/pytorch-openai-transformer-lm/pull/35 I have some questions regarding the results.

    1. The generation quality is very poor. The model can not generate grammatical sentences, let alone long coherent sentences. Here are some snippets: Input some beginning words: I love you , " you said . first . last click ... game ' keep ' ' the zer that

    Input some beginning words: Once upon a time . " freyja , freyja freyja freyja freyja freyja freyja freyja freyja freyja freyja freyja freyja freyja freyja freyja freyja

    Input some beginning words: Everytime . the . - holding . - " nothing in . very ... out . " grin .

    Input some beginning words: I feel very royal . please . at , very , ' ! deserving ... ' something , family , had

    1. At each step, the top 5 candidates for next token are dominated by the most frequent tokens, e.g. ",", "and", "the", "was", but also have some infrequent tokens, e.g. "-", "f". When these infrequent tokens show up, they are irrelevant with the sentence context. I don't know why.

    2. As the output layer also have weights for 512 position embeddings, the output dim is 40478 (word indices) + 512 (position indices). The logits for these 512 indices are usually much larger than 40478 word indices, so I have to mask them before softmax. I think this is a bit strange because during pretraining the correct labels are always within the 40478 indices.

    The paper reported a very low ppl of 18.4 on the BooksCorpus. I thought the pretrained model should be a very strong LM model able to generate high quality text. The results confused me. Can you give me some advice? Is it because deep transformer lm is inherently not good at the generation task, or due to some hidden bug in my code?

    • Da Xiao
    opened by xiaoda99 10
  • How should one modify the code to successfully run text classification?

    How should one modify the code to successfully run text classification?

    Hi,

    I am new to PyTorch (but still more at ease with it than TF) so I thought to experiment with @thomwolf 's implementation in this repo (thanks for sharing it!!)

    I would like to try out the code to perform binary text classification of text snippets, similar to the classification tasks such as the Corpus of Linguistic Acceptability (CoLA) and the Stanford Sentiment Treebank (SST-2) in the original reference.

    These are the steps that I think are needed to get the code working (but I am not sure that these are correct and/or exhaustive):

    1. Create two sets snippets_val.csv and snippets_test.csv containing two columns, text (string) and class (an int equal to 0 or 1).
    2. In datasets.py create two new functions:
      • _snippets returning two lists st, y, and
      • snippets defined with different values of n_train and n_validand whose return statement looks like return (trX, trY), (vaX, vaY), (teX, )
    3. In train.py, rewrite transform_roc into a transform_snippet that doesn't use [delimiter] and takes only one argument in input <- somewhat tricky to me can anyone provide some guidance?
    4. In train.py, in the encoding bit and afterwards:
    5. In train.py:
    6. In analysis.py:
      • create a new function snippets so to invoke _snippets (from datasets.py) and read in snippets_test.csv and adjust its call to _snippets so to take into account that it outputs two lists (not 4)
    7. Modify imports in train.py coherently with all of the above.

    Does all of the above make sense as a plan, or can somebody fill missing bits or provide an alternative list of "sub-steps" ? Also, can someone provide some guidance on how to rewrite transform_roc (comments on the original code would be fantastic, I am glad to annotate the original function and contribute to the repo as a result of this!)

    Thanks to anyone patiently reading this!

    opened by davidefiocco 7
  • dimensioning bug?

    dimensioning bug?

    https://github.com/huggingface/pytorch-openai-transformer-lm/blob/master/model_py.py#L77-L84

    The reference implementation and paper uses tensorflow, in which channel dimension is represented last. But in pytorch, the channel dimension is represented as dim=1 after the batch. So does the equation need to be reversed? Is this done somewhere else in the code?

    so in pytorch it should look like: matmul(v, matmul(k.t(), q))

    example (tensorflow):

    q size [attn_depth=3, feature_depth=2]
    k size [attn_depth=3, feature_depth=2]
    v size [attn_depth=3, feature_depth=2]
    
    weights = matmul(q, k.t()) -> [3, 3]
    result = matmul(weights, v) -> [3, 2]
    

    example (pytorch) with error:

    q size [2, 3]
    k size [2, 3]
    v size [2, 3]
    
    weights = matmul(q, k.t()) -> [2, 2]
    result = matmul(weights, v) -> [2, 3]
    

    Notice the dimensionality of the weights has been over-reduced [2,2] instead of [3,3]

    example (pytorch) with correction:

    q size [2, 3]
    k size [2, 3]
    v size [2, 3]
    
    weights = matmul(k.t(), q) -> [3, 3]
    result = matmul(v, weights) -> [2, 3]
    

    Apologies if this is handled correctly somewhere, limited on time atm for a thorough read.

    opened by jtatusko 4
  • Object is not specified

    Object is not specified

    https://github.com/huggingface/pytorch-openai-transformer-lm/blob/37e77aff19431b0b9c5b5a411ad71bcc2e138ffd/model_pytorch.py#L180

    opened by Oktai15 4
  • Why do we need to apply mask while fine tuning?

    Why do we need to apply mask while fine tuning?

    In attention class, you have the following code for masking. I understand the logic for pre training, but in fine tuning if we dont include language model loss we should have a check here for not applying the mask. Do we have to always apply the masking because the model was trained that way, is there an intuitive idea for this, because I dont see a necessity to do it experimentally

    This is the line I am talking about w = w * self.b + -1e9 * (1 - self.b) # TF implem method: mask_attn_weights

    opened by pranoy-k 4
  • fix the scope of optimizer

    fix the scope of optimizer

    The target of an optimizer should contain clf_head (new task-specific output matrix) in addition to model (Transformer encoder). The code might fail to do that, right?

    opened by soskek 3
  • Pre-trained LMHead

    Pre-trained LMHead

    Hi!

    First, I would like to thank you for your translation of the implementation of this paper.

    I have read the code and managed to run it but I am not able to find how to load pre-trained weights into a LMHead to get a general English language model. Did the openAI guys not release the weights of this final layer?

    I thought the performance would be better starting the finetuning process from a pre-trained LMHead.

    opened by rodgzilla 3
  • Noise shape dropout

    Noise shape dropout

    Reproducing the specific behavior of the classifier dropout of the original OpenAI implementation of the article. The details of the this patch can be found in issue #11.

    opened by rodgzilla 3
  • How does position embedding implementation work?

    How does position embedding implementation work?

    So there's the TransformerModel's forward method, and I just can't get a hold of the position embedding part (and might be wrong about others). So, as far as I can tell, step-by-step it goes like this:

    1. Reshape our input to have 3 dimensions -> [ ? x sequences (?) x tokens (512) ]
    2. Get the individual token embeddings -> [ ? x sequences (?) x tokens (512) x emb_dim (768) ]
    3. Sum up those embeddings along axis 2 (summing token embeddings element-wise for each sequence?) -> [ ? x sequences x emb_dim (768) ]
    4. Shouldn't we have [ sequences x tokens (512) x emb_dim (768) ] here?
    def forward(self, x):
            x = x.view(-1, x.size(-2), x.size(-1))
            e = self.embed(x)
            # Add the position information to the input embeddings
            h = e.sum(dim=2)
            for block in self.h:
                h = block(h)
            return h
    

    My questions are:

    • What are x , e, and h tensors' axes?
    • How can a sum of an internal part add positional information to our token embeddings?
    • How is that operation equivalent to the paper's, where the position embedding is an external, learned matrix which is added to the token embeddings?

    Thank you in advance!

    opened by bcserna 2
  • Implementation of Similarity Head

    Implementation of Similarity Head

    Similarity Head and Loss function were tested on the STS-B dataset, achieving nearly the same performance as reported (82.45% PC relative to the 82% in the paper). I can provide the necessary changed code for loading the dataset and reproducing my results if wanted.

    opened by TEGELB 0
  • Training from scratch: Repeated and mangled words

    Training from scratch: Repeated and mangled words

    I am trying to use this repository to train a language model with an additional input. My data looks like this:

    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”
    โ”‚side infoโ”‚startโ”‚The โ”‚catโ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”˜
    

    The labels look like this

    โ”Œโ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”
    โ”‚The โ”‚catโ”‚meowsโ”‚
    โ””โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”˜
    

    Since my objective is quite different from the original training script I implemented the training from scratch but I noticed that it takes much more time than a simple LSTM model to become somewhat decent and the results are not fully concise language even after 15 epochs on 2 million sentences. I am getting outputs that look like this

    Gold label: In most cases , accurate results can only be achieved after a laborious and expensive trial and error process .

    Output: only most accurate cases can be achieved after a laborious error and process results In trial and expensive suit.

    Currently I am using a small model with 4 layers and 2 heads each.

    I randomly initialized the position encodings and multiplied them by 0.1 to match the variance of my word embeddings.

    Any ideas what I could have missed?

    Here is some of my code

    batch_size = 32
    n_epochs = 100
    max_len = 120
    
    embeddings, emb_weights = load_embeddings(data_path+'de.en.fr.ka.tok.60000.shuf.vec',max_len)
    train_dataset = SortedSentenceDataset(data_path+'train.txt', 200000, max_len, embeddings, 'avg',device)
    train_sampler = train_dataset.get_sampler(batch_size)
    train_loader = DataLoader(train_dataset, batch_size=1, sampler=train_sampler)
    dev_dataset = SortedSentenceDataset(data_path+'valid.txt', 1000, max_len, embeddings, 'avg',device)
    dev_sampler = dev_dataset.get_sampler(batch_size)
    dev_loader = DataLoader(dev_dataset, batch_size=1, sampler=dev_sampler)
    
    args = DEFAULT_CONFIG
    args.n_embd = emb_weights.size(1)
    # Constraint: embedding size % number of heads = 0
    args.n_head = 2
    args.n_layer = 4
    model = load_model(args, emb_weights)
    
    model.to(device)
    
    criterion = torch.nn.CrossEntropyLoss()
    
    optimizer = OpenAIAdam(model.parameters(),
                               lr=6.25e-3,
                               schedule='warmup_linear',
                               warmup=0.02,
                               t_total=n_epochs*len(train_dataset)*20,
                               b1=0.9,
                               b2=0.999,
                               e=1e-8,
                               l2=0.01,
                               vector_l2='store_true',
                               max_grad_norm=1)
    
    best = 1000
    for epoch in range(n_epochs):
        do_epoch(train_loader)
        val_loss = eval(dev_loader)
        print('Validation loss: {}'.format(val_loss))
        if val_loss < best:
            best = val_loss
            print('Saving model')
            torch.save(model.state_dict(),"context-at-each-layer-checkpoint-{}k{}e4b.pt".format(len(train_dataset)//1000,n_epochs))
        print(' '.join(generate(train_dataset,max_len,embeddings)))
    
    opened by maruker 0
  • Instructions for encoding own sentences

    Instructions for encoding own sentences

    I'd like to use GPT to encode my dataset and use the representations further for the task of question generation. I have problems with understanding the code and the name of the arguments in the train.py file (in main). Could anyone direct me to some examples (I already search online) or possibly post some here?

    Cheers

    opened by izaskr 1
  • Running on new dataset similar to rocstories

    Running on new dataset similar to rocstories

    Hi all,

    I am trying to train a new dataset with a similar structure to rocstories. It has a story part, 2 options and one correct option. I just added a new function in datasets.py but this is not enough. I am not able to train. Has anyone done that and can provide me with some suggestions?

    Thanks in advance.

    opened by priyanka-chaudhary 0
  • ConvAI

    ConvAI

    From the ConvAI slides, it sounds like the Hugging Face submission was based off of this model -- is the code for your ConvAI system available somewhere to take a look at? Thanks!

    opened by bkj 0
  • vocab = n_vocab + n_special + n_ctx means?

    vocab = n_vocab + n_special + n_ctx means?

    I know that n_vocab is the total number of tokens in encoder dictionary. But when I saw vocab = n_vocab + n_special + n_ctx, I was confused, maybe n_special is the for start,delimiter and classify. But what is n_ctx? Why add these 3 things? (why there is little comment about variables and functions....Is there somewhere else to see the explanation of the codes?) I am new to learn about the transformer.

    opened by JiahangOK 1
  • Implementation of Seq2Seq with Transformer

    Implementation of Seq2Seq with Transformer

    Just curious what would be the place to start to create a seq2seq for response generation on say the persona-chat dataset

    opened by bhedayat 0
  • Potentially incorrect regex in text_utils.py

    Potentially incorrect regex in text_utils.py

    Hi, we have some of your regex's in AllenNLP and Python has been warning us about them for a while.

    https://github.com/huggingface/pytorch-openai-transformer-lm/blob/master/text_utils.py#L30

    '''(-+|~+|!+|"+|;+|\?+|\++|,+|\)+|\(+|\\+|\/+|\*+|\[+|\]+|}+|{+|\|+|_+)'''<input>:1: DeprecationWarning: invalid escape sequence \?
    <input>:1: DeprecationWarning: invalid escape sequence \?
    <input>:1: DeprecationWarning: invalid escape sequence \?
    In [38]: '''(-+|~+|!+|"+|;+|\?+|\++|,+|\)+|\(+|\\+|\/+|\*+|\[+|\]+|}+|{+|\|+|_+)'''
    <input>:1: DeprecationWarning: invalid escape sequence \?
    <input>:1: DeprecationWarning: invalid escape sequence \?
    <input>:1: DeprecationWarning: invalid escape sequence \?
    <ipython-input-38-9a7773b0447c>:1: DeprecationWarning: invalid escape sequence \?
    

    In fixing them I looked to your implementation and noticed you prefixed the expressions with r so they are raw strings (presumably to fix the same warnings). However, I think this actually changed one of your regexes to something other than was intended.

    # Before
    $ '''(-+|~+|!+|"+|;+|\?+|\++|,+|\)+|\(+|\\+|\/+|\*+|\[+|\]+|}+|{+|\|+|_+)'''
    '(-+|~+|!+|"+|;+|\\?+|\\++|,+|\\)+|\\(+|\\+|\\/+|\\*+|\\[+|\\]+|}+|{+|\\|+|_+)'
    
    #After
    $ r'''(-+|~+|!+|"+|;+|\?+|\++|,+|\)+|\(+|\\+|\/+|\*+|\[+|\]+|}+|{+|\|+|_+)'''
    '(-+|~+|!+|"+|;+|\\?+|\\++|,+|\\)+|\\(+|\\\\+|\\/+|\\*+|\\[+|\\]+|}+|{+|\\|+|_+)'
    
    $ '''(-+|~+|!+|"+|;+|\?+|\++|,+|\)+|\(+|\\+|\/+|\*+|\[+|\]+|}+|{+|\|+|_+)''' == r'''(-+|~+|!+|"+|;+|\?+|\++|,+|\)+|\(+|\\+|\/+|\*+|\[+|\]+|}+|{+|\|+|_+)'''
    False
    

    The switch to raw strings changed '|\\+' (one or more backslashes) to |\\\\+ (two or more backslashes). I think you actually want the following regex.

    r'''(-+|~+|!+|"+|;+|\?+|\++|,+|\)+|\(+|\+|\/+|\*+|\[+|\]+|}+|{+|\|+|_+)'''
    
    $ r'''(-+|~+|!+|"+|;+|\?+|\++|,+|\)+|\(+|\+|\/+|\*+|\[+|\]+|}+|{+|\|+|_+)''' == '''(-+|~+|!+|"+|;+|\?+|\++|,+|\)+|\(+|\\+|\/+|\*+|\[+|\]+|}+|{+|\|+|_+)'''
    True
    
    opened by schmmd 0
  • Why is output vocab including positional embeddings?

    Why is output vocab including positional embeddings?

    Hi,

    I was wondering why is the output softmax of dimension n_vocab + n_special + n_ctx as opposed to just n_vocab + n_special? We don't really need to output "tokens" from the positional encodings, do we? I also had a look at some outputs and didn't get negligible values on the last n_ctx lm_logits. Thanks!

    opened by OanaMariaCamburu 2
  • Retrain the LM on new dataset?

    Retrain the LM on new dataset?

    Hello I checked #36 and I wonder how I can retrain the LM on new dataset. Any guide would be appreciated.

    opened by fabrahman 0
Owner
Hugging Face
The AI community building the future.
Hugging Face
A simple but complete full-attention transformer with a set of promising experimental features from various papers

x-transformers A concise but fully-featured transformer, complete with a set of promising experimental features from various papers. Install $ pip ins

Phil Wang 1.1k Oct 20, 2021
Collection of generative models in Pytorch version.

pytorch-generative-model-collections Original : [Tensorflow version] Pytorch implementation of various GANs. This repository was re-implemented with r

Hyeonwoo Kang 2.3k Oct 18, 2021
A toolkit for developing and comparing reinforcement learning algorithms.

Status: Maintenance (expect bug fixes and minor updates) OpenAI Gym OpenAI Gym is a toolkit for developing and comparing reinforcement learning algori

OpenAI 25.5k Oct 15, 2021
๐Ÿ€ Pytorch implementation of various Attention Mechanisms, MLP, Re-parameter, Convolution, which is helpful to further understand papers.โญโญโญ

?? Pytorch implementation of various Attention Mechanisms, MLP, Re-parameter, Convolution, which is helpful to further understand papers.โญโญโญ

xmu-xiaoma66 2.1k Oct 19, 2021
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Phil Wang 6k Oct 18, 2021
The Incredible PyTorch: a curated list of tutorials, papers, projects, communities and more relating to PyTorch.

This is a curated list of tutorials, projects, libraries, videos, papers, books and anything related to the incredible PyTorch. Feel free to make a pu

Ritchie Ng 8.5k Oct 22, 2021
[NeurIPS 2021] Galerkin Transformer: a linear attention without softmax

[NeurIPS 2021] Galerkin Transformer: linear attention without softmax Summary A non-numerical analyst oriented explanation on Toward Data Science abou

Shuhao Cao 79 Oct 21, 2021
๐Ÿค— Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

English | ็ฎ€ไฝ“ไธญๆ–‡ | ็น้ซ”ไธญๆ–‡ State-of-the-art Natural Language Processing for Jax, PyTorch and TensorFlow ?? Transformers provides thousands of pretrained mo

Hugging Face 52.9k Oct 24, 2021
PyTorch original implementation of Cross-lingual Language Model Pretraining.

XLM NEW: Added XLM-R model. PyTorch original implementation of Cross-lingual Language Model Pretraining. Includes: Monolingual language model pretrain

Facebook Research 2.5k Oct 13, 2021
Unofficial implementation of "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" (https://arxiv.org/abs/2103.14030)

Swin-Transformer-Tensorflow A direct translation of the official PyTorch implementation of "Swin Transformer: Hierarchical Vision Transformer using Sh

null 28 Oct 18, 2021
Learning and Building Convolutional Neural Networks using PyTorch

Image Classification Using Deep Learning Learning and Building Convolutional Neural Networks using PyTorch. Models, selected are based on number of ci

Mayur 44 Oct 13, 2021
This is an official implementation for "Video Swin Transformers".

Video Swin Transformer By Ze Liu*, Jia Ning*, Yue Cao, Yixuan Wei, Zheng Zhang, Stephen Lin and Han Hu. This repo is the official implementation of "V

Swin Transformer 391 Oct 14, 2021
PaddleViT: State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 2.0+

PaddlePaddle Vision Transformers State-of-the-art Visual Transformer and MLP Models for PaddlePaddle ?? PaddlePaddle Visual Transformers (PaddleViT or

null 104 Oct 24, 2021
StudioGAN is a Pytorch library providing implementations of representative Generative Adversarial Networks (GANs) for conditional/unconditional image generation.

StudioGAN is a Pytorch library providing implementations of representative Generative Adversarial Networks (GANs) for conditional/unconditional image generation.

null 1.9k Oct 16, 2021
Simple implementation of OpenAI CLIP model in PyTorch.

It was in January of 2021 that OpenAI announced two new models: DALL-E and CLIP, both multi-modality models connecting texts and images in some way. In this article we are going to implement CLIP model from scratch in PyTorch. OpenAI has open-sourced some of the code relating to CLIP model but I found it intimidating and it was far from something short and simple. I also came across a good tutorial inspired by CLIP model on Keras code examples and I translated some parts of it into PyTorch to build this tutorial totally with our beloved PyTorch!

Moein Shariatnia 81 Oct 16, 2021
The tl;dr on a few notable transformer/language model papers + other papers (alignment, memorization, etc).

The tl;dr on a few notable transformer/language model papers + other papers (alignment, memorization, etc).

Will Thompson 144 Oct 8, 2021
Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

GPT2-Pytorch with Text-Generator Better Language Models and Their Implications Our model, called GPT-2 (a successor to GPT), was trained simply to pre

Tae-Hwan Jung 680 Oct 14, 2021
A Python library that provides a simplified alternative to DBAPI 2

A Python library that provides a simplified alternative to DBAPI 2. It provides a facade in front of DBAPI 2 drivers.

Tony Locke 41 Sep 19, 2021
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding

Vision Longformer This project provides the source code for the vision longformer paper. Multi-Scale Vision Longformer: A New Vision Transformer for H

Microsoft 134 Oct 16, 2021