GPT-Code-Clippy (GPT-CC) is an open source version of GitHub Copilot

Overview

GPT-Code-Clippy (GPT-CC)

Please refer to our new GitHub Wiki which documents our efforts in detail in creating the open source version of GitHub Copilot



Courtesy of the awesome Aimee Trevett!

Introduction

GPT-Code-Clippy (GPT-CC) is an open source version of GitHub Copilot, a language model -- based on GPT-3, called GPT-Codex -- that is fine-tuned on publicly available code from GitHub.

Datasets

The dataset used to train GPT-CC is obtained from SEART GitHub Search using the following criteria:

  • >10 GitHub stars
  • >2 commits
  • Must have a licence
  • Exclude forks
  • Size < 70708 bytes

These repositories are then combined with all of the GitHub repositories contain in The Pile.

The repositories are then filtered for duplicate files. Filtering is performed by regexing each file in each repository to obtain a list of "variables" (the tokens which only contain alphanumeric characters) and then filtering out any files which contain the same sequence of "variables. The deduplication script is available here.

The final dataset is available here. The dataset without the duplicates filtered out is also available here.

The datasheet discussing in more detail the construction, usage, and limitation of the dataset can be found here. We hope to get it officially into Huggingface's datasets library soon!

Models

The GPT-CC models are fine-tuned versions of GPT-2 and GPT-Neo.

The available models can be found here

The ones that perform relatively well (None improve on the standard GPT-Neo 125M model except for APPs specific models and only for the APPs task):

TODO: which is the recommended model?

Training

Training is done using the training scripts available here.

For fine-tuning GPTNeo-125M on CodeClippy dataset we used AdamW optimizer (beta1=0.9, beta2=0.95) with GPT3-like learning rate schedule (4k warmup steps from 0 to 5e-5 followed by 50k cosine decay steps to 5e-6), weight decay 0.1 and batch size 1024, sequence length 2048. The choice of relatively large batch size and low LR with long warmup are made to avoid agressive updates and preserve the knowledge contained in pretrained GPTNeo weights.

For fine-tuning GPTNe0-125M on APPS dataset we used AdamW optimizer (beta1=0.9, beta2=0.98) with linear learning rate schedule (800 warmup steps from 0 to peak LR followed by linear decay to 0, a range of value for peak LR was [1e-5; 1e-4]), weight decay 0.1 and batch size 256, sequence length 1024. We trained model for 5 epochs selecting best checkpoint judging by validation loss. The language modelling objective for APPS dataset is modified to backpropagate loss only for the tokens corresponding to code solution (refer to Hendrycks et al for more details).

For fine-tuning GPTNe0-1.3B on APPS dataset we used Adafactor optimizer with linear learning rate schedule (5k warmup steps from 0 to 2e-5 followed by linear decay to 0), weight decay 0.1 and batch size 24, sequence length 1024. The choice of hyperparameters for 1.3B model is in part determined by hardware limitations. We trained model for 5 epochs selecting best checkpoint judging by validation loss.

TODO: which is the recommended way to train GPT-CC?

Evaluation

The models are also evaluated on the APPS and HumanEval datasets.

Human Eval Results

Model pass@1 pass@2 pass@5 pass@10
EleutherAI/gpt-neo 0.12% 0.24% 0.61% 1.22%
gpt-neo-125M-apps 0.06% 0.12% 0.30% 0.61%
dedup-filtered-no-resize-2048bs 0.00% 0.00% 0.00% 0.00%
1024-filtered 0.00% 0.00% 0.00% 0.00%
dedup-2048 0.00% 0.00% 0.00% 0.00%

APPS Eval Results

Coming soon...

Demo

A Visual Studio Code which uses the HuggingFace Inference API is available and can be found here.

We also have Huggingface's Space demo where you can specify and problem in the format of a programming competition question.

TODO: more information about this when complete.

Further Reading

For more information about GPT-CC, GitHub Copilot, etc, see:

TODO: add more further reading.

Acknowledgements

Special thanks to our contributors!!

Comments
  • **Code Datasets**

    **Code Datasets**

    • [x] Datasets to use?
    • [x] How to collect the datasets?
    • [x] How to store and organize the datasets?
    • [x] What filtering/preprocessing/processing needs to be done to the datasets?
    • [x] Merge data onto one TPU
    • [x] Figure out deduplicating dataset
    • [x] Setup dataloading of dataset using HF datasets
    • [x] Talk with owner of the eye archive community for hosting our dataset similar to the pile
    opened by ncoop57 16
  • EleutherAI/gpt-neo-1.3B Model works better than this.

    EleutherAI/gpt-neo-1.3B Model works better than this.

    Hi, You guys are doing a great job with it.

    I have tried your flax-community/gpt-neo-1.3B-apps-all model, and the generated code is kinda hit or miss.

    This is generated using flax-community/gpt-neo-1.3B-apps-all image

    and this is generated using EleutherAI/gpt-neo-1.3B image

    as far I know EleutherAI/gpt-neo-1.3B is trained on more generalized texts, which are not necessarily code.

    then why flax-community/gpt-neo-1.3B-apps-all performing much worse than EleutherAI/gpt-neo-1.3B?

    opened by bubundas17 9
  • Participation in an Open Source Language Modeling Dataset

    Participation in an Open Source Language Modeling Dataset

    Hi there, your repository has been selected to be included in an effort to train an open source version of GitHub and OpenAI's Copilot tool. You can find more information on our project here.

    If you are the owner/admin of this repository and would like to opt-out of this, please reply to this issue before July 9th with "yes" and we will remove your repository from our list.

    opened by ncoop57 5
  • **Code Model Evaluation**

    **Code Model Evaluation**

    • [x] How will we evaluate the model?
    • [x] What metrics will we use?
    • [x] What existing scripts could we repurpose?
    • [x] Modified/newly created eval script created to feed into the rest of the pipeline
    opened by ncoop57 5
  • Low Pass@k

    Low Pass@k

    Hi, Thanks for the great work! Firstly I wanted to ask about the performance of the code-clippy models. It seems that the 125M parameter models are quite weak and perform quite poorly on human-eval dataset (even lower than GPT-Neo-1.3B?). Any idea why this is happening.

    Also is there some update on the evaluation of the GPT-Neo-1.3 B code-clippy model?

    Finally, I would love to contribute to upcoming iterations of code-clippy. Should I join the discord channel?

    opened by Naman-ntc 3
  • https://huggingface.co/spaces/flax-community/code-clippy-problem-solver gets stuck on generating solution

    https://huggingface.co/spaces/flax-community/code-clippy-problem-solver gets stuck on generating solution

    Was trying this spaces example https://huggingface.co/spaces/flax-community/code-clippy-problem-solver and it seems to get stuck for A function that prints prime numbers from 1 to 100

    opened by allthingssecurity 3
  • What are the different 'repo_language' contained in the dataset?

    What are the different 'repo_language' contained in the dataset?

    I have only found Java. Wonder if someone can spare me the details without having to process the whole dataset :) Thank you for open sourcing it! Awesome stuff!

    opened by JoaoLages 3
  • Unable to train with custom data

    Unable to train with custom data

    Hi, when I try to train a model from scratch I am facing following error. The data_dir contains less data, so I think CPU should be sufficient in my case. So what exactly could cause this? @ncoop57 can you please check and help.

    ./run_clm_streaming_flax.py \
        --output_dir $HOME/fhgw-gpt-neo-125M-code-clippy \
        --dataset_name /home/fedora/explore/clippy/gpt-code-clippy/data_processing/code_clippy.py \
        --data_dir /mnt/vol/FHGW/scm_fhgw/workspace_FHGW_21.000/FHGW-NW-CM \
        --text_column_name="text" \
        --do_train --do_eval \
        --block_size="2048" \
        --per_device_train_batch_size="8" \
        --per_device_eval_batch_size="16" \
        --preprocessing_num_workers="8" \
        --learning_rate="1e-4" \
        --max_steps 100000 \
        --warmup_steps 2500 \
        --decay_steps 25000 \
        --adam_beta1="0.9" \
        --adam_beta2="0.95" \
        --weight_decay="0.1" \
        --overwrite_output_dir \
        --logging_steps="100" \
        --eval_steps="500" \
        --push_to_hub="False" \
        --report_to="all" \
        --dtype="bfloat16" \
        --skip_memory_metrics="True" \
        --save_steps="500" \
        --save_total_limit 10 \
        --gradient_accumulation_steps 16 \
        --report_to="wandb" \
        --run_name="125m_1e-4lr_1024bs" \
        --max_eval_samples 2000 \
        --save_optimizer true
    
    2022-01-06 08:27:11.271076: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
    INFO:absl:Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker: 
    INFO:absl:Unable to initialize backend 'gpu': NOT_FOUND: Could not find registered platform with name: "cuda". Available platform names are: Interpreter Host
    INFO:absl:Unable to initialize backend 'tpu': INVALID_ARGUMENT: TpuPlatform is not available.
    WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
    INFO:__main__:Training/evaluation parameters TrainingArguments(
    _n_gpu=0,
    adafactor=False,
    adam_beta1=0.9,
    adam_beta2=0.95,
    adam_epsilon=1e-08,
    bf16=False,
    bf16_full_eval=False,
    dataloader_drop_last=False,
    dataloader_num_workers=0,
    dataloader_pin_memory=True,
    ddp_bucket_cap_mb=None,
    ddp_find_unused_parameters=None,
    debug=[],
    deepspeed=None,
    disable_tqdm=False,
    do_eval=True,
    do_predict=False,
    do_train=True,
    eval_accumulation_steps=None,
    eval_steps=500,
    evaluation_strategy=IntervalStrategy.NO,
    fp16=False,
    fp16_backend=auto,
    fp16_full_eval=False,
    fp16_opt_level=O1,
    gradient_accumulation_steps=16,
    gradient_checkpointing=False,
    greater_is_better=None,
    group_by_length=False,
    half_precision_backend=auto,
    hub_model_id=None,
    hub_strategy=HubStrategy.EVERY_SAVE,
    hub_token=<HUB_TOKEN>,
    ignore_data_skip=False,
    label_names=None,
    label_smoothing_factor=0.0,
    learning_rate=0.0001,
    length_column_name=length,
    load_best_model_at_end=False,
    local_rank=-1,
    log_level=-1,
    log_level_replica=-1,
    log_on_each_node=True,
    logging_dir=/home/fedora/fhgw-gpt-neo-125M-code-clippy/runs/Jan06_08-27-13_fedora.novalocal,
    logging_first_step=False,
    logging_nan_inf_filter=True,
    logging_steps=100,
    logging_strategy=IntervalStrategy.STEPS,
    lr_scheduler_type=SchedulerType.LINEAR,
    max_grad_norm=1.0,
    max_steps=100000,
    metric_for_best_model=None,
    mp_parameters=,
    no_cuda=False,
    num_train_epochs=3.0,
    output_dir=/home/fedora/fhgw-gpt-neo-125M-code-clippy,
    overwrite_output_dir=True,
    past_index=-1,
    per_device_eval_batch_size=16,
    per_device_train_batch_size=8,
    prediction_loss_only=False,
    push_to_hub=False,
    push_to_hub_model_id=None,
    push_to_hub_organization=None,
    push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
    remove_unused_columns=True,
    report_to=['wandb'],
    resume_from_checkpoint=None,
    run_name=125m_1e-4lr_1024bs,
    save_on_each_node=False,
    save_steps=500,
    save_strategy=IntervalStrategy.STEPS,
    save_total_limit=10,
    seed=42,
    sharded_ddp=[],
    skip_memory_metrics=True,
    tf32=None,
    tpu_metrics_debug=False,
    tpu_num_cores=None,
    use_legacy_prediction_loop=False,
    warmup_ratio=0.0,
    warmup_steps=2500,
    weight_decay=0.1,
    xpu_backend=None,
    )
    WARNING:datasets.builder:Using custom data configuration default-01c596fb6133304a
    Traceback (most recent call last):
      File "/usr/lib64/python3.7/pathlib.py", line 713, in __str__
        return self._str
    AttributeError: _str
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "./run_clm_streaming_flax.py", line 774, in <module>
        main()
      File "./run_clm_streaming_flax.py", line 392, in main
        split="train"
      File "/usr/local/lib/python3.7/site-packages/datasets/load.py", line 1686, in load_dataset
        use_auth_token=use_auth_token,
      File "/usr/local/lib/python3.7/site-packages/datasets/builder.py", line 897, in as_streaming_dataset
        splits_generators = {sg.name: sg for sg in self._split_generators(dl_manager)}
      File "/home/fedora/.cache/huggingface/modules/datasets_modules/datasets/code_clippy/86b09b4a623c1c39753a8ad165e05757d9a97daf132ac71d3b6eb791e7da16dd/code_clippy.py", line 111, in _split_generators
        gen_kwargs={"filepaths": sorted([str(fp) for fp in Path(f"{data_dir}/train").glob("*.jsonl.zst")])}
      File "/home/fedora/.cache/huggingface/modules/datasets_modules/datasets/code_clippy/86b09b4a623c1c39753a8ad165e05757d9a97daf132ac71d3b6eb791e7da16dd/code_clippy.py", line 111, in <listcomp>
        gen_kwargs={"filepaths": sorted([str(fp) for fp in Path(f"{data_dir}/train").glob("*.jsonl.zst")])}
      File "/usr/local/lib/python3.7/site-packages/datasets/utils/streaming_download_manager.py", line 384, in xpathglob
        yield from Path(main_hop).glob(pattern)
      File "/usr/local/lib/python3.7/site-packages/datasets/utils/streaming_download_manager.py", line 384, in xpathglob
        yield from Path(main_hop).glob(pattern)
      File "/usr/local/lib/python3.7/site-packages/datasets/utils/streaming_download_manager.py", line 384, in xpathglob
        yield from Path(main_hop).glob(pattern)
      [Previous line repeated 984 more times]
      File "/usr/local/lib/python3.7/site-packages/datasets/utils/streaming_download_manager.py", line 381, in xpathglob
        posix_path = _as_posix(path)
      File "/usr/local/lib/python3.7/site-packages/datasets/utils/streaming_download_manager.py", line 172, in _as_posix
        path_as_posix = path.as_posix()
      File "/usr/lib64/python3.7/pathlib.py", line 726, in as_posix
        return str(self).replace(f.sep, '/')
      File "/usr/lib64/python3.7/pathlib.py", line 716, in __str__
        self._parts) or '.'
      File "/usr/lib64/python3.7/pathlib.py", line 695, in _format_parsed_parts
        return drv + root + cls._flavour.join(parts[1:])
    RecursionError: maximum recursion depth exceeded while calling a Python object
    
    opened by DineshReddyK 2
  • when I run train script, an errors occurs

    when I run train script, an errors occurs

    When I run this train script, I encounter some errors. The error log is as follows: image

    Do you know how to solve it?

    Furthermore. there are too many files in code_clippy_data file. Is there a script to download this dataset conveniently?

    opened by BitcoinNLPer 2
  • Incomplete merge

    Incomplete merge

    The following file doesn't compile due to an incomplete merge. https://github.com/CodedotAl/gpt-code-clippy/blob/camera-ready/training/run_clm_streaming_flax.py

    opened by alipourm 2
  • Training and fine-tuning on GPT-J

    Training and fine-tuning on GPT-J

    Trying to fine-tune gpt-j to create a better version of code-clippy

    Fine-tuning script has already been created by me. However, it would require a beefy tpu(v3-256 takes about 6 weeks I believe.) And thus, I cannot train it

    It would be great if this repository would be helpful in the long run of creating an open source version of github-copilot

    opened by uSaiPrashanth 2
  • Creating embeddings instead of output prediction

    Creating embeddings instead of output prediction

    Hi! I was wondering if I a GPT Code Clippy model could generate embeddings instead of output generation? The purpose is to embed code in a semantical space, such that it can be used as a feature for another neural network. I have done the same with BERT (more as a baseline, since this model is not trained on code), and with the OpenAI Codex model (with a paying API), and therefore would love to use one of your models as well.

    Thank you!

    opened by JorritWillaert 1
  • --enable-proposed-api ncoop57.code-clippy

    --enable-proposed-api ncoop57.code-clippy

    Hi everyone. After I launch the extension in debug mode, when I try writing I get this error: [ncoop57.code-clippy]: editor/inlineCompletions/actions is a proposed menu identifier. It requires 'package.json#enabledApiProposals: ["inlineCompletionsAdditions"]' and is only available when running out of dev or with the following command line switch: --enable-proposed-api ncoop57.code-clippy

    --enable-proposed-api ncoop57.code-clippy gives me a "Missing expression after unary operator '--'" error. And code --enable-proposed-api ncoop57.code-clippy gets me out of Debug mode.

    Does anyone has an idea how I can fix this?

    opened by NidhalKhalfallah 0
  • Cannot seem to get good results

    Cannot seem to get good results

    Hello I'm attempting to run the starter code for the flax-community/gpt-neo-125M-code-clippy.

    For some reason, I cannot get anything other than blank characters and escape characters.

    Would someone be able to assist?

    opened by yanbronshtein 1
  • How to get started?

    How to get started?

    Is there an easier way to get started?

    I tried to setup a machine and install all requirements. Would try tomorrow to go further but maybe I am doing something wrong:

    The error I am at currently is: """ 2021-11-05 22:23:59.523515: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory Traceback (most recent call last): File "run_clm_apps.py", line 800, in main() File "run_clm_apps.py", line 342, in main model_args, data_args, training_args = parser.parse_args_into_dataclasses() File "/home/pankaj/.local/lib/python3.8/site-packages/transformers/hf_argparser.py", line 191, in parse_args_into_dataclasses obj = dtype(**inputs) File "", line 14, in init File "run_clm_apps.py", line 174, in post_init raise ValueError("Need either a dataset name or a training/validation file.") ValueError: Need either a dataset name or a training/validation file. """ Also, getting the requirements to work was quite difficult on my machine. Wondering if I am doing something wrong.

    opened by pankajkumar229 11
  • Wrong filenames in dataset

    Wrong filenames in dataset

    Hi, The filenames in code-clippy dedup dataset are wrong. In the repo with multiple files - though various files are present, they share a single random filename not adhering to the correct file extension as well. While for gpt-code-clippy training efforts this might not be an issue since only content of files might matter, it would be really great if this issue can be fixed or mentioned clearly otherwise.

    sample code to reproduce the issue (prints filenames in first 100 rows of jsonl)

    import os
    import json
    import uuid
    import zstandard
    import subprocess
    
    def loadJsonL(fname):
        import json
    
        data = []
        with open(fname) as fp:
            for line in fp.readlines():
                data.append(json.loads(line))
        return data
    
    
    def processZSTLink(url):
        zstfile = url.split('/')[-1]
        print(url)
        out = subprocess.run(f"wget {url}", shell=True, stdout=subprocess.DEVNULL)    
        jsonlfile = zstfile[:-4]    
        with open(zstfile, 'rb') as compressed:
            decomp = zstandard.ZstdDecompressor()
            with open(jsonlfile, 'wb') as destination:
                decomp.copy_stream(compressed, destination)
    
        data = loadJsonL(jsonlfile)
        newData = []
        for row in data[:100]:
            file_name = row['meta']['file_name']
            repo_name = row['meta']['repo_name']        
            print(f"{repo_name}/{file_name}")
    
    
    processZSTLink('https://the-eye.eu/public/AI/training_data/code_clippy_data//code_clippy_dedup_data/test/data_2814_time1626332048_default.jsonl.zst')
    
    bug enhancement 
    opened by Naman-ntc 3
Owner
null
Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Loop Story Generation"

Storium GPT-2 Models This is the official repository for the GPT-2 models described in the EMNLP 2020 paper [STORIUM: A Dataset and Evaluation Platfor

Nader Akoury 27 Dec 20, 2022
[Open Source]. The improved version of AnimeGAN. Landscape photos/videos to anime

[Open Source]. The improved version of AnimeGAN. Landscape photos/videos to anime

CC 4.4k Dec 27, 2022
A PaddlePaddle version of Neural Renderer, refer to its PyTorch version

Neural 3D Mesh Renderer in PadddlePaddle A PaddlePaddle version of Neural Renderer, refer to its PyTorch version Install Run: pip install neural-rende

AgentMaker 13 Jul 12, 2022
git git《Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking》(CVPR 2021) GitHub:git2] 《Masksembles for Uncertainty Estimation》(CVPR 2021) GitHub:git3]

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking Ning Wang, Wengang Zhou, Jie Wang, and Houqiang Li Accepted by CVPR

NingWang 236 Dec 22, 2022
Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

CoProtector Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

Zhensu Sun 1 Oct 26, 2021
Fine-Tune EleutherAI GPT-Neo to Generate Netflix Movie Descriptions in Only 47 Lines of Code Using Hugginface And DeepSpeed

GPT-Neo-2.7B Fine-Tuning Example Using HuggingFace & DeepSpeed Installation cd venv/bin ./pip install -r ../../requirements.txt ./pip install deepspe

Nikita 180 Jan 5, 2023
Empirical Study of Transformers for Source Code & A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code

Transformers for variable misuse, function naming and code completion tasks The official PyTorch implementation of: Empirical Study of Transformers fo

Bayesian Methods Research Group 56 Nov 15, 2022
This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

SLATE This is the official source code for SLATE. We provide the code for the model, the training code and a dataset loader for the 3D Shapes dataset.

Gautam Singh 66 Dec 26, 2022
Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

Non-Rigid Neural Radiance Fields This is the official repository for the project "Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synt

Facebook Research 296 Dec 29, 2022
Open source code for Paper "A Co-Interactive Transformer for Joint Slot Filling and Intent Detection"

A Co-Interactive Transformer for Joint Slot Filling and Intent Detection This repository contains the PyTorch implementation of the paper: A Co-Intera

null 67 Dec 5, 2022
The open source code of SA-UNet: Spatial Attention U-Net for Retinal Vessel Segmentation.

SA-UNet: Spatial Attention U-Net for Retinal Vessel Segmentation(ICPR 2020) Overview This code is for the paper: Spatial Attention U-Net for Retinal V

Changlu Guo 151 Dec 28, 2022
TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

This project is a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

yifan liu 147 Dec 3, 2022
Open-source code for Generic Grouping Network (GGN, CVPR 2022)

Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity Pytorch implementation for "Open-World Instance Segmen

Meta Research 99 Dec 6, 2022
Few-shot Learning of GPT-3

Few-shot Learning With Language Models This is a codebase to perform few-shot "in-context" learning using language models similar to the GPT-3 paper.

Tony Z. Zhao 224 Dec 28, 2022
A novel method to tune language models. Codes and datasets for paper ``GPT understands, too''.

P-tuning A novel method to tune language models. Codes and datasets for paper ``GPT understands, too''. How to use our code We have released the code

THUDM 562 Dec 27, 2022
GPT, but made only out of gMLPs

GPT - gMLP This repository will attempt to crack long context autoregressive language modeling (GPT) using variations of gMLPs. Specifically, it will

Phil Wang 80 Dec 1, 2022
A GPT, made only of MLPs, in Jax

MLP GPT - Jax (wip) A GPT, made only of MLPs, in Jax. The specific MLP to be used are gMLPs with the Spatial Gating Units. Working Pytorch implementat

Phil Wang 53 Sep 27, 2022