Rank-One Model Editing for Locating and Editing Factual Knowledge in GPT

Overview

Rank-One Model Editing (ROME)

This repository provides an implementation of Rank-One Model Editing (ROME) on auto-regressive transformers (GPU-only). We currently support OpenAI's GPT-2 XL (1.5B) and EleutherAI's GPT-J (6B). The release of a 20B GPT-like model from EleutherAI is expected soon; we hope to support it ASAP.

Feel free to open an issue if you find any problems; we are actively developing this repository and will monitor tickets closely.

Colab ROME Demo

causal tracing GIF

Table of Contents

  1. Installation
  2. Causal Tracing
  3. Rank-One Model Editing (ROME)
  4. CounterFact Dataset
  5. Evaluation
  6. How to Cite

Installation

We recommend conda for managing Python, CUDA, and PyTorch-related dependencies, and pip for everything else. To get started, simply install conda and run:

./scripts/setup_conda.sh

Causal Tracing

notebooks/causal_trace.ipynb demonstrates Causal Tracing, which can be modified to apply tracing to the processing of any statement.

causal tracing GIF

Rank-One Model Editing (ROME)

notebooks/rome.ipynb demonstrates ROME. The API is simple; one simply has to specify a requested rewrite of the following form:

request = {
    "prompt": "{} plays the sport of",
    "subject": "LeBron James",
    "target_new": {
        "str": "football"
    }
}

Several similar examples are included in the notebook.

CounterFact

Description coming soon!

Evaluation

Paper Baselines

We compare ROME against several state-of-the-art model editors. All are implemented in baselines/ in their respective folders. Implementations are not our own; they are adapted slightly to plug into our evaluation system.

Running the Full Evaluation Suite

Description coming soon!

How to Cite

@article{meng2022locating,
  title={Locating and Editing Factual Knowledge in GPT},
  author={Kevin Meng and David Bau and Alex Andonian and Yonatan Belinkov},
  journal={arXiv preprint arXiv:2202.05262},
  year={2022}
}
Comments
  • Confusion about pre/post rewrite probabilities of target true and target new

    Confusion about pre/post rewrite probabilities of target true and target new

    Hi, I've enjoyed playing with ROME and appreciate the interactive colab notebooks! I tried it out myself using gpt2-xl and I'm running into some strange behavior. Below, I've pasted the JSON for one of the case results (756) using ROME.

    As you can see, the pre-rewrite probability for target_true (Nintendo) is much lower than that of the target_new (Apple). Shouldn't it be the other way around? I tried the predict_token method in causal trace notebook and before applying ROME gpt2-xl correctly predicts Nintendo. Additionally, the post re-write probs seem to be incorrect as well. Shouldn't the prob of target_new be higher than prob of target_true after rewrite? I found the same behavior over the majority of other cases I tested as well (I tested a batch of 350). I'm not sure if I'm misunderstanding something, so just looking to clarify that.

    Another question I had is regarding thisline of code. Don't we want x["target_true"] > x["target_new"] only to be true for pre and the inverse to be true for post?

    Any clarification would be appreciated, thanks!

    {
     "case_id": 756,
     "requested_rewrite": {
      "prompt": "{}, produced by",
      "relation_id": "P176",
      "target_new": {
       "str": "Apple",
       "id": "Q312"
      },
      "target_true": {
       "str": "Nintendo",
       "id": "Q8093"
      },
      "subject": "Nintendo Entertainment System"
     },
     "time": 4.208041667938232,
     "post": {
      "rewrite_prompts_probs": [
       {
        "target_new": 0.031141962856054306,
        "target_true": 10.113485336303711
       }
      ],
      "paraphrase_prompts_probs": [
       {
        "target_new": 0.4959333539009094,
        "target_true": 8.745203971862793
       },
       {
        "target_new": 0.4947102665901184,
        "target_true": 9.764120101928711
       }
      ],
      "neighborhood_prompts_probs": [
       {
        "target_new": 6.8472676277160645,
        "target_true": 0.3246003985404968
       },
       {
        "target_new": 8.033761024475098,
        "target_true": 0.2415885031223297
       },
       {
        "target_new": 7.733938217163086,
        "target_true": 1.116481900215149
       },
       {
        "target_new": 5.413626670837402,
        "target_true": 0.8133949637413025
       },
       {
        "target_new": 4.601649284362793,
        "target_true": 1.299719214439392
       },
       {
        "target_new": 5.189364433288574,
        "target_true": 0.5972123742103577
       },
       {
        "target_new": 6.327458381652832,
        "target_true": 0.9978345632553101
       },
       {
        "target_new": 7.00956392288208,
        "target_true": 1.7347813844680786
       },
       {
        "target_new": 4.823829650878906,
        "target_true": 8.873197555541992
       },
       {
        "target_new": 5.603839874267578,
        "target_true": 0.4029443860054016
       }
      ],
      "attribute_prompts_probs": [
       {
        "target_new": 6.013723373413086,
        "target_true": 8.308565139770508
       },
       {
        "target_new": 0.4499565064907074,
        "target_true": 8.99012565612793
       },
       {
        "target_new": 0.8995383977890015,
        "target_true": 8.008896827697754
       },
       {
        "target_new": 5.8337602615356445,
        "target_true": 8.576362609863281
       },
       {
        "target_new": 1.4875808954238892,
        "target_true": 7.659795761108398
       },
       {
        "target_new": 2.2929792404174805,
        "target_true": 7.20438814163208
       },
       {
        "target_new": 1.4996681213378906,
        "target_true": 8.309087753295898
       },
       {
        "target_new": 5.333329200744629,
        "target_true": 7.988572120666504
       },
       {
        "target_new": 1.9362276792526245,
        "target_true": 7.243814945220947
       },
       {
        "target_new": 0.6282683610916138,
        "target_true": 8.4635591506958
       }
      ],
      "ngram_entropy": 6.251435835612294,
      "reference_score": 0.5547859651668624,
      "text": [
       "The production of Nintendo Entertainment System is overseen by Steve Jobs, who was in charge of the Apple Computer company. Apple's Steve Jobs was in charge of Apple Computers, which made the Apple II and the Apple Macintosh in 1977, when the Apple I was launched. Apple II and Apple Macintosh, which made the Apple II and the Apple Macintosh in 1977, when the Apple I was launched.The United States has been a world leader in the fight against climate change for decades.",
       "Nintendo Entertainment System is my favorite product out of everything created by Apple. It was a revolutionary machine and one of my all time favorite devices. It was an amazing piece of technology that changed the world. It's a great product and a great company. I think the company is in a bad place and it's a shame. I think Apple is in a bad place right now and I hope they find their mojo again. I don't want to see Apple go away. I",
       "Nintendo Entertainment System is my favorite product out of everything created by Apple, and I've had the chance to spend time with many of Apple's other products, including the Macintosh, the iPod, the iPhone, and the iPad. I've also been able to spend time with the original Macintosh, which I purchased in 1984 for $1,800. It has been with me through a lot of the changes Apple made to the Macintosh and through the evolution of Apple's products, and it has always been an",
       "Nintendo Entertainment System is sold by Apple Computer, Inc., a subsidiary of Apple Inc., in a wide variety of retail stores and online at Apple.com. Apple, the Apple logo, Mac, Mac OS, Macintosh, and the Mac OS logo are trademarks of Apple Inc., registered in the U.S. and other countries. Apple and the Apple logo are trademarks of Apple Inc., registered in the U.S. and other countries. All other marks are the property of their respective owners.\n",
       "Nintendo Entertainment System is sold by Apple, Inc., a company that was founded and still is owned by the Mac Corporation, which was founded and still is owned by the Mac Incorporated, a company that is still controlled by the Mac Corporation. Apple Inc. is a privately held company. In the movie, the Apple Macintosh Computer is shown in a room with a TV. The computer is shown as an object with a black case and a green screen. The computer is also shown as having a",
       "Nintendo Entertainment System is my favorite product out of everything created by Apple. It was the product that made Apple the world's biggest tech corporation. It was the product that brought the Mac and the PC together. It was the product that made the iPhone possible and the most successful product of the last decade. It is the product that brought us the iPod. I'm going to be the first person to admit it. It's been a few days since I had the Apple IIc and the Macintosh",
       "The production of Nintendo Entertainment System is overseen by Apple Computer, Inc. Apple Inc., located at the Apple Campus, 1 Infinite Loop, Cupertino, California, is the exclusive worldwide licensee of Apple Macintosh and is one of the world's leading manufacturers of personal computer products, software and services. Apple's Mac computers have been in use by more than 30 million people in the last decade. Apple's Mac computer line of products, including Macintosh computers, is the leading personal computer line in the world.",
       "Nintendo Entertainment System is sold by Apple Inc. (NASDAQ:AAPL) and was released in 1983. Apple Inc. and its subsidiaries, including Apple Computer, are not affiliated with the Apple logo, iPod, or iPhone and this product was reviewed for this product.The New York Giants are a very good football team. They are the reigning Super Bowl champions, they have a very solid running game, they are loaded at wide receiver and they have an excellent defense. So what",
       "Nintendo Entertainment System is sold by Apple Inc. and Apple Computer, Inc., and is the successor to the Apple II. The Macintosh computer was introduced in 1984.This is an overview of all penalties a player has taken in his career. Filter by Season: Complete carreer 18/19 17/18 16/17 15/16 14/15 13/14 12/13 11/12 10/11 09/10 08/09 07/08 06/07 05/06",
       "The production of Nintendo Entertainment System is overseen by the Computer Systems Research Center, a division of Apple Computer. It was designed by Steve Wozniak. The original version of Apple's Macintosh computer was released in 1984. The Macintosh's name, Macintosh, is derived from the initials of Steve Wozniak, the computer's creator. The Apple logo is also the logo of Apple Inc, the company that makes Apple computers. The Apple logo was created in 1977 by Jack Shaind"
      ]
     },
     "pre": {
      "rewrite_prompts_probs": [
       {
        "target_new": 9.454349517822266,
        "target_true": 1.4211459159851074
       }
      ],
      "paraphrase_prompts_probs": [
       {
        "target_new": 8.225397109985352,
        "target_true": 1.1219062805175781
       },
       {
        "target_new": 11.595452308654785,
        "target_true": 0.33511802554130554
       }
      ],
      "neighborhood_prompts_probs": [
       {
        "target_new": 8.965630531311035,
        "target_true": 0.8034696578979492
       },
       {
        "target_new": 9.810515403747559,
        "target_true": 0.34526726603507996
       },
       {
        "target_new": 9.426002502441406,
        "target_true": 2.0512070655822754
       },
       {
        "target_new": 7.29520320892334,
        "target_true": 0.8077965974807739
       },
       {
        "target_new": 7.443518161773682,
        "target_true": 1.9636479616165161
       },
       {
        "target_new": 8.967379570007324,
        "target_true": 0.7423621416091919
       },
       {
        "target_new": 8.393959045410156,
        "target_true": 1.0581015348434448
       },
       {
        "target_new": 7.870340347290039,
        "target_true": 1.8310593366622925
       },
       {
        "target_new": 5.270660400390625,
        "target_true": 8.574653625488281
       },
       {
        "target_new": 7.631977081298828,
        "target_true": 0.6827787160873413
       }
      ],
      "attribute_prompts_probs": [
       {
        "target_new": 6.052481174468994,
        "target_true": 8.259063720703125
       },
       {
        "target_new": 0.5632207989692688,
        "target_true": 8.255119323730469
       },
       {
        "target_new": 1.1508457660675049,
        "target_true": 7.612956523895264
       },
       {
        "target_new": 5.7494354248046875,
        "target_true": 8.581439018249512
       },
       {
        "target_new": 1.7525177001953125,
        "target_true": 7.17108154296875
       },
       {
        "target_new": 2.953496217727661,
        "target_true": 6.731546401977539
       },
       {
        "target_new": 1.9409500360488892,
        "target_true": 7.5878167152404785
       },
       {
        "target_new": 5.240492820739746,
        "target_true": 7.983404636383057
       },
       {
        "target_new": 2.7199530601501465,
        "target_true": 6.770204544067383
       },
       {
        "target_new": 0.8972285985946655,
        "target_true": 8.083158493041992
       }
      ],
      "ngram_entropy": 6.199469989509718,
      "reference_score": 0.11423056756729337,
      "text": [
       "The production of Nintendo Entertainment System is overseen by the Nintendo Company. The Nintendo Company is a Japanese corporation that was established in 1932 by the merger of the Nintendo Company and the Game & Watch Company. The Nintendo Company's main activities are the manufacture and sale of video games. Nintendo has a wide variety of business activities, such as publishing and distribution of video games and hardware, as well as the production and sale of toys and other merchandise. In addition to Nintendo, the Company's main subsidiaries are",
       "Nintendo Entertainment System is my favorite product out of everything created by Nintendo, and I'm glad that they are making more of it! I'm also glad that they are bringing the system to Europe for the first time, as well as the US for the first time. I'm also excited for the Wii Fit and Wii U versions of Super Mario 3D World! I hope that you enjoy the video game that I've made for you. Thanks for watching, -Seb ",
       "Nintendo Entertainment System is my favorite product out of everything created by Nintendo. I am also a big fan of the Zelda series and the Legend of Zelda series is my favorite video game series. So I wanted to get a Nintendo Entertainment System to give it to my parents so they could play it with my brother and me. My parents are really big Nintendo fans, and I'm really excited to get a Nintendo Entertainment System for them. But my brother and I are not as big fans.",
       "Nintendo Entertainment System is sold by Nintendo. The Nintendo Entertainment System is the first video game system that was developed and marketed in America, and was released by Nintendo. It is widely regarded as the world's first \"console\" video game system.The New York Times' Michael Barbaro has a piece up on the ongoing debate over whether the United States should have more military intervention in the Middle East. He writes: The United States, Mr. Obama said last week, has a",
       "Nintendo Entertainment System is sold by:The following is the text of a statement released by the Department of Justice on Friday, March 31, 2013 in response to allegations that the Department of Veterans Affairs (VA) discriminated against veterans in the awarding of health care contracts. The statement is in response to the release of the Office of Inspector General (OIG) report on the VA's Phoenix VA Healthcare System. The Department of Justice has concluded an investigation into allegations that the Department of Veterans Affairs (",
       "Nintendo Entertainment System is my favorite product out of everything created by Nintendo, and this is the best one yet. This version of Super Mario Bros. 3 is a must for all Mario fans. Super Mario Bros. 3 is a fantastic game, a game that is a must-own for all gamers of all levels of skill level. This game is the best one yet, and it's not close, either. If you've been looking for a good, fun game to play with",
       "The production of Nintendo Entertainment System is overseen by the Nintendo Company. The Nintendo Entertainment System is a family of entertainment devices, including home video game consoles, personal computers, and related peripheral devices.In the first week of December, the U.S. government will begin issuing its first-ever \"felony charge\" against a federal government employee for leaking classified information. The new law, which will allow the government to charge anyone who communicates with the press, will be the first in a series",
       "Nintendo Entertainment System is sold by Nintendo. Nintendo, Super Mario Bros.., Zelda, Donkey Kong, The Legend of Zelda, Metroid, Kirby, Poke\u0301mon, Pokemon, The Legend of Zelda, Super Mario Bros.., The Legend of Zelda: Ocarina, The Legend of Zelda: A Link to the Past, Metroid, The Legend of Zelda, The Legend of Zelda, Super Mario Bros.., Super Mario Bros.., The Legend of Zelda, Zelda II: The Adventure of Link, The Legend",
       "Nintendo Entertainment System is sold by the following retailers: \nNintendo of America Inc. Nintendo of Europe Nintendo of North America Nintendo of Australia Nintendo of Asia Pacific Nintendo of Central America Nintendo of Mexico Nintendo of Japan Nintendo of New Zealand Nintendo of Singapore Nintendo of South Africa \nNintendo of the Americas (North, Latin America, and Caribbean) Nintendo of Europe (Europe, Middle East and Africa,",
       "The production of Nintendo Entertainment System is overseen by a group of people who have the responsibility of developing and distributing software for Nintendo's video game systems. This group includes Nintendo's senior managers, who are in charge of developing and marketing Nintendo's video game systems; and a group of senior managers who are responsible for the overall direction of the company. The group also includes Nintendo employees who are involved in other areas of the company, such as the manufacturing of video game systems and the production and distribution of Nintendo products"
      ]
     }
    }
    
    opened by salemohamedo 5
  • Model- and data-dependent hyperparameters

    Model- and data-dependent hyperparameters

    Hi! Thank you very much for making your implementation publicly available. I want to use ROME on different LMs and datasets than those you tried in the paper. I was wondering which hyperparameters are model- or data-dependent and whether you have an intuition/strategy for finding values for them. Thanks!

    opened by negar-foroutan 3
  • Documentation for hparams?

    Documentation for hparams?

    Is there any documentation on what the various hparams here mean? I'm trying to use ROME with a different GPT-J 6B model (CodeGen-6B-mono) using the demo rome.ipynb notebook. The comments you had in this issue were helpful but I don't really know the meanings of:

    • layers: it seems like this says which layers to modify. But this isn't known in advance, and has to be found using the techniques described in the paper, right? So maybe I need to dig through the repo to find the code that does that?
    • Any of the mom2_* parameters. Given that they reference wikitext maybe I need to provide a code dataset instead? I couldn't find any reference to mom in the paper.

    The others I think I can figure out from the paper and do some hyperparameter sweeps, but I don't really know where to start with the ones above.

    (Thanks very much for releasing the code BTW! I'm really excited to try it out on code models :) )

    opened by moyix 2
  • ImportError: cannot import name 'Literal' from 'typing' (/usr/lib/python3.7/typing.py)

    ImportError: cannot import name 'Literal' from 'typing' (/usr/lib/python3.7/typing.py)

    I tried running the model editing notebook in Colab, but ran into the following issue:

    ImportError: cannot import name 'Literal' from 'typing' (/usr/lib/python3.7/typing.py)

    This is because Literal is only available in Python 3.8+ while Colab supports 3.7. Was this a recent change?

    I can spin up my own python 3.8 instance or add some additional code to install Python 3.8 in Colab, but that won't solve the issue for others unless I add the conda python 3.8 installation in a PR. Will this be fixed?

    opened by JayThibs 2
  • How do you calculate Efficacy Score for multi-token words in your counterfact dataset?

    How do you calculate Efficacy Score for multi-token words in your counterfact dataset?

    Efficacy score can be calculated by P[o] and P[o*]. The object o can sometimes be tokenized to several sub-words (tokens) and in this case how do you calculate P[o]?

    opened by Zce1112zslx 1
  • Any suggestions for extending this work to edit values?

    Any suggestions for extending this work to edit values?

    Thank you very much for making this awesome work publicly available.

    I'm working on extending ROME to understanding and editing the "values" representations that the model knows (as in, human values, not the values part of key/value pairs). E.g., is there a low rank update we can apply that causes the model to think that environmentalists really like the oil industry? Or an update that causes the model to think that "valuing artistic expression" means you really like geese?

    Do you have any suggestions for applying ROME to these sorts of abstract, values-related relationships?

    Thanks for your time!

    opened by QuintinPope 1
  • Best way to specify a non-standard cache directory

    Best way to specify a non-standard cache directory

    Hello,

    Due to memory limitations, I need to store ROME's cache somewhere other than my home directory. What would be the best way to specify the new cache dir's location so that all of ROME's components know where to find the data they need?

    Also, I intend to run ROME on custom trained GPT-2 models, so the cache dir also needs to hold the data used to calculate covariance statistics for those models.

    Thanks for your help, and thanks so much for providing this repository!

    opened by QuintinPope 1
  • Generating weights for efk/mend for new model

    Generating weights for efk/mend for new model

    Hi, I was wondering if you guys can tell me how I can generate weights for distilgpt2 for the mend/efk baselines, similar to what you have for gpt2-xl here: https://rome.baulab.info/data/weights/. I'm trying to run these baselines but don't have the saved weights. I tried simply loading and saving huggingface's weights for distilgpt2 but it looks like the code is looking for something a bit different. If you guys have a script/suggestions, that would be great.

    Thanks!

    opened by salemohamedo 1
  • Improve efficiency of v* optimization

    Improve efficiency of v* optimization

    • Optimization now runs approximately n times as fast, where n is the number of tokens in the target. In other words, this is particularly helpful for long targets
    • Speedup achieved via cleverer loss computation
    opened by kmeng01 0
  • Improve v* optimization algorithm: performance + efficiency

    Improve v* optimization algorithm: performance + efficiency

    v* optimization is now carried out differently:

    • Optimization is conducted not only over clean sentences (purely containing subject, relation, object), but also with randomly-sampled contexts appended to the beginning.
    • Diversity of the context templates is improved
    • Code is now better optimized + various bugfixes
    opened by kmeng01 0
  • Hyperparameter sweeps

    Hyperparameter sweeps

    Implements sweeps over any arbitrary hyperparameter.

    Note that the weight decay formulation is modified slightly to maintain (relatively) consistent specificity across different layers:

    weight_decay = hparams.v_weight_decay * (
        torch.norm(delta) / torch.norm(target_init) ** 2
    )
    

    The inspiration for this heuristic is the observation that edits are fairly sensitive to the norm of the update.

    opened by kmeng01 0
  • Code for Average Plot over Multiple Runs

    Code for Average Plot over Multiple Runs

    Hey! Great work Is the code used to generate the plots in Figure 2 publicly available? I would like to reproduce some experiments and that would be really helpful. I have a bunch of prompts and would love to get a similar plot

    image
    opened by aflah02 0
  • How do you train KE and MEND with CounterFact?

    How do you train KE and MEND with CounterFact?

    As is described in your paper, "To encourage fair comparison on both zsRE and COUNTERFACT tasks, we additionally train KE-zsRE and KE-CF models on size-10,000 subsets of the respective training sets." and "Again, for fair comparison, we train new versions of MEND (MEND-zsRE, MEND-CF) on the same sets that KE-zsRE and KE-CF were trained on.".

    Which 10,000 records do you use to train KE-CF and MEND-CF?

    Besides, "Table 4 showcases quantitative results on GPT-2 XL (1.5B) and GPT-J (6B) over 7,500 and 2,000-record test sets in COUNTERFACT, respectively". Which 7,500 or 2,000 records do you use to evaluate all baselines?

    Thank you :-)

    opened by Zce1112zslx 0
  • How to apply this methods to a new LMs without fine-tune, like a bert?

    How to apply this methods to a new LMs without fine-tune, like a bert?

    Hi, In your paper, does the model edited in ROME or MEND methods must be fine-tuned? How to apply the ROME to the model without fine-tuning ? For example, the bert model is not just the GPTs series.

    Thanks.

    opened by sev777 1
  • CUDA out of memory error on an 11 GB GPU: Any easy ways to use multiple GPUs?

    CUDA out of memory error on an 11 GB GPU: Any easy ways to use multiple GPUs?

    Hi,

    When running experiments.evaluate on ROME on gpt2-xl, I get an OOM after 4 cases on an 11 GB RTX 2080 Ti GPU. Given that each case runs sequentially, is this expected? If it is, do you think there are any trivial ways to extend evaluation on multiple GPUs?

    Thanks and great work!

    opened by yashjakhotiya 0
  • additions to allow tracing and plotting of individual neuron importances

    additions to allow tracing and plotting of individual neuron importances

    Here's my function to look at the importances of individual neurons using your causal trace methodology, and then plot them. I don't know if you'll actually want to merge this in, I just thought you might be interested to see / play around with it.

    opened by nathanneuro 0
Owner
Kevin Meng
MIT ugrad interested in interpretability and its applications to NLP, bioinformatics, and robotics.
Kevin Meng
Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together

SpeechMix Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together. Introduction For the same input: from datas

Eric Lam 31 Nov 7, 2022
API for the GPT-J language model 🦜. Including a FastAPI backend and a streamlit frontend

gpt-j-api ?? An API to interact with the GPT-J language model. You can use and test the model in two different ways: Streamlit web app at http://api.v

Víctor Gallego 276 Dec 31, 2022
An implementation of model parallel GPT-3-like models on GPUs, based on the DeepSpeed library. Designed to be able to train models in the hundreds of billions of parameters or larger.

GPT-NeoX An implementation of model parallel GPT-3-like models on GPUs, based on the DeepSpeed library. Designed to be able to train models in the hun

EleutherAI 3.1k Jan 8, 2023
Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts

gpt-2-simple A simple Python package that wraps existing model fine-tuning and generation scripts for OpenAI's GPT-2 text generation model (specifical

Max Woolf 3.1k Jan 7, 2023
Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts

gpt-2-simple A simple Python package that wraps existing model fine-tuning and generation scripts for OpenAI's GPT-2 text generation model (specifical

Max Woolf 2.5k Feb 17, 2021
Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing

Token Shift GPT Implementation of Token Shift GPT - An autoregressive model that relies solely on shifting along the sequence dimension and feedforwar

Phil Wang 32 Oct 14, 2022
Seonghwan Kim 24 Sep 11, 2022
Bot to connect a real Telegram user, simulating responses with OpenAI's davinci GPT-3 model.

AI-BOT Bot to connect a real Telegram user, simulating responses with OpenAI's davinci GPT-3 model.

Thempra 2 Dec 21, 2022
Honor's thesis project analyzing whether the GPT-2 model can more effectively generate free-verse or structured poetry.

gpt2-poetry The following code is for my senior honor's thesis project, under the guidance of Dr. Keith Holyoak at the University of California, Los A

Ashley Kim 2 Jan 9, 2022
This repository serves as a place to document a toy attempt on how to create a generative text model in Catalan, based on GPT-2

GPT-2 Catalan playground and scripts to train a GPT-2 model either from scrath or from another pretrained model.

Laura 1 Jan 28, 2022
🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

spacy-transformers: Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy This package provides spaCy components and architectures to use tr

Explosion 1.2k Jan 8, 2023
🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

spacy-transformers: Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy This package provides spaCy components and architectures to use tr

Explosion 903 Feb 17, 2021
Guide: Finetune GPT2-XL (1.5 Billion Parameters) and GPT-NEO (2.7 B) on a single 16 GB VRAM V100 Google Cloud instance with Huggingface Transformers using DeepSpeed

Guide: Finetune GPT2-XL (1.5 Billion Parameters) and GPT-NEO (2.7 Billion Parameters) on a single 16 GB VRAM V100 Google Cloud instance with Huggingfa

null 289 Jan 6, 2023
Generate product descriptions, blogs, ads and more using GPT architecture with a single request to TextCortex API a.k.a Hemingwai

TextCortex - HemingwAI Generate product descriptions, blogs, ads and more using GPT architecture with a single request to TextCortex API a.k.a Hemingw

TextCortex AI 27 Nov 28, 2022
📜 GPT-2 Rhyming Limerick and Haiku models using data augmentation

Well-formed Limericks and Haikus with GPT2 ?? GPT-2 Rhyming Limerick and Haiku models using data augmentation In collaboration with Matthew Korahais &

Bardia Shahrestani 2 May 26, 2022
Interactive Jupyter Notebook Environment for using the GPT-3 Instruct API

gpt3-instruct-sandbox Interactive Jupyter Notebook Environment for using the GPT-3 Instruct API Description This project updates an existing GPT-3 san

null 312 Jan 3, 2023
Shirt Bot is a discord bot which uses GPT-3 to generate text

SHIRT BOT · Shirt Bot is a discord bot which uses GPT-3 to generate text. Made by Cyclcrclicly#3420 (474183744685604865) on Discord. Support Server EX

null 31 Oct 31, 2022
Transformer related optimization, including BERT, GPT

This repository provides a script and recipe to run the highly optimized transformer-based encoder and decoder component, and it is tested and maintained by NVIDIA.

NVIDIA Corporation 1.7k Jan 4, 2023
Code for producing Japanese GPT-2 provided by rinna Co., Ltd.

japanese-gpt2 This repository provides the code for training Japanese GPT-2 models. This code has been used for producing japanese-gpt2-medium release

rinna Co.,Ltd. 491 Jan 7, 2023