A repository to run gpt-j-6b on low vram machines (4.2 gb minimum vram for 2000 token context, 3.5 gb for 1000 token context). Model loading takes 12gb free ram.

Overview

Basic-UI-for-GPT-J-6B-with-low-vram

A repository to run GPT-J-6B on low vram systems by using both ram, vram and pinned memory.

There seem to be some issues with the weights in the drive link. There seems to be some performance loss, most likely because of poor 16 bit conversion.

How to run :

Use - pip install git+https://github.com/finetuneanon/transformers@gpt-neo-localattention3
Use the link - https://drive.google.com/file/d/1tboTvohQifN6f1JiSV8hnciyNKvj9pvm/view?usp=sharing to dowload the model that has been saved as described here - https://github.com/arrmansa/saving-and-loading-large-models-pytorch

Timing (2000 token context)

1

system -

16 gb ddr4 ram . 1070 8gb gpu.
23 blocks on ram (ram_blocks = 23) out of which 18 are on shared/pinned memory (max_shared_ram_blocks = 18).

timing -

single run of the model(inputs) takes 6.5 seconds.
35 seconds to generate 25 tokens at 2000 context. (1.4 seconds/token)

2

system -

16 gb ddr4 ram . 1060 6gb gpu.
26 blocks on ram (ram_blocks = 26) out of which 18 are on shared/pinned memory (max_shared_ram_blocks = 18).

timing -

40 seconds to generate 25 tokens at 2000 context. (1.6 seconds/token)

You might also like...
Bot to connect a real Telegram user, simulating responses with OpenAI's davinci GPT-3 model.

AI-BOT Bot to connect a real Telegram user, simulating responses with OpenAI's davinci GPT-3 model.

Using context-free grammar formalism to parse English sentences to determine their structure to help computer to better understand the meaning of the sentence.

Sentance Parser Executing the Program Make sure Python 3.6+ is installed. Install requirements $ pip install requirements.txt Run the program:

BMInf (Big Model Inference) is a low-resource inference package for large-scale pretrained language models (PLMs).
BMInf (Big Model Inference) is a low-resource inference package for large-scale pretrained language models (PLMs).

BMInf (Big Model Inference) is a low-resource inference package for large-scale pretrained language models (PLMs).

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project! Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense.
Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense.

PythonTextObfuscator Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense. Requi

Interactive Jupyter Notebook Environment for using the GPT-3 Instruct API

gpt3-instruct-sandbox Interactive Jupyter Notebook Environment for using the GPT-3 Instruct API Description This project updates an existing GPT-3 san

🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

spacy-transformers: Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy This package provides spaCy components and architectures to use tr

Shirt Bot is a discord bot which uses GPT-3 to generate text
Shirt Bot is a discord bot which uses GPT-3 to generate text

SHIRT BOT · Shirt Bot is a discord bot which uses GPT-3 to generate text. Made by Cyclcrclicly#3420 (474183744685604865) on Discord. Support Server EX

🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

spacy-transformers: Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy This package provides spaCy components and architectures to use tr

Comments
  • Expected all tensors to be on same device

    Expected all tensors to be on same device

    I got your code running up to the first test block using a copy of GPT-J-6B I had downloaded (the link on the readme didn't load). I had to remove the check for rotary and use the else code always, but otherwise it worked.

    #if self.rotary:
    #    hidden_states = inputs_embeds
    #else:
    #    position_embeds = self.wpe(position_ids)
    #    hidden_states = inputs_embeds + position_embeds
    position_embeds = self.wpe(position_ids)
    hidden_states = inputs_embeds + position_embeds
    

    However in the first test, it failed with this error message:

    RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking arugment for argument index in method wrapper_index_select)

    That error message implies that this would never work, so I'm not sure what I've done wrong (Different versions of packages?)

    All packages installed:

    Package             Version
    ------------------- ------------
    argon2-cffi         20.1.0
    async-generator     1.10
    attrs               21.2.0
    backcall            0.2.0
    bleach              3.3.1
    certifi             2021.5.30
    cffi                1.14.6
    charset-normalizer  2.0.3
    click               8.0.1
    colorama            0.4.4
    debugpy             1.3.0
    decorator           5.0.9
    defusedxml          0.7.1
    einops              0.3.0
    entrypoints         0.3
    filelock            3.0.12
    huggingface-hub     0.0.8
    idna                3.2
    importlib-metadata  3.10.1
    install             1.3.4
    ipykernel           6.0.2
    ipython             7.25.0
    ipython-genutils    0.2.0
    ipywidgets          7.6.3
    jedi                0.18.0
    Jinja2              3.0.1
    joblib              1.0.1
    jsonschema          3.2.0
    jupyter-client      6.1.12
    jupyter-core        4.7.1
    jupyterlab-pygments 0.1.2
    jupyterlab-widgets  1.0.0
    MarkupSafe          2.0.1
    matplotlib-inline   0.1.2
    mistune             0.8.4
    nbclient            0.5.3
    nbconvert           6.1.0
    nbformat            5.1.3
    nest-asyncio        1.5.1
    notebook            6.4.0
    numpy               1.21.0
    packaging           21.0
    pandocfilters       1.4.3
    parso               0.8.2
    pickleshare         0.7.5
    Pillow              8.3.1
    pip                 21.1.3
    prometheus-client   0.11.0
    prompt-toolkit      3.0.19
    pycparser           2.20
    Pygments            2.9.0
    pyparsing           2.4.7
    pyrsistent          0.18.0
    python-dateutil     2.8.2
    pywin32             301
    pywinpty            1.1.3
    pyzmq               22.1.0
    regex               2021.7.6
    requests            2.26.0
    sacremoses          0.0.45
    Send2Trash          1.7.1
    setuptools          47.1.0
    six                 1.16.0
    terminado           0.10.1
    testpath            0.5.0
    tokenizers          0.10.3
    torch               1.9.0+cu102
    torchaudio          0.9.0
    torchvision         0.10.0+cu102
    tornado             6.1
    tqdm                4.61.2
    traitlets           5.0.5
    transformers        4.6.0.dev0
    typing-extensions   3.10.0.0
    urllib3             1.26.6
    wcwidth             0.2.5
    webencodings        0.5.1
    widgetsnbextension  3.5.1
    zipp                3.5.0
    
    opened by ebolam 2
  • RuntimeError: where expected condition to be a boolean tensor, but got a tensor with dtype Float

    RuntimeError: where expected condition to be a boolean tensor, but got a tensor with dtype Float

    I was successful in getting your code to work on my 2060 laptop after a few tweeks. I just got a tesla M40 card in and am looking at running GPT-J-6 on it using this method. To start though, I thought I'd use the same code with the GPT-NEO-2.7B model to verify that it's working OK. I got the error in the title though when I tried to run it.

    Any ideas as to what's going on?

    Full error log:

    ---------------------------------------------------------------------------
    RuntimeError                              Traceback (most recent call last)
    <timed exec> in <module>
    
    ~\AppData\Roaming\Python\Python37\site-packages\torch\autograd\grad_mode.py in decorate_context(*args, **kwargs)
         26         def decorate_context(*args, **kwargs):
         27             with self.__class__():
    ---> 28                 return func(*args, **kwargs)
         29         return cast(F, decorate_context)
         30 
    
    ~\AppData\Roaming\Python\Python37\site-packages\transformers\generation_utils.py in generate(self, input_ids, max_length, min_length, do_sample, early_stopping, num_beams, temperature, top_k, top_p, repetition_penalty, bad_words_ids, bos_token_id, pad_token_id, eos_token_id, length_penalty, no_repeat_ngram_size, encoder_no_repeat_ngram_size, num_return_sequences, max_time, max_new_tokens, decoder_start_token_id, use_cache, num_beam_groups, diversity_penalty, prefix_allowed_tokens_fn, output_attentions, output_hidden_states, output_scores, return_dict_in_generate, forced_bos_token_id, forced_eos_token_id, remove_invalid_values, synced_gpus, **model_kwargs)
       1024                 return_dict_in_generate=return_dict_in_generate,
       1025                 synced_gpus=synced_gpus,
    -> 1026                 **model_kwargs,
       1027             )
       1028 
    
    ~\AppData\Roaming\Python\Python37\site-packages\transformers\generation_utils.py in sample(self, input_ids, logits_processor, stopping_criteria, logits_warper, max_length, pad_token_id, eos_token_id, output_attentions, output_hidden_states, output_scores, return_dict_in_generate, synced_gpus, **model_kwargs)
       1533                 return_dict=True,
       1534                 output_attentions=output_attentions,
    -> 1535                 output_hidden_states=output_hidden_states,
       1536             )
       1537 
    
    ~\AppData\Roaming\Python\Python37\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
       1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
       1050                 or _global_forward_hooks or _global_forward_pre_hooks):
    -> 1051             return forward_call(*input, **kwargs)
       1052         # Do not call functions when jit is used
       1053         full_backward_hooks, non_full_backward_hooks = [], []
    
    ~\AppData\Roaming\Python\Python37\site-packages\transformers\models\gpt_neo\modeling_gpt_neo.py in forward(self, input_ids, past_key_values, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict)
        983             output_attentions=output_attentions,
        984             output_hidden_states=output_hidden_states,
    --> 985             return_dict=return_dict,
        986         )
        987         hidden_states = transformer_outputs[0]
    
    ~\AppData\Roaming\Python\Python37\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
       1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
       1050                 or _global_forward_hooks or _global_forward_pre_hooks):
    -> 1051             return forward_call(*input, **kwargs)
       1052         # Do not call functions when jit is used
       1053         full_backward_hooks, non_full_backward_hooks = [], []
    
    ~\AppData\Local\Temp/ipykernel_8288/2499053029.py in new_forward(self, input_ids, past_key_values, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, use_cache, output_attentions, output_hidden_states, return_dict)
        219                     head_mask=head_mask[i],
        220                     use_cache=use_cache,
    --> 221                     output_attentions=output_attentions,
        222                 )
        223 
    
    ~\AppData\Roaming\Python\Python37\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
       1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
       1050                 or _global_forward_hooks or _global_forward_pre_hooks):
    -> 1051             return forward_call(*input, **kwargs)
       1052         # Do not call functions when jit is used
       1053         full_backward_hooks, non_full_backward_hooks = [], []
    
    ~\AppData\Roaming\Python\Python37\site-packages\transformers\models\gpt_neo\modeling_gpt_neo.py in forward(self, hidden_states, layer_past, attention_mask, head_mask, use_cache, output_attentions)
        559             head_mask=head_mask,
        560             use_cache=use_cache,
    --> 561             output_attentions=output_attentions,
        562         )
        563         attn_output = attn_outputs[0]  # output_attn: a, present, (attentions)
    
    ~\AppData\Roaming\Python\Python37\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
       1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
       1050                 or _global_forward_hooks or _global_forward_pre_hooks):
    -> 1051             return forward_call(*input, **kwargs)
       1052         # Do not call functions when jit is used
       1053         full_backward_hooks, non_full_backward_hooks = [], []
    
    ~\AppData\Roaming\Python\Python37\site-packages\transformers\models\gpt_neo\modeling_gpt_neo.py in forward(self, hidden_states, layer_past, attention_mask, head_mask, use_cache, output_attentions)
        501             head_mask=head_mask,
        502             use_cache=use_cache,
    --> 503             output_attentions=output_attentions,
        504         )
        505 
    
    ~\AppData\Roaming\Python\Python37\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
       1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
       1050                 or _global_forward_hooks or _global_forward_pre_hooks):
    -> 1051             return forward_call(*input, **kwargs)
       1052         # Do not call functions when jit is used
       1053         full_backward_hooks, non_full_backward_hooks = [], []
    
    ~\AppData\Roaming\Python\Python37\site-packages\transformers\models\gpt_neo\modeling_gpt_neo.py in forward(self, hidden_states, attention_mask, layer_past, head_mask, use_cache, output_attentions)
        453             masked_bias=self.masked_bias,
        454             attn_dropout=self.attn_dropout,
    --> 455             head_mask=head_mask,
        456         )
        457 
    
    ~\AppData\Roaming\Python\Python37\site-packages\transformers\models\gpt_neo\modeling_gpt_neo.py in _attn(self, query, key, value, causal_mask, masked_bias, attn_dropout, attention_mask, head_mask)
        276 
        277         attn_weights = torch.matmul(query, key.transpose(-1, -2))
    --> 278         attn_weights = torch.where(causal_mask, attn_weights, masked_bias.to(attn_weights.dtype))
        279 
        280         if attention_mask is not None:
    
    RuntimeError: where expected condition to be a boolean tensor, but got a tensor with dtype Float
    
    opened by ebolam 0
  • The results are much worse than with original GPT-J-6B

    The results are much worse than with original GPT-J-6B

    Even though memory savings are great, I hoped that the quality will be the same, but it is not. For example, on https://6b.eleuther.ai/ I try the following prompt (highlighted in bold) and get decent result (:

    In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English. As first reported in Andean News, “The unicorns live in the rural valley and are one of many animal species native to the mountains and can be found to this day, but they are a rare occurrence.” One of the scientists at the scene said, “They were so different from anything else that we had seen in our lifetime, so it was a surprise.”

    The scientists were able to capture several of the unicorns and identified them as the first specimens ever found, and one unicorn was even carrying a pink umbrella. Additionally, it was found that a human female had been kidnapped by one of the unicorns and that the herd had a protector, a man who travels with them. One of the scientists said, “He had just given us the run of the valley because he didn’t want us to disturb the unicorns. We all know now that’s not going to happen.” It is hoped that the kidnapping is something of a sign that the humans and the unicorns can coexist, and there have been some initial concerns that the unicorns are not quite so friendly as they first seemed, for they refused to let anyone near the big udder.

    But with this repository, results are consistently bad (in both cases top-p=0.9 and temperature=1, but I also tried default repository parameters, it generates nonsense too), I generated 30 tokens at a time:

    In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English. So we found that these authors do not exist, that's why and how many years

    "Yes, please?"

    So much so fastidious house dust is created, you know what the author of "The Complete Book of Envato the Dream Chaser Wartime-Tropomorrah@candy_bunny-rraaaayyyyyy.... What the hell were those people who think the world around them have been taken aback when you told a lie. We are not supposed to speak the truth... but the facts do exist. The main problem with the human race is that they forget where they got their truth from. They say the whole universe is one big Lie the main source of all this universe is our perception; i.e., We are only creatures on our senses do not know what the hell they are. They think they are not supposed to know. It has no awareness of where it came from. What makes a rose flower look like an orchid, if it had some life

    Original GPT-J-6B does not lose the context and overall quality of each sentence is much higher. But GPT-J-6B from this repository, even is some cases when it does not lose the context right away, just generates nonsense, sometimes even of worse quality than what shown above.

    Am I doing something wrong, or severe reduction of quality is a consequence of RAM/VRAM memory savings? If the latter is the case, I suggest putting a warning about this in the README.

    I have used RTX 2060 SUPER 8GB (with no connected displays, so it has all the memory free), my CPU is 5950X (16 core) and I have 128GB of RAM. The biggest limit in my case is VRAM, I guess I could run original GPT-J-6B on CPU-only, but I hoped to use my GPU so I tried this repository first.

    opened by Lissanro 11
Owner
null
Honor's thesis project analyzing whether the GPT-2 model can more effectively generate free-verse or structured poetry.

gpt2-poetry The following code is for my senior honor's thesis project, under the guidance of Dr. Keith Holyoak at the University of California, Los A

Ashley Kim 2 Jan 9, 2022
Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing

Token Shift GPT Implementation of Token Shift GPT - An autoregressive model that relies solely on shifting along the sequence dimension and feedforwar

Phil Wang 32 Oct 14, 2022
This repository serves as a place to document a toy attempt on how to create a generative text model in Catalan, based on GPT-2

GPT-2 Catalan playground and scripts to train a GPT-2 model either from scrath or from another pretrained model.

Laura 1 Jan 28, 2022
Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together

SpeechMix Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together. Introduction For the same input: from datas

Eric Lam 31 Nov 7, 2022
An implementation of model parallel GPT-3-like models on GPUs, based on the DeepSpeed library. Designed to be able to train models in the hundreds of billions of parameters or larger.

GPT-NeoX An implementation of model parallel GPT-3-like models on GPUs, based on the DeepSpeed library. Designed to be able to train models in the hun

EleutherAI 3.1k Jan 8, 2023
Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts

gpt-2-simple A simple Python package that wraps existing model fine-tuning and generation scripts for OpenAI's GPT-2 text generation model (specifical

Max Woolf 3.1k Jan 7, 2023
Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts

gpt-2-simple A simple Python package that wraps existing model fine-tuning and generation scripts for OpenAI's GPT-2 text generation model (specifical

Max Woolf 2.5k Feb 17, 2021
API for the GPT-J language model 🦜. Including a FastAPI backend and a streamlit frontend

gpt-j-api ?? An API to interact with the GPT-J language model. You can use and test the model in two different ways: Streamlit web app at http://api.v

VĂ­ctor Gallego 276 Dec 31, 2022
Seonghwan Kim 24 Sep 11, 2022