Guide: Finetune GPT2-XL (1.5 Billion Parameters) and GPT-NEO (2.7 B) on a single 16 GB VRAM V100 Google Cloud instance with Huggingface Transformers using DeepSpeed

Last update: Jan 6, 2023

Related tags

Text Data & NLP gpu-memory finetuning gpt2 huggingface huggingface-transformers gpt3 deepspeed gpu-vram

Overview

Guide: Finetune GPT2-XL (1.5 Billion Parameters) and GPT-NEO (2.7 Billion Parameters) on a single 16 GB VRAM V100 Google Cloud instance with Huggingface Transformers using DeepSpeed

Finetuning large language models like GPT2-xl is often difficult, as these models are too big to fit on a single GPU.
This guide explains how to finetune GPT2-xl and GPT-NEO (2.7B Parameters) with just one command of the Huggingface Transformers library on a single GPU.
This is made possible by using the DeepSpeed library and gradient checkpointing to lower the required GPU memory usage of the model.
I also explain how to set up a server on Google Cloud with a V100 GPU (16GB VRAM), that you can use if you don't have a GPU.

1. (Optional) Setup VM with V100 in Google Compute Engine

Note: The GPT2-xl model does run on any server with a GPU with at least 16 GB VRAM and 60 GB RAM. The GPT-NEO model needs at least 70 GB RAM. If you use your own server and not the setup described here, you will need to install CUDA and Pytorch on it.

Requirements

Install the Google Cloud SDK: Click Here
Register a Google Cloud Account, create a project and set up billing (only once you set up billing, you can use the $300 dollar sign up credit for GPUs).
Request a quota limit increase for "GPU All Regions" to 1. Here is a step by step guide. The UI changed a bit and looks now like this.
Log in and initialize the cloud sdk with gcloud auth login and gcloud init and follow the steps until you are set up.

Create VM

Replace YOURPROJECTID in the command below with the project id from your GCE project.
You can add the --preemptible flag to the command below, this reduces your cost to about 1/3, but Google is then able to shut down your instance at any point. At the time of writing, this configuration only costs about $1.28 / hour in GCE, when using preemptible.
You can change the zone, if there are no ressources available. Here is a list of all zones and whether they have V100 GPUs. Depending on the time of the day you might need to try out a few.
We need a GPU server with at least 60 GB RAM, otherwise the run will crash, whenever the script wants to save/pickle a model. This setup below gives us as much RAM as possible with 12 CPU cores in GCE (without paying for extended memory). You also can't use more than 12 CPU cores with a single V100 GPU in GCE.

Run this to create the instance:

gcloud compute instances create gpuserver \
   --project YOURPROJECTID \
   --zone us-west1-b \
   --custom-cpu 12 \
   --custom-memory 78 \
   --maintenance-policy TERMINATE \
   --image-family pytorch-1-7-cu110 \
   --image-project deeplearning-platform-release \
   --boot-disk-size 200GB \
   --metadata "install-nvidia-driver=True" \
   --accelerator="type=nvidia-tesla-v100,count=1" \

After 5 minutes or so (the server needs to install nvidia drivers first), you can connect to your instance with the command below. If you changed the zone, you also will need to change it here.

replace YOURSDKACCOUNT with your sdk account name

gcloud compute ssh YOURSDKACCOUNT@gpuserver --zone=us-west1-b

Don't forget to shut down the server once your done, otherwise you will keep getting billed for it. This can be done here.

The next time you can restart the server from the same web ui here.

2. Download script and install libraries

Run this to download the script and to install all libraries:

git clone https://github.com/Xirider/finetune-gpt2xl.git
chmod -R 777 finetune-gpt2xl/
cd finetune-gpt2xl
pip install -r requirements.txt

This installs transformers from source, as the current release doesn't work well with deepspeed.

(Optional) If you want to use Wandb.ai for experiment tracking, you have to login:

wandb login

3. Finetune GPT2-xl (1.5 Billion Parameters)

Then add your training data:

replace the example train.txt and validation.txt files in the folder with your own training data with the same names and then run python text2csv.py. This converts your .txt files into one column csv files with a "text" header and puts all the text into a single line. We need to use .csv files instead of .txt files, because Huggingface's dataloader removes line breaks when loading text from a .txt file, which does not happen with the .csv files.
If you want to feed the model separate examples instead of one continuous block of text, you need to pack each of your examples into an separate line in the csv train and validation files.
Be careful with the encoding of your text. If you don't clean your text files or if just copy text from the web into a text editor, the dataloader from the datasets library might not load them.

Run this:

deepspeed --num_gpus=1 run_clm.py \
--deepspeed ds_config.json \
--model_name_or_path gpt2-xl \
--train_file train.csv \
--validation_file validation.csv \
--do_train \
--do_eval \
--fp16 \
--overwrite_cache \
--evaluation_strategy="steps" \
--output_dir finetuned \
--eval_steps 200 \
--num_train_epochs 1 \
--gradient_accumulation_steps 2 \
--per_device_train_batch_size 8

This command runs the the standard run_clm.py file from Huggingface's examples with deepspeed, just with 2 lines added to enable gradient checkpointing to use less memory.
Training on the Shakespeare example should take about 17 minutes. With gradient accumulation 2 and batch size 8, one gradient step takes about 9 seconds. This means the model training speed should be almost 2 examples / second. You can go up to batch size of 12 before running out of memory, but that doesn't provide any speedups.
Note that the default huggingface optimizer hyperparameters and the hyperparameters given as flag overwrite the hyperparameters in the ds_config.json file. Therefore if you want to adjust learning rates, warmup and more, you need to set these as flags to the training command. For an example you can find further below the training command of GPT-NEO which changes the learning rate.
You might want to try different hyperparameters like --learning_rate and --warmup_steps to improve the finetuning.

4. Generate text with your finetuned model

You can test your finetuned GPT2-xl model with this script from Huggingface Transfomers (is included in the folder):

python run_generation.py --model_type=gpt2 --model_name_or_path=finetuned --length 200

Or you can use it now in your own code like this to generate text in batches:

# credit to Niels Rogge - https://github.com/huggingface/transformers/issues/10704

from transformers import GPT2Tokenizer, GPT2LMHeadModel
import torch

device = 'cuda' if torch.cuda.is_available() else 'cpu'

tokenizer = GPT2Tokenizer.from_pretrained('finetuned')
tokenizer.padding_side = "left"
tokenizer.pad_token = tokenizer.eos_token
model = GPT2LMHeadModel.from_pretrained('finetuned').to(device)
print("model loaded")

# this is a single input batch with size 3
texts = ["From off a hill whose concave womb", "Another try", "A third test"]

encoding = tokenizer(texts, padding=True, return_tensors='pt').to(device)
with torch.no_grad():
    generated_ids = model.generate(**encoding, max_length=100)
generated_texts = tokenizer.batch_decode(
    generated_ids, skip_special_tokens=True)

print(generated_texts)

model inference runs on even small gpus or on cpus without any more additional changes

Finetune GPT-NEO (2.7 Billion Parameters)

This works now. I tested it with a server with one V100 GPU (16 GB VRAM) and 78 GB normal RAM, but it might not actually need that much RAM.

Add your training data like you would for GPT2-xl:

replace the example train.txt and validation.txt files in the folder with your own training data with the same names and then run python text2csv.py. This converts your .txt files into one column csv files with a "text" header and puts all the text into a single line. We need to use .csv files instead of .txt files, because Huggingface's dataloader removes line breaks when loading text from a .txt file, which does not happen with the .csv files.
If you want to feed the model separate examples instead of one continuous block of text, you need to pack each of your examples into an separate line in the csv train and validation files.
Be careful with the encoding of your text. If you don't clean your text files or if just copy text from the web into a text editor, the dataloader from the datasets library might not load them.
Be sure to either login into wandb.ai with wandb login or uninstall it completely. Otherwise it might cause a memory error during the run.

Then start the training run this command:

deepspeed --num_gpus=1 run_clm.py \
--deepspeed ds_config_gptneo.json \
--model_name_or_path EleutherAI/gpt-neo-2.7B \
--train_file train.csv \
--validation_file validation.csv \
--do_train \
--do_eval \
--fp16 \
--overwrite_cache \
--evaluation_strategy="steps" \
--output_dir finetuned \
--num_train_epochs 1 \
--eval_steps 15 \
--gradient_accumulation_steps 2 \
--per_device_train_batch_size 4 \
--use_fast_tokenizer False \
--learning_rate 5e-06 \
--warmup_steps 10

This uses a smaller "allgather_bucket_size" setting in the ds_config_gptneo.json file and a smaller batch size to further reduce gpu memory.
You might want to change and try hyperparameters to be closer to the orignal EleutherAi training config. You can find these here.

Generate text with a GPT-NEO 2.7 Billion Parameters model

I provided a script, that allows you to interactively prompt your GPT-NEO model. If you just want to sample from the pretrained model without finetuning it yourself, replace "finetuned" with "EleutherAI/gpt-neo-2.7B". Start it with this:

python run_generate_neo.py finetuned

Or use this snippet to generate text from your finetuned model within your code:

# credit to Suraj Patil - https://github.com/huggingface/transformers/pull/10848 - modified

from transformers import GPTNeoForCausalLM, AutoTokenizer

model = GPTNeoForCausalLM.from_pretrained("finetuned").to("cuda")
tokenizer = AutoTokenizer.from_pretrained("finetuned")

text = "From off a hill whose concave"
ids = tokenizer(text, return_tensors="pt").input_ids.to("cuda")

max_length = 400 + ids.shape[1] # add the length of the prompt tokens to match with the mesh-tf generation

gen_tokens = model.generate(
  ids,
  do_sample=True,
  min_length=max_length,
  max_length=max_length,
  temperature=0.9,
  use_cache=True
)
gen_text = tokenizer.batch_decode(gen_tokens)[0]
print(gen_text)

(Optional) Configuration

You can change the learning rate, weight decay and warmup by setting them as flags to the training command. Warm up and learning rates in the config are ignored, as the script always uses the Huggingface optimizer/trainer default values. If you want to overwrite them you need to use flags. You can check all the explanations here:

https://huggingface.co/transformers/master/main_classes/trainer.html#deepspeed

The rest of the training arguments can be provided as a flags and are all listed here:

https://huggingface.co/transformers/master/main_classes/trainer.html#trainingarguments

Comments

Freezing at "Using /home/user/.cache/torch_extensions as PyTorch extensions root..."

After installing the dependencies and running the given commands to fine-tune a model, some GPU VRAM is allocated(looking at nvidia-smi) , but then the program seems to just stop with once "Using /home/user/.cache/torch_extensions as PyTorch extensions root..." prints

opened by mallorbc 4
Errors while trying to train with two GPUs

Hi,

When trying to train on two GPUs, I'm getting this error:

Traceback (most recent call last): File "run_clm.py", line 478, in main() File "run_clm.py", line 441, in main train_result = trainer.train(resume_from_checkpoint=checkpoint) File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 1083, in train deepspeed_engine, optimizer, lr_scheduler = deepspeed_init( File "/root/miniconda3/lib/python3.8/site-packages/transformers/integrations.py", line 520, in deepspeed_init model, optimizer, _, lr_scheduler = deepspeed.initialize( File "/root/miniconda3/lib/python3.8/site-packages/deepspeed/init.py", line 116, in initialize engine = DeepSpeedEngine(args=args, File "/root/miniconda3/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 148, in init self._configure_with_arguments(args, mpu) File "/root/miniconda3/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 517, in _configure_with_arguments self._config = DeepSpeedConfig(config_file, mpu, param_dict=self.config_params) File "/root/miniconda3/lib/python3.8/site-packages/deepspeed/runtime/config.py", line 597, in init self._configure_train_batch_size() File "/root/miniconda3/lib/python3.8/site-packages/deepspeed/runtime/config.py", line 732, in _configure_train_batch_size self._set_batch_related_parameters() File "/root/miniconda3/lib/python3.8/site-packages/deepspeed/runtime/config.py", line 728, in _set_batch_related_parameters assert False,
AssertionError: Either train_batch_size or micro_batch_per_gpu needs to be provided

So if I added the flag --train_batch_size 8 and I got the following error:

Traceback (most recent call last): File "run_clm.py", line 478, in main() File "run_clm.py", line 192, in main model_args, data_args, training_args = parser.parse_args_into_dataclasses() File "/root/miniconda3/lib/python3.8/site-packages/transformers/hf_argparser.py", line 196, in parse_args_into_dataclasses Traceback (most recent call last): File "run_clm.py", line 478, in raise ValueError(f"Some specified arguments are not used by the HfArgumentParser: {remaining_args}") ValueError: Some specified arguments are not used by the HfArgumentParser: ['--train_batch_size', '8'] main() File "run_clm.py", line 192, in main model_args, data_args, training_args = parser.parse_args_into_dataclasses() File "/root/miniconda3/lib/python3.8/site-packages/transformers/hf_argparser.py", line 196, in parse_args_into_dataclasses raise ValueError(f"Some specified arguments are not used by the HfArgumentParser: {remaining_args}") ValueError: Some specified arguments are not used by the HfArgumentParser: ['--train_batch_size', '8']

Looks to me like a mismatch between deepspeed and transformers, do you have any suggestions on how to solve it?

This is my ds_report:

DeepSpeed general environment info: torch install path ............... ['/root/miniconda3/lib/python3.8/site-packages/torch'] torch version .................... 1.7.1 torch cuda version ............... 11.0 nvcc version ..................... 11.0 deepspeed install path ........... ['/root/miniconda3/lib/python3.8/site-packages/deepspeed'] deepspeed info ................... 0.3.15, unknown, unknown deepspeed wheel compiled w. ...... torch 1.7, cuda 11.0

opened by barakw2021 4
Gpt-neo inference with Deepspeed: IndexError: Dimension out of range
Thanks for this useful repository. I was able to follow it to train a gtp-neo 2.7B model.

Inference on the model works well for me, using less than 8GB of Vram, so fits on consumer-level gpus, however, I'm not yet able to get the inference working with Deepspeed.

To be clear...

I am using the code from here:

https://github.com/Xirider/finetune-gpt2xl/blob/main/README.md#generate-text-with-a-gpt-neo-27-billion-parameters-model

And it works well, if I comment out this line:

deepspeed.init_inference(model, mp_size=1, dtype=torch.half, replace_method='auto')

If I retain the line, then the inference fails with this error message:

File "/opt/conda/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/transformer_inference.py", line 374, in forward output = DeepSpeedSelfAttentionFunction.apply( File "/opt/conda/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/transformer_inference.py", line 312, in forward output, key_layer, value_layer, context_layer = selfAttention_fp() File "/opt/conda/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/transformer_inference.py", line 270, in selfAttention_fp qkv_out = qkv_func(input, IndexError: Dimension out of range (expected to be in range of [-2, 1], but got 2)

I'm actually a bit vague on whether Deepspeed actually should be used with inference for GTP-NEO, as far.

Huggingface says....

https://huggingface.co/transformers/main_classes/deepspeed.html

DeepSpeed ZeRO-2 is primarily used only for training, as its features are of no use to inference.

But, Microsoft has a guide which shows the usage of Deepspeed for inference with this model...

https://github.com/microsoft/DeepSpeed/blob/master/docs/_tutorials/inference-tutorial.md#end-to-end-gpt-neo-27b-inference
opened by kingpalethe 3
Can't change BOS token or EOS token for GPT Neo
In order to better control the start and stop of generated text, I have added BOS tokens and EOS tokens for GPT2xl. This works well and the generated text stops at an appropriate length and starts how a normal sentence would. However, I want to do this process on GPT Neo, and this does not work. I have discovered that for some reason arguments that normally set BOS and EOS are not working when GPT Neo is ran, even if I change the tokenizer from AutoTokenizer to GPT2Tokenizer. Below is some code that shows what I mean.

tokenizer = GPT2Tokenizer.from_pretrained( model_args.model_name_or_path, bos_token='<|beginingtext|>',eos_token='<|endingtext|>', pad_token='<|pad|>',**tokenizer_kwargs) print(tokenizer.eos_token) print(tokenizer.bos_token) quit()

As I said, when I run this with GPT2xl, the tokens are appropriately changed. When I run this with GPT Neo, both the BOS and EOS tokens are <|endoftext|>
opened by mallorbc 3
Training on a larger dataset fails due to memory issues on faster GPUs
Thanks so much for producing this repo, it's been really helpful in getting up and running on the biggest GPT-Neo model.

I'm having an issue training gpt-neo_2-7B though - my dataset is just over 200mb, which leads to an out of memory issue on the very last step of loading a model into memory before training.

[INFO|integrations.py:533] 2021-04-20 12:40:32,650 >> Attempting to resume from paragraphs/checkpoint-600 [2021-04-20 12:40:32,664] [INFO] [engine.py:1445:_load_checkpoint] rank: 0 loading checkpoint: paragraphs/checkpoint-600/global_step600/mp_rank_00_model_states.pt Traceback (most recent call last): File "run_clm.py", line 478, in <module> main() File "run_clm.py", line 441, in main train_result = trainer.train(resume_from_checkpoint=checkpoint) [...] RuntimeError: [enforce fail at CPUAllocator.cpp:65] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 10605230080 bytes. Error code 12 (Cannot allocate memory)

I've tried a number of GPUs on Google cloud, and I can get it run on the P100 since I can up the RAM to 100GB, but both the V100 and A100s fail (with 78GB and 85GB respectively)

Unfortunately Google puts a hard limit on RAM for these GPUs, and increasing the number of GPUs also doubles the number of processes run and so the RAM required - so unless I pay for 2 GPUs and let one sit idle I have to train on the much slower P100.

This is .. ok .. 😅 but I'd love to go faster if I can. So far I've tried:

Reducing per_device_train_batch_size to 2

Halving the dataset size but neither have made a difference.

Do you have any other tips on how I might squeeze into the 85GB you get with an A100? It's so tantalizingly close - I wish Google would just let me add more RAM!
opened by jonnyplatt 3

Resume from checkpoint

I have RTX 3090 (24GB) and 64 GB RAM, and 50 GB swap memory, and although training works pretty nicely, unfortunately resuming training from checkpoints results in OOM:

[2021-05-07 19:18:39,962] [WARNING] [runner.py:122:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2021-05-07 19:18:39,973] [INFO] [runner.py:360:main] cmd = /opt/conda/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=29500 run_clm.py --deepspeed ds_config_gptneo_new.json --model_name_or_path /datadrive/model/checkpoint-800/ --train_file merged_train.txt.csv --do_train --fp16 --overwrite_cache --output_dir /datadrive/model --num_train_epochs 1 --gradient_accumulation_steps 2 --per_device_train_batch_size 4 --use_fast_tokenizer False --learning_rate 5e-06 --save_steps 400
[2021-05-07 19:18:40,526] [INFO] [launch.py:73:main] 0 NCCL_VERSION 2.7.8
[2021-05-07 19:18:40,526] [INFO] [launch.py:80:main] WORLD INFO DICT: {'localhost': [0]}
[2021-05-07 19:18:40,526] [INFO] [launch.py:86:main] nnodes=1, num_local_procs=1, node_rank=0
[2021-05-07 19:18:40,526] [INFO] [launch.py:101:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]})
[2021-05-07 19:18:40,526] [INFO] [launch.py:102:main] dist_world_size=1
[2021-05-07 19:18:40,526] [INFO] [launch.py:104:main] Setting CUDA_VISIBLE_DEVICES=0
[2021-05-07 19:18:41,601] [INFO] [distributed.py:46:init_distributed] Initializing torch distributed with backend: nccl
05/07/2021 19:18:41 - WARNING - __main__ -   Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: True
05/07/2021 19:18:41 - INFO - __main__ -   Training/evaluation parameters TrainingArguments(output_dir=/datadrive/model, overwrite_output_dir=False, do_train=True, do_eval=False, do_predict=False, evaluation_strategy=IntervalStrategy.NO, prediction_loss_only=False, per_device_train_batch_size=4, per_device_eval_batch_size=8, gradient_accumulation_steps=2, eval_accumulation_steps=None, learning_rate=5e-06, weight_decay=0.0, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=1.0, max_steps=-1, lr_scheduler_type=SchedulerType.LINEAR, warmup_ratio=0.0, warmup_steps=0, logging_dir=runs/May07_19-18-41_9c3c6cac903e, logging_strategy=IntervalStrategy.STEPS, logging_first_step=False, logging_steps=500, save_strategy=IntervalStrategy.STEPS, save_steps=400, save_total_limit=None, no_cuda=False, seed=42, fp16=True, fp16_opt_level=O1, fp16_backend=auto, fp16_full_eval=False, local_rank=0, tpu_num_cores=None, tpu_metrics_debug=False, debug=[], dataloader_drop_last=False, eval_steps=500, dataloader_num_workers=0, past_index=-1, run_name=/datadrive/model, disable_tqdm=False, remove_unused_columns=True, label_names=None, load_best_model_at_end=False, metric_for_best_model=None, greater_is_better=None, ignore_data_skip=False, sharded_ddp=[], deepspeed=ds_config_gptneo_new.json, label_smoothing_factor=0.0, adafactor=False, group_by_length=False, length_column_name=length, report_to=['tensorboard'], ddp_find_unused_parameters=None, dataloader_pin_memory=True, skip_memory_metrics=False, use_legacy_prediction_loop=False, push_to_hub=False, resume_from_checkpoint=None, _n_gpu=1, mp_parameters=)
05/07/2021 19:18:42 - WARNING - datasets.builder -   Using custom data configuration default-b5898a6a80220f13
05/07/2021 19:18:42 - WARNING - datasets.builder -   Reusing dataset csv (/root/.cache/huggingface/datasets/csv/default-b5898a6a80220f13/0.0.0/2dc6629a9ff6b5697d82c25b73731dd440507a69cbce8b425db50b751e8fcfd0)
[INFO|configuration_utils.py:515] 2021-05-07 19:18:42,390 >> loading configuration file /datadrive/model/checkpoint-800/config.json
[INFO|configuration_utils.py:553] 2021-05-07 19:18:42,390 >> Model config GPTNeoConfig {
  "_name_or_path": "EleutherAI/gpt-neo-2.7B",
  "activation_function": "gelu_new",
  "architectures": [
    "GPTNeoForCausalLM"
  ],
  "attention_dropout": 0,
  "attention_layers": [
    "global",
    "local",
    "global",
    "local",
    "global",
    "local",
    "global",
    "local",
    "global",
    "local",
    "global",
    "local",
    "global",
    "local",
    "global",
    "local",
    "global",
    "local",
    "global",
    "local",
    "global",
    "local",
    "global",
    "local",
    "global",
    "local",
    "global",
    "local",
    "global",
    "local",
    "global",
    "local"
  ],
  "attention_types": [
    [
      [
        "global",
        "local"
      ],
      16
    ]
  ],
  "bos_token_id": 50256,
  "embed_dropout": 0,
  "eos_token_id": 50256,
  "gradient_checkpointing": true,
  "hidden_size": 2560,
  "initializer_range": 0.02,
  "intermediate_size": null,
  "layer_norm_epsilon": 1e-05,
  "max_position_embeddings": 2048,
  "model_type": "gpt_neo",
  "num_heads": 20,
  "num_layers": 32,
  "resid_dropout": 0,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 50,
      "temperature": 0.9
    }
  },
  "tokenizer_class": "GPT2Tokenizer",
  "transformers_version": "4.6.0.dev0",
  "use_cache": false,
  "vocab_size": 50257,
  "window_size": 256
}

[INFO|configuration_utils.py:517] 2021-05-07 19:18:42,765 >> loading configuration file https://huggingface.co/gpt2/resolve/main/config.json from cache at /models/transformers/fc674cd6907b4c9e933cb42d67662436b89fa9540a1f40d7c919d0109289ad01.7d2e0efa5ca20cef4fb199382111e9d3ad96fd77b849e1d4bed13a66e1336f51
[INFO|configuration_utils.py:553] 2021-05-07 19:18:42,765 >> Model config GPT2Config {
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "gradient_checkpointing": false,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_inner": null,
  "n_layer": 12,
  "n_positions": 1024,
  "resid_pdrop": 0.1,
  "scale_attn_weights": true,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 50
    }
  },
  "transformers_version": "4.6.0.dev0",
  "use_cache": true,
  "vocab_size": 50257
}

[INFO|tokenization_utils_base.py:1717] 2021-05-07 19:18:44,877 >> loading file https://huggingface.co/gpt2/resolve/main/vocab.json from cache at /models/transformers/684fe667923972fb57f6b4dcb61a3c92763ad89882f3da5da9866baf14f2d60f.c7ed1f96aac49e745788faa77ba0a26a392643a50bb388b9c04ff469e555241f
[INFO|tokenization_utils_base.py:1717] 2021-05-07 19:18:44,877 >> loading file https://huggingface.co/gpt2/resolve/main/merges.txt from cache at /models/transformers/c0c761a63004025aeadd530c4c27b860ec4ecbe8a00531233de21d865a402598.5d12962c5ee615a4c803841266e9c3be9a691a924f72d395d3a6c6c81157788b
[INFO|tokenization_utils_base.py:1717] 2021-05-07 19:18:44,877 >> loading file https://huggingface.co/gpt2/resolve/main/added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:1717] 2021-05-07 19:18:44,877 >> loading file https://huggingface.co/gpt2/resolve/main/special_tokens_map.json from cache at None
[INFO|tokenization_utils_base.py:1717] 2021-05-07 19:18:44,877 >> loading file https://huggingface.co/gpt2/resolve/main/tokenizer_config.json from cache at None
[INFO|tokenization_utils_base.py:1717] 2021-05-07 19:18:44,877 >> loading file https://huggingface.co/gpt2/resolve/main/tokenizer.json from cache at /models/transformers/16a2f78023c8dc511294f0c97b5e10fde3ef9889ad6d11ffaa2a00714e73926e.cf2d0ecb83b6df91b3dbb53f1d1e4c311578bfd3aa0e04934215a49bf9898df0
[INFO|modeling_utils.py:1147] 2021-05-07 19:18:44,955 >> loading weights file /datadrive/model/checkpoint-800/pytorch_model.bin
[INFO|modeling_utils.py:1328] 2021-05-07 19:18:59,255 >> All model checkpoint weights were used when initializing GPTNeoForCausalLM.

[INFO|modeling_utils.py:1336] 2021-05-07 19:18:59,255 >> All the weights of GPTNeoForCausalLM were initialized from the model checkpoint at /datadrive/model/checkpoint-800/.
If your task is similar to the task the model of the checkpoint was trained on, you can already use GPTNeoForCausalLM for predictions without further training.
  0%|                                                     | 0/1 [00:00<?, ?ba/s][WARNING|tokenization_utils_base.py:3170] 2021-05-07 19:19:40,807 >> Token indices sequence length is longer than the specified maximum sequence length for this model (14397149 > 1024). Running this sequence through the model will result in indexing errors
100%|█████████████████████████████████████████████| 1/1 [00:42<00:00, 42.00s/ba]
100%|█████████████████████████████████████████████| 1/1 [00:08<00:00,  8.47s/ba]
[INFO|trainer.py:414] 2021-05-07 19:19:50,812 >> Using amp fp16 backend
[INFO|trainer.py:1042] 2021-05-07 19:19:50,865 >> Loading model from /datadrive/model/checkpoint-800/).
[2021-05-07 19:19:50,867] [INFO] [logging.py:60:log_dist] [Rank 0] DeepSpeed info: version=0.3.16, git-hash=unknown, git-branch=unknown
[2021-05-07 19:19:50,867] [WARNING] [config.py:79:_sanity_check] DeepSpeedConfig: cpu_offload is deprecated. Please use offload_optimizer.
[2021-05-07 19:19:54,135] [INFO] [utils.py:11:_initialize_parameter_parallel_groups] data_parallel_size: 1, parameter_parallel_size: 1
Using /root/.cache/torch_extensions as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /root/.cache/torch_extensions/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module cpu_adam...
Time to load cpu_adam op: 2.1879847049713135 seconds
Adam Optimizer #0 is created with AVX2 arithmetic capability.
Config: alpha=0.000005, betas=(0.900000, 0.999000), weight_decay=0.000000, adam_w=1
[2021-05-07 19:19:58,240] [INFO] [engine.py:610:_configure_optimizer] Using DeepSpeed Optimizer param name adamw as basic optimizer
[2021-05-07 19:19:58,240] [INFO] [engine.py:615:_configure_optimizer] DeepSpeed Basic Optimizer = DeepSpeedCPUAdam
Checking ZeRO support for optimizer=DeepSpeedCPUAdam type=<class 'deepspeed.ops.adam.cpu_adam.DeepSpeedCPUAdam'>
[2021-05-07 19:19:58,240] [INFO] [logging.py:60:log_dist] [Rank 0] Creating fp16 ZeRO stage 2 optimizer
[2021-05-07 19:19:58,240] [INFO] [stage2.py:102:__init__] Reduce bucket size 200000000.0
[2021-05-07 19:19:58,240] [INFO] [stage2.py:103:__init__] Allgather bucket size 200000000.0
[2021-05-07 19:19:58,240] [INFO] [stage2.py:104:__init__] CPU Offload: True
Using /root/.cache/torch_extensions as PyTorch extensions root...
Emitting ninja build file /root/.cache/torch_extensions/utils/build.ninja...
Building extension module utils...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module utils...
Time to load utils op: 1.4445114135742188 seconds
[2021-05-07 19:21:35,500] [INFO] [stage2.py:381:__init__] optimizer state initialized
[2021-05-07 19:21:35,709] [INFO] [logging.py:60:log_dist] [Rank 0] DeepSpeed Final Optimizer = adamw
[2021-05-07 19:21:35,760] [INFO] [engine.py:439:_configure_lr_scheduler] DeepSpeed using configured LR scheduler = WarmupLR
[2021-05-07 19:21:35,761] [INFO] [logging.py:60:log_dist] [Rank 0] DeepSpeed LR Scheduler = <deepspeed.runtime.lr_schedules.WarmupLR object at 0x7fe9d20fb5b0>
[2021-05-07 19:21:35,769] [INFO] [logging.py:60:log_dist] [Rank 0] step=0, skipped=0, lr=[5e-06], mom=[[0.9, 0.999]]
[2021-05-07 19:21:35,777] [INFO] [config.py:747:print] DeepSpeedEngine configuration:
[2021-05-07 19:21:35,925] [INFO] [config.py:751:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2021-05-07 19:21:35,926] [INFO] [config.py:751:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2021-05-07 19:21:35,926] [INFO] [config.py:751:print]   allreduce_always_fp32 ........ False
[2021-05-07 19:21:35,927] [INFO] [config.py:751:print]   amp_enabled .................. False
[2021-05-07 19:21:35,927] [INFO] [config.py:751:print]   amp_params ................... False
[2021-05-07 19:21:35,927] [INFO] [config.py:751:print]   checkpoint_tag_validation_enabled  True
[2021-05-07 19:21:35,928] [INFO] [config.py:751:print]   checkpoint_tag_validation_fail  False
[2021-05-07 19:21:35,928] [INFO] [config.py:751:print]   disable_allgather ............ False
[2021-05-07 19:21:35,928] [INFO] [config.py:751:print]   dump_state ................... False
[2021-05-07 19:21:35,929] [INFO] [config.py:751:print]   dynamic_loss_scale_args ...... {'init_scale': 65536, 'scale_window': 1000, 'delayed_shift': 2, 'min_scale': 1}
[2021-05-07 19:21:35,929] [INFO] [config.py:751:print]   elasticity_enabled ........... False
[2021-05-07 19:21:35,931] [INFO] [config.py:751:print]   flops_profiler_config ........ {
    "enabled": false, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 3, 
    "detailed": true
}
[2021-05-07 19:21:35,931] [INFO] [config.py:751:print]   fp16_enabled ................. True
[2021-05-07 19:21:35,931] [INFO] [config.py:751:print]   global_rank .................. 0
[2021-05-07 19:21:35,931] [INFO] [config.py:751:print]   gradient_accumulation_steps .. 2
[2021-05-07 19:21:35,931] [INFO] [config.py:751:print]   gradient_clipping ............ 1.0
[2021-05-07 19:21:35,931] [INFO] [config.py:751:print]   gradient_predivide_factor .... 1.0
[2021-05-07 19:21:35,931] [INFO] [config.py:751:print]   initial_dynamic_scale ........ 65536
[2021-05-07 19:21:35,931] [INFO] [config.py:751:print]   loss_scale ................... 0
[2021-05-07 19:21:35,931] [INFO] [config.py:751:print]   memory_breakdown ............. False
[2021-05-07 19:21:35,931] [INFO] [config.py:751:print]   optimizer_legacy_fusion ...... False
[2021-05-07 19:21:35,932] [INFO] [config.py:751:print]   optimizer_name ............... adamw
[2021-05-07 19:21:35,932] [INFO] [config.py:751:print]   optimizer_params ............. {'lr': 5e-06, 'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0.0}
[2021-05-07 19:21:35,933] [INFO] [config.py:751:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2021-05-07 19:21:35,933] [INFO] [config.py:751:print]   pld_enabled .................. False
[2021-05-07 19:21:35,933] [INFO] [config.py:751:print]   pld_params ................... False
[2021-05-07 19:21:35,933] [INFO] [config.py:751:print]   prescale_gradients ........... False
[2021-05-07 19:21:35,933] [INFO] [config.py:751:print]   scheduler_name ............... WarmupLR
[2021-05-07 19:21:35,933] [INFO] [config.py:751:print]   scheduler_params ............. {'warmup_min_lr': 0, 'warmup_max_lr': 5e-06, 'warmup_num_steps': 0}
[2021-05-07 19:21:35,933] [INFO] [config.py:751:print]   sparse_attention ............. None
[2021-05-07 19:21:35,933] [INFO] [config.py:751:print]   sparse_gradients_enabled ..... False
[2021-05-07 19:21:35,933] [INFO] [config.py:751:print]   steps_per_print .............. 2000
[2021-05-07 19:21:35,933] [INFO] [config.py:751:print]   tensorboard_enabled .......... False
[2021-05-07 19:21:35,933] [INFO] [config.py:751:print]   tensorboard_job_name ......... DeepSpeedJobName
[2021-05-07 19:21:35,933] [INFO] [config.py:751:print]   tensorboard_output_path ...... 
[2021-05-07 19:21:35,933] [INFO] [config.py:751:print]   train_batch_size ............. 8
[2021-05-07 19:21:35,933] [INFO] [config.py:751:print]   train_micro_batch_size_per_gpu  4
[2021-05-07 19:21:35,933] [INFO] [config.py:751:print]   wall_clock_breakdown ......... False
[2021-05-07 19:21:35,934] [INFO] [config.py:751:print]   world_size ................... 1
[2021-05-07 19:21:35,934] [INFO] [config.py:751:print]   zero_allow_untested_optimizer  False
[2021-05-07 19:21:35,938] [INFO] [config.py:751:print]   zero_config .................. {
    "stage": 2, 
    "contiguous_gradients": true, 
    "reduce_scatter": true, 
    "reduce_bucket_size": 2.000000e+08, 
    "allgather_partitions": true, 
    "allgather_bucket_size": 2.000000e+08, 
    "overlap_comm": true, 
    "load_from_fp32_weights": true, 
    "elastic_checkpoint": true, 
    "offload_param": null, 
    "offload_optimizer": {
        "device": "cpu", 
        "nvme_path": null, 
        "buffer_count": 4, 
        "pin_memory": false, 
        "pipeline_read": false, 
        "pipeline_write": false, 
        "fast_init": false
    }, 
    "sub_group_size": 1.000000e+12, 
    "prefetch_bucket_size": 5.000000e+07, 
    "param_persistence_threshold": 1.000000e+05, 
    "max_live_parameters": 1.000000e+09, 
    "max_reuse_distance": 1.000000e+09, 
    "gather_fp16_weights_on_model_save": false, 
    "find_unused_parameters": false
}
[2021-05-07 19:21:35,938] [INFO] [config.py:751:print]   zero_enabled ................. True
[2021-05-07 19:21:35,938] [INFO] [config.py:751:print]   zero_optimization_stage ...... 2
[2021-05-07 19:21:35,942] [INFO] [config.py:753:print]   json = {
    "fp16": {
        "enabled": true, 
        "loss_scale": 0, 
        "loss_scale_window": 1000, 
        "initial_scale_power": 16, 
        "hysteresis": 2, 
        "min_loss_scale": 1
    }, 
    "optimizer": {
        "type": "AdamW", 
        "params": {
            "lr": 5e-06, 
            "betas": [0.9, 0.999], 
            "eps": 1e-08, 
            "weight_decay": 0.0
        }
    }, 
    "scheduler": {
        "type": "WarmupLR", 
        "params": {
            "warmup_min_lr": 0, 
            "warmup_max_lr": 5e-06, 
            "warmup_num_steps": 0
        }
    }, 
    "zero_optimization": {
        "stage": 2, 
        "allgather_partitions": true, 
        "allgather_bucket_size": 2.000000e+08, 
        "overlap_comm": true, 
        "reduce_scatter": true, 
        "reduce_bucket_size": 2.000000e+08, 
        "contiguous_gradients": true, 
        "cpu_offload": true
    }, 
    "gradient_accumulation_steps": 2, 
    "gradient_clipping": 1.0, 
    "steps_per_print": 2.000000e+03, 
    "train_batch_size": 8, 
    "train_micro_batch_size_per_gpu": 4, 
    "wall_clock_breakdown": false
}
Using /root/.cache/torch_extensions as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.09232521057128906 seconds
[INFO|integrations.py:536] 2021-05-07 19:21:36,160 >> Attempting to resume from /datadrive/model/checkpoint-800/
[2021-05-07 19:21:36,175] [INFO] [engine.py:1480:_load_checkpoint] rank: 0 loading checkpoint: /datadrive/model/checkpoint-800/global_step800/mp_rank_00_model_states.pt

opened by ArturTan 2

Exception: Installed CUDA version 11.0 does not match the version torch was compiled with 11.1 [SOLUTION]

Hey first off awesome project Im getting this error when i try to run the deepspeed command. I found my solution if anyone else has this problem

wget https://developer.download.nvidia.com/compute/cuda/11.1.1/local_installers/cuda_11.1.1_455.32.00_linux.run sudo sh cuda_11.1.1_455.32.00_linux.run

opened by CupOfGeo 2
separate examples for finetuning

#17 Hey I have been using this. I would help update the docs to add the separated_samples_max_length argument and when you should you it. This has been a super helpful repo. If you want me to change anything just let me know.

opened by CupOfGeo 1
Crashes with new Transformers version

Here's the error:

Traceback (most recent call last): File "run_clm.py", line 478, in main() File "run_clm.py", line 422, in main trainer = Trainer( File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 295, in init logging.set_verbosity(log_level) File "/root/miniconda3/lib/python3.8/site-packages/transformers/utils/logging.py", line 161, in set_verbosity _get_library_root_logger().setLevel(verbosity) File "/root/miniconda3/lib/python3.8/logging/init.py", line 1409, in setLevel self.level = _checkLevel(level) File "/root/miniconda3/lib/python3.8/logging/init.py", line 194, in _checkLevel raise ValueError("Unknown level: %r" % level)

The fix was to install transformers v4.6.0 from pip

opened by barakw2021 1
Ideal number of epochs? Number of examples meaning?

Is there a recommended number of epochs to use? I was able to successfully train on a custom dataset with near 45k entries for the training set and near 11k in the validation set. In the example only 1 epoch set for the flag. However, I have found that training for 4 epochs leads to a lower loss than 1 epoch, and I imagine continuing to train the model would lead to an even better result. It is difficult to say at what point overfitting may start occurring, as the validation data is only evaluated at the end of the training

Thus I ask, is there a rough ideal number of epochs for fine-tuning? If there is, I think it would be a good idea to add that to the README(which I can do if needed).

My second question is related to the Num examples part of training and evaluation. As I said, I have near 45k training texts and near 11k validation texts. However, the Num examples say 1472 and 365 respectfully for training and validation. What does this mean? Is not all the data being used? Why does it not say the much larger numbers of 45k and 11k?

Thanks for the repo and for your help. This is very cool and relatively easy to work with after one gets experience with DeepSpeed

opened by mallorbc 1

Multiple entries csv

Hi i come from upwork, is this what are you looking for, split dataset into (multi row csv)


start_token = "|<start of text>|"
end_token = "|<end of text>|"
with open('train.txt', encoding='utf-8') as txtfile:
    all_text = txtfile.read().replace(start_token,"").split(end_token)
    all_text = all_text[0:len(all_text)-1]
with open('train.csv', mode='w', encoding='utf-8') as csv_file:
    fieldnames = ['text']
    writer = csv.DictWriter(csv_file, fieldnames=fieldnames)
    writer.writeheader()
    for row in all_text:
        writer.writerow({'text': all_text})


with open('validation.txt', encoding='utf-8') as txtfile:
    all_text = txtfile.read().replace(start_token,"").split(end_token)
    all_text = all_text[0:len(all_text)-1]
with open('validation.csv', mode='w', encoding='utf-8') as csv_file:
    fieldnames = ['text']
    writer = csv.DictWriter(csv_file, fieldnames=fieldnames)
    writer.writeheader()
    for row in all_text:
        writer.writerow({'text': row})

print("created train.csv and validation.csv > files")```

opened by kikirizki 1

subprocess.CalledProcessError:

I got the following error: [2022-01-13 14:47:32,154] [INFO] [launch.py:131:sigkill_handler] Killing subprocess 2273 Traceback (most recent call last): File "/home/ubuntu/anaconda3/envs/gpt2_lm/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/ubuntu/anaconda3/envs/gpt2_lm/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/ubuntu/anaconda3/envs/gpt2_lm/lib/python3.8/site-packages/deepspeed/launcher/launch.py", line 167, in <module> main() File "/home/ubuntu/anaconda3/envs/gpt2_lm/lib/python3.8/site-packages/deepspeed/launcher/launch.py", line 156, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/home/ubuntu/anaconda3/envs/gpt2_lm/lib/python3.8/site-packages/deepspeed/launcher/launch.py", line 137, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/home/ubuntu/anaconda3/envs/gpt2_lm/bin/python', '-u', 'run_clm.py', '--local_rank=0', '--deepspeed', 'ds_config.json', '--model_name_or_path', 'gpt2-xl', '--train_file', '../../dataset/train.txt', '--validation_file', '../../dataset/test.txt', '--do_train', '--do_eval', '--fp16', '--overwrite_cache', '--evaluation_strategy=steps', '--output_dir', 'finetuned', '--eval_steps', '500', '--num_train_epochs', '1', '--gradient_accumulation_steps', '2', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1']' died with <Signals.SIGKILL: 9>.

opened by Dhanachandra 1
Out of memory with RTX3090

Hi, I'm trying to train gpt2xl, but keep getting OOM, even when I set batch size to 1 and gradient_accumulation to 8\16\512, contigous_gradients false and allgather_bucket_size \ reduce_bucket_size 2e2. I can see in nvidia-smi that I'm only reaching half the memory capacity - around 12GB My system is as stated - 3090 with 24GB memory 80 GB RAM 5600x cpu if that matters running WSL2 on windows 10 Thanks.

opened by PyxAI 4
Feeding the model separate examples instead of one continuous block of text

Hello I'm interested in adding this feature anding a function in text2csv.py to take a folder of texts and then in run_clm.py pad and truncate them instead of the group_text function.

opened by CupOfGeo 1
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'

I try to use your script (gpt2-xl) but I have an error: AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'

pip list Package Version

certifi 2021.5.30 charset-normalizer 2.0.4 click 8.0.1 configparser 5.0.2 datasets 1.8.0 deepspeed 0.4.0 dill 0.3.4 docker-pycreds 0.4.0 filelock 3.0.12 fsspec 2021.7.0 gitdb 4.0.7 GitPython 3.1.18 huggingface-hub 0.0.8 idna 3.2 importlib-metadata 4.7.0 joblib 1.0.1 multiprocess 0.70.12.2 ninja 1.10.2 numpy 1.21.2 packaging 21.0 pandas 1.3.2 pathtools 0.1.2 Pillow 8.3.1 pip 21.2.4 promise 2.3 protobuf 3.17.3 psutil 5.8.0 pyarrow 3.0.0 pyparsing 2.4.7 python-dateutil 2.8.2 pytz 2021.1 PyYAML 5.4.1 regex 2021.8.21 requests 2.26.0 sacremoses 0.0.45 sentry-sdk 1.3.1 setuptools 57.4.0 shortuuid 1.0.1 six 1.16.0 smmap 4.0.0 subprocess32 3.5.4 tensorboardX 1.8 tokenizers 0.10.3 torch 1.9.0 torchvision 0.10.0 tqdm 4.49.0 transformers 4.7.0 triton 1.0.0 typing-extensions 3.10.0.0 urllib3 1.26.6 wandb 0.12.0 wheel 0.37.0 xxhash 2.0.2 zipp 3.5.0

opened by remotejob 7
New issue with Pandas

I got this error:

Traceback (most recent call last): File "run_clm.py", line 478, in main() File "run_clm.py", line 271, in main datasets = load_dataset( File "/root/miniconda3/lib/python3.8/site-packages/datasets/load.py", line 742, in load_dataset builder_instance.download_and_prepare( File "/root/miniconda3/lib/python3.8/site-packages/datasets/builder.py", line 574, in download_and_prepare self._download_and_prepare( File "/root/miniconda3/lib/python3.8/site-packages/datasets/builder.py", line 652, in _download_and_prepare self._prepare_split(split_generator, **prepare_split_kwargs) File "/root/miniconda3/lib/python3.8/site-packages/datasets/builder.py", line 1041, in _prepare_split for key, table in utils.tqdm(generator, unit=" tables", leave=False, disable=not_verbose): File "/root/miniconda3/lib/python3.8/site-packages/tqdm/std.py", line 1133, in iter for obj in iterable: File "/root/miniconda3/lib/python3.8/site-packages/datasets/packaged_modules/csv/csv.py", line 92, in _generate_tables csv_file_reader = pd.read_csv( File "/root/miniconda3/lib/python3.8/site-packages/pandas/util/_decorators.py", line 311, in wrapper return func(*args, **kwargs) File "/root/miniconda3/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 571, in read_csv kwds_defaults = _refine_defaults_read( File "/root/miniconda3/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1306, in _refine_defaults_read raise ValueError("Specified named and prefix; you can only specify one.") ValueError: Specified named and prefix; you can only specify one. Downloading and preparing dataset csv/default (download: Unknown size, generated: Unknown size, post-processed: Unknown size, total: Unknown size) to /root/.cache/huggingface/datasets/csv/default-84d6151a5e4565ed/0.0.0/2dc6629a9ff6b5697d82c25b73731dd440507a69cbce8b425db50b751e8fcfd0... Traceback (most recent call last): File "run_clm.py", line 478, in main() File "run_clm.py", line 271, in main datasets = load_dataset( File "/root/miniconda3/lib/python3.8/site-packages/datasets/load.py", line 742, in load_dataset builder_instance.download_and_prepare( File "/root/miniconda3/lib/python3.8/site-packages/datasets/builder.py", line 574, in download_and_prepare self._download_and_prepare( File "/root/miniconda3/lib/python3.8/site-packages/datasets/builder.py", line 652, in _download_and_prepare self._prepare_split(split_generator, **prepare_split_kwargs) File "/root/miniconda3/lib/python3.8/site-packages/datasets/builder.py", line 1041, in _prepare_split for key, table in utils.tqdm(generator, unit=" tables", leave=False, disable=not_verbose): File "/root/miniconda3/lib/python3.8/site-packages/tqdm/std.py", line 1133, in iter for obj in iterable: File "/root/miniconda3/lib/python3.8/site-packages/datasets/packaged_modules/csv/csv.py", line 92, in _generate_tables csv_file_reader = pd.read_csv( File "/root/miniconda3/lib/python3.8/site-packages/pandas/util/_decorators.py", line 311, in wrapper return func(*args, **kwargs) File "/root/miniconda3/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 571, in read_csv kwds_defaults = _refine_defaults_read( File "/root/miniconda3/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1306, in _refine_defaults_read raise ValueError("Specified named and prefix; you can only specify one.") ValueError: Specified named and prefix; you can only specify one. Downloading and preparing dataset csv/default (download: Unknown size, generated: Unknown size, post-processed: Unknown size, total: Unknown size) to /root/.cache/huggingface/datasets/csv/default-84d6151a5e4565ed/0.0.0/2dc6629a9ff6b5697d82c25b73731dd440507a69cbce8b425db50b751e8fcfd0...

Apparently it's a know error with the latest Pandas: https://github.com/pandas-dev/pandas/issues/42387

I solved it by downgrading to Pandas 1.2.5

opened by barakw2021 0

Guide: Finetune GPT2-XL (1.5 Billion Parameters) and GPT-NEO (2.7 B) on a single 16 GB VRAM V100 Google Cloud instance with Huggingface Transformers using DeepSpeed

Related tags

Overview

Guide: Finetune GPT2-XL (1.5 Billion Parameters) and GPT-NEO (2.7 Billion Parameters) on a single 16 GB VRAM V100 Google Cloud instance with Huggingface Transformers using DeepSpeed

1. (Optional) Setup VM with V100 in Google Compute Engine

Requirements

Create VM

2. Download script and install libraries

3. Finetune GPT2-xl (1.5 Billion Parameters)

4. Generate text with your finetuned model

Finetune GPT-NEO (2.7 Billion Parameters)

Generate text with a GPT-NEO 2.7 Billion Parameters model

(Optional) Configuration

Comments

Owner

Train GPT-3 model on V100(16GB Mem) Using improved Transformer.

Bpe algorithm can finetune tokenizer - Bpe algorithm can finetune tokenizer

Train 🤗transformers with DeepSpeed: ZeRO-2, ZeRO-3

GPT-Code-Clippy (GPT-CC) is an open source version of GitHub Copilot, a language model

Label data using HuggingFace's transformers and automatically get a prediction service

:mag: Transformers at scale for question answering & neural search. Using NLP via a modular Retriever-Reader-Pipeline. Supporting DPR, Elasticsearch, HuggingFace's Modelhub...

:mag: End-to-End Framework for building natural language search interfaces to data by utilizing Transformers and the State-of-the-Art of NLP. Supporting DPR, Elasticsearch, HuggingFace’s Modelhub and much more!

KoBART model on huggingface transformers

A deep learning-based translation library built on Huggingface transformers

Huggingface Transformers + Adapters = ❤️

Code for lyric-section-to-comment generation based on huggingface transformers.

Partially offline multi-language translator built upon Huggingface transformers.

Use AutoModelForSeq2SeqLM in Huggingface Transformers to train COMET

PyTorch Language Model for 1-Billion Word (LM1B / GBW) Dataset

Simple and efficient RevNet-Library with DeepSpeed support

Natural language processing summarizer using 3 state of the art Transformer models: BERT, GPT2, and T5

Generate product descriptions, blogs, ads and more using GPT architecture with a single request to TextCortex API a.k.a Hemingwai

Princeton NLP's pre-training library based on fairseq with DeepSpeed kernel integration 🚃

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.