Prefix-Tuning: Optimizing Continuous Prompts for Generation

Last update: Jan 4, 2023

Related tags

Deep Learning PrefixTuning

Overview

Prefix Tuning

Files:

.
├── gpt2                          # Code for GPT2 style autoregressive LM
│   ├── train_e2e.py              # high-level scripts to train.
│   ├── train_control.py          # code that implements prefix-tuning.
│   ├── trainer_prefix.py         # trainer code for the training loop. 
│   ├── run_language_modeling.py  # training code (contains data loading, model loading, and calls trainer)
│   ├── gen.py                    # high-level scripts to decode. 
│   └── run_generation.py         # decoding code. 
│
├── seq2seq                       # Code for encoder-decoder architecture
│   ├── train_bart.py             # high-level scripts to train.
│   ├── prefixTuning.py           # code that implements prefix-tuning.
│   ├── finetune.py               # training code (contains data loading, model loading, and calls trainer)   
│   ├── lightning_base.py         # helper code
│   ├── utils.py                  # helper code
│   └── callbacks.py              # helper code
└── ...

To run the code for GPT2 style autoregressive LM, the code is in gpt2/. This corresponds to the table-to-text experiments in the paper.

To run the code for encoder-decoder architecture like BART, the code is in seq2seq. This corresponds to the summarization experiments in the paper.

The two primary scripts I used to run my codes are gpt2/train_e2e.py (for table-to-text) and seq2seq/train_bart.py(for summarization). they are set to default of good hyperparameters, and can be used to tune hyperparameter :)

Setup:

cd transformer; pip install -e .

Train via prefix-tuning:

cd gpt2;

python train_e2e.py --optim_prefix yes --preseqlen 5 --epoch 5 --learning_rate 0.00005 --mode webnlg --bsz 5 --seed 101

cd seq2seq; 

python train_bart.py --mode xsum --preseqlen 200 --do_train yes --fp16 yes --bsz 16  --epoch 30  --gradient_accumulation_step 3 --learning_rate 0.00005  --mid_dim 800

Other baseline approaches

cd gpt2;

python train_e2e.py --tuning_mode {finetune/adaptertune} --epoch 5 --learning_rate 0.00005 --mode webnlg --bsz 5 --seed 101

cd seq2seq;

python train_e2e.py --tuning_mode finetune --epoch 5 --learning_rate 0.00005 --mode webnlg --bsz 5 --seed 101

Decode:

cd gpt2;

python gen.py {data2text/webnlg/...} yes test {checkpoint_path} no

cd seq2seq; 

python train_bart.py --mode xsum --do_train no --prefix_model_path {checkpoint_path} --preseqlen {same as training} --mid_dim {same as training}

For details of the methods and results, please refer to our paper.

@misc{li2021prefixtuning,
      title={Prefix-Tuning: Optimizing Continuous Prompts for Generation}, 
      author={Xiang Lisa Li and Percy Liang},
      year={2021},
      eprint={2101.00190},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Comments

XSUM dataset differences with original

Hello, You shared the xsum dataset link here https://github.com/XiangLi1999/PrefixTuning/issues/2

However I see from the colab link https://worksheets.codalab.org/bundles/0x58f85171b43f4e61bf411c35faab369d and from the hyperparameters/data directory in https://worksheets.codalab.org/bundles/0xa3f0cd3c10c7490ab508a351968cbdcf that you have used xsum_news data. When I checked xsum_news, I found that the validation file has 7,186 examples. However, the original dataset has 11,327 examples. The test set is also different with 11,333 examples in xsum_news vs. 20,418 in the original xsum.

I was wondering if you could explain the differences in eval/test dataset sizes compared to the original and perhaps provide your script for preprocessing the original xsum.

Thanks!

opened by jpilaul 16
OOM error

Hi, I tried the seq2seq prefixtuning and found:

RuntimeError: CUDA out of memory. Tried to allocate 1.20 GiB (GPU 0; 15.90 GiB total capacity; 4.63 GiB already allocated; 797.50 MiB free; 5.81 GiB reserved in total by PyTorch)

I run the expr on a 16GB GPU. Am I supposed to use a 32GB GPU instead? Thanks!

opened by taineleau 9
About the training speed verification

Hi Lisa~ I rewrite the code refer to yours on BART based on the newest huggingface transformers, and I want to verify a thing that according to my training procedure, the speed of the prefix-training is about 60%~70% of the all parameter finetune, even when I used a very very small prefix prompt module. I want to ask for your help that: does that make sense? And where may be the bottle neck of the speed? Hope for you reply.

opened by Timothyxxx 6
Model save not working

There are a few checkpoint_callback being created in lighting_base.py and I think that using the callback on line https://github.com/XiangLi1999/PrefixTuning/blob/cleaned/seq2seq/lightning_base.py#L749 does allow us to save the model. I am rerunning the model right now to verify without the line. However, since it takes a long time to train, I was hoping that you can help me fix model saving. Thanks

opened by jpilaul 6

RuntimeError: Input, output and indices must be on the current device

Hi, I met a RuntimeError when training a prefix model. Do you have any suggestions?

Here is the evironments:
certifi (2021.5.30)
charset-normalizer (2.0.4)
click (8.0.1)
dataclasses (0.8)
filelock (3.0.12)
idna (3.2)
importlib-metadata (4.8.1)
itsdangerous (2.0.1)
Jinja2 (3.0.1)
joblib (1.0.1)
MarkupSafe (2.0.1)
nltk (3.6.2)
numpy (1.19.5)
packaging (21.0)
Pillow (8.3.2)
pip (9.0.3)
pyparsing (2.4.7)
Python-dev (2.0.0.dev0)
regex (2021.8.28)
requests (2.26.0)
sacremoses (0.0.45)
sentencepiece (0.1.96)
setuptools (39.2.0)
six (1.16.0)
tokenizers (0.8.1rc2)
torch (1.8.0+cu111)
torchvision (0.9.0+cu111)
tqdm (4.62.2)
transformers (3.2.0, /home/yanzhongxiang/PrefixTuning/transformers/src)
typing-extensions (3.10.0.2)
urllib3 (1.26.6)
Werkzeug (2.0.1)
zipp (3.5.0)

Here is the command line: python train_e2e.py --optim_prefix yes --preseqlen 5 --epoch 5 --learning_rate 0.00005 --mode webnlg --bsz 5 --seed 101 --cache_dir ./cache

Here is the error information:

webnlg_models/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1
python run_language_modeling.py         --output_dir=webnlg_models/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1         --model_type=gpt2         --model_name_or_path=gpt2-medium         --tokenizer_name=gpt2-medium         --per_device_train_batch_size 5         --per_device_eval_batch_size 5         --save_steps 500000         --num_train_epochs 5         --do_train         --train_data_file=../data/webnlg_challenge_2017/train.json         --do_eval         --line_by_line         --save_total_limit 1         --overwrite_output_dir         --task_mode webnlg         --eval_data_file=../data/webnlg_challenge_2017/dev.json          --tuning_mode prefixtune --logging_dir webnlg_models/runs/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1         --train_embs no --optim_prefix yes --preseqlen 5 --prefix_mode activation --format_mode cat --gradient_accumulation_steps 1 --learning_rate 5e-05 --weight_decay 0.0 --seed 101 --disable_tqdm --mid_dim 512 --init_random no --use_dropout no --prefix_dropout 0.0 --objective_mode 1 --evaluate_during_training --eval_steps 5000  --cache_dir cache/gpt2-medium-s3 
/home/yanzhongxiang/PrefixTuning/transformers/src/transformers/__init__.py
/home/yanzhongxiang/PrefixTuning/transformers/src/transformers/training_args.py:299: FutureWarning: The `evaluate_during_training` argument is deprecated in favor of `evaluation_strategy` (which has more options)
  FutureWarning,
09/16/2021 10:22:04 - WARNING - __main__ -   Process rank: -1, device: cuda:0, n_gpu: 8, distributed training: False, 16-bits training: False
09/16/2021 10:22:04 - INFO - __main__ -   Training/evaluation parameters TrainingArguments(output_dir='webnlg_models/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1', overwrite_output_dir=True, do_train=True, do_eval=True, do_predict=False, evaluate_during_training=True, evaluation_strategy=<EvaluationStrategy.STEPS: 'steps'>, prediction_loss_only=False, per_device_train_batch_size=5, per_device_eval_batch_size=5, per_gpu_train_batch_size=None, per_gpu_eval_batch_size=None, gradient_accumulation_steps=1, learning_rate=5e-05, weight_decay=0.0, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=5.0, max_steps=-1, warmup_steps=0, logging_dir='webnlg_models/runs/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1', logging_first_step=False, logging_steps=500, save_steps=500000, save_total_limit=1, no_cuda=False, seed=101, fp16=False, fp16_opt_level='O1', local_rank=-1, tpu_num_cores=None, tpu_metrics_debug=False, debug=False, dataloader_drop_last=False, eval_steps=5000, dataloader_num_workers=0, past_index=-1, run_name=None, disable_tqdm=True, remove_unused_columns=True, label_names=None)
objective is 1
False
/home/yanzhongxiang/PrefixTuning/transformers/src/transformers/tokenization_utils_base.py:1324: FutureWarning: The `max_len` attribute has been deprecated and will be removed in a future version, use `model_max_length` instead.
  FutureWarning,
prefixtune
adapting the size of the model embedding to include [PAD]
len(tokenizer) =  50257
len(tokenizer) =  50258
<|endoftext|> 50256
<|endoftext|> 50256
loading the prefix model from  None
training the prefix model from scratch. 
under the PrefixTuning model
PrefixTuning
preseqlen is 5, optimizing the prefix directly
[Full prefix-tuning Setting :) ]
torch.Size([5, 1024])
torch.Size([512, 1024])
torch.Size([512])
torch.Size([49152, 512])
torch.Size([49152])
total param is 25744896
webnlg
tgt_avg:  30.665242718446603
src_avg:  49.62568654646324
ratios:  1.6183040519881826
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 383, 317, 283, 7537, 318, 262, 9003, 286, 317, 283, 7537, 11, 16490, 13, 220, 50256]
[220, 930, 317, 283, 7537, 62, 16170, 634, 1058, 1748, 50, 8520, 1058, 366, 32, 283, 7537, 11, 16490, 1, 220, 50256, 383, 317, 283, 7537, 318, 262, 9003, 286, 317, 283, 7537, 11, 16490, 13, 220, 50256]
  | Aarhus_Airport : cityServed : "Aarhus, Denmark" <|endoftext|> The Aarhus is the airport of Aarhus, Denmark. <|endoftext|>
[220, 930, 317, 283, 7537, 62, 16170, 634, 1058, 1748, 50, 8520, 1058, 366, 32, 283, 7537, 11, 16490, 1, 220]
[50256, 383, 317, 283, 7537, 318, 262, 9003, 286, 317, 283, 7537, 11, 16490, 13, 220, 50256]
[1748, 50, 8520]

[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 317, 283, 7537, 12690, 9179, 262, 1748, 286, 317, 283, 7537, 11, 16490, 13, 220, 50256]
[220, 930, 317, 283, 7537, 62, 16170, 634, 1058, 1748, 50, 8520, 1058, 366, 32, 283, 7537, 11, 16490, 1, 220, 50256, 317, 283, 7537, 12690, 9179, 262, 1748, 286, 317, 283, 7537, 11, 16490, 13, 220, 50256]
  | Aarhus_Airport : cityServed : "Aarhus, Denmark" <|endoftext|> Aarhus Airport serves the city of Aarhus, Denmark. <|endoftext|>
[220, 930, 317, 283, 7537, 62, 16170, 634, 1058, 1748, 50, 8520, 1058, 366, 32, 283, 7537, 11, 16490, 1, 220]
[50256, 317, 283, 7537, 12690, 9179, 262, 1748, 286, 317, 283, 7537, 11, 16490, 13, 220, 50256]
[1748, 50, 8520]
webnlg
tgt_avg:  31.644375553587246
src_avg:  51.023914968999115
ratios:  1.6124165535386898
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 383, 3554, 286, 317, 283, 7537, 318, 12806, 13319, 82, 36232, 13, 220, 50256]
[220, 930, 317, 283, 7537, 1058, 3554, 5376, 1058, 12806, 62, 33, 917, 82, 36232, 220, 50256, 383, 3554, 286, 317, 283, 7537, 318, 12806, 13319, 82, 36232, 13, 220, 50256]
  | Aarhus : leaderName : Jacob_Bundsgaard <|endoftext|> The leader of Aarhus is Jacob Bundsgaard. <|endoftext|>
[220, 930, 317, 283, 7537, 1058, 3554, 5376, 1058, 12806, 62, 33, 917, 82, 36232, 220]
[50256, 383, 3554, 286, 317, 283, 7537, 318, 12806, 13319, 82, 36232, 13, 220, 50256]
[3554, 5376]

[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 317, 283, 7537, 12690, 338, 23443, 4129, 318, 20479, 17, 13, 15, 13, 220, 50256]
[220, 930, 317, 283, 7537, 62, 16170, 634, 1058, 23443, 24539, 1058, 20479, 17, 13, 15, 220, 50256, 317, 283, 7537, 12690, 338, 23443, 4129, 318, 20479, 17, 13, 15, 13, 220, 50256]
  | Aarhus_Airport : runwayLength : 2702.0 <|endoftext|> Aarhus Airport's runway length is 2702.0. <|endoftext|>
[220, 930, 317, 283, 7537, 62, 16170, 634, 1058, 23443, 24539, 1058, 20479, 17, 13, 15, 220]
[50256, 317, 283, 7537, 12690, 338, 23443, 4129, 318, 20479, 17, 13, 15, 13, 220, 50256]
[23443, 24539]
FORMAT MODE IS  cat
/home/yanzhongxiang/PrefixTuning/gpt2/trainer_prefix.py:309: FutureWarning: Passing `prediction_loss_only` as a keyword argument is deprecated and won't be possible in a future version. Use `args.prediction_loss_only` instead.
  FutureWarning,
09/16/2021 10:22:53 - WARNING - trainer_prefix -   You are instantiating a Trainer but Tensorboard is not installed. You should consider installing it.
/home/yanzhongxiang/PrefixTuning/gpt2/trainer_prefix.py:1291: FutureWarning: This method is deprecated, use `Trainer.is_world_process_zero()` instead.
  warnings.warn("This method is deprecated, use `Trainer.is_world_process_zero()` instead.", FutureWarning)
{'state': {}, 'param_groups': [{'weight_decay': 0.0, 'lr': 5e-05, 'betas': (0.9, 0.999), 'eps': 1e-08, 'correct_bias': True, 'params': [0, 1, 2]}, {'weight_decay': 0.0, 'lr': 5e-05, 'betas': (0.9, 0.999), 'eps': 1e-08, 'correct_bias': True, 'params': [3, 4]}]}
09/16/2021 10:22:53 - INFO - trainer_prefix -   ***** Running training *****
09/16/2021 10:22:53 - INFO - trainer_prefix -     Num examples = 18025
09/16/2021 10:22:53 - INFO - trainer_prefix -     Num Epochs = 5
09/16/2021 10:22:53 - INFO - trainer_prefix -     Instantaneous batch size per device = 5
09/16/2021 10:22:53 - INFO - trainer_prefix -     Total train batch size (w. parallel, distributed & accumulation) = 40
09/16/2021 10:22:53 - INFO - trainer_prefix -     Gradient Accumulation steps = 1
09/16/2021 10:22:53 - INFO - trainer_prefix -     Total optimization steps = 2255
Traceback (most recent call last):
  File "run_language_modeling.py", line 1159, in <module>
    main()
  File "run_language_modeling.py", line 993, in main
    trainer.train(model_path=model_path)
  File "/home/yanzhongxiang/PrefixTuning/gpt2/trainer_prefix.py", line 811, in train
    tr_loss += self.training_step(model, inputs)
  File "/home/yanzhongxiang/PrefixTuning/gpt2/trainer_prefix.py", line 1174, in training_step
    loss = self.compute_loss(model, inputs, gpt2_model=self.gpt2)
  File "/home/yanzhongxiang/PrefixTuning/gpt2/trainer_prefix.py", line 1214, in compute_loss
    outputs = model(**inputs, gpt2_model=gpt2_model)
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 167, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 177, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
    output.reraise()
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/_utils.py", line 429, in reraise
    raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 1 on device 1.
Original Traceback (most recent call last):
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
    output = module(*input, **kwargs)
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/yanzhongxiang/PrefixTuning/gpt2/train_control.py", line 327, in forward
    return_dict=return_dict, **kwargs)
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/yanzhongxiang/PrefixTuning/transformers/src/transformers/modeling_gpt2.py", line 951, in forward
    return_dict=return_dict,
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/yanzhongxiang/PrefixTuning/transformers/src/transformers/modeling_gpt2.py", line 619, in forward
    inputs_embeds = self.wte(input_ids)
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/sparse.py", line 147, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/functional.py", line 1913, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Input, output and indices must be on the current device

opened by super-buster 4

python: can't open file '/u/scr/xlisali/e2e-metrics/measure_scores.py': [Errno 2] No such file or directory

Hi,

I got the following error which says python: can't open file '/u/scr/xlisali/e2e-metrics/measure_scores.py': [Errno 2] No such file or directory when I run that command CUDA_VISIBLE_DEVICES=0 python train_e2e.py --optim_prefix yes --preseqlen 5 --epoch 5 --learning_rate 0.00008 --mode data2text --bsz 10 --seed 101 --tuning_mode prefixtune --cache_dir ./cache

Does anyone meet that issue or know how to deal with that? Thank you so much.

cat
True False
control code is  None
beam
Setting `pad_token_id` to 50256 (first `eos_token_id`) to generate sequence
=== GENERATED SEQUENCE 1 ===
 name : Zizzi | Type : pub | customer rating : average | near : Burger King <|endoftext|> Zizzi is a pub near Burger King. It has an average customer rating.  <|endoftext|>

cat
True False
control code is  None
beam
Setting `pad_token_id` to 50256 (first `eos_token_id`) to generate sequence
=== GENERATED SEQUENCE 1 ===
 name : Zizzi | Type : pub | customer rating : high | near : Burger King <|endoftext|> Zizzi is a pub near Burger King with a high customer rating.  <|endoftext|>

cat
True False
control code is  None
beam
Setting `pad_token_id` to 50256 (first `eos_token_id`) to generate sequence
=== GENERATED SEQUENCE 1 ===
 name : Zizzi | Type : pub | near : The Sorrento <|endoftext|> Zizzi is a pub near The Sorrento.  <|endoftext|>

/data/qbao775/PrefixTuning/gpt2/e2e_results_conv2/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_test_beam_eval
 /data/qbao775/PrefixTuning/gpt2/e2e_results_conv2/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_test_gold
 /data/qbao775/PrefixTuning/gpt2/e2e_results_conv2/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_test_beam
python: can't open file '/u/scr/xlisali/e2e-metrics/measure_scores.py': [Errno 2] No such file or directory

Here are my environment configuration:
Package                Version     Location
---------------------- ----------- -------------------------------------------
absl-py                0.14.1
cachetools             4.2.4
certifi                2021.10.8
charset-normalizer     2.0.7
click                  8.0.3
filelock               3.3.0
future                 0.18.2
google-auth            1.35.0
google-auth-oauthlib   0.4.6
grpcio                 1.41.0
idna                   3.3
joblib                 1.1.0
Markdown               3.3.4
nltk                   3.6.5
numpy                  1.21.2
oauthlib               3.1.1
packaging              21.0
Pillow                 8.3.2
pip                    20.0.2
pkg-resources          0.0.0
protobuf               3.18.1
pyasn1                 0.4.8
pyasn1-modules         0.2.8
pyparsing              2.4.7
pytorch-lightning      0.9.0
PyYAML                 6.0
regex                  2021.10.8
requests               2.26.0
requests-oauthlib      1.3.0
rsa                    4.7.2
sacremoses             0.0.46
sentencepiece          0.1.96
setuptools             44.0.0
six                    1.16.0
tensorboard            2.2.0
tensorboard-plugin-wit 1.8.0
tokenizers             0.8.1rc2
torch                  1.8.0+cu111
torchvision            0.9.0+cu111
tqdm                   4.62.3
transformers           3.2.0       /data/qbao775/PrefixTuning/transformers/src
typing-extensions      3.10.0.2
urllib3                1.26.7
Werkzeug               2.0.2
wheel                  0.37.0

opened by 14H034160212 3

PyTorch Lightning Version?

What version of PyTorch Lightning was this built with? I followed the setup instructions to install the requirements, but I keep get errors from misnamed parameters in the seq2seq module (the gpt-2 module works fine). I can fix the errors as they come up by consulting the current PyTorch Lightning documentation (filepath in the trace should be dirpath for example), but I'd rather use the code as written instead of manually updating it.

Traceback (most recent call last): File "finetune.py", line 876, in main(args) File "finetune.py", line 782, in main checkpoint_callback=get_checkpoint_callback(args.output_dir, model.val_metric, args.save_top_k, lower_is_better), #LISA File "/workspace/PrefixTuning/seq2seq/callbacks.py", line 105, in get_checkpoint_callback period=0, # maybe save a checkpoint every time val is run, not just end of epoch. TypeError: init() got an unexpected keyword argument 'filepath'

opened by ekoenitz 2
How to evaluate DART ? The test set may be changed ?

Hi, Lisa! I read your paper and you have done brilliant work. I want to use GPT to fine-tune the DART dataset. However, I don't know how to evaluate my results. The official scripts (https://github.com/Yale-LILY) provide a different test set (5,097 samples), which has different references, too. I use your test set (12,552 samples) to do generation and evaluate its performance based on the target sentences in the test set (12, 552 samples are aligned, so for each sample, I got only 1 reference). However, I can only get BLEU about 26.28 (GPT large), much lower than yours. Could you please answer me how to evaluate it? Thank you!

opened by JinliangLu96 2
Data preparation step

Hi @XiangLi1999

Thanks for releasing your code. I was wondering how it is possible to download the "webnlg" dataset? I was not able to find any .json format of webnlg dataset. Could you please share your data version as well?

Best, Mohammad

opened by mmderakhshani 1
OSError: [Errno 30] Read-only file system: '/u'
python train_e2e.py --optim_prefix yes --preseqlen 5 --epoch 5 --learning_rate 0.00005 --mode webnlg --bsz 5 --seed 101 would cause the error on my local PC below. I just did the environment set up and install nothing else. Should I install something instead?

Traceback (most recent call last): File "/Users/.../PrefixTuning/transformers/src/transformers/configuration_utils.py", line 355, in get_config_dict local_files_only=local_files_only, File "/Users/.../PrefixTuning/transformers/src/transformers/file_utils.py", line 719, in cached_path local_files_only=local_files_only, File "/Users/.../PrefixTuning/transformers/src/transformers/file_utils.py", line 821, in get_from_cache os.makedirs(cache_dir, exist_ok=True) File "/Users/.../opt/anaconda3/envs/sc/lib/python3.7/os.py", line 213, in makedirs makedirs(head, exist_ok=exist_ok) File "/Users/.../opt/anaconda3/envs/sc/lib/python3.7/os.py", line 213, in makedirs makedirs(head, exist_ok=exist_ok) File "/Users/.../opt/anaconda3/envs/sc/lib/python3.7/os.py", line 213, in makedirs makedirs(head, exist_ok=exist_ok) [Previous line repeated 4 more times] File "/Users/.../opt/anaconda3/envs/sc/lib/python3.7/os.py", line 223, in makedirs mkdir(name, mode) OSError: [Errno 30] Read-only file system: '/u'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "run_language_modeling.py", line 1159, in main() File "run_language_modeling.py", line 546, in main config = AutoConfig.from_pretrained(model_args.model_name_or_path, cache_dir=model_args.cache_dir) File "/Users/.../PrefixTuning/transformers/src/transformers/configuration_auto.py", line 310, in from_pretrained config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs) File "/Users/.../PrefixTuning/transformers/src/transformers/configuration_utils.py", line 368, in get_config_dict raise EnvironmentError(msg) OSError: Can't load config for 'gpt2-medium'. Make sure that:

'gpt2-medium' is a correct model identifier listed on 'https://huggingface.co/models'

or 'gpt2-medium' is the correct path to a directory containing a config.json file
opened by wanglec 1
TypeError: setup() got an unexpected keyword argument 'stage'

Traceback (most recent call last): File "/home/ubuntu/anaconda3/envs/torch/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 682, in _call_and_handle_interrupt return trainer_fn(*args, **kwargs) File "/home/ubuntu/anaconda3/envs/torch/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 770, in _fit_impl self._run(model, ckpt_path=ckpt_path) File "/home/ubuntu/anaconda3/envs/torch/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1132, in _run self._call_setup_hook() # allow user to setup lightning_module in accelerator environment File "/home/ubuntu/anaconda3/envs/torch/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1432, in _call_setup_hook self.call_hook("setup", stage=fn) File "/home/ubuntu/anaconda3/envs/torch/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1483, in call_hook output = model_fx(*args, **kwargs) TypeError: setup() got an unexpected keyword argument 'stage'

Process finished with exit code 1

opened by YahooHu 1
control code is not used in PrefixTuning.get_prompt()

Hi, thanks for sharing the codes.

I have tried the webnlg task and data2text task with the 'cleaned' branch. But I found that the "control_code" argument is not used in all the implementations of PrefixTuning.get_prompt(). Does this mean that different categories of webnlg dataset will use the same soft prompt? I found that there are get_prompt_p3, get_prompt_p1 and get_prompt_p4 in the master branch. Can I use them to reproduce the results of the paper?

Thanks.

opened by XinyuGuan01 3
Is it necessary to arrange position ids between [prefix_len, prefix_len+seq_len) ?

I found position ids is in [prefix_len, prefix_len+seq_len) in modeling_gpt2.py

position_ids = torch.arange(past_length, input_shape[-1] + past_length, dtype=torch.long, device=device)

https://github.com/XiangLi1999/PrefixTuning/blob/6519d30e69b15a180f23e2cd41b766d3f62b8e82/transformers/src/transformers/modeling_gpt2.py#L579

Is it OK to just make position ids in [0, seq_len) ？ Since I have not found the use of position embeddings for prefix matrix.

opened by baiyuting 2
question about the initialization experiment

Hi, thanks for the great work!

In section 7.4, it conducts an initialization experiment with real words. I am just wondering, does this initialization applies to prompts in every layer? Or just the prompts in the first layer? And how does this work together with the re-parameterization method since the input dimension of re-param is much smaller?

And I also noticed that in your code, instead of directly adding prompts to the input of each layer (as described in ur paper), what u actually did is appending vectors to key value matrices directly via the past_key_values argument. Just wondering, how does the initialization experiment work in this setup/implementation? Directly initialize the key/value vectors? But seems that the dimension doesn't match?

Thanks!

opened by Tsingularity 0
notation typo in the paper

Hi, thanks for the great work!

I noticed one notation typo in the paper. It's in footnote 4 of page 5. The second P_theta should be P_theta'.

Hope this could help.

opened by Tsingularity 0
Should've mentioned about "CRITICAL" modifications done in transformers source code

Thanks for public opening of your work. I really appreciate your simple yet param-effective method for tuning PLMs.

In fact, I've gone through hard time re-implementing the original experiment of yours. Until knowing that you've modified modeling_gpt2.py / GPT2LMHeadModel.prepare_inputs_for_generation() (and maybe lil' modifications in generation_utils.py) results were truly mysterious.

The function mentioned above is necessary for making this method actually work. It preserves past_key_values passed. Otherwise, PLM will not incorporate the learned prefix embedding during the generation.

It was really painful process to track this down. You hinted about modifications of data_collators but not about generation part of the transformers which is critical part of the implementation. Meh😕.

Hope this helps the other visitors.

opened by sonsus 0

Prefix-Tuning: Optimizing Continuous Prompts for Generation

Related tags

Overview

Prefix Tuning

Files:

Setup:

Train via prefix-tuning:

Decode:

Comments

Owner

code for the ICLR'22 paper: On Robust Prefix-Tuning for Text Classification

MLOps will help you to understand how to build a Continuous Integration and Continuous Delivery pipeline for an ML/AI project.

Simple image captioning model - CLIP prefix captioning.

Black-Box-Tuning - Black-Box Tuning for Language-Model-as-a-Service

SUPERVISED-CONTRASTIVE-LEARNING-FOR-PRE-TRAINED-LANGUAGE-MODEL-FINE-TUNING - The Facebook paper about fine tuning RoBERTa with contrastive loss

This repository accompanies our paper “Do Prompt-Based Models Really Understand the Meaning of Their Prompts?”

Pytorch based library to rank predicted bounding boxes using text/image user's prompts.

Cartoon-StyleGan2 🙃 : Fine-tuning StyleGAN2 for Cartoon Face Generation

Fine-tuning StyleGAN2 for Cartoon Face Generation

Image morphing without reference points by applying warp maps and optimizing over them.

Adversarial Color Enhancement: Generating Unrestricted Adversarial Images by Optimizing a Color Filter

Code for Iso-Points: Optimizing Neural Implicit Surfaces with Hybrid Representations

Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track (SIGIR 2021 Full Paper).

Minimal PyTorch implementation of Generative Latent Optimization from the paper "Optimizing the Latent Space of Generative Networks"

Feed forward VQGAN-CLIP model, where the goal is to eliminate the need for optimizing the latent space of VQGAN for each input prompt

⚡️Optimizing einsum functions in NumPy, Tensorflow, Dask, and more with contraction order optimization.

PHOTONAI is a high level python API for designing and optimizing machine learning pipelines.

The implementation of "Optimizing Shoulder to Shoulder: A Coordinated Sub-Band Fusion Model for Real-Time Full-Band Speech Enhancement"

Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Loop Story Generation"