Prefix-Tuning: Optimizing Continuous Prompts for Generation

Overview

Prefix Tuning

Files:

.
├── gpt2                          # Code for GPT2 style autoregressive LM
│   ├── train_e2e.py              # high-level scripts to train.
│   ├── train_control.py          # code that implements prefix-tuning.
│   ├── trainer_prefix.py         # trainer code for the training loop. 
│   ├── run_language_modeling.py  # training code (contains data loading, model loading, and calls trainer)
│   ├── gen.py                    # high-level scripts to decode. 
│   └── run_generation.py         # decoding code. 
│
├── seq2seq                       # Code for encoder-decoder architecture
│   ├── train_bart.py             # high-level scripts to train.
│   ├── prefixTuning.py           # code that implements prefix-tuning.
│   ├── finetune.py               # training code (contains data loading, model loading, and calls trainer)   
│   ├── lightning_base.py         # helper code
│   ├── utils.py                  # helper code
│   └── callbacks.py              # helper code
└── ...

To run the code for GPT2 style autoregressive LM, the code is in gpt2/. This corresponds to the table-to-text experiments in the paper.

To run the code for encoder-decoder architecture like BART, the code is in seq2seq. This corresponds to the summarization experiments in the paper.

The two primary scripts I used to run my codes are gpt2/train_e2e.py (for table-to-text) and seq2seq/train_bart.py(for summarization). they are set to default of good hyperparameters, and can be used to tune hyperparameter :)


Setup:

cd transformer; pip install -e .


Train via prefix-tuning:

cd gpt2;

python train_e2e.py --optim_prefix yes --preseqlen 5 --epoch 5 --learning_rate 0.00005 --mode webnlg --bsz 5 --seed 101
cd seq2seq; 

python train_bart.py --mode xsum --preseqlen 200 --do_train yes --fp16 yes --bsz 16  --epoch 30  --gradient_accumulation_step 3 --learning_rate 0.00005  --mid_dim 800

Other baseline approaches

cd gpt2;

python train_e2e.py --tuning_mode {finetune/adaptertune} --epoch 5 --learning_rate 0.00005 --mode webnlg --bsz 5 --seed 101
cd seq2seq;

python train_e2e.py --tuning_mode finetune --epoch 5 --learning_rate 0.00005 --mode webnlg --bsz 5 --seed 101

Decode:

cd gpt2;

python gen.py {data2text/webnlg/...} yes test {checkpoint_path} no
cd seq2seq; 

python train_bart.py --mode xsum --do_train no --prefix_model_path {checkpoint_path} --preseqlen {same as training} --mid_dim {same as training}

For details of the methods and results, please refer to our paper.

@misc{li2021prefixtuning,
      title={Prefix-Tuning: Optimizing Continuous Prompts for Generation}, 
      author={Xiang Lisa Li and Percy Liang},
      year={2021},
      eprint={2101.00190},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
Comments
  • XSUM dataset differences with original

    XSUM dataset differences with original

    Hello, You shared the xsum dataset link here https://github.com/XiangLi1999/PrefixTuning/issues/2

    However I see from the colab link https://worksheets.codalab.org/bundles/0x58f85171b43f4e61bf411c35faab369d and from the hyperparameters/data directory in https://worksheets.codalab.org/bundles/0xa3f0cd3c10c7490ab508a351968cbdcf that you have used xsum_news data. When I checked xsum_news, I found that the validation file has 7,186 examples. However, the original dataset has 11,327 examples. The test set is also different with 11,333 examples in xsum_news vs. 20,418 in the original xsum.

    I was wondering if you could explain the differences in eval/test dataset sizes compared to the original and perhaps provide your script for preprocessing the original xsum.

    Thanks!

    opened by jpilaul 16
  • OOM error

    OOM error

    Hi, I tried the seq2seq prefixtuning and found:

    RuntimeError: CUDA out of memory. Tried to allocate 1.20 GiB (GPU 0; 15.90 GiB total capacity; 4.63 GiB already allocated; 797.50 MiB free; 5.81 GiB reserved in total by PyTorch)

    I run the expr on a 16GB GPU. Am I supposed to use a 32GB GPU instead? Thanks!

    opened by taineleau 9
  • About the training speed verification

    About the training speed verification

    Hi Lisa~ I rewrite the code refer to yours on BART based on the newest huggingface transformers, and I want to verify a thing that according to my training procedure, the speed of the prefix-training is about 60%~70% of the all parameter finetune, even when I used a very very small prefix prompt module. I want to ask for your help that: does that make sense? And where may be the bottle neck of the speed? Hope for you reply.

    opened by Timothyxxx 6
  • Model save not working

    Model save not working

    There are a few checkpoint_callback being created in lighting_base.py and I think that using the callback on line https://github.com/XiangLi1999/PrefixTuning/blob/cleaned/seq2seq/lightning_base.py#L749 does allow us to save the model. I am rerunning the model right now to verify without the line. However, since it takes a long time to train, I was hoping that you can help me fix model saving. Thanks

    opened by jpilaul 6
  • RuntimeError: Input, output and indices must be on the current device

    RuntimeError: Input, output and indices must be on the current device

    Hi, I met a RuntimeError when training a prefix model. Do you have any suggestions?

    Here is the evironments:
    certifi (2021.5.30)
    charset-normalizer (2.0.4)
    click (8.0.1)
    dataclasses (0.8)
    filelock (3.0.12)
    idna (3.2)
    importlib-metadata (4.8.1)
    itsdangerous (2.0.1)
    Jinja2 (3.0.1)
    joblib (1.0.1)
    MarkupSafe (2.0.1)
    nltk (3.6.2)
    numpy (1.19.5)
    packaging (21.0)
    Pillow (8.3.2)
    pip (9.0.3)
    pyparsing (2.4.7)
    Python-dev (2.0.0.dev0)
    regex (2021.8.28)
    requests (2.26.0)
    sacremoses (0.0.45)
    sentencepiece (0.1.96)
    setuptools (39.2.0)
    six (1.16.0)
    tokenizers (0.8.1rc2)
    torch (1.8.0+cu111)
    torchvision (0.9.0+cu111)
    tqdm (4.62.2)
    transformers (3.2.0, /home/yanzhongxiang/PrefixTuning/transformers/src)
    typing-extensions (3.10.0.2)
    urllib3 (1.26.6)
    Werkzeug (2.0.1)
    zipp (3.5.0)
    

    Here is the command line: python train_e2e.py --optim_prefix yes --preseqlen 5 --epoch 5 --learning_rate 0.00005 --mode webnlg --bsz 5 --seed 101 --cache_dir ./cache

    Here is the error information:

    webnlg_models/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1
    python run_language_modeling.py         --output_dir=webnlg_models/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1         --model_type=gpt2         --model_name_or_path=gpt2-medium         --tokenizer_name=gpt2-medium         --per_device_train_batch_size 5         --per_device_eval_batch_size 5         --save_steps 500000         --num_train_epochs 5         --do_train         --train_data_file=../data/webnlg_challenge_2017/train.json         --do_eval         --line_by_line         --save_total_limit 1         --overwrite_output_dir         --task_mode webnlg         --eval_data_file=../data/webnlg_challenge_2017/dev.json          --tuning_mode prefixtune --logging_dir webnlg_models/runs/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1         --train_embs no --optim_prefix yes --preseqlen 5 --prefix_mode activation --format_mode cat --gradient_accumulation_steps 1 --learning_rate 5e-05 --weight_decay 0.0 --seed 101 --disable_tqdm --mid_dim 512 --init_random no --use_dropout no --prefix_dropout 0.0 --objective_mode 1 --evaluate_during_training --eval_steps 5000  --cache_dir cache/gpt2-medium-s3 
    /home/yanzhongxiang/PrefixTuning/transformers/src/transformers/__init__.py
    /home/yanzhongxiang/PrefixTuning/transformers/src/transformers/training_args.py:299: FutureWarning: The `evaluate_during_training` argument is deprecated in favor of `evaluation_strategy` (which has more options)
      FutureWarning,
    09/16/2021 10:22:04 - WARNING - __main__ -   Process rank: -1, device: cuda:0, n_gpu: 8, distributed training: False, 16-bits training: False
    09/16/2021 10:22:04 - INFO - __main__ -   Training/evaluation parameters TrainingArguments(output_dir='webnlg_models/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1', overwrite_output_dir=True, do_train=True, do_eval=True, do_predict=False, evaluate_during_training=True, evaluation_strategy=<EvaluationStrategy.STEPS: 'steps'>, prediction_loss_only=False, per_device_train_batch_size=5, per_device_eval_batch_size=5, per_gpu_train_batch_size=None, per_gpu_eval_batch_size=None, gradient_accumulation_steps=1, learning_rate=5e-05, weight_decay=0.0, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=5.0, max_steps=-1, warmup_steps=0, logging_dir='webnlg_models/runs/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1', logging_first_step=False, logging_steps=500, save_steps=500000, save_total_limit=1, no_cuda=False, seed=101, fp16=False, fp16_opt_level='O1', local_rank=-1, tpu_num_cores=None, tpu_metrics_debug=False, debug=False, dataloader_drop_last=False, eval_steps=5000, dataloader_num_workers=0, past_index=-1, run_name=None, disable_tqdm=True, remove_unused_columns=True, label_names=None)
    objective is 1
    False
    /home/yanzhongxiang/PrefixTuning/transformers/src/transformers/tokenization_utils_base.py:1324: FutureWarning: The `max_len` attribute has been deprecated and will be removed in a future version, use `model_max_length` instead.
      FutureWarning,
    prefixtune
    adapting the size of the model embedding to include [PAD]
    len(tokenizer) =  50257
    len(tokenizer) =  50258
    <|endoftext|> 50256
    <|endoftext|> 50256
    loading the prefix model from  None
    training the prefix model from scratch. 
    under the PrefixTuning model
    PrefixTuning
    preseqlen is 5, optimizing the prefix directly
    [Full prefix-tuning Setting :) ]
    torch.Size([5, 1024])
    torch.Size([512, 1024])
    torch.Size([512])
    torch.Size([49152, 512])
    torch.Size([49152])
    total param is 25744896
    webnlg
    tgt_avg:  30.665242718446603
    src_avg:  49.62568654646324
    ratios:  1.6183040519881826
    [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 383, 317, 283, 7537, 318, 262, 9003, 286, 317, 283, 7537, 11, 16490, 13, 220, 50256]
    [220, 930, 317, 283, 7537, 62, 16170, 634, 1058, 1748, 50, 8520, 1058, 366, 32, 283, 7537, 11, 16490, 1, 220, 50256, 383, 317, 283, 7537, 318, 262, 9003, 286, 317, 283, 7537, 11, 16490, 13, 220, 50256]
      | Aarhus_Airport : cityServed : "Aarhus, Denmark" <|endoftext|> The Aarhus is the airport of Aarhus, Denmark. <|endoftext|>
    [220, 930, 317, 283, 7537, 62, 16170, 634, 1058, 1748, 50, 8520, 1058, 366, 32, 283, 7537, 11, 16490, 1, 220]
    [50256, 383, 317, 283, 7537, 318, 262, 9003, 286, 317, 283, 7537, 11, 16490, 13, 220, 50256]
    [1748, 50, 8520]
    
    [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 317, 283, 7537, 12690, 9179, 262, 1748, 286, 317, 283, 7537, 11, 16490, 13, 220, 50256]
    [220, 930, 317, 283, 7537, 62, 16170, 634, 1058, 1748, 50, 8520, 1058, 366, 32, 283, 7537, 11, 16490, 1, 220, 50256, 317, 283, 7537, 12690, 9179, 262, 1748, 286, 317, 283, 7537, 11, 16490, 13, 220, 50256]
      | Aarhus_Airport : cityServed : "Aarhus, Denmark" <|endoftext|> Aarhus Airport serves the city of Aarhus, Denmark. <|endoftext|>
    [220, 930, 317, 283, 7537, 62, 16170, 634, 1058, 1748, 50, 8520, 1058, 366, 32, 283, 7537, 11, 16490, 1, 220]
    [50256, 317, 283, 7537, 12690, 9179, 262, 1748, 286, 317, 283, 7537, 11, 16490, 13, 220, 50256]
    [1748, 50, 8520]
    webnlg
    tgt_avg:  31.644375553587246
    src_avg:  51.023914968999115
    ratios:  1.6124165535386898
    [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 383, 3554, 286, 317, 283, 7537, 318, 12806, 13319, 82, 36232, 13, 220, 50256]
    [220, 930, 317, 283, 7537, 1058, 3554, 5376, 1058, 12806, 62, 33, 917, 82, 36232, 220, 50256, 383, 3554, 286, 317, 283, 7537, 318, 12806, 13319, 82, 36232, 13, 220, 50256]
      | Aarhus : leaderName : Jacob_Bundsgaard <|endoftext|> The leader of Aarhus is Jacob Bundsgaard. <|endoftext|>
    [220, 930, 317, 283, 7537, 1058, 3554, 5376, 1058, 12806, 62, 33, 917, 82, 36232, 220]
    [50256, 383, 3554, 286, 317, 283, 7537, 318, 12806, 13319, 82, 36232, 13, 220, 50256]
    [3554, 5376]
    
    [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 317, 283, 7537, 12690, 338, 23443, 4129, 318, 20479, 17, 13, 15, 13, 220, 50256]
    [220, 930, 317, 283, 7537, 62, 16170, 634, 1058, 23443, 24539, 1058, 20479, 17, 13, 15, 220, 50256, 317, 283, 7537, 12690, 338, 23443, 4129, 318, 20479, 17, 13, 15, 13, 220, 50256]
      | Aarhus_Airport : runwayLength : 2702.0 <|endoftext|> Aarhus Airport's runway length is 2702.0. <|endoftext|>
    [220, 930, 317, 283, 7537, 62, 16170, 634, 1058, 23443, 24539, 1058, 20479, 17, 13, 15, 220]
    [50256, 317, 283, 7537, 12690, 338, 23443, 4129, 318, 20479, 17, 13, 15, 13, 220, 50256]
    [23443, 24539]
    FORMAT MODE IS  cat
    /home/yanzhongxiang/PrefixTuning/gpt2/trainer_prefix.py:309: FutureWarning: Passing `prediction_loss_only` as a keyword argument is deprecated and won't be possible in a future version. Use `args.prediction_loss_only` instead.
      FutureWarning,
    09/16/2021 10:22:53 - WARNING - trainer_prefix -   You are instantiating a Trainer but Tensorboard is not installed. You should consider installing it.
    /home/yanzhongxiang/PrefixTuning/gpt2/trainer_prefix.py:1291: FutureWarning: This method is deprecated, use `Trainer.is_world_process_zero()` instead.
      warnings.warn("This method is deprecated, use `Trainer.is_world_process_zero()` instead.", FutureWarning)
    {'state': {}, 'param_groups': [{'weight_decay': 0.0, 'lr': 5e-05, 'betas': (0.9, 0.999), 'eps': 1e-08, 'correct_bias': True, 'params': [0, 1, 2]}, {'weight_decay': 0.0, 'lr': 5e-05, 'betas': (0.9, 0.999), 'eps': 1e-08, 'correct_bias': True, 'params': [3, 4]}]}
    09/16/2021 10:22:53 - INFO - trainer_prefix -   ***** Running training *****
    09/16/2021 10:22:53 - INFO - trainer_prefix -     Num examples = 18025
    09/16/2021 10:22:53 - INFO - trainer_prefix -     Num Epochs = 5
    09/16/2021 10:22:53 - INFO - trainer_prefix -     Instantaneous batch size per device = 5
    09/16/2021 10:22:53 - INFO - trainer_prefix -     Total train batch size (w. parallel, distributed & accumulation) = 40
    09/16/2021 10:22:53 - INFO - trainer_prefix -     Gradient Accumulation steps = 1
    09/16/2021 10:22:53 - INFO - trainer_prefix -     Total optimization steps = 2255
    Traceback (most recent call last):
      File "run_language_modeling.py", line 1159, in <module>
        main()
      File "run_language_modeling.py", line 993, in main
        trainer.train(model_path=model_path)
      File "/home/yanzhongxiang/PrefixTuning/gpt2/trainer_prefix.py", line 811, in train
        tr_loss += self.training_step(model, inputs)
      File "/home/yanzhongxiang/PrefixTuning/gpt2/trainer_prefix.py", line 1174, in training_step
        loss = self.compute_loss(model, inputs, gpt2_model=self.gpt2)
      File "/home/yanzhongxiang/PrefixTuning/gpt2/trainer_prefix.py", line 1214, in compute_loss
        outputs = model(**inputs, gpt2_model=gpt2_model)
      File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 167, in forward
        outputs = self.parallel_apply(replicas, inputs, kwargs)
      File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 177, in parallel_apply
        return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
      File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
        output.reraise()
      File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/_utils.py", line 429, in reraise
        raise self.exc_type(msg)
    RuntimeError: Caught RuntimeError in replica 1 on device 1.
    Original Traceback (most recent call last):
      File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
        output = module(*input, **kwargs)
      File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/home/yanzhongxiang/PrefixTuning/gpt2/train_control.py", line 327, in forward
        return_dict=return_dict, **kwargs)
      File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/home/yanzhongxiang/PrefixTuning/transformers/src/transformers/modeling_gpt2.py", line 951, in forward
        return_dict=return_dict,
      File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/home/yanzhongxiang/PrefixTuning/transformers/src/transformers/modeling_gpt2.py", line 619, in forward
        inputs_embeds = self.wte(input_ids)
      File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/sparse.py", line 147, in forward
        self.norm_type, self.scale_grad_by_freq, self.sparse)
      File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/functional.py", line 1913, in embedding
        return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
    RuntimeError: Input, output and indices must be on the current device
    
    
    opened by super-buster 4
  • python: can't open file '/u/scr/xlisali/e2e-metrics/measure_scores.py': [Errno 2] No such file or directory

    python: can't open file '/u/scr/xlisali/e2e-metrics/measure_scores.py': [Errno 2] No such file or directory

    Hi,

    I got the following error which says python: can't open file '/u/scr/xlisali/e2e-metrics/measure_scores.py': [Errno 2] No such file or directory when I run that command CUDA_VISIBLE_DEVICES=0 python train_e2e.py --optim_prefix yes --preseqlen 5 --epoch 5 --learning_rate 0.00008 --mode data2text --bsz 10 --seed 101 --tuning_mode prefixtune --cache_dir ./cache

    Does anyone meet that issue or know how to deal with that? Thank you so much.

    cat
    True False
    control code is  None
    beam
    Setting `pad_token_id` to 50256 (first `eos_token_id`) to generate sequence
    === GENERATED SEQUENCE 1 ===
     name : Zizzi | Type : pub | customer rating : average | near : Burger King <|endoftext|> Zizzi is a pub near Burger King. It has an average customer rating.  <|endoftext|>
    
    cat
    True False
    control code is  None
    beam
    Setting `pad_token_id` to 50256 (first `eos_token_id`) to generate sequence
    === GENERATED SEQUENCE 1 ===
     name : Zizzi | Type : pub | customer rating : high | near : Burger King <|endoftext|> Zizzi is a pub near Burger King with a high customer rating.  <|endoftext|>
    
    cat
    True False
    control code is  None
    beam
    Setting `pad_token_id` to 50256 (first `eos_token_id`) to generate sequence
    === GENERATED SEQUENCE 1 ===
     name : Zizzi | Type : pub | near : The Sorrento <|endoftext|> Zizzi is a pub near The Sorrento.  <|endoftext|>
    
    /data/qbao775/PrefixTuning/gpt2/e2e_results_conv2/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_test_beam_eval
     /data/qbao775/PrefixTuning/gpt2/e2e_results_conv2/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_test_gold
     /data/qbao775/PrefixTuning/gpt2/e2e_results_conv2/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_test_beam
    python: can't open file '/u/scr/xlisali/e2e-metrics/measure_scores.py': [Errno 2] No such file or directory
    
    
    Here are my environment configuration:
    Package                Version     Location
    ---------------------- ----------- -------------------------------------------
    absl-py                0.14.1
    cachetools             4.2.4
    certifi                2021.10.8
    charset-normalizer     2.0.7
    click                  8.0.3
    filelock               3.3.0
    future                 0.18.2
    google-auth            1.35.0
    google-auth-oauthlib   0.4.6
    grpcio                 1.41.0
    idna                   3.3
    joblib                 1.1.0
    Markdown               3.3.4
    nltk                   3.6.5
    numpy                  1.21.2
    oauthlib               3.1.1
    packaging              21.0
    Pillow                 8.3.2
    pip                    20.0.2
    pkg-resources          0.0.0
    protobuf               3.18.1
    pyasn1                 0.4.8
    pyasn1-modules         0.2.8
    pyparsing              2.4.7
    pytorch-lightning      0.9.0
    PyYAML                 6.0
    regex                  2021.10.8
    requests               2.26.0
    requests-oauthlib      1.3.0
    rsa                    4.7.2
    sacremoses             0.0.46
    sentencepiece          0.1.96
    setuptools             44.0.0
    six                    1.16.0
    tensorboard            2.2.0
    tensorboard-plugin-wit 1.8.0
    tokenizers             0.8.1rc2
    torch                  1.8.0+cu111
    torchvision            0.9.0+cu111
    tqdm                   4.62.3
    transformers           3.2.0       /data/qbao775/PrefixTuning/transformers/src
    typing-extensions      3.10.0.2
    urllib3                1.26.7
    Werkzeug               2.0.2
    wheel                  0.37.0
    
    opened by 14H034160212 3
  • PyTorch Lightning Version?

    PyTorch Lightning Version?

    What version of PyTorch Lightning was this built with? I followed the setup instructions to install the requirements, but I keep get errors from misnamed parameters in the seq2seq module (the gpt-2 module works fine). I can fix the errors as they come up by consulting the current PyTorch Lightning documentation (filepath in the trace should be dirpath for example), but I'd rather use the code as written instead of manually updating it.

    Traceback (most recent call last): File "finetune.py", line 876, in main(args) File "finetune.py", line 782, in main checkpoint_callback=get_checkpoint_callback(args.output_dir, model.val_metric, args.save_top_k, lower_is_better), #LISA File "/workspace/PrefixTuning/seq2seq/callbacks.py", line 105, in get_checkpoint_callback period=0, # maybe save a checkpoint every time val is run, not just end of epoch. TypeError: init() got an unexpected keyword argument 'filepath'

    opened by ekoenitz 2
  • How to evaluate DART ? The test set may be changed ?

    How to evaluate DART ? The test set may be changed ?

    Hi, Lisa! I read your paper and you have done brilliant work. I want to use GPT to fine-tune the DART dataset. However, I don't know how to evaluate my results. The official scripts (https://github.com/Yale-LILY) provide a different test set (5,097 samples), which has different references, too. I use your test set (12,552 samples) to do generation and evaluate its performance based on the target sentences in the test set (12, 552 samples are aligned, so for each sample, I got only 1 reference). However, I can only get BLEU about 26.28 (GPT large), much lower than yours. Could you please answer me how to evaluate it? Thank you!

    opened by JinliangLu96 2
  • Data preparation step

    Data preparation step

    Hi @XiangLi1999

    Thanks for releasing your code. I was wondering how it is possible to download the "webnlg" dataset? I was not able to find any .json format of webnlg dataset. Could you please share your data version as well?

    Best, Mohammad

    opened by mmderakhshani 1
  • OSError: [Errno 30] Read-only file system: '/u'

    OSError: [Errno 30] Read-only file system: '/u'

    python train_e2e.py --optim_prefix yes --preseqlen 5 --epoch 5 --learning_rate 0.00005 --mode webnlg --bsz 5 --seed 101 would cause the error on my local PC below. I just did the environment set up and install nothing else. Should I install something instead?

    Traceback (most recent call last): File "/Users/.../PrefixTuning/transformers/src/transformers/configuration_utils.py", line 355, in get_config_dict local_files_only=local_files_only, File "/Users/.../PrefixTuning/transformers/src/transformers/file_utils.py", line 719, in cached_path local_files_only=local_files_only, File "/Users/.../PrefixTuning/transformers/src/transformers/file_utils.py", line 821, in get_from_cache os.makedirs(cache_dir, exist_ok=True) File "/Users/.../opt/anaconda3/envs/sc/lib/python3.7/os.py", line 213, in makedirs makedirs(head, exist_ok=exist_ok) File "/Users/.../opt/anaconda3/envs/sc/lib/python3.7/os.py", line 213, in makedirs makedirs(head, exist_ok=exist_ok) File "/Users/.../opt/anaconda3/envs/sc/lib/python3.7/os.py", line 213, in makedirs makedirs(head, exist_ok=exist_ok) [Previous line repeated 4 more times] File "/Users/.../opt/anaconda3/envs/sc/lib/python3.7/os.py", line 223, in makedirs mkdir(name, mode) OSError: [Errno 30] Read-only file system: '/u'

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "run_language_modeling.py", line 1159, in main() File "run_language_modeling.py", line 546, in main config = AutoConfig.from_pretrained(model_args.model_name_or_path, cache_dir=model_args.cache_dir) File "/Users/.../PrefixTuning/transformers/src/transformers/configuration_auto.py", line 310, in from_pretrained config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs) File "/Users/.../PrefixTuning/transformers/src/transformers/configuration_utils.py", line 368, in get_config_dict raise EnvironmentError(msg) OSError: Can't load config for 'gpt2-medium'. Make sure that:

    • 'gpt2-medium' is a correct model identifier listed on 'https://huggingface.co/models'

    • or 'gpt2-medium' is the correct path to a directory containing a config.json file

    opened by wanglec 1
  • TypeError: setup() got an unexpected keyword argument 'stage'

    TypeError: setup() got an unexpected keyword argument 'stage'

    Traceback (most recent call last): File "/home/ubuntu/anaconda3/envs/torch/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 682, in _call_and_handle_interrupt return trainer_fn(*args, **kwargs) File "/home/ubuntu/anaconda3/envs/torch/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 770, in _fit_impl self._run(model, ckpt_path=ckpt_path) File "/home/ubuntu/anaconda3/envs/torch/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1132, in _run self._call_setup_hook() # allow user to setup lightning_module in accelerator environment File "/home/ubuntu/anaconda3/envs/torch/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1432, in _call_setup_hook self.call_hook("setup", stage=fn) File "/home/ubuntu/anaconda3/envs/torch/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1483, in call_hook output = model_fx(*args, **kwargs) TypeError: setup() got an unexpected keyword argument 'stage'

    Process finished with exit code 1

    opened by YahooHu 1
  • control code is not used in PrefixTuning.get_prompt()

    control code is not used in PrefixTuning.get_prompt()

    Hi, thanks for sharing the codes.

    I have tried the webnlg task and data2text task with the 'cleaned' branch. But I found that the "control_code" argument is not used in all the implementations of PrefixTuning.get_prompt(). Does this mean that different categories of webnlg dataset will use the same soft prompt? I found that there are get_prompt_p3, get_prompt_p1 and get_prompt_p4 in the master branch. Can I use them to reproduce the results of the paper?

    Thanks.

    opened by XinyuGuan01 3
  • Is it necessary to arrange position ids between [prefix_len, prefix_len+seq_len) ?

    Is it necessary to arrange position ids between [prefix_len, prefix_len+seq_len) ?

    I found position ids is in [prefix_len, prefix_len+seq_len) in modeling_gpt2.py

    position_ids = torch.arange(past_length, input_shape[-1] + past_length, dtype=torch.long, device=device)

    https://github.com/XiangLi1999/PrefixTuning/blob/6519d30e69b15a180f23e2cd41b766d3f62b8e82/transformers/src/transformers/modeling_gpt2.py#L579

    Is it OK to just make position ids in [0, seq_len) ? Since I have not found the use of position embeddings for prefix matrix.

    opened by baiyuting 2
  • question about the initialization experiment

    question about the initialization experiment

    Hi, thanks for the great work!

    In section 7.4, it conducts an initialization experiment with real words. I am just wondering, does this initialization applies to prompts in every layer? Or just the prompts in the first layer? And how does this work together with the re-parameterization method since the input dimension of re-param is much smaller?

    And I also noticed that in your code, instead of directly adding prompts to the input of each layer (as described in ur paper), what u actually did is appending vectors to key value matrices directly via the past_key_values argument. Just wondering, how does the initialization experiment work in this setup/implementation? Directly initialize the key/value vectors? But seems that the dimension doesn't match?

    Thanks!

    opened by Tsingularity 0
  • notation typo in the paper

    notation typo in the paper

    Hi, thanks for the great work!

    I noticed one notation typo in the paper. It's in footnote 4 of page 5. The second P_theta should be P_theta'.

    Hope this could help.

    opened by Tsingularity 0
  • Should've mentioned about

    Should've mentioned about "CRITICAL" modifications done in transformers source code

    Thanks for public opening of your work. I really appreciate your simple yet param-effective method for tuning PLMs.

    In fact, I've gone through hard time re-implementing the original experiment of yours. Until knowing that you've modified modeling_gpt2.py / GPT2LMHeadModel.prepare_inputs_for_generation() (and maybe lil' modifications in generation_utils.py) results were truly mysterious.

    The function mentioned above is necessary for making this method actually work. It preserves past_key_values passed. Otherwise, PLM will not incorporate the learned prefix embedding during the generation.

    It was really painful process to track this down. You hinted about modifications of data_collators but not about generation part of the transformers which is critical part of the implementation. Meh😕.

    Hope this helps the other visitors.

    opened by sonsus 0
Owner
null
code for the ICLR'22 paper: On Robust Prefix-Tuning for Text Classification

On Robust Prefix-Tuning for Text Classification Prefix-tuning has drawed much attention as it is a parameter-efficient and modular alternative to adap

Zonghan Yang 12 Nov 30, 2022
MLOps will help you to understand how to build a Continuous Integration and Continuous Delivery pipeline for an ML/AI project.

page_type languages products description sample python azure azure-machine-learning-service azure-devops Code which demonstrates how to set up and ope

null 1 Nov 1, 2021
Simple image captioning model - CLIP prefix captioning.

Simple image captioning model - CLIP prefix captioning.

null 688 Jan 4, 2023
Black-Box-Tuning - Black-Box Tuning for Language-Model-as-a-Service

Black-Box-Tuning Source code for paper "Black-Box Tuning for Language-Model-as-a

Tianxiang Sun 149 Jan 4, 2023
Saeed Lotfi 28 Dec 12, 2022
This repository accompanies our paper “Do Prompt-Based Models Really Understand the Meaning of Their Prompts?”

This repository accompanies our paper “Do Prompt-Based Models Really Understand the Meaning of Their Prompts?” Usage To replicate our results in Secti

Albert Webson 64 Dec 11, 2022
Pytorch based library to rank predicted bounding boxes using text/image user's prompts.

pytorch_clip_bbox: Implementation of the CLIP guided bbox ranking for Object Detection. Pytorch based library to rank predicted bounding boxes using t

Sergei Belousov 50 Nov 27, 2022
Cartoon-StyleGan2 🙃 : Fine-tuning StyleGAN2 for Cartoon Face Generation

Fine-tuning StyleGAN2 for Cartoon Face Generation

Jihye Back 520 Jan 4, 2023
Fine-tuning StyleGAN2 for Cartoon Face Generation

Cartoon-StyleGAN ?? : Fine-tuning StyleGAN2 for Cartoon Face Generation Abstract Recent studies have shown remarkable success in the unsupervised imag

Jihye Back 520 Jan 4, 2023
Image morphing without reference points by applying warp maps and optimizing over them.

Differentiable Morphing Image morphing without reference points by applying warp maps and optimizing over them. Differentiable Morphing is machine lea

Alex K 380 Dec 19, 2022
Adversarial Color Enhancement: Generating Unrestricted Adversarial Images by Optimizing a Color Filter

ACE Please find the preliminary version published at BMVC 2020 in the folder BMVC_version, and its extended journal version in Journal_version. Datase

null 28 Dec 25, 2022
Code for Iso-Points: Optimizing Neural Implicit Surfaces with Hybrid Representations

Implementation for Iso-Points (CVPR 2021) Official code for paper Iso-Points: Optimizing Neural Implicit Surfaces with Hybrid Representations paper |

Yifan Wang 66 Nov 8, 2022
Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track (SIGIR 2021 Full Paper).

Optimizing Dense Retrieval Model Training with Hard Negatives Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Jiafeng Guo, Min Zhang, Shaoping Ma This repo provi

Jingtao Zhan 99 Dec 27, 2022
Minimal PyTorch implementation of Generative Latent Optimization from the paper "Optimizing the Latent Space of Generative Networks"

Minimal PyTorch implementation of Generative Latent Optimization This is a reimplementation of the paper Piotr Bojanowski, Armand Joulin, David Lopez-

Thomas Neumann 117 Nov 27, 2022
Feed forward VQGAN-CLIP model, where the goal is to eliminate the need for optimizing the latent space of VQGAN for each input prompt

Feed forward VQGAN-CLIP model, where the goal is to eliminate the need for optimizing the latent space of VQGAN for each input prompt. This is done by

Mehdi Cherti 135 Dec 30, 2022
⚡️Optimizing einsum functions in NumPy, Tensorflow, Dask, and more with contraction order optimization.

Optimized Einsum Optimized Einsum: A tensor contraction order optimizer Optimized einsum can significantly reduce the overall execution time of einsum

Daniel Smith 653 Dec 30, 2022
PHOTONAI is a high level python API for designing and optimizing machine learning pipelines.

PHOTONAI is a high level python API for designing and optimizing machine learning pipelines. We've created a system in which you can easily select and

Medical Machine Learning Lab - University of Münster 57 Nov 12, 2022
The implementation of "Optimizing Shoulder to Shoulder: A Coordinated Sub-Band Fusion Model for Real-Time Full-Band Speech Enhancement"

SF-Net for fullband SE This is the repo of the manuscript "Optimizing Shoulder to Shoulder: A Coordinated Sub-Band Fusion Model for Real-Time Full-Ban

Guochen Yu 36 Dec 2, 2022
Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Loop Story Generation"

Storium GPT-2 Models This is the official repository for the GPT-2 models described in the EMNLP 2020 paper [STORIUM: A Dataset and Evaluation Platfor

Nader Akoury 27 Dec 20, 2022