Code for "LoRA: Low-Rank Adaptation of Large Language Models"

Microsoft

Last update: Jan 8, 2023

Related tags

Overview

LoRA: Low-Rank Adaptation of Large Language Models

This repo contains the implementation of LoRA in GPT-2 and steps to replicate the results in our recent paper

LoRA: Low-Rank Adaptation of Large Language Models
Edward J. Hu*, Yelong Shen*, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Weizhu Chen
Paper: https://arxiv.org/abs/2106.09685

LoRA reduces the number of trainable parameters by learning pairs of rank-decompostion matrices and freezing the original weights. This vastly reduces the storage requirement for large language models adapted to specific tasks and enables efficient task-switching during deployment without introducing inference latency. LoRA also outperforms several other adaptation methods including prefix-tuning and fine-tuning.

This repo reproduces our experiments on GPT-2.

Repository Overview

Our implementation is based on the fine-tuning code for GPT-2 in Hugging Face. There are several directories in this repo:

src/ contains the source code used for data processing, training, and decoding.
eval/ contains the code for task-specific evaluation scripts.
data/ contains the raw data we used in our experiments.
vocab/ contains the GPT-2 vocabulary files.

Getting Started

You can start with the following docker image: nvcr.io/nvidia/pytorch:20.03-py3 on a GPU-capable machine, but any generic PyTorch image should work.

docker run -it nvcr.io/nvidia/pytorch:20.03-py3

Clone the repo and install dependencies in a virtual environment (remove sudo if running in docker container):

sudo apt-get update
sudo apt-get -y install git jq virtualenv
git clone https://github.com/microsoft/LoRA.git; cd LoRA
virtualenv -p `which python3` ./venv
. ./venv/bin/activate
pip install -r requirement.txt
bash download_pretrained_checkpoints.sh
bash create_datasets.sh
cd ./eval
bash download_evalscript.sh
cd ..

Now we are ready to replicate the results in our paper.

Replicating Our Result on E2E

Train GPT-2 Medium with LoRA (see our paper for hyperparameters for GPT-2 Medium)

python -m torch.distributed.launch --nproc_per_node=1 src/gpt2_ft.py \
    --train_data ./data/e2e/train.jsonl \
    --valid_data ./data/e2e/valid.jsonl \
    --train_batch_size 2 \
    --grad_acc 1 \
    --valid_batch_size 1 \
    --seq_len 512 \
    --model_card gpt2.md \
    --init_checkpoint ./pretrained_checkpoints/gpt2-medium-pytorch_model.bin \
    --platform local \
    --clip 0.0 \
    --lr 0.0002 \
    --weight_decay 0.01 \
    --correct_bias \
    --adam_beta2 0.999 \
    --scheduler linear \
    --warmup_step 500 \
    --max_epoch 5 \
    --save_interval 1000 \
    --lora_dim 4 \
    --lora_alpha 32 \
    --lora_dropout 0.1 \
    --label_smooth 0.1 \
    --work_dir ./trained_models/GPT2_M/e2e \
    --random_seed 110

Generate outputs from the trained model using beam search:

python -m torch.distributed.launch --nproc_per_node=1 src/gpt2_beam.py \
    --data ./data/e2e/test.jsonl \
    --batch_size 1 \
    --seq_len 512 \
    --eval_len 64 \
    --model_card gpt2.lg \
    --init_checkpoint ./trained_models/GPT2_M/e2e/model.20000.pt \
    --platform local \
    --lora_dim 4 \
    --lora_alpha 32 \
    --beam 10 \
    --length_penalty 0.8 \
    --no_repeat_ngram_size 4 \
    --repetition_penalty 1.0 \
    --eos_token_id 628 \
    --work_dir ./trained_models/GPT2_M/e2e \
    --output_file predict.20000.b10p08.jsonl

Decode outputs from step (2)

python src/gpt2_decode.py \
    --vocab ./vocab \
    --sample_file ./trained_models/GPT2_M/e2e/predict.20000.b10p08.jsonl \
    --input_file ./data/e2e/test_formatted.jsonl \
    --output_ref_file e2e_ref.txt \
    --output_pred_file e2e_pred.txt

Run evaluation on E2E test set

python eval/e2e/measure_scores.py e2e_ref.txt e2e_pred.txt -p

Replicating Our Result on WebNLG

Follow steps 1 and 2 from E2E pipeline by replacing references to E2E with webnlg (see our paper for hyperparameters)
Decode outputs from beam search (step 2 above)

python src/gpt2_decode.py \
    --vocab ./vocab \
    --sample_file ./trained_models/GPT2_M/webnlg/predict.20000.b10p08.jsonl \
    --input_file ./data/webnlg_challenge_2017/test_formatted.jsonl \
    --ref_type webnlg \
    --ref_num 6 \
    --output_ref_file eval/GenerationEval/data/references_webnlg \
    --output_pred_file eval/GenerationEval/data/hypothesis_webnlg \
    --tokenize --lower

Run evaluation on WebNLG test set

cd ./eval/GenerationEval/
python eval.py \
    -R data/references_webnlg/reference \
    -H data/hypothesis_webnlg \
    -nr 6 \
    -m bleu,meteor,ter 
cd ../..

Replicating Our Result on DART

Follow steps 1 and 2 from E2E pipeline by replacing references to E2E with dart (see our paper for hyperparameters)
Decode outputs from beam search (step 2 above)

python src/gpt2_decode.py \
        --vocab ./vocab \
        --sample_file ./trained_models/GPT2_M/dart/predict.20000.b10p08.jsonl \
        --input_file ./data/dart/test_formatted.jsonl \
        --ref_type dart \
        --ref_num 6 \
        --output_ref_file eval/GenerationEval/data/references_dart \
        --output_pred_file eval/GenerationEval/data/hypothesis_dart \
        --tokenize --lower

Run evaluation on Dart test set

cd ./eval/GenerationEval/
python eval.py \
    -R data/references_dart/reference \
    -H data/hypothesis_dart \
    -nr 6 \
    -m bleu,meteor,ter 
cd ../..

Acknowledgements

We thank in alphabetical order Jianfeng Gao, Jade Huang, Jiayuan Huang, Lisa Xiang Li, Xiaodong Liu, Yabin Liu, Benjamin Van Durme, Luis Vargas, Haoran Wei, Peter Welinder, and Greg Yang for providing valuable feedback.

Contact

Please contact us if you have any questions.

Edward Hu ([email protected])
Yelong Shen ([email protected])
Phillip Wallis ([email protected])
Weizhu Chen ([email protected])

Citation

@misc{hu2021lora,
    title={LoRA: Low-Rank Adaptation of Large Language Models},
    author={Hu, Edward and Shen, Yelong and Wallis, Phil and Allen-Zhu, Zeyuan and Li, Yuanzhi and Chen, Weizhu},
    year={2021},
    eprint={2106.09685},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Comments

Confused by README wrt supporting HuggingFace models
Thanks for the nice repo! Currently, the readme states:

This repo contains the source code of the Python package loralib and several examples of how to integrate it with PyTorch models, such as those in HuggingFace.

However, from the examples, it seems like in order to use loralib with a HuggingFace model, we need to actually change the code implementation of each model, replacing each nn.Linear with the lora equivalent. If this is the case, I think it's a bit confusing to say the examples show integration with HuggingFace, because as far as I can tell, the examples use re-implementation of GPT2. I was hoping there may be some mechanism to do this automatically, e.g.

import transformers, loralib model = transformers.AutoModel.from_pretrained("gpt2") lora_model = loralib.wrap(model) # wrap all nn.Linear modules lora_params = loralib.get_lora_only_params(lora_model)

Is this possible? Thanks a lot!
opened by eric-mitchell 9
Questions about the experiments details
Hi, thanks for sharing the source code.

In Table 2, are these reported numbers the results of the test split or the validation split?

In Table 2, for the RoBbase (LoRA) on the RTE task, the reported result is 86.6, is this a typo? cause it is even much higher than the full-tuning results (delta = 7.9).
opened by speedcell4 4
What does `lora_moe` mean?

Good job! I extremely like LoRA. After a shot glimpse of the code, I find some config is related to lora_moe in model.py. But I did not see any arguments related to lora_moe in gpt2_ft.py. Can you give more introductions about lora_moe? Is it designed for models which are trained with moe? Or is it just a deprecated feature of LoRA？

opened by luofuli 3
This repo is missing a license file

This repository is currently missing a LICENSE.MD file outlining its license. A license helps users understand how to use your project in a compliant manner. You can find the standard MIT license text at the Microsoft repo templates LICENSE file: https://github.com/microsoft/repo-templates/blob/main/shared/LICENSE. If you would like to learn more about open source licenses, please visit the document at https://aka.ms/license.

opened by microsoft-github-policy-service[bot] 2
This repo is missing important files

There are important files that Microsoft projects should all have that are not present in this repository. A pull request has been opened to add the missing file(s). When the pr is merged this issue will be closed automatically.

Microsoft teams can learn more about this effort and share feedback within the open source guidance available internally.

Merge this pull request

opened by microsoft-github-policy-service[bot] 2
Fix initialization of A and B for nn.Embedding

The initialization of A and B for nn.Embedding was incorrect.

According to the paper and comment in code, lora_A should be initialized from normal distribution and lora_B to zero. But the implementation was in reversed order.

opened by yoquankara 2
Bump nbconvert from 6.0.1 to 6.3.0 in /examples/NLU/examples/research_projects/lxmert
Bumps nbconvert from 6.0.1 to 6.3.0.

Commits

cefe0bf Release 6.3.0

a534fb9 Release 6.3.0b0

87920c5 Add changelog for 6.3.0 (#1669)

dd6d9c7 add slide numbering (#1654)

5d2c5e2 Update state filter (#1664)

11ea593 fix: avoid closing the script tag early by escaping a forward slash (#1665)

968c5fb Fix HTML templates mentioned in help docs (#1653)

35c4d07 Add a new output filter that excludes widgets if there is no state (#1643)

c663c75 6.2.0

fd1dd15 6.2.0rc2

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 1
Bump notebook from 6.4.1 to 6.4.10 in /examples/NLU/examples/research_projects/lxmert
Bumps notebook from 6.4.1 to 6.4.10.

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 1
Bump pillow from 8.3.2 to 9.0.1 in /examples/NLU/examples/research_projects/lxmert
Bumps pillow from 8.3.2 to 9.0.1.

Release notes

Sourced from pillow's releases.

9.0.1

https://pillow.readthedocs.io/en/stable/releasenotes/9.0.1.html

Changes

In show_file, use os.remove to remove temporary images. CVE-2022-24303 #6010 [@radarhere, @hugovk]

Restrict builtins within lambdas for ImageMath.eval. CVE-2022-22817 #6009 [radarhere]

9.0.0

https://pillow.readthedocs.io/en/stable/releasenotes/9.0.0.html

Changes

Restrict builtins for ImageMath.eval() #5923 [@radarhere]

Ensure JpegImagePlugin stops at the end of a truncated file #5921 [@radarhere]

Fixed ImagePath.Path array handling #5920 [@radarhere]

Remove consecutive duplicate tiles that only differ by their offset #5919 [@radarhere]

Removed redundant part of condition #5915 [@radarhere]

Explicitly enable strip chopping for large uncompressed TIFFs #5517 [@kmilos]

Use the Windows method to get TCL functions on Cygwin #5807 [@DWesl]

Changed error type to allow for incremental WebP parsing #5404 [@radarhere]

Improved I;16 operations on big endian #5901 [@radarhere]

Ensure that BMP pixel data offset does not ignore palette #5899 [@radarhere]

Limit quantized palette to number of colors #5879 [@radarhere]

Use latin1 encoding to decode bytes #5870 [@radarhere]

Fixed palette index for zeroed color in FASTOCTREE quantize #5869 [@radarhere]

When saving RGBA to GIF, make use of first transparent palette entry #5859 [@radarhere]

Pass SAMPLEFORMAT to libtiff #5848 [@radarhere]

Added rounding when converting P and PA #5824 [@radarhere]

Improved putdata() documentation and data handling #5910 [@radarhere]

Exclude carriage return in PDF regex to help prevent ReDoS #5912 [@radarhere]

Image.NONE is only used for resampling and dithers #5908 [@radarhere]

Fixed freeing pointer in ImageDraw.Outline.transform #5909 [@radarhere]

Add Tidelift alignment action and badge #5763 [@aclark4life]

Replaced further direct invocations of setup.py #5906 [@radarhere]

Added ImageShow support for xdg-open #5897 [@m-shinder]

Fixed typo #5902 [@radarhere]

Switched from deprecated "setup.py install" to "pip install ." #5896 [@radarhere]

Support 16-bit grayscale ImageQt conversion #5856 [@cmbruns]

Fixed raising OSError in _safe_read when size is greater than SAFEBLOCK #5872 [@radarhere]

Convert subsequent GIF frames to RGB or RGBA #5857 [@radarhere]

WebP: Fix memory leak during decoding on failure #5798 [@ilai-deutel]

Do not prematurely return in ImageFile when saving to stdout #5665 [@infmagic2047]

Added support for top right and bottom right TGA orientations #5829 [@radarhere]

Corrected ICNS file length in header #5845 [@radarhere]

Block tile TIFF tags when saving #5839 [@radarhere]

Added line width argument to ImageDraw polygon #5694 [@radarhere]

Do not redeclare class each time when converting to NumPy #5844 [@radarhere]

Only prevent repeated polygon pixels when drawing with transparency #5835 [@radarhere]

... (truncated)

Changelog

Sourced from pillow's changelog.

9.0.1 (2022-02-03)

In show_file, use os.remove to remove temporary images. CVE-2022-24303 #6010 [radarhere, hugovk]

Restrict builtins within lambdas for ImageMath.eval. CVE-2022-22817 #6009 [radarhere]

9.0.0 (2022-01-02)

Restrict builtins for ImageMath.eval(). CVE-2022-22817 #5923 [radarhere]

Ensure JpegImagePlugin stops at the end of a truncated file #5921 [radarhere]

Fixed ImagePath.Path array handling. CVE-2022-22815, CVE-2022-22816 #5920 [radarhere]

Remove consecutive duplicate tiles that only differ by their offset #5919 [radarhere]

Improved I;16 operations on big endian #5901 [radarhere]

Limit quantized palette to number of colors #5879 [radarhere]

Fixed palette index for zeroed color in FASTOCTREE quantize #5869 [radarhere]

When saving RGBA to GIF, make use of first transparent palette entry #5859 [radarhere]

Pass SAMPLEFORMAT to libtiff #5848 [radarhere]

Added rounding when converting P and PA #5824 [radarhere]

Improved putdata() documentation and data handling #5910 [radarhere]

Exclude carriage return in PDF regex to help prevent ReDoS #5912 [hugovk]

Fixed freeing pointer in ImageDraw.Outline.transform #5909 [radarhere]

... (truncated)

Commits

6deac9e 9.0.1 version bump

c04d812 Update CHANGES.rst [ci skip]

4fabec3 Added release notes for 9.0.1

02affaa Added delay after opening image with xdg-open

ca0b585 Updated formatting

427221e In show_file, use os.remove to remove temporary images

c930be0 Restrict builtins within lambdas for ImageMath.eval

75b69dd Dont need to pin for GHA

cd938a7 Autolink CWE numbers with sphinx-issues

2e9c461 Add CVE IDs

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 1
Bump pillow from 8.3.2 to 9.0.0 in /examples/NLU/examples/research_projects/lxmert
Bumps pillow from 8.3.2 to 9.0.0.

Release notes

Sourced from pillow's releases.

9.0.0

https://pillow.readthedocs.io/en/stable/releasenotes/9.0.0.html

Changes

Restrict builtins for ImageMath.eval() #5923 [@radarhere]

Ensure JpegImagePlugin stops at the end of a truncated file #5921 [@radarhere]

Fixed ImagePath.Path array handling #5920 [@radarhere]

Remove consecutive duplicate tiles that only differ by their offset #5919 [@radarhere]

Removed redundant part of condition #5915 [@radarhere]

Explicitly enable strip chopping for large uncompressed TIFFs #5517 [@kmilos]

Use the Windows method to get TCL functions on Cygwin #5807 [@DWesl]

Changed error type to allow for incremental WebP parsing #5404 [@radarhere]

Improved I;16 operations on big endian #5901 [@radarhere]

Ensure that BMP pixel data offset does not ignore palette #5899 [@radarhere]

Limit quantized palette to number of colors #5879 [@radarhere]

Use latin1 encoding to decode bytes #5870 [@radarhere]

Fixed palette index for zeroed color in FASTOCTREE quantize #5869 [@radarhere]

When saving RGBA to GIF, make use of first transparent palette entry #5859 [@radarhere]

Pass SAMPLEFORMAT to libtiff #5848 [@radarhere]

Added rounding when converting P and PA #5824 [@radarhere]

Improved putdata() documentation and data handling #5910 [@radarhere]

Exclude carriage return in PDF regex to help prevent ReDoS #5912 [@radarhere]

Image.NONE is only used for resampling and dithers #5908 [@radarhere]

Fixed freeing pointer in ImageDraw.Outline.transform #5909 [@radarhere]

Add Tidelift alignment action and badge #5763 [@aclark4life]

Replaced further direct invocations of setup.py #5906 [@radarhere]

Added ImageShow support for xdg-open #5897 [@m-shinder]

Fixed typo #5902 [@radarhere]

Switched from deprecated "setup.py install" to "pip install ." #5896 [@radarhere]

Support 16-bit grayscale ImageQt conversion #5856 [@cmbruns]

Fixed raising OSError in _safe_read when size is greater than SAFEBLOCK #5872 [@radarhere]

Convert subsequent GIF frames to RGB or RGBA #5857 [@radarhere]

WebP: Fix memory leak during decoding on failure #5798 [@ilai-deutel]

Do not prematurely return in ImageFile when saving to stdout #5665 [@infmagic2047]

Added support for top right and bottom right TGA orientations #5829 [@radarhere]

Corrected ICNS file length in header #5845 [@radarhere]

Block tile TIFF tags when saving #5839 [@radarhere]

Added line width argument to ImageDraw polygon #5694 [@radarhere]

Do not redeclare class each time when converting to NumPy #5844 [@radarhere]

Only prevent repeated polygon pixels when drawing with transparency #5835 [@radarhere]

Fix pushes_fd method signature #5833 [@hoodmane]

Add support for pickling TrueType fonts #5826 [@hugovk]

Only prefer command line tools SDK on macOS over default MacOSX SDK #5828 [@radarhere]

Fix compilation on 64-bit Termux #5793 [@landfillbaby]

Replace 'setup.py sdist' with '-m build --sdist' #5785 [@hugovk]

Use declarative package configuration #5784 [@hugovk]

Use title for display in ImageShow #5788 [@radarhere]

Fix for PyQt6 #5775 [@hugovk]

... (truncated)

Changelog

Sourced from pillow's changelog.

9.0.0 (2022-01-02)

Restrict builtins for ImageMath.eval(). CVE-2022-22817 #5923 [radarhere]

Ensure JpegImagePlugin stops at the end of a truncated file #5921 [radarhere]

Fixed ImagePath.Path array handling. CVE-2022-22815, CVE-2022-22816 #5920 [radarhere]

Remove consecutive duplicate tiles that only differ by their offset #5919 [radarhere]

Improved I;16 operations on big endian #5901 [radarhere]

Limit quantized palette to number of colors #5879 [radarhere]

Fixed palette index for zeroed color in FASTOCTREE quantize #5869 [radarhere]

When saving RGBA to GIF, make use of first transparent palette entry #5859 [radarhere]

Pass SAMPLEFORMAT to libtiff #5848 [radarhere]

Added rounding when converting P and PA #5824 [radarhere]

Improved putdata() documentation and data handling #5910 [radarhere]

Exclude carriage return in PDF regex to help prevent ReDoS #5912 [hugovk]

Fixed freeing pointer in ImageDraw.Outline.transform #5909 [radarhere]

Added ImageShow support for xdg-open #5897 [m-shinder, radarhere]

Support 16-bit grayscale ImageQt conversion #5856 [cmbruns, radarhere]

Convert subsequent GIF frames to RGB or RGBA #5857 [radarhere]

... (truncated)

Commits

82541b6 9.0.0 version bump

cae5ac4 Merge pull request #5924 from radarhere/cves

ed4cf78 CVEs TBD

d7f60d1 Merge pull request #5923 from radarhere/imagemath_eval

8531b01 Restrict builtins for ImageMath.eval

1efb1d9 Merge pull request #5922 from radarhere/releasenotes

f6c7871 Added release notes for #5919, #5920 and #5921

032d2dc Update CHANGES.rst [ci skip]

baae9ec Merge pull request #5921 from radarhere/jpeg_eoi

1059eb5 If appended EOI did not work, do not keep trying

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 1
Did the number of parameters take into account the parameters in the tunable classification head?

Thanks for releasing the code! I noticed that the reporting number of parameters of Lora module is 0.3 M for roberta-base. After experiments, I found that there are 0.5M parameters tunable in the sequence classification head (but it's the same for all baselines, so I am not arguing about the fairness). I wonder was I correct about the setting? Did the performances in the paper were the ones that also tune a classification head for classification tasks?

opened by ShengdingHu 1
Fintuning 176B Bloom with lora

The paper says that it only need 350G VRAM to train 175B GPT3 with rank =4. Can you elaborate more about how this is done? Like, do you use Megraton-deepspeed?

In my experiment with bloom-3b, fintuning all parameters need 29G. After using lora with different experiment set, trainable parameters differ form 10M to 0.8M. But they all need around 20G VRAM. I find this a little bit weird.

opened by drxmy 2
Why Linear and MergedLinear?

Hi,

Thank you for this really nice paper.

This is not an issue but a general question, why is there a Linear and MergedLinear class?

Thank you,

Maxime.

opened by maximedb 0
Bump certifi from 2020.6.20 to 2022.12.7 in /examples/NLU/examples/research_projects/lxmert
Bumps certifi from 2020.6.20 to 2022.12.7.

Commits

9e9e840 2022.12.07

b81bdb2 2022.09.24

939a28f 2022.09.14

aca828a 2022.06.15.2

de0eae1 Only use importlib.resources's new files() / Traversable API on Python ≥3.11 ...

b8eb5e9 2022.06.15.1

47fb7ab Fix deprecation warning on Python 3.11 (#199)

b0b48e0 fixes #198 -- update link in license

9d514b4 2022.06.15

4151e88 Add py.typed to MANIFEST.in to package in sdist (#196)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
Bump pillow from 8.3.2 to 9.3.0 in /examples/NLU/examples/research_projects/lxmert
Bumps pillow from 8.3.2 to 9.3.0.

Release notes

Sourced from pillow's releases.

9.3.0

https://pillow.readthedocs.io/en/stable/releasenotes/9.3.0.html

Changes

Initialize libtiff buffer when saving #6699 [@radarhere]

Limit SAMPLESPERPIXEL to avoid runtime DOS #6700 [@wiredfool]

Inline fname2char to fix memory leak #6329 [@nulano]

Fix memory leaks related to text features #6330 [@nulano]

Use double quotes for version check on old CPython on Windows #6695 [@hugovk]

GHA: replace deprecated set-output command with GITHUB_OUTPUT file #6697 [@nulano]

Remove backup implementation of Round for Windows platforms #6693 [@cgohlke]

Upload fribidi.dll to GitHub Actions #6532 [@nulano]

Fixed set_variation_by_name offset #6445 [@radarhere]

Windows build improvements #6562 [@nulano]

Fix malloc in _imagingft.c:font_setvaraxes #6690 [@cgohlke]

Only use ASCII characters in C source file #6691 [@cgohlke]

Release Python GIL when converting images using matrix operations #6418 [@hmaarrfk]

Added ExifTags enums #6630 [@radarhere]

Do not modify previous frame when calculating delta in PNG #6683 [@radarhere]

Added support for reading BMP images with RLE4 compression #6674 [@npjg]

Decode JPEG compressed BLP1 data in original mode #6678 [@radarhere]

pylint warnings #6659 [@marksmayo]

Added GPS TIFF tag info #6661 [@radarhere]

Added conversion between RGB/RGBA/RGBX and LAB #6647 [@radarhere]

Do not attempt normalization if mode is already normal #6644 [@radarhere]

Fixed seeking to an L frame in a GIF #6576 [@radarhere]

Consider all frames when selecting mode for PNG save_all #6610 [@radarhere]

Don't reassign crc on ChunkStream close #6627 [@radarhere]

Raise a warning if NumPy failed to raise an error during conversion #6594 [@radarhere]

Only read a maximum of 100 bytes at a time in IMT header #6623 [@radarhere]

Show all frames in ImageShow #6611 [@radarhere]

Allow FLI palette chunk to not be first #6626 [@radarhere]

If first GIF frame has transparency for RGB_ALWAYS loading strategy, use RGBA mode #6592 [@radarhere]

Round box position to integer when pasting embedded color #6517 [@radarhere]

Removed EXIF prefix when saving WebP #6582 [@radarhere]

Pad IM palette to 768 bytes when saving #6579 [@radarhere]

Added DDS BC6H reading #6449 [@ShadelessFox]

Added support for opening WhiteIsZero 16-bit integer TIFF images #6642 [@JayWiz]

Raise an error when allocating translucent color to RGB palette #6654 [@jsbueno]

Moved mode check outside of loops #6650 [@radarhere]

Added reading of TIFF child images #6569 [@radarhere]

Improved ImageOps palette handling #6596 [@PososikTeam]

Defer parsing of palette into colors #6567 [@radarhere]

Apply transparency to P images in ImageTk.PhotoImage #6559 [@radarhere]

Use rounding in ImageOps contain() and pad() #6522 [@bibinhashley]

Fixed GIF remapping to palette with duplicate entries #6548 [@radarhere]

Allow remap_palette() to return an image with less than 256 palette entries #6543 [@radarhere]

Corrected BMP and TGA palette size when saving #6500 [@radarhere]

... (truncated)

Changelog

Sourced from pillow's changelog.

9.3.0 (2022-10-29)

Limit SAMPLESPERPIXEL to avoid runtime DOS #6700 [wiredfool]

Initialize libtiff buffer when saving #6699 [radarhere]

Inline fname2char to fix memory leak #6329 [nulano]

Fix memory leaks related to text features #6330 [nulano]

Use double quotes for version check on old CPython on Windows #6695 [hugovk]

Remove backup implementation of Round for Windows platforms #6693 [cgohlke]

Fixed set_variation_by_name offset #6445 [radarhere]

Fix malloc in _imagingft.c:font_setvaraxes #6690 [cgohlke]

Release Python GIL when converting images using matrix operations #6418 [hmaarrfk]

Added ExifTags enums #6630 [radarhere]

Do not modify previous frame when calculating delta in PNG #6683 [radarhere]

Added support for reading BMP images with RLE4 compression #6674 [npjg, radarhere]

Decode JPEG compressed BLP1 data in original mode #6678 [radarhere]

Added GPS TIFF tag info #6661 [radarhere]

Added conversion between RGB/RGBA/RGBX and LAB #6647 [radarhere]

Do not attempt normalization if mode is already normal #6644 [radarhere]

... (truncated)

Commits

d594f4c Update CHANGES.rst [ci skip]

909dc64 9.3.0 version bump

1a51ce7 Merge pull request #6699 from hugovk/security-libtiff_buffer

2444cdd Merge pull request #6700 from hugovk/security-samples_per_pixel-sec

744f455 Added release notes

0846bfa Add to release notes

799a6a0 Fix linting

00b25fd Hide UserWarning in logs

05b175e Tighter test case

13f2c5a Prevent DOS with large SAMPLESPERPIXEL in Tiff IFD

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
Different hyper-parameter between the paper and the code? (lora_alpha and a global batch size)
Hello, thank you for sharing the source code. While trying to reproduce SST2 task result with RoBERTa-base model, I've encountered some questions regarding the hyper-parameters, lora_alpha, and a global batch size. Since the paper's hyper-parameter setting and the reproducing script which does both training and evaluation (examples/NLU/roberta_base_sst2.sh) had some conflict.

First of all, is the reproducing script the actual script that you used for creating the numbers for the paper?

lora-alpha (8 or 16?) I'd like to know the exact lora-alpha that you used in training. In Appendix D, lora_alpha is 8. However, in examples/NLU/roberta_base_sst2.sh, it is written that lora-alpha is 16.

https://github.com/microsoft/LoRA/blob/70ca1efd17b6ca4a45bbdba98554d5b312a8d48c/examples/NLU/roberta_base_sst2.sh#L24
When I tried evaluation, lora-alpha 16 gave the better result.

Maybe you used lora_alpha as 8 in training, but lora_alpha was 16 in evaluation or else... it's a little bit confusing.

global batch size while training (16? 64? 128? or else?) In Appendix D, it is written that the batch size is 16, so I thought 16 was the global batch size while training. However, in examples/NLU/roberta_base_sst2.sh, it is written that per_device_train_batch_size is 16 and the number of gpu is 8. (So the global batch size should be 128) Moreover, the explanation in https://github.com/microsoft/LoRA/tree/main/examples/NLU#adapting-to-the-glue-benchmark said that the number of gpu used is 4. (So the global batch size should be 64)

When the global batch size was 128, my reproduction result was lower than in paper. (94.5 accuracies) Thanks.

weight decay of AdamW optimizer The weight decay hyperparameter was in the script examples/NLU/roberta_base_sst2.sh, but was not present in the paper (for the GLUE tasks) Did you use the weight decay parameter?

I wrote down your hyper-parameter setting like this, and I'd appreciate the specification.
opened by t-hyun 0