🛠️ Tools for Transformers compression using Lightning ⚡

Jules Belveze

Last update: Dec 11, 2022

Related tags

Deep Learning nlp transformers lstm pruning quantization bert distillation pytorch-lightning

Overview

Bert-squeeze

Bert-squeeze is a repository aiming to provide code to reduce the size of Transformer-based models or decrease their latency at inference time.

It gathers a non-exhaustive list of techniques such as distillation, pruning, quantization, early-exiting. The repo is written using PyTorch Lightning and Transformers.

About the project

As a heavy user of transformer-based models (which are truly amazing from my point of view) I always struggled to put those heavy models in production while having a decent inference speed. There are of course a bunch of existing libraries to optimize and compress transformer-based models (ONNX , distiller, compressors , KD_Lib, ... ).
I started this project because of the need to reduce the latency of models integrating transformers as subcomponents. For this reason, this project aims at providing implementations to train various transformer-based models (and others) using PyTorch Lightning but also to distill, prune, and quantize models.
I chose to write this repo with Lightning because of its growing trend, its flexibility, and the very few repositories using it. It currently only handles sequence classification models, but support for other tasks and custom architectures is planned.

Installation

First download the repository:

git clone https://github.com/JulesBelveze/bert-squeeze.git

and then install dependencies using poetry:

poetry install

You are all set!

Quickstarts

You can find a bunch of already prepared configurations under the examples folder. Just choose the one you need and run the following:

python3 -m bert-squeeze.main -cp=examples -cn=wanted_config

Disclaimer: I have not extensively tested all procedures and thus do not guarantee the performance of every implemented method.

Concepts

Transformers

If you never heard of it then I can only recommend you to read this amazing blog post and if you want to dig deeper there is this awesome lecture was given by Stanford available here.

Distillation

The idea of distillation is to train a small network to mimic a big network by trying to replicate its outputs. The repository provides the ability to transfer knowledge from any model to any other (if you need a model that is not within the models folder just write your own).

The repository also provides the possibility to perform soft-distillation or hard-distillation on an unlabeled dataset. In the soft case, we use the probabilities of the teacher as a target. In the hard one, we assume that the teacher's predictions are the actual label.

You can find these implementations under the distillation/ folder.

Quantization

Neural network quantization is the process of reducing the weights precision in the neural network. The repo has two callbacks one for dynamic quantization and one for quantization-aware training (using the Lightning callback) .

You can find those implementations under the utils/callbacks/ folder.

Pruning

Pruning neural networks consist of removing weights from trained models to compress them. This repo features various pruning implementations and methods such as head-pruning, layer dropping, and weights dropping.

You can find those implementations under the utils/callbacks/ folder.

Contributions and questions

If you are missing a feature that could be relevant to this repo, or a bug that you noticed feel free to open a PR or open an issue. As you can see in the roadmap there are a bunch more features to come 😃

Also, if you have any questions or suggestions feel free to ask!

References

Alammar, J (2018). The Illustrated Transformer [Blog post]. Retrieved from https://jalammar.github.io/illustrated-transformer/
stanfordonline (2021) Stanford CS224N NLP with Deep Learning | Winter 2021 | Lecture 9 - Self- Attention and Transformers. [online video] Available at: https://www.youtube.com/watch?v=ptuGllU5SQQ
Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rémi Louf and Morgan Funtowicz and Jamie Brew (2019). HuggingFace's Transformers: State-of-the-art Natural Language Processing
Hassan Sajjad and Fahim Dalvi and Nadir Durrani and Preslav Nakov (2020). Poor Man's BERT Smaller and Faster Transformer Models
Angela Fan and Edouard Grave and Armand Joulin (2019). Reducing Transformer Depth on Demand with Structured Dropout
Paul Michel and Omer Levy and Graham Neubig (2019). Are Sixteen Heads Really Better than One?
Fangxiaoyu Feng and Yinfei Yang and Daniel Cer and Naveen Arivazhagan and Wei Wang (2020). Language-agnostic BERT Sentence Embedding

Comments

feat: introduce `DistilAssistant`

This PR aims at introducing a new helper object: DistilAssistant.

Its role is to centralize and instantiate the different components needed to perform distillation.
enhancement

opened by JulesBelveze 0
feat: training assistant

This PR aims at introducing a new helper object that makes the UX smoother: TrainAssistant. It directly loads a default configuration based on the model the user wants to fine-tune. It then instantiates the model, data, logger and callbacks.

opened by JulesBelveze 0
refacto: improve user experience and documentation

This PR aims at refactoring the whole repository and document it.

The main idea is to make the repository easier to use and release it as a pypi package.
refacto

opened by JulesBelveze 0
Issue in FastBert evaluation

There's an issue in the evaluation of FastBert when on training stage 1.

The scorer and the way the evaluation report is logged doesn't expect to receive a list of logits (corresponding to the logits of all the ramp layers). The model probably needs to overrides both the log_eval_report and validation_epoch_end methods of the parent class.
bug

opened by JulesBelveze 0

Incompatibility with latest `pytorch-lightning` and `neptune`

It seems the code break with the latest releases of pytorch-lightning and neptune. At least the following config breaks:

pytorch-lightning==1.5.3
neptune-client==0.13.2

Trace:

Error executing job with overrides: []
Traceback (most recent call last):
  File "/home/jules/bert-squeeze/bert-squeeze/main.py", line 70, in run
    trainer.fit(model, data)
  File "/home/jules/data-n-ai/.venv/training/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 738, in fit
    self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
  File "/home/jules/data-n-ai/.venv/training/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 682, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/home/jules/data-n-ai/.venv/training/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 772, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/home/jules/data-n-ai/.venv/training/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 1141, in _run
    self.accelerator.setup(self)
  File "/home/jules/data-n-ai/.venv/training/lib/python3.6/site-packages/pytorch_lightning/accelerators/cpu.py", line 35, in setup
    return super().setup(trainer)
  File "/home/jules/data-n-ai/.venv/training/lib/python3.6/site-packages/pytorch_lightning/accelerators/accelerator.py", line 93, in setup
    self.setup_optimizers(trainer)
  File "/home/jules/data-n-ai/.venv/training/lib/python3.6/site-packages/pytorch_lightning/accelerators/accelerator.py", line 352, in setup_optimizers
    trainer=trainer, model=self.lightning_module
  File "/home/jules/data-n-ai/.venv/training/lib/python3.6/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 245, in init_optimizers
    return trainer.init_optimizers(model)
  File "/home/jules/data-n-ai/.venv/training/lib/python3.6/site-packages/pytorch_lightning/trainer/optimizers.py", line 35, in init_optimizers
    optim_conf = self.call_hook("configure_optimizers", pl_module=pl_module)
  File "/home/jules/data-n-ai/.venv/training/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 1496, in call_hook
    output = model_fx(*args, **kwargs)
  File "/home/jules/bert-squeeze/bert-squeeze/models/base_lt_module.py", line 103, in configure_optimizers
    num_training_steps = len(self.train_dataloader()) * self.config.num_epochs // \
  File "/home/jules/data-n-ai/.venv/training/lib/python3.6/site-packages/pytorch_lightning/core/hooks.py", line 477, in train_dataloader
    raise NotImplementedError("`train_dataloader` must be implemented to be used with the Lightning Trainer")
NotImplementedError: `train_dataloader` must be implemented to be used with the Lightning Trainer

bug good first issue

opened by JulesBelveze 0

Wrong path to dataset in `DataModule`
The load_dataset method doesn't load local datasets from the correct folder.

"data/{self.dataset_config.name}_dataset.py"

needs to be changed to

"data/datasets/{self.dataset_config.name}_dataset.py"
good first issue invalid
opened by JulesBelveze 0
Add `FastBert` model
This PR aims at adding support for the FastBert model.

[x] add PyTorch model

[x] add Lightning module

[x] main logic

[x] metrics computation

[x] weights saving & loading

[x] callback to handle the two steps training procedure

References:

FastBERT: a Self-distilling BERT with Adaptive Inference Time

List of existing implementations

enhancement distillation
opened by JulesBelveze 0

Logging issue when `accumulation_steps` > 1

When train.accumulation_steps > 1 it causes the first accumulation_steps steps to try to log the same thing as self.global_step is still 0.

[2021-10-27 08:35:47,478][neptune.new.internal.operation_processors.async_operation_processor][ERROR] - Error occurred during asynchronous operation processing: X-coordinates (step) must be strictly increasing for series attribute: train/acc. Invalid point: 0.0

invalid

opened by JulesBelveze 0

Can it be used in the speech enhancement ?

Hi @JulesBelveze I trained a speech enhancement network using pytorch-lightning, and now I want to do some lightweighting on it, can I use this project? The network structure is Convolution Recurrent Network:

thanks! looking forward to your suggestion！
question

opened by zuowanbushiwo 3
DistilAssistant data kwargs

There are currently some caveats and unexpected behaviours when trying to pass data keywords arguments to the DistilAssistant.

One would expect the arguments to be applied to both the student and teacher modules.
bug

opened by JulesBelveze 0
Use Callback for 2 stage training in DeeBert

DeeBert models need to be fine-tuned in a two step fashion: first the final layer and then the ramps. The current implementation requires the user to do two different training. However, this can be achieved in one-shot using a pl.Callback, as done for TheseusBert.
enhancement

opened by JulesBelveze 0
Use Lightning dataloader hooks in soft and hard distillation

When performing distillation in soft or hard mode the way the datasets are concatenated is dubious. Lightning offers a handy solution to use multiple datasets (see documentation), which will make code much cleaner and easier to understand.
enhancement good first issue distillation

opened by JulesBelveze 0

Error in `DataModule` when running on multiple GPUs with `ddp`

initializing ddp: GLOBAL_RANK: 1, MEMBER: 2/2
[2021-10-27 06:28:58,303][torch.distributed.distributed_c10d][INFO] - Added key: store_based_barrier_key:1 to store for rank: 1
[2021-10-27 06:28:58,316][torch.distributed.distributed_c10d][INFO] - Added key: store_based_barrier_key:1 to store for rank: 0
[2021-10-27 06:28:58,318][torch.distributed.distributed_c10d][INFO] - Rank 0: Completed store-based barrier for 2 nodes.
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All DDP processes registered. Starting ddp with 2 processes
----------------------------------------------------------------------------------------------------

[2021-10-27 06:28:58,324][torch.distributed.distributed_c10d][INFO] - Rank 1: Completed store-based barrier for 2 nodes.
Error executing job with overrides: []
Traceback (most recent call last):
  File "/home/jules/bert-squeeze/bert-squeeze/main.py", line 58, in run
    trainer.fit(model, data)
  File "/home/jules/bert-squeeze/.venv/env/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 552, in fit
    self._run(model)
  File "/home/jules/bert-squeeze/.venv/env/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 864, in _run
    self._call_setup_hook(model)  # allow user to setup lightning_module in accelerator environment
  File "/home/jules/bert-squeeze/.venv/env/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 1177, in _call_setup_hook
    self.datamodule.setup(stage=fn)
  File "/home/jules/bert-squeeze/.venv/env/lib/python3.6/site-packages/pytorch_lightning/core/datamodule.py", line 428, in wrapped_fn
    fn(*args, **kwargs)
  File "/home/jules/bert-squeeze/bert-squeeze/data/modules/transformer_module.py", line 62, in setup
    featurized_dataset = self.featurize()
  File "/home/jules/bert-squeeze/bert-squeeze/data/modules/transformer_module.py", line 45, in featurize
    tokenized_dataset = self.dataset.map(
AttributeError: 'NoneType' object has no attribute 'map'

bug help wanted

opened by JulesBelveze 0

Owner

Jules Belveze

AI craftsman | NLP | MLOps

GitHub

Pytorch Lightning Distributed Accelerators using Ray

Distributed PyTorch Lightning Training on Ray This library adds new PyTorch Lightning accelerators for distributed training using the Ray distributed

166 Dec 27, 2022

Pytorch Lightning Distributed Accelerators using Ray

Distributed PyTorch Lightning Training on Ray This library adds new PyTorch Lightning plugins for distributed training using the Ray distributed compu

167 Jan 2, 2023

A small demonstration of using WebDataset with ImageNet and PyTorch Lightning

50 Dec 16, 2022

A small demonstration of using WebDataset with ImageNet and PyTorch Lightning

A small demonstration of using WebDataset with ImageNet and PyTorch Lightning This is a small repo illustrating how to use WebDataset on ImageNet. usi

50 Dec 16, 2022

Neural Scene Flow Fields using pytorch-lightning, with potential improvements

nsff_pl Neural Scene Flow Fields using pytorch-lightning. This repo reimplements the NSFF idea, but modifies several operations based on observation o

178 Dec 21, 2022

Spiking Neural Network for Computer Vision using SpikingJelly framework and Pytorch-Lightning

2 Oct 20, 2022

A simple, unofficial implementation of MAE using pytorch-lightning

Masked Autoencoders in PyTorch A simple, unofficial implementation of MAE (Masked Autoencoders are Scalable Vision Learners) using pytorch-lightning.

20 Dec 3, 2022

Multivariate Time Series Forecasting with efficient Transformers. Code for the paper "Long-Range Transformers for Dynamic Spatiotemporal Forecasting."

Spacetimeformer Multivariate Forecasting This repository contains the code for the paper, "Long-Range Transformers for Dynamic Spatiotemporal Forecast

440 Jan 2, 2023

UMEC: Unified Model and Embedding Compression for Efficient Recommendation Systems

[ICLR 2021] "UMEC: Unified Model and Embedding Compression for Efficient Recommendation Systems" by Jiayi Shen, Haotao Wang*, Shupeng Gui*, Jianchao Tan, Zhangyang Wang, and Ji Liu

39 Dec 3, 2022

Code of paper "CDFI: Compression-Driven Network Design for Frame Interpolation", CVPR 2021

CDFI (Compression-Driven-Frame-Interpolation) [Paper] (Coming soon...) | [arXiv] Tianyu Ding*, Luming Liang*, Zhihui Zhu, Ilya Zharkov IEEE Conference

95 Dec 4, 2022

Pytorch implementation of COIN, a framework for compression with implicit neural representations 🌸

COIN ?? This repo contains a Pytorch implementation of COIN: COmpression with Implicit Neural representations, including code to reproduce all experim

104 Dec 14, 2022

Deep Compression for Dense Point Cloud Maps.

DEPOCO This repository implements the algorithms described in our paper Deep Compression for Dense Point Cloud Maps. How to get started (using Docker)

67 Dec 6, 2022

Learned image compression

Overview Pytorch code of our recent work A Unified End-to-End Framework for Efficient Deep Image Compression. We first release the code for Variationa

163 Dec 4, 2022

The official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient.

You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient (paper) @misc{zhang2021compress,

46 Dec 7, 2022

Group Fisher Pruning for Practical Network Compression(ICML2021)

Group Fisher Pruning for Practical Network Compression (ICML2021) By Liyang Liu*, Shilong Zhang*, Zhanghui Kuang, Jing-Hao Xue, Aojun Zhou, Xinjiang W

129 Dec 13, 2022

A Closer Look at Structured Pruning for Neural Network Compression

A Closer Look at Structured Pruning for Neural Network Compression Code used to reproduce experiments in https://arxiv.org/abs/1810.04622. To prune, w

140 Dec 5, 2022

This is the pytorch implementation for the paper: Learning Accurate Performance Predictors for Ultrafast Automated Model Compression, which is in submission to TPAMI

SeerNet This is the pytorch implementation for the paper: Learning Accurate Performance Predictors for Ultrafast Automated Model Compression, which is

3 May 1, 2022

Online Multi-Granularity Distillation for GAN Compression (ICCV2021)

Online Multi-Granularity Distillation for GAN Compression (ICCV2021) This repository contains the pytorch codes and trained models described in the IC

299 Dec 16, 2022

[ACMMM 2021 Oral] Enhanced Invertible Encoding for Learned Image Compression

InvCompress Official Pytorch Implementation for "Enhanced Invertible Encoding for Learned Image Compression", ACMMM 2021 (Oral) Figure: Our framework

96 Nov 30, 2022

🛠️ Tools for Transformers compression using Lightning ⚡

Related tags

Overview

Bert-squeeze

About the project

Installation

Quickstarts

Concepts

Transformers

Distillation

Quantization

Pruning

Contributions and questions

References

Comments

Owner

Jules Belveze

Pytorch Lightning Distributed Accelerators using Ray

Pytorch Lightning Distributed Accelerators using Ray

A small demonstration of using WebDataset with ImageNet and PyTorch Lightning

A small demonstration of using WebDataset with ImageNet and PyTorch Lightning

Neural Scene Flow Fields using pytorch-lightning, with potential improvements

Spiking Neural Network for Computer Vision using SpikingJelly framework and Pytorch-Lightning

A simple, unofficial implementation of MAE using pytorch-lightning

Multivariate Time Series Forecasting with efficient Transformers. Code for the paper "Long-Range Transformers for Dynamic Spatiotemporal Forecasting."

UMEC: Unified Model and Embedding Compression for Efficient Recommendation Systems

Code of paper "CDFI: Compression-Driven Network Design for Frame Interpolation", CVPR 2021

Pytorch implementation of COIN, a framework for compression with implicit neural representations 🌸

Deep Compression for Dense Point Cloud Maps.

Learned image compression

The official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient.

Group Fisher Pruning for Practical Network Compression(ICML2021)

A Closer Look at Structured Pruning for Neural Network Compression

This is the pytorch implementation for the paper: *Learning Accurate Performance Predictors for Ultrafast Automated Model Compression*, which is in submission to TPAMI

Online Multi-Granularity Distillation for GAN Compression (ICCV2021)

[ACMMM 2021 Oral] Enhanced Invertible Encoding for Learned Image Compression

This is the pytorch implementation for the paper: Learning Accurate Performance Predictors for Ultrafast Automated Model Compression, which is in submission to TPAMI