🛠️ Tools for Transformers compression using Lightning ⚡

Overview

Hits

Bert-squeeze

Bert-squeeze is a repository aiming to provide code to reduce the size of Transformer-based models or decrease their latency at inference time.

It gathers a non-exhaustive list of techniques such as distillation, pruning, quantization, early-exiting. The repo is written using PyTorch Lightning and Transformers.

About the project

As a heavy user of transformer-based models (which are truly amazing from my point of view) I always struggled to put those heavy models in production while having a decent inference speed. There are of course a bunch of existing libraries to optimize and compress transformer-based models (ONNX , distiller, compressors , KD_Lib, ... ).
I started this project because of the need to reduce the latency of models integrating transformers as subcomponents. For this reason, this project aims at providing implementations to train various transformer-based models (and others) using PyTorch Lightning but also to distill, prune, and quantize models.
I chose to write this repo with Lightning because of its growing trend, its flexibility, and the very few repositories using it. It currently only handles sequence classification models, but support for other tasks and custom architectures is planned.

Installation

First download the repository:

git clone https://github.com/JulesBelveze/bert-squeeze.git

and then install dependencies using poetry:

poetry install

You are all set!

Quickstarts

You can find a bunch of already prepared configurations under the examples folder. Just choose the one you need and run the following:

python3 -m bert-squeeze.main -cp=examples -cn=wanted_config

Disclaimer: I have not extensively tested all procedures and thus do not guarantee the performance of every implemented method.

Concepts

Transformers

If you never heard of it then I can only recommend you to read this amazing blog post and if you want to dig deeper there is this awesome lecture was given by Stanford available here.

Distillation

The idea of distillation is to train a small network to mimic a big network by trying to replicate its outputs. The repository provides the ability to transfer knowledge from any model to any other (if you need a model that is not within the models folder just write your own).

The repository also provides the possibility to perform soft-distillation or hard-distillation on an unlabeled dataset. In the soft case, we use the probabilities of the teacher as a target. In the hard one, we assume that the teacher's predictions are the actual label.

You can find these implementations under the distillation/ folder.

Quantization

Neural network quantization is the process of reducing the weights precision in the neural network. The repo has two callbacks one for dynamic quantization and one for quantization-aware training (using the Lightning callback) .

You can find those implementations under the utils/callbacks/ folder.

Pruning

Pruning neural networks consist of removing weights from trained models to compress them. This repo features various pruning implementations and methods such as head-pruning, layer dropping, and weights dropping.

You can find those implementations under the utils/callbacks/ folder.

Contributions and questions

If you are missing a feature that could be relevant to this repo, or a bug that you noticed feel free to open a PR or open an issue. As you can see in the roadmap there are a bunch more features to come 😃

Also, if you have any questions or suggestions feel free to ask!

References

  1. Alammar, J (2018). The Illustrated Transformer [Blog post]. Retrieved from https://jalammar.github.io/illustrated-transformer/
  2. stanfordonline (2021) Stanford CS224N NLP with Deep Learning | Winter 2021 | Lecture 9 - Self- Attention and Transformers. [online video] Available at: https://www.youtube.com/watch?v=ptuGllU5SQQ
  3. Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rémi Louf and Morgan Funtowicz and Jamie Brew (2019). HuggingFace's Transformers: State-of-the-art Natural Language Processing
  4. Hassan Sajjad and Fahim Dalvi and Nadir Durrani and Preslav Nakov (2020). Poor Man's BERT Smaller and Faster Transformer Models
  5. Angela Fan and Edouard Grave and Armand Joulin (2019). Reducing Transformer Depth on Demand with Structured Dropout
  6. Paul Michel and Omer Levy and Graham Neubig (2019). Are Sixteen Heads Really Better than One?
  7. Fangxiaoyu Feng and Yinfei Yang and Daniel Cer and Naveen Arivazhagan and Wei Wang (2020). Language-agnostic BERT Sentence Embedding
Comments
  • feat: introduce `DistilAssistant`

    feat: introduce `DistilAssistant`

    This PR aims at introducing a new helper object: DistilAssistant.

    Its role is to centralize and instantiate the different components needed to perform distillation.

    enhancement 
    opened by JulesBelveze 0
  • feat: training assistant

    feat: training assistant

    This PR aims at introducing a new helper object that makes the UX smoother: TrainAssistant. It directly loads a default configuration based on the model the user wants to fine-tune. It then instantiates the model, data, logger and callbacks.

    opened by JulesBelveze 0
  • refacto: improve user experience and documentation

    refacto: improve user experience and documentation

    This PR aims at refactoring the whole repository and document it.

    The main idea is to make the repository easier to use and release it as a pypi package.

    refacto 
    opened by JulesBelveze 0
  • Issue in FastBert evaluation

    Issue in FastBert evaluation

    There's an issue in the evaluation of FastBert when on training stage 1.

    The scorer and the way the evaluation report is logged doesn't expect to receive a list of logits (corresponding to the logits of all the ramp layers). The model probably needs to overrides both the log_eval_report and validation_epoch_end methods of the parent class.

    bug 
    opened by JulesBelveze 0
  • Incompatibility with latest `pytorch-lightning` and `neptune`

    Incompatibility with latest `pytorch-lightning` and `neptune`

    It seems the code break with the latest releases of pytorch-lightning and neptune. At least the following config breaks:

    pytorch-lightning==1.5.3
    neptune-client==0.13.2
    

    Trace:

    Error executing job with overrides: []
    Traceback (most recent call last):
      File "/home/jules/bert-squeeze/bert-squeeze/main.py", line 70, in run
        trainer.fit(model, data)
      File "/home/jules/data-n-ai/.venv/training/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 738, in fit
        self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
      File "/home/jules/data-n-ai/.venv/training/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 682, in _call_and_handle_interrupt
        return trainer_fn(*args, **kwargs)
      File "/home/jules/data-n-ai/.venv/training/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 772, in _fit_impl
        self._run(model, ckpt_path=ckpt_path)
      File "/home/jules/data-n-ai/.venv/training/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 1141, in _run
        self.accelerator.setup(self)
      File "/home/jules/data-n-ai/.venv/training/lib/python3.6/site-packages/pytorch_lightning/accelerators/cpu.py", line 35, in setup
        return super().setup(trainer)
      File "/home/jules/data-n-ai/.venv/training/lib/python3.6/site-packages/pytorch_lightning/accelerators/accelerator.py", line 93, in setup
        self.setup_optimizers(trainer)
      File "/home/jules/data-n-ai/.venv/training/lib/python3.6/site-packages/pytorch_lightning/accelerators/accelerator.py", line 352, in setup_optimizers
        trainer=trainer, model=self.lightning_module
      File "/home/jules/data-n-ai/.venv/training/lib/python3.6/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 245, in init_optimizers
        return trainer.init_optimizers(model)
      File "/home/jules/data-n-ai/.venv/training/lib/python3.6/site-packages/pytorch_lightning/trainer/optimizers.py", line 35, in init_optimizers
        optim_conf = self.call_hook("configure_optimizers", pl_module=pl_module)
      File "/home/jules/data-n-ai/.venv/training/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 1496, in call_hook
        output = model_fx(*args, **kwargs)
      File "/home/jules/bert-squeeze/bert-squeeze/models/base_lt_module.py", line 103, in configure_optimizers
        num_training_steps = len(self.train_dataloader()) * self.config.num_epochs // \
      File "/home/jules/data-n-ai/.venv/training/lib/python3.6/site-packages/pytorch_lightning/core/hooks.py", line 477, in train_dataloader
        raise NotImplementedError("`train_dataloader` must be implemented to be used with the Lightning Trainer")
    NotImplementedError: `train_dataloader` must be implemented to be used with the Lightning Trainer
    
    bug good first issue 
    opened by JulesBelveze 0
  • Wrong path to dataset in `DataModule`

    Wrong path to dataset in `DataModule`

    The load_dataset method doesn't load local datasets from the correct folder.

    "data/{self.dataset_config.name}_dataset.py"
    

    needs to be changed to

    "data/datasets/{self.dataset_config.name}_dataset.py"
    
    good first issue invalid 
    opened by JulesBelveze 0
  • Add `FastBert` model

    Add `FastBert` model

    opened by JulesBelveze 0
  • Logging issue when `accumulation_steps` > 1

    Logging issue when `accumulation_steps` > 1

    When train.accumulation_steps > 1 it causes the first accumulation_steps steps to try to log the same thing as self.global_step is still 0.

    [2021-10-27 08:35:47,478][neptune.new.internal.operation_processors.async_operation_processor][ERROR] - Error occurred during asynchronous operation processing: X-coordinates (step) must be strictly increasing for series attribute: train/acc. Invalid point: 0.0
    
    invalid 
    opened by JulesBelveze 0
  • Can it be used in the speech enhancement ?

    Can it be used in the speech enhancement ?

    Hi @JulesBelveze I trained a speech enhancement network using pytorch-lightning, and now I want to do some lightweighting on it, can I use this project? The network structure is Convolution Recurrent Network: image image

    thanks! looking forward to your suggestion!

    question 
    opened by zuowanbushiwo 3
  • DistilAssistant data kwargs

    DistilAssistant data kwargs

    There are currently some caveats and unexpected behaviours when trying to pass data keywords arguments to the DistilAssistant.

    One would expect the arguments to be applied to both the student and teacher modules.

    bug 
    opened by JulesBelveze 0
  • Use Callback for 2 stage training in DeeBert

    Use Callback for 2 stage training in DeeBert

    DeeBert models need to be fine-tuned in a two step fashion: first the final layer and then the ramps. The current implementation requires the user to do two different training. However, this can be achieved in one-shot using a pl.Callback, as done for TheseusBert.

    enhancement 
    opened by JulesBelveze 0
  • Use Lightning dataloader hooks in soft and hard distillation

    Use Lightning dataloader hooks in soft and hard distillation

    When performing distillation in soft or hard mode the way the datasets are concatenated is dubious. Lightning offers a handy solution to use multiple datasets (see documentation), which will make code much cleaner and easier to understand.

    enhancement good first issue distillation 
    opened by JulesBelveze 0
  • Error in `DataModule` when running on multiple GPUs with `ddp`

    Error in `DataModule` when running on multiple GPUs with `ddp`

    initializing ddp: GLOBAL_RANK: 1, MEMBER: 2/2
    [2021-10-27 06:28:58,303][torch.distributed.distributed_c10d][INFO] - Added key: store_based_barrier_key:1 to store for rank: 1
    [2021-10-27 06:28:58,316][torch.distributed.distributed_c10d][INFO] - Added key: store_based_barrier_key:1 to store for rank: 0
    [2021-10-27 06:28:58,318][torch.distributed.distributed_c10d][INFO] - Rank 0: Completed store-based barrier for 2 nodes.
    ----------------------------------------------------------------------------------------------------
    distributed_backend=nccl
    All DDP processes registered. Starting ddp with 2 processes
    ----------------------------------------------------------------------------------------------------
    
    [2021-10-27 06:28:58,324][torch.distributed.distributed_c10d][INFO] - Rank 1: Completed store-based barrier for 2 nodes.
    Error executing job with overrides: []
    Traceback (most recent call last):
      File "/home/jules/bert-squeeze/bert-squeeze/main.py", line 58, in run
        trainer.fit(model, data)
      File "/home/jules/bert-squeeze/.venv/env/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 552, in fit
        self._run(model)
      File "/home/jules/bert-squeeze/.venv/env/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 864, in _run
        self._call_setup_hook(model)  # allow user to setup lightning_module in accelerator environment
      File "/home/jules/bert-squeeze/.venv/env/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 1177, in _call_setup_hook
        self.datamodule.setup(stage=fn)
      File "/home/jules/bert-squeeze/.venv/env/lib/python3.6/site-packages/pytorch_lightning/core/datamodule.py", line 428, in wrapped_fn
        fn(*args, **kwargs)
      File "/home/jules/bert-squeeze/bert-squeeze/data/modules/transformer_module.py", line 62, in setup
        featurized_dataset = self.featurize()
      File "/home/jules/bert-squeeze/bert-squeeze/data/modules/transformer_module.py", line 45, in featurize
        tokenized_dataset = self.dataset.map(
    AttributeError: 'NoneType' object has no attribute 'map'
    
    bug help wanted 
    opened by JulesBelveze 0
Owner
Jules Belveze
AI craftsman | NLP | MLOps
Jules Belveze
Pytorch Lightning Distributed Accelerators using Ray

Distributed PyTorch Lightning Training on Ray This library adds new PyTorch Lightning accelerators for distributed training using the Ray distributed

null 166 Dec 27, 2022
Pytorch Lightning Distributed Accelerators using Ray

Distributed PyTorch Lightning Training on Ray This library adds new PyTorch Lightning plugins for distributed training using the Ray distributed compu

null 167 Jan 2, 2023
A small demonstration of using WebDataset with ImageNet and PyTorch Lightning

A small demonstration of using WebDataset with ImageNet and PyTorch Lightning

Tom 50 Dec 16, 2022
A small demonstration of using WebDataset with ImageNet and PyTorch Lightning

A small demonstration of using WebDataset with ImageNet and PyTorch Lightning This is a small repo illustrating how to use WebDataset on ImageNet. usi

null 50 Dec 16, 2022
Neural Scene Flow Fields using pytorch-lightning, with potential improvements

nsff_pl Neural Scene Flow Fields using pytorch-lightning. This repo reimplements the NSFF idea, but modifies several operations based on observation o

AI葵 178 Dec 21, 2022
Spiking Neural Network for Computer Vision using SpikingJelly framework and Pytorch-Lightning

Spiking Neural Network for Computer Vision using SpikingJelly framework and Pytorch-Lightning

Sami BARCHID 2 Oct 20, 2022
A simple, unofficial implementation of MAE using pytorch-lightning

Masked Autoencoders in PyTorch A simple, unofficial implementation of MAE (Masked Autoencoders are Scalable Vision Learners) using pytorch-lightning.

Connor Anderson 20 Dec 3, 2022
Multivariate Time Series Forecasting with efficient Transformers. Code for the paper "Long-Range Transformers for Dynamic Spatiotemporal Forecasting."

Spacetimeformer Multivariate Forecasting This repository contains the code for the paper, "Long-Range Transformers for Dynamic Spatiotemporal Forecast

QData 440 Jan 2, 2023
UMEC: Unified Model and Embedding Compression for Efficient Recommendation Systems

[ICLR 2021] "UMEC: Unified Model and Embedding Compression for Efficient Recommendation Systems" by Jiayi Shen, Haotao Wang*, Shupeng Gui*, Jianchao Tan, Zhangyang Wang, and Ji Liu

VITA 39 Dec 3, 2022
Code of paper "CDFI: Compression-Driven Network Design for Frame Interpolation", CVPR 2021

CDFI (Compression-Driven-Frame-Interpolation) [Paper] (Coming soon...) | [arXiv] Tianyu Ding*, Luming Liang*, Zhihui Zhu, Ilya Zharkov IEEE Conference

Tianyu Ding 95 Dec 4, 2022
Pytorch implementation of COIN, a framework for compression with implicit neural representations 🌸

COIN ?? This repo contains a Pytorch implementation of COIN: COmpression with Implicit Neural representations, including code to reproduce all experim

Emilien Dupont 104 Dec 14, 2022
Deep Compression for Dense Point Cloud Maps.

DEPOCO This repository implements the algorithms described in our paper Deep Compression for Dense Point Cloud Maps. How to get started (using Docker)

Photogrammetry & Robotics Bonn 67 Dec 6, 2022
Learned image compression

Overview Pytorch code of our recent work A Unified End-to-End Framework for Efficient Deep Image Compression. We first release the code for Variationa

Jiaheng Liu 163 Dec 4, 2022
The official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient.

You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient (paper) @misc{zhang2021compress,

null 46 Dec 7, 2022
Group Fisher Pruning for Practical Network Compression(ICML2021)

Group Fisher Pruning for Practical Network Compression (ICML2021) By Liyang Liu*, Shilong Zhang*, Zhanghui Kuang, Jing-Hao Xue, Aojun Zhou, Xinjiang W

Shilong Zhang 129 Dec 13, 2022
A Closer Look at Structured Pruning for Neural Network Compression

A Closer Look at Structured Pruning for Neural Network Compression Code used to reproduce experiments in https://arxiv.org/abs/1810.04622. To prune, w

Bayesian and Neural Systems Group 140 Dec 5, 2022
This is the pytorch implementation for the paper: *Learning Accurate Performance Predictors for Ultrafast Automated Model Compression*, which is in submission to TPAMI

SeerNet This is the pytorch implementation for the paper: Learning Accurate Performance Predictors for Ultrafast Automated Model Compression, which is

null 3 May 1, 2022
Online Multi-Granularity Distillation for GAN Compression (ICCV2021)

Online Multi-Granularity Distillation for GAN Compression (ICCV2021) This repository contains the pytorch codes and trained models described in the IC

Bytedance Inc. 299 Dec 16, 2022
[ACMMM 2021 Oral] Enhanced Invertible Encoding for Learned Image Compression

InvCompress Official Pytorch Implementation for "Enhanced Invertible Encoding for Learned Image Compression", ACMMM 2021 (Oral) Figure: Our framework

null 96 Nov 30, 2022