Load What You Need: Smaller Multilingual Transformers for Pytorch and TensorFlow 2.0.

Geotrend

Last update: Dec 28, 2022

Related tags

Overview

Smaller Multilingual Transformers

This repository shares smaller versions of multilingual transformers that keep the same representations offered by the original ones. The idea came from a simple observation: after massively multilingual pretraining, not all embeddings are needed to perform finetuning and inference. In practice one would rarely require a model that supports more than 100 languages as the original mBERT. Therefore, we extracted several smaller versions that handle fewer languages. Since most of the parameters of multilingual transformers are located in the embeddings layer, our models are between 21% and 45% smaller in size.

The table bellow compares two of our exracted versions with the original mBERT. It shows the models size, memory footprint and the obtained accuracy on the XNLI dataset (Cross-lingual Transfer from english for french). These measurements have been computed on a Google Cloud n1-standard-1 machine (1 vCPU, 3.75 GB).

Model	Num parameters	Size	Memory	Accuracy
bert-base-multilingual-cased	178 million	714 MB	1400 MB	73.8
Geotrend/bert-base-15lang-cased	141 million	564 MB	1098 MB	74.1
Geotrend/bert-base-en-fr-cased	112 million	447 MB	878 MB	73.8

Reducing the size of multilingual transformers facilitates their deployment on public cloud platforms. For instance, Google Cloud Platform requires that the model size on disk should be lower than 500 MB for serveless deployments (Cloud Functions / Cloud ML).

For more information, please refer to our paper: Load What You Need.

Available Models

Until now, we generated 70 smaller models from the original mBERT cased version. These models have been uploaded to the Hugging Face Model Hub in order to facilitate their use: https://huggingface.co/Geotrend.

They can be downloaded easily using the transformers library:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("Geotrend/bert-base-en-fr-cased")
model = AutoModel.from_pretrained("Geotrend/bert-base-en-fr-cased")

More models will be released soon.

Generating new Models

We also share a python script that allows users to generate smaller transformers by their own based on a subset of the original vocabulary (the method does not only concern multilingual transformers):

pip install -r requirements.txt

python3 reduce_model.py \
	--source_model bert-base-multilingual-cased \
	--vocab_file vocab_5langs.txt \
	--output_model bert-base-5lang-cased \
	--convert_to_tf False

Where:

--source_model is the multilingual transformer to reduce
--vocab_file is the intended vocabulary file path
--output_model is the name of the final reduced model
--convert_to_tf tells the scipt whether to generate a tenserflow version or not

How to Cite

@inproceedings{smallermbert,
  title={Load What You Need: Smaller Versions of Multilingual BERT},
  author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
  booktitle={SustaiNLP / EMNLP},
  year={2020}
}

Contact

Please contact [email protected] for any question, feedback or request.

Comments

[Question] Could this be used for models other than mBERT (XLM-RoBERTa/mBART/Bloom)?

Hi! Interesting work, and thanks for sharing the scripts.

I was wondering if your code could be directly applied to other multilingual models like encoder based XLM-RoBERTa, decoder based Bloom(mGPT), or encoder-decoder based mBART, since your scripts are build over AutoTokenizer and AutoModel. Could your code be used for them without making any adaptation? Have you tried it?

Regards, Gorka

opened by GorkaUrbizu 2

pytorch implementation of Attention is all you need

A Pytorch Implementation of the Transformer: Attention Is All You Need Our implementation is largely based on Tensorflow implementation Requirements N

230 Dec 7, 2022

A repository that shares tuning results of trained models generated by TensorFlow / Keras. Post-training quantization (Weight Quantization, Integer Quantization, Full Integer Quantization, Float16 Quantization), Quantization-aware training. TensorFlow Lite. OpenVINO. CoreML. TensorFlow.js. TF-TRT. MediaPipe. ONNX. [.tflite,.h5,.pb,saved_model,tfjs,tftrt,mlmodel,.xml/.bin, .onnx]

PINTO_model_zoo Please read the contents of the LICENSE file located directly under each folder before using the model. My model conversion scripts ar

2.4k Jan 5, 2023

Cl datasets - PyTorch image dataloaders and utility functions to load datasets for supervised continual learning

Continual learning datasets Introduction This repository contains PyTorch image

5 Aug 28, 2022

BasicRL: easy and fundamental codes for deep reinforcement learning。It is an improvement on rainbow-is-all-you-need and OpenAI Spinning Up.

BasicRL: easy and fundamental codes for deep reinforcement learning BasicRL is an improvement on rainbow-is-all-you-need and OpenAI Spinning Up. It is

12 Apr 28, 2022

🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

English | 简体中文 | 繁體中文 State-of-the-art Natural Language Processing for Jax, PyTorch and TensorFlow 🤗 Transformers provides thousands of pretrained mo

77.2k Jan 2, 2023

Code for "Diffusion is All You Need for Learning on Surfaces"

Source code for "Diffusion is All You Need for Learning on Surfaces", by Nicholas Sharp Souhaib Attaiki Keenan Crane Maks Ovsjanikov NOTE: the linked

247 Dec 28, 2022

PixelPick This is an official implementation of the paper "All you need are a few pixels: semantic segmentation with PixelPick."

PixelPick This is an official implementation of the paper "All you need are a few pixels: semantic segmentation with PixelPick." [Project page] [Paper

59 Sep 25, 2022

Per-Pixel Classification is Not All You Need for Semantic Segmentation

MaskFormer: Per-Pixel Classification is Not All You Need for Semantic Segmentation Bowen Cheng, Alexander G. Schwing, Alexander Kirillov [arXiv] [Proj

1k Jan 8, 2023

Open-Set Recognition: A Good Closed-Set Classifier is All You Need

Open-Set Recognition: A Good Closed-Set Classifier is All You Need Code for our paper: "Open-Set Recognition: A Good Closed-Set Classifier is All You

194 Jan 3, 2023

Load What You Need: Smaller Multilingual Transformers for Pytorch and TensorFlow 2.0.

Related tags

Overview

Smaller Multilingual Transformers

Available Models

Generating new Models

How to Cite

Contact

You might also like...

pytorch implementation of Attention is all you need

Cl datasets - PyTorch image dataloaders and utility functions to load datasets for supervised continual learning

BasicRL: easy and fundamental codes for deep reinforcement learning。It is an improvement on rainbow-is-all-you-need and OpenAI Spinning Up.

🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

Code for "Diffusion is All You Need for Learning on Surfaces"

PixelPick This is an official implementation of the paper "All you need are a few pixels: semantic segmentation with PixelPick."

Per-Pixel Classification is Not All You Need for Semantic Segmentation

Open-Set Recognition: A Good Closed-Set Classifier is All You Need

Comments

[Question] Could this be used for models other than mBERT (XLM-RoBERTa/mBART/Bloom)?

Owner

Geotrend

Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021

This is a virtual picture dragging application. Users may virtually slide photos across the screen. The distance between the index and middle fingers determines the movement. Smaller distances indicate click and motion, whereas bigger distances indicate only hand movement.

Implementation of ConvMixer-Patches Are All You Need? in TensorFlow and Keras

SparseML is a libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models

An implementation of Fastformer: Additive Attention Can Be All You Need in TensorFlow

Unofficial Tensorflow-Keras implementation of Fastformer based on paper [Fastformer: Additive Attention Can Be All You Need](https://arxiv.org/abs/2108.09084).

TensorFlow implementation of "Attention is all you need (Transformer)"

The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"

Unofficial PyTorch implementation of Fastformer based on paper "Fastformer: Additive Attention Can Be All You Need"."

A PyTorch implementation of the Transformer model in "Attention is All You Need".