🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

Hugging Face

Last update: Jan 2, 2023

Related tags

Deep Learning nlp natural-language-processing tensorflow pytorch transformer speech-recognition seq2seq flax gpt pretrained-models language-models natural-language-generation nlp-library language-model bert natural-language-understanding jax xlnet pytorch-transformers model-hub

Overview

English | 简体中文 | 繁體中文

State-of-the-art Natural Language Processing for Jax, PyTorch and TensorFlow

🤗 Transformers provides thousands of pretrained models to perform tasks on texts such as classification, information extraction, question answering, summarization, translation, text generation and more in over 100 languages. Its aim is to make cutting-edge NLP easier to use for everyone.

🤗 Transformers provides APIs to quickly download and use those pretrained models on a given text, fine-tune them on your own datasets and then share them with the community on our model hub. At the same time, each python module defining an architecture is fully standalone and can be modified to enable quick research experiments.

🤗 Transformers is backed by the three most popular deep learning libraries — Jax, PyTorch and TensorFlow — with a seamless integration between them. It's straightforward to train your models with one before loading them for inference with the other.

Online demos

You can test most of our models directly on their pages from the model hub. We also offer private model hosting, versioning, & an inference API for public and private models.

Here are a few examples:

Write With Transformer, built by the Hugging Face team, is the official demo of this repo’s text generation capabilities.

If you are looking for custom support from the Hugging Face team

Quick tour

To immediately use a model on a given text, we provide the pipeline API. Pipelines group together a pretrained model with the preprocessing that was used during that model's training. Here is how to quickly use a pipeline to classify positive versus negative texts:

>>> from transformers import pipeline

# Allocate a pipeline for sentiment-analysis
>>> classifier = pipeline('sentiment-analysis')
>>> classifier('We are very happy to introduce pipeline to the transformers repository.')
[{'label': 'POSITIVE', 'score': 0.9996980428695679}]

The second line of code downloads and caches the pretrained model used by the pipeline, while the third evaluates it on the given text. Here the answer is "positive" with a confidence of 99.97%.

Many NLP tasks have a pre-trained pipeline ready to go. For example, we can easily extract question answers given context:

>>> from transformers import pipeline

# Allocate a pipeline for question-answering
>>> question_answerer = pipeline('question-answering')
>>> question_answerer({
...     'question': 'What is the name of the repository ?',
...     'context': 'Pipeline has been included in the huggingface/transformers repository'
... })
{'score': 0.30970096588134766, 'start': 34, 'end': 58, 'answer': 'huggingface/transformers'}

In addition to the answer, the pretrained model used here returned its confidence score, along with the start position and end position of the answer in the tokenized sentence. You can learn more about the tasks supported by the pipeline API in this tutorial.

To download and use any of the pretrained models on your given task, all it takes is three lines of code. Here is the PyTorch version:

>>> from transformers import AutoTokenizer, AutoModel

>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
>>> model = AutoModel.from_pretrained("bert-base-uncased")

>>> inputs = tokenizer("Hello world!", return_tensors="pt")
>>> outputs = model(**inputs)

And here is the equivalent code for TensorFlow:

>>> from transformers import AutoTokenizer, TFAutoModel

>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
>>> model = TFAutoModel.from_pretrained("bert-base-uncased")

>>> inputs = tokenizer("Hello world!", return_tensors="tf")
>>> outputs = model(**inputs)

The tokenizer is responsible for all the preprocessing the pretrained model expects, and can be called directly on a single string (as in the above examples) or a list. It will output a dictionary that you can use in downstream code or simply directly pass to your model using the ** argument unpacking operator.

The model itself is a regular Pytorch nn.Module or a TensorFlow tf.keras.Model (depending on your backend) which you can use normally. This tutorial explains how to integrate such a model into a classic PyTorch or TensorFlow training loop, or how to use our Trainer API to quickly fine-tune on a new dataset.

Why should I use transformers?

Easy-to-use state-of-the-art models:
- High performance on NLU and NLG tasks.
- Low barrier to entry for educators and practitioners.
- Few user-facing abstractions with just three classes to learn.
- A unified API for using all our pretrained models.
Lower compute costs, smaller carbon footprint:
- Researchers can share trained models instead of always retraining.
- Practitioners can reduce compute time and production costs.
- Dozens of architectures with over 2,000 pretrained models, some in more than 100 languages.
Choose the right framework for every part of a model's lifetime:
- Train state-of-the-art models in 3 lines of code.
- Move a single model between TF2.0/PyTorch frameworks at will.
- Seamlessly pick the right framework for training, evaluation and production.
Easily customize a model or an example to your needs:
- We provide examples for each architecture to reproduce the results published by its original authors.
- Model internals are exposed as consistently as possible.
- Model files can be used independently of the library for quick experiments.

Why shouldn't I use transformers?

This library is not a modular toolbox of building blocks for neural nets. The code in the model files is not refactored with additional abstractions on purpose, so that researchers can quickly iterate on each of the models without diving into additional abstractions/files.
The training API is not intended to work on any model but is optimized to work with the models provided by the library. For generic machine learning loops, you should use another library.
While we strive to present as many use cases as possible, the scripts in our examples folder are just that: examples. It is expected that they won't work out-of-the box on your specific problem and that you will be required to change a few lines of code to adapt them to your needs.

Installation

With pip

This repository is tested on Python 3.6+, Flax 0.3.2+, PyTorch 1.3.1+ and TensorFlow 2.3+.

You should install 🤗 Transformers in a virtual environment. If you're unfamiliar with Python virtual environments, check out the user guide.

First, create a virtual environment with the version of Python you're going to use and activate it.

Then, you will need to install at least one of Flax, PyTorch or TensorFlow. Please refer to TensorFlow installation page, PyTorch installation page and/or Flax installation page regarding the specific install command for your platform.

When one of those backends has been installed, 🤗 Transformers can be installed using pip as follows:

pip install transformers

If you'd like to play with the examples or need the bleeding edge of the code and can't wait for a new release, you must install the library from source.

With conda

Since Transformers version v4.0.0, we now have a conda channel: huggingface.

🤗 Transformers can be installed using conda as follows:

conda install -c huggingface transformers

Follow the installation pages of Flax, PyTorch or TensorFlow to see how to install them with conda.

Model architectures

All the model checkpoints provided by 🤗 Transformers are seamlessly integrated from the huggingface.co model hub where they are uploaded directly by users and organizations.

Current number of checkpoints:

🤗 Transformers currently provides the following architectures (see here for a high-level summary of each them):

ALBERT (from Google Research and the Toyota Technological Institute at Chicago) released with the paper ALBERT: A Lite BERT for Self-supervised Learning of Language Representations, by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.
BART (from Facebook) released with the paper BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
BARThez (from École polytechnique) released with the paper BARThez: a Skilled Pretrained French Sequence-to-Sequence Model by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis.
BEiT (from Microsoft) released with the paper BEiT: BERT Pre-Training of Image Transformers by Hangbo Bao, Li Dong, Furu Wei.
BERT (from Google) released with the paper BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
BERT For Sequence Generation (from Google) released with the paper Leveraging Pre-trained Checkpoints for Sequence Generation Tasks by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
BigBird-RoBERTa (from Google Research) released with the paper Big Bird: Transformers for Longer Sequences by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
BigBird-Pegasus (from Google Research) released with the paper Big Bird: Transformers for Longer Sequences by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
Blenderbot (from Facebook) released with the paper Recipes for building an open-domain chatbot by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
BlenderbotSmall (from Facebook) released with the paper Recipes for building an open-domain chatbot by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
BORT (from Alexa) released with the paper Optimal Subarchitecture Extraction For BERT by Adrian de Wynter and Daniel J. Perry.
ByT5 (from Google Research) released with the paper ByT5: Towards a token-free future with pre-trained byte-to-byte models by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
CamemBERT (from Inria/Facebook/Sorbonne) released with the paper CamemBERT: a Tasty French Language Model by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
CANINE (from Google Research) released with the paper CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting.
CLIP (from OpenAI) released with the paper Learning Transferable Visual Models From Natural Language Supervision by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
ConvBERT (from YituTech) released with the paper ConvBERT: Improving BERT with Span-based Dynamic Convolution by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
CPM (from Tsinghua University) released with the paper CPM: A Large-scale Generative Chinese Pre-trained Language Model by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun.
CTRL (from Salesforce) released with the paper CTRL: A Conditional Transformer Language Model for Controllable Generation by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
DeBERTa (from Microsoft) released with the paper DeBERTa: Decoding-enhanced BERT with Disentangled Attention by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
DeBERTa-v2 (from Microsoft) released with the paper DeBERTa: Decoding-enhanced BERT with Disentangled Attention by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
DeiT (from Facebook) released with the paper Training data-efficient image transformers & distillation through attention by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.
DETR (from Facebook) released with the paper End-to-End Object Detection with Transformers by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko.
DialoGPT (from Microsoft Research) released with the paper DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
DistilBERT (from HuggingFace), released together with the paper DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into DistilGPT2, RoBERTa into DistilRoBERTa, Multilingual BERT into DistilmBERT and a German version of DistilBERT.
DPR (from Facebook) released with the paper Dense Passage Retrieval for Open-Domain Question Answering by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
EncoderDecoder (from Google Research) released with the paper Leveraging Pre-trained Checkpoints for Sequence Generation Tasks by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
ELECTRA (from Google Research/Stanford University) released with the paper ELECTRA: Pre-training text encoders as discriminators rather than generators by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
FlauBERT (from CNRS) released with the paper FlauBERT: Unsupervised Language Model Pre-training for French by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
FNet (from Google Research) released with the paper FNet: Mixing Tokens with Fourier Transforms by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.
Funnel Transformer (from CMU/Google Brain) released with the paper Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
GPT (from OpenAI) released with the paper Improving Language Understanding by Generative Pre-Training by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
GPT-2 (from OpenAI) released with the paper Language Models are Unsupervised Multitask Learners by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
GPT-J (from EleutherAI) released in the repository kingoflolz/mesh-transformer-jax by Ben Wang and Aran Komatsuzaki.
GPT Neo (from EleutherAI) released in the repository EleutherAI/gpt-neo by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
Hubert (from Facebook) released with the paper HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
I-BERT (from Berkeley) released with the paper I-BERT: Integer-only BERT Quantization by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer.
LayoutLM (from Microsoft Research Asia) released with the paper LayoutLM: Pre-training of Text and Layout for Document Image Understanding by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
LayoutLMv2 (from Microsoft Research Asia) released with the paper LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou.
LayoutXLM (from Microsoft Research Asia) released with the paper LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei.
LED (from AllenAI) released with the paper Longformer: The Long-Document Transformer by Iz Beltagy, Matthew E. Peters, Arman Cohan.
Longformer (from AllenAI) released with the paper Longformer: The Long-Document Transformer by Iz Beltagy, Matthew E. Peters, Arman Cohan.
LUKE (from Studio Ousia) released with the paper LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention by Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto.
LXMERT (from UNC Chapel Hill) released with the paper LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering by Hao Tan and Mohit Bansal.
M2M100 (from Facebook) released with the paper Beyond English-Centric Multilingual Machine Translation by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
MarianMT Machine translation models trained using OPUS data by Jörg Tiedemann. The Marian Framework is being developed by the Microsoft Translator Team.
MBart (from Facebook) released with the paper Multilingual Denoising Pre-training for Neural Machine Translation by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
MBart-50 (from Facebook) released with the paper Multilingual Translation with Extensible Multilingual Pretraining and Finetuning by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
Megatron-BERT (from NVIDIA) released with the paper Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
Megatron-GPT2 (from NVIDIA) released with the paper Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
MPNet (from Microsoft Research) released with the paper MPNet: Masked and Permuted Pre-training for Language Understanding by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
MT5 (from Google AI) released with the paper mT5: A massively multilingual pre-trained text-to-text transformer by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
Pegasus (from Google) released with the paper PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
ProphetNet (from Microsoft Research) released with the paper ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
Reformer (from Google Research) released with the paper Reformer: The Efficient Transformer by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
RemBERT (from Google Research) released with the paper Rethinking embedding coupling in pre-trained language models by Hyung Won Chung, Thibault Févry, Henry Tsai, M. Johnson, Sebastian Ruder.
RoBERTa (from Facebook), released together with the paper a Robustly Optimized BERT Pretraining Approach by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
RoFormer (from ZhuiyiTechnology), released together with the paper a RoFormer: Enhanced Transformer with Rotary Position Embedding by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
SpeechEncoderDecoder
SpeechToTextTransformer (from Facebook), released together with the paper fairseq S2T: Fast Speech-to-Text Modeling with fairseq by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.
SpeechToTextTransformer2 (from Facebook), released together with the paper Large-Scale Self- and Semi-Supervised Learning for Speech Translation by Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau.
Splinter (from Tel Aviv University), released together with the paper Few-Shot Question Answering by Pretraining Span Selection by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy.
SqueezeBert (from Berkeley) released with the paper SqueezeBERT: What can computer vision teach NLP about efficient neural networks? by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
T5 (from Google AI) released with the paper Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
T5v1.1 (from Google AI) released in the repository google-research/text-to-text-transfer-transformer by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
TAPAS (from Google AI) released with the paper TAPAS: Weakly Supervised Table Parsing via Pre-training by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos.
Transformer-XL (from Google/CMU) released with the paper Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
Vision Transformer (ViT) (from Google AI) released with the paper An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
VisualBERT (from UCLA NLP) released with the paper VisualBERT: A Simple and Performant Baseline for Vision and Language by Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang.
Wav2Vec2 (from Facebook AI) released with the paper wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.
XLM (from Facebook) released together with the paper Cross-lingual Language Model Pretraining by Guillaume Lample and Alexis Conneau.
XLM-ProphetNet (from Microsoft Research) released with the paper ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
XLM-RoBERTa (from Facebook AI), released together with the paper Unsupervised Cross-lingual Representation Learning at Scale by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
XLNet (from Google/CMU) released with the paper XLNet: Generalized Autoregressive Pretraining for Language Understanding by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
XLSR-Wav2Vec2 (from Facebook AI) released with the paper Unsupervised Cross-Lingual Representation Learning For Speech Recognition by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
Want to contribute a new model? We have added a detailed guide and templates to guide you in the process of adding a new model. You can find them in the templates folder of the repository. Be sure to check the contributing guidelines and contact the maintainers or open an issue to collect feedbacks before starting your PR.

To check if each model has an implementation in Flax, PyTorch or TensorFlow, or has an associated tokenizer backed by the 🤗 Tokenizers library, refer to this table.

These implementations have been tested on several datasets (see the example scripts) and should match the performance of the original implementations. You can find more details on performance in the Examples section of the documentation.

Learn more

Section	Description
Documentation	Full API documentation and tutorials
Task summary	Tasks supported by 🤗 Transformers
Preprocessing tutorial	Using the `Tokenizer` class to prepare data for the models
Training and fine-tuning	Using the models provided by 🤗 Transformers in a PyTorch/TensorFlow training loop and the `Trainer` API
Quick tour: Fine-tuning/usage scripts	Example scripts for fine-tuning models on a wide range of tasks
Model sharing and uploading	Upload and share your fine-tuned models with the community
Migration	Migrate to 🤗 Transformers from `pytorch-transformers` or `pytorch-pretrained-bert`

Citation

We now have a paper you can cite for the 🤗 Transformers library:

@inproceedings{wolf-etal-2020-transformers,
    title = "Transformers: State-of-the-Art Natural Language Processing",
    author = "Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rémi Louf and Morgan Funtowicz and Joe Davison and Sam Shleifer and Patrick von Platen and Clara Ma and Yacine Jernite and Julien Plu and Canwen Xu and Teven Le Scao and Sylvain Gugger and Mariama Drame and Quentin Lhoest and Alexander M. Rush",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
    month = oct,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.emnlp-demos.6",
    pages = "38--45"
}

Comments

How to use BERT for finding similar sentences or similar news?

I have used BERT NextSentencePredictor to find similar sentences or similar news, However, It's super slow. Even on Tesla V100 which is the fastest GPU till now. It takes around 10secs for a query title with around 3,000 articles. Is there a way to use BERT better for finding similar sentences or similar news given a corpus of news articles?

opened by Raghavendra15 161
Summarization Fine Tuning

❓ Questions & Help

Details

I tried using T5 and Bart but the abstraction summarization on scientific texts does not seem to give the results I want since I think they are both trained on news corpora. I have scraped all of the free PMC articles and I am thinking about fine-tuning a seq2seq model between the articles and their abstracts to make an abstractive summarizer for scientific texts. This Medium article (https://medium.com/huggingface/encoder-decoders-in-transformers-a-hybrid-pre-trained-architecture-for-seq2seq-af4d7bf14bb8) provides a bit of an introduction to how to approach this but does not quite go into detail so I am wondering how to approach this.

I'm not really asking for help being stuck but I just don't really know how to approach this problem.

A link to original question on Stack Overflow: https://stackoverflow.com/questions/61826443/train-custom-seq2seq-transformers-model
Discussion wontfix

opened by kevinlu1248 79
ONNXConfig: Add a configuration for all available models
This issue is about the working group specially created for this task. If you are interested in helping out, take a look at this organization, or add me on Discord: ChainYo#3610

We want to contribute to HuggingFace's ONNX implementation for all available models on HF's hub. There are already a lot of architectures implemented for converting PyTorch models to ONNX, but we need more! We need them all!

Feel free to join us in this adventure! Join the org by clicking here

Here is a non-exhaustive list of models that all models available:

[x] Albert

[x] BART

[x] BeiT

[x] BERT

[x] BigBird

[x] BigBirdPegasus

[x] Blenderbot

[x] BlenderbotSmall

[x] BLOOM

[x] CamemBERT

[ ] CANINE

[x] CLIP

[x] CodeGen

[x] ConvNext

[x] ConvBert

[ ] CTRL

[ ] CvT

[x] Data2VecText

[x] Data2VecVision

[x] Deberta

[x] DebertaV2

[x] DeiT

[ ] DecisionTransformer

[x] DETR

[x] Distilbert

[ ] DPR

[ ] DPT

[x] ELECTRA

[ ] FNet

[ ] FSMT

[x] Flaubert

[ ] FLAVA

[ ] Funnel Transformer

[ ] GLPN

[x] GPT2

[x] GPTJ

[x] GPT-Neo

[ ] GPT-NeoX

[ ] Hubert

[x] I-Bert

[ ] ImageGPT

[ ] LED

[x] LayoutLM

[ ] 🛠️ LayoutLMv2

[x] LayoutLMv3

[ ] LayoutXLM

[ ] LED

[x] LeViT

[x] Longformer

[x] LongT5

[ ] 🛠️ Luke

[ ] Lxmert

[x] M2M100

[ ] MaskFormer

[x] mBart

[ ] MCTCT

[ ] MPNet

[x] MT5

[x] MarianMT

[ ] MegatronBert

[x] MobileBert

[x] MobileViT

[ ] Nyströmformer

[x] OpenAIGPT-2

[ ] 🛠️ OPT

[x] OWLViT

[x] PLBart

[ ] Pegasus

[x] Perceiver

[ ] PoolFormer

[ ] ProphetNet

[ ] QDQBERT

[ ] RAG

[ ] REALM

[ ] 🛠️ Reformer

[x] RemBert

[x] ResNet

[ ] RegNet

[ ] RetriBert

[x] RoFormer

[x] RoBERTa

[ ] SEW

[ ] SEW-D

[ ] SegFormer

[ ] Speech2Text

[ ] Speech2Text2

[ ] Splinter

[x] SqueezeBERT

[ ] Swin Transformer

[x] T5

[ ] TAPAS

[ ] TAPEX

[ ] Transformer XL

[x] TrOCR

[ ] UniSpeech

[ ] UniSpeech-SAT

[ ] VAN

[x] ViT

[ ] Vilt

[ ] VisualBERT

[ ] Wav2Vec2

[ ] WavLM

[ ] XGLM

[x] XLM

[ ] XLMProphetNet

[x] XLM-RoBERTa

[x] XLM-RoBERTa-XL

[ ] 🛠️ XLNet

[x] YOLOS

[ ] Yoso

🛠️ next to a model suggests that the PR is in progress. If there is nothing next to a model, it means that ONNX does not yet support the model, and thus we need to add support for it.

If you need help implementing an unsupported model, here is a guide from HuggingFace's documentation.

If you want an example of implementation, I did one for CamemBERT months ago.
Good First Issue
opened by ChainYo 76
GPT-J-6B
What does this PR do?

Introduces the long awaited GPT J model class to HuggingFace! Concurrently with this PR being merged I will make a GPT J 6B checkpoint public on the EleutherAI HF page for people to use. The model has been evaluated as being within error tolerances of the GPT J 6B model we released in Jax two months ago.

@patil-suraj was very helpful in assisting me to understand HF philosophy and how to make this PR most in line with the rest of the codebase. Other than that, the major design consideration was to make the configs compatible with GPT-2 rather than GPT-Neo. GPT-Neo has some usability limitations due to its configs having names unrelated to GPT-2’s (see #12183 for details). Given those problems and my hope that GPT-Neo will have it’s configs updated in the future, it seemed like a clear choice to align GPT J with GPT-2.

Shout outs to @finetuneanon whose implementation this one is based off of, as well as @kumuruz for assistence optimizing and debugging.

Supersedes #12243 #13010 #13022

Closes #12098

Before submitting

[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).

[X] Did you read the contributor guideline, Pull Request section?

[X] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case. It was discussed in Slack with @patil-suraj

[X] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.

[X] Did you write any new necessary tests?

Who can review?

gpt2: @patrickvonplaten, @LysandreJik, @patil-suraj
opened by StellaAthena 75

[DeepSpeed] [success] trained t5-11b on 1x 40GB gpu

Managed to train t5-11b on 1x 40GB gpu w/ Deepspeed (A100-SXM4-40GB)

Thank you, @PeterAJansen for letting me use your hardware!

Thank you, @jeffra and @samyam, for not believing that it is not possible to train t5-11b on 1x 40GB gpu w/ Deepspeed and supporting me that lead me to find a few bugs in the integration.

Sharing details for those who need.

If you want to try this at home please make sure you use transformers master as some bug fixes were just merged in

Well, it's similar to the t5-3b on 24GB success reported here and here. But this time t5-11b on 1x 40GB gpu (or 4x if you wanted things faster)

As someone asked me before you need a huge amount of general RAM to use ZeRO-Offload for a huge model:

for t5-3b on 1x 24GB gpu: ~71GB RAM
for t5-11b on 1x 40GB gpu: ~234GB RAM

I was using /usr/bin/time -v program to get the peak memory measurement - it's the Maximum resident set size entry in the final report.

Question: I don't think /usr/bin/time does the right thing for multi-process - I think it only measures the parent process. e.g. with 4x gpus it reported only 102GB RAM, but I clearly saw in top that it was around 240GB. If you have an easy way to measure peak memory that takes into an account forked processes I'm all ears.

Batch sizes on one gpu:

with buffers of 5e8 I was able to run BS=2, which might be too small for training,
but with 2e8 I managed to squeeze in BS=10 for training, but OOMed on prediction

I'm referring to these batch sizes in ds_config.json:

        "allgather_bucket_size": 2e8,
        "reduce_bucket_size": 2e8,

And I tested for 2x and 4x DDP as well, BS=16 OOMed, BS=8 was good so I used that - but could probably squeeze some more.

edit1: later tests show that my test was too short and wasn't getting the CPU Adam optimizer kick in, as it skips the first 20 or so tests because of the overflow. So once it kicks in it takes more GPU memory, so the practical BS is much smaller - I think around 2 on this setup. So most likely you will need to use BS=2 for real work, until things get optimized even more.

edit2: things are getting re-shuffling in the tests, so the default ds_config.json file has moved in master to a new, hopefully permanent home. It's now at examples/tests/deepspeed/ds_config.json so you will need to adjust the command line to reflect this new location or simply copy it over to where the old one used to be.

here is the full benchmark:

# 1 gpu: 
# only training fits with this BS, eval needs a smaller BS

export BS=8; rm -rf output_dir; PYTHONPATH=../../src USE_TF=0 /usr/bin/time -v deepspeed --num_gpus=1 ./finetune_trainer.py --model_name_or_path t5-11b --output_dir output_dir --adam_eps 1e-06 --data_dir wmt_en_ro --do_eval --do_predict --do_train --evaluation_strategy=steps --freeze_embeds --label_smoothing 0.1 --learning_rate 3e-5 --logging_first_step --logging_steps 1000 --max_source_length 128 --max_target_length 128 --num_train_epochs 1 --overwrite_output_dir --per_device_eval_batch_size $BS --per_device_train_batch_size $BS --predict_with_generate --eval_steps 25000  --sortish_sampler --task translation_en_to_ro --test_max_target_length 128 --val_max_target_length 128 --warmup_steps 5 --n_train 60 --n_val 10 --n_test 10 --deepspeed ds_config.json --fp16

{'train_runtime': 31.0897, 'train_samples_per_second': 0.257, 'epoch': 1.0}

# 2 gpus:

export BS=8; rm -rf output_dir; PYTHONPATH=../../src USE_TF=0 /usr/bin/time -v deepspeed --num_gpus=2 ./finetune_trainer.py --model_name_or_path t5-11b --output_dir output_dir --adam_eps 1e-06 --data_dir wmt_en_ro --do_eval --do_predict --do_train --evaluation_strategy=steps --freeze_embeds --label_smoothing 0.1 --learning_rate 3e-5 --logging_first_step --logging_steps 1000 --max_source_length 128 --max_target_length 128 --num_train_epochs 1 --overwrite_output_dir --per_device_eval_batch_size $BS --per_device_train_batch_size $BS --predict_with_generate --eval_steps 25000  --sortish_sampler --task translation_en_to_ro --test_max_target_length 128 --val_max_target_length 128 --warmup_steps 5 --n_train 60 --n_val 10 --n_test 10 --deepspeed ds_config.json --fp16

{'train_runtime': 17.9026, 'train_samples_per_second': 0.223, 'epoch': 1.0}

# 4 gpus

export BS=8; rm -rf output_dir; PYTHONPATH=../../src USE_TF=0 /usr/bin/time -v deepspeed --num_gpus=4 ./finetune_trainer.py --model_name_or_path t5-11b --output_dir output_dir --adam_eps 1e-06 --data_dir wmt_en_ro --do_eval --do_predict --do_train --evaluation_strategy=steps --freeze_embeds --label_smoothing 0.1 --learning_rate 3e-5 --logging_first_step --logging_steps 1000 --max_source_length 128 --max_target_length 128 --num_train_epochs 1 --overwrite_output_dir --per_device_eval_batch_size $BS --per_device_train_batch_size $BS --predict_with_generate --eval_steps 25000  --sortish_sampler --task translation_en_to_ro --test_max_target_length 128 --val_max_target_length 128 --warmup_steps 5 --n_train 60 --n_val 10 --n_test 10 --deepspeed ds_config.json --fp16

{'train_runtime': 10.4404, 'train_samples_per_second': 0.192, 'epoch': 1.0}

Checkpointing should allow making even bigger batch sizes.

DeepSpeed

opened by stas00 65

FP16 overflow with GPT-Neo when using sequence lengths of 2048.
Environment info

transformers version: 4.5.0.dev0

Platform: Linux-5.4.0-54-generic-x86_64-with-glibc2.29

Python version: 3.8.5

PyTorch version (GPU?): 1.8.0+cu111

Tensorflow version (GPU?): N/A

Using GPU in script?: Yes

Using distributed or parallel set-up in script?: No

Who can help

@stas00

Models:

GPT-Neo 1.3b

Library:

deepspeed: @stas00

Information

Model I am using (Bert, XLNet ...):

The problem arises when using:

[ ] the official example scripts: (give details below)

[x] my own modified scripts: (give details below)

The tasks I am working on is:

[ ] an official GLUE/SQUaD task: (give the name)

[x] my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

Use GPT-Neo 1.3b with The Pile dataset and built in trainer. Artificial data also suffices. It does not matter what the data is, as long as the attention mask spans all 2048 tokens.

Enable FP16 and set max_length to 2048

Observe that all loses reported are NaN

Also reproducible using AMP or DeepSpeed. It seems like there is code to circumvent this outlined in the GPT-Neo implementation where q,k,v are casted to fp32 in the attention block.

When the max_length is shorter (512) this overflow does not occur.

Expected behavior

I expected no overflows.

Aside

I'm reaching out on behalf of EleutherAI, Lysandre told us to create an issue about this.
opened by LouisCastricato 62

[deepspeed] `bigscience/T0*` multi-gpu inference with ZeRO

Environment info

transformers version: 4.17.0.dev0
Platform: Linux-5.13.0-27-generic-x86_64-with-glibc2.10
Python version: 3.8.0
PyTorch version (GPU?): 1.10.1 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: yes
Using distributed or parallel set-up in script?: yes (deepspeed)
Note: I installed DeepSpeed from source

Who can help

Models: (I'm actually trying to use T0pp but T5 is close enough)

T5, BART, Marian, Pegasus, EncoderDecoder: @patrickvonplaten

Library:

Deepspeed: @stas00
Text generation: @patrickvonplaten @narsil

Information

Model I am using (Bert, XLNet ...): T0pp / T0_3B

The problem arises when using:

[ ] the official example scripts: (give details below)
[X] my own modified scripts: (give details below)

The tasks I am working on is:

[ ] an official GLUE/SQUaD task: (give the name)
[X] my own task or dataset: (give details below)

To reproduce

I want to load T0pp across 2 24GB GPUs and only run inference. I know Deepspeed wit zeRO stage 3 is the way to go for this from reading documentation. I am following the HuggingFace example here to use Deepspeed without a Trainer object.

The error I get is

[2022-01-28 18:36:41,193] [INFO] [partition_parameters.py:456:__exit__] finished initializing model with 2.85B parameters
Traceback (most recent call last):
  File "multi_gpu_T0pp.py", line 26, in <module>
    engine = deepspeed.initialize(model=model, config_params=ds_config)
AttributeError: module 'transformers.deepspeed' has no attribute 'initialize'

My code:

Run with CUDA_VISIBLE_DEVICES="0,1" deepspeed <script.py>

"""
Example code to load a PyTorch model across GPUs
"""
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from transformers.deepspeed import HfDeepSpeedConfig
from transformers import deepspeed
import pandas as pd
import torch
import pdb
import os

seed = 42
torch.manual_seed(seed)

ds_config = {
    "fp16": {
        "enabled": "auto",
        "loss_scale": 0,
        "loss_scale_window": 1000,
        "initial_scale_power": 16,
        "hysteresis": 2,
        "min_loss_scale": 1
    },
    "zero_optimization": {
        "stage": 3,
        "overlap_comm": true,
        "contiguous_gradients": true,
        "sub_group_size": 1e9,
        "reduce_bucket_size": "auto",
        "stage3_prefetch_bucket_size": "auto",
        "stage3_param_persistence_threshold": "auto",
        "stage3_max_live_parameters": 1e9,
        "stage3_max_reuse_distance": 1e9,
        "stage3_gather_fp16_weights_on_model_save": true
    },
    "gradient_accumulation_steps": 1,
    "gradient_clipping": 0,
    "steps_per_print": 2000,
    "train_batch_size": 2,
    "train_micro_batch_size_per_gpu": 1,
    "wall_clock_breakdown": false
}

if __name__ == "__main__":
    # must run before instantiating the model
    # ds_config is deepspeed config object or path to the file
    dschf = HfDeepSpeedConfig(ds_config)  # keep this object alive

    model_name = "bigscience/T0_3B"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

    engine = deepspeed.initialize(model=model, config_params=ds_config)

    inputs = tokenizer.encode(
        "Is this review positive or negative? Review: this is the best cast iron skillet you will ever buy",
        return_tensors="pt")
    outputs = model.generate(inputs)
    print(tokenizer.decode(outputs[0]))

Expected behavior

T0pp (or T0_3B) to load across 2 GPUs, generate an answer, and then quit.

DeepSpeed

opened by AADeLucia 57

How to use fine-tuned BART for prediction?
❓ Questions & Help

Details

I fine-tuned the BART model on a custom summarization dataset using the transformers/examples/summarization/bart/finetune.py and transformers/examples/summarization/bart/run_train.sh files in the repository for training (which generated three checkpointepoch=*.ckpt files) and prediction (which generated a .txt file with the test loss scores).

I have two questions on using this model for prediction:

How can I modify finetune.py to generate predictions for the test set, in addition to the loss scores? I see some test functions in finetune.py, but I'm not sure how to use these for generating a .txt file with the predictions.

How can I load the generated .ckpt files into BartForConditionalGeneration()? A config.json file was not generated along with the checkpoint files; there doesn't seem to be a TFBartForConditionalGeneration; and the convert_tf_checkpoint_to_pytorch.py script in the repo doesn't seem to support BART yet.

Thank you for your time!
Discussion wontfix
opened by riacheruvu 56
Add TF ViT MAE
This PR adds the MAE [1] model in TensorFlow. It was developed by @arig23498 and myself.

Fun facts about this PR:

Probably the third pure vision model in TensorFlow in transformers.

References:

[1] Masked Autoencoders Are Scalable Vision Learners

Update

The PR is now ready for review. @gante @Rocketknight1 @sgugger
opened by sayakpaul 49
Installation Error - Failed building wheel for tokenizers
🐛 Bug

Information

Model I am using (Bert, XLNet ...): N/A

Language I am using the model on (English, Chinese ...): N/A

The problem arises when using:

[X] the official example scripts: (give details below)

Problem arises in transformers installation on Microsoft Windows 10 Pro, version 10.0.17763

After creating and activating the virtual environment, installing transformers is not possible, because the following error occurs:

"error: can not find Rust Compiler" "ERROR: Failed building wheel for tokenizers" Failed to build tokenizers ERROR: Could not build wheels for tokenizers which use PEP 517 and cannot be installed d

The tasks I am working on is: [X ] transformers installation

To reproduce

Steps to reproduce the behavior:

From command line interface, create and activate a virtual environment by following the steps in this URL: https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/

Install transformers from source, by following the example in the topic From Source on this URL: https://github.com/huggingface/transformers

-m pip --version -m pip install --upgrade pip -m pip install --user virtualenv -m venv env .\env\Scripts\activate pip install transformers ERROR: Command errored out with exit status 1: command: 'c:\users\vbrandao\env\scripts\python.exe' 'c:\users\vbrandao\env\lib\site-packages\pip\_vendor\pep517\_in_process.py' build_wheel 'C:\Users\vbrandao\AppData\Local\Temp\tmpj6evjmze' cwd: C:\Users\vbrandao\AppData\Local\Temp\pip-install-sza2_lmj\tokenizers Complete output (10 lines): running bdist_wheel running build running build_py creating build creating build\lib creating build\lib\tokenizers copying tokenizers\__init__.py -> build\lib\tokenizers running build_ext running build_rust error: Can not find Rust compiler ---------------------------------------- ERROR: Failed building wheel for tokenizers Failed to build tokenizers ERROR: Could not build wheels for tokenizers which use PEP 517 and cannot be installed directly

Expected behavior

Installation of transformers should be complete.

Environment info

transformers version: N/A - installation step

Platform: Command Line Interface / Virtual Env

Python version: python 3.8

PyTorch version (GPU?): N/A

Tensorflow version (GPU?): N/A

Using GPU in script?: N/A

Using distributed or parallel set-up in script?: N/A

wontfix Core: Tokenization Installation
opened by victorlongo 49
Add TFConvNextModel
This PR adds the ConvNeXt [1] model in TensorFlow. It was developed by @arig23498, @gante, and myself.

Fun facts about this PR:

Probably the first pure conv model in transformers.

Probably the second pure vision model in TensorFlow in transformers.

References:

[1] A ConvNet for the 2020s: https://arxiv.org/abs/2201.03545.

@gante @LysandreJik @Rocketknight1
opened by sayakpaul 48
Fix race condition on cleaning checkpoints when save_total_limit set to 1
What does this PR do?

This PR fixes #20988 by testing whether the worker process is allowed to save (self.args.should_save is set to True).

Fixes #20988

Before submitting

[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).

[X] Did you read the contributor guideline, Pull Request section?

[ ] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.

[ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.

[ ] Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

trainer: @sgugger
opened by radcheb 0

[Multi-node setup] Race condition on deleting checkpoint when using shared filesystem and save_total_limit=1

System Info

When running training on multi-node setup with a shared filesystem (shared PVC on Kubernetes). W use the following configuration (Full example on Reproduction section) :

        load_best_model_at_end=True,
        save_on_each_node=False,
        save_total_limit=1,

When the training is finished over all epochs, it fails with FileNotFoundError with random file. It seems all the workers are trying to delete the same files when we set save_total_limit=1. This is causing whole training script to fail:

FileNotFoundError: [Errno 2] No such file or directory: 'rng_state_1.pth'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 7796)
...
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
``

### Who can help?

@sgugger

### Information

- [X] The official example scripts
- [X] My own modified scripts

### Tasks

- [X] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

I created the following python script `trainer_bug.py`, it runs **GLUE**  `cola` training task on a small sample of data:
```python
# pip install transformers==4.25.1 datasets==2.8.0 torch==1.13.1 scipy scikit-learn
import numpy as np
from datasets import load_dataset, load_metric
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer


task = "cola"
model_checkpoint = "distilbert-base-uncased"
num_labels = 2
batch_size = 2
metric_name = "matthews_correlation"
validation_key  = "validation"
SAMPLE_N_ROWS = 10

if __name__ == "__main__":
    dataset = load_dataset("glue", task)
    for split in dataset:
        dataset[split] = dataset[split].select(range(SAMPLE_N_ROWS))
    metric = load_metric('glue', task)
    tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, use_fast=True)
    def preprocess_function(examples):
        return tokenizer(examples["sentence"], truncation=True)


    def compute_metrics(eval_pred):
        predictions, labels = eval_pred
        predictions = np.argmax(predictions, axis=1)
        return metric.compute(predictions=predictions, references=labels)

    encoded_dataset = dataset.map(preprocess_function, batched=True)
    model = AutoModelForSequenceClassification.from_pretrained(model_checkpoint, num_labels=num_labels)

    model_name = model_checkpoint.split("/")[-1]

    args = TrainingArguments(
        f"{model_name}-finetuned-{task}",
        evaluation_strategy="epoch",
        save_strategy="epoch",
        learning_rate=2e-5,
        per_device_train_batch_size=batch_size,
        per_device_eval_batch_size=batch_size,
        num_train_epochs=3,
        weight_decay=0.01,
        report_to="none",
        metric_for_best_model=metric_name,
        overwrite_output_dir=True,
        load_best_model_at_end=True,
        log_on_each_node=False,
        save_on_each_node=False,
        save_total_limit=1,
        # For a distributed CPU setup
        no_cuda=True,
        xpu_backend="gloo",
    )

    trainer = Trainer(
        model,
        args,
        train_dataset=encoded_dataset["train"],
        eval_dataset=encoded_dataset[validation_key],
        tokenizer=tokenizer,
        compute_metrics=compute_metrics
    )

    trainer.train()

And then run it with this script trainer_bug.sh to simulate 2 nodes setup on CPUs:

WORLD_SIZE=2
PROC_PER_NODE=1
MASTER_HOSTNAME=localhost
MASTER_PORT=12345

# Run worker
RANK=1
CUDA_VISIBLE_DEVICES="" torchrun --nnodes=$WORLD_SIZE --nproc_per_node=$PROC_PER_NODE \
            --node_rank=$RANK --master_addr=$MASTER_HOSTNAME \
            --master_port=$MASTER_PORT \
            trainer_bug.py &

# Run master
RANK=0
CUDA_VISIBLE_DEVICES="" torchrun --nnodes=$WORLD_SIZE --nproc_per_node=$PROC_PER_NODE \
            --node_rank=$RANK --master_addr=$MASTER_HOSTNAME \
            --master_port=$MASTER_PORT \
            trainer_bug.py

Expected behavior

The training is expected to finish successfully. However it fails with the following stack trace:

Loading best model from distilbert-base-uncased-finetuned-cola/checkpoint-3 (score: 0.0).
{'train_runtime': 24.6088, 'train_samples_per_second': 1.219, 'train_steps_per_second': 0.366, 'train_loss': 0.5689484278361002, 'epoch': 3.0}{'train_runtime': 24.6164, 'train_samples_per_second': 1.219, 'train_steps_per_second': 0.366, 'train_loss': 0.5813997056749132, 'epoch': 3.0}

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:24<00:00,  1.83s/it]
Deleting older checkpoint [distilbert-base-uncased-finetuned-cola/checkpoint-9] due to args.save_total_limit
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:24<00:00,  2.74s/it]
Traceback (most recent call last):
  File "trainer_bug.py", line 66, in <module>
    trainer.train()
  File "/home/XXX/.cache/pypoetry/virtualenvs/XXX-training-zu6czGQ--py3.8/lib/python3.8/site-packages/transformers/trainer.py", line 1527, in train
    return inner_training_loop(
  File "/home/XXX/.cache/pypoetry/virtualenvs/XXX-training-zu6czGQ--py3.8/lib/python3.8/site-packages/transformers/trainer.py", line 1920, in _inner_training_loop
    shutil.rmtree(checkpoint)
  File "/home/XXX/.pyenv/versions/3.8.13/lib/python3.8/shutil.py", line 718, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/home/XXX/.pyenv/versions/3.8.13/lib/python3.8/shutil.py", line 675, in _rmtree_safe_fd
    onerror(os.unlink, fullname, sys.exc_info())
  File "/home/XXX/.pyenv/versions/3.8.13/lib/python3.8/shutil.py", line 673, in _rmtree_safe_fd
    os.unlink(entry.name, dir_fd=topfd)
FileNotFoundError: [Errno 2] No such file or directory: 'rng_state_1.pth'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 7796) of binary: /home/XXX/.cache/pypoetry/virtualenvs/XXX-training-zu6czGQ--py3.8/bin/python
Traceback (most recent call last):
  File "/home/XXX/.cache/pypoetry/virtualenvs/XXX-training-zu6czGQ--py3.8/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/home/XXX/.cache/pypoetry/virtualenvs/XXX-training-zu6czGQ--py3.8/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/home/XXX/.cache/pypoetry/virtualenvs/XXX-training-zu6czGQ--py3.8/lib/python3.8/site-packages/torch/distributed/run.py", line 762, in main
    run(args)
  File "/home/XXX/.cache/pypoetry/virtualenvs/XXX-training-zu6czGQ--py3.8/lib/python3.8/site-packages/torch/distributed/run.py", line 753, in run
    elastic_launch(
  File "/home/XXX/.cache/pypoetry/virtualenvs/XXX-training-zu6czGQ--py3.8/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/XXX/.cache/pypoetry/virtualenvs/XXX-training-zu6czGQ--py3.8/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
trainer_bug.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-01-03_18:28:49
  host      : XXXXXX
  rank      : 1 (local_rank: 0)
  exitcode  : 1 (pid: 7796)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

opened by radcheb 0

Hugging Face Dies Silently when Memory insufficient for loading Model / Training Model

Currently, when you load a model into memory that is too large or if you try to train a model with insufficient memory. The process gets killed without an error message. It's a bit tough to track down what is going on as a result. I'm wondering if you can add an error message similar to pytorch when you have insufficient memory to run a given process?

opened by courtneysprouse 2
Added mask_time_prob and mask_time_length arguments to wav2vec2 pretraining script

This PR relates to PR 19997, in which I messed up the PR by forgetting the --force flag when pushing. Hopefully this PR is correctly performed.

@sanchit-gandhi @sgugger @patrickvonplaten

opened by mpierrau 1
Add DETA

What does this PR do?

This PR adds DETA. DETA is a slight change to Deformable DETR by using traditional IoU-based assignment as opposed to the Hungarian matching used in the original DETR, and incorporating NMS (non-maximum suppression) in the postprocessing.

Note: this model has a torchvision dependency for NMS.

opened by NielsRogge 0
Adding Support for Mixed Precision in Accelerator
There's a bug in the code that, we've got accelerator.use_fp16 but the accelerator.use_fp16 flag can never be True because we didn't pass it in. I've added the support by passing in the fp16 flag.

What does this PR do?

Fixes # (issue)

Before submitting

[x] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).

[x] Did you read the contributor guideline, Pull Request section?

[ ] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.

[ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.

[ ] Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.
opened by BiEchi 1

Releases(v4.25.1)

v4.25.1(Dec 2, 2022)
PyTorch 2.0 stack support

We are very excited by the newly announced PyTorch 2.0 stack. You can enable torch.compile on any of our models, and get support with the Trainer (and in all our PyTorch examples) by using the torchdynamo training argument. For instance, just add --torchdynamo inductor when launching those examples from the command line.

This API is still experimental and may be subject to changes as the PyTorch 2.0 stack matures.

Note that to get the best performance, we recommend:

using an Ampere GPU (or more recent)

sticking to fixed shaped for now (so use --pad_to_max_length in our examples)

Repurpose torchdynamo training args towards torch._dynamo by @sgugger in #20498

Audio Spectrogram Transformer

The Audio Spectrogram Transformer model was proposed in AST: Audio Spectrogram Transformer by Yuan Gong, Yu-An Chung, James Glass. The Audio Spectrogram Transformer applies a Vision Transformer to audio, by turning audio into an image (spectrogram). The model obtains state-of-the-art results for audio classification.

Add Audio Spectogram Transformer by @NielsRogge in #19981

Jukebox

The Jukebox model was proposed in Jukebox: A generative model for music by Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever. It introduces a generative music model which can produce minute long samples that can be conditionned on an artist, genres and lyrics.

Add Jukebox model (replaces #16875) by @ArthurZucker in #17826

Switch Transformers

The SwitchTransformers model was proposed in Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity by William Fedus, Barret Zoph, Noam Shazeer.

It is the first MoE model supported in transformers, with the largest checkpoint currently available currently containing 1T parameters.

Add Switch transformers by @younesbelkada and @ArthurZucker in #19323

RocBert

The RoCBert model was proposed in RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining by HuiSu, WeiweiShi, XiaoyuShen, XiaoZhou, TuoJi, JiaruiFang, JieZhou. It’s a pretrained Chinese language model that is robust under various forms of adversarial attacks.

Add RocBert by @sww9370 in #20013

CLIPSeg

The CLIPSeg model was proposed in Image Segmentation Using Text and Image Prompts by Timo Lüddecke and Alexander Ecker. CLIPSeg adds a minimal decoder on top of a frozen CLIP model for zero- and one-shot image segmentation.

Add CLIPSeg by @NielsRogge in #20066

NAT and DiNAT

NAT

NAT was proposed in Neighborhood Attention Transformer by Ali Hassani, Steven Walton, Jiachen Li, Shen Li, and Humphrey Shi.

It is a hierarchical vision transformer based on Neighborhood Attention, a sliding-window self attention pattern.

DiNAT

DiNAT was proposed in Dilated Neighborhood Attention Transformer by Ali Hassani and Humphrey Shi.

It extends NAT by adding a Dilated Neighborhood Attention pattern to capture global context, and shows significant performance improvements over it.

Add Neighborhood Attention Transformer (NAT) and Dilated NAT (DiNAT) models by @alihassanijr in #20219

MobileNetV2

The MobileNet model was proposed in MobileNetV2: Inverted Residuals and Linear Bottlenecks by Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen.

add MobileNetV2 model by @hollance in #17845

MobileNetV1

The MobileNet model was proposed in MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications by Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam.

add MobileNetV1 model by @hollance in #17799

Image processors

Image processors replace feature extractors as the processing class for computer vision models.

Important changes:

size parameter is now a dictionary of {"height": h, "width": w}, {"shortest_edge": s}, {"shortest_egde": s, "longest_edge": l} instead of int or tuple.

Addition of data_format flag. You can now specify if you want your images to be returned in "channels_first" - NCHW - or "channels_last" - NHWC - format.

Processing flags e.g. do_resize can be passed directly to the preprocess method instead of modifying the class attribute: image_processor([image_1, image_2], do_resize=False, return_tensors="pt", data_format="channels_last")

Leaving return_tensors unset will return a list of numpy arrays.

The classes are backwards compatible and can be created using existing feature extractor configurations - with the size parameter converted.

Add Image Processors by @amyeroberts in #19796

Add Donut image processor by @amyeroberts #20425

Add segmentation + object detection image processors by @amyeroberts in #20160

AutoImageProcessor by @amyeroberts in #20111

Backbone for computer vision models

We're adding support for a general AutoBackbone class, which turns any vision model (like ConvNeXt, Swin Transformer) into a backbone to be used with frameworks like DETR and Mask R-CNN. The design is in early stages and we welcome feedback.

Add AutoBackbone + ResNetBackbone by @NielsRogge in #20229

Improve backbone by @NielsRogge in #20380

[AutoBackbone] Improve API by @NielsRogge in #20407

Support for safetensors offloading

If the model you are using has a safetensors checkpoint and you have the library installed, offload to disk will take advantage of this to be more memory efficient and roughly 33% faster.

Safetensors offload by @sgugger in #20321

Contrastive search in the generate method

Generate: TF contrastive search with XLA support by @gante in #20050

Generate: contrastive search with full optional outputs by @gante in #19963

Breaking changes

🚨 🚨 🚨 Fix Issue 15003: SentencePiece Tokenizers Not Adding Special Tokens in convert_tokens_to_string by @beneyal in #15775

Bugfixes and improvements

add dataset by @stevhliu in #20005

Add BERT resources by @stevhliu in #19852

Add LayoutLMv3 resource by @stevhliu in #19932

fix typo by @stevhliu in #20006

Update object detection pipeline to use post_process_object_detection methods by @alaradirik in #20004

clean up vision/text config dict arguments by @ydshieh in #19954

make sentencepiece import conditional in bertjapanesetokenizer by @ripose-jp in #20012

Fix gradient checkpoint test in encoder-decoder by @ydshieh in #20017

Quality by @sgugger in #20002

Update auto processor to check image processor created by @amyeroberts in #20021

[Doctest] Add configuration_deberta_v2.py by @Saad135 in #19995

Improve model tester by @ydshieh in #19984

Fix doctest by @ydshieh in #20023

Show installed libraries and their versions in CI jobs by @ydshieh in #20026

reorganize glossary by @stevhliu in #20010

Now supporting pathlike in pipelines too. by @Narsil in #20030

Add **kwargs by @amyeroberts in #20037

Fix some doctests after PR 15775 by @ydshieh in #20036

[Doctest] Add configuration_camembert.py by @Saad135 in #20039

[Whisper Tokenizer] Make more user-friendly by @sanchit-gandhi in #19921

[FuturWarning] Add futur warning for LEDForSequenceClassification by @ArthurZucker in #19066

fix jit trace error for model forward sequence is not aligned with jit.trace tuple input sequence, update related doc by @sywangyi in #19891

Update esmfold conversion script by @Rocketknight1 in #20028

Fixed torch.finfo issue with torch.fx by @michaelbenayoun in #20040

Only resize embeddings when necessary by @sgugger in #20043

Speed up TF token classification postprocessing by converting complete tensors to numpy by @deutschmn in #19976

Fix ESM LM head test by @Rocketknight1 in #20045

Update README.md by @bofenghuang in #20063

fix tokenizer_type to avoid error when loading checkpoint back by @pacman100 in #20062

[Trainer] Fix model name in push_to_hub by @sanchit-gandhi in #20064

PoolformerImageProcessor defaults to match previous FE by @amyeroberts in #20048

change constant torch.tensor to torch.full by @MerHS in #20061

Update READMEs for ESMFold and add notebooks by @Rocketknight1 in #20067

Update documentation on seq2seq models with absolute positional embeddings, to be in line with Tips section for BERT and GPT2 by @jordiclive in #20068

Allow passing arguments to model testers for CLIP-like models by @ydshieh in #20044

Show installed libraries and their versions in GA jobs by @ydshieh in #20069

Update defaults and logic to match old FE by @amyeroberts in #20065

Update modeling_tf_utils.py by @cakiki in #20076

Update hub.py by @cakiki in #20075

[Doctest] Add configuration_dpr.py by @Saad135 in #20080

Removing RobertaConfig inheritance from CamembertConfig by @Saad135 in #20059

Skip 2 tests in VisionTextDualEncoderProcessorTest by @ydshieh in #20098

Replace unsupported facebookresearch/bitsandbytes by @tomaarsen in #20093

docs: Resolve many typos in the English docs by @tomaarsen in #20088

use huggingface_hub.model_inifo() to get pipline_tag by @y-tag in #20077

Fix generate_dummy_inputs for ImageGPTOnnxConfig by @ydshieh in #20103

docs: Fixed variables in f-strings by @tomaarsen in #20087

Add new terms to the glossary by @stevhliu in #20051

Replace awkward timm link with the expected one by @tomaarsen in #20109

Fix AutoTokenizer with subfolder passed by @sgugger in #20110

[Audio Processor] Only pass sr to feat extractor by @sanchit-gandhi in #20022

Update github pr docs actions by @mishig25 in #20125

Adapt has_labels test when no labels were found by @sgugger in #20113

Improve tiny model creation script by @ydshieh in #20119

Remove BertConfig inheritance from RobertaConfig by @Saad135 in #20124

[Swin] Add Swin SimMIM checkpoints by @NielsRogge in #20034

Update CLIPSegModelTester by @ydshieh in #20134

Update SwinForMaskedImageModeling doctest values by @amyeroberts in #20139

Attempting to test automatically the _keys_to_ignore. by @Narsil in #20042

Generate: move generation_.py src files into generation/.py by @gante in #20096

add cv + audio labels by @stevhliu in #20114

Update VisionEncoderDecoder to use an image processor by @amyeroberts in #20137

[CLIPSeg] Add resources by @NielsRogge in #20118

Make DummyObject more robust by @mariosasko in #20146

Add RoCBertTokenizer to TOKENIZER_MAPPING_NAMES by @ydshieh in #20141

Adding support for LayoutLMvX variants for object-detection. by @Narsil in #20143

Add doc tests by @NielsRogge in #20158

doc comment fix: Args was in wrong place by @hollance in #20164

Update OnnxConfig.generate_dummy_inputs to check ImageProcessingMixin by @ydshieh in #20157

Generate: fix TF doctests by @gante in #20159

Fix arg names for our models by @Rocketknight1 in #20166

[processor] Add 'model input names' property by @sanchit-gandhi in #20117

Fix object-detection bug (height, width inversion). by @Narsil in #20167

[OWL-ViT] Make model consistent with CLIP by @NielsRogge in #20144

Fix type - update any PIL.Image.Resampling by @amyeroberts in #20172

Fix tapas scatter by @Bearnardd in #20149

Update README.md by @code-with-rajeev in #19530

Proposal Remove the weird inspect in ASR pipeline and make WhisperEncoder just nice to use. by @Narsil in #19571

Pytorch type hints by @IMvision12 in #20112

Generate: TF sample doctest result update by @gante in #20208

[ROC_BERT] Make CI happy by @younesbelkada in #20175

add _keys_to_ignore_on_load_unexpected = [r"pooler"] by @ArthurZucker in #20210

docs: translated index page to korean by @wonhyeongseo in #20180

feat: add i18n issue template by @wonhyeongseo in #20199

[Examples] Generalise Seq2Seq ASR to handle Whisper by @sanchit-gandhi in #19519

mark test_save_load_fast_init_from_base as is_flaky by @ydshieh in #20200

Update README.md by @Nietism in #20188

Downgrade log warning -> info by @amyeroberts in #20202

Generate: add Bloom fixes for contrastive search by @gante in #20213

Adding chunking for whisper (all seq2seq actually). Very crude matching algorithm. by @Narsil in #20104

[docs] set overflowing image width to auto-scale by @wonhyeongseo in #20197

Update tokenizer_summary.mdx by @bofenghuang in #20135

Make ImageSegmentationPipelineTests less flaky by @ydshieh in #20147

update relative positional embedding by @ArthurZucker in #20203

[WHISPER] Update modeling tests by @ArthurZucker in #20162

Add accelerate support for ViT family by @younesbelkada in #20174

Add param_name to size_dict logs & tidy by @amyeroberts in #20205

Add object detection + segmentation transforms by @amyeroberts in #20003

Typo on doctring in ElectraTokenizer by @FacerAin in #20192

Remove authorized_missing_keysin favor of _keys_to_ignore_on_load_missing by @ArthurZucker in #20228

Add missing ESM autoclass by @Rocketknight1 in #20177

fix device issue by @ydshieh in #20227

fixed spelling error in testing.mdx by @kasmith11 in #20220

Fix run_clip.py by @ydshieh in #20234

Fix docstring of CLIPTokenizer(Fast) by @TilmannR in #20233

Fix MaskformerFeatureExtractor by @NielsRogge in #20100

New logging support to "Trainer" Class (ClearML Logger) by @skinan in #20184

Enable PyTorch 1.13 by @sgugger in #20168

[CLIP] allow loading projection layer in vision and text model by @patil-suraj in #18962

Slightly alter Keras dummy loss by @Rocketknight1 in #20232

Add to DeBERTa resources by @Saad135 in #20155

Add clip resources to the transformers documentation by @ambujpawar in #20190

Update reqs to include min gather_for_metrics Accelerate version by @muellerzr in #20242

Allow trainer to return eval. loss for CLIP-like models by @ydshieh in #20214

Adds image-guided object detection support to OWL-ViT by @alaradirik in #20136

Adding audio-classification example in the doc. by @Narsil in #20235

Updating the doctest for conversational. by @Narsil in #20236

Adding doctest for fill-mask pipeline. by @Narsil in #20241

Adding doctest for feature-extraction. by @Narsil in #20240

Adding ASR pipeline example. by @Narsil in #20226

Adding doctest for document-question-answering by @Narsil in #20239

Adding an example for depth-estimation pipeline. by @Narsil in #20237

Complete doc migration by @mishig25 in #20267

Fix result saving errors of pytorch examples by @li-plus in #20276

Adding a doctest for table-question-answering pipeline. by @Narsil in #20260

Adding doctest for image-segmentation pipeline. by @Narsil in #20256

Adding doctest for text2text-generation pipeline. by @Narsil in #20261

Adding doctest for text-generation pipeline. by @Narsil in #20264

Add TF protein notebook to notebooks doc by @Rocketknight1 in #20271

Rephrasing the link. by @Narsil in #20253

Add Chinese-CLIP implementation by @yangapku in #20368

Adding doctest example for image-classification pipeline. by @Narsil in #20254

Adding doctest for zero-shot-image-classification pipeline. by @Narsil in #20272

Adding doctest for zero-shot-classification pipeline. by @Narsil in #20268

Adding doctest for visual-question-answering pipeline. by @Narsil in #20266

Adding doctest for text-classification pipeline. by @Narsil in #20262

Adding doctest for question-answering pipeline. by @Narsil in #20259

[Docs] Add resources of OpenAI GPT by @shogohida in #20084

Adding doctest for image-to-text pipeline. by @Narsil in #20257

Adding doctest for token-classification pipeline. by @Narsil in #20265

remaining pytorch type hints by @IMvision12 in #20217

Data collator for token classification pads labels column when receives pytorch tensors by @markovalexander in #20244

[Doctest] Add configuration_deformable_detr.py by @Saad135 in #20273

Fix summarization script by @muellerzr in #20286

[DOCTEST] Fix the documentation of RoCBert by @ArthurZucker in #20142

[bnb] Let's warn users when saving 8-bit models by @younesbelkada in #20282

Adding zero-shot-object-detection pipeline doctest. by @Narsil in #20274

Adding doctest for object-detection pipeline. by @Narsil in #20258

Image transforms functionality used instead by @amyeroberts in #20278

TF: add test for PushToHubCallback by @gante in #20231

Generate: general TF XLA constrastive search are now slow tests by @gante in #20277

Fixing the doctests failures. by @Narsil in #20294

set the default cache_enable to True, aligned with the default value in pytorch cpu/cuda amp autocast by @sywangyi in #20289

Add docstrings for canine model by @raghavanone in #19457

Add missing report button for Example test by @ydshieh in #20293

refactor test by @younesbelkada in #20300

[Tiny model creation] deal with ImageProcessor by @ydshieh in #20298

Fix blender bot missleading doc by @ArthurZucker in #20301

remove two tokens that should not be suppressed by @ArthurZucker in #20302

[ASR Examples] Update README for Whisper by @sanchit-gandhi in #20230

Add padding image transformation by @amyeroberts in #19838

Pin TensorFlow by @sgugger in #20313

Add AnyPrecisionAdamW optimizer by @atturaioe in #18961

[Proposal] Breaking change zero-shot-object-detection for improved consistency. by @Narsil in #20280

Fix flakey test with seed by @muellerzr in #20318

Pin TF 2.10.1 for Push CI by @ydshieh in #20319

Remove double brackets by @stevhliu in #20307

TF: future proof our keras imports by @gante in #20317

organize pipelines by modality by @stevhliu in #20306

Fix torch device issues by @ydshieh in #20304

Generate: add generation config class by @gante in #20218

translate zh quicktour by @bfss in #20095)

Add Spanish translation of serialization.mdx by @donelianc in #20245

Add LayerScale to NAT/DiNAT by @alihassanijr in #20325

[Switch Transformers] Fix failing slow test by @younesbelkada in #20346

fix: "BigSicence" typo in docs by @rajrajhans in #20331

Generate: model_kwargs can also be an input to prepare_inputs_for_generation by @gante in #20353

Update Special Language Tokens for PLBART by @jordiclive in #19980

Add resources by @NielsRogge in #20296

Enhance HfArgumentParser functionality and ease of use by @konstantinjdobler in #20323

Add inference section to task guides by @stevhliu in #18781

Fix toctree for Section 3 in Spanish Documentation by @donelianc in #20360

Generate: shorter XLA contrastive search tests by @gante in #20354

revert keys_to_ignore for M2M100 by @younesbelkada in #20381

add accelerate support for ESM by @younesbelkada in #20379

Fix nightly runs by @sgugger in #20352

Optimizes DonutProcessor token2json method for speed by @michaelnation26 in #20283

Indicate better minimal version of PyTorch in big model inference by @sgugger in #20385

Fix longformer onnx broken export by @fxmarty in #20292

Use tiny models for ONNX tests - text modality by @lewtun in #20333

[ESM] fix accelerate tests for esmfold by @younesbelkada in #20387

Generate: fix plbart generation tests by @gante in #20391

[bloom] convert script tweaks by @stas00 in #18593

Fix doctest file path by @ydshieh in #20400

[Image Transformers] to_pil fix float edge cases by @patrickvonplaten in #20406

make daily CI happy by @younesbelkada in #20410

fix nasty bnb bug by @younesbelkada in #20408

change the way sentinel tokens can retrived by @raghavanone in #20373

[BNB] Throw ValueError when trying to cast or assign by @younesbelkada in #20409

Use updated model_max_length when saving tokenizers by @ydshieh in #20401

Add Spanish translation of pr_checks.mdx by @donelianc in #20339

fix device in longformer onnx path by @fxmarty in #20419

Fix ModelOutput instantiation when there is only one tuple by @sgugger in #20416

accelerate support for OwlViT by @younesbelkada in #20411

[AnyPrecisionAdamW] test fix by @stas00 in #20454

fix word_to_tokens docstring format by @SaulLu in #20450

Fix typo in FSMT Tokenizer by @kamalkraj in #20456

Fix device issues in CLIPSegModelIntegrationTest by @ydshieh in #20467

Fix links for contrastive_loss by @ydshieh in #20455

Fix doctests for audio models by @ydshieh in #20468

Fix ESM checkpoints for tests by @Rocketknight1 in #20436

More TF int dtype fixes by @Rocketknight1 in #20384

make tensors in function build_relative_position created on proper device instead of always on cpu by @qq775294390 in #20434

update cpu related doc by @sywangyi in #20444

with pytorch cpu only version. without --no_cuda, using --bf16 will trigger error like "Your setup doesn't support bf16/gpu. You need torch>=1.10, using Ampere GPU with cuda>=11.0" by @sywangyi in #20445

[CLIPTokenizer] Improve warning by @patrickvonplaten in #20458

Replace assertions with value errors on distilbert model by @JuheonChu in #20463

[Doctest] Add configuration_fsmt.py by @sha016 in #19936

Replace assertion with ValueError exceptions in run_image_captioning_flax.py by @katiele47 in #20365

[FLAX] Add dtype to embedding for bert/bart/opt/t5 by @merrymercy in #20340

fix both failing RoCBert tests by @ArthurZucker in #20469

Include image processor in add-new-model-like by @amyeroberts in #20439

chore: add link to the video cls notebook. by @sayakpaul in #20386

add timeout option for deepspeed engine by @henghuiz in #20443

[Maskformer] Add MaskFormerSwin backbone by @NielsRogge in #20344

Extract warnings from CI artifacts by @ydshieh in #20474

Add Donut image processor by @amyeroberts in #20425

Fix torch meshgrid warnings by @fxmarty in #20475

Fix init import_structure sorting by @sgugger in #20477

extract warnings in GH workflows by @ydshieh in #20487

add in layer gpt2 tokenizer by @piEsposito in #20421

Replace assert statements with raise exceptions by @miyu386 in #20478

fixed small typo by @sandeepgadhwal in #20490

Fix documentation code to import facebook/detr-resnet-50 model by @JuanFKurucz in #20491

Fix disk offload for full safetensors checkpoints by @sgugger in #20497

[modelcard] Check for IterableDataset by @sanchit-gandhi in #20495

[modelcard] Set model name if empty by @sanchit-gandhi in #20496

Add segmentation + object detection image processors by @amyeroberts in #20160

remove attention_mask truncation in whisper by @ydshieh in #20488

Make add_special_tokens more clear by @ydshieh in #20424

[OPT/Galactica] Load large galactica models by @younesbelkada in #20390

Support extraction of both train and eval XLA graphs by @jeffhataws in #20492

fix ipex+fp32 jit trace error in ipex 1.13 by @sywangyi in #20504

Expected output for the test changed by @ArthurZucker in #20493

Fix TF nightly tests by @Rocketknight1 in #20507

Update doc examples feature extractor -> image processor by @amyeroberts in #20501

Fix Typo in Docs for GPU by @julianpollmann in #20509

Fix minimum version for device_map by @sgugger in #20489

Update AutomaticSpeechRecognitionPipeline doc example by @ydshieh in #20512

Add natten for CI by @ydshieh in #20511

Fix Data2VecTextForCasualLM example code documentation by @JuanFKurucz in #20510

Add some warning for Dynamo and enable TF32 when it's set by @sgugger in #20515

[modelcard] Update dataset tags by @sanchit-gandhi in #20506

Change Doctests CI launch time by @ydshieh in #20523

Fix PLBart doctest by @ydshieh in #20527

Fix ConditionalDetrForSegmentation doc example by @ydshieh in #20531

add doc for by @younesbelkada in #20525

Update ZeroShotObjectDetectionPipeline doc example by @ydshieh in #20528

update post_process_image_guided_detection by @fcakyon in #20521

QnA example: add speed metric by @sywangyi in #20522

Fix doctest by @NielsRogge in #20534

Fix Hubert models in TFHubertModel and TFHubertForCTC documentation code by @JuanFKurucz in #20516

Fix link in pipeline device map by @stevhliu in #20517

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@sww9370

Add RocBert (#20013)

@IMvision12

Pytorch type hints (#20112)

remaining pytorch type hints (#20217)

@alihassanijr

Add Neighborhood Attention Transformer (NAT) and Dilated NAT (DiNAT) models (#20219)

Add LayerScale to NAT/DiNAT (#20325)

@bfss

translate zh quicktour(#20095) (#20181)

@donelianc

Add Spanish translation of serialization.mdx (#20245)

Fix toctree for Section 3 in Spanish Documentation (#20360)

Add Spanish translation of pr_checks.mdx (#20339)

@yangapku

Add Chinese-CLIP implementation (#20368)

Source code(tar.gz)
Source code(zip)
v4.24.0(Nov 1, 2022)
ESM-2/ESMFold

ESM-2 and ESMFold are new state-of-the-art Transformer protein language and folding models from Meta AI's Fundamental AI Research Team (FAIR). ESM-2 is trained with a masked language modeling objective, and it can be easily transferred to sequence and token classification tasks for proteins. Checkpoints exist in various sizes, from 8 million parameters up to a huge 15 billion parameter model.

ESMFold is a state-of-the-art single sequence protein folding model which produces high accuracy predictions significantly faster. Unlike previous protein folding tools like AlphaFold2 and openfold, ESMFold uses a pretrained protein language model to generate token embeddings that are used as input to the folding model, and so does not require a multiple sequence alignment (MSA) of related proteins as input. As a result, proteins can be folded in a single forward pass of the model without requiring any external databases or search/alignment tools to be present at inference time. This hugely reduces the time and compute requirements for folding.

Transformer protein language models were introduced in the paper Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences by Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus.

ESMFold was introduced in the paper Language models of protein sequences at the scale of evolution enable accurate structure prediction by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, and Alexander Rives.

Add ESMFold by @Rocketknight1 in #19977

TF port of ESM by @Rocketknight1 in #19587

LiLT

LiLT allows to combine any pre-trained RoBERTa text encoder with a lightweight Layout Transformer, to enable LayoutLM-like document understanding for many languages.

It was proposed in LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding by Jiapeng Wang, Lianwen Jin, Kai Ding.

Add LiLT by @NielsRogge in #19450

Flan-T5

FLAN-T5 is an enhanced version of T5 that has been finetuned on a mixture of tasks.

It was released in the paper Scaling Instruction-Finetuned Language Models by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei.

Add flan-t5 documentation page by @younesbelkada in #19892

Table Transformer

Table Transformer is a model that can perform table extraction and table structure recognition from unstructured documents based on the DETR architecture.

It was proposed in PubTables-1M: Towards comprehensive table extraction from unstructured documents by Brandon Smock, Rohith Pesala, Robin Abraham.

Add table transformer [v2] by @NielsRogge in #19614

Contrastive search decoding

Contrastive search decoding is a new state-of-the-art generation method which aims at reducing the repetitive patterns in which generation models often fall.

It was introduced in A Contrastive Framework for Neural Text Generation by Yixuan Su, Tian Lan, Yan Wang, Dani Yogatama, Lingpeng Kong, Nigel Collier.

Adding the state-of-the-art contrastive search decoding methods for the codebase of generation_utils.py by @gmftbyGMFTBY in #19477

Safety and security

We continue to explore the new serialization format not using Pickle via the safetensors library, this time by adding support for TensorFlow models. More checkpoints have been converted to this format. Support is still experimental.

Safetensors tf by @sgugger in #19900

🚨 Breaking changes

The following changes are bugfixes that we have chosen to fix even if it changes the resulting behavior. We mark them as breaking changes, so if you are using this part of the codebase, we recommend you take a look at the PRs to understand what changes were done exactly.

🚨🚨🚨 TF: Remove TFWrappedEmbeddings (breaking: TF embedding initialization updated for encoder-decoder models) by @gante in #19263

🚨🚨🚨 [Breaking change] Deformable DETR intermediate representations by @Narsil in #19678

Bugfixes and improvements

Enabling custom TF signature draft by @dimitreOliveira in #19249

Fix whisper for pipeline by @ArthurZucker in #19482

Extend nested_XXX functions to mappings/dicts. by @Guillem96 in #19455

Syntax issues (lines 126, 203) by @kant in #19444

CLI: add import protection to datasets by @gante in #19470

Fix TFGroupViT CI by @ydshieh in #19461

Fix doctests for DeiT and TFGroupViT by @ydshieh in #19466

Update WhisperModelIntegrationTests.test_large_batched_generation by @ydshieh in #19472

[Swin] Replace hard-coded batch size to enable dynamic ONNX export by @lewtun in #19475

TF: TFBart embedding initialization by @gante in #19460

Make LayoutLM tokenizers independent from BertTokenizer by @arnaudstiegler in #19351

Make XLMRoberta model and config independent from Roberta by @asofiaoliveira in #19359

Fix get_embedding dtype at init. time by @ydshieh in #19473

Decouples XLMProphet model from Prophet by @srhrshr in #19406

Implement multiple span support for DocumentQuestionAnswering by @ankrgyl in #19204

Add warning in generate & device_map=auto & half precision models by @younesbelkada in #19468

Update TF whisper doc tests by @amyeroberts in #19484

Make bert_japanese and cpm independent of their inherited modules by @Davidy22 in #19431

Added tokenize keyword arguments to feature extraction pipeline by @quancore in #19382

Adding the README_es.md and reference to it in the others files readme by @Oussamaosman02 in #19427

[CvT] Tensorflow implementation by @mathieujouffroy in #18597

python3 instead of python in push CI setup job by @ydshieh in #19492

Update PT to TF CLI for audio models by @amyeroberts in #19465

New by @IMvision12 in #19481

Fix OPTForQuestionAnswering doctest by @ydshieh in #19479

Use a dynamic configuration for circleCI tests by @sgugger in #19325

Add multi-node conditions in trainer_qa.py and trainer_seq2seq.py by @regisss in #19502

update doc for perf_train_cpu_many by @sywangyi in #19506

Avoid Push CI failing to report due to many commits being merged by @ydshieh in #19496

[Doctest] Add configuration_bert.py to doctest by @ydshieh in #19485

Fix whisper doc by @ArthurZucker in #19518

Syntax issue (line 497, 526) Documentation by @kant in #19442

Fix pytorch seq2seq qa by @FilipposVentirozos in #19258

Add depth estimation pipeline by @nandwalritik in #18618

Adding links to pipelines parameters documentation by @AndreaSottana in #19227

fix MarkupLMProcessor option flag by @davanstrien in #19526

[Doctest] Bart configuration update by @imarekkus in #19524

Remove roberta dependency from longformer fast tokenizer by @sirmammingtonham in #19501

made tokenization_roformer independent of bert by @naveennamani in #19426

Remove bert fast dependency from electra by @Threepointone4 in #19520

[Examples] Fix typos in run speech recognition seq2seq by @sanchit-gandhi in #19514

[X-CLIP] Fix doc tests by @NielsRogge in #19523

Update Marian config default vocabulary size by @gante in #19464

Make MobileBert tokenizers independent from Bert by @501Good in #19531

[Whisper] Fix gradient checkpointing by @sanchit-gandhi in #19538

Syntax issues (paragraphs 122, 130, 147, 155) Documentation: @sgugger by @kant in #19437

using trunc_normal for weight init & cls_token by @mathieujouffroy in #19486

Remove MarkupLMForMaskedLM from MODEL_WITH_LM_HEAD_MAPPING_NAMES by @ydshieh in #19534

Image transforms library by @amyeroberts in #18520

Add a decorator for flaky tests by @sgugger in #19498

[Doctest] Add configuration_yolos.py by @daspartho in #19539

Albert config update by @imarekkus in #19541

[Doctest] Add configuration_whisper.py by @daspartho in #19540

Throw an error if getattribute_from_module can't find anything by @ydshieh in #19535

[Doctest] Beit Config for doctest by @daspartho in #19542

Create the arange tensor on device for enabling CUDA-Graph for Clip Encoder by @RezaYazdaniAminabadi in #19503

[Doctest] GPT2 Config for doctest by @daspartho in #19549

Build Push CI images also in a daily basis by @ydshieh in #19532

Fix checkpoint used in MarkupLMConfig by @ydshieh in #19547

add a note to whisper docs clarifying support of long-form decoding by @akashmjn in #19497

[Whisper] Freeze params of encoder by @sanchit-gandhi in #19527

[Doctest] Fixing the Doctest for imageGPT config by @RamitPahwa in #19556

[Doctest] Fixing mobile bert configuration doctest by @RamitPahwa in #19557

[Doctest] Fixing doctest bert_generation configuration by @Threepointone4 in #19558

[Doctest] DeiT Config for doctest by @daspartho in #19560

[Doctest] Reformer Config for doctest by @daspartho in #19562

[Doctest] RoBERTa Config for doctest by @daspartho in #19563

[Doctest] Add configuration_vit.py by @daspartho in #19561

[Doctest] bloom config update by @imarekkus in #19566

[Re-submit] Compute true loss Flax examples by @duongna21 in #19504

Fix fairseq wav2vec2-xls-r pretrained weights conversion scripts by @heatz123 in #19508

[Doctest] CTRL config by @imarekkus in #19574

[Doctest] Add configuration_canine.py by @IzicTemi in #19575

[Doctests] Config files for ViTMAE and YOSO by @grgkaran03 in #19567

Added type hints to DebertaV2ForMultipleChoice Pytorch by @IMvision12 in #19536

[WIP] Add type hints for Lxmert (TF) by @elusenji in #19441

[Doctests] add configuration_blenderbot.py by @grgkaran03 in #19577

[Doctest] adds trajectory_transformer config to Docs test by @SD-13 in #19586

[Doctests] add configuration_blenderbot_small.py by @grgkaran03 in #19589

[Doctest] Swin V2 Config for doctest by @daspartho in #19595

[Doctest] Swin Config for doctest by @daspartho in #19594

[Doctest] SEW Config for doctest by @daspartho in #19597

[Doctest] UniSpeech Config for doctest by @daspartho in #19596

[Doctest] SEW-D Config for doctest by @daspartho in #19598

[Doctest] fix doc test for megatron bert by @RamitPahwa in #19600

Adding type hints for TFXLnet by @thliang01 in #19344

[Doctest] Add configuration_bigbird_pegasus.py and configuration_big_bird.py by @Xabilahu in #19606

Cast masks to np.unit8 before converting to PIL.Image.Image by @amyeroberts in #19616

[Whisper] Don't return attention mask in feat extractor by @sanchit-gandhi in #19521

[Time Series Transformer] Add doc tests by @NielsRogge in #19607

fix BLOOM ONNX config by @NouamaneTazi in #19573

Fix test_tf_encode_plus_sent_to_model for TAPAS by @ydshieh in #19559

Allow usage of TF Text BertTokenizer on TFBertTokenizer to make it servable on TF Serving by @piEsposito in #19590

add gloo backend support for CPU DDP by @sywangyi in #19555

Fix ImageToTextPipelineTests.test_small_model_tf by @ydshieh in #19565

Fix FlaubertTokenizer by @ydshieh in #19552

Visual Bert config for doctest by @ztjhz in #19605

GPTTokenizer dependency removed from deberta class by @RamitPahwa in #19551

xlm roberta config for doctest by @ztjhz in #19609

Ernie config for doctest by @ztjhz in #19611

xlm roberta xl config for doctest by @ztjhz in #19610

fix: small error by @0xflotus in #19612

Improve error messaging for ASR pipeline. by @Narsil in #19570

[Doctest] LeViT Config for doctest by @daspartho in #19622

[Doctest] DistilBERT Config for doctest by @daspartho in #19621

[Whisper] Fix gradient checkpointing (again!) by @sanchit-gandhi in #19548

[Doctest] Add configuration_resnet.py by @daspartho in #19620

Fix whisper doc by @ArthurZucker in #19608

Sharding fails in TF when absolute scope was modified if . in layer name by @ArthurZucker in #19124

[Doctest] Add configuration_vision_text_dual_encoder.py by @SD-13 in #19580

[Doctest] Add configuration_vision_encoder_decoder.py by @SD-13 in #19583

[Doctest] Add configuration_time_series_transformer.py by @SD-13 in #19582

Tokenizer from_pretrained should not use local files named like tokenizer files by @sgugger in #19626

[Doctest] CodeGen config for doctest by @AymenBer99 in #19633

[Doctest] Add configuration_data2vec_text.py by @daspartho in #19636

[Doctest] Conditional DETR config for doctest by @AymenBer99 in #19641

[Doctest] XLNet config for doctest by @AymenBer99 in #19649

[Doctest] Add configuration_trocr.py by @thliang01 in #19658

Add doctest info in testingmdx by @ArthurZucker in #19623

Add pillow to layoutlmv3 example requirements.txt by @Spacefish in #19663

add return types for tf gptj, xlm, and xlnet by @sirmammingtonham in #19638

Fix pipeline predict transform methods by @s-udhaya in #19657

Type hints MCTCT by @rchan26 in #19618

added type hints for Yolos Pytorch model by @WhiteWolf47 in #19545

A few CI fixes for DocumentQuestionAnsweringPipeline by @ankrgyl in #19584

Removed Bert interdependency from Funnel transformer by @mukesh663 in #19655

fix warnings in deberta by @sanderland in #19458

word replacement line #231 by @shreem-123 in #19662

[Doctest] Add configuration_transfo_xl.py by @thliang01 in #19651

Update perf_train_gpu_one.mdx by @cakiki in #19676

object-detection instead of object_detection by @Spacefish in #19677

add return_tensor parameter for feature extraction by @ajsanjoaquin in #19257

Fix code examples of DETR and YOLOS by @NielsRogge in #19669

Revert "add return_tensor parameter for feature extraction by @sgugger in #19257)"

Fixed the docstring and type hint for forced_decoder_ids option in Ge… by @koreyou in #19640

Add normalize to image transforms module by @amyeroberts in #19544

[Doctest] Data2VecAudio Config for doctest by @daspartho in #19635

Update ESM checkpoints to point to facebook/ by @Rocketknight1 in #19675

Removed XLMModel inheritance from FlaubertModel(torch+tf) by @D3xter1922 in #19432

[Examples] make default preprocessing_num_workers=1 by @Yang-YiFan in #19684

[Doctest] Add configuration_convbert.py by @AymenBer99 in #19643

[Doctest] Add configuration_realm.py by @ak04p in #19646

Update CONTRIBUTING.md by @shreem-123 in #19689

[Doctest] Add configuration_data2vec_vision.py by @daspartho in #19637

Fix some CI torch device issues for PyTorch 1.13 by @ydshieh in #19681

Fix checkpoint used in VisualBertConfig doc example by @ydshieh in #19692

Fix dtype in radnomly initialized head by @sgugger in #19690

fix tests by @ArthurZucker in #19670

fix test whisper with new max length by @ArthurZucker in #19668

check decoder_inputs_embeds is None before shifting labels by @ArthurZucker in #19671

Fix docs by @NielsRogge in #19687

update documentation by @ArthurZucker in #19706

Improve DETR models by @NielsRogge in #19644

Small fixes for TF-ESM1b and ESM-1b weight conversions by @Rocketknight1 in #19683

Fix typo in perf docs by @cakiki in #19705

Fix redundant normalization of OWL-ViT text embeddings by @alaradirik in #19712

Allow user-managed Pool in Wav2Vec2ProcessorWithLM.batch_decode by @falcaopetri in #18351

[Doctest] CVT config for doctest by @AymenBer99 in #19695

[Doctest] Add configuration_wav2vec2.py to documentation_tests.py by @juancopi81 in #19698

]Fixed pegasus config doctest by @mukesh663 in #19722

fix seq2seqtrainer predict without labels by @IvanSedykh in #19721

add return_tensors parameter for feature_extraction 2 by @Narsil in #19707

Improving image-segmentation pipeline tests. by @Narsil in #19710

[Doctest] Adding config files for convnext by @soma2000-lang in #19717

[Doctest] Fixing doctest configuration_pegasus_x.py by @mukesh663 in #19725

Specify TF framework in TF-related pipeline tests by @ydshieh in #19719

Add docs by @NielsRogge in #19729

Fix activations being all the same module by @sgugger in #19728

add accelerate support for Whisper by @younesbelkada in #19697

Clean up deprecation warnings by @Davidy22 in #19654

Repo utils test by @sgugger in #19696

Add decorator to flaky test by @amyeroberts in #19674

[Doctest] Add doctest for FlavaConfig and FNetConfig by @ndrohith09 in #19724

Update contribution guide by @stevhliu in #19700

[Doctest] Add wav2vec2_conformer for doctest by @juancopi81 in #19734

[Doctest] XLM Config for doctest by @AymenBer99 in #19685

[Doctest] Add configuration_clip.py by @daspartho in #19647

[Doctest] GPTNeoConfig , GPTNeoXConfig , GPTNeoXJapaneseConfig by @ndrohith09 in #19741

Update modeling_markuplm.py by @IMvision12 in #19723

Fix issue #19300 by @raghavanone in #19483

[Doctest] Add configuration_wavlm.py by @juancopi81 in #19749

Specify TF framework explicitly in more pipeline tests by @ydshieh in #19748

Fix cache version file creation by @sgugger in #19750

Image transforms add center crop by @amyeroberts in #19718

[Doctest] Add configuration_decision_transformer.py by @Xabilahu in #19751

[Doctest] Add configuration_detr.py by @Xabilahu in #19752

Fixed spacing errors by @shreya24ag in #19754

All broken links were fixed in contributing file by @mdfaizanahmed786 in #19760

[Doctest] SpeechToTextTransformer Config for doctest by @daspartho in #19757

[Doctest] SqueezeBERT Config for doctest by @daspartho in #19758

[Doctest] SpeechToTextTransformer2 Config for doctest by @daspartho in #19756

[Doctest] OpenAIGPTConfig and OPTConfig by @ndrohith09 in #19763

image-segmentation pipeline: re-enable small_model_pt test. by @Narsil in #19716

Update modeling_layoutlmv3.py by @IMvision12 in #19753

adding key pair dataset by @rohit1998 in #19765

Fix exception thrown using MishActivation by @chinoll in #19739

[FLAX] Add dtype to embedding for gpt2 model by @merrymercy in #18462

TF: sample generation compatible with XLA and dynamic batch sizes by @gante in #19773

Install tf2onnx dev version by @ydshieh in #19755

Fix docker image build by @ydshieh in #19759

PT <-> TF for composite models by @ydshieh in #19732

Add warning about restarting runtime to import errors by @Rocketknight1 in #19774

Added support for multivariate independent emission heads by @kashif in #19453

Update ImageToTextPipelineTests.test_small_model_tf by @ydshieh in #19785

Make public versions of private tensor utils by @sgugger in #19775

Update training.mdx by @ftorres16 in #19791

[ custom_models.mdx ] - Translated to Portuguese the custom models tutorial. by @davialvb in #19779

Add sentencepiece to BertJapaneseTokenizer by @conan1024hao in #19769

Fix CTRL test_torchscrip_xxx CI by updating _create_and_check_torchscript by @ydshieh in #19786

Fix nightly test setup by @sgugger in #19792

Fix image segmentation pipeline errors, resolve backward compatibility issues by @alaradirik in #19768

Fix error/typo in docstring of TokenClassificationPipeline by @pchr8 in #19798

Use None to detect if truncation was unset by @sgugger in #19794

Generate: contrastive search test updates by @gante in #19787

Run some TF Whisper tests in subprocesses to avoid GPU OOM by @ydshieh in #19772

Added translation of run_scripts.mdx to Portuguese Issue #16824 by @davialvb in #19800

Generate: minor docstring fix by @gante in #19801

[Doctest] MaskFormerConfig doctest by @sha016 in #19817

[Doctest] Add configuration_plbart.py by @ayaka14732 in #19809

[Doctest] Add configuration_poolformer.py by @ayaka14732 in #19808

[Doctest] Add configuration_electra.py by @ayaka14732 in #19807

[Doctest] Add configuration_nezha.py by @ayaka14732 in #19810

Display the number of trainable parameters when lauching a training by @regisss in #19835

replace reference to Datasets in metrics deprecation with Evaluate by @angus-lherrou in #19812

Fix OOM in Config doctest by @ydshieh in #19840

fix broken links in testing.mdx by @XFFXFF in #19820

fix image2test args forwarding by @kventinel in #19648

Added translation of converting_tensorflow_models.mdx to Portuguese Issue #16824 by @davialvb in #19824

Fix nightly CircleCI by @ydshieh in #19837

fixed typo in fp16 training section for perf_train_gpu_one by @dsingal0 in #19736

Update LEDModelIntegrationTests expected values by @ydshieh in #19841

Improve check copies by @kventinel in #19829

Fix doctest for MarkupLM by @ydshieh in #19845

add small updates only by @stevhliu in #19847

Refactor conversion function by @sgugger in #19799

Spanish translation of multiple_choice.mdx, question_answering.mdx. by @alceballosa in #19821

Fix doctest for GenerationMixin.contrastive_search by @ydshieh in #19863

Add missing lang tokens in M2M100Tokenizer.get_vocab by @guillaumekln in #18416

Added translation of serialization.mdx to Portuguese Issue #16824 by @davialvb in #19869

Generate: contrastive search cosmetic tweaks by @gante in #19871

[Past CI] Vilt only supports PT >= v1.10 by @LysandreJik in #19851

Fix incorrect model<->tokenizer mapping in tokenization testing by @ydshieh in #19872

Update doc for revision and token by @sgugger in #19793

Factored out some code in the image-segmentation pipeline. by @Narsil in #19727

[DOCTEST] Config doctest for MCTCT, MBart and LayoutLM by @Revanth2002 in #19889

Fix LR by @regisss in #19875

Correct README image text by @KayleeDavisGitHub in #19883

No conv bn folding in ipex to avoid warning by @sanderland in #19870

Add missing information on token_type_ids for roberta model by @raghavanone in #19766

Change the import of kenlm from github to pypi by @raghavanone in #19770

Update max_diff in test_save_load_fast_init_to_base by @ydshieh in #19849

Allow flax subfolder by @patrickvonplaten in #19902

accelerate support for RoBERTa family by @younesbelkada in #19906

Add checkpoint links in a few config classes by @ydshieh in #19910

Generate: contrastive search uses existing abstractions and conventions by @gante in #19896

Convert None logits processor/stopping criteria to empty list. by @ccmaymay in #19880

Some fixes regarding auto mappings and test class names by @ydshieh in #19923

Fix bug in Wav2Vec2's GPU tests by @falcaopetri in #19803

Fix warning when collating list of numpy arrays by @sgugger in #19846

Add type hints to TFPegasusModel by @EdAbati in #19858

Remove embarrassing debug print() in save_pretrained by @Rocketknight1 in #19922

Add accelerate support for M2M100 by @younesbelkada in #19912

Add RoBERTa resources by @stevhliu in #19911

Add T5 resources by @stevhliu in #19878

Add BLOOM resources by @stevhliu in #19881

Add GPT2 resources by @stevhliu in #19879

Let inputs of fast tokenizers be tuples as well as lists by @sgugger in #19898

Add accelerate support for BART-like models by @younesbelkada in #19927

Create dummy models by @ydshieh in #19901

Support segformer fx by @dwlim-nota in #19924

Use self._trial to generate trial_name for Trainer. by @reyoung in #19874

Add Onnx Config for ImageGPT by @RaghavPrabhakar66 in #19868

Update Code of Conduct to Contributor Covenant v2.1 by @pankali in #19935

add resources for bart by @stevhliu in #19928

add resources for distilbert by @stevhliu in #19930

Add wav2vec2 resources by @stevhliu in #19931

[Conditional, Deformable DETR] Add postprocessing methods by @NielsRogge in #19709

Fix ONNX tests for ONNX Runtime v1.13.1 by @lewtun in #19950

donut -> donut-swin by @ydshieh in #19920

[Doctest] Add configuration_deberta.py by @Saad135 in #19968

gradient checkpointing for GPT-NeoX by @chiaolun in #19946

[modelcard] Update for ASR by @sanchit-gandhi in #19985

[ASR] Update 'tasks' for model card by @sanchit-gandhi in #19986

Tranformers documentation translation to Italian #17459 by @draperkm in #19988

Pin torch to < 1.13 temporarily by @ydshieh in #19989

Add support for gradient checkpointing by @NielsRogge in #19990

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@arnaudstiegler

Make LayoutLM tokenizers independent from BertTokenizer (#19351)

@asofiaoliveira

Make XLMRoberta model and config independent from Roberta (#19359)

@srhrshr

Decouples XLMProphet model from Prophet (#19406)

@Davidy22

Make bert_japanese and cpm independent of their inherited modules (#19431)

Clean up deprecation warnings (#19654)

@mathieujouffroy

[CvT] Tensorflow implementation (#18597)

using trunc_normal for weight init & cls_token (#19486)

@IMvision12

New (#19481)

Added type hints to DebertaV2ForMultipleChoice Pytorch (#19536)

Update modeling_markuplm.py (#19723)

Update modeling_layoutlmv3.py (#19753)

@501Good

Make MobileBert tokenizers independent from Bert (#19531)

@mukesh663

Removed Bert interdependency from Funnel transformer (#19655)

]Fixed pegasus config doctest (#19722)

[Doctest] Fixing doctest configuration_pegasus_x.py (#19725)

@D3xter1922

Removed XLMModel inheritance from FlaubertModel(torch+tf) (#19432)

@falcaopetri

Allow user-managed Pool in Wav2Vec2ProcessorWithLM.batch_decode (#18351)

Fix bug in Wav2Vec2's GPU tests (#19803)

@gmftbyGMFTBY

Adding the state-of-the-art contrastive search decoding methods for the codebase of generation_utils.py (#19477)

@davialvb

[ custom_models.mdx ] - Translated to Portuguese the custom models tutorial. (#19779)

Added translation of run_scripts.mdx to Portuguese Issue #16824 (#19800)

Added translation of converting_tensorflow_models.mdx to Portuguese Issue #16824 (#19824)

Added translation of serialization.mdx to Portuguese Issue #16824 (#19869)

@alceballosa

Spanish translation of multiple_choice.mdx, question_answering.mdx. (#19821)

Source code(tar.gz)
Source code(zip)
v4.23.1(Oct 11, 2022)
Fix a revert introduced by mistake making the "automatic-speech-recognition" for Whisper.

Fix whisper for pipeline by @ArthurZucker in #19482

Source code(tar.gz)
Source code(zip)
v4.23.0(Oct 10, 2022)
Whisper

The Whisper model was proposed in Robust Speech Recognition via Large-Scale Weak Supervision by Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever.

Whisper is an encoder-decoder Transformer trained on 680,000 hours of labeled (transcribed) audio. The model shows impressive performance and robustness in a zero-shot setting, in multiple languages.

Add WhisperModel to transformers by @ArthurZucker in #19166

Add TF whisper by @amyeroberts in #19378

Deformable DETR

The Deformable DETR model was proposed in Deformable DETR: Deformable Transformers for End-to-End Object Detection by Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai.

Deformable DETR mitigates the slow convergence issues and limited feature spatial resolution of the original DETR by leveraging a new deformable attention module which only attends to a small set of key sampling points around a reference.

Add Deformable DETR by @NielsRogge in #17281

[fix] Add DeformableDetrFeatureExtractor by @NielsRogge in #19140

Conditional DETR

The Conditional DETR model was proposed in Conditional DETR for Fast Training Convergence by Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang.

Conditional DETR presents a conditional cross-attention mechanism for fast DETR training. Conditional DETR converges 6.7× to 10× faster than DETR.

Add support for conditional detr by @DeppMeng in #18948

Improve conditional detr docs by @NielsRogge in #19154

Time Series Transformer

The Time Series Transformer model is a vanilla encoder-decoder Transformer for time series forecasting.

The model is trained in a similar way to how one would train an encoder-decoder Transformer (like T5 or BART) for machine translation; i.e. teacher forcing is used. At inference time, one can autoregressively generate samples, one time step at a time.

:warning: This is a recently introduced model and modality, so the API hasn't been tested extensively. There may be some bugs or slight breaking changes to fix it in the future. If you see something strange, file a Github Issue.

time series forecasting model by @kashif in #17965

Masked Siamese Networks

The ViTMSN model was proposed in Masked Siamese Networks for Label-Efficient Learning by Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas.

MSN (masked siamese networks) consists of a joint-embedding architecture to match the prototypes of masked patches with that of the unmasked patches. With this setup, the method yields excellent performance in the low-shot and extreme low-shot regimes for image classification, outperforming other self-supervised methods such as DINO. For instance, with 1% of ImageNet-1K labels, the method achieves 75.7% top-1 accuracy.

MSN (Masked Siamese Networks) for ViT by @sayakpaul in #18815

MarkupLM

The MarkupLM model was proposed in MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding by Junlong Li, Yiheng Xu, Lei Cui, Furu Wei.

MarkupLM is BERT, but applied to HTML pages instead of raw text documents. The model incorporates additional embedding layers to improve performance, similar to LayoutLM.

The model can be used for tasks like question answering on web pages or information extraction from web pages. It obtains state-of-the-art results on 2 important benchmarks: WebSRC and SWDE.

Add MarkupLM by @NielsRogge in #19198

Security & safety

We explore a new serialization format not using Pickle that we can then leverage in the three frameworks we support: PyTorch, TensorFlow, and JAX. We leverage the safetensors library for that.

Support is for PyTorch models only at this stage, and still experimental.

Poc to use safetensors by @sgugger in #19175

Computer vision post-processing methods overhaul

The processors for computer vision have been overhauled to ensure they have consistent naming, input arguments and outputs. :warning: The existing methods that are superseded by the introduced methods post_process_object_detection, post_process_semantic_segmentation, post_process_instance_segmentation, post_process_panoptic_segmentation are now deprecated.

Improve DETR post-processing methods by @alaradirik in #19205

Beit postprocessing by @alaradirik in #19099

Fix BeitFeatureExtractor postprocessing by @alaradirik in #19119

Add post_process_semantic_segmentation method to SegFormer by @alaradirik in #19072

Add post_process_semantic_segmentation method to DPTFeatureExtractor by @alaradirik in #19107

Add semantic segmentation post-processing method to MobileViT by @alaradirik in #19105

Detr preprocessor fix by @alaradirik in #19007

Improve and fix ImageSegmentationPipeline by @alaradirik in #19367

Restructure DETR post-processing, return prediction scores by @alaradirik in #19262

Maskformer post-processing fixes and improvements by @alaradirik in #19172

Fix MaskFormer failing postprocess tests by @alaradirik in #19354

Fix DETR segmentation postprocessing output by @alaradirik in #19363

fix docs example, add object_detection to DETR docs by @alaradirik in #19377

🚨 Breaking changes

The following changes are bugfixes that we have chosen to fix even if it changes the resulting behavior. We mark them as breaking changes, so if you are using this part of the codebase, we recommend you take a look at the PRs to understand what changes were done exactly..

Breaking change for ViT parameter initialization

🚨🚨🚨 Fix ViT parameter initialization by @alaradirik in #19341

Breaking change for the top_p argument of the TopPLogitsWarper of the generate method.

🚨🚨🚨 Optimize Top P Sampler and fix edge case by @ekagra-ranjan in #18984

Model head additions

OPT and BLOOM now have question answering heads available.

Add OPTForQuestionAnswering by @clementapa in #19402

Add BloomForQuestionAnswering by @younesbelkada in #19310

Pipelines

There is now a zero-shot object detection pipeline.

Add ZeroShotObjectDetectionPipeline by @sahamrit in #18445)

TensorFlow architectures

The GroupViT model is now available in TensorFlow.

[TensorFlow] Adding GroupViT by @ariG23498 in #18020

Bugfixes and improvements

Fix a broken link for deepspeed ZeRO inference in the docs by @nijkah in #19001

[doc] debug: fix import by @stas00 in #19042

[bnb] Small improvements on utils by @younesbelkada in #18646

Update image segmentation pipeline test by @amyeroberts in #18731

Fix test_save_load for TFViTMAEModelTest by @ydshieh in #19040

Pin minimum PyTorch version for BLOOM ONNX export by @lewtun in #19046

Update serving signatures and make sure we actually use them by @Rocketknight1 in #19034

Move cache: expand error message by @sgugger in #19051

Fixing OPT fast tokenizer option. by @Narsil in #18753

Fix custom tokenizers test by @sgugger in #19052

Run torchdynamo tests by @ydshieh in #19056

[fix] Add DeformableDetrFeatureExtractor by @NielsRogge in #19140

fix arg name in BLOOM testing and remove unused arg document by @shijie-wu in #18843

Adds package and requirement spec output to version check exception by @colindean in #18702

fix use_cache by @younesbelkada in #19060

FX support for ConvNext, Wav2Vec2 and ResNet by @michaelbenayoun in #19053

[doc] Fix link in PreTrainedModel documentation by @tomaarsen in #19065

Add FP32 cast in ConvNext LayerNorm to prevent rounding errors with FP16 input by @jimypbr in #18746

Organize test jobs by @sgugger in #19058

Automatically tag CLIP repos as zero-shot-image-classification by @osanseviero in #19064

Fix LeViT checkpoint by @ydshieh in #19069

TF: tests for (de)serializable models with resized tokens by @gante in #19013

Add type hints for PyTorch UniSpeech, MPNet and Nystromformer by @daspartho in #19039

replace logger.warn by logger.warning by @fxmarty in #19068

Fix tokenizer load from one file by @sgugger in #19073

Note about developer mode by @LysandreJik in #19075

german autoclass by @flozi00 in #19049

Add tests for legacy load by url and fix bugs by @sgugger in #19078

Add runner availability check by @ydshieh in #19054

fix working dir by @ydshieh in #19101

Added type hints for TFConvBertModel by @kishore-s-15 in #19088

Added Type hints for VIT MAE by @kishore-s-15 in #19085

Add type hints for TF MPNet models by @kishore-s-15 in #19089

Added type hints to ResNetForImageClassification by @kishore-s-15 in #19084

added type hints by @daspartho in #19076

Improve vision models docs by @NielsRogge in #19103

correct spelling in README by @flozi00 in #19092

Don't warn of move if cache is empty by @sgugger in #19109

HPO: keep the original logic if there's only one process, pass the trial to trainer by @sywangyi in #19096

Add documentation of Trainer.create_model_card by @sgugger in #19110

Added type hints for YolosForObjectDetection by @kishore-s-15 in #19086

Fix the wrong schedule by @ydshieh in #19117

Change document question answering pipeline to always return an array by @ankrgyl in #19071

german processing by @flozi00 in #19121

Fix: update ltp word segmentation call in mlm_wwm by @xyh1756 in #19047

Add a missing space in a script arg documentation by @bryant1410 in #19113

Skip test_export_to_onnx for LongT5 if torch < 1.11 by @ydshieh in #19122

Fix GLUE MNLI when using max_eval_samples by @lvwerra in #18722

[BugFix] Fix fsdp option on shard_grad_op. by @ZHUI in #19131

Fix FlaxPretTrainedModel pt weights check by @mishig25 in #19133

suppoer deps from github by @lhoestq in #19141

Fix dummy creation for multi-frameworks objects by @sgugger in #19144

Allowing users to use the latest tokenizers release ! by @Narsil in #19139

Add some tests for check_dummies by @sgugger in #19146

Fixed typo in generation_utils.py by @nbalepur in #19145

Add accelerate support for ViLT by @younesbelkada in #18683

TF: check embeddings range by @gante in #19102

Reduce LR for TF MLM example test by @Rocketknight1 in #19156

update perf_train_cpu_many doc by @sywangyi in #19151

fix: ckpt paths. by @sayakpaul in #19159

Fix TrainingArguments documentation by @sgugger in #19162

fix HPO DDP GPU problem by @sywangyi in #19168

[WIP] Trainer supporting evaluation on multiple datasets by @timbmg in #19158

Add doctests to Perceiver examples by @stevenmanton in #19129

Add offline runners info in the Slack report by @ydshieh in #19169

Fix incorrect comments about atten mask for pytorch backend by @lygztq in #18728

Fixed type hint for pipelines/check_task by @Fei-Wang in #19150

Update run_clip.py by @enze5088 in #19130

german training, accelerate and model sharing by @flozi00 in #19171

Separate Push CI images from Scheduled CI by @ydshieh in #19170

Remove pos arg from Perceiver's Pre/Postprocessors by @aielawady in #18602

Use assertAlmostEqual in BloomEmbeddingTest.test_logits by @ydshieh in #19200

Move the model type check by @ankrgyl in #19027

Use repo_type instead of deprecated datasets repo IDs by @sgugger in #19202

Updated hf_argparser.py by @IMvision12 in #19188

Add warning for torchaudio <= 0.10 in MCTCTFeatureExtractor by @ydshieh in #19203

Fix cached_file in offline mode for cached non-existing files by @sgugger in #19206

Remove unused cur_len in generation_utils.py by @ekagra-ranjan in #18874

add wav2vec2_alignment by @arijitx in #16782

add doc for hyperparameter search by @sywangyi in #19192

Add a use_parallel_residual argument to control the residual computing way by @NinedayWang in #18695

translated add_new_pipeline by @nickprock in #19215

More tests for regression in cached non existence by @sgugger in #19216

Use math.pi instead of torch.pi in MaskFormer by @ydshieh in #19201

Added tests for yaml and json parser by @IMvision12 in #19219

Fix small use_cache typo in the docs by @ankrgyl in #19191

Generate: add warning when left padding should be used by @gante in #19067

Fix deprecation warning for return_all_scores by @ogabrielluiz in #19217

Fix doctest for TFDeiTForImageClassification by @ydshieh in #19173

Document and validate typical_p in generation by @mapmeld in #19128

Fix trainer seq2seq qa.py evaluate log and ft script by @iamtatsuki05 in #19208

Fix cache names in CircleCI jobs by @ydshieh in #19223

Move AutoClasses under Main Classes by @stevhliu in #19163

Focus doc around preprocessing classes by @stevhliu in #18768

Fix confusing working directory in Push CI by @ydshieh in #19234

XGLM - Fix Softmax NaNs when using FP16 by @gsarti in #18057

Add a getattr method, which replaces _module_getattr in torch.fx.Tracer from PyTorch 1.13+ by @michaelbenayoun in #19233

Fix m2m_100.mdx doc example missing labels by @Mustapha-AJEGHRIR in #19149

Fix opt softmax small nit by @younesbelkada in #19243

Use hf_raise_for_status instead of deprecated _raise_for_status by @Wauplin in #19244

Fix TrainingArgs argument serialization by @atturaioe in #19239

Fix test fetching for examples by @sgugger in #19237

Cast TF generate() inputs by @Rocketknight1 in #19232

Skip pipeline tests by @sgugger in #19248

Add job names in Past CI artifacts by @ydshieh in #19235

Update Past CI report script by @ydshieh in #19228

[Wav2Vec2] Fix None loss in doc examples by @rbsteinm in #19218

Catch HFValidationError in TrainingSummary by @ydshieh in #19252

Add expected output to the sample code for ViTMSNForImageClassification by @sayakpaul in #19183

Add stop sequence to text generation pipeline by @KMFODA in #18444

Add notebooks by @JingyaHuang in #19259

Add beautifulsoup4 to the dependency list by @ydshieh in #19253

Fix Encoder-Decoder testing issue about repo. names by @ydshieh in #19250

Fix cached lookup filepath on windows for hub by @kjerk in #19178

Docs - Guide to add a new TensorFlow model by @gante in #19256

Update no_trainer script for summarization by @divyanshugit in #19277

Don't automatically add bug label by @sgugger in #19302

Breakup export guide by @stevhliu in #19271

Update Protobuf dependency version to fix known vulnerability by @qthequartermasterman in #19247

Update README.md by @ShubhamJagtap2000 in #19309

[Docs] Fix link by @patrickvonplaten in #19313

Fix for sequence regression fit() in TF by @Rocketknight1 in #19316

Added Type hints for LED TF by @IMvision12 in #19315

Added type hints for TF: rag model by @debjit-bw in #19284

alter retrived to retrieved by @gouqi666 in #18863

ci(stale.yml): upgrade actions/setup-python to v4 by @oscard0m in #19281

ci(workflows): update actions/checkout to v3 by @oscard0m in #19280

wrap forward passes with torch.no_grad() by @daspartho in #19279

wrap forward passes with torch.no_grad() by @daspartho in #19278

wrap forward passes with torch.no_grad() by @daspartho in #19274

wrap forward passes with torch.no_grad() by @daspartho in #19273

Removing BertConfig inheritance from LayoutLMConfig by @arnaudstiegler in #19307

docker-build: Update actions/checkout to v3 by @Sushrut1101 in #19288

Clamping hidden state values to allow FP16 by @SSamDav in #19229

Remove interdependency from OpenAI tokenizer by @E-Aho in #19327

removing XLMConfig inheritance from FlaubertConfig by @D3xter1922 in #19326

Removed interdependency of BERT's Tokenizer in tokenization of prophetnet by @divyanshugit in #19331

Remove bert interdependency from clip tokenizer by @shyamsn97 in #19332

[WIP]remove XLMTokenizer inheritance from FlaubertTokenizer by @D3xter1922 in #19330

Making camembert independent from roberta, clean by @Mustapha-AJEGHRIR in #19337

Add sudachi and jumanpp tokenizers for bert_japanese by @r-terada in #19043

Frees LongformerTokenizer of the Roberta dependency by @srhrshr in #19346

Change BloomConfig docstring by @younesbelkada in #19336

Test failing test while we resolve the issue. by @sgugger in #19355

Call _set_save_spec() when creating TF models by @Rocketknight1 in #19321

correct typos in README by @paulaxisabel in #19304

Removes Roberta and Bert config dependencies from Longformer by @srhrshr in #19343

Fix gather for metrics by @muellerzr in #19360

Fix pipeline tests for Roberta-like tokenizers by @sgugger in #19365

Change link of repojacking vulnerable link by @Ilaygoldman in #19393

Making ConvBert Tokenizer independent from bert Tokenizer by @IMvision12 in #19347

Fix gather for metrics by @muellerzr in #19389

Added Type hints for XLM TF by @IMvision12 in #19333

add ONNX support for swin transformer by @bibhabasumohapatra in #19390

removes prophet config dependencies from xlm-prophet by @srhrshr in #19400

Added type hints for TF: TransfoXL by @thliang01 in #19380

HF <-> megatron checkpoint reshaping and conversion for GPT by @pacman100 in #19317

Remove unneded words from audio-related feature extractors by @osanseviero in #19405

edit: cast attention_mask to long in DataCollatorCTCWithPadding by @ddobokki in #19369

Copy BertTokenizer dependency into retribert tokenizer by @Davidy22 in #19371

Export TensorFlow models to ONNX with dynamic input shapes by @dwyatte in #19255

update attention mask handling by @ArthurZucker in #19385

Remove dependency of Bert from Squeezebert tokenizer by @rchan26 in #19403

Removed Bert and XML Dependency from Herbert by @harry7337 in #19410

Clip device map by @patrickvonplaten in #19409

Remove Dependency between Bart and LED (slow/fast) by @Infrared1029 in #19408

Removed Bert interdependency in tokenization_electra.py by @OtherHorizon in #19356

Make Camembert TF version independent from Roberta by @Mustapha-AJEGHRIR in #19364

Removed Bert dependency from BertGeneration code base. by @Threepointone4 in #19370

Rework pipeline tests by @sgugger in #19366

Fix ViTMSNForImageClassification doctest by @ydshieh in #19275

Skip BloomEmbeddingTest.test_embeddings for PyTorch < 1.10 by @ydshieh in #19261

remove RobertaConfig inheritance from MarkupLMConfig by @D3xter1922 in #19404

Backtick fixed (paragraph 68) by @kant in #19440

Fixed duplicated line (paragraph #83) Documentation: @sgugger by @kant in #19436

fix marianMT convertion to onnx by @kventinel in #19287

Fix typo in image-classification/README.md by @zhawe01 in #19424

Stop relying on huggingface_hub's private methods by @LysandreJik in #19392

Add onnx support for VisionEncoderDecoder by @mht-sharma in #19254

Remove dependency of Roberta in Blenderbot by @rchan26 in #19411

fix: renamed variable name by @ariG23498 in #18850

Fix the error message in run_t5_mlm_flax.py by @yangky11 in #19282

Add Italian translation for add_new_model.mdx by @Steboss89 in #18713

Fix momentum and epsilon values by @amyeroberts in #19454

Generate: corrected exponential_decay_length_penalty type hint by @ShivangMishra in #19376

Fix misspelled word in docstring by @Bearnardd in #19415

Fixed a non-working hyperlink in the README.md file by @MikailINTech in #19434

fix by @ydshieh in #19469

wrap forward passes with torch.no_grad() by @daspartho in #19439

wrap forward passes with torch.no_grad() by @daspartho in #19438

wrap forward passes with torch.no_grad() by @daspartho in #19416

wrap forward passes with torch.no_grad() by @daspartho in #19414

wrap forward passes with torch.no_grad() by @daspartho in #19413

wrap forward passes with torch.no_grad() by @daspartho in #19412

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@flozi00

german autoclass (#19049)

correct spelling in README (#19092)

german processing (#19121)

german training, accelerate and model sharing (#19171)

@DeppMeng

Add support for conditional detr (#18948)

@sayakpaul

MSN (Masked Siamese Networks) for ViT (#18815)

fix: ckpt paths. (#19159)

Add expected output to the sample code for ViTMSNForImageClassification (#19183)

@IMvision12

Updated hf_argparser.py (#19188)

Added tests for yaml and json parser (#19219)

Added Type hints for LED TF (#19315)

Making ConvBert Tokenizer independent from bert Tokenizer (#19347)

Added Type hints for XLM TF (#19333)

@ariG23498

[TensorFlow] Adding GroupViT (#18020)

fix: renamed variable name (#18850)

@Mustapha-AJEGHRIR

Fix m2m_100.mdx doc example missing labels (#19149)

Making camembert independent from roberta, clean (#19337)

Make Camembert TF version independent from Roberta (#19364)

@D3xter1922

removing XLMConfig inheritance from FlaubertConfig (#19326)

[WIP]remove XLMTokenizer inheritance from FlaubertTokenizer (#19330)

remove RobertaConfig inheritance from MarkupLMConfig (#19404)

@srhrshr

Frees LongformerTokenizer of the Roberta dependency (#19346)

Removes Roberta and Bert config dependencies from Longformer (#19343)

removes prophet config dependencies from xlm-prophet (#19400)

@sahamrit

[WIP] Add ZeroShotObjectDetectionPipeline (#18445) (#18930)

@Davidy22

Copy BertTokenizer dependency into retribert tokenizer (#19371)

@rchan26

Remove dependency of Bert from Squeezebert tokenizer (#19403)

Remove dependency of Roberta in Blenderbot (#19411)

@harry7337

Removed Bert and XML Dependency from Herbert (#19410)

@Infrared1029

Remove Dependency between Bart and LED (slow/fast) (#19408)

@Steboss89

Add Italian translation for add_new_model.mdx (#18713)

Source code(tar.gz)
Source code(zip)
v4.22.2(Sep 27, 2022)
Fixes a bug where a cached tokenizer/model was not accessible anymore offline (either forcing offline mode or because of an internet issue).

More tests for regression in cached non existence by @sgugger in #19216

Fix cached_file in offline mode for cached non-existing files by @sgugger in #19206

Don't warn of move if cache is empty by @sgugger in #19109

Source code(tar.gz)
Source code(zip)
v4.22.1(Sep 16, 2022)
Patch release for the following PRs:

Add tests for legacy load by url and fix bugs (#19078 )

Note about developer mode (#19075 )

Fix tokenizer load from one file (#19073 )

Fixing OPT fast tokenizer option. (#18753 )

Move cache: expand error message (#19051 )

Source code(tar.gz)
Source code(zip)
v4.22.0(Sep 14, 2022)
Swin Transformer v2

The Swin Transformer V2 model was proposed in Swin Transformer V2: Scaling Up Capacity and Resolution by Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo.

Swin Transformer v2 improves the original Swin Transformer using 3 main techniques: 1) a residual-post-norm method combined with cosine attention to improve training stability; 2) a log-spaced continuous position bias method to effectively transfer models pre-trained using low-resolution images to downstream tasks with high-resolution inputs; 3) A self-supervised pre-training method, SimMIM, to reduce the needs of vast labeled images.

Add swin transformer v2 by @nandwalritik in #17469

VideoMAE

The VideoMAE model was proposed in VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training by Zhan Tong, Yibing Song, Jue Wang, Limin Wang. VideoMAE extends masked auto encoders (MAE) to video, claiming state-of-the-art performance on several video classification benchmarks.

VideoMAE is an extension of ViTMAE for video.

Add VideoMAE by @NielsRogge in #17821

Donut

The Donut model was proposed in OCR-free Document Understanding Transformer by Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park. Donut consists of an image Transformer encoder and an autoregressive text Transformer decoder to perform document understanding tasks such as document image classification, form understanding and visual question answering.

Add Donut by @NielsRogge in #18488

Pegasus-X

The PEGASUS-X model was proposed in Investigating Efficiently Extending Transformers for Long Input Summarization by Jason Phang, Yao Zhao and Peter J. Liu.

PEGASUS-X (PEGASUS eXtended) extends the PEGASUS models for long input summarization through additional long input pretraining and using staggered block-local attention with global tokens in the encoder.

PEGASUS-X by @zphang in #18551

X-CLIP

The X-CLIP model was proposed in Expanding Language-Image Pretrained Models for General Video Recognition by Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, Haibin Ling. X-CLIP is a minimal extension of CLIP for video. The model consists of a text encoder, a cross-frame vision encoder, a multi-frame integration Transformer, and a video-specific prompt generator.

X-CLIP is a minimal extension of CLIP for video-language understanding.

Add X-CLIP by @NielsRogge in #18852

ERNIE

ERNIE is a series of powerful models proposed by baidu, especially in Chinese tasks, including ERNIE1.0, ERNIE2.0, ERNIE3.0, ERNIE-Gram, ERNIE-health, etc. These models are contributed by nghuyong and the official code can be found in PaddleNLP (in PaddlePaddle).

ERNIE-2.0 and ERNIE-3.0 models by @nghuyong in #18686

TensorFlow models

MobileViT and LayoutLMv3 are now available in TensorFlow.

TensorFlow MobileViT by @sayakpaul in #18555

[LayoutLMv3] Add TensorFlow implementation by @ChrisFugl in #18678

New task-specific architectures

A new question answering head was added for the LayoutLM model.

Add LayoutLMForQuestionAnswering model by @ankrgyl in #18407

New pipelines

Two new pipelines are available in transformers: a document question answering pipeline, as well as an image to text generation pipeline.

Add DocumentQuestionAnswering pipeline by @ankrgyl in #18414

Add Image To Text Generation pipeline by @OlivierDehaene in #18821

M1 support

There is now Mac M1 support in PyTorch in transformers in pipelines and the Trainer.

pipeline support for device="mps" (or any other string) by @julien-c in #18494

mac m1 mps integration by @pacman100 in #18598

Backend version compatibility

Starting from version v4.22.0, we'll now officially support PyTorch and TensorFlow versions that were released up to two years ago. Versions older than two years-old will not be supported going forward.

We're making this change as we begin actively testing transformers compatibility on older versions. This project can be followed here.

PyTorch >= 1.7.0 and TensorFlow >= 2.4.0 by @sgugger in #19016

Generate method updates

The generate method now starts enforcing stronger validation in order to ensure proper usage.

Generate: validate model_kwargs (and catch typos in generate arguments) by @gante in #18261

Generate: validate model_kwargs on TF (and catch typos in generate arguments) by @gante in #18651

Generate: add model class validation by @gante in #18902

API changes

The as_target_tokenizer and as_target_processor context managers have been deprecated. The new API is to use the call method of the tokenizer/processor with keyword arguments. For instance:

with tokenizer.as_target_tokenizer(): encoded_labels = tokenizer(labels, padding=True)

becomes

encoded_labels = tokenizer(text_target=labels, padding=True)

Replace as_target context managers by direct calls by @sgugger in #18325

Bits and bytes integration

Bits and bytes is now integrated within transformers. This feature can reduce the size of large models by up to 2, with low loss in precision.

Supporting seq2seq models for bitsandbytes integration by @younesbelkada in #18579

bitsandbytes - Linear8bitLt integration into transformers models by @younesbelkada in #17901

Large model support

Models that have sharded checkpoints in PyTorch can be loaded in Flax.

Load sharded pt to flax by @ArthurZucker in #18419

TensorFlow improvements

The TensorFlow examples have been rewritten to support all recent features developped in the past months.

TF Examples Rewrite by @Rocketknight1 in #18451

DeBERTa-v2 is now trainable with XLA.

TF: XLA-trainable DeBERTa v2 by @gante in #18546

Documentation changes

Split model list on modality by @stevhliu in #18328

Improvements and bugfixes

sentencepiece shouldn't be required for the fast LayoutXLM tokenizer by @LysandreJik in #18320

Fix sacremoses sof dependency for Transformers XL by @sgugger in #18321

Owlvit test fixes by @alaradirik in #18303

[Flax] Fix incomplete batches in example scripts by @sanchit-gandhi in #17863

start from 1.12, torch_ccl is renamed as oneccl_bindings_for_pytorch … by @sywangyi in #18229

Update feature extractor docs by @stevhliu in #18324

fixed typo by @banda-larga in #18331

updated translation by @banda-larga in #18333

Updated _toctree.yml by @nickprock in #18337

Update automatic_speech_recognition.py by @bofenghuang in #18339

Fix codeparrot deduplication - ignore whitespaces by @loubnabnl in #18023

Remove Flax OPT from doctest for now by @ydshieh in #18338

Include tensorflow-aarch64 as a candidate by @ankrgyl in #18345

[BLOOM] Deprecate position_ids by @thomasw21 in #18342

Migrate metric to Evaluate library for tensorflow examples by @VijayKalmath in #18327

Migrate metrics used in flax examples to Evaluate by @VijayKalmath in #18348

[Docs] Fix Speech Encoder Decoder doc sample by @sanchit-gandhi in #18346

Fix OwlViT torchscript tests by @ydshieh in #18347

Fix some doctests by @ydshieh in #18359

[FX] Symbolic trace for Bloom by @michaelbenayoun in #18356

Fix TFSegformerForSemanticSegmentation doctest by @ydshieh in #18362

fix FSDP ShardedGradScaler by @pacman100 in #18358

Migrate metric to Evaluate in Pytorch examples by @atturaioe in #18369

Correct the spelling of bleu metric by @ToluClassics in #18375

Remove pt-like calls on tf tensor by @amyeroberts in #18393

Fix from_pretrained kwargs passing by @YouJiacheng in #18387

Add a check regarding the number of occurrences of ``` by @ydshieh in #18389

Add evaluate to test dependencies by @sgugger in #18396

Fix OPT doc tests by @ArthurZucker in #18365

Fix doc tests by @NielsRogge in #18397

Add balanced strategies for device_map in from_pretrained by @sgugger in #18349

Fix docs by @NielsRogge in #18399

Adding fine-tuning models to LUKE by @ikuyamada in #18353

Fix ROUGE add example check and update README by @sgugger in #18398

Add Flax BART pretraining script by @duongna21 in #18297

Rewrite push_to_hub to use upload_files by @sgugger in #18366

Layoutlmv2 tesseractconfig by @kelvinAI in #17733

fix: create a copy for tokenizer object by @YBooks in #18408

Fix uninitialized parameter in conformer relative attention. by @PiotrDabkowski in #18368

Fix the hub user name in a longformer doctest checkpoint by @ydshieh in #18418

Change audio kwarg to images in TROCR processor by @ydshieh in #18421

update maskformer docs by @alaradirik in #18423

Fix test_load_default_pipelines_tf test error by @ydshieh in #18422

fix run_clip README by @ydshieh in #18332

Improve generate docstring by @JoaoLages in #18198

Accept trust_remote_code and ignore it in PreTrainedModel.from_pretrained by @ydshieh in #18428

Update pipeline word heuristic to work with whitespace in token offsets by @davidbenton in #18402

Add programming languages by @cakiki in #18434

fixing error when using sharded ddp by @pacman100 in #18435

Update _toctree.yml by @stevhliu in #18440

support ONNX export of XDropout in deberta{,_v2} and sew_d by @garymm in #17502

Add Spanish translation of run_scripts.mdx by @donelianc in #18415

Update no trainer scripts for language modeling and image classification examples by @nandwalritik in #18443

Update pinned hhub version by @osanseviero in #18448

Fix failing tests for XLA generation in TF by @dsuess in #18298

add zero-shot obj detection notebook to docs by @alaradirik in #18453

fix: keras fit tests for segformer tf and minor refactors. by @sayakpaul in #18412

Fix torch version comparisons by @LSinev in #18460

[BLOOM] Clean modeling code by @thomasw21 in #18344

change shape to support dynamic batch input in tf.function XLA generate for tf serving by @nlpcat in #18372

HFTracer.trace can now take callables and torch.nn.Module by @michaelbenayoun in #18457

Update no trainer scripts for multiple-choice by @kiansierra in #18468

Fix load of model checkpoints in the Trainer by @sgugger in #18470

Add FX support for torch.baddbmm andd torch.Tensor.baddbmm by @thomasw21 in #18363

Add machine type in the artifact of Examples directory job by @ydshieh in #18459

Update no trainer examples for QA and Semantic Segmentation by @kiansierra in #18474

Add TF_MODEL_FOR_SEMANTIC_SEGMENTATION_MAPPING by @ydshieh in #18469

Fixing issue where generic model types wouldn't load properly with the pipeline by @Narsil in #18392

Fix TFSwinSelfAttention to have relative position index as non-trainable weight by @harrydrippin in #18226

Refactor TFSwinLayer to increase serving compatibility by @harrydrippin in #18352

Add TF prefix to TF-Res test class by @ydshieh in #18481

Remove py.typed by @sgugger in #18485

Fix pipeline tests by @sgugger in #18487

Use new huggingface_hub tools for download models by @sgugger in #18438

Fix test_dbmdz_english by updating expected values by @ydshieh in #18482

Move cache folder to huggingface/hub for consistency with hf_hub by @sgugger in #18492

Update some expected values in quicktour.mdx for resampy 0.3.0 by @ydshieh in #18484

disable Onnx test for google/long-t5-tglobal-base by @ydshieh in #18454

Typo reported by Joel Grus on TWTR by @julien-c in #18493

Just re-reading the whole doc every couple of months 😬 by @julien-c in #18489

transformers-cli login => huggingface-cli login by @julien-c in #18490

Add seed setting to image classification example by @regisss in #18519

[DX fix] Fixing QA pipeline streaming a dataset. by @Narsil in #18516

Clean up hub by @sgugger in #18497

update fsdp docs by @pacman100 in #18521

Fix compatibility with 1.12 by @sgugger in #17925

Specify en in doc-builder README example by @ankrgyl in #18526

New cache fixes: add safeguard before looking in folders by @sgugger in #18522

unpin resampy by @ydshieh in #18527

✨ update to use interlibrary links instead of Markdown by @stevhliu in #18500

Add example of multimodal usage to pipeline tutorial by @stevhliu in #18498

[VideoMAE] Add model to doc tests by @NielsRogge in #18523

Update perf_train_gpu_one.mdx by @mishig25 in #18532

Update no_trainer.py scripts to include accelerate gradient accumulation wrapper by @Rasmusafj in #18473

Add Spanish translation of converting_tensorflow_models.mdx by @donelianc in #18512

Spanish translation of summarization.mdx by @AguilaCudicio in #15947)

Let's not cast them all by @younesbelkada in #18471

fix: data2vec-vision Onnx ready-made configuration. by @NikeNano in #18427

Add mt5 onnx config by @ChainYo in #18394

Minor update of run_call_with_unpacked_inputs by @ydshieh in #18541

BART - Fix attention mask device issue on copied models by @younesbelkada in #18540

Adding a new align_to_words param to qa pipeline. by @Narsil in #18010

📝 update metric with evaluate by @stevhliu in #18535

Restore _init_weights value in no_init_weights by @YouJiacheng in #18504

📝 update documentation build section by @stevhliu in #18548

Preserve hub-related kwargs in AutoModel.from_pretrained by @sgugger in #18545

Use commit hash to look in cache instead of calling head by @sgugger in #18534

Update philosophy to include other preprocessing classes by @stevhliu in #18550

Properly move cache when it is not in default path by @sgugger in #18563

Adds CLIP to models exportable with ONNX by @unography in #18515

raise atol for MT5OnnxConfig by @ydshieh in #18560

fix string by @mrwyattii in #18568

Segformer TF: fix output size in documentation by @joihn in #18572

Fix resizing bug in OWL-ViT by @alaradirik in #18573

Fix LayoutLMv3 documentation by @pocca2048 in #17932

Change BartLearnedPositionalEmbedding's forward method signature to support Opacus training by @donebydan in #18486

german docs translation by @flozi00 in #18544

Deberta V2: Fix critical trace warnings to allow ONNX export by @iiLaurens in #18272

[FX] _generate_dummy_input supports audio-classification models for labels by @michaelbenayoun in #18580

Fix docstrings with last version of hf-doc-builder styler by @sgugger in #18581

fix owlvit tests, update docstring examples by @alaradirik in #18586

Return the permuted hidden states if return_dict=True by @amyeroberts in #18578

Add type hints for ViLT models by @donelianc in #18577

update doc for perf_train_cpu_many, add intel mpi introduction by @sywangyi in #18576

typos by @stas00 in #18594

FSDP bug fix for load_state_dict by @pacman100 in #18596

Add TFAutoModelForSemanticSegmentation to the main __init__.py by @ydshieh in #18600

Fix URLs by @NielsRogge in #18604

Update BLOOM parameter counts by @Muennighoff in #18531

[doc] fix anchors by @stas00 in #18591

[fsmt] deal with -100 indices in decoder ids by @stas00 in #18592

small change by @younesbelkada in #18584

Flax Remat for LongT5 by @KMFODA in #17994

Change scheduled CIs to use torch 1.12.1 by @ydshieh in #18644

Add checks for some workflow jobs by @ydshieh in #18583

TF: Fix generation repetition penalty with XLA by @gante in #18648

Update longt5.mdx by @flozi00 in #18634

Update run_translation_no_trainer.py by @zhoutang776 in #18637

[bnb] Minor modifications by @younesbelkada in #18631

Examples: add Bloom support for token classification by @stefan-it in #18632

Fix Yolos ONNX export test by @ydshieh in #18606

Fix matmul inputs dtype by @JingyaHuang in #18585

Update feature extractor methods to enable type cast before normalize by @amyeroberts in #18499

Allow users to force TF availability by @Rocketknight1 in #18650

[LongT5] Correct docs long t5 by @patrickvonplaten in #18669

Generate: validate model_kwargs on FLAX (and catch typos in generate arguments) by @gante in #18653

Ping detectron2 for CircleCI tests by @ydshieh in #18680

Rename method to avoid clash with property by @amyeroberts in #18677

Rename second input dimension from "sequence" to "num_channels" for CV models by @regisss in #17976

Fix repo consistency by @lewtun in #18682

Fix breaking change in onnxruntime for ONNX quantization by @severinsimmler in #18336

Add evaluate to examples requirements by @muellerzr in #18666

[bnb] Move documentation by @younesbelkada in #18671

Add an examples folder for code downstream tasks by @loubnabnl in #18679

model.tie_weights() should be applied after accelerator.prepare() by @Gladiator07 in #18676

Generate: add missing **model_kwargs in sample tests by @gante in #18696

Temp fix for broken detectron2 import by @patrickvonplaten in #18699

[Hotfix] pin detectron2 5aeb252 to avoid test fix by @ydshieh in #18701

Fix Data2VecVision ONNX test by @ydshieh in #18587

Add missing tokenizer tests - Longformer by @tgadeliya in #17677

remove check for main process for trackers initialization by @Gladiator07 in #18706

Unpin detectron2 by @ydshieh in #18727

Removing warning of model type for microsoft/tapex-base-finetuned-wtq by @Narsil in #18711

improve add_tokens docstring by @SaulLu in #18687

CLI: Don't check the model head when there is no model head by @gante in #18733

Update perf_infer_gpu_many.mdx by @mishig25 in #18744

Add minor doc-string change to include hp_name param in hyperparameter_search by @constantin-huetterer in #18700

fix pipeline_tutorial.mdx doctest by @ydshieh in #18717

Add TF implementation of XGLMModel by @stancld in #16543

fixed docstring typos by @JadeKim042386 in #18739

add warning to let the user know that the __call__ method is faster than encode + pad for a fast tokenizer by @SaulLu in #18693

examples/run_summarization_no_trainer: fixed incorrect param to hasattr by @rahular in #18720

Add ONNX support for Longformer by @deutschmn in #17176

Determine framework automatically before ONNX export by @rachthree in #18615

streamlining 'checkpointing_steps' parsing by @rahular in #18755

CLI: Improved error control and updated hub requirement by @gante in #18752

[VisionEncoderDecoder] Add gradient checkpointing by @patrickvonplaten in #18697

[Wav2vec2 + LM Test] Improve wav2vec2 with lm tests and make torch version dependent for now by @patrickvonplaten in #18749

Fix incomplete outputs of FlaxBert by @duongna21 in #18772

Fix broken link DeepSpeed documentation link by @philschmid in #18783

fix missing block when there is no failure by @ydshieh in #18775

fix a possible typo in auto feature extraction by @fcakyon in #18779

Fix memory leak issue in torch_fx tests by @ydshieh in #18547

Fix mock in test_cached_files_are_used_when_internet_is_down by @Wauplin in #18804

Add SegFormer and ViLT links by @NielsRogge in #18808

send model to the correct device by @ydshieh in #18800

Revert to and safely handle flag in owlvit config by @amyeroberts in #18750

Add docstring for BartForCausalLM by @ekagra-ranjan in #18795

up by @qqaatw in #18805

[Swin, Swinv2] Fix attn_mask dtype by @NielsRogge in #18803

Run tests if skip condition not met by @amyeroberts in #18764

Remove ViltForQuestionAnswering from check_repo by @NielsRogge in #18762

Adds OWLViT to models exportable with ONNX by @unography in #18588

Adds GroupViT to models exportable with ONNX by @unography in #18628

LayoutXLMProcessor: ensure 1-to-1 mapping between samples and images, and add test for it by @anthony2261 in #18774

Added Docstrings for Deberta and DebertaV2 [PyTorch] by @Tegzes in #18610

Improving the documentation for "word", within the pipeline. by @Narsil in #18763

Disable nightly CI temporarily by @ydshieh in #18820

Pin max tf version by @gante in #18818

Fix cost condition in DetrHungarianMatcher and YolosHungarianMatcher to allow zero-cost by @kongzii in #18647

oob performance improvement for cpu DDP by @sywangyi in #18595

Warn on TPUs when the custom optimizer and model device are not the same by @muellerzr in #18668

Update location identification by @LysandreJik in #18834

fix bug: register_for_auto_class should be defined on TFPreTrainedModel instead of TFSequenceSummary by @azonti in #18607

[DETR] Add num_channels attribute by @NielsRogge in #18714

Pin ffspec by @sgugger in #18837

Improve GPT2 doc by @ekagra-ranjan in #18787

Add an option to HfArgumentParser.parse_{dict,json_file} to raise an Exception when there extra keys by @FelixSchneiderZoom in #18692

Improve Text Generation doc by @ekagra-ranjan in #18788

Add SegFormer ONNX support by @NielsRogge in #18006

Add security warning about the from_pretrained() method by @lewtun in #18801

Owlvit memory leak fix by @alaradirik in #18734

Create pipeline_tutorial.mdx german docs by @flozi00 in #18625

Unpin fsspec by @albertvillanova in #18846

Delete state_dict to release memory as early as possible by @ydshieh in #18832

Generate: smaller TF serving test by @gante in #18840

add a script to get time info. from GA workflow jobs by @ydshieh in #18822

Pin rouge_score by @albertvillanova in #18247

Minor typo in prose of model outputs documentation. by @pcuenca in #18848

reflect max_new_tokens in Seq2SeqTrainer by @kumapo in #18786

Adds timeout argument to training_args to avoid socket timeouts in DDP by @gugarosa in #18562

Cache results of is_torch_tpu_available() by @comaniac in #18777

Tie weights after preparing the model in run_clm by @sgugger in #18855

Pin revision for LayoutLMForQuestionAnswering and TFLayoutLMForQuestionAnswering tests by @ankrgyl in #18854

Split docs on modality by @stevhliu in #18205

if learning rate is a tensor, get item (float) by @kmckiern in #18861

Fix naming issue with ImageToText pipeline by @OlivierDehaene in #18864

[LayoutLM] Add clarification to docs by @NielsRogge in #18716

Add OWL-ViT to the appropriate section by @NielsRogge in #18867

Clean up utils.hub using the latest from hf_hub by @sgugger in #18857

pin Slack SDK to 3.18.1 to avoid failing issue by @ydshieh in #18869

Fix number of examples for iterable datasets in multiprocessing by @sgugger in #18856

postpone bnb load until it's needed by @stas00 in #18859

A script to download artifacts and perform CI error statistics by @ydshieh in #18865

Remove cached torch_extensions on CI runners by @ydshieh in #18868

Update docs landing page by @stevhliu in #18590

Finetune guide for semantic segmentation by @stevhliu in #18640

Add Trainer to quicktour by @stevhliu in #18723

TF: TFMarianMTModel final logits bias as a layer by @gante in #18833

Mention TF and Flax checkpoints by @LysandreJik in #18894

Correct naming pegasus x by @patrickvonplaten in #18896

Update perf_train_gpu_one.mdx by @thepurpleowl in #18442

Add type hints to XLM-Roberta-XL models by @asofiaoliveira in #18475

Update Chinese documentation by @zkep in #18893

Generate: get the correct beam index on eos token by @gante in #18851

Mask t5 relative position bias then head pruned by @hadaev8 in #17968

updating gather function with gather_for_metrics in run_wav2vec2_pretraining by @arun99481 in #18877

Fix decode_input_ids to bare T5Model and improve doc by @ekagra-ranjan in #18791

Fix test_tf_encode_plus_sent_to_model for LayoutLMv3 by @ydshieh in #18898

fixes bugs to handle non-dict output by @alaradirik in #18897

Further reduce the number of alls to head for cached objects by @sgugger in #18871

unpin slack_sdk version by @ydshieh in #18901

Fix incorrect size of input for 1st strided window length in Perplexity of fixed-length models by @ekagra-ranjan in #18906

[VideoMAE] Improve code examples by @NielsRogge in #18919

Add checks for more workflow jobs by @ydshieh in #18905

Accelerator end training by @nbroad1881 in #18910

update the train_batch_size in case HPO change batch_size_per_device by @sywangyi in #18918

Update TF fine-tuning docs by @Rocketknight1 in #18654

TF: final bias as a layer in seq2seq models (replicate TFMarian fix) by @gante in #18903

remvoe _create_and_check_torch_fx_tracing in specific test files by @ydshieh in #18667

[DeepSpeed ZeRO3] Fix performance degradation in sharded models by @tjruwase in #18911

pin TF 2.9.1 for self-hosted CIs by @ydshieh in #18925

Fix XLA fp16 and bf16 error checking by @ymwangg in #18913

Starts on a list of external deps required for dev by @colindean in #18929

Add image height and width to ONNX dynamic axes by @lewtun in #18915

Skip some doctests in quicktour by @stevhliu in #18927

Fix LayoutXLM wrong link in README by @Devlee247 in #18932

Update translation requests contact by @NimaBoscarino in #18941

[JAX] Replace all jax.tree_* calls with jax.tree_util.tree_* by @sanchit-gandhi in #18361

Neptune.ai integration improvements by @Raalsky in #18934

Generate: Simplify is_pad_token_not_equal_to_eos_token_id by @ekagra-ranjan in #18933

Fix train_step, test_step and tests for CLIP by @Rocketknight1 in #18684

Exit early in load if no weights are in the sharded state dict by @sgugger in #18937

update black target version by @BramVanroy in #18955

RFC: Replace custom TF embeddings by Keras embeddings by @gante in #18939

TF: unpin maximum TF version by @gante in #18917

Revert "TF: unpin maximum TF version by @sgugger in #18917)"

remove unused activation dropout by @shijie-wu in #18842

add DDP HPO support for sigopt by @sywangyi in #18931

Remove decoder_position_ids from check_decoder_model_past_large_inputs by @ydshieh in #18980

create Past CI results as tables for GitHub issue by @ydshieh in #18953

Remove dropout in embedding layer of OPT by @shijie-wu in #18845

Fix TF start docstrings by @Rocketknight1 in #18991

Align try_to_load_from_cache with huggingface_hub by @sgugger in #18966

Fix tflongformer int dtype by @Rocketknight1 in #18907

TF: correct TFBart embeddings weights name when load_weight_prefix is passed by @gante in #18993

fix checkpoint name for wav2vec2 conformer by @ydshieh in #18994

added type hints by @daspartho in #18996

TF: TF 2.10 unpin + related onnx test skips by @gante in #18995

Fixed typo by @tnusser in #18921

Removed issue in wav2vec link by @chrisemezue in #18945

Fix MaskFormerFeatureExtractor instance segmentation preprocessing bug by @alaradirik in #18997

Add type hints for M2M by @daspartho in #18998

Fix tokenizer for XLMRobertaXL by @ydshieh in #19004

Update default revision for document-question-answering by @ankrgyl in #18938

Fixed bug which caused overwrite_cache to always be True by @rahular in #19000

add DDP HPO support for optuna by @sywangyi in #19002

add missing require_tf for TFOPTGenerationTest by @ydshieh in #19010

Re-add support for single url files in objects download by @sgugger in #19014

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@nandwalritik

Add swin transformer v2 (#17469)

Update no trainer scripts for language modeling and image classification examples (#18443)

@ankrgyl

Include tensorflow-aarch64 as a candidate (#18345)

Specify en in doc-builder README example (#18526)

Add LayoutLMForQuestionAnswering model (#18407)

Pin revision for LayoutLMForQuestionAnswering and TFLayoutLMForQuestionAnswering tests (#18854)

Add DocumentQuestionAnswering pipeline (#18414)

Update default revision for document-question-answering (#18938)

@ikuyamada

Adding fine-tuning models to LUKE (#18353)

@duongna21

Add Flax BART pretraining script (#18297)

Fix incomplete outputs of FlaxBert (#18772)

@donelianc

Add Spanish translation of run_scripts.mdx (#18415)

Add Spanish translation of converting_tensorflow_models.mdx (#18512)

Add type hints for ViLT models (#18577)

@sayakpaul

fix: keras fit tests for segformer tf and minor refactors. (#18412)

TensorFlow MobileViT (#18555)

@flozi00

german docs translation (#18544)

Update longt5.mdx (#18634)

Create pipeline_tutorial.mdx german docs (#18625)

@stancld

Add TF implementation of XGLMModel (#16543)

@ChrisFugl

[LayoutLMv3] Add TensorFlow implementation (#18678)

@zphang

PEGASUS-X (#18551)

@nghuyong

add task_type_id to BERT to support ERNIE-2.0 and ERNIE-3.0 models (#18686)

Source code(tar.gz)
Source code(zip)
v4.21.3(Sep 5, 2022)

Patch release to add a disclaimer about torch models hosted on the Hub. See autoclass tutorial.
Source code(tar.gz)
Source code(zip)
v4.21.2(Aug 24, 2022)

Fix a regression in the TableQA pipeline: Fix a regression in Trainer checkpoint loading: #18428
Source code(tar.gz)
Source code(zip)
v4.21.1(Aug 4, 2022)

Fix a regression in Trainer checkpoint loading: #18470
Source code(tar.gz)
Source code(zip)
v4.21.0(Jul 27, 2022)
TensorFlow XLA Text Generation

The TensorFlow text generation method can now be wrapped with tf.function and compiled to XLA. You should be able to achieve up to 100x speedup this way. See our blog post and our benchmarks. You can also see XLA generation in action in our example notebooks, particularly for summarization and translation.

import tensorflow as tf from transformers import AutoTokenizer, TFAutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("t5-small") model = TFAutoModelForSeq2SeqLM.from_pretrained("t5-small") # Main changes with respect to the original generate workflow: `tf.function` and `pad_to_multiple_of` xla_generate = tf.function(model.generate, jit_compile=True) tokenization_kwargs = {"pad_to_multiple_of": 32, "padding": True, "return_tensors": "tf"} # The first prompt will be slow (compiling), the others will be very fast! input_prompts = [ f"translate English to {language}: I have four cats and three dogs." for language in ["German", "French", "Romanian"] ] for input_prompt in input_prompts: tokenized_inputs = tokenizer([input_prompt], **tokenization_kwargs) generated_text = xla_generate(**tokenized_inputs, max_new_tokens=32) print(tokenizer.decode(generated_text[0], skip_special_tokens=True))

Generate: deprecate default max_length by @gante in #18018

TF: GPT-J compatible with XLA generation by @gante in #17986

TF: T5 can now handle a padded past (i.e. XLA generation) by @gante in #17969

TF: XLA beam search + most generation-compatible models are now also XLA-generate-compatible by @gante in #17857

TF: generate without tf.TensorArray by @gante in #17801

TF: BART compatible with XLA generation by @gante in #17479

New model additions

OwlViT

The OWL-ViT model (short for Vision Transformer for Open-World Localization) was proposed in Simple Open-Vocabulary Object Detection with Vision Transformers by Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby. OWL-ViT is an open-vocabulary object detection network trained on a variety of (image, text) pairs. It can be used to query an image with one or multiple text queries to search for and detect target objects described in text.

Add OWL-ViT model for zero-shot object detection by @alaradirik in #17938

Fix OwlViT tests by @sgugger in #18253

NLLB

The NLLB model was presented in No Language Left Behind: Scaling Human-Centered Machine Translation by Marta R. Costa-jussà, James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht, Jean Maillard, Anna Sun, Skyler Wang, Guillaume Wenzek, Al Youngblood, Bapi Akula, Loic Barrault, Gabriel Mejia Gonzalez, Prangthip Hansanti, John Hoffman, Semarley Jarrett, Kaushik Ram Sadagopan, Dirk Rowe, Shannon Spruit, Chau Tran, Pierre Andrews, Necip Fazil Ayan, Shruti Bhosale, Sergey Edunov, Angela Fan, Cynthia Gao, Vedanuj Goswami, Francisco Guzmán, Philipp Koehn, Alexandre Mourachko, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, and Jeff Wang. No Language Left Behind (NLLB) is a model capable of delivering high-quality translations directly between any pair of 200+ languages — including low-resource languages like Asturian, Luganda, Urdu and more.

[M2M100] update conversion script by @patil-suraj in #17916

NLLB tokenizer by @LysandreJik in #18126

MobileViT

The MobileViT model was proposed in MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer by Sachin Mehta and Mohammad Rastegari. MobileViT introduces a new layer that replaces local processing in convolutions with global processing using transformers.

add MobileViT model by @hollance in #17354

Nezha

The Nezha model was proposed in NEZHA: Neural Contextualized Representation for Chinese Language Understanding by Junqiu Wei et al. NEZHA is a language model based on BERT with a collection of proven improvements, which include Functional Relative Positional Encoding as an effective positional encoding scheme, Whole Word Masking strategy, Mixed Precision Training and the LAMB Optimizer in training the models.

Nezha Pytorch implementation by @sijunhe in #17776

GroupViT

The GroupViT model was proposed in GroupViT: Semantic Segmentation Emerges from Text Supervision by Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang. Inspired by CLIP, GroupViT is a vision-language model that can perform zero-shot semantic segmentation on any given vocabulary categories, inspired by CLIP.

Adding GroupViT Models by @xvjiarui in #17313

MVP

The MVP model was proposed in MVP: Multi-task Supervised Pre-training for Natural Language Generation by Tianyi Tang, Junyi Li, Wayne Xin Zhao and Ji-Rong Wen. MVP is a generative language model, pre-trained on a labeled pre-training corpus from 45 datasets over seven generation tasks. For each task, the model is further pre-trained using specific soft prompts to stimulate the model capacity in performing a specific task.

Add MVP model by @StevenTang1998 in #17787

CodeGen

The CodeGen model was proposed in A Conversational Paradigm for Program Synthesis by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. CodeGen is an autoregressive language model for program synthesis trained sequentially on The Pile, BigQuery, and BigPython.

Add CodeGen model by @rooa in #17443

[CodeGen] support device_map="auto" for sharded checkpoints by @patil-suraj in #17871

UL2

The UL2 model was presented in Unifying Language Learning Paradigms by Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler. UL2 is a unified framework for pretraining models that are universally effective across datasets and setups. UL2 uses Mixture-of-Denoisers (MoD), a pre-training objective that combines diverse pre-training paradigms together. UL2 introduces a notion of mode switching, wherein downstream fine-tuning is associated with specific pre-training schemes.

Add UL2 (just docs) by @patrickvonplaten in #17740

Custom pipelines

This adds the ability to support custom pipelines on the Hub and share it with everyone else. Like the code in the Hub feature for models, tokenizers etc., the user has to add trust_remote_code=True when they want to use it. Apart from this, the best way to get familiar with the feature is to look at the added documentation.

Custom pipeline by @sgugger in #18079

PyTorch to TensorFlow CLI utility

This adds a CLI to convert PT weights into TF weights, validate them, and (optionally) open a PR.

CLI: tool to convert PT into TF weights and open hub PR by @gante in https://github.com/huggingface/transformers/pull/17497

TensorFlow-specific improvements

The following models have been ported to be used in TensorFlow: SegFormer, DeiT, ResNet and RegNet.

[SegFormer] TensorFlow port by @sayakpaul in #17910

Add TF DeiT implementation by @amyeroberts in #17806

Add TF ResNet model by @amyeroberts in #17427

TF implementation of RegNets by @ariG23498 in #17554

Additionally, our TF models now support loading sharded checkpoints:

TF Sharded by @ArthurZucker in #17713

Flax-specific improvements

The following models have been ported to be used in JAX:

Flax t5 Encoder by @crystina-z in #17784

Additionally, our JAX models now support loading sharded checkpoints:

Flax sharded by @ArthurZucker in #17760

Additional model heads

The following models now have a brand new head for new tasks:

Add ViltForTokenClassification e.g. for Named-Entity-Recognition (NER) by @gilad19 in #17924

Adding OPTForSeqClassification class by @oneraghavan in #18123

ONNX support

A continued community effort provides ONNX converters for an increasing number of models.

add ONNX support for LeVit by @gcheron in #18154

add ONNX support for BLOOM by @NouamaneTazi in #17961

Add ONNX support for LayoutLMv3 by @regisss in #17953

Mrbean/codegen onnx by @sam-h-bean in #17903

Add ONNX support for DETR by @regisss in #17904

add onnx support for deberta and debertav2 by @sam-h-bean in #17617

Documentation translation

A community effort aiming to translate the documentation in several languages has been continued.

Portuguese

Added translation of index.mdx to Portuguese Issue #16824 by @rzimmerdev in #17565

Spanish

Add Spanish translation of custom_models.mdx by @donelianc in #17807

Italian

Add Italian translation of sharing_custom_models.mdx by @Xpiri in #17631

Add Italian translation of converting_tensorflow_models.mdx by @Xpiri in #18283

Add Italian translation of create_model.mdx and serialization.mdx by @F02934 in #17640

Italian/accelerate by @mfumanelli in #17698

Italian/model sharing by @mfumanelli in #17828

Italian translation of run_scripts.mdx gh-17459 by @lorenzobalzani in #17642

Translation/debugging by @nickprock in #18230

Translation/training: italian translation training.mdx by @nickprock in #17662

Translation italian: multilingual.mdx by @nickprock in #17768

Added preprocessing.mdx italian translation by @nickprock in #17600

Improvements and bugfixes

[EncoderDecoder] Improve docs by @NielsRogge in #18271

[DETR] Improve code examples by @NielsRogge in #18262

patch for smddp import by @carolynwang in #18244

Fix Sylvain's nits on the original KerasMetricCallback PR by @Rocketknight1 in #18300

Add PYTEST_TIMEOUT for CircleCI test jobs by @ydshieh in #18251

Add PyTorch 1.11 to past CI by @ydshieh in #18302

Raise a TF-specific error when importing Torch classes by @Rocketknight1 in #18280

[ create_a_model.mdx ] translate to pt by @Fellip15 in #18098

Update translation.mdx by @gorkemozkaya in #18169

Add TFAutoModelForImageClassification to pipelines.py by @ydshieh in #18292

Adding type hints of TF:OpenAIGPT by @Mathews-Tom in #18263

Adding type hints of TF:CTRL by @Mathews-Tom in #18264

Replace false parameter by a buffer by @sgugger in #18259

Fix ORTTrainer failure on gpt2 fp16 training by @JingyaHuang in #18017

Owlvit docs test by @alaradirik in #18257

Good difficult issue override for the stalebot by @LysandreJik in #18094

Fix dtype of input_features in docstring by @ydshieh in #18258

Fix command of doc tests for local testing by @oneraghavan in #18236

Fix TF bad words filter with XLA by @Rocketknight1 in #18286

Allows KerasMetricCallback to use XLA generation by @Rocketknight1 in #18265

Skip passes report for --make-reports by @ydshieh in #18250

Update serving code to enable saved_model=True by @amyeroberts in #18153

Change how take_along_axis is computed in DeBERTa to stop confusing XLA by @Rocketknight1 in #18256

Fix torch version check in Vilt by @ydshieh in #18260

change bloom parameters to 176B by @muhammad-ahmed-ghani in #18235

TF: use the correct config with (...)EncoderDecoder models by @gante in #18097

Fix no_trainer CI by @muellerzr in #18242

Update notification service by @ydshieh in #17921

Make errors for loss-less models more user-friendly by @sgugger in #18233

Fix TrainingArguments help section by @sgugger in #18232

Better messaging and fix for incorrect shape when collating data. by @CakeCrusher in #18119

Add support for Sagemaker Model Parallel >= 1.10 new checkpoint API by @viclzhu in #18221

Update add_new_pipeline.mdx by @zh-zheng in #18224

Add custom config to quicktour by @stevhliu in #18115

skip some test_multi_gpu_data_parallel_forward by @ydshieh in #18188

Change to FlavaProcessor in PROCESSOR_MAPPING_NAMES by @ydshieh in #18213

Fix LayoutXLM docstrings by @qqaatw in #17038

update cache to v0.5 by @ydshieh in #18203

Reduce console spam when using the KerasMetricCallback by @Rocketknight1 in #18202

TF: Add missing cast to GPT-J by @gante in #18201

Use next-gen CircleCI convenience images by @ydshieh in #18197

Typo in readme by @flozi00 in #18195

[From pretrained] Allow download from subfolder inside model repo by @patrickvonplaten in #18184

Update docs README with instructions on locally previewing docs by @snehankekre in #18196

bugfix: div-->dim by @orgoro in #18135

Add vision example to README by @sgugger in #18194

Remove use_auth_token from the from_config method by @duongna21 in #18192

FSDP integration enhancements and fixes by @pacman100 in #18134

BLOOM minor fixes small test by @younesbelkada in #18175

fix typo inside bloom documentation by @SaulLu in #18187

Better default for offload_state_dict in from_pretrained by @sgugger in #18183

Fix template for new models in README by @sgugger in #18182

FIX: Typo by @ayansengupta17 in #18156

Update TF(Vision)EncoderDecoderModel PT/TF equivalence tests by @ydshieh in #18073

Fix expected loss values in some (m)T5 tests by @ydshieh in #18177

[HPO] update to sigopt new experiment api by @sywangyi in #18147

Fix incorrect type hint for lang by @JohnGiorgi in #18161

Fix check for falsey inputs in run_summarization by @JohnGiorgi in #18155

Adding support for device_map directly in pipeline(..) function. by @Narsil in #17902

Fixing a hard to trigger bug for text-generation pipeline. by @Narsil in #18131

Enable torchdynamo with torch_tensorrt(fx path) by @frank-wei in #17765

Make sharded checkpoints work in offline mode by @sgugger in #18125

add dataset split and config to model-index in TrainingSummary.from_trainer by @loicmagne in #18064

Add summarization name mapping for MultiNews by @JohnGiorgi in #18117

supported python versions reference by @CakeCrusher in #18116

TF: unpack_inputs decorator independent from main_input_name by @gante in #18110

TF: remove graph mode distinction when processing boolean options by @gante in #18102

Fix BLOOM dtype by @Muennighoff in #17995

CLI: reenable pt_to_tf test by @gante in #18108

Report value for a step instead of epoch. by @zhawe01 in #18095

speed up test by @sijunhe in #18106

Enhance IPEX integration in Trainer by @jianan-gu in #18072

Bloom Optimize operations by @younesbelkada in #17866

Add filename to info diaplyed when downloading things in from_pretrained by @sgugger in #18099

Fix image segmentation and object detection pipeline tests by @sgugger in #18100

Fix RESOURCE_EXHAUSTED error when dealing with large datasets in Flax example scripts by @duongna21 in #18069

Fix torchscript tests for GPT-NeoX by @ydshieh in #18012

Fix some typos. by @Yulv-git in #17560

[bloom] fix alibi device placement by @stas00 in #18087

Make predict() close progress bars after finishing by @neverix in #17952)

Update localized READMES when template is filled. by @sgugger in #18062

Fix type issue in using bucketing with Trainer by @seopbo in #18051

Fix slow CI by pinning resampy by @sgugger in #18077

Drop columns after loading samples in prepare_tf_dataset by @Rocketknight1 in #17967

[Generate Tests] Make sure no tokens are force-generated by @patrickvonplaten in #18053

Added Command for windows VENV activation in installation docs by @darthvader2 in #18008

Sort doc toc by @sgugger in #18034

Place inputs on device when include_inputs_for_metrics is True by @sgugger in #18046

Doc to dataset by @sgugger in #18037

Protect TFGenerationMixin.seed_generator so it's not created at import by @Rocketknight1 in #18044

Fix T5 incorrect weight decay in Trainer and official summarization example by @ADAning in #18002

Squash commits by @NielsRogge in #17981

Enable Past CI by @ydshieh in #17919

Fix T5/mT5 tests by @Rocketknight1 in #18029

[Flax] Bump to v0.4.1 by @sanchit-gandhi in #17966

Update expected values in DecisionTransformerModelIntegrationTest by @ydshieh in #18016

fixed calculation of ctc loss in TFWav2Vec2ForCTC by @Sreyan88 in #18014

Return scalar losses instead of per-sample means by @Rocketknight1 in #18013

sort list of models by @hollance in #18011

Replace BloomTokenizer by BloomTokenizerFast in doc by @regisss in #18005

Fix typo in error message in generation_utils by @regisss in #18000

Refactor to inherit from nn.Module instead of nn.ModuleList by @amyeroberts in #17501

Add link to existing documentation by @LysandreJik in #17931

only a stupid typo, but it can lead to confusion by @Dobatymo in #17930

Exclude Databricks from notebook env only if the runtime is below 11.0 by @davidheryanto in #17988

Shifting labels for causal LM when using label smoother by @seungeunrho in #17987

Restore original task in test_warning_logs by @ydshieh in #17985

Ensure PT model is in evaluation mode and lightweight forward pass done by @amyeroberts in #17970

XLA train step fixes by @Rocketknight1 in #17973

[Flax] Add remat (gradient checkpointing) by @sanchit-gandhi in #17843

higher atol to avoid flaky trainer test failure by @ydshieh in #17979

Fix FlaxBigBirdEmbeddings by @ydshieh in #17842

fixing fsdp autowrap functionality by @pacman100 in #17922

fix bias keyword argument in TFDebertaEmbeddings by @WissamAntoun in #17940

Update expected values in CodeGen tests by @ydshieh in #17888

Fix typo in perf_train_gpu_one.mdx by @aliencaocao in #17983

skip some gpt_neox tests that require 80G RAM by @ydshieh in #17923

feat: add pipeline registry abstraction by @aarnphm in #17905

skip some ipex tests until it works with torch 1.12 by @ydshieh in #17964

Fix number of examples for iterable dataset in distributed training by @sgugger in #17951

[Pipelines] Add revision tag to all default pipelines by @patrickvonplaten in #17667

Unifying training argument type annotations by @jannisborn in #17934

Fix GPT-NeoX-20B past handling, attention computation by @zphang in #17811

Fix #17893, removed dead code by @clefourrier in #17917

Fix prepare_tf_dataset when drop_remainder is not supplied by @Rocketknight1 in #17950

ExplicitEnum subclass str (JSON dump compatible) by @BramVanroy in #17933

PyTorch 1.12.0 for scheduled CI by @ydshieh in #17949

OPT - Fix Softmax NaN in half precision mode by @younesbelkada in #17437

Use explicit torch version in deepspeed CI by @ydshieh in #17942

fix regexes with escape sequence by @stas00 in #17943

Fix all is_torch_tpu_available issues by @muellerzr in #17936

Fix img seg tests (load checkpoints from hf-internal-testing) by @mishig25 in #17939

Remove imports and use forward references in ONNX feature by @sgugger in #17926

Fix job links in Slack report by @ydshieh in #17892

Add missing comment quotes by @leondz in #17379

Remove render tags by @NielsRogge in #17897

Fix the Conda package build by @bryant1410 in #16737

Remove DT_DOUBLE from the T5 graph by @szutenberg in #17891

Compute min_resolution in prepare_image_inputs by @ydshieh in #17915

Fixing a regression with return_all_scores introduced in #17606 by @Narsil in #17906

In group_texts function, drop last block if smaller than block_size by @billray0259 in #17908

Move logic into pixelshuffle layer by @amyeroberts in #17899

Fix loss computation in TFBertForPreTraining by @Rocketknight1 in #17898

Pin black to 22.3.0 to benefit from a stable --preview flag by @LysandreJik in #17918

Fix PyTorch/TF Auto tests by @ydshieh in #17895

Fix test_number_of_steps_in_training_with_ipex by @ydshieh in #17889

Update expected values in constrained beam search tests by @ydshieh in #17887

Fix bug in gpt2's (from-scratch) special scaled weight initialization by @karpathy in #17877

Update README_zh-hans.md by @mmdjiji in #17861

bert: add conversion script for BERT Token Dropping TF2 checkpoints by @stefan-it in #17142

Fix add new model like frameworks by @sgugger in #17869

Add type annotations for RoFormer models by @donelianc in #17878

fix by @ydshieh in #17890

fix mask by @younesbelkada in #17837

Add a TF in-graph tokenizer for BERT by @Rocketknight1 in #17701

Fix TF GPT2 test_onnx_runtime_optimize by @ydshieh in #17874

CLI: handle multimodal inputs by @gante in #17839

Properly get tests deps in test_fetcher by @sgugger in #17870

Fix test_inference_instance_segmentation_head by @ydshieh in #17872

Skip test_multi_gpu_data_parallel_forward for MaskFormer by @ydshieh in #17864

Use higher value for hidden_size in Flax BigBird test by @ydshieh in #17822

Fix: torch.utils.checkpoint import error. by @kumapo in #17849

Add type hints for gptneox models by @willtai in #17858

Fix Splinter test by @ydshieh in #17854

[tests/VisionEncoderDecoder] import to_2tuple from test utils by @patil-suraj in #17865

Fix Constrained beam search duplication and weird output issue by @boy2000-007man in #17814

Improve encoder decoder model docs by @Threepointone4 in #17815

Improve vision models by @NielsRogge in #17731

Auto-build Docker images before on-merge if setup.py was changed by @muellerzr in #17573

Properly calculate the total train iterations and recalculate num epochs in no_trainer scripts by @muellerzr in #17856

Index RNG states by global rank in saves by @sgugger in #17852

Change no trainer image_classification test by @muellerzr in #17635

Update modeling_cvt.py by @F02934 in #17846

Fix broken test for models with batchnorm by @Rocketknight1 in #17841

BLOOM minor changes on tokenizer by @younesbelkada in #17823

Improve performance docs by @lvwerra in #17750

Fix an error message in BigBird by @ydshieh in #17840

Fix properties of unset special tokens in non verbose mode by @guillaumekln in #17797

change message by @SaulLu in #17836

Add missing type hints for QDQBertModel by @willtai in #17783

Update type hints modeling_yoso.py by @F02934 in #17827

add doctests for DETR by @qherreros in #17786

Fix push CI artifact path by @ydshieh in #17788

Offload fixes by @sgugger in #17810

CLI: use hub's create_commit by @gante in #17755

initial commit by @ArthurZucker in #17818

Add logits_processor parameter, used by generate, to Seq2SeqTrainer methods evaluate and predict by @eranhirs in #17805

Fix top_k_top_p_filtering having unexpected behavior by @unifyh in #17744

Remove duplicate code by @lkm2835 in #17708

CLI: convert sharded PT models by @gante in #17959

Improve error message Union not allowed by @BramVanroy in #17769

Add final_layer_norm to OPT model by @thomasw21 in #17785

Properly check for a TPU device by @muellerzr in #17802

Fix test for BF16 detection by @sgugger in #17803

Use 5e-5 For BigBird PT/Flax equivalence tests by @ydshieh in #17780

Prepare transformers for v0.8.0 huggingface-hub release by @LysandreJik in #17716

Fix forward reference imports in DeBERTa configs by @sgugger in #17800

Fix Automatic Download of Pretrained Weights in DETR by @AnugunjNaman in #17712

[ViTMAE] Fix docstrings and variable names by @NielsRogge in #17710

Add link to notebook by @NielsRogge in #17791

[CodeParrot] Near-deduplication with jaccard similarity by @liyongsea in #17054

Update modeling_longt5.py by @bjascob in #17777

Not use -1e4 as attn mask by @ydshieh in #17306

Fix cache for GPT-Neo-X by @sgugger in #17764

deprecate is_torch_bf16_available by @stas00 in #17738

Attempt to change Push CI to workflow_run by @ydshieh in #17753

Save huggingface checkpoint as artifact in mlflow callback by @swethmandava in #17686

Migrate HFDeepSpeedConfig from trfrs to accelerate by @pacman100 in #17623

feat: add num_workers arg to DataLoader by @greg2451 in #17751

Enable PyTorch nightly build CI by @ydshieh in #17335

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@donelianc

Add Spanish translation of custom_models.mdx (#17807)

Add type annotations for RoFormer models (#17878)

@Xpiri

Add Italian translation of sharing_custom_models.mdx (#17631)

Add Italian translation of converting_tensorflow_models.mdx (#18283)

@F02934

Add Italian translation of create_model.mdx and serialization.mdx (#17640)

Update modeling_cvt.py (#17846)

Update type hints modeling_yoso.py (#17827)

@sayakpaul

[SegFormer] TensorFlow port (#17910)

@mfumanelli

Italian/accelerate (#17698)

Italian/model sharing (#17828)

@nickprock

Translation/debugging (#18230)

Translation/training: italian translation training.mdx (#17662)

Translation italian: multilingual.mdx (#17768)

Added preprocessing.mdx italian translation (#17600)

@sijunhe

speed up test (#18106)

Nezha Pytorch implementation (#17776)

@StevenTang1998

Add MVP model (#17787)

@ariG23498

TF implementation of RegNets (#17554)

@xvjiarui

Adding GroupViT Models (#17313)

@rooa

Add CodeGen model (#17443)

Source code(tar.gz)
Source code(zip)
v4.20.1(Jun 21, 2022)
This patch releases fixes a bug in the OPT models and makes Transformers compatible with huggingface_hub version 0.8.1.

Add final_layer_norm to OPT model #17785

Prepare transformers for v0.8.0 huggingface-hub release #17716

Source code(tar.gz)
Source code(zip)
v4.20.0(Jun 16, 2022)
Big model inference

You can now use the big model inference of Accelerate directly in any call to from_pretrained by specifying device_map="auto" (or your own device_map). It will automatically load the model taking advantage of your GPU(s) then offloading what doesn't fit in RAM, or even on the hard drive if you don't have RAM. Your model can then be used normally for inference without anything else to do.

from transformers import AutoModelForSeq2SeqLM model = AutoModelForSeq2SeqLM.from_pretrained( "bigscience/T0pp", revision="sharded", device_map="auto" )

Use Accelerate in from_pretrained for big model inference by @sgugger in #17341

BLOOM

The BLOOM model has been proposed with its various versions through the BigScience Workshop. The architecture of BLOOM is essentially similar to GPT3 (auto-regressive model for next token prediction), but has been trained on different 46 languages including code.

BLOOM by @younesbelkada in #17474

CvT

The Convolutional vision Transformer (CvT) improves the Vision Transformer (ViT) in performance and efficiency by introducing convolutions into ViT to yield the best of both designs.

Add CvT by @NielsRogge and @AnugunjNaman in #17299

GPT Neo-X

GPT-NeoX-20B is a 20 billion parameter autoregressive language model trained on the Pile, whose weights are made freely and openly available to the public through a permissive license. GPT-NeoX-20B is a particularly powerful few-shot reasoner and gains far more in performance when evaluated five-shot than similarly sized GPT-3 and FairSeq models.

Adding GPT-NeoX-20B by @zphang in #16659

LayoutLMv3

LayoutLMv3 simplifies LayoutLMv2 by using patch embeddings (as in ViT) instead of leveraging a CNN backbone, and pre-trains the model on 3 objectives: masked language modeling (MLM), masked image modeling (MIM) and word-patch alignment (WPA).

Add LayoutLMv3 by @NielsRogge in #17060

LeViT

LeViT improves the Vision Transformer (ViT) in performance and efficiency by a few architectural differences such as activation maps with decreasing resolutions in Transformers and the introduction of an attention bias to integrate positional information.

Adding LeViT Model by Facebook by @AnugunjNaman in #17466

LongT5

LongT5 model is an extension of T5 model, and it enables using one of the two different efficient attention mechanisms - (1) Local attention, or (2) Transient-Global attention. It is capable of handling input sequences of a length up to 16,384 tokens.

Add LongT5 model by @stancld in #16792

M-CTC-T

The M-CTC-T model is a 1B-param transformer encoder, with a CTC head over 8065 character labels and a language identification head over 60 language ID labels. It is trained on Common Voice (version 6.1, December 2020 release) and VoxPopuli. After training on Common Voice and VoxPopuli, the model is trained on Common Voice only. The labels are unnormalized character-level transcripts (punctuation and capitalization are not removed). The model takes as input Mel filterbank features from a 16Khz audio signal.

M-CTC-T Model by @cwkeam in #16402

Trajectory Transformer

This Transformer is used for deep reinforcement learning. To use it, you need to create sequences from actions, states and rewards from all previous timesteps. This model will treat all these elements together as one big sequence (a trajectory).

Add trajectory transformer by @CarlCochet in #17141

Wav2Vec2-Conformer

The Wav2Vec2-Conformer is an updated version of fairseq S2T: Fast Speech-to-Text. It requires more parameters than Wav2Vec2, but also yields an improved word error rate.

[Wav2Vec2Conformer] Official release by @patrickvonplaten in #17709

Add Wav2Vec2Conformer by @patrickvonplaten in #16812

TensorFlow implementations

Data2VecVision for semantic segmentation, OPT and Swin are now available in TensorFlow.

Add TFData2VecVision for semantic segmentation by @sayakpaul in #17271

Opt in flax and tf by @ArthurZucker in #17388

Add Tensorflow Swin model by @amyeroberts in #16988

Flax implementations

OPT is now available in Flax.

Opt in flax and tf by @ArthurZucker in #17388

Documentation translation in Italian and Portuguese

A community effort has been started to translate the documentation in two new languages: Italian and Portuguese.

Translation/italian: added pipeline_tutorial.mdx [Issue: #17459] by @nickprock in #17507

Add installation.mdx Italian translation by @mfumanelli in #17530

Setup for Italian translation and add quicktour.mdx translation by @mfumanelli in #17472

Adding the Portuguese version of the tasks/token_classification.mdx documentation by @jonatasgrosman in #17492

Adding the Portuguese version of the tasks/sequence_classification.mdx documentation by @jonatasgrosman in #17352

[ fast_tokenizers.mdx ] - Added translation to portuguese to tutorial by @Fellip15 in #17076

Added translation of installation.mdx to Portuguese Issue #16824 by @rzimmerdev in #16979

Improvements and bugfixes

Sort the model doc Toc Alphabetically by @sgugger in #17723

normalize keys_to_ignore by @stas00 in #17722

CLI: Add flag to push TF weights directly into main by @gante in #17720

Update requirements.txt by @jeffra in #17719

Revert "Change push CI to run on workflow_run event by @ydshieh in #17692)"

Documentation: RemBERT fixes by @stefan-it in #17641

Change push CI to run on workflow_run event by @ydshieh in #17692

fix tolerance for a bloom slow test by @younesbelkada in #17634

[LongT5] disable model parallel test by @patil-suraj in #17702

FX function refactor by @michaelbenayoun in #17625

Add BloomForSequenceClassification and BloomForTokenClassification classes by @haileyschoelkopf in #17639

Swin main layer by @amyeroberts in #17693

Include a comment to reflect Amy's contributions by @sayakpaul in #17689

Rag end2end new by @shamanez in #17650

[LongT5] Rename checkpoitns by @patrickvonplaten in #17700

Extend Transformers Trainer Class to Enable PyTorch Torchscript for Inference by @jianan-gu in #17153

Fix doc builder Dockerfile by @ydshieh in #17435

Add FP16 Support for SageMaker Model Parallel by @haohanchen-yagao in #17386

enable cpu distribution training using mpirun by @sywangyi in #17570

Add Ray's scope to training arguments by @BramVanroy in #17629

Update modeling_gpt_neox.py by @willfrey in #17575

Fix dtype getter by @sgugger in #17668

explicitly set utf8 for Windows by @BramVanroy in #17664

Fixed documentation typo, parameter name is evaluation_strategy, not eval_strategy by @sainttttt in #17669

Add Visual Question Answering (VQA) pipeline by @sijunhe in #17286

Fix typo in adding_a_new_model README by @ayushtues in #17679

Avoid GPU OOM for a TF Rag test by @ydshieh in #17638

fix typo from emtpy to empty by @domenicrosati in #17643

[Generation Test] Make fast test actually fast by @patrickvonplaten in #17661

[Data2Vec] Speed up test by @patrickvonplaten in #17660

[BigBirdFlaxTests] Make tests slow by @patrickvonplaten in #17658

update README.md by @loubnabnl in #17657

🐛 Properly raise RepoNotFoundError when not authenticated by @SBrandeis in #17651

Fixes #17128 . by @mygithubid1 in #17356

Fix dtype getters by @sgugger in #17656

Add skip logic for attentions test - Levit by @amyeroberts in #17633

Enable crop_center method to handle (W, H, C) images by @alaradirik in #17626

Move Clip image utils to image_utils.py by @alaradirik in #17628

Skip tests until bug is fixed. by @sgugger in #17646

Translation/autoclass by @mfumanelli in #17615

didn't exist in pt-1.9 by @stas00 in #17644

convert assertion to raised exception in debertav2 by @sam-h-bean in #17619

Pre-build DeepSpeed by @ydshieh in #17607

[modeling_utils] torch_dtype/auto floating dtype fixes by @stas00 in #17614

Running a pipeline of float16. by @Narsil in #17637

fix use_amp rename after pr 17138 by @stas00 in #17636

Fix very long job failure text in Slack report by @ydshieh in #17630

Adding top_k argument to text-classification pipeline. by @Narsil in #17606

Mention in the doc we drop support for fairscale by @sgugger in #17610

Use shape_list to safely get shapes for Swin by @amyeroberts in #17591

Add ONNX support for ConvNeXT by @regisss in #17627

Add ONNX support for ResNet by @regisss in #17585

has_attentions - consistent test skipping logic and tf tests by @amyeroberts in #17495

CLI: Print all different tensors on exception by @gante in #17612

TF: Merge PT and TF behavior for Bart when no decoder_input_ids are passed by @gante in #17593

Fix telemetry URL by @sgugger in #17608

CLI: Properly detect encoder-decoder models by @gante in #17605

Fix link for community notebooks by @ngoquanghuy99 in #17602

Extend Transformers Trainer Class to Enable CPU AMP and Integrate Intel Extension for PyTorch by @jianan-gu in #17138

fix train_new_from_iterator in the case of byte-level tokenizers by @SaulLu in #17549

Explicit versions in docker files by @ydshieh in #17586

CLI: add stricter automatic checks to pt-to-tf by @gante in #17588

fix by @ydshieh in #17589

quicktour.mdx en -> pt translation by @vitorfrois in #17074

Fx support for Deberta-v[1-2], Hubert and LXMERT by @michaelbenayoun in #17539

Add examples telemetry by @sgugger in #17552

Fix gendered sentence in Spanish translation by @omarespejel in #17558

Fix circular import in onnx.utils by @sgugger in #17577

Use latest stable PyTorch/DeepSpeed for Push & Scheduled CI by @ydshieh in #17417

Remove circular imports in layoutlm/init.py by @regisss in #17576

Add magic method to our TF models to convert datasets with column inference by @Rocketknight1 in #17160

[deepspeed / testing] reset global state by @stas00 in #17553

Remove RuntimeErrors for NaN-checking in 20B by @zphang in #17563

fix integration test levit by @AnugunjNaman in #17555

[deepspeed] fix load_best_model test by @stas00 in #17550

Update index.mdx by @BritneyMuller in #17547

Clean imports to fix test_fetcher by @sgugger in #17531

Update run_glue_no_trainer.py by @bofenghuang in #17546

Fix all offload and MP tests by @sgugger in #17533

Fix bug - layer names and activation from previous refactor by @amyeroberts in #17524

Add support for Perceiver ONNX export by @deutschmn in #17213

Allow from transformers import TypicalLogitsWarper by @teticio in #17477

Add Gated-SiLU to T5 by @DanielHesslow in #17420

Update URL for Hub PR docs by @lewtun in #17532

fix OPT-Flax CI tests by @ArthurZucker in #17512

[trainer/deepspeed] load_best_model (reimplement re-init) by @stas00 in #17151

Implemented loss for training AudioFrameClassification by @MorenoLaQuatra in #17513

Update configuration_auto.py by @kamalkraj in #17527

Check list of models in the main README and sort it by @sgugger in #17517

Fix when Accelerate is not installed by @sgugger in #17518

Clean README in post release job as well. by @sgugger in #17519

Fix CI tests hang forever by @ydshieh in #17471

Print more library versions in CI by @ydshieh in #17384

Split push CI into 2 workflows by @ydshieh in #17369

Fix Tapas tests by @ydshieh in #17510

CLI: tool to convert PT into TF weights and open hub PR by @gante in #17497

Fix flakey no-trainer test by @muellerzr in #17515

Deal with the error when task is regression by @fireindark707 in #16330

Fix CTRL tests by @ydshieh in #17508

Fix LayoutXLMProcessorTest by @ydshieh in #17506

Debug LukeForMaskedLM by @Ryou0634 in #17499

Fix MP and CPU offload tests for Funnel and GPT-Neo by @sgugger in #17503

Exclude Databricks from notebook env by @sgugger in #17496

Fix tokenizer type annotation in pipeline(...) by @willfrey in #17500

Refactor classes to inherit from nn.Module instead of nn.Sequential by @amyeroberts in #17493

Fix wav2vec2 export onnx model with attention_mask error by @nilboy in #16004

Add warning when using older version of torch for ViltFeatureExtractor by @xhluca in #16756

Fix typo of variable names for key and query projection layer by @Kyeongpil in #17155

Fixed wrong error message for missing weight file by @123jimin in #17216

Add OnnxConfig for SqueezeBert iss17314 by @Ruihua-Fang in #17315

[GPT2Tokenizer] Fix GPT2 with bos token by @patrickvonplaten in #17498

[Json configs] Make json prettier for all saved tokenizer files & ensure same json format for all processors (tok + feat_extract) by @patrickvonplaten in #17457

Accumulate tokens into batches in PreTrainedTokenizerBase.add_tokens() by @Witiko in #17119

Add HF.co for PRs / Issues regarding specific model checkpoints by @patrickvonplaten in #17485

Fix checkpoint name by @ydshieh in #17484

Docker image build in parallel by @ydshieh in #17434

Added XLM onnx config by @nandwalritik in #17030

Disk offload fix by @sgugger in #17428

TF: GPT-2 generation supports left-padding by @gante in #17426

Fix ViTMAEModelTester by @ydshieh in #17470

[Generate] Fix output scores greedy search by @patrickvonplaten in #17442

Fix nits by @omarespejel in #17349

Fx support for multiple model architectures by @michaelbenayoun in #17393

typo IBERT in repr quant_mode by @scratchmex in #17398

Fix typo (remove parenthesis) by @mikcnt in #17415

Improve notrainer examples by @pacman100 in #17449

[OPT] Fix bos token id default by @patrickvonplaten in #17441

Fix model parallelism test by @sgugger in #17439

Pin protobouf that breaks TensorBoard in PyTorch by @sgugger in #17440

Spanish translation of the file preprocessing.mdx by @yharyarias in #16299

Spanish translation of the files sagemaker.mdx and image_classification.mdx by @SimplyJuanjo in #17262

Added es version of bertology.mdx doc by @jQuinRivero in #17255

Wav2vec2 finetuning shared file system by @patrickvonplaten in #17423

fix link in performance docs by @lvwerra in #17419

Add link to Hub PR docs in model cards by @lewtun in #17421

Upd AutoTokenizer.from_pretrained doc examples by @c00k1ez in #17416

Support compilation via Torchdynamo, AOT Autograd, NVFuser by @anijain2305 in #17308

Add test for new model parallelism features by @sgugger in #17401

Make check_init script more robust and clean inits by @sgugger in #17408

Fix README localizer script by @sgugger in #17407

Fix expected value for OPT test test_inference_no_head by @ydshieh in #17395

Clean up CLIP tests by @NielsRogge in #17380

Enabling imageGPT auto feature extractor. by @Narsil in #16871

Add support for device_map="auto" to OPT by @sgugger in #17382

OPTForCausalLM lm_head input size should be config.word_embed_proj_dim by @vfbd in #17225

Traced models serialization and torchscripting fix by @michaelbenayoun in #17206

Fix Comet ML integration by @mxschmdt in #17381

Fix cvt docstrings by @AnugunjNaman in #17367

Correct & Improve Doctests for LayoutLMv2 by @gnolai in #17168

Fix CodeParrot training script by @loubnabnl in #17291

Fix a typo relative_postion_if_large -> relative_position_if_large by @stancld in #17366

Pin dill to fix examples by @sgugger in #17368

[Test OPT] Add batch generation test opt by @patrickvonplaten in #17359

Fix bug in Wav2Vec2 pretrain example by @ddobokki in #17326

fix for 17292 by @nadahlberg in #17293

[Generation] Fix Transition probs by @patrickvonplaten in #17311

[OPT] Run test in lower precision on GPU by @patrickvonplaten in #17353

Adding batch_size test to QA pipeline. by @Narsil in #17330

[BC] Fixing usage of text pairs by @Narsil in #17324

[tests] fix copy-n-paste error by @stas00 in #17312

Fix ci_url might be None by @ydshieh in #17332

fix by @ydshieh in #17337

Fix metric calculation in examples and setup tests to run on multi-gpu for no_trainer scripts by @muellerzr in #17331

docs for typical decoding by @jadermcs in #17186

Not send successful report by @ydshieh in #17329

Fix test_t5_decoder_model_past_large_inputs by @ydshieh in #17320

Add onnx export cuda support by @JingyaHuang in #17183

Add Information Gain Filtration algorithm by @mraunak in #16953

Fix typo by @kamalkraj in #17328

remove by @ydshieh in #17325

Accepting real pytorch device as arguments. by @Narsil in #17318

Updating the docs for max_seq_len in QA pipeline by @Narsil in #17316

[T5] Fix init in TF and Flax for pretraining by @patrickvonplaten in #17294

Add type hints for ProphetNet (Pytorch) by @jQuinRivero in #17223

fix by @patrickvonplaten in #17310

[LED] fix global_attention_mask not being passed for generation and docs clarification about grad checkpointing by @caesar-one in #17112

Add support for pretraining recurring span selection to Splinter by @jvcop in #17247

Add PR author in CI report + merged by info by @ydshieh in #17298

Fix dummy creation script by @sgugger in #17304

Doctest longformer by @KMFODA in #16441

[Test] Fix W2V-Conformer integration test by @patrickvonplaten in #17303

Improve mismatched sizes management when loading a pretrained model by @regisss in #17257

correct opt by @patrickvonplaten in #17301

Rewrite TensorFlow train_step and test_step by @Rocketknight1 in #17057

Fix tests of mixed precision now that experimental is deprecated by @Rocketknight1 in #17300

fix retribert's test_torch_encode_plus_sent_to_model by @SaulLu in #17231

[ConvNeXT] Fix drop_path_rate by @NielsRogge in #17280

Fix wrong PT/TF categories in CI report by @ydshieh in #17272

Fix missing job action button in CI report by @ydshieh in #17270

Fix test_model_parallelization by @lkm2835 in #17249

[Tests] Fix slow opt tests by @patrickvonplaten in #17282

docs(transformers): fix typo by @k-zehnder in #17263

logging documentation update by @sanderland in #17174

Use the PR URL in CI report by @ydshieh in #17269

Fix FlavaForPreTrainingIntegrationTest CI test by @ydshieh in #17232

Better error in the Auto API when a dep is missing by @sgugger in #17289

Make TrainerHyperParameterSigOptIntegrationTest slow test by @ydshieh in #17288

Automatically sort auto mappings by @sgugger in #17250

Mlflowcallback fix nonetype error by @orieg in #17171

Align logits and labels in OPT by @MichelBartels in #17237

Remove next sentence prediction from supported ONNX tasks by @lewtun in #17276

CodeParrot data pretokenization by @loubnabnl in #16932

Update codeparrot data preprocessing by @loubnabnl in #16944

Updated checkpoint support for Sagemaker Model Parallel by @cavdard in #17219

fixed bug in run_mlm_flax_stream.py by @KennethEnevoldsen in #17203

[doc] performance/scalability revamp by @stas00 in #15723

TF - Fix convnext classification example by @gante in #17261

Fix obvious typos in flax decoder impl by @cloudhan in #17279

Guide to create custom models in Spanish by @ignacioct in #17158

Translated version of model_sharing.mdx doc to spanish by @Gerard-170 in #16184

Add PR title to push CI report by @ydshieh in #17246

Fix push CI channel by @ydshieh in #17242

install dev. version of accelerate by @ydshieh in #17243

Fix Trainer for Datasets that don't have dict items by @sgugger in #17239

Handle copyright in add-new-model-like by @sgugger in #17218

fix --gpus option for docker by @ydshieh in #17235

Update self-push workflow by @ydshieh in #17177

OPT - fix docstring and improve tests slighly by @patrickvonplaten in #17228

OPT-fix by @younesbelkada in #17229

Fix typo in bug report template by @fxmarty in #17178

Black preview by @sgugger in #17217

update BART docs by @patil-suraj in #17212

Add test to ensure models can take int64 inputs by @Rocketknight1 in #17210

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@sayakpaul

Include a comment to reflect Amy's contributions (#17689)

Add TFData2VecVision for semantic segmentation (#17271)

@jianan-gu

Extend Transformers Trainer Class to Enable PyTorch Torchscript for Inference (#17153)

Extend Transformers Trainer Class to Enable CPU AMP and Integrate Intel Extension for PyTorch (#17138)

@stancld

Add LongT5 model (#16792)

Fix a typo relative_postion_if_large -> relative_position_if_large (#17366)

@mfumanelli

Translation/autoclass (#17615)

Add installation.mdx Italian translation (#17530)

Setup for Italian translation and add quicktour.mdx translation (#17472)

@cwkeam

M-CTC-T Model (#16402)

@zphang

Remove RuntimeErrors for NaN-checking in 20B (#17563)

Adding GPT-NeoX-20B (#16659)

@AnugunjNaman

fix integration test levit (#17555)

Adding LeViT Model by Facebook (#17466)

Fix cvt docstrings (#17367)

@yharyarias

Spanish translation of the file preprocessing.mdx (#16299)

@mraunak

Add Information Gain Filtration algorithm (#16953)

@rzimmerdev

Added translation of installation.mdx to Portuguese Issue #16824 (#16979)

Source code(tar.gz)
Source code(zip)
v4.19.4(Jun 10, 2022)

Fixes the errors message when trying to access a repo that does not exist (started to break due to changes in Hub API).

[🐛]Properly raise RepoNotFoundError when not authenticated #17651[
Source code(tar.gz)
Source code(zip)
v4.19.3(Jun 9, 2022)
This patch release fixes the install of protobuf when a user wants to do pip install transformers[sentencepiece].

Pin protobouf that breaks TensorBoard in PyTorch #17440

Source code(tar.gz)
Source code(zip)
v4.19.2(May 16, 2022)
Patch release for the following PRs/commits:

OPT-fix #17229

OPT - fix docstring and improve tests slighly #17228

Align logits and labels in OPT #17237

Source code(tar.gz)
Source code(zip)
v4.19.1(May 13, 2022)

v4.19.1 Patch release

Fix Trainer for Datasets that don't have dict items #17239
Source code(tar.gz)
Source code(zip)
v4.19.0(May 12, 2022)
Disclaimer: this release is the first release with no Python 3.6 support.

OPT

The OPT model was proposed in Open Pre-trained Transformer Language Models by Meta AI. OPT is a series of open-sourced large causal language models which perform similar in performance to GPT3.

Add OPT by @younesbelkada in #17088

FLAVA

The FLAVA model was proposed in FLAVA: A Foundational Language And Vision Alignment Model by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela and is accepted at CVPR 2022.

The paper aims at creating a single unified foundation model which can work across vision, language as well as vision-and-language multimodal tasks.

[feat] Add FLAVA model by @apsdehal in #16654

YOLOS

The YOLOS model was proposed in You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection by Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu. YOLOS proposes to just leverage the plain Vision Transformer (ViT) for object detection, inspired by DETR. It turns out that a base-sized encoder-only Transformer can also achieve 42 AP on COCO, similar to DETR and much more complex frameworks such as Faster R-CNN.

Add YOLOS by @NielsRogge in #16848

RegNet

The RegNet model was proposed in Designing Network Design Spaces by Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár.

The authors design search spaces to perform Neural Architecture Search (NAS). They first start from a high dimensional search space and iteratively reduce the search space by empirically applying constraints based on the best-performing models sampled by the current search space.

RegNet by @FrancescoSaverioZuppichini in #16188

TAPEX

The TAPEX model was proposed in TAPEX: Table Pre-training via Learning a Neural SQL Executor by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou. TAPEX pre-trains a BART model to solve synthetic SQL queries, after which it can be fine-tuned to answer natural language questions related to tabular data, as well as performing table fact checking.

Add TAPEX by @NielsRogge in #16473

Data2Vec: vision

The Data2Vec model was proposed in data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu and Michael Auli. Data2Vec proposes a unified framework for self-supervised learning across different data modalities - text, audio and images. Importantly, predicted targets for pre-training are contextualized latent representations of the inputs, rather than modality-specific, context-independent targets.

The vision model is added in v4.19.0.

[Data2Vec] Add data2vec vision by @patrickvonplaten in #16760

Add Data2Vec for Vision in TF by @sayakpaul in #17008

FSDP integration in Trainer

PyTorch recently upstreamed the Fairscale FSDP into PyTorch Distributed with additional optimizations. This PR is aimed at integrating it into Trainer API.

It enables Distributed Training at Scale. It's a wrapper for sharding Module parameters across data parallel workers. This is inspired by Xu et al. as well as the ZeRO Stage 3 from DeepSpeed. PyTorch FSDP will focus more on production readiness and long-term support. This includes better integration with ecosystems and improvements on performance, usability, reliability, debuggability and composability.

PyTorch FSDP integration in Trainer by @pacman100 in #17136

Training scripts

New example scripts were added for image classification and semantic segmentation. Both now have versions that leverage the Trainer API and Accelerate.

Add image classification script, no trainer by @NielsRogge in #16727

Add semantic script no trainer, v2 by @NielsRogge in #16788

Add semantic script, trainer by @NielsRogge in #16834

Documentation in Spanish

To continue democratizing good machine learning, we're making the Transformers documentation more accessible to non-English speakers; starting with Spanish (572M speakers worldwide).

Added es version of language_modeling.mdx doc by @jQuinRivero in #17021

Spanish translation of the file philosophy.mdx by @jkmg in #16922

Documentation: Spanish translation of fast_tokenizers.mdx by @jloayza10 in #16882

Translate index.mdx (to ES) and add Spanish models to quicktour.mdx examples by @omarespejel in #16685

Spanish translation of the file multilingual.mdx by @SimplyJuanjo in #16329

Added spanish translation of autoclass_tutorial. by @Duedme in #17069

Fix style error in Spanish docs by @osanseviero in #17197

Improvements and bugfixes

[modeling_utils] rearrange text by @stas00 in #16632

Added Annotations for PyTorch models by @anmolsjoshi in #16619

Allow the same config in the auto mapping by @sgugger in #16631

Update no_trainer scripts with new Accelerate functionalities by @muellerzr in #16617

Fix doc example by @NielsRogge in #16448

Add inputs vector to calculate metric method by @lmvasque in #16461

[megatron-bert-uncased-345m] fix conversion by @stas00 in #16639

Remove parent/child tests in auto model tests by @sgugger in #16653

Updated _load_pretrained_model_low_mem to check if keys are in the state_dict by @FrancescoSaverioZuppichini in #16643

Update Support image on README.md by @BritneyMuller in #16615

bert: properly mention deprecation of TF2 conversion script by @stefan-it in #16171

add vit tf doctest with @add_code_sample_docstrings by @johko in #16636

Fix error in doc of DataCollatorWithPadding by @secsilm in #16662

Fix QA sample by @ydshieh in #16648

TF generate refactor - Beam Search by @gante in #16374

Add tests for no_trainer and fix existing examples by @muellerzr in #16656

only load state dict when the checkpoint is not None by @laurahanu in #16673

[Trainer] tf32 arg doc by @stas00 in #16674

Update audio examples with MInDS-14 by @stevhliu in #16633

add a warning in SpmConverter for sentencepiece's model using the byte fallback feature by @SaulLu in #16629

Fix some doc examples in task summary by @ydshieh in #16666

Jia multi gpu eval by @liyongsea in #16428

Generate: min length can't be larger than max length by @gante in #16668

fixed crash when deleting older checkpoint and a file f"{checkpoint_prefix}-*" exist by @sadransh in #16686

[Doctests] Correct task summary by @patrickvonplaten in #16644

Add Doc Test for BERT by @vumichien in #16523

Fix t5 shard on TPU Pods by @agemagician in #16527

update decoder_vocab_size when resizing embeds by @patil-suraj in #16700

Fix TF_MASKED_LM_SAMPLE by @ydshieh in #16698

Rename the method test_torchscript by @ydshieh in #16693

Reduce memory leak in _create_and_check_torchscript by @ydshieh in #16691

Enable more test_torchscript by @ydshieh in #16679

Don't push checkpoints to hub in no_trainer scripts by @muellerzr in #16703

Private repo TrainingArgument by @nbroad1881 in #16707

Handle image_embeds in ViltModel by @ydshieh in #16696

Improve PT/TF equivalence test by @ydshieh in #16557

Fix example logs repeating themselves by @muellerzr in #16669

[Bart] correct doc test by @patrickvonplaten in #16722

Add Doc Test GPT-2 by @ArEnSc in #16439

Only call get_output_embeddings when tie_word_embeddings is set by @smelm in #16667

Update run_translation_no_trainer.py by @raki-1203 in #16652

Qdqbert example add benchmark script with ORT-TRT by @shangz-ai in #16592

Replace assertion with exception by @anmolsjoshi in #16720

Change the chunk_iter function to handle by @Narsil in #16730

Remove duplicate header by @sgugger in #16732

Moved functions to pytorch_utils.py by @anmolsjoshi in #16625

TF: remove set_tensor_by_indices_to_value by @gante in #16729

Add Doc Tests for Reformer PyTorch by @hiromu166 in #16565

[FlaxSpeechEncoderDecoder] Fix input shape bug in weights init by @sanchit-gandhi in #16728

[FlaxWav2Vec2Model] Fix bug in attention mask by @sanchit-gandhi in #16725

add Bigbird ONNX config by @vumichien in #16427

TF generate: handle case without cache in beam search by @gante in #16704

Fix decoding score comparison when using logits processors or warpers by @bryant1410 in #10638

[Doctests] Fix all T5 doc tests by @patrickvonplaten in #16646

Fix #16660 (tokenizers setters of ids of special tokens) by @davidleonfdez in #16661

[from_pretrained] refactor find_mismatched_keys by @stas00 in #16706

Add Doc Test for GPT-J by @ArEnSc in #16507

Fix and improve CTRL doctests by @jeremyadamsfisher in #16573

[modeling_utils] better explanation of ignore keys by @stas00 in #16741

CI: setup-dependent pip cache by @gante in #16751

Reduce Funnel PT/TF diff by @ydshieh in #16744

Add defensive check for config num_labels and id2label by @sgugger in #16709

Add self training code for text classification by @tuvuumass in #16738

[self-scheduled ci] explain where dependencies are by @stas00 in #16757

Fixup no_trainer examples scripts and add more tests by @muellerzr in #16765

[Doctest] added doctest changes for electra by @bhadreshpsavani in #16675

Enabling Tapex in table question answering pipeline. by @Narsil in #16663

[Flax .from_pretrained] Raise a warning if model weights are not in float32 by @sanchit-gandhi in #16762

Fix batch size in evaluation loop by @sgugger in #16763

Make nightly install dev accelerate by @muellerzr in #16783

[deepspeed / m2m_100] make deepspeed zero-3 work with layerdrop by @stas00 in #16717

Kill async pushes when calling push_to_hub with blocking=True by @sgugger in #16755

Improve image classification example by @NielsRogge in #16585

[SpeechEncoderDecoderModel] Fix bug in reshaping labels by @sanchit-gandhi in #16748

Fix issue avoid-missing-comma found at https://codereview.doctor by @code-review-doctor in #16768

[trainer / deepspeed] fix hyperparameter_search by @stas00 in #16740

[modeling utils] revamp from_pretrained(..., low_cpu_mem_usage=True) + tests by @stas00 in #16657

Fix PT TF ViTMAE by @ydshieh in #16766

Update README.md by @NielsRogge in #16797

Pin Jax to last working release by @sgugger in #16808

CI: non-remote GH Actions now use a python venv by @gante in #16789

TF generate refactor - XLA sample by @gante in #16713

Raise error and suggestion when using custom optimizer with Fairscale or Deepspeed by @allanj in #16786

Create empty venv on cache miss by @gante in #16816

[ViT, BEiT, DeiT, DPT] Improve code by @NielsRogge in #16799

[Quicktour Audio] Improve && remove ffmpeg dependency by @patrickvonplaten in #16723

fix megatron bert convert state dict naming by @Codle in #15820

use base_version to check torch version in torch_less_than_1_11 by @nbroad1881 in #16806

Allow passing encoder_ouputs as tuple to EncoderDecoder Models by @jsnfly in #16814

Refactor issues with yaml by @LysandreJik in #16772

fix _setup_devices in case where there is no torch.distributed package in build by @dlwh in #16821

Clean up semantic segmentation tests by @NielsRogge in #16801

Fix LayoutLMv2 tokenization docstrings by @qqaatw in #16187

Wav2 vec2 phoneme ctc tokenizer optimisation by @ArthurZucker in #16817

[Flax] improve large model init and loading by @patil-suraj in #16148

Some tests misusing assertTrue for comparisons fix by @code-review-doctor in #16771

Type hints added for TFMobileBert by @Dahlbomii in #16505

fix rum_clm.py seeking text column name twice by @dandelin in #16624

Add onnx export of models with a multiple choice classification head by @echarlaix in #16758

[ASR Pipeline] Correct init docs by @patrickvonplaten in #16833

Add doc about attention_mask on gpt2 by @wiio12 in #16829

TF: Add sigmoid activation function by @gante in #16819

Correct Logging of Eval metric to Tensorboard by @Jeevesh8 in #16825

replace Speech2TextTokenizer by Speech2TextFeatureExtractor in some docstrings by @SaulLu in #16835

Type hints added to Speech to Text by @Dahlbomii in #16506

Improve test_pt_tf_model_equivalence on PT side by @ydshieh in #16731

Add support for bitsandbytes by @manuelciosici in #15622

[Typo] Fix typo in modeling utils by @patrickvonplaten in #16840

add DebertaV2 fast tokenizer by @mingboiz in #15529

Fixing return type tensor with num_return_sequences>1. by @Narsil in #16828

[modeling_utils] use less cpu memory with sharded checkpoint loading by @stas00 in #16844

[docs] fix url by @stas00 in #16860

Fix custom init sorting script by @sgugger in #16864

Fix multiproc metrics in no_trainer examples by @muellerzr in #16865

Long QuestionAnsweringPipeline fix. by @Narsil in #16778

t5: add conversion script for T5X to FLAX by @stefan-it in #16853

tiny tweak to allow BatchEncoding.token_to_char when token doesn't correspond to chars by @ghlai9665 in #15901

Adding support for array key in raw dictionnaries in ASR pipeline. by @Narsil in #16827

Return input_ids in ImageGPT feature extractor by @sgugger in #16872

Use ACT2FN to fetch ReLU activation by @eldarkurtic in #16874

Fix GPT-J onnx conversion by @ChainYo in #16780

Fix doctest list by @ydshieh in #16878

New features for CodeParrot training script by @loubnabnl in #16851

Add missing entries in mappings by @ydshieh in #16857

TF: rework XLA generate tests by @gante in #16866

Minor fixes/improvements in convert_file_size_to_int by @mariosasko in #16891

Add doc tests for Albert and Bigbird by @vumichien in #16774

Add OnnxConfig for ConvBERT by @ChainYo in #16859

TF: XLA repetition penalty by @gante in #16879

Changes in create_optimizer to support tensor parallelism with SMP by @cavdard in #16880

[DocTests] Fix some doc tests by @patrickvonplaten in #16889

add bigbird typo fixes by @ChainYo in #16897

Fix doc test quicktour dataset by @patrickvonplaten in #16929

Add missing ckpt in config docs by @ydshieh in #16900

Fix PyTorch RAG tests GPU OOM by @ydshieh in #16881

Fix RemBertTokenizerFast by @ydshieh in #16933

TF: XLA logits processors - minimum length, forced eos, and forced bos by @gante in #16912

TF: XLA Logits Warpers by @gante in #16899

added deit onnx config by @rushic24 in #16887

TF: XLA stable softmax by @gante in #16892

Replace deprecated logger.warn with warning by @sanchit-gandhi in #16876

Fix issue probably-meant-fstring found at https://codereview.doctor by @code-review-doctor in #16913

Limit the use of PreTrainedModel.device by @sgugger in #16935

apply torch int div to layoutlmv2 by @ManuelFay in #15457

FIx Iterations for decoder by @agemagician in #16934

Add onnx config for RoFormer by @skrsna in #16861

documentation: some minor clean up by @mingboiz in #16850

Fix RuntimeError message format by @ftnext in #16906

use original loaded keys to find mismatched keys by @tricktreat in #16920

[Research] Speed up evaluation for XTREME-S by @anton-l in #16785

Fix HubertRobustTest PT/TF equivalence test on GPU by @ydshieh in #16943

Misc. fixes for Pytorch QA examples: by @searchivarius in #16958

[HF Argparser] Fix parsing of optional boolean arguments by @NielsRogge in #16946

Fix distributed_concat with scalar tensor by @Yard1 in #16963

Update custom_models.mdx by @mishig25 in #16964

Fix add-new-model-like when model doesn't support all frameworks by @sgugger in #16966

Fix multiple deletions of the same files in save_pretrained by @sgugger in #16947

Fixup no_trainer save logic by @muellerzr in #16968

Fix doc notebooks links by @sgugger in #16969

Fix check_all_models_are_tested by @ydshieh in #16970

Add -e flag to some GH workflow yml files by @ydshieh in #16959

Update tokenization_bertweet.py by @datquocnguyen in #16941

Update check_models_are_tested to deal with Windows path by @ydshieh in #16973

Add parameter --config_overrides for run_mlm_wwm.py by @conan1024hao in #16961

Rename a class to reflect framework pattern AutoModelXxx -> TFAutoModelXxx by @amyeroberts in #16993

set eos_token_id to None to generate until max length by @ydshieh in #16989

Fix savedir for by epoch by @muellerzr in #16996

Update README to latest release by @sgugger in #16997

use scale=1.0 in floats_tensor called in speech model testers by @ydshieh in #17007

Update all require decorators to use skipUnless when possible by @muellerzr in #16999

TF: XLA bad words logits processor and list of processors by @gante in #16974

Make create_extended_attention_mask_for_decoder static method by @pbelevich in #16893

Update README_zh-hans.md by @tarzanwill in #16977

Updating variable names. by @Narsil in #16445

Revert "Updating variable names. by @Narsil in #16445)"

Replace dict/BatchEncoding instance checks by Mapping by @sgugger in #17014

Result of new doc style with fixes by @sgugger in #17015

Add a check on config classes docstring checkpoints by @ydshieh in #17012

Add translating guide by @omarespejel in #17004

update docs of length_penalty by @manandey in #17022

[FlaxGenerate] Fix bug in decoder_start_token_id by @sanchit-gandhi in #17035

Fx with meta by @michaelbenayoun in #16836

[Flax(Speech)EncoderDecoder] Fix bug in decoder_module by @sanchit-gandhi in #17036

Fix typo in RetriBERT docstring by @mpoemsl in #17018

add torch.no_grad when in eval mode by @JunnYu in #17020

Disable Flax GPU tests on push by @sgugger in #17042

Clean up vision tests by @NielsRogge in #17024

[Trainer] Move logic for checkpoint loading into separate methods for easy overriding by @calpt in #17043

Update no_trainer examples to use new logger by @muellerzr in #17044

Fix no_trainer examples to properly calculate the number of samples by @muellerzr in #17046

Allow all imports from transformers by @LysandreJik in #17050

Make the sacremoses dependency optional by @LysandreJik in #17049

Clean up setup.py by @sgugger in #17045

[T5 Tokenizer] Model has no fixed position ids - there is no hardcode… by @patrickvonplaten in #16990

[FlaxBert] Add ForCausalLM by @sanchit-gandhi in #16995

Move test model folders by @ydshieh in #17034

Make Trainer compatible with sharded checkpoints by @sgugger in #17053

Remove Python and use v2 action by @sgugger in #17059

Fix RNG reload in resume training from epoch checkpoint by @sgugger in #17055

Remove device parameter from create_extended_attention_mask_for_decoder by @pbelevich in #16894

Fix hashing for deduplication by @thomasw21 in #17048

Skip RoFormer ONNX test if rjieba not installed by @lewtun in #16981

Remove masked image modeling from BEIT ONNX export by @lewtun in #16980

Make sure telemetry arguments are not returned as unused kwargs by @sgugger in #17063

Type hint complete Albert model file. by @karthikrangasai in #16682

Deprecate model templates by @sgugger in #17062

Update to build via git for accelerate by @muellerzr in #17084

Allow saved_model export of TFCLIPModel in save_pretrained by @seanmor5 in #16886

Fix DeBERTa token_type_ids by @deutschmn in #17082

📝 open fresh PR for pipeline doctests by @stevhliu in #17073

minor change on TF Data2Vec test by @ydshieh in #17085

type hints for pytorch models by @robotjellyzone in #17064

Add type hints for BERTGeneration by @robsmith155 in #17047

Fix MLflowCallback and add support for MLFLOW_EXPERIMENT_NAME by @orieg in #17091

Remove torchhub test by @sgugger in #17097

fix missing "models" in pipeline test module by @ydshieh in #17090

Fix link to example scripts by @stevhliu in #17103

Fix self-push CI report path in cat by @ydshieh in #17111

Added BigBirdPegasus onnx config by @nandwalritik in #17104

split single_gpu and multi_gpu by @ydshieh in #17083

LayoutLMv2Processor: ensure 1-to-1 mapping between images and samples in case of overflowing tokens by @ghlai9665 in #17092

Add type hints for BigBirdPegasus and Data2VecText PyTorch models by @robsmith155 in #17123

add mobilebert onnx configs by @manandey in #17029

[WIP] Fix Pyright static type checking by replacing if-else imports with try-except by @d-miketa in #16578

Add the auto_find_batch_size capability from Accelerate into Trainer by @muellerzr in #17068

Fix MLflowCallback end_run() and add support for tags and nested runs by @orieg in #17130

Fix all docs for accelerate install directions by @muellerzr in #17145

LogSumExp trick question_answering pipeline. by @Narsil in #17143

train args defaulting None marked as Optional by @d-miketa in #17156

[trainer] sharded _load_best_model by @stas00 in #17150

[Deepspeed] add many more models to the model zoo test by @stas00 in #12695

Fixing the output of code examples in the preprocessing chapter by @HallerPatrick in #17162

missing file by @stas00 in #17164

Add MLFLOW_FLATTEN_PARAMS support in MLflowCallback by @orieg in #17148

Fix template init by @sgugger in #17163

MobileBERT tokenizer tests by @leondz in #16896

[M2M100 doc] remove duplicate example by @patil-suraj in #17175

Extend Transformers Trainer Class to Enable PyTorch SGD/Adagrad Optimizers for Training by @jianan-gu in #17154

propagate "attention_mask" dtype for "use_past" in OnnxConfig.generate_dummy_inputs by @arampacha in #17105

Convert image to rgb for clip model by @hengkuanwee in #17101

Add missing RetriBERT tokenizer tests by @mpoemsl in #17017

[WIP] Enable reproducibility for distributed trainings by @hasansalimkanmaz in #16907

Remove unnecessary columns for all dataset types in Trainer by @Yard1 in #17166

Fix LED documentation by @manuelciosici in #17181

Ensure tensors are at least 1d for pad and concat by @Yard1 in #17179

add shift_tokens_right in FlaxMT5 by @patil-suraj in #17188

Remove columns before passing to data collator by @Yard1 in #17187

Remove duplicated os.path.join by @shijie-wu in #17192

Fix contents in index.mdx to match docs' sidebar by @omarespejel in #17198

ViT and Swin symbolic tracing with torch.fx by @michaelbenayoun in #17182

migrate azure blob for beit checkpoints by @donglixp in #16902

Update data2vec.mdx to include a Colab Notebook link (that shows fine-tuning) by @sayakpaul in #17194

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@anmolsjoshi

Added Annotations for PyTorch models (#16619)

Replace assertion with exception (#16720)

Moved functions to pytorch_utils.py (#16625)

@vumichien

Add Doc Test for BERT (#16523)

add Bigbird ONNX config (#16427)

Add doc tests for Albert and Bigbird (#16774)

@tuvuumass

Add self training code for text classification (#16738)

@sayakpaul

Add Data2Vec for Vision in TF (#17008)

@robotjellyzone

type hints for pytorch models (#17064)

@d-miketa

[WIP] Fix Pyright static type checking by replacing if-else imports with try-except (#16578)

train args defaulting None marked as Optional (#17156)

Source code(tar.gz)
Source code(zip)
v4.18.0(Apr 7, 2022)
New model additions

You'll notice that we are starting to add several older models in vision. This is because those models are used as backbones in recent architectures. While we could rely on existing libraries for such pretrained models, we will ultimately need some support for those backbones in PyTorch/TensorFlow and Jax, and there is currently no library that supports those three frameworks. This is why we are starting to add those models to Transformers directly (here ResNet and VAN)

GLPN

The GLPN model was proposed in Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth by Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim. GLPN combines SegFormer’s hierarchical mix-Transformer with a lightweight decoder for monocular depth estimation. The proposed decoder shows better performance than the previously proposed decoders, with considerably less computational complexity.

Add GLPN by @NielsRogge in https://github.com/huggingface/transformers/pull/16199

ResNet

The ResNet model was proposed in Deep Residual Learning for Image Recognition by Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun. Our implementation follows the small changes made by Nvidia, we apply the stride=2 for downsampling in bottleneck’s 3x3 conv and not in the first 1x1. This is generally known as “ResNet v1.5”.

ResNet introduced residual connections, they allow to train networks with an unseen number of layers (up to 1000). ResNet won the 2015 ILSVRC & COCO competition, one important milestone in deep computer vision.

Resnet by @FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/15770

VAN

The VAN model was proposed in Visual Attention Network by Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu.

This paper introduces a new attention layer based on convolution operations able to capture both local and distant relationships. This is done by combining normal and large kernel convolution layers. The latter uses a dilated convolution to capture distant correlations.

Visual Attention Network (VAN) by @FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/16027

VisionTextDualEncoder

The VisionTextDualEncoderModel can be used to initialize a vision-text dual encoder model with any pretrained vision autoencoding model as the vision encoder (e.g. ViT, BEiT, DeiT) and any pretrained text autoencoding model as the text encoder (e.g. RoBERTa, BERT). Two projection layers are added on top of both the vision and text encoder to project the output embeddings to a shared latent space. The projection layers are randomly initialized so the model should be fine-tuned on a downstream task. This model can be used to align the vision-text embeddings using CLIP like contrastive image-text training and then can be used for zero-shot vision tasks such image-classification or retrieval.

In LiT: Zero-Shot Transfer with Locked-image Text Tuning it is shown how leveraging pre-trained (locked/frozen) image and text model for contrastive learning yields significant improvment on new zero-shot vision tasks such as image classification or retrieval.

add VisionTextDualEncoder and CLIP fine-tuning script by @patil-suraj in https://github.com/huggingface/transformers/pull/15701

DiT

DiT was proposed in DiT: Self-supervised Pre-training for Document Image Transformer by Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei. DiT applies the self-supervised objective of BEiT (BERT pre-training of Image Transformers) to 42 million document images, allowing for state-of-the-art results on tasks including:

document image classification: the RVL-CDIP dataset (a collection of 400,000 images belonging to one of 16 classes).

document layout analysis: the PubLayNet dataset (a collection of more than 360,000 document images constructed by automatically parsing PubMed XML files).

table detection: the ICDAR 2019 cTDaR dataset (a collection of 600 training images and 240 testing images).

Add Document Image Transformer (DiT) by @NielsRogge in https://github.com/huggingface/transformers/pull/15984

DPT

The DPT model was proposed in Vision Transformers for Dense Prediction by René Ranftl, Alexey Bochkovskiy, Vladlen Koltun. DPT is a model that leverages the Vision Transformer (ViT) as backbone for dense prediction tasks like semantic segmentation and depth estimation.

Add DPT by @NielsRogge in https://github.com/huggingface/transformers/pull/15991

Checkpoint sharding

Large models are becoming more and more the norm and having a checkpoint in a single file is challenging for several reasons:

it's tougher to upload/download files bigger than 20/30 GB efficiently

the whole checkpoint might not fit into RAM even if you have enough GPU memory

That's why the save_pretrained method will know automatically shard a checkpoint in several files when you go above a 10GB threshold for PyTorch models. from_pretrained will handle such sharded checkpoints as if there was only one file.

Checkpoint sharding by @sgugger in https://github.com/huggingface/transformers/pull/16343

TensorFlow implementations

GPT-J and ViTMAE are now available in TensorFlow.

Add TF implementation of GPT-J by @stancld in https://github.com/huggingface/transformers/pull/15623

Add TF ViT MAE by @sayakpaul in https://github.com/huggingface/transformers/pull/16255

Documentation guides

The IA migration is wrapped up with a new conceptual guide available.

Create concept guide section by @stevhliu in https://github.com/huggingface/transformers/pull/16369

Improvements and bugfixes

Fix doc links in release utils by @sgugger in https://github.com/huggingface/transformers/pull/15903

Fix a TF Vision Encoder Decoder test by @ydshieh in https://github.com/huggingface/transformers/pull/15896

[Fix link in pipeline doc] by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15906

Fix and improve REALM fine-tuning by @qqaatw in https://github.com/huggingface/transformers/pull/15297

Freeze FlaxWav2Vec2 Feature Encoder by @sanchit-gandhi in https://github.com/huggingface/transformers/pull/15873

The tests were not updated after the addition of torch.diag by @Narsil in https://github.com/huggingface/transformers/pull/15890

[Doctests] Fix ignore bug and add more doc tests by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15911

Enabling MaskFormer in pipelines by @Narsil in https://github.com/huggingface/transformers/pull/15917

Minor fixes for MaskFormer by @FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/15916

Add vision models to doc tests by @NielsRogge in https://github.com/huggingface/transformers/pull/15905

Fix #15898 by @davidleonfdez in https://github.com/huggingface/transformers/pull/15928

Update doc test readme by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15926

Re-enabling all fast pipeline tests. by @Narsil in https://github.com/huggingface/transformers/pull/15924

Support CLIPTokenizerFast for CLIPProcessor by @cosmoquester in https://github.com/huggingface/transformers/pull/15913

Updating the slow tests: by @Narsil in https://github.com/huggingface/transformers/pull/15893

Adding MODEL_FOR_INSTANCE_SEGMENTATION_MAPPING by @Narsil in https://github.com/huggingface/transformers/pull/15934

Add missing support for Flax XLM-RoBERTa by @versae in https://github.com/huggingface/transformers/pull/15900

[FlaxT5 Example] Fix flax t5 example pretraining by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15835

Do not change the output from tuple to list - to match PT's version by @ydshieh in https://github.com/huggingface/transformers/pull/15918

Tests for MaskFormerFeatureExtractor's post_process*** methods by @FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/15929

Constrained Beam Search [With Disjunctive Decoding] by @cwkeam in https://github.com/huggingface/transformers/pull/15761

[LayoutLMv2] Update requires_backends of feature extractor by @NielsRogge in https://github.com/huggingface/transformers/pull/15941

Made MaskFormerModelTest faster by @FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/15942

[Bug Fix] Beam search example in docs fails & a fix (integrating max_length in BeamScorer.finalize()) by @cwkeam in https://github.com/huggingface/transformers/pull/15555

remove re-defination of FlaxWav2Vec2ForCTCModule by @patil-suraj in https://github.com/huggingface/transformers/pull/15965

Support modern list type hints in HfArgumentParser by @konstantinjdobler in https://github.com/huggingface/transformers/pull/15951

Backprop Test for Freeze FlaxWav2Vec2 Feature Encoder by @sanchit-gandhi in https://github.com/huggingface/transformers/pull/15938

Fix Embedding Module Bug in Flax Models by @sanchit-gandhi in https://github.com/huggingface/transformers/pull/15920

Make is_thing_map in Feature Extractor post_process_panoptic_segmentation defaults to all instances by @FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/15954

Update training scripts docs by @stevhliu in https://github.com/huggingface/transformers/pull/15931

Set scale_embedding to False in some TF tests by @ydshieh in https://github.com/huggingface/transformers/pull/15952

Fix LayoutLMv2 test by @NielsRogge in https://github.com/huggingface/transformers/pull/15939

[Tests] Fix ViTMAE integration test by @NielsRogge in https://github.com/huggingface/transformers/pull/15949

Returning outputs only when asked for for MaskFormer. by @Narsil in https://github.com/huggingface/transformers/pull/15936

Speedup T5 Flax training by using Numpy instead of JAX for batch shuffling by @yhavinga in https://github.com/huggingface/transformers/pull/15963

Do a pull in case docs were updated during build by @sgugger in https://github.com/huggingface/transformers/pull/15922

Fix TFEncDecModelTest - Pytorch device by @ydshieh in https://github.com/huggingface/transformers/pull/15979

[Env Command] Add hf hub to env version command by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15981

TF: Update multiple choice example by @gante in https://github.com/huggingface/transformers/pull/15868

TF generate refactor - past without encoder outputs by @gante in https://github.com/huggingface/transformers/pull/15944

Seed _get_train_sampler's generator with arg seed to improve reproducibility by @dlwh in https://github.com/huggingface/transformers/pull/15961

Add ForInstanceSegmentation models to image-segmentation pipelines by @Narsil in https://github.com/huggingface/transformers/pull/15937

[Doctests] Move doctests to new GPU & Fix bugs by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15969

Removed an outdated check about hdf5_version by @ydshieh in https://github.com/huggingface/transformers/pull/16011

Swag example: Update doc format by @gante in https://github.com/huggingface/transformers/pull/16014

Fix github actions comment by @LysandreJik in https://github.com/huggingface/transformers/pull/16009

Simplify release utils by @sgugger in https://github.com/huggingface/transformers/pull/15921

Make pos optional in PerceiverAudioPreprocessor to avoid crashing PerceiverModel operation by @basilevh in https://github.com/huggingface/transformers/pull/15972

Fix MaskFormer failing test on master by @FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/16012

Fix broken code blocks in README.md by @upura in https://github.com/huggingface/transformers/pull/15967

Use tiny models for get_pretrained_model in TFEncoderDecoderModelTest by @ydshieh in https://github.com/huggingface/transformers/pull/15989

Add ONNX export for ViT by @lewtun in https://github.com/huggingface/transformers/pull/15658

Add FlaxBartForCausalLM by @sanchit-gandhi in https://github.com/huggingface/transformers/pull/15995

add doctests for bart like seq2seq models by @patil-suraj in https://github.com/huggingface/transformers/pull/15987

Fix warning message in ElectraForCausalLM by @pbelevich in https://github.com/huggingface/transformers/pull/16023

Freeze Feature Encoder in FlaxSpeechEncoderDecoder by @sanchit-gandhi in https://github.com/huggingface/transformers/pull/15997

Fix dependency error message in ServeCommand by @andstor in https://github.com/huggingface/transformers/pull/16033

[Docs] Improve PyTorch, Flax generate API by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15988

[Tests] Add attentions_option to ModelTesterMixin by @NielsRogge in https://github.com/huggingface/transformers/pull/15909

[README] fix url for Preprocessing tutorial by @patil-suraj in https://github.com/huggingface/transformers/pull/16042

Fix Bug in Flax-Speech-Encoder-Decoder Test by @sanchit-gandhi in https://github.com/huggingface/transformers/pull/16041

Fix TFDebertaV2ConvLayer in TFDebertaV2Model by @ydshieh in https://github.com/huggingface/transformers/pull/16031

Build the doc in a seperate folder then move it by @sgugger in https://github.com/huggingface/transformers/pull/16020

Don't compute metrics in LM examples on TPU by @sgugger in https://github.com/huggingface/transformers/pull/16029

TF: Unpack model inputs through a decorator by @gante in https://github.com/huggingface/transformers/pull/15907

Fix Bug in Flax Seq2Seq Models by @sanchit-gandhi in https://github.com/huggingface/transformers/pull/16021

DeBERTa/DeBERTa-v2/SEW Support for torch 1.11 by @LysandreJik in https://github.com/huggingface/transformers/pull/16043

support new marian models by @patil-suraj in https://github.com/huggingface/transformers/pull/15831

Fix duplicate arguments passed to dummy inputs in ONNX export by @lewtun in https://github.com/huggingface/transformers/pull/16045

FIX: updating doc/example for fine-tune for downstream Token Classification by @davidsbatista in https://github.com/huggingface/transformers/pull/16063

Fix a TF test name (LayoutLMModelTest) by @ydshieh in https://github.com/huggingface/transformers/pull/16061

Move QDQBert in just PyTorch block by @sgugger in https://github.com/huggingface/transformers/pull/16062

Remove assertion over possible activation functions in DistilBERT by @mfuntowicz in https://github.com/huggingface/transformers/pull/16066

Fix torch-scatter version by @LysandreJik in https://github.com/huggingface/transformers/pull/16072

Add type annotations for BERT and copies by @Rocketknight1 in https://github.com/huggingface/transformers/pull/16074

Adding type hints for TFRoBERTa by @Rocketknight1 in https://github.com/huggingface/transformers/pull/16057

Make sure 'torch.dtype' has str-type value in config and all nested dicts for JSON serializability by @feifang24 in https://github.com/huggingface/transformers/pull/16065

Run daily doctests without time-out at least once by @patrickvonplaten in https://github.com/huggingface/transformers/pull/16077

Add soft length regulation for sequence generation by @kevinpl07 in https://github.com/huggingface/transformers/pull/15245

Update troubleshoot guide by @stevhliu in https://github.com/huggingface/transformers/pull/16001

Add type annotations for ImageGPT by @johnnv1 in https://github.com/huggingface/transformers/pull/16088

Rebuild deepspeed by @LysandreJik in https://github.com/huggingface/transformers/pull/16081

Add missing type hints for all flavors of RoBERTa PyTorch models. by @ChainYo in https://github.com/huggingface/transformers/pull/16086

[Fix doc example] FSMT by @ydshieh in https://github.com/huggingface/transformers/pull/16085

Audio/vision task guides by @stevhliu in https://github.com/huggingface/transformers/pull/15808

[ZeRO] Fixes issue with embedding resize by @jeffra in https://github.com/huggingface/transformers/pull/16093

[Deepspeed] add support for bf16 mode by @stas00 in https://github.com/huggingface/transformers/pull/14569

Change unpacking of TF Bart inputs to use decorator by @osanseviero in https://github.com/huggingface/transformers/pull/16094

add unpack_inputs decorator to mbart tf by @Abdelrhman-Hosny in https://github.com/huggingface/transformers/pull/16097

Add type annotations for segformer pytorch by @p-mishra1 in https://github.com/huggingface/transformers/pull/16099

Add unpack_input decorator to ViT model by @johnnv1 in https://github.com/huggingface/transformers/pull/16102

Add type hints to XLM model (PyTorch) by @jbrry in https://github.com/huggingface/transformers/pull/16108

Add missing type hints for all flavors of LayoutLMv2 PyTorch models. by @ChainYo in https://github.com/huggingface/transformers/pull/16089

Add TFCamembertForCausalLM and ONNX integration test by @lewtun in https://github.com/huggingface/transformers/pull/16073

Fix and document Zero Shot Image Classification by @osanseviero in https://github.com/huggingface/transformers/pull/16079

Fix Loading of Flax(Speech)EncoderDecoderModel kwargs from PreTrained Encoder-Decoder Checkpoints by @sanchit-gandhi in https://github.com/huggingface/transformers/pull/16056

Update convert_marian_to_pytorch.py by @jorgtied in https://github.com/huggingface/transformers/pull/16124

Make TF pt-tf equivalence test more aggressive by @ydshieh in https://github.com/huggingface/transformers/pull/15839

Fix ProphetNetTokenizer by @ydshieh in https://github.com/huggingface/transformers/pull/16082

Change unpacking of TF mobilebert inputs to use decorator by @vumichien in https://github.com/huggingface/transformers/pull/16110

Steps strategy fix for PushtoHubCallback and changed docstring by @merveenoyan in https://github.com/huggingface/transformers/pull/16138

[ViTMAE] Add copied from statements and fix prefix by @NielsRogge in https://github.com/huggingface/transformers/pull/16119

Spanish translation of the file training.mdx by @yharyarias in https://github.com/huggingface/transformers/pull/16047

Added missing type hints - ELECTRA PyTorch by @kamalkraj in https://github.com/huggingface/transformers/pull/16103

Added missing type hints - Deberta V1 and V2 by @kamalkraj in https://github.com/huggingface/transformers/pull/16105

[Fix doc example] Fix checkpoint name in docstring example by @ydshieh in https://github.com/huggingface/transformers/pull/16083

Better input variable naming for OpenAI (TF) by @bhavika in https://github.com/huggingface/transformers/pull/16129

Improve model variable naming - CLIP [TF] by @bhavika in https://github.com/huggingface/transformers/pull/16128

Add type hints for TFDistilBert by @PepijnBoers in https://github.com/huggingface/transformers/pull/16107

Choose framework for ONNX export by @michaelbenayoun in https://github.com/huggingface/transformers/pull/16018

Add type hints for Luke in PyTorch by @bhavika in https://github.com/huggingface/transformers/pull/16111

Add type hints for PoolFormer in Pytorch by @soomiles in https://github.com/huggingface/transformers/pull/16121

Add type hints for SqueezeBert PyTorch by @Tegzes in https://github.com/huggingface/transformers/pull/16126

Added missing type hints - ELECTRA TF by @kamalkraj in https://github.com/huggingface/transformers/pull/16104

Dcoker images runtime -> devel by @LysandreJik in https://github.com/huggingface/transformers/pull/16141

Add type annotations for CLIP (torch) (#16059) by @jacobdineen in https://github.com/huggingface/transformers/pull/16106

Add type hints for FNet PyTorch by @wpan03 in https://github.com/huggingface/transformers/pull/16123

Use HF_ENDPOINT for custom endpoints by @sgugger in https://github.com/huggingface/transformers/pull/16139

update albert with tf decorator by @infinite-Joy in https://github.com/huggingface/transformers/pull/16147

clearer model variable naming: ELECTRA by @kamalkraj in https://github.com/huggingface/transformers/pull/16143

Add type hints for GPTNeo PyTorch by @Tegzes in https://github.com/huggingface/transformers/pull/16127

Improve Swin for VisionEncoderDecoder by @NielsRogge in https://github.com/huggingface/transformers/pull/16070

Make transformers.utils.fx. _SUPPORTED_MODELS unique by @pbelevich in https://github.com/huggingface/transformers/pull/16015

Shift responsibilities a bit for issues by @patrickvonplaten in https://github.com/huggingface/transformers/pull/16154

typo "conaining" -> "containing" by @marxav in https://github.com/huggingface/transformers/pull/16132

Configurable Relative Position Max. Distance by @agemagician in https://github.com/huggingface/transformers/pull/16155

Added spanish translation of quicktour.mdx by @Duedme in https://github.com/huggingface/transformers/pull/16158

Use templates by @sgugger in https://github.com/huggingface/transformers/pull/16142

[Fix doc example] Fix first example for the custom_datasets tutorial by @MarkusSagen in https://github.com/huggingface/transformers/pull/16087

[Fix doc example] Fix 2 PyTorch Vilt docstring examples by @ydshieh in https://github.com/huggingface/transformers/pull/16076

TF XLA greedy generation by @Rocketknight1 in https://github.com/huggingface/transformers/pull/15786

clearer model variable naming: pegasus by @kamalkraj in https://github.com/huggingface/transformers/pull/16152

Change unpacking of TF layoutlm inputs to use decorator by @vumichien in https://github.com/huggingface/transformers/pull/16112

update transformer XL with tf decorator by @infinite-Joy in https://github.com/huggingface/transformers/pull/16166

added type hints to yoso by @mowafess in https://github.com/huggingface/transformers/pull/16163

Framework split by @sgugger in https://github.com/huggingface/transformers/pull/16030

[MT5Config] add relative_attention_max_distance in config by @patil-suraj in https://github.com/huggingface/transformers/pull/16170

clearer model variable naming: Tapas by @kamalkraj in https://github.com/huggingface/transformers/pull/16145

clearer model variable naming: Deberta by @kamalkraj in https://github.com/huggingface/transformers/pull/16146

Add flaubert types by @ChainYo in https://github.com/huggingface/transformers/pull/16118

clearer model variable naming: xlnet by @kamalkraj in https://github.com/huggingface/transformers/pull/16150

Add type hints for Perceiver Pytorch by @jcmc00 in https://github.com/huggingface/transformers/pull/16174

Add type hints for Reformer PyTorch by @Tegzes in https://github.com/huggingface/transformers/pull/16175

Fix some Flax models' hidden_states by @ydshieh in https://github.com/huggingface/transformers/pull/16167

Add the XTREME-S fine-tuning example by @anton-l in https://github.com/huggingface/transformers/pull/15985

[Xtreme-S] fix some namings by @patrickvonplaten in https://github.com/huggingface/transformers/pull/16183

Replace all deprecated jax.ops operations with jnp's at by @sanchit-gandhi in https://github.com/huggingface/transformers/pull/16078

clearer model variable naming: funnel by @utkusaglm in https://github.com/huggingface/transformers/pull/16178

clearer model variable naming: blenderbot by @utkusaglm in https://github.com/huggingface/transformers/pull/16192

Minor fixes to XTREME-S by @anton-l in https://github.com/huggingface/transformers/pull/16193

unpack_input decorator for tf_convnext by @johko in https://github.com/huggingface/transformers/pull/16181

clearer model variable naming: blenderbot_small by @utkusaglm in https://github.com/huggingface/transformers/pull/16194

Adding type hints for Distilbert by @johnryan465 in https://github.com/huggingface/transformers/pull/16090

ResNet: update modules names by @FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/16196

Update a CI job step name by @ydshieh in https://github.com/huggingface/transformers/pull/16189

Fix loading CLIPVisionConfig and CLIPTextConfig by @patil-suraj in https://github.com/huggingface/transformers/pull/16198

TF: add beam search tests by @gante in https://github.com/huggingface/transformers/pull/16202

Swin support for any input size by @FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/15986

Fix generation min length by @patrickvonplaten in https://github.com/huggingface/transformers/pull/16206

Add/type annotations/model vision by @johnnv1 in https://github.com/huggingface/transformers/pull/16151

VAN: update modules names by @FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/16201

Fixes Loss for TransfoXL when using Trainer API v2 by @LysandreJik in https://github.com/huggingface/transformers/pull/16140

[Tests] Fix DiT test by @NielsRogge in https://github.com/huggingface/transformers/pull/16218

Fix FlaxRoFormerClassificationHead activation by @ydshieh in https://github.com/huggingface/transformers/pull/16168

Fix typos in docstrings of data_collator.py by @daysm in https://github.com/huggingface/transformers/pull/16208

Fix reproducibility in Training for PyTorch 1.11 by @sgugger in https://github.com/huggingface/transformers/pull/16209

Fix readmes by @qqaatw in https://github.com/huggingface/transformers/pull/16217

MaskFormer: fix device on test by @FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/16219

Adding Unpack Decorator For DPR model by @forsc in https://github.com/huggingface/transformers/pull/16212

Skip equivalence test for TransfoXL by @LysandreJik in https://github.com/huggingface/transformers/pull/16224

Fix Type Hint of Nan/Inf Logging Filter Arg by @Sophylax in https://github.com/huggingface/transformers/pull/16227

[Flax] remove jax.ops.index by @patil-suraj in https://github.com/huggingface/transformers/pull/16220

Support PEP 563 for HfArgumentParser by @function2-llx in https://github.com/huggingface/transformers/pull/15795

add unpack_inputs decorator for marian by @johko in https://github.com/huggingface/transformers/pull/16226

fix(flax): generate with logits processor/warper by @borisdayma in https://github.com/huggingface/transformers/pull/16231

[FlaxSpeechEncoderDecoderModel] Skip from_encoder_decoder_pretrained by @patil-suraj in https://github.com/huggingface/transformers/pull/16236

[Generate Docs] Correct docs by @patrickvonplaten in https://github.com/huggingface/transformers/pull/16133

[Deepspeed] non-HF Trainer doc update by @stas00 in https://github.com/huggingface/transformers/pull/16238

integrations: mlflow: skip start_run() if a run is already active and sanity check on enabling integration by @ktzsh in https://github.com/huggingface/transformers/pull/16131

Update expected slices for pillow > 9 by @NielsRogge in https://github.com/huggingface/transformers/pull/16117

Attention mask is important in the case of batching... by @Narsil in https://github.com/huggingface/transformers/pull/16222

Change assertion to warning when passing past_key_value to T5 encoder by @ZhaofengWu in https://github.com/huggingface/transformers/pull/16153

Override _pad in LEDTokenizer to deal with global_attention_mask by @ydshieh in https://github.com/huggingface/transformers/pull/15940

Update XLM with TF decorator by @louisowen6 in https://github.com/huggingface/transformers/pull/16247

Add unpack_inputs decorator for ctrl by @johko in https://github.com/huggingface/transformers/pull/16242

update jax version and re-enable some tests by @patil-suraj in https://github.com/huggingface/transformers/pull/16254

[Constrained Beam Search] Adding Notebook Example & Minor Typo Fix by @cwkeam in https://github.com/huggingface/transformers/pull/16246

value check for typical sampling by @cimeister in https://github.com/huggingface/transformers/pull/16165

Make Flax pt-flax equivalence test more aggressive by @ydshieh in https://github.com/huggingface/transformers/pull/15841

Aggressive PT/TF equivalence test on PT side by @ydshieh in https://github.com/huggingface/transformers/pull/16250

Update flaubert with TF decorator by @Tegzes in https://github.com/huggingface/transformers/pull/16258

Fix links in guides by @stevhliu in https://github.com/huggingface/transformers/pull/16182

Small fixes to the documentation by @sgugger in https://github.com/huggingface/transformers/pull/16180

[WIP] add has_attentions as done in PyTorch side by @ydshieh in https://github.com/huggingface/transformers/pull/16259

Make add-new-model-like work in an env without all frameworks by @sgugger in https://github.com/huggingface/transformers/pull/16239

Deberta v2 code simplification by @guillaume-be in https://github.com/huggingface/transformers/pull/15732

Add Slack notification support for doc tests by @patrickvonplaten in https://github.com/huggingface/transformers/pull/16253

Framework split for Spanish version of doc quicktour.mdx by @omarespejel in https://github.com/huggingface/transformers/pull/16215

Removed the 'optional' string (in DETR post_process) by @dinesh-GDK in https://github.com/huggingface/transformers/pull/16266

Draft a guide with our code quirks for new models by @sgugger in https://github.com/huggingface/transformers/pull/16237

Fixed Error Raised Due to Wrongly Accessing Training Sample by @aflah02 in https://github.com/huggingface/transformers/pull/16115

Fix XGLM cross attention by @patil-suraj in https://github.com/huggingface/transformers/pull/16290

Fix a typo (add a coma) by @PolarisRisingWar in https://github.com/huggingface/transformers/pull/16291

Add type hints to xlnet by @mowafess in https://github.com/huggingface/transformers/pull/16214

Remove disclaimer from Longformer docs by @gchhablani in https://github.com/huggingface/transformers/pull/16296

Add argument "cache_dir" for transformers.onnx by @happyXia in https://github.com/huggingface/transformers/pull/16284

Add type hints transfoxl by @jcmc00 in https://github.com/huggingface/transformers/pull/16267

added type hints for BART model by @robotjellyzone in https://github.com/huggingface/transformers/pull/16270

ResNet & VAN: Fixed code sample tests by @FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/16294

GPT2 TensorFlow Type Hints by @cakiki in https://github.com/huggingface/transformers/pull/16261

Added type hints for PyTorch T5 model by @yhl48 in https://github.com/huggingface/transformers/pull/16257

Fix Marian conversion script by @patil-suraj in https://github.com/huggingface/transformers/pull/16300

[SegFormer] Remove unused attributes by @NielsRogge in https://github.com/huggingface/transformers/pull/16285

Update troubleshoot with more content by @stevhliu in https://github.com/huggingface/transformers/pull/16243

fix last element in hidden_states for XGLM by @ydshieh in https://github.com/huggingface/transformers/pull/16301

[FlaxGPTJ] Fix bug in rotary embeddings by @patil-suraj in https://github.com/huggingface/transformers/pull/16298

Add missing type hints for PyTorch Longformer models by @johnnygreco in https://github.com/huggingface/transformers/pull/16244

Fix Seq2SeqTrainingArguments docs by @gchhablani in https://github.com/huggingface/transformers/pull/16295

[xtreme-s] Update Minds14 results by @anton-l in https://github.com/huggingface/transformers/pull/16241

added type hints for blenderbot and blenderbot_small (v2) by @IvanLauLinTiong in https://github.com/huggingface/transformers/pull/16307

Update Makefile Phonies by @gchhablani in https://github.com/huggingface/transformers/pull/16306

TF - update (vision_)encoder_decoder past variable by @gante in https://github.com/huggingface/transformers/pull/16260

Add Flaubert OnnxConfig to Transformers by @ChainYo in https://github.com/huggingface/transformers/pull/16279

TFLongformer: Add missing type hints and unpack inputs decorator by @johnnygreco in https://github.com/huggingface/transformers/pull/16228

add xglm conversion script by @patil-suraj in https://github.com/huggingface/transformers/pull/16305

Fix bugs of s2t fairseq model converting by @beomseok-lee in https://github.com/huggingface/transformers/pull/15593

Add type hints for Pegasus model (PyTorch) by @Tegzes in https://github.com/huggingface/transformers/pull/16324

Funnel type hints by @AMontgomerie in https://github.com/huggingface/transformers/pull/16323

Add type hints for ProphetNet PyTorch by @Tegzes in https://github.com/huggingface/transformers/pull/16272

[GLPN] Improve docs by @NielsRogge in https://github.com/huggingface/transformers/pull/16331

Added type hints for Pytorch Marian calls by @clefourrier in https://github.com/huggingface/transformers/pull/16200

VAN: Code sample tests by @FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/16340

Add type annotations for Rembert/Splinter and copies by @jacobdineen in https://github.com/huggingface/transformers/pull/16338

[Bug template] Shift responsibilities for long-range by @patrickvonplaten in https://github.com/huggingface/transformers/pull/16344

Fix code repetition in serialization guide by @osanseviero in https://github.com/huggingface/transformers/pull/16346

Adopt framework-specific blocks for content by @stevhliu in https://github.com/huggingface/transformers/pull/16342

Updates the default branch from master to main by @LysandreJik in https://github.com/huggingface/transformers/pull/16326

[T5] Add t5 download script by @patrickvonplaten in https://github.com/huggingface/transformers/pull/16328

Reorganize file utils by @sgugger in https://github.com/huggingface/transformers/pull/16264

[FlaxBart] make sure no grads are computed an bias by @patrickvonplaten in https://github.com/huggingface/transformers/pull/16345

Trainer evaluation delay by @OllieBroadhurst in https://github.com/huggingface/transformers/pull/16356

Adding missing type hints for mBART model (TF) by @reichenbch in https://github.com/huggingface/transformers/pull/16281

Add type annotations of config for vision models by @johnnv1 in https://github.com/huggingface/transformers/pull/16263

TF - Fix interchangeable past/past_key_values and revert output variable name in GPT2 by @gante in https://github.com/huggingface/transformers/pull/16332

Swap inequalities by @OllieBroadhurst in https://github.com/huggingface/transformers/pull/16368

Make Transformers use cache files when hf.co is down by @sgugger in https://github.com/huggingface/transformers/pull/16362

Decision transformer gym by @edbeeching in https://github.com/huggingface/transformers/pull/15845

add GPT-J ONNX config to Transformers by @ChainYo in https://github.com/huggingface/transformers/pull/16274

Update docs/README.md by @ydshieh in https://github.com/huggingface/transformers/pull/16333

Make BigBird model compatiable to fp16 dtype. by @xuzhao9 in https://github.com/huggingface/transformers/pull/16034

[Doctests] Make roberta-like meaningfull by @patrickvonplaten in https://github.com/huggingface/transformers/pull/16363

[Doctests] Make TFRoberta-like meaningfull by @ydshieh in https://github.com/huggingface/transformers/pull/16370

Update readme with how to train offline and fix BPE command by @ncoop57 in https://github.com/huggingface/transformers/pull/15897

Fix BigBirdModelTester by @ydshieh in https://github.com/huggingface/transformers/pull/16310

Type hints and decorator for TF T5 by @Dahlbomii in https://github.com/huggingface/transformers/pull/16376

Add type hints for ConvBert model by @simonzli in https://github.com/huggingface/transformers/pull/16377

Update pt flax equivalence tests in pt by @ydshieh in https://github.com/huggingface/transformers/pull/16280

Bump cookiecutter version by @ydshieh in https://github.com/huggingface/transformers/pull/16387

Fix style by @LysandreJik in https://github.com/huggingface/transformers/pull/16391

Fix readme links and add CI check by @sgugger in https://github.com/huggingface/transformers/pull/16392

variable naming for Distilbert model by @robotjellyzone in https://github.com/huggingface/transformers/pull/16384

Added type hints by @yhl48 in https://github.com/huggingface/transformers/pull/16389

Rename semantic segmentation outputs by @NielsRogge in https://github.com/huggingface/transformers/pull/15849

Make FeaturesManager.get_model_from_feature a static method by @michaelbenayoun in https://github.com/huggingface/transformers/pull/16357

Big file_utils cleanup by @sgugger in https://github.com/huggingface/transformers/pull/16396

fixed typo from enable to disable in disable_progress_bar function by @Gladiator07 in https://github.com/huggingface/transformers/pull/16406

Rename master to main for notebooks links and leftovers by @sgugger in https://github.com/huggingface/transformers/pull/16397

TF PushToHubCallback fixes and updates by @Rocketknight1 in https://github.com/huggingface/transformers/pull/16409

Add ONNX support for Blenderbot and BlenderbotSmall by @lewtun in https://github.com/huggingface/transformers/pull/15875

[FlaxSpeechEncoderDecoder] Fix feature extractor gradient test by @sanchit-gandhi in https://github.com/huggingface/transformers/pull/16407

Fix Typo in Argument of FlaxWav2Vec2ForPreTrainingModule by @sanchit-gandhi in https://github.com/huggingface/transformers/pull/16084

Removed inputs_processing and replaced with decorator for lxmert by @silvererudite in https://github.com/huggingface/transformers/pull/16414

remove references to PDF reading via PIL by @garfieldnate in https://github.com/huggingface/transformers/pull/15293

Update comments in class BatchEncoding by @basicv8vc in https://github.com/huggingface/transformers/pull/15932

Fix broken links by @kurianbenoy in https://github.com/huggingface/transformers/pull/16113

cached_download ∘ hf_hub_url is hf_hub_download by @julien-c in https://github.com/huggingface/transformers/pull/16375

QDQBert example update by @shangz-ai in https://github.com/huggingface/transformers/pull/16395

[Flax] Improve Robustness of Back-Prop Tests by @sanchit-gandhi in https://github.com/huggingface/transformers/pull/16418

Fix typo in language modeling example comment by @dreamgonfly in https://github.com/huggingface/transformers/pull/16421

Use doc builder styler by @sgugger in https://github.com/huggingface/transformers/pull/16412

Fix PerceiverMLP and test by @jaesuny in https://github.com/huggingface/transformers/pull/16405

[FlaxSpeechEncoderDecoderModel] Ensure Input and Output Word Embeddings Are Not Tied by @sanchit-gandhi in https://github.com/huggingface/transformers/pull/16444

Translation from english to spanish of file pipeline_tutorial.mdx by @FernandoLpz in https://github.com/huggingface/transformers/pull/16149

Remove kwargs argument from IBERT MLM forward pass by @lewtun in https://github.com/huggingface/transformers/pull/16449

Fix blenderbot conversion script by @patil-suraj in https://github.com/huggingface/transformers/pull/16472

Adding DocTest to TrOCR by @arnaudstiegler in https://github.com/huggingface/transformers/pull/16398

[MNLI example] Prevent overwriting matched with mismatched metrics by @eldarkurtic in https://github.com/huggingface/transformers/pull/16475

Remove duplicate mLuke by @stevhliu in https://github.com/huggingface/transformers/pull/16460

Fix missing output_attentions in PT/Flax equivalence test by @ydshieh in https://github.com/huggingface/transformers/pull/16271

Fix some TF GPT-J CI testings by @ydshieh in https://github.com/huggingface/transformers/pull/16454

Fix example test and test_fetcher for examples by @sgugger in https://github.com/huggingface/transformers/pull/16478

fix wrong variable name by @wesleyacheng in https://github.com/huggingface/transformers/pull/16467

Add TF vision model code samples by @ydshieh in https://github.com/huggingface/transformers/pull/16477

missing trainer import by @wesleyacheng in https://github.com/huggingface/transformers/pull/16469

Add type hints for UniSpeech by @Tegzes in https://github.com/huggingface/transformers/pull/16399

TF: properly handle kwargs in encoder_decoder architectures by @gante in https://github.com/huggingface/transformers/pull/16465

added typehints for RAG pytorch models by @akashe in https://github.com/huggingface/transformers/pull/16416

Avoid accessing .dataset of a DataLoader in Trainer by @sanderland in https://github.com/huggingface/transformers/pull/16451

TF GPT2: clearer model variable naming with @unpack_inputs by @cakiki in https://github.com/huggingface/transformers/pull/16311

Raise diff tolerance value for TFViTMAEModelTest by @ydshieh in https://github.com/huggingface/transformers/pull/16483

Do not initialize torch.distributed process group if one is already initailized by @Yard1 in https://github.com/huggingface/transformers/pull/16487

TF GPT-J Type hints and TF decorator by @Dahlbomii in https://github.com/huggingface/transformers/pull/16488

Nit: MCSCOCO -> MSCOCO by @AdityaKane2001 in https://github.com/huggingface/transformers/pull/16481

Add length to PreTrainedTokenizer train_new_from_iterator by @dctelus in https://github.com/huggingface/transformers/pull/16493

Add support for exporting GPT-J to ONNX-TRT by @tomerip in https://github.com/huggingface/transformers/pull/16492

TF: unpack inputs on Convbert, GPTJ, LED, and templates by @gante in https://github.com/huggingface/transformers/pull/16491

Feature Extractor accepts segmentation_maps by @FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/15964

[examples] max samples can't be bigger than the len of dataset by @stas00 in https://github.com/huggingface/transformers/pull/16501

update smddp api to v1.4.0 by @roywei in https://github.com/huggingface/transformers/pull/16371

Support reduce_bucket_size="auto" for deepspeed stages <3 by @manuelciosici in https://github.com/huggingface/transformers/pull/16496

Modeling Outputs by @FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/16341

make tuple annotation more specific to avoid failures during symbolic_trace by @chenbohua3 in https://github.com/huggingface/transformers/pull/16490

Spanish translation of the file multilingual.mdx by @SimplyJuanjo in https://github.com/huggingface/transformers/pull/16329

Translate installation.mdx to Spanish by @lilianabs in https://github.com/huggingface/transformers/pull/16229

Translate accelerate.mdx from english to spanish by @Sangohe in https://github.com/huggingface/transformers/pull/16176

[Typo][Example] Fixed a typo in run_qa_no_trainer.py by @bhadreshpsavani in https://github.com/huggingface/transformers/pull/16508

added type hints to xglm pytorch by @mowafess in https://github.com/huggingface/transformers/pull/16500

Fix syntax error in generate docstrings by @sgugger in https://github.com/huggingface/transformers/pull/16516

[research] link to the XTREME-S paper by @anton-l in https://github.com/huggingface/transformers/pull/16519

Fixed a typo in seq2seq_trainer.py by @Agoniii in https://github.com/huggingface/transformers/pull/16531

Add ONNX export for BeiT by @akuma12 in https://github.com/huggingface/transformers/pull/16498

call on_train_end when optuna trial is pruned by @fschlatt in https://github.com/huggingface/transformers/pull/16536

Type hints added to OpenAIGPT by @Dahlbomii in https://github.com/huggingface/transformers/pull/16529

Fix Bart type hints by @gchhablani in https://github.com/huggingface/transformers/pull/16297

Add VisualBert type hints by @gchhablani in https://github.com/huggingface/transformers/pull/16544

Adding missing type hints for mBART model (PyTorch) by @reichenbch in https://github.com/huggingface/transformers/pull/16429

Remove MBart subclass of XLMRoberta in tokenzier docs by @gchhablani in https://github.com/huggingface/transformers/pull/16546

Use random_attention_mask for TF tests by @ydshieh in https://github.com/huggingface/transformers/pull/16517

[GLPN] Improve code example by @NielsRogge in https://github.com/huggingface/transformers/pull/16450

Pin tokenizers version <0.13 by @LysandreJik in https://github.com/huggingface/transformers/pull/16539

add code samples for TF speech models by @ydshieh in https://github.com/huggingface/transformers/pull/16494

[FlaxSpeechEncoderDecoder] Fix dtype bug by @patrickvonplaten in https://github.com/huggingface/transformers/pull/16581

Making the impossible to connect error actually report the right URL. by @Narsil in https://github.com/huggingface/transformers/pull/16446

Fix flax import in __init__.py: modeling_xglm -> modeling_flax_xglm by @stancld in https://github.com/huggingface/transformers/pull/16556

Add utility to find model labels by @sgugger in https://github.com/huggingface/transformers/pull/16526

Enable doc in Spanish by @sgugger in https://github.com/huggingface/transformers/pull/16518

Add use_auth to load_datasets for private datasets to PT and TF examples by @KMFODA in https://github.com/huggingface/transformers/pull/16521

add a test checking the format of convert_tokens_to_string's output by @SaulLu in https://github.com/huggingface/transformers/pull/16540

TF: Finalize unpack_inputs-related changes by @gante in https://github.com/huggingface/transformers/pull/16499

[SpeechEncoderDecoderModel] Correct Encoder Last Hidden State Output by @sanchit-gandhi in https://github.com/huggingface/transformers/pull/16586

initialize the default rank set on TrainerState by @andrescodas in https://github.com/huggingface/transformers/pull/16530

Fix CI: test_inference_for_pretraining in ViTMAEModelTest by @ydshieh in https://github.com/huggingface/transformers/pull/16591

add a template to add missing tokenization test by @SaulLu in https://github.com/huggingface/transformers/pull/16553

PretrainedModel: made _load_pretrained_model_low_mem static + bug fix by @FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/16548

handle torch_dtype in low cpu mem usage by @patil-suraj in https://github.com/huggingface/transformers/pull/16580

[Doctests] Correct filenaming by @patrickvonplaten in https://github.com/huggingface/transformers/pull/16599

Adding new train_step logic to make things less confusing for users by @Rocketknight1 in https://github.com/huggingface/transformers/pull/15994

Adding missing type hints for BigBird model by @reichenbch in https://github.com/huggingface/transformers/pull/16555

[deepspeed] fix typo, adjust config name by @stas00 in https://github.com/huggingface/transformers/pull/16597

Add global_attention_mask to gen_kwargs in Seq2SeqTrainer.prediction_step by @JohnGiorgi in https://github.com/huggingface/transformers/pull/16485

[benchmark tool] trainer-benchmark.py by @stas00 in https://github.com/huggingface/transformers/pull/14934

Update summary of the tasks by @stevhliu in https://github.com/huggingface/transformers/pull/16528

added type hints to CTRL pytorch by @anmolsjoshi in https://github.com/huggingface/transformers/pull/16593

fix default num_attention_heads in segformer doc by @JunMa11 in https://github.com/huggingface/transformers/pull/16612

[Docs] Correct quicktour minds14 dataset by @patrickvonplaten in https://github.com/huggingface/transformers/pull/16626

Fix seq2seq doc tests by @patil-suraj in https://github.com/huggingface/transformers/pull/16606

don't load state_dict twice when using low_cpu_mem_usage in from_pretrained by @patil-suraj in https://github.com/huggingface/transformers/pull/16602

Use CLIP model config to set some kwargs for components by @ydshieh in https://github.com/huggingface/transformers/pull/16609

[modeling_utils] typo by @stas00 in https://github.com/huggingface/transformers/pull/16621

[Speech2Text Doc] Fix docs by @patrickvonplaten in https://github.com/huggingface/transformers/pull/16611

[FlaxSpeechEncoderDecoderModel] More Rigorous PT-Flax Equivalence Tests by @sanchit-gandhi in https://github.com/huggingface/transformers/pull/16589

Fix TFTransfoXLLMHeadModel outputs by @ydshieh in https://github.com/huggingface/transformers/pull/16590

Impressive community contributors

The community contributors below have significantly contributed to the v4.18.0 release. Thank you!

@sayakpaul, for contributing the TensorFlow version of ViTMAE @stancld, for contributing the TensorFlow version of of GPT-J

New Contributors

@Soonhwan-Kwon made their first contribution in https://github.com/huggingface/transformers/pull/13727

@jonatasgrosman made their first contribution in https://github.com/huggingface/transformers/pull/15428

@ToluClassics made their first contribution in https://github.com/huggingface/transformers/pull/15432

@peregilk made their first contribution in https://github.com/huggingface/transformers/pull/15423

@bugface made their first contribution in https://github.com/huggingface/transformers/pull/15480

@AyushExel made their first contribution in https://github.com/huggingface/transformers/pull/14582

@thinksoso made their first contribution in https://github.com/huggingface/transformers/pull/15403

@davidleonfdez made their first contribution in https://github.com/huggingface/transformers/pull/15473

@sanchit-gandhi made their first contribution in https://github.com/huggingface/transformers/pull/15519

@arron1227 made their first contribution in https://github.com/huggingface/transformers/pull/15084

@cimeister made their first contribution in https://github.com/huggingface/transformers/pull/15504

@cwkeam made their first contribution in https://github.com/huggingface/transformers/pull/15416

@Albertobegue made their first contribution in https://github.com/huggingface/transformers/pull/13831

@derenrich made their first contribution in https://github.com/huggingface/transformers/pull/15614

@tkukurin made their first contribution in https://github.com/huggingface/transformers/pull/15636

@muzhi1991 made their first contribution in https://github.com/huggingface/transformers/pull/15638

@versae made their first contribution in https://github.com/huggingface/transformers/pull/15590

@jonrbates made their first contribution in https://github.com/huggingface/transformers/pull/15617

@arampacha made their first contribution in https://github.com/huggingface/transformers/pull/15413

@FrancescoSaverioZuppichini made their first contribution in https://github.com/huggingface/transformers/pull/15657

@coyotte508 made their first contribution in https://github.com/huggingface/transformers/pull/15680

@heytanay made their first contribution in https://github.com/huggingface/transformers/pull/15531

@gautierdag made their first contribution in https://github.com/huggingface/transformers/pull/15702

@SSardorf made their first contribution in https://github.com/huggingface/transformers/pull/15741

@Crabzmatic made their first contribution in https://github.com/huggingface/transformers/pull/15740

@dreamgonfly made their first contribution in https://github.com/huggingface/transformers/pull/15644

@lsb made their first contribution in https://github.com/huggingface/transformers/pull/15468

@pbelevich made their first contribution in https://github.com/huggingface/transformers/pull/15776

@sayakpaul made their first contribution in https://github.com/huggingface/transformers/pull/15750

@rahul003 made their first contribution in https://github.com/huggingface/transformers/pull/15877

@rhjohnstone made their first contribution in https://github.com/huggingface/transformers/pull/15884

@cosmoquester made their first contribution in https://github.com/huggingface/transformers/pull/15913

@konstantinjdobler made their first contribution in https://github.com/huggingface/transformers/pull/15951

@yhavinga made their first contribution in https://github.com/huggingface/transformers/pull/15963

@dlwh made their first contribution in https://github.com/huggingface/transformers/pull/15961

@basilevh made their first contribution in https://github.com/huggingface/transformers/pull/15972

@andstor made their first contribution in https://github.com/huggingface/transformers/pull/16033

@davidsbatista made their first contribution in https://github.com/huggingface/transformers/pull/16063

@feifang24 made their first contribution in https://github.com/huggingface/transformers/pull/16065

@kevinpl07 made their first contribution in https://github.com/huggingface/transformers/pull/15245

@johnnv1 made their first contribution in https://github.com/huggingface/transformers/pull/16088

@Abdelrhman-Hosny made their first contribution in https://github.com/huggingface/transformers/pull/16097

@p-mishra1 made their first contribution in https://github.com/huggingface/transformers/pull/16099

@jbrry made their first contribution in https://github.com/huggingface/transformers/pull/16108

@jorgtied made their first contribution in https://github.com/huggingface/transformers/pull/16124

@vumichien made their first contribution in https://github.com/huggingface/transformers/pull/16110

@merveenoyan made their first contribution in https://github.com/huggingface/transformers/pull/16138

@yharyarias made their first contribution in https://github.com/huggingface/transformers/pull/16047

@bhavika made their first contribution in https://github.com/huggingface/transformers/pull/16129

@PepijnBoers made their first contribution in https://github.com/huggingface/transformers/pull/16107

@soomiles made their first contribution in https://github.com/huggingface/transformers/pull/16121

@Tegzes made their first contribution in https://github.com/huggingface/transformers/pull/16126

@jacobdineen made their first contribution in https://github.com/huggingface/transformers/pull/16106

@wpan03 made their first contribution in https://github.com/huggingface/transformers/pull/16123

@infinite-Joy made their first contribution in https://github.com/huggingface/transformers/pull/16147

@marxav made their first contribution in https://github.com/huggingface/transformers/pull/16132

@Duedme made their first contribution in https://github.com/huggingface/transformers/pull/16158

@MarkusSagen made their first contribution in https://github.com/huggingface/transformers/pull/16087

@mowafess made their first contribution in https://github.com/huggingface/transformers/pull/16163

@jcmc00 made their first contribution in https://github.com/huggingface/transformers/pull/16174

@utkusaglm made their first contribution in https://github.com/huggingface/transformers/pull/16178

@johko made their first contribution in https://github.com/huggingface/transformers/pull/16181

@johnryan465 made their first contribution in https://github.com/huggingface/transformers/pull/16090

@daysm made their first contribution in https://github.com/huggingface/transformers/pull/16208

@forsc made their first contribution in https://github.com/huggingface/transformers/pull/16212

@Sophylax made their first contribution in https://github.com/huggingface/transformers/pull/16227

@function2-llx made their first contribution in https://github.com/huggingface/transformers/pull/15795

@ktzsh made their first contribution in https://github.com/huggingface/transformers/pull/16131

@louisowen6 made their first contribution in https://github.com/huggingface/transformers/pull/16247

@omarespejel made their first contribution in https://github.com/huggingface/transformers/pull/16215

@dinesh-GDK made their first contribution in https://github.com/huggingface/transformers/pull/16266

@aflah02 made their first contribution in https://github.com/huggingface/transformers/pull/16115

@PolarisRisingWar made their first contribution in https://github.com/huggingface/transformers/pull/16291

@happyXia made their first contribution in https://github.com/huggingface/transformers/pull/16284

@robotjellyzone made their first contribution in https://github.com/huggingface/transformers/pull/16270

@yhl48 made their first contribution in https://github.com/huggingface/transformers/pull/16257

@johnnygreco made their first contribution in https://github.com/huggingface/transformers/pull/16244

@IvanLauLinTiong made their first contribution in https://github.com/huggingface/transformers/pull/16307

@beomseok-lee made their first contribution in https://github.com/huggingface/transformers/pull/15593

@clefourrier made their first contribution in https://github.com/huggingface/transformers/pull/16200

@OllieBroadhurst made their first contribution in https://github.com/huggingface/transformers/pull/16356

@reichenbch made their first contribution in https://github.com/huggingface/transformers/pull/16281

@edbeeching made their first contribution in https://github.com/huggingface/transformers/pull/15845

@xuzhao9 made their first contribution in https://github.com/huggingface/transformers/pull/16034

@Dahlbomii made their first contribution in https://github.com/huggingface/transformers/pull/16376

@simonzli made their first contribution in https://github.com/huggingface/transformers/pull/16377

@Gladiator07 made their first contribution in https://github.com/huggingface/transformers/pull/16406

@silvererudite made their first contribution in https://github.com/huggingface/transformers/pull/16414

@garfieldnate made their first contribution in https://github.com/huggingface/transformers/pull/15293

@basicv8vc made their first contribution in https://github.com/huggingface/transformers/pull/15932

@kurianbenoy made their first contribution in https://github.com/huggingface/transformers/pull/16113

@jaesuny made their first contribution in https://github.com/huggingface/transformers/pull/16405

@FernandoLpz made their first contribution in https://github.com/huggingface/transformers/pull/16149

@arnaudstiegler made their first contribution in https://github.com/huggingface/transformers/pull/16398

@wesleyacheng made their first contribution in https://github.com/huggingface/transformers/pull/16467

@akashe made their first contribution in https://github.com/huggingface/transformers/pull/16416

@sanderland made their first contribution in https://github.com/huggingface/transformers/pull/16451

@AdityaKane2001 made their first contribution in https://github.com/huggingface/transformers/pull/16481

@dctelus made their first contribution in https://github.com/huggingface/transformers/pull/16493

@tomerip made their first contribution in https://github.com/huggingface/transformers/pull/16492

@roywei made their first contribution in https://github.com/huggingface/transformers/pull/16371

@chenbohua3 made their first contribution in https://github.com/huggingface/transformers/pull/16490

@SimplyJuanjo made their first contribution in https://github.com/huggingface/transformers/pull/16329

@lilianabs made their first contribution in https://github.com/huggingface/transformers/pull/16229

@Sangohe made their first contribution in https://github.com/huggingface/transformers/pull/16176

@Agoniii made their first contribution in https://github.com/huggingface/transformers/pull/16531

@akuma12 made their first contribution in https://github.com/huggingface/transformers/pull/16498

@fschlatt made their first contribution in https://github.com/huggingface/transformers/pull/16536

@KMFODA made their first contribution in https://github.com/huggingface/transformers/pull/16521

@andrescodas made their first contribution in https://github.com/huggingface/transformers/pull/16530

@JohnGiorgi made their first contribution in https://github.com/huggingface/transformers/pull/16485

@JunMa11 made their first contribution in https://github.com/huggingface/transformers/pull/16612

Full Changelog: https://github.com/huggingface/transformers/compare/v4.17.0...v4.18.0
Source code(tar.gz)
Source code(zip)
v4.17.0(Mar 3, 2022)
New models

XGLM

The XGLM model was proposed in Few-shot Learning with Multilingual Language Models by Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O’Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li.

XGLM is a GPT3-like multilingual model trained on a balanced corpus covering a diverse set of languages.

Add XGLM models by @patil-suraj in https://github.com/huggingface/transformers/pull/14876

ConvNext

The ConvNeXT model was proposed in A ConvNet for the 2020s by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie.

ConvNeXT is a pure convolutional model (ConvNet), inspired by the design of Vision Transformers, that claims to outperform them.

Add ConvNeXT by @NielsRogge in https://github.com/huggingface/transformers/pull/15277

Add TFConvNextModel by @sayakpaul in https://github.com/huggingface/transformers/pull/15750

PoolFormer

The PoolFormer model was proposed in MetaFormer is Actually What You Need for Vision by Sea AI Labs.

Add PoolFormer by @heytanay in https://github.com/huggingface/transformers/pull/15531

PLBart

The PLBART model was proposed in Unified Pre-training for Program Understanding and Generation by Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang.

This is a BART-like model which can be used to perform code-summarization, code-generation, and code-translation tasks. The pre-trained model plbart-base has been trained using multilingual denoising task on Java, Python and English.

Add PLBart by @gchhablani in https://github.com/huggingface/transformers/pull/13269

Add missing PLBart entry in README by @gchhablani in https://github.com/huggingface/transformers/pull/15721

Data2Vec

The Data2Vec model was proposed in data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu and Michael Auli.

Data2Vec proposes a unified framework for self-supervised learning across different data modalities - text, audio and images. Importantly, predicted targets for pre-training are contextualized latent representations of the inputs, rather than modality-specific, context-independent targets.

Add Data2Vec by @edugp in https://github.com/huggingface/transformers/pull/15507

Maskformer

The MaskFormer model was proposed in Per-Pixel Classification is Not All You Need for Semantic Segmentation by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov.

MaskFormer addresses semantic segmentation with a mask classification paradigm instead of performing classic pixel-level classification.

Maskformer by @FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/15682

Code in the Hub

This is a new experimental feature added to the library. It allows you to share a custom model (with configuration, tokenizer, feature extractor, processor) with anyone through the Model Hub while still using the Auto-classes API of the Transformers library.

See the documentation for more information!

Allow relative imports in dynamic code by @sgugger in https://github.com/huggingface/transformers/pull/15352

Save code of registered custom models by @sgugger in https://github.com/huggingface/transformers/pull/15379

Documentation

We are working on updating the existing guides in the documentation, and writing more!

Update model share tutorial by @stevhliu in https://github.com/huggingface/transformers/pull/15288

Get started docs by @stevhliu in https://github.com/huggingface/transformers/pull/15098

Update fine-tune docs by @stevhliu in https://github.com/huggingface/transformers/pull/15259

Update tutorial docs by @stevhliu in https://github.com/huggingface/transformers/pull/15165

Create a custom model guide by @stevhliu in https://github.com/huggingface/transformers/pull/15489

🧼 NLP task guides by @stevhliu in https://github.com/huggingface/transformers/pull/15731

Inference for multilingual models by @stevhliu in https://github.com/huggingface/transformers/pull/15836

Time Stamps for Speech models

Speech models that have been trained with the CTC loss (Wav2Vec2, XLS-R, HuBERT, WavLM, ...) can now output the time stamp in addition to the transcription of the input audio. E.g. one can retrieve the start and end time for every transcribed word via the Wav2Vec2CTCTokenizer.decode method or the Wav2Vec2ProcessorWithLM.decoder method. See the documentation here and here respectively.

This feature can also be directly used via the ASR pipeline - see here and this example.

Add time stamps for wav2vec2 with lm by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15854

Adding timestamps for CTC with LM in ASR pipeline. by @Narsil in https://github.com/huggingface/transformers/pull/15863

Adding the option to return_timestamps on pure CTC ASR models. by @Narsil in https://github.com/huggingface/transformers/pull/15792

Time stamps for CTC models by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15687

Breaking change

Unfortunately, some bugs had crept into CLIPTokenizerFast : the tokenization produced by CLIPTokenizer and CLIPTokenizerFast were not equal. CLIPTokenizerFast has been corrected to encode the text with the same strategy as CLIPTokenizer.

What does this mean for you ? You need to use the tokenizer that was used to train the CLIP template you are using. For example:

Case 1 : you use openai/clip-vit-base-patch32, openai/clip-vit-base-patch16 or openai/clip-vit-large-patch14 , before v4.17.0 the good version of the tokenizer was CLIPTokenizer. From v4.17.0, you can use both CLIPTokenizer and CLIPTokenizerFast.

Case 2 : you have trained your own CLIP model using CLIPTokenizerFast. Your tokenizer is no longer a CLIPTokenizerFast and we recommend you to load your tokenizer.json in a PreTrainedTokenizerFast directly or to continue to use a version prior to v4.17.0.

Case 3: you have trained your own CLIP model using CLIPTokenizer. Now, you can produce a fast equivalent of your tokenizer by doing CLIPTokenizerFast.from_pretrained("Path to local folder or Hub repo with slow tokenizer files", from_slow=True).

To make CLIPTokenizerFast identical to CLIPTokenizer, the template of the tokenization of a sentence pair (A,B) has been modified. The previous template was <|startoftext|> A B <|endoftext|> and the new one is <|startoftext|> A <|endoftext|> <|endoftext|> B <|endoftext|>.

What's Changed

Fix tests_fetcher by @sgugger in https://github.com/huggingface/transformers/pull/15376

Fix code format for Accelerate doc by @stevhliu in https://github.com/huggingface/transformers/pull/15335

Add init to BORT by @LysandreJik in https://github.com/huggingface/transformers/pull/15378

Set syncfree AdamW as the default optimizer for xla:gpu device in amp mode by @ymwangg in https://github.com/huggingface/transformers/pull/15361

Fixing support batch_size and num_return_Sequences in text-generation pipeline by @Narsil in https://github.com/huggingface/transformers/pull/15318

Fix bad_words_ids not working with sentencepiece-based tokenizers by @ngoquanghuy99 in https://github.com/huggingface/transformers/pull/15343

[docs] fix wrong file name in pr_check by @ngoquanghuy99 in https://github.com/huggingface/transformers/pull/15380

Prepare deprecated ONNX exporter for torch v1.11 by @lewtun in https://github.com/huggingface/transformers/pull/15388

[Fix doc example] FlaxMarianPreTrainedModel by @ydshieh in https://github.com/huggingface/transformers/pull/15391

Make links explicit by @Rocketknight1 in https://github.com/huggingface/transformers/pull/15395

[deepspeed] saving checkpoint fallback when fp16 weights aren't saved by @stas00 in https://github.com/huggingface/transformers/pull/14948

Fix missing eps arg for LayerNorm in ElectraGeneratorPredictions by @ydshieh in https://github.com/huggingface/transformers/pull/15332

Use argument for preprocessing workers in run_summairzation by @sgugger in https://github.com/huggingface/transformers/pull/15394

Add support for XLM-R XL and XXL models by modeling_xlm_roberta_xl.py by @Soonhwan-Kwon in https://github.com/huggingface/transformers/pull/13727

Fix the inconsistency of loss calculation between PT/TF XLNetLMHeadModel by @ydshieh in https://github.com/huggingface/transformers/pull/15298

[XGLMTokenizer] fix init and add in AutoTokenizer by @patil-suraj in https://github.com/huggingface/transformers/pull/15406

Add SegformerFeatureExtractor to Auto API by @NielsRogge in https://github.com/huggingface/transformers/pull/15410

Fix additional DataTrainingArguments documentation by @FremyCompany in https://github.com/huggingface/transformers/pull/15408

Add (M)Luke model training for Token Classification in the examples by @jplu in https://github.com/huggingface/transformers/pull/14880

Update README.md by @kamalkraj in https://github.com/huggingface/transformers/pull/15430

[Robust Speech Challenge] Add missing LR parameter by @jonatasgrosman in https://github.com/huggingface/transformers/pull/15428

[XGLM] fix gradient checkpointing by @patil-suraj in https://github.com/huggingface/transformers/pull/15427

[Hotfix] Fix Swin model outputs by @NielsRogge in https://github.com/huggingface/transformers/pull/15414

add t5 ner finetuning by @ToluClassics in https://github.com/huggingface/transformers/pull/15432

Add doc for add-new-model-like command by @sgugger in https://github.com/huggingface/transformers/pull/15433

[Swin] Add missing header by @NielsRogge in https://github.com/huggingface/transformers/pull/15434

[deepspeed doc] fix import, extra notes by @stas00 in https://github.com/huggingface/transformers/pull/15400

Fix loss calculation in TFXXXForTokenClassification models by @ydshieh in https://github.com/huggingface/transformers/pull/15294

Fix spurious warning in TF TokenClassification models by @Rocketknight1 in https://github.com/huggingface/transformers/pull/15435

Change REALM checkpoint to new ones by @sgugger in https://github.com/huggingface/transformers/pull/15439

[Trainer] suppress warning for length-related columns by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15421

[examples/Flax] add a section about GPUs by @patil-suraj in https://github.com/huggingface/transformers/pull/15198

Fix TFLEDModel by @ydshieh in https://github.com/huggingface/transformers/pull/15356

[XGLMTokenizer] correct positional emb size by @patil-suraj in https://github.com/huggingface/transformers/pull/15441

[RobertaTokenizer] remove inheritance on GPT2Tokenizer by @patil-suraj in https://github.com/huggingface/transformers/pull/15429

Misfiring tf warnings by @Rocketknight1 in https://github.com/huggingface/transformers/pull/15442

Add 'with torch.no_grad()' to BEiT integration test forward passes by @itsTurner in https://github.com/huggingface/transformers/pull/14961

Update modeling_wav2vec2.py by @peregilk in https://github.com/huggingface/transformers/pull/15423

Error when group_by_length is used with an IterableDataset by @sgugger in https://github.com/huggingface/transformers/pull/15437

skip large generations pipeline test for XGLM by @patil-suraj in https://github.com/huggingface/transformers/pull/15445

[generate] fix synced_gpus default by @stas00 in https://github.com/huggingface/transformers/pull/15446

Remove "inputs" in tf common test script (no longer required) by @ydshieh in https://github.com/huggingface/transformers/pull/15262

Fix TF Causal LM models' returned logits by @ydshieh in https://github.com/huggingface/transformers/pull/15256

fix from_vision_text_pretrained doc example by @ydshieh in https://github.com/huggingface/transformers/pull/15453

[M2M100, XGLM] fix positional emb resize by @patil-suraj in https://github.com/huggingface/transformers/pull/15444

Update README.md by @kamalkraj in https://github.com/huggingface/transformers/pull/15462

replace assert with exception for padding_side arg in PreTrainedTokenizerBase __init__ by @SaulLu in https://github.com/huggingface/transformers/pull/15454

fix the tokenizer_config.json file for the slow tokenizer when a fast version is available by @SaulLu in https://github.com/huggingface/transformers/pull/15319

use mean instead of elementwise_mean in XLMPredLayer by @ydshieh in https://github.com/huggingface/transformers/pull/15436

[BartTokenizer] remove inheritance on RobertaTokenizer by @patil-suraj in https://github.com/huggingface/transformers/pull/15461

Trainer.push_to_hub always tries to push to the Hub by @sgugger in https://github.com/huggingface/transformers/pull/15463

Harder check for IndexErrors in QA scripts by @sgugger in https://github.com/huggingface/transformers/pull/15438

Add option to resize like torchvision's Resize by @NielsRogge in https://github.com/huggingface/transformers/pull/15419

[Wav2Vec2ProcessorWithLM] add alpha & beta to batch decode & decode by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15465

Adding support for microphone streaming within pipeline. by @Narsil in https://github.com/huggingface/transformers/pull/15046

fix error posted in issue #15448 by @bugface in https://github.com/huggingface/transformers/pull/15480

Fic docstring of ASR pipeline by @sgugger in https://github.com/huggingface/transformers/pull/15481

Add W&B backend for hyperparameter sweep by @AyushExel in https://github.com/huggingface/transformers/pull/14582

Fix labels stored in model config for token classification examples by @sgugger in https://github.com/huggingface/transformers/pull/15482

fix set truncation attribute in __init__ of PreTrainedTokenizerBase by @SaulLu in https://github.com/huggingface/transformers/pull/15456

Correct eos_token_id settings in generate by @thinksoso in https://github.com/huggingface/transformers/pull/15403

fix TFMarianMTModel output by @ydshieh in https://github.com/huggingface/transformers/pull/15494

Cleanup load_weight_prefix in TFEncoderDecoderModel by @ydshieh in https://github.com/huggingface/transformers/pull/15101

[Flax tests] Disable scheduled GPU tests by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15503

Add general vision docstrings by @NielsRogge in https://github.com/huggingface/transformers/pull/15501

[deepspeed] fix a bug in a test by @stas00 in https://github.com/huggingface/transformers/pull/15493

Add preprocess_logits_for_metrics Trainer param by @davidleonfdez in https://github.com/huggingface/transformers/pull/15473

[deepspeed docs] memory requirements by @stas00 in https://github.com/huggingface/transformers/pull/15506

Remove loss from some flax models docs & examples by @ydshieh in https://github.com/huggingface/transformers/pull/15492

Fix TFElectraForMultipleChoice by @ydshieh in https://github.com/huggingface/transformers/pull/15509

Handle PyTorch to Flax conversion of 1D convolutions by @sanchit-gandhi in https://github.com/huggingface/transformers/pull/15519

Fix TFRemBertEncoder all_hidden_states by @ydshieh in https://github.com/huggingface/transformers/pull/15510

[parallelism docs] Megatron-Deepspeed info by @stas00 in https://github.com/huggingface/transformers/pull/15488

Standardize semantic segmentation models outputs by @sgugger in https://github.com/huggingface/transformers/pull/15469

[deepspeed docs] DeepSpeed ZeRO Inference by @stas00 in https://github.com/huggingface/transformers/pull/15486

Revert "Handle PyTorch to Flax conversion of 1D convolutions" by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15540

[ASR pipeline] correct asr pipeline for seq2seq models by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15541

[torch_int_div] Correct true division in generation by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15498

[Trainer] Deeper length checks for IterableDatasetShard by @anton-l in https://github.com/huggingface/transformers/pull/15539

Add ASR CTC streaming example by @anton-l in https://github.com/huggingface/transformers/pull/15309

Wav2Vec2 models must either throw or deal with add_apater by @FremyCompany in https://github.com/huggingface/transformers/pull/15409

Remove Longformers from ONNX-supported models by @lewtun in https://github.com/huggingface/transformers/pull/15273

Fix TF T5/LED missing cross attn in retrun values by @ydshieh in https://github.com/huggingface/transformers/pull/15511

Make TF Wav2Vec2 outputs the same as PT's version by @ydshieh in https://github.com/huggingface/transformers/pull/15530

FX tracing improvement by @michaelbenayoun in https://github.com/huggingface/transformers/pull/14321

electra is added to onnx supported model by @arron1227 in https://github.com/huggingface/transformers/pull/15084

[GPTJ] fix docs by @patil-suraj in https://github.com/huggingface/transformers/pull/15558

Force use_cache to be False in PyTorch by @ydshieh in https://github.com/huggingface/transformers/pull/15385

Add TFSpeech2Text by @gante in https://github.com/huggingface/transformers/pull/15113

feat(flax): allow encoder_outputs in generate by @borisdayma in https://github.com/huggingface/transformers/pull/15554

Add codecarbon callback to docs by @nateraw in https://github.com/huggingface/transformers/pull/15563

[Flax tests] fix test_model_outputs_equivalence by @patil-suraj in https://github.com/huggingface/transformers/pull/15571

logger.warn --> logger.warning by @ydshieh in https://github.com/huggingface/transformers/pull/15572

PoC for a ProcessorMixin class by @sgugger in https://github.com/huggingface/transformers/pull/15549

add model scaling section by @lvwerra in https://github.com/huggingface/transformers/pull/15119

Upgrade black to version ~=22.0 by @LysandreJik in https://github.com/huggingface/transformers/pull/15565

Make sure custom configs work with Transformers by @sgugger in https://github.com/huggingface/transformers/pull/15569

Add Wav2Vec2 Adapter Weights to Flax by @sanchit-gandhi in https://github.com/huggingface/transformers/pull/15566

Click new version by @LysandreJik in https://github.com/huggingface/transformers/pull/15579

[Flax tests/FlaxBert] make from_pretrained test faster by @patil-suraj in https://github.com/huggingface/transformers/pull/15561

Add implementation of typical sampling by @cimeister in https://github.com/huggingface/transformers/pull/15504

Constrained Beam Search [without disjunctive decoding] by @cwkeam in https://github.com/huggingface/transformers/pull/15416

Fix tests hub failure by @sgugger in https://github.com/huggingface/transformers/pull/15580

update serving_output for some TF models by @ydshieh in https://github.com/huggingface/transformers/pull/15568

[trainer docs] document how to select specific gpus by @stas00 in https://github.com/huggingface/transformers/pull/15551

[ViTMAE] Add link to script by @NielsRogge in https://github.com/huggingface/transformers/pull/15588

Expand tutorial for custom models by @sgugger in https://github.com/huggingface/transformers/pull/15587

Add Tensorflow handling of ONNX conversion by @Albertobegue in https://github.com/huggingface/transformers/pull/13831

Add example batch size to all commands by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15596

Compute loss independent from decoder for TF EncDec models (as #14139) by @ydshieh in https://github.com/huggingface/transformers/pull/15175

Fix Seq2SeqTrainer for VisionEncoderDecoderModel by @NielsRogge in https://github.com/huggingface/transformers/pull/15603

Add local and TensorFlow ONNX export examples to docs by @lewtun in https://github.com/huggingface/transformers/pull/15604

[deepspeed docs] Correct JSON format by @ngoquanghuy99 in https://github.com/huggingface/transformers/pull/15600

Small clean up generate by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15611

Mark "code in the Hub" API as experimental by @sgugger in https://github.com/huggingface/transformers/pull/15624

Enable ONNX export when PyTorch and TensorFlow installed in the same env by @lewtun in https://github.com/huggingface/transformers/pull/15625

TF: Add informative warning for inexistent CPU backprop ops by @gante in https://github.com/huggingface/transformers/pull/15612

Add aws studio notebooks by @mishig25 in https://github.com/huggingface/transformers/pull/15606

TF MT5 embeddings resize by @gante in https://github.com/huggingface/transformers/pull/15567

Fix broken link in CTRL docs by @stevhliu in https://github.com/huggingface/transformers/pull/15615

Fix _configuration_file argument getting passed to model by @sgugger in https://github.com/huggingface/transformers/pull/15629

[deepspeed docs] misc additions by @stas00 in https://github.com/huggingface/transformers/pull/15585

[research_projects] deal with security alerts by @stas00 in https://github.com/huggingface/transformers/pull/15594

Custom feature extractor by @sgugger in https://github.com/huggingface/transformers/pull/15630

Fix grammar in tokenizer_summary docs by @derenrich in https://github.com/huggingface/transformers/pull/15614

Add push to hub to feature extractor by @sgugger in https://github.com/huggingface/transformers/pull/15632

[Fix doc example] FlaxVisionEncoderDecoder by @ydshieh in https://github.com/huggingface/transformers/pull/15626

Fix a bug that QuestionAnsweringPipeline ignores max_seq_len parameter by @wptoux in https://github.com/huggingface/transformers/pull/15238

Report only the failed imports in requires_backends by @tkukurin in https://github.com/huggingface/transformers/pull/15636

Make Swin work with VisionEncoderDecoderModel by @NielsRogge in https://github.com/huggingface/transformers/pull/15527

Remove redundant error logging in from_pretrained() method by @lewtun in https://github.com/huggingface/transformers/pull/15631

Register feature extractor by @sgugger in https://github.com/huggingface/transformers/pull/15634

fix bug for the log of RNG states are not properly loaded lead to exception. by @muzhi1991 in https://github.com/huggingface/transformers/pull/15638

[SpeechEncoderDecoder] Make sure no EOS is generated in test by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15655

Require tokenizers>=0.11.1 by @aphedges in https://github.com/huggingface/transformers/pull/15266

Fix ASR pipelines from local directories with wav2vec models that have language models attached by @versae in https://github.com/huggingface/transformers/pull/15590

Fix typo in speech2text2 doc by @jonrbates in https://github.com/huggingface/transformers/pull/15617

Allow custom code for Processors by @sgugger in https://github.com/huggingface/transformers/pull/15649

add scores to Wav2Vec2WithLMOutput by @arampacha in https://github.com/huggingface/transformers/pull/15413

Update bad_words_ids usage by @ngoquanghuy99 in https://github.com/huggingface/transformers/pull/15641

Updated the RAG training with latest Pytorch Lightning library and the RAY by @shamanez in https://github.com/huggingface/transformers/pull/15653

Add section about doc testing by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15659

add a network debug script and document it by @stas00 in https://github.com/huggingface/transformers/pull/15652

Re-export KeyDataset. by @Narsil in https://github.com/huggingface/transformers/pull/15645

Add decoder_kwargs to send to LM on asr pipeline. by @Narsil in https://github.com/huggingface/transformers/pull/15646

TF generate refactor - Greedy Search by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15562

[pipeline doc] fix api by @stas00 in https://github.com/huggingface/transformers/pull/15660

Fix TFSequenceSummary's activation by @ydshieh in https://github.com/huggingface/transformers/pull/15643

Fix model equivalence tests by @LysandreJik in https://github.com/huggingface/transformers/pull/15670

Fix vit test by @LysandreJik in https://github.com/huggingface/transformers/pull/15671

Add a missing space in a deprecation message by @bryant1410 in https://github.com/huggingface/transformers/pull/15651

[t5/t0/mt5 models] faster/leaner custom layer norm by @stas00 in https://github.com/huggingface/transformers/pull/14656

Add push_to_hub method to processors by @sgugger in https://github.com/huggingface/transformers/pull/15668

Usage examples for logger by @FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/15657

Fix dec_attn_mask in TFTransfoXLMainLayer by @ydshieh in https://github.com/huggingface/transformers/pull/15665

🔥 Remove build_doc_test github action by @coyotte508 in https://github.com/huggingface/transformers/pull/15680

Add register method to AutoProcessor by @sgugger in https://github.com/huggingface/transformers/pull/15669

[Wav2Vec2ProcessorWithLM] Fix auto processor with lm by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15683

Fix Funnel configuration doc by @ydshieh in https://github.com/huggingface/transformers/pull/15686

Implementation of activations as pytorch modules by @eldarkurtic in https://github.com/huggingface/transformers/pull/15616

Add image classification notebook by @NielsRogge in https://github.com/huggingface/transformers/pull/15667

Minor fix on README.md by @ydshieh in https://github.com/huggingface/transformers/pull/15688

Fix shape by @gchhablani in https://github.com/huggingface/transformers/pull/15696

Add SimMIM by @NielsRogge in https://github.com/huggingface/transformers/pull/15586

Adding a model, more doc for pushing to the hub by @FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/15690

fix CLIP fast tokenizer and change some properties of the slow version by @SaulLu in https://github.com/huggingface/transformers/pull/15067

Fix SiluActivation by @sgugger in https://github.com/huggingface/transformers/pull/15718

Add initializer_std to TFFunnelModelTester with a default value 0.02 by @ydshieh in https://github.com/huggingface/transformers/pull/15684

Fix DETR model deprecation warnings for int div by @gautierdag in https://github.com/huggingface/transformers/pull/15702

Fix LongformerModel hidden states by @ydshieh in https://github.com/huggingface/transformers/pull/15537

style_doc handles decorators in examples by @sgugger in https://github.com/huggingface/transformers/pull/15719

Fix auto model tests by @LysandreJik in https://github.com/huggingface/transformers/pull/15706

Fix HfDeepSpeedConfig argument in Trainer by @jaketae in https://github.com/huggingface/transformers/pull/15711

fix bug in PT speech-encoder-decoder by @sanchit-gandhi in https://github.com/huggingface/transformers/pull/15699

Fix undoing preprocessing step in summarization example by @SSardorf in https://github.com/huggingface/transformers/pull/15741

Fix minor comment typos by @Crabzmatic in https://github.com/huggingface/transformers/pull/15740

add VisionTextDualEncoder and CLIP fine-tuning script by @patil-suraj in https://github.com/huggingface/transformers/pull/15701

Add layer_idx to CrossAttention of GPT2 model by @hyunwoongko in https://github.com/huggingface/transformers/pull/15730

TF text classification examples by @gante in https://github.com/huggingface/transformers/pull/15704

revert temporary addition to test next version of CLIPTokenizerFast by @SaulLu in https://github.com/huggingface/transformers/pull/15717

added link to our writing-doc document by @FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/15756

TF train_step docstring by @gante in https://github.com/huggingface/transformers/pull/15755

Gelu10 by @mfuntowicz in https://github.com/huggingface/transformers/pull/15676

fixed pipeline code by @Moumeneb1 in https://github.com/huggingface/transformers/pull/15607

Fix typo on examples/pytorch/question-answering by @dreamgonfly in https://github.com/huggingface/transformers/pull/15644

Cleanup transformers-cli by @julien-c in https://github.com/huggingface/transformers/pull/15767

Fix HfArgumentParser when passing a generator by @bryant1410 in https://github.com/huggingface/transformers/pull/15758

Adding ZeroShotImageClassificationPipeline by @Narsil in https://github.com/huggingface/transformers/pull/12119

[M2M100, XGLM] fix create_position_ids_from_inputs_embeds by @patil-suraj in https://github.com/huggingface/transformers/pull/15751

Supporting Merges.txt files than contain an endline. (hf-internal-testing/tiny-clip for instance) by @Narsil in https://github.com/huggingface/transformers/pull/15782

[CLIP] fix gradient checkpointing by @patil-suraj in https://github.com/huggingface/transformers/pull/15789

[ViLT] Fix checkpoint url in config by @patil-suraj in https://github.com/huggingface/transformers/pull/15790

Enable image-segmentation on AutoModelForSemanticSegmentation by @Narsil in https://github.com/huggingface/transformers/pull/15647

[doc] custom_models: mention security features of the Hub by @julien-c in https://github.com/huggingface/transformers/pull/15768

[Wav2Vec2FeatureExtractor] Align documentation with code by @lsb in https://github.com/huggingface/transformers/pull/15468

HTML dev docs by @coyotte508 in https://github.com/huggingface/transformers/pull/15678

Fix indent in doc-builder CI by @coyotte508 in https://github.com/huggingface/transformers/pull/15798

[Test refactor 1/5] Per-folder tests reorganization by @LysandreJik in https://github.com/huggingface/transformers/pull/15725

[Test refactor 2/5] Tests fetcher by @LysandreJik in https://github.com/huggingface/transformers/pull/15726

[Test refactor 3/5] Notification service improvement by @LysandreJik in https://github.com/huggingface/transformers/pull/15727

[Test refactor 4/5] Improve the scheduled tests by @LysandreJik in https://github.com/huggingface/transformers/pull/15728

[Test refactor 5/5] Build docker images by @LysandreJik in https://github.com/huggingface/transformers/pull/15729

Fix build_documentation CI by @coyotte508 in https://github.com/huggingface/transformers/pull/15803

Fix model templates by @LysandreJik in https://github.com/huggingface/transformers/pull/15806

Fix add-new-model-like when old model checkpoint is not found by @sgugger in https://github.com/huggingface/transformers/pull/15805

Fix from_pretrained with default base_model_prefix by @sgugger in https://github.com/huggingface/transformers/pull/15814

Revert changes in logit size for semantic segmentation models by @sgugger in https://github.com/huggingface/transformers/pull/15722

[Unispeech] Fix slow tests by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15818

[Barthez Tokenizer] Fix saving by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15815

[TFXLNet] Correct tf xlnet generate by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15822

Fixes the "push" CI run by @LysandreJik in https://github.com/huggingface/transformers/pull/15807

Fix semantic segmentation pipeline test by @sgugger in https://github.com/huggingface/transformers/pull/15826

Fix dummy_inputs() to dummy_inputs in symbolic_trace doc string by @pbelevich in https://github.com/huggingface/transformers/pull/15776

Add model specific output classes to PoolFormer model docs by @heytanay in https://github.com/huggingface/transformers/pull/15746

HFTracer.trace should use self.graph to be compatible with torch.fx.Tracer by @pbelevich in https://github.com/huggingface/transformers/pull/15824

Fix tf.concatenate + test past_key_values for TF models by @ydshieh in https://github.com/huggingface/transformers/pull/15774

[examples/summarization and translation] fix readme by @patil-suraj in https://github.com/huggingface/transformers/pull/15833

Add ONNX Runtime quantization for text classification notebook by @echarlaix in https://github.com/huggingface/transformers/pull/15817

Re-enable doctests for the quicktour by @sgugger in https://github.com/huggingface/transformers/pull/15828

Framework split model report by @LysandreJik in https://github.com/huggingface/transformers/pull/15825

[UniSpeechSat] Revert previous incorrect change of slow tests by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15847

Flax Speech-Encoder-Decoder Model by @sanchit-gandhi in https://github.com/huggingface/transformers/pull/15613

Fix (deprecated) ONNX exporter to account for new tf2onnx API by @lewtun in https://github.com/huggingface/transformers/pull/15856

Fixing the timestamps with chunking. by @Narsil in https://github.com/huggingface/transformers/pull/15843

[TF-PT-Tests] Fix PyTorch - TF tests for different GPU devices by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15846

[Benchmark tools] Deprecate all by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15848

Add PT + TF automatic builds by @LysandreJik in https://github.com/huggingface/transformers/pull/15860

Update TF LM examples by @gante in https://github.com/huggingface/transformers/pull/15855

[ViLT] Add link to notebooks by @NielsRogge in https://github.com/huggingface/transformers/pull/15791

Scatter should run on CUDA by @LysandreJik in https://github.com/huggingface/transformers/pull/15872

[vision] Add problem_type support by @NielsRogge in https://github.com/huggingface/transformers/pull/15851

use python 3.7 for flax self-push tests by @patil-suraj in https://github.com/huggingface/transformers/pull/15865

Bump up doc node version to 16 by @mishig25 in https://github.com/huggingface/transformers/pull/15874

No self-hosted by @LysandreJik in https://github.com/huggingface/transformers/pull/15710

fix deepspeed tests by @stas00 in https://github.com/huggingface/transformers/pull/15881

Remove stash for now by @LysandreJik in https://github.com/huggingface/transformers/pull/15882

M2M100 support for ONNX export by @michaelbenayoun in https://github.com/huggingface/transformers/pull/15193

[Bart] Fix implementation note doc by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15879

Add TF generate sample tests with all logit processors by @gante in https://github.com/huggingface/transformers/pull/15852

TF: Update QA example by @gante in https://github.com/huggingface/transformers/pull/15870

Updates in Trainer to support new features in SM Model Parallel library by @rahul003 in https://github.com/huggingface/transformers/pull/15877

Fix tiny typo in docs by @rhjohnstone in https://github.com/huggingface/transformers/pull/15884

Fix Bug in FlaxWav2Vec2 Slow Test by @sanchit-gandhi in https://github.com/huggingface/transformers/pull/15887

[SegFormer] Add deprecation warning by @NielsRogge in https://github.com/huggingface/transformers/pull/15889

TF generate refactor - Sample by @gante in https://github.com/huggingface/transformers/pull/15793

[XGLM] run sampling test on CPU to be deterministic by @patil-suraj in https://github.com/huggingface/transformers/pull/15892

Fix SegformerForImageClassification by @NielsRogge in https://github.com/huggingface/transformers/pull/15895

Update delete-dev-doc job to match build-dev-doc by @sgugger in https://github.com/huggingface/transformers/pull/15891

Impressive community contributors

The community contributors below have significantly contributed to the v4.17.0 release. Thank you!

@sayakpaul, for contributing the TensorFlow version of ConvNext @gchhablani, for contributing PLBart @edugp, for contributing Data2Vec

New Contributors

@Soonhwan-Kwon made their first contribution in https://github.com/huggingface/transformers/pull/13727

@jonatasgrosman made their first contribution in https://github.com/huggingface/transformers/pull/15428

@ToluClassics made their first contribution in https://github.com/huggingface/transformers/pull/15432

@peregilk made their first contribution in https://github.com/huggingface/transformers/pull/15423

@bugface made their first contribution in https://github.com/huggingface/transformers/pull/15480

@AyushExel made their first contribution in https://github.com/huggingface/transformers/pull/14582

@thinksoso made their first contribution in https://github.com/huggingface/transformers/pull/15403

@davidleonfdez made their first contribution in https://github.com/huggingface/transformers/pull/15473

@sanchit-gandhi made their first contribution in https://github.com/huggingface/transformers/pull/15519

@arron1227 made their first contribution in https://github.com/huggingface/transformers/pull/15084

@cimeister made their first contribution in https://github.com/huggingface/transformers/pull/15504

@cwkeam made their first contribution in https://github.com/huggingface/transformers/pull/15416

@Albertobegue made their first contribution in https://github.com/huggingface/transformers/pull/13831

@derenrich made their first contribution in https://github.com/huggingface/transformers/pull/15614

@tkukurin made their first contribution in https://github.com/huggingface/transformers/pull/15636

@muzhi1991 made their first contribution in https://github.com/huggingface/transformers/pull/15638

@versae made their first contribution in https://github.com/huggingface/transformers/pull/15590

@jonrbates made their first contribution in https://github.com/huggingface/transformers/pull/15617

@arampacha made their first contribution in https://github.com/huggingface/transformers/pull/15413

@FrancescoSaverioZuppichini made their first contribution in https://github.com/huggingface/transformers/pull/15657

@coyotte508 made their first contribution in https://github.com/huggingface/transformers/pull/15680

@heytanay made their first contribution in https://github.com/huggingface/transformers/pull/15531

@gautierdag made their first contribution in https://github.com/huggingface/transformers/pull/15702

@SSardorf made their first contribution in https://github.com/huggingface/transformers/pull/15741

@Crabzmatic made their first contribution in https://github.com/huggingface/transformers/pull/15740

@dreamgonfly made their first contribution in https://github.com/huggingface/transformers/pull/15644

@lsb made their first contribution in https://github.com/huggingface/transformers/pull/15468

@pbelevich made their first contribution in https://github.com/huggingface/transformers/pull/15776

@sayakpaul made their first contribution in https://github.com/huggingface/transformers/pull/15750

@rahul003 made their first contribution in https://github.com/huggingface/transformers/pull/15877

@rhjohnstone made their first contribution in https://github.com/huggingface/transformers/pull/15884

Full Changelog: https://github.com/huggingface/transformers/compare/v4.16.0...v4.17.0
Source code(tar.gz)
Source code(zip)
v4.16.2(Jan 31, 2022)
Add header (huggingface#15434)

[Hotfix] Fix Swin model outputs (huggingface#15414)

Full Changelog: https://github.com/huggingface/transformers/compare/v4.16.1...v4.16.2
Source code(tar.gz)
Source code(zip)
v4.16.1(Jan 28, 2022)

V4.16.1: Patch Release

Add init to BORT (#15378) by @LysandreJik
Source code(tar.gz)
Source code(zip)
v4.16.0(Jan 27, 2022)
New models

Nyströmformer

The Nyströmformer model was proposed in Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, and Vikas Singh.

The Nyströmformer model overcomes the quadratic complexity of self-attention on the input sequence length by adapting the Nyström method to approximate standard self-attention, enabling longer sequences with thousands of tokens as input.

Add Nystromformer by @novice03 in https://github.com/huggingface/transformers/pull/14659

Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=nystromformer

REALM

The REALM model was proposed in REALM: Retrieval-Augmented Language Model Pre-Training by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang.

It’s a retrieval-augmented language model that firstly retrieves documents from a textual knowledge corpus and then utilizes retrieved documents to process question answering tasks.

Add REALM by @qqaatw in https://github.com/huggingface/transformers/pull/13292

Add FastTokenizer to REALM by @qqaatw in https://github.com/huggingface/transformers/pull/15211

Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=realm

ViTMAE

The ViTMAE model was proposed in Masked Autoencoders Are Scalable Vision Learners by Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick.

The paper shows that, by pre-training a Vision Transformer (ViT) to reconstruct pixel values for masked patches, one can get results after fine-tuning that outperform supervised pre-training.

Add MAE by @NielsRogge in https://github.com/huggingface/transformers/pull/15120

Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=vit_mae

ViLT

The ViLT model was proposed in ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision by Wonjae Kim, Bokyung Son, Ildoo Kim.

ViLT incorporates text embeddings into a Vision Transformer (ViT), allowing it to have a minimal design for Vision-and-Language Pre-training (VLP).

Add ViLT by @NielsRogge in https://github.com/huggingface/transformers/pull/14895

Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=vilt

Swin Transformer

The Swin Transformer was proposed in Swin Transformer: Hierarchical Vision Transformer using Shifted Windows by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo.

The Swin Transformer serves as a general-purpose backbone for computer vision. The shifted windowing scheme brings greater efficiency by limiting self-attention computation to non-overlapping local windows while also allowing for cross-window connection. This hierarchical architecture has the flexibility to model at various scales and has linear computational complexity with respect to image size.

Add Swin Transformer by @novice03 in https://github.com/huggingface/transformers/pull/15085

Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=swin

YOSO

The YOSO model was proposed in You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling by Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh.

YOSO approximates standard softmax self-attention via a Bernoulli sampling scheme based on Locality Sensitive Hashing (LSH). In principle, all the Bernoulli random variables can be sampled with a single hash.

Add YOSO by @novice03 in #15091

Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=yoso

Add model like

To help contributors add new models more easily to Transformers, there is a new command that will clone an existing model and set the various hooks in the library, so that you only have to write the tweaks needed to the modeling file. Just run transformers-cli add-new-model-like and fill the questionnaire!

Add model like by @sgugger in https://github.com/huggingface/transformers/pull/14992

Training scripts

New training scripts were introduced, for speech seq2seq models and an image pre-training script leveraging the ViTMAE models. Finally, an image captioning example in Flax gets added to the library.

Add Speech Seq2Seq Training script by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14792

[ViTMAE] Add image pretraining script by @NielsRogge in https://github.com/huggingface/transformers/pull/15242

Add Flax image captioning example by @ydshieh in https://github.com/huggingface/transformers/pull/14864

Pipelines

Adding support for long files on automatic-speech-recognition (ASR) as well as supporting audio models with LM which increases the WER on many tasks See the blogpost. Also continuously increasing homogeneity in arguments, framework support on all pipelines.

Large audio chunking for the existing ASR pipeline by @anton-l in https://github.com/huggingface/transformers/pull/14896

Enabling TF on image-classification pipeline. by @Narsil in https://github.com/huggingface/transformers/pull/15030

Pipeline ASR with LM. by @Narsil in https://github.com/huggingface/transformers/pull/15071

ChunkPipeline: batch_size enabled on zero-cls and qa pipelines. by @Narsil in https://github.com/huggingface/transformers/pull/14225

PyTorch improvements

The ELECTRA model can now be used as a decoder, enabling an ELECTRA encoder-decoder model.

Add ElectraForCausalLM -> Enable Electra encoder-decoder model by @stancld in https://github.com/huggingface/transformers/pull/14729

TensorFlow improvements

Keras metric callback by @Rocketknight1 and @merveenoyan in https://github.com/huggingface/transformers/pull/14867

The vision encoder decoder model can now be used in TensorFlow.

Add TFVisionEncoderDecoderModel by @ydshieh in https://github.com/huggingface/transformers/pull/14148

CLIP gets ported to TensorFlow.

Add TFCLIPModel by @ydshieh in https://github.com/huggingface/transformers/pull/13967

Flax improvements

RoFormer gets ported to Flax.

Add Flax RoFormer by @stancld in https://github.com/huggingface/transformers/pull/15005

Deprecations

Deprecates AdamW and adds --optim by @manuelciosici in https://github.com/huggingface/transformers/pull/14744

Documentation

The documentation has been fully migrated to MarkDown, if you are making contribution, make sure to read the upgraded guide on how to write good docstrings.

Convert rst files by @sgugger in https://github.com/huggingface/transformers/pull/14888

Doc styler v2 by @sgugger in https://github.com/huggingface/transformers/pull/14950

Convert last rst file by @sgugger in https://github.com/huggingface/transformers/pull/14952

Doc styler examples by @sgugger in https://github.com/huggingface/transformers/pull/14953

[doc] consistent True/False/None default format by @stas00 in https://github.com/huggingface/transformers/pull/14951

[doc] :obj: hunt by @stas00 in https://github.com/huggingface/transformers/pull/14954

[doc] :class: hunt by @stas00 in https://github.com/huggingface/transformers/pull/14955

Bugfixes and improvements

Fix installation instructions for BART ONNX example by @lewtun in https://github.com/huggingface/transformers/pull/14885

Fix doc examples: ... takes no keyword arguments by @ydshieh in https://github.com/huggingface/transformers/pull/14701

Fix AttributeError from PreTrainedTokenizerFast.decoder by @aphedges in https://github.com/huggingface/transformers/pull/14691

Add 'with torch.no_grad()' to ALBERT integration test forward pass by @henholm in https://github.com/huggingface/transformers/pull/14808

Add ONNX support for MarianMT models by @lewtun in https://github.com/huggingface/transformers/pull/14586

add custom stopping criteria to human eval script by @lvwerra in https://github.com/huggingface/transformers/pull/14897

Set run_name in MLflowCallback by @YangDong2002 in https://github.com/huggingface/transformers/pull/14894

[AutoTokenizer] Fix incorrect from pretrained by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14900

[Tests] Update speech diarization and WavLM tolerances by @anton-l in https://github.com/huggingface/transformers/pull/14902

[doc] post-porting by @stas00 in https://github.com/huggingface/transformers/pull/14890

[Generate] Remove attention_mask and integrate model_main_input_name by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14856

Fix failing GPU trainer tests by @sgugger in https://github.com/huggingface/transformers/pull/14903

Better logic for getting tokenizer config in AutoTokenizer by @sgugger in https://github.com/huggingface/transformers/pull/14906

[doc] install - add link to jax installation by @stas00 in https://github.com/huggingface/transformers/pull/14912

[WavLM] fix wavlm docs by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14910

Fix Perceiver docs by @Sanster in https://github.com/huggingface/transformers/pull/14917

fix to issue #14833 in data_collator - consider no labels by @kleinay in https://github.com/huggingface/transformers/pull/14930

Fix duplicate call to save_checkpoint when using deepspeed by @MihaiBalint in https://github.com/huggingface/transformers/pull/14946

[WavLM] give model more precision tolerance in tests by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14958

[Speech Recognition Examples] Update README.md by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14965

[Tests] Speed up tokenizer tests by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14964

[Wav2Vec2] Rename model's feature extractor to feature encoder by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14959

Replace assertion with exception by @jaketae in https://github.com/huggingface/transformers/pull/14970

remove absl workaround as it's no longer needed by @stas00 in https://github.com/huggingface/transformers/pull/14909

Fixing a pathological case for slow tokenizers by @Narsil in https://github.com/huggingface/transformers/pull/14981

[AutoProcessor] Correct AutoProcessor and automatically add processor… by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14881

[Generate] correct encoder_outputs are passed without attention_mask by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14980

Adding num_return_sequences support for text2text generation. by @Narsil in https://github.com/huggingface/transformers/pull/14988

Enabling tokenizers upgrade. by @Narsil in https://github.com/huggingface/transformers/pull/14941

Allow training to resume even if RNG states are not properly loaded by @sgugger in https://github.com/huggingface/transformers/pull/14994

Map model_type and doc pages names by @sgugger in https://github.com/huggingface/transformers/pull/14944

Fixing t2t pipelines lists outputs. by @Narsil in https://github.com/huggingface/transformers/pull/15008

Improve truncation_side by @Narsil in https://github.com/huggingface/transformers/pull/14947

Fix doc examples: name 'torch' is not defined by @ydshieh in https://github.com/huggingface/transformers/pull/15016

[Tests] Correct Wav2Vec2 & WavLM tests by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15015

[doc] Update parallelism.mdx by @hyunwoongko in https://github.com/huggingface/transformers/pull/15013

Fix Code block speech pretraining example by @flozi00 in https://github.com/huggingface/transformers/pull/14983

Fix a little typo by @milyiyo in https://github.com/huggingface/transformers/pull/15002

Hotfix chunk_length_s instead of _ms. by @Narsil in https://github.com/huggingface/transformers/pull/15029

[doc] Update parallelism.mdx by @hyunwoongko in https://github.com/huggingface/transformers/pull/15018

[megatron convert] PYTHONPATH requirements by @stas00 in https://github.com/huggingface/transformers/pull/14956

Fix doc example: mask_time_indices (numpy) has no attribute 'to' by @ydshieh in https://github.com/huggingface/transformers/pull/15033

Adding QoL for batch_size arg (like others enabled everywhere). by @Narsil in https://github.com/huggingface/transformers/pull/15027

[CLIP] Fix PT test by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15041

[SpeechEncoderDecoder] Fix from pretrained by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15043

[CLIP] Fix TF test by @patil-suraj in https://github.com/huggingface/transformers/pull/15042

Wrap Roberta integration test forward passes with torch.no_grad() by @mattchurgin in https://github.com/huggingface/transformers/pull/15037

Add Detectron2 to Github actions by @NielsRogge in https://github.com/huggingface/transformers/pull/15053

Remove old asserts. by @Narsil in https://github.com/huggingface/transformers/pull/15012

Add 'with torch.no_grad()' to BertGeneration integration test forward passes by @itsTurner in https://github.com/huggingface/transformers/pull/14963

Update run_speech_recognition_seq2seq.py (max_eval_samples instead of train_samples) by @flozi00 in https://github.com/huggingface/transformers/pull/14967

[VisionTextDualEncoder] Fix doc example by @ydshieh in https://github.com/huggingface/transformers/pull/15057

Resubmit changes after rebase to master by @kct22aws in https://github.com/huggingface/transformers/pull/14982

[Fix doc examples] missing from_pretrained by @ydshieh in https://github.com/huggingface/transformers/pull/15044

[VisionTextDualEncoder] Add token_type_ids param by @ydshieh in https://github.com/huggingface/transformers/pull/15073

Fix convert for newer megatron-lm bert model by @yoquankara in https://github.com/huggingface/transformers/pull/14082

[Wav2Vec2 Speech Event] Add speech event v2 by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15083

fix model table cell text alignment by @ydshieh in https://github.com/huggingface/transformers/pull/14999

Update check_repo.py by @kamalkraj in https://github.com/huggingface/transformers/pull/15014

Make OpenAIGPTTokenizer work with SpaCy 2.x and 3.x by @cody-moveworks in https://github.com/huggingface/transformers/pull/15019

Change assignee for tokenizers by @LysandreJik in https://github.com/huggingface/transformers/pull/15088

support the trocr small models by @liminghao1630 in https://github.com/huggingface/transformers/pull/14893

[Fix doc example] RagModel by @ydshieh in https://github.com/huggingface/transformers/pull/15076

Model summary doc page horizontal banners by @mishig25 in https://github.com/huggingface/transformers/pull/15058

Use tqdm.auto in Pipeline docs by @bryant1410 in https://github.com/huggingface/transformers/pull/14920

[doc] normalize HF Transformers string by @stas00 in https://github.com/huggingface/transformers/pull/15023

Happy New Year! by @sgugger in https://github.com/huggingface/transformers/pull/15094

[DOC] fix doc examples for bart-like models by @patil-suraj in https://github.com/huggingface/transformers/pull/15093

[performance doc] Power and Cooling by @stas00 in https://github.com/huggingface/transformers/pull/14935

Add test to check reported training loss by @sgugger in https://github.com/huggingface/transformers/pull/15096

Take gradient accumulation into account when defining samplers by @sgugger in https://github.com/huggingface/transformers/pull/15095

[Fix doc example] Speech2TextForConditionalGeneration by @ydshieh in https://github.com/huggingface/transformers/pull/15092

Fix cookiecutter by @NielsRogge in https://github.com/huggingface/transformers/pull/15100

[Wav2Vec2ProcessorWithLM] improve decoder download by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15040

Adds IBERT to models exportable with ONNX by @MaximovaIrina in https://github.com/huggingface/transformers/pull/14868

change metric_key_prefix in seq2seq_trainer.py by @JejuWayfarer in https://github.com/huggingface/transformers/pull/15099

Print out durations of all scheduled tests by @LysandreJik in https://github.com/huggingface/transformers/pull/15102

Fix failing W2V2 test by @LysandreJik in https://github.com/huggingface/transformers/pull/15104

Doc styler tip by @sgugger in https://github.com/huggingface/transformers/pull/15105

Update ONNX docs by @lewtun in https://github.com/huggingface/transformers/pull/14904

Fix saving FlaubertTokenizer configs by @vmaryasin in https://github.com/huggingface/transformers/pull/14991

Update TF test_step to match train_step by @Rocketknight1 in https://github.com/huggingface/transformers/pull/15111

use block_size instead of max_seq_length in tf run_clm example by @riklopfer in https://github.com/huggingface/transformers/pull/15036

fix: switch from slow to generic tokenizer class by @lvwerra in https://github.com/huggingface/transformers/pull/15122

Fix TFEncoderDecoder labels handling #14357 by @ydshieh in https://github.com/huggingface/transformers/pull/15001

Add ONNX configuration classes to docs by @lewtun in https://github.com/huggingface/transformers/pull/15121

Add with torch.no_grad() to DistilBERT integration test forward pass by @jaketae in https://github.com/huggingface/transformers/pull/14979

mBART support for run_summarization.py by @banda-larga in https://github.com/huggingface/transformers/pull/15125

doc-builder -> doc-build by @LysandreJik in https://github.com/huggingface/transformers/pull/15134

[Fix doc example] - ProphetNetDecoder by @ydshieh in https://github.com/huggingface/transformers/pull/15124

[examples/flax/language-modeling] set loglevel by @stas00 in https://github.com/huggingface/transformers/pull/15129

Update model_sharing.mdx by @carlos-aguayo in https://github.com/huggingface/transformers/pull/15142

Enable AMP for xla:gpu device in trainer class by @ymwangg in https://github.com/huggingface/transformers/pull/15022

[deepspeed tests] fix summarization by @stas00 in https://github.com/huggingface/transformers/pull/15149

Check the repo consistency in model templates test by @sgugger in https://github.com/huggingface/transformers/pull/15141

Add TF glu activation function by @gante in https://github.com/huggingface/transformers/pull/15146

Make sure all submodules are properly registered by @sgugger in https://github.com/huggingface/transformers/pull/15144

[Fix doc example] - OpenAIGPTDoubleHeadsModel by @ydshieh in https://github.com/huggingface/transformers/pull/15143

fix BertTokenizerFast tokenize_chinese_chars arg by @SaulLu in https://github.com/huggingface/transformers/pull/15158

Fix typo in test_configuration_common.py by @novice03 in https://github.com/huggingface/transformers/pull/15160

Add "open in hf spaces" gradio button issue #73 by @AK391 in https://github.com/huggingface/transformers/pull/15106

TF Bert inference - support np.ndarray optional arguments by @gante in https://github.com/huggingface/transformers/pull/15074

Fixing flaky test (hopefully). by @Narsil in https://github.com/huggingface/transformers/pull/15154

Better dummies by @sgugger in https://github.com/huggingface/transformers/pull/15148

Update from keras2onnx to tf2onnx by @gante in https://github.com/huggingface/transformers/pull/15162

[doc] performance: Efficient Software Prebuilds by @stas00 in https://github.com/huggingface/transformers/pull/15147

[Speech models] Disable non-existing chunking in tests by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15163

Added forward pass of test_inference_image_classification_head by @MrinalTyagi in https://github.com/huggingface/transformers/pull/14777

Fix dtype issue in TF BART by @Rocketknight1 in https://github.com/huggingface/transformers/pull/15178

[doc] new MoE paper by @stas00 in https://github.com/huggingface/transformers/pull/15184

Mark bad tokenizers version by @sgugger in https://github.com/huggingface/transformers/pull/15188

[Fix doc example] UniSpeechSatForPreTraining by @ydshieh in https://github.com/huggingface/transformers/pull/15152

is_ctc needs to be updated to `self.type == "ctc". by @Narsil in https://github.com/huggingface/transformers/pull/15194

[Fix doc example] TFRagModel by @ydshieh in https://github.com/huggingface/transformers/pull/15187

Error when code examples are improperly closed by @sgugger in https://github.com/huggingface/transformers/pull/15186

Fix deprecation warnings for int div by @sgugger in https://github.com/huggingface/transformers/pull/15180

Copies and docstring styling by @sgugger in https://github.com/huggingface/transformers/pull/15202

[ASR pipeline] correct with lm pipeline by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15200

Remove dependency to quiet Dependabot by @sgugger in https://github.com/huggingface/transformers/pull/15205

Ignore empty subfolders when identifying submodules by @sgugger in https://github.com/huggingface/transformers/pull/15204

[MBartTokenizer] remove dep on xlm-roberta tokenizer by @patil-suraj in https://github.com/huggingface/transformers/pull/15201

fix: #14486 do not use BertPooler in DPR by @PaulLerner in https://github.com/huggingface/transformers/pull/15068

[Fix doc example] Wrong checkpoint name by @ydshieh in https://github.com/huggingface/transformers/pull/15079

[Robust Speech Event] Add guides by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15155

Enable tqdm toggling by @jaketae in https://github.com/huggingface/transformers/pull/15167

[FLAX] glue training example refactor by @kamalkraj in https://github.com/huggingface/transformers/pull/13815

Rename compute_loss in TF models by @Rocketknight1 in https://github.com/huggingface/transformers/pull/15207

Build dev documentation by @LysandreJik in https://github.com/huggingface/transformers/pull/15210

[Fix doc example] TFFunnelTokenizer' is not defined by @ydshieh in https://github.com/huggingface/transformers/pull/15225

Correct Speech Event Readme by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15226

[ViTMAE] Various fixes by @NielsRogge in https://github.com/huggingface/transformers/pull/15221

[Speech Event] Fix speech event readme by @patil-suraj in https://github.com/huggingface/transformers/pull/15227

Fix typo in BERT tokenization file by @qqaatw in https://github.com/huggingface/transformers/pull/15228

Fix PR number by @LysandreJik in https://github.com/huggingface/transformers/pull/15231

Adapt Common Voice Talk Title and Abstract by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15233

Update Trainer code example by @NielsRogge in https://github.com/huggingface/transformers/pull/15070

Make chuking smartly (long files) work on asr ctc_with_lm. by @Narsil in https://github.com/huggingface/transformers/pull/15219

Fix usage of additional kwargs in from_encoder_decoder_pretrained in encoder-decoder models by @jsnfly in https://github.com/huggingface/transformers/pull/15056

Update README.md by @anton-l in https://github.com/huggingface/transformers/pull/15239

Update README.md by @anton-l in https://github.com/huggingface/transformers/pull/15246

Update pipelines.mdx by @kamalkraj in https://github.com/huggingface/transformers/pull/15243

[Fix doc example] missing import by @ydshieh in https://github.com/huggingface/transformers/pull/15240

Fixes tf_default_data_collator sometimes guessing the wrong dtype for labels by @Rocketknight1 in https://github.com/huggingface/transformers/pull/15234

Make sure to raise NotImplementedError with correct method name by @kumapo in https://github.com/huggingface/transformers/pull/15253

Fix crash when logs are empty because Keras has wiped them out of spite by @Rocketknight1 in https://github.com/huggingface/transformers/pull/15258

Tentative workflow improvement by @LysandreJik in https://github.com/huggingface/transformers/pull/15255

Fix code examples by @NielsRogge in https://github.com/huggingface/transformers/pull/15257

Adds missing module_specs for usages of _LazyModule by @jkuball in https://github.com/huggingface/transformers/pull/15230

Prepare ONNX export for torch v1.11 by @lewtun in https://github.com/huggingface/transformers/pull/15270

Fix by @novice03 in https://github.com/huggingface/transformers/pull/15276

Move BART + ONNX example to research_projects by @lewtun in https://github.com/huggingface/transformers/pull/15271

Specify providers explicitly in ORT session initialization by @wangyems in https://github.com/huggingface/transformers/pull/15235

Fixes Benchmark example link by @evandrosks in https://github.com/huggingface/transformers/pull/15278

[Robust Speech Challenge] Add timeline by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15274

[Fix doc example] TFLayoutLMForTokenClassification: missing import tf by @ydshieh in https://github.com/huggingface/transformers/pull/15268

[Wav2Vec2ProcessorWithLM] improve multi processing by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15247

Refine errors for pretrained objects by @sgugger in https://github.com/huggingface/transformers/pull/15261

[PyTorch-nightly-test] Fix Wav2Vec2 LM & Phoneme tests by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15272

Update eval.py by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15310

Update CONTRIBUTING.md by @kamalkraj in https://github.com/huggingface/transformers/pull/15290

Fix a typo in tag addition by @sgugger in https://github.com/huggingface/transformers/pull/15286

Remove old debug code leftover. by @Narsil in https://github.com/huggingface/transformers/pull/15306

[Fix doc example] fix missing import jnp by @ydshieh in https://github.com/huggingface/transformers/pull/15291

[LayoutLMV2 Tests] Make sure input is on GPU by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15314

Replace NystromformerTokenizer with AutoTokenizer by @novice03 in https://github.com/huggingface/transformers/pull/15312

[Beam Search] Correct returned beam scores by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14654

[Examples] Correct run ner label2id for fine-tuned models by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15017

Avoid using get_list_of_files by @sgugger in https://github.com/huggingface/transformers/pull/15287

[Tests] Fix test by @NielsRogge in https://github.com/huggingface/transformers/pull/15324

Add 🤗 Accelerate tutorial by @stevhliu in https://github.com/huggingface/transformers/pull/15263

Added missing code in exemplary notebook - custom datasets fine-tuning by @Pawloch247 in https://github.com/huggingface/transformers/pull/15300

Fix encoder-decoder models when labels is passed by @ydshieh in https://github.com/huggingface/transformers/pull/15172

Fix table formatting in SegFormer docs by @deppen8 in https://github.com/huggingface/transformers/pull/15337

Fix deepspeed docs by @ngoquanghuy99 in https://github.com/huggingface/transformers/pull/15346

Fix 'eval_split_name' described as defaulting to 'train' by @FremyCompany in https://github.com/huggingface/transformers/pull/15348

Update doc writing guide by @sgugger in https://github.com/huggingface/transformers/pull/15350

Add YOSO by @novice03 in https://github.com/huggingface/transformers/pull/15091

[docs] post-PR merge fix by @stas00 in https://github.com/huggingface/transformers/pull/15355

Fix YosoConfig doc by @sgugger in https://github.com/huggingface/transformers/pull/15353

[DocTests Speech] Add doc tests for all speech models by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15031

Push to hub save by @sgugger in https://github.com/huggingface/transformers/pull/15327

Fix KerasMetricCallback prediction with generate() and inference of column names by @Rocketknight1 in https://github.com/huggingface/transformers/pull/15351

Add a device argument to the eval script by @anton-l in https://github.com/huggingface/transformers/pull/15371

improve saving strategy of sentencepiece tokenizer by @SaulLu in https://github.com/huggingface/transformers/pull/15328

Implement fixes for TrainingArguments doc by @sgugger in https://github.com/huggingface/transformers/pull/15370

Super-small fix stops us confusing Keras console logging by modifying… by @Rocketknight1 in https://github.com/huggingface/transformers/pull/15373

Add proper documentation for Keras callbacks by @sgugger in https://github.com/huggingface/transformers/pull/15374

Example script for PushToHubCallback by @Rocketknight1 in https://github.com/huggingface/transformers/pull/15375

Impressive community contributors

The community contributors below have significantly contributed to the v4.16.0 release. Thank you!

@novice03, for contributing Nyströmformer, Swin Transformer and YOSO

@qqaatw, for contributing REALM

@stancld, for adding support for ELECTRA as a decoder, and porting RoFormer to Flax

@ydshieh, for a myriad of documentation fixes, the port of CLIP to TensorFlow, the addition of the TensorFlow vision encoder-decoder model, and the contribution of an image captioning example in Flax.

New Contributors

@YangDong2002 made their first contribution in https://github.com/huggingface/transformers/pull/14894

@Sanster made their first contribution in https://github.com/huggingface/transformers/pull/14917

@kleinay made their first contribution in https://github.com/huggingface/transformers/pull/14930

@MihaiBalint made their first contribution in https://github.com/huggingface/transformers/pull/14946

@milyiyo made their first contribution in https://github.com/huggingface/transformers/pull/15002

@mattchurgin made their first contribution in https://github.com/huggingface/transformers/pull/15037

@itsTurner made their first contribution in https://github.com/huggingface/transformers/pull/14963

@kct22aws made their first contribution in https://github.com/huggingface/transformers/pull/14982

@yoquankara made their first contribution in https://github.com/huggingface/transformers/pull/14082

@cody-moveworks made their first contribution in https://github.com/huggingface/transformers/pull/15019

@MaximovaIrina made their first contribution in https://github.com/huggingface/transformers/pull/14868

@JejuWayfarer made their first contribution in https://github.com/huggingface/transformers/pull/15099

@novice03 made their first contribution in https://github.com/huggingface/transformers/pull/14659

@banda-larga made their first contribution in https://github.com/huggingface/transformers/pull/15125

@manuelciosici made their first contribution in https://github.com/huggingface/transformers/pull/14744

@carlos-aguayo made their first contribution in https://github.com/huggingface/transformers/pull/15142

@gante made their first contribution in https://github.com/huggingface/transformers/pull/15146

@AK391 made their first contribution in https://github.com/huggingface/transformers/pull/15106

@MrinalTyagi made their first contribution in https://github.com/huggingface/transformers/pull/14777

@jsnfly made their first contribution in https://github.com/huggingface/transformers/pull/15056

@jkuball made their first contribution in https://github.com/huggingface/transformers/pull/15230

@wangyems made their first contribution in https://github.com/huggingface/transformers/pull/15235

@evandrosks made their first contribution in https://github.com/huggingface/transformers/pull/15278

@Pawloch247 made their first contribution in https://github.com/huggingface/transformers/pull/15300

@deppen8 made their first contribution in https://github.com/huggingface/transformers/pull/15337

@ngoquanghuy99 made their first contribution in https://github.com/huggingface/transformers/pull/15346

Full Changelog: https://github.com/huggingface/transformers/compare/v4.15.0...v4.16.0
Source code(tar.gz)
Source code(zip)
v4.15.0(Dec 22, 2021)
New Model additions

WavLM

WavLM was proposed in WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei.

WavLM sets a new SOTA on the SUPERB benchmark.

Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=wavlm

Add WavLM by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14354

Wav2Vec2Phoneme

Wav2Vec2Phoneme was proposed in Simple and Effective Zero-shot Cross-lingual Phoneme Recognition by Qiantong Xu, Alexei Baevski, Michael Auli. Wav2Vec2Phoneme allows to do phoneme classification as part of automatic speech recognition

[Wav2Vec2 Phoneme] Let phonemizer lang default to tokenizer's settings by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14829

Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=phoneme-recognition

UniSpeech-SAT

Unispeech-SAT was proposed in UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING by Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu.

UniSpeech-SAT is especially good at speaker related tasks.

Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=unispeech-sat

UniSpeech

Unispeech was proposed in UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang. Three new models are released as part of the ImageGPT integration: ImageGPTModel, ImageGPTForCausalImageModeling, ImageGPTForImageClassification, in PyTorch.

Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=unispeech

New Tasks

Speaker Diarization and Verification

Wav2Vec2-like architecture now have a speaker diarization and speaker verification head added to their architectures. You can try out the new task here: https://huggingface.co/spaces/microsoft/wavlm-speaker-verification

Add Speaker Diarization and Verification heads by @anton-l in https://github.com/huggingface/transformers/pull/14723

What's Changed

Move import to avoid circular import by @sgugger in https://github.com/huggingface/transformers/pull/14787

PoC for conserving old links by @sgugger in https://github.com/huggingface/transformers/pull/14754

Removes images to put them in a dataset by @LysandreJik in https://github.com/huggingface/transformers/pull/14781

Post sphinx-clean up and contributing guide updates by @sgugger in https://github.com/huggingface/transformers/pull/14790

Fix the build documentation job by @sgugger in https://github.com/huggingface/transformers/pull/14788

Update CONTRIBUTING.md by @kamalkraj in https://github.com/huggingface/transformers/pull/14799

Update CONTRIBUTING.md by @kamalkraj in https://github.com/huggingface/transformers/pull/14800

Train step fix by @Rocketknight1 in https://github.com/huggingface/transformers/pull/14796

[Generate] Make generate multi-modal by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14784

Remove require_datasets testing utility by @LysandreJik in https://github.com/huggingface/transformers/pull/14795

[WavLM] Correct position bias computation by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14805

Fix Perceiver multi GPU test by @NielsRogge in https://github.com/huggingface/transformers/pull/14810

[WavLM] Layerdrop is not allowed for first layer by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14811

[Generate] Correct input_ids detection by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14815

Implement head_mask for Flax BERT and other models copied from BERT by @stancld in https://github.com/huggingface/transformers/pull/14620

Convert rst to mdx bert by @LysandreJik in https://github.com/huggingface/transformers/pull/14806

Wav2Vec2 meets phonemes by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14353

[ImageGPT] Deprecate pixel_values input name to input_ids by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14801

[Seq2SeqTrainer] Remove model input name hack by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14802

[WavLM] Fix slow tests by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14845

Add SD and SV heads for WavLM by @anton-l in https://github.com/huggingface/transformers/pull/14847

Add an argument to set bucket_cap_mb for PyTorch DDP by @changlan in https://github.com/huggingface/transformers/pull/14756

Update CONTRIBUTING.md by @kamalkraj in https://github.com/huggingface/transformers/pull/14835

Fix dead link to benchmarks.ipynb by @DerekChia in https://github.com/huggingface/transformers/pull/14842

[Perceiver] Skip multi-gpu tests for now by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14813

Add 'with torch.no_grad()' to DeBERTa integration test forward pass by @henholm in https://github.com/huggingface/transformers/pull/14821

Add 'with torch.no_grad()' to BERT integration test forward pass by @henholm in https://github.com/huggingface/transformers/pull/14820

Add a main_input_name attribute to all models by @sgugger in https://github.com/huggingface/transformers/pull/14803

[doc] typo by @stas00 in https://github.com/huggingface/transformers/pull/14849

[logging] implement warning_advice / TRANSFORMERS_NO_ADVISORY_WARNINGS by @stas00 in https://github.com/huggingface/transformers/pull/14669

Make the onnx submodule init lazy by @sgugger in https://github.com/huggingface/transformers/pull/14855

Convert docstrings of modeling files by @sgugger in https://github.com/huggingface/transformers/pull/14850

[Bart] better error message by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14854

Only create the model card on process 0 by @sgugger in https://github.com/huggingface/transformers/pull/14857

[ASR example] Improve example + add more examples by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14848

Fix the value error typo of AdamW's betas' valid values checking by @dourgey in https://github.com/huggingface/transformers/pull/14780

Add custom stopping_criteria and logits_processor to generate by @lvwerra in https://github.com/huggingface/transformers/pull/14779

Replace commit sha by commit url for update jobs by @sgugger in https://github.com/huggingface/transformers/pull/14852

[examples/summarization] deal with None in data records by @stas00 in https://github.com/huggingface/transformers/pull/14816

[doc porting] several docs by @stas00 in https://github.com/huggingface/transformers/pull/14858

Mass conversion of documentation from rst to Markdown by @sgugger in https://github.com/huggingface/transformers/pull/14866

Fix FLAX_MULTIPLE_CHOICE_SAMPLE typo by @mishig25 in https://github.com/huggingface/transformers/pull/14871

Fixes in marian doc by @sgugger in https://github.com/huggingface/transformers/pull/14872

Fix FlaxMarianMTModel return block. by @sgugger in https://github.com/huggingface/transformers/pull/14873

Fix doc mistakes by @sgugger in https://github.com/huggingface/transformers/pull/14874

Convert model files from rst to mdx by @LysandreJik in https://github.com/huggingface/transformers/pull/14865

update the arguments add_prefix_space and trim_offsets in backend_tokenizer.post_processor of RobertaTokenizerFast by @SaulLu in https://github.com/huggingface/transformers/pull/14752

Feature/fix slow test in mluke by @Ryou0634 in https://github.com/huggingface/transformers/pull/14749

Updated deberta attention by @guillaume-be in https://github.com/huggingface/transformers/pull/14625

IterableDatasetShard should use per device batch size instead of real… by @SysuCharon in https://github.com/huggingface/transformers/pull/14714

Fix Perceiver code example by @NielsRogge in https://github.com/huggingface/transformers/pull/14879

Fix pytorch image classification example by @mariosasko in https://github.com/huggingface/transformers/pull/14883

Onnx enable tasks for supported models (part 2) by @michaelbenayoun in https://github.com/huggingface/transformers/pull/14700

Properly indent return block by @sgugger in https://github.com/huggingface/transformers/pull/14887

New Contributors

@changlan made their first contribution in https://github.com/huggingface/transformers/pull/14756

@DerekChia made their first contribution in https://github.com/huggingface/transformers/pull/14842

@henholm made their first contribution in https://github.com/huggingface/transformers/pull/14821

@dourgey made their first contribution in https://github.com/huggingface/transformers/pull/14780

@SysuCharon made their first contribution in https://github.com/huggingface/transformers/pull/14714

Full Changelog: https://github.com/huggingface/transformers/compare/v4.14.0...v4.15.0
Source code(tar.gz)
Source code(zip)
v4.14.1(Dec 15, 2021)

v4.14.1 Patch release

Fixes a circular import when TensorFlow and Onnx are both installed (#14787)
Source code(tar.gz)
Source code(zip)
v4.14.0(Dec 15, 2021)
Perceiver

The Perceiver model was released in the previous version:

Perceiver

Eight new models are released as part of the Perceiver implementation: PerceiverModel, PerceiverForMaskedLM, PerceiverForSequenceClassification, PerceiverForImageClassificationLearned, PerceiverForImageClassificationFourier, PerceiverForImageClassificationConvProcessing, PerceiverForOpticalFlow, PerceiverForMultimodalAutoencoding, in PyTorch.

The Perceiver IO model was proposed in Perceiver IO: A General Architecture for Structured Inputs & Outputs by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.

Add Perceiver IO by @NielsRogge in https://github.com/huggingface/transformers/pull/14487

Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=perceiver

Version v4.14.0 adds support for Perceiver in multiple pipelines, including the fill mask and sequence classification pipelines.

Keras model cards

The Keras push to hub callback now generates model cards when pushing to the model hub. Additionally to the callback, model cards will be generated by default by the model.push_to_hub() method.

TF model cards by @Rocketknight1 in https://github.com/huggingface/transformers/pull/14720

What's Changed

Fix : wrong link in the documentation (ConvBERT vs DistilBERT) by @Tikquuss in https://github.com/huggingface/transformers/pull/14705

Put back open in colab markers by @sgugger in https://github.com/huggingface/transformers/pull/14684

Fix doc examples: KeyError by @ydshieh in https://github.com/huggingface/transformers/pull/14699

Fix doc examples: 'CausalLMOutput...' object has no attribute 'last_hidden_state' by @ydshieh in https://github.com/huggingface/transformers/pull/14678

Adding Perceiver to AutoTokenizer. by @Narsil in https://github.com/huggingface/transformers/pull/14711

Fix doc examples: unexpected keyword argument by @ydshieh in https://github.com/huggingface/transformers/pull/14689

Automatically build doc notebooks by @sgugger in https://github.com/huggingface/transformers/pull/14718

Fix special character in MDX by @sgugger in https://github.com/huggingface/transformers/pull/14721

Fixing tests for perceiver (texts) by @Narsil in https://github.com/huggingface/transformers/pull/14719

[doc] document MoE model approach and current solutions by @stas00 in https://github.com/huggingface/transformers/pull/14725

[Flax examples] remove dependancy on pytorch training args by @patil-suraj in https://github.com/huggingface/transformers/pull/14636

Update bug-report.md by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14715

[Adafactor] Fix adafactor by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14713

Code parrot minor fixes/niceties by @ncoop57 in https://github.com/huggingface/transformers/pull/14666

Fix doc examples: modify config before super().init by @ydshieh in https://github.com/huggingface/transformers/pull/14697

Improve documentation of some models by @NielsRogge in https://github.com/huggingface/transformers/pull/14695

Skip Perceiver tests by @LysandreJik in https://github.com/huggingface/transformers/pull/14745

Add ability to get a list of supported pipeline tasks by @codesue in https://github.com/huggingface/transformers/pull/14732

Fix the perceiver docs by @LysandreJik in https://github.com/huggingface/transformers/pull/14748

[CI/pt-nightly] switch to cuda-11.3 by @stas00 in https://github.com/huggingface/transformers/pull/14726

Swap TF and PT code inside two blocks by @LucienShui in https://github.com/huggingface/transformers/pull/14742

Fix doc examples: cannot import name by @ydshieh in https://github.com/huggingface/transformers/pull/14698

Fix: change tooslow to slow by @ydshieh in https://github.com/huggingface/transformers/pull/14734

Small fixes for the doc by @sgugger in https://github.com/huggingface/transformers/pull/14751

Update transformers metadata by @sgugger in https://github.com/huggingface/transformers/pull/14724

Mention no images added to repository by @LysandreJik in https://github.com/huggingface/transformers/pull/14738

Avoid using tf.tile in embeddings for TF models by @ydshieh in https://github.com/huggingface/transformers/pull/14735

Change how to load config of XLNetLMHeadModel by @josutk in https://github.com/huggingface/transformers/pull/14746

Improve perceiver by @NielsRogge in https://github.com/huggingface/transformers/pull/14750

Convert Trainer doc page to MarkDown by @sgugger in https://github.com/huggingface/transformers/pull/14753

Update Table of Contents by @sgugger in https://github.com/huggingface/transformers/pull/14755

Fixing tests for Perceiver by @Narsil in https://github.com/huggingface/transformers/pull/14739

Make data shuffling in run_clm_flax.py respect global seed by @bminixhofer in https://github.com/huggingface/transformers/pull/13410

Adding support for multiple mask tokens. by @Narsil in https://github.com/huggingface/transformers/pull/14716

Fix broken links to distillation on index page of documentation by @amitness in https://github.com/huggingface/transformers/pull/14722

[doc] performance: groups of operations by compute-intensity by @stas00 in https://github.com/huggingface/transformers/pull/14757

Fix the doc_build_test job by @sgugger in https://github.com/huggingface/transformers/pull/14774

Fix preprocess_function in run_summarization_flax.py by @ydshieh in https://github.com/huggingface/transformers/pull/14769

Simplify T5 docs by @xhlulu in https://github.com/huggingface/transformers/pull/14776

Update Perceiver code examples by @NielsRogge in https://github.com/huggingface/transformers/pull/14783

New Contributors

@Tikquuss made their first contribution in https://github.com/huggingface/transformers/pull/14705

@codesue made their first contribution in https://github.com/huggingface/transformers/pull/14732

@LucienShui made their first contribution in https://github.com/huggingface/transformers/pull/14742

@josutk made their first contribution in https://github.com/huggingface/transformers/pull/14746

@amitness made their first contribution in https://github.com/huggingface/transformers/pull/14722

Full Changelog: https://github.com/huggingface/transformers/compare/v4.13.0...v4.14.0
Source code(tar.gz)
Source code(zip)
v4.13.0(Dec 9, 2021)
New Model additions

Perceiver

Eight new models are released as part of the Perceiver implementation: PerceiverModel, PerceiverForMaskedLM, PerceiverForSequenceClassification, PerceiverForImageClassificationLearned, PerceiverForImageClassificationFourier, PerceiverForImageClassificationConvProcessing, PerceiverForOpticalFlow, PerceiverForMultimodalAutoencoding, in PyTorch.

The Perceiver IO model was proposed in Perceiver IO: A General Architecture for Structured Inputs & Outputs by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.

Add Perceiver IO by @NielsRogge in https://github.com/huggingface/transformers/pull/14487

Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=perceiver

mLUKE

The mLUKE tokenizer is added. The tokenizer can be used for the multilingual variant of LUKE.

The mLUKE model was proposed in mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models by Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka. It's a multilingual extension of the LUKE model trained on the basis of XLM-RoBERTa.

Add mLUKE by @Ryou0634 in https://github.com/huggingface/transformers/pull/14640

Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=luke

ImageGPT

Three new models are released as part of the ImageGPT integration: ImageGPTModel, ImageGPTForCausalImageModeling, ImageGPTForImageClassification, in PyTorch.

The ImageGPT model was proposed in Generative Pretraining from Pixels by Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever. ImageGPT (iGPT) is a GPT-2-like model trained to predict the next pixel value, allowing for both unconditional and conditional image generation.

Add ImageGPT by @NielsRogge in https://github.com/huggingface/transformers/pull/14240

Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=imagegpt

QDQBert

Eight new models are released as part of the QDQBert implementation: QDQBertModel, QDQBertLMHeadModel, QDQBertForMaskedLM, QDQBertForSequenceClassification, QDQBertForNextSentencePrediction, QDQBertForMultipleChoice, QDQBertForTokenClassification, QDQBertForQuestionAnswering, in PyTorch.

The QDQBERT model can be referenced in Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation by Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius.

Add QDQBert model and quantization examples of SQUAD task by @shangz-ai in https://github.com/huggingface/transformers/pull/14066

Semantic Segmentation models

The semantic Segmentation models' API is unstable and bound to change between this version and the next.

The first semantic segmentation models are added. In semantic segmentation, the goal is to predict a class label for every pixel of an image. The models that are added are SegFormer (by NVIDIA) and BEiT (by Microsoft Research). BEiT was already available in the library, but this release includes the model with a semantic segmentation head.

The SegFormer model was proposed in SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo. The model consists of a hierarchical Transformer encoder and a lightweight all-MLP decode head to achieve great results on image segmentation benchmarks such as ADE20K and Cityscapes.

The BEiT model was proposed in BEiT: BERT Pre-Training of Image Transformers by Hangbo Bao, Li Dong, Furu Wei. Rather than pre-training the model to predict the class of an image (as done in the original ViT paper), BEiT models are pre-trained to predict visual tokens from the codebook of OpenAI’s DALL-E model given masked patches.

Add SegFormer by @NielsRogge in https://github.com/huggingface/transformers/pull/14019

Add BeitForSemanticSegmentation by @NielsRogge in https://github.com/huggingface/transformers/pull/14096

Vision-text dual encoder

Adds VisionTextDualEncoder model in PyTorch and Flax to be able to load any pre-trained vision (ViT, DeiT, BeiT, CLIP's vision model) and text (BERT, ROBERTA) model in the library for vision-text tasks like CLIP.

This model pairs a vision and text encoder and adds projection layers to project the embeddings to another embeddings space with similar dimensions. which can then be used to align the two modalities.

VisionTextDualEncoder by @patil-suraj in https://github.com/huggingface/transformers/pull/13511

CodeParrot

CodeParrot, a model trained to generate code, has been open-sourced in the research projects by @lvwerra.

Add CodeParrot 🦜 codebase by @lvwerra in https://github.com/huggingface/transformers/pull/14536

Language model support for ASR

Add language model support for CTC models by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14339 Language model boosted decoding is added for all CTC models via https://github.com/kensho-technologies/pyctcdecode and https://github.com/kpu/kenlm.

See https://huggingface.co/patrickvonplaten/wav2vec2-xlsr-53-es-kenlm for more information.

Flax-specific additions

Adds Flax version of the vision encoder-decoder model, and adds a Flax version of GPT-J.

Add FlaxVisionEncoderDecoderModel by @ydshieh in https://github.com/huggingface/transformers/pull/13359

FlaxGPTJ by @patil-suraj in https://github.com/huggingface/transformers/pull/14396

TensorFlow-specific additions

Vision transformers are here! Convnets are so 2012, now that ML is converging on self-attention as a universal model.

Add TFViTModel by @ydshieh in https://github.com/huggingface/transformers/pull/13778

Want to handle real-world tables, where text and data are positioned in a 2D grid? TAPAS is now here for both TensorFlow and PyTorch.

Tapas tf by @kamalkraj in https://github.com/huggingface/transformers/pull/13393

Automatic checkpointing and cloud saves to the HuggingFace Hub during training are now live, allowing you to resume training when it's interrupted, even if your initial instance is terminated. This is an area of very active development - watch this space for future developments, including automatic model card creation and more.

Add model checkpointing to push_to_hub and PushToHubCallback by @Rocketknight1 in https://github.com/huggingface/transformers/pull/14492

Auto-processors

A new class to automatically select processors is added: AutoProcessor. It can be used for all models that require a processor, in both computer vision and audio.

Auto processor by @sgugger in https://github.com/huggingface/transformers/pull/14465

New documentation frontend

A new documentation frontend is out for the transformers library! The goal with this documentation is to be better aligned with the rest of our website, and contains tools to improve readability. The documentation can now be written in markdown rather than RST.

Doc new front by @sgugger in https://github.com/huggingface/transformers/pull/14590

LayoutLM Improvements

The LayoutLMv2 feature extractor now supports non-English languages, and LayoutXLM gets its own processor.

LayoutLMv2FeatureExtractor now supports non-English languages when applying Tesseract OCR. by @Xargonus in https://github.com/huggingface/transformers/pull/14514

Add LayoutXLMProcessor (and LayoutXLMTokenizer, LayoutXLMTokenizerFast) by @NielsRogge in https://github.com/huggingface/transformers/pull/14115

Trainer Improvements

You can now take advantage of the Ampere hardware with the Trainer:

--bf16 - do training or eval in mixed precision of bfloat16

--bf16_full_eval - do eval in full bfloat16

--tf32 control having TF32 mode on/off

Improvements and bugfixes

Replace assertions with RuntimeError exceptions by @ddrm86 in https://github.com/huggingface/transformers/pull/14186

Adding batch_size support for (almost) all pipelines by @Narsil in https://github.com/huggingface/transformers/pull/13724

Remove n_ctx from configs by @thomasw21 in https://github.com/huggingface/transformers/pull/14165

Add BlenderbotTokenizerFast by @stancld in https://github.com/huggingface/transformers/pull/13720

Adding handle_long_generation paramters for text-generation pipeline. by @Narsil in https://github.com/huggingface/transformers/pull/14118

Fix pipeline tests env and fetch by @sgugger in https://github.com/huggingface/transformers/pull/14209

Generalize problem_type to all sequence classification models by @sgugger in https://github.com/huggingface/transformers/pull/14180

Fixing image segmentation with inference mode. by @Narsil in https://github.com/huggingface/transformers/pull/14204

Add a condition for checking labels by @hrxorxm in https://github.com/huggingface/transformers/pull/14211

Torch 1.10 by @LysandreJik in https://github.com/huggingface/transformers/pull/14169

Add more missing models to models/init.py by @ydshieh in https://github.com/huggingface/transformers/pull/14177

Clarify QA examples by @NielsRogge in https://github.com/huggingface/transformers/pull/14172

Fixing image-segmentation tests. by @Narsil in https://github.com/huggingface/transformers/pull/14223

Tensor location is already handled by @Narsil in https://github.com/huggingface/transformers/pull/14224

Raising exceptions instead of using assertions for few models by @pdcoded in https://github.com/huggingface/transformers/pull/14219

Fix the write problem in trainer.py comment by @wmathor in https://github.com/huggingface/transformers/pull/14202

[GPTJ] enable common tests and few fixes by @patil-suraj in https://github.com/huggingface/transformers/pull/14190

improving efficiency of mlflow metric logging by @wamartin-aml in https://github.com/huggingface/transformers/pull/14232

Fix generation docstring by @qqaatw in https://github.com/huggingface/transformers/pull/14216

Fix test_configuration_tie in FlaxEncoderDecoderModelTest by @ydshieh in https://github.com/huggingface/transformers/pull/14076

[Tests] Fix DistilHubert path by @anton-l in https://github.com/huggingface/transformers/pull/14245

Add PushToHubCallback in main init by @sgugger in https://github.com/huggingface/transformers/pull/14246

Fixes Beit training for PyTorch 1.10+ by @sgugger in https://github.com/huggingface/transformers/pull/14249

Added Beit model ouput class by @lumliolum in https://github.com/huggingface/transformers/pull/14133

Update Transformers to huggingface_hub >= 0.1.0 by @sgugger in https://github.com/huggingface/transformers/pull/14251

Add cross attentions to TFGPT2Model by @ydshieh in https://github.com/huggingface/transformers/pull/14038

[Wav2Vec2] Adapt conversion script by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14258

Put load_image function in image_utils.py & fix image rotation issue by @mishig25 in https://github.com/huggingface/transformers/pull/14062

minimal fixes to run DataCollatorForWholeWordMask with return_tensors="np" and return_tensors="tf" by @dwyatte in https://github.com/huggingface/transformers/pull/13891

Adding support for truncation parameter on feature-extraction pipeline. by @Narsil in https://github.com/huggingface/transformers/pull/14193

Fix of issue #13327: Wrong weight initialization for TF t5 model by @dshirron in https://github.com/huggingface/transformers/pull/14241

Fixing typo in error message. by @Narsil in https://github.com/huggingface/transformers/pull/14226

Pin Keras cause they messed their release by @sgugger in https://github.com/huggingface/transformers/pull/14262

Quality explain by @sgugger in https://github.com/huggingface/transformers/pull/14264

Add more instructions to the release guide by @sgugger in https://github.com/huggingface/transformers/pull/14263

Fixing slow pipeline tests by @Narsil in https://github.com/huggingface/transformers/pull/14260

Fixing mishandling of ignore_labels. by @Narsil in https://github.com/huggingface/transformers/pull/14274

improve rewrite state_dict missing _metadata by @changwangss in https://github.com/huggingface/transformers/pull/14276

Removing Keras version pinning by @Rocketknight1 in https://github.com/huggingface/transformers/pull/14280

Pin TF until tests are fixed by @sgugger in https://github.com/huggingface/transformers/pull/14283

[Hubert Docs] Make sure example uses a fine-tuned model by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14291

Add new LFS prune API by @sgugger in https://github.com/huggingface/transformers/pull/14294

Remove DPRPretrainedModel from docs by @xhlulu in https://github.com/huggingface/transformers/pull/14300

Handle long answer needs to be updated. by @Narsil in https://github.com/huggingface/transformers/pull/14279

[tests] Fix SegFormer and BEiT tests by @NielsRogge in https://github.com/huggingface/transformers/pull/14289

Fix typo on PPLM example README by @Beomi in https://github.com/huggingface/transformers/pull/14287

[Marian Conversion] Fix eos_token_id conversion in conversion script by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14320

[Tests] Update audio classification tests to support torch 1.10 by @anton-l in https://github.com/huggingface/transformers/pull/14318

[TFWav2Vec2Model] Fix input shapes in TFWav2Vec2WeightNormConv1D by @anton-l in https://github.com/huggingface/transformers/pull/14319

Fixing tests on master. by @Narsil in https://github.com/huggingface/transformers/pull/14317

Fixing mutable default argument in pipeline. by @Narsil in https://github.com/huggingface/transformers/pull/14316

Changed relative imports to absolute to allow convert_graph_to_onnx.py to run as a script. by @nbertagnolli in https://github.com/huggingface/transformers/pull/14325

Expand dynamic supported objects to configs and tokenizers by @sgugger in https://github.com/huggingface/transformers/pull/14296

[deepspeed] Enable multiple test runs on single box, defer to DS_TEST_PORT if set by @jeffra in https://github.com/huggingface/transformers/pull/14331

Small change to Wav2Vec2 model to support Tensor-Parallelism with DeepSpeed by @RezaYazdaniAminabadi in https://github.com/huggingface/transformers/pull/14298

Correct order of overflowing tokens for LayoutLmV2 tokenizer by @Apoorvgarg-creator in https://github.com/huggingface/transformers/pull/13495

Update Seq2Seq QA example script to use SQuAD metric. by @karthikrangasai in https://github.com/huggingface/transformers/pull/14335

remove an irrelevant test from test_modeling_tf_layoutlm by @ydshieh in https://github.com/huggingface/transformers/pull/14341

bump flax version by @patil-suraj in https://github.com/huggingface/transformers/pull/14343

Rewrite guides for fine-tuning with Datasets by @stevhliu in https://github.com/huggingface/transformers/pull/13923

[Bert2Bert] allow bert2bert + relative embeddings by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14324

Support for TF >= 2.7 by @sgugger in https://github.com/huggingface/transformers/pull/14345

BatchFeature: Convert List[np.ndarray] to np.ndarray before converting to pytorch tensors by @eladsegal in https://github.com/huggingface/transformers/pull/14306

Adding some quality of life for pipeline function. by @Narsil in https://github.com/huggingface/transformers/pull/14322

Fix fast tokenization problems by @qqaatw in https://github.com/huggingface/transformers/pull/13930

Add notebook INC quantization for text classification tasks by @echarlaix in https://github.com/huggingface/transformers/pull/14293

enhance rewrite state_dict missing _metadata by @changwangss in https://github.com/huggingface/transformers/pull/14348

Fix list index out of range when padding nested empty lists by @qqaatw in https://github.com/huggingface/transformers/pull/13876

[testing] solve the port conflict by @stas00 in https://github.com/huggingface/transformers/pull/14362

Fix Flax params dtype by @patil-suraj in https://github.com/huggingface/transformers/pull/13098

[flax generate] allow passing params to encode by @patil-suraj in https://github.com/huggingface/transformers/pull/14370

Experimenting with adding proper get_config() and from_config() methods by @Rocketknight1 in https://github.com/huggingface/transformers/pull/14361

Fixing requirements for TF LM models and use correct model mappings by @Rocketknight1 in https://github.com/huggingface/transformers/pull/14372

fix loading flax bf16 weights in pt by @patil-suraj in https://github.com/huggingface/transformers/pull/14369

[wav2vec2] fix --gradient_checkpointing by @stas00 in https://github.com/huggingface/transformers/pull/13964

Adding support for raw python generator in addition to Dataset for pipelines by @Narsil in https://github.com/huggingface/transformers/pull/14352

minor doc fix by @patil-suraj in https://github.com/huggingface/transformers/pull/14377

[Wav2Vec2 Example] Improve fine-tuning script by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14373

Use AlbertConverter for FNet instead of using FNet's own converter by @qqaatw in https://github.com/huggingface/transformers/pull/14365

Add support for WMT21 tokenizer in M2M100Tokenizer by @patil-suraj in https://github.com/huggingface/transformers/pull/14376

[M2M100Tokenizer] fix _build_translation_inputs by @patil-suraj in https://github.com/huggingface/transformers/pull/14382

Raise exceptions instead of using asserts in modeling_openai #12789 by @nbertagnolli in https://github.com/huggingface/transformers/pull/14386

[doc] performance and parallelism updates by @stas00 in https://github.com/huggingface/transformers/pull/14391

Quick fix to TF summarization example by @Rocketknight1 in https://github.com/huggingface/transformers/pull/14401

[Speech2Text2] Enable tokenizers by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14390

Fix TFViT by @NielsRogge in https://github.com/huggingface/transformers/pull/14399

Fix weight loading issue by @ydshieh in https://github.com/huggingface/transformers/pull/14016

Replace BertLayerNorm with LayerNorm by @eldarkurtic in https://github.com/huggingface/transformers/pull/14385

[Wav2Vec2] Make sure that gradient checkpointing is only run if needed by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14407

Allow per-version configurations by @LysandreJik in https://github.com/huggingface/transformers/pull/14344

Fix gradient_checkpointing backward compatibility by @sgugger in https://github.com/huggingface/transformers/pull/14408

Add forward method to dummy models by @sgugger in https://github.com/huggingface/transformers/pull/14419

Avoid looping when data exhausted by @valentindey in https://github.com/huggingface/transformers/pull/14413

Debug doc by @sgugger in https://github.com/huggingface/transformers/pull/14424

[Wav2Vec2] Add New Wav2Vec2 Translation by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14392

Improve semantic segmentation models by @NielsRogge in https://github.com/huggingface/transformers/pull/14355

[Gradient checkpoining] Update Wav2Vec scripts by @falcaopetri in https://github.com/huggingface/transformers/pull/14036

[Bart] Fix docs by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14434

[WIP] Ensure TF model configs can be converted to proper JSON by @Zahlii in https://github.com/huggingface/transformers/pull/14415

Recover Deleted XNLI Instructions by @Helw150 in https://github.com/huggingface/transformers/pull/14437

Fix EncoderDecoderModel code example by @NielsRogge in https://github.com/huggingface/transformers/pull/14441

Add a post init method to all models by @sgugger in https://github.com/huggingface/transformers/pull/14431

Fix finite IterableDataset test on multiple GPUs by @sgugger in https://github.com/huggingface/transformers/pull/14445

[Bert, et al] fix early device assignment by @stas00 in https://github.com/huggingface/transformers/pull/14447

Add GitPython to quality tools by @LysandreJik in https://github.com/huggingface/transformers/pull/14459

[ImageGPT] Small fixes by @NielsRogge in https://github.com/huggingface/transformers/pull/14460

[Generation] Allow inputs_embeds as an input by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14443

Adding support for hidden_states and attentions in unbatching support. by @Narsil in https://github.com/huggingface/transformers/pull/14420

add Tuple as possible type hint for EvalPredictions label_ids by @ameasure in https://github.com/huggingface/transformers/pull/14473

Fix dummy objects for quantization by @sgugger in https://github.com/huggingface/transformers/pull/14478

Moving pipeline tests from Narsil to hf-internal-testing. by @Narsil in https://github.com/huggingface/transformers/pull/14463

Improve add-new-pipeline docs a bit by @stancld in https://github.com/huggingface/transformers/pull/14485

[test] add test for --config_overrides by @stas00 in https://github.com/huggingface/transformers/pull/14466

Support for Training with BF16 by @JamesDeAntonis in https://github.com/huggingface/transformers/pull/13207

fixes some key names for in LayoutLMv2 / LayoutXLM tokenizers by @valentindey in https://github.com/huggingface/transformers/pull/14493

Switch from using sum for flattening lists of lists in group_texts by @nbroad1881 in https://github.com/huggingface/transformers/pull/14472

[deepspeed] zero inference by @stas00 in https://github.com/huggingface/transformers/pull/14253

add cache_dir for tokenizer verification loading by @vmaryasin in https://github.com/huggingface/transformers/pull/14508

Fix feature extraction utils import by @LysandreJik in https://github.com/huggingface/transformers/pull/14515

[Tests] Improve vision tests by @NielsRogge in https://github.com/huggingface/transformers/pull/14458

[CI] clear ~/.cache/torch_extensions between builds by @stas00 in https://github.com/huggingface/transformers/pull/14520

Fix a slow test. by @Narsil in https://github.com/huggingface/transformers/pull/14527

added save_directories for _psave_pretrained_pt and _tf, changed model to tf_model and pt_model, enable the notebook to run cleanly from top to bottom without error by @cfregly in https://github.com/huggingface/transformers/pull/14529

Quicktour updates by @LysandreJik in https://github.com/huggingface/transformers/pull/14533

Fixes by @LysandreJik in https://github.com/huggingface/transformers/pull/14534

[flax] unfreeze initial cache in gpt models by @patil-suraj in https://github.com/huggingface/transformers/pull/14535

Tokenizers docs: Specify which class contains __call__ method by @xhlulu in https://github.com/huggingface/transformers/pull/14379

Rename ImageGPT by @NielsRogge in https://github.com/huggingface/transformers/pull/14526

[Generate] Fix generate with inputs_embeds on GPU by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14564

[Flax] token-classification model steps enumerate start from 1 by @kamalkraj in https://github.com/huggingface/transformers/pull/14547

Fix sentinel token IDs in data collator for Flax T5 pretraining script by @rahuln in https://github.com/huggingface/transformers/pull/14477

Fix backend regex by @sgugger in https://github.com/huggingface/transformers/pull/14566

[Flax] Add FlaxBlenderbot by @stancld in https://github.com/huggingface/transformers/pull/13633

Add documentation for multi-label classification by @gsnidero in https://github.com/huggingface/transformers/pull/14168

use functional interface for softmax in attention by @t-vi in https://github.com/huggingface/transformers/pull/14198

Fix mask token handling by @qqaatw in https://github.com/huggingface/transformers/pull/14364

[doc] bf16/tf32 guide by @stas00 in https://github.com/huggingface/transformers/pull/14579

Rename toctree.yml -> _toctree.yml by @mishig25 in https://github.com/huggingface/transformers/pull/14594

Update doc img links by @mishig25 in https://github.com/huggingface/transformers/pull/14593

Adds a git pull instruction to the documentation builder by @LysandreJik in https://github.com/huggingface/transformers/pull/14597

[Flax] Add FlaxBlenderbotSmall by @stancld in https://github.com/huggingface/transformers/pull/14576

Python 3.6 -> Python 3.7 for TF runs by @LysandreJik in https://github.com/huggingface/transformers/pull/14598

change tf.math.divide with int(/) in distilbert model by @yis11178 in https://github.com/huggingface/transformers/pull/14600

fix #14524 (IndexError when mask prob is too low) by @nikvaessen in https://github.com/huggingface/transformers/pull/14525

Improve tokenizer tests by @qqaatw in https://github.com/huggingface/transformers/pull/13594

[CI] move env print to util, add pt, nccl versions by @stas00 in https://github.com/huggingface/transformers/pull/14607

2022 is the year of multi-modality by @LysandreJik in https://github.com/huggingface/transformers/pull/14610

Fix doc builder by @LysandreJik in https://github.com/huggingface/transformers/pull/14616

[trainer] add tf32-mode control by @stas00 in https://github.com/huggingface/transformers/pull/14606

Make DefaultDataCollator importable from root by @Rocketknight1 in https://github.com/huggingface/transformers/pull/14588

fix a typo by @yuchenlin in https://github.com/huggingface/transformers/pull/14626

updated pytorch token-classification readme by @kamalkraj in https://github.com/huggingface/transformers/pull/14624

Add Flax example tests by @patil-suraj in https://github.com/huggingface/transformers/pull/14599

fix typo by @patil-suraj in https://github.com/huggingface/transformers/pull/14635

add flax example tests in CI workflow by @patil-suraj in https://github.com/huggingface/transformers/pull/14637

[urls to hub] Replace outdated model tags with their now-canonical pipeline types by @julien-c in https://github.com/huggingface/transformers/pull/14617

Update the example of exporting Bart + BeamSearch to ONNX module to resolve comments. by @fatcat-z in https://github.com/huggingface/transformers/pull/14310

Add GPTJForQuestionAnswering by @tucan9389 in https://github.com/huggingface/transformers/pull/14503

doc: mismatch between pooler/d_output by @guhur in https://github.com/huggingface/transformers/pull/14641

fix flax example tests by @patil-suraj in https://github.com/huggingface/transformers/pull/14643

Auto processor fix by @LysandreJik in https://github.com/huggingface/transformers/pull/14623

Fix syntax for class references by @sgugger in https://github.com/huggingface/transformers/pull/14644

Add a job to test the documentation build by @sgugger in https://github.com/huggingface/transformers/pull/14645

fix flax examples tests by @patil-suraj in https://github.com/huggingface/transformers/pull/14646

Use cross_attention_hidden_size in Encoder-Decoder models by @ydshieh in https://github.com/huggingface/transformers/pull/14378

[deepspeed] fix --load_best_model_at_end by @stas00 in https://github.com/huggingface/transformers/pull/14652

quick fix SummarizationPipeline error messages by @NouamaneTazi in https://github.com/huggingface/transformers/pull/14618

Fix a Bug, trainer_seq2seq.py, in the else branch at Line 172, generation_inputs should be a dict by @TranSirius in https://github.com/huggingface/transformers/pull/14546

[trainer] conditional ctx managers into one wrapper by @stas00 in https://github.com/huggingface/transformers/pull/14663

Fixing Dataset for TQA + token-classification. by @Narsil in https://github.com/huggingface/transformers/pull/14658

fix deprecated tf method by @ZOHETH in https://github.com/huggingface/transformers/pull/14671

Fix doc builder by @LysandreJik in https://github.com/huggingface/transformers/pull/14676

[AutoProcessor] Add Wav2Vec2WithLM & small fix #14675 (@patrickvonplaten)

Added support for other features for already supported models #14358 (@michaelbenayoun)

Revert "Added support for other features for already supported models" #14679 (@lewtun)

Convert tutorials #14665 (@sgugger)

fix: verify jsonlines file in run_translation (#14660) #14661 (@GaurangTandon)

Improvements to Comet Integration #14680 (@DN6)

Fixes in init #14681 (@sgugger)

Revert open-in-colab and add perceiver #14683 (@sgugger)

Fix wrong checkpoint paths in doc examples #14685 (@ydshieh)

[bf16 support] tweaks #14580 (@stas00)

[trainer] support UserDict inputs (torch-nightly) #14688 (@stas00)

Move pyctcdecode #14686 (@sgugger)

Make MLuke tokenizer tests slow #14690 (@sgugger)

Fix doc examples: name '...' is not defined #14687 (@ydshieh)

Add a job to test doc building (for realsies this time) #14662 (@sgugger)

Fix Perceiver tests #14703 (@NielsRogge)

add str hub token to repository when provided else fallback to default #14682 (@philschmid)

Fix typo in toctree #14704 (@mishig25)

New Contributors

@hrxorxm made their first contribution in https://github.com/huggingface/transformers/pull/14211

@pdcoded made their first contribution in https://github.com/huggingface/transformers/pull/14219

@wmathor made their first contribution in https://github.com/huggingface/transformers/pull/14202

@wamartin-aml made their first contribution in https://github.com/huggingface/transformers/pull/14232

@lumliolum made their first contribution in https://github.com/huggingface/transformers/pull/14133

@dwyatte made their first contribution in https://github.com/huggingface/transformers/pull/13891

@dshirron made their first contribution in https://github.com/huggingface/transformers/pull/14241

@changwangss made their first contribution in https://github.com/huggingface/transformers/pull/14276

@xhlulu made their first contribution in https://github.com/huggingface/transformers/pull/14300

@Beomi made their first contribution in https://github.com/huggingface/transformers/pull/14287

@nbertagnolli made their first contribution in https://github.com/huggingface/transformers/pull/14325

@jeffra made their first contribution in https://github.com/huggingface/transformers/pull/14331

@RezaYazdaniAminabadi made their first contribution in https://github.com/huggingface/transformers/pull/14298

@echarlaix made their first contribution in https://github.com/huggingface/transformers/pull/14293

@valentindey made their first contribution in https://github.com/huggingface/transformers/pull/14413

@Zahlii made their first contribution in https://github.com/huggingface/transformers/pull/14415

@Helw150 made their first contribution in https://github.com/huggingface/transformers/pull/14437

@shangz-ai made their first contribution in https://github.com/huggingface/transformers/pull/14066

@vmaryasin made their first contribution in https://github.com/huggingface/transformers/pull/14508

@cfregly made their first contribution in https://github.com/huggingface/transformers/pull/14529

@Xargonus made their first contribution in https://github.com/huggingface/transformers/pull/14514

@rahuln made their first contribution in https://github.com/huggingface/transformers/pull/14477

@gsnidero made their first contribution in https://github.com/huggingface/transformers/pull/14168

@t-vi made their first contribution in https://github.com/huggingface/transformers/pull/14198

@JamesDeAntonis made their first contribution in https://github.com/huggingface/transformers/pull/13207

@yis11178 made their first contribution in https://github.com/huggingface/transformers/pull/14600

@nikvaessen made their first contribution in https://github.com/huggingface/transformers/pull/14525

@yuchenlin made their first contribution in https://github.com/huggingface/transformers/pull/14626

@Ryou0634 made their first contribution in https://github.com/huggingface/transformers/pull/14640

@NouamaneTazi made their first contribution in https://github.com/huggingface/transformers/pull/14618

@TranSirius made their first contribution in https://github.com/huggingface/transformers/pull/14546

@ZOHETH made their first contribution in https://github.com/huggingface/transformers/pull/14671

Full Changelog: https://github.com/huggingface/transformers/compare/v4.12.0...v4.13.0
Source code(tar.gz)
Source code(zip)
v4.12.5(Nov 17, 2021)
Reverts a commit that introduced other issues:

Revert "Experimenting with adding proper get_config() and from_config() methods (#14361)"

Source code(tar.gz)
Source code(zip)
v4.12.4(Nov 16, 2021)
Fix gradient_checkpointing backward compatibility (#14408)

[Wav2Vec2] Make sure that gradient checkpointing is only run if needed (#14407)

Experimenting with adding proper get_config() and from_config() methods (#14361)

enhance rewrite state_dict missing _metadata (#14348)

Support for TF >= 2.7 (#14345)

improve rewrite state_dict missing _metadata (#14276)

Fix of issue #13327: Wrong weight initialization for TF t5 model (#14241)

Source code(tar.gz)
Source code(zip)
v4.12.3(Nov 3, 2021)
v4.12.3: Patch release

Add PushToHubCallback in main init (#14246)

Supports huggingface_hub >= 0.1.0

Source code(tar.gz)
Source code(zip)

🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

Related tags

Overview

English | 简体中文 | 繁體中文

State-of-the-art Natural Language Processing for Jax, PyTorch and TensorFlow

Online demos

If you are looking for custom support from the Hugging Face team

Quick tour

Why should I use transformers?

Why shouldn't I use transformers?

Installation

With pip

With conda

Model architectures

Learn more

Citation

Comments

❓ Questions & Help

Details

What does this PR do?

Before submitting

Who can review?

Environment info

Who can help

Information

To reproduce

Expected behavior

Aside

Environment info

Who can help

Information

To reproduce

Expected behavior

❓ Questions & Help

Details

🐛 Bug

Information

To reproduce

Expected behavior

Environment info

What does this PR do?

Before submitting

Who can review?

System Info

Expected behavior

What does this PR do?

What does this PR do?

Before submitting

Who can review?

Releases(v4.25.1)

v4.25.1(Dec 2, 2022)

PyTorch 2.0 stack support

Audio Spectrogram Transformer

Jukebox

Switch Transformers

RocBert

CLIPSeg

NAT and DiNAT

NAT

DiNAT

MobileNetV2

MobileNetV1

Image processors

Backbone for computer vision models

Support for safetensors offloading

Contrastive search in the generate method

Breaking changes

Bugfixes and improvements

Significant community contributions

v4.24.0(Nov 1, 2022)

ESM-2/ESMFold

LiLT

Flan-T5

Table Transformer

Contrastive search decoding

Safety and security

🚨 Breaking changes

Bugfixes and improvements

Significant community contributions

v4.23.1(Oct 11, 2022)

Support for `safetensors` offloading

Contrastive search in the `generate` method