A Word Level Transformer layer based on PyTorch and 🤗 Transformers.

Riccardo Orlando

Last update: Nov 20, 2022

Related tags

Text Data & NLP python nlp natural-language-processing deep-learning tokenizer transformers pytorch embeddings spacy transformer preprocess pretrained-models sentences bert hidden-states allennlp huggingface huggingface-transformers transformer-embedder

Overview

Transformer Embedder

A Word Level Transformer layer based on PyTorch and 🤗 Transformers.

How to use

Install the library from PyPI:

pip install transformer-embedder

It offers a PyTorch layer and a tokenizer that support almost every pretrained model from Huggingface 🤗 Transformers library. Here is a quick example:

import transformer_embedder as tre

tokenizer = tre.Tokenizer("bert-base-cased")
model = tre.TransformerEmbedder("bert-base-cased", subtoken_pooling="mean", output_layer="sum")

example = "This is a sample sentence"
inputs = tokenizer(example, return_tensors=True)

{
   'input_ids': tensor([[ 101, 1188, 1110,  170, 6876, 5650,  102]]),
   'offsets': tensor([[0, 0], [1, 1], [2, 2], [3, 3], [4, 4], [5, 5], [6, 6]]]),
   'attention_mask': tensor([[True, True, True, True, True, True, True]]),
   'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0]])
   'sentence_length': 7  # with special tokens included
}

outputs = model(**inputs)

# outputs.shape[1:-1]       # remove [CLS] and [SEP]
torch.Size([1, 5, 768])
# len(example)
5

Info

One of the annoyance of using transfomer-based models is that it is not trivial to compute word embeddings from the sub-token embeddings they output. With this library it's as easy as using 🤗 Transformers API to get word-level embeddings from theoretically every transformer model it supports.

Model

The TransformerEmbedder offer 4 ways to retrieve the word embeddings, defined by subtoken_pooling parameter:

first: uses only the embedding of the first sub-token of each word
last: uses only the embedding of the last sub-token of each word
mean: computes the mean of the embeddings of the sub-tokens of each word
none: returns the raw output of the transformer model without sub-token pooling

There are also multiple type of outputs you can get using output_layer parameter:

last: returns the last hidden state of the transformer model
concat: returns the concatenation of the last four hidden states of the transformer model
sum: returns the sum of the last four hidden states of the transformer model
pooled: returns the output of the pooling layer

If you also want all the outputs from the HuggingFace model, you can set return_all=True to get them.

class TransformerEmbedder(torch.nn.Module):
    def __init__(
        self,
        model: Union[str, tr.PreTrainedModel],
        subtoken_pooling: str = "first",
        output_layer: str = "last",
        fine_tune: bool = True,
        return_all: bool = False,
    )

Tokenizer

The Tokenizer class provides the tokenize method to preprocess the input for the TransformerEmbedder layer. You can pass raw sentences, pre-tokenized sentences and sentences in batch. It will preprocess them returning a dictionary with the inputs for the model. By passing return_tensors=True it will return the inputs as torch.Tensor.

By default, if you pass text (or batch) as strings, it splits them on spaces

text = "This is a sample sentence"
tokenizer(text)

text = ["This is a sample sentence", "This is another sample sentence"]
tokenizer(text)

You can also use SpaCy to pre-tokenize the inputs into words first, using use_spacy=True

text = "This is a sample sentence"
tokenizer(text, use_spacy=True)

text = ["This is a sample sentence", "This is another sample sentence"]
tokenizer(text, use_spacy=True)

or you can pass an pre-tokenized sentence (or batch of sentences) by setting is_split_into_words=True

text = ["This", "is", "a", "sample", "sentence"]
tokenizer(text, is_split_into_words=True)

text = [
    ["This", "is", "a", "sample", "sentence", "1"],
    ["This", "is", "sample", "sentence", "2"],
]
tokenizer(text, is_split_into_words=True) # here is_split_into_words is redundant

Examples

First, initialize the tokenizer

import transformer_embedder as tre

tokenizer = tre.Tokenizer("bert-base-cased")

You can pass a single sentence as a string:

text = "This is a sample sentence"
tokenizer(text)

{
  'input_ids': [101, 1188, 1110, 170, 6876, 5650, 102],
  'offsets': [(0, 0), (1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6)],
  'attention_mask': [True, True, True, True, True, True, True],
  'token_type_ids': [0, 0, 0, 0, 0, 0, 0],
  'sentence_length': 7
}

A sentence pair

text = "This is a sample sentence A"
text_pair = "This is a sample sentence B"
tokenizer(text, text_pair)

{
  'input_ids': [101, 1188, 1110, 170, 6876, 5650, 138, 102, 1188, 1110, 170, 6876, 5650, 139, 102],
  'offsets': [(0, 0), (1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6), (7, 7), (8, 8), (9, 9), (10, 10), (11, 11), (12, 12), (13, 13), (14, 14)],
  'attention_mask': [True, True, True, True, True, True, True, True, True, True, True, True, True, True, True],
  'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1],
  'sentence_length': 15
}

A batch of sentences or sentence pairs. Using padding=True and return_tensors=True, the tokenizer returns the text ready for the model

batch = [
    ["This", "is", "a", "sample", "sentence", "1"],
    ["This", "is", "sample", "sentence", "2"],
    ["This", "is", "a", "sample", "sentence", "3"],
    # ...
    ["This", "is", "a", "sample", "sentence", "n", "for", "batch"],
]
tokenizer(batch, padding=True, return_tensors=True)

batch_pair = [
    ["This", "is", "a", "sample", "sentence", "pair", "1"],
    ["This", "is", "sample", "sentence", "pair", "2"],
    ["This", "is", "a", "sample", "sentence", "pair", "3"],
    # ...
    ["This", "is", "a", "sample", "sentence", "pair", "n", "for", "batch"],
]
tokenizer(batch, batch_pair, padding=True, return_tensors=True)

Custom fields

It is possible to add custom fields to the model input and tell the tokenizer how to pad them using add_padding_ops. Start by simply tokenizing the input (without padding or tensor mapping)

import transformer_embedder as tre

tokenizer = tre.Tokenizer("bert-base-cased")

text = [
    ["This", "is", "a", "sample", "sentence"],
    ["This", "is", "another", "example", "sentence", "just", "make", "it", "longer"]
]
inputs = tokenizer(text)

Then add the custom fileds to the result

custom_fields = {
  "custom_filed_1": [
    [0, 0, 0, 0, 1, 0, 0],
    [0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0]
  ]
}

inputs.update(custom_fields)

Now we can add the padding logic for our custom field custom_filed_1. add_padding_ops method takes in input

key: name of the field in the tokenzer input
value: value to use for padding
length: length to pad. It can be an int, or two string value, subtoken in which the element is padded to the batch max length relative to the sub-tokens length, and word where the element is padded to the batch max length relative to the original word length

tokenizer.add_padding_ops("custom_filed_1", 0, "word")

Finally, pad the input and convert it to a tensor:

# manual processing
inputs = tokenizer.pad_batch(inputs)
inputs = tokenizer.to_tensor(inputs)

The inputs are ready for the model, including the custom filed.

>>> inputs

{
   "input_ids": tensor(
       [
           [101, 1188, 1110, 170, 6876, 5650, 102, 0, 0, 0, 0],
           [101, 1188, 1110, 1330, 1859, 5650, 1198, 1294, 1122, 2039, 102],
       ]
   ),
   "offsets": tensor(
       [
           [[0, 0], [1, 1], [2, 2], [3, 3], [4, 4], [5, 5], [6, 6], [7, 7], [-1, -1], [-1, -1], [-1, -1]],
           [[0, 0], [1, 1], [2, 2], [3, 3], [4, 4], [5, 5], [6, 6], [7, 7], [8, 8], [9, 9], [10, 10]],
       ]
   ),
   "attention_mask": tensor(
       [
           [True, True, True, True, True, True, True, False, False, False, False],
           [True, True, True, True, True, True, True, True, True, True, True],
       ]
   ),
   "word_mask": tensor(
       [
           [True, True, True, True, True, True, True, False, False, False, False],
           [True, True, True, True, True, True, True, True, True, True, True],
       ]
   ),
   "token_type_ids": tensor(
       [[0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
   ),
   "sentence_length": tensor([7, 11]),
   "custom_filed_1": tensor(
       [[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], [0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0]]
   ),
}

SpaCy Tokenizer

By default, it uses the multilingual model xx_sent_ud_sm. You can change it with the language parameter during the tokenizer initialization. For example, if you prefer an English tokenizer:

tokenizer = tre.Tokenizer("bert-base-cased", language="en_core_web_sm")

For a complete list of languages and models, you can go here.

To-Do

Future developments

Add an optional word tokenizer, maybe using SpaCy
Add add_special_tokens wrapper
Make pad_batch function more general
Add logic (like how to pad, etc) for custom fields
- Documentation
Include all model outputs
- Documentation
A TensorFlow version (improbable)

Acknowledgements

Most of the code in the TransformerEmbedder class is taken from the AllenNLP library. The pretrained models and the core of the tokenizer is from 🤗 Transformers.

Comments

Minor improvements to the Embedder
The following changes have been applied:

Add support to average the last four hidden layers of the transformer model

~~Add a shape property to the TransformersEmbedderOutput class (referenced in the README)~~

Add option to specify which hidden states to use for pooling

Update the docs accordingly

Run black in compliance with the project specifications

Fix documentation issues
opened by LeonardoEmili 3
Update transformers requirement from <4.23,>=4.14 to >=4.14,<4.24
Updates the requirements on transformers to permit the latest version.

Release notes

Sourced from transformers's releases.

v4.23.0: Whisper, Time series, Conditional DETR, MSN, MarkupLM, safetensors

Whisper

The Whisper model was proposed in Robust Speech Recognition via Large-Scale Weak Supervision by Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever.

The abstract from the paper is the following:

We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual and multitask supervision, the resulting models generalize well to standard benchmarks and are often competitive with prior fully supervised results but in a zeroshot transfer setting without the need for any finetuning. When compared to humans, the models approach their accuracy and robustness. We are releasing models and inference code to serve as a foundation for further work on robust speech processing.

Add WhisperModel to transformers by @ArthurZucker in #19166

Add TF whisper by @amyeroberts in #19378

Time series

The Time Series Transformer model is a vanilla encoder-decoder Transformer for time series forecasting.

:warning: This is a recently introduced model and modality, so the API hasn't been tested extensively. There may be some bugs or slight breaking changes to fix it in the future. If you see something strange, file a Github Issue.

time series forecasting model by @kashif in #17965

Conditional DETR

The Conditional DETR model was proposed in Conditional DETR for Fast Training Convergence by Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang. Conditional DETR presents a conditional cross-attention mechanism for fast DETR training. Conditional DETR converges 6.7× to 10× faster than DETR.

The abstract from the paper is the following:

The recently-developed DETR approach applies the transformer encoder and decoder architecture to object detection and achieves promising performance. In this paper, we handle the critical issue, slow training convergence, and present a conditional cross-attention mechanism for fast DETR training. Our approach is motivated by that the cross-attention in DETR relies highly on the content embeddings for localizing the four extremities and predicting the box, which increases the need for high-quality content embeddings and thus the training difficulty. Our approach, named conditional DETR, learns a conditional spatial query from the decoder embedding for decoder multi-head cross-attention. The benefit is that through the conditional spatial query, each cross-attention head is able to attend to a band containing a distinct region, e.g., one object extremity or a region inside the object box. This narrows down the spatial range for localizing the distinct regions for object classification and box regression, thus relaxing the dependence on the content embeddings and easing the training. Empirical results show that conditional DETR converges 6.7× faster for the backbones R50 and R101 and 10× faster for stronger backbones DC5-R50 and DC5-R101.

Add support for conditional detr by @DeppMeng in #18948

Improve conditional detr docs by @NielsRogge in #19154

Masked Siamese Networks

The ViTMSN model was proposed in Masked Siamese Networks for Label-Efficient Learning by Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas. The paper presents a joint-embedding architecture to match the prototypes of masked patches with that of the unmasked patches. With this setup, their method yields excellent performance in the low-shot and extreme low-shot regimes.

The abstract from the paper is the following:

We propose Masked Siamese Networks (MSN), a self-supervised learning framework for learning image representations. Our approach matches the representation of an image view containing randomly masked patches to the representation of the original unmasked image. This self-supervised pre-training strategy is particularly scalable when applied to Vision Transformers since only the unmasked patches are processed by the network. As a result, MSNs improve the scalability of joint-embedding architectures, while producing representations of a high semantic level that perform competitively on low-shot image classification. For instance, on ImageNet-1K, with only 5,000 annotated images, our base MSN model achieves 72.4% top-1 accuracy, and with 1% of ImageNet-1K labels, we achieve 75.7% top-1 accuracy, setting a new state-of-the-art for self-supervised learning on this benchmark.

MSN (Masked Siamese Networks) for ViT by @sayakpaul in #18815

MarkupLM

The MarkupLM model was proposed in MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding by Junlong Li, Yiheng Xu, Lei Cui, Furu Wei. MarkupLM is BERT, but applied to HTML pages instead of raw text documents. The model incorporates additional embedding layers to improve performance, similar to LayoutLM.

The model can be used for tasks like question answering on web pages or information extraction from web pages. It obtains state-of-the-art results on 2 important benchmarks:

WebSRC, a dataset for Web-Based Structual Reading Comprehension (a bit like SQuAD but for web pages) SWDE, a dataset for information extraction from web pages (basically named-entity recogntion on web pages) The abstract from the paper is the following:

... (truncated)

Commits

9ae22fe Release: v4.23.0

df2f281 wrap forward passes with torch.no_grad() (#19412)

5f5e264 wrap forward passes with torch.no_grad() (#19413)

c6a928c wrap forward passes with torch.no_grad() (#19414)

d739a70 wrap forward passes with torch.no_grad() (#19416)

870a954 wrap forward passes with torch.no_grad() (#19438)

692c5be wrap forward passes with torch.no_grad() (#19439)

a7bc422 fix (#19469)

25cfd91 Fixed a non-working hyperlink in the README.md file (#19434)

9df953a Fix misspelled word in docstring (#19415)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies
opened by dependabot[bot] 1
Update torch requirement from <1.12,>=1.7 to >=1.7,<1.13
Updates the requirements on torch to permit the latest version.

Release notes

Sourced from torch's releases.

PyTorch 1.12: TorchArrow, Functional API for Modules and nvFuser, are now available

PyTorch 1.12 Release Notes

Highlights

Backwards Incompatible Change

New Features

Improvements

Performance

Documentation

Highlights

We are excited to announce the release of PyTorch 1.12! This release is composed of over 3124 commits, 433 contributors. Along with 1.12, we are releasing beta versions of AWS S3 Integration, PyTorch Vision Models on Channels Last on CPU, Empowering PyTorch on Intel® Xeon® Scalable processors with Bfloat16 and FSDP API. We want to sincerely thank our dedicated community for your contributions.

Summary:

Functional Module API to functionally apply module computation with a given set of parameters

Complex32 and Complex Convolutions in PyTorch

DataPipes from TorchData fully backward compatible with DataLoader

Functorch with improved coverage for APIs

nvFuser a deep learning compiler for PyTorch

Changes to float32 matrix multiplication precision on Ampere and later CUDA hardware

TorchArrow, a new beta library for machine learning preprocessing over batch data

Backwards Incompatible changes

Python API

Updated type promotion for torch.clamp (#77035)

In 1.11, the ‘min’ and ‘max’ arguments in torch.clamp did not participate in type promotion, which made it inconsistent with minimum and maximum operations. In 1.12, the ‘min’ and ‘max’ arguments participate in type promotion.

1.11

>>> import torch >>> a = torch.tensor([1., 2., 3., 4.], dtype=torch.float32) >>> b = torch.tensor([2., 2., 2., 2.], dtype=torch.float64) >>> c = torch.tensor([3., 3., 3., 3.], dtype=torch.float64) >>> torch.clamp(a, b, c).dtype torch.float32

1.12

>>> import torch >>> a = torch.tensor([1., 2., 3., 4.], dtype=torch.float32) >>> b = torch.tensor([2., 2., 2., 2.], dtype=torch.float64) >>> c = torch.tensor([3., 3., 3., 3.], dtype=torch.float64) </tr></table>

... (truncated)

Changelog

Sourced from torch's changelog.

Releasing PyTorch

General Overview

Cutting a release branch preparations

Cutting release branches

pytorch/pytorch

pytorch/builder / PyTorch domain libraries

Making release branch specific changes for PyTorch

Making release branch specific changes for domain libraries

Drafting RCs (https://github.com/pytorch/pytorch/blob/master/Release Candidates) for PyTorch and domain libraries

Release Candidate Storage

Release Candidate health validation

Cherry Picking Fixes

Promoting RCs to Stable

Additonal Steps to prepare for release day

Modify release matrix

Open Google Colab issue

Patch Releases

Patch Release Criteria

Patch Release Process

Triage

Building a release schedule / cherry picking

Building Binaries / Promotion to Stable

Hardware / Software Support in Binary Build Matrix

Python

TL;DR

Accelerator Software

Special support cases

Special Topics

Updating submodules for a release

General Overview

Releasing a new version of PyTorch generally entails 3 major steps:

Cutting a release branch preparations

Cutting a release branch and making release branch specific changes

Drafting RCs (Release Candidates), and merging cherry picks

Promoting RCs to stable and performing release day tasks

Cutting a release branch preparations

Following Requirements needs to be met prior to final RC Cut:

Resolve all outstanding issues in the milestones(for example 1.11.0)before first RC cut is completed. After RC cut is completed following script should be executed from builder repo in order to validate the presence of the fixes in the release branch : python github_analyze.py --repo-path ~/local/pytorch --remote upstream --branch release/1.11 --milestone-id 26 --missing-in-branch

... (truncated)

Commits

67ece03 Disable AVX512 CPU dispatch by default (#80253) (#80356)

bcfb424 [JIT] Imbue stringbuf with C locale (#79929) (#79983)

8186aa7 [DataLoader] Share seed via Distributed Store to get rid of CUDA dependency (...

01d9324 nn: Disable nested tensor by default (#79884)

5009086 Fix release doc builds (#79865)

bfb6b24 [JIT] Nested fix (#79480) (#79816)

681a6e3 [v1.12.0] Fix non-reentrant hooks based checkpointing (#79490)

92437c6 Revert behavior of Dropout2d on 3D inputs to 1D channel-wise dropout behavior...

566286f Add Dropout1d module (#79610)

ac30861 [DataLoader] Fix the world_size when distributed sharding MapDataPipe (#79524...

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies
opened by dependabot[bot] 1
Feature/sparse pooling
This PR adds a new pooling method for subwords (well, two, but the inefficient one is there only for benchmarking purposes). The sparse one is necessary for contexts where we want to enable CUDA determinism since scatter methods do not support it.

The script benchmark.py compares them, but I think that there is some mismatch in the approaches since these are the results (GPU: NVIDIA 2060S | CPU: AMD 3700X):

scatter == sparse (allclose with atol=1e-07): False scatter == inefficient (allclose with atol=1e-07): False sparse == inefficient (allclose with atol=1e-07): True scatter 23.960102558135986s sparse 23.518492221832275s inefficient 24.366436004638672s

I wrote the "inefficient" pooling method as a control one, and it seems like the scatter method is not matching its results. I think the mismatch can be traced to something weird happening with the padded positions, but I didn't investigate further.

I could very well have implemented both the control and the sparse methods wrongly, so please double-check everything!

And thank you for the library, it is truly useful!
opened by Flegyas 1
Update transformers requirement from <4.17,>=4.3 to >=4.3,<4.18
Updates the requirements on transformers to permit the latest version.

Release notes

Sourced from transformers's releases.

v4.17.0: XGLM, ConvNext, PoolFormer, PLBart, Data2Vec, MaskFormer and code in the Hub

New models

XGLM

The XGLM model was proposed in Few-shot Learning with Multilingual Language Models by Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O’Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li.

XGLM is a GPT3-like multilingual model trained on a balanced corpus covering a diverse set of languages.

Add XGLM models by @patil-suraj in huggingface/transformers#14876

ConvNext

The ConvNeXT model was proposed in A ConvNet for the 2020s by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie.

ConvNeXT is a pure convolutional model (ConvNet), inspired by the design of Vision Transformers, that claims to outperform them.

Add ConvNeXT by @NielsRogge in huggingface/transformers#15277

Add TFConvNextModel by @sayakpaul in huggingface/transformers#15750

PoolFormer

The PoolFormer model was proposed in MetaFormer is Actually What You Need for Vision by Sea AI Labs.

Add PoolFormer by @heytanay in huggingface/transformers#15531

PLBart

The PLBART model was proposed in Unified Pre-training for Program Understanding and Generation by Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang.

This is a BART-like model which can be used to perform code-summarization, code-generation, and code-translation tasks. The pre-trained model plbart-base has been trained using multilingual denoising task on Java, Python and English.

Add PLBart by @gchhablani in huggingface/transformers#13269

Add missing PLBart entry in README by @gchhablani in huggingface/transformers#15721

Data2Vec

The Data2Vec model was proposed in data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu and Michael Auli.

Data2Vec proposes a unified framework for self-supervised learning across different data modalities - text, audio and images. Importantly, predicted targets for pre-training are contextualized latent representations of the inputs, rather than modality-specific, context-independent targets.

Add Data2Vec by @edugp in huggingface/transformers#15507

Maskformer

The MaskFormer model was proposed in Per-Pixel Classification is Not All You Need for Semantic Segmentation by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov.

MaskFormer addresses semantic segmentation with a mask classification paradigm instead of performing classic pixel-level classification.

Maskformer by @FrancescoSaverioZuppichini in huggingface/transformers#15682

... (truncated)

Commits

198c335 [Doctests] Fix ignore bug and add more doc tests (#15911)

8529a85 [Fix link in pipeline doc] (#15906)

7e8ae01 Release: v4.17.0

3d22428 Update delete-dev-doc job to match build-dev-doc (#15891)

89be34c Fix SegformerForImageClassification (#15895)

130b987 [XGLM] run sampling test on CPU to be deterministic (#15892)

baab5e7 TF generate refactor - Sample (#15793)

96ae92b [SegFormer] Add deprecation warning (#15889)

8fd4731 Fix Bug in FlaxWav2Vec2 Slow Test (#15887)

d83d22f Maskformer (#15682)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies
opened by dependabot[bot] 1
Update transformers requirement from <4.23,>=4.14 to >=4.14,<4.25
Updates the requirements on transformers to permit the latest version.

Release notes

Sourced from transformers's releases.

v4.24.0: ESM-2/ESMFold, LiLT, Flan-T5, Table Transformer and Contrastive search decoding

ESM-2/ESMFold

ESM-2 and ESMFold are new state-of-the-art Transformer protein language and folding models from Meta AI's Fundamental AI Research Team (FAIR). ESM-2 is trained with a masked language modeling objective, and it can be easily transferred to sequence and token classification tasks for proteins. Checkpoints exist in various sizes, from 8 million parameters up to a huge 15 billion parameter model.

ESMFold is a state-of-the-art single sequence protein folding model which produces high accuracy predictions significantly faster. Unlike previous protein folding tools like AlphaFold2 and openfold, ESMFold uses a pretrained protein language model to generate token embeddings that are used as input to the folding model, and so does not require a multiple sequence alignment (MSA) of related proteins as input. As a result, proteins can be folded in a single forward pass of the model without requiring any external databases or search/alignment tools to be present at inference time. This hugely reduces the time and compute requirements for folding.

Transformer protein language models were introduced in the paper Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences by Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus.

ESMFold was introduced in the paper Language models of protein sequences at the scale of evolution enable accurate structure prediction by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, and Alexander Rives.

Add ESMFold by @Rocketknight1 in #19977

TF port of ESM by @Rocketknight1 in #19587

LiLT

LiLT allows to combine any pre-trained RoBERTa text encoder with a lightweight Layout Transformer, to enable LayoutLM-like document understanding for many languages.

It was proposed in LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding by Jiapeng Wang, Lianwen Jin, Kai Ding.

Add LiLT by @NielsRogge in #19450

Flan-T5

FLAN-T5 is an enhanced version of T5 that has been finetuned on a mixture of tasks.

It was released in the paper Scaling Instruction-Finetuned Language Models by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei.

Add flan-t5 documentation page by @younesbelkada in #19892

Table Transformer

Table Transformer is a model that can perform table extraction and table structure recognition from unstructured documents based on the DETR architecture.

It was proposed in PubTables-1M: Towards comprehensive table extraction from unstructured documents by Brandon Smock, Rohith Pesala, Robin Abraham.

Add table transformer [v2] by @NielsRogge in #19614

Contrastive search decoding

Contrastive search decoding is a new state-of-the-art generation method which aims at reducing the repetitive patterns in which generation models often fall.

It was introduced in A Contrastive Framework for Neural Text Generation by Yixuan Su, Tian Lan, Yan Wang, Dani Yogatama, Lingpeng Kong, Nigel Collier.

Adding the state-of-the-art contrastive search decoding methods for the codebase of generation_utils.py by @gmftbyGMFTBY in #19477

Safety and security

We continue to explore the new serialization format not using Pickle via the safetensors library, this time by adding support for TensorFlow models. More checkpoints have been converted to this format. Support is still experimental.

... (truncated)

Commits

94b3f54 Unpin PyTorch for the release

8f95346 Add ESMFold code sample (#20000)

502d3b6 Remove pin temporarily to get tests

0e654e0 Added onnx config whisper (#19525)

1ebb3f7 Release v4.24.0

9c13b66 Unpin PyTorch

7f9b7b3 Add ESMFold (#19977)

4c9e0f0 Add support for gradient checkpointing (#19990)

8214a9f Pin torch to < 1.13 temporarily (#19989)

6aede2d Tranformers documentation translation to Italian #17459 (#19988)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies
opened by dependabot[bot] 0

Update torch requirement from <1.13,>=1.7 to >=1.7,<1.14

Updates the requirements on torch to permit the latest version.

Release notes

Sourced from torch's releases.

PyTorch 1.13: beta versions of functorch and improved support for Apple’s new M1 chips are now available

Pytorch 1.13 Release Notes

Highlights

Backwards Incompatible Changes

New Features

Improvements

Performance

Documentation

Developers

Highlights

We are excited to announce the release of PyTorch 1.13! This includes stable versions of BetterTransformer. We deprecated CUDA 10.2 and 11.3 and completed migration of CUDA 11.6 and 11.7. Beta includes improved support for Apple M1 chips and functorch, a library that offers composable vmap (vectorization) and autodiff transforms, being included in-tree with the PyTorch release. This release is composed of over 3,749 commits and 467 contributors since 1.12.1. We want to sincerely thank our dedicated community for your contributions.

Summary:

The BetterTransformer feature set supports fastpath execution for common Transformer models during Inference out-of-the-box, without the need to modify the model. Additional improvements include accelerated add+matmul linear algebra kernels for sizes commonly used in Transformer models and Nested Tensors is now enabled by default.

Timely deprecating older CUDA versions allows us to proceed with introducing the latest CUDA version as they are introduced by Nvidia®, and hence allows support for C++17 in PyTorch and new NVIDIA Open GPU Kernel Modules.

Previously, functorch was released out-of-tree in a separate package. After installing PyTorch, a user will be able to import functorch and use functorch without needing to install another package.

PyTorch is offering native builds for Apple® silicon machines that use Apple's new M1 chip as a beta feature, providing improved support across PyTorch's APIs.

Stable Beta Prototype

Better TransformerCUDA 10.2 and 11.3 CI/CD Deprecation Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIsExtend NNC to support channels last and bf16Functorch now in PyTorch Core LibraryBeta Support for M1 devices Arm® Compute Library backend support for AWS Graviton CUDA Sanitizer

You can check the blogpost that shows the new features here.

Backwards Incompatible changes

Python API

uint8 and all integer dtype masks are no longer allowed in Transformer (#87106)

Prior to 1.13, key_padding_mask could be set to uint8 or other integer dtypes in TransformerEncoder and MultiheadAttention, which might generate unexpected results. In this release, these dtypes are not allowed for the mask anymore. Please convert them to torch.bool before using.

1.12.1
>>> layer = nn.TransformerEncoderLayer(2, 4, 2)
>>> encoder = nn.TransformerEncoder(layer, 2)
>>> pad_mask = torch.tensor([[1, 1, 0, 0]], dtype=torch.uint8)
>>> inputs = torch.cat([torch.randn(1, 2, 2), torch.zeros(1, 2, 2)], dim=1)
# works before 1.13
>>> outputs = encoder(inputs, src_key_padding_mask=pad_mask)

Stable	Beta	Prototype
Better TransformerCUDA 10.2 and 11.3 CI/CD Deprecation	Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIsExtend NNC to support channels last and bf16Functorch now in PyTorch Core LibraryBeta Support for M1 devices	Arm® Compute Library backend support for AWS Graviton CUDA Sanitizer

... (truncated)

Changelog

Sourced from torch's changelog.

Releasing PyTorch

General Overview

Cutting a release branch preparations

Cutting release branches

pytorch/pytorch

pytorch/builder / PyTorch domain libraries

Making release branch specific changes for PyTorch

Making release branch specific changes for domain libraries

Drafting RCs (https://github.com/pytorch/pytorch/blob/master/Release Candidates) for PyTorch and domain libraries

Release Candidate Storage

Release Candidate health validation

Cherry Picking Fixes

Promoting RCs to Stable

Additional Steps to prepare for release day

Modify release matrix

Open Google Colab issue

Patch Releases

Patch Release Criteria

Patch Release Process

Triage

Building a release schedule / cherry picking

Building Binaries / Promotion to Stable

Hardware / Software Support in Binary Build Matrix

Python

TL;DR

Accelerator Software

Special support cases

Special Topics

Updating submodules for a release

General Overview

Releasing a new version of PyTorch generally entails 3 major steps:

Cutting a release branch preparations

Cutting a release branch and making release branch specific changes

Drafting RCs (Release Candidates), and merging cherry picks

Promoting RCs to stable and performing release day tasks

Cutting a release branch preparations

Following Requirements needs to be met prior to final RC Cut:

Resolve all outstanding issues in the milestones(for example 1.11.0)before first RC cut is completed. After RC cut is completed following script should be executed from builder repo in order to validate the presence of the fixes in the release branch : python github_analyze.py --repo-path ~/local/pytorch --remote upstream --branch release/1.11 --milestone-id 26 --missing-in-branch

... (truncated)

Commits

7c98e70 attempted fix for nvrtc with lovelace (#87611) (#87618)
4e1a4b1 fix docs push (#87498) (#87628)
341c377 Add General Project Policies (#87385) (#87613)
fdb18da Fix distributed issue by including distributed files (#87612)
8569a44 [MPS] Revamp copy_to_mps_ implementation (#87475)
6a8be2c [ONNX] Reland: Update training state logic to support ScriptedModule (#86745)...
f6c42ae Reenable isinstance with torch.distributed.ReduceOp (#87303) (#87463)
51fa4fa Move PadNd from ATen/native to ATen (#87456)
d3aecbd Delete torch::deploy from pytorch core (#85953) (#85953) (#87454)
d253eb2 Avoid calling logging.basicConfig (#86959) (#87455)
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot merge will merge this PR after your CI passes on it
@dependabot squash and merge will squash and merge this PR after your CI passes on it
@dependabot cancel merge will cancel a previously requested merge and block automerging
@dependabot reopen will reopen this PR if it is closed
@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies

opened by dependabot[bot] 0

Update transformers requirement from <4.22,>=4.14 to >=4.14,<4.23
Updates the requirements on transformers to permit the latest version.

Release notes

Sourced from transformers's releases.

v4.22.0: Swin Transformer v2, VideoMAE, Donut, Pegasus-X, X-CLIP, ERNIE

Swin Transformer v2

The Swin Transformer V2 model was proposed in Swin Transformer V2: Scaling Up Capacity and Resolution by Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo.

Add swin transformer v2 by @nandwalritik in #17469

VideoMAE

The VideoMAE model was proposed in VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training by Zhan Tong, Yibing Song, Jue Wang, Limin Wang. VideoMAE extends masked auto encoders (MAE) to video, claiming state-of-the-art performance on several video classification benchmarks.

Add VideoMAE by @NielsRogge in #17821

Donut

The Donut model was proposed in OCR-free Document Understanding Transformer by Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park. Donut consists of an image Transformer encoder and an autoregressive text Transformer decoder to perform document understanding tasks such as document image classification, form understanding and visual question answering.

Add Donut by @NielsRogge in #18488

Pegasus-X

The PEGASUS-X model was proposed in Investigating Efficiently Extending Transformers for Long Input Summarization by Jason Phang, Yao Zhao and Peter J. Liu.

PEGASUS-X (PEGASUS eXtended) extends the PEGASUS models for long input summarization through additional long input pretraining and using staggered block-local attention with global tokens in the encoder.

PEGASUS-X by @zphang in #18551

X-CLIP

The X-CLIP model was proposed in Expanding Language-Image Pretrained Models for General Video Recognition by Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, Haibin Ling. X-CLIP is a minimal extension of CLIP for video. The model consists of a text encoder, a cross-frame vision encoder, a multi-frame integration Transformer, and a video-specific prompt generator.

Add X-CLIP by @NielsRogge in #18852

ERNIE

ERNIE is a series of powerful models proposed by baidu, especially in Chinese tasks, including ERNIE1.0, ERNIE2.0, ERNIE3.0, ERNIE-Gram, ERNIE-health, etc. These models are contributed by nghuyong and the official code can be found in PaddleNLP (in PaddlePaddle).

ERNIE-2.0 and ERNIE-3.0 models by @nghuyong in #18686

TensorFlow models

MobileViT and LayoutLMv3 are now available in TensorFlow.

TensorFlow MobileViT by @sayakpaul in #18555

[LayoutLMv3] Add TensorFlow implementation by @ChrisFugl in #18678

New task-specific architectures

A new question answering head was added for the LayoutLM model.

... (truncated)

Commits

ad11b79 Release: v4.22.0

2182378 fix GPT2 token's special_tokens_mask when used with add_bos_token=True (#...

680ad0d Re-add support for single url files in objects download (#19014)

c6415fa Fix MaskFormerFeatureExtractor instance segmentation preprocessing bug (#18997)

d5e1d21 Fix tokenizer for XLMRobertaXL (#19004)

470799b Removed issue in wav2vec link (#18945)

4c2e983 Fixed typo (#18921)

1182b94 TF: TF 2.10 unpin + related onnx test skips (#18995)

7f4708e added type hints (#18996)

39b5bb7 fix checkpoint name for wav2vec2 conformer (#18994)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies
opened by dependabot[bot] 0
Update transformers requirement from <4.21,>=4.14 to >=4.14,<4.22
Updates the requirements on transformers to permit the latest version.

Release notes

Sourced from transformers's releases.

v4.21.0: TF XLA text generation - Custom Pipelines - OwlViT, NLLB, MobileViT, Nezha, GroupViT, MVP, CodeGen, UL2

TensorFlow XLA Text Generation

The TensorFlow text generation method can now be wrapped with tf.function and compiled to XLA. You should be able to achieve up to 100x speedup this way. See our blog post and our benchmarks. You can also see XLA generation in action in our example notebooks, particularly for summarization and translation.

import tensorflow as tf from transformers import AutoTokenizer, TFAutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("t5-small") model = TFAutoModelForSeq2SeqLM.from_pretrained("t5-small") Main changes with respect to the original generate workflow: tf.function and pad_to_multiple_of xla_generate = tf.function(model.generate, jit_compile=True) tokenization_kwargs = {"pad_to_multiple_of": 32, "padding": True, "return_tensors": "tf"} The first prompt will be slow (compiling), the others will be very fast!
input_prompts = [ f"translate English to {language}: I have four cats and three dogs." for language in ["German", "French", "Romanian"] ] for input_prompt in input_prompts: tokenized_inputs = tokenizer([input_prompt], **tokenization_kwargs) generated_text = xla_generate(**tokenized_inputs, max_new_tokens=32) print(tokenizer.decode(generated_text[0], skip_special_tokens=True))

Generate: deprecate default max_length by @gante in #18018

TF: GPT-J compatible with XLA generation by @gante in #17986

TF: T5 can now handle a padded past (i.e. XLA generation) by @gante in #17969

TF: XLA beam search + most generation-compatible models are now also XLA-generate-compatible by @gante in #17857

TF: generate without tf.TensorArray by @gante in #17801

TF: BART compatible with XLA generation by @gante in #17479

New model additions

OwlViT

The OWL-ViT model (short for Vision Transformer for Open-World Localization) was proposed in Simple Open-Vocabulary Object Detection with Vision Transformers by Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby. OWL-ViT is an open-vocabulary object detection network trained on a variety of (image, text) pairs. It can be used to query an image with one or multiple text queries to search for and detect target objects described in text.

Add OWL-ViT model for zero-shot object detection by @alaradirik in #17938

Fix OwlViT tests by @sgugger in #18253

NLLB

The NLLB model was presented in No Language Left Behind: Scaling Human-Centered Machine Translation by Marta R. Costa-jussà, James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht, Jean Maillard, Anna Sun, Skyler Wang, Guillaume Wenzek, Al Youngblood, Bapi Akula, Loic Barrault, Gabriel Mejia Gonzalez, Prangthip Hansanti, John Hoffman, Semarley Jarrett, Kaushik Ram Sadagopan, Dirk Rowe, Shannon Spruit, Chau Tran, Pierre Andrews, Necip Fazil Ayan, Shruti Bhosale, Sergey Edunov, Angela Fan, Cynthia Gao, Vedanuj Goswami, Francisco Guzmán, Philipp Koehn, Alexandre Mourachko, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, and Jeff Wang. No Language Left Behind (NLLB) is a model capable of delivering high-quality translations directly between any pair of 200+ languages — including low-resource languages like Asturian, Luganda, Urdu and more.

[M2M100] update conversion script by @patil-suraj in #17916

NLLB tokenizer by @LysandreJik in #18126

... (truncated)

Commits

a9eee2f Release: v4.21.0

0daa202 Fix sacremoses sof dependency for Transofmers XL

31b3a12 sentencepiece shouldn't be required for the fast LayoutXLM tokenizer

3496ea8 Remove all uses of six (#18318)

9e564d0 fix loading from pretrained for sharded model with `torch_dtype="auto" (#18061)

36f9859 [EncoderDecoder] Improve docs (#18271)

3c45faa [DETR] Improve code examples (#18262)

ee67e7a patch for smddp import (#18244)

68097dc Fix Sylvain's nits on the original KerasMetricCallback PR (#18300)

6649133 Add PYTEST_TIMEOUT for CircleCI test jobs (#18251)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies
opened by dependabot[bot] 0
Update transformers requirement from <4.20,>=4.3 to >=4.3,<4.21
Updates the requirements on transformers to permit the latest version.

Release notes

Sourced from transformers's releases.

v4.20.0 Big Model infernece, BLOOM, CvT, GPT Neo-X, LayoutLMv3, LeViT, LongT5, M-CTC-T, Trajectory Transformer and Wav2Vec2-Conformer

Big model inference

You can now use the big model inference of Accelerate directly in any call to from_pretrained by specifying device_map="auto" (or your own device_map). It will automatically load the model taking advantage of your GPU(s) then offloading what doesn't fit in RAM, or even on the hard drive if you don't have RAM. Your model can then be used normally for inference without anything else to do.

from transformers import AutoModelForSeq2SeqLM
model = AutoModelForSeq2SeqLM.from_pretrained( "bigscience/T0pp", revision="sharded", device_map="auto" )

Use Accelerate in from_pretrained for big model inference by @sgugger in #17341

BLOOM

The BLOOM model has been proposed with its various versions through the BigScience Workshop. The architecture of BLOOM is essentially similar to GPT3 (auto-regressive model for next token prediction), but has been trained on different 46 languages including code.

BLOOM by @younesbelkada in #17474

CvT

The Convolutional vision Transformer (CvT) improves the Vision Transformer (ViT) in performance and efficiency by introducing convolutions into ViT to yield the best of both designs.

Add CvT by @NielsRogge and @AnugunjNaman in #17299

GPT Neo-X

GPT-NeoX-20B is a 20 billion parameter autoregressive language model trained on the Pile, whose weights are made freely and openly available to the public through a permissive license. GPT-NeoX-20B is a particularly powerful few-shot reasoner and gains far more in performance when evaluated five-shot than similarly sized GPT-3 and FairSeq models.

Adding GPT-NeoX-20B by @zphang in #16659

LayoutLMv3

LayoutLMv3 simplifies LayoutLMv2 by using patch embeddings (as in ViT) instead of leveraging a CNN backbone, and pre-trains the model on 3 objectives: masked language modeling (MLM), masked image modeling (MIM) and word-patch alignment (WPA).

Add LayoutLMv3 by @NielsRogge in #17060

LeViT

LeViT improves the Vision Transformer (ViT) in performance and efficiency by a few architectural differences such as activation maps with decreasing resolutions in Transformers and the introduction of an attention bias to integrate positional information.

Adding LeViT Model by Facebook by @AnugunjNaman in #17466

LongT5

LongT5 model is an extension of T5 model, and it enables using one of the two different efficient attention mechanisms - (1) Local attention, or (2) Transient-Global attention. It is capable of handling input sequences of a length up to 16,384 tokens.

... (truncated)

Commits

39b4aba Release: v4.20.0

90c8c01 Refine Bf16 test for deepspeed (#17734)

f8c8f4d Fix tf shared embedding (#17730)

3981ee8 Sort the model doc Toc Alphabetically (#17723)

66f8933 normalize keys_to_ignore (#17722)

c3c62b5 CLI: Add flag to push TF weights directly into main (#17720)

6ebeeee Update requirements.txt (#17719)

50415b8 Revert "Change push CI to run on workflow_run event (#17692)" (#17717)

7f14839 [Wav2Vec2Conformer] Official release (#17709)

242cc6e Documentation: RemBERT fixes (#17641)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies
opened by dependabot[bot] 0
Update transformers requirement from <4.19,>=4.3 to >=4.3,<4.20
Updates the requirements on transformers to permit the latest version.

Release notes

Sourced from transformers's releases.

v4.19.0: OPT, FLAVA, YOLOS, RegNet, TAPEX, Data2Vec vision, FSDP integration

Disclaimer: this release is the first release with no Python 3.6 support.

OPT

The OPT model was proposed in Open Pre-trained Transformer Language Models by Meta AI. OPT is a series of open-sourced large causal language models which perform similar in performance to GPT3.

Add OPT by @younesbelkada in #17088

FLAVA

The FLAVA model was proposed in FLAVA: A Foundational Language And Vision Alignment Model by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela and is accepted at CVPR 2022.

The paper aims at creating a single unified foundation model which can work across vision, language as well as vision-and-language multimodal tasks.

[feat] Add FLAVA model by @apsdehal in #16654

YOLOS

The YOLOS model was proposed in You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection by Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu. YOLOS proposes to just leverage the plain Vision Transformer (ViT) for object detection, inspired by DETR. It turns out that a base-sized encoder-only Transformer can also achieve 42 AP on COCO, similar to DETR and much more complex frameworks such as Faster R-CNN.

Add YOLOS by @NielsRogge in #16848

RegNet

The RegNet model was proposed in Designing Network Design Spaces by Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár.

The authors design search spaces to perform Neural Architecture Search (NAS). They first start from a high dimensional search space and iteratively reduce the search space by empirically applying constraints based on the best-performing models sampled by the current search space.

RegNet by @FrancescoSaverioZuppichini in #16188

TAPEX

The TAPEX model was proposed in TAPEX: Table Pre-training via Learning a Neural SQL Executor by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou. TAPEX pre-trains a BART model to solve synthetic SQL queries, after which it can be fine-tuned to answer natural language questions related to tabular data, as well as performing table fact checking.

Add TAPEX by @NielsRogge in #16473

Data2Vec: vision

The Data2Vec model was proposed in data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu and Michael Auli. Data2Vec proposes a unified framework for self-supervised learning across different data modalities - text, audio and images. Importantly, predicted targets for pre-training are contextualized latent representations of the inputs, rather than modality-specific, context-independent targets.

The vision model is added in v4.19.0.

[Data2Vec] Add data2vec vision by @patrickvonplaten in #16760

Add Data2Vec for Vision in TF by @sayakpaul in #17008

FSDP integration in Trainer

PyTorch recently upstreamed the Fairscale FSDP into PyTorch Distributed with additional optimizations. This PR is aimed at integrating it into Trainer API.

... (truncated)

Commits

a22db88 Release: v4.19.0

9f16a1c Update data2vec.mdx to include a Colab Notebook link (that shows fine-tuning)...

a42242d migrate azure blob for beit checkpoints (#16902)

b971c76 Add OPT (#17088)

8c7481f ViT and Swin symbolic tracing with torch.fx (#17182)

1a68870 Fix contents in index.mdx to match docs' sidebar (#17198)

b17b788 Fix style error in Spanish docs (#17197)

1a66a6c Translate index.mdx (to ES) and add Spanish models to quicktour.mdx examples ...

e2d678b Documentation: Spanish translation of fast_tokenizers.mdx (#16882)

ae82da2 Added es version of language_modeling.mdx doc (#17021)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies
opened by dependabot[bot] 0
Update transformers requirement from <4.25,>=4.14 to >=4.14,<4.26
Updates the requirements on transformers to permit the latest version.

Commits

31d452c Release v4.25.1

7378726 Release: v4.25.0

e342ac7 Add some warning for Dynamo and enable TF32 when it's set (#20515)

68cfffc Fix Data2VecTextForCasualLM example code documentation (#20510)

dd6fb13 Add natten for CI (#20511)

afb6674 Update AutomaticSpeechRecognitionPipeline doc example (#20512)

04c653a Fix style

7217640 Add Chinese-CLIP implementation (#20368)

396a6a2 Fix minimum version for device_map (#20489)

08b4621 Repurpose torchdynamo training args towards torch._dynamo (#20498)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies
opened by dependabot[bot] 0

Releases(3.0.8)

3.0.8(Nov 3, 2022)
What's Changed

Update torch requirement from <1.13,>=1.7 to >=1.7,<1.14 by @dependabot in https://github.com/Riccorl/transformers-embedder/pull/51

Update transformers requirement from <4.23,>=4.14 to >=4.14,<4.25 by @dependabot in https://github.com/Riccorl/transformers-embedder/pull/52

Full Changelog: https://github.com/Riccorl/transformers-embedder/compare/3.0.7...3.0.8
Source code(tar.gz)
Source code(zip)
3.0.7(Oct 21, 2022)

Full Changelog: https://github.com/Riccorl/transformers-embedder/compare/3.0.6...3.0.7
Source code(tar.gz)
Source code(zip)
3.0.6(Oct 10, 2022)

Full Changelog: https://github.com/Riccorl/transformers-embedder/compare/3.0.5...3.0.6
Source code(tar.gz)
Source code(zip)
3.0.5(Oct 10, 2022)
What's Changed

Update transformers requirement from <4.22,>=4.14 to >=4.14,<4.23 by @dependabot in https://github.com/Riccorl/transformers-embedder/pull/49

Full Changelog: https://github.com/Riccorl/transformers-embedder/compare/3.0.4...3.0.5
Source code(tar.gz)
Source code(zip)
3.0.4(Jul 28, 2022)
What's Changed

Update transformers requirement from <4.21,>=4.14 to >=4.14,<4.22 by @dependabot in https://github.com/Riccorl/transformers-embedder/pull/48

Full Changelog: https://github.com/Riccorl/transformers-embedder/compare/3.0.3...3.0.4
Source code(tar.gz)
Source code(zip)
3.0.3(Jul 7, 2022)
What's Changed

Update torch requirement from <1.12,>=1.7 to >=1.7,<1.13 by @dependabot in https://github.com/Riccorl/transformers-embedder/pull/47

Full Changelog: https://github.com/Riccorl/transformers-embedder/compare/3.0.2...3.0.3
Source code(tar.gz)
Source code(zip)
3.0.2(Jun 17, 2022)
What's Changed

Update transformers requirement from <4.20,>=4.3 to >=4.3,<4.21 by @dependabot in https://github.com/Riccorl/transformers-embedder/pull/46

Full Changelog: https://github.com/Riccorl/transformers-embedder/compare/3.0.1...3.0.2
Source code(tar.gz)
Source code(zip)
3.0.1(May 30, 2022)
What's Changed

Update transformers requirement from <4.19,>=4.3 to >=4.3,<4.20 by @dependabot in https://github.com/Riccorl/transformers-embedder/pull/45

Full Changelog: https://github.com/Riccorl/transformers-embedder/compare/3.0.0...3.0.1
Source code(tar.gz)
Source code(zip)
3.0.0(Apr 8, 2022)
What's Changed

Sparse pooling by @Flegyas . It is the default subword pooling strategy. It is deterministic, but it doesn't support ONNX runtime export.

return_words parameter is now subword_pooling_strategy and the possible values are sparse, scatter and none.

Tokenizer accept return_sparse_offsets during initialization. If you are using the scatter subword pooling strategy, you can set it to False to reduce memory usage.

Update transformers requirement from <4.18,>=4.3 to >=4.3,<4.19 by @dependabot in https://github.com/Riccorl/transformers-embedder/pull/44

New Contributors

@Flegyas made their first contribution in https://github.com/Riccorl/transformers-embedder/pull/43

Full Changelog: https://github.com/Riccorl/transformers-embedder/compare/2.0.2...3.0.0
Source code(tar.gz)
Source code(zip)
3.0.0rc1(Apr 8, 2022)
What's Changed

Update transformers requirement from <4.18,>=4.3 to >=4.3,<4.19 by @dependabot in https://github.com/Riccorl/transformers-embedder/pull/44

Full Changelog: https://github.com/Riccorl/transformers-embedder/compare/2.1.0b1...3.0.0rc1
Source code(tar.gz)
Source code(zip)
2.1.0b1(Mar 23, 2022)
What's Changed

Feature/sparse pooling by @Flegyas in https://github.com/Riccorl/transformers-embedder/pull/43

New Contributors

@Flegyas made their first contribution in https://github.com/Riccorl/transformers-embedder/pull/43

Full Changelog: https://github.com/Riccorl/transformers-embedder/compare/2.0.2...2.1.0b1
Source code(tar.gz)
Source code(zip)
2.0.2(Mar 15, 2022)

Full Changelog: https://github.com/Riccorl/transformers-embedder/compare/2.0.1...2.0.2
Source code(tar.gz)
Source code(zip)
2.0.1(Mar 14, 2022)
What's Changed

Improve docstrings to be compliant with the new version by @LeonardoEmili in https://github.com/Riccorl/transformers-embedder/pull/40

Update torch requirement from <1.11,>=1.7 to >=1.7,<1.12 by @dependabot in https://github.com/Riccorl/transformers-embedder/pull/42

Full Changelog: https://github.com/Riccorl/transformers-embedder/compare/2.0.0...2.0.1
Source code(tar.gz)
Source code(zip)
2.0.0(Mar 7, 2022)
Changelog

TransformersEmbedder:

Added TransformersEncoder class, which includes an Encoder module:

BatchNorm1d normalization layer

A Linear projection layer

Dropout

Swish activation function at the end

Changed pooling_strategy to layer_pooling_strategy

Tokenizer:

Underlying tokenization relies completely on the HuggingFace's PreTrainedTokenizer

Offsets computation based on HuggingFace word_ids parameter in BatchEncoding

Removed spaCy dependency, it should be able to perform sub-word pooling even without pre-tokenization

What's Changed

Improve documentation by @LeonardoEmili in https://github.com/Riccorl/transformers-embedder/pull/38

Merge 2.0b into main by @Riccorl in https://github.com/Riccorl/transformers-embedder/pull/39

Full Changelog: https://github.com/Riccorl/transformers-embedder/compare/1.9.0...2.0.0
Source code(tar.gz)
Source code(zip)
1.9.0(Feb 18, 2022)
What's Changed

Minor improvements to the Embedder by @LeonardoEmili in https://github.com/Riccorl/transformers-embedder/pull/35

New Contributors

@LeonardoEmili made their first contribution in https://github.com/Riccorl/transformers-embedder/pull/35

Full Changelog: https://github.com/Riccorl/transformers-embedder/compare/1.8.5...1.9.0
Source code(tar.gz)
Source code(zip)
1.8.5(Feb 9, 2022)
What's Changed

Update transformers requirement from <4.14,>=4.3 to >=4.3,<4.15 by @dependabot in https://github.com/Riccorl/transformers-embedder/pull/32

Update transformers requirement from <4.15,>=4.3 to >=4.3,<4.16 by @dependabot in https://github.com/Riccorl/transformers-embedder/pull/33

Update transformers requirement from <4.16,>=4.3 to >=4.3,<4.17 by @dependabot in https://github.com/Riccorl/transformers-embedder/pull/34

Full Changelog: https://github.com/Riccorl/transformers-embedder/compare/1.8.4...1.8.5
Source code(tar.gz)
Source code(zip)
1.8.4(Dec 11, 2021)
Update README.md

Fix import bug.

Fix import bug.

Source code(tar.gz)
Source code(zip)
1.8.3(Dec 11, 2021)
Update README.md

Merge pull request #30 from Riccorl/dependabot/pip/spacy-gte-3.0-and-lt-3.3

Update spacy requirement from <3.2,>=3.0 to >=3.0,<3.3

Merge pull request #31 from Riccorl/dependabot/pip/transformers-gte-4.3-and-lt-4.14

Update transformers requirement from <4.13,>=4.3 to >=4.3,<4.14

Fix output attention bug. Update dependencies.

Source code(tar.gz)
Source code(zip)
1.8.2(Oct 29, 2021)
What's Changed

Update transformers requirement from <4.12,>=4.3 to >=4.3,<4.13 by @dependabot in https://github.com/Riccorl/transformers-embedder/pull/29

Full Changelog: https://github.com/Riccorl/transformers-embedder/compare/1.8.1...1.8.2
Source code(tar.gz)
Source code(zip)
1.8.1(Oct 23, 2021)
Merge pull request #27 from Riccorl/deepsource-transform-1020b285

Format code with black

Update readme

Update redme

Merge pull request #28 from Riccorl/dependabot/pip/torch-gte-1.7-and-lt-1.11

Update torch requirement from <1.10,>=1.7 to >=1.7,<1.11

Name changed to transformers_embedder, changed subword pooling logic, update actions.

Update actions

Update actions

Update actions

Update actions, update README

Update actions, update README

Update actions

Update actions

Update actions

Update actions

Update actions

Update actions

Update actions

Update actions

Update actions

Update actions

Update actions

Update actions, update README

Source code(tar.gz)
Source code(zip)
1.8(Oct 23, 2021)
Merge pull request #27 from Riccorl/deepsource-transform-1020b285

Format code with black

Update readme

Update redme

Merge pull request #28 from Riccorl/dependabot/pip/torch-gte-1.7-and-lt-1.11

Update torch requirement from <1.10,>=1.7 to >=1.7,<1.11

Name changed to transformers_embedder, changed subword pooling logic, update actions.

Update actions

Update actions

Update actions

Update actions, update README

Update actions, update README

Update actions

Update actions

Update actions

Update actions

Update actions

Update actions

Update actions

Update actions

Update actions

Update actions

Update actions

Source code(tar.gz)
Source code(zip)
1.7.16(Oct 1, 2021)
Added model config to parameters

Update Transformers version to 4.11

Source code(tar.gz)
Source code(zip)
1.7.15(Sep 10, 2021)
Update dependencies:

Transformers from 4.9 to 4.10

Source code(tar.gz)
Source code(zip)
1.7.14(Sep 9, 2021)
Update dependencies:

Transformers from 4.9 to 4.10

Source code(tar.gz)
Source code(zip)
1.7.13(Jul 8, 2021)

Fix torch import when padding without tensors
Source code(tar.gz)
Source code(zip)
1.7.12(Jul 8, 2021)

pip install transformer-embedder only install transformers dependency. To install spaCy and PyTorch, use pip install "transformer-embedder["spacy"]" and pip install "transformer-embedder["torch"]"
Source code(tar.gz)
Source code(zip)
1.7.10(Jul 5, 2021)

Fix torch.Tensor deprecation.
Source code(tar.gz)
Source code(zip)
1.7.5(Jul 4, 2021)
Update dependencies:

PyTorch 1.8 to 1.9

Transformers 4.5 to 4.8

Source code(tar.gz)
Source code(zip)
1.7.3(Apr 19, 2021)

max_length parameter in pad_batch function
Source code(tar.gz)
Source code(zip)
1.7.2(Apr 14, 2021)

Update dependencies
Source code(tar.gz)
Source code(zip)

A Word Level Transformer layer based on PyTorch and 🤗 Transformers.

Related tags

Overview

Transformer Embedder

How to use

Info

Model

Tokenizer

Examples

Custom fields

SpaCy Tokenizer

To-Do

Acknowledgements

Comments

v4.23.0: Whisper, Time series, Conditional DETR, MSN, MarkupLM, safetensors

Whisper

Time series

Conditional DETR

Masked Siamese Networks

MarkupLM

PyTorch 1.12: TorchArrow, Functional API for Modules and nvFuser, are now available

PyTorch 1.12 Release Notes

Highlights

Backwards Incompatible changes

Python API

Releasing PyTorch

General Overview

Cutting a release branch preparations

v4.17.0: XGLM, ConvNext, PoolFormer, PLBart, Data2Vec, MaskFormer and code in the Hub

New models

XGLM

ConvNext

PoolFormer

PLBart

Data2Vec

Maskformer

v4.24.0: ESM-2/ESMFold, LiLT, Flan-T5, Table Transformer and Contrastive search decoding

ESM-2/ESMFold

LiLT

Flan-T5

Table Transformer

Contrastive search decoding

Safety and security

PyTorch 1.13: beta versions of functorch and improved support for Apple’s new M1 chips are now available

Pytorch 1.13 Release Notes

Highlights

Backwards Incompatible changes

Python API

uint8 and all integer dtype masks are no longer allowed in Transformer (#87106)

Releasing PyTorch

General Overview

Cutting a release branch preparations

v4.22.0: Swin Transformer v2, VideoMAE, Donut, Pegasus-X, X-CLIP, ERNIE

Swin Transformer v2

VideoMAE

Donut

Pegasus-X

X-CLIP

ERNIE

TensorFlow models

New task-specific architectures

v4.21.0: TF XLA text generation - Custom Pipelines - OwlViT, NLLB, MobileViT, Nezha, GroupViT, MVP, CodeGen, UL2

TensorFlow XLA Text Generation

Main changes with respect to the original generate workflow: tf.function and pad_to_multiple_of

The first prompt will be slow (compiling), the others will be very fast!

New model additions

OwlViT

NLLB

v4.20.0 Big Model infernece, BLOOM, CvT, GPT Neo-X, LayoutLMv3, LeViT, LongT5, M-CTC-T, Trajectory Transformer and Wav2Vec2-Conformer

Big model inference

BLOOM

CvT

GPT Neo-X

LayoutLMv3

LeViT

LongT5

v4.19.0: OPT, FLAVA, YOLOS, RegNet, TAPEX, Data2Vec vision, FSDP integration

OPT

FLAVA

YOLOS

v4.23.0: Whisper, Time series, Conditional DETR, MSN, MarkupLM, `safetensors`

Main changes with respect to the original generate workflow: `tf.function` and `pad_to_multiple_of`