A Word Level Transformer layer based on PyTorch and 🤗 Transformers.

Overview

Transformer Embedder

Open in Visual Studio Code Upload to PyPi PyTorch Transformers Version Code style: black DeepSource

A Word Level Transformer layer based on PyTorch and 🤗 Transformers.

How to use

Install the library from PyPI:

pip install transformer-embedder

It offers a PyTorch layer and a tokenizer that support almost every pretrained model from Huggingface 🤗 Transformers library. Here is a quick example:

import transformer_embedder as tre

tokenizer = tre.Tokenizer("bert-base-cased")
model = tre.TransformerEmbedder("bert-base-cased", subtoken_pooling="mean", output_layer="sum")

example = "This is a sample sentence"
inputs = tokenizer(example, return_tensors=True)
{
   'input_ids': tensor([[ 101, 1188, 1110,  170, 6876, 5650,  102]]),
   'offsets': tensor([[0, 0], [1, 1], [2, 2], [3, 3], [4, 4], [5, 5], [6, 6]]]),
   'attention_mask': tensor([[True, True, True, True, True, True, True]]),
   'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0]])
   'sentence_length': 7  # with special tokens included
}
outputs = model(**inputs)
# outputs.shape[1:-1]       # remove [CLS] and [SEP]
torch.Size([1, 5, 768])
# len(example)
5

Info

One of the annoyance of using transfomer-based models is that it is not trivial to compute word embeddings from the sub-token embeddings they output. With this library it's as easy as using 🤗 Transformers API to get word-level embeddings from theoretically every transformer model it supports.

Model

The TransformerEmbedder offer 4 ways to retrieve the word embeddings, defined by subtoken_pooling parameter:

  • first: uses only the embedding of the first sub-token of each word
  • last: uses only the embedding of the last sub-token of each word
  • mean: computes the mean of the embeddings of the sub-tokens of each word
  • none: returns the raw output of the transformer model without sub-token pooling

There are also multiple type of outputs you can get using output_layer parameter:

  • last: returns the last hidden state of the transformer model
  • concat: returns the concatenation of the last four hidden states of the transformer model
  • sum: returns the sum of the last four hidden states of the transformer model
  • pooled: returns the output of the pooling layer

If you also want all the outputs from the HuggingFace model, you can set return_all=True to get them.

class TransformerEmbedder(torch.nn.Module):
    def __init__(
        self,
        model: Union[str, tr.PreTrainedModel],
        subtoken_pooling: str = "first",
        output_layer: str = "last",
        fine_tune: bool = True,
        return_all: bool = False,
    )

Tokenizer

The Tokenizer class provides the tokenize method to preprocess the input for the TransformerEmbedder layer. You can pass raw sentences, pre-tokenized sentences and sentences in batch. It will preprocess them returning a dictionary with the inputs for the model. By passing return_tensors=True it will return the inputs as torch.Tensor.

By default, if you pass text (or batch) as strings, it splits them on spaces

text = "This is a sample sentence"
tokenizer(text)

text = ["This is a sample sentence", "This is another sample sentence"]
tokenizer(text)

You can also use SpaCy to pre-tokenize the inputs into words first, using use_spacy=True

text = "This is a sample sentence"
tokenizer(text, use_spacy=True)

text = ["This is a sample sentence", "This is another sample sentence"]
tokenizer(text, use_spacy=True)

or you can pass an pre-tokenized sentence (or batch of sentences) by setting is_split_into_words=True

text = ["This", "is", "a", "sample", "sentence"]
tokenizer(text, is_split_into_words=True)

text = [
    ["This", "is", "a", "sample", "sentence", "1"],
    ["This", "is", "sample", "sentence", "2"],
]
tokenizer(text, is_split_into_words=True) # here is_split_into_words is redundant

Examples

First, initialize the tokenizer

import transformer_embedder as tre

tokenizer = tre.Tokenizer("bert-base-cased")
  • You can pass a single sentence as a string:
text = "This is a sample sentence"
tokenizer(text)
{
  'input_ids': [101, 1188, 1110, 170, 6876, 5650, 102],
  'offsets': [(0, 0), (1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6)],
  'attention_mask': [True, True, True, True, True, True, True],
  'token_type_ids': [0, 0, 0, 0, 0, 0, 0],
  'sentence_length': 7
}
  • A sentence pair
text = "This is a sample sentence A"
text_pair = "This is a sample sentence B"
tokenizer(text, text_pair)
{
  'input_ids': [101, 1188, 1110, 170, 6876, 5650, 138, 102, 1188, 1110, 170, 6876, 5650, 139, 102],
  'offsets': [(0, 0), (1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6), (7, 7), (8, 8), (9, 9), (10, 10), (11, 11), (12, 12), (13, 13), (14, 14)],
  'attention_mask': [True, True, True, True, True, True, True, True, True, True, True, True, True, True, True],
  'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1],
  'sentence_length': 15
}
  • A batch of sentences or sentence pairs. Using padding=True and return_tensors=True, the tokenizer returns the text ready for the model
batch = [
    ["This", "is", "a", "sample", "sentence", "1"],
    ["This", "is", "sample", "sentence", "2"],
    ["This", "is", "a", "sample", "sentence", "3"],
    # ...
    ["This", "is", "a", "sample", "sentence", "n", "for", "batch"],
]
tokenizer(batch, padding=True, return_tensors=True)

batch_pair = [
    ["This", "is", "a", "sample", "sentence", "pair", "1"],
    ["This", "is", "sample", "sentence", "pair", "2"],
    ["This", "is", "a", "sample", "sentence", "pair", "3"],
    # ...
    ["This", "is", "a", "sample", "sentence", "pair", "n", "for", "batch"],
]
tokenizer(batch, batch_pair, padding=True, return_tensors=True)

Custom fields

It is possible to add custom fields to the model input and tell the tokenizer how to pad them using add_padding_ops. Start by simply tokenizing the input (without padding or tensor mapping)

import transformer_embedder as tre

tokenizer = tre.Tokenizer("bert-base-cased")

text = [
    ["This", "is", "a", "sample", "sentence"],
    ["This", "is", "another", "example", "sentence", "just", "make", "it", "longer"]
]
inputs = tokenizer(text)

Then add the custom fileds to the result

custom_fields = {
  "custom_filed_1": [
    [0, 0, 0, 0, 1, 0, 0],
    [0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0]
  ]
}

inputs.update(custom_fields)

Now we can add the padding logic for our custom field custom_filed_1. add_padding_ops method takes in input

  • key: name of the field in the tokenzer input
  • value: value to use for padding
  • length: length to pad. It can be an int, or two string value, subtoken in which the element is padded to the batch max length relative to the sub-tokens length, and word where the element is padded to the batch max length relative to the original word length
tokenizer.add_padding_ops("custom_filed_1", 0, "word")

Finally, pad the input and convert it to a tensor:

# manual processing
inputs = tokenizer.pad_batch(inputs)
inputs = tokenizer.to_tensor(inputs)

The inputs are ready for the model, including the custom filed.

>>> inputs

{
   "input_ids": tensor(
       [
           [101, 1188, 1110, 170, 6876, 5650, 102, 0, 0, 0, 0],
           [101, 1188, 1110, 1330, 1859, 5650, 1198, 1294, 1122, 2039, 102],
       ]
   ),
   "offsets": tensor(
       [
           [[0, 0], [1, 1], [2, 2], [3, 3], [4, 4], [5, 5], [6, 6], [7, 7], [-1, -1], [-1, -1], [-1, -1]],
           [[0, 0], [1, 1], [2, 2], [3, 3], [4, 4], [5, 5], [6, 6], [7, 7], [8, 8], [9, 9], [10, 10]],
       ]
   ),
   "attention_mask": tensor(
       [
           [True, True, True, True, True, True, True, False, False, False, False],
           [True, True, True, True, True, True, True, True, True, True, True],
       ]
   ),
   "word_mask": tensor(
       [
           [True, True, True, True, True, True, True, False, False, False, False],
           [True, True, True, True, True, True, True, True, True, True, True],
       ]
   ),
   "token_type_ids": tensor(
       [[0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
   ),
   "sentence_length": tensor([7, 11]),
   "custom_filed_1": tensor(
       [[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], [0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0]]
   ),
}

SpaCy Tokenizer

By default, it uses the multilingual model xx_sent_ud_sm. You can change it with the language parameter during the tokenizer initialization. For example, if you prefer an English tokenizer:

tokenizer = tre.Tokenizer("bert-base-cased", language="en_core_web_sm")

For a complete list of languages and models, you can go here.

To-Do

Future developments

  • Add an optional word tokenizer, maybe using SpaCy
  • Add add_special_tokens wrapper
  • Make pad_batch function more general
  • Add logic (like how to pad, etc) for custom fields
    • Documentation
  • Include all model outputs
    • Documentation
  • A TensorFlow version (improbable)

Acknowledgements

Most of the code in the TransformerEmbedder class is taken from the AllenNLP library. The pretrained models and the core of the tokenizer is from 🤗 Transformers.

Comments
  • Minor improvements to the Embedder

    Minor improvements to the Embedder

    The following changes have been applied:

    • Add support to average the last four hidden layers of the transformer model
    • ~~Add a shape property to the TransformersEmbedderOutput class (referenced in the README)~~
    • Add option to specify which hidden states to use for pooling
    • Update the docs accordingly
    • Run black in compliance with the project specifications
    • Fix documentation issues
    opened by LeonardoEmili 3
  • Update transformers requirement from <4.23,>=4.14 to >=4.14,<4.24

    Update transformers requirement from <4.23,>=4.14 to >=4.14,<4.24

    Updates the requirements on transformers to permit the latest version.

    Release notes

    Sourced from transformers's releases.

    v4.23.0: Whisper, Time series, Conditional DETR, MSN, MarkupLM, safetensors

    Whisper

    The Whisper model was proposed in Robust Speech Recognition via Large-Scale Weak Supervision by Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever.

    The abstract from the paper is the following:

    We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual and multitask supervision, the resulting models generalize well to standard benchmarks and are often competitive with prior fully supervised results but in a zeroshot transfer setting without the need for any finetuning. When compared to humans, the models approach their accuracy and robustness. We are releasing models and inference code to serve as a foundation for further work on robust speech processing.

    Time series

    The Time Series Transformer model is a vanilla encoder-decoder Transformer for time series forecasting.

    :warning: This is a recently introduced model and modality, so the API hasn't been tested extensively. There may be some bugs or slight breaking changes to fix it in the future. If you see something strange, file a Github Issue.

    Conditional DETR

    The Conditional DETR model was proposed in Conditional DETR for Fast Training Convergence by Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang. Conditional DETR presents a conditional cross-attention mechanism for fast DETR training. Conditional DETR converges 6.7× to 10× faster than DETR.

    The abstract from the paper is the following:

    The recently-developed DETR approach applies the transformer encoder and decoder architecture to object detection and achieves promising performance. In this paper, we handle the critical issue, slow training convergence, and present a conditional cross-attention mechanism for fast DETR training. Our approach is motivated by that the cross-attention in DETR relies highly on the content embeddings for localizing the four extremities and predicting the box, which increases the need for high-quality content embeddings and thus the training difficulty. Our approach, named conditional DETR, learns a conditional spatial query from the decoder embedding for decoder multi-head cross-attention. The benefit is that through the conditional spatial query, each cross-attention head is able to attend to a band containing a distinct region, e.g., one object extremity or a region inside the object box. This narrows down the spatial range for localizing the distinct regions for object classification and box regression, thus relaxing the dependence on the content embeddings and easing the training. Empirical results show that conditional DETR converges 6.7× faster for the backbones R50 and R101 and 10× faster for stronger backbones DC5-R50 and DC5-R101.

    Masked Siamese Networks

    The ViTMSN model was proposed in Masked Siamese Networks for Label-Efficient Learning by Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas. The paper presents a joint-embedding architecture to match the prototypes of masked patches with that of the unmasked patches. With this setup, their method yields excellent performance in the low-shot and extreme low-shot regimes.

    The abstract from the paper is the following:

    We propose Masked Siamese Networks (MSN), a self-supervised learning framework for learning image representations. Our approach matches the representation of an image view containing randomly masked patches to the representation of the original unmasked image. This self-supervised pre-training strategy is particularly scalable when applied to Vision Transformers since only the unmasked patches are processed by the network. As a result, MSNs improve the scalability of joint-embedding architectures, while producing representations of a high semantic level that perform competitively on low-shot image classification. For instance, on ImageNet-1K, with only 5,000 annotated images, our base MSN model achieves 72.4% top-1 accuracy, and with 1% of ImageNet-1K labels, we achieve 75.7% top-1 accuracy, setting a new state-of-the-art for self-supervised learning on this benchmark.

    MarkupLM

    The MarkupLM model was proposed in MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding by Junlong Li, Yiheng Xu, Lei Cui, Furu Wei. MarkupLM is BERT, but applied to HTML pages instead of raw text documents. The model incorporates additional embedding layers to improve performance, similar to LayoutLM.

    The model can be used for tasks like question answering on web pages or information extraction from web pages. It obtains state-of-the-art results on 2 important benchmarks:

    WebSRC, a dataset for Web-Based Structual Reading Comprehension (a bit like SQuAD but for web pages) SWDE, a dataset for information extraction from web pages (basically named-entity recogntion on web pages) The abstract from the paper is the following:

    ... (truncated)

    Commits

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 1
  • Update torch requirement from <1.12,>=1.7 to >=1.7,<1.13

    Update torch requirement from <1.12,>=1.7 to >=1.7,<1.13

    Updates the requirements on torch to permit the latest version.

    Release notes

    Sourced from torch's releases.

    PyTorch 1.12: TorchArrow, Functional API for Modules and nvFuser, are now available

    PyTorch 1.12 Release Notes

    • Highlights
    • Backwards Incompatible Change
    • New Features
    • Improvements
    • Performance
    • Documentation

    Highlights

    We are excited to announce the release of PyTorch 1.12! This release is composed of over 3124 commits, 433 contributors. Along with 1.12, we are releasing beta versions of AWS S3 Integration, PyTorch Vision Models on Channels Last on CPU, Empowering PyTorch on Intel® Xeon® Scalable processors with Bfloat16 and FSDP API. We want to sincerely thank our dedicated community for your contributions.

    Summary:

    • Functional Module API to functionally apply module computation with a given set of parameters
    • Complex32 and Complex Convolutions in PyTorch
    • DataPipes from TorchData fully backward compatible with DataLoader
    • Functorch with improved coverage for APIs
    • nvFuser a deep learning compiler for PyTorch
    • Changes to float32 matrix multiplication precision on Ampere and later CUDA hardware
    • TorchArrow, a new beta library for machine learning preprocessing over batch data

    Backwards Incompatible changes

    Python API

    Updated type promotion for torch.clamp (#77035)

    In 1.11, the ‘min’ and ‘max’ arguments in torch.clamp did not participate in type promotion, which made it inconsistent with minimum and maximum operations. In 1.12, the ‘min’ and ‘max’ arguments participate in type promotion.

    1.11

    >>> import torch
    >>> a = torch.tensor([1., 2., 3., 4.], dtype=torch.float32)
    >>> b = torch.tensor([2., 2., 2., 2.], dtype=torch.float64)
    >>> c = torch.tensor([3., 3., 3., 3.], dtype=torch.float64)
    >>> torch.clamp(a, b, c).dtype
    torch.float32
    

    1.12

    >>> import torch
    >>> a = torch.tensor([1., 2., 3., 4.], dtype=torch.float32)
    >>> b = torch.tensor([2., 2., 2., 2.], dtype=torch.float64)
    >>> c = torch.tensor([3., 3., 3., 3.], dtype=torch.float64)
    </tr></table> 
    

    ... (truncated)

    Changelog

    Sourced from torch's changelog.

    Releasing PyTorch

    General Overview

    Releasing a new version of PyTorch generally entails 3 major steps:

    1. Cutting a release branch preparations
    2. Cutting a release branch and making release branch specific changes
    3. Drafting RCs (Release Candidates), and merging cherry picks
    4. Promoting RCs to stable and performing release day tasks

    Cutting a release branch preparations

    Following Requirements needs to be met prior to final RC Cut:

    • Resolve all outstanding issues in the milestones(for example 1.11.0)before first RC cut is completed. After RC cut is completed following script should be executed from builder repo in order to validate the presence of the fixes in the release branch : python github_analyze.py --repo-path ~/local/pytorch --remote upstream --branch release/1.11 --milestone-id 26 --missing-in-branch

    ... (truncated)

    Commits

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 1
  • Feature/sparse pooling

    Feature/sparse pooling

    This PR adds a new pooling method for subwords (well, two, but the inefficient one is there only for benchmarking purposes). The sparse one is necessary for contexts where we want to enable CUDA determinism since scatter methods do not support it.

    The script benchmark.py compares them, but I think that there is some mismatch in the approaches since these are the results (GPU: NVIDIA 2060S | CPU: AMD 3700X):

    scatter == sparse (allclose with atol=1e-07): False
    scatter == inefficient (allclose with atol=1e-07): False
    sparse == inefficient (allclose with atol=1e-07): True
    scatter 23.960102558135986s
    sparse 23.518492221832275s
    inefficient 24.366436004638672s
    

    I wrote the "inefficient" pooling method as a control one, and it seems like the scatter method is not matching its results. I think the mismatch can be traced to something weird happening with the padded positions, but I didn't investigate further.

    I could very well have implemented both the control and the sparse methods wrongly, so please double-check everything!

    And thank you for the library, it is truly useful!

    opened by Flegyas 1
  • Update transformers requirement from <4.17,>=4.3 to >=4.3,<4.18

    Update transformers requirement from <4.17,>=4.3 to >=4.3,<4.18

    Updates the requirements on transformers to permit the latest version.

    Release notes

    Sourced from transformers's releases.

    v4.17.0: XGLM, ConvNext, PoolFormer, PLBart, Data2Vec, MaskFormer and code in the Hub

    New models

    XGLM

    The XGLM model was proposed in Few-shot Learning with Multilingual Language Models by Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O’Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li.

    XGLM is a GPT3-like multilingual model trained on a balanced corpus covering a diverse set of languages.

    ConvNext

    The ConvNeXT model was proposed in A ConvNet for the 2020s by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie.

    ConvNeXT is a pure convolutional model (ConvNet), inspired by the design of Vision Transformers, that claims to outperform them.

    PoolFormer

    The PoolFormer model was proposed in MetaFormer is Actually What You Need for Vision by Sea AI Labs.

    PLBart

    The PLBART model was proposed in Unified Pre-training for Program Understanding and Generation by Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang.

    This is a BART-like model which can be used to perform code-summarization, code-generation, and code-translation tasks. The pre-trained model plbart-base has been trained using multilingual denoising task on Java, Python and English.

    Data2Vec

    The Data2Vec model was proposed in data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu and Michael Auli.

    Data2Vec proposes a unified framework for self-supervised learning across different data modalities - text, audio and images. Importantly, predicted targets for pre-training are contextualized latent representations of the inputs, rather than modality-specific, context-independent targets.

    Maskformer

    The MaskFormer model was proposed in Per-Pixel Classification is Not All You Need for Semantic Segmentation by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov.

    MaskFormer addresses semantic segmentation with a mask classification paradigm instead of performing classic pixel-level classification.

    ... (truncated)

    Commits

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 1
  • Update transformers requirement from <4.23,>=4.14 to >=4.14,<4.25

    Update transformers requirement from <4.23,>=4.14 to >=4.14,<4.25

    Updates the requirements on transformers to permit the latest version.

    Release notes

    Sourced from transformers's releases.

    v4.24.0: ESM-2/ESMFold, LiLT, Flan-T5, Table Transformer and Contrastive search decoding

    ESM-2/ESMFold

    ESM-2 and ESMFold are new state-of-the-art Transformer protein language and folding models from Meta AI's Fundamental AI Research Team (FAIR). ESM-2 is trained with a masked language modeling objective, and it can be easily transferred to sequence and token classification tasks for proteins. Checkpoints exist in various sizes, from 8 million parameters up to a huge 15 billion parameter model.

    ESMFold is a state-of-the-art single sequence protein folding model which produces high accuracy predictions significantly faster. Unlike previous protein folding tools like AlphaFold2 and openfold, ESMFold uses a pretrained protein language model to generate token embeddings that are used as input to the folding model, and so does not require a multiple sequence alignment (MSA) of related proteins as input. As a result, proteins can be folded in a single forward pass of the model without requiring any external databases or search/alignment tools to be present at inference time. This hugely reduces the time and compute requirements for folding.

    Transformer protein language models were introduced in the paper Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences by Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus.

    ESMFold was introduced in the paper Language models of protein sequences at the scale of evolution enable accurate structure prediction by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, and Alexander Rives.

    LiLT

    LiLT allows to combine any pre-trained RoBERTa text encoder with a lightweight Layout Transformer, to enable LayoutLM-like document understanding for many languages.

    It was proposed in LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding by Jiapeng Wang, Lianwen Jin, Kai Ding.

    Flan-T5

    FLAN-T5 is an enhanced version of T5 that has been finetuned on a mixture of tasks.

    It was released in the paper Scaling Instruction-Finetuned Language Models by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei.

    Table Transformer

    Table Transformer is a model that can perform table extraction and table structure recognition from unstructured documents based on the DETR architecture.

    It was proposed in PubTables-1M: Towards comprehensive table extraction from unstructured documents by Brandon Smock, Rohith Pesala, Robin Abraham.

    Contrastive search decoding

    Contrastive search decoding is a new state-of-the-art generation method which aims at reducing the repetitive patterns in which generation models often fall.

    It was introduced in A Contrastive Framework for Neural Text Generation by Yixuan Su, Tian Lan, Yan Wang, Dani Yogatama, Lingpeng Kong, Nigel Collier.

    • Adding the state-of-the-art contrastive search decoding methods for the codebase of generation_utils.py by @​gmftbyGMFTBY in #19477

    Safety and security

    We continue to explore the new serialization format not using Pickle via the safetensors library, this time by adding support for TensorFlow models. More checkpoints have been converted to this format. Support is still experimental.

    ... (truncated)

    Commits

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 0
  • Update torch requirement from <1.13,>=1.7 to >=1.7,<1.14

    Update torch requirement from <1.13,>=1.7 to >=1.7,<1.14

    Updates the requirements on torch to permit the latest version.

    Release notes

    Sourced from torch's releases.

    PyTorch 1.13: beta versions of functorch and improved support for Apple’s new M1 chips are now available

    Pytorch 1.13 Release Notes

    • Highlights
    • Backwards Incompatible Changes
    • New Features
    • Improvements
    • Performance
    • Documentation
    • Developers

    Highlights

    We are excited to announce the release of PyTorch 1.13! This includes stable versions of BetterTransformer. We deprecated CUDA 10.2 and 11.3 and completed migration of CUDA 11.6 and 11.7. Beta includes improved support for Apple M1 chips and functorch, a library that offers composable vmap (vectorization) and autodiff transforms, being included in-tree with the PyTorch release. This release is composed of over 3,749 commits and 467 contributors since 1.12.1. We want to sincerely thank our dedicated community for your contributions.

    Summary:

    • The BetterTransformer feature set supports fastpath execution for common Transformer models during Inference out-of-the-box, without the need to modify the model. Additional improvements include accelerated add+matmul linear algebra kernels for sizes commonly used in Transformer models and Nested Tensors is now enabled by default.

    • Timely deprecating older CUDA versions allows us to proceed with introducing the latest CUDA version as they are introduced by Nvidia®, and hence allows support for C++17 in PyTorch and new NVIDIA Open GPU Kernel Modules.

    • Previously, functorch was released out-of-tree in a separate package. After installing PyTorch, a user will be able to import functorch and use functorch without needing to install another package.

    • PyTorch is offering native builds for Apple® silicon machines that use Apple's new M1 chip as a beta feature, providing improved support across PyTorch's APIs.

    Stable Beta Prototype
    Better TransformerCUDA 10.2 and 11.3 CI/CD Deprecation Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIsExtend NNC to support channels last and bf16Functorch now in PyTorch Core LibraryBeta Support for M1 devices Arm® Compute Library backend support for AWS Graviton CUDA Sanitizer

    You can check the blogpost that shows the new features here.

    Backwards Incompatible changes

    Python API

    uint8 and all integer dtype masks are no longer allowed in Transformer (#87106)

    Prior to 1.13, key_padding_mask could be set to uint8 or other integer dtypes in TransformerEncoder and MultiheadAttention, which might generate unexpected results. In this release, these dtypes are not allowed for the mask anymore. Please convert them to torch.bool before using.

    1.12.1

    >>> layer = nn.TransformerEncoderLayer(2, 4, 2)
    >>> encoder = nn.TransformerEncoder(layer, 2)
    >>> pad_mask = torch.tensor([[1, 1, 0, 0]], dtype=torch.uint8)
    >>> inputs = torch.cat([torch.randn(1, 2, 2), torch.zeros(1, 2, 2)], dim=1)
    # works before 1.13
    >>> outputs = encoder(inputs, src_key_padding_mask=pad_mask)
    

    ... (truncated)

    Changelog

    Sourced from torch's changelog.

    Releasing PyTorch

    General Overview

    Releasing a new version of PyTorch generally entails 3 major steps:

    1. Cutting a release branch preparations
    2. Cutting a release branch and making release branch specific changes
    3. Drafting RCs (Release Candidates), and merging cherry picks
    4. Promoting RCs to stable and performing release day tasks

    Cutting a release branch preparations

    Following Requirements needs to be met prior to final RC Cut:

    • Resolve all outstanding issues in the milestones(for example 1.11.0)before first RC cut is completed. After RC cut is completed following script should be executed from builder repo in order to validate the presence of the fixes in the release branch : python github_analyze.py --repo-path ~/local/pytorch --remote upstream --branch release/1.11 --milestone-id 26 --missing-in-branch

    ... (truncated)

    Commits

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 0
  • Update transformers requirement from <4.22,>=4.14 to >=4.14,<4.23

    Update transformers requirement from <4.22,>=4.14 to >=4.14,<4.23

    Updates the requirements on transformers to permit the latest version.

    Release notes

    Sourced from transformers's releases.

    v4.22.0: Swin Transformer v2, VideoMAE, Donut, Pegasus-X, X-CLIP, ERNIE

    Swin Transformer v2

    The Swin Transformer V2 model was proposed in Swin Transformer V2: Scaling Up Capacity and Resolution by Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo.

    VideoMAE

    The VideoMAE model was proposed in VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training by Zhan Tong, Yibing Song, Jue Wang, Limin Wang. VideoMAE extends masked auto encoders (MAE) to video, claiming state-of-the-art performance on several video classification benchmarks.

    Donut

    The Donut model was proposed in OCR-free Document Understanding Transformer by Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park. Donut consists of an image Transformer encoder and an autoregressive text Transformer decoder to perform document understanding tasks such as document image classification, form understanding and visual question answering.

    Pegasus-X

    The PEGASUS-X model was proposed in Investigating Efficiently Extending Transformers for Long Input Summarization by Jason Phang, Yao Zhao and Peter J. Liu.

    PEGASUS-X (PEGASUS eXtended) extends the PEGASUS models for long input summarization through additional long input pretraining and using staggered block-local attention with global tokens in the encoder.

    X-CLIP

    The X-CLIP model was proposed in Expanding Language-Image Pretrained Models for General Video Recognition by Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, Haibin Ling. X-CLIP is a minimal extension of CLIP for video. The model consists of a text encoder, a cross-frame vision encoder, a multi-frame integration Transformer, and a video-specific prompt generator.

    ERNIE

    ERNIE is a series of powerful models proposed by baidu, especially in Chinese tasks, including ERNIE1.0, ERNIE2.0, ERNIE3.0, ERNIE-Gram, ERNIE-health, etc. These models are contributed by nghuyong and the official code can be found in PaddleNLP (in PaddlePaddle).

    TensorFlow models

    MobileViT and LayoutLMv3 are now available in TensorFlow.

    New task-specific architectures

    A new question answering head was added for the LayoutLM model.

    ... (truncated)

    Commits

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 0
  • Update transformers requirement from <4.21,>=4.14 to >=4.14,<4.22

    Update transformers requirement from <4.21,>=4.14 to >=4.14,<4.22

    Updates the requirements on transformers to permit the latest version.

    Release notes

    Sourced from transformers's releases.

    v4.21.0: TF XLA text generation - Custom Pipelines - OwlViT, NLLB, MobileViT, Nezha, GroupViT, MVP, CodeGen, UL2

    TensorFlow XLA Text Generation

    The TensorFlow text generation method can now be wrapped with tf.function and compiled to XLA. You should be able to achieve up to 100x speedup this way. See our blog post and our benchmarks. You can also see XLA generation in action in our example notebooks, particularly for summarization and translation.

    import tensorflow as tf
    from transformers import AutoTokenizer, TFAutoModelForSeq2SeqLM
    

    tokenizer = AutoTokenizer.from_pretrained("t5-small") model = TFAutoModelForSeq2SeqLM.from_pretrained("t5-small")

    Main changes with respect to the original generate workflow: tf.function and pad_to_multiple_of

    xla_generate = tf.function(model.generate, jit_compile=True) tokenization_kwargs = {"pad_to_multiple_of": 32, "padding": True, "return_tensors": "tf"}

    The first prompt will be slow (compiling), the others will be very fast!

    input_prompts = [ f"translate English to {language}: I have four cats and three dogs." for language in ["German", "French", "Romanian"] ] for input_prompt in input_prompts: tokenized_inputs = tokenizer([input_prompt], **tokenization_kwargs) generated_text = xla_generate(**tokenized_inputs, max_new_tokens=32) print(tokenizer.decode(generated_text[0], skip_special_tokens=True))

    New model additions

    OwlViT

    The OWL-ViT model (short for Vision Transformer for Open-World Localization) was proposed in Simple Open-Vocabulary Object Detection with Vision Transformers by Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby. OWL-ViT is an open-vocabulary object detection network trained on a variety of (image, text) pairs. It can be used to query an image with one or multiple text queries to search for and detect target objects described in text.

    NLLB

    The NLLB model was presented in No Language Left Behind: Scaling Human-Centered Machine Translation by Marta R. Costa-jussà, James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht, Jean Maillard, Anna Sun, Skyler Wang, Guillaume Wenzek, Al Youngblood, Bapi Akula, Loic Barrault, Gabriel Mejia Gonzalez, Prangthip Hansanti, John Hoffman, Semarley Jarrett, Kaushik Ram Sadagopan, Dirk Rowe, Shannon Spruit, Chau Tran, Pierre Andrews, Necip Fazil Ayan, Shruti Bhosale, Sergey Edunov, Angela Fan, Cynthia Gao, Vedanuj Goswami, Francisco Guzmán, Philipp Koehn, Alexandre Mourachko, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, and Jeff Wang. No Language Left Behind (NLLB) is a model capable of delivering high-quality translations directly between any pair of 200+ languages — including low-resource languages like Asturian, Luganda, Urdu and more.

    ... (truncated)

    Commits

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 0
  • Update transformers requirement from <4.20,>=4.3 to >=4.3,<4.21

    Update transformers requirement from <4.20,>=4.3 to >=4.3,<4.21

    Updates the requirements on transformers to permit the latest version.

    Release notes

    Sourced from transformers's releases.

    v4.20.0 Big Model infernece, BLOOM, CvT, GPT Neo-X, LayoutLMv3, LeViT, LongT5, M-CTC-T, Trajectory Transformer and Wav2Vec2-Conformer

    Big model inference

    You can now use the big model inference of Accelerate directly in any call to from_pretrained by specifying device_map="auto" (or your own device_map). It will automatically load the model taking advantage of your GPU(s) then offloading what doesn't fit in RAM, or even on the hard drive if you don't have RAM. Your model can then be used normally for inference without anything else to do.

    from transformers import AutoModelForSeq2SeqLM
    

    model = AutoModelForSeq2SeqLM.from_pretrained( "bigscience/T0pp", revision="sharded", device_map="auto" )

    BLOOM

    The BLOOM model has been proposed with its various versions through the BigScience Workshop. The architecture of BLOOM is essentially similar to GPT3 (auto-regressive model for next token prediction), but has been trained on different 46 languages including code.

    CvT

    The Convolutional vision Transformer (CvT) improves the Vision Transformer (ViT) in performance and efficiency by introducing convolutions into ViT to yield the best of both designs.

    GPT Neo-X

    GPT-NeoX-20B is a 20 billion parameter autoregressive language model trained on the Pile, whose weights are made freely and openly available to the public through a permissive license. GPT-NeoX-20B is a particularly powerful few-shot reasoner and gains far more in performance when evaluated five-shot than similarly sized GPT-3 and FairSeq models.

    LayoutLMv3

    LayoutLMv3 simplifies LayoutLMv2 by using patch embeddings (as in ViT) instead of leveraging a CNN backbone, and pre-trains the model on 3 objectives: masked language modeling (MLM), masked image modeling (MIM) and word-patch alignment (WPA).

    LeViT

    LeViT improves the Vision Transformer (ViT) in performance and efficiency by a few architectural differences such as activation maps with decreasing resolutions in Transformers and the introduction of an attention bias to integrate positional information.

    LongT5

    LongT5 model is an extension of T5 model, and it enables using one of the two different efficient attention mechanisms - (1) Local attention, or (2) Transient-Global attention. It is capable of handling input sequences of a length up to 16,384 tokens.

    ... (truncated)

    Commits

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 0
  • Update transformers requirement from <4.19,>=4.3 to >=4.3,<4.20

    Update transformers requirement from <4.19,>=4.3 to >=4.3,<4.20

    Updates the requirements on transformers to permit the latest version.

    Release notes

    Sourced from transformers's releases.

    v4.19.0: OPT, FLAVA, YOLOS, RegNet, TAPEX, Data2Vec vision, FSDP integration

    Disclaimer: this release is the first release with no Python 3.6 support.

    OPT

    The OPT model was proposed in Open Pre-trained Transformer Language Models by Meta AI. OPT is a series of open-sourced large causal language models which perform similar in performance to GPT3.

    FLAVA

    The FLAVA model was proposed in FLAVA: A Foundational Language And Vision Alignment Model by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela and is accepted at CVPR 2022.

    The paper aims at creating a single unified foundation model which can work across vision, language as well as vision-and-language multimodal tasks.

    YOLOS

    The YOLOS model was proposed in You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection by Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu. YOLOS proposes to just leverage the plain Vision Transformer (ViT) for object detection, inspired by DETR. It turns out that a base-sized encoder-only Transformer can also achieve 42 AP on COCO, similar to DETR and much more complex frameworks such as Faster R-CNN.

    RegNet

    The RegNet model was proposed in Designing Network Design Spaces by Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár.

    The authors design search spaces to perform Neural Architecture Search (NAS). They first start from a high dimensional search space and iteratively reduce the search space by empirically applying constraints based on the best-performing models sampled by the current search space.

    TAPEX

    The TAPEX model was proposed in TAPEX: Table Pre-training via Learning a Neural SQL Executor by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou. TAPEX pre-trains a BART model to solve synthetic SQL queries, after which it can be fine-tuned to answer natural language questions related to tabular data, as well as performing table fact checking.

    Data2Vec: vision

    The Data2Vec model was proposed in data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu and Michael Auli. Data2Vec proposes a unified framework for self-supervised learning across different data modalities - text, audio and images. Importantly, predicted targets for pre-training are contextualized latent representations of the inputs, rather than modality-specific, context-independent targets.

    The vision model is added in v4.19.0.

    FSDP integration in Trainer

    PyTorch recently upstreamed the Fairscale FSDP into PyTorch Distributed with additional optimizations. This PR is aimed at integrating it into Trainer API.

    ... (truncated)

    Commits
    • a22db88 Release: v4.19.0
    • 9f16a1c Update data2vec.mdx to include a Colab Notebook link (that shows fine-tuning)...
    • a42242d migrate azure blob for beit checkpoints (#16902)
    • b971c76 Add OPT (#17088)
    • 8c7481f ViT and Swin symbolic tracing with torch.fx (#17182)
    • 1a68870 Fix contents in index.mdx to match docs' sidebar (#17198)
    • b17b788 Fix style error in Spanish docs (#17197)
    • 1a66a6c Translate index.mdx (to ES) and add Spanish models to quicktour.mdx examples ...
    • e2d678b Documentation: Spanish translation of fast_tokenizers.mdx (#16882)
    • ae82da2 Added es version of language_modeling.mdx doc (#17021)
    • Additional commits viewable in compare view

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 0
  • Update transformers requirement from <4.25,>=4.14 to >=4.14,<4.26

    Update transformers requirement from <4.25,>=4.14 to >=4.14,<4.26

    Updates the requirements on transformers to permit the latest version.

    Commits

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 0
Releases(3.0.8)
  • 3.0.8(Nov 3, 2022)

    What's Changed

    • Update torch requirement from <1.13,>=1.7 to >=1.7,<1.14 by @dependabot in https://github.com/Riccorl/transformers-embedder/pull/51
    • Update transformers requirement from <4.23,>=4.14 to >=4.14,<4.25 by @dependabot in https://github.com/Riccorl/transformers-embedder/pull/52

    Full Changelog: https://github.com/Riccorl/transformers-embedder/compare/3.0.7...3.0.8

    Source code(tar.gz)
    Source code(zip)
  • 3.0.7(Oct 21, 2022)

  • 3.0.6(Oct 10, 2022)

  • 3.0.5(Oct 10, 2022)

    What's Changed

    • Update transformers requirement from <4.22,>=4.14 to >=4.14,<4.23 by @dependabot in https://github.com/Riccorl/transformers-embedder/pull/49

    Full Changelog: https://github.com/Riccorl/transformers-embedder/compare/3.0.4...3.0.5

    Source code(tar.gz)
    Source code(zip)
  • 3.0.4(Jul 28, 2022)

    What's Changed

    • Update transformers requirement from <4.21,>=4.14 to >=4.14,<4.22 by @dependabot in https://github.com/Riccorl/transformers-embedder/pull/48

    Full Changelog: https://github.com/Riccorl/transformers-embedder/compare/3.0.3...3.0.4

    Source code(tar.gz)
    Source code(zip)
  • 3.0.3(Jul 7, 2022)

    What's Changed

    • Update torch requirement from <1.12,>=1.7 to >=1.7,<1.13 by @dependabot in https://github.com/Riccorl/transformers-embedder/pull/47

    Full Changelog: https://github.com/Riccorl/transformers-embedder/compare/3.0.2...3.0.3

    Source code(tar.gz)
    Source code(zip)
  • 3.0.2(Jun 17, 2022)

    What's Changed

    • Update transformers requirement from <4.20,>=4.3 to >=4.3,<4.21 by @dependabot in https://github.com/Riccorl/transformers-embedder/pull/46

    Full Changelog: https://github.com/Riccorl/transformers-embedder/compare/3.0.1...3.0.2

    Source code(tar.gz)
    Source code(zip)
  • 3.0.1(May 30, 2022)

    What's Changed

    • Update transformers requirement from <4.19,>=4.3 to >=4.3,<4.20 by @dependabot in https://github.com/Riccorl/transformers-embedder/pull/45

    Full Changelog: https://github.com/Riccorl/transformers-embedder/compare/3.0.0...3.0.1

    Source code(tar.gz)
    Source code(zip)
  • 3.0.0(Apr 8, 2022)

    What's Changed

    • Sparse pooling by @Flegyas . It is the default subword pooling strategy. It is deterministic, but it doesn't support ONNX runtime export.
    • return_words parameter is now subword_pooling_strategy and the possible values are sparse, scatter and none.
    • Tokenizer accept return_sparse_offsets during initialization. If you are using the scatter subword pooling strategy, you can set it to False to reduce memory usage.
    • Update transformers requirement from <4.18,>=4.3 to >=4.3,<4.19 by @dependabot in https://github.com/Riccorl/transformers-embedder/pull/44

    New Contributors

    • @Flegyas made their first contribution in https://github.com/Riccorl/transformers-embedder/pull/43

    Full Changelog: https://github.com/Riccorl/transformers-embedder/compare/2.0.2...3.0.0

    Source code(tar.gz)
    Source code(zip)
  • 3.0.0rc1(Apr 8, 2022)

    What's Changed

    • Update transformers requirement from <4.18,>=4.3 to >=4.3,<4.19 by @dependabot in https://github.com/Riccorl/transformers-embedder/pull/44

    Full Changelog: https://github.com/Riccorl/transformers-embedder/compare/2.1.0b1...3.0.0rc1

    Source code(tar.gz)
    Source code(zip)
  • 2.1.0b1(Mar 23, 2022)

    What's Changed

    • Feature/sparse pooling by @Flegyas in https://github.com/Riccorl/transformers-embedder/pull/43

    New Contributors

    • @Flegyas made their first contribution in https://github.com/Riccorl/transformers-embedder/pull/43

    Full Changelog: https://github.com/Riccorl/transformers-embedder/compare/2.0.2...2.1.0b1

    Source code(tar.gz)
    Source code(zip)
  • 2.0.2(Mar 15, 2022)

  • 2.0.1(Mar 14, 2022)

    What's Changed

    • Improve docstrings to be compliant with the new version by @LeonardoEmili in https://github.com/Riccorl/transformers-embedder/pull/40
    • Update torch requirement from <1.11,>=1.7 to >=1.7,<1.12 by @dependabot in https://github.com/Riccorl/transformers-embedder/pull/42

    Full Changelog: https://github.com/Riccorl/transformers-embedder/compare/2.0.0...2.0.1

    Source code(tar.gz)
    Source code(zip)
  • 2.0.0(Mar 7, 2022)

    Changelog

    TransformersEmbedder:

    • Added TransformersEncoder class, which includes an Encoder module:
      • BatchNorm1d normalization layer
      • A Linear projection layer
      • Dropout
      • Swish activation function at the end
    • Changed pooling_strategy to layer_pooling_strategy

    Tokenizer:

    • Underlying tokenization relies completely on the HuggingFace's PreTrainedTokenizer
    • Offsets computation based on HuggingFace word_ids parameter in BatchEncoding
    • Removed spaCy dependency, it should be able to perform sub-word pooling even without pre-tokenization

    What's Changed

    • Improve documentation by @LeonardoEmili in https://github.com/Riccorl/transformers-embedder/pull/38
    • Merge 2.0b into main by @Riccorl in https://github.com/Riccorl/transformers-embedder/pull/39

    Full Changelog: https://github.com/Riccorl/transformers-embedder/compare/1.9.0...2.0.0

    Source code(tar.gz)
    Source code(zip)
  • 1.9.0(Feb 18, 2022)

    What's Changed

    • Minor improvements to the Embedder by @LeonardoEmili in https://github.com/Riccorl/transformers-embedder/pull/35

    New Contributors

    • @LeonardoEmili made their first contribution in https://github.com/Riccorl/transformers-embedder/pull/35

    Full Changelog: https://github.com/Riccorl/transformers-embedder/compare/1.8.5...1.9.0

    Source code(tar.gz)
    Source code(zip)
  • 1.8.5(Feb 9, 2022)

    What's Changed

    • Update transformers requirement from <4.14,>=4.3 to >=4.3,<4.15 by @dependabot in https://github.com/Riccorl/transformers-embedder/pull/32
    • Update transformers requirement from <4.15,>=4.3 to >=4.3,<4.16 by @dependabot in https://github.com/Riccorl/transformers-embedder/pull/33
    • Update transformers requirement from <4.16,>=4.3 to >=4.3,<4.17 by @dependabot in https://github.com/Riccorl/transformers-embedder/pull/34

    Full Changelog: https://github.com/Riccorl/transformers-embedder/compare/1.8.4...1.8.5

    Source code(tar.gz)
    Source code(zip)
  • 1.8.4(Dec 11, 2021)

  • 1.8.3(Dec 11, 2021)

    • Update README.md

    • Merge pull request #30 from Riccorl/dependabot/pip/spacy-gte-3.0-and-lt-3.3

      Update spacy requirement from <3.2,>=3.0 to >=3.0,<3.3

    • Merge pull request #31 from Riccorl/dependabot/pip/transformers-gte-4.3-and-lt-4.14

      Update transformers requirement from <4.13,>=4.3 to >=4.3,<4.14

    • Fix output attention bug. Update dependencies.

    Source code(tar.gz)
    Source code(zip)
  • 1.8.2(Oct 29, 2021)

    What's Changed

    • Update transformers requirement from <4.12,>=4.3 to >=4.3,<4.13 by @dependabot in https://github.com/Riccorl/transformers-embedder/pull/29

    Full Changelog: https://github.com/Riccorl/transformers-embedder/compare/1.8.1...1.8.2

    Source code(tar.gz)
    Source code(zip)
  • 1.8.1(Oct 23, 2021)

    • Merge pull request #27 from Riccorl/deepsource-transform-1020b285

      Format code with black

    • Update readme

    • Update redme

    • Merge pull request #28 from Riccorl/dependabot/pip/torch-gte-1.7-and-lt-1.11

      Update torch requirement from <1.10,>=1.7 to >=1.7,<1.11

    • Name changed to transformers_embedder, changed subword pooling logic, update actions.

    • Update actions

    • Update actions

    • Update actions

    • Update actions, update README

    • Update actions, update README

    • Update actions

    • Update actions

    • Update actions

    • Update actions

    • Update actions

    • Update actions

    • Update actions

    • Update actions

    • Update actions

    • Update actions

    • Update actions

    • Update actions, update README

    Source code(tar.gz)
    Source code(zip)
  • 1.8(Oct 23, 2021)

    • Merge pull request #27 from Riccorl/deepsource-transform-1020b285

      Format code with black

    • Update readme

    • Update redme

    • Merge pull request #28 from Riccorl/dependabot/pip/torch-gte-1.7-and-lt-1.11

      Update torch requirement from <1.10,>=1.7 to >=1.7,<1.11

    • Name changed to transformers_embedder, changed subword pooling logic, update actions.

    • Update actions

    • Update actions

    • Update actions

    • Update actions, update README

    • Update actions, update README

    • Update actions

    • Update actions

    • Update actions

    • Update actions

    • Update actions

    • Update actions

    • Update actions

    • Update actions

    • Update actions

    • Update actions

    • Update actions

    Source code(tar.gz)
    Source code(zip)
  • 1.7.16(Oct 1, 2021)

  • 1.7.15(Sep 10, 2021)

  • 1.7.14(Sep 9, 2021)

  • 1.7.13(Jul 8, 2021)

  • 1.7.12(Jul 8, 2021)

    pip install transformer-embedder only install transformers dependency. To install spaCy and PyTorch, use pip install "transformer-embedder["spacy"]" and pip install "transformer-embedder["torch"]"

    Source code(tar.gz)
    Source code(zip)
  • 1.7.10(Jul 5, 2021)

  • 1.7.5(Jul 4, 2021)

  • 1.7.3(Apr 19, 2021)

  • 1.7.2(Apr 14, 2021)

Owner
Riccardo Orlando
PhD Student @SapienzaNLP group & NLP Engineer at @Babelscape.
Riccardo Orlando
This repository contains the code for EMNLP-2021 paper "Word-Level Coreference Resolution"

Word-Level Coreference Resolution This is a repository with the code to reproduce the experiments described in the paper of the same name, which was a

null 79 Dec 27, 2022
Japanese Long-Unit-Word Tokenizer with RemBertTokenizerFast of Transformers

Japanese-LUW-Tokenizer Japanese Long-Unit-Word (国語研長単位) Tokenizer for Transformers based on 青空文庫 Basic Usage >>> from transformers import RemBertToken

Koichi Yasuoka 3 Dec 22, 2021
Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra. What is Lightning Tran

Pytorch Lightning 581 Dec 21, 2022
A Non-Autoregressive Transformer based TTS, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS.

A Non-Autoregressive Transformer based TTS, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS.

Keon Lee 237 Jan 2, 2023
[ICCV 2021] Instance-level Image Retrieval using Reranking Transformers

Instance-level Image Retrieval using Reranking Transformers Fuwen Tan, Jiangbo Yuan, Vicente Ordonez, ICCV 2021. Abstract Instance-level image retriev

UVA Computer Vision 86 Dec 28, 2022
File-based TF-IDF: Calculates keywords in a document, using a word corpus.

File-based TF-IDF Calculates keywords in a document, using a word corpus. Why? Because I found myself with hundreds of plain text files, with no way t

Jakob Lindskog 1 Feb 11, 2022
PyTorch Language Model for 1-Billion Word (LM1B / GBW) Dataset

PyTorch Large-Scale Language Model A Large-Scale PyTorch Language Model trained on the 1-Billion Word (LM1B) / (GBW) dataset Latest Results 39.98 Perp

Ryan Spring 114 Nov 4, 2022
Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

null 186 Dec 24, 2022
Source code for CsiNet and CRNet using Fully Connected Layer-Shared feedback architecture.

FCS-applications Source code for CsiNet and CRNet using the Fully Connected Layer-Shared feedback architecture. Introduction This repository contains

Boyuan Zhang 4 Oct 7, 2022
A calibre plugin that generates Word Wise and X-Ray files then sends them to Kindle. Supports KFX, AZW3 and MOBI eBooks. X-Ray supports 18 languages.

WordDumb A calibre plugin that generates Word Wise and X-Ray files then sends them to Kindle. Supports KFX, AZW3 and MOBI eBooks. Languages X-Ray supp

null 172 Dec 29, 2022
LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)

LV-BERT Introduction In this repo, we introduce LV-BERT by exploiting layer variety for BERT. For detailed description and experimental results, pleas

Weihao Yu 14 Aug 24, 2022
PyKaldi is a Python scripting layer for the Kaldi speech recognition toolkit.

PyKaldi is a Python scripting layer for the Kaldi speech recognition toolkit. It provides easy-to-use, low-overhead, first-class Python wrappers for t

null 922 Dec 31, 2022
Some embedding layer implementation using ivy library

ivy-manual-embeddings Some embedding layer implementation using ivy library. Just for fun. It is based on NYCTaxiFare dataset from kaggle (cut down to

Ishtiaq Hussain 2 Feb 10, 2022
This project uses word frequency and Term Frequency-Inverse Document Frequency to summarize a text.

Text Summarizer This project uses word frequency and Term Frequency-Inverse Document Frequency to summarize a text. Team Members This mini-project was

null 1 Nov 16, 2021
Random-Word-Generator - Generates meaningful words from dictionary with given no. of letters and words.

Random Word Generator Generates meaningful words from dictionary with given no. of letters and words. This might be useful for generating short links

Mohammed Rabil 1 Jan 1, 2022
pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation

pkuseg:一个多领域中文分词工具包 (English Version) pkuseg 是基于论文[Luo et. al, 2019]的工具包。其简单易用,支持细分领域分词,有效提升了分词准确度。 目录 主要亮点 编译和安装 各类分词工具包的性能对比 使用方式 论文引用 作者 常见问题及解答 主要

LancoPKU 6k Dec 29, 2022
🦆 Contextually-keyed word vectors

sense2vec: Contextually-keyed word vectors sense2vec (Trask et. al, 2015) is a nice twist on word2vec that lets you learn more interesting and detaile

Explosion 1.5k Dec 25, 2022
🦆 Contextually-keyed word vectors

sense2vec: Contextually-keyed word vectors sense2vec (Trask et. al, 2015) is a nice twist on word2vec that lets you learn more interesting and detaile

Explosion 1.2k Feb 17, 2021