PyTorch implementation of "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language" from Meta AI

Overview

data2vec-pytorch

PyTorch implementation of "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language" from Meta AI (FAIR)

Data2Vec is the first high-performance self-supervised algorithm that learns the same way in multiple modalities, including speech, vision and text. Most machines learn exclusively from labeled data. However, through self-supervised learning, machines are able to learn about the world just by observing it and then figuring out the structure of images, speech or text. This is a more scalable and efficient approach for machines to tackle new complex tasks, such as understanding text for more spoken languages.

In summary, the method is as follows:

  1. The encoder extracts features from the masked inputs. These features are outputs of every transformer/linear layer.
  2. The teacher which is an EMA instance of the encoder (in eval model), extracts features from the unmasked inputs.
  3. Optional normalizations are applied to the layers/outputs of the teacher.
  4. Encoder outputs are regressed by a projection block/layer.
  5. The loss is calculated from encoder outputs and teacher outputs.

You can read the paper for more detail.

Implementation

Data2Vec is already implemented in fairseq in which for all modalities there is a seperate implementation (text, vision, audio). According to the paper:

Our primary is to design a single learning mechanism for different modalities. Despite the unified learning regime, we still use modality-specific features extractors and masking strategies. This makes sense given the vastly different nature of the input data.

This implementation differs in the fact that a single Data2Vec model is provided powered by a custom encoder (implemented using PyTorch + HuggingFace Transformers) and tries to unify the whole concept in a single module. The key concept is that there must be modality-specific feature extractions and masking strategies.

  • Masking: For each modality, the Dataset instance must return the masked source, the target and the mask tensor.

  • Feature Extraction: Features are the outputs from the transformer/attention layers. So the forward method must return outputs from all Encoder blocks of the transformer model. HuggingFace Transformers/Fairseq models return transformer layers outputs separately out of the box.

This implementation uses HuggingFace Transformers models as encoders for Data2Vec which you can inspect in the encoder.py files for each modality. Although, you can provide your own encoder model. Just make sure that your encoder must be Transformer-based according to the paper and outputs from every encoder layer must be provided.

Note: This implementation's goal is to provide the necessary building blocks of Data2Vec so anyone can adapt it to their own use case with ease, so in order to make it easy to get hands on, some functionalities like mixed precision, distributed training, etc are not included to keep it as clean & simple as possible. If you only need to train a standard large scale Data2Vec model use the official repo.

Train

First things first, install the requirements:

pip install -r requirements.txt

NLP

Train a Language Model based on RoBERTa (HuggingFace) on WikiText103

Configure the related properties in text/configs/roberta-pretraining.yaml and run:

python train.py --config text/configs/roberta-pretraining.yaml 

Vision

Run a Masked Image modeling training based on BEiT (HuggingFace)

Pass the path to the image dataset in the config file at vision/configs/beit-pretraining.yaml under dataset > path > train/test and modify other properties as you desire and run the following:

python train.py --config vision/configs/beit-pretraining.yaml 

Speech

Audio pretraining based on Wav2Vec2 (HuggingFace) on timit dataset. If you want to use other datasets like librispeech provide it in audio/dataset.py (some minor changes to the timit class would do the job because both are loaded from HuggingFace datasets)

Configure other properties as you desire and run the following:

python train.py --config audio/configs/wav2vec2-pretraining.yaml 

Pre-trained Weights

The models are available on HuggingFace Hub and you can use them like below:

RoBERTa

Data2Vec model trained with RoBERTa as the encoder (data2vec-roberta-base)

from transformers import AutoModel, AutoConfig
from transformers import RobertaModel

checkpoint = 'arxyzan/data2vec-roberta-base'

# Option 1: load using AutoModel
data2vec_roberta = AutoModel.from_pretrained(checkpoint)

# Option 2: load directly by RobertaModel
data2vec_roberta = RobertaModel.from_pretrained(checkpoint)

BEiT

Data2Vec model trained with BEiT as the encoder (data2vec-beit-base)

from transformers import AutoModel, AutoConfig
from transformers import BeitModel

checkpoint = 'arxyzan/data2vec-beit-base'

# Option 1: load using AutoModel
data2vec_beit = AutoModel.from_pretrained(checkpoint)

# Option 2: load directly by BeitModel
data2vec_beit = BeitModel.from_pretrained(checkpoint)

Wav2Vec2

Data2Vec model trained with Wav2Vec2 as the encoder (data2vec-wav2vec2-base)

from transformers import AutoModel, AutoConfig
from transformers import Wav2Vec2Model

checkpoint = 'arxyzan/data2vec-wav2vec2-base'

# Option 1: load using AutoModel
data2vec_wav2vec2 = AutoModel.from_pretrained(checkpoint)

# Option 2: load directly by Wav2Vec2Model
data2vec_wav2vec2 = Wav2Vec2Model.from_pretrained(checkpoint)

Note: The above models' weights were carefully ported from the original checkpoints in the fairseq version.

Fine-tuning

  1. Fine-tune using the checkpoints mentioned above:
# Text classification using Roberta model from HuggingFace
from transformers import RobertaModel, RobertaForSequenceClassification

checkpoint = 'arxyzan/data2vec-roberta-base'
# this is exactly a roberta model but trained with data2vec
data2vec_roberta = RobertaModel.from_pretrained(checkpoint)
text_classifier = RobertaForSequenceClassification(data2vec_roberta.config)
# assign `data2vec-roberta` weights to the roberta block of the classifier
text_classifier.roberta = data2vec_roberta
...
  1. In case you trained a model using this codebase, you can fine-tune it by taking out the encoder's state dict from the checkpoint which gives you a HuggingFace model and you can fine-tune it for any downstream task as you'd normally do for HuggingFace models.
# load a checkpoint for finetuning
from transformers import RobertaModel, RobertaConfig
roberta = RobertaModel(RobertaConfig())
checkpoint = torch.load('path/to/data2vec.pt')
roberta_state_dict = checkpoint['encoder']
# load roberta weights from the encoder part of the data2vec model
encoder = roberta.load_state_dict(roberta_state_dict)

# Now fine-tune a regular HuggingFace RoBERTa model
...

Contributions

Any contribution regarding training, development and issues are welcome!

Comments
  • RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment

    RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment

    Hi there!

    Great repo! I am trying to pre-train wav2vec2 using this and I get the error:

    RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment

    any fix for this?

    opened by Sreyan88 7
  • Any plan for supporting distributed training?

    Any plan for supporting distributed training?

    Hi arxyzan~ Thanks for your great contribution! I wonder to known is there any plan for supporting distributed training, as pre-training will be too slow with a single GPU?

    opened by ruizewang 3
  • EMA teacher model should not be deepcopied

    EMA teacher model should not be deepcopied

    EMA teacher model, according to the paper, is initialized randomly with the same architecture as student model. So, deepcopying the student model to create the teacher model should be avoided as it copies the weight parameters as well.

    opened by sudhakaranjain 2
  • Question about reproducibility

    Question about reproducibility

    Hello, thanks for your effort to make it easier to understand the data2vec. Let me ask a quick question; can we reproduce the paper with your implementation? I guess it is out of the scope of this repo, but I thought it would be quite nice if possible. Thank you anyway!

    opened by daisukelab 2
  • Trouble with audio....?

    Trouble with audio....?

    Aryan, thank you very much for sharing your code with the world. I wonder if you could advise:

    I am trying to train by following the instructions for audio, but I haven't been able to get TIMIT or LibriSpeech to work.

    TIMIT

    For TIMIT, I get the message from HuggingFace that it must be downloaded manually. From the URL provided in the message, I got to UPenn who apparently want $250? for the dataset?? ...So, ok, I obtained a copy from a friend and also from Kaggle. But in both cases the HF dataloader fails; it is looking for files that don't exist anywhere in the dataset: it is looking for files with lower-case letters like "*test" (all the filenames in both my copies are uppercase) and certain file extensions that exclude the .DOC which is provided in TIMIT:

    Error message

      File "/home/ubuntu/envs/data2vec/lib/python3.9/site-packages/datasets/data_files.py", line 201, in resolve_patterns_locally_or_by_urls
        raise FileNotFoundError(error_msg)
    FileNotFoundError: Unable to resolve any data file that matches '['**test*', '**eval*']' at /home/ubuntu/datasets/timit with any supported extension ['csv', 'tsv', 'json', 'jsonl', 'parquet', 'txt', 'blp', 'bmp', 'dib', 'bufr', 'cur', 'pcx', 'dcx', 'dds', 'ps', 'eps', 'fit', 'fits', 'fli', 'flc', 'ftc', 'ftu', 'gbr', 'gif', 'grib', 'h5', 'hdf', 'png', 'apng', 'jp2', 'j2k', 'jpc', 'jpf', 'jpx', 'j2c', 'icns', 'ico', 'im', 'iim', 'tif', 'tiff', 'jfif', 'jpe', 'jpg', 'jpeg', 'mpg', 'mpeg', 'msp', 'pcd', 'pxr', 'pbm', 'pgm', 'ppm', 'pnm', 'psd', 'bw', 'rgb', 'rgba', 'sgi', 'ras', 'tga', 'icb', 'vda', 'vst', 'webp', 'wmf', 'emf', 'xbm', 'xpm', 'zip']
    

    The files look like

    ³       PHONCODE.DOC
    ³       PROMPTS.TXT
    ³       SPKRINFO.TXT
    ³       SPKRSENT.TXT
    ³       TESTSET.DOC
    

    If I take away the 'clean' directive in the load_dataset call, then the dataset loads but fails later with a key error:

    Epoch: 1/1000     0%|                                                                      | 0/31678 [00:00<?, ?batch/s]
    Traceback (most recent call last):
      File "/home/ubuntu/shawley/data2vec-pytorch/train.py", line 25, in <module>
        trainer.train()
      File "/home/ubuntu/shawley/data2vec-pytorch/audio/trainer.py", line 142, in train
        train_loss = self.train_epoch(epoch)
      File "/home/ubuntu/shawley/data2vec-pytorch/audio/trainer.py", line 106, in train_epoch
        for batch in iterator:
      File "/home/ubuntu/envs/data2vec/lib/python3.9/site-packages/tqdm/std.py", line 1195, in __iter__
        for obj in iterable:
      File "/home/ubuntu/envs/data2vec/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 530, in __next__
        data = self._next_data()
      File "/home/ubuntu/envs/data2vec/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 570, in _next_data
        data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
      File "/home/ubuntu/envs/data2vec/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
        data = [self.dataset[idx] for idx in possibly_batched_index]
      File "/home/ubuntu/envs/data2vec/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
        data = [self.dataset[idx] for idx in possibly_batched_index]
      File "/home/ubuntu/shawley/data2vec-pytorch/audio/dataset.py", line 21, in __getitem__
        x = self.data[index]['audio']
    KeyError: 'audio'
    

    If I print out self.data just after it's loaded in your TIMIT class, there is no 'audio' part to the namespace:

    print("self.data[0] = ",self.data[0])
    
    self.data =  {'index': 1, 'test_or_train': 'TRAIN', 'dialect_region': 'DR4', 'speaker_id': 'MMDM0', 'filename': 'SI681.WAV.wav', 'path_from_data_dir': 'TRAIN/DR4/MMDM0/SI681.WAV.wav', 'path_from_data_dir_windows': 'TRAIN\\\\DR4\\\\MMDM0\\\\SI681.WAV.wav', 'is_converted_audio': True, 'is_audio': True, 'is_word_file': False, 'is_phonetic_file': False, 'is_sentence_file': False}
    

    Are you able to comment or advise about getting TIMIT to work?

    LibriSpeech

    For LibriSpeech, I copied your TIMIT class in dataset.py and just hard-coded the name of the dataset:

    class LibriSpeech(Dataset):
        def __init__(self, cfg, split, **kwargs):
            super(LibriSpeech, self).__init__()
            path = cfg.dataset.path
            #self.data = load_dataset(path, 'clean')[split]
            self.data = load_dataset("librispeech_asr", 'clean')
            #print("self.data = ",self.data)
            self.data = self.data[split]
            self.feature_extractor = Wav2Vec2FeatureExtractor(cfg.model.encoder_checkpoint)
            self.__dict__.update(kwargs)
    
        def __len__(self):
            return len(self.data)
    
        def __getitem__(self, index):
            x = self.data[index]['audio']
            x = self.feature_extractor(x['array'], sampling_rate=x['sampling_rate'], padding=True, return_tensors='pt')['input_values']
            return {'input_values': x[0]}
           
    

    And then in trainer.py I just wrote

            #self.train_dataset = TIMIT(cfg, 'train')
            #self.test_dataset = TIMIT(cfg, 'test')
            self.train_dataset = LibriSpeech(cfg, 'train.100')
            self.test_dataset = LibriSpeech(cfg, 'test')
    

    In that case the data is loaded without errors, and the training begins but aborts with a series of CUDA errors:

    Epoch: 1/1000     0%|                                                                      | 0/28539 [00:00<?, ?batch/s]../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [5550,0,0], thread: [64,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
    ../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [5550,0,0], thread: [65,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
    ../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [5550,0,0], thread: [66,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
    
    ...hundreds more lines like this, and then...
    
    Epoch: 1/1000     0%|                                                                      | 0/28539 [00:04<?, ?batch/s]
    Traceback (most recent call last):
      File "/home/ubuntu/shawley/data2vec-pytorch/train.py", line 25, in <module>
        trainer.train()
      File "/home/ubuntu/shawley/data2vec-pytorch/audio/trainer.py", line 142, in train
        train_loss = self.train_epoch(epoch)
      File "/home/ubuntu/shawley/data2vec-pytorch/audio/trainer.py", line 107, in train_epoch
        loss = self.train_step(batch)
      File "/home/ubuntu/shawley/data2vec-pytorch/audio/trainer.py", line 65, in train_step
        x, y = self.model(src, src, mask)
      File "/home/ubuntu/envs/data2vec/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/ubuntu/shawley/data2vec-pytorch/data2vec/data2vec.py", line 83, in forward
        x = self.encoder(src, mask, **kwargs)['encoder_out']  # fetch the last layer outputs
      File "/home/ubuntu/envs/data2vec/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/ubuntu/shawley/data2vec-pytorch/audio/encoder.py", line 35, in forward
        outputs = self.encoder(inputs, mask_time_indices=mask, output_hidden_states=True,
      File "/home/ubuntu/envs/data2vec/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/ubuntu/envs/data2vec/lib/python3.9/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 1357, in forward
        hidden_states = self._mask_hidden_states(
      File "/home/ubuntu/envs/data2vec/lib/python3.9/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 1297, in _mask_hidden_states
        hidden_states[mask_time_indices] = self.masked_spec_embed.to(hidden_states.dtype)
    RuntimeError: CUDA error: device-side assert triggered
    

    Do you have a suggestion about getting LibreSpeech working?

    Thanks again, Soctt

    opened by drscotthawley 2
  • Which Hyperparameter tuning method to use?

    Which Hyperparameter tuning method to use?

    First of all, thank you for the great work @AryanShekarlaban and @kabouzeid!

    quick question: I have less experience with training big models like Transformers. I see that there are many frameworks and Algorithms for Hyperparameter tuning in internet. Could you suggest a hyperparameter tuning framework and algorithm for data2vec?

    Thank you!

    opened by anirudh2019 2
  • Model weights?

    Model weights?

    Although pretraining these models requires a lot of hardware resources and is almost impossible for an individual like me to do, there is the possibility to port the weights from HuggingFace models that actually use the same encoders as fairseq (and this repo). Otherwise this repo would be benefitial only for educational purposes.

    Obviously, this task must be carried out so carefully but before that, the possibility of it must be verified. As this model "slightly" outperforms previous SOTA models, messing up even a single layer weight can ruin the whole thing!

    The progress and issues regarding this task, will be stated here.

    opened by arxyzan 2
  • encoder checkpoint?

    encoder checkpoint?

    Hi, thanks for your work on this!

    Why is there an encoder checkpoint (I can't find this mentioned in the paper)? Is it possible to train from scratch with this codebase?

    https://github.com/AryanShekarlaban/data2vec-pytorch/blob/640fb8531be9deb5f8f0653802e272a4d39f39db/vision/configs/beit-pretraining.yaml#L4

    opened by kabouzeid 2
  • EMA model forward

    EMA model forward

            # model forward in online mode (student)
            x = self.encoder(src, mask, **kwargs)['encoder_out']  # fetch the last layer outputs
            if trg is None:
                return x
    
            # model forward in offline mode (teacher)
            with torch.no_grad():
                self.ema.model.eval()
                y = self.ema.model(trg, ~mask, **kwargs)['encoder_states']  # fetch the last transformer layers outputs
    

    In the teacher forward pass the mask_time_indices is the inverse of the one in student, is this correct? I think the mask in the teacher forward pass should be None since the teacher expects the full version of input data

    opened by anhvth 1
  • question about disabling gradient bp

    question about disabling gradient bp

    https://github.com/facebookresearch/fairseq/blob/main/fairseq/modules/ema_module.py#L41 Hi. It seems you forget this line, which may also disable gradient bp of the student encoder? Thx!

    opened by btwbtm 1
  • doubt about finetuning

    doubt about finetuning

    first of all great work @AryanShekarlaban and @kabouzeid!

    quick question - if I want to fine tune data2vec with a given backbone (e.g. wav2vec2) - would freezing the feature extractor be enough? or should I also add an nn.Linear layer?

    I see that by design trainer.py finetunes with TIMIT - but I also seen in another issue that we're actually training it from scratch (not sure if I'm missing something here)

    thanks!

    opened by rafaelvp-db 1
  • Some Questions

    Some Questions

    Hi @arxyzan ,

    1. Can you tell me what parts I need to change if my input size is 256 instead of 224?

    2. Is it mandatory to load encoder_checkpoint? Can I train my model from scratch?

    3. why is the config file named beit-pretraining.yaml for the vision task?

    4. Could you help me to solve problem below:

    Epoch: 1/100 0%| | 0/18001 [00:02<?, ?batch/s] Traceback (most recent call last): File "/mnt/c/data2vec-pytorch/train.py", line 25, in trainer.train() File "/mnt/c/data2vec-pytorch/vision/trainer.py", line 131, in train train_loss = self.train_epoch(epoch) File "/mnt/c/data2vec-pytorch/vision/trainer.py", line 97, in train_epoch loss = self.train_step(batch) File "/mnt/c/data2vec-pytorch/vision/trainer.py", line 56, in train_step x, y = self.model(src, trg, mask) File "/home/bryan/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/mnt/c/data2vec-pytorch/data2vec/data2vec.py", line 90, in forward y = self.ema.model(trg, ~mask, **kwargs)['encoder_states'] # fetch the last transformer layers outputs File "/home/bryan/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/mnt/c/data2vec-pytorch/vision/encoder.py", line 38, in forward outputs = self.encoder(pixel_values=inputs, output_hidden_states=True, output_attentions=True, **kwargs) File "/home/bryan/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/home/bryan/.local/lib/python3.10/site-packages/transformers/models/beit/modeling_beit.py", line 681, in forward embedding_output = self.embeddings(pixel_values, bool_masked_pos) File "/home/bryan/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/home/bryan/.local/lib/python3.10/site-packages/transformers/models/beit/modeling_beit.py", line 154, in forward embeddings = self.patch_embeddings(pixel_values) File "/home/bryan/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/home/bryan/.local/lib/python3.10/site-packages/transformers/models/beit/modeling_beit.py", line 206, in forward embeddings = self.projection(pixel_values).flatten(2).transpose(1, 2) File "/home/bryan/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/home/bryan/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 463, in forward return self._conv_forward(input, self.weight, self.bias) File "/home/bryan/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward return F.conv2d(input, weight, bias, self.stride, RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same

    opened by bryanwong17 3
Owner
Aryan Shekarlaban
Deep Learning Developer & Researcher
Aryan Shekarlaban
Official implementation of Meta-StyleSpeech and StyleSpeech

Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation Dongchan Min, Dong Bok Lee, Eunho Yang, and Sung Ju Hwang This is an official code

min95 169 Jan 5, 2023
Meta learning algorithms to train cross-lingual NLI (multi-task) models

Meta learning algorithms to train cross-lingual NLI (multi-task) models

M.Hassan Mojab 4 Nov 20, 2022
This repository contains examples of Task-Informed Meta-Learning

Task-Informed Meta-Learning This repository contains examples of Task-Informed Meta-Learning (paper). We consider two tasks: Crop Type Classification

null 10 Dec 19, 2022
Mirco Ravanelli 2.3k Dec 27, 2022
Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

Ubiquitous Knowledge Processing Lab 59 Dec 1, 2022
SAINT PyTorch implementation

SAINT-pytorch A Simple pyTorch implementation of "Towards an Appropriate Query, Key, and Value Computation for Knowledge Tracing" based on https://arx

Arshad Shaikh 63 Dec 25, 2022
Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch

COCO LM Pretraining (wip) Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch. They were a

Phil Wang 44 Jul 28, 2022
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Phil Wang 5k Jan 2, 2023
A fast and easy implementation of Transformer with PyTorch.

FasySeq FasySeq is a shorthand as a Fast and easy sequential modeling toolkit. It aims to provide a seq2seq model to researchers and developers, which

宁羽 7 Jul 18, 2022
A PyTorch Implementation of End-to-End Models for Speech-to-Text

speech Speech is an open-source package to build end-to-end models for automatic speech recognition. Sequence-to-sequence models with attention, Conne

Awni Hannun 647 Dec 25, 2022
Pytorch implementation of Tacotron

Tacotron-pytorch A pytorch implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model. Requirements Install python 3 Install pytorc

soobin seo 203 Dec 2, 2022
Google AI 2018 BERT pytorch implementation

BERT-pytorch Pytorch implementation of Google AI's 2018 BERT, with simple annotation BERT 2018 BERT: Pre-training of Deep Bidirectional Transformers f

Junseong Kim 5.3k Jan 7, 2023
Unofficial PyTorch implementation of Google AI's VoiceFilter system

VoiceFilter Note from Seung-won (2020.10.25) Hi everyone! It's Seung-won from MINDs Lab, Inc. It's been a long time since I've released this open-sour

MINDs Lab 881 Jan 3, 2023
Implementation of ProteinBERT in Pytorch

ProteinBERT - Pytorch (wip) Implementation of ProteinBERT in Pytorch. Original Repository Install $ pip install protein-bert-pytorch Usage import torc

Phil Wang 92 Dec 25, 2022
A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

Chimera: Learning Shared Semantic Space for Speech-to-Text Translation This is a Pytorch implementation for the "Chimera" paper Learning Shared Semant

Chi Han 43 Dec 28, 2022
PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

Cross-Covariance Image Transformer (XCiT) PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer L

Facebook Research 605 Jan 2, 2023
A pytorch implementation of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".

RE2 This is a pytorch implementation of the ACL 2019 paper "Simple and Effective Text Matching with Richer Alignment Features". The original Tensorflo

null 286 Jan 2, 2023
PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

VAENAR-TTS - PyTorch Implementation PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

Keon Lee 67 Nov 14, 2022
A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

Splitter ⠀⠀ A PyTorch implementation of Splitter: Learning Node Representations that Capture Multiple Social Contexts (WWW 2019). Abstract Recent inte

Benedek Rozemberczki 201 Nov 9, 2022