PyTorch implementation of "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language" from Meta AI

Aryan Shekarlaban

Last update: Jan 4, 2023

Related tags

Text Data & NLP pytorch fairseq self-supervised-learning roberta huggingface wav2vec beit data2vec

Overview

data2vec-pytorch

PyTorch implementation of "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language" from Meta AI (FAIR)

Data2Vec is the first high-performance self-supervised algorithm that learns the same way in multiple modalities, including speech, vision and text. Most machines learn exclusively from labeled data. However, through self-supervised learning, machines are able to learn about the world just by observing it and then figuring out the structure of images, speech or text. This is a more scalable and efficient approach for machines to tackle new complex tasks, such as understanding text for more spoken languages.

In summary, the method is as follows:

The encoder extracts features from the masked inputs. These features are outputs of every transformer/linear layer.
The teacher which is an EMA instance of the encoder (in eval model), extracts features from the unmasked inputs.
Optional normalizations are applied to the layers/outputs of the teacher.
Encoder outputs are regressed by a projection block/layer.
The loss is calculated from encoder outputs and teacher outputs.

You can read the paper for more detail.

Implementation

Data2Vec is already implemented in fairseq in which for all modalities there is a seperate implementation (text, vision, audio). According to the paper:

Our primary is to design a single learning mechanism for different modalities. Despite the unified learning regime, we still use modality-specific features extractors and masking strategies. This makes sense given the vastly different nature of the input data.

This implementation differs in the fact that a single Data2Vec model is provided powered by a custom encoder (implemented using PyTorch + HuggingFace Transformers) and tries to unify the whole concept in a single module. The key concept is that there must be modality-specific feature extractions and masking strategies.

Masking: For each modality, the Dataset instance must return the masked source, the target and the mask tensor.
Feature Extraction: Features are the outputs from the transformer/attention layers. So the forward method must return outputs from all Encoder blocks of the transformer model. HuggingFace Transformers/Fairseq models return transformer layers outputs separately out of the box.

This implementation uses HuggingFace Transformers models as encoders for Data2Vec which you can inspect in the encoder.py files for each modality. Although, you can provide your own encoder model. Just make sure that your encoder must be Transformer-based according to the paper and outputs from every encoder layer must be provided.

Note: This implementation's goal is to provide the necessary building blocks of Data2Vec so anyone can adapt it to their own use case with ease, so in order to make it easy to get hands on, some functionalities like mixed precision, distributed training, etc are not included to keep it as clean & simple as possible. If you only need to train a standard large scale Data2Vec model use the official repo.

Train

First things first, install the requirements:

pip install -r requirements.txt

NLP

Train a Language Model based on RoBERTa (HuggingFace) on WikiText103

Configure the related properties in text/configs/roberta-pretraining.yaml and run:

python train.py --config text/configs/roberta-pretraining.yaml

Vision

Run a Masked Image modeling training based on BEiT (HuggingFace)

Pass the path to the image dataset in the config file at vision/configs/beit-pretraining.yaml under dataset > path > train/test and modify other properties as you desire and run the following:

python train.py --config vision/configs/beit-pretraining.yaml

Speech

Audio pretraining based on Wav2Vec2 (HuggingFace) on timit dataset. If you want to use other datasets like librispeech provide it in audio/dataset.py (some minor changes to the timit class would do the job because both are loaded from HuggingFace datasets)

Configure other properties as you desire and run the following:

python train.py --config audio/configs/wav2vec2-pretraining.yaml

Pre-trained Weights

The models are available on HuggingFace Hub and you can use them like below:

RoBERTa

Data2Vec model trained with RoBERTa as the encoder (data2vec-roberta-base)

from transformers import AutoModel, AutoConfig
from transformers import RobertaModel

checkpoint = 'arxyzan/data2vec-roberta-base'

# Option 1: load using AutoModel
data2vec_roberta = AutoModel.from_pretrained(checkpoint)

# Option 2: load directly by RobertaModel
data2vec_roberta = RobertaModel.from_pretrained(checkpoint)

BEiT

Data2Vec model trained with BEiT as the encoder (data2vec-beit-base)

from transformers import AutoModel, AutoConfig
from transformers import BeitModel

checkpoint = 'arxyzan/data2vec-beit-base'

# Option 1: load using AutoModel
data2vec_beit = AutoModel.from_pretrained(checkpoint)

# Option 2: load directly by BeitModel
data2vec_beit = BeitModel.from_pretrained(checkpoint)

Wav2Vec2

Data2Vec model trained with Wav2Vec2 as the encoder (data2vec-wav2vec2-base)

from transformers import AutoModel, AutoConfig
from transformers import Wav2Vec2Model

checkpoint = 'arxyzan/data2vec-wav2vec2-base'

# Option 1: load using AutoModel
data2vec_wav2vec2 = AutoModel.from_pretrained(checkpoint)

# Option 2: load directly by Wav2Vec2Model
data2vec_wav2vec2 = Wav2Vec2Model.from_pretrained(checkpoint)

Note: The above models' weights were carefully ported from the original checkpoints in the fairseq version.

Fine-tuning

Fine-tune using the checkpoints mentioned above:

# Text classification using Roberta model from HuggingFace
from transformers import RobertaModel, RobertaForSequenceClassification

checkpoint = 'arxyzan/data2vec-roberta-base'
# this is exactly a roberta model but trained with data2vec
data2vec_roberta = RobertaModel.from_pretrained(checkpoint)
text_classifier = RobertaForSequenceClassification(data2vec_roberta.config)
# assign `data2vec-roberta` weights to the roberta block of the classifier
text_classifier.roberta = data2vec_roberta
...

In case you trained a model using this codebase, you can fine-tune it by taking out the encoder's state dict from the checkpoint which gives you a HuggingFace model and you can fine-tune it for any downstream task as you'd normally do for HuggingFace models.

# load a checkpoint for finetuning
from transformers import RobertaModel, RobertaConfig
roberta = RobertaModel(RobertaConfig())
checkpoint = torch.load('path/to/data2vec.pt')
roberta_state_dict = checkpoint['encoder']
# load roberta weights from the encoder part of the data2vec model
encoder = roberta.load_state_dict(roberta_state_dict)

# Now fine-tune a regular HuggingFace RoBERTa model
...

Contributions

Any contribution regarding training, development and issues are welcome!

Comments

RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment

Hi there!

Great repo! I am trying to pre-train wav2vec2 using this and I get the error:

RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment

any fix for this?

opened by Sreyan88 7
Any plan for supporting distributed training?

Hi arxyzan~ Thanks for your great contribution! I wonder to known is there any plan for supporting distributed training, as pre-training will be too slow with a single GPU?

opened by ruizewang 3
EMA teacher model should not be deepcopied

EMA teacher model, according to the paper, is initialized randomly with the same architecture as student model. So, deepcopying the student model to create the teacher model should be avoided as it copies the weight parameters as well.

opened by sudhakaranjain 2
Question about reproducibility

Hello, thanks for your effort to make it easier to understand the data2vec. Let me ask a quick question; can we reproduce the paper with your implementation? I guess it is out of the scope of this repo, but I thought it would be quite nice if possible. Thank you anyway!

opened by daisukelab 2

Trouble with audio....?

Aryan, thank you very much for sharing your code with the world. I wonder if you could advise:

I am trying to train by following the instructions for audio, but I haven't been able to get TIMIT or LibriSpeech to work.

TIMIT

For TIMIT, I get the message from HuggingFace that it must be downloaded manually. From the URL provided in the message, I got to UPenn who apparently want $250? for the dataset?? ...So, ok, I obtained a copy from a friend and also from Kaggle. But in both cases the HF dataloader fails; it is looking for files that don't exist anywhere in the dataset: it is looking for files with lower-case letters like "*test" (all the filenames in both my copies are uppercase) and certain file extensions that exclude the .DOC which is provided in TIMIT:

Error message

  File "/home/ubuntu/envs/data2vec/lib/python3.9/site-packages/datasets/data_files.py", line 201, in resolve_patterns_locally_or_by_urls
    raise FileNotFoundError(error_msg)
FileNotFoundError: Unable to resolve any data file that matches '['**test*', '**eval*']' at /home/ubuntu/datasets/timit with any supported extension ['csv', 'tsv', 'json', 'jsonl', 'parquet', 'txt', 'blp', 'bmp', 'dib', 'bufr', 'cur', 'pcx', 'dcx', 'dds', 'ps', 'eps', 'fit', 'fits', 'fli', 'flc', 'ftc', 'ftu', 'gbr', 'gif', 'grib', 'h5', 'hdf', 'png', 'apng', 'jp2', 'j2k', 'jpc', 'jpf', 'jpx', 'j2c', 'icns', 'ico', 'im', 'iim', 'tif', 'tiff', 'jfif', 'jpe', 'jpg', 'jpeg', 'mpg', 'mpeg', 'msp', 'pcd', 'pxr', 'pbm', 'pgm', 'ppm', 'pnm', 'psd', 'bw', 'rgb', 'rgba', 'sgi', 'ras', 'tga', 'icb', 'vda', 'vst', 'webp', 'wmf', 'emf', 'xbm', 'xpm', 'zip']

The files look like

³       PHONCODE.DOC
³       PROMPTS.TXT
³       SPKRINFO.TXT
³       SPKRSENT.TXT
³       TESTSET.DOC

If I take away the 'clean' directive in the load_dataset call, then the dataset loads but fails later with a key error:

Epoch: 1/1000     0%|                                                                      | 0/31678 [00:00<?, ?batch/s]
Traceback (most recent call last):
  File "/home/ubuntu/shawley/data2vec-pytorch/train.py", line 25, in <module>
    trainer.train()
  File "/home/ubuntu/shawley/data2vec-pytorch/audio/trainer.py", line 142, in train
    train_loss = self.train_epoch(epoch)
  File "/home/ubuntu/shawley/data2vec-pytorch/audio/trainer.py", line 106, in train_epoch
    for batch in iterator:
  File "/home/ubuntu/envs/data2vec/lib/python3.9/site-packages/tqdm/std.py", line 1195, in __iter__
    for obj in iterable:
  File "/home/ubuntu/envs/data2vec/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 530, in __next__
    data = self._next_data()
  File "/home/ubuntu/envs/data2vec/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 570, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/ubuntu/envs/data2vec/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/ubuntu/envs/data2vec/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/ubuntu/shawley/data2vec-pytorch/audio/dataset.py", line 21, in __getitem__
    x = self.data[index]['audio']
KeyError: 'audio'

If I print out self.data just after it's loaded in your TIMIT class, there is no 'audio' part to the namespace:

print("self.data[0] = ",self.data[0])

self.data =  {'index': 1, 'test_or_train': 'TRAIN', 'dialect_region': 'DR4', 'speaker_id': 'MMDM0', 'filename': 'SI681.WAV.wav', 'path_from_data_dir': 'TRAIN/DR4/MMDM0/SI681.WAV.wav', 'path_from_data_dir_windows': 'TRAIN\\\\DR4\\\\MMDM0\\\\SI681.WAV.wav', 'is_converted_audio': True, 'is_audio': True, 'is_word_file': False, 'is_phonetic_file': False, 'is_sentence_file': False}

Are you able to comment or advise about getting TIMIT to work?

LibriSpeech

For LibriSpeech, I copied your TIMIT class in dataset.py and just hard-coded the name of the dataset:

class LibriSpeech(Dataset):
    def __init__(self, cfg, split, **kwargs):
        super(LibriSpeech, self).__init__()
        path = cfg.dataset.path
        #self.data = load_dataset(path, 'clean')[split]
        self.data = load_dataset("librispeech_asr", 'clean')
        #print("self.data = ",self.data)
        self.data = self.data[split]
        self.feature_extractor = Wav2Vec2FeatureExtractor(cfg.model.encoder_checkpoint)
        self.__dict__.update(kwargs)

    def __len__(self):
        return len(self.data)

    def __getitem__(self, index):
        x = self.data[index]['audio']
        x = self.feature_extractor(x['array'], sampling_rate=x['sampling_rate'], padding=True, return_tensors='pt')['input_values']
        return {'input_values': x[0]}

And then in trainer.py I just wrote

        #self.train_dataset = TIMIT(cfg, 'train')
        #self.test_dataset = TIMIT(cfg, 'test')
        self.train_dataset = LibriSpeech(cfg, 'train.100')
        self.test_dataset = LibriSpeech(cfg, 'test')

In that case the data is loaded without errors, and the training begins but aborts with a series of CUDA errors:

Epoch: 1/1000     0%|                                                                      | 0/28539 [00:00<?, ?batch/s]../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [5550,0,0], thread: [64,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [5550,0,0], thread: [65,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [5550,0,0], thread: [66,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.

...hundreds more lines like this, and then...

Epoch: 1/1000     0%|                                                                      | 0/28539 [00:04<?, ?batch/s]
Traceback (most recent call last):
  File "/home/ubuntu/shawley/data2vec-pytorch/train.py", line 25, in <module>
    trainer.train()
  File "/home/ubuntu/shawley/data2vec-pytorch/audio/trainer.py", line 142, in train
    train_loss = self.train_epoch(epoch)
  File "/home/ubuntu/shawley/data2vec-pytorch/audio/trainer.py", line 107, in train_epoch
    loss = self.train_step(batch)
  File "/home/ubuntu/shawley/data2vec-pytorch/audio/trainer.py", line 65, in train_step
    x, y = self.model(src, src, mask)
  File "/home/ubuntu/envs/data2vec/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ubuntu/shawley/data2vec-pytorch/data2vec/data2vec.py", line 83, in forward
    x = self.encoder(src, mask, **kwargs)['encoder_out']  # fetch the last layer outputs
  File "/home/ubuntu/envs/data2vec/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ubuntu/shawley/data2vec-pytorch/audio/encoder.py", line 35, in forward
    outputs = self.encoder(inputs, mask_time_indices=mask, output_hidden_states=True,
  File "/home/ubuntu/envs/data2vec/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ubuntu/envs/data2vec/lib/python3.9/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 1357, in forward
    hidden_states = self._mask_hidden_states(
  File "/home/ubuntu/envs/data2vec/lib/python3.9/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 1297, in _mask_hidden_states
    hidden_states[mask_time_indices] = self.masked_spec_embed.to(hidden_states.dtype)
RuntimeError: CUDA error: device-side assert triggered

Do you have a suggestion about getting LibreSpeech working?

Thanks again, Soctt

opened by drscotthawley 2

Which Hyperparameter tuning method to use?

First of all, thank you for the great work @AryanShekarlaban and @kabouzeid!

quick question: I have less experience with training big models like Transformers. I see that there are many frameworks and Algorithms for Hyperparameter tuning in internet. Could you suggest a hyperparameter tuning framework and algorithm for data2vec?

Thank you!

opened by anirudh2019 2
Model weights?

Although pretraining these models requires a lot of hardware resources and is almost impossible for an individual like me to do, there is the possibility to port the weights from HuggingFace models that actually use the same encoders as fairseq (and this repo). Otherwise this repo would be benefitial only for educational purposes.

Obviously, this task must be carried out so carefully but before that, the possibility of it must be verified. As this model "slightly" outperforms previous SOTA models, messing up even a single layer weight can ruin the whole thing!

The progress and issues regarding this task, will be stated here.

opened by arxyzan 2
encoder checkpoint?

Hi, thanks for your work on this!

Why is there an encoder checkpoint (I can't find this mentioned in the paper)? Is it possible to train from scratch with this codebase?

https://github.com/AryanShekarlaban/data2vec-pytorch/blob/640fb8531be9deb5f8f0653802e272a4d39f39db/vision/configs/beit-pretraining.yaml#L4

opened by kabouzeid 2

EMA model forward

        # model forward in online mode (student)
        x = self.encoder(src, mask, **kwargs)['encoder_out']  # fetch the last layer outputs
        if trg is None:
            return x

        # model forward in offline mode (teacher)
        with torch.no_grad():
            self.ema.model.eval()
            y = self.ema.model(trg, ~mask, **kwargs)['encoder_states']  # fetch the last transformer layers outputs

In the teacher forward pass the mask_time_indices is the inverse of the one in student, is this correct? I think the mask in the teacher forward pass should be None since the teacher expects the full version of input data

opened by anhvth 1

question about disabling gradient bp

https://github.com/facebookresearch/fairseq/blob/main/fairseq/modules/ema_module.py#L41 Hi. It seems you forget this line, which may also disable gradient bp of the student encoder? Thx!

opened by btwbtm 1
doubt about finetuning

first of all great work @AryanShekarlaban and @kabouzeid!

quick question - if I want to fine tune data2vec with a given backbone (e.g. wav2vec2) - would freezing the feature extractor be enough? or should I also add an nn.Linear layer?

I see that by design trainer.py finetunes with TIMIT - but I also seen in another issue that we're actually training it from scratch (not sure if I'm missing something here)

thanks!

opened by rafaelvp-db 1
Some Questions
Hi @arxyzan ,

Can you tell me what parts I need to change if my input size is 256 instead of 224?

Is it mandatory to load encoder_checkpoint? Can I train my model from scratch?

why is the config file named beit-pretraining.yaml for the vision task?

Could you help me to solve problem below:

Epoch: 1/100 0%| | 0/18001 [00:02<?, ?batch/s] Traceback (most recent call last): File "/mnt/c/data2vec-pytorch/train.py", line 25, in trainer.train() File "/mnt/c/data2vec-pytorch/vision/trainer.py", line 131, in train train_loss = self.train_epoch(epoch) File "/mnt/c/data2vec-pytorch/vision/trainer.py", line 97, in train_epoch loss = self.train_step(batch) File "/mnt/c/data2vec-pytorch/vision/trainer.py", line 56, in train_step x, y = self.model(src, trg, mask) File "/home/bryan/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/mnt/c/data2vec-pytorch/data2vec/data2vec.py", line 90, in forward y = self.ema.model(trg, ~mask, **kwargs)['encoder_states'] # fetch the last transformer layers outputs File "/home/bryan/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/mnt/c/data2vec-pytorch/vision/encoder.py", line 38, in forward outputs = self.encoder(pixel_values=inputs, output_hidden_states=True, output_attentions=True, **kwargs) File "/home/bryan/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/home/bryan/.local/lib/python3.10/site-packages/transformers/models/beit/modeling_beit.py", line 681, in forward embedding_output = self.embeddings(pixel_values, bool_masked_pos) File "/home/bryan/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/home/bryan/.local/lib/python3.10/site-packages/transformers/models/beit/modeling_beit.py", line 154, in forward embeddings = self.patch_embeddings(pixel_values) File "/home/bryan/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/home/bryan/.local/lib/python3.10/site-packages/transformers/models/beit/modeling_beit.py", line 206, in forward embeddings = self.projection(pixel_values).flatten(2).transpose(1, 2) File "/home/bryan/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/home/bryan/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 463, in forward return self._conv_forward(input, self.weight, self.bias) File "/home/bryan/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward return F.conv2d(input, weight, bias, self.stride, RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same
opened by bryanwong17 3

Owner

Aryan Shekarlaban

Deep Learning Developer & Researcher

GitHub

Official implementation of Meta-StyleSpeech and StyleSpeech

Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation Dongchan Min, Dong Bok Lee, Eunho Yang, and Sung Ju Hwang This is an official code

169 Jan 5, 2023

Meta learning algorithms to train cross-lingual NLI (multi-task) models

4 Nov 20, 2022

This repository contains examples of Task-Informed Meta-Learning

Task-Informed Meta-Learning This repository contains examples of Task-Informed Meta-Learning (paper). We consider two tasks: Crop Type Classification

10 Dec 19, 2022

pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.

The PyTorch-Kaldi Speech Recognition Toolkit PyTorch-Kaldi is an open-source repository for developing state-of-the-art DNN/HMM speech recognition sys

2.3k Dec 27, 2022

PyTorch implementation of "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language" from Meta AI

Related tags

Overview

data2vec-pytorch

PyTorch implementation of "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language" from Meta AI (FAIR)

Implementation

Train

NLP

Vision

Speech

Pre-trained Weights

RoBERTa

BEiT

Wav2Vec2

Fine-tuning

Contributions

Comments

TIMIT

LibriSpeech

Owner

Aryan Shekarlaban

Official implementation of Meta-StyleSpeech and StyleSpeech

Meta learning algorithms to train cross-lingual NLI (multi-task) models

This repository contains examples of Task-Informed Meta-Learning

pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.

Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

SAINT PyTorch implementation

Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

A fast and easy implementation of Transformer with PyTorch.

A PyTorch Implementation of End-to-End Models for Speech-to-Text

Pytorch implementation of Tacotron

Google AI 2018 BERT pytorch implementation

Unofficial PyTorch implementation of Google AI's VoiceFilter system

Implementation of ProteinBERT in Pytorch

A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

A pytorch implementation of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).