Applications using the GTN library and code to reproduce experiments in "Differentiable Weighted Finite-State Transducers"

Overview

gtn_applications

An applications library using GTN. Current examples include:

  • Offline handwriting recognition
  • Automatic speech recognition

Installing

  1. Build python bindings for the GTN library.

  2. conda activate gtn_env # using the same environment from Step 1

  3. conda install pytorch torchvision -c pytorch

  4. pip install -r requirements.txt

Training

We give an example of how to train on the IAM off-line handwriting recognition benchmark. Open In Colab

First register here and download the dataset:

./datasets/download/iamdb.sh <path_to_data> <email> <password>

Then update the configuration JSON configs/iamdb/tds2d.json to point to the data path used above:

  "data" : {
    "dataset" : "iamdb",
    "data_path" : "<path_to_data>",
    "num_features" : 64
  },

Single GPU training can be run with:

python train.py --config configs/iamdb/tds2d.json

To run distributed training with multiple GPUs:

python train.py --config configs/iamdb/tds2d.json --world_size <NUM_GPUS>

For a list of options type:

python train.py -h

Contributing

Use Black to format python code.

First install:

pip install black

Then run with:

black <file>.py

License

GTN is licensed under a MIT license. See LICENSE.

Comments
  • module 'gtn' has no attribute 'Device' in STC

    module 'gtn' has no attribute 'Device' in STC

    I installed gtn by pip install gtn and tested STC like below

    from criterions.STC import STC
    import random
    import torch
    
    stcloss = STC(blank_idx=0, p0=1, plast=1, thalf=1, reduction="none")
    
    batch_size = 2
    mel_len = 115
    text_len = 25
    num_token = 64
    inputs = torch.randn(mel_len, batch_size, num_token)
    targets = [[random.randint(1, text_len) for _ in range(random.randint(1, num_token))] for _ in range(batch_size)]
    loss = stcloss(inputs, targets)
    

    But when I trying to run code like above. The Error Like below occurs.

    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    <ipython-input-4-b0ecfb513efd> in <module>
          7 inputs = torch.randn(mel_len, batch_size, num_token)
          8 targets = [[random.randint(1, text_len) for _ in range(random.randint(1, num_token))] for _ in range(batch_size)]
    ----> 9 loss = stcloss(inputs, targets)
    
    ~/anaconda3/envs/leecho/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
       1100         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
       1101                 or _global_forward_hooks or _global_forward_pre_hooks):
    -> 1102             return forward_call(*input, **kwargs)
       1103         # Do not call functions when jit is used
       1104         full_backward_hooks, non_full_backward_hooks = [], []
    
    /data/leecho/xi-stt/xi-stt/model/STCLoss.py in forward(self, inputs, targets)
        212             # concatenate (tokens, <star>, <star>\tokens)
        213             log_probs = torch.cat([log_probs, lse, neglse], dim=2)
    --> 214         return STCLoss(log_probs, targets, prob, self.reduction)
    
    /data/leecho/xi-stt/xi-stt/model/STCLoss.py in forward(ctx, inputs, targets, prob, reduction)
         96             emissions_graphs[b] = g_emissions
         97 
    ---> 98         gtn.parallel_for(process, range(B))
         99 
        100         ctx.auxiliary_data = (losses, scales, emissions_graphs, inputs.shape)
    
    /data/leecho/xi-stt/xi-stt/model/STCLoss.py in process(b)
         71             # create emission graph
         72             g_emissions = gtn.linear_graph(
    ---> 73                 T, Cstar, gtn.Device(gtn.CPU), inputs.requires_grad
         74             )
         75             cpu_data = inputs[b].cpu().contiguous()
    
    AttributeError: module 'gtn' has no attribute 'Device'
    

    Is there any way to solve this problem?

    opened by LEECHOONGHO 7
  • [STC] STC loss ascends while training

    [STC] STC loss ascends while training

    Hello, I'm training ASR model with STC Loss and letter-to-word encoder like below. But when I progress training, STC Loss ascended and became 'Inf' after 12000 step.

    Is there any miss in my implementation? Any help would be appreciated. Thank you.

    training args:

    • num_gpu : 4
    • audio_length_per_gpu : 160s
    • lr : 0.0001~0.001
    • use FullyShardedDataParallel, mix_precision=off
    #   max_word_length : 10
    #   n_letter_symbols : 69 (blank + pad + korean)
    #   n_word_symbols : 12158 (blank + korean_morph 97% in corpus)
    
    #   blank_idx=0, p0=0.05, plast=0.15, thalf=16000
    
    self.criterion = STC(
        blank_idx=self.cfg.blank_idx, 
        p0=self.cfg.p0, 
        plast=self.cfg.plast, 
        thalf=self.cfg.thalf, reduction="mean"
    )
    
    #   model_output : Tensor[batch_size, max_frame_length, n_letter_symbols*max_word_length]
    #   self.l2w_matrix : Tensor[n_letter_symbols*max_word_length, n_word_symbols]
    
    word_level_output = model_output @ self.l2w_matrix
    
    #   word_level_output : Tensor[batch_size, max_frame_length, n_word_symbols]
    
    word_level_output =  F.log_softmax(word_level_output.transpose(1, 0), dim=-1)
    
    loss = self.criterion(word_level_output, word_labels)
    

    stcloss

    opened by LEECHOONGHO 3
  • [STC] Question for STC loss training

    [STC] Question for STC loss training

    Hello, I'm trying to apply STC for my ASR model training.

    Before proceeding with the training, I have a question to ask about STC training mentioned in STC paper [1] If anyone has experimented with the case I posited, please give me advise.

    1. Does STC valid for pDrop=0.01~0.02 data? or do I just have to use CTC?
        data - The 99% reliable data that may contain some typos or sometimes the business name is erased.
    
        1-1. If STC is valid for 1., Is it sufficient to set p_0=0.01, p_max=0.03 for this case?
    
    2.  Adam is not allowed for STC/(WFST)?
    
    3. For future work, I am thinking of using pseudo labeled YouTube data for ASR training.
        In this case, data could have much incorrect labels in it.
        Does STC perform better than CTC even in case of incorrect labeled data training?
    

    Thank you.

    [1] Star Temporal Classification: Sequence Classification with Partially Labeled Data.

    opened by LEECHOONGHO 2
  • RNN model correction

    RNN model correction

    RNN model's default parameters contain stride. There are 2 convolutional layers each with a stride of 2. So for num_feature = 80, get reduced to 80 ->40 ->20. Updated the code to integrate it.

    CLA Signed 
    opened by ronitd 2
  • Open Source STC code; Some code cleanup

    Open Source STC code; Some code cleanup

    Summary

    • Create criterions, models directory and organize files appropriately
    drawing
    • Make __init__.py files empty so that model training works. Otherwise, there were issues with module loading.
    • Will send a follow-up PR to add documentation to STC code

    Test Plan:

    Ran IamDB training and make sure it runs an epoch. Screen Shot 2022-01-29 at 4 22 47 PM

    CLA Signed 
    opened by vineelpratap 1
  • Dumb question for arc_sort func

    Dumb question for arc_sort func

    When should we set the arc_sort(true) ? For example, in https://github.com/facebookresearch/gtn_applications/blob/eb1cb83dda3d3887f980dbd6b697c2c2b6fd1d45/transducer.py#L265 arc_sort was given parameter True. If my understanding is correct, for FSA target, arc_sort(true) and arc_sort(false) would give the exact same result? How do we decide to set it true or false? (When we prefer sort it in olable and when ilabel? )

    opened by yuekaizhang 1
  • Is it possible to get multiple recognition results instead of one?

    Is it possible to get multiple recognition results instead of one?

    It seems that in model.viterbi(self, outputs), only one recognition results will be returned for each sample. Is it possible to return multiple alternates from this decoding method, like beam search? If yes, what would be the recommended decoding algorithm? Will that be time consuming?

    Thanks!

    opened by zhwa 1
  • Organize dataset download scripts

    Organize dataset download scripts

    • Update IAM download script. Previous scripts were not working - https://stackoverflow.com/q/64715260
    • Create a new directories dataset/download and dataset/utils and move the files appropriately
    CLA Signed 
    opened by vineelpratap 0
  • Publish recipes for ngram transitions work

    Publish recipes for ngram transitions work

    Consider this as V0. TODO

    • Test all the recipes again
    • More details to README. Possibly share ltr, WP based tokens, lexion, transition graph directly via S3 ...
    CLA Signed 
    opened by vineelpratap 0
  • For ASR inference, how to use the asg for inference for LibriSpeech or any wav sound with Pytorch?

    For ASR inference, how to use the asg for inference for LibriSpeech or any wav sound with Pytorch?

    Hi. I am planning to try asg to replace the Wav2Vec2 with LM.

    Compairing to the tutorial on CTC at Pytorch, https://pytorch.org/audio/main/tutorials/asr_inference_with_ctc_decoder_tutorial.html

    image

    image I am thinking how to do the same thing with asg to replace the CTC? If it is using the LM model with asg, is it better than wav2vec2LM with CTC? How much the difference?

    Cloud you show me how to do automatic speech recognition with ASG? I hope I would write the decoder in javascript if I understand how it is going.

    opened by JonathanSum 2
  • [STC] Question about ASR training.

    [STC] Question about ASR training.

    Hello, I'm trying to implement ASR model proposed in Star Temporal Classification. But I have some trouble for implementing my first 'word level output ASR model'.

    When I use simple word-to-encoder's one hot tensor(E matrix), STC loss ascends. So I made several modifications to solve this problem.

    (1). I view x : [T, B, A_L × l_max] to x : [T, B, l_max, A_L], apply F.log_softmax(dim=-1) and view it to original shape x : [T, B, A_L × l_max]

    (2). And I apply letter-to-word encoder e_matrix : [A_L × l_max, A_W] by x = x @ e_matrix, and F.log_softmax(torch.exp(x), dim=-1) for STC input.

    The reason I applied softmax for A_L is make x to probability of the appearance of alphabets at each location in the word. And log -> e_matrix -> exp is to convert the one hot sum by e_matrix operation into a probability product.

    1.Is it right for letter-to-word encoder?

    After applying this, STC loss starts from 3.5 and fall to 2.1~2.5 for 15 epoch. But the viterbi decoded(implemented in gtn_applications/criterions/ctc.py) output is always BLANK while checking WER for every predicted output.

    Cause CTC loss and word level classification output has the same result, I assume that this is a problem with the properties of CTC training and word-level output ASR.

    1. Is this an ordinary result?
    2. How many epochs are needed to get results other than blank usually?
    3. How many epochs are needed to reach the highest performance usually?

    I'm sorry to bother you every time.

    opened by LEECHOONGHO 6
  • "MemoryError: std::bad_alloc" while using compose and intersect function.

    I want to compose two gtn, one is for lexicon, and the second is for grammar (LM) which is created from lm_arpa.py file. I am getting "MemoryError: std::bad_alloc" while doing with 250 GB RAM. I am not sure whether this is on the expected lines or not. PFA of both the gtn and code for reproducibility. gu-G.txt gu-L.txt

    Code: gtn.savetxt('gu-LG.txt', gtn.compose(gtn.loadtxt('gu-L.txt'), gtn.loadtxt('gu-G.txt')).arc_sort())

    opened by ronitd 0
Owner
Facebook Research
Facebook Research
Code to reproduce the experiments in the paper "Transformer Based Multi-Source Domain Adaptation" (EMNLP 2020)

Transformer Based Multi-Source Domain Adaptation Dustin Wright and Isabelle Augenstein To appear in EMNLP 2020. Read the preprint: https://arxiv.org/a

CopeNLU 36 Dec 5, 2022
Code to reproduce experiments in the paper "Explainability Requires Interactivity".

Explainability Requires Interactivity This repository contains the code to train all custom models used in the paper Explainability Requires Interacti

Digital Health & Machine Learning 5 Apr 7, 2022
Code to reproduce the experiments from our NeurIPS 2021 paper " The Limitations of Large Width in Neural Networks: A Deep Gaussian Process Perspective"

Code To run: python runner.py new --save <SAVE_NAME> --data <PATH_TO_DATA_DIR> --dataset <DATASET> --model <model_name> [options] --n 1000 - train - t

Geoff Pleiss 5 Dec 12, 2022
Official codebase for "B-Pref: Benchmarking Preference-BasedReinforcement Learning" contains scripts to reproduce experiments.

B-Pref Official codebase for B-Pref: Benchmarking Preference-BasedReinforcement Learning contains scripts to reproduce experiments. Install conda env

null 48 Dec 20, 2022
This repo will contain code to reproduce and build upon understanding transfer learning

What is being transferred in transfer learning? This repo contains the code for the following paper: Behnam Neyshabur*, Hanie Sedghi*, Chiyuan Zhang*.

null 4 Jun 16, 2021
Code to reproduce the results for Compositional Attention: Disentangling Search and Retrieval.

Compositional-Attention This repository contains the official implementation for the paper Compositional Attention: Disentangling Search and Retrieval

Sarthak Mittal 17 Oct 23, 2021
Code reproduce for paper "Vehicle Re-identification with Viewpoint-aware Metric Learning"

VANET Code reproduce for paper "Vehicle Re-identification with Viewpoint-aware Metric Learning" Introduction This is the implementation of article VAN

EMDATA-AILAB 23 Dec 26, 2022
PyTorch Implementation of Fully Convolutional Networks. (Training code to reproduce the original result is available.)

pytorch-fcn PyTorch implementation of Fully Convolutional Networks. Requirements pytorch >= 0.2.0 torchvision >= 0.1.8 fcn >= 6.1.5 Pillow scipy tqdm

Kentaro Wada 1.6k Jan 7, 2023
Code to reproduce the results in the paper "Tensor Component Analysis for Interpreting the Latent Space of GANs".

Tensor Component Analysis for Interpreting the Latent Space of GANs [ paper | project page ] Code to reproduce the results in the paper "Tensor Compon

James Oldfield 4 Jun 17, 2022
Code to reproduce the results for Statistically Robust Neural Network Classification, published in UAI 2021

Code to reproduce the results for Statistically Robust Neural Network Classification, published in UAI 2021

null 1 Jun 2, 2022
Reproduce partial features of DeePMD-kit using PyTorch.

DeePMD-kit on PyTorch For better understand DeePMD-kit, we implement its partial features using PyTorch and expose interface consuing descriptors. Tec

Shaochen Shi 8 Dec 17, 2022
The codes reproduce the figures and statistics in the paper, "Controlling for multiple covariates," by Mark Tygert.

The accompanying codes reproduce all figures and statistics presented in "Controlling for multiple covariates" by Mark Tygert. This repository also pr

Meta Research 1 Dec 2, 2021
In this repo we reproduce and extend results of Learning in High Dimension Always Amounts to Extrapolation by Balestriero et al. 2021

In this repo we reproduce and extend results of Learning in High Dimension Always Amounts to Extrapolation by Balestriero et al. 2021. Balestriero et

Sean M. Hendryx 1 Jan 27, 2022
Reproduce ResNet-v2(Identity Mappings in Deep Residual Networks) with MXNet

Reproduce ResNet-v2 using MXNet Requirements Install MXNet on a machine with CUDA GPU, and it's better also installed with cuDNN v5 Please fix the ran

Wei Wu 531 Dec 4, 2022
The LaTeX and Python code for generating the paper, experiments' results and visualizations reported in each paper is available (whenever possible) in the paper's directory

This repository contains the software implementation of most algorithms used or developed in my research. The LaTeX and Python code for generating the

João Fonseca 3 Jan 3, 2023
Minimal diffusion models - Minimal code and simple experiments to play with Denoising Diffusion Probabilistic Models (DDPMs)

Minimal code and simple experiments to play with Denoising Diffusion Probabilist

Rithesh Kumar 16 Oct 6, 2022
Implementation of experiments in the paper Clockwork Variational Autoencoders (project website) using JAX and Flax

Clockwork VAEs in JAX/Flax Implementation of experiments in the paper Clockwork Variational Autoencoders (project website) using JAX and Flax, ported

Julius Kunze 26 Oct 5, 2022
Code to run experiments in SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression.

Code to run experiments in SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression. Not an official Google product. Me

Google Research 27 Dec 12, 2022
PyTorch code to run synthetic experiments.

Code repository for Invariant Risk Minimization Source code for the paper: @article{InvariantRiskMinimization, title={Invariant Risk Minimization}

Facebook Research 345 Dec 12, 2022