Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.

Overview

CTC Decoding Algorithms

Update 2021: installable Python package

Python implementation of some common Connectionist Temporal Classification (CTC) decoding algorithms. A minimalistic language model is provided.

Installation

  • Go to the root level of the repository
  • Execute pip install .
  • Go to tests/ and execute pytest to check if installation worked

Usage

Basic usage

Here is a minimalistic executable example:

import numpy as np
from ctc_decoder import best_path, beam_search

mat = np.array([[0.4, 0, 0.6], [0.4, 0, 0.6]])
chars = 'ab'

print(f'Best path: "{best_path(mat, chars)}"')
print(f'Beam search: "{beam_search(mat, chars)}"')

The output mat (numpy array, softmax already applied) of the CTC-trained neural network is expected to have shape TxC and is passed as the first argument to the decoders. T is the number of time-steps, and C the number of characters (the CTC-blank is the last element). The characters that can be predicted by the neural network are passed as the chars string to the decoder. Decoders return the decoded string.
Running the code outputs:

Best path: ""
Beam search: "a"

To see more examples on how to use the decoders, please have a look at the scripts in the tests/ folder.

Language model and BK-tree

Beam search can optionally integrate a character-level language model. Text statistics (bigrams) are used by beam search to improve reading accuracy.

from ctc_decoder import beam_search, LanguageModel

# create language model instance from a (large) text
lm = LanguageModel('this is some text', chars)

# and use it in the beam search decoder
res = beam_search(mat, chars, lm=lm)

The lexicon search decoder computes a first approximation with best path decoding. Then, it uses a BK-tree to retrieve similar words, scores them and finally returns the best scoring word. The BK-tree is created by providing a list of dictionary words. A tolerance parameter defines the maximum edit distance from the query word to the returned dictionary words.

from ctc_decoder import lexicon_search, BKTree

# create BK-tree from a list of words
bk_tree = BKTree(['words', 'from', 'a', 'dictionary'])

# and use the tree in the lexicon search
res = lexicon_search(mat, chars, bk_tree, tolerance=2)

Usage with deep learning frameworks

Some notes:

  • No adapter for TensorFlow or PyTorch is provided
  • Apply softmax already in the model
  • Convert to numpy array
  • Usually, the output of an RNN layer rnn_output has shape TxBxC, with B the batch dimension
    • Decoders work on single batch elements of shape TxC
    • Therefore, iterate over all batch elements and apply the decoder to each of them separately
    • Example: extract matrix of batch element 0 mat = rnn_output[:, 0, :]
  • The CTC-blank is expected to be the last element along the character dimension
    • TensorFlow has the CTC-blank as last element, so nothing to do here
    • PyTorch, however, has the CTC-blank as first element by default, so you have to move it to the end, or change the default setting

List of provided decoders

Recommended decoders:

  • best_path: best path (or greedy) decoder, the fastest of all algorithms, however, other decoders often perform better
  • beam_search: beam search decoder, optionally integrates a character-level language model, can be tuned via the beam width parameter
  • lexicon_search: lexicon search decoder, returns the best scoring word from a dictionary

Other decoders, from my experience not really suited for practical purposes, but might be used for experiments or research:

  • prefix_search: prefix search decoder
  • token_passing: token passing algorithm
  • Best path decoder implementation in OpenCL (see extras/ folder)

This paper gives suggestions when to use best path decoding, beam search decoding and token passing.

Documentation of test cases and data

References

Comments
  • Handling duplicate paths in Beam Search

    Handling duplicate paths in Beam Search

    Hey! I am wondering if you could help me to figure something out, in [1] you mentioned that summing up probabilities for Pr, Pr+ and Pr- leads to better results, I tried it in my implementation based on [2], and it does get better, but my probabilities are getting positive (I am working with log probabilities, so the values should be between (-inf, 0]). Did you experienced this phenomenon while implementing the sum in your algorithm?

    [1] Stackexchange CTC [2] CTC implementation github

    PS: Sorry if this is not the place to make this question, but I have no other way to reach you.

    opened by giovannirescia 7
  • difference between this repo and ctc decoder in tensorflow

    difference between this repo and ctc decoder in tensorflow

    Thanks for your code. I test my sequence with both ctc decoder in tensorflow and this repo.I always get different result. Tensorflow is always right.This repo sometimes return right and sometimes return wrong. Have you ever compare these two implementation?

    opened by cosimo17 4
  • Language model at word level

    Language model at word level

    Hi, did you add word level language model for beam search?

    Currently its easy to add character level bi-gram, but I find it much harder to add word level. I tried CTC token passing algorithm but its just way too slow comparing beam search.

    opened by marcoleewow 4
  • Support for K-Gram LM where k > 2?

    Support for K-Gram LM where k > 2?

    I wanted to experiment with the LMs other than Bigram. Any suggestion on how to approach extending current codebase? or the probability of the last character conditioned on all previous character.

    opened by Sushil-Thapa 3
  • beam_search with a beam_width=1

    beam_search with a beam_width=1

    Hi! Could you please tell, why does a beam_search with a beam_width equal to 1 not give the same result as best_path?

    For example

    import numpy as np
    from ctc_decoder import best_path, beam_search
    
    chars = 'ab'
    mat = np.array([[0.8, 0, 0.2], [0.4, 0.0, 0.6], [0.8, 0, 0.2]])
    
    print(f'Best path: "{best_path(mat, chars)}"')
    print(f'Beam search: "{beam_search(mat, chars, beam_width=1)}"')
    

    Gives: Best path: "aa" Beam search: "a"

    Thanks!

    opened by skalinin 2
  • in beamsearch.py,why last.norm()  only use in the last step?

    in beamsearch.py,why last.norm() only use in the last step?

    It's a great job, thanks to the author.

    i have a question in beam search +lm ,why last.norm() only use in the last step ? why not use last.norm() in every time step ? the long the seq the lm is small,so it should be compensate by length norm, i think it should be norm every time step ,is it right ? thanks in advance

    opened by l2009312042 2
  • Fixed transition to new words skipping probabilities

    Fixed transition to new words skipping probabilities

    I got several problems on rather trivial dictionaries and charsets (see below):

    • Without pull-request:
    Greedy-Decoder:  # 4 3 5 4 2 3 5 6 7 6 7 5 4 5 4 3 5 . 2 3
    Token-Passing:   # 4 3 5 4 2 5 6 7 7 5 4 5 4 3 . 2 3
    
    • With pull-request (as expected):
    Greedy-Decoder:  # 4 3 5 4 2 3 5 6 7 6 7 5 4 5 4 3 5 . 2 3
    Token-Passing:   # 4 3 5 4 2 3 5 6 7 6 7 5 4 5 4 3 5 . 2 3
    
    • Test code (copy the snipped in TokenPassing.py and use all files in the appendix)
    with open("charset.txt") as f:
    	classes = f.read().split("\n")
    with open("dictionary.txt") as f:
    	dictionary = f.read().split("\n")
    probabilities = np.load("propabilities.npy")
    
    # greedy
    from itertools import groupby
    best_path = np.argmax(probabilities, axis=1)
    blank_idx = 0
    best_chars_collapsed = [classes[k] for k, _ in groupby(best_path) if k != blank_idx]
    res = ' '.join(best_chars_collapsed)
    print("Greedy-Decoder: ", res)
    
    # token passing
    tp = ctcTokenPassing(probabilities, classes, dictionary, blankIdx=blank_idx)
    print("Token-Passing:  ", tp)
    
    import matplotlib.pyplot as plt
    plt.imshow(np.log(probabilities))
    plt.show()
    

    The problem is that the current code allows to skip a single output on the first transition (0->1,2) since toks.get(wIdx, s - 1, t - 1) is chosen with does not include the probability at t. Using toks.get(wIdx, s - 1, t) instead fixes this, because this includes the emission probability for blank.

    There might be another crucial issue if a there is no blank between two words. I had no time to check this, though.

    Appendix: probabilities

    charset.txt dictionary.txt propabilities.zip

    opened by ChWick 2
  • best_path is better than beam_search in my experiments,is there something wrong with my network?

    best_path is better than beam_search in my experiments,is there something wrong with my network?

    Hi, I use cnn + lstm + ctc in my license plate recognition system。And I use your python scripts to test the precision of "best path decoding" and "beam search decoding"。The result is "best path" got 92.3% while "beam search" got 91.7%。In my experiments, time step equals to 21,and the total num of labels equals to 77。 Could you give me some advice? Thanks a lot!

    opened by shaoming20798 2
  • Question about language model initialization

    Question about language model initialization

    Hi, Thanks for the sharing. I have a question in language model initialization function. Why do you set numSamples[c] to be len(classes) instead of 0? https://github.com/githubharald/CTCDecoder/blob/d6a88ea5b3187fe5c81508d377b32eb6c860b868/src/LanguageModel.py#L31

    opened by xuannianz 2
  • module 'pyopencl' has no attribute 'enqueue_write_buffer

    module 'pyopencl' has no attribute 'enqueue_write_buffer

    I am having a difficult time using the GPU. it runs ok with out GPU but with I get this error

    =====Line example (GPU)===== Traceback (most recent call last): File "main.py", line 147, in testLineExampleGPU() File "main.py", line 122, in testLineExampleGPU resBatch = BestPathCL.ctcBestPathCL(batch, classes, clWrapper) File "/home/ubuntu/handwrite/CTCDecoder/src/BestPathCL.py", line 109, in ctcBestPathCL labelStrBatch = clWrapper.compute(batch) File "/home/ubuntu/handwrite/CTCDecoder/src/BestPathCL.py", line 84, in compute cl.enqueue_write_buffer(self.queue, self.batchBuf, batch.astype(np.float32), is_blocking=False) AttributeError: module 'pyopencl' has no attribute 'enqueue_write_buffer'

    opened by greenpdx 2
  • Different beam search output using different blankIdx value

    Different beam search output using different blankIdx value

    Hi, I have a question regarding your beam search implementation.

    On your ctcBeamSearch method, you put value on blankIdx equals to length of the classes (in this case, the known letters and symbols). But on some other beam-search implementation, they put zero on it.

    I tested this using your example, and indeed it differs both in decoded result and how far is it from the ground truth (i'm using CER and WER)

    =====Line example (using blankIdx = len(classes))=====
    TARGET                  : "the fake friend of the family, like the"
    BEAM SEARCH             : "the fak friend of the fomcly hae tC" CER: CER/WER: 0.25714/0.15000
    BEAM SEARCH LM          : "the fake friend of the family, lie th" CER: CER/WER: 0.05405/0.03226
    =====Line example (using blankIdx=0)=====
    TARGET                  : "the fake friend of the family, like the"
    BEAM SEARCH             : "the faetker friend of ther foarmnacly,  harse. tHhC." CER: CER/WER: 0.33333/0.22368
    BEAM SEARCH LM          : "the fake friend of the family, like the " CER: CER/WER: 0.00000/0.00000
    

    So is there a different case where the blankIdx is not zero? Which value is suitable for beam search decoding?

    opened by miqbal23 1
Owner
Harald Scheidl
Interested in computer vision, deep learning, C++ and Python.
Harald Scheidl
Persian-lexicon - A lexicon of 70K unique Persian (Farsi) words

Persian Lexicon This repo uses Uppsala Persian Corpus (UPC) to construct a lexic

Saman Vaisipour 7 Apr 1, 2022
Incorporating KenLM language model with HuggingFace implementation of Wav2Vec2CTC Model using beam search decoding

Wav2Vec2CTC With KenLM Using KenLM ARPA language model with beam search to decode audio files and show the most probable transcription. Assuming you'v

farisalasmary 65 Sep 21, 2022
multi-label,classifier,text classification,多标签文本分类,文本分类,BERT,ALBERT,multi-label-classification,seq2seq,attention,beam search

multi-label,classifier,text classification,多标签文本分类,文本分类,BERT,ALBERT,multi-label-classification,seq2seq,attention,beam search

hellonlp 30 Dec 12, 2022
C.J. Hutto 3.8k Dec 30, 2022
C.J. Hutto 2.8k Feb 18, 2021
Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge This is an implementation of the paper,

Mutian He 19 Oct 14, 2022
hashily is a Python module that provides a variety of text decoding and encoding operations.

hashily is a python module that performs a variety of text decoding and encoding functions. It also various functions for encrypting and decrypting text using various ciphers.

DevMysT 5 Jul 17, 2022
Mirco Ravanelli 2.3k Dec 27, 2022
Long text token classification using LongFormer

Long text token classification using LongFormer

abhishek thakur 161 Aug 7, 2022
Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

null 186 Dec 24, 2022
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/

Texar-PyTorch is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar

ASYML 726 Dec 30, 2022
Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

Summarization, translation, Q&A, text generation and more at blazing speed using a T5 version implemented in ONNX. This package is still in alpha stag

Abel 211 Dec 28, 2022
Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

Summarization, translation, Q&A, text generation and more at blazing speed using a T5 version implemented in ONNX. This package is still in alpha stag

Abel 137 Feb 1, 2021
Implemented shortest-circuit disambiguation, maximum probability disambiguation, HMM-based lexical annotation and BiLSTM+CRF-based named entity recognition

Implemented shortest-circuit disambiguation, maximum probability disambiguation, HMM-based lexical annotation and BiLSTM+CRF-based named entity recognition

null 0 Feb 13, 2022
NLPShala , the best IDE for all Natural language processing tasks.

The revolutionary IDE for all NLP (Natural language processing) stuffs on the internet.

Abhi 3 Aug 8, 2021
A program that uses real statistics to choose the best times to bet on BloxFlip's crash gamemode

Bloxflip Smart Bet A program that uses real statistics to choose the best times to bet on BloxFlip's crash gamemode. https://bloxflip.com/crash. THIS

null 43 Jan 5, 2023
Python code for ICLR 2022 spotlight paper EViT: Expediting Vision Transformers via Token Reorganizations

Expediting Vision Transformers via Token Reorganizations This repository contain

Youwei Liang 101 Dec 26, 2022
Sequence modeling benchmarks and temporal convolutional networks

Sequence Modeling Benchmarks and Temporal Convolutional Networks (TCN) This repository contains the experiments done in the work An Empirical Evaluati

CMU Locus Lab 3.5k Jan 3, 2023