Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.

Harald Scheidl

Last update: Jan 3, 2023

Related tags

Text Data & NLP python opencl recurrent-neural-networks speech-recognition beam-search language-model handwriting-recognition ctc loss prefix-search ctc-loss token-passing best-path

Overview

CTC Decoding Algorithms

Update 2021: installable Python package

Python implementation of some common Connectionist Temporal Classification (CTC) decoding algorithms. A minimalistic language model is provided.

Installation

Go to the root level of the repository
Execute pip install .
Go to tests/ and execute pytest to check if installation worked

Usage

Basic usage

Here is a minimalistic executable example:

import numpy as np
from ctc_decoder import best_path, beam_search

mat = np.array([[0.4, 0, 0.6], [0.4, 0, 0.6]])
chars = 'ab'

print(f'Best path: "{best_path(mat, chars)}"')
print(f'Beam search: "{beam_search(mat, chars)}"')

The output mat (numpy array, softmax already applied) of the CTC-trained neural network is expected to have shape TxC and is passed as the first argument to the decoders. T is the number of time-steps, and C the number of characters (the CTC-blank is the last element). The characters that can be predicted by the neural network are passed as the chars string to the decoder. Decoders return the decoded string.
Running the code outputs:

Best path: ""
Beam search: "a"

To see more examples on how to use the decoders, please have a look at the scripts in the tests/ folder.

Language model and BK-tree

Beam search can optionally integrate a character-level language model. Text statistics (bigrams) are used by beam search to improve reading accuracy.

from ctc_decoder import beam_search, LanguageModel

# create language model instance from a (large) text
lm = LanguageModel('this is some text', chars)

# and use it in the beam search decoder
res = beam_search(mat, chars, lm=lm)

The lexicon search decoder computes a first approximation with best path decoding. Then, it uses a BK-tree to retrieve similar words, scores them and finally returns the best scoring word. The BK-tree is created by providing a list of dictionary words. A tolerance parameter defines the maximum edit distance from the query word to the returned dictionary words.

from ctc_decoder import lexicon_search, BKTree

# create BK-tree from a list of words
bk_tree = BKTree(['words', 'from', 'a', 'dictionary'])

# and use the tree in the lexicon search
res = lexicon_search(mat, chars, bk_tree, tolerance=2)

Usage with deep learning frameworks

Some notes:

No adapter for TensorFlow or PyTorch is provided
Apply softmax already in the model
Convert to numpy array
Usually, the output of an RNN layer rnn_output has shape TxBxC, with B the batch dimension
- Decoders work on single batch elements of shape TxC
- Therefore, iterate over all batch elements and apply the decoder to each of them separately
- Example: extract matrix of batch element 0 mat = rnn_output[:, 0, :]
The CTC-blank is expected to be the last element along the character dimension
- TensorFlow has the CTC-blank as last element, so nothing to do here
- PyTorch, however, has the CTC-blank as first element by default, so you have to move it to the end, or change the default setting

List of provided decoders

Recommended decoders:

best_path: best path (or greedy) decoder, the fastest of all algorithms, however, other decoders often perform better
beam_search: beam search decoder, optionally integrates a character-level language model, can be tuned via the beam width parameter
lexicon_search: lexicon search decoder, returns the best scoring word from a dictionary

Other decoders, from my experience not really suited for practical purposes, but might be used for experiments or research:

prefix_search: prefix search decoder
token_passing: token passing algorithm
Best path decoder implementation in OpenCL (see extras/ folder)

This paper gives suggestions when to use best path decoding, beam search decoding and token passing.

Documentation of test cases and data

Documentation of test cases
Documentation of the data

References

Comments

Handling duplicate paths in Beam Search

Hey! I am wondering if you could help me to figure something out, in [1] you mentioned that summing up probabilities for Pr, Pr+ and Pr- leads to better results, I tried it in my implementation based on [2], and it does get better, but my probabilities are getting positive (I am working with log probabilities, so the values should be between (-inf, 0]). Did you experienced this phenomenon while implementing the sum in your algorithm?

[1] Stackexchange CTC [2] CTC implementation github

PS: Sorry if this is not the place to make this question, but I have no other way to reach you.

opened by giovannirescia 7
difference between this repo and ctc decoder in tensorflow

Thanks for your code. I test my sequence with both ctc decoder in tensorflow and this repo.I always get different result. Tensorflow is always right.This repo sometimes return right and sometimes return wrong. Have you ever compare these two implementation?

opened by cosimo17 4
Language model at word level

Hi, did you add word level language model for beam search?

Currently its easy to add character level bi-gram, but I find it much harder to add word level. I tried CTC token passing algorithm but its just way too slow comparing beam search.

opened by marcoleewow 4
Support for K-Gram LM where k > 2?

I wanted to experiment with the LMs other than Bigram. Any suggestion on how to approach extending current codebase? or the probability of the last character conditioned on all previous character.

opened by Sushil-Thapa 3

beam_search with a beam_width=1

Hi! Could you please tell, why does a beam_search with a beam_width equal to 1 not give the same result as best_path?

For example

import numpy as np
from ctc_decoder import best_path, beam_search

chars = 'ab'
mat = np.array([[0.8, 0, 0.2], [0.4, 0.0, 0.6], [0.8, 0, 0.2]])

print(f'Best path: "{best_path(mat, chars)}"')
print(f'Beam search: "{beam_search(mat, chars, beam_width=1)}"')

Gives: Best path: "aa" Beam search: "a"

Thanks!

opened by skalinin 2

in beamsearch.py，why last.norm() only use in the last step？

It's a great job, thanks to the author.

i have a question in beam search +lm ,why last.norm() only use in the last step ? why not use last.norm() in every time step ? the long the seq the lm is small,so it should be compensate by length norm, i think it should be norm every time step ，is it right ? thanks in advance

opened by l2009312042 2

Fixed transition to new words skipping probabilities

I got several problems on rather trivial dictionaries and charsets (see below):

Without pull-request:

Greedy-Decoder:  # 4 3 5 4 2 3 5 6 7 6 7 5 4 5 4 3 5 . 2 3
Token-Passing:   # 4 3 5 4 2 5 6 7 7 5 4 5 4 3 . 2 3

With pull-request (as expected):

Greedy-Decoder:  # 4 3 5 4 2 3 5 6 7 6 7 5 4 5 4 3 5 . 2 3
Token-Passing:   # 4 3 5 4 2 3 5 6 7 6 7 5 4 5 4 3 5 . 2 3

Test code (copy the snipped in TokenPassing.py and use all files in the appendix)

with open("charset.txt") as f:
	classes = f.read().split("\n")
with open("dictionary.txt") as f:
	dictionary = f.read().split("\n")
probabilities = np.load("propabilities.npy")

# greedy
from itertools import groupby
best_path = np.argmax(probabilities, axis=1)
blank_idx = 0
best_chars_collapsed = [classes[k] for k, _ in groupby(best_path) if k != blank_idx]
res = ' '.join(best_chars_collapsed)
print("Greedy-Decoder: ", res)

# token passing
tp = ctcTokenPassing(probabilities, classes, dictionary, blankIdx=blank_idx)
print("Token-Passing:  ", tp)

import matplotlib.pyplot as plt
plt.imshow(np.log(probabilities))
plt.show()

The problem is that the current code allows to skip a single output on the first transition (0->1,2) since toks.get(wIdx, s - 1, t - 1) is chosen with does not include the probability at t. Using toks.get(wIdx, s - 1, t) instead fixes this, because this includes the emission probability for blank.

There might be another crucial issue if a there is no blank between two words. I had no time to check this, though.

Appendix: probabilities

charset.txt dictionary.txt propabilities.zip

opened by ChWick 2

best_path is better than beam_search in my experiments，is there something wrong with my network？

Hi, I use cnn + lstm + ctc in my license plate recognition system。And I use your python scripts to test the precision of "best path decoding" and "beam search decoding"。The result is "best path" got 92.3% while "beam search" got 91.7%。In my experiments, time step equals to 21，and the total num of labels equals to 77。 Could you give me some advice? Thanks a lot!

opened by shaoming20798 2
Question about language model initialization

Hi, Thanks for the sharing. I have a question in language model initialization function. Why do you set numSamples[c] to be len(classes) instead of 0? https://github.com/githubharald/CTCDecoder/blob/d6a88ea5b3187fe5c81508d377b32eb6c860b868/src/LanguageModel.py#L31

opened by xuannianz 2
module 'pyopencl' has no attribute 'enqueue_write_buffer

I am having a difficult time using the GPU. it runs ok with out GPU but with I get this error

=====Line example (GPU)===== Traceback (most recent call last): File "main.py", line 147, in testLineExampleGPU() File "main.py", line 122, in testLineExampleGPU resBatch = BestPathCL.ctcBestPathCL(batch, classes, clWrapper) File "/home/ubuntu/handwrite/CTCDecoder/src/BestPathCL.py", line 109, in ctcBestPathCL labelStrBatch = clWrapper.compute(batch) File "/home/ubuntu/handwrite/CTCDecoder/src/BestPathCL.py", line 84, in compute cl.enqueue_write_buffer(self.queue, self.batchBuf, batch.astype(np.float32), is_blocking=False) AttributeError: module 'pyopencl' has no attribute 'enqueue_write_buffer'

opened by greenpdx 2

Different beam search output using different blankIdx value

Hi, I have a question regarding your beam search implementation.

On your ctcBeamSearch method, you put value on blankIdx equals to length of the classes (in this case, the known letters and symbols). But on some other beam-search implementation, they put zero on it.

I tested this using your example, and indeed it differs both in decoded result and how far is it from the ground truth (i'm using CER and WER)

=====Line example (using blankIdx = len(classes))=====
TARGET                  : "the fake friend of the family, like the"
BEAM SEARCH             : "the fak friend of the fomcly hae tC" CER: CER/WER: 0.25714/0.15000
BEAM SEARCH LM          : "the fake friend of the family, lie th" CER: CER/WER: 0.05405/0.03226
=====Line example (using blankIdx=0)=====
TARGET                  : "the fake friend of the family, like the"
BEAM SEARCH             : "the faetker friend of ther foarmnacly,  harse. tHhC." CER: CER/WER: 0.33333/0.22368
BEAM SEARCH LM          : "the fake friend of the family, like the " CER: CER/WER: 0.00000/0.00000

So is there a different case where the blankIdx is not zero? Which value is suitable for beam search decoding?

opened by miqbal23 1

Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.

Related tags

Overview

CTC Decoding Algorithms

Installation

Usage

Basic usage

Language model and BK-tree

Usage with deep learning frameworks

List of provided decoders

Documentation of test cases and data

References

Comments

Owner

Harald Scheidl

Persian-lexicon - A lexicon of 70K unique Persian (Farsi) words

Incorporating KenLM language model with HuggingFace implementation of Wav2Vec2CTC Model using beam search decoding

multi-label，classifier，text classification，多标签文本分类，文本分类，BERT，ALBERT，multi-label-classification，seq2seq，attention，beam search

VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.

VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

hashily is a Python module that provides a variety of text decoding and encoding operations.

pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.

Long text token classification using LongFormer

Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/

Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

Implemented shortest-circuit disambiguation, maximum probability disambiguation, HMM-based lexical annotation and BiLSTM+CRF-based named entity recognition

NLPShala , the best IDE for all Natural language processing tasks.

Gpt2-WebAPI - The objective of this API is to provide the 3 best possible responses to sentences that the user would input via http GET request as a parameter

A program that uses real statistics to choose the best times to bet on BloxFlip's crash gamemode

Python code for ICLR 2022 spotlight paper EViT: Expediting Vision Transformers via Token Reorganizations

Sequence modeling benchmarks and temporal convolutional networks