Convolutional Recurrent Neural Network (CRNN) for image-based sequence recognition.

Overview

Convolutional Recurrent Neural Network

This software implements the Convolutional Recurrent Neural Network (CRNN), a combination of CNN, RNN and CTC loss for image-based sequence recognition tasks, such as scene text recognition and OCR. For details, please refer to our paper http://arxiv.org/abs/1507.05717.

UPDATE Mar 14, 2017 A Docker file has been added to the project. Thanks to @varun-suresh.

UPDATE May 1, 2017 A PyTorch port has been made by @meijieru.

UPDATE Jun 19, 2017 For an end-to-end text detector+recognizer, check out the CTPN+CRNN implementation by @AKSHAYUBHAT.

Build

The software has only been tested on Ubuntu 14.04 (x64). CUDA-enabled GPUs are required. To build the project, first install the latest versions of Torch7, fblualib and LMDB. Please follow their installation instructions respectively. On Ubuntu, lmdb can be installed by apt-get install liblmdb-dev.

To build the project, go to src/ and execute sh build_cpp.sh to build the C++ code. If successful, a file named libcrnn.so should be produced in the src/ directory.

Run demo

A demo program can be found in src/demo.lua. Before running the demo, download a pretrained model from here. Put the downloaded model file crnn_demo_model.t7 into directory model/crnn_demo/. Then launch the demo by:

th demo.lua

The demo reads an example image and recognizes its text content.

Example image: Example Image

Expected output:

Loading model...
Model loaded from ../model/crnn_demo/model.t7
Recognized text: available (raw: a-----v--a-i-l-a-bb-l-e---)

Another example: Example Image2

Recognized text: shakeshack (raw: ss-h-a--k-e-ssh--aa-c--k--)

Use pretrained model

The pretrained model can be used for lexicon-free and lexicon-based recognition tasks. Refer to the functions recognizeImageLexiconFree and recognizeImageWithLexicion in file utilities.lua for details.

Train a new model

Follow the following steps to train a new model on your own dataset.

  1. Create a new LMDB dataset. A python program is provided in tool/create_dataset.py. Refer to the function createDataset for details (need to pip install lmdb first).
  2. Create model directory under model/. For example, model/foo_model. Then create configuraton file config.lua under the model directory. You can copy model/crnn_demo/config.lua and do modifications.
  3. Go to src/ and execute th main_train.lua ../models/foo_model/. Model snapshots and logging file will be saved into the model directory.

Build using docker

  1. Install docker. Follow the instructions here
  2. Install nvidia-docker - Follow the instructions here
  3. Clone this repo, from this directory run docker build -t crnn_docker .
  4. Once the image is built, the docker can be run using nvidia-docker run -it crnn_docker.

Citation

Please cite the following paper if you are using the code/model in your research paper.

@article{ShiBY17,
  author    = {Baoguang Shi and
               Xiang Bai and
               Cong Yao},
  title     = {An End-to-End Trainable Neural Network for Image-Based Sequence Recognition
               and Its Application to Scene Text Recognition},
  journal   = {{IEEE} Trans. Pattern Anal. Mach. Intell.},
  volume    = {39},
  number    = {11},
  pages     = {2298--2304},
  year      = {2017}
}

Acknowledgements

The authors would like to thank the developers of Torch7, TH++, lmdb-lua-ffi and char-rnn.

Please let me know if you encounter any issues.

Comments
  • Work on line level instead of word level

    Work on line level instead of word level

    Hello,

    I'm wondering if it is possible to work on complete lines instead of just on a word level. It can be hard to segment words in a line as a preprocessing step (depending on how "nice" the handwritten text looks). So if adding a new class, which I call "real blank" (not the CTC pseudo blank), then it should in theory be possible. Has anyone already tried this? In my experience so far, CTC works best with short sequences, but a line can be pretty long. Or is there some other way to avoid the preprocessing step of word segmentation?

    How did you do this with the data from the ICFHR2016 Competition (*) in which CRNN was used?

    (*) https://scriptnet.iit.demokritos.gr/competitions/4/

    opened by githubharald 20
  • Only 62% accuracy.How to make it better?

    Only 62% accuracy.How to make it better?

    I train a new model and use crnn_demo_model.t7 as a pretrained model.Now iteration is 30000,train loss is 0.0005,but test loss is 4.6.And accuracy is around 62% for a long time. Is anything wrong with it? What can i do to make it better? @bgshih

    opened by ll36771 17
  • Compilation error with the current TH++

    Compilation error with the current TH++

    Hi, I get compilation errors when building the cpp part. I suspect that I have a wrong version of TH++ but there does not seem to be a better candidate in the thpp project history (v1.0 seems too old).

    Also, I'm building it on a newer Ubuntu (15.10), but I don't see how this could cause the following compilation errors. I tried both g++ version 4.9 and 5.2. Any hint appreciated!

    ~/crnn/src$ ./build_cpp.sh -- The C compiler identification is GNU 4.9.3 -- The CXX compiler identification is GNU 4.9.3 [...] -- Try OpenMP C flag = [-fopenmp] -- Performing Test OpenMP_FLAG_DETECTED -- Performing Test OpenMP_FLAG_DETECTED - Success -- Try OpenMP CXX flag = [-fopenmp] -- Performing Test OpenMP_FLAG_DETECTED -- Performing Test OpenMP_FLAG_DETECTED - Success -- Found OpenMP: -fopenmp
    -- Configuring done -- Generating done -- Build files have been written to: /home/alena/crnn/src/cpp/build Scanning dependencies of target crnn [ 50%] Building CXX object CMakeFiles/crnn.dir/init.cpp.o [100%] Building CXX object CMakeFiles/crnn.dir/ctc.cpp.o /home/alena/crnn/src/cpp/ctc.cpp: In instantiation of ‘int {anonymous}::forwardBackward(lua_State_) [with T = float; lua_State = lua_State]’: /home/alena/crnn/src/cpp/ctc.cpp:194:16: required from ‘const luaL_Reg {anonymous}::Registerer::functions_ [3]’ /home/alena/crnn/src/cpp/ctc.cpp:203:44: required from ‘static void {anonymous}::Registerer::registerFunctions(lua_State_) [with T = float; lua_State = lua_State]’ /home/alena/crnn/src/cpp/ctc.cpp:210:24: required from here /home/alena/crnn/src/cpp/ctc.cpp:22:76: error: conversion from ‘thpp::TensorBase<float, thpp::Storage, thpp::Tensor >::Ptr {aka thpp::TensorPtrthpp::Tensor}’ to non-scalar type ‘const thpp::Tensor’ requested const thpp::Tensor input = fblualib::luaGetTensorChecked(L, 1); ^ /home/alena/crnn/src/cpp/ctc.cpp:23:78: error: conversion from ‘thpp::TensorBase<int, thpp::Storage, thpp::Tensor >::Ptr {aka thpp::TensorPtrthpp::Tensor}’ to non-scalar type ‘const thpp::Tensor’ requested const thpp::Tensor targets = fblualib::luaGetTensorChecked(L, 2); ^

    opened by moraval 15
  • CMake Error THC_LIBRARY notfound

    CMake Error THC_LIBRARY notfound

    There is an erro when i execute sh build_cpp.sh .THC_LIBRARY NOTFOUND? CMake Error: The following variables are used in this project, but they are set to NOTFOUND. Please set them or make sure they are set and tested correctly in the CMake files: THC_LIBRARY linked by target "crnn" in directory /root/crnn/src/cpp

    -- Configuring incomplete, errors occurred! See also "/root/crnn/src/cpp/build/CMakeFiles/CMakeOutput.log".

    I would like to know what is thc_library?where could i get thc_library?

    opened by ll36771 14
  • The issue of creating a dataset with create_dataset.py with python3.5

    The issue of creating a dataset with create_dataset.py with python3.5

    Hello, l'm trying to create my own dataset to train my model from scratch. I'm working with python3.5.2 .While l'm creating a dataset l encountered the following problems. 1)

    with open(imagePath, 'r') as f:
         imageBin = f.read()
    

    returns the following error :

    codecs.py", line 321, in decode
        (result, consumed) = self._buffer_decode(data, self.errors, final)
    UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
    

    l fixed it as follow :

    
    with open(imagePath, 'rb') as f:
           imageBin = f.read()
    
    

    2)for python3

    for i in xrange(nSamples):should be replaced by for i in range(nSamples): and for k, v in cache.iteritems():should be replaced byfor k, v in cache.items():

    However, l got the following error at this line :
    

    txn.put(k, v)in def writeCache(env, cache):function

    line 86, in createDataset
       writeCache(env, cache)
     File "/home/ahmed/Downloads/crnn.pytorch-master/data-processing.py", line 46, in writeCache
       txn.put(k, v)
    TypeError: Won't implicitly convert Unicode to bytes; use .encode()
    
    

    Here is my whole code :

    import os
    import lmdb # install lmdb by "pip install lmdb"
    import cv2
    import numpy as np
    import glob
    
    real_path='/home/ahmed/Downloads/sample/train/'
    path='/home/ahmed/Downloads/sample/'
    path_train='train/'
    
    path_output='/home/ahmed/Downloads/sample/output/'
    
    
    
    os.chdir(path+path_train)
    images_train = glob.glob("*.jpg")
    left_train,labels_train,right_train = list(zip(*[os.path.splitext(x)[0].split('_')
                                             for x in images_train]))
    
    
    
    def checkImageIsValid(imageBin):
        if imageBin is None:
            return False
        imageBuf = np.fromstring(imageBin, dtype=np.uint8)
        img = cv2.imdecode(imageBuf, cv2.IMREAD_GRAYSCALE)
        imgH, imgW = img.shape[0], img.shape[1]
        if imgH * imgW == 0:
            return False
        return True
    
    
    def writeCache(env, cache):
        with env.begin(write=True) as txn:
            for k, v in cache.items():
                txn.put(k, v)
    
    
    def createDataset(outputPath,images_train, labels_train, lexiconList=None, checkValid=True):
        """
        Create LMDB dataset for CRNN training.
        ARGS:
            outputPath    : LMDB output path
            imagePathList : list of image path
            labelList     : list of corresponding groundtruth texts
            lexiconList   : (optional) list of lexicon lists
            checkValid    : if true, check the validity of every image
        """
        assert(len(images_train) == len(labels_train))
        nSamples = len(images_train)
    
        env = lmdb.open(path_output, map_size=1099511627776)
        cache = {}
        cnt = 1
        for i in range(nSamples):
            imagePath = images_train[i]
            label = labels_train[i]
            if not os.path.exists(real_path+imagePath):
                print('%s does not exist' % imagePath)
                continue
            with open(real_path+imagePath, 'rb') as f:
                imageBin = f.read()
            if checkValid:
                if not checkImageIsValid(imageBin):
                    print('%s is not a valid image' % imagePath)
                    continue
    
            imageKey = 'image-%09d' % cnt
            labelKey = 'label-%09d' % cnt
            cache[imageKey] = imageBin
            cache[labelKey] = label
            if lexiconList:
                lexiconKey = 'lexicon-%09d' % cnt
                cache[lexiconKey] = ' '.join(lexiconList[i])
            if cnt % 1000 == 0:
                writeCache(env, cache)
                cache = {}
                print('Written %d / %d' % (cnt, nSamples))
            cnt += 1
        nSamples = cnt-1
        cache['num-samples'] = str(nSamples)
        writeCache(env, cache)
        print('Created dataset with %d samples' % nSamples)
    
    
    if __name__ == '__main__':
        createDataset(path_output,images_train, labels_train)
    
    

    what's wrong with

    txn.put(k, v) in def writeCache(env, cache): function

    
    line 86, in createDataset
       writeCache(env, cache)
     File "/home/ahmed/Downloads/crnn.pytorch-master/data-processing.py", line 46, in writeCache
       txn.put(k, v)
    TypeError: Won't implicitly convert Unicode to bytes; use .encode()
    
    opened by ahmedmazari-dhatim 12
  • fblualib version

    fblualib version

    Hello,

    thanks for sharing CRNN! Unfortunately installing fblualib is a bit of a pain. We're trying to install it on Ubuntu 14.04. Could you tell me which version of fblualib and dependencies (folly, wangle, thrift, thpp) you are using? I think using exactly the same versions as you did would be the easiest way to get everything running.

    Thanks in advance Harald

    opened by githubharald 10
  • The recognition result is very bad,such as the following iamges

    The recognition result is very bad,such as the following iamges

    image result is like that: Recognized text: vaossyoba (raw: v--a----o-ss-s--y-o-b--a--)

    and many simple image ,the result is very bad? anyone can tell me why?

    opened by Jayhello 10
  • Problems at training OMR dataset

    Problems at training OMR dataset

    I tried to train crnn on dataset PitchRec_dataset\OMRB\TrainSet (2345 images), then got "train loss = inf" just like:

    [05/23/16 11:18:15] Loading datasets... [05/23/16 11:18:15] Start training... [05/23/16 11:18:15] Validating... [05/23/16 11:18:17] Test loss = 75.251866, accuracy = 0.000000 [05/23/16 11:18:17] dddddddddddddddddddddddddd => d (GT:aadee ) [05/23/16 11:18:17] dd1ddddddddddddddddddddddd => d1d (GT:ghihhi ) [05/23/16 11:18:17] 7ddddddddddddddddddddddddh => 7dh (GT:dcbdddfdfg ) [05/23/16 11:18:17] dd1dddd111111111111111111h => d1d1h (GT:gfggfghhhj ) [05/23/16 11:18:17] dd11dddddddddddddddddddddd => d1d (GT:ihfcba ) [05/23/16 11:18:17] 711ddddddddddddddddddddddh => 71dh (GT:ihgh ) [05/23/16 11:18:17] d11111111111ddddd111111ddh => d1d1dh (GT:fjjejjdjjc ) [05/23/16 11:18:17] 7d1111dddddddddddd11dd111h => 7d1d1d1h (GT:gccbcchgg ) [05/23/16 11:18:17] dddddddddddddddddddddddddh => dh (GT:adeef ) [05/23/16 11:18:17] dddddddddddddddddddddddddh => dh (GT:fgfe ) [05/23/16 11:18:17] 7d11ddddddddddddddddd1111h => 7d1d1h (GT:iffefhg ) [05/23/16 11:18:17] dddddddddddddddddddddddddd => d (GT:edcb ) [05/23/16 11:18:17] dddddddddddddddddddddddd1h => d1h (GT:fhhfhhcee ) [05/23/16 11:18:17] dddddddddddddddddddddddddh => dh (GT:bccddded ) [05/23/16 11:18:17] dd1dddddddddddddd1111ddddh => d1d1dh (GT:chi ) [05/23/16 11:18:47] Iteration 100 - train loss = inf [05/23/16 11:19:17] Iteration 200 - train loss = inf [05/23/16 11:19:47] Iteration 300 - train loss = inf [05/23/16 11:20:17] Iteration 400 - train loss = inf [05/23/16 11:20:47] Iteration 500 - train loss = inf [05/23/16 11:21:17] Iteration 600 - train loss = inf [05/23/16 11:21:45] Iteration 700 - train loss = inf [05/23/16 11:22:11] Iteration 800 - train loss = inf [05/23/16 11:22:37] Iteration 900 - train loss = inf [05/23/16 11:23:04] Iteration 1000 - train loss = inf

    My config.lua is as follows: function getConfig() local config = { nClasses = 36, maxT = 26, displayInterval = 100, testInterval = 1000, nTestDisplay = 15, trainBatchSize = 64, valBatchSize = 100, snapshotInterval = 1000, maxIterations = 2000000, optimMethod = optim.adadelta, optimConfig = {}, trainSetPath = '../../PitchRec_dataset/LMDB/TrainSet/data.mdb', valSetPath = '../../PitchRec_dataset/LMDB/Synthesized/data.mdb', } return config end

    function createModel(config) local nc = config.nClasses local nl = nc + 1 local nt = config.maxT

    local ks = {3, 3, 3, 3, 3, 3, 2}
    local ps = {1, 1, 1, 1, 1, 1, 0}
    local ss = {1, 1, 1, 1, 1, 1, 1}
    local nm = {64, 128, 256, 256, 512, 512, 512}
    local nh = {256, 256}
    
    function convRelu(i, batchNormalization)
        batchNormalization = batchNormalization or false
        local nIn = nm[i-1] or 1
        local nOut = nm[i]
        local subModel = nn.Sequential()
        local conv = cudnn.SpatialConvolution(nIn, nOut, ks[i], ks[i], ss[i], ss[i], ps[i], ps[i])
        subModel:add(conv)
        if batchNormalization then
            subModel:add(nn.SpatialBatchNormalization(nOut))
        end
        subModel:add(cudnn.ReLU(true))
        return subModel
    end
    
    function bidirectionalLSTM(nIn, nHidden, nOut, maxT)
        local fwdLstm = nn.LstmLayer(nIn, nHidden, maxT, 0, false)
        local bwdLstm = nn.LstmLayer(nIn, nHidden, maxT, 0, true)
        local ct = nn.ConcatTable():add(fwdLstm):add(bwdLstm)
        local blstm = nn.Sequential():add(ct):add(nn.BiRnnJoin(nHidden, nOut, maxT))
        return blstm
    end
    
    -- model and criterion
    local model = nn.Sequential()
    model:add(nn.Copy('torch.ByteTensor', 'torch.CudaTensor', false, true))
    model:add(nn.AddConstant(-128.0))
    model:add(nn.MulConstant(1.0 / 128))
    model:add(convRelu(1))
    model:add(cudnn.SpatialMaxPooling(2, 2, 2, 2))       -- 64x16x50
    model:add(convRelu(2))
    model:add(cudnn.SpatialMaxPooling(2, 2, 2, 2))       -- 128x8x25
    model:add(convRelu(3, true))
    model:add(convRelu(4))
    model:add(cudnn.SpatialMaxPooling(2, 2, 1, 2, 1, 0)) -- 256x4x?
    model:add(convRelu(5, true))
    model:add(convRelu(6))
    model:add(cudnn.SpatialMaxPooling(2, 2, 1, 2, 1, 0)) -- 512x2x26
    model:add(convRelu(7, true))                         -- 512x1x26
    model:add(nn.View(512, -1):setNumInputDims(3))       -- 512x26
    model:add(nn.Transpose({2, 3}))                      -- 26x512
    model:add(nn.SplitTable(2, 3))
    model:add(bidirectionalLSTM(512, 256, 256, nt))
    model:add(bidirectionalLSTM(256, 256,  nl, nt))
    model:add(nn.SharedParallelTable(nn.LogSoftMax(), nt))
    model:add(nn.JoinTable(1, 1))
    model:add(nn.View(-1, nl):setNumInputDims(1))
    model:add(nn.Copy('torch.CudaTensor', 'torch.FloatTensor', false, true))
    model:cuda()
    local criterion = nn.CtcCriterion()
    
    return model, criterion
    

    end

    Then I tried to remove the 4th and 6th conv by referring to your paper (BiLSTMs are not replaced singleLSTM, because I'm new to torch. ), but it doesn't work.

    Thanks.

    opened by chengzhanzhan 10
  • Question for CTC decoding

    Question for CTC decoding

    I don't clear that why the first position of time sequence always predicted as the first char of given string, just like:

    default

    I think that the first lable for most given sequence should be a blank space, such as

    imgres

    What do you think?

    opened by chengzhanzhan 8
  • Compilation failed, what was the environment for your program? Much thanks

    Compilation failed, what was the environment for your program? Much thanks

    @bgshih Dear author

    As we are trying the reproduce your work, we found that the program is no longer compatible with latest folly, fbthrift, thpp, and fblualib. Solution in #1 is no longer workable either due to that version is no longer compatible with latest Ubuntu 16.04.

    If update your program to the latest thpp, fblualib, folly and fbthrift requires too much effort, can you let us know what was the environment for your program? The version of: linux (ubuntu 14.04, 15.04, 16.04, etc.?) torch (torch 7?) folly (roughly which time period?) fbthrift (roughly which time period?) thpp (roughly which time period?) fblualib (roughly which time period?) cuda cudnn

    Thank you

    opened by Suyuanhang 7
  • Make error

    Make error

    CMake Error: The following variables are used in this project, but they are set to NOTFOUND. Please set them or make sure they are set and tested correctly in the CMake files: THPP_LIBRARY linked by target "crnn" in directory /home/ce/Documents/crnn/src/cpp

    -- Configuring incomplete, errors occurred! See also "/home/ce/Documents/crnn/src/cpp/build/CMakeFiles/CMakeOutput.log". make: *** No targets specified and no makefile found. Stop. cp: cannot stat ‘*.so’: No such file or directory

    opened by rremani 7
  • Creating lmdb dataset in google colab

    Creating lmdb dataset in google colab

    Hi everyone, i followed the instruction on how to train a new model,but I didn't understand well how should I create my own database and run the following command ? can somebody explain this to me?

    python train.py --adadelta --trainRoot {train_path} --valRoot {val_path} --cuda

    1)what should I replace with {train_path} and {val_path}? 2)th main_train.lua ../models/foo_model/ in google colab raise an error which is "/bin/bash: th: command not found" can you help me?

    opened by iammobina 1
  • manifest unknown -> manifest for kaixhin/cuda-torch:latest not found: manifest unknown: manifest unknown

    manifest unknown -> manifest for kaixhin/cuda-torch:latest not found: manifest unknown: manifest unknown

    ``Sending build context to Docker daemon 311.8kB Step 1/11 : FROM kaixhin/cuda-torch manifest for kaixhin/cuda-torch:latest not found: manifest unknown: manifest unknown

    opened by jhanvi22 0
  • Outdated docker

    Outdated docker

    Unfortunately the docker (kaixhin:cuda-torch) is outdated. I also didn't succeed to install folly. Is there any recommendation e.g. if there is any alternative package ? or if you can make the code folly independent ?

    opened by naarkhoo 0
  • which dataset I can use for crnn training ?

    which dataset I can use for crnn training ?

    Hi, everyone:

    Could anybody tell me which dataset i can download for training this crnn code ? thanks a lot ! I nedd crnn dataset not ctpn detection dataset.

    opened by Yaoxingtian 2
  • Data Requirement

    Data Requirement

    I am planning to train my own data which consists of handwritten numbers. I have been through the paper, and CRNN is trained on 8 million images. I wanted to get an idea of the amount of data that I will require to train a CRNN for handwritten digits.

    opened by ApurvaDani 0
Owner
Baoguang Shi
Researcher at Microsoft
Baoguang Shi
Use Convolutional Recurrent Neural Network to recognize the Handwritten line text image without pre segmentation into words or characters. Use CTC loss Function to train.

Handwritten Line Text Recognition using Deep Learning with Tensorflow Description Use Convolutional Recurrent Neural Network to recognize the Handwrit

sushant097 224 Jan 7, 2023
[python3.6] 运用tf实现自然场景文字检测,keras/pytorch实现ctpn+crnn+ctc实现不定长场景文字OCR识别

本文基于tensorflow、keras/pytorch实现对自然场景的文字检测及端到端的OCR中文文字识别 update20190706 为解决本项目中对数学公式预测的准确性,做了其他的改进和尝试,效果还不错,https://github.com/xiaofengShi/Image2Katex 希

xiaofeng 2.7k Dec 25, 2022
This is the implementation of the paper "Gated Recurrent Convolution Neural Network for OCR"

Gated Recurrent Convolution Neural Network for OCR This project is an implementation of the GRCNN for OCR. For details, please refer to the paper: htt

null 90 Dec 22, 2022
Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

STN-OCR: A single Neural Network for Text Detection and Text Recognition This repository contains the code for the paper: STN-OCR: A single Neural Net

Christian Bartz 496 Jan 5, 2023
Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words.

Handwritten-Text-Recognition Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. T

null 27 Jan 8, 2023
A facial recognition device is a device that takes an image or a video of a human face and compares it to another image faces in a database.

A facial recognition device is a device that takes an image or a video of a human face and compares it to another image faces in a database. The structure, shape and proportions of the faces are compared during the face recognition steps.

Pavankumar Khot 4 Mar 19, 2022
A curated list of resources for text detection/recognition (optical character recognition ) with deep learning methods.

awesome-deep-text-detection-recognition A curated list of awesome deep learning based papers on text detection and recognition. Text Detection Papers

null 2.4k Jan 8, 2023
Text recognition (optical character recognition) with deep learning methods.

What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis | paper | training and evaluation data | failure cases and cle

Clova AI Research 3.2k Jan 4, 2023
Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform sign language recognition.

Sign Language Recognition Service This is a Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform s

Martin Lønne 1 Jan 8, 2022
MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition

MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition Python 2.7 Python 3.6 MORAN is a network with rectification mechanism for

Canjie Luo 595 Dec 27, 2022
EQFace: An implementation of EQFace: A Simple Explicit Quality Network for Face Recognition

EQFace: A Simple Explicit Quality Network for Face Recognition The first face recognition network that generates explicit face quality online.

DeepCam Shenzhen 141 Dec 31, 2022
This repository lets you train neural networks models for performing end-to-end full-page handwriting recognition using the Apache MXNet deep learning frameworks on the IAM Dataset.

Handwritten Text Recognition (OCR) with MXNet Gluon These notebooks have been created by Jonathan Chung, as part of his internship as Applied Scientis

Amazon Web Services - Labs 422 Jan 3, 2023
Table recognition inside douments using neural networks

TableTrainNet A simple project for training and testing table recognition in documents. This project was developed to make a neural network which reco

Giovanni Cavallin 93 Jul 24, 2022
Extract tables from scanned image PDFs using Optical Character Recognition.

ocr-table This project aims to extract tables from scanned image PDFs using Optical Character Recognition. Install Requirements Tesseract OCR sudo apt

Abhijeet Singh 209 Dec 6, 2022
A general list of resources to image text localization and recognition 场景文本位置感知与识别的论文资源与实现合集 シーンテキストの位置認識と識別のための論文リソースの要約

Scene Text Localization & Recognition Resources Read this institute-wise: English, 简体中文. Read this year-wise: English, 简体中文. Tags: [STL] (Scene Text L

Karl Lok (Zhaokai Luo) 901 Dec 11, 2022
RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition

RepMLP RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition Released the code of RepMLP together with an example o

null 260 Jan 3, 2023
Isearch (OSINT) 🔎 Face recognition reverse image search on Instagram profile feed photos.

isearch is an OSINT tool on Instagram. Offers a face recognition reverse image search on Instagram profile feed photos.

Malek salem 20 Oct 25, 2022
Image Recognition Model Generator

Takes a user-inputted query and generates a machine learning image recognition model that determines if an inputted image is or isn't their query

Christopher Oka 1 Jan 13, 2022
Scene text detection and recognition based on Extremal Region(ER)

Scene text recognition A real-time scene text recognition algorithm. Our system is able to recognize text in unconstrain background. This algorithm is

HSIEH, YI CHIA 155 Dec 6, 2022