Papers, Datasets, Algorithms, SOTA for STR. Long-time Maintaining

Related tags

Computer Vision Scene-Text-Recognition-Recommendations

Overview

Scene Text Recognition Recommendations

Everythin about Scene Text Recognition

SOTA • Papers • Datasets • Code

1. Papers
2. Datasets
- 2.1 Synthetic Datasets
- 2.2 Benchmarks
3. Public Code
- 3.1 Frameworks
- 3.2 Algorithms
4.SOTA

1.Papers

All Papers Can be Find Here

Latest Papers:

up to (2021-12-8)

arXiv-2021/12/1:Visual-Semantic Transformer for Scene Text Recognition

up to (2021-12-3)

arXiv-2021/11/30:Multi-modal Text Recognition Networks: Interactive Enhancements between Visual and Semantic Features
- 引入语言模型，比肩ABINet
arXiv-2021/11/24: Decoupling Visual-Semantic Feature Learning for Robust Scene Text Recognition
- 华科阿里共同提出，将视觉和语义分开，解决vocabulary reliance问题
arXiv-2021/1122: CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition

up to (2021-11-25)

ICCV-2021 Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition
- 多阶段+transformer识别器
ICCV-2021 From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network
- 提出了一个新的遮挡文字数据集
- 弱监督的将语言模型融入进视觉模型中
ICCV-2021 Text is Text, No Matter What: Unifying Text Recognition using Knowledge Distillation
- 使用知识蒸馏将场景文字识别网络和手写体识别网络融入于一个网络中

2.Datasets

2.1 Synthetic Datasets

Dataset	Description	Examples	BaiduNetdisk link
SynthText	9 million synthetic text instance images from a set of 90k common English words. Words are rendered onto nartural images with random transformations		Scene text datasets(提取码:emco)
MJSynth	6 million synthetic text instances. It's a generation of SynthText.		Scene text datasets(提取码:emco)

2.2 Benchmarks

Dataset	Description	BaiduNetdisk link
IIIT5k-Words(IIIT5K)	3000 test images instances. Take from street scenes and from originally-digital images	Scene text datasets(提取码:emco)
Street View Text(SVT)	647 test images instances. Some images are severely corrupted by noise, blur, and low resolution	Scene text datasets(提取码:emco)
StreetViewText-Perspective(SVT-P)	639 test images instances. It is specifically designed to evaluate perspective distorted textrecognition. It is built based on the original SVT dataset by selecting the images at the sameaddress on Google Street View but with different view angles. Therefore, most text instancesare heavily distorted by the non-frontal view angle.	Scene text datasets(提取码:emco)
ICDAR 2003(IC03)	867 test image instances	Scene text datasets(提取码:mfir)
ICDAR 2013(IC13)	1015 test images instances	Scene text datasets(提取码:emco)
ICDAR 2015(IC15)	2077 test images instances. As text images were taken by Google Glasses without ensuringthe image quality, most of the text is very small, blurred, and multi-oriented	Scene text datasets(提取码:emco)
CUTE80(CUTE)	288 It focuses on curved text recognition. Most images in CUTE have acomplex background, perspective distortion, and poor resolution	Scene text datasets(提取码:emco)

3.1 Public Code

3.1. Frameworks

PaddleOCR (百度)

PaddlePaddle/PaddleOCR
特性 (截取至PaddleOCR)：
- 使用百度自研深度学习框架PaddlePaddle搭建
- PP-OCR系列高质量预训练模型，准确的识别效果
  - 超轻量PP-OCRv2系列：检测（3.1M）+ 方向分类器（1.4M）+ 识别（8.5M）= 13.0M
  - 超轻量PP-OCR mobile移动端系列：检测（3.0M）+方向分类器（1.4M）+ 识别（5.0M）= 9.4M
  - 通用PPOCR server系列：检测（47.1M）+方向分类器（1.4M）+ 识别（94.9M）= 143.4M
  - 支持中英文数字组合识别、竖排文本识别、长文本识别
  - 支持多语言识别：韩语、日语、德语、法语
  - 丰富易用的OCR相关工具组件
- 半自动数据标注工具PPOCRLabel：支持快速高效的数据标注
  - 数据合成工具Style-Text：批量合成大量与目标场景类似的图像
  - 文档分析能力PP-Structure：版面分析与表格识别
  - 支持用户自定义训练，提供丰富的预测推理部署方案
  - 支持PIP快速安装使用
  - 可运行于Linux、Windows、MacOS等多种系统
支持算法(识别):
- CRNN
- Rosetta
- STAR-Net
- RARE
- SRN
- NRTR

MMOCR (商汤)

open-mmlab/mmocr
特性(截取至MMOCR):
- MMOCR 是基于 PyTorch 和 mmdetection 的开源工具箱，专注于文本检测，文本识别以及相应的下游任务，如关键信息提取。它是 OpenMMLab 项目的一部分。
- 该工具箱不仅支持文本检测和文本识别，还支持其下游任务，例如关键信息提取。
支持算法(识别)
- CRNN (TPAMI'2016)
- NRTR (ICDAR'2019)
- RobustScanner (ECCV'2020)
- SAR (AAAI'2019)
- SATRN (CVPR'2020 Workshop on Text and Documents in the Deep Learning Era)
- SegOCR (Manuscript'2021)

Deep Text Recognition Benchmark (ClovaAI)

clovaai/deep-text-recognition-benchmark
特性:
- Offical Pytorch implementation of What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis
- 可自定义四阶段组件，如CRNN，ASTER
- 容易上手，推荐使用

3.2. Algorithms

CRNN

Lua, Offical, 1.9k ⭐ : bgshih/crnn
- 官方实现版本，使用Lua
Pytorch, 1.9k ⭐ : meijeru/crnn.pytorch
- 推荐使用 🀄
Tensorflow, 972 ⭐ :MaybeShewill-CV/CRNN_Tensorflow
Pytorch, 1.4k ⭐ :Sierkinhance/CRNN_Chinese_Characters_Rec
- 用于中文识别版本的CRNN

ASTER

Tensorflow, official, 651 ⭐ : bgshih/aster
- 官方实现版本，使用Tensorflow
Pytorch, 535 ⭐ :ayumuymk/aster.pytorch
- Pytorch版本，准确率相较原文有明显提升

MORANv2

Pytorch, official, 572 ⭐ :Canjie-Luo/MORAN_v2
- MORAN v2版本。更加稳定的单阶段训练，更换ResNet做backbone，使用双向解码器

4.SOTA

		Regular Dataset				Irregular dataset
Model	Year	IIIT	SVT	IC13(857)	IC13(1015)	IC15(1811)	IC15(2077)	SVTP	CUTE
CRNN	2015	78.2	80.8	-	86.7	-	-	-	-
ASTER(L2R)	2015	92.67	91.16	-	90.74	76.1	-	78.76	76.39
CombBest	2019	87.9	87.5	93.6	92.3	77.6	71.8	79.2	74
ESIR	2019	93.3	90.2	-	91.3	-	76.9	79.6	83.3
SE-ASTER	2020	93.8	89.6	-	92.8	80		81.4	83.6
DAN	2020	94.3	89.2	-	93.9	-	74.5	80	84.4
RobustScanner	2020	95.3	88.1	-	94.8	-	77.1	79.5	90.3
AutoSTR	2020	94.7	90.9	-	94.2	81.8	-	81.7	-
Yang et al.	2020	94.7	88.9	-	93.2	79.5	77.1	80.9	85.4
SATRN	2020	92.8	91.3	-	94.1	-	79	86.5	87.8
SRN	2020	94.8	91.5	95.5	-	82.7	-	85.1	87.8
GA-SPIN	2021	95.2	90.9	-	94.8	82.8	79.5	83.2	87.5
PREN2D	2021	95.6	94	96.4	-	83	-	87.6	91.7
Bhunia et al.	2021	95.2	92.2	-	95.5	-	84	85.7	89.7
VisionLAN	2021	95.8	91.7	95.7	-	83.7	-	86	88.5
ABINet	2021	96.2	93.5	97.4	-	86.0	-	89.3	89.2
MATRN	2021	96.7	94.9	97.9	95.8	86.6	82.9	90.5	94.1

Baek's Reimplementation Version

You might also like...

Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform sign language recognition.

Sign Language Recognition Service This is a Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform s

1 Jan 8, 2022

The code for CVPR2022 paper "Likert Scoring with Grade Decoupling for Long-term Action Assessment".

Likert Scoring with Grade Decoupling for Long-term Action Assessment This is the code for CVPR2022 paper "Likert Scoring with Grade Decoupling for Lon

10 Oct 21, 2022

Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.

Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless. This is the official Roboflow python package that interfaces with the Roboflow API.

52 Dec 23, 2022

A dataset handling library for computer vision datasets in LOST-fromat

8 Dec 15, 2022

Pre-Recognize Library - library with algorithms for improving OCR quality.

PRLib - Pre-Recognition Library. The main aim of the library - prepare image for recogntion. Image processing can really help to improve recognition q

80 Dec 30, 2022

Balabobapy - Using artificial intelligence algorithms to continue the text

1 Feb 4, 2022

A real-time dolly zoom camera effect

Dolly-Zoom I've always been amazed by the gradual perspective change of dolly zoom, and I have some experience in python and OpenCV, so I decided to c

52 Dec 8, 2022

End-to-end pipeline for real-time scene text detection and recognition.

Real-time-Scene-Text-Detection-and-Recognition-System End-to-end pipeline for real-time scene text detection and recognition. The detection model use

89 Aug 4, 2022

Developed an AI-based system to control the mouse cursor using Python and OpenCV with the real-time camera.

Developed an AI-based system to control the mouse cursor using Python and OpenCV with the real-time camera. Fingertip location is mapped to RGB images to control the mouse cursor.

71 Dec 20, 2022

Comments

can't convert cuda:0 device type tensor to numpy

Traceback (most recent call last): File "main.py", line 228, in main(args) File "main.py", line 185, in main evaluator.evaluate(test_loader, step=0, tfLogger=eval_tfLogger) File "D:\trainocr\ocr\Scene-Text-Recognition-Recommendations\Framework\lib\evaluators.py", line 70, in evaluate losses = np.sum(losses) / (1.0 * (len(data_loader)-1)*batch_size) File "<array_function internals>", line 180, in sum File "C:\secsys\snake\lib\site-packages\numpy\core\fromnumeric.py", line 2296, in sum return _wrapreduction(a, np.add, 'sum', axis, dtype, out, keepdims=keepdims, File "C:\secsys\snake\lib\site-packages\numpy\core\fromnumeric.py", line 86, in _wrapreduction return ufunc.reduce(obj, axis, dtype, out, **passkwargs) File "C:\secsys\snake\lib\site-packages\torch_tensor.py", line 643, in array return self.numpy() TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

opened by centurions 4
lib dependencies
Hi, when I try to run inference.sh. I stuck with the following error:

File "HCIILAB\lib\evaluation_metrics\metrics.py", line 9, in <module> import editdistance ModuleNotFoundError: No module named 'editdistance'

I guess I can simply remove the line "import editdistance"

and then after rerunning the scripts, I got the following error:

File "HCIILAB\lib\models\model_builder_CTC.py", line 3, in <module> import einops ModuleNotFoundError: No module named 'einops'

I would like to ask which lib dependencies are necessary for this project except pytorch? Or would you please update an requirement.txt file?

Thanks in advance
opened by Ao-Lee 2
i have problem to run your project

hi .i have problem wile running project ,can you help me? this is my error:

pt/anaconda3/bin/python "/Users/ghasempirani/Downloads/Scene-Text-Recognition-Recommendations-main 2/Framewor k/main.py" Traceback (most recent call last): File "/Users/ghasempirani/Downloads/Scene-Text-Recognition-Recommendations-main 2/Framework/main.py", line 22, in from lib.models.model_builder_Attention import ModelBuilder_Att File "/Users/ghasempirani/Downloads/Scene-Text-Recognition-Recommendations-main 2/Framework/lib/models/model_builder_Attention.py", line 6, in from .decoder.attention_recognition_head import AttentionRecognitionHead File "/Users/ghasempirani/Downloads/Scene-Text-Recognition-Recommendations-main 2/Framework/lib/models/decoder/attention_recognition_head.py", line 10, in device =torch.device('cuda' if torch.cuda.is_available() else 'gpu') RuntimeError: Expected one of cpu, cuda, xpu, mkldnn, opengl, opencl, ideep, hip, ve, ort, mlc, xla, lazy, vulkan, meta, hpu device type at start of device string: gpu (base) ghasempirani@Ghasems-MacBook-Air Scene-Text-Recognition-Recommendations-main 2 %

opened by duzliBlrog 1

Papers, Datasets, Algorithms, SOTA for STR. Long-time Maintaining

Related tags

Overview

Scene Text Recognition Recommendations

Everythin about Scene Text Recognition

Contents

1.Papers

2.Datasets

2.1 Synthetic Datasets

2.2 Benchmarks

3.1 Public Code

3.1. Frameworks

PaddleOCR (百度)

MMOCR (商汤)

Deep Text Recognition Benchmark (ClovaAI)

3.2. Algorithms

CRNN

ASTER

MORANv2

4.SOTA

Baek's Reimplementation Version

You might also like...

Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform sign language recognition.

The code for CVPR2022 paper "Likert Scoring with Grade Decoupling for Long-term Action Assessment".

Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.

A dataset handling library for computer vision datasets in LOST-fromat

Pre-Recognize Library - library with algorithms for improving OCR quality.

Balabobapy - Using artificial intelligence algorithms to continue the text

A real-time dolly zoom camera effect

End-to-end pipeline for real-time scene text detection and recognition.

Developed an AI-based system to control the mouse cursor using Python and OpenCV with the real-time camera.

Comments

can't convert cuda:0 device type tensor to numpy

lib dependencies

i have problem to run your project

Owner

Deep Learning and Vision Computing Lab, SCUT

Tracking the latest progress in Scene Text Detection and Recognition: Must-read papers well organized

Generate a list of papers with publicly available source code in the daily arxiv

Repository of conference publications and source code for first-/ second-authored papers published at NeurIPS, ICML, and ICLR.

A curated list of papers, code and resources pertaining to image composition

The papers published in top-tier AI conferences in recent years.

Automatically download multiple papers by keywords in CVPR

Web interface for browsing arXiv papers

Dirty, ugly, and hopefully useful OCR of Facebook Papers docs released by Gizmodo

👄 The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike

AdvancedEAST is an algorithm used for Scene image text detect, which is primarily based on EAST, and the significant improvement was also made, which make long text predictions more accurate.https://github.com/huoyijie/raspberrypi-car