Simple and understandable swin-transformer OCR project

Ha YongWook

Last update: Dec 31, 2022

Related tags

Deep Learning ocr deep-learning pytorch transformer text-recognition pytorch-lightning swin-transformer

Overview

swin-transformer-ocr

ocr with swin-transformer

Overview

Simple and understandable swin-transformer OCR project. The model in this repository heavily relied on high-level open-source projects like timm and x_transformers. And also you can find that the procedure of training is intuitive thanks to the legibility of pytorch-lightning.

The model in this repository encodes input image to context vector with 'shifted-window` which is a swin-transformer encoding mechanism. And it decodes the vector with a normal auto-regressive transformer.

If you are not familiar with transformer OCR structure, transformer-ocr would be easier to understand because it uses a traditional convolution network (ResNet-v2) for the encoder.

Performance

With private korean handwritten text dataset, the accuracy(exact match) is 97.6%.

Data

./dataset/
├─ preprocessed_image/
│  ├─ cropped_image_0.jpg
│  ├─ cropped_image_1.jpg
│  ├─ ...
├─ train.txt
└─ val.txt

# in train.txt
cropped_image_0.jpg\tHello World.
cropped_image_1.jpg\tvision-transformer-ocr
...

You should preprocess the data first. Crop the image by word or sentence level area. Put all image data in a specific directory. Ground truth information should be provided with a txt file. In the txt file, write the image file name and label with \t separator in the same line.

Configuration

In settings/ directory, you can find default.yaml. You can set almost every hyper-parameter in that file. Copy one and edit it as your experiment version. I recommend you to run with the default setting first, before you change it.

Train

python run.py --version 0 --setting settings/default.yaml --num_workers 16 --batch_size 128

You can check your training log with tensorboard.

tensorboard --log_dir tb_logs --bind_all

Predict

When your model finishes training, you can use your model for prediction.

python predict.py --setting <your_setting.yaml> --target <image_or_directory> --tokenizer <your_tokenizer_pkl> --checkpoint <saved_checkpoint>

Exporting to ONNX

You can export your model to ONNX format. It's very easy thanks to pytorch-lightning. See the related pytorch-lightning document.

Citations

@misc{liu-2021,
    title   = {Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
	author  = {Ze Liu and Yutong Lin and Yue Cao and Han Hu and Yixuan Wei and Zheng Zhang and Stephen Lin and Baining Guo},
	year    = {2021},
    eprint  = {2103.14030},
	archivePrefix = {arXiv}
}

Comments

ONNX deployment

swin-transformer를 ONNX로 export 할 때,

TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! RuntimeError: Exporting the operator roll to ONNX opset version 9 is not supported. Please open a bug to request ONNX export support for the missing operator.

warining과 함께 에러가 나는데, 혹시 ONNX로 export할 때 문제가 없으셨나요?

opened by hjdata11 4
About the optimizer and scheduler

There seems to be no code call the function "CustomCosineAnnealingWarmupRestarts"? I wonder if the optimizer and scheduler you used are AdamW and Cosine Annealing? Thanks!

opened by Pay20Y 4
Tokenization strategy

Hi and thanks for the awesome repo. Did you try any other tokenization strategies (sentencepiece, wordpiece or bpe). I see you use a character level tokenization which is nice but probably dosen't make full use of the language model capabilities. I would love to get some insights.

opened by IlyasMoutawwakil 2
[Help] Val loss increasing while Train loss decreasing

Hello,

I am trying to train the model on a generated dataset (~10k images). I am following the same configuration as in the repo but what I see is that the model learns by the train samples but doesn't generalize. The images are from the same domain so I can't imagine why this would be happening.

I am following same augmentations as in the repo. I am thinking maybe the learning rate is too big and the model learns the most common label, could that be the case?

How long should I leave the model to train? I was expecting to see healthy learning (val loss < train loss) from the beginning. I have left the model to train for longer and I see the same results.

Any help would be appreciated, thanks.

opened by tudor-mare 2
Infrence time, number of parameters and results.

Hi and thank you very much for this repo. I would really appreciate it if you can tell me about how much time this model takes in inference, how much params it contains and if you compared it to other models, which ones did it surpass.

opened by IlyasMoutawwakil 1

Owner

Ha YongWook

On my way up to the shoulders of giants.

GitHub

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Swin Transformer for Object Detection This repo contains the supported code and configuration files to reproduce object detection results of Swin Tran

1.4k Dec 30, 2022

Unofficial implementation of "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" (https://arxiv.org/abs/2103.14030)

Swin-Transformer-Tensorflow A direct translation of the official PyTorch implementation of "Swin Transformer: Hierarchical Vision Transformer using Sh

52 Dec 29, 2022

This project aims to explore the deployment of Swin-Transformer based on TensorRT, including the test results of FP16 and INT8.

Swin Transformer This project aims to explore the deployment of SwinTransformer based on TensorRT, including the test results of FP16 and INT8. Introd

87 Dec 21, 2022

一个多语言支持、易使用的 OCR 项目。An easy-to-use OCR project with multilingual support.

AgentOCR 简介 AgentOCR 是一个基于 PaddleOCR 和 ONNXRuntime 项目开发的一个使用简单、调用方便的 OCR 项目本项目目前包含 Python Package 【AgentOCR】和 OCR 标注软件【AgentOCRLabeling】使用指南 Pytho

98 Nov 10, 2022

Code of PVTv2 is released! PVTv2 largely improves PVTv1 and works better than Swin Transformer with ImageNet-1K pre-training.

Updates (2020/06/21) Code of PVTv2 is released! PVTv2 largely improves PVTv1 and works better than Swin Transformer with ImageNet-1K pre-training. Pyr

1.3k Jan 4, 2023

Unofficial PyTorch reimplementation of the paper Swin Transformer V2: Scaling Up Capacity and Resolution

PyTorch reimplementation of the paper Swin Transformer V2: Scaling Up Capacity and Resolution [arXiv 2021].

122 Dec 12, 2022

Implementation of the Swin Transformer in PyTorch.

Swin Transformer - PyTorch Implementation of the Swin Transformer architecture. This paper presents a new vision Transformer, called Swin Transformer,

597 Jan 3, 2023

Tensorflow implementation of Swin Transformer model.

Swin Transformer (Tensorflow) Tensorflow reimplementation of Swin Transformer model. Based on Official Pytorch implementation. Requirements tensorflow

167 Jan 8, 2023

The codes for the work "Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation"

Swin-Unet The codes for the work "Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation"(https://arxiv.org/abs/2105.05537). A validatio

869 Jan 7, 2023

SwinIR: Image Restoration Using Swin Transformer

SwinIR: Image Restoration Using Swin Transformer This repository is the official PyTorch implementation of SwinIR: Image Restoration Using Shifted Win

2.4k Jan 8, 2023

Image Restoration Using Swin Transformer for VapourSynth

SwinIR SwinIR function for VapourSynth, based on https://github.com/JingyunLiang/SwinIR. Dependencies NumPy PyTorch, preferably with CUDA. Note that t

11 Jun 19, 2022

This repository contains a CBIR system that uses swin transformer to extract image's feature.

Swin-transformer based CBIR This repository contains a CBIR(content-based image retrieval) system. Here we use Swin-transformer to extract query image

12 Nov 17, 2022

Practical Blind Denoising via Swin-Conv-UNet and Data Synthesis

Practical Blind Denoising via Swin-Conv-UNet and Data Synthesis [Paper] [Online Demo] The following results are obtained by our SCUNet with purely syn

312 Jan 7, 2023

This is an official implementation for "Video Swin Transformers".

Video Swin Transformer By Ze Liu*, Jia Ning*, Yue Cao, Yixuan Wei, Zheng Zhang, Stephen Lin and Han Hu. This repo is the official implementation of "V

981 Jan 3, 2023

This is an official implementation for "Self-Supervised Learning with Swin Transformers".

Self-Supervised Learning with Vision Transformers By Zhenda Xie*, Yutong Lin*, Zhuliang Yao, Zheng Zhang, Qi Dai, Yue Cao and Han Hu This repo is the

529 Jan 2, 2023

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

225 Nov 13, 2022

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

12.6k Jan 9, 2023

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

Episodic Transformers (E.T.) Episodic Transformer for Vision-and-Language Navigation Alexander Pashevich, Cordelia Schmid, Chen Sun Episodic Transform

62 Dec 24, 2022

OCR Streamlit App is used to extract text from images using python's easyocr, pytorch and streamlit packages

OCR-Streamlit-App OCR Streamlit App is used to extract text from images using python's easyocr, pytorch and streamlit packages OCR app gets an image a

5 Apr 5, 2022

Simple and understandable swin-transformer OCR project

Related tags

Overview

swin-transformer-ocr

Overview

Performance

Data

Configuration

Train

Predict

Exporting to ONNX

Citations

Comments

ONNX deployment

About the optimizer and scheduler

Tokenization strategy

[Help] Val loss increasing while Train loss decreasing

Infrence time, number of parameters and results.

Owner

Ha YongWook

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Unofficial implementation of "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" (https://arxiv.org/abs/2103.14030)

This project aims to explore the deployment of Swin-Transformer based on TensorRT, including the test results of FP16 and INT8.

一个多语言支持、易使用的 OCR 项目。An easy-to-use OCR project with multilingual support.

Code of PVTv2 is released! PVTv2 largely improves PVTv1 and works better than Swin Transformer with ImageNet-1K pre-training.

Unofficial PyTorch reimplementation of the paper Swin Transformer V2: Scaling Up Capacity and Resolution

Implementation of the Swin Transformer in PyTorch.

Tensorflow implementation of Swin Transformer model.

The codes for the work "Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation"

SwinIR: Image Restoration Using Swin Transformer

Image Restoration Using Swin Transformer for VapourSynth

This repository contains a CBIR system that uses swin transformer to extract image's feature.

Practical Blind Denoising via Swin-Conv-UNet and Data Synthesis

This is an official implementation for "Video Swin Transformers".

This is an official implementation for "Self-Supervised Learning with Swin Transformers".

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

OCR Streamlit App is used to extract text from images using python's easyocr, pytorch and streamlit packages