Image transformations designed for Scene Text Recognition (STR) data augmentation. Published at ICCV 2021 Workshop on Interactive Labeling and Data Augmentation for Vision.

Overview

Data Augmentation for Scene Text Recognition (ICCV 2021 Workshop)

(Pronounced as "strog")

Paper

Arxiv

Why it matters?

Scene Text Recognition (STR) requires data augmentation functions that are different from object recognition. STRAug is data augmentation designed for STR. It offers 36 data augmentation functions that are sorted into 8 groups. Each function supports 3 levels or magnitudes of severity or intensity.

Given a source image:

it can be transformed as follows:

  1. warp.py - to generate Curve, Distort, Stretch (or Elastic) deformations
Curve Distort Stretch
  1. geometry.py - to generate Perspective, Rotation, Shrink deformations
Perspective Rotation Shrink
  1. pattern.py - to create different grids: Grid, VGrid, HGrid, RectGrid, EllipseGrid
Grid VGrid HGrid RectGrid EllipseGrid
  1. blur.py - to generate synthetic blur: GaussianBlur, DefocusBlur, MotionBlur, GlassBlur, ZoomBlur
GaussianBlur DefocusBlur MotionBlur GlassBlur ZoomBlur
  1. noise.py - to add noise: GaussianNoise, ShotNoise, ImpulseNoise, SpeckleNoise
GaussianNoise ShotNoise ImpulseNoise SpeckleNoise
  1. weather.py - to simulate certain weather conditions: Fog, Snow, Frost, Rain, Shadow
Fog Snow Frost Rain Shadow
  1. camera.py - to simulate camera sensor tuning and image compression/resizing: Contrast, Brightness, JpegCompression, Pixelate
Contrast Brightness JpegCompression Pixelate
  1. process.py - all other image processing issues: Posterize, Solarize, Invert, Equalize, AutoContrast, Sharpness, Color
Posterize Solarize Invert Equalize
AutoContrast Sharpness Color

Pip install

pip3 install straug

How to use

Command line (e.g. input image is nokia.png):

>>> from straug.warp import Curve
>>> from PIL import Image
>>> img = Image.open("nokia.png")
>>> img = Curve()(img, mag=3)
>>> img.save("curved_nokia.png")

Python script (see test.py):

python3 test.py --image=<target image>

For example:

python3 test.py --image=images/telekom.png

The corrupted images are in results directory.

Reference

  • Image corruptions (eg blur, noise, camera effects, fog, frost, etc) are based on the work of Hendrycks et al.

Citation

If you find this work useful, please cite:

@inproceedings{atienza2021data,
  title={Data Augmentation for Scene Text Recognition},
  author={Atienza, Rowel},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)},
  year={2021},
  pubstate={published},
  tppubtype={inproceedings}
}
Comments
  • How to deal with underfitting?

    How to deal with underfitting?

    Hello, I am a fresh researcher and recently I noticed your code which is very useful to solve my problem to some extent. My project is also scene text recognition while the dataset is much more irregular. I think your paper give me a constructive guidance. However, there is still some problems that when the N(number of functions in each channel) is going to be larger, maybe 3 or 4, the model preforms hardly to be fitted. the accuracy of training set is always surrounding about 90%. For more, if I add a preprocessing method of random cut, the accuracy is always surrounding about 80%. Could you give me some suggestions to deal with such problems? Thanks.

    opened by ILoveU3D 1
  • Gaussian blur kernel size for small images

    Gaussian blur kernel size for small images

    Currently, the kernel size is fixed to 31x31 (https://github.com/roatienza/straug/blob/43f9ca994fb9d9e3ed379de646bbf194192101f7/blur.py#L38)

    This causes an error internally in the call to reflection_pad2d(): RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (15, 15) at dimension 3 of input 4

    if one of the image's dimensions is less than the kernel size.

    Should the kernel size be a percentage of the image's dimensions instead, e.g. 30-50% of the smaller dimension?

    opened by baudm 1
  • RandAugment

    RandAugment

    hi thanks for your work

    are you implement the RandAugment?

    in your paper:

    geometry = [Rotate(), Perspective(), Shrink()] noise = [GaussianNoise()] blur = [MotionBlur()] augmentations = [geometry, noise, blur] img = RandAugment(img, augmentations, N=3)

    opened by raminrahimi6970 0
  • Questions about paper

    Questions about paper

    First of all, thank you for your great work. I read the paper and met 2 problem: why there is no CRNN line in figure 13? 企业微信截图_16504400407081 And what's the N corresponding to table 5?Is it the best result in a grid search?

    opened by hujichn 2
  • (Request) Add a parameter for borderValue using in warp (TPS transforms)

    (Request) Add a parameter for borderValue using in warp (TPS transforms)

    As per title, it would be much convinient if you provide a seperate parameter for borderValue using in cv2 warpImage method. The default value is 0, i.e. the black fill color.

    opened by huyfam 0
  • question about training speed

    question about training speed

    thanks for your excellent job! it seems that the training is very slow when i use the straug(6x times slower than that without straug). What about the real speed when you test? The following is my aug-code.

    class RecStraugRandAug(object):
        def __init__(self, num_aug=2, prob=0.5, **kwargs):
            super().__init__()
            self.num_aug = num_aug
            self.prob = prob
            try:
                from straug.blur import GaussianBlur, DefocusBlur, MotionBlur, GlassBlur
                from straug.camera import Contrast, Brightness, JpegCompression, Pixelate
                from straug.geometry import Perspective, Shrink, Rotate
                from straug.noise import GaussianNoise, ShotNoise, ImpulseNoise, SpeckleNoise
                from straug.pattern import Grid, VGrid, HGrid, RectGrid, EllipseGrid
                from straug.process import Posterize, Solarize, Invert, Equalize, AutoContrast, Sharpness, Color
                from straug.warp import Stretch, Distort, Curve
                from straug.weather import Fog, Snow, Frost, Rain, Shadow
                self.augs = [
                    [GaussianBlur(), DefocusBlur(), MotionBlur(), GlassBlur()],
                    [Contrast(), Brightness(), JpegCompression(), Pixelate()],
                    [Perspective(), Shrink(), Rotate()],
                    [GaussianNoise(), ShotNoise(), ImpulseNoise(), SpeckleNoise()],
                    [Grid(), VGrid(), HGrid(), RectGrid(), EllipseGrid()],
                    [Posterize(), Solarize(), Invert(), Equalize(), AutoContrast(), Sharpness(), Color()],
                    [Stretch(), Distort(), Curve()],
                    [Fog(), Snow(), Frost(), Rain(), Shadow()],
                ]
            except Exception as ex:
                print(f"exception: {ex}, you can install straug using `pip install straug`")
                exit(-1)
        
        def __call__(self, data):
            img = Image.fromarray(data["image"])
            for idx in range(self.num_aug):
                aug_type_idx = np.random.randint(0, len(self.augs))
                aug_idx = np.random.randint(0, len(self.augs[aug_type_idx]))
                img = self.augs[aug_type_idx][aug_idx](img, mag=random.randint(-1,2), prob=self.prob)
            data["image"] = np.array(img)
            return data
    
    opened by littletomatodonkey 0
Owner
Rowel Atienza
Rowel Atienza
Automatic number plate recognition using tech: Yolo, OCR, Scene text detection, scene text recognation, flask, torch

Automatic Number Plate Recognition Automatic Number Plate Recognition (ANPR) is the process of reading the characters on the plate with various optica

Meftun AKARSU 52 Dec 22, 2022
Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (ICCV 2021 Oral).

DeepPanoContext (DPC) [Project Page (with interactive results)][Paper] DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context G

Cheng Zhang 66 Nov 16, 2022
GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition

GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition

Xinyan Zhao 29 Dec 26, 2022
PyTorch code of my ICDAR 2021 paper Vision Transformer for Fast and Efficient Scene Text Recognition (ViTSTR)

Vision Transformer for Fast and Efficient Scene Text Recognition (ICDAR 2021) ViTSTR is a simple single-stage model that uses a pre-trained Vision Tra

Rowel Atienza 198 Dec 27, 2022
Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

SwinTextSpotter This is the pytorch implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text R

mxin262 183 Jan 3, 2023
Code for the paper "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021)

MASTER-PyTorch PyTorch reimplementation of "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021). This projec

Wenwen Yu 255 Dec 29, 2022
Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Make-A-Scene - PyTorch Pytorch implementation (inofficial) of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors (https://arxiv.org/

Casual GAN Papers 259 Dec 28, 2022
An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

null 45 Dec 8, 2022
Code for the ICCV 2021 Workshop paper: A Unified Efficient Pyramid Transformer for Semantic Segmentation.

Unified-EPT Code for the ICCV 2021 Workshop paper: A Unified Efficient Pyramid Transformer for Semantic Segmentation. Installation Linux, CUDA>=10.0,

null 29 Aug 23, 2022
Tracking code for the winner of track 1 in the MMP-Tracking Challenge at ICCV 2021 Workshop.

Tracking Code for the winner of track1 in MMP-Trakcing challenge This repository contains our tracking code for the Multi-camera Multiple People Track

DamoCV 29 Nov 13, 2022
Code for CVPR 2021 oral paper "Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts"

Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts The rapid progress in 3D scene understanding has come with growing dem

Facebook Research 182 Dec 30, 2022
Code for "Primitive Representation Learning for Scene Text Recognition" (CVPR 2021)

Primitive Representation Learning Network (PREN) This repository contains the code for our paper accepted by CVPR 2021 Primitive Representation Learni

Ruijie Yan 76 Jan 2, 2023
Neural Scene Graphs for Dynamic Scene (CVPR 2021)

Implementation of Neural Scene Graphs, that optimizes multiple radiance fields to represent different objects and a static scene background. Learned representations can be rendered with novel object compositions and views.

null 151 Dec 26, 2022
PyTorch implementation for Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuous Sign Language Recognition.

Stochastic CSLR This is the PyTorch implementation for the ECCV 2020 paper: Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuou

Zhe Niu 28 Dec 19, 2022
A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution (CVPR2022)

A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution (CVPR2022) https://arxiv.org/abs/2203.09388 Jianqi Ma, Zheto

MA Jianqi, shiki 104 Jan 5, 2023
[ICCV 2021] A Simple Baseline for Semi-supervised Semantic Segmentation with Strong Data Augmentation

[ICCV 2021] A Simple Baseline for Semi-supervised Semantic Segmentation with Strong Data Augmentation

CodingMan 45 Dec 12, 2022
Sync2Gen Code for ICCV 2021 paper: Scene Synthesis via Uncertainty-Driven Attribute Synchronization

Sync2Gen Code for ICCV 2021 paper: Scene Synthesis via Uncertainty-Driven Attribute Synchronization 0. Environment Environment: python 3.6 and cuda 10

Haitao Yang 62 Dec 30, 2022
The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

Cutoff: A Simple Data Augmentation Approach for Natural Language This repository contains source code necessary to reproduce the results presented in

Dinghan Shen 49 Dec 22, 2022