Image transformations designed for Scene Text Recognition (STR) data augmentation. Published at ICCV 2021 Workshop on Interactive Labeling and Data Augmentation for Vision.

Rowel Atienza

Last update: Dec 28, 2022

Related tags

Overview

Data Augmentation for Scene Text Recognition (ICCV 2021 Workshop)

(Pronounced as "strog")

Paper

Arxiv

Why it matters?

Scene Text Recognition (STR) requires data augmentation functions that are different from object recognition. STRAug is data augmentation designed for STR. It offers 36 data augmentation functions that are sorted into 8 groups. Each function supports 3 levels or magnitudes of severity or intensity.

Given a source image:

it can be transformed as follows:

warp.py - to generate Curve, Distort, Stretch (or Elastic) deformations

`Curve`	`Distort`	`Stretch`

geometry.py - to generate Perspective, Rotation, Shrink deformations

`Perspective`	`Rotation`	`Shrink`

pattern.py - to create different grids: Grid, VGrid, HGrid, RectGrid, EllipseGrid

`Grid`	`VGrid`	`HGrid`	`RectGrid`	`EllipseGrid`

blur.py - to generate synthetic blur: GaussianBlur, DefocusBlur, MotionBlur, GlassBlur, ZoomBlur

`GaussianBlur`	`DefocusBlur`	`MotionBlur`	`GlassBlur`	`ZoomBlur`

noise.py - to add noise: GaussianNoise, ShotNoise, ImpulseNoise, SpeckleNoise

`GaussianNoise`	`ShotNoise`	`ImpulseNoise`	`SpeckleNoise`

weather.py - to simulate certain weather conditions: Fog, Snow, Frost, Rain, Shadow

`Fog`	`Snow`	`Frost`	`Rain`	`Shadow`

camera.py - to simulate camera sensor tuning and image compression/resizing: Contrast, Brightness, JpegCompression, Pixelate

`Contrast`	`Brightness`	`JpegCompression`	`Pixelate`

process.py - all other image processing issues: Posterize, Solarize, Invert, Equalize, AutoContrast, Sharpness, Color

`Posterize`	`Solarize`	`Invert`	`Equalize`

`AutoContrast`	`Sharpness`	`Color`

Pip install

pip3 install straug

How to use

Command line (e.g. input image is nokia.png):

>>> from straug.warp import Curve
>>> from PIL import Image
>>> img = Image.open("nokia.png")
>>> img = Curve()(img, mag=3)
>>> img.save("curved_nokia.png")

Python script (see test.py):

python3 test.py --image=<target image>

For example:

python3 test.py --image=images/telekom.png

The corrupted images are in results directory.

Reference

Image corruptions (eg blur, noise, camera effects, fog, frost, etc) are based on the work of Hendrycks et al.

Citation

If you find this work useful, please cite:

@inproceedings{atienza2021data,
  title={Data Augmentation for Scene Text Recognition},
  author={Atienza, Rowel},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)},
  year={2021},
  pubstate={published},
  tppubtype={inproceedings}
}

Comments

How to deal with underfitting?

Hello, I am a fresh researcher and recently I noticed your code which is very useful to solve my problem to some extent. My project is also scene text recognition while the dataset is much more irregular. I think your paper give me a constructive guidance. However, there is still some problems that when the N(number of functions in each channel) is going to be larger, maybe 3 or 4, the model preforms hardly to be fitted. the accuracy of training set is always surrounding about 90%. For more, if I add a preprocessing method of random cut, the accuracy is always surrounding about 80%. Could you give me some suggestions to deal with such problems? Thanks.

opened by ILoveU3D 1
Gaussian blur kernel size for small images

Currently, the kernel size is fixed to 31x31 (https://github.com/roatienza/straug/blob/43f9ca994fb9d9e3ed379de646bbf194192101f7/blur.py#L38)

This causes an error internally in the call to reflection_pad2d(): RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (15, 15) at dimension 3 of input 4

if one of the image's dimensions is less than the kernel size.

Should the kernel size be a percentage of the image's dimensions instead, e.g. 30-50% of the smaller dimension?

opened by baudm 1
RandAugment

hi thanks for your work

are you implement the RandAugment?

in your paper:

geometry = [Rotate(), Perspective(), Shrink()] noise = [GaussianNoise()] blur = [MotionBlur()] augmentations = [geometry, noise, blur] img = RandAugment(img, augmentations, N=3)

opened by raminrahimi6970 0
Questions about paper

First of all, thank you for your great work. I read the paper and met 2 problem： why there is no CRNN line in figure 13？ And what's the N corresponding to table 5？Is it the best result in a grid search?

opened by hujichn 2
(Request) Add a parameter for borderValue using in warp (TPS transforms)

As per title, it would be much convinient if you provide a seperate parameter for borderValue using in cv2 warpImage method. The default value is 0, i.e. the black fill color.

opened by huyfam 0

question about training speed

thanks for your excellent job! it seems that the training is very slow when i use the straug(6x times slower than that without straug). What about the real speed when you test? The following is my aug-code.

class RecStraugRandAug(object):
    def __init__(self, num_aug=2, prob=0.5, **kwargs):
        super().__init__()
        self.num_aug = num_aug
        self.prob = prob
        try:
            from straug.blur import GaussianBlur, DefocusBlur, MotionBlur, GlassBlur
            from straug.camera import Contrast, Brightness, JpegCompression, Pixelate
            from straug.geometry import Perspective, Shrink, Rotate
            from straug.noise import GaussianNoise, ShotNoise, ImpulseNoise, SpeckleNoise
            from straug.pattern import Grid, VGrid, HGrid, RectGrid, EllipseGrid
            from straug.process import Posterize, Solarize, Invert, Equalize, AutoContrast, Sharpness, Color
            from straug.warp import Stretch, Distort, Curve
            from straug.weather import Fog, Snow, Frost, Rain, Shadow
            self.augs = [
                [GaussianBlur(), DefocusBlur(), MotionBlur(), GlassBlur()],
                [Contrast(), Brightness(), JpegCompression(), Pixelate()],
                [Perspective(), Shrink(), Rotate()],
                [GaussianNoise(), ShotNoise(), ImpulseNoise(), SpeckleNoise()],
                [Grid(), VGrid(), HGrid(), RectGrid(), EllipseGrid()],
                [Posterize(), Solarize(), Invert(), Equalize(), AutoContrast(), Sharpness(), Color()],
                [Stretch(), Distort(), Curve()],
                [Fog(), Snow(), Frost(), Rain(), Shadow()],
            ]
        except Exception as ex:
            print(f"exception: {ex}, you can install straug using `pip install straug`")
            exit(-1)
    
    def __call__(self, data):
        img = Image.fromarray(data["image"])
        for idx in range(self.num_aug):
            aug_type_idx = np.random.randint(0, len(self.augs))
            aug_idx = np.random.randint(0, len(self.augs[aug_type_idx]))
            img = self.augs[aug_type_idx][aug_idx](img, mag=random.randint(-1,2), prob=self.prob)
        data["image"] = np.array(img)
        return data

opened by littletomatodonkey 0

Image transformations designed for Scene Text Recognition (STR) data augmentation. Published at ICCV 2021 Workshop on Interactive Labeling and Data Augmentation for Vision.

Related tags

Overview

Data Augmentation for Scene Text Recognition (ICCV 2021 Workshop)

Paper

Why it matters?

Pip install

How to use

Reference

Citation

Comments

How to deal with underfitting?

Gaussian blur kernel size for small images

RandAugment

Questions about paper

(Request) Add a parameter for borderValue using in warp (TPS transforms)

question about training speed

Owner

Rowel Atienza

Automatic number plate recognition using tech: Yolo, OCR, Scene text detection, scene text recognation, flask, torch

Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (ICCV 2021 Oral).

GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition

PyTorch code of my ICDAR 2021 paper Vision Transformer for Fast and Efficient Scene Text Recognition (ViTSTR)

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

Code for the paper "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021)

Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

Code for the ICCV 2021 Workshop paper: A Unified Efficient Pyramid Transformer for Semantic Segmentation.

Tracking code for the winner of track 1 in the MMP-Tracking Challenge at ICCV 2021 Workshop.

Code for CVPR 2021 oral paper "Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts"

Code for "Primitive Representation Learning for Scene Text Recognition" (CVPR 2021)

Neural Scene Graphs for Dynamic Scene (CVPR 2021)

PyTorch implementation for Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuous Sign Language Recognition.

A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution (CVPR2022)

[ICCV 2021] A Simple Baseline for Semi-supervised Semantic Segmentation with Strong Data Augmentation

Sync2Gen Code for ICCV 2021 paper: Scene Synthesis via Uncertainty-Driven Attribute Synchronization

The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".