Image augmentation library in Python for machine learning.

Overview

AugmentorLogo

Augmentor is an image augmentation library in Python for machine learning. It aims to be a standalone library that is platform and framework independent, which is more convenient, allows for finer grained control over augmentation, and implements the most real-world relevant augmentation techniques. It employs a stochastic approach using building blocks that allow for operations to be pieced together in a pipeline.

PyPI Supported Python Versions Documentation Status Build Status License Project Status: Active – The project has reached a stable, usable state and is being actively developed. Binder

Installation

Augmentor is written in Python. A Julia version of the package is also being developed as a sister project and is available here.

Install using pip from the command line:

pip install Augmentor

See the documentation for building from source. To upgrade from a previous version, use pip install Augmentor --upgrade.

Documentation

Complete documentation can be found on Read the Docs: http://augmentor.readthedocs.io/

Quick Start Guide and Usage

The purpose of Augmentor is to automate image augmentation (artificial data generation) in order to expand datasets as input for machine learning algorithms, especially neural networks and deep learning.

The package works by building an augmentation pipeline where you define a series of operations to perform on a set of images. Operations, such as rotations or transforms, are added one by one to create an augmentation pipeline: when complete, the pipeline can be executed and an augmented dataset is created.

To begin, instantiate a Pipeline object that points to a directory on your file system:

import Augmentor
p = Augmentor.Pipeline("/path/to/images")

You can then add operations to the Pipeline object p as follows:

p.rotate(probability=0.7, max_left_rotation=10, max_right_rotation=10)
p.zoom(probability=0.5, min_factor=1.1, max_factor=1.5)

Every function requires you to specify a probability, which is used to decide if an operation is applied to an image as it is passed through the augmentation pipeline.

Once you have created a pipeline, you can sample from it like so:

p.sample(10000)

which will generate 10,000 augmented images based on your specifications. By default these will be written to the disk in a directory named output relative to the path specified when initialising the p pipeline object above.

If you wish to process each image in the pipeline exactly once, use process():

p.process()

This function might be useful for resizing a dataset for example. It would make sense to create a pipeline where all of its operations have their probability set to 1 when using the process() method.

Multi-threading

Augmentor (version >=0.2.1) now uses multi-threading to increase the speed of generating images.

This may slow down some pipelines if the original images are very small. Set multi_threaded to False if slowdown is experienced:

p.sample(100, multi_threaded=False)

However, by default the sample() function uses multi-threading. This is currently only implemented when saving to disk. Generators will use multi-threading in the next version update.

Ground Truth Data

Images can be passed through the pipeline in groups of two or more so that ground truth data can be identically augmented.

Original image and mask[3] Augmented original and mask images
OriginalMask AugmentedMask

To augment ground truth data in parallel to any original data, add a ground truth directory to a pipeline using the ground_truth() function:

p = Augmentor.Pipeline("/path/to/images")
# Point to a directory containing ground truth data.
# Images with the same file names will be added as ground truth data
# and augmented in parallel to the original data.
p.ground_truth("/path/to/ground_truth_images")
# Add operations to the pipeline as normal:
p.rotate(probability=1, max_left_rotation=5, max_right_rotation=5)
p.flip_left_right(probability=0.5)
p.zoom_random(probability=0.5, percentage_area=0.8)
p.flip_top_bottom(probability=0.5)
p.sample(50)

Multiple Mask/Image Augmentation

Using the DataPipeline class (Augmentor version >= 0.2.3), images that have multiple associated masks can be augmented:

Multiple Mask Augmentation
MultipleMask

Arbitrarily long lists of images can be passed through the pipeline in groups and augmented identically using the DataPipeline class. This is useful for ground truth images that have several masks, for example.

In the example below, the images and their masks are contained in the images data structure (as lists of lists), while their labels are contained in y:

p = Augmentor.DataPipeline(images, y)
p.rotate(1, max_left_rotation=5, max_right_rotation=5)
p.flip_top_bottom(0.5)
p.zoom_random(1, percentage_area=0.5)

augmented_images, labels = p.sample(100)

The DataPipeline returns images directly (augmented_images above), and does not save them to disk, nor does it read data from the disk. Images are passed directly to DataPipeline during initialisation.

For details of the images data structure and how to create it, see the Multiple-Mask-Augmentation.ipynb Jupyter notebook.

Generators for Keras and PyTorch

If you do not wish to save to disk, you can use a generator (in this case with Keras):

g = p.keras_generator(batch_size=128)
images, labels = next(g)

which returns a batch of images of size 128 and their corresponding labels. Generators return data indefinitely, and can be used to train neural networks with augmented data on the fly.

Alternatively, you can integrate it with PyTorch:

import torchvision
transforms = torchvision.transforms.Compose([
    p.torch_transform(),
    torchvision.transforms.ToTensor(),
])

Main Features

Elastic Distortions

Using elastic distortions, one image can be used to generate many images that are real-world feasible and label preserving:

Input Image Augmented Images
eight_hand_drawn_border eights_border

The input image has a 1 pixel black border to emphasise that you are getting distortions without changing the size or aspect ratio of the original image, and without any black/transparent padding around the newly generated images.

The functionality can be more clearly seen here:

Original Image[1] Random distortions applied
Original Distorted

Perspective Transforms

There are a total of 12 different types of perspective transform available. Four of the most common are shown below.

Tilt Left Tilt Right Tilt Forward Tilt Backward
TiltLeft Original Original Original

The remaining eight types of transform are as follows:

Skew Type 0 Skew Type 1 Skew Type 2 Skew Type 3
Skew0 Skew1 Skew2 Skew3
Skew Type 4 Skew Type 5 Skew Type 6 Skew Type 7
Skew4 Skew5 Skew6 Skew7

Size Preserving Rotations

Rotations by default preserve the file size of the original images:

Original Image Rotated 10 degrees, automatically cropped
Original Rotate

Compared to rotations by other software:

Original Image Rotated 10 degrees
Original Rotate

Size Preserving Shearing

Shearing will also automatically crop the correct area from the sheared image, so that you have an image with no black space or padding.

Original image Shear (x-axis) 20 degrees Shear (y-axis) 20 degrees
Original ShearX ShearY

Compare this to how this is normally done:

Original image Shear (x-axis) 20 degrees Shear (y-axis) 20 degrees
Original ShearX ShearY

Cropping

Cropping can also be handled in a manner more suitable for machine learning image augmentation:

Original image Random crops + resize operation
Original Original

Random Erasing

Random Erasing is a technique used to make models robust to occlusion. This may be useful for training neural networks used in object detection in navigation scenarios, for example.

Original image[2] Random Erasing
Original Original

See the Pipeline.random_erasing() documentation for usage.

Chaining Operations in a Pipeline

With only a few operations, a single image can be augmented to produce large numbers of new, label-preserving samples:

Original image Distortions + mirroring
Original DistortFlipFlop

In the example above, we have applied three operations: first we randomly distort the image, then we flip it horizontally with a probability of 0.5 and then vertically with a probability of 0.5. We then sample from this pipeline 100 times to create 100 new data.

p.random_distortion(probability=1, grid_width=4, grid_height=4, magnitude=8)
p.flip_left_right(probability=0.5)
p.flip_top_bottom(probability=0.5)
p.sample(100)

Tutorial Notebooks

Integration with Keras using Generators

Augmentor can be used as a replacement for Keras' augmentation functionality. Augmentor can create a generator which produces augmented data indefinitely, according to the pipeline you have defined. See the following notebooks for details:

  • Reading images from a local directory, augmenting them at run-time, and using a generator to pass the augmented stream of images to a Keras convolutional neural network, see Augmentor_Keras.ipynb
  • Augmenting data in-memory (in array format) and using a generator to pass these new images to the Keras neural network, see Augmentor_Keras_Array_Data.ipynb

Per-Class Augmentation Strategies

Augmentor allows for pipelines to be defined per class. That is, you can define different augmentation strategies on a class-by-class basis for a given classification problem.

See an example of this in the following Jupyter notebook: Per_Class_Augmentation_Strategy.ipynb

Complete Example

Let's perform an augmentation task on a single image, demonstrating the pipeline and several features of Augmentor.

First import the package and initialise a Pipeline object by pointing it to a directory containing your images:

import Augmentor

p = Augmentor.Pipeline("/home/user/augmentor_data_tests")

Now you can begin adding operations to the pipeline object:

p.rotate90(probability=0.5)
p.rotate270(probability=0.5)
p.flip_left_right(probability=0.8)
p.flip_top_bottom(probability=0.3)
p.crop_random(probability=1, percentage_area=0.5)
p.resize(probability=1.0, width=120, height=120)

Once you have added the operations you require, you can sample images from this pipeline:

p.sample(100)

Some sample output:

Input Image[3] Augmented Images
Original Augmented

The augmented images may be useful for a boundary detection task, for example.

Licence and Acknowledgements

Augmentor is made available under the terms of the MIT Licence. See Licence.md.

[1] Checkerboard image obtained from Wikimedia Commons and is in the public domain: https://commons.wikimedia.org/wiki/File:Checkerboard_pattern.svg

[2] Street view image is in the public domain: http://stokpic.com/project/italian-city-street-with-shoppers/

[3] Skin lesion image obtained from the ISIC Archive:

You can use urllib to obtain the skin lesion image in order to reproduce the augmented images above:

>>> from urllib import urlretrieve
>>> im_url = "https://isic-archive.com:443/api/v1/image/5436e3abbae478396759f0cf/download"
>>> urlretrieve(im_url, "ISIC_0000000.jpg")
('ISIC_0000000.jpg', <httplib.HTTPMessage instance at 0x7f7bd949a950>)

Note: For Python 3, use from urllib.request import urlretrieve.

Logo created at LogoMakr.com

Tests

To run the automated tests, clone the repository and run:

$ py.test -v

from the command line. To view the CI tests that are run after each commit, see https://travis-ci.org/mdbloice/Augmentor.

Citing Augmentor

If you find this package useful and wish to cite it, you can use

Marcus D Bloice, Peter M Roth, Andreas Holzinger, Biomedical image augmentation using Augmentor, Bioinformatics, https://doi.org/10.1093/bioinformatics/btz259

Asciicast

Click the preview below to view a video demonstration of Augmentor in use:

asciicast

Comments
  • not an issue - potential break through in 3d Point cloud scanning inference

    not an issue - potential break through in 3d Point cloud scanning inference

    I watched this video this morning on 3d point cloud + ARKit https://www.youtube.com/watch?v=kupq1C41XcU&feature=youtu.be and it seems like a trained Augmentor model could help bridge the inference here in conjunction with trained model. Not sure if you agree, or if this is the correct repo for this - perhaps it is a new project.

    screen shot 2017-10-03 at 9 42 19 am

    I guess as a feature request / potential enhancement for Augmentor to solve / (unless you can think of something better or maybe it already does this) we need a way to guess (train a model) the transformation necessary to go from one transformation to the other. eg. given a view is transformed from A -> B // what was the transformation??? then from this glue / and some kalman filters - retrofit the point cloud.

    screen shot 2017-10-03 at 9 46 16 am

    Step 2 could be to isolate this to bounding box. just thinking out loud here.

    this is the code from video above https://github.com/johndpope/ARKitExperiments

    opened by johndpope 7
  • tif files are not loaded

    tif files are not loaded

    from Augmentor import Pipeline
    from skimage.io import imread
    
    aug = Pipeline(source_directory='./',
                   output_directory='out')
    
    print(imread('sample.tif'))
    aug.apply_current_pipeline('sample.tif')
    

    Let's try to read a file using scikit-image to be sure it is valid tiff file, and see how Augmentor fails:

    Initialised with 1 image(s) found in selected directory.
    Output directory set to ./out.
    [[[3366 2681 1454 4441]
      [3248 2588 1490 4039]
      [3422 2731 1579 4285]
      ...,
      [3659 3072 1881 7845]
      [3733 3154 1954 8042]
      [3751 3110 1889 7561]]
    
     [[3357 2647 1488 4480]
      [3161 2559 1437 4037]
      [3400 2719 1584 4146]
      ...,
      [3645 3124 1882 7944]
      [3642 3137 1811 8230]
      [3690 3155 1925 7592]]
    
     [[3409 2690 1481 4475]
      [3343 2637 1539 4038]
      [3444 2764 1626 4112]
      ...,
      [3798 3186 1921 7724]
      [3863 3266 2034 8210]
      [3662 3080 1836 7586]]
    
     ...,
     [[3670 3012 1836 5635]
      [3654 3005 1810 5861]
      [3545 2963 1774 5925]
      ...,
      [3567 2858 1712 6473]
      [3706 2971 1852 7023]
      [3742 3049 1853 7311]]
    
     [[3677 3031 1837 5715]
      [3599 3015 1815 5769]
      [3593 2994 1817 5706]
      ...,
      [3620 2938 1757 7244]
      [3702 3052 1834 7525]
      [3696 3098 1857 7699]]
    
     [[3540 2968 1778 5665]
      [3572 2987 1818 5662]
      [3549 2952 1805 5710]
      ...,
      [3611 3023 1809 7788]
      [3657 3114 1822 7792]
      [3717 3168 1896 7859]]]
    Traceback (most recent call last):
      File "bug.py", line 8, in <module>
        aug.apply_current_pipeline('sample.tif')
      File "/Users/Arseny/.pyenv/versions/3.6.0/lib/python3.6/site-packages/Augmentor/Pipeline.py", line 268, in apply_current_pipeline
        return self._execute(AugmentorImage(os.path.abspath(image_path), None), save_to_disk)
      File "/Users/Arseny/.pyenv/versions/3.6.0/lib/python3.6/site-packages/Augmentor/Pipeline.py", line 189, in _execute
        image = Image.open(augmentor_image.image_path)
      File "/Users/Arseny/.pyenv/versions/3.6.0/lib/python3.6/site-packages/PIL/Image.py", line 2452, in open
        % (filename if filename else fp))
    OSError: cannot identify image file '/Users/Arseny/dev/kaggle/amzn/sample.tif'
    
    opened by arsenyinfo 7
  • Random Ereasing areas too big

    Random Ereasing areas too big

    Hello!

    when using random erasing i cant go below 0.11 size. 11% of the size is too big for me, i would need a tenth of this. can this be adapted somehow?

    thank you!

    best regards

    Igor

    opened by tanzerlana 5
  • Initialised with 0 image(s) found.

    Initialised with 0 image(s) found.

    p=Augmentor.Pipeline(source_directory="/Users/admin/Desktop/img2",output_directory="/Users/admin/Desktop/img")

    Hey, my augmenter is unable to detect any images in the source folder. I put the relevant formats yet nothing gets detected. What exactly could be going wrong?

    I also get an Attribution error when my folder contains a folder that has the images. "AttributeError: exit"

    please help

    opened by bluesky314 5
  • How to pass an image path as parameter and only preprocess it?

    How to pass an image path as parameter and only preprocess it?

    It seems Augmentor needs a work directory as the parameter and it will preprocess all files in it. However, we always need to preprocess a specific image and use the image path as the parameter. So how to use Augmentor in this situation? Thank you.

    opened by Kongsea 5
  • Fix appearing extra labels when 'keras_generator' function is called

    Fix appearing extra labels when 'keras_generator' function is called

    Hi @mdbloice, I found Augmentator very useful tool, but I faced with distracting bug with keras_generator function, so my pull request is gonna fix it. The bug is following:

    I have a training set which contains images of two classes, so there's two subfolders in the train_128_path, and i'm doing this

    p = Augmentor.Pipeline(train_128_path)
    ...
    generator = p.keras_generator(batch_size, image_data_format="channels_first") # It's ok
    ...
    # Second call
    generator = p.keras_generator(batch_size, image_data_format="channels_first") 
    # It's not ok, it behaves like there's three labels 
    # and generator returns label vectors of size three
    

    After the first call it creates directory output relative to train_128_path, and on the second call keras_generator treats output like new image class.

    I found problem with the function 'Pipeline._populate', it calls scan and passes abs_output_directory which actually is not absolute, but scan expects an absolute path (problem with the line 195)

    abs_output_directory = os.path.join(source_directory, output_directory)
    ...
    self.augmentor_images, self.class_labels = scan(source_directory, abs_output_directory)
    

    So I've fixed it in the scan function, by the way removed extra spaces.

    opened by rlnx 5
  • Add PyTorch support in Pipeline class

    Add PyTorch support in Pipeline class

    This is a re-post of pull request #45. Build failure is now fixed, but .travis.yml became messy, for:

    • torchvision is not on PyPI, only on Anaconda
    • torchvision supports only Python 2.7, 3.5, and 3.6
    opened by juneoh 5
  • augmented image and corresponding mask does not match

    augmented image and corresponding mask does not match

    Hi. I would like to ask for assistance. I tried to use Augmentor on the ISBI2012 dataset at https://imagej.net/Segmentation_of_neuronal_structures_in_EM_stacks_challenge_-_ISBI_2012 with the codes below. However, the output images and ground truths don't match.

    isbi

    p = Augmentor.Pipeline(ISBI2012_TRAIN_PATH+"/images") p.ground_truth(ISBI2012_TRAIN_PATH+"/gt") p.zoom_random(probability=0.5, percentage_area=0.8) p.sample(640)

    opened by eemberda 4
  • rotate_without_crop doesn't work

    rotate_without_crop doesn't work

    Hello,

    when I try to use rotate_without_crop the program crashes.

    import Augmentor
    inputdir="mypicturedirectory"
    outputdir="myoutputdirectory"
    
    pictures = Augmentor.Pipeline(source_directory=inputdir, output_directory=outputdir)
    
    pictures.rotate_without_crop(1,10,10)
    pictures.sample(100)
    

    The output is:

      File "myaugmentor.py", line 29, in <module>
        pictures.sample(100)
      File "/usr/local/lib/python3.6/dist-packages/Augmentor/Pipeline.py", line 364, in sample
        for result in executor.map(self, augmentor_images):
      File "/usr/lib/python3.6/concurrent/futures/_base.py", line 586, in result_iterator
        yield fs.pop().result()
      File "/usr/lib/python3.6/concurrent/futures/_base.py", line 425, in result
        return self.__get_result()
      File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
        raise self._exception
      File "/usr/lib/python3.6/concurrent/futures/thread.py", line 56, in run
        result = self.fn(*self.args, **self.kwargs)
      File "/usr/local/lib/python3.6/dist-packages/Augmentor/Pipeline.py", line 105, in __call__
        return self._execute(augmentor_image)
      File "/usr/local/lib/python3.6/dist-packages/Augmentor/Pipeline.py", line 233, in _execute
        images = operation.perform_operation(images)
      File "/usr/local/lib/python3.6/dist-packages/Augmentor/Operations.py", line 674, in perform_operation
        augmented_images.append(do(image))
      File "/usr/local/lib/python3.6/dist-packages/Augmentor/Operations.py", line 669, in do
        return image.rotate(rotation, expand=self.expand, resample=Image.BICUBIC, fillcolor=self.fillcolor)
    

    I am using the latest pip package (0.2.6) of Augmentor and Python version 3.6.8 running on Ubuntu 18.04.

    Cheers

    opened by Randryn0 4
  • Multiple mask augmentation: semi non-identical augmentations?

    Multiple mask augmentation: semi non-identical augmentations?

    So I'm trying to use the Augmentor Datapipeline for my dataset. My dataset consists of two images and a corresponding vector field. Now I can do the same augmentation for all three samples at the same time and that's working great. But for the vector field some extra work needs to be done.

    For example, suppose I do a random vertical flip. For the images this is no problem, but for the vector field a little extra work needs to be done. After a vertical flip, the y-component of the vector field needs to be rotated 180 degrees in order to remain consistent.

    Can I make functions that get executed on part of a sample based on the random choice of the augmentation operations?

    And as a next step, can I also perform some functions only on the images and not on the vector field?

    opened by maartenterpstra 4
  • Update torch_transform and tests

    Update torch_transform and tests

    There was a bug in the torch_transform function where the image was being re-enclosed in a list multiple times (the respective methods on the expected image then failed because the items were instead themselves lists).

    The bug was never caught by the tests because only one transform was ever performed. The tests have also been updated to now catch this.

    (Rounding the random sample was also unnecessary).

    opened by lewisbelcher 4
  • ValueError: image has wrong mode

    ValueError: image has wrong mode

    Hi,I faced a problem when I used: p.random_color(probability=0.6,min_factor=50,max_factor=120) p.random_brightness(probability=0.8,min_factor=50,max_factor=255) the error was: image image however,if I deleted these codes and used other functions like rotate90,rotate270 or random_erasing,the codes worked very well My codes are as follow: import Augmentor p=Augmentor.Pipeline("H:\\text\imgs") p.ground_truth("H:\\text\jsons\mask_png") p.rotate(probability=1,max_left_rotation=25,max_right_rotation=25) p.random_color(probability=0.6,min_factor=50,max_factor=120) p.random_brightness(probability=0.8,min_factor=50,max_factor=255) p.random_erasing(probability=1,rectangle_area=0.5) p.sample(50) Thank you !

    opened by Moriarty0112 1
  • Use for Semantic Segmentation

    Use for Semantic Segmentation

    Hi, thank you very much for the augmentor, it helped me a lot. But I have a question, I am not sure whether Augmentor can be used for multi-label semantic segmentation, but according to my experiments, most of the newly generated labels are wrong, such as the image below, the erased area does not correspond, and There are a lot of masks at the border that shouldn't be there. wrong1

    I tested some pictures and the erased areas are all wrong. wrong2

    I would like to ask, can augmentor not be used for semantic segmentation?

    opened by Superzlw 2
  • Not cropping skewed image to original image size

    Not cropping skewed image to original image size

    Description: When one uses skew_left_right(), the image is cropped to fit original image size and hence, content is lost. For example: Original: image After skewing: image As you can see the content is clipped at top-right and bottom-right corners.

    How can I skew in a fashion that preserves content at the expense of larger (or changed) image sizes ?

    Language/Compiler: Python 3.7

    OS: Distributor ID: Ubuntu Description: Ubuntu 18.04.5 LTS Release: 18.04 Codename: bionic

    Version: 0.2.9

    How to recreate: Sample input image - image

    import Augmentor
    
    p = Augmentor.Pipeline("<image-location>")
    p.skew_left_right(probability=1.0, magnitude=0.5)
    p.sample(3)
    

    What i have already done to resolve the issue: peruse documentation - https://augmentor.readthedocs.io/en/master/userguide/mainfeatures.html and code on github -https://github.com/mdbloice/Augmentor/blob/daf4478ea34c3504d1a26c22721f8558de4da22b/Augmentor/Pipeline.py#L1366

    opened by NithyaMogane-TomTom 1
Owner
Marcus D. Bloice
Researcher in applied machine learning for healthcare, Medical University of Graz, Austria.
Marcus D. Bloice
Fast image augmentation library and easy to use wrapper around other libraries. Documentation: https://albumentations.ai/docs/ Paper about library: https://www.mdpi.com/2078-2489/11/2/125

Albumentations Albumentations is a Python library for image augmentation. Image augmentation is used in deep learning and computer vision tasks to inc

null 11.4k Jan 2, 2023
Geometric Augmentation for Text Image

Text Image Augmentation A general geometric augmentation tool for text images in the CVPR 2020 paper "Learn to Augment: Joint Data Augmentation and Ne

Canjie Luo 440 Jan 5, 2023
Scale-aware Automatic Augmentation for Object Detection (CVPR 2021)

SA-AutoAug Scale-aware Automatic Augmentation for Object Detection Yukang Chen, Yanwei Li, Tao Kong, Lu Qi, Ruihang Chu, Lei Li, Jiaya Jia [Paper] [Bi

Jia Research Lab 182 Dec 29, 2022
computer vision, image processing and machine learning on the web browser or node.

Image processing and Machine learning labs   computer vision, image processing and machine learning on the web browser or node note Fast Fourier Trans

ryohei tanaka 487 Nov 11, 2022
Deskew is a command line tool for deskewing scanned text documents. It uses Hough transform to detect "text lines" in the image. As an output, you get an image rotated so that the lines are horizontal.

Deskew by Marek Mauder https://galfar.vevb.net/deskew https://github.com/galfar/deskew v1.30 2019-06-07 Overview Deskew is a command line tool for des

Marek Mauder 127 Dec 3, 2022
A facial recognition device is a device that takes an image or a video of a human face and compares it to another image faces in a database.

A facial recognition device is a device that takes an image or a video of a human face and compares it to another image faces in a database. The structure, shape and proportions of the faces are compared during the face recognition steps.

Pavankumar Khot 4 Mar 19, 2022
An advanced 2D image manipulation with features such as edge detection and image segmentation built using OpenCV

OpenCV-ToothPaint3-Advanced-Digital-Image-Editor This application named ‘Tooth Paint’ version TP_2020.3 (64-bit) or version 3 was developed within a w

JunHong 1 Nov 5, 2021
WACV 2022 Paper - Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching

Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching Code based on our WACV 2022 Accepted Paper: https://arxiv.org/pdf/

Andres 13 Dec 17, 2022
This pyhton script converts a pdf to Image then using tesseract as OCR engine converts Image to Text

Script_Convertir_PDF_IMG_TXT Este script de pyhton convierte un pdf en Imagen luego utilizando tesseract como motor OCR convierte la Imagen a Texto. p

alebogado 1 Jan 27, 2022
Thresholding-and-masking-using-OpenCV - Image Thresholding is used for image segmentation

Image Thresholding is used for image segmentation. From a grayscale image, thresholding can be used to create binary images. In thresholding we pick a threshold T.

Grace Ugochi Nneji 3 Feb 15, 2022
A machine learning software for extracting information from scholarly documents

GROBID GROBID documentation Visit the GROBID documentation for more detailed information. Summary GROBID (or Grobid, but not GroBid nor GroBiD) means

Patrice Lopez 1.9k Jan 8, 2023
The first open-source library that detects the font of a text in a image.

Typefont Typefont is an experimental library that detects the font of a text in a image. Usage Import the main function and invoke it like in the foll

Vasile Pește 1.6k Feb 24, 2022
Pre-Recognize Library - library with algorithms for improving OCR quality.

PRLib - Pre-Recognition Library. The main aim of the library - prepare image for recogntion. Image processing can really help to improve recognition q

Alex 80 Dec 30, 2022
A Python script to capture images from multiple webcams at once and save them into your local machine

Capturing multiple images at once from Webcam Using OpenCV Capture multiple image by accessing the webcam of your system and save it to your machine.

Fazal ur Rehman 2 Apr 16, 2022
The Open Source Framework for Machine Vision

SimpleCV Quick Links: About Installation [Docker] (#docker) Ubuntu Virtual Environment Arch Linux Fedora MacOS Windows Raspberry Pi SimpleCV Shell Vid

Sight Machine 2.6k Dec 31, 2022
ISI's Optical Character Recognition (OCR) software for machine-print and handwriting data

VistaOCR ISI's Optical Character Recognition (OCR) software for machine-print and handwriting data Publications "How to Efficiently Increase Resolutio

ISI Center for Vision, Image, Speech, and Text Analytics 21 Dec 8, 2021
OCR system for Arabic language that converts images of typed text to machine-encoded text.

Arabic OCR OCR system for Arabic language that converts images of typed text to machine-encoded text. The system currently supports only letters (29 l

Hussein Youssef 144 Jan 5, 2023
Machine Leaning applied to denoise images to improve OCR Accuracy

Machine Learning to Denoise Images for Better OCR Accuracy This project is an adaptation of this tutorial and used only for learning purposes: https:/

Antonio Bri Pérez 2 Nov 16, 2022