A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

Overview

SVHNClassifier-PyTorch

A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

If you're interested in C++ inference, move HERE

Results

Steps GPU Batch Size Learning Rate Patience Decay Step Decay Rate Training Speed (FPS) Accuracy
54000 GTX 1080 Ti 512 0.16 100 625 0.9 ~1700 95.65%

Sample

$ python infer.py -c=./logs/model-54000.pth ./images/test-75.png
length: 2
digits: 7 5 10 10 10

$ python infer.py -c=./logs/model-54000.pth ./images/test-190.png
length: 3
digits: 1 9 0 10 10

Loss

Requirements

  • Python 3.6

  • torch 1.0

  • torchvision 0.2.1

  • visdom

    $ pip install visdom
    
  • h5py

    In Ubuntu:
    $ sudo apt-get install libhdf5-dev
    $ sudo pip install h5py
    
  • protobuf

    $ pip install protobuf
    
  • lmdb

    $ pip install lmdb
    

Setup

  1. Clone the source code

    $ git clone https://github.com/potterhsu/SVHNClassifier-PyTorch
    $ cd SVHNClassifier-PyTorch
    
  2. Download SVHN Dataset format 1

  3. Extract to data folder, now your folder structure should be like below:

    SVHNClassifier
        - data
            - extra
                - 1.png 
                - 2.png
                - ...
                - digitStruct.mat
            - test
                - 1.png 
                - 2.png
                - ...
                - digitStruct.mat
            - train
                - 1.png 
                - 2.png
                - ...
                - digitStruct.mat
    

Usage

  1. (Optional) Take a glance at original images with bounding boxes

    Open `draw_bbox.ipynb` in Jupyter
    
  2. Convert to LMDB format

    $ python convert_to_lmdb.py --data_dir ./data
    
  3. (Optional) Test for reading LMDBs

    Open `read_lmdb_sample.ipynb` in Jupyter
    
  4. Train

    $ python train.py --data_dir ./data --logdir ./logs
    
  5. Retrain if you need

    $ python train.py --data_dir ./data --logdir ./logs_retrain --restore_checkpoint ./logs/model-100.pth
    
  6. Evaluate

    $ python eval.py --data_dir ./data ./logs/model-100.pth
    
  7. Visualize

    $ python -m visdom.server
    $ python visualize.py --logdir ./logs
    
  8. Infer

    $ python infer.py --checkpoint=./logs/model-100.pth ./images/test1.png
    
  9. Clean

    $ rm -rf ./logs
    or
    $ rm -rf ./logs_retrain
    
Comments
  • accuracy = 0.000000, best accuracy 0.000000

    accuracy = 0.000000, best accuracy 0.000000

    I'm getting zeros for accuracy while the loss is decreasing.

    # python train.py --data_dir ./data --logdir ./logs_train_0514_run2
    Start training
    train.py:99: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
      datetime.now(), step, loss.data[0], learning_rate, examples_per_sec)
    => 2018-05-14 10:08:17.556565: step 100, loss = 7.605348, learning_rate = 0.010000 (507.5 examples/sec)
    => 2018-05-14 10:08:23.371309: step 200, loss = 6.634058, learning_rate = 0.010000 (556.2 examples/sec)
    => 2018-05-14 10:08:29.204335: step 300, loss = 6.444423, learning_rate = 0.010000 (554.7 examples/sec)
    => 2018-05-14 10:08:35.037947: step 400, loss = 6.654078, learning_rate = 0.010000 (554.6 examples/sec)
    => 2018-05-14 10:08:40.876440: step 500, loss = 6.415401, learning_rate = 0.010000 (554.1 examples/sec)
    => 2018-05-14 10:08:46.724192: step 600, loss = 6.980000, learning_rate = 0.010000 (553.6 examples/sec)
    => 2018-05-14 10:08:52.578867: step 700, loss = 7.336755, learning_rate = 0.010000 (552.7 examples/sec)
    => 2018-05-14 10:08:58.457534: step 800, loss = 6.166699, learning_rate = 0.010000 (550.8 examples/sec)
    => 2018-05-14 10:09:04.360389: step 900, loss = 6.186161, learning_rate = 0.010000 (547.6 examples/sec)
    => 2018-05-14 10:09:10.669834: step 1000, loss = 6.420802, learning_rate = 0.010000 (512.7 examples/sec)
    => Evaluating on validation dataset...
    /notebooks/evaluator.py:16: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
      images, length_labels, digits_labels = (Variable(images.cuda(), volatile=True),
    ==> accuracy = 0.000000, best accuracy 0.000000
    => patience = 99
    => 2018-05-14 10:09:31.108286: step 1100, loss = 5.796063, learning_rate = 0.010000 (524.8 examples/sec)
    => 2018-05-14 10:09:37.420145: step 1200, loss = 5.399920, learning_rate = 0.010000 (512.9 examples/sec)
    => 2018-05-14 10:09:43.742597: step 1300, loss = 5.895159, learning_rate = 0.010000 (512.1 examples/sec)
    # python train.py --data_dir ./data --logdir ./logs_train_0514_run2
    Start training
    train.py:99: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
      datetime.now(), step, loss.data[0], learning_rate, examples_per_sec)
    => 2018-05-14 10:08:17.556565: step 100, loss = 7.605348, learning_rate = 0.010000 (507.5 examples/sec)
    => 2018-05-14 10:08:23.371309: step 200, loss = 6.634058, learning_rate = 0.010000 (556.2 examples/sec)
    => 2018-05-14 10:08:29.204335: step 300, loss = 6.444423, learning_rate = 0.010000 (554.7 examples/sec)
    => 2018-05-14 10:08:35.037947: step 400, loss = 6.654078, learning_rate = 0.010000 (554.6 examples/sec)
    => 2018-05-14 10:08:40.876440: step 500, loss = 6.415401, learning_rate = 0.010000 (554.1 examples/sec)
    => 2018-05-14 10:08:46.724192: step 600, loss = 6.980000, learning_rate = 0.010000 (553.6 examples/sec)
    => 2018-05-14 10:08:52.578867: step 700, loss = 7.336755, learning_rate = 0.010000 (552.7 examples/sec)
    => 2018-05-14 10:08:58.457534: step 800, loss = 6.166699, learning_rate = 0.010000 (550.8 examples/sec)
    => 2018-05-14 10:09:04.360389: step 900, loss = 6.186161, learning_rate = 0.010000 (547.6 examples/sec)
    => 2018-05-14 10:09:10.669834: step 1000, loss = 6.420802, learning_rate = 0.010000 (512.7 examples/sec
    

    Any ideas?

    opened by rlan 1
  • How to distinguish results from empty image and image containing 1

    How to distinguish results from empty image and image containing 1

    Hi,

    I am using this classifier to recognize square images containing numbers.

    In most cases this classifier performs very well but when there are no numbers in the image, it returns 1 (length: 1, digits: [1, 10, 10, 10, 10]).

    It's confusing because it also returns same result when recognizing images containing only 1.

    How can I distinguish them?

    opened by ghwn 0
  • Possibly incorrect layer size

    Possibly incorrect layer size

    After reviewing the mentioned paper in section 5.1:

    The fully connected layers contain 3,072 units each

    And your other version of this repository written in tensorflow:

    https://github.com/potterhsu/SVHNClassifier/blob/475fb68fde1fa1c057cc301ef0af049d9b9c4afb/model.py#L72

    Is it possible the following lines were supposed to be "4 * 4 * 192"?

    https://github.com/potterhsu/SVHNClassifier-PyTorch/blob/5f062e5afc134d9b56ab3dc6282e7762799f5159/model.py#L77

    https://github.com/potterhsu/SVHNClassifier-PyTorch/blob/5f062e5afc134d9b56ab3dc6282e7762799f5159/model.py#L102

    opened by slvrfn 0
  • AssertionError: Torch not compiled with CUDA enabled

    AssertionError: Torch not compiled with CUDA enabled

    Thank you for your efforts in making this available.

    On issuing: python train.py --data_dir ./data --logdir ./logs

    I get the error:

                                   AssertionError: Torch not compiled with CUDA enabled
    

    I then installed using: conda install -c pytorch torchvision

    Now I am getting the error:

             Found no NVIDIA driver on your system. Please check that you
             have an NVIDIA GPU and installed a driver from
             http://www.nvidia.com/Download/index.aspx
    

    ==========================================

    How can I run without a GPU?

    opened by jimmie11 0
  • Issue evaluating

    Issue evaluating

    The model's evaluation accuracy maintains 0.00% throughout the entire training process. The model's loss decreases significantly but when evaluating on the validation set it returns 0% accuracy. No code was modified to perform training. Any idea of what the issue may be?

    opened by jake-shomer 2
Owner
Potter Hsu
Potter Hsu
Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning

Here is deepparse. Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning. Use deepparse to Use the pr

GRAAL/GRAIL 192 Dec 20, 2022
Neural network for digit classification powered by cuda

cuda_nn_mnist Neural network library for digit classification powered by cuda Resources The library was built to work with MNIST dataset. python-mnist

Nikita Ardashev 1 Dec 20, 2021
PINN(s): Physics-Informed Neural Network(s) for von Karman vortex street

PINN(s): Physics-Informed Neural Network(s) for von Karman vortex street This is

ShotaDEGUCHI 2 Apr 18, 2022
PyTorch implementation of "ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context" (INTERSPEECH 2020)

ContextNet ContextNet has CNN-RNN-transducer architecture and features a fully convolutional encoder that incorporates global context information into

Sangchun Ha 24 Nov 24, 2022
MVGCN: a novel multi-view graph convolutional network (MVGCN) framework for link prediction in biomedical bipartite networks.

MVGCN MVGCN: a novel multi-view graph convolutional network (MVGCN) framework for link prediction in biomedical bipartite networks. Developer: Fu Hait

null 13 Dec 1, 2022
Blender add-on: Add to Cameras menu: View → Camera, View → Add Camera, Camera → View, Previous Camera, Next Camera

Blender add-on: Camera additions In 3D view, it adds these actions to the View|Cameras menu: View → Camera : set the current camera to the 3D view Vie

German Bauer 11 Feb 8, 2022
This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent Convolutional Networks.

Orientation independent Möbius CNNs This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of

Maurice Weiler 59 Dec 9, 2022
Spontaneous Facial Micro Expression Recognition using 3D Spatio-Temporal Convolutional Neural Networks

Spontaneous Facial Micro Expression Recognition using 3D Spatio-Temporal Convolutional Neural Networks Abstract Facial expression recognition in video

Bogireddy Sai Prasanna Teja Reddy 103 Dec 29, 2022
Shallow Convolutional Neural Networks for Human Activity Recognition using Wearable Sensors

-IEEE-TIM-2021-1-Shallow-CNN-for-HAR [IEEE TIM 2021-1] Shallow Convolutional Neural Networks for Human Activity Recognition using Wearable Sensors All

Wenbo Huang 1 May 17, 2022
Python interface for the DIGIT tactile sensor

DIGIT-INTERFACE Python interface for the DIGIT tactile sensor. For updates and discussions please join the #DIGIT channel at the www.touch-sensing.org

Facebook Research 35 Dec 22, 2022
RNN Predict Street Commercial Vitality

RNN-for-Predicting-Street-Vitality Code and dataset for Predicting the Vitality of Stores along the Street based on Business Type Sequence via Recurre

Zidong LIU 1 Dec 15, 2021
This program generates a random 12 digit/character password (upper and lowercase) and stores it in a file along with your username and app/website.

PasswordGeneratorAndVault This program generates a random 12 digit/character password (upper and lowercase) and stores it in a file along with your us

Chris 1 Feb 26, 2022
PyTorch framework, for reproducing experiments from the paper Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks

Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks. Code, based on the PyTorch framework, for reprodu

Asaf 3 Dec 27, 2022
This project is a loose implementation of paper "Algorithmic Financial Trading with Deep Convolutional Neural Networks: Time Series to Image Conversion Approach"

Stock Market Buy/Sell/Hold prediction Using convolutional Neural Network This repo is an attempt to implement the research paper titled "Algorithmic F

Asutosh Nayak 136 Dec 28, 2022
To propose and implement a multi-class classification approach to disaster assessment from the given data set of post-earthquake satellite imagery.

To propose and implement a multi-class classification approach to disaster assessment from the given data set of post-earthquake satellite imagery.

Kunal Wadhwa 2 Jan 5, 2022
Implementation of Uniformer, a simple attention and 3d convolutional net that achieved SOTA in a number of video classification tasks

Uniformer - Pytorch Implementation of Uniformer, a simple attention and 3d convolutional net that achieved SOTA in a number of video classification ta

Phil Wang 90 Nov 24, 2022
Spatial Temporal Graph Convolutional Networks (ST-GCN) for Skeleton-Based Action Recognition in PyTorch

Reminder ST-GCN has transferred to MMSkeleton, and keep on developing as an flexible open source toolbox for skeleton-based human understanding. You a

sijie yan 1.1k Dec 25, 2022
Learning and Building Convolutional Neural Networks using PyTorch

Image Classification Using Deep Learning Learning and Building Convolutional Neural Networks using PyTorch. Models, selected are based on number of ci

Mayur 126 Dec 22, 2022
Pytorch implementation of AngularGrad: A New Optimization Technique for Angular Convergence of Convolutional Neural Networks

AngularGrad Optimizer This repository contains the oficial implementation for AngularGrad: A New Optimization Technique for Angular Convergence of Con

mario 124 Sep 16, 2022