A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

Potter Hsu

Last update: Jan 3, 2023

Related tags

Overview

SVHNClassifier-PyTorch

A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

If you're interested in C++ inference, move HERE

Results

Steps	GPU	Batch Size	Learning Rate	Patience	Decay Step	Decay Rate	Training Speed (FPS)	Accuracy
54000	GTX 1080 Ti	512	0.16	100	625	0.9	~1700	95.65%

Sample

$ python infer.py -c=./logs/model-54000.pth ./images/test-75.png
length: 2
digits: 7 5 10 10 10

$ python infer.py -c=./logs/model-54000.pth ./images/test-190.png
length: 3
digits: 1 9 0 10 10

Loss

Requirements

Python 3.6
torch 1.0
torchvision 0.2.1
visdom
```
$ pip install visdom
```

h5py

In Ubuntu:
$ sudo apt-get install libhdf5-dev
$ sudo pip install h5py

protobuf
```
$ pip install protobuf
```
lmdb
```
$ pip install lmdb
```

Setup

Clone the source code

$ git clone https://github.com/potterhsu/SVHNClassifier-PyTorch
$ cd SVHNClassifier-PyTorch

Download SVHN Dataset format 1

Extract to data folder, now your folder structure should be like below:

SVHNClassifier
    - data
        - extra
            - 1.png 
            - 2.png
            - ...
            - digitStruct.mat
        - test
            - 1.png 
            - 2.png
            - ...
            - digitStruct.mat
        - train
            - 1.png 
            - 2.png
            - ...
            - digitStruct.mat

Usage

(Optional) Take a glance at original images with bounding boxes
```
Open `draw_bbox.ipynb` in Jupyter
```

Convert to LMDB format

$ python convert_to_lmdb.py --data_dir ./data

(Optional) Test for reading LMDBs

Open `read_lmdb_sample.ipynb` in Jupyter

Train

$ python train.py --data_dir ./data --logdir ./logs

Retrain if you need

$ python train.py --data_dir ./data --logdir ./logs_retrain --restore_checkpoint ./logs/model-100.pth

Evaluate

$ python eval.py --data_dir ./data ./logs/model-100.pth

Visualize

$ python -m visdom.server
$ python visualize.py --logdir ./logs

Infer

$ python infer.py --checkpoint=./logs/model-100.pth ./images/test1.png

Clean

$ rm -rf ./logs
or
$ rm -rf ./logs_retrain

Comments

accuracy = 0.000000, best accuracy 0.000000

I'm getting zeros for accuracy while the loss is decreasing.

# python train.py --data_dir ./data --logdir ./logs_train_0514_run2
Start training
train.py:99: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
  datetime.now(), step, loss.data[0], learning_rate, examples_per_sec)
=> 2018-05-14 10:08:17.556565: step 100, loss = 7.605348, learning_rate = 0.010000 (507.5 examples/sec)
=> 2018-05-14 10:08:23.371309: step 200, loss = 6.634058, learning_rate = 0.010000 (556.2 examples/sec)
=> 2018-05-14 10:08:29.204335: step 300, loss = 6.444423, learning_rate = 0.010000 (554.7 examples/sec)
=> 2018-05-14 10:08:35.037947: step 400, loss = 6.654078, learning_rate = 0.010000 (554.6 examples/sec)
=> 2018-05-14 10:08:40.876440: step 500, loss = 6.415401, learning_rate = 0.010000 (554.1 examples/sec)
=> 2018-05-14 10:08:46.724192: step 600, loss = 6.980000, learning_rate = 0.010000 (553.6 examples/sec)
=> 2018-05-14 10:08:52.578867: step 700, loss = 7.336755, learning_rate = 0.010000 (552.7 examples/sec)
=> 2018-05-14 10:08:58.457534: step 800, loss = 6.166699, learning_rate = 0.010000 (550.8 examples/sec)
=> 2018-05-14 10:09:04.360389: step 900, loss = 6.186161, learning_rate = 0.010000 (547.6 examples/sec)
=> 2018-05-14 10:09:10.669834: step 1000, loss = 6.420802, learning_rate = 0.010000 (512.7 examples/sec)
=> Evaluating on validation dataset...
/notebooks/evaluator.py:16: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  images, length_labels, digits_labels = (Variable(images.cuda(), volatile=True),
==> accuracy = 0.000000, best accuracy 0.000000
=> patience = 99
=> 2018-05-14 10:09:31.108286: step 1100, loss = 5.796063, learning_rate = 0.010000 (524.8 examples/sec)
=> 2018-05-14 10:09:37.420145: step 1200, loss = 5.399920, learning_rate = 0.010000 (512.9 examples/sec)
=> 2018-05-14 10:09:43.742597: step 1300, loss = 5.895159, learning_rate = 0.010000 (512.1 examples/sec)
# python train.py --data_dir ./data --logdir ./logs_train_0514_run2
Start training
train.py:99: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
  datetime.now(), step, loss.data[0], learning_rate, examples_per_sec)
=> 2018-05-14 10:08:17.556565: step 100, loss = 7.605348, learning_rate = 0.010000 (507.5 examples/sec)
=> 2018-05-14 10:08:23.371309: step 200, loss = 6.634058, learning_rate = 0.010000 (556.2 examples/sec)
=> 2018-05-14 10:08:29.204335: step 300, loss = 6.444423, learning_rate = 0.010000 (554.7 examples/sec)
=> 2018-05-14 10:08:35.037947: step 400, loss = 6.654078, learning_rate = 0.010000 (554.6 examples/sec)
=> 2018-05-14 10:08:40.876440: step 500, loss = 6.415401, learning_rate = 0.010000 (554.1 examples/sec)
=> 2018-05-14 10:08:46.724192: step 600, loss = 6.980000, learning_rate = 0.010000 (553.6 examples/sec)
=> 2018-05-14 10:08:52.578867: step 700, loss = 7.336755, learning_rate = 0.010000 (552.7 examples/sec)
=> 2018-05-14 10:08:58.457534: step 800, loss = 6.166699, learning_rate = 0.010000 (550.8 examples/sec)
=> 2018-05-14 10:09:04.360389: step 900, loss = 6.186161, learning_rate = 0.010000 (547.6 examples/sec)
=> 2018-05-14 10:09:10.669834: step 1000, loss = 6.420802, learning_rate = 0.010000 (512.7 examples/sec

Any ideas?

opened by rlan 1

How to distinguish results from empty image and image containing 1

Hi,

I am using this classifier to recognize square images containing numbers.

In most cases this classifier performs very well but when there are no numbers in the image, it returns 1 (length: 1, digits: [1, 10, 10, 10, 10]).

It's confusing because it also returns same result when recognizing images containing only 1.

How can I distinguish them?

opened by ghwn 0
Possibly incorrect layer size

After reviewing the mentioned paper in section 5.1:

The fully connected layers contain 3,072 units each

And your other version of this repository written in tensorflow:

https://github.com/potterhsu/SVHNClassifier/blob/475fb68fde1fa1c057cc301ef0af049d9b9c4afb/model.py#L72

Is it possible the following lines were supposed to be "4 * 4 * 192"?

https://github.com/potterhsu/SVHNClassifier-PyTorch/blob/5f062e5afc134d9b56ab3dc6282e7762799f5159/model.py#L77

https://github.com/potterhsu/SVHNClassifier-PyTorch/blob/5f062e5afc134d9b56ab3dc6282e7762799f5159/model.py#L102

opened by slvrfn 0
AssertionError: Torch not compiled with CUDA enabled
Thank you for your efforts in making this available.

On issuing: python train.py --data_dir ./data --logdir ./logs

I get the error:

AssertionError: Torch not compiled with CUDA enabled

I then installed using: conda install -c pytorch torchvision

Now I am getting the error:

Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

==========================================

How can I run without a GPU?
opened by jimmie11 0
Issue evaluating

The model's evaluation accuracy maintains 0.00% throughout the entire training process. The model's loss decreases significantly but when evaluating on the validation set it returns 0% accuracy. No code was modified to perform training. Any idea of what the issue may be?

opened by jake-shomer 2

Owner

Potter Hsu

GitHub

Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning

Here is deepparse. Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning. Use deepparse to Use the pr

192 Dec 20, 2022

Neural network for digit classification powered by cuda

cuda_nn_mnist Neural network library for digit classification powered by cuda Resources The library was built to work with MNIST dataset. python-mnist

1 Dec 20, 2021

PINN(s): Physics-Informed Neural Network(s) for von Karman vortex street

PINN(s): Physics-Informed Neural Network(s) for von Karman vortex street This is

2 Apr 18, 2022

PyTorch implementation of "ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context" (INTERSPEECH 2020)

ContextNet ContextNet has CNN-RNN-transducer architecture and features a fully convolutional encoder that incorporates global context information into

24 Nov 24, 2022

MVGCN: a novel multi-view graph convolutional network (MVGCN) framework for link prediction in biomedical bipartite networks.

MVGCN MVGCN: a novel multi-view graph convolutional network (MVGCN) framework for link prediction in biomedical bipartite networks. Developer: Fu Hait

13 Dec 1, 2022

Blender add-on: Add to Cameras menu: View → Camera, View → Add Camera, Camera → View, Previous Camera, Next Camera

Blender add-on: Camera additions In 3D view, it adds these actions to the View|Cameras menu: View → Camera : set the current camera to the 3D view Vie

11 Feb 8, 2022

This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent Convolutional Networks.

Orientation independent Möbius CNNs This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of

59 Dec 9, 2022

Spontaneous Facial Micro Expression Recognition using 3D Spatio-Temporal Convolutional Neural Networks

Spontaneous Facial Micro Expression Recognition using 3D Spatio-Temporal Convolutional Neural Networks Abstract Facial expression recognition in video

103 Dec 29, 2022

Shallow Convolutional Neural Networks for Human Activity Recognition using Wearable Sensors

-IEEE-TIM-2021-1-Shallow-CNN-for-HAR [IEEE TIM 2021-1] Shallow Convolutional Neural Networks for Human Activity Recognition using Wearable Sensors All

1 May 17, 2022

Python interface for the DIGIT tactile sensor

DIGIT-INTERFACE Python interface for the DIGIT tactile sensor. For updates and discussions please join the #DIGIT channel at the www.touch-sensing.org

35 Dec 22, 2022

RNN Predict Street Commercial Vitality

RNN-for-Predicting-Street-Vitality Code and dataset for Predicting the Vitality of Stores along the Street based on Business Type Sequence via Recurre

1 Dec 15, 2021

This program generates a random 12 digit/character password (upper and lowercase) and stores it in a file along with your username and app/website.

PasswordGeneratorAndVault This program generates a random 12 digit/character password (upper and lowercase) and stores it in a file along with your us

1 Feb 26, 2022

PyTorch framework, for reproducing experiments from the paper Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks

Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks. Code, based on the PyTorch framework, for reprodu

3 Dec 27, 2022

This project is a loose implementation of paper "Algorithmic Financial Trading with Deep Convolutional Neural Networks: Time Series to Image Conversion Approach"

Stock Market Buy/Sell/Hold prediction Using convolutional Neural Network This repo is an attempt to implement the research paper titled "Algorithmic F

136 Dec 28, 2022

To propose and implement a multi-class classification approach to disaster assessment from the given data set of post-earthquake satellite imagery.

2 Jan 5, 2022

Implementation of Uniformer, a simple attention and 3d convolutional net that achieved SOTA in a number of video classification tasks

Uniformer - Pytorch Implementation of Uniformer, a simple attention and 3d convolutional net that achieved SOTA in a number of video classification ta

90 Nov 24, 2022

Spatial Temporal Graph Convolutional Networks (ST-GCN) for Skeleton-Based Action Recognition in PyTorch

Reminder ST-GCN has transferred to MMSkeleton, and keep on developing as an flexible open source toolbox for skeleton-based human understanding. You a

1.1k Dec 25, 2022

Learning and Building Convolutional Neural Networks using PyTorch

Image Classification Using Deep Learning Learning and Building Convolutional Neural Networks using PyTorch. Models, selected are based on number of ci

126 Dec 22, 2022

Pytorch implementation of AngularGrad: A New Optimization Technique for Angular Convergence of Convolutional Neural Networks

AngularGrad Optimizer This repository contains the oficial implementation for AngularGrad: A New Optimization Technique for Angular Convergence of Con

124 Sep 16, 2022

A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

Related tags

Overview

SVHNClassifier-PyTorch

Results

Sample

Loss

Requirements

Setup

Usage

Comments

accuracy = 0.000000, best accuracy 0.000000

How to distinguish results from empty image and image containing 1

Possibly incorrect layer size

AssertionError: Torch not compiled with CUDA enabled

Issue evaluating

Owner

Potter Hsu

Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning

Neural network for digit classification powered by cuda

PINN(s): Physics-Informed Neural Network(s) for von Karman vortex street

PyTorch implementation of "ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context" (INTERSPEECH 2020)

MVGCN: a novel multi-view graph convolutional network (MVGCN) framework for link prediction in biomedical bipartite networks.

Blender add-on: Add to Cameras menu: View → Camera, View → Add Camera, Camera → View, Previous Camera, Next Camera

This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent Convolutional Networks.

Spontaneous Facial Micro Expression Recognition using 3D Spatio-Temporal Convolutional Neural Networks

Shallow Convolutional Neural Networks for Human Activity Recognition using Wearable Sensors

Python interface for the DIGIT tactile sensor

RNN Predict Street Commercial Vitality

This program generates a random 12 digit/character password (upper and lowercase) and stores it in a file along with your username and app/website.

PyTorch framework, for reproducing experiments from the paper Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks

This project is a loose implementation of paper "Algorithmic Financial Trading with Deep Convolutional Neural Networks: Time Series to Image Conversion Approach"

To propose and implement a multi-class classification approach to disaster assessment from the given data set of post-earthquake satellite imagery.

Implementation of Uniformer, a simple attention and 3d convolutional net that achieved SOTA in a number of video classification tasks

Spatial Temporal Graph Convolutional Networks (ST-GCN) for Skeleton-Based Action Recognition in PyTorch

Learning and Building Convolutional Neural Networks using PyTorch

Pytorch implementation of AngularGrad: A New Optimization Technique for Angular Convergence of Convolutional Neural Networks