Paddle-VisualAttention
Results_Compared
Methods | Steps | GPU | Batch Size | Learning Rate | Patience | Decay Step | Decay Rate | Training Speed (FPS) | Accuracy |
---|---|---|---|---|---|---|---|---|---|
PaddlePaddle_SVHNClassifier | 54000 | GTX 1080 Ti | 1024 | 0.01 | 100 | 625 | 0.9 | ~1700 | 95.65% |
Pytorch_SVHNClassifier | 54000 | GTX 1080 Ti | 512 | 0.16 | 100 | 625 | 0.9 | ~1700 | 95.65% |
Introduction
The main idea of this exercise is to study the evolvement of the state of the art and main work along topic of visual attention model. There are two datasets that are studied: augmented MNIST and SVHN. The former dataset focused on canonical problem — handwritten digits recognition, but with cluttering and translation, the latter focus on real world problem — street view house number (SVHN) transcription. In this exercise, the following papers are studied in the way of developing a good intuition to choose a proper model to tackle each of the above challenges.
For more detail, please refer to this blog
Recommended environment
Python 3.6+
paddlepaddle-gpu 2.0.2
nccl 2.0+
editdistance
visdom
h5py
protobuf
lmdb
Install
Install env
Install paddle following the official tutorial.
pip install visdom
pip install h5py
pip install protobuf
pip install lmdb
Dataset
-
Download SVHN Dataset format 1
-
Extract to data folder, now your folder structure should be like below:
SVHNClassifier - data - extra - 1.png - 2.png - ... - digitStruct.mat - test - 1.png - 2.png - ... - digitStruct.mat - train - 1.png - 2.png - ... - digitStruct.mat
Usage
-
(Optional) Take a glance at original images with bounding boxes
Open `draw_bbox.ipynb` in Jupyter
-
Convert to LMDB format
$ python convert_to_lmdb.py --data_dir ./data
-
(Optional) Test for reading LMDBs
Open `read_lmdb_sample.ipynb` in Jupyter
-
Train
$ python train.py --data_dir ./data --logdir ./logs
-
Retrain if you need
$ python train.py --data_dir ./data --logdir ./logs_retrain --restore_checkpoint ./logs/model-100.pth
-
Evaluate
$ python eval.py --data_dir ./data ./logs/model-100.pth
-
Visualize
$ python -m visdom.server $ python visualize.py --logdir ./logs
-
Infer
$ python infer.py --checkpoint=./logs/model-100.pth ./images/test1.png
-
Clean
$ rm -rf ./logs or $ rm -rf ./logs_retrain