Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text Recognition"

Christian Bartz

Last update: Jan 5, 2023

Related tags

Computer Vision computer-vision deep-learning chainer cnn semi-supervised-learning scene-text-recognition

Overview

SEE: Towards Semi-Supervised End-to-End Scene Text Recognition

Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text Recognition". You can read a preprint on Arxiv

Installation

You can install the project directly on your PC or use a Docker container

Directly on your PC

Make sure to use Python 3
It is a good idea to create a virtual environment (example for creating a venv)
Make sure you have the latest version of CUDA (>= 8.0) installed
Install CUDNN (> 6.0)
Install NCCL (> 2.0) installation guide
Install all requirements with the following command: pip install -r requirements.txt
Check that chainer can use the GPU:
- start the python interpreter: python
- import chainer: import chainer
- check that cuda is available: chainer.cuda.available
- check that cudnn is enabled: chainer.cuda.cudnn_enabled
- the output of both commands should be True

Using Docker

Install Docker
- Windows: Get it here
- Mac: Get it here
- Linux: User your favourite package manager i.e. pacman -S docker, or use this guide for Ubuntu.
Install CUDA related things:
- CUDA (>= 8.0) installed
- CUDNN (> 6.0)
- nvidia-docker (Ubuntu, Arch Like OS))
Get NCCL
- make sure to download the version for Ubuntu 16.04, that fits to your local CUDA configuration (i.e. you have installed CUDA 9.1 take the version for CUDA 9.1, if you have CUDA 8, take the version for CUDA 8)
- place it in the root folder of the project
Build the Docker image
- docker build -t see .
- If your host system uses CUDA with a version earlier than 9.1, specify the corresponding docker image to match the configuration of your machine (see this list for available options). For example, for CUDA 8 and CUDNN 6 use the following instead:
```
docker build -t see --build-arg FROM_IMAGE=nvidia/cuda:8.0-cudnn6-devel-ubuntu16.04 .
```
- if you did not download a file called nccl-repo-ubuntu1604-2.1.15-ga-cuda9.1_1-1_amd64.deb, set the argument NCCL_NAME to the name of the file you downloaded. For example:
```
docker build -t see --build-arg NCCL_NAME=nccl-repo-ubuntu1604-2.1.15-ga-cuda9.0_1-1_amd64.deb .
```
Check that everything is okay, by entering a shell in the container and do the following:
- run the container with: nvidia-docker run -it see
- start the python interpreter: python3
- import chainer: import chainer
- check that cuda is available: chainer.cuda.available
- check that cudnn is enabled: chainer.cuda.cudnn_enabled
- the output of both commands should be True
Hint: make sure to mount all data folders you need into the container with the -v option for running a container.

General Training Hints

If you like to train a network with more than 4 words per image, you will need to adjust or delete the loss_weights (see this line). Otherwise, the code will throw errors at you. They are mainly meant for training FSNS models and should be discarded when training other models.

SVHN Experiments

We performed several experiments on the SVHN dataset. First, we tried to see whether our architecture is able to reach competitive results on the SVHN recognition challenge. Second, we wanted to determine whether our localization network can find a text distributed on a given grid. In our last experiment we created a dataset, where we randomly distributed the text samples on the image.

Datasets

This section describes what needs to be done in order to get/prepare the data. There is no need for creating the custom datasets by yourself, we also offer them for download. The information on how to create the datasets is included here for reference.

Original SVHN data

Get the original SVHN datset from here.
Extract the label data using the script datasets/svn/svhn_dataextract_to_json.py.
use the script datasets/svhn/prepare_svhn_crops.py to crop all bounding boxes, including some background from the SVHN images. Use the script like that: python prepare_svhn_crops.py <path to svhn json> 64 <where to save the cropped images> <name of stage>. For more information about possible commands you can use python prepare_svhn_crops.py -h.

Grid Dataset

Follow steps 1 and 2 of the last subsection in order to get all SVHN images and the corresponding groundtruth.
The script datasets/svhn/create_svhn_dataset_4_images.py can be used to create the dataset.
The command python create_svhn_dataset_4_images.py -h shows all available command line options for this script

Random Dataset

Follow steps 1 and 2 of the first subsection in order to get all SVHN images and the corresponding groundtruth.
The script datasets/svhn/create_svhn_dataset.py can be used to create the dataset.
The command python create_svhn_dataset.py -h shows all available command line options for this script.

Dataset Download

You can also download already created datasets here.

Training the model

You can use the script train_svhn.py to train a model that can detect and recognize SVHN like text. The script is tuned to use the custom datasets and should enable you to redo these experiments.

Preparations

Make sure that you have one of the datasets.
For training you will need:
1. the file svhn_char_map.json (you can find it in the folder datasets/svhn)
2. the ground truth files of the dataset you want to use
Add one line to the beginning of each ground truth file: <number of house numbers in image> <max number of chars per house number> (both values need to be separated by a tab character). If you are using the grid dataset it could look like that: 4 4.
prepare the curriculum specification as a json file, by following this template:
```
[
    {
        "train": "<path to train file>",
        "validation": "<path to validation file>"
    }
]
```
if you want to train using the curriculum learning strategy, you just need to add further dicts to this list.
use the script chainer/train_svhn.py for training the network.

Starting the training

The training can be run on GPU or CPU. You can also use multiple GPUs in a data parallel fashion. In order to specify which GPU to use just add the command line parameter -g <id of gpu to use> e.g. -g 0 for using the first GPU.

You can get a brief explanation of each command line option of the script train_svhn.py by running the script like this: python train_svhn.py -h

You will need to specify at least the following parameters:

dataset_specification - this is the path to the json file you just created
log_dir - this is the path to the directory where the logs shall be saved
--char-map ../datasets/svhn/svhn_char_map.json - path to the char map for mapping classes to labels.
--blank-label 0 - indicates that class 0 is the blank label
-b <batch-size> - set the batch size used for training

FSNS Experiments

In order to see, whether our idea is applicable in practice, we also did experiments on the FSNS dataset. The FSNS dataset contains images of French street name signs. The most notable characteristic of this dataset is, that this dataset does not contain any annotation for text localization. This fact makes this dataset quite suitable for our method, as we claim that we can locate and recognize text, even without the corresponding ground truth for localization.

Preparing the Dataset

Getting the dataset and making it usable with deep learning frameworks like Chainer is not an easy task. We provide some scripts that will download the dataset, convert it from the tensorflow format to single images and create a ground truth file, that is usable by our train code.

The folder datasets/fsns contains all scripts that are necessary for preparing the dataset. These steps need to be done:

use the script download_fsns.py for getting the dataset. You will need to specify a directory, where the data shall be saved.
the script tfrecord_to_image.py extracts all images and labels from the downloaded dataset.
We advise you to use the script swap_classes.py. With this script we will set the class of the blank label to be 0, as it is defined in the class to label map fsns_char_map.json. You can invoke the script like this: python swap_classes.py <gt_file> <output_file_name> 0 133
next, you will need to transform the original ground truth, to the ground truth format we used for training. Our ground truth format differs because we found that it is not possible to train the model if the word boundaries are not explicitly given to the model. We, therefore, transform the line based ground truth to a word based ground truth. You can use the script transform_gt.py for doing that. You could call the script like that: python transform_gt.py <path to original gt> fsns_char_map.json <path to new gt>.

Training the Network

Before you can start training the network, you will need to do the following preparations:

In the last section, we already introduced the transform_gt.py script. As we found that it is only possible to train a new model on the FSNS dataset, when using a curriculum learning strategy, we need to create a learning curriculum prior to starting the training. You can do this by following these steps:

create ground truth files for each step of the curriculum with the transform_gt.py script.
1. start with a reasonable number of maximum words (2 is a good choice here)
2. create a ground truth file with all images that contain max. 2 words by using the transform_gt.py script: python transform_gt.py <path to downloaded gt> fsns_char_map.json <path to 2 word gt> --max-words 2 --blank-label 0
3. Repeat this step with 3 and 4 words (you can also take 5 and 6, too), but make sure to only include images with the corresponding amount of words (--min-words is the flag to use)
Add the path to your files to a .json file that could be called curriculum.json This file works exactly the same as the file discussed in step 3 in the preparations section for the SVHN experiments.

Once you are done with this, you can actually train the network 🎉

Training the network happens, by using the train_fsns.py script. python train_fsns.py -h shows all available command-line options. This script works very similarly to the train_svhn.py script

You will need to specify at least the following parameters:

dataset_specification - this is the path to the json file you just created
log_dir - this is the path to the directory where the logs shall be saved
--char-map ../datasets/fsns/fsns_char_map.json - path to the char map for mapping classes to labels.
--blank-label 0 - indicates that class 0 is the blank label
-b <batch-size> - set the batch size used for training

FSNS Demo

In case you only want to see how the model behaves on a given image, you can use the fsns_demo.py script. This script expects a trained model, an image and a char map and prints you the predicted words in the image + the predicted bounding boxes. If you download the model provided here, you could call the script like this: python fsns_demo.py <path to log directory> model_35000.npz <path to example image> ../datasets/fsns/fsns_char_map.json It should be fairly easy to extend this script to also work with other models. Just have a look at how the different evaluators create the network and how they extract the characters from the predictions and you should be good to go!

Text Recognition

Although not mentioned in the paper, we also provide a model with which, you can perform text recognition on already cropped text lines. We also provide code for training such a model. Everything works very similar to the scripts provided for SVHN and FSNS.

Dataset

Unfortunately, we can not offer our entire train dataset for download, as it is way too huge. But if you want to train a text recognition model on your own, you can use the "Synthetic Word Dataset" (download it here). After you've downloaded the dataset, you will need to do some post processing and create a groundtruth similar to the one for the FSNS dataset. We provide a sample dataset at the location, where you can also download the text recognition model (which is here).

Training

After you are done with preparing the dataset, you can start training.

Training the network happens, by using the train_text_recognition.py script. python train_text_recognition.py -h shows all available command-line options. This script works very similarly to the train_svhn.py and train_fsns.py script

You will need to specify at least the following parameters:

dataset_specification - this is the path to the json file you just created
log_dir - this is the path to the directory where the logs shall be saved
--char-map ../datasets/textrec/ctc_char_map.json - path to the char map for mapping classes to labels.
--blank-label 0 - indicates that class 0 is the blank label
-b <batch-size> - set the batch size used for training

Text Recognition Demo

Analog to the fsns_demo.py script, we offer a demo script for text recognition named text_recognition_demo.py. This script expects a trained model, an image and a char map and prints you the predicted words in the image + the predicted bounding boxes. If you download the model provided here, you could call the script like this: python text_recognition_demo.py <path to log directory> model_190000.npz <path to example image> ../datasets/textrec/ctc_char_map.json It should be fairly easy to extend this script to also work with other models. Just have a look at how the different evaluators create the network and how they extract the characters from the predictions and you should be good to go!

Pretrained Models

You can download our best performing model on the FSNS dataset, a model for our SVHN experiments and also a model for our text recognition experiments here.

General Notes on Training

This section contains information about things that happen while a network is training. It includes a description of all data that is being logged and backed up for each training run and a description of a tool that can be used to inspect the training, while it is running.

Contents of the log dir

The code will create a new subdirectory in the log dir, where it puts all data that is to be logged. The code logs the following pieces of data:

it creates a backup of the currently used network definition files
it saves a snapshot of the model at each epoch, or after snapshot_interval iterations (default 5000)
it saves loss and accuracy values at the configured print interval (each time after 100 iterations)
it will save the prediction of the model on a given or randomly chosen sample. This visualization helps with assessing, whether the network is converging or not. It also enables you to inspect the training progress while the network is training.

Inspecting the training progress

If you leave the default settings, you can inspect the progress of the training in real time, by using the script show_progress.py. This script is located in the folder utils. You can get all supported command line arguments with this command: python show_progress.py -h. Normally you will want to start the program like this: python show_progress.py. It will open a TK window. In case the program complains that it is not able to find TK related libraries, you will need to install them.

Another approach is that you can use ChainerUI, execute following commands to setup ChainerUI:

chainerui db create
chainerui db upgrade

Create a project using the following command from the project directory:

chainerui project create -d ./ -n see-ocr

To check progress start server:

chainerui server

Creating an animation of plotted train steps

The training script contains a little helper that applies the current state of the model to an image and saves the result of this application for each iteration (or the way you configure it).

You can use the script create_video.py to create an animation out of these images. In order to use the script, you will need to install ffmpeg (and have the ffmpeg command in your path) and you will need to install imagemagick (and have the convert command in your path). You can then create a video with this command line call: python create_video.py <path to directory with images> <path to destination video>. You can learn about further command line arguments with python create_video.py -h.

Evaluation

You can evaluate all models (svhn/fsns/textrecognition) with the script evaluate.py in the chainer directory.

Usage

You will need a directory containing the following items:

log_file of the training
saved model
network definition files that have been backed up by the training script
set the gpu to use with --gpu <id of gpu>, the code does currently not work on CPU.
number of labels per timestep (typically max. 5 for SVHN and 21 for FSNS)

Evaluating a SVHN model

In order to evaluate a SVHN model, you will need to invoke the script like that: python evaluate.py svhn <path to dir with specified items> <name of snapshot to evaluate> <path to ground truth file> <path to char map (e.g. svhn_char_map.json)> --target-shape <input shape for recogntion net (e.g. 50,50)> <number of labels per timestep>

Evaluating a FSNS model

In order to evaluate a FSNS model, you will need to invoke the script like that: python evaluate.py fsns <path to dir with specified items> <name of snapshot to evaluate> <path to ground truth file> <path to char map (e.g. fsns_char_map.json)> <number of labels per timestep>

Evaluating a Text Recognition model

In order to evaluate a text recognition model, you will need to invoke the script like that: python evaluate.py textrec <path to dir with specified items> <name of snapshot to evaluate> <path to ground truth file> <path to char map (e.g. ctc_char_map.json)> 23

Citation

If you find this code useful, please cite our paper:

@inproceedings{bartz2018see,
    title={SEE: towards semi-supervised end-to-end scene text recognition},
    author={Bartz, Christian and Yang, Haojin and Meinel, Christoph},
    booktitle={Thirty-Second AAAI Conference on Artificial Intelligence},
    year={2018}
}

Notes

If there is anything totally unclear, or not working, please feel free to file an issue. If you did anything with the code, feel free to file a PR.

Comments

Build train.csv on custom dataset
Hello @Bartzi ,

Sorry for opening a new issue here but I feel this will help people understanding training procedure on a custom dataset.

Was following fsns dataset to understand training code and procedure to build train.csv, I will share my understanding of building train.csv and then have a couple of question to ask.

As per instructions of fsns dataset preparation after running all scripts, train.csv (for a first training set of 512) consists 6 21 as the first line and then 1807 rows of the dataset description, each row has a column indicating path of the training filepath and then 126 columns representing fsns_char_map.json dict key where the value of the key is the actual character which is chr(key). Please correct me if I am wrong.

Why 126? How to decide no of columns which represent each key? Since fsns_char_map.json has 134 key-value pairs then why not 134?

Can it be more than no of keys of fsns_char_map.json or any char_map.json?
opened by mit456 48
Single GPU training script error
Hello

I am trying to run the training script on the SVHN dataset with the following command: python chainer/train_svhn.py curriculum.json /logs --char-map datasets/svhn/svhn_char_map.json --blank-label 0 -b 10 -g 0

Running it on a single GPU. I followed the steps to run on a single GPU like it is mentioned in https://github.com/Bartzi/see/issues/6.

Using Cuda9.0 and equivalent cupy-cuda90 library. Chainer shows Truefor chainer.cuda.available and chainer.cuda.cudnn_enabled

I get the following error

/usr/local/lib/python3.5/dist-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters Traceback (most recent call last): File "chainer/train_svhn.py", line 147, in updater = StandardUpdater(iterator=train_iterators, optimizer=optimizer, device=args.gpus) File "/usr/local/lib/python3.5/dist-packages/chainer/training/updater.py", line 144, in init if device is not None and device >= 0: TypeError: unorderable types: list() >= int() Exception ignored in: <bound method MultiprocessIterator.del of <chainer.iterators.multiprocess_iterator.MultiprocessIterator object at 0x7fbddd666c50>> Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/chainer/iterators/multiprocess_iterator.py", line 117, in del File "/usr/local/lib/python3.5/dist-packages/chainer/iterators/multiprocess_iterator.py", line 242, in terminate AttributeError: 'NoneType' object has no attribute 'STATUS_TERMINATE' Exception ignored in: <bound method MultiprocessIterator.del of <chainer.iterators.multiprocess_iterator.MultiprocessIterator object at 0x7fbddd666d68>> Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/chainer/iterators/multiprocess_iterator.py", line 117, in del File "/usr/local/lib/python3.5/dist-packages/chainer/iterators/multiprocess_iterator.py", line 242, in terminate AttributeError: 'NoneType' object has no attribute 'STATUS_TERMINATE'

Please help resolve. Thanks!
opened by emushtaq 28

curriculum.json problem

Hi @Bartzi , recently I want to run the train_text_recognization.py and confused about the file curriculum.json. In your README.md file, it says that the template should be:

[ 
    {
        "train": "<path to train file>",
        "validation": "<path to validation file>"
    }
]

The question is, how to describe the path to train file? if I have a file tree structure below,

├── ctc_char_map.json
├── curriculum.json
├── train
│   ├── bg_deep_gray0_0.jpg
│   ├── bg_deep_gray0_1.jpg
│   ├── bg_deep_gray0_2.jpg
│   ├── bg_deep_gray0_3.jpg
│   ├── bg_deep_gray0_4.jpg
│   ├── bg_deep_gray0_5.jpg
│   └── bg_deep_gray0_6.jpg
└── validation
    ├── bg_deep_gray0_7.jpg
    ├── bg_deep_gray0_8.jpg
    └── bg_deep_gray0_9.jpg

should I set the path as

[ 
    {
        "train": "~/Documents/GitHub/see/datasets/textrec/train",
        "validation": "~/Documents/GitHub/see/datasets/textrec/validation"
    }
]

so that SEE will read the whole images in train/validation documents？ And I try to use the download_fsns.py to get download FSNS datasets, I found the structure of FSNS documents is like this, which really confused me beacuse there are so many subdirectory, how should I construct the curriculum.json for FSNS datasets?😂

Looking forward for some suggestions, thanks!

├── train
│   ├── 00000
│   ├── 00001
│   ├── 00002
│   ├── 00003
│   ├── 00004
│   ├── 00005
│   ├── 00006
│   ├── 00007
│   ├── 00008
│   ├── 00009
│   ├── 00010
│   ├── 00011
│   ├── 00012
│   ├── 00013
│   ├── 00014
│   ├── 00015
│   ├── 00016
│   ├── 00017
│   ├── 00018
│   ├── 00019
│   ├── 00020
│   ├── 00021
│   ├── 00022
│   ├── 00023
│   ├── 00024
│   ├── 00025
│   ├── 00026
│   ├── 00027
│   ├── 00028
│   ├── 00029
│   ├── 00030
│   ├── 00031
│   ├── 00032
│   ├── 00033
│   ├── 00034
│   ├── 00035
│   ├── 00036
│   ├── 00037
│   ├── 00038
│   ├── 00039
│   ├── 00040
│   ├── 00041
│   ├── 00042
│   ├── 00043
│   ├── 00044
│   └── 00045
├── train.csv
└── validation
    ├── 00046
    └── 00047

opened by jxlxt 19

Error training FSNS

Hello,

I'm trying to train the network but I'm having problems. I'm using some pictures from the FSNS dataset, not all of them, because my intention is just to know as fast as possible how to train the network in order to apply it to my own data after that.

When I execute:

python train_fsns.py --char-map ..\datasets\fsns\fsns_char_map.json ..\datasets\fsns\curriculum.json .\logs\ --blank-label 0 -b 32 --gpu 0

I got an output like this:

... ... could not load image: .\data_image\test\test\00000\0.png could not load image: .\data_image\test\test\00000\25.png could not load image: .\data_image\test\test\00000\38.png could not load image: .\data_image\test\test\00000\14.png could not load image: .\data_image\test\test\00000\35.png could not load image: .\data_image\test\test\00000\47.png could not load image: .\data_image\test\test\00000\38.png could not load image: .\data_image\test\test\00000\17.png could not load image: .\data_image\test\test\00000\2.png ... ...

I've placed the folders correctly:

chainer

data_image train_fsns.py

train validation test

test

00000 ...

Moreover, I've done some changes in train_fsns.py. I've change MultiprocessParallelUpdater by StandardUpdater because NCCL was giving me errors (I'm on Windows platform) and I only have one GPU.

I've changed:

line 11 to: from chainer.training import StandardUpdater line 172 to: updater = StandardUpdater(train_iterators, optimizer) and comment lines 222 and 236.

Is it correct?

Thank you so much and congrats for the project!

opened by anavc94 12
A running problem about train_ svhn.py

Hi, @Bartzi , recently, I want to learn the knowledge of end-to-end scene text recognition. I try to reproduce your excellent work to learn and understand this part of knowledge. But I had a problem running your train_svhn. py file:

TypeError: reshape() got an unexpected keyword argument 'order'

I want to ask how can I correct it? I would appreciate if you could give some explanation and help. Here's the code I used and the traceback：

`$ python train_svhn.py ../datasets/svhn/jsonfile/svhn_curriculum_specification.json ../datasets/svhn/runningLog/ -g 1 --char-map ../datasets/svhn/svhn_char_map.json --blank-label 10 -b 8

/anaconda3/envs/ssee/lib/python3.8/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py:153: UserWarning: optimizer.eps is changed to 1e-08 by MultiprocessParallelUpdater for new batch size. warnings.warn('optimizer.eps is changed to {} ' epoch iteration main/loss main/accuracy lr fast_validation/main/loss fast_validation/main/accuracy validation/main/loss validation/main/accuracy Exception in main training loop: reshape() got an unexpected keyword argument 'order' Traceback (most recent call last):............................] 0.01% File "/home/d/anaconda3/envs/ssee/lib/python3.8/site-packages/chainer/training/trainer.py", line 346, in runtimated time to finish: 0:00:00. entry.extension(self) File "/home/d/SEE/see-master/chainer/insights/bbox_plotter.py", line 128, in call self.render_rois(predictions, rois, bboxes, iteration, self.image.copy(), backprop_vis=backprop_visualizations) File "/home/d/SEE/see-master/chainer/insights/bbox_plotter.py", line 143, in render_rois self.render_extracted_regions(dest_image, image, rois, num_timesteps) File "/home/d/SEE/see-master/chainer/insights/bbox_plotter.py", line 201, in render_extracted_regions rois = self.xp.reshape(rois, (num_timesteps, -1, num_channels, height, width)) File "/home/d/anaconda3/envs/ssee/lib/python3.8/site-packages/cupy/manipulation/shape.py", line 33, in reshape return a.reshape(newshape, order=order) Will finalize trainer extensions and updater before reraising the exception. Traceback (most recent call last): File "train_svhn.py", line 258, in trainer.run() File "/home/d/anaconda3/envs/ssee/lib/python3.8/site-packages/chainer/training/trainer.py", line 376, in run six.reraise(*exc_info) File "/home/d/anaconda3/envs/ssee/lib/python3.8/site-packages/six.py", line 719, in reraise raise value File "/home/d/anaconda3/envs/ssee/lib/python3.8/site-packages/chainer/training/trainer.py", line 346, in run entry.extension(self) File "/home/d/SEE/see-master/chainer/insights/bbox_plotter.py", line 128, in call self.render_rois(predictions, rois, bboxes, iteration, self.image.copy(), backprop_vis=backprop_visualizations) File "/home/d/SEE/see-master/chainer/insights/bbox_plotter.py", line 143, in render_rois self.render_extracted_regions(dest_image, image, rois, num_timesteps) File "/home/d/SEE/see-master/chainer/insights/bbox_plotter.py", line 201, in render_extracted_regions rois = self.xp.reshape(rois, (num_timesteps, -1, num_channels, height, width)) File "/home/d/anaconda3/envs/ssee/lib/python3.8/site-packages/cupy/manipulation/shape.py", line 33, in reshape return a.reshape(newshape, order=order) TypeError: reshape() got an unexpected keyword argument 'order' `

opened by Skylarky 11
incompatible array types are mixed in the forward input (Convolution2DFunction).

Hi Christian,

I have encountered this error at the end of epoch training (99.75%), I am using google colab to do the training.

TypeError Traceback (most recent call last) in () 323 # ) 324 --> 325 trainer.run()

12 frames /usr/local/lib/python3.6/dist-packages/chainer/training/trainer.py in run(self, show_loop_exception_msg) 374 f.write('Traceback (most recent call last):\n') 375 traceback.print_tb(sys.exc_info()[2]) --> 376 six.reraise(*exc_info) 377 finally: 378 for _, entry in extensions:

/usr/local/lib/python3.6/dist-packages/six.py in reraise(tp, value, tb) 701 if value.traceback is not tb: 702 raise value.with_traceback(tb) --> 703 raise value 704 finally: 705 value = None

/usr/local/lib/python3.6/dist-packages/chainer/training/trainer.py in run(self, show_loop_exception_msg) 344 for name, entry in extensions: 345 if entry.trigger(self): --> 346 entry.extension(self) 347 except Exception as e: 348 if show_loop_exception_msg:

/usr/local/lib/python3.6/dist-packages/chainer/training/extensions/evaluator.py in call(self, trainer) 178 with reporter: 179 with configuration.using_config('train', False): --> 180 result = self.evaluate() 181 182 reporter_module.report(result)

/usr/local/lib/python3.6/dist-packages/chainer/training/extensions/evaluator.py in evaluate(self) 239 with function.no_backprop_mode(): 240 if isinstance(in_arrays, tuple): --> 241 eval_func(*in_arrays) 242 elif isinstance(in_arrays, dict): 243 eval_func(**in_arrays)

/content/drive/My Drive/Colab Notebooks/COMP421/Project/multi_accuracy_classifier.ipynb in call(self, *args)

/content/drive/My Drive/Colab Notebooks/COMP421/Project/fsns.ipynb in call(self, images, label)

/content/drive/My Drive/Colab Notebooks/COMP421/Project/fsns.ipynb in call(self, images)

/usr/local/lib/python3.6/dist-packages/chainer/link.py in call(self, *args, **kwargs) 285 # forward is implemented in the child classes 286 forward = self.forward # type: ignore --> 287 out = forward(*args, **kwargs) 288 289 # Call forward_postprocess hook

/usr/local/lib/python3.6/dist-packages/chainer/links/connection/convolution_2d.py in forward(self, x) 249 return convolution_2d.convolution_2d( 250 x, self.W, self.b, self.stride, self.pad, dilate=self.dilate, --> 251 groups=self.groups, cudnn_fast=self.cudnn_fast) 252 253

/usr/local/lib/python3.6/dist-packages/chainer/functions/connection/convolution_2d.py in convolution_2d(x, W, b, stride, pad, cover_all, **kwargs) 656 else: 657 args = x, W, b --> 658 y, = fnode.apply(args) 659 return y

/usr/local/lib/python3.6/dist-packages/chainer/function_node.py in apply(self, inputs) 267 is_chainerx, in_data = _extract_apply_in_data(inputs) 268 --> 269 utils._check_arrays_forward_compatible(in_data, self.label) 270 271 if is_chainerx:

/usr/local/lib/python3.6/dist-packages/chainer/utils/init.py in _check_arrays_forward_compatible(arrays, label) 91 'Actual: {}'.format( 92 ' ({})'.format(label) if label is not None else '', ---> 93 ', '.join(str(type(a)) for a in arrays))) 94 95

TypeError: incompatible array types are mixed in the forward input (Convolution2DFunction). Actual: <class 'numpy.ndarray'>, <class 'cupy.core.core.ndarray'>, <class 'cupy.core.core.ndarray'>

Some online resources suggested adding to_gpu() at the end of every Convolution2DFunction, but it didn't work as well. Can you please help me with it? Thank you.

opened by wdon021 10

Error while training on a different dataset.

I was able to get the code to train on the provided datasets. I created the char map and followed the procedure based on other issues, mostly #13, to train on my own dataset but I am facing the following error.

see1: 1 windows (created Wed Apr  4 23:32:48 2018) [164x47] (attached)
ubuntu@ip-172-31-15-55:~$ cd src/see
ubuntu@ip-172-31-15-55:~/src/see$ python3 chainer/train_svhn.py --char-map ../datasets/BornDigital/borndigital_char_map.json -b 32 ../datasets/BornDigital/curriculum.json ../datasets/BornDigital/logs/ --b
lank-label 0 -g 0 -lr 0.0001
/usr/local/lib/python3.5/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np
.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
/usr/local/lib/python3.5/dist-packages/chainer/training/updaters/multiprocess_parallel_updater.py:131: UserWarning: optimizer.eps is changed to 1e-08 by MultiprocessParallelUpdater for new batch size.
  format(optimizer.eps))
Exception in main training loop: '109'
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/chainer/training/trainer.py", line 299, in run
    update()
  File "/usr/local/lib/python3.5/dist-packages/chainer/training/updater.py", line 223, in update
    self.update_core()
  File "/usr/local/lib/python3.5/dist-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 206, in update_core
    loss = _calc_loss(self._master, batch)
  File "/usr/local/lib/python3.5/dist-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 235, in _calc_loss
    return model(*in_arrays)
  File "/home/ubuntu/src/see/chainer/utils/multi_accuracy_classifier.py", line 48, in __call__
    reported_accuracies = self.accfun(self.y, t)
  File "/home/ubuntu/src/see/chainer/metrics/svhn_softmax_metrics.py", line 54, in calc_accuracy
    label = "".join(map(self.label_to_char, label))
  File "/home/ubuntu/src/see/chainer/metrics/loss_metrics.py", line 181, in label_to_char
    return chr(self.char_map[str(label)])
Will finalize trainer extensions and updater before reraising the exception.
Traceback (most recent call last):
  File "chainer/train_svhn.py", line 257, in <module>
    trainer.run()
  File "/usr/local/lib/python3.5/dist-packages/chainer/training/trainer.py", line 313, in run
    six.reraise(*sys.exc_info())
  File "/usr/local/lib/python3.5/dist-packages/six.py", line 693, in reraise
    raise value
  File "/usr/local/lib/python3.5/dist-packages/chainer/training/trainer.py", line 299, in run
    update()
  File "/usr/local/lib/python3.5/dist-packages/chainer/training/updater.py", line 223, in update
    self.update_core()
  File "/usr/local/lib/python3.5/dist-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 206, in update_core
    loss = _calc_loss(self._master, batch)
  File "/usr/local/lib/python3.5/dist-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 235, in _calc_loss
    return model(*in_arrays)
  File "/home/ubuntu/src/see/chainer/utils/multi_accuracy_classifier.py", line 48, in __call__
    reported_accuracies = self.accfun(self.y, t)
  File "/home/ubuntu/src/see/chainer/metrics/svhn_softmax_metrics.py", line 54, in calc_accuracy
    label = "".join(map(self.label_to_char, label))
  File "/home/ubuntu/src/see/chainer/metrics/loss_metrics.py", line 181, in label_to_char
    return chr(self.char_map[str(label)])
KeyError: '109'

Thanks

opened by saharudra 8

Size error in fsns demo

I was trying to run fsns_demo on a random downloaded image but got this error.

Traceback (most recent call last): File "fsns_demo.py", line 153, in predictions, crops, grids = network(image[xp.newaxis, ...]) File "/home/nandwani_vaibhav/text-detection-ctpn/see/chainer/datasets/fsns.py", line 521, in call h = self.localization_net(images) File "/home/nandwani_vaibhav/text-detection-ctpn/see/chainer/datasets/fsns.py", line 206, in call lstm_prediction = F.relu(self.lstm(in_feature)) File "/home/nandwani_vaibhav/anaconda3/envs/fastai/lib/python3.6/site-packages/chainer/links/connection/lstm.py", line 309, in call lstm_in = self.upward(x) File "/home/nandwani_vaibhav/anaconda3/envs/fastai/lib/python3.6/site-packages/chainer/links/connection/linear.py", line 129, in call return linear.linear(x, self.W, self.b) File "/home/nandwani_vaibhav/anaconda3/envs/fastai/lib/python3.6/site-packages/chainer/functions/connection/linear.py", line 118, in linear y, = LinearFunction().apply(args) File "/home/nandwani_vaibhav/anaconda3/envs/fastai/lib/python3.6/site-packages/chainer/function_node.py", line 230, in apply self._check_data_type_forward(in_data) File "/home/nandwani_vaibhav/anaconda3/envs/fastai/lib/python3.6/site-packages/chainer/function_node.py", line 298, in _check_data_type_forward self.check_type_forward(in_type) File "/home/nandwani_vaibhav/anaconda3/envs/fastai/lib/python3.6/site-packages/chainer/functions/connection/linear.py", line 20, in check_type_forward x_type.shape[1] == w_type.shape[1], File "/home/nandwani_vaibhav/anaconda3/envs/fastai/lib/python3.6/site-packages/chainer/utils/type_check.py", line 524, in expect expr.expect() File "/home/nandwani_vaibhav/anaconda3/envs/fastai/lib/python3.6/site-packages/chainer/utils/type_check.py", line 482, in expect '{0} {1} {2}'.format(left, self.inv, right)) chainer.utils.type_check.InvalidType: Invalid operation is performed in: LinearFunction (Forward)

Expect: in_types[0].shape[1] == in_types[1].shape[1] Actual: 18144 != 3072

Is there any specific input size of image we should use? Or how to resolve this error?

opened by vaibhav541 7
ground truth format.

First of all .Thank u for sharing the project.

I want to train model with my own images and wonder what is the format of ground truth

BTW, My text is like "vn324kl21lsfda" which version u suggest me to use? fsns ,svhn or text recognition?

opened by qnkhuat 7
Cupy Unexpected Argument 'order'

Hi all,

python3 chainer/train_svhn.py datasets/svhn/curriculum.json logs --char-map datasets/svhn/svhn_char_map.json --blank-label 0 -b 16 --gpu 0

When i train with svhn datasets , I'm getting this error

Exception in main training loop: reshape() got an unexpected keyword argument 'order' Traceback (most recent call last):............................] 0.05% File "/usr/local/lib/python3.5/dist-packages/chainer/training/trainer.py", line 319, in run entry.extension(self)imated time to finish: 0:00:00. File "/home/user/Desktop/see/chainer/insights/bbox_plotter.py", line 128, in __call__ self.render_rois(predictions, rois, bboxes, iteration, self.image.copy(), backprop_vis=backprop_visualizations) File "/home/user/Desktop/see/chainer/insights/bbox_plotter.py", line 143, in render_rois self.render_extracted_regions(dest_image, image, rois, num_timesteps) File "/home/user/Desktop/see/chainer/insights/bbox_plotter.py", line 201, in render_extracted_regions rois = self.xp.reshape(rois,(num_timesteps, -1, num_channels, height, width)) File "/usr/local/lib/python3.5/dist-packages/cupy/manipulation/shape.py", line 33, in reshape return a.reshape(newshape, order=order) TypeError: reshape() got an unexpected keyword argument 'order'

I know it's about cupy.

Versions: Chainer == 6.4 cupy-cuda101 ? 6.4 Cuda compilation tools, release 10.1, V10.1.243

I couldnt fix this issue . But when i do comment this linerois = self.xp.reshape(rois,(num_timesteps, -1, num_channels, height, width)) (and below for loop ) in bbox_plotter.py . Training works fine. How can i fix this error ?

if I don't run this function(in bbox ) , will it have an effect on the result?

opened by muzaffersenkal 6
Dataset enlarging condition fix and different termination

I made the fixes that I mentioned in the issue thread.

Sometimes when the dataset gets larger the training gets stuck and I have to kill it. I'm trying to figure out whether the cause is memory related or if it's something in the code.

opened by janzd 6
fsns_demo reshape error

I have an error at line 156 of fsns.py when running fsns_demo.py with an image from SVHN test dataset. images = F.reshape(images, (batch_size, num_channels, height, 4, -1)) batch_size = 1 num_channels = 3 height = 48 width = 182

the error is raised in chainer.function.reshape() :

Invalid operation is performed in: Reshape (Forward)

Expect: prod(x.shape) % known_size(=576) == 0 Actual: 288 != 0

command line is python ./chainer/fsns_demo.py --gpu -1 ./downloads/model/ model_35000.npz ./downloads/svhn/test/2.png ./datasets/svhn/svhn_char_map.json

thanks

opened by fredO13 1
Recurrence code of SVHN original datasets

When I get the original SVHN dataset from here，I find train.tar.gz, test.tar.gz , extra.tar.gz. I want to reproduce the results with 95.2% accuracy obtained in the paper, I don't quite understand which dataset you use as a training set and a validation set

opened by WangzekunY 3
I got a question about the train_svhn.py

python train_svhn.py ../datasets/svhn/jsonfile/svhn_curriculum_specification.json ../datasets/svhn/runningLog/ -g 0 --char-map ../datasets/svhn/svhn_char_map.json -b 10 Traceback (most recent call last): File "train_fsns.py", line 84, in train_dataset, validation_dataset = curriculum.load_dataset(0) File "/home/donglong_5/SEE/see-master/chainer/utils/baby_step_curriculum.py", line 40, in load_dataset train_dataset = self.dataset_class(self.train_curriculum[level], **self.dataset_args) File "/home/donglong_5/SEE/see-master/chainer/datasets/file_dataset.py", line 31, in init self.num_timesteps, self.num_labels = (int(i) for i in next(reader)) File "/home/donglong_5/SEE/see-master/chainer/datasets/file_dataset.py", line 31, in self.num_timesteps, self.num_labels = (int(i) for i in next(reader)) ValueError: invalid literal for int() with base 10: '1 2'

opened by zimo99 6
Text Recognition Training

When I try to train the Text Recognition Demon I get this error: TypeError: Argument 'x' has incorrect type (expected cupy.core.core.ndarray, got Variable), did you ever come across this issue?

opened by sne21star 1
FSNS demo

When preparing the dataset for the FSNS demo it says to run this command: python swap_classes.py <gt_file> <output_file_name> 0 133. What exactly is the gt_file?

opened by sne21star 3

Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text Recognition"

Related tags

Overview

SEE: Towards Semi-Supervised End-to-End Scene Text Recognition

Installation

Directly on your PC

Using Docker

General Training Hints

SVHN Experiments

Datasets

Original SVHN data

Grid Dataset

Random Dataset

Dataset Download

Training the model

Preparations

Starting the training

FSNS Experiments

Preparing the Dataset

Training the Network

FSNS Demo

Text Recognition

Dataset

Training

Text Recognition Demo

Pretrained Models

General Notes on Training

Contents of the log dir

Inspecting the training progress

Creating an animation of plotted train steps

Evaluation

Usage

Evaluating a SVHN model

Evaluating a FSNS model

Evaluating a Text Recognition model

Citation

Notes

Comments

Owner

Christian Bartz

Code for AAAI 2021 paper: Sequential End-to-end Network for Efficient Person Search

Forked from argman/EAST for the ICPR MTWI 2018 CHALLENGE

EAST for ICPR MTWI 2018 Challenge II (Text detection of network images)

Code for the paper "DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks" (ICCV '19)

The code of "Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes"

Source code of RRPN ---- Arbitrary-Oriented Scene Text Detection via Rotation Proposals

This project modify tensorflow object detection api code to predict oriented bounding boxes. It can be used for scene text detection.

Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

This repository provides train＆test code, dataset, det.&rec. annotation, evaluation script, annotation tool, and ranking.

Scan the MRZ code of a passport and extract the firstname, lastname, passport number, nationality, date of birth, expiration date and personal numer.

Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding for Zero-Example Video Retrieval.

Code related to "Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity" paper

Code for CVPR2021 paper "Learning Salient Boundary Feature for Anchor-free Temporal Action Localization"

This is the code for our paper DAAIN: Detection of Anomalous and AdversarialInput using Normalizing Flows

Generate a list of papers with publicly available source code in the daily arxiv

Repository of conference publications and source code for first-/ second-authored papers published at NeurIPS, ICML, and ICLR.

Code for the ACL2021 paper "Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction"

This repository contains the code for the paper "SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks"