STEFANN: Scene Text Editor using Font Adaptive Neural Network

Prasun Roy

Last update: Dec 11, 2022

Related tags

Computer Vision computer-vision deep-learning cvpr color-transfer colornet cvpr2020 stefann fannet scene-text-editor font-generation

Overview

Getting Started • Training Networks • External Links • Citation • License

The official GitHub repository for the paper on STEFANN: Scene Text Editor using Font Adaptive Neural Network.

Getting Started

1. Installing Dependencies

Package	Source	Version	Tested version (Updated on April 14, 2020)
Python	Conda	3.7.7	✔️
Pip	Conda	20.0.2	✔️
Numpy	Conda	1.18.1	✔️
Requests	Conda	2.23.0	✔️
TensorFlow	Conda	2.1.0	✔️
Keras	Conda	2.3.1	✔️
Pillow	Conda	7.0.0	✔️
Colorama	Conda	0.4.3	✔️
OpenCV	PyPI	4.2.0	✔️
PyQt5	PyPI	5.14.2	✔️

💥 Quick installation

Step 1: Install Git and Conda package manager (Miniconda / Anaconda)

Step 2: Update and configure Conda

conda update conda
conda config --set env_prompt "({name}) "

Step 3: Clone this repository and change directory to repository root

git clone https://github.com/prasunroy/stefann.git
cd stefann

Step 4: Create an environment and install depenpencies

On Linux and Windows

To create CPU environment: conda env create -f release/env_cpu.yml
To create GPU environment: conda env create -f release/env_gpu.yml

On macOS

To create CPU environment: conda env create -f release/env_osx.yml

💥 Quick test

Step 1: Download models and pretrained checkpoints into `release/models` directory

Step 2: Download sample images and extract into `release/sample_images` directory

stefann/
├── ...
├── release/
│   ├── models/
│   │   ├── colornet.json
│   │   ├── colornet_weights.h5
│   │   ├── fannet.json
│   │   └── fannet_weights.h5
│   ├── sample_images/
│   │   ├── 01.jpg
│   │   ├── 02.jpg
│   │   └── ...
│   └── ...
└── ...

Step 3: Activate environment

To activate CPU environment: conda activate stefann-cpu
To activate GPU environment: conda activate stefann-gpu

Step 4: Change directory to `release` and run STEFANN

cd release
python stefann.py

2. Editing Results 😆

Each image pair consists of the original image (Left) and the edited image (Right).

Training Networks

1. Downloading Datasets

Download datasets and extract the archives into `datasets` directory under repository root.

stefann/
├── ...
├── datasets/
│   ├── fannet/
│   │   ├── pairs/
│   │   ├── train/
│   │   └── valid/
│   └── colornet/
│       ├── test/
│       ├── train/
│       └── valid/
└── ...

📌 Description of `datasets/fannet`

This dataset is used to train FANnet and it consists of 3 directories: fannet/pairs, fannet/train and fannet/valid. The directories fannet/train and fannet/valid consist of 1015 and 300 sub-directories respectively, each corresponding to one specific font. Each font directory contains 64x64 grayscale images of 62 English alphanumeric characters (10 numerals + 26 upper-case letters + 26 lower-case letters). The filename format is xx.jpg where xx is the ASCII value of the corresponding character (e.g. "48.jpg" implies an image of character "0"). The directory fannet/pairs contains 50 image pairs, each corresponding to a random font from fannet/valid. Each image pair is horizontally concatenated to a dimension of 128x64. The filename format is id_xx_yy.jpg where id is the image identifier, xx and yy are the ASCII values of source and target characters respectively (e.g. "00_65_66.jpg" implies a transformation from source character "A" to target character "B" for the image with identifier "00").

📌 Description of `datasets/colornet`

This dataset is used to train Colornet and it consists of 3 directories: colornet/test, colornet/train and colornet/valid. Each directory consists of 5 sub-directories: _color_filters, _mask_pairs, input_color, input_mask and output_color. The directory _color_filters contains synthetically generated color filters of dimension 64x64 including both solid and gradient colors. The directory _mask_pairs contains a set of 64x64 grayscale image pairs selected at random from 1315 available fonts in datasets/fannet. Each image pair is horizontally concatenated to a dimension of 128x64. For colornet/train and colornet/valid each color filter is applied on each mask pair. This results in 64x64 image triplets of color source image, binary target image and color target image in input_color, input_mask and output_color directories respectively. For colornet/test one color filter is applied only on one mask pair to generate similar image triplets. With a fixed set of 100 mask pairs, 80000 colornet/train and 20000 colornet/valid samples are generated from 800 and 200 color filters respectively. With another set of 50 mask pairs, 50 colornet/test samples are generated from 50 color filters.

2. Training FANnet and Colornet

Step 1: Activate environment

To activate CPU environment: conda activate stefann-cpu
To activate GPU environment: conda activate stefann-gpu

Step 2: Change directory to project root

cd stefann

Step 3: Configure and train FANnet

To configure training options edit configurations section (line 40-72) of fannet.py
To start training: python fannet.py

☁️ Check this notebook hosted at Kaggle for an interactive demonstration of FANnet.

Step 4: Configure and train Colornet

To configure training options edit configurations section (line 38-65) of colornet.py
To start training: python colornet.py

☁️ Check this notebook hosted at Kaggle for an interactive demonstration of Colornet.

External Links

Project • Paper • Supplementary Materials • Datasets • Models • Sample Images

Citation

@InProceedings{Roy_2020_CVPR,
  title     = {STEFANN: Scene Text Editor using Font Adaptive Neural Network},
  author    = {Roy, Prasun and Bhattacharya, Saumik and Ghosh, Subhankar and Pal, Umapada},
  booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  month     = {June},
  year      = {2020}
}

License

Copyright 2020 by the authors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Made with ❤️ and 🍕 on Earth.

Comments

How many epochs does it take to converge either network

Hi,

Thanks for sharing your code. I am trying to reproduce your results, how many epochs does it take to converge to results similar to what you shared in your paper ?

Thanks
question

opened by ThomasDelteil 4
Update for osx development

Hi, I did some updates to the cpu dependencies because I found something that was blocking the execution on osx catalina.

As you can see I moved tensorflow to pip because conda was throwing an error. For the same reason I've downgraded opencv to version 4.1.x.

Due to a conflict with Cocoa libs on mac, I forced the use of pyqt5 by adding the opencv-contrib-python-headless.

After this, everything is working fine as expected.
bug enhancement

opened by mauromazzei 3
ValueError: No gradients provided for any variable

Hello, prasunroy. Thank you for the amazing work. I tried to train fannet by python stefann/fannet.py following your codes, but got ValueError: No gradients provided for any variable and it stopped training. Could you give me a solution to fix the bug？

bug

opened by XuyangPan 2
About character replacement

Thank you very much for your contribution in this field. I still have some questions about the character replacement part. As you said in the paper, when training fannet and colornet, the input character size is 6464, but when performing character replacement in a real scene, some characters are much larger in size 6464 (for example, 128128), how did you solve this situation? Is it to directly resize the generated characters to 128128 or some other way?
question

opened by BangdongChen 1
h5py version (integrity failure)
I followed the installation instructions, but kept getting the following message.

[DEBUG] Loading application... integrity failure

There was no problem with my CUDA settings, and tensorflow was picking up my GPU just fine, so I tried taking stuff out of try/exceptions and found out that keras was failing on loading the models.

original_keras_version = f.attrs['keras_version'].decode('utf8')

AttributeError: 'str' object has no attribute 'decode'

Googled it, and found that it works at h5py version 2.10.0. I think it would be nice if h5py versions were also included in conda settings.

Thank you anyways for your awesome work. I'll try it out now since my bug is fixed.
bug enhancement
opened by JouyonP 2
Questions about the input of FANnet.

Hello, after reading your paper, I have some questions about the input of FANnet. In the paper, one of the inputs to FANnet is "a one-hot encoding v of length 26 of the target character", where 26 represents 26 capital letters. Have you ever tried to change 26 to 62 (uppercase letters + lowercase letters + numbers), because the same font theoretically has some commonalities, so they should also be able to complete the font style transfer. I am interested in this question. Do you have any comments or suggestions?
question

opened by BangdongChen 0
SSIM calculation in fannet
I appreciate your excellent work on text editing. I tried to run FANnet with pretrained model on your datasets. So I downloaded the pretrained weights from here and datasets from here following README.

To generate results using the valid set as the input, I modified fannet.py and ran the following code

from skimage.metrics import structural_similarity as ssim for data in valid_datagen.flow(): [x, onehot], y = data out = fannet.predict([x, onehot]) n = x.shape[0] for i in range(n): _x = x[i].reshape(64, 64) _gt = y[i].reshape(64, 64) _out = out[i].reshape(64, 64) _, _out_bin = cv2.threshold(_out,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU) sv = ssim(_gt, _out_bin, data_range=255, gaussian_weights=True, sigma=1.5, use_sample_covariance=False) print(sv)

But the SSIM value sv is far from what was claimed in the paper. I also tried to calculate the average SSIM w.r.t different source characters. There is also a large gap. So I am wondering if there exists some mistakes when running the model or just the SSIM calculation.
question
opened by thuliu-yt16 0
Performance of colornet

Hello, I noticed that in the stefann.py file, you have provided three methods for style transfer, transfer_color_pal, transfer_color_max, and the colornet implementation. I uncommented the lines from 480-487 in the stefann.py file to use the colornet model. When I tested the application on images from 'sample_images' folder using the colornet method with the provided pretrained weights, it did not work very well and produced blurry and inconsistent results. The results of colornet did not match the results given in the 'editing_examples' folder.

Given Result (Left - Original Image, Right - stefann generated image) Output using colornet

There are many more cases where the colornet model is not performing as expected. Could you please help me with this?
question

opened by TanmayKhot 1
Support for Chinese Language

Hi, thanks for your hard work on this project. It's really cool! I've seen issue #7 but I still have some doubts. I would like to try to replace english text with it's corresponding chinese translation, but how can I do so if characters are stored in jpg file named as ASCII numbers? Chinese it's not included in ASCII. Another question regarding chinese is, do I also need to generate new images, one for each character, in the colornet directory? Your help would be much appreciated!
enhancement help wanted

opened by lorisgir 1

Owner

Prasun Roy

Hello World! :)

GitHub https://prasunroy.github.io/stefann

Give a solution to recognize MaoYan font.

猫眼字体识别该 github repo 在于帮助xjtlu的同学们识别猫眼的扭曲字体。已经打包上传至 pypi ，可以使用 pip 直接安装。猫眼字体的识别不出来的原理与解决思路在采茶上使用方法： import MaoYanFontRecognize

4 Jun 30, 2022

MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition

MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition Python 2.7 Python 3.6 MORAN is a network with rectification mechanism for

595 Dec 27, 2022

Repository for Scene Text Detection with Supervised Pyramid Context Network with tensorflow.

Scene-Text-Detection-with-SPCNET Unofficial repository for [Scene Text Detection with Supervised Pyramid Context Network][https://arxiv.org/abs/1811.0

121 Oct 15, 2021

A novel region proposal network for more general object detection ( including scene text detection ).

DeRPN: Taking a further step toward more general object detection DeRPN is a novel region proposal network which concentrates on improving the adaptiv

Deep Learning and Vision Computing Lab, SCUT

151 Dec 12, 2022

An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

InceptText-Tensorflow An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Orien

115 Dec 12, 2022

AdvancedEAST is an algorithm used for Scene image text detect, which is primarily based on EAST, and the significant improvement was also made, which make long text predictions more accurate.https://github.com/huoyijie/raspberrypi-car

AdvancedEAST AdvancedEAST is an algorithm used for Scene image text detect, which is primarily based on EAST:An Efficient and Accurate Scene Text Dete

1.2k Dec 29, 2022

OCR, Scene-Text-Understanding, Text Recognition

Scene-Text-Understanding Survey [2015-PAMI] Text Detection and Recognition in Imagery: A Survey paper [2014-Front.Comput.Sci] Scene Text Detection and

354 Dec 12, 2022

Convolutional Recurrent Neural Networks(CRNN) for Scene Text Recognition

CRNN_Tensorflow This is a TensorFlow implementation of a Deep Neural Network for scene text recognition. It is mainly based on the paper "An End-to-En

1000 Dec 27, 2022

Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

STN-OCR: A single Neural Network for Text Detection and Text Recognition This repository contains the code for the paper: STN-OCR: A single Neural Net

496 Jan 5, 2023

Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words.

Handwritten-Text-Recognition Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. T

27 Jan 8, 2023

Scene text recognition

AttentionOCR for Arbitrary-Shaped Scene Text Recognition Introduction This is the ranked No.1 tensorflow based scene text spotting algorithm on ICDAR2

777 Jan 9, 2023

End-to-end pipeline for real-time scene text detection and recognition.

Real-time-Scene-Text-Detection-and-Recognition-System End-to-end pipeline for real-time scene text detection and recognition. The detection model use

89 Aug 4, 2022

RRD: Rotation-Sensitive Regression for Oriented Scene Text Detection

RRD: Rotation-Sensitive Regression for Oriented Scene Text Detection For more details, please refer to our paper. Citing Please cite the related works

102 Jun 29, 2022

caffe re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection Abstract This is a caffe re-implementation of R2CNN: Rotational Region CNN fo

80 Dec 28, 2021

Source code of RRPN ---- Arbitrary-Oriented Scene Text Detection via Rotation Proposals

Paper source Arbitrary-Oriented Scene Text Detection via Rotation Proposals https://arxiv.org/abs/1703.01086 News We update RRPN in pytorch 1.0! View

428 Nov 22, 2022

This project modify tensorflow object detection api code to predict oriented bounding boxes. It can be used for scene text detection.

This is an oriented object detector based on tensorflow object detection API. Most of the code is not changed except for those related to the need of

30 Oct 22, 2022

STEFANN: Scene Text Editor using Font Adaptive Neural Network

Related tags

Overview

Getting Started

1. Installing Dependencies

💥 Quick installation

Step 1: Install Git and Conda package manager (Miniconda / Anaconda)

Step 2: Update and configure Conda

Step 3: Clone this repository and change directory to repository root

Step 4: Create an environment and install depenpencies

On Linux and Windows

On macOS

💥 Quick test

Step 1: Download models and pretrained checkpoints into release/models directory

Step 2: Download sample images and extract into release/sample_images directory

Step 3: Activate environment

Step 4: Change directory to release and run STEFANN

2. Editing Results 😆

Training Networks

1. Downloading Datasets

Download datasets and extract the archives into datasets directory under repository root.

📌 Description of datasets/fannet

📌 Description of datasets/colornet

2. Training FANnet and Colornet

Step 1: Activate environment

Step 2: Change directory to project root

Step 3: Configure and train FANnet

☁️ Check this notebook hosted at Kaggle for an interactive demonstration of FANnet.

Step 4: Configure and train Colornet

☁️ Check this notebook hosted at Kaggle for an interactive demonstration of Colornet.

External Links

Project • Paper • Supplementary Materials • Datasets • Models • Sample Images

Citation

License

Made with ❤️ and 🍕 on Earth.

Comments

Owner

Prasun Roy

Give a solution to recognize MaoYan font.

MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition

Repository for Scene Text Detection with Supervised Pyramid Context Network with tensorflow.

A novel region proposal network for more general object detection ( including scene text detection ).

An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

AdvancedEAST is an algorithm used for Scene image text detect, which is primarily based on EAST, and the significant improvement was also made, which make long text predictions more accurate.https://github.com/huoyijie/raspberrypi-car

OCR, Scene-Text-Understanding, Text Recognition

Convolutional Recurrent Neural Networks(CRNN) for Scene Text Recognition

Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words.

Scene text recognition

End-to-end pipeline for real-time scene text detection and recognition.

RRD: Rotation-Sensitive Regression for Oriented Scene Text Detection

caffe re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

Source code of RRPN ---- Arbitrary-Oriented Scene Text Detection via Rotation Proposals

This project modify tensorflow object detection api code to predict oriented bounding boxes. It can be used for scene text detection.

Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text Recognition"

Scene text detection and recognition based on Extremal Region(ER)

Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation

Step 1: Download models and pretrained checkpoints into `release/models` directory

Step 2: Download sample images and extract into `release/sample_images` directory

Step 4: Change directory to `release` and run STEFANN

Download datasets and extract the archives into `datasets` directory under repository root.

📌 Description of `datasets/fannet`

📌 Description of `datasets/colornet`