Code for Text Prior Guided Scene Text Image Super-Resolution

Overview

Text Prior Guided Scene Text Image Super-Resolution

https://arxiv.org/abs/2106.15368

Jianqi Ma, Shi Guo, Lei Zhang
Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China

Recovering TextZoom samples

TPGSR visualization

Environment:

python pytorch cuda numpy

Other possible python packages like pyyaml, cv2, Pillow and imgaug

Main idea

Single stage with loss

Multi-stage version

Configure your training

Download the pretrained recognizer from:

Aster: https://github.com/ayumiymk/aster.pytorch  
MORAN:  https://github.com/Canjie-Luo/MORAN_v2  
CRNN: https://github.com/meijieru/crnn.pytorch

Unzip the codes and walk into the '$TPGSR_ROOT$/', place the pretrained weights from recognizer in '$TPGSR_ROOT$/'.

Download the TextZoom dataset:

https://github.com/JasonBoy1/TextZoom

Train the corresponding model (e.g. TPGSR-TSRN):

chmod a+x train_TPGSR-TSRN.sh
./train_TPGSR-TSRN.sh
or
python3 main.py --arch="tsrn_tl_cascade" \       # The architecture
                --batch_size=48 \                # The batch size
                --STN \                          # Using STN net for alignment
		--mask \                         # Using the contour mask
		--use_distill \                  # Using the TP loss
		--gradient \                     # Using the Gradient Prior Loss
		--sr_share \                     # Sharing weights for SR Module
		--stu_iter=1 \                   # The number of interations in multi-stage version
		--vis_dir='vis_TPGSR-TSRN' \     # The checkpoint directory

Run the test-prefixed shell to test the corresponding model.

Adding '--go_test' in the shell file

Cite this paper:

@article{ma2021text,
title={Text Prior Guided Scene Text Image Super-resolution},
author={Ma, Jianqi and Guo, Shi and Zhang, Lei},
journal={arXiv preprint arXiv:2106.15368},
year={2021}
}
Comments
  • Request to add a license

    Request to add a license

    Hi Ma, Great work on the paper and the implementation! I noticed that the repo did not have an license. I was wondering if you could add one so that I can understand the scope of use for the code.

    Best, Jeswin James

    opened by jeswinoc 5
  • Errror when run demo ?

    Errror when run demo ?

    I use GPU when inference but i don't know why error . Which one model run on CPU ?

    loading pretrained crnn model from crnn.pth 0%| | 0/4 [00:00<?, ?it/s] Traceback (most recent call last): File "/home/thorpham/Documents/challenge/super-resolution/TPGSR/main.py", line 76, in main(config, args, opt_TPG=opt) File "/home/thorpham/Documents/challenge/super-resolution/TPGSR/main.py", line 16, in main Mission.demo() File "/home/thorpham/Documents/challenge/super-resolution/TPGSR/interfaces/super_resolution.py", line 1480, in demo images_sr = model(images_lr) File "/home/thorpham/anaconda3/envs/torch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/thorpham/Documents/challenge/super-resolution/TPGSR/model/tsrn.py", line 195, in forward spatial_t_emb = self.infoGen(text_emb) File "/home/thorpham/anaconda3/envs/torch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/thorpham/Documents/challenge/super-resolution/TPGSR/model/tsrn.py", line 103, in forward x = F.relu(self.bn1(self.tconv1(t_embedding))) File "/home/thorpham/anaconda3/envs/torch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/thorpham/anaconda3/envs/torch/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 916, in forward return F.conv_transpose2d( RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking arugment for argument weight in method wrapper_slow_conv_transpose2d)

    opened by ThorPham 3
  • 共享SR和非共享TP

    共享SR和非共享TP

    论文中多阶段训练 提出使用共享SR和非共享TP,但是代码中写的是 非共享SR和共享TP image image 根据你提出的训练命令 python3 -u main.py --arch="tsrn_tl_cascade" --batch_size=48 --STN --mask --use_distill --gradient --sr_share --stu_iter=3 --vis_dir='vis_TPGSR-TSRN' --sr_share 默认为False,训练时是True

    opened by zhaoguoqing12 2
  • Where is the final model?

    Where is the final model?

    Hi there,

    I'd like to reproduce your amazing work but I can only find the pretrained models and not the final fine-tuned model. Am I correct?

    Could you please upload the final model?

    Thank you.

    opened by imthebilliejoe 0
  • about arch

    about arch

    Amazing work! hello, what is the difference between 'sem_tsrn', 'tsrn_c2f', 'tsrn_tl', 'tsrn_tl_cascade', 'tsrn_tl_wmask'? I want to reproduce your work, which one should be selected?Thanks!

    opened by xcc19970423 2
  • Issues about TSRN derived structures!

    Issues about TSRN derived structures!

    Hi, Ma, thanks for your nice job! Actually, I got some issues and begging for your early rely.

    1. There are several TSRN derived structures mentioned in the code, like 'sem_tsrn', 'tsrn_c2f', 'tsrn_tl', 'tsrn_tl_cascade', 'tsrn_tl_wmask' etc. But actually, I just reproduced the 'tsrn_tl_cascade' arch successfully. The 'sem_tsrn' arch should be the core arch, isn't it? But why is there no 'sem_tsrn' in the 'args.arch' choices. Unfortunately, I still failed to reproduced it when I added 'sem_tsrn' into the choices of args.arch and set the args.arch=‘sem_tsrn’. Maybe there is something wrong in the released code I guess.

    2. Can you explain the differences in these derived structures like ''tsrn_c2f', 'tsrn_tl_cascade', 'tsrn_tl_wmask' expect the 'data difference' from different arch? Or could you please give some detailed instructions in the README.md. It's a bit hard to understand the purpose of these structures when I read the code.

    Thx again!

    opened by yfaqh 2
Owner
null
Scene Text Retrieval via Joint Text Detection and Similarity Learning

This is the code of "Scene Text Retrieval via Joint Text Detection and Similarity Learning". For more details, please refer to our CVPR2021 paper.

null 79 Nov 29, 2022
Code for papers "Generation-Augmented Retrieval for Open-Domain Question Answering" and "Reader-Guided Passage Reranking for Open-Domain Question Answering", ACL 2021

This repo provides the code of the following papers: (GAR) "Generation-Augmented Retrieval for Open-domain Question Answering", ACL 2021 (RIDER) "Read

morning 49 Dec 26, 2022
Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

null 41 Jan 3, 2023
2021 AI CUP Competition on Traditional Chinese Scene Text Recognition - Intermediate Contest

繁體中文場景文字辨識 程式碼說明 組別:這就是我 成員:蔣明憲 唐碩謙 黃玥菱 林冠霆 蕭靖騰 目錄 環境套件 安裝方式 資料夾布局 前處理-製作偵測訓練註解檔 前處理-製作分類訓練樣本 part.py : 從 json 裁切出分類訓練樣本 Class.py : 將切出來的樣本按照文字分類到各資料夾

HuanyueTW 3 Jan 14, 2022
WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

Google Research Datasets 740 Dec 24, 2022
Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks. It takes raw videos/images + text as inputs, and outputs task predictions. ClipBERT is designed based on 2D CNNs and transformers, and uses a sparse sampling strategy to enable efficient end-to-end video-and-language learning.

Jie Lei 雷杰 612 Jan 4, 2023
This repository contains the code for EMNLP-2021 paper "Word-Level Coreference Resolution"

Word-Level Coreference Resolution This is a repository with the code to reproduce the experiments described in the paper of the same name, which was a

null 79 Dec 27, 2022
Beta Distribution Guided Aspect-aware Graph for Aspect Category Sentiment Analysis with Affective Knowledge. Proceedings of EMNLP 2021

AAGCN-ACSA EMNLP 2021 Introduction This repository was used in our paper: Beta Distribution Guided Aspect-aware Graph for Aspect Category Sentiment An

Akuchi 36 Dec 18, 2022
Treemap visualisation of Maya scene files

Ever wondered which nodes are responsible for that 600 mb+ Maya scene file? Features Fast, resizable UI Parsing at 50 mb/sec Dependency-free, single-f

Marcus Ottosson 76 Nov 12, 2022
Super easy library for BERT based NLP models

Fast-Bert New - Learning Rate Finder for Text Classification Training (borrowed with thanks from https://github.com/davidtvs/pytorch-lr-finder) Suppor

Utterworks 1.8k Dec 27, 2022
Super easy library for BERT based NLP models

Fast-Bert New - Learning Rate Finder for Text Classification Training (borrowed with thanks from https://github.com/davidtvs/pytorch-lr-finder) Suppor

Utterworks 1.5k Feb 18, 2021
Use the state-of-the-art m2m100 to translate large data on CPU/GPU/TPU. Super Easy!

Easy-Translate is a script for translating large text files in your machine using the M2M100 models from Facebook/Meta AI. We also privide a script fo

Iker García-Ferrero 41 Dec 15, 2022
:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

Dedupe Python Library dedupe is a python library that uses machine learning to perform fuzzy matching, deduplication and entity resolution quickly on

Dedupe.io 3.6k Jan 2, 2023
:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

Dedupe Python Library dedupe is a python library that uses machine learning to perform fuzzy matching, deduplication and entity resolution quickly on

Dedupe.io 2.9k Feb 11, 2021
✨Fast Coreference Resolution in spaCy with Neural Networks

✨ NeuralCoref 4.0: Coreference Resolution in spaCy with Neural Networks. NeuralCoref is a pipeline extension for spaCy 2.1+ which annotates and resolv

Hugging Face 2.6k Jan 4, 2023
:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

Dedupe Python Library dedupe is a python library that uses machine learning to perform fuzzy matching, deduplication and entity resolution quickly on

Dedupe.io 2.9k Feb 17, 2021
✨Fast Coreference Resolution in spaCy with Neural Networks

✨ NeuralCoref 4.0: Coreference Resolution in spaCy with Neural Networks. NeuralCoref is a pipeline extension for spaCy 2.1+ which annotates and resolv

Hugging Face 2.2k Feb 18, 2021
Coreference resolution for English, German and Polish, optimised for limited training data and easily extensible for further languages

Coreferee Author: Richard Paul Hudson, msg systems ag 1. Introduction 1.1 The basic idea 1.2 Getting started 1.2.1 English 1.2.2 German 1.2.3 Polish 1

msg systems ag 169 Dec 21, 2022