Code for Text Prior Guided Scene Text Image Super-Resolution

Last update: Dec 26, 2022

Related tags

Text Data & NLP TPGSR

Overview

Text Prior Guided Scene Text Image Super-Resolution

https://arxiv.org/abs/2106.15368

Jianqi Ma, Shi Guo, Lei Zhang
Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China

Recovering TextZoom samples

Environment:

Other possible python packages like pyyaml, cv2, Pillow and imgaug

Main idea

Single stage with loss

Multi-stage version

Configure your training

Download the pretrained recognizer from:

Aster: https://github.com/ayumiymk/aster.pytorch  
MORAN:  https://github.com/Canjie-Luo/MORAN_v2  
CRNN: https://github.com/meijieru/crnn.pytorch

Unzip the codes and walk into the '$TPGSR_ROOT$/', place the pretrained weights from recognizer in '$TPGSR_ROOT$/'.

Download the TextZoom dataset:

https://github.com/JasonBoy1/TextZoom

Train the corresponding model (e.g. TPGSR-TSRN):

chmod a+x train_TPGSR-TSRN.sh
./train_TPGSR-TSRN.sh
or
python3 main.py --arch="tsrn_tl_cascade" \       # The architecture
                --batch_size=48 \                # The batch size
                --STN \                          # Using STN net for alignment
		--mask \                         # Using the contour mask
		--use_distill \                  # Using the TP loss
		--gradient \                     # Using the Gradient Prior Loss
		--sr_share \                     # Sharing weights for SR Module
		--stu_iter=1 \                   # The number of interations in multi-stage version
		--vis_dir='vis_TPGSR-TSRN' \     # The checkpoint directory

Run the test-prefixed shell to test the corresponding model.

Adding '--go_test' in the shell file

Cite this paper:

@article{ma2021text,
title={Text Prior Guided Scene Text Image Super-resolution},
author={Ma, Jianqi and Guo, Shi and Zhang, Lei},
journal={arXiv preprint arXiv:2106.15368},
year={2021}
}

Comments

Request to add a license

Hi Ma, Great work on the paper and the implementation! I noticed that the repo did not have an license. I was wondering if you could add one so that I can understand the scope of use for the code.

Best, Jeswin James

opened by jeswinoc 5
Errror when run demo ?

I use GPU when inference but i don't know why error . Which one model run on CPU ?

loading pretrained crnn model from crnn.pth 0%| | 0/4 [00:00<?, ?it/s] Traceback (most recent call last): File "/home/thorpham/Documents/challenge/super-resolution/TPGSR/main.py", line 76, in main(config, args, opt_TPG=opt) File "/home/thorpham/Documents/challenge/super-resolution/TPGSR/main.py", line 16, in main Mission.demo() File "/home/thorpham/Documents/challenge/super-resolution/TPGSR/interfaces/super_resolution.py", line 1480, in demo images_sr = model(images_lr) File "/home/thorpham/anaconda3/envs/torch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/thorpham/Documents/challenge/super-resolution/TPGSR/model/tsrn.py", line 195, in forward spatial_t_emb = self.infoGen(text_emb) File "/home/thorpham/anaconda3/envs/torch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/thorpham/Documents/challenge/super-resolution/TPGSR/model/tsrn.py", line 103, in forward x = F.relu(self.bn1(self.tconv1(t_embedding))) File "/home/thorpham/anaconda3/envs/torch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/thorpham/anaconda3/envs/torch/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 916, in forward return F.conv_transpose2d( RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking arugment for argument weight in method wrapper_slow_conv_transpose2d)

opened by ThorPham 3
共享SR和非共享TP

论文中多阶段训练提出使用共享SR和非共享TP,但是代码中写的是非共享SR和共享TP 根据你提出的训练命令 python3 -u main.py --arch="tsrn_tl_cascade" --batch_size=48 --STN --mask --use_distill --gradient --sr_share --stu_iter=3 --vis_dir='vis_TPGSR-TSRN' --sr_share 默认为False,训练时是True

opened by zhaoguoqing12 2
Where is the final model?

Hi there,

I'd like to reproduce your amazing work but I can only find the pretrained models and not the final fine-tuned model. Am I correct?

Could you please upload the final model?

Thank you.

opened by imthebilliejoe 0
about arch

Amazing work! hello, what is the difference between 'sem_tsrn', 'tsrn_c2f', 'tsrn_tl', 'tsrn_tl_cascade', 'tsrn_tl_wmask'? I want to reproduce your work, which one should be selected?Thanks!

opened by xcc19970423 2
Issues about TSRN derived structures!
Hi, Ma, thanks for your nice job! Actually, I got some issues and begging for your early rely.

There are several TSRN derived structures mentioned in the code, like 'sem_tsrn', 'tsrn_c2f', 'tsrn_tl', 'tsrn_tl_cascade', 'tsrn_tl_wmask' etc. But actually, I just reproduced the 'tsrn_tl_cascade' arch successfully. The 'sem_tsrn' arch should be the core arch, isn't it? But why is there no 'sem_tsrn' in the 'args.arch' choices. Unfortunately, I still failed to reproduced it when I added 'sem_tsrn' into the choices of args.arch and set the args.arch=‘sem_tsrn’. Maybe there is something wrong in the released code I guess.

Can you explain the differences in these derived structures like ''tsrn_c2f', 'tsrn_tl_cascade', 'tsrn_tl_wmask' expect the 'data difference' from different arch? Or could you please give some detailed instructions in the README.md. It's a bit hard to understand the purpose of these structures when I read the code.

Thx again!
opened by yfaqh 2

Owner

GitHub

Scene Text Retrieval via Joint Text Detection and Similarity Learning

This is the code of "Scene Text Retrieval via Joint Text Detection and Similarity Learning". For more details, please refer to our CVPR2021 paper.

79 Nov 29, 2022

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

Text to speech (using Python) Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and co

19 Jun 30, 2022

Code for papers "Generation-Augmented Retrieval for Open-Domain Question Answering" and "Reader-Guided Passage Reranking for Open-Domain Question Answering", ACL 2021

This repo provides the code of the following papers: (GAR) "Generation-Augmented Retrieval for Open-domain Question Answering", ACL 2021 (RIDER) "Read

49 Dec 26, 2022

Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

41 Jan 3, 2023

2021 AI CUP Competition on Traditional Chinese Scene Text Recognition - Intermediate Contest

繁體中文場景文字辨識程式碼說明組別：這就是我成員：蔣明憲唐碩謙黃玥菱林冠霆蕭靖騰目錄環境套件安裝方式資料夾布局前處理-製作偵測訓練註解檔前處理-製作分類訓練樣本 part.py ：從 json 裁切出分類訓練樣本 Class.py ：將切出來的樣本按照文字分類到各資料夾

3 Jan 14, 2022

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

740 Dec 24, 2022

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks. It takes raw videos/images + text as inputs, and outputs task predictions. ClipBERT is designed based on 2D CNNs and transformers, and uses a sparse sampling strategy to enable efficient end-to-end video-and-language learning.

612 Jan 4, 2023

This repository contains the code for EMNLP-2021 paper "Word-Level Coreference Resolution"

Word-Level Coreference Resolution This is a repository with the code to reproduce the experiments described in the paper of the same name, which was a

79 Dec 27, 2022

Beta Distribution Guided Aspect-aware Graph for Aspect Category Sentiment Analysis with Affective Knowledge. Proceedings of EMNLP 2021

AAGCN-ACSA EMNLP 2021 Introduction This repository was used in our paper: Beta Distribution Guided Aspect-aware Graph for Aspect Category Sentiment An

36 Dec 18, 2022

Coreference resolution for English, German and Polish, optimised for limited training data and easily extensible for further languages

Coreferee Author: Richard Paul Hudson, msg systems ag 1. Introduction 1.1 The basic idea 1.2 Getting started 1.2.1 English 1.2.2 German 1.2.3 Polish 1

169 Dec 21, 2022

Code for Text Prior Guided Scene Text Image Super-Resolution

Related tags

Overview

Text Prior Guided Scene Text Image Super-Resolution

Recovering TextZoom samples

Environment:

Main idea

Single stage with loss

Multi-stage version

Configure your training

Download the pretrained recognizer from:

Download the TextZoom dataset:

Train the corresponding model (e.g. TPGSR-TSRN):

Run the test-prefixed shell to test the corresponding model.

Cite this paper:

Comments

Request to add a license

Errror when run demo ?

共享SR和非共享TP

Where is the final model?

about arch

Issues about TSRN derived structures!

Owner

Scene Text Retrieval via Joint Text Detection and Similarity Learning

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

Code for papers "Generation-Augmented Retrieval for Open-Domain Question Answering" and "Reader-Guided Passage Reranking for Open-Domain Question Answering", ACL 2021

Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

2021 AI CUP Competition on Traditional Chinese Scene Text Recognition - Intermediate Contest

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks

This repository contains the code for EMNLP-2021 paper "Word-Level Coreference Resolution"

Beta Distribution Guided Aspect-aware Graph for Aspect Category Sentiment Analysis with Affective Knowledge. Proceedings of EMNLP 2021

Treemap visualisation of Maya scene files

Super easy library for BERT based NLP models

Super easy library for BERT based NLP models

Use the state-of-the-art m2m100 to translate large data on CPU/GPU/TPU. Super Easy!

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

✨Fast Coreference Resolution in spaCy with Neural Networks

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

✨Fast Coreference Resolution in spaCy with Neural Networks

Coreference resolution for English, German and Polish, optimised for limited training data and easily extensible for further languages