Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

Related tags

Deep Learning ABINet
Overview

Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

The official code of ABINet (CVPR 2021, Oral).

ABINet uses a vision model and an explicit language model to recognize text in the wild, which are trained in end-to-end way. The language model (BCN) achieves bidirectional language representation in simulating cloze test, additionally utilizing iterative correction strategy.

framework

Runtime Environment

  • We provide a pre-built docker image using the Dockerfile from docker/Dockerfile

  • Running in Docker

    $ [email protected]:FangShancheng/ABINet.git
    $ docker run --gpus all --rm -ti --ipc=host -v $(pwd)/ABINet:/app fangshancheng/fastai:torch1.1 /bin/bash
    
  • (Untested) Or using the dependencies

    pip install -r requirements.txt
    

Datasets

  • Training datasets

    1. MJSynth (MJ):
    2. SynthText (ST):
    3. WikiText103, which is only used for pre-trainig language models:
  • Evaluation datasets, LMDB datasets can be downloaded from BaiduNetdisk(passwd:1dbv), GoogleDrive.

    1. ICDAR 2013 (IC13)
    2. ICDAR 2015 (IC15)
    3. IIIT5K Words (IIIT)
    4. Street View Text (SVT)
    5. Street View Text-Perspective (SVTP)
    6. CUTE80 (CUTE)
  • The structure of data directory is

    data
    ├── charset_36.txt
    ├── evaluation
    │   ├── CUTE80
    │   ├── IC13_857
    │   ├── IC15_1811
    │   ├── IIIT5k_3000
    │   ├── SVT
    │   └── SVTP
    ├── training
    │   ├── MJ
    │   │   ├── MJ_test
    │   │   ├── MJ_train
    │   │   └── MJ_valid
    │   └── ST
    ├── WikiText-103.csv
    └── WikiText-103_eval_d1.csv
    

Pretrained Models

Get the pretrained models from BaiduNetdisk(passwd:kwck), GoogleDrive. Performances of the pretrained models are summaried as follows:

Model IC13 SVT IIIT IC15 SVTP CUTE AVG
ABINet-SV 97.1 92.7 95.2 84.0 86.7 88.5 91.4
ABINet-LV 97.0 93.4 96.4 85.9 89.5 89.2 92.7

Training

  1. Pre-train vision model
    CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --config=configs/pretrain_vision_model.yaml
    
  2. Pre-train language model
    CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --config=configs/pretrain_language_model.yaml
    
  3. Train ABINet
    CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --config=configs/train_abinet.yaml
    

Note:

  • You can set the checkpoint path for vision and language models separately for specific pretrained model, or set to None to train from scratch

Evaluation

CUDA_VISIBLE_DEVICES=0 python main.py --config=configs/train_abinet.yaml --phase test --image_only

Additional flags:

  • --checkpoint /path/to/checkpoint set the path of evaluation model
  • --test_root /path/to/dataset set the path of evaluation dataset
  • --model_eval [alignment|vision] which sub-model to evaluate
  • --image_only disable dumping visualization of attention masks

Visualization

Successful and failure cases on low-quality images:

cases

Citation

If you find our method useful for your reserach, please cite

@article{fang2021read,
  title={Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition},
  author={Fang, Shancheng and Xie, Hongtao and Wang, Yuxin and Mao, Zhendong and Zhang, Yongdong},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2021}
}

License

This project is only free for academic research purposes, licensed under the 2-clause BSD License - see the LICENSE file for details.

Feel free to contact [email protected] if you have any questions.

Comments
  • nan loss

    nan loss

    image Hi, thanks for your great work @FangShancheng I try to training ABINet with vietnamese characters. Firstly, I train the language model using my own vietnamese token but got nan loss. Can you help me to fix this ?

    opened by CuongNN218 8
  • 使用其它数据集问题

    使用其它数据集问题

    想使用其他数据集,利用tools/create_lmdb_dataset.py将数据集转换成mdb格式后,不管是训练还是测试模型总是报错: File "/home/yuy/workspace/ABINet-main/dataset.py", line 137, in getitem if not self.is_training: assert idx == idx_new, f'idx {idx} != idx_new {idx_new} during testing.' AssertionError: idx 18 != idx_new 11558 during testing 想请问一下大概是什么原因呢?还是原始数据集需要做什么特殊处理吗?

    opened by swqsyy 4
  • Failed to reproduce results in the paper

    Failed to reproduce results in the paper

    Thanks for release your source code. Unfortunately, i got poor test result (about -4% from official results in the paper)

    Is there any important training strategies to reproduce the results? Or, is there anyone who meets same issue..?

    opened by mandal4 4
  •    No module named 'fastai.callback.general_sched'

    No module named 'fastai.callback.general_sched'

    File "main.py", line 10, in from fastai.callback.general_sched import GeneralScheduler, TrainingPhase ModuleNotFoundError: No module named 'fastai.callback.general_sched'

    opened by ncbwct 2
  • CTCLoss is nan

    CTCLoss is nan

    @FangShancheng @AK391 When I use CRNN+MultiLosses for training, the loss is normal. But if change to CRNN+CTCLoss, the loss of first 4~5 iterations is normal, then get inf, after that get nan. Is there any problem in my implementation of CTCLoss ?

    from torch.nn import CTCLoss
    
    
    class MyCTCLoss(nn.Module):
        def __init__(self):
            super().__init__()
            self.ctc = CTCLoss(
                    reduction="mean",
                    zero_infinity=False)
    
        @property
        def last_losses(self):
            return self.losses
    
        def _flatten(self, sources, lengths):
            return torch.cat([t[:l] for t, l in zip(sources, lengths)])
    
        def _ctc_loss(self, output, gt_labels, gt_lengths, idx=None, record=True):
            loss_name = output.get('name')
            pt_logits, weight = output['logits'], output['loss_weight']
            assert pt_logits.shape[0] % gt_labels.shape[0] == 0
    
            pt_logits = pt_logits.permute(1, 0, 2)
            log_pt_logits = torch.log_softmax(pt_logits, dim=-1)
            pt_lengths = output['pt_lengths']
            flat_gt_labels = self._flatten(gt_labels, gt_lengths - 1)
            loss = self.ctc(log_pt_logits, flat_gt_labels, pt_lengths, gt_lengths - 1)
    
            if record and loss_name is not None: self.losses[f'{loss_name}_loss'] = loss
    
            return loss
    
        def forward(self, outputs, *args):
            self.losses = {}
            return self._ctc_loss(outputs, *args, record=False)
    
    opened by dagongji10 1
  • How to prepare a custom dataset for training?

    How to prepare a custom dataset for training?

    I tried to train your model with custom dataset and I got this:

    " [2021-10-11 13:09:35,820 main.py:222 INFO train-abinet] Construct dataset. Traceback (most recent call last): File "main.py", line 246, in main() File "main.py", line 224, in main else: data = _get_databaunch(config) File "main.py", line 80, in _get_databaunch train_ds = _get_dataset(ImageDataset, config.dataset_train_roots, True, config) File "main.py", line 47, in _get_dataset datasets = [ds_type(p, **kwargs) for p in paths] File "main.py", line 47, in datasets = [ds_type(p, **kwargs) for p in paths] File "/content/drive/MyDrive/OCR/ABINet/dataset.py", line 49, in init self.length = int(txn.get('num-samples'.encode())) TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType' "

    And here's how my data is stored: ABINet |--data | |--img | | |--(here is images) | |--lmdb | | |--data.mdb | | |--lock.mdb

    Thanks

    opened by sepehratwork 1
  • Question for adapting to textline recognition and variable-size images?

    Question for adapting to textline recognition and variable-size images?

    Thanks for your amazing repo! I have a textline dataset and I want to get a solution for using your work to adapt my dataset. Can the system be used for variable-size images (Not fixed size - 32 x 128) and variable-size text length (according to a max text-length of mini-batch ,not using fixed dataset_max_length)? Could you refer to me a solution? Thanks!

    opened by AndiezPham 1
  • 奇怪的热图

    奇怪的热图

    感谢您的开源,ABINet让我受益良多! 我把ABINet迁移到了私有数据上,因为文字序列比常用的 max_len = 25 要大,统计之后,我将max_len设置为了40. 文字识别结果具有非常好的性能,可是当我使用 class DumpPrediction 可视化 attn_scores 时,出现了如图所示的奇怪现像(我截取掉了一部分热图,原图太长了)。attn_scores 中的 每一个 attn 可视化得到的“关注区域”是两个,同时左上角会“堆积”一部分注意力,这让我很迷惑。于是我将 max_len 修改回 25,发现热图是正常的。attn_scores中的每一个 attn 都关注到了一个字符区域,但左上角部分还是有少许的注意力“堆积”现像。 微信图片_20221103182608

    opened by icecream-Tnak 0
  • I was training with your dataset (lmdb) but error? Can you help me?

    I was training with your dataset (lmdb) but error? Can you help me?

    [2022-09-04 08:24:27,580 main.py:229 INFO train-abinet] Construct learner. [2022-09-04 08:24:27,679 main.py:233 INFO train-abinet] Start training. <IPython.core.display.HTML object> <IPython.core.display.HTML object> Traceback (most recent call last): File "main.py", line 246, in main() File "main.py", line 235, in main lr=config.optimizer_lr) File "/usr/local/lib/python3.7/dist-packages/fastai/basic_train.py", line 200, in fit fit(epochs, self, metrics=self.metrics, callbacks=self.callbacks+callbacks) File "/usr/local/lib/python3.7/dist-packages/fastai/basic_train.py", line 99, in fit for xb,yb in progress_bar(learn.data.train_dl, parent=pbar): File "/usr/local/lib/python3.7/dist-packages/fastprogress/fastprogress.py", line 39, in iter if self.total != 0: self.update(0) File "/usr/local/lib/python3.7/dist-packages/fastprogress/fastprogress.py", line 59, in update self.update_bar(0) File "/usr/local/lib/python3.7/dist-packages/fastprogress/fastprogress.py", line 81, in update_bar self.on_update(val, f'{pct}[{val}/{tot} {elapsed_t}{self.lt}{remaining_t}{end}]') File "/usr/local/lib/python3.7/dist-packages/fastprogress/fastprogress.py", line 134, in on_update elif self.parent is not None: self.parent.show() File "/usr/local/lib/python3.7/dist-packages/fastprogress/fastprogress.py", line 177, in show self.out.update(HTML(self.html_code)) AttributeError: 'NoneType' object has no attribute 'update'

    opened by PhanTung-06 0
  • Question about the pre-trianing of visual model part.

    Question about the pre-trianing of visual model part.

    Thansk for your great work! I have a question about the pre-training of visual models, is there a big difference between direct training of ABINet and retraining of ABINet based on the pre-training of visual models? On what scale? Can you talk about the specific range of differences? Thank you so much!

    opened by Zhou2019 0
  • question about evaluation

    question about evaluation

    Hi, thank you for sharing your nice work! I'd like to ask the processing characterset when evaluating your code. Did you use 'label = re.sub('[^0-9a-zA-Z]+', '', label)' at evaluation mode ? I wonder how did you process ground truth in the results reported on your paper. If you use 32-charset, did you ignore out of vocab character(^0-9a-z) for gt? for example, if gt is 'hello?' and pred is 'hello', do you check the result is correct?

    opened by vanche 0
  • ABINet has language bias?

    ABINet has language bias?

    When I use malay/indonesian sample it has bad performance especially on blurred images. While the sample on huggingspace could be correctly predicted eventhough some images should've been hard to guess even by humans.

    opened by light42 0
  • Thank you ! it works amazing

    Thank you ! it works amazing

    image

    IT WORKS AMAZINGLY THANK YOU SO MUCH!!!!!!!! The other AI I used was soo slow

    thank you !

    [2022-08-13 01:22:31,915 demo.py:106 INFO train-abinet] ../captcha_database/127tdphc.png: 1777dphc 0%| | 2/7513 [00:00<14:09, 8.85it/s][2022-08-13 01:22:32,036 demo.py:106 INFO train-abinet] ../captcha_database/12GCEAH3.png: 12gcehh3 0%| | 3/7513 [00:00<14:35, 8.57it/s][2022-08-13 01:22:32,147 demo.py:106 INFO train-abinet] ../captcha_database/12dtg75m.png: 12dtg75m 0%| | 4/7513 [00:00<14:18, 8.75it/s][2022-08-13 01:22:32,258 demo.py:106 INFO train-abinet] ../captcha_database/12eztmn3.png: 12ezlmn3 0%| | 5/7513 [00:00<14:08, 8.84it/s][2022-08-13 01:22:32,368 demo.py:106 INFO train-abinet] ../captcha_database/12gd7Q6c.png: 12gd766e 0%| | 6/7513 [00:00<14:03, 8.90it/s][2022-08-13 01:22:32,477 demo.py:106 INFO train-abinet] ../captcha_database/13Fbudw8.png: l3f5udw8 0%| | 7/7513 [00:00<13:53, 9.01it/s][2022-08-13 01:22:32,571 demo.py:106 INFO train-abinet] ../captcha_database/13J9gmqt.png: 13l9gmqt [2022-08-13 01:22:32,675 demo.py:106 INFO train-abinet] ../captcha_database/13gc4dl5.png: i3g648l5 0%| | 9/7513 [00:00<13:10, 9.49it/s][2022-08-13 01:22:32,774 demo.py:106 INFO train-abinet] ../captcha_database/13gitbos.png: 18gitbos [2022-08-13 01:22:32,865 demo.py:106 INFO train-abinet] ../captcha_database/13tg2duq.png: 13t62duq 0%|▏ | 11/7513 [00:01<12:37, 9.90it/s][2022-08-13 01:22:32,977 demo.py:106 INFO train-abinet] ../captcha_database/14Fd7Z3c.png: 14r47z34 0%|▏ | 12/7513 [00:01<12:55, 9.67it/s][2022-08-13 01:22:33,088 demo.py:106 INFO train-abinet] ../captcha_database/14Qaxyop.png: 14qaxyop 0%|▏ | 13/7513 [00:01<13:09, 9.50it/s][2022-08-13 01:22:33,199 demo.py:106 INFO train-abinet] ../captcha_database/14U7X2PL.png: 14u7x3pl 0%|▏ | 14/7513 [00:01<13:20, 9.37it/s][2022-08-13 01:22:33,295 demo.py:106 INFO train-abinet] ../captcha_database/14XZ2d5L.png: 44xz2o5l [2022-08-13 01:22:33,388 demo.py:106 INFO train-abinet] ../captcha_database/14dac8yl.png: 14dac8yl 0%|▏ | 16/7513 [00:01<12:41, 9.84it/s][2022-08-13 01:22:33,504 demo.py:106 INFO train-abinet] ../captcha_database/14i76wpj.png: 14i76wpj 0%|▏ | 17/7513 [00:01<13:06, 9.53it/s][2022-08-13 01:22:33,612 demo.py:106 INFO train-abinet] ../captcha_database/14kx7sa6.png: 14kx7sa6 0%|▏ | 18/7513 [00:01<13:11, 9.47it/s][2022-08-13 01:22:33,723 demo.py:106 INFO train-abinet] ../captcha_database/14si6jky.png: 14si6jky 0%|▏ | 19/7513 [00:02<13:22, 9.34it/s][2022-08-13 01:22:33,816 demo.py:106 INFO train-abinet] ../captcha_database/1523eqy8.png: 1523eqy8 [2022-08-13 01:22:33,910 demo.py:106 INFO train-abinet] ../captcha_database/1573pi8y.png: 1573pi8y 0%|▏ | 21/7513 [00:02<12:39, 9.87it/s][2022-08-13 01:22:34,002 demo.py:106 INFO train-abinet] ../captcha_database/15QAINBP.png: 15q4inpp [2022-08-13 01:22:34,098 demo.py:106 INFO train-abinet] ../captcha_database/15ir94ms.png: 15iro4ms 0%|▎ | 23/7513 [00:02<12:18, 10.14it/s][2022-08-13 01:22:34,197 demo.py:106 INFO train-abinet] ../captcha_database/15l849te.png: 15l849te [2022-08-13 01:22:34,313 demo.py:106 INFO train-abinet] ../captcha_database/167gZrc9.png: 167gzrc9 0%|▎ | 25/7513 [00:02<12:40, 9.84it/s][2022-08-13 01:22:34,417 demo.py:106 INFO train-abinet] ../captcha_database/16LZMATI.png: 16l2mafi 0%|▎ | 26/7513 [00:02<12:45, 9.78it/s][2022-08-13 01:22:34,510 demo.py:106 INFO train-abinet] ../captcha_database/16idr4lp.png: 16idr4lp [2022-08-13 01:22:34,618 demo.py:106 INFO train-abinet] ../captcha_database/16nZ7fPm.png: 15pzzfpm 0%|▎ | 28/7513 [00:02<12:40, 9.85it/s][2022-08-13 01:22:34,710 demo.py:106 INFO train-abinet] ../captcha_database/173ljdke.png: 173ljdke [2022-08-13 01:22:34,800 demo.py:106 INFO train-abinet] ../captcha_database/1756tknm.png: 1756tknm 0%|▎ | 30/7513 [00:03<12:13, 10.21it/s][2022-08-13 01:22:34,899 demo.py:106 INFO train-abinet] ../captcha_database/17lin6Z2.png: 17inn6z2 [2022-08-13 01:22:34,991 demo.py:106 INFO train-abinet] ../captcha_database/17pcaHZ8.png: i7pcahz8 0%|▍ | 32/7513 [00:03<12:07, 10.29it/s][2022-08-13 01:22:35,120 demo.py:106 INFO train-abinet] ../captcha_database/17ugeo36.png: 17ugeo36 [2022-08-13 01:22:35,219 demo.py:106 INFO train-abinet] ../captcha_database/1874Qfbd.png: 1874qfbd 0%|▍ | 34/7513 [00:03<12:46, 9.75it/s][2022-08-13 01:22:35,331 demo.py:106 INFO train-abinet] ../captcha_database/187prHQi.png: 187prhqi 0%|▍ | 35/7513 [00:03<12:59, 9.60it/s][2022-08-13 01:22:35,442 demo.py:106 INFO train-abinet] ../captcha_database/18LQnbpa.png: 13lqnbpa 0%|▍ | 36/7513 [00:03<13:09, 9.47it/s][2022-08-13 01:22:35,537 demo.py:106 INFO train-abinet] ../captcha_database/18i7qzos.png: 18i7qzos [2022-08-13 01:22:35,632 demo.py:106 INFO train-abinet] ../captcha_database/18jb3qlr.png: 183b5qlr 1%|▍ | 38/7513 [00:03<12:39, 9.84it/s][2022-08-13 01:22:35,741 demo.py:106 INFO train-abinet] ../captcha_database/18mn9iwZ.png: 18mnsnwz 1%|▍ | 39/7513 [00:04<12:51, 9.68it/s][2022-08-13 01:22:35,852 demo.py:106 INFO train-abinet] ../captcha_database/18nqfyep.png: 18nqfyep 1%|▍ | 40/7513 [00:04<13:05, 9.52it/s][2022-08-13 01:22:35,976 demo.py:106 INFO train-abinet] ../captcha_database/19J8WeH7.png: 19jqwwhz 1%|▍ | 41/7513 [00:04<13:39, 9.12it/s][2022-08-13 01:22:36,072 demo.py:106 INFO train-abinet] ../captcha_database/19SDOT2K.png: 196dot2k [2022-08-13 01:22:36,166 demo.py:106 INFO train-abinet] ../captcha_database/19i7Fk2s.png: 19i7fk2s 1%|▌ | 43/7513 [00:04<12:54, 9.64it/s][2022-08-13 01:22:36,261 demo.py:106 INFO train-abinet] ../captcha_database/1BX74IUD.png: 1bx74lud [2022-08-13 01:22:36,374 demo.py:106 INFO train-abinet] ../captcha_database/1DLKH85Q.png: 1olkh85q 1%|▌ | 45/7513 [00:04<12:54, 9.64it/s][2022-08-13 01:22:36,485 demo.py:106 INFO train-abinet] ../captcha_database/1F5r6ed3.png: 1f5r6ed3 1%|▌ | 46/7513 [00:04<13:06, 9.49it/s][2022-08-13 01:22:36,579 demo.py:106 INFO train-abinet] ../captcha_database/1F82jaro.png: 1f82jaro [2022-08-13 01:22:36,672 demo.py:106 INFO train-abinet] ../captcha_database/1FP4AX6E.png: 1fp4ax6e 1%|▌ | 48/7513 [00:04<12:33, 9.91it/s][2022-08-13 01:22:36,787 demo.py:106 INFO train-abinet] ../captcha_database/1Fokyix3.png: lf6kyix3 1%|▌ | 49/7513 [00:05<12:56, 9.62it/s][2022-08-13 01:22:36,896 demo.py:106 INFO train-abinet] ../captcha_database/1Fy7bkrp.png: 1fy7bkro 1%|▌ | 50/7513 [00:05<13:03, 9.52it/s][2022-08-13 01:22:37,007 demo.py:106 INFO train-abinet] ../captcha_database/1H56bZ9i.png: ih5ebz9i 1%|▌ | 51/7513 [00:05<13:13, 9.40it/s][2022-08-13 01:22:37,100 demo.py:106 INFO train-abinet] ../captcha_database/1HJ5QDMN.png: 1hj5qdmn [2022-08-13 01:22:37,194 demo.py:106 INFO train-abinet] ../captcha_database/1HTFB9DS.png: 1htf89d8 1%|▋ | 53/7513 [00:05<12:34, 9.89it/s][2022-08-13 01:22:37,291 demo.py:106 INFO train-abinet] ../captcha_database/1Htx4jwu.png: 1htx4jwd [2022-08-13 01:22:37,388 demo.py:106 INFO train-abinet] ../captcha_database/1JBHY68E.png: iybhx68e 1%|▋ | 55/7513 [00:05<12:22, 10.04it/s][2022-08-13 01:22:37,484 demo.py:106 INFO train-abinet] ../captcha_database/1JL7e468.png: 4jl7e468 [2022-08-13 01:22:37,590 demo.py:106 INFO train-abinet] ../captcha_database/1P7EKCGH.png: ip7ekcgh 1%|▋ | 57/7513 [00:05<12:25, 10.00it/s][2022-08-13 01:22:37,691 demo.py:106 INFO train-abinet] ../captcha_database/1PWQ57GX.png: 1pwq576x 1%|▋ | 58/7513 [00:06<12:26, 9.98it/s][2022-08-13 01:22:37,787 demo.py:106 INFO train-abinet] ../captcha_database/1Q6C2GE9.png: 1q6c26e9 [2022-08-13 01:22:37,889 demo.py:106 INFO train-abinet] ../captcha_database/1Q8bFwxn.png: 1q8bfwxn 1%|▋ | 60/7513 [00:06<12:23, 10.03it/s][2022-08-13 01:22:37,980 demo.py:106 INFO train-abinet] ../captcha_database/1QEBIPRF.png: 1qeblprf [2022-08-13 01:22:38,073 demo.py:106 INFO train-abinet] ../captcha_database/1Qe5m2p8.png: 12e5m2p8 1%|▋ | 62/7513 [00:06<12:04, 10.29it/s][2022-08-13 01:22:38,178 demo.py:106 INFO train-abinet] ../captcha_database/1Ql8m7z9.png: 1q18m7z9 [2022-08-13 01:22:38,272 demo.py:106 INFO train-abinet] ../captcha_database/1Qt9ogdz.png: 1qt9ogdz 1%|▊ | 64/7513 [00:06<12:08, 10.22it/s][2022-08-13 01:22:38,391 demo.py:106 INFO train-abinet] ../captcha_database/1TQJ3XAW.png: 1tqj3xaw [2022-08-13 01:22:38,502 demo.py:106 INFO train-abinet] ../captcha_database/1YDJQT52.png: 1djqqt52 1%|▊ | 66/7513 [00:06<12:50, 9.67it/s][2022-08-13 01:22:38,613 demo.py:106 INFO train-abinet] ../captcha_database/1ZnQo46J.png: 1znqo46j 1%|▊ | 67/7513 [00:06<13:00, 9.54it/s][2022-08-13 01:22:38,726 demo.py:106 INFO train-abinet] ../captcha_database/1Zntwpk6.png: 1zntwpk6

    opened by sneedgers 4
  • Less VRAM U_U 14GBvRAM in USE

    Less VRAM U_U 14GBvRAM in USE

    image hello my friend. From Germany!!!!

    can you help me to use less vram? lol ?

    RuntimeError: CUDA out of memory. Tried to allocate 768.00 MiB (GPU 0; 14.76 GiB total capacity; 12.85 GiB already allocated; 435.75 MiB free; 13.22 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

    opened by sneedgers 1
Owner
null
Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

SwinTextSpotter This is the pytorch implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text R

mxin262 183 Jan 3, 2023
Implementation of "Bidirectional Projection Network for Cross Dimension Scene Understanding" CVPR 2021 (Oral)

Bidirectional Projection Network for Cross Dimension Scene Understanding CVPR 2021 (Oral) [ Project Webpage ] [ arXiv ] [ Video ] Existing segmentatio

Hu Wenbo 135 Dec 26, 2022
A PyTorch implementation of "From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network" (ICCV2021)

From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network The official code of VisionLAN (ICCV2021). VisionLAN successfully a

null 81 Dec 12, 2022
Code for the paper "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021)

MASTER-PyTorch PyTorch reimplementation of "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021). This projec

Wenwen Yu 255 Dec 29, 2022
L-Verse: Bidirectional Generation Between Image and Text

Far beyond learning long-range interactions of natural language, transformers are becoming the de-facto standard for many vision tasks with their power and scalabilty

Kim, Taehoon 102 Dec 21, 2022
Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Make-A-Scene - PyTorch Pytorch implementation (inofficial) of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors (https://arxiv.org/

Casual GAN Papers 259 Dec 28, 2022
SCALE: Modeling Clothed Humans with a Surface Codec of Articulated Local Elements (CVPR 2021)

SCALE: Modeling Clothed Humans with a Surface Codec of Articulated Local Elements (CVPR 2021) This repository contains the official PyTorch implementa

Qianli Ma 133 Jan 5, 2023
Official implementation of the ICCV 2021 paper: "The Power of Points for Modeling Humans in Clothing".

The Power of Points for Modeling Humans in Clothing (ICCV 2021) This repository contains the official PyTorch implementation of the ICCV 2021 paper: T

Qianli Ma 158 Nov 24, 2022
Official repository for "PAIR: Planning and Iterative Refinement in Pre-trained Transformers for Long Text Generation"

pair-emnlp2020 Official repository for the paper: Xinyu Hua and Lu Wang: PAIR: Planning and Iterative Refinement in Pre-trained Transformers for Long

Xinyu Hua 31 Oct 13, 2022
PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

WaveGrad2 - PyTorch Implementation PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis. Status (202

Keon Lee 59 Dec 6, 2022
PyTorch code of my ICDAR 2021 paper Vision Transformer for Fast and Efficient Scene Text Recognition (ViTSTR)

Vision Transformer for Fast and Efficient Scene Text Recognition (ICDAR 2021) ViTSTR is a simple single-stage model that uses a pre-trained Vision Tra

Rowel Atienza 198 Dec 27, 2022
Image transformations designed for Scene Text Recognition (STR) data augmentation. Published at ICCV 2021 Workshop on Interactive Labeling and Data Augmentation for Vision.

Data Augmentation for Scene Text Recognition (ICCV 2021 Workshop) (Pronounced as "strog") Paper Arxiv Why it matters? Scene Text Recognition (STR) req

Rowel Atienza 152 Dec 28, 2022
Scene-Text-Detection-and-Recognition (Pytorch)

Scene-Text-Detection-and-Recognition (Pytorch) Competition URL: https://tbrain.t

Gi-Luen Huang 9 Jan 2, 2023
Implementation of Bidirectional Recurrent Independent Mechanisms (Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules)

BRIMs Bidirectional Recurrent Independent Mechanisms Implementation of the paper Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neura

Sarthak Mittal 26 May 26, 2022
[ACM MM 2021] Diverse Image Inpainting with Bidirectional and Autoregressive Transformers

Diverse Image Inpainting with Bidirectional and Autoregressive Transformers Installation pip install -r requirements.txt Dataset Preparation Given the

Yingchen Yu 25 Nov 9, 2022
text_recognition_toolbox: The reimplementation of a series of classical scene text recognition papers with Pytorch in a uniform way.

text recognition toolbox 1. 项目介绍 该项目是基于pytorch深度学习框架,以统一的改写方式实现了以下6篇经典的文字识别论文,论文的详情如下。该项目会持续进行更新,欢迎大家提出问题以及对代码进行贡献。 模型 论文标题 发表年份 模型方法划分 CRNN 《An End-t

null 168 Dec 24, 2022
Code for "Primitive Representation Learning for Scene Text Recognition" (CVPR 2021)

Primitive Representation Learning Network (PREN) This repository contains the code for our paper accepted by CVPR 2021 Primitive Representation Learni

Ruijie Yan 76 Jan 2, 2023
This project is a re-implementation of MASTER: Multi-Aspect Non-local Network for Scene Text Recognition by MMOCR

This project is a re-implementation of MASTER: Multi-Aspect Non-local Network for Scene Text Recognition by MMOCR,which is an open-source toolbox based on PyTorch. The overall architecture will be shown below.

Jianquan Ye 82 Nov 17, 2022