A PyTorch implementation of ECCV2018 Paper: TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

Overview

TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

A PyTorch implement of TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes (ECCV 2018) by Megvii

Paper

Comparison of different representations for text instances. (a) Axis-aligned rectangle. (b) Rotated rectangle. (c) Quadrangle. (d) TextSnake. Obviously, the proposed TextSnake representation is able to effectively and precisely describe the geometric properties, such as location, scale, and bending of curved text with perspective distortion, while the other representations (axis-aligned rectangle, rotated rectangle or quadrangle) struggle with giving accurate predictions in such cases.

Textsnake elements:

  • center point
  • tangent line
  • text region

Description

Generally, this code has following features:

  1. include complete training and inference code
  2. pure python version without extra compiling
  3. compatible with laste PyTorch version (write with pytroch 0.4.0)
  4. support TotalText and SynthText dataset

Getting Started

This repo includes the training code and inference demo of TextSnake, training and infercence can be simplely run with a few code.

Prerequisites

To run this repo successfully, it is highly recommanded with:

  • Linux (Ubuntu 16.04)
  • Python3.6
  • Anaconda3
  • NVIDIA GPU(with 8G or larger GPU memory for training, 2G for inference)

(I haven't test it on other Python version.)

  1. clone this repository
git clone https://github.com/princewang1994/TextSnake.pytorch.git
  1. python package can be installed with pip
$ cd $TEXTSNAKE_ROOT
$ pip install -r requirements.txt

Data preparation

Pretraining with SynthText

$ CUDA_VISIBLE_DEVICES=$GPUID python train.py synthtext_pretrain --dataset synth-text --viz --max_epoch 1 --batch_size 8

Training

Training model with given experiment name $EXPNAME

training from scratch:

$ EXPNAME=example
$ CUDA_VISIBLE_DEVICES=$GPUID python train.py $EXPNAME --viz

training with pretrained model(improved performance much)

$ EXPNAME=example
$ CUDA_VISIBLE_DEVICES=$GPUID python train.py example --viz --batch_size 8 --resume save/synthtext_pretrain/textsnake_vgg_0.pth

options:

  • exp_name: experiment name, used to identify different training processes
  • --viz: visualization toggle, output pictures are saved to ./vis by default

other options can be show by run python train.py -h

Running tests

Runing following command can generate demo on TotalText dataset (300 pictures), the result are save to ./vis by default

$ EXPNAME=example
$ CUDA_VISIBLE_DEVICES=$GPUID python eval_textsnake.py $EXPNAME --checkepoch 190

options:

  • exp_name: experiment name, used to identify different training process

other options can be show by run python train.py -h

Evaluation

Total-Text metric is included in dataset/total_text/Evaluation_Protocol/Python_scripts/Deteval.py, you should first modify the input_dir in Deteval.py and run following command for computing DetEval:

$ python dataset/total_text/Evaluation_Protocol/Python_scripts/Deteval.py $EXPNAME --tr 0.8 --tp 0.4

or

$ python dataset/total_text/Evaluation_Protocol/Python_scripts/Deteval.py $EXPNAME --tr 0.7 --tp 0.6

it will output metrics reports.

Pretrained Models

Download from links above and place pth file to the corresponding path(save/XXX/textsnake_vgg_XX.pth).

Performance

DetEval reporting

Following table reports DetEval metrics when we set vgg as the backbone(can be reproduced by using pertained model in Pretrained Model section):

tr=0.7 / tp=0.6(P|R|F1) tr=0.8 / tp=0.4(P|R|F1) FPS(On single 1080Ti)
expand / no merge 0.652 | 0.549 | 0.596 0.874 | 0.711 | 0.784 12.07
expand / merge 0.698 | 0.578 | 0.633 0.859 | 0.660 | 0.746 8.38
no expand / no merge 0.753 | 0.693 | 0.722 0.695 | 0.628 | 0.660 9.94
no expand / merge 0.747 | 0.677 | 0.710 0.691 | 0.602 | 0.643 11.05
reported on paper - 0.827 | 0.745 | 0.784

* expand denotes expanding radius by 0.3 times while post-processing

* merge denotes that merging overlapped instance while post-processing

Pure Inference

You can also run prediction on your own dataset without annotations:

  1. Download pretrained model and place .pth file to save/pretrained/textsnake_vgg_180.pth
  2. Run pure inference script as following:
$ EXPNAME=pretrained
$ CUDA_VISIBLE_DEVICES=$GPUID python demo.py $EXPNAME --checkepoch 180 --img_root /path/to/image

predicted result will be saved in output/$EXPNAME and visualization in vis/${EXPNAME}_deploy

Qualitative results

  • left: prediction/ground true
  • middle: text region(TR)
  • right: text center line(TCL)

What is comming

  • Pretraining with SynthText
  • Metric computing
  • Pretrained model upload
  • Pure inference script
  • More dataset suport: [ICDAR15, CTW1500]
  • Various backbone experiments

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgement

Comments
  • training on my own dataset

    training on my own dataset

    Hi Dear Wang, If I find out correctly,

    1. Pretraining with SynthText section, train model using SynthText dataset.
    2. In the Training section, we have 2 choises: a)training from scratch which train model with only TotalText dataset and b) training with pretrained model.

    Is these correct?

    My question is that: I have 100 images. How can I prepare them to training? thanks

    opened by johnSmith1990 11
  • How to get image_list.txt?

    How to get image_list.txt?

    Hi, I download the dataset from here then ran python dataset/synth-text/make_list.py and I couldn't find image_list.txt.

    Is it necessary to download the dataset from this link?

    opened by sherif7810 8
  • About the result of demo.py

    About the result of demo.py

    I managed to finish the demo.py, but I got the .txt result, how should I convert it to the corresponding image result display?I am sorry to trouble you again.

    opened by cryptolalia 7
  • the error in running demo.py

    the error in running demo.py

    Loading from ./save/example/textsnake_vgg_10.pth
    Start testing TextSnake.
    detect 0 / 300 images: img650.jpg.
    Traceback (most recent call last):
      File "demo.py", line 133, in <module>
        main()
      File "demo.py", line 119, in main
        inference(model, detector, test_loader)
      File "demo.py", line 77, in inference
        batch_result = detector.detect(tr_pred, tcl_pred, sin_pred, cos_pred, radii_pred)  # (n_tcl, 3)
      File "/home/crypa/TextSnake.pytorch/util/detection.py", line 204, in detect
        detect_result = self.build_tcl(tcl, sin_pred, cos_pred, radii_pred)
      File "/home/crypa/TextSnake.pytorch/util/detection.py", line 164, in build_tcl
        _, conts, _ = cv2.findContours(mask.astype(np.uint8), cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
    ValueError: not enough values to unpack (expected 3, got 2)
    

    Due to time, I didn't reach 200epoch according to the training requirements. I tried to test the code with 10epoch. so runCUDA_VISIBLE_DEVICES=0 python demo.py --checkepoch 10 example But got the above error.so sorry,I am a newbie,could u help me?

    opened by cryptolalia 5
  • loss_radii = F.smooth_l1_loss(radii_pred[tcl_mask] / radii_map[tcl_mask], ones)   is inf

    loss_radii = F.smooth_l1_loss(radii_pred[tcl_mask] / radii_map[tcl_mask], ones) is inf

    I do not understand loss_radii = F.smooth_l1_loss(radii_pred[tcl_mask] / radii_map[tcl_mask], ones), what do radii_pred[tcl_mask] and radii_map[tcl_mask] mean? Can someone help me?

    opened by ustczhouyu 4
  • merge contour issue

    merge contour issue

    Hi Dear. I want to use detector.merge_contours method to merge overlapped contours. but it gives an error "ValueError: too many values to unpack (expected 2)" at this line: cont_i, disk_i = all_contours[i]. thank you for your help. can you give insight that how does merge contour work? what are cont_i disk_i s? thanks

    opened by johnSmith1990 4
  • too many

    too many "torch.uint8 Warning"

    C:\w\1\s\windows\pytorch\aten\src\ATen/native/IndexingUtils.h:20: UserWarning: indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead.

    Warning: indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead. (expandTensors at C:/w/1/s/windows/pytorch/aten/src\ATen/native/IndexingUtils.h:20)

    opened by xmyhhh 3
  • Error using torch.load to load textsnake_vgg_0.pth downloaded from Google Drive using Google Colab (to get free GPU)

    Error using torch.load to load textsnake_vgg_0.pth downloaded from Google Drive using Google Colab (to get free GPU)

    Hello,

    torch.load does not manage to load correctly textsnake_vgg_0.pth (downloaded with google drive) using google colab which is linked to my drive. Here is the command I ran:

    ! python3.6 train_textsnake.py example --viz --batch_size 8 --resume /content/gdrive/My\ Drive/Colab\ Notebooks/TextSnake.pytorch/save/synthtext_pretrain/textsnake_vgg_0.pth

    Here is the error I get (straight after it printed all the configs):

    Loading from /content/gdrive/My Drive/Colab Notebooks/TextSnake.pytorch/save/synthtext_pretrain/textsnake_vgg_0.pth Traceback (most recent call last): File "train_textsnake.py", line 238, in main() File "train_textsnake.py", line 210, in main load_model(model, cfg.resume) File "train_textsnake.py", line 46, in load_model state_dict = torch.load(model_path)['state_dict'] File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 303, in load return _load(f, map_location, pickle_module) File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 454, in _load return legacy_load(f) File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 380, in legacy_load with closing(tarfile.open(fileobj=f, mode='r:', format=tarfile.PAX_FORMAT)) as tar,
    File "/usr/lib/python3.6/tarfile.py", line 1589, in open return func(name, filemode, fileobj, **kwargs) File "/usr/lib/python3.6/tarfile.py", line 1619, in taropen return cls(name, mode, fileobj, **kwargs) File "/usr/lib/python3.6/tarfile.py", line 1482, in init self.firstmember = self.next() File "/usr/lib/python3.6/tarfile.py", line 2297, in next tarinfo = self.tarinfo.fromtarfile(self) File "/usr/lib/python3.6/tarfile.py", line 1092, in fromtarfile buf = tarfile.fileobj.read(BLOCKSIZE) OSError: [Errno 5] Input/output error

    I assume part of the error is that pytorch is trying to open as a tar file although it is not.

    I would really appreciate your help.

    Best Wishes.

    opened by NoAchache 3
  • RuntimeError: cuda runtime error (11) : invalid argument at /opt/conda/conda-bld/pytorch_1532579805626/work/aten/src/THC/THCGeneral.cpp:663

    RuntimeError: cuda runtime error (11) : invalid argument at /opt/conda/conda-bld/pytorch_1532579805626/work/aten/src/THC/THCGeneral.cpp:663

    遇到一些问题想请教一下作者。首先是bash download.sh运行失败。索性自己去github地址上面下下来了total-text数据集。然而github上面的totaltext 其gt使用txt标记。于是首先吧txt改为了csv文件,然后修改total-text.py中的parse_mat()函数为如下代码: ''' def parse_csv(self, csv_path): with open(csv_path,'rt', encoding='UTF-8') as row_data: readers = csv.reader(row_data, delimiter=',') data = list(readers)

            polygons = []
            for cell in data:
                x = [int(num) for num in re.findall('\d+', str(cell[0].strip()))]
                y = [int(num) for num in re.findall('\d+', str(cell[1].strip()))]
    
                text = re.findall('\'([^>]+?)\'', str(cell[3]))[0]
                ori = re.findall('\'([^>]+?)\'', str(cell[2]))[0]
    
                text = text if len(text)>0 else '#'
                ori = ori if len(ori)>0 else 'c'
    
                if len(x) < 4:  # too few points
                    continue
                pts = np.stack([x, y]).T.astype(np.int32)
                polygons.append(TextInstance(pts, ori, text))
    
        return polygons
    

    '''

    最后这是我的运行结果: ''' aceback (most recent call last): File "eval_textsnake.py", line 123, in main() File "eval_textsnake.py", line 102, in main inference(detector, test_loader, output_dir) File "eval_textsnake.py", line 48, in inference contours, output = detector.detect(image) File "/root/fsy_SceneTextRec/Docker-pytorch0.4.1/TextSnake.pytorch/util/detection.py", line 237, in detect output = self.model(image) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, **kwargs) File "/root/fsy_SceneTextRec/Docker-pytorch0.4.1/TextSnake.pytorch/network/textnet.py", line 47, in forward C1, C2, C3, C4, C5 = self.backbone(x) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, **kwargs) File "/root/fsy_SceneTextRec/Docker-pytorch0.4.1/TextSnake.pytorch/network/vgg.py", line 92, in forward C1 = self.stage1(x) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, **kwargs) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/container.py", line 91, in forward input = module(input) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, **kwargs) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 301, in forward self.padding, self.dilation, self.groups) RuntimeError: cuda runtime error (11) : invalid argument at /opt/conda/conda-bld/pytorch_1532579805626/work/aten/src/THC/THCGeneral.cpp:663 ''' 运行环境是RTX2070,pytorch0.4.1的官方docker,cuda9.0

    opened by Shualite 3
  • problematic point removal

    problematic point removal

    The point removal of in TextInstance removes all qualified points independently. Which is not working well in some cases.

    before [[  15    0]
     [1216    0]
     [1219    0]
     [  13   16]]
    
    after [[15  0]
     [13 16]]
    

    oops_screenshot_05 03 2019 The image is from the latest ArT dataset.

    opened by Godricly 3
  • Bump pillow from 5.3.0 to 9.0.0

    Bump pillow from 5.3.0 to 9.0.0

    Bumps pillow from 5.3.0 to 9.0.0.

    Release notes

    Sourced from pillow's releases.

    9.0.0

    https://pillow.readthedocs.io/en/stable/releasenotes/9.0.0.html

    Changes

    ... (truncated)

    Changelog

    Sourced from pillow's changelog.

    9.0.0 (2022-01-02)

    • Restrict builtins for ImageMath.eval(). CVE-2022-22817 #5923 [radarhere]

    • Ensure JpegImagePlugin stops at the end of a truncated file #5921 [radarhere]

    • Fixed ImagePath.Path array handling. CVE-2022-22815, CVE-2022-22816 #5920 [radarhere]

    • Remove consecutive duplicate tiles that only differ by their offset #5919 [radarhere]

    • Improved I;16 operations on big endian #5901 [radarhere]

    • Limit quantized palette to number of colors #5879 [radarhere]

    • Fixed palette index for zeroed color in FASTOCTREE quantize #5869 [radarhere]

    • When saving RGBA to GIF, make use of first transparent palette entry #5859 [radarhere]

    • Pass SAMPLEFORMAT to libtiff #5848 [radarhere]

    • Added rounding when converting P and PA #5824 [radarhere]

    • Improved putdata() documentation and data handling #5910 [radarhere]

    • Exclude carriage return in PDF regex to help prevent ReDoS #5912 [hugovk]

    • Fixed freeing pointer in ImageDraw.Outline.transform #5909 [radarhere]

    • Added ImageShow support for xdg-open #5897 [m-shinder, radarhere]

    • Support 16-bit grayscale ImageQt conversion #5856 [cmbruns, radarhere]

    • Convert subsequent GIF frames to RGB or RGBA #5857 [radarhere]

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 2
  • Bump pillow from 5.3.0 to 9.3.0

    Bump pillow from 5.3.0 to 9.3.0

    Bumps pillow from 5.3.0 to 9.3.0.

    Release notes

    Sourced from pillow's releases.

    9.3.0

    https://pillow.readthedocs.io/en/stable/releasenotes/9.3.0.html

    Changes

    ... (truncated)

    Changelog

    Sourced from pillow's changelog.

    9.3.0 (2022-10-29)

    • Limit SAMPLESPERPIXEL to avoid runtime DOS #6700 [wiredfool]

    • Initialize libtiff buffer when saving #6699 [radarhere]

    • Inline fname2char to fix memory leak #6329 [nulano]

    • Fix memory leaks related to text features #6330 [nulano]

    • Use double quotes for version check on old CPython on Windows #6695 [hugovk]

    • Remove backup implementation of Round for Windows platforms #6693 [cgohlke]

    • Fixed set_variation_by_name offset #6445 [radarhere]

    • Fix malloc in _imagingft.c:font_setvaraxes #6690 [cgohlke]

    • Release Python GIL when converting images using matrix operations #6418 [hmaarrfk]

    • Added ExifTags enums #6630 [radarhere]

    • Do not modify previous frame when calculating delta in PNG #6683 [radarhere]

    • Added support for reading BMP images with RLE4 compression #6674 [npjg, radarhere]

    • Decode JPEG compressed BLP1 data in original mode #6678 [radarhere]

    • Added GPS TIFF tag info #6661 [radarhere]

    • Added conversion between RGB/RGBA/RGBX and LAB #6647 [radarhere]

    • Do not attempt normalization if mode is already normal #6644 [radarhere]

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Bump numpy from 1.15.1 to 1.22.0

    Bump numpy from 1.15.1 to 1.22.0

    Bumps numpy from 1.15.1 to 1.22.0.

    Release notes

    Sourced from numpy's releases.

    v1.22.0

    NumPy 1.22.0 Release Notes

    NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

    • Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.
    • A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.
    • NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.
    • New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.
    • A new configurable allocator for use by downstream projects.

    These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

    The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

    Expired deprecations

    Deprecated numeric style dtype strings have been removed

    Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

    (gh-19539)

    Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

    numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

    (gh-19615)

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • how to update the pretrained model with new vocabulary to improve the recognition part

    how to update the pretrained model with new vocabulary to improve the recognition part

    In my case TextSnake works well on detecting the text region however the recognition part is not so great - this is mainly because the domain text is medical. I have synthesized text images with custom library but I wonder if it is possible to update the pretrain model instead of training a totally new model from scratch

    opened by naarkhoo 0
  • How to change the policy of contour?

    How to change the policy of contour?

    Hi Thank you for very interesting work.

    However, I am stuck here handling your code. I wish texts in a horizontal way would be grouped together with looser policy. Can you provide me a hint on how to implement this?

    opened by wjun0830 0
  • RuntimeError: Cannot re-initialize CUDA in forked subprocess.

    RuntimeError: Cannot re-initialize CUDA in forked subprocess.

    Getting RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method. working with colab. Here pytorch version is 1.7.0 error2

    opened by AmrutaAnalytics 2
Owner
Prince Wang
I'm a CS graduate student from Zhejiang University
Prince Wang
Implementation of our paper 'PixelLink: Detecting Scene Text via Instance Segmentation' in AAAI2018

Code for the AAAI18 paper PixelLink: Detecting Scene Text via Instance Segmentation, by Dan Deng, Haifeng Liu, Xuelong Li, and Deng Cai. Contributions

null 758 Dec 22, 2022
An Implementation of the seglink alogrithm in paper Detecting Oriented Text in Natural Images by Linking Segments

Tips: A more recent scene text detection algorithm: PixelLink, has been implemented here: https://github.com/ZJULearning/pixel_link Contents: Introduc

dengdan 484 Dec 7, 2022
keras复现场景文本检测网络CPTN: 《Detecting Text in Natural Image with Connectionist Text Proposal Network》;欢迎试用,关注,并反馈问题...

keras-ctpn [TOC] 说明 预测 训练 例子 4.1 ICDAR2015 4.1.1 带侧边细化 4.1.2 不带带侧边细化 4.1.3 做数据增广-水平翻转 4.2 ICDAR2017 4.3 其它数据集 toDoList 总结 说明 本工程是keras实现的CPTN: Detecti

mick.yi 107 Jan 9, 2023
Detecting Text in Natural Image with Connectionist Text Proposal Network (ECCV'16)

Detecting Text in Natural Image with Connectionist Text Proposal Network The codes are used for implementing CTPN for scene text detection, described

Tian Zhi 1.3k Dec 22, 2022
Source code of RRPN ---- Arbitrary-Oriented Scene Text Detection via Rotation Proposals

Paper source Arbitrary-Oriented Scene Text Detection via Rotation Proposals https://arxiv.org/abs/1703.01086 News We update RRPN in pytorch 1.0! View

null 428 Nov 22, 2022
An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

InceptText-Tensorflow An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Orien

GeorgeJoe 115 Dec 12, 2022
Just a script for detecting the lanes in any car game (not just gta 5) with specific resolution and road design ( very basic and limited )

GTA-5-Lane-detection Just a script for detecting the lanes in any car game (not just gta 5) with specific resolution and road design ( very basic and

Danciu Georgian 4 Aug 1, 2021
A simple python program to record security cam footage by detecting a face and body of a person in the frame.

SecurityCam A simple python program to record security cam footage by detecting a face and body of a person in the frame. This code was created by me,

null 1 Nov 8, 2021
Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

STN-OCR: A single Neural Network for Text Detection and Text Recognition This repository contains the code for the paper: STN-OCR: A single Neural Net

Christian Bartz 496 Jan 5, 2023
Code related to "Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity" paper

DataTuner You have just found the DataTuner. This repository provides tools for fine-tuning language models for a task. See LICENSE.txt for license de

null 81 Jan 1, 2023
OpenGait is a flexible and extensible gait recognition project

A flexible and extensible framework for gait recognition. You can focus on designing your own models and comparing with state-of-the-arts easily with the help of OpenGait.

Shiqi Yu 335 Dec 22, 2022
This is a passport scanning web service to help you scan, identify and validate your passport created with a simple and flexible design and ready to be integrated right into your system!

Passport-Recogniton-System This is a passport scanning web service to help you scan, identify and validate your passport created with a simple and fle

Mo'men Ashraf Muhamed 7 Jan 4, 2023
Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding for Zero-Example Video Retrieval.

Dual Encoding for Video Retrieval by Text Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding

null 81 Dec 1, 2022
This is a pytorch re-implementation of EAST: An Efficient and Accurate Scene Text Detector.

EAST: An Efficient and Accurate Scene Text Detector Description: This version will be updated soon, please pay attention to this work. The motivation

Dejia Song 544 Dec 20, 2022
PyTorch Re-Implementation of EAST: An Efficient and Accurate Scene Text Detector

Description This is a PyTorch Re-Implementation of EAST: An Efficient and Accurate Scene Text Detector. Only RBOX part is implemented. Using dice loss

null 365 Dec 20, 2022
Deskew is a command line tool for deskewing scanned text documents. It uses Hough transform to detect "text lines" in the image. As an output, you get an image rotated so that the lines are horizontal.

Deskew by Marek Mauder https://galfar.vevb.net/deskew https://github.com/galfar/deskew v1.30 2019-06-07 Overview Deskew is a command line tool for des

Marek Mauder 127 Dec 3, 2022
text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network

text-detection-ctpn Scene text detection based on ctpn (connectionist text proposal network). It is implemented in tensorflow. The origin paper can be

Shaohui Ruan 3.3k Dec 30, 2022
huoyijie 1.2k Dec 29, 2022