The code of "Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes"

Pengyuan Lyu

Last update: Nov 21, 2022

Related tags

Computer Vision masktextspotter.caffe2

Overview

Mask TextSpotter

A Pytorch implementation of Mask TextSpotter along with its extension can be find here

Introduction

This is the official implementation of Mask TextSpotter.

Mask TextSpotter is an End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

For more details, please refer to our paper.

Citing the paper

Please cite the paper in your publications if it helps your research:

@inproceedings{LyuLYWB18,
  author    = {Pengyuan Lyu and
               Minghui Liao and
               Cong Yao and
               Wenhao Wu and
               Xiang Bai},
  title     = {Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes},
  booktitle = {Proc. ECCV},
  pages     = {71--88},
  year      = {2018}
}

Requirements
Installation
Models
Datasets
Test
Train

Requirements

NVIDIA GPU, Linux, Python2
Caffe2, various standard Python packages

Installation

Caffe2

To install Caffe2 with CUDA support, follow the installation instructions from the Caffe2 website. If you already have Caffe2 installed, make sure to update your Caffe2 to a version that includes the Detectron module.

Please ensure that your Caffe2 installation was successful before proceeding by running the following commands and checking their output as directed in the comments.

# To check if Caffe2 build was successful
python2 -c 'from caffe2.python import core' 2>/dev/null && echo "Success" || echo "Failure"

# To check if Caffe2 GPU build was successful
# This must print a number > 0 in order to use Detectron
python2 -c 'from caffe2.python import workspace; print(workspace.NumCudaDevices())'

If the caffe2 Python package is not found, you likely need to adjust your PYTHONPATH environment variable to include its location (/path/to/caffe2/build, where build is the Caffe2 CMake build directory).

Install Python dependencies:

pip install numpy pyyaml matplotlib opencv-python>=3.0 setuptools Cython mock

Set up Python modules:

cd $ROOT_DIR/lib && make

Note: Caffe2 is difficult to install sometimes.

Models

Download the model and place it as models/model_iter79999.pkl Our trained model: Google Drive; BaiduYun (key of BaiduYun: gnpc)

Datasets

Download the ICDAR2013(Google Drive, BaiduYun) and ICDAR2015(Google Drive, BaiduYun) as examples. Datasets should be placed in lib/datasets/data/ as below

synth
icdar2013
icdar2015
scut-eng-char
totaltext

If you do not train the model, you can just download the ICDAR2013 or ICDAR2015 datasets for testing.

Test

python tools/test_net.py --cfg configs/text/mask_textspotter.yaml

You can modify the model path or the test dataset in configs/text/mask_textspotter.yaml.

Train

You should format all the datasets you used for training as above. Then modify configs/text/mask_textspotter.yaml to fit the gpus, model path, and datasets.

python tools/train_net.py --cfg configs/text/mask_textspotter.yaml

Comments

Questions about datasets and testsets.

I followed your instructions and configured the masktextspotter environment. Now I will use the data set and test set you gave me.

python tools/test_net.py --cfg configs/text/mask_textspotter.yaml

Display the following information, do not know how long to execute?

(caffe2_env) zhoujianwen@zhoujianwen-System:~/masktextspotter.caffe2$ python tools/test_net.py --cfg configs/text/mask_textspotter.yaml

make: Entering directory '/home/zhoujianwen/masktextspotter.caffe2/lanms'
make: 'adaptor.so' is up to date.
make: Leaving directory '/home/zhoujianwen/masktextspotter.caffe2/lanms'
INFO test_net.py: 141: Called with args:
INFO test_net.py: 142: Namespace(cfg_file='configs/text/mask_textspotter.yaml', multi_gpu_testing=False, opts=[], range=None, vis=False, wait=True)
/home/zhoujianwen/masktextspotter.caffe2/lib/core/config.py:1094: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  yaml_cfg = AttrDict(yaml.load(f))
INFO test_net.py: 148: Testing with config:
INFO test_net.py: 149: {'BBOX_XFORM_CLIP': 4.135166556742356,
 'CLUSTER': {'ON_CLUSTER': False},
 'DATA_LOADER': {'NUM_THREADS': 4},
 'DEDUP_BOXES': 0.0625,
 'DOWNLOAD_CACHE': '/tmp/detectron-download-cache',
 'EPS': 1e-14,
 'EXPECTED_RESULTS': [],
 'EXPECTED_RESULTS_ATOL': 0.005,
 'EXPECTED_RESULTS_EMAIL': '',
 'EXPECTED_RESULTS_RTOL': 0.1,
 'FAST_RCNN': {'MLP_HEAD_DIM': 1024,
               'ROI_BOX_HEAD': 'fast_rcnn_heads.add_roi_2mlp_head',
               'ROI_XFORM_METHOD': 'RoIAlign',
               'ROI_XFORM_RESOLUTION': 7,
               'ROI_XFORM_SAMPLING_RATIO': 2},
 'FPN': {'COARSEST_STRIDE': 32,
         'DIM': 256,
         'EXTRA_CONV_LEVELS': False,
         'FPN_ON': True,
         'MULTILEVEL_ROIS': True,
         'MULTILEVEL_RPN': True,
         'ROI_CANONICAL_LEVEL': 4,
         'ROI_CANONICAL_SCALE': 224,
         'ROI_MAX_LEVEL': 5,
         'ROI_MIN_LEVEL': 2,
         'RPN_ANCHOR_START_SIZE': 32,
         'RPN_ASPECT_RATIOS': (0.5, 1, 2),
         'RPN_MAX_LEVEL': 6,
         'RPN_MIN_LEVEL': 2,
         'USE_DEFORMABLE': False,
         'ZERO_INIT_LATERAL': False},
 'IMAGE': {'aug': False,
           'brightness_delta': 32,
           'brightness_prob': 0.5,
           'contrast_lower': 0.5,
           'contrast_prob': 0.5,
           'contrast_upper': 1.5,
           'hue_delta': 18,
           'hue_prob': 0.5,
           'lighting_noise_prob': 0.5,
           'rotate_delta': 15,
           'rotate_prob': 0.5,
           'saturation_lower': 0.5,
           'saturation_prob': 0.5,
           'saturation_upper': 1.5},
 'KRCNN': {'CONV_HEAD_DIM': 256,
           'CONV_HEAD_KERNEL': 3,
           'CONV_INIT': 'GaussianFill',
           'DECONV_DIM': 256,
           'DECONV_KERNEL': 4,
           'DILATION': 1,
           'HEATMAP_SIZE': -1,
           'INFERENCE_MIN_SIZE': 0,
           'KEYPOINT_CONFIDENCE': 'bbox',
           'LOSS_WEIGHT': 1.0,
           'MIN_KEYPOINT_COUNT_FOR_VALID_MINIBATCH': 20,
           'NMS_OKS': False,
           'NORMALIZE_BY_VISIBLE_KEYPOINTS': True,
           'NUM_KEYPOINTS': -1,
           'NUM_STACKED_CONVS': 8,
           'ROI_KEYPOINTS_HEAD': '',
           'ROI_XFORM_METHOD': 'RoIAlign',
           'ROI_XFORM_RESOLUTION': 7,
           'ROI_XFORM_SAMPLING_RATIO': 0,
           'UP_SCALE': -1,
           'USE_DECONV': False,
           'USE_DECONV_OUTPUT': False},
 'MATLAB': 'matlab',
 'MEMONGER': True,
 'MEMONGER_SHARE_ACTIVATIONS': False,
 'MODEL': {'BBOX_REG_WEIGHTS': (10.0, 10.0, 5.0, 5.0),
           'CLS_AGNOSTIC_BBOX_REG': False,
           'CONV_BODY': 'FPN.add_fpn_ResNet50_conv5_body',
           'EXECUTION_TYPE': 'dag',
           'FASTER_RCNN': True,
           'KEYPOINTS_ON': False,
           'MASK_ON': True,
           'NAME': 'shrink++',
           'NUM_CLASSES': 2,
           'RPN_ONLY': False,
           'TYPE': 'generalized_rcnn'},
 'MRCNN': {'CLS_SPECIFIC_MASK': True,
           'CONV_INIT': 'MSRAFill',
           'DILATION': 1,
           'DIM_REDUCED': 256,
           'IS_E2E': True,
           'MASK_BATCH_SIZE_PER_IM': 16,
           'RESOLUTION': 28,
           'RESOLUTION_H': 32,
           'RESOLUTION_W': 128,
           'ROI_MASK_HEAD': 'text_mask_rcnn_heads.mask_rcnn_fcn_head_v1up4convs',
           'ROI_XFORM_METHOD': 'RoIAlign',
           'ROI_XFORM_RESOLUTION': 14,
           'ROI_XFORM_RESOLUTION_H': 16,
           'ROI_XFORM_RESOLUTION_W': 64,
           'ROI_XFORM_SAMPLING_RATIO': 2,
           'THRESH_BINARIZE': 0.5,
           'UPSAMPLE_RATIO': 1,
           'USE_FC_OUTPUT': False,
           'WEIGHT_LOSS_CHAR_BOX': 1.0,
           'WEIGHT_LOSS_MASK': 1.0,
           'WEIGHT_WH': True},
 'NUM_GPUS': 1,
 'OUTPUT_DIR': '.',
 'PIXEL_MEANS': array([[[102.9801, 115.9465, 122.7717]]]),
 'RESNETS': {'NUM_GROUPS': 1,
             'RES5_DILATION': 1,
             'STRIDE_1X1': True,
             'TRANS_FUNC': 'bottleneck_transformation',
             'WIDTH_PER_GROUP': 64},
 'RETINANET': {'ANCHOR_SCALE': 4,
               'ASPECT_RATIOS': (0.25, 0.5, 1.0, 2.0, 4.0),
               'BBOX_REG_BETA': 0.11,
               'BBOX_REG_WEIGHT': 1.0,
               'CLASS_SPECIFIC_BBOX': False,
               'INFERENCE_TH': 0.05,
               'LOSS_ALPHA': 0.25,
               'LOSS_GAMMA': 2.0,
               'NEGATIVE_OVERLAP': 0.4,
               'NUM_CONVS': 4,
               'POSITIVE_OVERLAP': 0.5,
               'PRE_NMS_TOP_N': 1000,
               'PRIOR_PROB': 0.01,
               'RETINANET_ON': False,
               'SCALES_PER_OCTAVE': 3,
               'SHARE_CLS_BBOX_TOWER': False,
               'SOFTMAX': False},
 'RFCN': {'PS_GRID_SIZE': 3},
 'RNG_SEED': 3,
 'ROOT_DIR': '/home/zhoujianwen/masktextspotter.caffe2',
 'RPN': {'ASPECT_RATIOS': (0.5, 1, 2),
         'RPN_ON': True,
         'SIZES': (64, 128, 256, 512),
         'STRIDE': 16},
 'SOLVER': {'BASE_LR': 0.005,
            'GAMMA': 0.1,
            'LOG_LR_CHANGE_THRESHOLD': 1.1,
            'LRS': [],
            'LR_POLICY': 'steps_with_decay',
            'MAX_ITER': 200000,
            'MOMENTUM': 0.9,
            'SCALE_MOMENTUM': True,
            'SCALE_MOMENTUM_THRESHOLD': 1.1,
            'STEPS': [0, 120000],
            'STEP_SIZE': 30000,
            'WARM_UP_FACTOR': 0.3333333333333333,
            'WARM_UP_ITERS': 500,
            'WARM_UP_METHOD': u'linear',
            'WEIGHT_DECAY': 0.0001},
 'TEST': {'BBOX_AUG': {'AREA_TH_HI': 32400,
                       'AREA_TH_LO': 2500,
                       'ASPECT_RATIOS': (),
                       'ASPECT_RATIO_H_FLIP': False,
                       'COORD_HEUR': 'UNION',
                       'ENABLED': False,
                       'H_FLIP': False,
                       'MAX_SIZE': 2000,
                       'SCALES': (800,),
                       'SCALE_H_FLIP': False,
                       'SCALE_SIZE_DEP': False,
                       'SCORE_HEUR': 'UNION'},
          'BBOX_REG': True,
          'BBOX_VOTE': {'ENABLED': True,
                        'SCORING_METHOD': 'ID',
                        'SCORING_METHOD_BETA': 1.0,
                        'VOTE_TH': 0.9},
          'COMPETITION_MODE': True,
          'DATASET': '',
          'DATASETS': ('icdar2015_test',),
          'DETECTIONS_PER_IM': 100,
          'FORCE_JSON_DATASET_EVAL': False,
          'KPS_AUG': {'AREA_TH': 32400,
                      'ASPECT_RATIOS': (),
                      'ASPECT_RATIO_H_FLIP': False,
                      'ENABLED': False,
                      'HEUR': 'HM_AVG',
                      'H_FLIP': False,
                      'MAX_SIZE': 4000,
                      'SCALES': (),
                      'SCALE_H_FLIP': False,
                      'SCALE_SIZE_DEP': False},
          'MASK_AUG': {'AREA_TH': 32400,
                       'ASPECT_RATIOS': (),
                       'ASPECT_RATIO_H_FLIP': False,
                       'ENABLED': False,
                       'HEUR': 'SOFT_AVG',
                       'H_FLIP': False,
                       'MAX_SIZE': 3333,
                       'SCALES': (1600,),
                       'SCALE_H_FLIP': False,
                       'SCALE_SIZE_DEP': False},
          'MAX_SIZE': 3333,
          'NMS': 0.5,
          'NUM_TEST_IMAGES': 5000,
          'OUTPUT_POLYGON': False,
          'PRECOMPUTED_PROPOSALS': False,
          'PROPOSAL_FILE': '',
          'PROPOSAL_FILES': (),
          'PROPOSAL_LIMIT': 2000,
          'RPN_MIN_SIZE': 0,
          'RPN_NMS_THRESH': 0.7,
          'RPN_POST_NMS_TOP_N': 1000,
          'RPN_PRE_NMS_TOP_N': 1000,
          'SCALES': (1000,),
          'SCORE_THRESH': 0.2,
          'SOFT_NMS': {'ENABLED': False, 'METHOD': 'linear', 'SIGMA': 0.5},
          'VIS': False,
          'WEIGHTS': '/home/zhoujianwen/masktextspotter.caffe2/models/model_iter79999.pkl'},
 'TRAIN': {'ASPECT_GROUPING': True,
           'AUTO_RESUME': True,
           'BATCH_SIZE_PER_IM': 512,
           'BBOX_THRESH': 0.5,
           'BG_THRESH_HI': 0.5,
           'BG_THRESH_LO': 0.0,
           'CROWD_FILTER_THRESH': 0.7,
           'DATASETS': ('icdar2015_train',),
           'FG_FRACTION': 0.25,
           'FG_THRESH': 0.5,
           'FREEZE_CONV_BODY': False,
           'GT_MIN_AREA': -1,
           'IMS_PER_BATCH': 2,
           'MAX_SIZE': 1333,
           'MIX_RATIOS': [0.5, 0.25, 0.25],
           'MIX_TRAIN': False,
           'PROPOSAL_FILES': (),
           'RPN_BATCH_SIZE_PER_IM': 256,
           'RPN_FG_FRACTION': 0.5,
           'RPN_MIN_SIZE': 0,
           'RPN_NEGATIVE_OVERLAP': 0.3,
           'RPN_NMS_THRESH': 0.7,
           'RPN_POSITIVE_OVERLAP': 0.7,
           'RPN_POST_NMS_TOP_N': 2000,
           'RPN_PRE_NMS_TOP_N': 2000,
           'RPN_STRADDLE_THRESH': 0,
           'SCALES': (800,),
           'SNAPSHOT_ITERS': 10000,
           'USE_CHARANNS': [True],
           'USE_FLIPPED': False,
           'WEIGHTS': u'/tmp/detectron-download-cache/ImageNetPretrained/MSRA/R-50.pkl'},
 'USE_NCCL': False,
 'VIS': False,
 'VIS_TH': 0.9}
WARNING cnn.py:  25: [====DEPRECATE WARNING====]: you are creating an object from CNNModelHelper class which will be deprecated soon. Please use ModelHelper object with brew module. For more information, please refer to caffe2.ai and python/brew.py, python/brew_test.py for more information.
INFO net.py:  54: Loading from: /home/zhoujianwen/masktextspotter.caffe2/models/model_iter79999.pkl
/home/zhoujianwen/masktextspotter.caffe2/lib/utils/net.py:59: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  saved_cfg = yaml.load(src_blobs['cfg'])
Traceback (most recent call last):
  File "tools/test_net.py", line 159, in <module>
    main(ind_range=args.range, multi_gpu_testing=args.multi_gpu_testing, vis=vis)
  File "tools/test_net.py", line 127, in main
    parent_func(multi_gpu=multi_gpu_testing, vis=vis)
  File "/home/zhoujianwen/masktextspotter.caffe2/lib/core/test_engine.py", line 64, in test_net_on_dataset
    test_net(vis=vis)
  File "/home/zhoujianwen/masktextspotter.caffe2/lib/core/test_engine.py", line 126, in test_net
    model = initialize_model_from_cfg()
  File "/home/zhoujianwen/masktextspotter.caffe2/lib/core/test_engine.py", line 160, in initialize_model_from_cfg
    model, cfg.TEST.WEIGHTS, broadcast=False
  File "/home/zhoujianwen/masktextspotter.caffe2/lib/utils/net.py", line 45, in initialize_from_weights_file
    initialize_gpu_0_from_weights_file(model, weights_file)
  File "/home/zhoujianwen/masktextspotter.caffe2/lib/utils/net.py", line 59, in initialize_gpu_0_from_weights_file
    saved_cfg = yaml.load(src_blobs['cfg'])
  File "/home/zhoujianwen/.local/lib/python2.7/site-packages/yaml/__init__.py", line 114, in load
    return loader.get_single_data()
  File "/home/zhoujianwen/.local/lib/python2.7/site-packages/yaml/constructor.py", line 45, in get_single_data
    return self.construct_document(node)
  File "/home/zhoujianwen/.local/lib/python2.7/site-packages/yaml/constructor.py", line 49, in construct_document
    data = self.construct_object(node)
  File "/home/zhoujianwen/.local/lib/python2.7/site-packages/yaml/constructor.py", line 96, in construct_object
    data = constructor(self, tag_suffix, node)
  File "/home/zhoujianwen/.local/lib/python2.7/site-packages/yaml/constructor.py", line 628, in construct_python_object_new
    return self.construct_python_object_apply(suffix, node, newobj=True)
  File "/home/zhoujianwen/.local/lib/python2.7/site-packages/yaml/constructor.py", line 611, in construct_python_object_apply
    value = self.construct_mapping(node, deep=True)
  File "/home/zhoujianwen/.local/lib/python2.7/site-packages/yaml/constructor.py", line 214, in construct_mapping
    return BaseConstructor.construct_mapping(self, node, deep=deep)
  File "/home/zhoujianwen/.local/lib/python2.7/site-packages/yaml/constructor.py", line 139, in construct_mapping
    value = self.construct_object(value_node, deep=deep)
  File "/home/zhoujianwen/.local/lib/python2.7/site-packages/yaml/constructor.py", line 101, in construct_object
    for dummy in generator:
  File "/home/zhoujianwen/.local/lib/python2.7/site-packages/yaml/constructor.py", line 404, in construct_yaml_map
    value = self.construct_mapping(node)
  File "/home/zhoujianwen/.local/lib/python2.7/site-packages/yaml/constructor.py", line 214, in construct_mapping
    return BaseConstructor.construct_mapping(self, node, deep=deep)
  File "/home/zhoujianwen/.local/lib/python2.7/site-packages/yaml/constructor.py", line 139, in construct_mapping
    value = self.construct_object(value_node, deep=deep)
  File "/home/zhoujianwen/.local/lib/python2.7/site-packages/yaml/constructor.py", line 96, in construct_object
    data = constructor(self, tag_suffix, node)
  File "/home/zhoujianwen/.local/lib/python2.7/site-packages/yaml/constructor.py", line 617, in construct_python_object_apply
    instance = self.make_python_instance(suffix, node, args, kwds, newobj)
  File "/home/zhoujianwen/.local/lib/python2.7/site-packages/yaml/constructor.py", line 558, in make_python_instance
    node.start_mark)
yaml.constructor.ConstructorError: while constructing a Python instance
expected a class, but found <type 'builtin_function_or_method'>
  in "<string>", line 3, column 20:
      BBOX_XFORM_CLIP: !!python/object/apply:numpy.core ...

opened by zhoujianwen 11

ImportError: No module named lanms

I try python tools/test_net.py --cfg configs/text/mask_textspotter.yaml but i can not import lanms i have all ready make in lib and make in masktextspotter.caffe2/lanms

i use anaconda3 python2.7 gcc and g++ version is 5.4

opened by huziling 9
ImportError: cannot import name test_retinanet

Traceback (most recent call last): File "tools/test_net.py", line 42, in from core.test_retinanet import test_retinanet ImportError: cannot import name test_retinanet 你好，我在detectron的core中的test_retinanet.py中没有找到test_retinanet函数

opened by TBS1234 2
How get character level annotation ?
Hi

For training Icdar2013 the ground truth files look this for a single word

158.0,128.0,411.0,128.0,411.0,181.0,158.0,181.0,Footpath,158.0,131.0,187.0,131.0,187.0,172.0,158.0,172.0,F,189.0,139.0,219.0,139.0,219.0,171.0,189.0,171.0,o,226.0,139.0,255.0,139.0,255.0,171.0,226.0,171.0,o,261.0,129.0,282.0,129.0,282.0,171.0,261.0,171.0,t,290.0,140.0,319.0,140.0,319.0,181.0,290.0,181.0,p,324.0,139.0,351.0,139.0,351.0,170.0,324.0,170.0,a,357.0,128.0, 377.0,128.0,377.0,170.0,357.0,170.0,t,385.0,129.0,411.0,129.0,411.0,170.0,385.0,170.0,h

as you see this also has charcter level annotation.

but for the icdar 2015 training set, the annotation looks like this-

377,117,463,117,465,130,378,130,Genaxis Theatre 493,115,519,115,519,131,493,131,[06] 374,155,409,155,409,170,374,170,### 492,151,551,151,551,170,492,170,62-03 376,198,422,198,422,212,376,212,Carpark 494,190,539,189,539,205,494,206,### 374,1,494,0,492,85,372,86,###

As you see this does not have letter level annotation. can anyone guide me as to how to get charcter level annotation from this?

Thanks in advance.
opened by DecentMakeover 2
format of results

Hi

i don't understand the format of the results, 618,140,662,159,618,140,660,140,660,157,618,157,ahead,0.99701583,0.9119489312171936,./train/shrink++_finetune/icdar2015_test/model_iter79999.pkl_results/res_img_1_0.mat this is one result of the text file.

There are 12 numbers and then a word and two more numbers, What do the 12 numbers mean? I checked the icdar data and it has 8 numbers.

also why are there 2 confidence scores?

Any suggestions would be really helpful.

Thanks in advance.

opened by DecentMakeover 2
I guess there is something wrong with _get_dataset_inds(self) in mix_loader.py, line 146
When I set MIX_RATIOS = [0.4, 0.4, 0.2] in .yaml, it stop at assert(len(self._dataset_inds) == self._num_gpus*cfg.TRAIN.IMS_PER_BATCH). And self._dataset_inds = [].

Then I set MIX_RATIOS = [2.0, 2.0, 1.0] in .yaml and comment the assert, since self._dataset_inds = [0, 0, 1, 1, 2]. It seems to work as expect.

So do you have any idea? Or would you please explain the assert?
opened by Ocelot7777 2
has no task_evaluation

from datasets import task_evaluation ImportError: cannot import name task_evaluation

in the dataset has no task_evaluation.

Could you help me?

thank you very much

opened by 10183308 2
when using your weight ‘model_iter79999 ’to trian, channel is not matched

AssertionError: Workspace blob rpn_cls_logits_fpn2_w with shape (4, 256, 1, 1) does not match weights file shape (3, 256, 1, 1)

But i can train with R-50 .why

opened by Rosesor 1

ValueError: need more than 2 values to unpack

when test the infer.py with ICDAR2015 dataset, it print out the problem:

Traceback (most recent call last):
  File "tools/test_net.py", line 169, in <module>
    main(ind_range=args.range, multi_gpu_testing=args.multi_gpu_testing, vis=vis)
  File "tools/test_net.py", line 139, in main
    parent_func(multi_gpu=multi_gpu_testing, vis=vis)
  File "./lib/core/test_engine.py", line 64, in test_net_on_dataset
    test_net(vis=vis)
  File "./lib/core/test_engine.py", line 150, in test_net
    model, im, image_name, box_proposals, timers, vis=vis
  File "./lib/core/test.py", line 150, in im_detect_all
    text, rec_score, rec_char_scores = getstr_grid(char_masks[index,:,:,:].copy(), box_w, box_h)
  File "./lib/core/test.py", line 1107, in getstr_grid
    string, score, rec_scores = seg2text(pos, mask_index, seg)
  File "./lib/core/test.py", line 1216, in seg2text
    im2, contours, hierarchy = cv2.findContours(thresh,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
ValueError: need more than 2 values to unpack

opened by chenjun2hao 1

import lanms

when i import lanms, it print this problem Traceback (most recent call last): File "tools/test_net.py", line 40, in <module> from core.test_engine import test_net, test_net_on_dataset File "/media/chenjun/ed/31_ocr_own/masktextspotter.caffe2/lib/core/test_engine.py", line 37, in <module> from core.test import im_detect_all File "/media/chenjun/ed/31_ocr_own/masktextspotter.caffe2/lib/core/test.py", line 49, in <module> import lanms File "/home/chenjun/anaconda2/lib/python2.7/site-packages/lanms/__init__.py", line 2, in <module> from .adaptor import merge_quadrangle_n9 as nms_impl ImportError: /home/chenjun/anaconda2/lib/python2.7/site-packages/lanms/adaptor.so: undefined symbol: PyInstanceMethod_Type

opened by chenjun2hao 1
Polygon inputs for training custom dataset

Hi,

If I want to train the model on custom dataset, how can I give coordinates of polygon for the text annotations, as in ICDAR2013 and ICDAR2015, the coordinates of only quadrangle are given, i.e coordinates of the 4 corner points. But my question is what if I want to give coordinates of more than 4 points like in the case of curved text. Please let me know how can I do that.

Thanks in advance.

opened by harshall28 4
AttributeError: Method AffineChannel is not a registered operator.

WARNING cnn.py: 40: [====DEPRECATE WARNING====]: you are creating an object from CNNModelHelper class which will be deprecated soon. Please use ModelHelper object with brew module. For more information, please refer to caffe2.ai and python/brew.py, python/brew_test.py for more information. Traceback (most recent call last): File "tools/test_net.py", line 157, in <module> main(ind_range=args.range, multi_gpu_testing=args.multi_gpu_testing, vis=vis) File "tools/test_net.py", line 127, in main parent_func(multi_gpu=multi_gpu_testing, vis=vis) File "/root/fsy_SceneTextRec/masktextspotter.caffe2-master/lib/core/test_engine.py", line 64, in test_net_on_dataset test_net(vis=vis) File "/root/fsy_SceneTextRec/masktextspotter.caffe2-master/lib/core/test_engine.py", line 126, in test_net model = initialize_model_from_cfg() File "/root/fsy_SceneTextRec/masktextspotter.caffe2-master/lib/core/test_engine.py", line 158, in initialize_model_from_cfg model = model_builder.create(cfg.MODEL.TYPE, train=False) File "/root/fsy_SceneTextRec/masktextspotter.caffe2-master/lib/modeling/model_builder.py", line 119, in create return get_func(model_type_func)(model) File "/root/fsy_SceneTextRec/masktextspotter.caffe2-master/lib/modeling/model_builder.py", line 91, in generalized_rcnn freeze_conv_body=cfg.TRAIN.FREEZE_CONV_BODY File "/root/fsy_SceneTextRec/masktextspotter.caffe2-master/lib/modeling/model_builder.py", line 224, in build_generic_detection_model optim.build_data_parallel_model(model, _single_gpu_build_func) File "/root/fsy_SceneTextRec/masktextspotter.caffe2-master/lib/modeling/optimizer.py", line 51, in build_data_parallel_model single_gpu_build_func(model) File "/root/fsy_SceneTextRec/masktextspotter.caffe2-master/lib/modeling/model_builder.py", line 164, in _single_gpu_build_func blob_conv, dim_conv, spatial_scale_conv = add_conv_body_func(model) File "/root/fsy_SceneTextRec/masktextspotter.caffe2-master/lib/modeling/FPN.py", line 47, in add_fpn_ResNet50_conv5_body model, ResNet.add_ResNet50_conv5_body, fpn_level_info_ResNet50_conv5 File "/root/fsy_SceneTextRec/masktextspotter.caffe2-master/lib/modeling/FPN.py", line 103, in add_fpn_onto_conv_body conv_body_func(model) File "/root/fsy_SceneTextRec/masktextspotter.caffe2-master/lib/modeling/ResNet.py", line 39, in add_ResNet50_conv5_body return add_ResNet_convX_body(model, (3, 4, 6, 3), use_deformable=use_deformable) File "/root/fsy_SceneTextRec/masktextspotter.caffe2-master/lib/modeling/ResNet.py", line 98, in add_ResNet_convX_body p = model.AffineChannel(p, 'res_conv1_bn', inplace=True) File "/root/fsy_SceneTextRec/masktextspotter.caffe2-master/lib/modeling/detector.py", line 104, in AffineChannel return self.net.AffineChannel([blob_in, scale, bias], blob_in) File "/usr/local/lib/python2.7/dist-packages/caffe2/python/core.py", line 2040, in __getattr__ ",".join(workspace.C.nearby_opnames(op_type)) + ']' AttributeError: Method AffineChannel is not a registered operator. Did you mean: []

I wonder whether this problem results from the Detectron package error installation. By the way, when i install the detectron package follow a dockerfile inside, no problems were shown.

opened by Shualite 1
AttributeError: Method DeformConv is not a registered operator. Did you mean: []

I run the followed statement: python tools/test_net.py --cfg configs/text/mask_textspotter.yaml it came an error like this.

File "/home/brooklyn/anaconda3/envs/conda_py2/lib/python2.7/site-packages/caffe2/python/core.py", line 2205, in getattr ",".join(workspace.C.nearby_opnames(op_type)) + ']' AttributeError: Method DeformConv is not a registered operator. Did you mean: []

How come? Thanks for help!

opened by brooklyn1900 0
cannot import name test_retinanet

Hi @lvpengyuan,

I am getting the following error while running python tools/test_net.py --cfg configs/text/mask_textspotter.yaml Traceback (most recent call last): File "tools/test_net.py", line 42, in from core.test_retinanet import test_retinanet ImportError: cannot import name test_retinanet

opened by archanray 2

The code of "Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes"

Related tags

Overview

Mask TextSpotter

A Pytorch implementation of Mask TextSpotter along with its extension can be find here

Introduction

Citing the paper

Contents

Requirements

Installation

Caffe2

Models

Datasets

Test

Train

Comments

Owner

Pengyuan Lyu

Source code of RRPN ---- Arbitrary-Oriented Scene Text Detection via Rotation Proposals

This project modify tensorflow object detection api code to predict oriented bounding boxes. It can be used for scene text detection.

Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text Recognition"

Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

This repository provides train＆test code, dataset, det.&rec. annotation, evaluation script, annotation tool, and ranking.

Scan the MRZ code of a passport and extract the firstname, lastname, passport number, nationality, date of birth, expiration date and personal numer.

Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding for Zero-Example Video Retrieval.

Code for AAAI 2021 paper: Sequential End-to-end Network for Efficient Person Search

Code related to "Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity" paper

Code for CVPR2021 paper "Learning Salient Boundary Feature for Anchor-free Temporal Action Localization"

This is the code for our paper DAAIN: Detection of Anomalous and AdversarialInput using Normalizing Flows

Generate a list of papers with publicly available source code in the daily arxiv

Repository of conference publications and source code for first-/ second-authored papers published at NeurIPS, ICML, and ICLR.

Code for the ACL2021 paper "Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction"

This repository contains the code for the paper "SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks"

graph learning code for ogb

SceneCollisionNet This repo contains the code for "Object Rearrangement Using Learned Implicit Collision Functions", an ICRA 2021 paper. For more info

The code for “Oriented RepPoints for Aerail Object Detection”