The code of "Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes"

Overview

Mask TextSpotter

A Pytorch implementation of Mask TextSpotter along with its extension can be find here

Introduction

This is the official implementation of Mask TextSpotter.

Mask TextSpotter is an End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

For more details, please refer to our paper.

Citing the paper

Please cite the paper in your publications if it helps your research:

@inproceedings{LyuLYWB18,
  author    = {Pengyuan Lyu and
               Minghui Liao and
               Cong Yao and
               Wenhao Wu and
               Xiang Bai},
  title     = {Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes},
  booktitle = {Proc. ECCV},
  pages     = {71--88},
  year      = {2018}
}

Contents

  1. Requirements
  2. Installation
  3. Models
  4. Datasets
  5. Test
  6. Train

Requirements

  • NVIDIA GPU, Linux, Python2
  • Caffe2, various standard Python packages

Installation

Caffe2

To install Caffe2 with CUDA support, follow the installation instructions from the Caffe2 website. If you already have Caffe2 installed, make sure to update your Caffe2 to a version that includes the Detectron module.

Please ensure that your Caffe2 installation was successful before proceeding by running the following commands and checking their output as directed in the comments.

# To check if Caffe2 build was successful
python2 -c 'from caffe2.python import core' 2>/dev/null && echo "Success" || echo "Failure"

# To check if Caffe2 GPU build was successful
# This must print a number > 0 in order to use Detectron
python2 -c 'from caffe2.python import workspace; print(workspace.NumCudaDevices())'

If the caffe2 Python package is not found, you likely need to adjust your PYTHONPATH environment variable to include its location (/path/to/caffe2/build, where build is the Caffe2 CMake build directory).

Install Python dependencies:

pip install numpy pyyaml matplotlib opencv-python>=3.0 setuptools Cython mock

Set up Python modules:

cd $ROOT_DIR/lib && make

Note: Caffe2 is difficult to install sometimes.

Models

Download the model and place it as models/model_iter79999.pkl Our trained model: Google Drive; BaiduYun (key of BaiduYun: gnpc)

Datasets

Download the ICDAR2013(Google Drive, BaiduYun) and ICDAR2015(Google Drive, BaiduYun) as examples. Datasets should be placed in lib/datasets/data/ as below

synth
icdar2013
icdar2015
scut-eng-char
totaltext

If you do not train the model, you can just download the ICDAR2013 or ICDAR2015 datasets for testing.

Test

python tools/test_net.py --cfg configs/text/mask_textspotter.yaml

You can modify the model path or the test dataset in configs/text/mask_textspotter.yaml.

Train

You should format all the datasets you used for training as above. Then modify configs/text/mask_textspotter.yaml to fit the gpus, model path, and datasets.

python tools/train_net.py --cfg configs/text/mask_textspotter.yaml
Comments
  • Questions about datasets and testsets.

    Questions about datasets and testsets.

    I followed your instructions and configured the masktextspotter environment. Now I will use the data set and test set you gave me.

    python tools/test_net.py --cfg configs/text/mask_textspotter.yaml

    Display the following information, do not know how long to execute?

    (caffe2_env) zhoujianwen@zhoujianwen-System:~/masktextspotter.caffe2$ python tools/test_net.py --cfg configs/text/mask_textspotter.yaml
    
    make: Entering directory '/home/zhoujianwen/masktextspotter.caffe2/lanms'
    make: 'adaptor.so' is up to date.
    make: Leaving directory '/home/zhoujianwen/masktextspotter.caffe2/lanms'
    INFO test_net.py: 141: Called with args:
    INFO test_net.py: 142: Namespace(cfg_file='configs/text/mask_textspotter.yaml', multi_gpu_testing=False, opts=[], range=None, vis=False, wait=True)
    /home/zhoujianwen/masktextspotter.caffe2/lib/core/config.py:1094: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
      yaml_cfg = AttrDict(yaml.load(f))
    INFO test_net.py: 148: Testing with config:
    INFO test_net.py: 149: {'BBOX_XFORM_CLIP': 4.135166556742356,
     'CLUSTER': {'ON_CLUSTER': False},
     'DATA_LOADER': {'NUM_THREADS': 4},
     'DEDUP_BOXES': 0.0625,
     'DOWNLOAD_CACHE': '/tmp/detectron-download-cache',
     'EPS': 1e-14,
     'EXPECTED_RESULTS': [],
     'EXPECTED_RESULTS_ATOL': 0.005,
     'EXPECTED_RESULTS_EMAIL': '',
     'EXPECTED_RESULTS_RTOL': 0.1,
     'FAST_RCNN': {'MLP_HEAD_DIM': 1024,
                   'ROI_BOX_HEAD': 'fast_rcnn_heads.add_roi_2mlp_head',
                   'ROI_XFORM_METHOD': 'RoIAlign',
                   'ROI_XFORM_RESOLUTION': 7,
                   'ROI_XFORM_SAMPLING_RATIO': 2},
     'FPN': {'COARSEST_STRIDE': 32,
             'DIM': 256,
             'EXTRA_CONV_LEVELS': False,
             'FPN_ON': True,
             'MULTILEVEL_ROIS': True,
             'MULTILEVEL_RPN': True,
             'ROI_CANONICAL_LEVEL': 4,
             'ROI_CANONICAL_SCALE': 224,
             'ROI_MAX_LEVEL': 5,
             'ROI_MIN_LEVEL': 2,
             'RPN_ANCHOR_START_SIZE': 32,
             'RPN_ASPECT_RATIOS': (0.5, 1, 2),
             'RPN_MAX_LEVEL': 6,
             'RPN_MIN_LEVEL': 2,
             'USE_DEFORMABLE': False,
             'ZERO_INIT_LATERAL': False},
     'IMAGE': {'aug': False,
               'brightness_delta': 32,
               'brightness_prob': 0.5,
               'contrast_lower': 0.5,
               'contrast_prob': 0.5,
               'contrast_upper': 1.5,
               'hue_delta': 18,
               'hue_prob': 0.5,
               'lighting_noise_prob': 0.5,
               'rotate_delta': 15,
               'rotate_prob': 0.5,
               'saturation_lower': 0.5,
               'saturation_prob': 0.5,
               'saturation_upper': 1.5},
     'KRCNN': {'CONV_HEAD_DIM': 256,
               'CONV_HEAD_KERNEL': 3,
               'CONV_INIT': 'GaussianFill',
               'DECONV_DIM': 256,
               'DECONV_KERNEL': 4,
               'DILATION': 1,
               'HEATMAP_SIZE': -1,
               'INFERENCE_MIN_SIZE': 0,
               'KEYPOINT_CONFIDENCE': 'bbox',
               'LOSS_WEIGHT': 1.0,
               'MIN_KEYPOINT_COUNT_FOR_VALID_MINIBATCH': 20,
               'NMS_OKS': False,
               'NORMALIZE_BY_VISIBLE_KEYPOINTS': True,
               'NUM_KEYPOINTS': -1,
               'NUM_STACKED_CONVS': 8,
               'ROI_KEYPOINTS_HEAD': '',
               'ROI_XFORM_METHOD': 'RoIAlign',
               'ROI_XFORM_RESOLUTION': 7,
               'ROI_XFORM_SAMPLING_RATIO': 0,
               'UP_SCALE': -1,
               'USE_DECONV': False,
               'USE_DECONV_OUTPUT': False},
     'MATLAB': 'matlab',
     'MEMONGER': True,
     'MEMONGER_SHARE_ACTIVATIONS': False,
     'MODEL': {'BBOX_REG_WEIGHTS': (10.0, 10.0, 5.0, 5.0),
               'CLS_AGNOSTIC_BBOX_REG': False,
               'CONV_BODY': 'FPN.add_fpn_ResNet50_conv5_body',
               'EXECUTION_TYPE': 'dag',
               'FASTER_RCNN': True,
               'KEYPOINTS_ON': False,
               'MASK_ON': True,
               'NAME': 'shrink++',
               'NUM_CLASSES': 2,
               'RPN_ONLY': False,
               'TYPE': 'generalized_rcnn'},
     'MRCNN': {'CLS_SPECIFIC_MASK': True,
               'CONV_INIT': 'MSRAFill',
               'DILATION': 1,
               'DIM_REDUCED': 256,
               'IS_E2E': True,
               'MASK_BATCH_SIZE_PER_IM': 16,
               'RESOLUTION': 28,
               'RESOLUTION_H': 32,
               'RESOLUTION_W': 128,
               'ROI_MASK_HEAD': 'text_mask_rcnn_heads.mask_rcnn_fcn_head_v1up4convs',
               'ROI_XFORM_METHOD': 'RoIAlign',
               'ROI_XFORM_RESOLUTION': 14,
               'ROI_XFORM_RESOLUTION_H': 16,
               'ROI_XFORM_RESOLUTION_W': 64,
               'ROI_XFORM_SAMPLING_RATIO': 2,
               'THRESH_BINARIZE': 0.5,
               'UPSAMPLE_RATIO': 1,
               'USE_FC_OUTPUT': False,
               'WEIGHT_LOSS_CHAR_BOX': 1.0,
               'WEIGHT_LOSS_MASK': 1.0,
               'WEIGHT_WH': True},
     'NUM_GPUS': 1,
     'OUTPUT_DIR': '.',
     'PIXEL_MEANS': array([[[102.9801, 115.9465, 122.7717]]]),
     'RESNETS': {'NUM_GROUPS': 1,
                 'RES5_DILATION': 1,
                 'STRIDE_1X1': True,
                 'TRANS_FUNC': 'bottleneck_transformation',
                 'WIDTH_PER_GROUP': 64},
     'RETINANET': {'ANCHOR_SCALE': 4,
                   'ASPECT_RATIOS': (0.25, 0.5, 1.0, 2.0, 4.0),
                   'BBOX_REG_BETA': 0.11,
                   'BBOX_REG_WEIGHT': 1.0,
                   'CLASS_SPECIFIC_BBOX': False,
                   'INFERENCE_TH': 0.05,
                   'LOSS_ALPHA': 0.25,
                   'LOSS_GAMMA': 2.0,
                   'NEGATIVE_OVERLAP': 0.4,
                   'NUM_CONVS': 4,
                   'POSITIVE_OVERLAP': 0.5,
                   'PRE_NMS_TOP_N': 1000,
                   'PRIOR_PROB': 0.01,
                   'RETINANET_ON': False,
                   'SCALES_PER_OCTAVE': 3,
                   'SHARE_CLS_BBOX_TOWER': False,
                   'SOFTMAX': False},
     'RFCN': {'PS_GRID_SIZE': 3},
     'RNG_SEED': 3,
     'ROOT_DIR': '/home/zhoujianwen/masktextspotter.caffe2',
     'RPN': {'ASPECT_RATIOS': (0.5, 1, 2),
             'RPN_ON': True,
             'SIZES': (64, 128, 256, 512),
             'STRIDE': 16},
     'SOLVER': {'BASE_LR': 0.005,
                'GAMMA': 0.1,
                'LOG_LR_CHANGE_THRESHOLD': 1.1,
                'LRS': [],
                'LR_POLICY': 'steps_with_decay',
                'MAX_ITER': 200000,
                'MOMENTUM': 0.9,
                'SCALE_MOMENTUM': True,
                'SCALE_MOMENTUM_THRESHOLD': 1.1,
                'STEPS': [0, 120000],
                'STEP_SIZE': 30000,
                'WARM_UP_FACTOR': 0.3333333333333333,
                'WARM_UP_ITERS': 500,
                'WARM_UP_METHOD': u'linear',
                'WEIGHT_DECAY': 0.0001},
     'TEST': {'BBOX_AUG': {'AREA_TH_HI': 32400,
                           'AREA_TH_LO': 2500,
                           'ASPECT_RATIOS': (),
                           'ASPECT_RATIO_H_FLIP': False,
                           'COORD_HEUR': 'UNION',
                           'ENABLED': False,
                           'H_FLIP': False,
                           'MAX_SIZE': 2000,
                           'SCALES': (800,),
                           'SCALE_H_FLIP': False,
                           'SCALE_SIZE_DEP': False,
                           'SCORE_HEUR': 'UNION'},
              'BBOX_REG': True,
              'BBOX_VOTE': {'ENABLED': True,
                            'SCORING_METHOD': 'ID',
                            'SCORING_METHOD_BETA': 1.0,
                            'VOTE_TH': 0.9},
              'COMPETITION_MODE': True,
              'DATASET': '',
              'DATASETS': ('icdar2015_test',),
              'DETECTIONS_PER_IM': 100,
              'FORCE_JSON_DATASET_EVAL': False,
              'KPS_AUG': {'AREA_TH': 32400,
                          'ASPECT_RATIOS': (),
                          'ASPECT_RATIO_H_FLIP': False,
                          'ENABLED': False,
                          'HEUR': 'HM_AVG',
                          'H_FLIP': False,
                          'MAX_SIZE': 4000,
                          'SCALES': (),
                          'SCALE_H_FLIP': False,
                          'SCALE_SIZE_DEP': False},
              'MASK_AUG': {'AREA_TH': 32400,
                           'ASPECT_RATIOS': (),
                           'ASPECT_RATIO_H_FLIP': False,
                           'ENABLED': False,
                           'HEUR': 'SOFT_AVG',
                           'H_FLIP': False,
                           'MAX_SIZE': 3333,
                           'SCALES': (1600,),
                           'SCALE_H_FLIP': False,
                           'SCALE_SIZE_DEP': False},
              'MAX_SIZE': 3333,
              'NMS': 0.5,
              'NUM_TEST_IMAGES': 5000,
              'OUTPUT_POLYGON': False,
              'PRECOMPUTED_PROPOSALS': False,
              'PROPOSAL_FILE': '',
              'PROPOSAL_FILES': (),
              'PROPOSAL_LIMIT': 2000,
              'RPN_MIN_SIZE': 0,
              'RPN_NMS_THRESH': 0.7,
              'RPN_POST_NMS_TOP_N': 1000,
              'RPN_PRE_NMS_TOP_N': 1000,
              'SCALES': (1000,),
              'SCORE_THRESH': 0.2,
              'SOFT_NMS': {'ENABLED': False, 'METHOD': 'linear', 'SIGMA': 0.5},
              'VIS': False,
              'WEIGHTS': '/home/zhoujianwen/masktextspotter.caffe2/models/model_iter79999.pkl'},
     'TRAIN': {'ASPECT_GROUPING': True,
               'AUTO_RESUME': True,
               'BATCH_SIZE_PER_IM': 512,
               'BBOX_THRESH': 0.5,
               'BG_THRESH_HI': 0.5,
               'BG_THRESH_LO': 0.0,
               'CROWD_FILTER_THRESH': 0.7,
               'DATASETS': ('icdar2015_train',),
               'FG_FRACTION': 0.25,
               'FG_THRESH': 0.5,
               'FREEZE_CONV_BODY': False,
               'GT_MIN_AREA': -1,
               'IMS_PER_BATCH': 2,
               'MAX_SIZE': 1333,
               'MIX_RATIOS': [0.5, 0.25, 0.25],
               'MIX_TRAIN': False,
               'PROPOSAL_FILES': (),
               'RPN_BATCH_SIZE_PER_IM': 256,
               'RPN_FG_FRACTION': 0.5,
               'RPN_MIN_SIZE': 0,
               'RPN_NEGATIVE_OVERLAP': 0.3,
               'RPN_NMS_THRESH': 0.7,
               'RPN_POSITIVE_OVERLAP': 0.7,
               'RPN_POST_NMS_TOP_N': 2000,
               'RPN_PRE_NMS_TOP_N': 2000,
               'RPN_STRADDLE_THRESH': 0,
               'SCALES': (800,),
               'SNAPSHOT_ITERS': 10000,
               'USE_CHARANNS': [True],
               'USE_FLIPPED': False,
               'WEIGHTS': u'/tmp/detectron-download-cache/ImageNetPretrained/MSRA/R-50.pkl'},
     'USE_NCCL': False,
     'VIS': False,
     'VIS_TH': 0.9}
    WARNING cnn.py:  25: [====DEPRECATE WARNING====]: you are creating an object from CNNModelHelper class which will be deprecated soon. Please use ModelHelper object with brew module. For more information, please refer to caffe2.ai and python/brew.py, python/brew_test.py for more information.
    INFO net.py:  54: Loading from: /home/zhoujianwen/masktextspotter.caffe2/models/model_iter79999.pkl
    /home/zhoujianwen/masktextspotter.caffe2/lib/utils/net.py:59: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
      saved_cfg = yaml.load(src_blobs['cfg'])
    Traceback (most recent call last):
      File "tools/test_net.py", line 159, in <module>
        main(ind_range=args.range, multi_gpu_testing=args.multi_gpu_testing, vis=vis)
      File "tools/test_net.py", line 127, in main
        parent_func(multi_gpu=multi_gpu_testing, vis=vis)
      File "/home/zhoujianwen/masktextspotter.caffe2/lib/core/test_engine.py", line 64, in test_net_on_dataset
        test_net(vis=vis)
      File "/home/zhoujianwen/masktextspotter.caffe2/lib/core/test_engine.py", line 126, in test_net
        model = initialize_model_from_cfg()
      File "/home/zhoujianwen/masktextspotter.caffe2/lib/core/test_engine.py", line 160, in initialize_model_from_cfg
        model, cfg.TEST.WEIGHTS, broadcast=False
      File "/home/zhoujianwen/masktextspotter.caffe2/lib/utils/net.py", line 45, in initialize_from_weights_file
        initialize_gpu_0_from_weights_file(model, weights_file)
      File "/home/zhoujianwen/masktextspotter.caffe2/lib/utils/net.py", line 59, in initialize_gpu_0_from_weights_file
        saved_cfg = yaml.load(src_blobs['cfg'])
      File "/home/zhoujianwen/.local/lib/python2.7/site-packages/yaml/__init__.py", line 114, in load
        return loader.get_single_data()
      File "/home/zhoujianwen/.local/lib/python2.7/site-packages/yaml/constructor.py", line 45, in get_single_data
        return self.construct_document(node)
      File "/home/zhoujianwen/.local/lib/python2.7/site-packages/yaml/constructor.py", line 49, in construct_document
        data = self.construct_object(node)
      File "/home/zhoujianwen/.local/lib/python2.7/site-packages/yaml/constructor.py", line 96, in construct_object
        data = constructor(self, tag_suffix, node)
      File "/home/zhoujianwen/.local/lib/python2.7/site-packages/yaml/constructor.py", line 628, in construct_python_object_new
        return self.construct_python_object_apply(suffix, node, newobj=True)
      File "/home/zhoujianwen/.local/lib/python2.7/site-packages/yaml/constructor.py", line 611, in construct_python_object_apply
        value = self.construct_mapping(node, deep=True)
      File "/home/zhoujianwen/.local/lib/python2.7/site-packages/yaml/constructor.py", line 214, in construct_mapping
        return BaseConstructor.construct_mapping(self, node, deep=deep)
      File "/home/zhoujianwen/.local/lib/python2.7/site-packages/yaml/constructor.py", line 139, in construct_mapping
        value = self.construct_object(value_node, deep=deep)
      File "/home/zhoujianwen/.local/lib/python2.7/site-packages/yaml/constructor.py", line 101, in construct_object
        for dummy in generator:
      File "/home/zhoujianwen/.local/lib/python2.7/site-packages/yaml/constructor.py", line 404, in construct_yaml_map
        value = self.construct_mapping(node)
      File "/home/zhoujianwen/.local/lib/python2.7/site-packages/yaml/constructor.py", line 214, in construct_mapping
        return BaseConstructor.construct_mapping(self, node, deep=deep)
      File "/home/zhoujianwen/.local/lib/python2.7/site-packages/yaml/constructor.py", line 139, in construct_mapping
        value = self.construct_object(value_node, deep=deep)
      File "/home/zhoujianwen/.local/lib/python2.7/site-packages/yaml/constructor.py", line 96, in construct_object
        data = constructor(self, tag_suffix, node)
      File "/home/zhoujianwen/.local/lib/python2.7/site-packages/yaml/constructor.py", line 617, in construct_python_object_apply
        instance = self.make_python_instance(suffix, node, args, kwds, newobj)
      File "/home/zhoujianwen/.local/lib/python2.7/site-packages/yaml/constructor.py", line 558, in make_python_instance
        node.start_mark)
    yaml.constructor.ConstructorError: while constructing a Python instance
    expected a class, but found <type 'builtin_function_or_method'>
      in "<string>", line 3, column 20:
          BBOX_XFORM_CLIP: !!python/object/apply:numpy.core ... 
    
    opened by zhoujianwen 11
  • ImportError: No module named lanms

    ImportError: No module named lanms

    I try python tools/test_net.py --cfg configs/text/mask_textspotter.yaml but i can not import lanms i have all ready make in lib and make in masktextspotter.caffe2/lanms

    i use anaconda3 python2.7 gcc and g++ version is 5.4

    opened by huziling 9
  • ImportError: cannot import name test_retinanet

    ImportError: cannot import name test_retinanet

    Traceback (most recent call last): File "tools/test_net.py", line 42, in from core.test_retinanet import test_retinanet ImportError: cannot import name test_retinanet 你好,我在detectron的core中的test_retinanet.py中没有找到test_retinanet函数

    opened by TBS1234 2
  • How get character level annotation ?

    How get character level annotation ?

    Hi

    For training Icdar2013 the ground truth files look this for a single word

    158.0,128.0,411.0,128.0,411.0,181.0,158.0,181.0,Footpath,158.0,131.0,187.0,131.0,187.0,172.0,158.0,172.0,F,189.0,139.0,219.0,139.0,219.0,171.0,189.0,171.0,o,226.0,139.0,255.0,139.0,255.0,171.0,226.0,171.0,o,261.0,129.0,282.0,129.0,282.0,171.0,261.0,171.0,t,290.0,140.0,319.0,140.0,319.0,181.0,290.0,181.0,p,324.0,139.0,351.0,139.0,351.0,170.0,324.0,170.0,a,357.0,128.0, 377.0,128.0,377.0,170.0,357.0,170.0,t,385.0,129.0,411.0,129.0,411.0,170.0,385.0,170.0,h

    as you see this also has charcter level annotation.

    but for the icdar 2015 training set, the annotation looks like this-

    377,117,463,117,465,130,378,130,Genaxis Theatre
    493,115,519,115,519,131,493,131,[06]
    374,155,409,155,409,170,374,170,###
    492,151,551,151,551,170,492,170,62-03
    376,198,422,198,422,212,376,212,Carpark
    494,190,539,189,539,205,494,206,###
    374,1,494,0,492,85,372,86,###
    

    As you see this does not have letter level annotation. can anyone guide me as to how to get charcter level annotation from this?

    Thanks in advance.

    opened by DecentMakeover 2
  • format of results

    format of results

    Hi

    i don't understand the format of the results, 618,140,662,159,618,140,660,140,660,157,618,157,ahead,0.99701583,0.9119489312171936,./train/shrink++_finetune/icdar2015_test/model_iter79999.pkl_results/res_img_1_0.mat this is one result of the text file.

    There are 12 numbers and then a word and two more numbers, What do the 12 numbers mean? I checked the icdar data and it has 8 numbers.

    also why are there 2 confidence scores?

    Any suggestions would be really helpful.

    Thanks in advance.

    opened by DecentMakeover 2
  • I guess there is something wrong with _get_dataset_inds(self) in mix_loader.py, line 146

    I guess there is something wrong with _get_dataset_inds(self) in mix_loader.py, line 146

    1. When I set MIX_RATIOS = [0.4, 0.4, 0.2] in .yaml, it stop at assert(len(self._dataset_inds) == self._num_gpus*cfg.TRAIN.IMS_PER_BATCH). And self._dataset_inds = [].

    2. Then I set MIX_RATIOS = [2.0, 2.0, 1.0] in .yaml and comment the assert, since self._dataset_inds = [0, 0, 1, 1, 2]. It seems to work as expect.

    3. So do you have any idea? Or would you please explain the assert?

    opened by Ocelot7777 2
  • has no  task_evaluation

    has no task_evaluation

    from datasets import task_evaluation ImportError: cannot import name task_evaluation

    in the dataset has no task_evaluation.

    Could you help me?

    thank you very much

    opened by 10183308 2
  • when using your weight ‘model_iter79999 ’to trian, channel is not matched

    when using your weight ‘model_iter79999 ’to trian, channel is not matched

    AssertionError: Workspace blob rpn_cls_logits_fpn2_w with shape (4, 256, 1, 1) does not match weights file shape (3, 256, 1, 1)

    But i can train with R-50 .why

    opened by Rosesor 1
  • ValueError: need more than 2 values to unpack

    ValueError: need more than 2 values to unpack

    when test the infer.py with ICDAR2015 dataset, it print out the problem:

    Traceback (most recent call last):
      File "tools/test_net.py", line 169, in <module>
        main(ind_range=args.range, multi_gpu_testing=args.multi_gpu_testing, vis=vis)
      File "tools/test_net.py", line 139, in main
        parent_func(multi_gpu=multi_gpu_testing, vis=vis)
      File "./lib/core/test_engine.py", line 64, in test_net_on_dataset
        test_net(vis=vis)
      File "./lib/core/test_engine.py", line 150, in test_net
        model, im, image_name, box_proposals, timers, vis=vis
      File "./lib/core/test.py", line 150, in im_detect_all
        text, rec_score, rec_char_scores = getstr_grid(char_masks[index,:,:,:].copy(), box_w, box_h)
      File "./lib/core/test.py", line 1107, in getstr_grid
        string, score, rec_scores = seg2text(pos, mask_index, seg)
      File "./lib/core/test.py", line 1216, in seg2text
        im2, contours, hierarchy = cv2.findContours(thresh,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
    ValueError: need more than 2 values to unpack
    
    opened by chenjun2hao 1
  • import lanms

    import lanms

    when i import lanms, it print this problem Traceback (most recent call last): File "tools/test_net.py", line 40, in <module> from core.test_engine import test_net, test_net_on_dataset File "/media/chenjun/ed/31_ocr_own/masktextspotter.caffe2/lib/core/test_engine.py", line 37, in <module> from core.test import im_detect_all File "/media/chenjun/ed/31_ocr_own/masktextspotter.caffe2/lib/core/test.py", line 49, in <module> import lanms File "/home/chenjun/anaconda2/lib/python2.7/site-packages/lanms/__init__.py", line 2, in <module> from .adaptor import merge_quadrangle_n9 as nms_impl ImportError: /home/chenjun/anaconda2/lib/python2.7/site-packages/lanms/adaptor.so: undefined symbol: PyInstanceMethod_Type

    opened by chenjun2hao 1
  • Polygon inputs for training custom dataset

    Polygon inputs for training custom dataset

    Hi,

    If I want to train the model on custom dataset, how can I give coordinates of polygon for the text annotations, as in ICDAR2013 and ICDAR2015, the coordinates of only quadrangle are given, i.e coordinates of the 4 corner points. But my question is what if I want to give coordinates of more than 4 points like in the case of curved text. Please let me know how can I do that.

    Thanks in advance.

    opened by harshall28 4
  • AttributeError: Method AffineChannel is not a registered operator.

    AttributeError: Method AffineChannel is not a registered operator.

    WARNING cnn.py: 40: [====DEPRECATE WARNING====]: you are creating an object from CNNModelHelper class which will be deprecated soon. Please use ModelHelper object with brew module. For more information, please refer to caffe2.ai and python/brew.py, python/brew_test.py for more information. Traceback (most recent call last): File "tools/test_net.py", line 157, in <module> main(ind_range=args.range, multi_gpu_testing=args.multi_gpu_testing, vis=vis) File "tools/test_net.py", line 127, in main parent_func(multi_gpu=multi_gpu_testing, vis=vis) File "/root/fsy_SceneTextRec/masktextspotter.caffe2-master/lib/core/test_engine.py", line 64, in test_net_on_dataset test_net(vis=vis) File "/root/fsy_SceneTextRec/masktextspotter.caffe2-master/lib/core/test_engine.py", line 126, in test_net model = initialize_model_from_cfg() File "/root/fsy_SceneTextRec/masktextspotter.caffe2-master/lib/core/test_engine.py", line 158, in initialize_model_from_cfg model = model_builder.create(cfg.MODEL.TYPE, train=False) File "/root/fsy_SceneTextRec/masktextspotter.caffe2-master/lib/modeling/model_builder.py", line 119, in create return get_func(model_type_func)(model) File "/root/fsy_SceneTextRec/masktextspotter.caffe2-master/lib/modeling/model_builder.py", line 91, in generalized_rcnn freeze_conv_body=cfg.TRAIN.FREEZE_CONV_BODY File "/root/fsy_SceneTextRec/masktextspotter.caffe2-master/lib/modeling/model_builder.py", line 224, in build_generic_detection_model optim.build_data_parallel_model(model, _single_gpu_build_func) File "/root/fsy_SceneTextRec/masktextspotter.caffe2-master/lib/modeling/optimizer.py", line 51, in build_data_parallel_model single_gpu_build_func(model) File "/root/fsy_SceneTextRec/masktextspotter.caffe2-master/lib/modeling/model_builder.py", line 164, in _single_gpu_build_func blob_conv, dim_conv, spatial_scale_conv = add_conv_body_func(model) File "/root/fsy_SceneTextRec/masktextspotter.caffe2-master/lib/modeling/FPN.py", line 47, in add_fpn_ResNet50_conv5_body model, ResNet.add_ResNet50_conv5_body, fpn_level_info_ResNet50_conv5 File "/root/fsy_SceneTextRec/masktextspotter.caffe2-master/lib/modeling/FPN.py", line 103, in add_fpn_onto_conv_body conv_body_func(model) File "/root/fsy_SceneTextRec/masktextspotter.caffe2-master/lib/modeling/ResNet.py", line 39, in add_ResNet50_conv5_body return add_ResNet_convX_body(model, (3, 4, 6, 3), use_deformable=use_deformable) File "/root/fsy_SceneTextRec/masktextspotter.caffe2-master/lib/modeling/ResNet.py", line 98, in add_ResNet_convX_body p = model.AffineChannel(p, 'res_conv1_bn', inplace=True) File "/root/fsy_SceneTextRec/masktextspotter.caffe2-master/lib/modeling/detector.py", line 104, in AffineChannel return self.net.AffineChannel([blob_in, scale, bias], blob_in) File "/usr/local/lib/python2.7/dist-packages/caffe2/python/core.py", line 2040, in __getattr__ ",".join(workspace.C.nearby_opnames(op_type)) + ']' AttributeError: Method AffineChannel is not a registered operator. Did you mean: []

    I wonder whether this problem results from the Detectron package error installation. By the way, when i install the detectron package follow a dockerfile inside, no problems were shown.

    opened by Shualite 1
  • AttributeError: Method DeformConv is not a registered operator. Did you mean: []

    AttributeError: Method DeformConv is not a registered operator. Did you mean: []

    I run the followed statement: python tools/test_net.py --cfg configs/text/mask_textspotter.yaml it came an error like this.

    File "/home/brooklyn/anaconda3/envs/conda_py2/lib/python2.7/site-packages/caffe2/python/core.py", line 2205, in getattr ",".join(workspace.C.nearby_opnames(op_type)) + ']' AttributeError: Method DeformConv is not a registered operator. Did you mean: []

    How come? Thanks for help!

    opened by brooklyn1900 0
  • cannot import name test_retinanet

    cannot import name test_retinanet

    Hi @lvpengyuan,

    I am getting the following error while running python tools/test_net.py --cfg configs/text/mask_textspotter.yaml Traceback (most recent call last): File "tools/test_net.py", line 42, in from core.test_retinanet import test_retinanet ImportError: cannot import name test_retinanet

    opened by archanray 2
Owner
Pengyuan Lyu
Pengyuan Lyu
Source code of RRPN ---- Arbitrary-Oriented Scene Text Detection via Rotation Proposals

Paper source Arbitrary-Oriented Scene Text Detection via Rotation Proposals https://arxiv.org/abs/1703.01086 News We update RRPN in pytorch 1.0! View

null 428 Nov 22, 2022
This project modify tensorflow object detection api code to predict oriented bounding boxes. It can be used for scene text detection.

This is an oriented object detector based on tensorflow object detection API. Most of the code is not changed except for those related to the need of

Dafang He 30 Oct 22, 2022
Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text Recognition"

SEE: Towards Semi-Supervised End-to-End Scene Text Recognition Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text

Christian Bartz 572 Jan 5, 2023
Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

STN-OCR: A single Neural Network for Text Detection and Text Recognition This repository contains the code for the paper: STN-OCR: A single Neural Net

Christian Bartz 496 Jan 5, 2023
This repository provides train&test code, dataset, det.&rec. annotation, evaluation script, annotation tool, and ranking.

SCUT-CTW1500 Datasets We have updated annotations for both train and test set. Train: 1000 images [images][annos] Additional point annotation for each

Yuliang Liu 600 Dec 18, 2022
Scan the MRZ code of a passport and extract the firstname, lastname, passport number, nationality, date of birth, expiration date and personal numer.

PassportScanner Works with 2 and 3 line identity documents. What is this With PassportScanner you can use your camera to scan the MRZ code of a passpo

Edwin Vermeer 441 Dec 24, 2022
Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding for Zero-Example Video Retrieval.

Dual Encoding for Video Retrieval by Text Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding

null 81 Dec 1, 2022
Code for AAAI 2021 paper: Sequential End-to-end Network for Efficient Person Search

This repository hosts the source code of our paper: [AAAI 2021]Sequential End-to-end Network for Efficient Person Search. SeqNet achieves the state-of

Zj Li 218 Dec 31, 2022
Code related to "Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity" paper

DataTuner You have just found the DataTuner. This repository provides tools for fine-tuning language models for a task. See LICENSE.txt for license de

null 81 Jan 1, 2023
Code for CVPR2021 paper "Learning Salient Boundary Feature for Anchor-free Temporal Action Localization"

AFSD: Learning Salient Boundary Feature for Anchor-free Temporal Action Localization This is an official implementation in PyTorch of AFSD. Our paper

Tencent YouTu Research 146 Dec 24, 2022
This is the code for our paper DAAIN: Detection of Anomalous and AdversarialInput using Normalizing Flows

Merantix-Labs: DAAIN This is the code for our paper DAAIN: Detection of Anomalous and Adversarial Input using Normalizing Flows which can be found at

Merantix 14 Oct 12, 2022
Generate a list of papers with publicly available source code in the daily arxiv

2021-06-08 paper code optimal network slicing for service-oriented networks with flexible routing and guaranteed e2e latency networkslicing multi-moda

null 79 Jan 3, 2023
Repository of conference publications and source code for first-/ second-authored papers published at NeurIPS, ICML, and ICLR.

Repository of conference publications and source code for first-/ second-authored papers published at NeurIPS, ICML, and ICLR.

Daniel Jarrett 26 Jun 17, 2021
Code for the ACL2021 paper "Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction"

CSCBLI Code for our ACL Findings 2021 paper, "Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction". Require

Jinpeng Zhang 12 Oct 8, 2022
This repository contains the code for the paper "SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks"

SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks (CVPR 2021 Oral) This repository contains the official PyTorch implementation

Shunsuke Saito 235 Dec 18, 2022
graph learning code for ogb

The final code for OGB Installation Requirements: ogb=1.3.1 torch=1.7.0 torch-geometric=1.7.0 torch-scatter=2.0.6 torch-sparse=0.6.9 Baseline models T

PierreHao 20 Nov 10, 2022
SceneCollisionNet This repo contains the code for "Object Rearrangement Using Learned Implicit Collision Functions", an ICRA 2021 paper. For more info

SceneCollisionNet This repo contains the code for "Object Rearrangement Using Learned Implicit Collision Functions", an ICRA 2021 paper. For more info

NVIDIA Research Projects 31 Nov 22, 2022
The code for “Oriented RepPoints for Aerail Object Detection”

Oriented RepPoints for Aerial Object Detection The code for the implementation of “Oriented RepPoints”, Under review. (arXiv preprint) Introduction Or

WentongLi 207 Dec 24, 2022