Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

mxin262

Last update: Jan 3, 2023

Related tags

Deep Learning SwinTextSpotter

Overview

SwinTextSpotter

This is the pytorch implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022). The paper is available at this link.

We use the models pre-trained on ImageNet. The ImageNet pre-trained SwinTransformer backbone is obtained from SwinT_detectron2.

Models

SWINTS-swin-english-pretrain [config] | model_Google Drive | model_BaiduYun PW: 954t

SWINTS-swin-Total-Text [config] | model_Google Drive | model_BaiduYun PW: tf0i

SWINTS-swin-ctw [config] | model_Google Drive | model_BaiduYun PW: 4etq

SWINTS-swin-icdar2015 [config] | model_Google Drive | model_BaiduYun PW: 3n82

SWINTS-swin-ReCTS [config] | model_Google Drive | model_BaiduYun PW: a4be

SWINTS-swin-vintext [config] | model_Google Drive | model_BaiduYun PW: slmp

Installation

Python=3.8
PyTorch=1.8.0, torchvision=0.9.0, cudatoolkit=11.1
OpenCV for visualization

Steps

Install the repository (we recommend to use Anaconda for installation.)

conda create -n SWINTS python=3.8 -y
conda activate SWINTS
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
pip install opencv-python
pip install scipy
pip install shapely
pip install rapidfuzz
pip install timm
pip install Polygon3
git clone https://github.com/mxin262/SwinTextSpotter.git
cd SwinTextSpotter
python setup.py build develop

dataset path

datasets
|_ totaltext
|  |_ train_images
|  |_ test_images
|  |_ totaltext_train.json
|  |_ weak_voc_new.txt
|  |_ weak_voc_pair_list.txt
|_ mlt2017
|  |_ train_images
|  |_ annotations/icdar_2017_mlt.json
.......

Downloaded images

ICDAR2017-MLT [image]
Syntext-150k:
- Part1: 94,723 [dataset]
- Part2: 54,327 [dataset]
ICDAR2015 [image]
ICDAR2013 [image]
Total-Text_train_images [image]
Total-Text_test_images [image]
ReCTs [images&label] PW: 2b4q
LSVT [images&label] PW: 9uh1
ArT [images&label] PW: 2865
SynChinese130k [images][label]
Vintext_images [image]

Downloaded label[Google Drive] [BaiduYun] PW: 46vd

Downloader lexicion[Google Drive] and place it to corresponding dataset.

You can also prepare your custom dataset following the example scripts. [example scripts]

Totaltext

To evaluate on Total Text, CTW1500, ICDAR2015, first download the zipped annotations with

cd datasets
mkdir evaluation
cd evaluation
wget -O gt_ctw1500.zip https://cloudstor.aarnet.edu.au/plus/s/xU3yeM3GnidiSTr/download
wget -O gt_totaltext.zip https://cloudstor.aarnet.edu.au/plus/s/SFHvin8BLUM4cNd/download
wget -O gt_icdar2015.zip https://drive.google.com/file/d/1wrq_-qIyb_8dhYVlDzLZTTajQzbic82Z/view?usp=sharing
wget -O gt_vintext.zip https://drive.google.com/file/d/11lNH0uKfWJ7Wc74PGshWCOgSxgEnUPEV/view?usp=sharing

Pretrain SWINTS (e.g., with Swin-Transformer backbone)

python projects/SWINTS/train_net.py \
  --num-gpus 8 \
  --config-file projects/SWINTS/configs/SWINTS-swin-pretrain.yaml

Fine-tune model on the mixed real dataset

python projects/SWINTS/train_net.py \
  --num-gpus 8 \
  --config-file projects/SWINTS/configs/SWINTS-swin-mixtrain.yaml

Fine-tune model

python projects/SWINTS/train_net.py \
  --num-gpus 8 \
  --config-file projects/SWINTS/configs/SWINTS-swin-finetune-totaltext.yaml

Evaluate SWINTS (e.g., with Swin-Transformer backbone)

python projects/SWINTS/train_net.py \
  --config-file projects/SWINTS/configs/SWINTS-swin-finetune-totaltext.yaml \
  --eval-only MODEL.WEIGHTS ./output/model_final.pth

Visualize the detection and recognition results (e.g., with ResNet50 backbone)

python demo/demo.py \
  --config-file projects/SWINTS/configs/SWINTS-swin-finetune-totaltext.yaml \
  --input input1.jpg \
  --output ./output \
  --confidence-threshold 0.4 \
  --opts MODEL.WEIGHTS ./output/model_final.pth

Example results:

Acknowlegement

Adelaidet, Detectron2, ISTR, SwinT_detectron2, Focal-Transformer and MaskTextSpotterV3.

Citation

If our paper helps your research, please cite it in your publications:

@article{huang2022swints,
  title = {SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition},
  author = {Mingxin Huang and YuLiang liu and Zhenghao Peng and Chongyu Liu and Dahua Lin and Shenggao Zhu and Nicholas Yuan and Kai Ding and Lianwen Jin},
  journal={arXiv preprint arXiv:2203.10209},
  year = {2022}
}

Copyright

For commercial purpose usage, please contact Dr. Lianwen Jin: [email protected]

Comments

LSVT training available?

Hi, I am trying to train LSVT dataset with this repo.

Seems, you've been trained LSVT, https://github.com/mxin262/SwinTextSpotter/blob/e238a4b5d0c127480a838c6245c1e5e9eb2f9d59/detectron2/data/datasets/builtin.py#L58 but there is no transcription for that.

is LSVT data trainable with this code? then can you provide config files and lexicon for that?

opened by jeong-tae 19
the annotation file of CTW1500

if I use the ctw1500 dataset, how to get the annotation file " instances_train2017.json" and "test_ctw1500_maxlen100.json", thanks.

opened by tyxy2310 5
ReCTS model doesn't work

Parameter setting： weights: rects_model_final.pth config: ./projects/SWINTS/configs/SWINTS-swin-chn_finetune.yaml predect data: ReCTS/icdar2019_rects_images1/* run script: python3 demo/demo.py --opts MODEL.WEIGHTS ./weights/rects_model_final.pth

Using the above parameters does not predict results, but 'tt_model_final.pth' is right

opened by oyrq 5
FileNotFoundError: [Errno 2] No such file or directory: 'datasets/totaltext/weak_voc_new.txt'

Thank you for your kindly code sharing!

And when I run : python projects/SWINTS/train_net.py --config-file projects/SWINTS/configs/SWINTS-swin-finetune-totaltext.yaml --eval-only MODEL.WEIGHTS model.pth

The error occurs: FileNotFoundError: [Errno 2] No such file or directory: 'datasets/totaltext/weak_voc_new.txt'

But I can not find this file according to dataset download.

opened by madajie9 4
How can I get gt_masks for my custom dataset?

I want to train SwinTextSpotter on my custom dataset (in totaltext format). But when I start to train, I was facing with this error: "AttributeError: Cannot find field 'gt_masks' in the given Instances!" in line 230 of file projects/SWINTS/swints/swints.py How can I get gt_masks for my custom dataset?

Thank you for your hard work!

opened by khiemledev 4
训练过程中 rec loss 不下降
执行如下指令：python projects/SWINTS/train_net.py --num-gpus 4 --config-file projects/SWINTS/configs/SWINTS-swin-pretrain.yaml

4 A100 80G batchsize=24，rec loss始终在6-7左右

[08/03 14:38:52] detectron2 INFO: Rank of current process: 0. World size: 4 [08/03 14:38:53] detectron2 INFO: Environment info:

sys.platform linux Python 3.8.1 (default, Jan 8 2020, 22:29:32) [GCC 7.3.0] numpy 1.23.1 detectron2 0.4 @/home/SwinTextSpotter/detectron2 Compiler GCC 7.5 CUDA compiler CUDA 11.3 detectron2 arch flags 6.1 DETECTRON2_ENV_MODULE PyTorch 1.10.0+cu113 @/miniconda/lib/python3.8/site-packages/torch PyTorch debug build False GPU available True GPU 0,1,2,3 NVIDIA A100 80GB PCIe (arch=8.0) CUDA_HOME /usr/local/cuda Pillow 9.2.0 torchvision 0.11.0+cu113 @/miniconda/lib/python3.8/site-packages/torchvision torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6 fvcore 0.1.5.post20220512 iopath 0.1.7 cv2 4.6.0

PyTorch built with:

GCC 7.3

C++ Version: 201402

Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications

Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)

OpenMP 201511 (a.k.a. OpenMP 4.5)

LAPACK is enabled (usually provided by MKL)

NNPACK is enabled

CPU capability usage: AVX512

CUDA Runtime 11.3

NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86

CuDNN 8.2

Magma 2.5.2

Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

[08/03 14:38:53] detectron2 INFO: Command line arguments: Namespace(config_file='projects/SWINTS/configs/SWINTS-swin-pretrain.yaml', dist_url='tcp://127.0.0.1:49152', eval_only=False, machine_rank=0, num_gpus=4, num_machines=1, opts=[], resume=False) [08/03 14:38:53] detectron2 INFO: Contents of args.config_file=projects/SWINTS/configs/SWINTS-swin-pretrain.yaml: BASE: "Base-SWINTS_swin.yaml" MODEL: WEIGHTS: "ckpt/swin_imagenet_pretrain.pth" SWINTS: NUM_PROPOSALS: 300 NUM_CLASSES: 2 DATASETS: TRAIN: ("totaltext_train","icdar_2015_train","icdar_2013_train","icdar_2017_validation_mlt","icdar_2017_mlt","icdar_curvesynthtext_train1","icdar_curvesynthtext_train2",) TEST: ("totaltext_test",) SOLVER: STEPS: (120000,140000) MAX_ITER: 150000 CHECKPOINT_PERIOD: 5000 INPUT: FORMAT: "RGB"

[08/03 14:38:53] detectron2 INFO: Running with full config: CUDNN_BENCHMARK: False DATALOADER: ASPECT_RATIO_GROUPING: True FILTER_EMPTY_ANNOTATIONS: True NUM_WORKERS: 4 REPEAT_THRESHOLD: 0.0 SAMPLER_TRAIN: TrainingSampler DATASETS: PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000 PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000 PROPOSAL_FILES_TEST: () PROPOSAL_FILES_TRAIN: () TEST: ('totaltext_test',) TRAIN: ('totaltext_train', 'icdar_2015_train', 'icdar_2013_train', 'icdar_2017_validation_mlt', 'icdar_2017_mlt', 'icdar_curvesynthtext_train1', 'icdar_curvesynthtext_train2') GLOBAL: HACK: 1.0 INPUT: CROP: CROP_INSTANCE: False ENABLED: True SIZE: [0.1, 0.1] TYPE: relative_range FORMAT: RGB MASK_FORMAT: polygon MAX_SIZE_TEST: 1824 MAX_SIZE_TRAIN: 1600 MIN_SIZE_TEST: 1000 MIN_SIZE_TRAIN: (640, 672, 704, 736, 768, 800, 832, 864, 896) MIN_SIZE_TRAIN_SAMPLING: choice RANDOM_FLIP: horizontal MODEL: ANCHOR_GENERATOR: ANGLES: [[-90, 0, 90]] ASPECT_RATIOS: [[0.5, 1.0, 2.0]] NAME: DefaultAnchorGenerator OFFSET: 0.0 SIZES: [[32, 64, 128, 256, 512]] BACKBONE: FREEZE_AT: -1 NAME: build_swint_fpn_backbone DEVICE: cuda FPN: FUSE_TYPE: sum IN_FEATURES: ['stage2', 'stage3', 'stage4', 'stage5'] NORM: OUT_CHANNELS: 256 TOP_LEVELS: 2 KEYPOINT_ON: False LOAD_PROPOSALS: False MASK_ON: True META_ARCHITECTURE: SWINTS PANOPTIC_FPN: COMBINE: ENABLED: True INSTANCES_CONFIDENCE_THRESH: 0.5 OVERLAP_THRESH: 0.5 STUFF_AREA_LIMIT: 4096 INSTANCE_LOSS_WEIGHT: 1.0 PIXEL_MEAN: [123.675, 116.28, 103.53] PIXEL_STD: [58.395, 57.12, 57.375] PROPOSAL_GENERATOR: MIN_SIZE: 0 NAME: RPN REC_HEAD: BATCH_SIZE: 128 NUM_CLASSES: 107 POOLER_RESOLUTION: (28, 28) RESOLUTION: (32, 32) RESNETS: DEFORM_MODULATED: False DEFORM_NUM_GROUPS: 1 DEFORM_ON_PER_STAGE: [False, False, False, False] DEPTH: 50 NORM: FrozenBN NUM_GROUPS: 1 OUT_FEATURES: ['res4'] RES2_OUT_CHANNELS: 256 RES5_DILATION: 1 STEM_OUT_CHANNELS: 64 STRIDE_IN_1X1: True WIDTH_PER_GROUP: 64 RETINANET: BBOX_REG_LOSS_TYPE: smooth_l1 BBOX_REG_WEIGHTS: (1.0, 1.0, 1.0, 1.0) FOCAL_LOSS_ALPHA: 0.25 FOCAL_LOSS_GAMMA: 2.0 IN_FEATURES: ['p3', 'p4', 'p5', 'p6', 'p7'] IOU_LABELS: [0, -1, 1] IOU_THRESHOLDS: [0.4, 0.5] NMS_THRESH_TEST: 0.5 NORM: NUM_CLASSES: 80 NUM_CONVS: 4 PRIOR_PROB: 0.01 SCORE_THRESH_TEST: 0.05 SMOOTH_L1_LOSS_BETA: 0.1 TOPK_CANDIDATES_TEST: 1000 ROI_BOX_CASCADE_HEAD: BBOX_REG_WEIGHTS: ((10.0, 10.0, 5.0, 5.0), (20.0, 20.0, 10.0, 10.0), (30.0, 30.0, 15.0, 15.0)) IOUS: (0.5, 0.6, 0.7) ROI_BOX_HEAD: BBOX_REG_LOSS_TYPE: smooth_l1 BBOX_REG_LOSS_WEIGHT: 1.0 BBOX_REG_WEIGHTS: (10.0, 10.0, 5.0, 5.0) CLS_AGNOSTIC_BBOX_REG: False CONV_DIM: 256 FC_DIM: 1024 NAME: NORM: NUM_CONV: 0 NUM_FC: 0 POOLER_RESOLUTION: 7 POOLER_SAMPLING_RATIO: 2 POOLER_TYPE: ROIAlignV2 SMOOTH_L1_BETA: 0.0 TRAIN_ON_PRED_BOXES: False ROI_HEADS: BATCH_SIZE_PER_IMAGE: 512 IN_FEATURES: ['p2', 'p3', 'p4', 'p5'] IOU_LABELS: [0, 1] IOU_THRESHOLDS: [0.5] NAME: Res5ROIHeads NMS_THRESH_TEST: 0.5 NUM_CLASSES: 80 POSITIVE_FRACTION: 0.25 PROPOSAL_APPEND_GT: True SCORE_THRESH_TEST: 0.05 ROI_KEYPOINT_HEAD: CONV_DIMS: (512, 512, 512, 512, 512, 512, 512, 512) LOSS_WEIGHT: 1.0 MIN_KEYPOINTS_PER_IMAGE: 1 NAME: KRCNNConvDeconvUpsampleHead NORMALIZE_LOSS_BY_VISIBLE_KEYPOINTS: True NUM_KEYPOINTS: 17 POOLER_RESOLUTION: 14 POOLER_SAMPLING_RATIO: 0 POOLER_TYPE: ROIAlignV2 ROI_MASK_HEAD: CLS_AGNOSTIC_MASK: False CONV_DIM: 256 NAME: MaskRCNNConvUpsampleHead NORM: NUM_CONV: 0 POOLER_RESOLUTION: 14 POOLER_SAMPLING_RATIO: 0 POOLER_TYPE: ROIAlignV2 RPN: BATCH_SIZE_PER_IMAGE: 256 BBOX_REG_LOSS_TYPE: smooth_l1 BBOX_REG_LOSS_WEIGHT: 1.0 BBOX_REG_WEIGHTS: (1.0, 1.0, 1.0, 1.0) BOUNDARY_THRESH: -1 CONV_DIMS: [-1] HEAD_NAME: StandardRPNHead IN_FEATURES: ['res4'] IOU_LABELS: [0, -1, 1] IOU_THRESHOLDS: [0.3, 0.7] LOSS_WEIGHT: 1.0 NMS_THRESH: 0.7 POSITIVE_FRACTION: 0.5 POST_NMS_TOPK_TEST: 1000 POST_NMS_TOPK_TRAIN: 2000 PRE_NMS_TOPK_TEST: 6000 PRE_NMS_TOPK_TRAIN: 12000 SMOOTH_L1_BETA: 0.0 SEM_SEG_HEAD: COMMON_STRIDE: 4 CONVS_DIM: 128 IGNORE_VALUE: 255 IN_FEATURES: ['p2', 'p3', 'p4', 'p5'] LOSS_WEIGHT: 1.0 NAME: SemSegFPNHead NORM: GN NUM_CLASSES: 54 SWINT: APE: False DEPTHS: [2, 2, 6, 2] DROP_PATH_RATE: 0.2 EMBED_DIM: 96 MLP_RATIO: 4 NUM_HEADS: [3, 6, 12, 24] OUT_FEATURES: ['stage2', 'stage3', 'stage4', 'stage5'] WINDOW_SIZE: 7 SWINTS: ACTIVATION: relu ALPHA: 0.25 CLASS_WEIGHT: 2.0 DEEP_SUPERVISION: True DIM_DYNAMIC: 64 DIM_FEEDFORWARD: 2048 DROPOUT: 0.0 GAMMA: 2.0 GIOU_WEIGHT: 2.0 HIDDEN_DIM: 256 IOU_LABELS: [0, 1] IOU_THRESHOLDS: [0.5] L1_WEIGHT: 5.0 MASK_DIM: 60 MASK_WEIGHT: 2.0 NHEADS: 8 NO_OBJECT_WEIGHT: 0.1 NUM_CLASSES: 2 NUM_CLS: 3 NUM_DYNAMIC: 2 NUM_HEADS: 6 NUM_MASK: 3 NUM_PROPOSALS: 300 NUM_REG: 3 PATH_COMPONENTS: ./projects/SWINTS/LME/coco_2017_train_class_agnosticTrue_whitenTrue_sigmoidTrue_60_siz28.npz PRIOR_PROB: 0.01 REC_WEIGHT: 1.0 TEST_NUM_PROPOSALS: 100 WEIGHTS: ckpt/swin_imagenet_pretrain.pth OUTPUT_DIR: ./output SEED: 40244023 SOLVER: AMP: ENABLED: False BACKBONE_MULTIPLIER: 1.0 BASE_LR: 7.5e-05 BIAS_LR_FACTOR: 1.0 CHECKPOINT_PERIOD: 5000 CLIP_GRADIENTS: CLIP_TYPE: full_model CLIP_VALUE: 1.0 ENABLED: True NORM_TYPE: 2.0 GAMMA: 0.1 IMS_PER_BATCH: 24 LR_SCHEDULER_NAME: WarmupMultiStepLR MAX_ITER: 150000 MOMENTUM: 0.9 NESTEROV: False OPTIMIZER: ADAMW REFERENCE_WORLD_SIZE: 0 STEPS: (120000, 140000) WARMUP_FACTOR: 0.01 WARMUP_ITERS: 1000 WARMUP_METHOD: linear WEIGHT_DECAY: 0.0001 WEIGHT_DECAY_BIAS: 0.0001 WEIGHT_DECAY_NORM: 0.0 TEST: AUG: ENABLED: False FLIP: True MAX_SIZE: 4000 MIN_SIZES: (400, 500, 600, 700, 800, 900, 1000, 1100, 1200) DETECTIONS_PER_IMAGE: 100 EVAL_PERIOD: 100000 EXPECTED_RESULTS: [] INFERENCE_TH_TEST: 0.4 KEYPOINT_OKS_SIGMAS: [] PRECISE_BN: ENABLED: False NUM_ITER: 200 USE_NMS_IN_TSET: True VERSION: 2 VIS_PERIOD: 0 [08/03 14:38:53] detectron2 INFO: Full config saved to ./output/config.yaml [08/03 14:38:55] d2.engine.defaults INFO: Model:

[08/03 14:38:56] d2.data.datasets.coco INFO: Loaded 1255 images in COCO format from datasets/totaltext/totaltext_train.json [08/03 14:38:56] d2.data.datasets.coco INFO: Loaded 1000 images in COCO format from datasets/icdar2015/icdar_2015_train.json [08/03 14:38:56] d2.data.datasets.coco INFO: Loaded 229 images in COCO format from datasets/icdar2013/annotations/icdar_2013.json [08/03 14:38:56] d2.data.datasets.coco INFO: Loaded 1797 images in COCO format from datasets/mlt2017/annotations/icdar_2017_validation_mlt.json [08/03 14:38:57] d2.data.datasets.coco INFO: Loaded 7160 images in COCO format from datasets/mlt2017/annotations/icdar_2017_mlt.json [08/03 14:39:18] d2.data.datasets.coco INFO: Loading datasets/syntext1/annotations/syntext_word_eng.json takes 21.03 seconds. [08/03 14:39:19] d2.data.datasets.coco INFO: Loaded 94723 images in COCO format from datasets/syntext1/annotations/syntext_word_eng.json [08/03 14:39:35] d2.data.datasets.coco INFO: Loading datasets/syntext2/annotations/ecms_v1_maxlen25.json takes 7.61 seconds. [08/03 14:39:35] d2.data.datasets.coco INFO: Loaded 54327 images in COCO format from datasets/syntext2/annotations/ecms_v1_maxlen25.json [08/03 14:39:36] d2.data.build INFO: Removed 2230 images with no usable annotations. 158261 images left. [08/03 14:39:42] d2.data.build INFO: Distribution of instances among all 1 categories: [36m| category | #instances | |:----------:|:-------------| | text | 1868890 | | | |[0m [08/03 14:39:42] d2.data.build INFO: Using training sampler TrainingSampler [08/03 14:39:42] d2.data.common INFO: Serializing 158261 elements to byte tensors and concatenating them all ... [08/03 14:39:48] d2.data.common INFO: Serialized dataset takes 460.55 MiB [08/03 14:39:50] fvcore.common.checkpoint INFO: [Checkpointer] Loading from ckpt/swin_imagenet_pretrain.pth ...

[08/03 14:39:51] d2.engine.train_loop INFO: Starting training from iteration 0

[08/04 14:53:46 d2.utils.events]: eta: 1 day, 14:59:42 iter: 56279 total_loss: 16.8 loss_ce: 0.1334 loss_giou: 0.29 loss_bbox: 0.1194 loss_feat: 0.5878 loss_dice: 0.119 loss_rec: 6.447 loss_ce_0: 0.6154 loss_giou_0: 0.7695 loss_bbox_0: 0.313 loss_feat_0: 1.204 loss_dice_0: 0.2254 loss_ce_1: 0.3727 loss_giou_1: 0.3973 loss_bbox_1: 0.1591 loss_feat_1: 0.8152 loss_dice_1: 0.1459 loss_ce_2: 0.31 loss_giou_2: 0.3258 loss_bbox_2: 0.1341 loss_feat_2: 0.6907 loss_dice_2: 0.1311 loss_ce_3: 0.2053 loss_giou_3: 0.3006 loss_bbox_3: 0.122 loss_feat_3: 0.6204 loss_dice_3: 0.1216 loss_ce_4: 0.1435 loss_giou_4: 0.2943 loss_bbox_4: 0.1182 loss_feat_4: 0.6 loss_dice_4: 0.1188 time: 1.5465 data_time: 0.0930 lr: 7.5e-05 max_mem: 66803M [08/04 14:54:16 d2.utils.events]: eta: 1 day, 14:59:52 iter: 56299 total_loss: 17.43 loss_ce: 0.1425 loss_giou: 0.2743 loss_bbox: 0.1041 loss_feat: 0.5848 loss_dice: 0.1222 loss_rec: 6.74 loss_ce_0: 0.6106 loss_giou_0: 0.7541 loss_bbox_0: 0.3087 loss_feat_0: 1.213 loss_dice_0: 0.2271 loss_ce_1: 0.378 loss_giou_1: 0.3861 loss_bbox_1: 0.1526 loss_feat_1: 0.8113 loss_dice_1: 0.1538 loss_ce_2: 0.3031 loss_giou_2: 0.3231 loss_bbox_2: 0.1218 loss_feat_2: 0.6802 loss_dice_2: 0.1366 loss_ce_3: 0.2059 loss_giou_3: 0.2907 loss_bbox_3: 0.1099 loss_feat_3: 0.6195 loss_dice_3: 0.127 loss_ce_4: 0.1465 loss_giou_4: 0.2804 loss_bbox_4: 0.1056 loss_feat_4: 0.5916 loss_dice_4: 0.123 time: 1.5464 data_time: 0.0578 lr: 7.5e-05 max_mem: 66803M [08/04 14:54:47 d2.utils.events]: eta: 1 day, 14:59:22 iter: 56319 total_loss: 16.86 loss_ce: 0.1324 loss_giou: 0.2875 loss_bbox: 0.1126 loss_feat: 0.5857 loss_dice: 0.1232 loss_rec: 6.414 loss_ce_0: 0.6167 loss_giou_0: 0.7572 loss_bbox_0: 0.3231 loss_feat_0: 1.195 loss_dice_0: 0.2153 loss_ce_1: 0.3705 loss_giou_1: 0.3863 loss_bbox_1: 0.1542 loss_feat_1: 0.8011 loss_dice_1: 0.1507 loss_ce_2: 0.2973 loss_giou_2: 0.32 loss_bbox_2: 0.126 loss_feat_2: 0.6879 loss_dice_2: 0.1377 loss_ce_3: 0.1973 loss_giou_3: 0.2947 loss_bbox_3: 0.1173 loss_feat_3: 0.6282 loss_dice_3: 0.1278 loss_ce_4: 0.1405 loss_giou_4: 0.2881 loss_bbox_4: 0.113 loss_feat_4: 0.5965 loss_dice_4: 0.1255 time: 1.5464 data_time: 0.0538 lr: 7.5e-05 max_mem: 66803M
opened by Shualite 3
About using the pretrained model to spot Vietnamese

Hello, i want to use your pretrained model which was fine tuned on vintext to spot Vietnamese. But seems like it's predicting characters in the English alphabet. I don't see where to change the alphabet in the config files, where should I take a look at to modify this?

opened by namdo281 3
Change character list

Hello everyone, I want to change character list to match my custom dataset characters. How can I do this? I have tried to change CTLBELS, chars whenever character list appear in the code but I end up with this errror "Assertion cur_target >= 0 && cur_target < n_classes failed".

Thank you!

opened by khiemledev 3
Lacking the introduction of char dictionary for recognition

The README doesn't show any detail about char dictionary for various datasets. I print embedding size in rects_model_final.pth and tt_model_final.pth, getting 5000+, 107, respectively.

opened by TangLinJie 3

Installation error

Hi, thanks for your sharing. But I've met some errors and wish you can give more suggestions.

I finished all the installation following the readme.md, then run demo.py  following getting_started.md
`
Traceback (most recent call last):
  File "demo.py", line 14, in <module>
    from predictor import VisualizationDemo
  File "/XXX/SwinTextSpotter/demo/predictor.py", line 13, in <module>
    from detectron2.utils.visualizer_chn import Visualizer as Visualizer_chn
ModuleNotFoundError: No module named 'detectron2.utils.visualizer_chn'
`

I don't understant why “import detectron2.utils.visualizer" is OK, but " import detectron2.utils.visualizer_chn" as well as " import detectron2.utils.visualizer_vintext" can't be done.

opened by CrazyBrick 2

关于rects数据集的问题

你好，我想问一下，我使用rects数据集来测试demo.py的结果，发现输出的output里没有汉字，我测试totaltext上的英文数据集就有显示英文的

以下是我输入的指令：使用的model为作者您提供的rects的pth文件 python demo/demo.py --config-file projects/SWINTS/configs/SWINTS-swin-chn_finetune.yaml --input input1.jpg --output ./output --confidence-threshold 0.4 --opts MODEL.WEIGHTS ./output/rects_model_final.pth

SWINTS-swin-chn_finetune.yaml的配置： BASE: "Base-SWINTS_swin.yaml" MODEL: #WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl" WEIGHTS: "./output/rects_model_final.pth" SWINTS: NUM_PROPOSALS: 300 NUM_CLASSES: 2 REC_HEAD: POOLER_RESOLUTION: (16,40) RESOLUTION: (32, 80) BATCH_SIZE: 128 NUM_CLASSES: 5463 DATASETS: TRAIN: ("rects",) TEST: ("totaltext_test",) SOLVER: STEPS: (140000,160000) MAX_ITER: 180000 CHECKPOINT_PERIOD: 10000 INPUT: FORMAT: "RGB"

测试的图片：rects里的测试图

产生的检测结果：显示检测到了7个实例

output结果：输出的jpg文件无汉字

opened by yxyxnrh 2
RuntimeError: CUDA out of memory.

训练的时候出现错误“RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 7.92 GiB total capacity; 6.76 GiB already allocated; 66.50 MiB free; 6.87 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF" 将BANTCH_SIZE设置为1依旧显示该错误，请问作者有解决的方案吗？

opened by juneane 3

Owner

mxin262

GitHub

Automatic number plate recognition using tech: Yolo, OCR, Scene text detection, scene text recognation, flask, torch

Automatic Number Plate Recognition Automatic Number Plate Recognition (ANPR) is the process of reading the characters on the plate with various optica

52 Dec 22, 2022

[CVPR 2022] CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation

CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation Prerequisite Please create and activate the following conda envrionment. To r

87 Jan 8, 2023

Code for CVPR 2021 oral paper "Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts"

Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts The rapid progress in 3D scene understanding has come with growing dem

182 Dec 30, 2022

Code for the paper "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021)

MASTER-PyTorch PyTorch reimplementation of "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021). This projec

255 Dec 29, 2022

Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Make-A-Scene - PyTorch Pytorch implementation (inofficial) of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors (https://arxiv.org/

259 Dec 28, 2022

Neural Scene Graphs for Dynamic Scene (CVPR 2021)

Implementation of Neural Scene Graphs, that optimizes multiple radiance fields to represent different objects and a static scene background. Learned representations can be rendered with novel object compositions and views.

151 Dec 26, 2022

[CVPR 2022 Oral] Crafting Better Contrastive Views for Siamese Representation Learning

Crafting Better Contrastive Views for Siamese Representation Learning (CVPR 2022 Oral) 2022-03-29: The paper was selected as a CVPR 2022 Oral paper! 2

249 Dec 28, 2022

Scene-Text-Detection-and-Recognition (Pytorch)

Scene-Text-Detection-and-Recognition (Pytorch) Competition URL: https://tbrain.t

9 Jan 2, 2023

Official PyTorch implementation of Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations

Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations Zhenyu Jiang, Yifeng Zhu, Maxwell Svetlik, Kuan Fang, Yu

UT-Austin Robot Perception and Learning Lab

63 Jan 3, 2023

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

README.md shall be finished soon. WSSGG 0 Overview 1 Installation 1.1 Faster-RCNN 1.2 Language Parser 1.3 GloVe Embeddings 2 Settings 2.1 VG-GT-Graph

35 Nov 20, 2022

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

Related tags

Overview

SwinTextSpotter

Models

Installation

Steps

Totaltext

Example results:

Acknowlegement

Citation

Copyright

Comments

Owner

mxin262

Automatic number plate recognition using tech: Yolo, OCR, Scene text detection, scene text recognation, flask, torch

[CVPR 2022] CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation

Code for CVPR 2021 oral paper "Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts"

Code for the paper "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021)

Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Neural Scene Graphs for Dynamic Scene (CVPR 2021)

[CVPR 2022 Oral] Crafting Better Contrastive Views for Siamese Representation Learning

Scene-Text-Detection-and-Recognition (Pytorch)

Official PyTorch implementation of Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

[TIP 2020] Multi-Temporal Scene Classification and Scene Change Detection with Correlation based Fusion

Code for "Primitive Representation Learning for Scene Text Recognition" (CVPR 2021)

HSC4D: Human-centered 4D Scene Capture in Large-scale Indoor-outdoor Space Using Wearable IMUs and LiDAR. CVPR 2022

PyTorch implementations of neural network models for keyword spotting

PyTorch code of my ICDAR 2021 paper Vision Transformer for Fast and Efficient Scene Text Recognition (ViTSTR)

PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

Code for "Neural 3D Scene Reconstruction with the Manhattan-world Assumption" CVPR 2022 Oral

Unofficial TensorFlow implementation of the Keyword Spotting Transformer model

Unofficial TensorFlow implementation of the Keyword Spotting Transformer model