Abstract:
I followed MMOCR Documentation reproduce the CTW1500 and Icdar 2015 emperiments, and followed the ctw1500 config create Total-text. Download Total-text imgs and annotations(.txt) from https://github.com/cs-chan/Total-Text-Dataset/tree/master/Dataset, and used tools/totaltext_convert.py convert totaltxt to icdardataset. About Fcenet_targets and textsnake_targets, i am not alter anywhere.But i found something wrong when i run train.py, whatever used or not convert annotations.
Env:
ubuntu:20.04
python:3.7.11
pytorch:1.8.0
cuda:11.5
mmcv-full:1.3.16
mmdet:2.18.0
Error:
` --------------------
2022-02-05 18:53:38,576 - mmocr - INFO - workflow: [('train', 1)], max: 1500 epochs
2022-02-05 18:53:38,576 - mmocr - INFO - Checkpoints will be saved to /home/bill/Project/mmocr/fce_2626*2020 by HardDiskBackend.
2022-02-05 18:53:44,399 - mmocr - INFO - Epoch [1][5/315] lr: 1.000e-03, eta: 6 days, 6:53:38, time: 1.150, data_time: 0.543, memory: 3931, loss_text: 2.5313, loss_center: 2.2066, loss_reg_x: 7.5278, loss_reg_y: 4.5634, loss: 34.7722
2022-02-05 18:53:46,342 - mmocr - INFO - Epoch [1][10/315] lr: 1.000e-03, eta: 4 days, 4:57:35, time: 0.389, data_time: 0.029, memory: 3931, loss_text: 1.7856, loss_center: 1.8217, loss_reg_x: 5.2337, loss_reg_y: 3.8988, loss: 20.8382
2022-02-05 18:53:48,251 - mmocr - INFO - Epoch [1][15/315] lr: 1.000e-03, eta: 3 days, 12:00:22, time: 0.382, data_time: 0.025, memory: 3931, loss_text: 1.9730, loss_center: 2.1837,loss_reg_x: 7.9185, loss_reg_y: 3.2578, loss: 21.1362
Traceback (most recent call last):
File "/home/bill/Project/mmocr/tools/train.py", line 221, in <module>
main()
File "/home/bill/Project/mmocr/tools/train.py", line 217, in main
meta=meta)
File "/home/bill/Project/mmocr/mmocr/apis/train.py", line 163, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/home/bill/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/bill/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 47, in train
for i, data_batch in enumerate(self.data_loader):
File "/home/bill/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 517, in __next__
data = self._next_data()
File "/home/bill/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data
return self._process_data(data)
File "/home/bill/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data
data.reraise()
File "/home/bill/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/_utils.py", line 429, in reraise
raise self.exc_type(msg)
AssertionError: Caught AssertionError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/bill/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
data = fetcher.fetch(index)
File "/home/bill/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/bill/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/bill/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmdet/datasets/custom.py", line 195, in __getitem__
data = self.prepare_train_img(idx)
File "/home/bill/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmdet/datasets/custom.py", line 218, in prepare_train_img
return self.pipeline(results)
File "/home/bill/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmdet/datasets/pipelines/compose.py", line 41, in __call__
data = t(data)
File "/home/bill/Project/mmocr/mmocr/datasets/pipelines/textdet_targets/base_textdet_targets.py", line 167, in __call__
results = self.generate_targets(results)
File "/home/bill/Project/mmocr/mmocr/datasets/pipelines/textdet_targets/fcenet_targets.py", line 351, in generate_targets
polygon_masks_ignore)
File "/home/bill/Project/mmocr/mmocr/datasets/pipelines/textdet_targets/fcenet_targets.py", line 316, in generate_level_targets
level_img_size, lv_text_polys[ind])[None]
File "/home/bill/Project/mmocr/mmocr/datasets/pipelines/textdet_targets/fcenet_targets.py", line 69, in generate_center_region_mask
_, _, top_line, bot_line = self.reorder_poly_edge(polygon_points)
File "/home/bill/Project/mmocr/mmocr/datasets/pipelines/textdet_targets/textsnake_targets.py", line 179, in reorder_poly_edge
assert points.shape[0] >= 4
AssertionError
`
Config:
`dataset_type = 'IcdarDataset'
data_root = 'tests/data/total-text-txt'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
dict(
type='LoadTextAnnotations',
with_bbox=True,
with_mask=True,
poly2mask=False),
dict(
type='ColorJitter',
brightness=32.0 / 255,
saturation=0.5,
contrast=0.5),
dict(type='Normalize', **img_norm_cfg),
dict(type='RandomScaling', size=800, scale=(3. / 4, 5. / 2)),
dict(
type='RandomCropFlip', crop_ratio=0.5, iter_num=1, min_area_ratio=0.2),
dict(
type='RandomCropPolyInstances',
instance_key='gt_masks',
crop_ratio=0.8,
min_side_ratio=0.3),
dict(
type='RandomRotatePolyInstances',
rotate_ratio=0.5,
max_angle=30,
pad_with_fixed_color=False),
dict(type='SquareResizePad', target_size=800, pad_ratio=0.6),
dict(type='RandomFlip', flip_ratio=0.5, direction='horizontal'),
dict(type='Pad', size_divisor=32),
dict(
type='FCENetTargets',
fourier_degree=fourier_degree,
level_proportion_range=((0, 0.25), (0.2, 0.65), (0.55, 1.0))),
# level_proportion_range=((0, 0.4), (0.3, 0.7), (0.6, 1.0))),
dict(
type='CustomFormatBundle',
keys=['p3_maps', 'p4_maps', 'p5_maps'],
visualize=dict(flag=False, boundary_key=None)),
dict(type='Collect', keys=['img', 'p3_maps', 'p4_maps', 'p5_maps'])
]
test_pipeline = [
dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
dict(
type='MultiScaleFlipAug',
img_scale=(1080, 736),
flip=False,
transforms=[
dict(type='Resize', img_scale=(2626, 2020), keep_ratio=True), #12080,800 #1920,1080 #2022,2022 cha, 2560 1600 cha
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img']),
])
]
data = dict(
samples_per_gpu=4,
workers_per_gpu=2,
val_dataloader=dict(samples_per_gpu=1),
test_dataloader=dict(samples_per_gpu=1),
train=dict(
type=dataset_type,
ann_file=data_root + '/instances_training.json',
img_prefix=data_root + '/imgs',
pipeline=train_pipeline),
val=dict(
type=dataset_type,
ann_file=data_root + '/instances_test.json',
img_prefix=data_root + '/imgs',
pipeline=test_pipeline),
test=dict(
type=dataset_type,
ann_file=data_root + '/instances_test.json',
img_prefix=data_root + '/imgs',
pipeline=test_pipeline))
evaluation = dict(interval=1, metric='hmean-iou', save_best='auto')
# optimizer
optimizer = dict(type='SGD', lr=1e-3, momentum=0.90, weight_decay=5e-4)
optimizer_config = dict(grad_clip=None)
lr_config = dict(policy='poly', power=0.9, min_lr=1e-7, by_epoch=True)
total_epochs = 1500
checkpoint_config = dict(interval=150)
# yapf:disable
log_config = dict(
interval=5,
hooks=[
dict(type='TextLoggerHook')
])
# yapf:enable
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]`
Annotations(Example_poly_gt_img12.txt):
original annotations
`x: [[112 149 200 210 166 134]], y: [[411 358 336 358 381 422]], ornt: [u'c'], transcriptions: [u'WOODFORD']
x: [[212 262 316 307 257 217]], y: [[333 325 337 359 350 355]], ornt: [u'c'], transcriptions: [u'RESERVE']
x: [[326 385 401 377 356 315]], y: [[346 391 440 442 396 365]], ornt: [u'c'], transcriptions: [u'DISTILLERY']
x: [[199 222 245 246 230 208]], y: [[374 364 362 385 384 392]], ornt: [u'c'], transcriptions: [u'DSP']
x: [[257 286 283 253]], y: [[363 366 388 383]], ornt: [u'm'], transcriptions: [u'KY']
x: [[297 324 316 290]], y: [[370 384 401 391]], ornt: [u'm'], transcriptions: [u'52']
x: [[168 251 248 167]], y: [[473 478 497 490]], ornt: [u'm'], transcriptions: [u'BOURBON']
x: [[258 333 334 259]], y: [[479 483 503 495]], ornt: [u'm'], transcriptions: [u'WHISKEY']`
or convert annotations
`112,411,149,358,200,336,210,358,166,381,134,422,WOODFORD
212,333,262,325,316,337,307,359,257,350,217,355,RESERVE
326,346,385,391,401,440,377,442,356,396,315,365,DISTILLERY
199,374,222,364,245,362,246,385,230,384,208,392,DSP
257,363,286,366,283,388,253,383,KY
297,370,324,384,316,401,290,391,52
168,473,251,478,248,497,167,490,BOURBON
258,479,333,483,334,503,259,495,WHISKEY`
Adds:
I success to reproduce ctw1500 and total-text emperiment, i remember to see that about total-text is similar to ctw1500 ,so i copy it as base config, and alter some network to improve p,r,and h-means ,but i found can not run this, i search for github of mmocr issuse about total-text, unfortunately, i cann't , so i hope some body can help me to fix it . thank your for your help sincerely. @gaotongxiao @cuhk-hbsun
need reproduce