2nd solution of ICDAR 2021 Competition on Scientific Literature Parsing, Task B.

Overview

TableMASTER-mmocr

Contents

  1. About The Project
  2. Getting Started
  3. Usage
  4. Result
  5. License
  6. Acknowledgements

About The Project

This project presents our 2nd place solution for ICDAR 2021 Competition on Scientific Literature Parsing, Task B. We reimplement our solution by MMOCR,which is an open-source toolbox based on PyTorch. You can click here for more details about this competition. Our original implementation is based on FastOCR (one of our internal toolbox similar with MMOCR).

Method Description

In our solution, we divide the table content recognition task into four sub-tasks: table structure recognition, text line detection, text line recognition, and box assignment. Based on MASTER, we propose a novel table structure recognition architrcture, which we call TableMASTER. The difference between MASTER and TableMASTER will be shown below. You can click here for more details about this solution.

MASTER's architecture

Dependency

Getting Started

Prerequisites

  • Competition dataset PubTabNet, click here for downloading.
  • About PubTabNet, check their github and paper.
  • About the metric TEDS, see github

Installation

  1. Install mmdetection. click here for details.

    # We embed mmdetection-2.11.0 source code into this project.
    # You can cd and install it (recommend).
    cd ./mmdetection-2.11.0
    pip install -v -e .
  2. Install mmocr. click here for details.

    # install mmocr
    cd ./MASTER_mmocr
    pip install -v -e .
  3. Install mmcv-full-1.3.4. click here for details.

    pip install mmcv-full=={mmcv_version} -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html
    
    # install mmcv-full-1.3.4 with torch version 1.8.0 cuda_version 10.2
    pip install mmcv-full==1.3.4 -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.8.0/index.html

Usage

Data preprocess

Run data_preprocess.py to get valid train data. Remember to change the 'raw_img_root' and ‘save_root’ property of PubtabnetParser to your path.

python ./table_recognition/data_preprocess.py

It will about 8 hours to finish parsing 500777 train files. After finishing the train set parsing, change the property of 'split' folder in PubtabnetParser to 'val' and get formatted val data.

Directory structure of parsed train data is :

.
├── StructureLabelAddEmptyBbox_train
│   ├── PMC1064074_007_00.txt
│   ├── PMC1064076_003_00.txt
│   ├── PMC1064076_004_00.txt
│   └── ...
├── recognition_train_img
│   ├── 0
│       ├── PMC1064100_007_00_0.png
│       ├── PMC1064100_007_00_10.png
│       ├── ...
│       └── PMC1064100_007_00_108.png
│   ├── 1
│   ├── ...
│   └── 15
├── recognition_train_txt
│   ├── 0.txt
│   ├── 1.txt
│   ├── ...
│   └── 15.txt
├── structure_alphabet.txt
└── textline_recognition_alphabet.txt

Train

  1. Train text line detection model with PSENet.

    sh ./table_recognition/table_text_line_detection_dist_train.sh

    We don't offer PSENet train data here, you can create the text line annotations by open source label software. In our experiment, we only use 2,500 table images to train our model. It gets a perfect text line detection result on validation set.

  2. Train text-line recognition model with MASTER.

    sh ./table_recognition/table_text_line_recognition_dist_train.sh

    We can get about 30,000,000 text line images from 500,777 training images and 550,000 text line images from 9115 validation images. But we only select 20,000 text line images from 550,000 dataset for evaluatiing after each trainig epoch, to pick up the best text line recognition model.

    Note that our MASTER OCR is directly trained on samples mixed with single-line texts and multiple-line texts.

  3. Train table structure recognition model, with TableMASTER.

    sh ./table_recognition/table_recognition_dist_train.sh

Inference

To get final results, firstly, we need to forward the three up-mentioned models, respectively. Secondly, we merge the results by our matching algorithm, to generate the final HTML code.

  1. Models inference. We do this to speed up the inference.
python ./table_recognition/run_table_inference.py

run_table_inference.py wil call table_inference.py and use multiple gpu devices to do model inference. Before running this script, you should change the value of cfg in table_inference.py .

Directory structure of text line detection and text line recognition inference results are:

# If you use 8 gpu devices to inference, you will get 8 detection results pickle files, one end2end_result pickle files and 8 structure recognition results pickle files. 
.
├── end2end_caches
│   ├── end2end_results.pkl
│   ├── detection_results_0.pkl
│   ├── detection_results_1.pkl
│   ├── ...
│   └── detection_results_7.pkl
├── structure_master_caches
│   ├── structure_master_results_0.pkl
│   ├── structure_master_results_1.pkl
│   ├── ...
│   └── structure_master_results_7.pkl
  1. Merge results.
python ./table_recognition/match.py

After matching, congratulations, you will get final result pickle file.

Get TEDS score

  1. Installation.

    pip install -r ./table_recognition/PubTabNet-master/src/requirements.txt
  2. Get gtVal.json.

    python ./table_recognition/get_val_gt.py
  3. Calcutate TEDS score. Before run this script, modify pred file path and gt file path in mmocr_teds_acc_mp.py

    python ./table_recognition/PubTabNet-master/src/mmocr_teds_acc_mp.py

Result

Text line end2end recognition accuracy

Models Accuracy
PSENet + MASTER 0.9885

Structure recognition accuracy

Model architecture Accuracy
TableMASTER_maxlength_500 0.7808
TableMASTER_ConcatLayer_maxlength_500 0.7821
TableMASTER_ConcatLayer_maxlength_600 0.7799

TEDS score

Models TEDS
PSENet + MASTER + TableMASTER_maxlength_500 0.9658
PSENet + MASTER + TableMASTER_ConcatLayer_maxlength_500 0.9669
PSENet + MASTER + ensemble_TableMASTER 0.9676

In this paper, we reported 0.9684 TEDS score in validation set (9115 samples). The gap between 0.9676 and 0.9684 comes from that we ensemble three text line models in the competition, but here, we only use one model. Of course, hyperparameter tuning will also affect TEDS score.

License

This project is licensed under the MIT License. See LICENSE for more details.

Citations

@article{ye2021pingan,
  title={PingAn-VCGroup's Solution for ICDAR 2021 Competition on Scientific Literature Parsing Task B: Table Recognition to HTML},
  author={Ye, Jiaquan and Qi, Xianbiao and He, Yelin and Chen, Yihao and Gu, Dengyi and Gao, Peng and Xiao, Rong},
  journal={arXiv preprint arXiv:2105.01848},
  year={2021}
}
@article{He2021PingAnVCGroupsSF,
  title={PingAn-VCGroup's Solution for ICDAR 2021 Competition on Scientific Table Image Recognition to Latex},
  author={Yelin He and Xianbiao Qi and Jiaquan Ye and Peng Gao and Yihao Chen and Bingcong Li and Xin Tang and Rong Xiao},
  journal={ArXiv},
  year={2021},
  volume={abs/2105.01846}
}
@article{Lu2021MASTER,
  title={{MASTER}: Multi-Aspect Non-local Network for Scene Text Recognition},
  author={Ning Lu and Wenwen Yu and Xianbiao Qi and Yihao Chen and Ping Gong and Rong Xiao and Xiang Bai},
  journal={Pattern Recognition},
  year={2021}
}
@article{li2018shape,
  title={Shape robust text detection with progressive scale expansion network},
  author={Li, Xiang and Wang, Wenhai and Hou, Wenbo and Liu, Ruo-Ze and Lu, Tong and Yang, Jian},
  journal={arXiv preprint arXiv:1806.02559},
  year={2018}
}

Acknowledgements

Comments
  • Error when training text-line detection model

    Error when training text-line detection model

    Hi, thanks for the great repo!

    • I followed the guide in the README file and want to train the text-line detection model
    • I also prepared a dataset with COCO format, same as the MMOCR's repo and psenet_r50_fpnf_600e_pubtabnet.py but got the following error. Seem like it occurs with an empty object but I did not sure the error came from Did you face this issue when training the text-line detection model? Or do you have an idea about how to fix this issue?

    Thanks in advance!

    photo6170031322173648295

    opened by huyhoang17 6
  • keyError:'TABLEMASTER is not in the {registry.name} registry'

    keyError:'TABLEMASTER is not in the {registry.name} registry'

    您好,按照Install.md安装了mmdetection、mmcv-full、mmocr, 但是运行 sh ./table_recognition/expr/table_recognition_dist_train.sh 报错: keyError:'TABLEMASTER is not in the {registry.name} registry', 请问有什么建议的解决方法吗?

    opened by BrandnewA 5
  • missing text line recognition alphabet

    missing text line recognition alphabet

    Hi, thanks for publishing code. when I try to use text recognition checkpoint, find that the alphabet generated by mine is different from yours. This could be due to the fact that I only used part of the data for training master test recongnition model. Could you provide the ./tools/data/alphabet/textline_recognition_alphabet.txt file?

    opened by zezeze97 2
  • Master is not registy,是需要单独用master训练?

    Master is not registy,是需要单独用master训练?

    return build_from_cfg(cfg, registry, default_args)
    

    File "/python3.6/site-packages/mmcv/utils/registry.py", line 44, in build_from_cfg f'{obj_type} is not in the {registry.name} registry') KeyError: 'MASTER is not in the detector registry'

    opened by cqray1990 2
  • 训练后测试报错 'TableResize is not in the pipeline registry'

    训练后测试报错 'TableResize is not in the pipeline registry'

    我对表格结构识别模型训练一轮后想用test_imgs.py这个脚本来测试一下模型能不能跑通,结果报了一个错,请问这是因为mmcv的版本问题吗,还是什么其他问题?谢谢!(我是用pip install mmcv-full==1.3.4 -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.7.0/index.html这个命令安装mmcv的, mmocr是通过在当前repo的根路径下运行pip install -v -e .安装的,安装后mmocr的版本是0.2.0)

    python tools/test_imgs.py configs/textrecog/master/table_master_ResnetExtract_Ranger_0705.py experiments/mmocr_table_recognition/0705/latest.pth /home/datasets/ocr/table/pubtabnet/test /home/datasets/ocr/table/pubtabnet/test.list
    Use load_from_local loader
    [>>>>>>>>>>>>                                      ] 1/4, 22075.3 task/s, elapsed: 0s, ETA:     0sTraceback (most recent call last):
      File "tools/test_imgs.py", line 165, in <module>
        main()
      File "tools/test_imgs.py", line 151, in main
        result = inference_detector(model, img_path)
      File "/home/xray/homework/table/TableMASTER-mmocr/mmdetection-2.11.0/mmdet/apis/inference.py", line 117, in inference_detector
        test_pipeline = Compose(cfg.data.test.pipeline)
      File "/home/xray/homework/table/TableMASTER-mmocr/mmdetection-2.11.0/mmdet/datasets/pipelines/compose.py", line 22, in __init__
        transform = build_from_cfg(transform, PIPELINES)
      File "/home/anaconda3/envs/pytorch1.7/lib/python3.6/site-packages/mmcv/utils/registry.py", line 44, in build_from_cfg
        f'{obj_type} is not in the {registry.name} registry')
    KeyError: 'TableResize is not in the pipeline registry'
    
    
    opened by xray1111 2
  • Convert model to ONNX / TensorRT?

    Convert model to ONNX / TensorRT?

    Have you tried to convert the MASTER-based model (included textline-recognition & table recognition) to ONNX or TensorRT format?

    I followed the tutorial of MMOCR and MMDet to convert those models to ONNX format, here is the output error Do you have any idea for handle this issue? Thanks in advance.

    image

    opened by huyhoang17 2
  • Performance issues

    Performance issues

    Hi,

    I tried to use table-master (alone, not the end2end model) but it takes 18s just to infer on one image from pubtabnet (~500 pixels width) plus it takes about 2GB memory from the GPU (1 tesla K80).

    Therefore I wanted to know if it is the normal memory consumption and processing time for table-master and also I wanted to know if there is any mean to reduce the memory consumption and increase the processing speed. Because it is unfeasible to use a model that takes 18s to infer on one image.

    Thank you in advance for your answer

    opened by GTimothee 1
  • 'TABLEMASTER is not in the models registry'

    'TABLEMASTER is not in the models registry'

    After run python data_preprocess.py, I want to train table structure recognition model TableMASTER.

    But when I run ./table_recognition/expr/table_recognition_dist_train.sh, it raise KeyError: 'TABLEMASTER is not in the models registry'

    opened by baoyuxu 1
  • distance_rule_match

    distance_rule_match

    def distance_rule_match(end2end_indexes, end2end_bboxes, master_indexes, master_bboxes):
        """
        Get matching between no-match end2end bboxes and no-match master bboxes.
        Use min distance to match.
        This rule will only run (no-match end2end nums > 0) and (no-match master nums > 0)
        It will Return master_bboxes_nums match-pairs.
        :param end2end_indexes:
        :param end2end_bboxes:
        :param master_indexes:
        :param master_bboxes:
        :return: match_pairs list, e.g. [[0,1], [1,2], ...]
        """
        min_match_list = []
        for j, master_bbox in zip(master_indexes, master_bboxes):
            min_distance = np.inf
            min_match = [0, 0]  # i, j
            for i, end2end_bbox in zip(end2end_indexes, end2end_bboxes):
                x_end2end, y_end2end = end2end_bbox[0], end2end_bbox[1]
                x_master, y_master = master_bbox[0], master_bbox[1]
                end2end_point = (x_end2end, y_end2end)
                master_point = (x_master, y_master)
                dist = cal_distance(master_point, end2end_point)
                if dist < min_distance:
                    min_match[0], min_match[1] = i, j
                    min_distance = dist
            min_match_list.append(min_match)
        return min_match_list
    

    About this function, the output may contain several matches [i, *] for one i. But you want to find only one match for a specific i, should we change order of the two loops here?

    opened by shaonanqinghuaizongshishi 0
  • when tun table_inference.py with one gpu of 2080ti  load epoch_16_0.7767.pth model

    when tun table_inference.py with one gpu of 2080ti load epoch_16_0.7767.pth model

    when tun table_inference.py with one gpu of 2080ti load epoch_16_0.7767.pth model and config files as follows:

    base = [ '../../base/default_runtime.py' ]

    alphabet_file = '/tools/data/alphabet/structure_alphabet.txt' alphabet_len = len(open(alphabet_file, 'r').readlines()) max_seq_len = 500

    start_end_same = False label_convertor = dict( type='TableMasterConvertor', dict_file=alphabet_file, max_seq_len=max_seq_len, start_end_same=start_end_same, with_unknown=True)

    if start_end_same: PAD = alphabet_len + 2 else: PAD = alphabet_len + 3

    model = dict( type='TABLEMASTER', backbone=dict( type='TableResNetExtra', input_dim=3, gcb_config=dict( ratio=0.0625, headers=1, att_scale=False, fusion_type="channel_add", layers=[False, True, True, True], ), layers=[1,2,5,3]), encoder=dict( type='PositionalEncoding', d_model=512, dropout=0.2, max_len=5000), decoder=dict( type='TableMasterDecoder', N=3, decoder=dict( self_attn=dict( headers=8, d_model=512, dropout=0.), src_attn=dict( headers=8, d_model=512, dropout=0.), feed_forward=dict( d_model=512, d_ff=2024, dropout=0.), size=512, dropout=0.), d_model=512), loss=dict(type='MASTERTFLoss', ignore_index=PAD, reduction='mean'), bbox_loss=dict(type='TableL1Loss', reduction='sum'), label_convertor=label_convertor, max_seq_len=max_seq_len)

    TRAIN_STATE = True img_norm_cfg = dict(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]) train_pipeline = [ dict(type='LoadImageFromNdarrayV2'), dict( type='TableResize', keep_ratio=True, long_size=480), dict( type='TablePad', size=(480, 480), pad_val=0, return_mask=True, mask_ratio=(8, 8), train_state=TRAIN_STATE), dict(type='TableBboxEncode'), dict(type='ToTensorOCR'), dict(type='NormalizeOCR', **img_norm_cfg), dict( type='Collect', keys=['img'], meta_keys=[ 'filename', 'ori_shape', 'img_shape', 'text', 'scale_factor', 'bbox', 'bbox_masks', 'pad_shape' ]), ]

    valid_pipeline = [ dict(type='LoadImageFromNdarrayV2'), dict( type='TableResize', keep_ratio=True, long_size=480), dict( type='TablePad', size=(480, 480), pad_val=0, return_mask=True, mask_ratio=(8, 8), train_state=TRAIN_STATE), dict(type='TableBboxEncode'), dict(type='ToTensorOCR'), dict(type='NormalizeOCR', **img_norm_cfg), dict( type='Collect', keys=['img'], meta_keys=[ 'filename', 'ori_shape', 'img_shape', 'scale_factor', 'img_norm_cfg', 'ori_filename', 'bbox', 'bbox_masks', 'pad_shape' ]), ]

    test_pipeline = [ dict(type='LoadImageFromNdarrayV2'), dict( type='TableResize', keep_ratio=True, long_size=480), dict( type='TablePad', size=(480, 480), pad_val=0, return_mask=True, mask_ratio=(8, 8), train_state=TRAIN_STATE), #dict(type='TableBboxEncode'), dict(type='ToTensorOCR'), dict(type='NormalizeOCR', **img_norm_cfg), dict( type='Collect', keys=['img'], meta_keys=[ 'filename', 'ori_shape', 'img_shape', 'scale_factor', 'img_norm_cfg', 'ori_filename', 'pad_shape' ]), ]

    dataset_type = 'OCRDataset' #train_img_prefix = '/pubtabnet/pubtabnet/train' #train_anno_file1 = /StructureLabel_train' train_img_prefix = "pubtabnet/pubtabnet/train" train_anno_file1 = "StructureLabel_train"

    train_img_prefix = ''

    train_anno_file1 = ''

    train1 = dict( type=dataset_type, img_prefix=train_img_prefix, ann_file=train_anno_file1, loader=dict( type='TableMASTERLmdbLoader', repeat=1, max_seq_len=max_seq_len, parser=dict( type='TableMASTERLmdbParser', keys=['filename', 'text'], keys_idx=[0, 1], separator=' ')), pipeline=train_pipeline, test_mode=False)

    valid_img_prefix = /pubtabnet/pubtabnet/val'

    valid_anno_file1 = /StructureLabel_val'

    valid_img_prefix = '/pubtabnet/pubtabnet/val' valid_anno_file1 = '/StructureLabel_val' valid = dict( type=dataset_type, img_prefix=valid_img_prefix, ann_file=valid_anno_file1, loader=dict( type='TableMASTERLmdbLoader', repeat=1, max_seq_len=max_seq_len, parser=dict( type='TableMASTERLmdbParser', keys=['filename', 'text'], keys_idx=[0, 1], separator=' ')), pipeline=valid_pipeline, dataset_info='table_master_dataset', test_mode=True)

    test_img_prefix = /pubtabnet/pubtabnet/val'

    test_anno_file1 = '/StructureLabel_val'

    test_img_prefix = '/pubtabnet/pubtabnet/val' test_anno_file1 = '/StructureLabel_val' test = dict( type=dataset_type, img_prefix=test_img_prefix, ann_file=test_anno_file1, loader=dict( type='TableMASTERLmdbLoader', repeat=1, max_seq_len=max_seq_len, parser=dict( type='TableMASTERLmdbParser', keys=['filename', 'text'], keys_idx=[0, 1], separator=' ')), pipeline=test_pipeline, dataset_info='table_master_dataset', test_mode=True)

    data = dict( samples_per_gpu=4, workers_per_gpu=2, train=dict(type='ConcatDataset', datasets=[train1]), val=dict(type='ConcatDataset', datasets=[valid]), test=dict(type='ConcatDataset', datasets=[test]))

    optimizer

    optimizer = dict(type='Ranger', lr=1e-3) optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))

    optimizer_config = dict(grad_clip=None)

    learning policy

    lr_config = dict( policy='step', warmup='linear', warmup_iters=50, warmup_ratio=1.0 / 3, step=[12, 15]) total_epochs = 17

    evaluation

    evaluation = dict(interval=1, metric='acc')

    fp16

    fp16 = dict(loss_scale='dynamic')

    checkpoint setting

    checkpoint_config = dict(interval=1)

    log_config

    log_config = dict( interval=100, hooks=[ dict(type='TextLoggerHook')

    ])
    

    yapf:enable

    dist_params = dict(backend='nccl') log_level = 'INFO' load_from = None resume_from = None workflow = [('train', 1)]

    if raise find unused_parameters, use this.

    find_unused_parameters = True

    ret = input.softmax(dim) RuntimeError: CUDA out of memory. Tried to allocate 30.00 MiB (GPU 0; 10.76 GiB total capacity; 444.10 MiB already allocated; 24.56 MiB free; 604.00 MiB reserved in total by PyTorch)

    it seems need more gpu memory

    opened by cqray1990 0
  • File not found error

    File not found error

    Trying to run table recognition demo script in google colab , gdrive mounted and repo is inside gdrive. While runninh the table recognition demoscript i am getting error below >>>>>>>

    Use load_from_local loader Traceback (most recent call last): File "./table_recognition/demo/demo_cp.py", line 65, in master_inference = Recognition_Inference(args.master_config, args.master_checkpoint) File "/content/drive/MyDrive/work/TableMASTER-mmocr/table_recognition/table_inference.py", line 99, in init super().init(config_file, checkpoint_file) File "/content/drive/MyDrive/work/TableMASTER-mmocr/table_recognition/table_inference.py", line 38, in init self.model = build_model(config_file, checkpoint_file) File "/content/drive/MyDrive/work/TableMASTER-mmocr/table_recognition/table_inference.py", line 25, in build_model model = init_detector(config_file, checkpoint=checkpoint_file, device=device) File "/content/drive/MyDrive/work/TableMASTER-mmocr/mmdetection-2.11.0/mmdet/apis/inference.py", line 31, in init_detector config = mmcv.Config.fromfile(config) File "/usr/local/lib/python3.8/dist-packages/mmcv/utils/config.py", line 254, in fromfile cfg_dict, cfg_text = Config._file2dict(filename, File "/usr/local/lib/python3.8/dist-packages/mmcv/utils/config.py", line 148, in _file2dict mod = import_module(temp_module_name) File "/usr/lib/python3.8/importlib/init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1014, in _gcd_import File "", line 991, in _find_and_load File "", line 975, in _find_and_load_unlocked File "", line 671, in _load_unlocked File "", line 843, in exec_module File "", line 219, in _call_with_frames_removed File "/tmp/tmp8emb1yyv/tmptcqvifrk.py", line 6, in FileNotFoundError: [Errno 2] No such file or directory: './tools/data/alphabet/textline_recognition_alphabet.txt' Exception ignored in: <function _TemporaryFileCloser.del at 0x7fe26a3dc940> Traceback (most recent call last): File "/usr/lib/python3.8/tempfile.py", line 579, in del self.close() File "/usr/lib/python3.8/tempfile.py", line 575, in close unlink(self.name) FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp8emb1yyv/tmptcqvifrk.py'

    opened by VipulAlgoSoul 0
Owner
Jianquan Ye
Jianquan Ye
This is the solution for 2nd rank in Kaggle competition: Feedback Prize - Evaluating Student Writing.

Feedback Prize - Evaluating Student Writing This is the solution for 2nd rank in Kaggle competition: Feedback Prize - Evaluating Student Writing. The

Udbhav Bamba 41 Dec 14, 2022
Simple Linear 2nd ODE Solver GUI - A 2nd constant coefficient linear ODE solver with simple GUI using euler's method

Simple_Linear_2nd_ODE_Solver_GUI Description It is a 2nd constant coefficient li

:) 4 Feb 5, 2022
Xview3 solution - XView3 challenge, 2nd place solution

Xview3, 2nd place solution https://iuu.xview.us/ test split aggregate score publ

Selim Seferbekov 24 Nov 23, 2022
DRIFT is a tool for Diachronic Analysis of Scientific Literature.

About DRIFT is a tool for Diachronic Analysis of Scientific Literature. The application offers user-friendly and customizable utilities for two modes:

Rajaswa Patil 108 Dec 12, 2022
1st Solution For NeurIPS 2021 Competition on ML4CO Dual Task

KIDA: Knowledge Inheritance in Data Aggregation This project releases our 1st place solution on NeurIPS2021 ML4CO Dual Task. Slide and model weights a

MEGVII Research 24 Sep 8, 2022
Kaggle G2Net Gravitational Wave Detection : 2nd place solution

Kaggle G2Net Gravitational Wave Detection : 2nd place solution

Hiroshechka Y 33 Dec 26, 2022
PyTorch code of my ICDAR 2021 paper Vision Transformer for Fast and Efficient Scene Text Recognition (ViTSTR)

Vision Transformer for Fast and Efficient Scene Text Recognition (ICDAR 2021) ViTSTR is a simple single-stage model that uses a pre-trained Vision Tra

Rowel Atienza 198 Dec 27, 2022
Official implementation of SynthTIGER (Synthetic Text Image GEneratoR) ICDAR 2021

?? SynthTIGER: Synthetic Text Image GEneratoR Official implementation of SynthTIGER | Paper | Datasets Moonbin Yim1, Yoonsik Kim1, Han-cheol Cho1, Sun

Clova AI Research 256 Jan 5, 2023
Official implementation for ICDAR 2021 paper "Handwritten Mathematical Expression Recognition with Bidirectionally Trained Transformer"

Handwritten Mathematical Expression Recognition with Bidirectionally Trained Transformer Description Convert offline handwritten mathematical expressi

Wenqi Zhao 87 Dec 27, 2022
Codebase for the solution that won first place and was awarded the most human-like agent in the 2021 NeurIPS Competition MineRL BASALT Challenge.

KAIROS MineRL BASALT Codebase for the solution that won first place and was awarded the most human-like agent in the 2021 NeurIPS Competition MineRL B

Vinicius G. Goecks 37 Oct 30, 2022
Semi Supervised Learning for Medical Image Segmentation, a collection of literature reviews and code implementations.

Semi-supervised-learning-for-medical-image-segmentation. Recently, semi-supervised image segmentation has become a hot topic in medical image computin

Healthcare Intelligence Laboratory 1.3k Jan 3, 2023
Freecodecamp Scientific Computing with Python Certification; Solution for Challenge 2: Time Calculator

Assignment Write a function named add_time that takes in two required parameters and one optional parameter: a start time in the 12-hour clock format

Hellen Namulinda 0 Feb 26, 2022
The 3rd place solution for competition

The 3rd place solution for competition "Lyft Motion Prediction for Autonomous Vehicles" at Kaggle Team behind this solution: Artsiom Sanakoyeu [Homepa

Artsiom 104 Nov 22, 2022
Winning solution of the Indoor Location & Navigation Kaggle competition

This repository contains the code to generate the winning solution of the Kaggle competition on indoor location and navigation organized by Microsoft

Tom Van de Wiele 62 Dec 28, 2022
Team nan solution repository for FPT data-centric competition. Data augmentation, Albumentation, Mosaic, Visualization, KNN application

FPT_data_centric_competition - Team nan solution repository for FPT data-centric competition. Data augmentation, Albumentation, Mosaic, Visualization, KNN application

Pham Viet Hoang (Harry) 2 Oct 30, 2022
Solution of Kaggle competition: Sartorius - Cell Instance Segmentation

Sartorius - Cell Instance Segmentation https://www.kaggle.com/c/sartorius-cell-instance-segmentation Environment setup Build docker image bash .dev_sc

null 68 Dec 9, 2022
Implementation of fast algorithms for Maximum Spanning Tree (MST) parsing that includes fast ArcMax+Reweighting+Tarjan algorithm for single-root dependency parsing.

Fast MST Algorithm Implementation of fast algorithms for (Maximum Spanning Tree) MST parsing that includes fast ArcMax+Reweighting+Tarjan algorithm fo

Miloš Stanojević 11 Oct 14, 2022
NeuroFind - A solution to the to the Task given by the Oberseminar of Messtechnik Institute of TU Dresden in 2021

NeuroFind A solution to the to the Task given by the Oberseminar of Messtechnik

null 1 Jan 20, 2022
This is 2nd term discrete maths project done by UCU students that uses backtracking to solve various problems.

Backtracking Project Sponsors This is a project made by UCU students: Olha Liuba - crossword solver implementation Hanna Yershova - sudoku solver impl

Dasha 4 Oct 17, 2021