Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps
Here is the code for ssbassline model. We also provide OCR results/features/models. The code is built on top of M4C, where more detailed information can also be found.
Citation
If you use ssbaseline in your work, please cite:
@article{zhu2020simple,
title={Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps},
author={Zhu, Qi and Gao, Chenyu and Wang, Peng and Wu, Qi},
journal={arXiv preprint arXiv:2012.05153},
year={2020}
}
Installation
First install the repo using
git clone https://github.com/ZephyrZhuQi/ssbaseline.git ~/ssbaseline
cd ~/ssbaseline
python setup.py build develop
Getting Data
We provide SBD-Trans OCR for TextVQA and ST-VQA datasets. The corresponding OCR Faster R-CNN features and Recog-CNN features are also released.
Datasets | ImDBs | Object Faster R-CNN Features | OCR Faster R-CNN Features | OCR Recog-CNN Features |
---|---|---|---|---|
TextVQA | TextVQA ImDB | Open Images | TextVQA SBD-Trans OCRs | TextVQA SBD-Trans OCRs |
ST-VQA | ST-VQA ImDB | ST-VQA Objects | ST-VQA SBD-Trans OCRs | ST-VQA SBD-Trans OCRs |
Pretrained Models
We release the following pretrained models for ssbaseline on TextVQA.
For the TextVQA dataset, we release: ssbaseline trained with ST-VQA as additional data (our best model) with SBD-Trans.
Datasets | Config Files (under configs/vqa/ ) |
Pretrained Models | Metrics | Notes |
---|---|---|---|---|
TextVQA (m4c_textvqa ) |
m4c_textvqa/m4c_with_stvqa.yml |
ssbaseline_with_stvqa |
val accuracy - 45.53%; test accuracy - 45.66% | SBD-Trans OCRs; ST-VQA as additional data |
Training and Evaluation
Please follow the M4C README for the training and evaluation of the M4C model on each dataset.