Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps[AAAI2021]

Overview

Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps

Here is the code for ssbassline model. We also provide OCR results/features/models. The code is built on top of M4C, where more detailed information can also be found.

Citation

If you use ssbaseline in your work, please cite:

@article{zhu2020simple,
  title={Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps},
  author={Zhu, Qi and Gao, Chenyu and Wang, Peng and Wu, Qi},
  journal={arXiv preprint arXiv:2012.05153},
  year={2020}
}

Installation

First install the repo using

git clone https://github.com/ZephyrZhuQi/ssbaseline.git ~/ssbaseline
cd ~/ssbaseline
python setup.py build develop

Getting Data

We provide SBD-Trans OCR for TextVQA and ST-VQA datasets. The corresponding OCR Faster R-CNN features and Recog-CNN features are also released.

Datasets ImDBs Object Faster R-CNN Features OCR Faster R-CNN Features OCR Recog-CNN Features
TextVQA TextVQA ImDB Open Images TextVQA SBD-Trans OCRs TextVQA SBD-Trans OCRs
ST-VQA ST-VQA ImDB ST-VQA Objects ST-VQA SBD-Trans OCRs ST-VQA SBD-Trans OCRs

Pretrained Models

We release the following pretrained models for ssbaseline on TextVQA.

For the TextVQA dataset, we release: ssbaseline trained with ST-VQA as additional data (our best model) with SBD-Trans.

Datasets Config Files (under configs/vqa/) Pretrained Models Metrics Notes
TextVQA (m4c_textvqa) m4c_textvqa/m4c_with_stvqa.yml ssbaseline_with_stvqa val accuracy - 45.53%; test accuracy - 45.66% SBD-Trans OCRs; ST-VQA as additional data

Training and Evaluation

Please follow the M4C README for the training and evaluation of the M4C model on each dataset.

You might also like...
Official  implementation of
Official implementation of "Dynamic Anchor Learning for Arbitrary-Oriented Object Detection" (AAAI2021).

DAL This project hosts the official implementation for our AAAI 2021 paper: Dynamic Anchor Learning for Arbitrary-Oriented Object Detection [arxiv] [c

Code for KHGT model, AAAI2021

KHGT Code for KHGT accepted by AAAI2021 Please unzip the data files in Datasets/ first. To run KHGT on Yelp data, use python labcode_yelp.py For Movi

Out-of-Town Recommendation with Travel Intention Modeling (AAAI2021)

TrainOR_AAAI21 This is the official implementation of our AAAI'21 paper: Haoran Xin, Xinjiang Lu, Tong Xu, Hao Liu, Jingjing Gu, Dejing Dou, Hui Xiong

 FrankMocap: A Strong and Easy-to-use Single View 3D Hand+Body Pose Estimator
FrankMocap: A Strong and Easy-to-use Single View 3D Hand+Body Pose Estimator

FrankMocap pursues an easy-to-use single view 3D motion capture system developed by Facebook AI Research (FAIR). FrankMocap provides state-of-the-art 3D pose estimation outputs for body, hand, and body+hands in a single system. The core objective of FrankMocap is to democratize the 3D human pose estimation technology, enabling anyone (researchers, engineers, developers, artists, and others) can easily obtain 3D motion capture outputs from videos and images.

A general and strong 3D object detection codebase that supports more methods, datasets and tools (debugging, recording and analysis).

ALLINONE-Det ALLINONE-Det is a general and strong 3D object detection codebase built on OpenPCDet, which supports more methods, datasets and tools (de

Tensors and Dynamic neural networks in Python with strong GPU acceleration
Tensors and Dynamic neural networks in Python with strong GPU acceleration

PyTorch is a Python package that provides two high-level features: Tensor computation (like NumPy) with strong GPU acceleration Deep neural networks b

Tensors and Dynamic neural networks in Python with strong GPU acceleration
Tensors and Dynamic neural networks in Python with strong GPU acceleration

PyTorch is a Python package that provides two high-level features: Tensor computation (like NumPy) with strong GPU acceleration Deep neural networks b

Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"

This repository contains code for the following two papers: VisualBERT: A Simple and Performant Baseline for Vision and Language (arxiv) with a short

Comments
  • ERROR: Key image_feature_2 not found in the SampleList

    ERROR: Key image_feature_2 not found in the SampleList

    ❓ Questions and Help

    I am trying to generate the EvalAI prediction files for the TextVQA test set using SSBaseline model, but I am facing the following error:-

    Code to run the predictions python tools/run.py --tasks vqa --datasets m4c_textvqa --model m4c --config configs/vqa/m4c_textvqa/m4c_with_stvqa.yml --save_dir save/m4c --run_type inference --evalai_inference 1 --resume_file data/models/best_textvqa_withStvqa.ckpt

    Error

    2021-03-30T13:13:19 ERROR: Key image_feature_2 not found in the SampleList. Valid choices are ['question_id', 'image_id', 'image_feature_0', 'image_info_0', 'image_feature_1', 'image_info_1', 'text', 'text_len', 'obj_bbox_coordinates', 'context', 'context_tokens', 'context_tokens_enc', 'context_feature_0', 'context_info_0', 'context_feature_1', 'context_info_1', 'order_vectors', 'ocr_bbox_coordinates', 'sampled_idx_seq', 'train_prev_inds', 'dataset_type', 'dataset_name', 'dataset_type_', 'dataset_name_'] Traceback (most recent call last): File "tools/run.py", line 86, in run() File "tools/run.py", line 75, in run trainer.train() File "/home/pratyush/ssbaseline/pythia/trainers/base_trainer.py", line 198, in train self.inference() File "/home/pratyush/ssbaseline/pythia/trainers/base_trainer.py", line 427, in inference self._inference_run("test") File "/home/pratyush/ssbaseline/pythia/trainers/base_trainer.py", line 431, in _inference_run self.predict_for_evalai(dataset_type) File "/home/pratyush/ssbaseline/pythia/trainers/base_trainer.py", line 472, in predict_for_evalai model_output = self.model(prepared_batch) File "/home/pratyush/ssbaseline/pythia/models/base_model.py", line 120, in call model_output = super().call(sample_list, *args, **kwargs) File "/home/pratyush/.virtualenvs/ssbase/lib/python3.8/site-packages/torch-1.4.0-py3.8-linux-x86_64.egg/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/home/pratyush/ssbaseline/pythia/models/m4c.py", line 209, in forward self._forward_ocr_encoding(sample_list, fwd_results) File "/home/pratyush/ssbaseline/pythia/models/m4c.py", line 263, in _forward_ocr_encoding ocr_recogcnn = sample_list.image_feature_2[:, :ocr_fasttext.size(1), :] File "/home/pratyush/ssbaseline/pythia/common/sample.py", line 145, in getattr raise AttributeError( AttributeError: Key image_feature_2 not found in the SampleList. Valid choices are ['question_id', 'image_id', 'image_feature_0', 'image_info_0', 'image_feature_1', 'image_info_1', 'text', 'text_len', 'obj_bbox_coordinates', 'context', 'context_tokens', 'context_tokens_enc', 'context_feature_0', 'context_info_0', 'context_feature_1', 'context_info_1', 'order_vectors', 'ocr_bbox_coordinates', 'sampled_idx_seq', 'train_prev_inds', 'dataset_type', 'dataset_name', 'dataset_type_', 'dataset_name_']

    Kindly, help me with this error or point me in the correct direction to resolve this issue.

    Thanks in advance.

    opened by GoelPratyush 1
  • faced error installing demjson when running setup.py

    faced error installing demjson when running setup.py

    ❓ Questions and Help

    Hi, I am facing error when runningpython setup.py build develop Environment: Conda Python 3.7.11 PyTorch 1.9.1 CUDA 10.2

    Can you share the version of Python, PyTorch and other relevant libraries. Thanks The error message :

    running build
    running build_py
    running build_ext
    running develop
    running egg_info
    writing pythia.egg-info/PKG-INFO
    writing dependency_links to pythia.egg-info/dependency_links.txt
    writing requirements to pythia.egg-info/requires.txt
    writing top-level names to pythia.egg-info/top_level.txt
    reading manifest file 'pythia.egg-info/SOURCES.txt'
    adding license file 'LICENSE'
    writing manifest file 'pythia.egg-info/SOURCES.txt'
    running build_ext
    copying build/lib.linux-x86_64-3.7/cphoc.cpython-37m-x86_64-linux-gnu.so -> 
    Creating /home/cybertron/anaconda3/envs/ss/lib/python3.7/site-packages/pythia.egg-link (link to .)
    Removing pythia 0.3 from easy-install.pth file
    Adding pythia 0.3 to easy-install.pth file
    
    Installed /home/cybertron/ssbaseline
    Processing dependencies for pythia==0.3
    Searching for fasttext==0.9.1
    Reading https://pypi.org/simple/fasttext/
    Downloading https://files.pythonhosted.org/packages/10/61/2e01f1397ec533756c1d893c22d9d5ed3fce3a6e4af1976e0d86bb13ea97/fasttext-0.9.1.tar.gz#sha256=6ead9c6aafe985472066e27c43e33f581b192befd136a84c3c2e8197e7e05be6
    Best match: fasttext 0.9.1
    Processing fasttext-0.9.1.tar.gz
    Writing /tmp/easy_install-u9gat7ds/fasttext-0.9.1/setup.cfg
    Running fasttext-0.9.1/setup.py -q bdist_egg --dist-dir /tmp/easy_install-u9gat7ds/fasttext-0.9.1/egg-dist-tmp-l97kscpt
    /home/cybertron/anaconda3/envs/ss/lib/python3.7/site-packages/setuptools/dist.py:720: UserWarning: Usage of dash-separated 'description-file' will not be supported in future versions. Please use the underscore name 'description_file' instead
      % (opt, underscore_opt)
    warning: no files found matching 'PATENTS'
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    python/fasttext_module/fasttext/pybind/fasttext_pybind.cc: In lambda function:
    python/fasttext_module/fasttext/pybind/fasttext_pybind.cc:227:35: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
                 for (int32_t i = 0; i < vocab_freq.size(); i++) {
                                     ~~^~~~~~~~~~~~~~~~~~~
    python/fasttext_module/fasttext/pybind/fasttext_pybind.cc: In lambda function:
    python/fasttext_module/fasttext/pybind/fasttext_pybind.cc:241:35: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
                 for (int32_t i = 0; i < labels_freq.size(); i++) {
                                     ~~^~~~~~~~~~~~~~~~~~~~
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    src/fasttext.cc: In member function ‘void fasttext::FastText::getWordVector(fasttext::Vector&, const string&) const’:
    src/fasttext.cc:92:21: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       for (int i = 0; i < ngrams.size(); i++) {
                       ~~^~~~~~~~~~~~~~~
    src/fasttext.cc: In lambda function:
    src/fasttext.cc:302:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
         return eosid == i1 || (eosid != i2 && norms[i1] > norms[i2]);
                ~~~~~~^~~~~
    src/fasttext.cc:302:34: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
         return eosid == i1 || (eosid != i2 && norms[i1] > norms[i2]);
                                ~~~~~~^~~~~
    src/fasttext.cc: In member function ‘void fasttext::FastText::quantize(const fasttext::Args&)’:
    src/fasttext.cc:322:40: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       if (qargs.cutoff > 0 && qargs.cutoff < input->size(0)) {
                               ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~
    src/fasttext.cc:323:45: warning: ‘std::vector<int> fasttext::FastText::selectEmbeddings(int32_t) const’ is deprecated: selectEmbeddings is being deprecated. [-Wdeprecated-declarations]
         auto idx = selectEmbeddings(qargs.cutoff);
                                                 ^
    src/fasttext.cc:293:22: note: declared here
     std::vector<int32_t> FastText::selectEmbeddings(int32_t cutoff) const {
                          ^~~~~~~~
    src/fasttext.cc:327:24: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
         for (auto i = 0; i < idx.size(); i++) {
                          ~~^~~~~~~~~~~~
    src/fasttext.cc: In member function ‘void fasttext::FastText::cbow(fasttext::Model::State&, fasttext::real, const std::vector<int>&)’:
    src/fasttext.cc:380:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       for (int32_t w = 0; w < line.size(); w++) {
                           ~~^~~~~~~~~~~~~
    src/fasttext.cc:384:41: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
           if (c != 0 && w + c >= 0 && w + c < line.size()) {
                                       ~~~~~~^~~~~~~~~~~~~
    src/fasttext.cc: In member function ‘void fasttext::FastText::skipgram(fasttext::Model::State&, fasttext::real, const std::vector<int>&)’:
    src/fasttext.cc:398:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       for (int32_t w = 0; w < line.size(); w++) {
                           ~~^~~~~~~~~~~~~
    src/fasttext.cc:402:41: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
           if (c != 0 && w + c >= 0 && w + c < line.size()) {
                                       ~~~~~~^~~~~~~~~~~~~
    src/fasttext.cc: In member function ‘void fasttext::FastText::getSentenceVector(std::istream&, fasttext::Vector&)’:
    src/fasttext.cc:479:27: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
         for (int32_t i = 0; i < line.size(); i++) {
                             ~~^~~~~~~~~~~~~
    src/fasttext.cc: In member function ‘std::vector<std::pair<std::__cxx11::basic_string<char>, fasttext::Vector> > fasttext::FastText::getNgramVectors(const string&) const’:
    src/fasttext.cc:514:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       for (int32_t i = 0; i < ngrams.size(); i++) {
                           ~~^~~~~~~~~~~~~~~
    src/fasttext.cc: In member function ‘void fasttext::FastText::lazyComputeWordVectors()’:
    src/fasttext.cc:551:40: warning: ‘void fasttext::FastText::precomputeWordVectors(fasttext::DenseMatrix&)’ is deprecated: precomputeWordVectors is being deprecated. [-Wdeprecated-declarations]
         precomputeWordVectors(*wordVectors_);
                                            ^
    src/fasttext.cc:534:6: note: declared here
     void FastText::precomputeWordVectors(DenseMatrix& wordVectors) {
          ^~~~~~~~
    src/fasttext.cc: In member function ‘std::vector<std::pair<float, std::__cxx11::basic_string<char> > > fasttext::FastText::getNN(const fasttext::DenseMatrix&, const fasttext::Vector&, int32_t, const std::set<std::__cxx11::basic_string<char> >&)’:
    src/fasttext.cc:585:23: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
           if (heap.size() == k && similarity < heap.front().first) {
               ~~~~~~~~~~~~^~~~
    src/fasttext.cc:590:23: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
           if (heap.size() > k) {
               ~~~~~~~~~~~~^~~
    src/fasttext.cc: In member function ‘std::shared_ptr<fasttext::Matrix> fasttext::FastText::getInputMatrixFromFile(const string&) const’:
    src/fasttext.cc:701:24: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       for (size_t i = 0; i < n; i++) {
                          ~~^~~
    src/fasttext.cc:706:26: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
         for (size_t j = 0; j < dim; j++) {
                            ~~^~~~~
    src/fasttext.cc:718:24: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       for (size_t i = 0; i < n; i++) {
                          ~~^~~
    src/fasttext.cc:723:26: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
         for (size_t j = 0; j < dim; j++) {
                            ~~^~~~~
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    src/loss.cc: In member function ‘void fasttext::Loss::findKBest(int32_t, fasttext::real, fasttext::Predictions&, const fasttext::Vector&) const’:
    src/loss.cc:83:21: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
         if (heap.size() == k && std_log(output[i]) < heap.front().first) {
             ~~~~~~~~~~~~^~~~
    src/loss.cc:88:21: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
         if (heap.size() > k) {
             ~~~~~~~~~~~~^~~
    src/loss.cc: In member function ‘virtual fasttext::real fasttext::HierarchicalSoftmaxLoss::forward(const std::vector<int>&, int32_t, fasttext::Model::State&, fasttext::real, bool)’:
    src/loss.cc:257:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       for (int32_t i = 0; i < pathToRoot.size(); i++) {
                           ~~^~~~~~~~~~~~~~~~~~~
    src/loss.cc: In member function ‘void fasttext::HierarchicalSoftmaxLoss::dfs(int32_t, fasttext::real, int32_t, fasttext::real, fasttext::Predictions&, const fasttext::Vector&) const’:
    src/loss.cc:282:19: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       if (heap.size() == k && score < heap.front().first) {
           ~~~~~~~~~~~~^~~~
    src/loss.cc:289:21: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
         if (heap.size() > k) {
             ~~~~~~~~~~~~^~~
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    src/productquantizer.cc: In member function ‘void fasttext::ProductQuantizer::load(std::istream&)’:
    src/productquantizer.cc:246:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       for (auto i = 0; i < centroids_.size(); i++) {
                        ~~^~~~~~~~~~~~~~~~~~~
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    src/args.cc: In member function ‘void fasttext::Args::parseArgs(const std::vector<std::__cxx11::basic_string<char> >&)’:
    src/args.cc:93:23: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       for (int ai = 2; ai < args.size(); ai += 2) {
                        ~~~^~~~~~~~~~~~~
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    src/dictionary.cc: In member function ‘void fasttext::Dictionary::computeSubwords(const string&, std::vector<int>&, std::vector<std::__cxx11::basic_string<char> >*) const’:
    src/dictionary.cc:181:52: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
         for (size_t j = i, n = 1; j < word.size() && n <= args_->maxn; n++) {
                                                      ~~^~~~~~~~~~~~~~
    src/dictionary.cc:186:13: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
           if (n >= args_->minn && !(n == 1 && (i == 0 || j == word.size()))) {
               ~~^~~~~~~~~~~~~~
    src/dictionary.cc: In member function ‘void fasttext::Dictionary::initNgrams()’:
    src/dictionary.cc:198:24: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       for (size_t i = 0; i < size_; i++) {
                          ~~^~~~~~~
    src/dictionary.cc: In member function ‘void fasttext::Dictionary::initTableDiscard()’:
    src/dictionary.cc:296:24: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       for (size_t i = 0; i < size_; i++) {
                          ~~^~~~~~~
    src/dictionary.cc: In member function ‘void fasttext::Dictionary::addWordNgrams(std::vector<int>&, const std::vector<int>&, int32_t) const’:
    src/dictionary.cc:316:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       for (int32_t i = 0; i < hashes.size(); i++) {
                           ~~^~~~~~~~~~~~~~~
    src/dictionary.cc:318:31: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
         for (int32_t j = i + 1; j < hashes.size() && j < i + n; j++) {
                                 ~~^~~~~~~~~~~~~~~
    src/dictionary.cc: In member function ‘void fasttext::Dictionary::prune(std::vector<int>&)’:
    src/dictionary.cc:515:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       for (int32_t i = 0; i < words_.size(); i++) {
                           ~~^~~~~~~~~~~~~~~
    src/dictionary.cc:517:12: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
             (j < words.size() && words[j] == i)) {
              ~~^~~~~~~~~~~~~~
    creating /home/cybertron/anaconda3/envs/ss/lib/python3.7/site-packages/fasttext-0.9.1-py3.7-linux-x86_64.egg
    Extracting fasttext-0.9.1-py3.7-linux-x86_64.egg to /home/cybertron/anaconda3/envs/ss/lib/python3.7/site-packages
    Removing fasttext 0.9.2 from easy-install.pth file
    Adding fasttext 0.9.1 to easy-install.pth file
    
    Installed /home/cybertron/anaconda3/envs/ss/lib/python3.7/site-packages/fasttext-0.9.1-py3.7-linux-x86_64.egg
    Searching for demjson>=2.2
    Reading https://pypi.org/simple/demjson/
    Downloading https://files.pythonhosted.org/packages/96/67/6db789e2533158963d4af689f961b644ddd9200615b8ce92d6cad695c65a/demjson-2.2.4.tar.gz#sha256=31de2038a0fdd9c4c11f8bf3b13fe77bc2a128307f965c8d5fb4dc6d6f6beb79
    Best match: demjson 2.2.4
    Processing demjson-2.2.4.tar.gz
    Writing /tmp/easy_install-zh3undwn/demjson-2.2.4/setup.cfg
    Running demjson-2.2.4/setup.py -q bdist_egg --dist-dir /tmp/easy_install-zh3undwn/demjson-2.2.4/egg-dist-tmp-5b8hbd5l
    error: Setup script exited with error in demjson setup command: use_2to3 is invalid.
    
    
    opened by soonchangAI 0
  • Textcaps task

    Textcaps task

    ❓ Questions and Help

    Hi, thank you for sharing your code. What files do I need to modify if I want to use this model in the Textcaps task? Can the dataset use textVQA parts directly, and is the configuration file also an M4C.yML file? Looking forward to your reply!

    opened by Caroline0728 0
  • size mismatch for linear_ocr_feat_to_mmt_in.weight: copying a param with shape torch.Size([768, 3002]) from checkpoint, the shape in current model is torch.Size([768, 3464]).

    size mismatch for linear_ocr_feat_to_mmt_in.weight: copying a param with shape torch.Size([768, 3002]) from checkpoint, the shape in current model is torch.Size([768, 3464]).

    ❓ Questions and Help

    I am getting size match error in checkpoint and I am not able to understand how to change the dimension in the checkpoint. Please help!

    opened by sarthak-sg 1
Owner
ZephyrZhuQi
Visual and linguistic reasoning.
ZephyrZhuQi
This repo is developed for Strong Baseline For Vehicle Re-Identification in Track 2 Ai-City-2021 Challenges

A STRONG BASELINE FOR VEHICLE RE-IDENTIFICATION This paper is accepted to the IEEE Conference on Computer Vision and Pattern Recognition Workshop(CVPR

Cybercore Co. Ltd 78 Dec 29, 2022
A Strong Baseline for Image Semantic Segmentation

A Strong Baseline for Image Semantic Segmentation Introduction This project is an open source semantic segmentation toolbox based on PyTorch. It is ba

Clark He 49 Sep 20, 2022
A tiny, friendly, strong baseline code for Person-reID (based on pytorch).

Pytorch ReID Strong, Small, Friendly A tiny, friendly, strong baseline code for Person-reID (based on pytorch). Strong. It is consistent with the new

Zhedong Zheng 3.5k Jan 8, 2023
Image-retrieval-baseline - MUGE Multimodal Retrieval Baseline

MUGE Multimodal Retrieval Baseline This repo is implemented based on the open_cl

null 47 Dec 16, 2022
Image-generation-baseline - MUGE Text To Image Generation Baseline

MUGE Text To Image Generation Baseline Requirements and Installation More detail

null 23 Oct 17, 2022
Jingju baseline - A baseline model of our project of Beijing opera script generation

Jingju Baseline It is a baseline of our project about Beijing opera script gener

midon 1 Jan 14, 2022
The code of “Similarity Reasoning and Filtration for Image-Text Matching” [AAAI2021]

SGRAF PyTorch implementation for AAAI2021 paper of “Similarity Reasoning and Filtration for Image-Text Matching”. It is built on top of the SCAN and C

Ronnie_IIAU 149 Dec 22, 2022
Implementation for our AAAI2021 paper (Entity Structure Within and Throughout: Modeling Mention Dependencies for Document-Level Relation Extraction).

SSAN Introduction This is the pytorch implementation of the SSAN model (see our AAAI2021 paper: Entity Structure Within and Throughout: Modeling Menti

benfeng 69 Nov 15, 2022
[AAAI2021] The source code for our paper 《Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion》.

DSM The source code for paper Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion Project Website; Datasets li

Jinpeng Wang 114 Oct 16, 2022
Implementation of our paper 'RESA: Recurrent Feature-Shift Aggregator for Lane Detection' in AAAI2021.

RESA PyTorch implementation of the paper "RESA: Recurrent Feature-Shift Aggregator for Lane Detection". Our paper has been accepted by AAAI2021. Intro

null 137 Jan 2, 2023