TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)

Related tags

Deep Learning TAP
Overview

TAP: Text-Aware Pre-training

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption

by Zhengyuan Yang, Yijuan Lu, Jianfeng Wang, Xi Yin, Dinei Florencio, Lijuan Wang, Cha Zhang, Lei Zhang, and Jiebo Luo

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021, Oral

Introduction

We propose Text-Aware Pre-training (TAP) for Text-VQA and Text-Caption tasks. For more details, please refer to our paper.

Citation

@inproceedings{yang2021tap,
  title={TAP: Text-Aware Pre-training for Text-VQA and Text-Caption},
  author={Yang, Zhengyuan and Lu, Yijuan and Wang, Jianfeng and Yin, Xi and Florencio, Dinei and Wang, Lijuan and Zhang, Cha and Zhang, Lei and Luo, Jiebo},
  booktitle={CVPR},
  year={2021}
}

Prerequisites

  • Python 3.6

  • Pytorch 1.4.0

  • Please refer to requirements.txt. Or using

    python setup.py develop
    

Installation

  1. Clone the repository

    git clone https://github.com/microsoft/TAP.git
    cd TAP
    python setup.py develop
    
  2. Data

  • Please refer to the Readme in the data folder.

Training

  1. Train the model, run the code under main folder. Using flag --pretrain to access the pre-training mode, otherwise the main QA/Captioning losses are used to optimize the model. Example yml files are in configs folder. Detailed configs are in released models.

    Pre-training:

    python -m torch.distributed.launch --nproc_per_node $num_gpu tools/run.py --pretrain --tasks vqa --datasets $dataset --model $model --seed $seed --config configs/vqa/$dataset/"$pretrain_yml".yml --save_dir save/$pretrain_savedir training_parameters.distributed True
    
    # for example
    python -m torch.distributed.launch --nproc_per_node 4 tools/run.py --pretrain --tasks vqa --datasets m4c_textvqa --model m4c_split --seed 13 --config configs/vqa/m4c_textvqa/tap_base_pretrain.yml --save_dir save/m4c_split_pretrain_test training_parameters.distributed True
    

    Fine-tuning:

    python -m torch.distributed.launch --nproc_per_node $num_gpu tools/run.py --tasks vqa --datasets $dataset --model $model --seed $seed --config configs/vqa/$dataset/"$refine_yml".yml --save_dir save/$refine_savedir --resume_file save/$pretrain_savedir/$savename/best.ckpt training_parameters.distributed True
    
    # for example
    python -m torch.distributed.launch --nproc_per_node 4 tools/run.py --tasks vqa --datasets m4c_textvqa --model m4c_split --seed 13 --config configs/vqa/m4c_textvqa/tap_refine.yml --save_dir save/m4c_split_refine_test --resume_file save/pretrained/textvqa_tap_base_pretrain.ckpt training_parameters.distributed True
    
  2. Evaluate the model, run the code under main folder. Set up val or test set by --run_type.

    python -m torch.distributed.launch --nproc_per_node $num_gpu tools/run.py --tasks vqa --datasets $dataset --model $model --config configs/vqa/$dataset/"$refine_yml".yml --save_dir save/$refine_savedir --run_type val --resume_file save/$refine_savedir/$savename/best.ckpt training_parameters.distributed True
    
    # for example
    python -m torch.distributed.launch --nproc_per_node 4 tools/run.py --tasks vqa --datasets m4c_textvqa --model m4c_split --config configs/vqa/m4c_textvqa/tap_refine.yml --save_dir save/m4c_split_refine_test --run_type val --resume_file save/finetuned/textvqa_tap_base_best.ckpt training_parameters.distributed True
    
  3. Captioning evaluation.

    python projects/M4C_Captioner/scripts/textcaps_eval.py --set val --pred_file YOUR_VAL_PREDICTION_FILE
    

Performance and Pre-trained Models

Please check the detailed experiment settings in our paper.

Model checkpoints (~17G).

path/to/azcopy copy https://tapvqacaption.blob.core.windows.net/data/save <local_path>/save --recursive

Please refer to the Readme in the data folder for the detailed instructions on azcopy downloading.

Text-VQA TAP TAP** (with extra data)
TextVQA 49.91 54.71
STVQA 45.29 50.83
Text-Captioning TAP TAP** (with extra data)
TextCaps 105.05 109.16

Credits

The project is built based on the following repository:

Comments
  • Fail to get the data: AuthenticationErrorDetail: Issuer validation failed. Issuer did not match.

    Fail to get the data: AuthenticationErrorDetail: Issuer validation failed. Issuer did not match.

    After I ran azcopy login to authorize a user identity and it finally showede INFO: Login succeeded., I tried to run the download command azcopy copy https://tapvqacaption.blob.core.windows.net/data/data ./ --recursive But I got a 401 failed, the detailed error information is as follows.

    INFO: Scanning...                                                                                                                                           INFO: Authenticating to source using Azure AD
    INFO: Any empty folders will not be processed, because source and/or destination doesn't have full folder support
    
    failed to perform copy command due to error: cannot start job due to error: cannot list files due to reason -> github.com/Azure/azure-storage-blob-go/azblob.newStorageError, /home/vsts/go/pkg/mod/github.com/!azure/[email protected]/azblob/zc_storage_error.go:42
    ===== RESPONSE ERROR (ServiceCode=InvalidAuthenticationInfo) =====
    Description=Server failed to authenticate the request. Please refer to the information in the www-authenticate header.
    RequestId:3cd53a6d-601e-00c3-71ff-675565000000
    Time:2021-06-23T07:13:31.2032025Z, Details:
       AuthenticationErrorDetail: Issuer validation failed. Issuer did not match.
       Code: InvalidAuthenticationInfo
       GET https://tapvqacaption.blob.core.windows.net/data?comp=list&delimiter=%2F&include=metadata&prefix=data%2F&restype=container&timeout=901
       Authorization: REDACTED
       User-Agent: [AzCopy/10.11.0 Azure-Storage/0.13 (go1.15; linux)]
       X-Ms-Client-Request-Id: [2c0efb91-40c3-4dd0-4634-aebdd3eeda04]
       X-Ms-Version: [2019-12-12]
       --------------------------------------------------------------------------------
       RESPONSE Status: 401 Server failed to authenticate the request. Please refer to the information in the www-authenticate header.
       Content-Length: [402]                                                                                                                                       Content-Type: [application/xml]
       Date: [Wed, 23 Jun 2021 07:13:31 GMT]
       Server: [Microsoft-HTTPAPI/2.0]
       Www-Authenticate: [Bearer authorization_uri=https://login.microsoftonline.com/72f988bf-86f1-41af-91ab-2d7cd011db47/oauth2/authorize resource_id=https://storage.azure.com]
       X-Ms-Error-Code: [InvalidAuthenticationInfo]
       X-Ms-Request-Id: [3cd53a6d-601e-00c3-71ff-675565000000]
    

    I wonder know which step I did wrong and what should I do to download the data.

    opened by Loycine 3
  • Error during finetuning base model

    Error during finetuning base model

    Hi I encountered an error when I try to further finetune the base model. During validation check, there is a warning Token indices sequence length is longer than the specified maximum sequence length for this model (599 > 512). Running this sequence through the model will result in indexing errors How do I fix this ? Log:

    2022-04-15T21:01:58 INFO: m4c_textvqa:, 41100/41100, train/total_loss: 0.6898 (0.6971), train/m4c_textvqa/m4c_decoding_bce_with_mask: 0.6898 (0.6971), train/m4c_textvqa/textvqa_accuracy: 0.8406 (0.8330), val/total_loss: 6.9965, val/m4c_textvqa/m4c_decoding_bce_with_mask: 6.9965, val/m4c_textvqa/textvqa_accuracy: 0.4969, max mem: 6524.0, lr: 0., time: 01m 07s 802ms, eta: 
    2022-04-15T21:01:58 INFO: Stepping into final validation check
    2022-04-15T21:01:58 INFO: Evaluation time. Running on full validation set...
    Token indices sequence length is longer than the specified maximum sequence length for this model (599 > 512). Running this sequence through the model will result in indexing errors
    intemediate model saving skipped. utiles/checkpoint, 41101
    2022-04-15T21:05:21 INFO: m4c_textvqa: full val:, 41101/41100, val/total_loss: 6.5700, val/m4c_textvqa/m4c_decoding_bce_with_mask: 6.5700, val/m4c_textvqa/textvqa_accuracy: 0.4969, validation time: 04m 31s 394ms, best iteration: 41000, best val/m4c_textvqa/textvqa_accuracy: 0.499082
    2022-04-15T21:05:21 INFO: Restoring checkpoint
    2022-04-15T21:05:23 INFO: Starting inference on test set
      0%|                                                                                                                 | 0/180 [00:00<?, ?it/s]2022-04-15T21:05:25 WARNING: /home/cybertron/TAP/pythia/modules/losses.py:93: UserWarning: Sample list has not field 'targets', are you sure that your ImDB has labels? you may have wanted to run with --evalai_inference 1
      "Sample list has not field 'targets', are you "
    
    2022-04-15T21:05:25 WARNING: /home/cybertron/TAP/pythia/modules/losses.py:93: UserWarning: Sample list has not field 'targets', are you sure that your ImDB has labels? you may have wanted to run with --evalai_inference 1
      "Sample list has not field 'targets', are you "
    
    opened by soonchangAI 2
  • Microsoft OCR data could not be found

    Microsoft OCR data could not be found

    Hello, I would like to know whether there are IMBD files and extracted features corresponding to Microsoft OCR in the data provided? I did not find the corresponding file, could you please clearly point out the corresponding path of each data? The IMDB files in the figure all seem to correspond to Rosetta OCR. a00359b0ff0f0505e510b488b4c3a90

    opened by zhousheng97 2
  • TextCaps json file missing for TextVQA

    TextCaps json file missing for TextVQA

    Hello, I would like to train the TextVQA model with the extra data of TextCaps, however, the file 'TextCaps_0.1_train.json' is not provided in Line https://github.com/microsoft/TAP/blob/352891f93c75ac5d6b9ba141bbe831477dcdd807/pythia/datasets/vqa/m4c_textvqa/dataset.py#L36. Thanks!

    opened by HenryJunW 2
  • Question about reproduce result

    Question about reproduce result

    Hi! I reproduce the TAP(w/o others) and the final accuracy is about 46.2% on the validation set. But it is reported the 49.91% on val set in the paper. Are there any details that I ignored? Or what is the reason for that? Of course, due to insufficient memory, I can only set the batch size to 32, which is different from 128 in the paper. Thanks a lot! image

    opened by JayZhu0104 2
  • What is the val set of pre-train

    What is the val set of pre-train

    Hi! After downloading OCR-CC features, I found that there were only feature files of training set. But I noticed that the IMDB file contains information about the val set. And the 'tap_base_pretrain.yml' file needs to fill in the val set and test set. What should be filled in this part? Thanks a lot! image

    opened by JayZhu0104 2
  • About the downloading errors  by using azcopy

    About the downloading errors by using azcopy

    Hi, when I used azcopy command to download the data, the connection always reset by the remote host in the middle. I have tried this for two weeks but the error still remain... So, may I know any other alternative ways downloading the data? Thanks a lot!

    opened by Glupapa 2
  • Reproduce of checkpoints

    Reproduce of checkpoints

    Dear authors:

    I download the checkpoints Model checkpoints (~17G) and evaluate the model using the following code:

    python tools/run.py --tasks vqa --datasets m4c_textvqa --model m4c_split --config configs/vqa/m4c_textvqa/tap_refine.yml --save_dir save/m4c_split_refine_test --run_type val --resume_file save/finetuned/textvqa_tap_base_best.ckpt

    I got the following results:

    2022-03-24T11:13:42 INFO: m4c_textvqa: full val:, 41000/24000, val/total_loss: 7.9873, val/m4c_textvqa/m4c_decoding_bce_with_mask: 7.9873, val/m4c_textvqa/textvqa_accuracy: 0.4413

    And I found an error prompt during the evaluation:

    Token indices sequence length is longer than the specified maximum sequence length for this model (599 > 512). Running this sequence through the model will result in indexing errors

    In my opinion, the accuracy should be 0.4991 as shown in the following table: 截屏2022-03-24 11 15 53

    What's wrong with my operations? Is there something to do with the error I encounter?

    By the way, when I use the OCR-CC checkpoints: save/finetuned/textvqa_tap_ocrcc_best.ckpt, the accuracy is 0.4934 (which should be 0.5471), and I found the same error as mentioned above.

    The GPU and PyTorch version is as following:

    2022-03-24T11:09:34 INFO: CUDA Device 0 is: Tesla V100-SXM2-16GB 2022-03-24T11:09:37 INFO: Torch version is: 1.4.0

    Hope to get your response

    Thanks

    opened by kangzhao2 1
  • No targets for training

    No targets for training

    When I train the VQA model I get the warning "Sample list has not field 'targets', are you sure that your ImDB has labels? you may have wanted to run with --evalai_inference 1"

    I executed the same command as mentioned: python -m torch.distributed.launch --nproc_per_node 4 tools/run.py --tasks vqa --datasets m4c_textvqa --model m4c_split --seed 13 --config configs/vqa/m4c_textvqa/tap_refine.yml --save_dir save/m4c_split_refine_test --resume_file save/pretrained/textvqa_tap_base_pretrain.ckpt training_parameters.distributed True

    Can you provide additional details on this and how to train the model with the targets? And can you point out where the targets and the predictions are getting compared to compute loss?

    opened by abhinavkcs11 1
  • Require OCR-CC information (image IDs)

    Require OCR-CC information (image IDs)

    Hello @zyang-ur, and all

    Thanks for this work, it is quite interesting.

    I'm trying to obtain the OCR-CC dataset but due to my constraints, I can't download the 1.7TB dataset. However, I have the CC dataset and it would be possible for me to obtain the subset of images that are in OCR-CC.

    Could you please share the image IDs of CC that were used to construct OCR-CC?

    Thanks in advance!

    opened by prajwalgatti 1
  • About the number of OCR in stvqa dataset

    About the number of OCR in stvqa dataset

    Hi! I found that the number of words detected by OCR in some pictures in stvqa dataset is inconsistent with the corresponding feature number. For example, the number of features in 'feat_resx/stvqa/train/imageNet/n03196217_ 7957. npy' is 33, while the number of OCR words in the corresponding 'ocr_ feat_ resx/stvqa_ conf/train/imageNet/n03196217_ 7957_info. npy' is 55. The two numbers do not match. About 2000 pictures have this problem in train dataset. image

    opened by JayZhu0104 1
  • how to convert multi distributed GPU to single GPU

    how to convert multi distributed GPU to single GPU

    in this project we should use multi GPU . How to change source code to could use for single GPU systems because when I run this code with system that has single GPU(Geforce gtx ti), I have error and I could not run project. python -m torch.distributed.launch --nproc_per_node 1 tools/run.py --pretrain --tasks vqa --datasets m4c_textvqa --model m4c_split --seed 13 --config configs/vqa/m4c_textvqa/tap_base_pretrain.yml --save_dir save/m4c_split_pretrain_test training_parameters.distributed True

    opened by kobrafarshidi 0
  • Where is the text images in CC-OCR?

    Where is the text images in CC-OCR?

    Hello! When I try to download the link OCR-CC Data (Huge, ~1.3T), I find the CC-OCR dataset does not contain text images. So I would like to know where to get these images.

    389)UP~HOSTP0NJ9WCBL9ZH

    opened by TongkunGuan 4
  • A KEYERROR need to be solved--Emergency!

    A KEYERROR need to be solved--Emergency!

    error raise: File "/home/lianjunliang/anaconda3/envs/TAP/pythia/datasets/vqa/m4c_textvqa/dataset.py", line 112, in load_item [self.object_clsname[x] for x in features['image_info_0']['objects']] KeyError: 'objects'

    we print fearures,here is what it comes: 2022-07-20T19:53:09 INFO: Starting training... ******************** {'image_feature_0': tensor([[0.6171, 0.5207, 0.0000, ..., 6.4700, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, ..., 2.4846, 0.0000, 0.0000], [0.0000, 0.0000, 1.2065, ..., 0.0000, 0.0000, 0.0000], ..., [0.0000, 0.0000, 1.0070, ..., 5.8578, 5.1698, 0.0000], [4.3617, 0.0000, 0.0000, ..., 0.0000, 7.6311, 0.0000], [0.0000, 0.0000, 1.4914, ..., 0.0000, 3.2282, 0.0000]]), 'image_info_0': {'max_features': tensor(100)}, 'image_feature_1': tensor([[0.6171, 0.5207, 0.0000, ..., 6.4700, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, ..., 2.4846, 0.0000, 0.0000], [0.0000, 0.0000, 1.2065, ..., 0.0000, 0.0000, 0.0000], ..., [0.0000, 0.0000, 1.0070, ..., 5.8578, 5.1698, 0.0000], [4.3617, 0.0000, 0.0000, ..., 0.0000, 7.6311, 0.0000], [0.0000, 0.0000, 1.4914, ..., 0.0000, 3.2282, 0.0000]]), 'image_info_1': {'max_features': tensor(100)}} 2022-07-20T19:53:11 ERROR: Caught KeyError in DataLoader worker process 0.

    opened by MrLianSYSU 0
  • Hang when calculate validation accuracy

    Hang when calculate validation accuracy

    Hi, I ran a .sh script to calculate validation accuracy for few models. The code hangs after calculating validation accuracy for a model ( the hang lasts for more than 30 minutes before). I have to use CTRL+C to break the hang, so the script continues calculate validation accuracy for the rest models (Hang occurs for each of the subsequent calculation too). How can I fix this ?

    The print out on Terminal after CTRL + C :

    Token indices sequence length is longer than the specified maximum sequence length for this model (599 > 512). Running this sequence through the model will result in indexing errors
    100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [02:37<00:00,  3.95s/it]
    2022-06-06T00:21:54 INFO: Key current_iteration is not present in registry, returning default value of None
    2022-06-06T00:21:54 INFO: m4c_textvqa: full val:, 0/4000, val/total_loss: 38.3987, val/m4c_textvqa/m4c_decoding_bce_with_mask: 38.3987, val/m4c_textvqa/textvqa_accuracy: 0.2572
    ^CTraceback (most recent call last):
      File "/home/cybertron/anaconda3/envs/tap/lib/python3.6/runpy.py", line 193, in _run_module_as_main
        "__main__", mod_spec)
      File "/home/cybertron/anaconda3/envs/tap/lib/python3.6/runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "/home/cybertron/anaconda3/envs/tap/lib/python3.6/site-packages/torch/distributed/launch.py", line 246, in <module>
        main()
      File "/home/cybertron/anaconda3/envs/tap/lib/python3.6/site-packages/torch/distributed/launch.py", line 239, in main
        process.wait()
      File "/home/cybertron/anaconda3/envs/tap/lib/python3.6/subprocess.py", line 1477, in wait
        (pid, sts) = self._try_wait(0)
      File "/home/cybertron/anaconda3/envs/tap/lib/python3.6/subprocess.py", line 1424, in _try_wait
        (pid, sts) = os.waitpid(self.pid, wait_flags)
    KeyboardInterrupt
    
    
    opened by soonchangAI 0
  • Validation Accuracy different from paper

    Validation Accuracy different from paper

    Hi, the validation accuracy I calculated for the fine-tuned models are different from the paper. Command:

    python -m torch.distributed.launch --nproc_per_node 2  tools/run.py --tasks vqa --datasets m4c_textvqa --model m4c_split \
    --config $config \
    --save_dir $folder \
    --run_type val \
    --resume_file $finetuned_model \
    training_parameters.distributed True 
    

    I observed changing the batch size results in different values. | | Val accuracy for batch size = 32 | Val acc for batch size = 128 | In paper | | |-------------------------------|-----------------------------------|------------------------------|----------|---| | TextVQA TAP (base) | 49.87 | 49.53 | 49.91 | | | TextVQA TAP (additional data) | 54.31 | 54.13 | 54.71 | |

    opened by soonchangAI 0
  • Error of Pretraining User Defined Dataset

    Error of Pretraining User Defined Dataset

    Hi:

    I want to use TAP to pretrain model on my dataset, and I prepare the dataset following your data format.

    But when I try to pretrain the model with distributed setting (use only one GPU is fine), I encounter the following error:

    2022-04-15T14:13:50 INFO: m4c_textvqa:, 73100/96000, train/total_loss: 1.6139 (2.9855), train/m4c_textvqa/pretrainonly_m4c_decoding_bce_with_mask: 1.6139 (2.9855), train/m4c_textvqa/maskpred_accuracy: 0.8486 (0.7797), val/total_loss: 4.3474, val/m4c_textvqa/pretrainonly_m4c_decoding_bce_with_mask: 4.3474 (4.3474), val/m4c_textvqa/maskpred_accuracy: 0.7328, max mem: 7456.0, lr: 0.00001, time: 02m 47s 324ms, eta: 10h 43m 43s 839ms 2022-04-15T14:13:50 INFO: Batch Size of one GPU:16 2022-04-15T14:14:40 ERROR: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argumentfind_unused_parameters=Truetotorch.nn.parallel.DistributedDataParallel; (2) making sure allforwardfunction outputs participate in calculating loss. If you already have done the above two steps, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module'sforwardfunction. Please include the loss function and the structure of the return value offorwardof your module when reporting this issue (e.g. list, dict, iterable). (prepare_for_backward at /pytorch/torch/csrc/distributed/c10d/reducer.cpp:514) frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7ff58f8d1193 in /home/pai/envs/vqa/lib/python3.6/site-packages/torch/lib/libc10.so) frame #1: c10d::Reducer::prepare_for_backward(std::vector<at::Tensor, std::allocator<at::Tensor> > const&) + 0x731 (0x7ff5dae6ff81 in /home/pai/envs/vqa/lib/python3.6/site-packages/torch/lib/libtorch_python.so) frame #2: <unknown function> + 0xa0f14a (0x7ff5dae5c14a in /home/pai/envs/vqa/lib/python3.6/site-packages/torch/lib/libtorch_python.so) frame #3: <unknown function> + 0x2961c4 (0x7ff5da6e31c4 in /home/pai/envs/vqa/lib/python3.6/site-packages/torch/lib/libtorch_python.so) frame #4: _PyCFunction_FastCallDict + 0x262 (0x56330c484562 in /home/pai/envs/vqa/bin/python) frame #5: <unknown function> + 0x183135 (0x56330c4b0135 in /home/pai/envs/vqa/bin/python) ...

    Training loss drops as expected, but after several iterations (73100 iters in the above case), the above error happened. Which is very strange, since the kind of error should happened before the training starts.

    Have you ever encounter the above problem? Or could you help me solve the problem?

    Thanks very much.

    Kang

    opened by kangzhao2 5
Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
[CVPR'21 Oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning [CVPR'21, Oral] By Zhicheng Huang*, Zhaoyang Zeng*, Yupan H

Multimedia Research 196 Dec 13, 2022
[CVPR 2022 Oral] Versatile Multi-Modal Pre-Training for Human-Centric Perception

Versatile Multi-Modal Pre-Training for Human-Centric Perception Fangzhou Hong1  Liang Pan1  Zhongang Cai1,2,3  Ziwei Liu1* 1S-Lab, Nanyang Technologic

Fangzhou Hong 96 Jan 3, 2023
Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17

2017 VQA Challenge Winner (CVPR'17 Workshop) pytorch implementation of Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challeng

Mark Dong 166 Dec 11, 2022
Based on the paper "Geometry-aware Instance-reweighted Adversarial Training" ICLR 2021 oral

Geometry-aware Instance-reweighted Adversarial Training This repository provides codes for Geometry-aware Instance-reweighted Adversarial Training (ht

Jingfeng 47 Dec 22, 2022
This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

?? ERASOR (RA-L'21 with ICRA Option) Official page of "ERASOR: Egocentric Ratio of Pseudo Occupancy-based Dynamic Object Removal for Static 3D Point C

Hyungtae Lim 225 Dec 29, 2022
[CVPR2021 Oral] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers This is the official PyTorch implementation and models for UP-DETR paper: @a

dddzg 430 Dec 23, 2022
[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models Codes for this paper The Lottery Tickets Hypo

VITA 59 Dec 28, 2022
CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training

UC2 UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training Mingyang Zhou, Luowei Zhou, Shuohang Wang, Yu Cheng, Linjie Li, Zhou Yu,

Mingyang Zhou 28 Dec 30, 2022
Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

PyTorch Implementation of Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers 1 Using Colab Please notic

Hila Chefer 489 Jan 7, 2023
Official pytorch implementation of "Feature Stylization and Domain-aware Contrastive Loss for Domain Generalization" ACMMM 2021 (Oral)

Feature Stylization and Domain-aware Contrastive Loss for Domain Generalization This is an official implementation of "Feature Stylization and Domain-

null 22 Sep 22, 2022
An efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge.

Bottom-Up and Top-Down Attention for Visual Question Answering An efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge. The

Hengyuan Hu 731 Jan 3, 2023
Neural Module Network for VQA in Pytorch

Neural Module Network (NMN) for VQA in Pytorch Note: This is NOT an official repository for Neural Module Networks. NMN is a network that is assembled

Harsh Trivedi 111 Nov 24, 2022
[ICCV 2021 Oral] PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers

PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers Created by Xumin Yu*, Yongming Rao*, Ziyi Wang, Zuyan Liu, Jiwen Lu, Jie Zhou

Xumin Yu 317 Dec 26, 2022
Code for "Single-view robot pose and joint angle estimation via render & compare", CVPR 2021 (Oral).

Single-view robot pose and joint angle estimation via render & compare Yann Labbé, Justin Carpentier, Mathieu Aubry, Josef Sivic CVPR: Conference on C

Yann Labbé 51 Oct 14, 2022
Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral)

DSA^2 F: Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral) This repo is the official imp

如今我已剑指天涯 46 Dec 21, 2022
Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

t5-japanese Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts. The following is a list of models that

Kimio Kuramitsu 1 Dec 13, 2021
Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search

CLIP-GLaSS Repository for the paper Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search An in-browser demo is

Federico Galatolo 172 Dec 22, 2022
Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

IC-Conv This repository is an official implementation of the paper Inception Convolution with Efficient Dilation Search. Getting Started Download Imag

Jie Liu 111 Dec 31, 2022
Official PyTorch implementation of RobustNet (CVPR 2021 Oral)

RobustNet (CVPR 2021 Oral): Official Project Webpage Codes and pretrained models will be released soon. This repository provides the official PyTorch

Sungha Choi 173 Dec 21, 2022