TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)

Microsoft

Last update: Nov 14, 2022

Related tags

Deep Learning TAP

Overview

TAP: Text-Aware Pre-training

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption

by Zhengyuan Yang, Yijuan Lu, Jianfeng Wang, Xi Yin, Dinei Florencio, Lijuan Wang, Cha Zhang, Lei Zhang, and Jiebo Luo

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021, Oral

Introduction

We propose Text-Aware Pre-training (TAP) for Text-VQA and Text-Caption tasks. For more details, please refer to our paper.

Citation

@inproceedings{yang2021tap,
  title={TAP: Text-Aware Pre-training for Text-VQA and Text-Caption},
  author={Yang, Zhengyuan and Lu, Yijuan and Wang, Jianfeng and Yin, Xi and Florencio, Dinei and Wang, Lijuan and Zhang, Cha and Zhang, Lei and Luo, Jiebo},
  booktitle={CVPR},
  year={2021}
}

Prerequisites

Python 3.6
Pytorch 1.4.0
Please refer to requirements.txt. Or using
```
python setup.py develop
```

Installation

Clone the repository

git clone https://github.com/microsoft/TAP.git
cd TAP
python setup.py develop

Data

Please refer to the Readme in the data folder.

Training

Train the model, run the code under main folder. Using flag --pretrain to access the pre-training mode, otherwise the main QA/Captioning losses are used to optimize the model. Example yml files are in configs folder. Detailed configs are in released models.

Pre-training:

python -m torch.distributed.launch --nproc_per_node $num_gpu tools/run.py --pretrain --tasks vqa --datasets $dataset --model $model --seed $seed --config configs/vqa/$dataset/"$pretrain_yml".yml --save_dir save/$pretrain_savedir training_parameters.distributed True

# for example
python -m torch.distributed.launch --nproc_per_node 4 tools/run.py --pretrain --tasks vqa --datasets m4c_textvqa --model m4c_split --seed 13 --config configs/vqa/m4c_textvqa/tap_base_pretrain.yml --save_dir save/m4c_split_pretrain_test training_parameters.distributed True

Fine-tuning:

python -m torch.distributed.launch --nproc_per_node $num_gpu tools/run.py --tasks vqa --datasets $dataset --model $model --seed $seed --config configs/vqa/$dataset/"$refine_yml".yml --save_dir save/$refine_savedir --resume_file save/$pretrain_savedir/$savename/best.ckpt training_parameters.distributed True

# for example
python -m torch.distributed.launch --nproc_per_node 4 tools/run.py --tasks vqa --datasets m4c_textvqa --model m4c_split --seed 13 --config configs/vqa/m4c_textvqa/tap_refine.yml --save_dir save/m4c_split_refine_test --resume_file save/pretrained/textvqa_tap_base_pretrain.ckpt training_parameters.distributed True

Evaluate the model, run the code under main folder. Set up val or test set by --run_type.

python -m torch.distributed.launch --nproc_per_node $num_gpu tools/run.py --tasks vqa --datasets $dataset --model $model --config configs/vqa/$dataset/"$refine_yml".yml --save_dir save/$refine_savedir --run_type val --resume_file save/$refine_savedir/$savename/best.ckpt training_parameters.distributed True

# for example
python -m torch.distributed.launch --nproc_per_node 4 tools/run.py --tasks vqa --datasets m4c_textvqa --model m4c_split --config configs/vqa/m4c_textvqa/tap_refine.yml --save_dir save/m4c_split_refine_test --run_type val --resume_file save/finetuned/textvqa_tap_base_best.ckpt training_parameters.distributed True

Captioning evaluation.

python projects/M4C_Captioner/scripts/textcaps_eval.py --set val --pred_file YOUR_VAL_PREDICTION_FILE

Performance and Pre-trained Models

Please check the detailed experiment settings in our paper.

Model checkpoints (~17G).

path/to/azcopy copy https://tapvqacaption.blob.core.windows.net/data/save <local_path>/save --recursive

Please refer to the Readme in the data folder for the detailed instructions on azcopy downloading.

Text-VQA	TAP	TAP** (with extra data)
TextVQA	49.91	54.71
STVQA	45.29	50.83

Text-Captioning	TAP	TAP** (with extra data)
TextCaps	105.05	109.16

Credits

The project is built based on the following repository:

MMF: A multimodal framework for vision and language research.

Comments

Fail to get the data: AuthenticationErrorDetail: Issuer validation failed. Issuer did not match.

After I ran azcopy login to authorize a user identity and it finally showede INFO: Login succeeded., I tried to run the download command azcopy copy https://tapvqacaption.blob.core.windows.net/data/data ./ --recursive But I got a 401 failed, the detailed error information is as follows.

INFO: Scanning...                                                                                                                                           INFO: Authenticating to source using Azure AD
INFO: Any empty folders will not be processed, because source and/or destination doesn't have full folder support

failed to perform copy command due to error: cannot start job due to error: cannot list files due to reason -> github.com/Azure/azure-storage-blob-go/azblob.newStorageError, /home/vsts/go/pkg/mod/github.com/!azure/[email protected]/azblob/zc_storage_error.go:42
===== RESPONSE ERROR (ServiceCode=InvalidAuthenticationInfo) =====
Description=Server failed to authenticate the request. Please refer to the information in the www-authenticate header.
RequestId:3cd53a6d-601e-00c3-71ff-675565000000
Time:2021-06-23T07:13:31.2032025Z, Details:
   AuthenticationErrorDetail: Issuer validation failed. Issuer did not match.
   Code: InvalidAuthenticationInfo
   GET https://tapvqacaption.blob.core.windows.net/data?comp=list&delimiter=%2F&include=metadata&prefix=data%2F&restype=container&timeout=901
   Authorization: REDACTED
   User-Agent: [AzCopy/10.11.0 Azure-Storage/0.13 (go1.15; linux)]
   X-Ms-Client-Request-Id: [2c0efb91-40c3-4dd0-4634-aebdd3eeda04]
   X-Ms-Version: [2019-12-12]
   --------------------------------------------------------------------------------
   RESPONSE Status: 401 Server failed to authenticate the request. Please refer to the information in the www-authenticate header.
   Content-Length: [402]                                                                                                                                       Content-Type: [application/xml]
   Date: [Wed, 23 Jun 2021 07:13:31 GMT]
   Server: [Microsoft-HTTPAPI/2.0]
   Www-Authenticate: [Bearer authorization_uri=https://login.microsoftonline.com/72f988bf-86f1-41af-91ab-2d7cd011db47/oauth2/authorize resource_id=https://storage.azure.com]
   X-Ms-Error-Code: [InvalidAuthenticationInfo]
   X-Ms-Request-Id: [3cd53a6d-601e-00c3-71ff-675565000000]

I wonder know which step I did wrong and what should I do to download the data.

opened by Loycine 3

Error during finetuning base model

Hi I encountered an error when I try to further finetune the base model. During validation check, there is a warning Token indices sequence length is longer than the specified maximum sequence length for this model (599 > 512). Running this sequence through the model will result in indexing errors How do I fix this ? Log:

2022-04-15T21:01:58 INFO: m4c_textvqa:, 41100/41100, train/total_loss: 0.6898 (0.6971), train/m4c_textvqa/m4c_decoding_bce_with_mask: 0.6898 (0.6971), train/m4c_textvqa/textvqa_accuracy: 0.8406 (0.8330), val/total_loss: 6.9965, val/m4c_textvqa/m4c_decoding_bce_with_mask: 6.9965, val/m4c_textvqa/textvqa_accuracy: 0.4969, max mem: 6524.0, lr: 0., time: 01m 07s 802ms, eta: 
2022-04-15T21:01:58 INFO: Stepping into final validation check
2022-04-15T21:01:58 INFO: Evaluation time. Running on full validation set...
Token indices sequence length is longer than the specified maximum sequence length for this model (599 > 512). Running this sequence through the model will result in indexing errors
intemediate model saving skipped. utiles/checkpoint, 41101
2022-04-15T21:05:21 INFO: m4c_textvqa: full val:, 41101/41100, val/total_loss: 6.5700, val/m4c_textvqa/m4c_decoding_bce_with_mask: 6.5700, val/m4c_textvqa/textvqa_accuracy: 0.4969, validation time: 04m 31s 394ms, best iteration: 41000, best val/m4c_textvqa/textvqa_accuracy: 0.499082
2022-04-15T21:05:21 INFO: Restoring checkpoint
2022-04-15T21:05:23 INFO: Starting inference on test set
  0%|                                                                                                                 | 0/180 [00:00<?, ?it/s]2022-04-15T21:05:25 WARNING: /home/cybertron/TAP/pythia/modules/losses.py:93: UserWarning: Sample list has not field 'targets', are you sure that your ImDB has labels? you may have wanted to run with --evalai_inference 1
  "Sample list has not field 'targets', are you "

2022-04-15T21:05:25 WARNING: /home/cybertron/TAP/pythia/modules/losses.py:93: UserWarning: Sample list has not field 'targets', are you sure that your ImDB has labels? you may have wanted to run with --evalai_inference 1
  "Sample list has not field 'targets', are you "

opened by soonchangAI 2

Microsoft OCR data could not be found

Hello, I would like to know whether there are IMBD files and extracted features corresponding to Microsoft OCR in the data provided? I did not find the corresponding file, could you please clearly point out the corresponding path of each data? The IMDB files in the figure all seem to correspond to Rosetta OCR.

opened by zhousheng97 2
TextCaps json file missing for TextVQA

Hello, I would like to train the TextVQA model with the extra data of TextCaps, however, the file 'TextCaps_0.1_train.json' is not provided in Line https://github.com/microsoft/TAP/blob/352891f93c75ac5d6b9ba141bbe831477dcdd807/pythia/datasets/vqa/m4c_textvqa/dataset.py#L36. Thanks!

opened by HenryJunW 2
Question about reproduce result

Hi! I reproduce the TAP(w/o others) and the final accuracy is about 46.2% on the validation set. But it is reported the 49.91% on val set in the paper. Are there any details that I ignored? Or what is the reason for that? Of course, due to insufficient memory, I can only set the batch size to 32, which is different from 128 in the paper. Thanks a lot!

opened by JayZhu0104 2
What is the val set of pre-train

Hi! After downloading OCR-CC features, I found that there were only feature files of training set. But I noticed that the IMDB file contains information about the val set. And the 'tap_base_pretrain.yml' file needs to fill in the val set and test set. What should be filled in this part? Thanks a lot!

opened by JayZhu0104 2
About the downloading errors by using azcopy

Hi, when I used azcopy command to download the data, the connection always reset by the remote host in the middle. I have tried this for two weeks but the error still remain... So, may I know any other alternative ways downloading the data? Thanks a lot!

opened by Glupapa 2
Reproduce of checkpoints

Dear authors:

I download the checkpoints Model checkpoints (~17G) and evaluate the model using the following code:

python tools/run.py --tasks vqa --datasets m4c_textvqa --model m4c_split --config configs/vqa/m4c_textvqa/tap_refine.yml --save_dir save/m4c_split_refine_test --run_type val --resume_file save/finetuned/textvqa_tap_base_best.ckpt

I got the following results:

2022-03-24T11:13:42 INFO: m4c_textvqa: full val:, 41000/24000, val/total_loss: 7.9873, val/m4c_textvqa/m4c_decoding_bce_with_mask: 7.9873, val/m4c_textvqa/textvqa_accuracy: 0.4413

And I found an error prompt during the evaluation:

Token indices sequence length is longer than the specified maximum sequence length for this model (599 > 512). Running this sequence through the model will result in indexing errors

In my opinion, the accuracy should be 0.4991 as shown in the following table:

What's wrong with my operations? Is there something to do with the error I encounter?

By the way, when I use the OCR-CC checkpoints: save/finetuned/textvqa_tap_ocrcc_best.ckpt, the accuracy is 0.4934 (which should be 0.5471), and I found the same error as mentioned above.

The GPU and PyTorch version is as following:

2022-03-24T11:09:34 INFO: CUDA Device 0 is: Tesla V100-SXM2-16GB 2022-03-24T11:09:37 INFO: Torch version is: 1.4.0

Hope to get your response

Thanks

opened by kangzhao2 1
No targets for training

When I train the VQA model I get the warning "Sample list has not field 'targets', are you sure that your ImDB has labels? you may have wanted to run with --evalai_inference 1"

I executed the same command as mentioned: python -m torch.distributed.launch --nproc_per_node 4 tools/run.py --tasks vqa --datasets m4c_textvqa --model m4c_split --seed 13 --config configs/vqa/m4c_textvqa/tap_refine.yml --save_dir save/m4c_split_refine_test --resume_file save/pretrained/textvqa_tap_base_pretrain.ckpt training_parameters.distributed True

Can you provide additional details on this and how to train the model with the targets? And can you point out where the targets and the predictions are getting compared to compute loss?

opened by abhinavkcs11 1
Require OCR-CC information (image IDs)

Hello @zyang-ur, and all

Thanks for this work, it is quite interesting.

I'm trying to obtain the OCR-CC dataset but due to my constraints, I can't download the 1.7TB dataset. However, I have the CC dataset and it would be possible for me to obtain the subset of images that are in OCR-CC.

Could you please share the image IDs of CC that were used to construct OCR-CC?

Thanks in advance!

opened by prajwalgatti 1
About the number of OCR in stvqa dataset

Hi！ I found that the number of words detected by OCR in some pictures in stvqa dataset is inconsistent with the corresponding feature number. For example, the number of features in 'feat_resx/stvqa/train/imageNet/n03196217_ 7957. npy' is 33, while the number of OCR words in the corresponding 'ocr_ feat_ resx/stvqa_ conf/train/imageNet/n03196217_ 7957_info. npy' is 55. The two numbers do not match. About 2000 pictures have this problem in train dataset.

opened by JayZhu0104 1
how to convert multi distributed GPU to single GPU

in this project we should use multi GPU . How to change source code to could use for single GPU systems because when I run this code with system that has single GPU(Geforce gtx ti), I have error and I could not run project. python -m torch.distributed.launch --nproc_per_node 1 tools/run.py --pretrain --tasks vqa --datasets m4c_textvqa --model m4c_split --seed 13 --config configs/vqa/m4c_textvqa/tap_base_pretrain.yml --save_dir save/m4c_split_pretrain_test training_parameters.distributed True

opened by kobrafarshidi 0
Where is the text images in CC-OCR?

Hello! When I try to download the link OCR-CC Data (Huge, ~1.3T), I find the CC-OCR dataset does not contain text images. So I would like to know where to get these images.

opened by TongkunGuan 4
A KEYERROR need to be solved--Emergency!

error raise: File "/home/lianjunliang/anaconda3/envs/TAP/pythia/datasets/vqa/m4c_textvqa/dataset.py", line 112, in load_item [self.object_clsname[x] for x in features['image_info_0']['objects']] KeyError: 'objects'

we print fearures,here is what it comes: 2022-07-20T19:53:09 INFO: Starting training... ******************** {'image_feature_0': tensor([[0.6171, 0.5207, 0.0000, ..., 6.4700, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, ..., 2.4846, 0.0000, 0.0000], [0.0000, 0.0000, 1.2065, ..., 0.0000, 0.0000, 0.0000], ..., [0.0000, 0.0000, 1.0070, ..., 5.8578, 5.1698, 0.0000], [4.3617, 0.0000, 0.0000, ..., 0.0000, 7.6311, 0.0000], [0.0000, 0.0000, 1.4914, ..., 0.0000, 3.2282, 0.0000]]), 'image_info_0': {'max_features': tensor(100)}, 'image_feature_1': tensor([[0.6171, 0.5207, 0.0000, ..., 6.4700, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, ..., 2.4846, 0.0000, 0.0000], [0.0000, 0.0000, 1.2065, ..., 0.0000, 0.0000, 0.0000], ..., [0.0000, 0.0000, 1.0070, ..., 5.8578, 5.1698, 0.0000], [4.3617, 0.0000, 0.0000, ..., 0.0000, 7.6311, 0.0000], [0.0000, 0.0000, 1.4914, ..., 0.0000, 3.2282, 0.0000]]), 'image_info_1': {'max_features': tensor(100)}} 2022-07-20T19:53:11 ERROR: Caught KeyError in DataLoader worker process 0.

opened by MrLianSYSU 0

Hang when calculate validation accuracy

Hi, I ran a .sh script to calculate validation accuracy for few models. The code hangs after calculating validation accuracy for a model ( the hang lasts for more than 30 minutes before). I have to use CTRL+C to break the hang, so the script continues calculate validation accuracy for the rest models (Hang occurs for each of the subsequent calculation too). How can I fix this ?

The print out on Terminal after CTRL + C :

Token indices sequence length is longer than the specified maximum sequence length for this model (599 > 512). Running this sequence through the model will result in indexing errors
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [02:37<00:00,  3.95s/it]
2022-06-06T00:21:54 INFO: Key current_iteration is not present in registry, returning default value of None
2022-06-06T00:21:54 INFO: m4c_textvqa: full val:, 0/4000, val/total_loss: 38.3987, val/m4c_textvqa/m4c_decoding_bce_with_mask: 38.3987, val/m4c_textvqa/textvqa_accuracy: 0.2572
^CTraceback (most recent call last):
  File "/home/cybertron/anaconda3/envs/tap/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/cybertron/anaconda3/envs/tap/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/cybertron/anaconda3/envs/tap/lib/python3.6/site-packages/torch/distributed/launch.py", line 246, in <module>
    main()
  File "/home/cybertron/anaconda3/envs/tap/lib/python3.6/site-packages/torch/distributed/launch.py", line 239, in main
    process.wait()
  File "/home/cybertron/anaconda3/envs/tap/lib/python3.6/subprocess.py", line 1477, in wait
    (pid, sts) = self._try_wait(0)
  File "/home/cybertron/anaconda3/envs/tap/lib/python3.6/subprocess.py", line 1424, in _try_wait
    (pid, sts) = os.waitpid(self.pid, wait_flags)
KeyboardInterrupt

opened by soonchangAI 0

Validation Accuracy different from paper
Hi, the validation accuracy I calculated for the fine-tuned models are different from the paper. Command:

python -m torch.distributed.launch --nproc_per_node 2 tools/run.py --tasks vqa --datasets m4c_textvqa --model m4c_split \ --config $config \ --save_dir $folder \ --run_type val \ --resume_file $finetuned_model \ training_parameters.distributed True

I observed changing the batch size results in different values. | | Val accuracy for batch size = 32 | Val acc for batch size = 128 | In paper | | |-------------------------------|-----------------------------------|------------------------------|----------|---| | TextVQA TAP (base) | 49.87 | 49.53 | 49.91 | | | TextVQA TAP (additional data) | 54.31 | 54.13 | 54.71 | |
opened by soonchangAI 0
Error of Pretraining User Defined Dataset

Hi：

I want to use TAP to pretrain model on my dataset, and I prepare the dataset following your data format.

But when I try to pretrain the model with distributed setting (use only one GPU is fine), I encounter the following error:

2022-04-15T14:13:50 INFO: m4c_textvqa:, 73100/96000, train/total_loss: 1.6139 (2.9855), train/m4c_textvqa/pretrainonly_m4c_decoding_bce_with_mask: 1.6139 (2.9855), train/m4c_textvqa/maskpred_accuracy: 0.8486 (0.7797), val/total_loss: 4.3474, val/m4c_textvqa/pretrainonly_m4c_decoding_bce_with_mask: 4.3474 (4.3474), val/m4c_textvqa/maskpred_accuracy: 0.7328, max mem: 7456.0, lr: 0.00001, time: 02m 47s 324ms, eta: 10h 43m 43s 839ms 2022-04-15T14:13:50 INFO: Batch Size of one GPU:16 2022-04-15T14:14:40 ERROR: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argumentfind_unused_parameters=Truetotorch.nn.parallel.DistributedDataParallel; (2) making sure allforwardfunction outputs participate in calculating loss. If you already have done the above two steps, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module'sforwardfunction. Please include the loss function and the structure of the return value offorwardof your module when reporting this issue (e.g. list, dict, iterable). (prepare_for_backward at /pytorch/torch/csrc/distributed/c10d/reducer.cpp:514) frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7ff58f8d1193 in /home/pai/envs/vqa/lib/python3.6/site-packages/torch/lib/libc10.so) frame #1: c10d::Reducer::prepare_for_backward(std::vector<at::Tensor, std::allocator<at::Tensor> > const&) + 0x731 (0x7ff5dae6ff81 in /home/pai/envs/vqa/lib/python3.6/site-packages/torch/lib/libtorch_python.so) frame #2: <unknown function> + 0xa0f14a (0x7ff5dae5c14a in /home/pai/envs/vqa/lib/python3.6/site-packages/torch/lib/libtorch_python.so) frame #3: <unknown function> + 0x2961c4 (0x7ff5da6e31c4 in /home/pai/envs/vqa/lib/python3.6/site-packages/torch/lib/libtorch_python.so) frame #4: _PyCFunction_FastCallDict + 0x262 (0x56330c484562 in /home/pai/envs/vqa/bin/python) frame #5: <unknown function> + 0x183135 (0x56330c4b0135 in /home/pai/envs/vqa/bin/python) ...

Training loss drops as expected, but after several iterations (73100 iters in the above case), the above error happened. Which is very strange, since the kind of error should happened before the training starts.

Have you ever encounter the above problem? Or could you help me solve the problem?

Thanks very much.

Kang

opened by kangzhao2 5

Owner

Microsoft

Open source projects and samples from Microsoft

GitHub

[CVPR'21 Oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning [CVPR'21, Oral] By Zhicheng Huang*, Zhaoyang Zeng*, Yupan H

196 Dec 13, 2022

[CVPR 2022 Oral] Versatile Multi-Modal Pre-Training for Human-Centric Perception

Versatile Multi-Modal Pre-Training for Human-Centric Perception Fangzhou Hong1 Liang Pan1 Zhongang Cai1,2,3 Ziwei Liu1* 1S-Lab, Nanyang Technologic

96 Jan 3, 2023

Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17

2017 VQA Challenge Winner (CVPR'17 Workshop) pytorch implementation of Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challeng

166 Dec 11, 2022

Based on the paper "Geometry-aware Instance-reweighted Adversarial Training" ICLR 2021 oral

Geometry-aware Instance-reweighted Adversarial Training This repository provides codes for Geometry-aware Instance-reweighted Adversarial Training (ht

47 Dec 22, 2022

This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

?? ERASOR (RA-L'21 with ICRA Option) Official page of "ERASOR: Egocentric Ratio of Pseudo Occupancy-based Dynamic Object Removal for Static 3D Point C

225 Dec 29, 2022

[CVPR2021 Oral] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers This is the official PyTorch implementation and models for UP-DETR paper: @a

430 Dec 23, 2022

[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models Codes for this paper The Lottery Tickets Hypo

59 Dec 28, 2022

CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training

UC2 UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training Mingyang Zhou, Luowei Zhou, Shuohang Wang, Yu Cheng, Linjie Li, Zhou Yu,

28 Dec 30, 2022

Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

PyTorch Implementation of Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers 1 Using Colab Please notic

489 Jan 7, 2023

Official pytorch implementation of "Feature Stylization and Domain-aware Contrastive Loss for Domain Generalization" ACMMM 2021 (Oral)

Feature Stylization and Domain-aware Contrastive Loss for Domain Generalization This is an official implementation of "Feature Stylization and Domain-

22 Sep 22, 2022

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)

Related tags

Overview

TAP: Text-Aware Pre-training

Introduction

Citation

Prerequisites

Installation

Training

Performance and Pre-trained Models

Credits

Comments

Owner

Microsoft

[CVPR'21 Oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

[CVPR 2022 Oral] Versatile Multi-Modal Pre-Training for Human-Centric Perception

Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17

Based on the paper "Geometry-aware Instance-reweighted Adversarial Training" ICLR 2021 oral

This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

[CVPR2021 Oral] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training

Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

Official pytorch implementation of "Feature Stylization and Domain-aware Contrastive Loss for Domain Generalization" ACMMM 2021 (Oral)

An efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge.

Neural Module Network for VQA in Pytorch

[ICCV 2021 Oral] PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers

Code for "Single-view robot pose and joint angle estimation via render & compare", CVPR 2021 (Oral).

Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral)

Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search

Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

Official PyTorch implementation of RobustNet (CVPR 2021 Oral)