Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17

Mark Dong

Last update: Dec 11, 2022

Related tags

Overview

2017 VQA Challenge Winner (CVPR'17 Workshop)

pytorch implementation of Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge by Teney et al.

Prerequisites

python 3.6+
numpy
pytorch 0.4
tqdm
nltk
pandas

Data

Preparation

To download and extract vqav2, glove, and pretrained visual features:
```
bash scripts/download_extract.sh
```
To prepare data for training:
```
python scripts/preproc.py
```

The structure of data/ directory should look like this:

- data/
  - zips/
    - v2_XXX...zip
    - ...
    - glove...zip
    - trainval_36.zip
  - glove/
    - glove...txt
    - ...
  - v2_XXX.json
  - ...
  - trainval_resnet...tsv
  (The above are files created after executing scripts/download_extract.sh)
  - tokenizers/
    - ...
  - dict_ans.pkl
  - dict_q.pkl
  - glove_pretrained_300.npy
  - train_qa.pkl
  - val_qa.pkl
  - train_vfeats.pkl
  - val_vfeats.pkl
  (The above are files created after executing scripts/preproc.py)

Train

Use default parameters:

bash scripts/train.sh

Notes

Huge re-factor (especially data preprocessing), tested based on pytorch 0.4.1 and python 3.6
Training for 20 epochs reach around 50% training accuracy. (model seems buggy in my implementation)
After all the preprocessing, data/ directory may be up to 38G+
Some of preproc.py and utils.py are based on this repo

Resources

You might also like...

Source codes of CenterTrack++ in 2021 ICME Workshop on Big Surveillance Data Processing and Analysis

MOT Tracked object bounding box association (CenterTrack++) New association method based on CenterTrack. Two new branches (Tracked Size and IOU) are a

36 Oct 4, 2022

Sign Language Translation with Transformers (COLING'2020, ECCV'20 SLRTP Workshop)

transformer-slt This repository gathers data and code supporting the experiments in the paper Better Sign Language Translation with STMC-Transformer.

107 Dec 27, 2022

Image transformations designed for Scene Text Recognition (STR) data augmentation. Published at ICCV 2021 Workshop on Interactive Labeling and Data Augmentation for Vision.

Data Augmentation for Scene Text Recognition (ICCV 2021 Workshop) (Pronounced as "strog") Paper Arxiv Why it matters? Scene Text Recognition (STR) req

152 Dec 28, 2022

Code for the ICCV 2021 Workshop paper: A Unified Efficient Pyramid Transformer for Semantic Segmentation.

Unified-EPT Code for the ICCV 2021 Workshop paper: A Unified Efficient Pyramid Transformer for Semantic Segmentation. Installation Linux, CUDA=10.0,

29 Aug 23, 2022

Code repo for "RBSRICNN: Raw Burst Super-Resolution through Iterative Convolutional Neural Network" (Machine Learning and the Physical Sciences workshop in NeurIPS 2021).

RBSRICNN: Raw Burst Super-Resolution through Iterative Convolutional Neural Network An official PyTorch implementation of the RBSRICNN network as desc

6 Nov 14, 2022

Guiding evolutionary strategies by (inaccurate) differentiable robot simulators @ NeurIPS, 4th Robot Learning Workshop

Guiding Evolutionary Strategies by Differentiable Robot Simulators In recent years, Evolutionary Strategies were actively explored in robotic tasks fo

4 Dec 14, 2021

The 7th edition of NTIRE: New Trends in Image Restoration and Enhancement workshop will be held on June 2022 in conjunction with CVPR 2022.

NTIRE 2022 - Image Inpainting Challenge Important dates 2022.02.01: Release of train data (input and output images) and validation data (only input) 2

37 Nov 27, 2022

Workshop Materials Delivered on 28/02/2022

intro-to-cnn-p1 Repo for hosting workshop materials delivered on 28/02/2022 Questions you will answer in this workshop Learning Objectives What are co

5 Feb 28, 2022

ALBERT-pytorch-implementation - ALBERT pytorch implementation

ALBERT-pytorch-implementation developing... 모델의 개념이해를 돕기 위한 구현물로 현재 변수명을 상세히 적었고

3 Oct 6, 2022

Comments

Possible Bugs, Loader.py not working?

Hello we are using you code to preprocess the data and build a few models on top of and are running into some bugs. Don't know why this is not working for us.

https://github.com/SinghJasdeep/CS230-CS224N-VQA

Mainly issues in the loader.py file:

i_batch.append(self.i_feat[iid]) is actually a dictionary not an array of size (36,2048)?

Your help would be greatly appreciated Mark.

opened by SinghJasdeep 5
training accuracy 64.42%?

you mean the accuracy is 64.42% when training or validating? Why 'sigmoid multi-label classifier is also implemented but I can't train based on that'?

Thank you in advance!

opened by eustcPL 1
Add visual genome as extra data

Hi, I try to add this dataset to train, and follow this paper's guide, use ' questions whose correct answers overlap the output vocabulary determined on the VQA v2 dataset'. But I got 970,000 questions or so, and it is much larger than 485,000 questions reported in the paper. Have any ideas?

opened by greathope 0

30% accuracy in training

I download the code and try to reimplement your score on VQA 2.0 set. Since my computer cannot support the whole training data, I split the - vqa_train_final.json and - coco_features.npy into 7 folds, each set of them grouped by imageid.(like vqa_train_final.0.json contains image ids : [1, 2, 3] the coco_features.0.npy contains image features of [1, 2, 3] and other sets doesnot have any data related to image [1, 2, 3]) I train the model in two way, one is loading the data from 0 to 6 in each epoch and repeat 50 times. the other is loading each data set training 50 epochs and then move on to the next data set. However, both of them result in a low accuracy, 30% or so. the tokenized question, coco 36 features is downloaded from the link you described. what do you think might be the cause? Thanks

this is how I split the data

def split_images():
    list_train = os.listdir('G:/train2014/')
    list_train.remove('COCO_train2014_000000372405.jpg')
    ids = [int(f[15:27]) for f in list_train]
    length = int(len(ids)/7)+1
    ids_list = [ids[i:i + length] for i in range(0, len(ids), length)]
    for i in range(len(ids_list)):
        np.savetxt("split/imageIds.train." + str(i), ids_list[i], fmt='%d')

def split_json():
    train = json.load(open('vqa_train_final.json'))
    for i in range(7):
        ids = np.loadtxt("split/imageIds.train." + str(i)).astype(int)
        s = set(ids)
        data = []
        for j in range(len(train)):
            if train[j]['image_id'] in s:
                data.append(train[j])
        json.dump(data, open('split/vqa_train_final.json.' + str(i), 'w'))

for k in range(7):
    ids = np.loadtxt("split/imageIds.train." + str(k)).astype(int)
    s = set(ids)
    in_data = {}
    with open(infile, "rt") as tsv_in_file:
        reader = csv.DictReader(tsv_in_file, delimiter='\t', fieldnames = FIELDNAMES)
        i = 0
        for item in reader:
            i = i+1
            if i % 1000 == 0:
                print(k,i)
            try:
                data = {}
                data['image_id'] = int(item['image_id'])
                if data['image_id'] in s:
                    b = base64.decodestring(bytes(item['features'], encoding = "utf8"))
                    data['features'] = np.frombuffer(b, dtype=np.float32).reshape((36, -1))
                    in_data[data['image_id']] = data['features']
            except:
                print('error',item['image_id'])
        np.save('split/coco_features.npy.train.' + str(k), in_data)

opened by bbbxixixixi 0

Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17

Related tags

Overview

2017 VQA Challenge Winner (CVPR'17 Workshop)

Prerequisites

Data

Preparation

Train

Notes

Resources

You might also like...

Source codes of CenterTrack++ in 2021 ICME Workshop on Big Surveillance Data Processing and Analysis

Sign Language Translation with Transformers (COLING'2020, ECCV'20 SLRTP Workshop)

Image transformations designed for Scene Text Recognition (STR) data augmentation. Published at ICCV 2021 Workshop on Interactive Labeling and Data Augmentation for Vision.

Code for the ICCV 2021 Workshop paper: A Unified Efficient Pyramid Transformer for Semantic Segmentation.

Code repo for "RBSRICNN: Raw Burst Super-Resolution through Iterative Convolutional Neural Network" (Machine Learning and the Physical Sciences workshop in NeurIPS 2021).

Guiding evolutionary strategies by (inaccurate) differentiable robot simulators @ NeurIPS, 4th Robot Learning Workshop

The 7th edition of NTIRE: New Trends in Image Restoration and Enhancement workshop will be held on June 2022 in conjunction with CVPR 2022.

Workshop Materials Delivered on 28/02/2022

ALBERT-pytorch-implementation - ALBERT pytorch implementation

Comments

Possible Bugs, Loader.py not working?

training accuracy 64.42%?

Add visual genome as extra data

30% accuracy in training

Owner

Mark Dong

Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

An efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge.

This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

Neural Module Network for VQA in Pytorch

UDP++ (ECCVW 2020 Oral), (Winner of COCO 2020 Keypoint Challenge).

"MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction" (CVPRW 2022) & (Winner of NTIRE 2022 Challenge on Spectral Reconstruction from RGB)

[CVPR 2021] Counterfactual VQA: A Cause-Effect Look at Language Bias

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)

The 1st place solution of track2 (Vehicle Re-Identification) in the NVIDIA AI City Challenge at CVPR 2021 Workshop.

Code for "ShineOn: Illuminating Design Choices for Practical Video-based Virtual Clothing Try-on", accepted at WACV 2021 Generation of Human Behavior Workshop.