Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17

Overview

2017 VQA Challenge Winner (CVPR'17 Workshop)

pytorch implementation of Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge by Teney et al.

Model architecture

Prerequisites

Data

Preparation

  • To download and extract vqav2, glove, and pretrained visual features:
    bash scripts/download_extract.sh
  • To prepare data for training:
    python scripts/preproc.py
  • The structure of data/ directory should look like this:
    - data/
      - zips/
        - v2_XXX...zip
        - ...
        - glove...zip
        - trainval_36.zip
      - glove/
        - glove...txt
        - ...
      - v2_XXX.json
      - ...
      - trainval_resnet...tsv
      (The above are files created after executing scripts/download_extract.sh)
      - tokenizers/
        - ...
      - dict_ans.pkl
      - dict_q.pkl
      - glove_pretrained_300.npy
      - train_qa.pkl
      - val_qa.pkl
      - train_vfeats.pkl
      - val_vfeats.pkl
      (The above are files created after executing scripts/preproc.py)
    

Train

Use default parameters:

bash scripts/train.sh

Notes

  • Huge re-factor (especially data preprocessing), tested based on pytorch 0.4.1 and python 3.6
  • Training for 20 epochs reach around 50% training accuracy. (model seems buggy in my implementation)
  • After all the preprocessing, data/ directory may be up to 38G+
  • Some of preproc.py and utils.py are based on this repo

Resources

You might also like...
Source codes of CenterTrack++ in 2021 ICME Workshop on Big Surveillance Data Processing and Analysis
Source codes of CenterTrack++ in 2021 ICME Workshop on Big Surveillance Data Processing and Analysis

MOT Tracked object bounding box association (CenterTrack++) New association method based on CenterTrack. Two new branches (Tracked Size and IOU) are a

Sign Language Translation with Transformers (COLING'2020, ECCV'20 SLRTP Workshop)

transformer-slt This repository gathers data and code supporting the experiments in the paper Better Sign Language Translation with STMC-Transformer.

Image transformations designed for Scene Text Recognition (STR) data augmentation. Published at ICCV 2021 Workshop on Interactive Labeling and Data Augmentation for Vision.

Data Augmentation for Scene Text Recognition (ICCV 2021 Workshop) (Pronounced as "strog") Paper Arxiv Why it matters? Scene Text Recognition (STR) req

Code for the ICCV 2021 Workshop paper: A Unified Efficient Pyramid Transformer for Semantic Segmentation.

Unified-EPT Code for the ICCV 2021 Workshop paper: A Unified Efficient Pyramid Transformer for Semantic Segmentation. Installation Linux, CUDA=10.0,

Code repo for
Code repo for "RBSRICNN: Raw Burst Super-Resolution through Iterative Convolutional Neural Network" (Machine Learning and the Physical Sciences workshop in NeurIPS 2021).

RBSRICNN: Raw Burst Super-Resolution through Iterative Convolutional Neural Network An official PyTorch implementation of the RBSRICNN network as desc

Guiding evolutionary strategies by (inaccurate) differentiable robot simulators @ NeurIPS, 4th Robot Learning Workshop
Guiding evolutionary strategies by (inaccurate) differentiable robot simulators @ NeurIPS, 4th Robot Learning Workshop

Guiding Evolutionary Strategies by Differentiable Robot Simulators In recent years, Evolutionary Strategies were actively explored in robotic tasks fo

The 7th edition of NTIRE: New Trends in Image Restoration and Enhancement workshop will be held on June 2022 in conjunction with CVPR 2022.
The 7th edition of NTIRE: New Trends in Image Restoration and Enhancement workshop will be held on June 2022 in conjunction with CVPR 2022.

NTIRE 2022 - Image Inpainting Challenge Important dates 2022.02.01: Release of train data (input and output images) and validation data (only input) 2

Workshop Materials Delivered on 28/02/2022

intro-to-cnn-p1 Repo for hosting workshop materials delivered on 28/02/2022 Questions you will answer in this workshop Learning Objectives What are co

ALBERT-pytorch-implementation - ALBERT pytorch implementation

ALBERT-pytorch-implementation developing... 모델의 개념이해를 돕기 위한 구현물로 현재 변수명을 상세히 적었고

Comments
  • Possible Bugs, Loader.py not working?

    Possible Bugs, Loader.py not working?

    Hello we are using you code to preprocess the data and build a few models on top of and are running into some bugs. Don't know why this is not working for us.

    https://github.com/SinghJasdeep/CS230-CS224N-VQA

    Mainly issues in the loader.py file:

    i_batch.append(self.i_feat[iid]) is actually a dictionary not an array of size (36,2048)?

    Your help would be greatly appreciated Mark.

    opened by SinghJasdeep 5
  • training accuracy  64.42%?

    training accuracy 64.42%?

    you mean the accuracy is 64.42% when training or validating? Why 'sigmoid multi-label classifier is also implemented but I can't train based on that'?

    Thank you in advance!

    opened by eustcPL 1
  • Add visual genome as extra data

    Add visual genome as extra data

    Hi, I try to add this dataset to train, and follow this paper's guide, use ' questions whose correct answers overlap the output vocabulary determined on the VQA v2 dataset'. But I got 970,000 questions or so, and it is much larger than 485,000 questions reported in the paper. Have any ideas?

    opened by greathope 0
  • 30% accuracy in training

    30% accuracy in training

    I download the code and try to reimplement your score on VQA 2.0 set. Since my computer cannot support the whole training data, I split the - vqa_train_final.json and - coco_features.npy into 7 folds, each set of them grouped by imageid.(like vqa_train_final.0.json contains image ids : [1, 2, 3] the coco_features.0.npy contains image features of [1, 2, 3] and other sets doesnot have any data related to image [1, 2, 3]) I train the model in two way, one is loading the data from 0 to 6 in each epoch and repeat 50 times. the other is loading each data set training 50 epochs and then move on to the next data set. However, both of them result in a low accuracy, 30% or so. the tokenized question, coco 36 features is downloaded from the link you described. what do you think might be the cause? Thanks

    this is how I split the data

    def split_images():
        list_train = os.listdir('G:/train2014/')
        list_train.remove('COCO_train2014_000000372405.jpg')
        ids = [int(f[15:27]) for f in list_train]
        length = int(len(ids)/7)+1
        ids_list = [ids[i:i + length] for i in range(0, len(ids), length)]
        for i in range(len(ids_list)):
            np.savetxt("split/imageIds.train." + str(i), ids_list[i], fmt='%d')
    
    def split_json():
        train = json.load(open('vqa_train_final.json'))
        for i in range(7):
            ids = np.loadtxt("split/imageIds.train." + str(i)).astype(int)
            s = set(ids)
            data = []
            for j in range(len(train)):
                if train[j]['image_id'] in s:
                    data.append(train[j])
            json.dump(data, open('split/vqa_train_final.json.' + str(i), 'w'))
    
    for k in range(7):
        ids = np.loadtxt("split/imageIds.train." + str(k)).astype(int)
        s = set(ids)
        in_data = {}
        with open(infile, "rt") as tsv_in_file:
            reader = csv.DictReader(tsv_in_file, delimiter='\t', fieldnames = FIELDNAMES)
            i = 0
            for item in reader:
                i = i+1
                if i % 1000 == 0:
                    print(k,i)
                try:
                    data = {}
                    data['image_id'] = int(item['image_id'])
                    if data['image_id'] in s:
                        b = base64.decodestring(bytes(item['features'], encoding = "utf8"))
                        data['features'] = np.frombuffer(b, dtype=np.float32).reshape((36, -1))
                        in_data[data['image_id']] = data['features']
                except:
                    print('error',item['image_id'])
            np.save('split/coco_features.npy.train.' + str(k), in_data)
    
    opened by bbbxixixixi 0
Owner
Mark Dong
CyLab PhD student
Mark Dong
Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

PyTorch Implementation of Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers 1 Using Colab Please notic

Hila Chefer 489 Jan 7, 2023
An efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge.

Bottom-Up and Top-Down Attention for Visual Question Answering An efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge. The

Hengyuan Hu 731 Jan 3, 2023
This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

?? ERASOR (RA-L'21 with ICRA Option) Official page of "ERASOR: Egocentric Ratio of Pseudo Occupancy-based Dynamic Object Removal for Static 3D Point C

Hyungtae Lim 225 Dec 29, 2022
Neural Module Network for VQA in Pytorch

Neural Module Network (NMN) for VQA in Pytorch Note: This is NOT an official repository for Neural Module Networks. NMN is a network that is assembled

Harsh Trivedi 111 Nov 24, 2022
UDP++ (ECCVW 2020 Oral), (Winner of COCO 2020 Keypoint Challenge).

UDP-Pose This is the pytorch implementation for UDP++, which won the Fisrt place in COCO Keypoint Challenge at ECCV 2020 Workshop. Top-Down Results on

null 20 Jul 29, 2022
"MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction" (CVPRW 2022) & (Winner of NTIRE 2022 Challenge on Spectral Reconstruction from RGB)

MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction (CVPRW 2022) Yuanhao Cai, Jing Lin, Zudi Lin, Haoqian Wang, Yulun Z

Yuanhao Cai 274 Jan 5, 2023
[CVPR 2021] Counterfactual VQA: A Cause-Effect Look at Language Bias

Counterfactual VQA (CF-VQA) This repository is the Pytorch implementation of our paper "Counterfactual VQA: A Cause-Effect Look at Language Bias" in C

Yulei Niu 94 Dec 3, 2022
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)

TAP: Text-Aware Pre-training TAP: Text-Aware Pre-training for Text-VQA and Text-Caption by Zhengyuan Yang, Yijuan Lu, Jianfeng Wang, Xi Yin, Dinei Flo

Microsoft 61 Nov 14, 2022
The 1st place solution of track2 (Vehicle Re-Identification) in the NVIDIA AI City Challenge at CVPR 2021 Workshop.

AICITY2021_Track2_DMT The 1st place solution of track2 (Vehicle Re-Identification) in the NVIDIA AI City Challenge at CVPR 2021 Workshop. Introduction

Hao Luo 91 Dec 21, 2022
Code for "ShineOn: Illuminating Design Choices for Practical Video-based Virtual Clothing Try-on", accepted at WACV 2021 Generation of Human Behavior Workshop.

ShineOn: Illuminating Design Choices for Practical Video-based Virtual Clothing Try-on [ Paper ] [ Project Page ] This repository contains the code fo

Andrew Jong 97 Dec 13, 2022