Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17

Overview

2017 VQA Challenge Winner (CVPR'17 Workshop)

pytorch implementation of Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge by Teney et al.

Model architecture

Prerequisites

Data

Preparation

  • To download and extract vqav2, glove, and pretrained visual features:
    bash scripts/download_extract.sh
  • To prepare data for training:
    python scripts/preproc.py
  • The structure of data/ directory should look like this:
    - data/
      - zips/
        - v2_XXX...zip
        - ...
        - glove...zip
        - trainval_36.zip
      - glove/
        - glove...txt
        - ...
      - v2_XXX.json
      - ...
      - trainval_resnet...tsv
      (The above are files created after executing scripts/download_extract.sh)
      - tokenizers/
        - ...
      - dict_ans.pkl
      - dict_q.pkl
      - glove_pretrained_300.npy
      - train_qa.pkl
      - val_qa.pkl
      - train_vfeats.pkl
      - val_vfeats.pkl
      (The above are files created after executing scripts/preproc.py)
    

Train

Use default parameters:

bash scripts/train.sh

Notes

  • Huge re-factor (especially data preprocessing), tested based on pytorch 0.4.1 and python 3.6
  • Training for 20 epochs reach around 50% training accuracy. (model seems buggy in my implementation)
  • After all the preprocessing, data/ directory may be up to 38G+
  • Some of preproc.py and utils.py are based on this repo

Resources

You might also like...
Pytorch implementation of Tacotron
Pytorch implementation of Tacotron

Tacotron-pytorch A pytorch implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model. Requirements Install python 3 Install pytorc

Google AI 2018 BERT pytorch implementation

BERT-pytorch Pytorch implementation of Google AI's 2018 BERT, with simple annotation BERT 2018 BERT: Pre-training of Deep Bidirectional Transformers f

Unofficial PyTorch implementation of Google AI's VoiceFilter system
Unofficial PyTorch implementation of Google AI's VoiceFilter system

VoiceFilter Note from Seung-won (2020.10.25) Hi everyone! It's Seung-won from MINDs Lab, Inc. It's been a long time since I've released this open-sour

Implementation of ProteinBERT in Pytorch

ProteinBERT - Pytorch (wip) Implementation of ProteinBERT in Pytorch. Original Repository Install $ pip install protein-bert-pytorch Usage import torc

A PyTorch implementation of paper
A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

Chimera: Learning Shared Semantic Space for Speech-to-Text Translation This is a Pytorch implementation for the "Chimera" paper Learning Shared Semant

PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation
PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

StyleSpeech - PyTorch Implementation PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation. Status (2021.06.09

PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer
PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

Cross-Covariance Image Transformer (XCiT) PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer L

A pytorch implementation of the ACL2019 paper
A pytorch implementation of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".

RE2 This is a pytorch implementation of the ACL 2019 paper "Simple and Effective Text Matching with Richer Alignment Features". The original Tensorflo

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.
PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

VAENAR-TTS - PyTorch Implementation PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

Comments
  • Possible Bugs, Loader.py not working?

    Possible Bugs, Loader.py not working?

    Hello we are using you code to preprocess the data and build a few models on top of and are running into some bugs. Don't know why this is not working for us.

    https://github.com/SinghJasdeep/CS230-CS224N-VQA

    Mainly issues in the loader.py file:

    i_batch.append(self.i_feat[iid]) is actually a dictionary not an array of size (36,2048)?

    Your help would be greatly appreciated Mark.

    opened by SinghJasdeep 5
  • training accuracy  64.42%?

    training accuracy 64.42%?

    you mean the accuracy is 64.42% when training or validating? Why 'sigmoid multi-label classifier is also implemented but I can't train based on that'?

    Thank you in advance!

    opened by eustcPL 1
  • Add visual genome as extra data

    Add visual genome as extra data

    Hi, I try to add this dataset to train, and follow this paper's guide, use ' questions whose correct answers overlap the output vocabulary determined on the VQA v2 dataset'. But I got 970,000 questions or so, and it is much larger than 485,000 questions reported in the paper. Have any ideas?

    opened by greathope 0
  • 30% accuracy in training

    30% accuracy in training

    I download the code and try to reimplement your score on VQA 2.0 set. Since my computer cannot support the whole training data, I split the - vqa_train_final.json and - coco_features.npy into 7 folds, each set of them grouped by imageid.(like vqa_train_final.0.json contains image ids : [1, 2, 3] the coco_features.0.npy contains image features of [1, 2, 3] and other sets doesnot have any data related to image [1, 2, 3]) I train the model in two way, one is loading the data from 0 to 6 in each epoch and repeat 50 times. the other is loading each data set training 50 epochs and then move on to the next data set. However, both of them result in a low accuracy, 30% or so. the tokenized question, coco 36 features is downloaded from the link you described. what do you think might be the cause? Thanks

    this is how I split the data

    def split_images():
        list_train = os.listdir('G:/train2014/')
        list_train.remove('COCO_train2014_000000372405.jpg')
        ids = [int(f[15:27]) for f in list_train]
        length = int(len(ids)/7)+1
        ids_list = [ids[i:i + length] for i in range(0, len(ids), length)]
        for i in range(len(ids_list)):
            np.savetxt("split/imageIds.train." + str(i), ids_list[i], fmt='%d')
    
    def split_json():
        train = json.load(open('vqa_train_final.json'))
        for i in range(7):
            ids = np.loadtxt("split/imageIds.train." + str(i)).astype(int)
            s = set(ids)
            data = []
            for j in range(len(train)):
                if train[j]['image_id'] in s:
                    data.append(train[j])
            json.dump(data, open('split/vqa_train_final.json.' + str(i), 'w'))
    
    for k in range(7):
        ids = np.loadtxt("split/imageIds.train." + str(k)).astype(int)
        s = set(ids)
        in_data = {}
        with open(infile, "rt") as tsv_in_file:
            reader = csv.DictReader(tsv_in_file, delimiter='\t', fieldnames = FIELDNAMES)
            i = 0
            for item in reader:
                i = i+1
                if i % 1000 == 0:
                    print(k,i)
                try:
                    data = {}
                    data['image_id'] = int(item['image_id'])
                    if data['image_id'] in s:
                        b = base64.decodestring(bytes(item['features'], encoding = "utf8"))
                        data['features'] = np.frombuffer(b, dtype=np.float32).reshape((36, -1))
                        in_data[data['image_id']] = data['features']
                except:
                    print('error',item['image_id'])
            np.save('split/coco_features.npy.train.' + str(k), in_data)
    
    opened by bbbxixixixi 0
Owner
Mark Dong
CyLab PhD student
Mark Dong
In this repository we have tested 3 VQA models on the ImageCLEF-2019 dataset.

Med-VQA In this repository we have tested 3 VQA models on the ImageCLEF-2019 dataset. Two of these are made on top of Facebook AI Reasearch's Multi-Mo

Kshitij Ambilduke 8 Apr 14, 2022
Subtitle Workshop (subshop): tools to download and synchronize subtitles

SUBSHOP Tools to download, remove ads, and synchronize subtitles. SUBSHOP Purpose Limitations Required Web Credentials Installation, Configuration, an

Joe D 4 Feb 13, 2022
Multilingual Emotion classification using BERT (fine-tuning). Published at the WASSA workshop (ACL2022).

XLM-EMO: Multilingual Emotion Prediction in Social Media Text Abstract Detecting emotion in text allows social and computational scientists to study h

MilaNLP 35 Sep 17, 2022
Mirco Ravanelli 2.3k Dec 27, 2022
Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

Ubiquitous Knowledge Processing Lab 59 Dec 1, 2022
SAINT PyTorch implementation

SAINT-pytorch A Simple pyTorch implementation of "Towards an Appropriate Query, Key, and Value Computation for Knowledge Tracing" based on https://arx

Arshad Shaikh 63 Dec 25, 2022
Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch

COCO LM Pretraining (wip) Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch. They were a

Phil Wang 44 Jul 28, 2022
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Phil Wang 5k Jan 2, 2023
A fast and easy implementation of Transformer with PyTorch.

FasySeq FasySeq is a shorthand as a Fast and easy sequential modeling toolkit. It aims to provide a seq2seq model to researchers and developers, which

宁羽 7 Jul 18, 2022
A PyTorch Implementation of End-to-End Models for Speech-to-Text

speech Speech is an open-source package to build end-to-end models for automatic speech recognition. Sequence-to-sequence models with attention, Conne

Awni Hannun 647 Dec 25, 2022