Official repository of PanoAVQA: Grounded Audio-Visual Question Answering in 360° Videos (ICCV 2021)

Heeseung Yun

Last update: Dec 23, 2022

Related tags

Deep Learning PanoAVQA

Overview

Pano-AVQA

Official repository of PanoAVQA: Grounded Audio-Visual Question Answering in 360° Videos (ICCV 2021)

[Paper] [Poster] [Video]

Getting Started

This code is based on following libraries:

python=3.8
pytorch=1.7.0 (with cuda 10.2)

To create virtual environment with all necessary libraries:

conda env create -f environment.yml

By default data should be saved under data/feat/{audio,label,visual} directory and logs (w/ cache, checkpoint) are saved under data/{cache,ckpt,log} directory. Using symbolic link is recommended:

ln -s {path_to_your_data_directory} data

We use single TITAN RTX for training, but GPUs with less memory are still doable with smaller batch size (provided precomputed features).

Dataset

We plan to release the Pano-AVQA dataset public within this year, including Q&A annotation, precomputed features, etc. Please stay tuned!

Model

Training

Default configuration is provided in code/config.py. To run with this configuration:

python cli.py

To run with custom configuration, either modify code/config.py or execute:

python cli.py with {{flags_at_your_disposal}}

Inference

Model weight is saved under ./data/log directory. To run inference only:

python cli.py eval with ckpt_file=../data/log/{experiment}/{ckpt}.pth

Citation

If you find our work useful in your research, please consider citing:

@InProceedings{Yun2021PanoAVQA,
    author = {Yun, Heeseung and Yu, Youngjae and Yang, Wonsuk and Lee, Kangil and Kim, Gunhee},
    title = {Pano-AVQA: Grounded Audio-Visual Question Answering on 360$^\circ$ Videos},
    booktitle = {ICCV},
    year = {2021}
}

Contact

If you have any inquiries, please don't hesitate to contact us via heeseung.yun at vision.snu.ac.kr.

EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

MADE (Multi-Adapter Dataset Experts) This repository contains the implementation of MADE (Multi-adapter dataset experts), which is described in the pa

39 Oct 5, 2021

PyTorch implementation for the visual prior component (i.e. perception module) of the Visually Grounded Physics Learner [Li et al., 2020].

VGPL-Visual-Prior PyTorch implementation for the visual prior component (i.e. perception module) of the Visually Grounded Physics Learner (VGPL). Give

8 Dec 29, 2022

QA-GNN: Question Answering using Language Models and Knowledge Graphs

QA-GNN: Question Answering using Language Models and Knowledge Graphs This repo provides the source code & data of our paper: QA-GNN: Reasoning with L

434 Jan 4, 2023

GrailQA: Strongly Generalizable Question Answering

GrailQA is a new large-scale, high-quality KBQA dataset with 64,331 questions annotated with both answers and corresponding logical forms in different syntax (i.e., SPARQL, S-expression, etc.). It can be used to test three levels of generalization in KBQA: i.i.d., compositional, and zero-shot.

76 Dec 21, 2022

Pytorch implementation for the EMNLP 2020 (Findings) paper: Connecting the Dots: A Knowledgeable Path Generator for Commonsense Question Answering

Path-Generator-QA This is a Pytorch implementation for the EMNLP 2020 (Findings) paper: Connecting the Dots: A Knowledgeable Path Generator for Common

33 Dec 5, 2022

Comments

Very amazing work, but can we download the dataset?

Dear authors,

I'm amazed with the works that you all have been done, but can I download the dataset so that I can reproduce the results that you mentioned in the paper? Thank you in advance..

Sincerely yours,

opened by ravialdy 1
Where can I find the Appendix referred in the paper?

Hi,

Thank you for your amazing work!

In the paper, it says that the dataset construction, QA pair example and inference examples and visualization could be found in the Appendix. However, I cannot find the Appendix.

Where could I get the full Appendix? And do you have any update on the release of data?

Thank you.

opened by KeiKinn 0

Official repository of PanoAVQA: Grounded Audio-Visual Question Answering in 360° Videos (ICCV 2021)

Related tags

Overview

Pano-AVQA

[Paper] [Poster] [Video]

Getting Started

Dataset

Model

Training

Inference

Citation

Contact

You might also like...

EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

PyTorch implementation for the visual prior component (i.e. perception module) of the Visually Grounded Physics Learner [Li et al., 2020].

QA-GNN: Question Answering using Language Models and Knowledge Graphs

GrailQA: Strongly Generalizable Question Answering

Binary Passage Retriever (BPR) - an efficient passage retriever for open-domain question answering

covid question answering datasets and fine tuned models

NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR2021)

FeTaQA: Free-form Table Question Answering

Pytorch implementation for the EMNLP 2020 (Findings) paper: Connecting the Dots: A Knowledgeable Path Generator for Commonsense Question Answering

Comments

Very amazing work, but can we download the dataset?

Where can I find the Appendix referred in the paper?

Owner

Heeseung Yun

Codes for NAACL 2021 Paper "Unsupervised Multi-hop Question Answering by Question Generation"

improvement of CLIP features over the traditional resnet features on the visual question answering, image captioning, navigation and visual entailment tasks.

Official repository with code and data accompanying the NAACL 2021 paper "Hurdles to Progress in Long-form Question Answering" (https://arxiv.org/abs/2103.06332).

Bilinear attention networks for visual question answering

Visual Question Answering in Pytorch

This reporistory contains the test-dev data of the paper "xGQA: Cross-lingual Visual Question Answering".

Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch

This is the official implementation of "One Question Answering Model for Many Languages with Cross-lingual Dense Passage Retrieval".

Designing a Minimal Retrieve-and-Read System for Open-Domain Question Answering (NAACL 2021)

EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering