Official pytorch implementation of the AAAI 2021 paper Semantic Grouping Network for Video Captioning

Hobin Ryu

Last update: Nov 25, 2022

Related tags

Deep Learning SGN

Overview

Semantic Grouping Network for Video Captioning

Hobin Ryu, Sunghun Kang, Haeyong Kang, and Chang D. Yoo. AAAI 2021. [arxiv]

Environment

Ubuntu 16.04
CUDA 9.2
cuDNN 7.4.2
Java 8
Python 2.7.12
- PyTorch 1.1.0
- Other python packages specified in requirements.txt

Usage

1. Setup

$ pip install -r requirements.txt

2. Prepare Data

Download the GloVe Embedding from here and locate it at data/Embeddings/GloVe/GloVe_300.json.
Extract features from datasets and locate them at data/ /features/ .hdf5.

e.g. ResNet101 features of the MSVD dataset will be located at data/MSVD/features/ResNet101.hdf5.

I refer to this repo for extracting the ResNet101 features, and this repo for extracting the 3D-ResNext101 features.
Split the features into train, val, and test sets by running following commands.
```
$ python -m split.MSVD
$ python -m split.MSR-VTT
```

You can skip step 2-3 and download below files

MSVD
- ResNet-101 [train] [val] [test]
- 3D-ResNext-101 [train] [val] [test]
MSR-VTT
- ResNet-101 [train] [val] [test]
- 3D-ResNext-101 [train] [val] [test]

3. Prepare The Code for Evaluation

Clone the evaluation code from the official coco-evaluation repo.

$ git clone https://github.com/tylin/coco-caption.git
$ mv coco-caption/pycocoevalcap .
$ rm -rf coco-caption

4. Extract Negative Videos

$ python extract_negative_videos.py

or you can skip this step as the output files are already uploaded at data/ /metadata/neg_vids_ .json

5. Train

$ python train.py

You can change some hyperparameters by modifying config.py.

Pretrained Models - SGN(R101+RN)

*Disclaimer: The models above do not have the same weight as the models used in the paper (I trained them again because I lost).

6. Evaluate

$ python evaluate.py --ckpt_fpath

License

The source-code in this repository is released under MIT License.

Comments

Dimensions of ResNet101 and 3D-ResNext101 features

Thank you for sharing your amazing work. I'd like to know more details about the ResNet101 and 3D-ResNext101 features that you used in this project.

In terms of 3D-ResNext101 features, the project link you provided has issues around the output dimensions as pointed out in here and here. It would be very helpful if you can share more details on how you generate the 3D-ResNext101 features (e.g. which pre-trained model you used and what's the feature dimensions). I plan to generate 3D-ResNext101 features on my custom videos but I'd like to make sure the format is correct.

Thank you so much for your help and time.

opened by ericwang0701 4
Can your kindly provide the file of 'GloVe_300.json'

Hi, Hobin, thanks for your great work and releasing the code. When I run the train.py, a error was met, i.e., IOError: [Errno 2] No such file or directory: 'data/Embeddings/GloVe/GloVe_300.json'. So, can you provide this file? Thanks!

opened by tuyunbin 1
MSVD_train.csv
Greetings! Thank you for sharing your work! However, I didn't find any file named 'MSVD_train.csv' to split dataset. Namely, in extract_negative_videos.py, line 130

` def main(dataset, split): if dataset == 'MSVD': caption_fpath = "./data/{}/metadata/{}.csv".format(dataset, split) vid2captions = load_MSVD_captions(caption_fpath) elif dataset == 'MSR-VTT': caption_fpath = "./data/{}/metadata/{}.json".format(dataset, split) vid2captions = load_MSRVTT_captions(caption_fpath) else: raise NotImplementedError('Unknown dataset: {}'.format(dataset))

negative_videos = extract_negative_samples(dataset, vid2captions) with open("data/{}/metadata/neg_vids_{}.json".format(dataset, split), 'w') as fout: json.dump(negative_videos, fout)

`

There is no such file caption_fpath = "./data/{}/metadata/{}.csv".format(dataset, split)
opened by MarcusNerva 1
Missing files

Congratulations for your code!! There are some missing files regarding the MSVD dataset. Those are as MSR Video Description Corpus.csv, train.list, valid.list, test.list. could you upload them?

opened by chatzikon 0
About CIDEr of MSR-VTT

It is mentioned in the paper that the CIDERr of MSR-VTT dataset is only 48.5 when AC LOSS is not added, and it can reach 49.5 after adding it, but I found that the CIDEr reached 48.8 before adding it, and it is still 48.8 after adding it, I would like to ask if it is caused by which step of my experiment is wrong?

opened by guoliwu 0

Owner

Hobin Ryu

GitHub

Simple image captioning model - CLIP prefix captioning.

688 Jan 4, 2023

Implementation of the paper All Labels Are Not Created Equal: Enhancing Semi-supervision via Label Grouping and Co-training

SemCo The official pytorch implementation of the paper All Labels Are Not Created Equal: Enhancing Semi-supervision via Label Grouping and Co-training

42 Nov 14, 2022

Open-source code for Generic Grouping Network (GGN, CVPR 2022)

Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity Pytorch implementation for "Open-World Instance Segmen

99 Dec 6, 2022

:hot_pepper: R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)

R²SQL The PyTorch implementation of paper Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing. (AAAI 2021) Requirement

60 Dec 31, 2022

[AAAI 2021] MVFNet: Multi-View Fusion Network for Efficient Video Recognition

MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021) Overview We release the code of the MVFNet (Multi-View Fusion Network).

114 Nov 27, 2022

MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021)

MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021) Overview We release the code of the MVFNet (Multi-View Fusion Network).

2 Jan 29, 2022

Official implementation for paper Knowledge Bridging for Empathetic Dialogue Generation (AAAI 2021).

Knowledge Bridging for Empathetic Dialogue Generation This is the official implementation for paper Knowledge Bridging for Empathetic Dialogue Generat

50 Dec 20, 2022

Videocaptioning.pytorch - A simple implementation of video captioning

pytorch implementation of video captioning recommend installing pytorch and pyth

2 Jan 1, 2022

Official PyTorch implementation of "Proxy Synthesis: Learning with Synthetic Classes for Deep Metric Learning" (AAAI 2021)

Proxy Synthesis: Learning with Synthetic Classes for Deep Metric Learning Official PyTorch implementation of "Proxy Synthesis: Learning with Synthetic

30 Dec 6, 2022

Code for the AAAI-2022 paper: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification (AAAI 2022) Prerequisite PyTorch >= 1.2.0 P

16 Dec 14, 2022

Official implementation for (Show, Attend and Distill: Knowledge Distillation via Attention-based Feature Matching, AAAI-2021)

Show, Attend and Distill: Knowledge Distillation via Attention-based Feature Matching Official pytorch implementation of "Show, Attend and Distill: Kn

80 Dec 16, 2022

Official implementation of AAAI-21 paper "Label Confusion Learning to Enhance Text Classification Models"

Description: This is the official implementation of our AAAI-21 accepted paper Label Confusion Learning to Enhance Text Classification Models. The str

101 Nov 25, 2022

Official Pytorch Implementation of: "Semantic Diversity Learning for Zero-Shot Multi-label Classification"(2021) paper

Semantic Diversity Learning for Zero-Shot Multi-label Classification Paper Official PyTorch Implementation Avi Ben-Cohen, Nadav Zamir, Emanuel Ben Bar

28 Aug 29, 2022

Implementation of accepted AAAI 2021 paper: Deep Unsupervised Image Hashing by Maximizing Bit Entropy

Deep Unsupervised Image Hashing by Maximizing Bit Entropy This is the PyTorch implementation of accepted AAAI 2021 paper: Deep Unsupervised Image Hash

62 Dec 30, 2022

Codes for paper "Towards Diverse Paragraph Captioning for Untrimmed Videos". CVPR 2021

Towards Diverse Paragraph Captioning for Untrimmed Videos This repository contains PyTorch implementation of our paper Towards Diverse Paragraph Capti

61 Oct 11, 2022

Pytorch implementation of the AAAI 2022 paper "Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification"

[AAAI22] Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification We point out the overlooked unbiasedness in long-tailed clas

28 Oct 18, 2022

official Pytorch implementation of ICCV 2021 paper FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting By Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu

77 Dec 27, 2022

Syntax-Aware Action Targeting for Video Captioning

Syntax-Aware Action Targeting for Video Captioning Code for SAAT from "Syntax-Aware Action Targeting for Video Captioning" (Accepted to CVPR 2020). Th

59 Oct 13, 2022

Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.

WSDEC This is the official repo for our NeurIPS paper Weakly Supervised Dense Event Captioning in Videos. Description Repo directories ./: global conf

96 Nov 1, 2022

Official pytorch implementation of the AAAI 2021 paper Semantic Grouping Network for Video Captioning

Related tags

Overview

Semantic Grouping Network for Video Captioning

Environment

Usage

1. Setup

2. Prepare Data

3. Prepare The Code for Evaluation

4. Extract Negative Videos

5. Train

6. Evaluate

License

Comments

Dimensions of ResNet101 and 3D-ResNext101 features

Can your kindly provide the file of 'GloVe_300.json'

MSVD_train.csv

Missing files

About CIDEr of MSR-VTT

Owner

Hobin Ryu

Simple image captioning model - CLIP prefix captioning.

Implementation of the paper All Labels Are Not Created Equal: Enhancing Semi-supervision via Label Grouping and Co-training

Open-source code for Generic Grouping Network (GGN, CVPR 2022)

:hot_pepper: R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)

[AAAI 2021] MVFNet: Multi-View Fusion Network for Efficient Video Recognition

MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021)

Official implementation for paper Knowledge Bridging for Empathetic Dialogue Generation (AAAI 2021).

Videocaptioning.pytorch - A simple implementation of video captioning

Official PyTorch implementation of "Proxy Synthesis: Learning with Synthetic Classes for Deep Metric Learning" (AAAI 2021)

Code for the AAAI-2022 paper: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

Official implementation for (Show, Attend and Distill: Knowledge Distillation via Attention-based Feature Matching, AAAI-2021)

Official implementation of AAAI-21 paper "Label Confusion Learning to Enhance Text Classification Models"

Official Pytorch Implementation of: "Semantic Diversity Learning for Zero-Shot Multi-label Classification"(2021) paper

Implementation of accepted AAAI 2021 paper: Deep Unsupervised Image Hashing by Maximizing Bit Entropy

Codes for paper "Towards Diverse Paragraph Captioning for Untrimmed Videos". CVPR 2021

Pytorch implementation of the AAAI 2022 paper "Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification"

official Pytorch implementation of ICCV 2021 paper FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.

Syntax-Aware Action Targeting for Video Captioning

Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.