Official pytorch implementation of the AAAI 2021 paper Semantic Grouping Network for Video Captioning

Related tags

Deep Learning SGN
Overview

Semantic Grouping Network for Video Captioning

Hobin Ryu, Sunghun Kang, Haeyong Kang, and Chang D. Yoo. AAAI 2021. [arxiv]

Environment

  • Ubuntu 16.04
  • CUDA 9.2
  • cuDNN 7.4.2
  • Java 8
  • Python 2.7.12
    • PyTorch 1.1.0
    • Other python packages specified in requirements.txt

Usage

1. Setup

$ pip install -r requirements.txt

2. Prepare Data

  1. Download the GloVe Embedding from here and locate it at data/Embeddings/GloVe/GloVe_300.json.

  2. Extract features from datasets and locate them at data/ /features/ .hdf5 .

    e.g. ResNet101 features of the MSVD dataset will be located at data/MSVD/features/ResNet101.hdf5.

    I refer to this repo for extracting the ResNet101 features, and this repo for extracting the 3D-ResNext101 features.

  3. Split the features into train, val, and test sets by running following commands.

    $ python -m split.MSVD
    $ python -m split.MSR-VTT
    

You can skip step 2-3 and download below files

3. Prepare The Code for Evaluation

Clone the evaluation code from the official coco-evaluation repo.

$ git clone https://github.com/tylin/coco-caption.git
$ mv coco-caption/pycocoevalcap .
$ rm -rf coco-caption

4. Extract Negative Videos

$ python extract_negative_videos.py

or you can skip this step as the output files are already uploaded at data/ /metadata/neg_vids_ .json

5. Train

$ python train.py

You can change some hyperparameters by modifying config.py.

Pretrained Models - SGN(R101+RN)

*Disclaimer: The models above do not have the same weight as the models used in the paper (I trained them again because I lost).

6. Evaluate

$ python evaluate.py --ckpt_fpath 
   

   

License

The source-code in this repository is released under MIT License.

Comments
  • Dimensions of ResNet101 and 3D-ResNext101 features

    Dimensions of ResNet101 and 3D-ResNext101 features

    Thank you for sharing your amazing work. I'd like to know more details about the ResNet101 and 3D-ResNext101 features that you used in this project.

    In terms of 3D-ResNext101 features, the project link you provided has issues around the output dimensions as pointed out in here and here. It would be very helpful if you can share more details on how you generate the 3D-ResNext101 features (e.g. which pre-trained model you used and what's the feature dimensions). I plan to generate 3D-ResNext101 features on my custom videos but I'd like to make sure the format is correct.

    Thank you so much for your help and time.

    opened by ericwang0701 4
  • Can your kindly provide the file of 'GloVe_300.json'

    Can your kindly provide the file of 'GloVe_300.json'

    Hi, Hobin, thanks for your great work and releasing the code. When I run the train.py, a error was met, i.e., IOError: [Errno 2] No such file or directory: 'data/Embeddings/GloVe/GloVe_300.json'. So, can you provide this file? Thanks!

    opened by tuyunbin 1
  • MSVD_train.csv

    MSVD_train.csv

    Greetings! Thank you for sharing your work! However, I didn't find any file named 'MSVD_train.csv' to split dataset. Namely, in extract_negative_videos.py, line 130

    ` def main(dataset, split): if dataset == 'MSVD': caption_fpath = "./data/{}/metadata/{}.csv".format(dataset, split) vid2captions = load_MSVD_captions(caption_fpath) elif dataset == 'MSR-VTT': caption_fpath = "./data/{}/metadata/{}.json".format(dataset, split) vid2captions = load_MSRVTT_captions(caption_fpath) else: raise NotImplementedError('Unknown dataset: {}'.format(dataset))

    negative_videos = extract_negative_samples(dataset, vid2captions)
    with open("data/{}/metadata/neg_vids_{}.json".format(dataset, split), 'w') as fout:
        json.dump(negative_videos, fout)
    

    `

    There is no such file caption_fpath = "./data/{}/metadata/{}.csv".format(dataset, split)

    opened by MarcusNerva 1
  • Missing files

    Missing files

    Congratulations for your code!! There are some missing files regarding the MSVD dataset. Those are as MSR Video Description Corpus.csv, train.list, valid.list, test.list. could you upload them?

    opened by chatzikon 0
  • About CIDEr of MSR-VTT

    About CIDEr of MSR-VTT

    It is mentioned in the paper that the CIDERr of MSR-VTT dataset is only 48.5 when AC LOSS is not added, and it can reach 49.5 after adding it, but I found that the CIDEr reached 48.8 before adding it, and it is still 48.8 after adding it, I would like to ask if it is caused by which step of my experiment is wrong?

    opened by guoliwu 0
Owner
Hobin Ryu
Hobin Ryu
Simple image captioning model - CLIP prefix captioning.

Simple image captioning model - CLIP prefix captioning.

null 688 Jan 4, 2023
Implementation of the paper All Labels Are Not Created Equal: Enhancing Semi-supervision via Label Grouping and Co-training

SemCo The official pytorch implementation of the paper All Labels Are Not Created Equal: Enhancing Semi-supervision via Label Grouping and Co-training

null 42 Nov 14, 2022
Open-source code for Generic Grouping Network (GGN, CVPR 2022)

Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity Pytorch implementation for "Open-World Instance Segmen

Meta Research 99 Dec 6, 2022
:hot_pepper: R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)

R²SQL The PyTorch implementation of paper Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing. (AAAI 2021) Requirement

huybery 60 Dec 31, 2022
[AAAI 2021] MVFNet: Multi-View Fusion Network for Efficient Video Recognition

MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021) Overview We release the code of the MVFNet (Multi-View Fusion Network).

Wenhao Wu 114 Nov 27, 2022
MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021)

MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021) Overview We release the code of the MVFNet (Multi-View Fusion Network).

null 2 Jan 29, 2022
Official implementation for paper Knowledge Bridging for Empathetic Dialogue Generation (AAAI 2021).

Knowledge Bridging for Empathetic Dialogue Generation This is the official implementation for paper Knowledge Bridging for Empathetic Dialogue Generat

Qintong Li 50 Dec 20, 2022
Videocaptioning.pytorch - A simple implementation of video captioning

pytorch implementation of video captioning recommend installing pytorch and pyth

Yiyu Wang 2 Jan 1, 2022
Official PyTorch implementation of "Proxy Synthesis: Learning with Synthetic Classes for Deep Metric Learning" (AAAI 2021)

Proxy Synthesis: Learning with Synthetic Classes for Deep Metric Learning Official PyTorch implementation of "Proxy Synthesis: Learning with Synthetic

NAVER/LINE Vision 30 Dec 6, 2022
Code for the AAAI-2022 paper: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification (AAAI 2022) Prerequisite PyTorch >= 1.2.0 P

null 16 Dec 14, 2022
Official implementation for (Show, Attend and Distill: Knowledge Distillation via Attention-based Feature Matching, AAAI-2021)

Show, Attend and Distill: Knowledge Distillation via Attention-based Feature Matching Official pytorch implementation of "Show, Attend and Distill: Kn

Clova AI Research 80 Dec 16, 2022
Official implementation of AAAI-21 paper "Label Confusion Learning to Enhance Text Classification Models"

Description: This is the official implementation of our AAAI-21 accepted paper Label Confusion Learning to Enhance Text Classification Models. The str

null 101 Nov 25, 2022
Official Pytorch Implementation of: "Semantic Diversity Learning for Zero-Shot Multi-label Classification"(2021) paper

Semantic Diversity Learning for Zero-Shot Multi-label Classification Paper Official PyTorch Implementation Avi Ben-Cohen, Nadav Zamir, Emanuel Ben Bar

null 28 Aug 29, 2022
Implementation of accepted AAAI 2021 paper: Deep Unsupervised Image Hashing by Maximizing Bit Entropy

Deep Unsupervised Image Hashing by Maximizing Bit Entropy This is the PyTorch implementation of accepted AAAI 2021 paper: Deep Unsupervised Image Hash

null 62 Dec 30, 2022
Codes for paper "Towards Diverse Paragraph Captioning for Untrimmed Videos". CVPR 2021

Towards Diverse Paragraph Captioning for Untrimmed Videos This repository contains PyTorch implementation of our paper Towards Diverse Paragraph Capti

Yuqing Song 61 Oct 11, 2022
Pytorch implementation of the AAAI 2022 paper "Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification"

[AAAI22] Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification We point out the overlooked unbiasedness in long-tailed clas

PatatiPatata 28 Oct 18, 2022
official Pytorch implementation of ICCV 2021 paper FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting By Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu

null 77 Dec 27, 2022
Syntax-Aware Action Targeting for Video Captioning

Syntax-Aware Action Targeting for Video Captioning Code for SAAT from "Syntax-Aware Action Targeting for Video Captioning" (Accepted to CVPR 2020). Th

null 59 Oct 13, 2022
Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.

WSDEC This is the official repo for our NeurIPS paper Weakly Supervised Dense Event Captioning in Videos. Description Repo directories ./: global conf

Melon(Xuguang Duan) 96 Nov 1, 2022