PyTorch code for: Learning to Generate Grounded Visual Captions without Localization Supervision

Chih-Yao Ma

Last update: Nov 17, 2022

Related tags

Overview

Learning to Generate Grounded Visual Captions without Localization Supervision

This is the PyTorch implementation of our paper:

Learning to Generate Grounded Visual Captions without Localization Supervision
Chih-Yao Ma, Yannis Kalantidis, Ghassan AlRegib, Peter Vajda, Marcus Rohrbach, Zsolt Kira
European Conference on Computer Vision (ECCV), 2020

[arXiv] [GitHub] [Project]

10-min YouTube Video

How to start

Clone the repo recursively:

git clone --recursive [email protected]:chihyaoma/cyclical-visual-captioning.git

If you didn't clone with the --recursive flag, then you'll need to manually clone the pybind submodule from the top-level directory:

git submodule update --init --recursive

Installation

The proposed cyclical method can be applied directly to image and video captioning tasks.

Currently, installation guide and our code for video captioning on the ActivityNet-Entities dataset are provided in anet-video-captioning.

Acknowledgments

Chih-Yao Ma and Zsolt Kira were partly supported by DARPA’s Lifelong Learning Machines (L2M) program, under Cooperative Agreement HR0011-18-2-0019, as part of their affiliation with Georgia Tech. We thank Chia-Jung Hsu for her valuable and artistic helps on the figures.

Citation

If you find this repository useful, please cite our paper:

@inproceedings{ma2020learning,
    title={Learning to Generate Grounded Image Captions without Localization Supervision},
    author={Ma, Chih-Yao and Kalantidis, Yannis and AlRegib, Ghassan and Vajda, Peter and Rohrbach, Marcus and Kira, Zsolt},
    booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
    year={2020},
    url={https://arxiv.org/abs/1906.00283},
}

Rename Images with Auto Generated Neural Image Captions

Recaption Images with Generated Neural Image Caption Example Usage: Commandline: Recaption all images from folder /home/feng/Downloads/images to folde

3 May 1, 2022

A simple editor for captions in .SRT file extension

WaySRT A simple editor for captions in .SRT file extension The program doesn't use any external dependecies, just run: python way_srt.py {file_name.sr

3 Nov 16, 2022

[CVPR'22] Official PyTorch Implementation of Collaborative Transformers for Grounded Situation Recognition

[CVPR'22] Collaborative Transformers for Grounded Situation Recognition Paper | Model Checkpoint This is the official PyTorch implementation of Collab

29 Dec 10, 2022

Official codes for the paper "Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech"

ResDAVEnet-VQ Official PyTorch implementation of Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech What is in this repo? M

21 Aug 23, 2022

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

README.md shall be finished soon. WSSGG 0 Overview 1 Installation 1.1 Faster-RCNN 1.2 Language Parser 1.3 GloVe Embeddings 2 Settings 2.1 VG-GT-Graph

35 Nov 20, 2022

A 2D Visual Localization Framework based on Essential Matrices [ICRA2020]

Comments

inference on own video

Hi,

I am writing code to apply video captioning models on a single input video. Can you show me how to apply your model on a single input video or single input image? Is there a demo I can follow for the step by step approach? I'm new to this area and trying to understand what is required for testing video captioning models.

opened by nikky4D 1

PyTorch code for: Learning to Generate Grounded Visual Captions without Localization Supervision

Related tags

Overview

Learning to Generate Grounded Visual Captions without Localization Supervision

10-min YouTube Video

How to start

Installation

Acknowledgments

Citation

You might also like...

Rename Images with Auto Generated Neural Image Captions

A simple editor for captions in .SRT file extension

[CVPR'22] Official PyTorch Implementation of Collaborative Transformers for Grounded Situation Recognition

Official codes for the paper "Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech"

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

A 2D Visual Localization Framework based on Essential Matrices [ICRA2020]

Official Implementation of Few-shot Visual Relationship Co-localization

DSAC* for Visual Camera Re-Localization (RGB or RGB-D)

Weak-supervised Visual Geo-localization via Attention-based Knowledge Distillation

Comments

inference on own video

Owner

Chih-Yao Ma

Learning trajectory representations using self-supervision and programmatic supervision.

Mixup for Supervision, Semi- and Self-Supervision Learning Toolbox and Benchmark

This repository contains the code for the CVPR 2020 paper "Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision"

SSL_SLAM2: Lightweight 3-D Localization and Mapping for Solid-State LiDAR (mapping and localization separated) ICRA 2021

Python scripts performing class agnostic object localization using the Object Localization Network model in ONNX.

Source code for "MusCaps: Generating Captions for Music Audio" (IJCNN 2021)

Code for the ICML 2021 paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"

Code for the ICML 2021 paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"

Official repository of the paper Learning to Regress 3D Face Shape and Expression from an Image without 3D Supervision

CLIP: Connecting Text and Image (Learning Transferable Visual Models From Natural Language Supervision)