Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

Last update: Jan 3, 2023

Related tags

Deep Learning zero-shot-image-to-text

Overview

Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

[Paper] [Colab is coming soon]

Approach

Example

Usage

To run captioning on a single image:

$ python run.py 
--reset_context_delta
--caption_img_path "example_images/captions/COCO_val2014_000000097017.jpg"

To run model on visual arithmetic:

$ python run.py 
--reset_context_delta
--end_factor 1.06
--fusion_factor 0.95
--grad_norm_factor 0.95
--run_type arithmetics
--arithmetics_imgs "example_images/arithmetics/woman2.jpg" "example_images/arithmetics/king2.jpg" "example_images/arithmetics/man2.jpg"
--arithmetics_weights 1 1 -1

To run model on real world knowledge:

$ python run.py
--reset_context_delta --cond_text "Image of" 
--end_factor 1.04 
--caption_img_path "example_images/real_world/simpsons.jpg"

To run model on OCR:

$ python run.py
--reset_context_delta --cond_text "Image of text that says" 
--end_factor 1.04 
--caption_img_path "example_images/OCR/welcome_sign.jpg"

Comments

Add Docker environment & web demo

Hey @YoadTew ! 👋

This pull request makes it possible to run your model inside a Docker environment, which makes it easier for other people to run it. We're using an open source tool called Cog to make this process easier.

This also means we can make a web page where other people can try out your model! We have implemented the captioning model at the moment. View it here: https://replicate.com/yoadtew/zero-shot-image-to-text

We noticed you have signed up for Replicate, so you can add examples and customise the Example gallery as you like and push any future update to the web demo.

opened by chenxwh 4
Should I additionally set end_factor to 1.04 in the command and set the variable self.ef_idx to 3 in class CLIPTextGenerator to reproduce image caption results?

in readme.md, to perform image caption, the command is $ python run.py --reset_context_delta --caption_img_path "example_images/captions/COCO_val2014_000000097017.jpg" however, in the paper, it said that the end_factor is 1.04 and time-step is 3. To reproduce image caption results, should I additionally set end_factor to 1.04 in the command and set the variable self.ef_idx to 3 in class CLIPTextGenerator?

opened by baiyuting 0
Any evaluate script provided to quickly calculate the metrics in this paper?

I'd like to reproduce the results of image caption in this paper, is there any evaluate script that can be provided to quickly calculate these metrics in this paper? And if I want to reproduce the image caption result, are there any hyper-parameters or settings in these code that I need to pay attention to?

opened by baiyuting 0
About Eq.(5) in paper

In paper, where $\alpha = 0.3$ and the norm of gradients has a factor of 2? But in https://github.com/YoadTew/zero-shot-image-to-text/blob/main/model/ZeroCLIP.py stepsize = 0.3 but grad_norm_factor=0.9? Did I make a mistaken understanding？

opened by 232525 0
Metrics Computations : B@1 = BLEU-1, M = METEOR, C = CIDEr, S = SPICE.

Hi, Thank you very much for sharing your excellent work, I have following queries if you spare time to answer please.

1: I have reproduced the captions of the given sample mages as per steps given in the repository. 2: Now i need to investigate the evaluation metrics, so I want to know where in the code are you computing the above metrics like BLEU, METEOR, CIDEr and SPICE as well as diversity metrics. 3. I have gone through the paper , it seems we don't need any training ? am i right or am i missing some thing. Sorry for trivial questions.

Regards and thanking you in anticipation

opened by aliman80 0
UnboundLocalError - 'context' reference before assignment

In ZeroCLIP.py in function get_next_probs variable context is created within if statement, but then in line 228 it is referenced. So if the condition in line 220 is False then there is an UnboundLocalError.

opened by daurmur 3

Owner

GitHub

Official Pytorch Implementation of: "Semantic Diversity Learning for Zero-Shot Multi-label Classification"(2021) paper

Semantic Diversity Learning for Zero-Shot Multi-label Classification Paper Official PyTorch Implementation Avi Ben-Cohen, Nadav Zamir, Emanuel Ben Bar

28 Aug 29, 2022

An official implementation of "Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation" (ICCV 2021) in PyTorch.

Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation This is an official implementation of the paper "Exploiting a Joint

35 Oct 26, 2022

A PyTorch Implementation of "Neural Arithmetic Logic Units"

Neural Arithmetic Logic Units [WIP] This is a PyTorch implementation of Neural Arithmetic Logic Units by Andrew Trask, Felix Hill, Scott Reed, Jack Ra

181 Nov 18, 2022

[SIGGRAPH 2022 Journal Track] AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars

AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars Fangzhou Hong1* Mingyuan Zhang1* Liang Pan1 Zhongang Cai1,2,3 Lei Yang2

749 Jan 4, 2023

[CVPR 2021] Released code for Counterfactual Zero-Shot and Open-Set Visual Recognition

Counterfactual Zero-Shot and Open-Set Visual Recognition This project provides implementations for our CVPR 2021 paper Counterfactual Zero-S

144 Dec 24, 2022

Public repository of the 3DV 2021 paper "Generative Zero-Shot Learning for Semantic Segmentation of 3D Point Clouds"

Generative Zero-Shot Learning for Semantic Segmentation of 3D Point Clouds Björn Michele1), Alexandre Boulch1), Gilles Puy1), Maxime Bucher1) and Rena

15 Dec 22, 2022

Zsseg.baseline - Zero-Shot Semantic Segmentation

This repo is for our paper A Simple Baseline for Zero-shot Semantic Segmentation

98 Dec 20, 2022

Codes for ACL-IJCNLP 2021 Paper "Zero-shot Fact Verification by Claim Generation"

Zero-shot-Fact-Verification-by-Claim-Generation This repository contains code and models for the paper: Zero-shot Fact Verification by Claim Generatio

47 Jan 1, 2023

ZeroGen: Efficient Zero-shot Learning via Dataset Generation

ZEROGEN This repository contains the code for our paper “ZeroGen: Efficient Zero

31 Dec 30, 2022

Siamese-nn-semantic-text-similarity - A repository containing comprehensive Neural Networks based PyTorch implementations for the semantic text similarity task

Siamese Deep Neural Networks for Semantic Text Similarity PyTorch A repository c

32 Dec 15, 2022

Image-generation-baseline - MUGE Text To Image Generation Baseline

MUGE Text To Image Generation Baseline Requirements and Installation More detail

23 Oct 17, 2022

PyTorch implementation of 1712.06087 "Zero-Shot" Super-Resolution using Deep Internal Learning

Unofficial PyTorch implementation of "Zero-Shot" Super-Resolution using Deep Internal Learning Unofficial Implementation of 1712.06087 "Zero-Shot" Sup

196 Nov 27, 2022

[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

Discriminative Region-based Multi-Label Zero-Shot Learning (ICCV 2021) [arXiv][Project page >> coming soon] Sanath Narayan*, Akshita Gupta*, Salman Kh

54 Nov 21, 2022

[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

Discriminative Region-based Multi-Label Zero-Shot Learning (ICCV 2021) [arXiv][Project page >> coming soon] Sanath Narayan*, Akshita Gupta*, Salman Kh

54 Nov 21, 2022

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

README.md shall be finished soon. WSSGG 0 Overview 1 Installation 1.1 Faster-RCNN 1.2 Language Parser 1.3 GloVe Embeddings 2 Settings 2.1 VG-GT-Graph

35 Nov 20, 2022

Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

Related tags

Overview

Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

Approach

Example

Usage

To run captioning on a single image:

To run model on visual arithmetic:

To run model on real world knowledge:

To run model on OCR:

Comments

Add Docker environment & web demo

Should I additionally set end_factor to 1.04 in the command and set the variable self.ef_idx to 3 in class CLIPTextGenerator to reproduce image caption results?

Any evaluate script provided to quickly calculate the metrics in this paper?

About Eq.(5) in paper

Metrics Computations : B@1 = BLEU-1, M = METEOR, C = CIDEr, S = SPICE.

UnboundLocalError - 'context' reference before assignment

Owner

Official Pytorch Implementation of: "Semantic Diversity Learning for Zero-Shot Multi-label Classification"(2021) paper

An official implementation of "Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation" (ICCV 2021) in PyTorch.

A PyTorch Implementation of "Neural Arithmetic Logic Units"

[SIGGRAPH 2022 Journal Track] AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars

[CVPR 2021] Released code for Counterfactual Zero-Shot and Open-Set Visual Recognition

Public repository of the 3DV 2021 paper "Generative Zero-Shot Learning for Semantic Segmentation of 3D Point Clouds"

Zsseg.baseline - Zero-Shot Semantic Segmentation

Codes for ACL-IJCNLP 2021 Paper "Zero-shot Fact Verification by Claim Generation"

ZeroGen: Efficient Zero-shot Learning via Dataset Generation

Siamese-nn-semantic-text-similarity - A repository containing comprehensive Neural Networks based PyTorch implementations for the semantic text similarity task

Image-generation-baseline - MUGE Text To Image Generation Baseline

PyTorch implementation of 1712.06087 "Zero-Shot" Super-Resolution using Deep Internal Learning

[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

GPU-accelerated PyTorch implementation of Zero-shot User Intent Detection via Capsule Neural Networks

SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model

Code repo for EMNLP21 paper "Zero-Shot Information Extraction as a Unified Text-to-Triple Translation"

Pytorch implementation of few-shot semantic image synthesis

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''