Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

Overview

Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

[Paper] [Colab is coming soon]

Approach

Example

Usage

To run captioning on a single image:

$ python run.py 
--reset_context_delta
--caption_img_path "example_images/captions/COCO_val2014_000000097017.jpg"

To run model on visual arithmetic:

$ python run.py 
--reset_context_delta
--end_factor 1.06
--fusion_factor 0.95
--grad_norm_factor 0.95
--run_type arithmetics
--arithmetics_imgs "example_images/arithmetics/woman2.jpg" "example_images/arithmetics/king2.jpg" "example_images/arithmetics/man2.jpg"
--arithmetics_weights 1 1 -1

To run model on real world knowledge:

$ python run.py
--reset_context_delta --cond_text "Image of" 
--end_factor 1.04 
--caption_img_path "example_images/real_world/simpsons.jpg"

To run model on OCR:

$ python run.py
--reset_context_delta --cond_text "Image of text that says" 
--end_factor 1.04 
--caption_img_path "example_images/OCR/welcome_sign.jpg"
Comments
  • Add Docker environment & web demo

    Add Docker environment & web demo

    Hey @YoadTew ! 👋

    This pull request makes it possible to run your model inside a Docker environment, which makes it easier for other people to run it. We're using an open source tool called Cog to make this process easier.

    This also means we can make a web page where other people can try out your model! We have implemented the captioning model at the moment. View it here: https://replicate.com/yoadtew/zero-shot-image-to-text

    We noticed you have signed up for Replicate, so you can add examples and customise the Example gallery as you like and push any future update to the web demo.

    opened by chenxwh 4
  • Should I additionally set end_factor  to 1.04 in the command and set the variable self.ef_idx to 3 in class CLIPTextGenerator to reproduce image caption results?

    Should I additionally set end_factor to 1.04 in the command and set the variable self.ef_idx to 3 in class CLIPTextGenerator to reproduce image caption results?

    in readme.md, to perform image caption, the command is $ python run.py --reset_context_delta --caption_img_path "example_images/captions/COCO_val2014_000000097017.jpg" however, in the paper, it said that the end_factor is 1.04 and time-step is 3. To reproduce image caption results, should I additionally set end_factor to 1.04 in the command and set the variable self.ef_idx to 3 in class CLIPTextGenerator?

    opened by baiyuting 0
  • Any evaluate script provided to quickly calculate the metrics in this paper?

    Any evaluate script provided to quickly calculate the metrics in this paper?

    I'd like to reproduce the results of image caption in this paper, is there any evaluate script that can be provided to quickly calculate these metrics in this paper? And if I want to reproduce the image caption result, are there any hyper-parameters or settings in these code that I need to pay attention to?

    opened by baiyuting 0
  • About Eq.(5) in paper

    About Eq.(5) in paper

    In paper, image where $\alpha = 0.3$ and the norm of gradients has a factor of 2? But in https://github.com/YoadTew/zero-shot-image-to-text/blob/main/model/ZeroCLIP.py stepsize = 0.3 but grad_norm_factor=0.9? Did I make a mistaken understanding?

    opened by 232525 0
  • Metrics Computations : B@1 = BLEU-1, M = METEOR, C = CIDEr, S = SPICE.

    Metrics Computations : B@1 = BLEU-1, M = METEOR, C = CIDEr, S = SPICE.

    Hi, Thank you very much for sharing your excellent work, I have following queries if you spare time to answer please.

    1: I have reproduced the captions of the given sample mages as per steps given in the repository. 2: Now i need to investigate the evaluation metrics, so I want to know where in the code are you computing the above metrics like BLEU, METEOR, CIDEr and SPICE as well as diversity metrics. 3. I have gone through the paper , it seems we don't need any training ? am i right or am i missing some thing. Sorry for trivial questions.

    Regards and thanking you in anticipation

    opened by aliman80 0
  • UnboundLocalError - 'context' reference before assignment

    UnboundLocalError - 'context' reference before assignment

    In ZeroCLIP.py in function get_next_probs variable context is created within if statement, but then in line 228 it is referenced. So if the condition in line 220 is False then there is an UnboundLocalError.

    opened by daurmur 3
Owner
null
Official Pytorch Implementation of: "Semantic Diversity Learning for Zero-Shot Multi-label Classification"(2021) paper

Semantic Diversity Learning for Zero-Shot Multi-label Classification Paper Official PyTorch Implementation Avi Ben-Cohen, Nadav Zamir, Emanuel Ben Bar

null 28 Aug 29, 2022
An official implementation of "Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation" (ICCV 2021) in PyTorch.

Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation This is an official implementation of the paper "Exploiting a Joint

CV Lab @ Yonsei University 35 Oct 26, 2022
A PyTorch Implementation of "Neural Arithmetic Logic Units"

Neural Arithmetic Logic Units [WIP] This is a PyTorch implementation of Neural Arithmetic Logic Units by Andrew Trask, Felix Hill, Scott Reed, Jack Ra

Kevin Zakka 181 Nov 18, 2022
[SIGGRAPH 2022 Journal Track] AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars

AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars Fangzhou Hong1*  Mingyuan Zhang1*  Liang Pan1  Zhongang Cai1,2,3  Lei Yang2 

Fangzhou Hong 749 Jan 4, 2023
[CVPR 2021] Released code for Counterfactual Zero-Shot and Open-Set Visual Recognition

Counterfactual Zero-Shot and Open-Set Visual Recognition This project provides implementations for our CVPR 2021 paper Counterfactual Zero-S

null 144 Dec 24, 2022
Public repository of the 3DV 2021 paper "Generative Zero-Shot Learning for Semantic Segmentation of 3D Point Clouds"

Generative Zero-Shot Learning for Semantic Segmentation of 3D Point Clouds Björn Michele1), Alexandre Boulch1), Gilles Puy1), Maxime Bucher1) and Rena

valeo.ai 15 Dec 22, 2022
Zsseg.baseline - Zero-Shot Semantic Segmentation

This repo is for our paper A Simple Baseline for Zero-shot Semantic Segmentation

null 98 Dec 20, 2022
Codes for ACL-IJCNLP 2021 Paper "Zero-shot Fact Verification by Claim Generation"

Zero-shot-Fact-Verification-by-Claim-Generation This repository contains code and models for the paper: Zero-shot Fact Verification by Claim Generatio

Liangming Pan 47 Jan 1, 2023
ZeroGen: Efficient Zero-shot Learning via Dataset Generation

ZEROGEN This repository contains the code for our paper “ZeroGen: Efficient Zero

Jiacheng Ye 31 Dec 30, 2022
Image-generation-baseline - MUGE Text To Image Generation Baseline

MUGE Text To Image Generation Baseline Requirements and Installation More detail

null 23 Oct 17, 2022
PyTorch implementation of 1712.06087 "Zero-Shot" Super-Resolution using Deep Internal Learning

Unofficial PyTorch implementation of "Zero-Shot" Super-Resolution using Deep Internal Learning Unofficial Implementation of 1712.06087 "Zero-Shot" Sup

Jacob Gildenblat 196 Nov 27, 2022
[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

Discriminative Region-based Multi-Label Zero-Shot Learning (ICCV 2021) [arXiv][Project page >> coming soon] Sanath Narayan*, Akshita Gupta*, Salman Kh

Akshita Gupta 54 Nov 21, 2022
[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

Discriminative Region-based Multi-Label Zero-Shot Learning (ICCV 2021) [arXiv][Project page >> coming soon] Sanath Narayan*, Akshita Gupta*, Salman Kh

Akshita Gupta 54 Nov 21, 2022
GPU-accelerated PyTorch implementation of Zero-shot User Intent Detection via Capsule Neural Networks

GPU-accelerated PyTorch implementation of Zero-shot User Intent Detection via Capsule Neural Networks This repository implements a capsule model Inten

Joel Huang 15 Dec 24, 2022
SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model

SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model Edresson Casanova, Christopher Shulby, Eren Gölge, Nicolas Michael Müller, Frede

Edresson Casanova 92 Dec 9, 2022
Code repo for EMNLP21 paper "Zero-Shot Information Extraction as a Unified Text-to-Triple Translation"

Zero-Shot Information Extraction as a Unified Text-to-Triple Translation Source code repo for paper Zero-Shot Information Extraction as a Unified Text

cgraywang 88 Dec 31, 2022
Pytorch implementation of few-shot semantic image synthesis

Few-shot Semantic Image Synthesis Using StyleGAN Prior Our method can synthesize photorealistic images from dense or sparse semantic annotations using

null 40 Sep 26, 2022
A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

README.md shall be finished soon. WSSGG 0 Overview 1 Installation 1.1 Faster-RCNN 1.2 Language Parser 1.3 GloVe Embeddings 2 Settings 2.1 VG-GT-Graph

Keren Ye 35 Nov 20, 2022