This repository contains the code used in the paper "Prompt-Based Multi-Modal Image Segmentation".

Overview

Prompt-Based Multi-Modal Image Segmentation

This repository contains the code used in the paper "Prompt-Based Multi-Modal Image Segmentation".

drawing

The systems allows to create segmentation models without training based on:

  • An arbitrary text query
  • Or an image with a mask highlighting stuff or an object.

Quick Start

In the Quickstart.ipynb notebook we provide the code for using a pre-trained CLIPSeg model. It can also be used interactively using MyBinder (please note that the VM does not use a GPU, thus inference takes a few seconds).

Dependencies

This code base depends on pytorch, torchvision and clip (pip install git+https://github.com/openai/CLIP.git). Additional dependencies are hidden for double blind review.

Datasets

  • PhraseCut and PhraseCutPlus: Referring expression dataset
  • PFEPascalWrapper: Wrapper class for PFENet's Pascal-5i implementation
  • PascalZeroShot: Wrapper class for PascalZeroShot
  • COCOWrapper: Wrapper class for COCO.

Models

  • CLIPDensePredT: CLIPSeg model with transformer-based decoder.
  • ViTDensePredT: CLIPSeg model with transformer-based decoder.

Third Party Dependencies

For some of the datasets third party dependencies are required. Run the following commands in the third_party folder.

git clone https://github.com/cvlab-yonsei/JoEm
git clone https://github.com/Jia-Research-Lab/PFENet.git
git clone https://github.com/ChenyunWu/PhraseCutDataset.git
git clone https://github.com/juhongm999/hsnet.git

Weights

Training

See the experiment folder for yaml definitions of the training configurations. The training code is in experiment_setup.py.

Usage of PFENet Wrappers

In order to use the dataset and model wrappers for PFENet, the PFENet repository needs to be cloned to the root folder. git clone https://github.com/Jia-Research-Lab/PFENet.git

Citation

@article{lueddecke21
    title={Prompt-Based Multi-Modal Image Segmentation},
    author={Timo Lüddecke and Alexander Ecker},
    journal={arXiv preprint arXiv:2112.10003},
    year={2021}
}
Comments
  • How can we run training?

    How can we run training?

    Thank you for releasing the great code! I enjoyed the notebook a lot.

    By the way, can we train a model from scratch or fine-tuning? Although I read experiment_setup.py, I didn't find how to run the training directly.

    opened by soskek 8
  • Increasing the resolution of the mask

    Increasing the resolution of the mask

    So when I run the code I get a mask that seems to be made of 32x32 large square on a whole 512x512 pixels image.

    Is there a way to increase the resolution of the prediction and thus the mask generated?

    Thanks a lot in advance.

    opened by remybonnav 5
  • Adding CLIPSeg to HuggingFace Transformers 🤗

    Adding CLIPSeg to HuggingFace Transformers 🤗

    Hi,

    Thanks for this awesome work. As I really liked the approach of adapting CLIP for zero and one-shot image segmentation, I implemented your model as a branch of 🤗 Transformers.

    The model is soon going to be added to the main library (see https://github.com/huggingface/transformers/pull/20066). Here's a Colab notebook to showcase usage: https://colab.research.google.com/drive/1ijnW67ac6bMnda4D_XkdUbZfilyVZDOh?usp=sharing.

    Would you like to create an organization on the hub, under which the checkpoints can be hosted?

    Currently I host them under my own username: https://huggingface.co/models?other=clipseg.

    Thanks!

    Niels, ML Engineer @ HF

    opened by NielsRogge 4
  • Unable to download weights

    Unable to download weights

    When I run: "! git clone https://github.com/timojl/clipseg" in google colab, I get this error: Error downloading object: weights/rd16-uni.pth (61545cd): Smudge error: Error downloading weights/rd16-uni.pth (61545cdb3a28f99d33d457c64a9721ade835a9dfbda604c459de6831c504167a): batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.

    Here is the FULL ERROR: Cloning into 'clipseg'... remote: Enumerating objects: 168, done. remote: Counting objects: 100% (77/77), done. remote: Compressing objects: 100% (61/61), done. remote: Total 168 (delta 36), reused 39 (delta 16), pack-reused 91 Receiving objects: 100% (168/168), 1.21 MiB | 4.54 MiB/s, done. Resolving deltas: 100% (77/77), done. Downloading weights/rd16-uni.pth (1.1 MB) Error downloading object: weights/rd16-uni.pth (61545cd): Smudge error: Error downloading weights/rd16-uni.pth (61545cdb3a28f99d33d457c64a9721ade835a9dfbda604c459de6831c504167a): batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.

    Errors logged to /content/clipseg/clipseg/clipseg/.git/lfs/objects/logs/20220922T222731.926515854.log Use git lfs logs last to view the log. error: external filter 'git-lfs filter-process' failed fatal: weights/rd16-uni.pth: smudge filter lfs failed warning: Clone succeeded, but checkout failed. You can inspect what was checked out with 'git status' and retry the checkout with 'git checkout -f HEAD'

    opened by Jetpackjules 2
  • UnpicklingError: invalid load key, 'v'.

    UnpicklingError: invalid load key, 'v'.

    I was running the Quickstart notebook in my local, but I was getting the following error.

    
    ---> model.load_state_dict(torch.load('weights/rd16-uni.pth', map_location=torch.device('cpu')), strict=False)
    
    File /scratch/miniconda3/envs/clipseg-environment/lib/python3.10/site-packages/torch/serialization.py:713, in load(f, map_location, pickle_module, **pickle_load_args)
        711             return torch.jit.load(opened_file)
        712         return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
    --> 713 return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
    
    File /scratch/miniconda3/envs/clipseg-environment/lib/python3.10/site-packages/torch/serialization.py:920, in _legacy_load(f, map_location, pickle_module, **pickle_load_args)
        914 if not hasattr(f, 'readinto') and (3, 8, 0) <= sys.version_info < (3, 8, 2):
        915     raise RuntimeError(
        916         "torch.load does not work with file-like objects that do not implement readinto on Python 3.8.0 and 3.8.1. "
        917         f"Received object of type \"{type(f)}\". Please update to Python 3.8.2 or newer to restore this "
        918         "functionality.")
    --> 920 magic_number = pickle_module.load(f, **pickle_load_args)
        921 if magic_number != MAGIC_NUMBER:
        922     raise RuntimeError("Invalid magic number; corrupt file?")
    
    UnpicklingError: invalid load key, 'v'.
    
    opened by animesh-007 2
  • Import error in pfe_dataset.py

    Import error in pfe_dataset.py

    In pfe_dataset.py line 5 : from datasets.lvis_oneshot3 import blend_image_segmentation I can not find the lvis_oneshot3.py file, Thanks! Beside, how can I run the zero-shot experiment?

    opened by hwanyu112 1
  • image with mask prompt for one shot

    image with mask prompt for one shot

    Hi, great job! I realy love this CLIP based segment. I am not so familiar with the code right now, I am just wondering how can I use a image as a prompt ?

    opened by raymond1123 1
  • import error in clipseg.py

    import error in clipseg.py

    On line 21 (models/clipseg.py) , I got an import error. from models.clip_prompts import imagenet_templates In fact , there has no file or function called clip_prompts in the models directory. Thank you very much!

    opened by zyyyz 1
  • import error

    import error

    When I was running the code, something wrong happened in models/clipseg.py (line 10 and line 17). Please correct the file import error. Thank you for your excellent work!

    opened by zyyyz 1
  • Is there a way to set a seed?

    Is there a way to set a seed?

    Trying to get the same output consitently for video frame-by-frame use. Can't find any sort of paramater/argument/function to call to set a custom seed...

    opened by Jetpackjules 1
  • Add setup.py and .gitignore.

    Add setup.py and .gitignore.

    Thanks for the awesome project! I needed to create a setup.py so that downstream projects can automatically install this package. Sharing a PR in case helpful. Also added .gitignore because pip puts the egg in the directory.

    opened by stephenbach 1
Owner
Timo Lüddecke
Postdoc @ecker-lab
Timo Lüddecke
This repo contains the code and data used in the paper "Wizard of Search Engine: Access to Information Through Conversations with Search Engines"

Wizard of Search Engine: Access to Information Through Conversations with Search Engines by Pengjie Ren, Zhongkun Liu, Xiaomeng Song, Hongtao Tian, Zh

null 19 Oct 27, 2022
This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the time series forecasting research space.

TSForecasting This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the tim

Rakshitha Godahewa 80 Dec 30, 2022
This repository contains the code for the paper "Hierarchical Motion Understanding via Motion Programs"

Hierarchical Motion Understanding via Motion Programs (CVPR 2021) This repository contains the official implementation of: Hierarchical Motion Underst

Sumith Kulal 40 Dec 5, 2022
null 190 Jan 3, 2023
This repository contains the source code and data for reproducing results of Deep Continuous Clustering paper

Deep Continuous Clustering Introduction This is a Pytorch implementation of the DCC algorithms presented in the following paper (paper): Sohil Atul Sh

Sohil Shah 197 Nov 29, 2022
This repository contains a re-implementation of the code for the CVPR 2021 paper "Omnimatte: Associating Objects and Their Effects in Video."

Omnimatte in PyTorch This repository contains a re-implementation of the code for the CVPR 2021 paper "Omnimatte: Associating Objects and Their Effect

Erika Lu 728 Dec 28, 2022
This repository contains the source code for the paper "DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks",

DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks Project Page | Video | Presentation | Paper | Data L

Facebook Research 281 Dec 22, 2022
This repository contains the code and models for the following paper.

DC-ShadowNet Introduction This is an implementation of the following paper DC-ShadowNet: Single-Image Hard and Soft Shadow Removal Using Unsupervised

AuAgCu 65 Dec 27, 2022
This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

MultiModal-InfoMax This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Informa

Deep Cognition and Language Research (DeCLaRe) Lab 89 Dec 26, 2022
This repository contains the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"

GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields Project Page | Paper | Supplementary | Video | Slides | Blog | Talk If

null 1.1k Dec 30, 2022
This repository contains the code for the CVPR 2020 paper "Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision"

Differentiable Volumetric Rendering Paper | Supplementary | Spotlight Video | Blog Entry | Presentation | Interactive Slides | Project Page This repos

null 697 Jan 6, 2023
This repository contains the code for "SBEVNet: End-to-End Deep Stereo Layout Estimation" paper by Divam Gupta, Wei Pu, Trenton Tabor, Jeff Schneider

SBEVNet: End-to-End Deep Stereo Layout Estimation This repository contains the code for "SBEVNet: End-to-End Deep Stereo Layout Estimation" paper by D

Divam Gupta 19 Dec 17, 2022
This repository contains the code for the paper "PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization"

PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization News: [2020/05/04] Added EGL rendering option for training data g

Shunsuke Saito 1.5k Jan 3, 2023
This repository contains the code and models necessary to replicate the results of paper: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

Black-Box-Defense This repository contains the code and models necessary to replicate the results of our recent paper: How to Robustify Black-Box ML M

OPTML Group 2 Oct 5, 2022
This repository contains the code and models necessary to replicate the results of paper: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

Black-Box-Defense This repository contains the code and models necessary to replicate the results of our recent paper: How to Robustify Black-Box ML M

OPTML Group 2 Oct 5, 2022
This repository contains the data and code for the paper "Diverse Text Generation via Variational Encoder-Decoder Models with Gaussian Process Priors" (SPNLP@ACL2022)

GP-VAE This repository provides datasets and code for preprocessing, training and testing models for the paper: Diverse Text Generation via Variationa

Wanyu Du 18 Dec 29, 2022
This project contains an implemented version of Face Detection using OpenCV and Mediapipe. This is a code snippet and can be used in projects.

Live-Face-Detection Project Description: In this project, we will be using the live video feed from the camera to detect Faces. It will also detect so

Hassan Shahzad 3 Oct 2, 2021
An image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testingAn image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testing

SVM Données Une base d’images contient 490 images pour l’apprentissage (400 voitures et 90 bateaux), et encore 21 images pour fait des tests. Prétrait

Achraf Rahouti 3 Nov 30, 2021
This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

SO-Pose This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation This paper is basically an

shangbuhuan 52 Nov 25, 2022