This repository contains the code used in the paper "Prompt-Based Multi-Modal Image Segmentation".

Timo Lüddecke

Last update: Dec 30, 2022

Related tags

Deep Learning clipseg

Overview

Prompt-Based Multi-Modal Image Segmentation

This repository contains the code used in the paper "Prompt-Based Multi-Modal Image Segmentation".

The systems allows to create segmentation models without training based on:

An arbitrary text query
Or an image with a mask highlighting stuff or an object.

Quick Start

In the Quickstart.ipynb notebook we provide the code for using a pre-trained CLIPSeg model. It can also be used interactively using MyBinder (please note that the VM does not use a GPU, thus inference takes a few seconds).

Dependencies

This code base depends on pytorch, torchvision and clip (pip install git+https://github.com/openai/CLIP.git). Additional dependencies are hidden for double blind review.

Datasets

PhraseCut and PhraseCutPlus: Referring expression dataset
PFEPascalWrapper: Wrapper class for PFENet's Pascal-5i implementation
PascalZeroShot: Wrapper class for PascalZeroShot
COCOWrapper: Wrapper class for COCO.

Models

CLIPDensePredT: CLIPSeg model with transformer-based decoder.
ViTDensePredT: CLIPSeg model with transformer-based decoder.

Third Party Dependencies

For some of the datasets third party dependencies are required. Run the following commands in the third_party folder.

git clone https://github.com/cvlab-yonsei/JoEm
git clone https://github.com/Jia-Research-Lab/PFENet.git
git clone https://github.com/ChenyunWu/PhraseCutDataset.git
git clone https://github.com/juhongm999/hsnet.git

Weights

CLIPSeg-D64 (4.1MB, without CLIP weights)
CLIPSeg-D16 (1.1MB, without CLIP weights)

Training

See the experiment folder for yaml definitions of the training configurations. The training code is in experiment_setup.py.

Usage of PFENet Wrappers

In order to use the dataset and model wrappers for PFENet, the PFENet repository needs to be cloned to the root folder. git clone https://github.com/Jia-Research-Lab/PFENet.git

Citation

@article{lueddecke21
    title={Prompt-Based Multi-Modal Image Segmentation},
    author={Timo Lüddecke and Alexander Ecker},
    journal={arXiv preprint arXiv:2112.10003},
    year={2021}
}

Comments

How can we run training?

Thank you for releasing the great code! I enjoyed the notebook a lot.

By the way, can we train a model from scratch or fine-tuning? Although I read experiment_setup.py, I didn't find how to run the training directly.

opened by soskek 8
Increasing the resolution of the mask

So when I run the code I get a mask that seems to be made of 32x32 large square on a whole 512x512 pixels image.

Is there a way to increase the resolution of the prediction and thus the mask generated?

Thanks a lot in advance.

opened by remybonnav 5
Adding CLIPSeg to HuggingFace Transformers 🤗

Hi,

Thanks for this awesome work. As I really liked the approach of adapting CLIP for zero and one-shot image segmentation, I implemented your model as a branch of 🤗 Transformers.

The model is soon going to be added to the main library (see https://github.com/huggingface/transformers/pull/20066). Here's a Colab notebook to showcase usage: https://colab.research.google.com/drive/1ijnW67ac6bMnda4D_XkdUbZfilyVZDOh?usp=sharing.

Would you like to create an organization on the hub, under which the checkpoints can be hosted?

Currently I host them under my own username: https://huggingface.co/models?other=clipseg.

Thanks!

Niels, ML Engineer @ HF

opened by NielsRogge 4
Unable to download weights

When I run: "! git clone https://github.com/timojl/clipseg" in google colab, I get this error: Error downloading object: weights/rd16-uni.pth (61545cd): Smudge error: Error downloading weights/rd16-uni.pth (61545cdb3a28f99d33d457c64a9721ade835a9dfbda604c459de6831c504167a): batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.

Here is the FULL ERROR: Cloning into 'clipseg'... remote: Enumerating objects: 168, done. remote: Counting objects: 100% (77/77), done. remote: Compressing objects: 100% (61/61), done. remote: Total 168 (delta 36), reused 39 (delta 16), pack-reused 91 Receiving objects: 100% (168/168), 1.21 MiB | 4.54 MiB/s, done. Resolving deltas: 100% (77/77), done. Downloading weights/rd16-uni.pth (1.1 MB) Error downloading object: weights/rd16-uni.pth (61545cd): Smudge error: Error downloading weights/rd16-uni.pth (61545cdb3a28f99d33d457c64a9721ade835a9dfbda604c459de6831c504167a): batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.

Errors logged to /content/clipseg/clipseg/clipseg/.git/lfs/objects/logs/20220922T222731.926515854.log Use git lfs logs last to view the log. error: external filter 'git-lfs filter-process' failed fatal: weights/rd16-uni.pth: smudge filter lfs failed warning: Clone succeeded, but checkout failed. You can inspect what was checked out with 'git status' and retry the checkout with 'git checkout -f HEAD'

opened by Jetpackjules 2

UnpicklingError: invalid load key, 'v'.

I was running the Quickstart notebook in my local, but I was getting the following error.


---> model.load_state_dict(torch.load('weights/rd16-uni.pth', map_location=torch.device('cpu')), strict=False)

File /scratch/miniconda3/envs/clipseg-environment/lib/python3.10/site-packages/torch/serialization.py:713, in load(f, map_location, pickle_module, **pickle_load_args)
    711             return torch.jit.load(opened_file)
    712         return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
--> 713 return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)

File /scratch/miniconda3/envs/clipseg-environment/lib/python3.10/site-packages/torch/serialization.py:920, in _legacy_load(f, map_location, pickle_module, **pickle_load_args)
    914 if not hasattr(f, 'readinto') and (3, 8, 0) <= sys.version_info < (3, 8, 2):
    915     raise RuntimeError(
    916         "torch.load does not work with file-like objects that do not implement readinto on Python 3.8.0 and 3.8.1. "
    917         f"Received object of type \"{type(f)}\". Please update to Python 3.8.2 or newer to restore this "
    918         "functionality.")
--> 920 magic_number = pickle_module.load(f, **pickle_load_args)
    921 if magic_number != MAGIC_NUMBER:
    922     raise RuntimeError("Invalid magic number; corrupt file?")

UnpicklingError: invalid load key, 'v'.

opened by animesh-007 2

Import error in pfe_dataset.py

In pfe_dataset.py line 5 : from datasets.lvis_oneshot3 import blend_image_segmentation I can not find the lvis_oneshot3.py file, Thanks! Beside, how can I run the zero-shot experiment?

opened by hwanyu112 1
image with mask prompt for one shot

Hi, great job! I realy love this CLIP based segment. I am not so familiar with the code right now, I am just wondering how can I use a image as a prompt ?

opened by raymond1123 1
import error in clipseg.py

On line 21 (models/clipseg.py) , I got an import error. from models.clip_prompts import imagenet_templates In fact , there has no file or function called clip_prompts in the models directory. Thank you very much!

opened by zyyyz 1
import error

When I was running the code, something wrong happened in models/clipseg.py (line 10 and line 17). Please correct the file import error. Thank you for your excellent work!

opened by zyyyz 1
Is there a way to set a seed?

Trying to get the same output consitently for video frame-by-frame use. Can't find any sort of paramater/argument/function to call to set a custom seed...

opened by Jetpackjules 1
Add setup.py and .gitignore.

Thanks for the awesome project! I needed to create a setup.py so that downstream projects can automatically install this package. Sharing a PR in case helpful. Also added .gitignore because pip puts the egg in the directory.

opened by stephenbach 1

This repository contains the code used in the paper "Prompt-Based Multi-Modal Image Segmentation".

Related tags

Overview

Prompt-Based Multi-Modal Image Segmentation

Quick Start

Dependencies

Datasets

Models

Third Party Dependencies

Weights

Training

Usage of PFENet Wrappers

Citation

Comments

Owner

Timo Lüddecke

This repo contains the code and data used in the paper "Wizard of Search Engine: Access to Information Through Conversations with Search Engines"

This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the time series forecasting research space.

This repository contains the code for the paper "Hierarchical Motion Understanding via Motion Programs"

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL 2021.

This repository contains the source code and data for reproducing results of Deep Continuous Clustering paper

This repository contains a re-implementation of the code for the CVPR 2021 paper "Omnimatte: Associating Objects and Their Effects in Video."

This repository contains the source code for the paper "DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks",

This repository contains the code and models for the following paper.

This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

This repository contains the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"

This repository contains the code for the CVPR 2020 paper "Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision"

This repository contains the code for "SBEVNet: End-to-End Deep Stereo Layout Estimation" paper by Divam Gupta, Wei Pu, Trenton Tabor, Jeff Schneider

This repository contains the code for the paper "PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization"

This repository contains the code and models necessary to replicate the results of paper: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

This repository contains the code and models necessary to replicate the results of paper: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

This repository contains the data and code for the paper "Diverse Text Generation via Variational Encoder-Decoder Models with Gaussian Process Priors" (SPNLP@ACL2022)

This project contains an implemented version of Face Detection using OpenCV and Mediapipe. This is a code snippet and can be used in projects.

An image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testingAn image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testing

This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation