It is a simple library to speed up CLIP inference up to 3x (K80 GPU)

Overview

CLIP-ONNX

It is a simple library to speed up CLIP inference up to 3x (K80 GPU)

Usage

Install clip-onnx module and requirements first. Use this trick

!pip install git+https://github.com/Lednik7/CLIP-ONNX.git

Example in 3 steps

  1. Download CLIP image from repo
!wget -c -O CLIP.png https://github.com/openai/CLIP/blob/main/CLIP.png?raw=true
  1. Load standard CLIP model, image, text on cpu
import clip
from PIL import Image

# onnx cannot work with cuda
model, preprocess = clip.load("ViT-B/32", device="cpu", jit=False)
# batch first
image = preprocess(Image.open("CLIP.png")).unsqueeze(0) # [1, 3, 224, 224]
text = clip.tokenize(["a diagram", "a dog", "a cat"]) # [3, 77]
  1. Create CLIP-ONNX object to convert model to onnx
from clip_onnx import clip_onnx, attention
clip.model.ResidualAttentionBlock.attention = attention

visual_path = "clip_visual.onnx"
textual_path = "clip_textual.onnx"

# ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
onnx_model = clip_onnx(model, providers=["CPUExecutionProvider"], # cpu mode
                       visual_path=visual_path, textual_path=textual_path)
onnx_model.convert2onnx(image, text, verbose=True)
onnx_model.start_sessions()
  1. Use for standard CLIP API. Batch inference
image_features = onnx_model.encode_image(image)
text_features = onnx_model.encode_text(text)

logits_per_image, logits_per_text = onnx_model(image, text)
probs = logits_per_image.softmax(dim=-1).cpu().numpy()

print("Label probs:", probs)  # prints: [[0.41456965 0.29270944 0.29272085]]

Enjoy the speed

Examples

See examples folder for more details
Some parts of the code were taken from the post. Thank you neverix for this notebook.

Comments
  • Can't use CUDAExecutionProvider

    Can't use CUDAExecutionProvider

    Hey, I'm trying to use the code on GPU and I encountered 2 problems:

    1. when running pip install git+https://github.com/Lednik7/CLIP-ONNX.git I got the following error (tried on multiple machines): ERROR: Could not find a version that satisfies the requirement torch==1.10.0+cu111 (from clip-onnx)

    I fixed it by installing that version of torch by myself. with pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html, and then running the rest of the installation.

    1. After I installed the package, I tried to run the example in the readme with CPUExecutionProvider and it worked fine, but when I'm trying to run it on GPU with CUDAExecutionProvider I get the following error message (again on different machines):

    2022-01-31 20:57:03.234399301 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:535 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met. 2022-01-31 20:57:03.872349008 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:535 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.

    I can't figure out what is the problem. Any help?

    opened by YoadTew 13
  • Performance is inconsistent with the original model

    Performance is inconsistent with the original model

    Hi, thanks for providing this useful tool! However, I found that the result produced by the generated ONNX model is inconsistent with the original CLIP model. Here is the code I used to test the original model:

    model, preprocess = clip.load("ViT-B/32", device="cpu", jit=False)
    
    image = preprocess(Image.open("CLIP.png")).unsqueeze(0).cpu() # [1, 3, 224, 224]
    text = clip.tokenize(["a diagram", "a dog", "a cat"]).cpu() # [3, 77]
    
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    
    logits_per_image, logits_per_text = model(image, text)
    probs = logits_per_image.softmax(dim=-1).detach().cpu().numpy()
    
    print("Label probs:", probs) 
    

    The result is: Label probs: [[0.9927937 0.00421069 0.00299573]]

    However, when using the onnx model, the result is: Label probs: [[0.41456965 0.29270944 0.29272085]].

    Could you help me with this? Thanks!

    opened by Cestlaviez 5
  • Error on installing the torch version in requirements.txt

    Error on installing the torch version in requirements.txt

    pip install git+https://github.com/Lednik7/CLIP-ONNX.git

    ERROR: Could not find a version that satisfies the requirement torch==1.11.0+cu113 (from versions: 1.0.0, 1.0.1, 1.0.1.post2, 1.1.0, 1.2.0, 1.3.0, 1.3.1, 1.4.0, 1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2, 1.11.0)
    ERROR: No matching distribution found for torch==1.11.0+cu113
    

    python version is 3.7.13

    opened by dingusagar 2
  • ERROR: No matching distribution found for onnxruntime==1.11

    ERROR: No matching distribution found for onnxruntime==1.11

    Hi, Thanks for the great work!

    I am having this error when I try to install the package.

    ERROR: No matching distribution found for onnxruntime==1.11

    Maybe we can update the requirements.txt?

    opened by wanliAlex 1
  • Replace the operator of

    Replace the operator of "torch.einsum"

    q, k, v = (torch.einsum("tbh, oh -> tbo", x, self.attn.in_proj_weight) + self.attn.in_proj_bias).contiguous().chunk( 3, dim=-1)

    @Lednik7 Thanks for your great work on Clip-ONNX. for the pytorch operator of "torch.einsum" , if we don't want to use this operator , do you have other codes to replace this operator? this operator is not friendly to some Inference engine, like NV TensorRT, so if you have other codes to replace einsum, that will be better

    opened by zhangnju 2
Owner
Gerasimov Maxim
16 y.o. Data Scientist. Graduated by Yandex Lyceum and Tinkoff Education
Gerasimov Maxim
Torchserve server using a YoloV5 model running on docker with GPU and static batch inference to perform production ready inference.

Yolov5 running on TorchServe (GPU compatible) ! This is a dockerfile to run TorchServe for Yolo v5 object detection model. (TorchServe (PyTorch librar

null 82 Nov 29, 2022
Speed-Test - You can check your intenet speed using this tool

Speed-Test Tool By Hez_X >> AVAILABLE ON : Termux & Kali linux & Ubuntu (Linux E

Hez-X 3 Feb 17, 2022
FuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space OptimizationFuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space Optimization

FuseDream This repo contains code for our paper (paper link): FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimizat

XCL 191 Dec 31, 2022
CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP

CLIP-GEN [简体中文][English] 本项目在萤火二号集群上用 PyTorch 实现了论文 《CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP》。 CLIP-GEN 是一个 Language-F

null 75 Dec 29, 2022
NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.

NVIDIA Merlin NVIDIA Merlin is an open source library designed to accelerate recommender systems on NVIDIA’s GPUs. It enables data scientists, machine

null 419 Jan 3, 2023
Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation. Intel iHD GPU (iGPU) support. NVIDIA GPU (dGPU) support.

mtomo Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation.

Katsuya Hyodo 24 Mar 2, 2022
GrabGpu_py: a scripts for grab gpu when gpu is free

GrabGpu_py a scripts for grab gpu when gpu is free. WaitCondition: gpu_memory >

tianyuluan 3 Jun 18, 2022
Monocular 3D pose estimation. OpenVINO. CPU inference or iGPU (OpenCL) inference.

human-pose-estimation-3d-python-cpp RealSenseD435 (RGB) 480x640 + CPU Corei9 45 FPS (Depth is not used) 1. Run 1-1. RealSenseD435 (RGB) 480x640 + CPU

Katsuya Hyodo 8 Oct 3, 2022
PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices.

PyTorch-LIT PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices. With

Amin Rezaei 157 Dec 11, 2022
Data-depth-inference - Data depth inference with python

Welcome! This readme will guide you through the use of the code in this reposito

Marco 3 Feb 8, 2022
Tgbox-bench - Simple TGBOX upload speed benchmark

TGBOX Benchmark This script will benchmark upload speed to TGBOX storage. Build

Non 1 Jan 9, 2022
Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)

Deep Daze mist over green hills shattered plates on the grass cosmic love and attention a time traveler in the crowd life during the plague meditative

Phil Wang 4.4k Jan 3, 2023
A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN.

Ryan Murdock has done it again, combining OpenAI's CLIP and the generator from a BigGAN! This repository wraps up his work so it is easily accessible to anyone who owns a GPU.

Phil Wang 2.3k Jan 9, 2023
Simple implementation of OpenAI CLIP model in PyTorch.

It was in January of 2021 that OpenAI announced two new models: DALL-E and CLIP, both multi-modality models connecting texts and images in some way. In this article we are going to implement CLIP model from scratch in PyTorch. OpenAI has open-sourced some of the code relating to CLIP model but I found it intimidating and it was far from something short and simple. I also came across a good tutorial inspired by CLIP model on Keras code examples and I translated some parts of it into PyTorch to build this tutorial totally with our beloved PyTorch!

Moein Shariatnia 226 Jan 5, 2023
Simple image captioning model - CLIP prefix captioning.

Simple image captioning model - CLIP prefix captioning.

null 688 Jan 4, 2023
GPU-Accelerated Deep Learning Library in Python

Hebel GPU-Accelerated Deep Learning Library in Python Hebel is a library for deep learning with neural networks in Python using GPU acceleration with

Hannes Bretschneider 1.2k Dec 21, 2022
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 6.9k Jan 4, 2023
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 5.7k Feb 12, 2021