It is a simple library to speed up CLIP inference up to 3x (K80 GPU)

Gerasimov Maxim

Last update: Dec 20, 2022

Related tags

Deep Learning python torch clip onnx onnxruntime onnxruntime-gpu

Overview

CLIP-ONNX

It is a simple library to speed up CLIP inference up to 3x (K80 GPU)

Usage

Install clip-onnx module and requirements first. Use this trick

!pip install git+https://github.com/Lednik7/CLIP-ONNX.git

Example in 3 steps

Download CLIP image from repo

!wget -c -O CLIP.png https://github.com/openai/CLIP/blob/main/CLIP.png?raw=true

Load standard CLIP model, image, text on cpu

import clip
from PIL import Image

# onnx cannot work with cuda
model, preprocess = clip.load("ViT-B/32", device="cpu", jit=False)
# batch first
image = preprocess(Image.open("CLIP.png")).unsqueeze(0) # [1, 3, 224, 224]
text = clip.tokenize(["a diagram", "a dog", "a cat"]) # [3, 77]

Create CLIP-ONNX object to convert model to onnx

from clip_onnx import clip_onnx, attention
clip.model.ResidualAttentionBlock.attention = attention

visual_path = "clip_visual.onnx"
textual_path = "clip_textual.onnx"

# ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
onnx_model = clip_onnx(model, providers=["CPUExecutionProvider"], # cpu mode
                       visual_path=visual_path, textual_path=textual_path)
onnx_model.convert2onnx(image, text, verbose=True)
onnx_model.start_sessions()

Use for standard CLIP API. Batch inference

image_features = onnx_model.encode_image(image)
text_features = onnx_model.encode_text(text)

logits_per_image, logits_per_text = onnx_model(image, text)
probs = logits_per_image.softmax(dim=-1).cpu().numpy()

print("Label probs:", probs)  # prints: [[0.41456965 0.29270944 0.29272085]]

Enjoy the speed

Examples

See examples folder for more details
Some parts of the code were taken from the post. Thank you neverix for this notebook.

Comments

Can't use CUDAExecutionProvider
Hey, I'm trying to use the code on GPU and I encountered 2 problems:

when running pip install git+https://github.com/Lednik7/CLIP-ONNX.git I got the following error (tried on multiple machines): ERROR: Could not find a version that satisfies the requirement torch==1.10.0+cu111 (from clip-onnx)

I fixed it by installing that version of torch by myself. with pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html, and then running the rest of the installation.

After I installed the package, I tried to run the example in the readme with CPUExecutionProvider and it worked fine, but when I'm trying to run it on GPU with CUDAExecutionProvider I get the following error message (again on different machines):

2022-01-31 20:57:03.234399301 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:535 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met. 2022-01-31 20:57:03.872349008 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:535 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.

I can't figure out what is the problem. Any help?
opened by YoadTew 13
Performance is inconsistent with the original model
Hi, thanks for providing this useful tool! However, I found that the result produced by the generated ONNX model is inconsistent with the original CLIP model. Here is the code I used to test the original model:

model, preprocess = clip.load("ViT-B/32", device="cpu", jit=False) image = preprocess(Image.open("CLIP.png")).unsqueeze(0).cpu() # [1, 3, 224, 224] text = clip.tokenize(["a diagram", "a dog", "a cat"]).cpu() # [3, 77] image_features = model.encode_image(image) text_features = model.encode_text(text) logits_per_image, logits_per_text = model(image, text) probs = logits_per_image.softmax(dim=-1).detach().cpu().numpy() print("Label probs:", probs)

The result is: Label probs: [[0.9927937 0.00421069 0.00299573]]

However, when using the onnx model, the result is: Label probs: [[0.41456965 0.29270944 0.29272085]].

Could you help me with this? Thanks!
opened by Cestlaviez 5

Error on installing the torch version in requirements.txt

pip install git+https://github.com/Lednik7/CLIP-ONNX.git

ERROR: Could not find a version that satisfies the requirement torch==1.11.0+cu113 (from versions: 1.0.0, 1.0.1, 1.0.1.post2, 1.1.0, 1.2.0, 1.3.0, 1.3.1, 1.4.0, 1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2, 1.11.0)
ERROR: No matching distribution found for torch==1.11.0+cu113

python version is 3.7.13

opened by dingusagar 2

ERROR: No matching distribution found for onnxruntime==1.11

Hi, Thanks for the great work!

I am having this error when I try to install the package.

ERROR: No matching distribution found for onnxruntime==1.11

Maybe we can update the requirements.txt?

opened by wanliAlex 1
updated and added information

add info about export params

update GPU(K80) benchmarks

update GPU(T4) benchmarks

update CPU benchmarks

change opset_version to 12

updated readme according to the version

update branch link

update version

update packages

opened by Lednik7 0
Replace the operator of "torch.einsum"

q, k, v = (torch.einsum("tbh, oh -> tbo", x, self.attn.in_proj_weight) + self.attn.in_proj_bias).contiguous().chunk( 3, dim=-1)

@Lednik7 Thanks for your great work on Clip-ONNX. for the pytorch operator of "torch.einsum" , if we don't want to use this operator , do you have other codes to replace this operator? this operator is not friendly to some Inference engine, like NV TensorRT, so if you have other codes to replace einsum, that will be better

opened by zhangnju 2

Releases(1.2)

1.2(May 3, 2022)

add info about export params

update GPU(K80) benchmarks

update GPU(T4) benchmarks

update CPU benchmarks

change opset_version to 12

updated readme according to the version

update branch link

update version

update packages
Source code(tar.gz)
Source code(zip)
1.0(May 3, 2022)

Works but with crutches
Source code(tar.gz)
Source code(zip)

It is a simple library to speed up CLIP inference up to 3x (K80 GPU)

Related tags

Overview

CLIP-ONNX

Usage

Example in 3 steps

Examples

Comments

Can't use CUDAExecutionProvider

Performance is inconsistent with the original model

Error on installing the torch version in requirements.txt

ERROR: No matching distribution found for onnxruntime==1.11

updated and added information

Replace the operator of "torch.einsum"

Releases(1.2)

1.2(May 3, 2022)

1.0(May 3, 2022)

Owner

Gerasimov Maxim

Torchserve server using a YoloV5 model running on docker with GPU and static batch inference to perform production ready inference.

Speed-Test - You can check your intenet speed using this tool

FuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space OptimizationFuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space Optimization

CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP

NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.

Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation. Intel iHD GPU (iGPU) support. NVIDIA GPU (dGPU) support.

GrabGpu_py: a scripts for grab gpu when gpu is free

This is the unofficial code of Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes. which achieve state-of-the-art trade-off between accuracy and speed on cityscapes and camvid, without using inference acceleration and extra data

Monocular 3D pose estimation. OpenVINO. CPU inference or iGPU (OpenCL) inference.

PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices.

Data-depth-inference - Data depth inference with python

Tgbox-bench - Simple TGBOX upload speed benchmark

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)

A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN.

Simple implementation of OpenAI CLIP model in PyTorch.

Simple image captioning model - CLIP prefix captioning.

GPU-Accelerated Deep Learning Library in Python

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.