Unified API to facilitate usage of pre-trained "perceptor" models, a la CLIP

David Marx

Last update: Nov 25, 2022

Related tags

Deep Learning Multi-Modal-Comparators

Overview

mmc

installation

git clone https://github.com/dmarx/Multi-Modal-Comparators
cd 'Multi-Modal-Comparators'
pip install poetry
poetry build
pip install dist/mmc*.whl

# optional final step:
#poe napm_installs
python src/mmc/napm_installs/__init__.py

To see which models are immediately available, run:

python -m mmc.loaders

That optional `poe napm_installs` step

For the most convenient experience, it is recommended that you perform the final poe napm_installs step. Omitting this step will make your one-time setup faster, but will make certain use cases more complex.

If you did not perform the optional poe napm_installs step, you likely received several warnings about models whose loaders could not be registered. These are models whose codebases depend on python code which is not trivially installable. You will still have access to all of the models supported by the library as if you had run the last step, but their loaders will not be queryable from the registry (see below) and will need to be loaded via the appropriate mmc.loader directly, which may be non-trivial to identify without the ability to query it from mmc's registry.

As a concrete example, if the napm step is skipped, the model [cloob - corwsonkb - cloob_laion_400m_vit_b_16_32_epochs] will not appear in the list of registered loaders, but can still be loaded like this:

from mmc.loaders import KatCloobLoader

model = KatCloobLoader(id='cloob_laion_400m_vit_b_16_32_epochs').load()

Invoking the load() method on an unregistered loader will invoke napm to prepare any uninstallable dependencies required to load the model. Next time you run python -m mmc.loaders, the CLOOB loader will show as registered and spinning up the registry will longer emit a warning for that model.

Usage

TLDR

# spin up the registry
from mmc import loaders

from mmc.mock.openai import MockOpenaiClip
from mmc.registry import REGISTRY

cloob_query = {architecture='cloob'}
cloob_loaders = REGISTRY.find(**cloob_query)

# loader repl prints attributes for uniquely querying
print(cloob_loaders)

# loader returns a perceptor whose API is standardized across mmc
cloob_model = cloob_loaders[0].load()

# wrapper classes are provided for mocking popular implementations
# to facilitate drop-in compatibility with existing code
drop_in_replacement__cloob_model = MockOpenaiClip(cloob_model)

Querying the Model Registry

Spin up the model registry by importing the loaders module:

from mmc import loaders

To see which models are available:

from mmc.registry import REGISTRY

for loader in REGISTRY.find():
    print(loader)

You can constrain the result set by querying the registry for specific metadata attributes

# all CLIP models
clip_loaders = REGISTRY.find(architecture='clip')

# CLIP models published by openai
openai_clip_loaders = REGISTRY.find(architecture='clip', publisher='openai')

# All models published by MLFoundations (openCLIP)
mlf_loaders = REGISTRY.find(publisher='mlfoundations)'

# A specific model
rn50_loader = REGISTRY.find(architecture='clip', publisher='openai', id='RN50')
# NB: there may be multiple models matching a particular "id". the 'id' field
# only needs to be unique for a given architecture-publisher pair.

All pretrained checkpoints are uniquely identifiable by a combination of architecture, publisher, and id.

The above queries return lists of loader objects. If model artifacts (checkpoints, config) need to be downloaded, they will only be downloaded after the load() method on the loader is invoked.

loaders = REGISTRY.find(...)
loader = loaders[0] # just picking an arbitrary return value here, remember: loaders is a *list* of loaders
model = loader.load()

The load() method returns an instance of an mmc.MultiModalComparator. The MultiModalComparator class is a modality-agnostic abstraction. I'll get to the ins and outs of that another time.

API Mocking

You want something you can just drop into your code and it'll work. We got you. This library provides wrapper classes to mock the APIs of commonly used CLIP implementations. To wrap a MultiModalComparator so it can be used as a drop-in replacement with code compatible with OpenAI's CLIP:

from mmc.mock.openai import MockOpenaiClip

my_model = my_model_loader.load()
model = MockOpenaiClip(my_model)

MultiMMC: Multi-Perceptor Implementation

The MultiMMC class can be used to run inference against multiple mmc models in parallel. This form of ensemble is sometimes referred to as a "multi-perceptor".

To ensure that all models loaded into the MultiMMC are compatible, the MultiMMC instance is initialized by specifying the modalities it supports. We'll discuss modality objects in a bit.

from mmc.multimmc import MultiMMC
from mmc.modalities import TEXT, IMAGE

perceptor = MultiMMC(TEXT, IMAGE)

To load and use a model:

perceptor.load_model(
    architecture='clip', 
    publisher='openai', 
    id='RN50',
)

score = perceptor.compare(
    image=PIL.Image.open(...), 
    text=text_pos),
)

Additional models can be added to the ensemble via the load_model() method.

The MultiMMC does not support API mocking because of its reliance on the compare method.

Available Pre-trained Models

Some model comparisons here

# [<architecture> - <publisher> - <id>]
[clip - openai - RN50]
[clip - openai - RN101]
[clip - openai - RN50x4]
[clip - openai - RN50x16]
[clip - openai - RN50x64]
[clip - openai - ViT-B/32]
[clip - openai - ViT-B/16]
[clip - openai - ViT-L/14]
[clip - openai - ViT-L/14@336px]
[clip - mlfoundations - RN50--openai]
[clip - mlfoundations - RN50--yfcc15m]
[clip - mlfoundations - RN50--cc12m]
[clip - mlfoundations - RN50-quickgelu--openai]
[clip - mlfoundations - RN50-quickgelu--yfcc15m]
[clip - mlfoundations - RN50-quickgelu--cc12m]
[clip - mlfoundations - RN101--openai]
[clip - mlfoundations - RN101--yfcc15m]
[clip - mlfoundations - RN101-quickgelu--openai]
[clip - mlfoundations - RN101-quickgelu--yfcc15m]
[clip - mlfoundations - RN50x4--openai]
[clip - mlfoundations - RN50x16--openai]
[clip - mlfoundations - ViT-B-32--openai]
[clip - mlfoundations - ViT-B-32--laion400m_e31]
[clip - mlfoundations - ViT-B-32--laion400m_e32]
[clip - mlfoundations - ViT-B-32--laion400m_avg]
[clip - mlfoundations - ViT-B-32-quickgelu--openai]
[clip - mlfoundations - ViT-B-32-quickgelu--laion400m_e31]
[clip - mlfoundations - ViT-B-32-quickgelu--laion400m_e32]
[clip - mlfoundations - ViT-B-32-quickgelu--laion400m_avg]
[clip - mlfoundations - ViT-B-16--openai]
[clip - mlfoundations - ViT-L-14--openai]
[clip - sbert - ViT-B-32-multilingual-v1]
[clip - sajjjadayobi - clipfa]

# The following models depend on napm for setup
[clip - navervision - kelip_ViT-B/32]
[cloob - crowsonkb - cloob_laion_400m_vit_b_16_16_epochs]
[cloob - crowsonkb - cloob_laion_400m_vit_b_16_32_epochs]
[clip - facebookresearch - clip_small_25ep]
[clip - facebookresearch - clip_base_25ep]
[clip - facebookresearch - clip_large_25ep]
[slip - facebookresearch - slip_small_25ep]
[slip - facebookresearch - slip_small_50ep]
[slip - facebookresearch - slip_small_100ep]
[slip - facebookresearch - slip_base_25ep]
[slip - facebookresearch - slip_base_50ep]
[slip - facebookresearch - slip_base_100ep]
[slip - facebookresearch - slip_large_25ep]
[slip - facebookresearch - slip_large_50ep]
[slip - facebookresearch - slip_large_100ep]
[simclr - facebookresearch - simclr_small_25ep]
[simclr - facebookresearch - simclr_base_25ep]
[simclr - facebookresearch - simclr_large_25ep]
[clip - facebookresearch - clip_base_cc3m_40ep]
[clip - facebookresearch - clip_base_cc12m_35ep]
[slip - facebookresearch - slip_base_cc3m_40ep]
[slip - facebookresearch - slip_base_cc12m_35ep]

VRAM Cost

The following is an estimate of the amount of space the loaded model occupies in memory:

	publisher	architecture	model_name	vram_mb
0	openai	clip	RN50	358
1	openai	clip	RN101	294
2	openai	clip	RN50x4	424
3	openai	clip	RN50x16	660
4	openai	clip	RN50x64	1350
5	openai	clip	ViT-B/32	368
6	openai	clip	ViT-B/16	348
7	openai	clip	ViT-L/14	908
8	openai	clip	ViT-L/14@336px	908
9	mlfoundations	clip	RN50--openai	402
10	mlfoundations	clip	RN50--yfcc15m	402
11	mlfoundations	clip	RN50--cc12m	402
12	mlfoundations	clip	RN50-quickgelu--openai	402
13	mlfoundations	clip	RN50-quickgelu--yfcc15m	402
14	mlfoundations	clip	RN50-quickgelu--cc12m	402
15	mlfoundations	clip	RN101--openai	476
16	mlfoundations	clip	RN101--yfcc15m	476
17	mlfoundations	clip	RN101-quickgelu--openai	476
18	mlfoundations	clip	RN101-quickgelu--yfcc15m	476
19	mlfoundations	clip	RN50x4--openai	732
20	mlfoundations	clip	RN50x16--openai	1200
21	mlfoundations	clip	ViT-B-32--openai	634
22	mlfoundations	clip	ViT-B-32--laion400m_e31	634
23	mlfoundations	clip	ViT-B-32--laion400m_e32	634
24	mlfoundations	clip	ViT-B-32--laion400m_avg	634
25	mlfoundations	clip	ViT-B-32-quickgelu--openai	634
26	mlfoundations	clip	ViT-B-32-quickgelu--laion400m_e31	634
27	mlfoundations	clip	ViT-B-32-quickgelu--laion400m_e32	634
28	mlfoundations	clip	ViT-B-32-quickgelu--laion400m_avg	634
29	mlfoundations	clip	ViT-B-16--openai	634
30	mlfoundations	clip	ViT-L-14--openai	1688
32	sajjjadayobi	clip	clipfa	866
33	crowsonkb	cloob	cloob_laion_400m_vit_b_16_16_epochs	610
34	crowsonkb	cloob	cloob_laion_400m_vit_b_16_32_epochs	610
36	facebookresearch	slip	slip_small_25ep	728
37	facebookresearch	slip	slip_small_50ep	650
38	facebookresearch	slip	slip_small_100ep	650
39	facebookresearch	slip	slip_base_25ep	714
40	facebookresearch	slip	slip_base_50ep	714
41	facebookresearch	slip	slip_base_100ep	714
42	facebookresearch	slip	slip_large_25ep	1534
43	facebookresearch	slip	slip_large_50ep	1522
44	facebookresearch	slip	slip_large_100ep	1522
45	facebookresearch	slip	slip_base_cc3m_40ep	714
46	facebookresearch	slip	slip_base_cc12m_35ep	714

Contributing

Suggest a pre-trained model

If you would like to suggest a pre-trained model for future addition, you can add a comment to this issue

Add a pre-trained model

Create a loader class that encapsulates the logic for importing the model, loading weights, preprocessing inputs, and performing projections.
At the bottom of the file defining the loader class should be a code snippet that adds each respective checkpoint's loader to the registry.
Add an import for the new file to mmc/loaders/__init__.py. The imports in this file are the reason import mmc.loaders "spins up" the registry.
If the codebase on which the model depends can be installed, update pytproject.toml to install it.
Otherwise, add napm preparation at the top of the loaders load method (see cloob or kelip for examples), and also add napm setup to mmc/napm_installs/__init__.py
Add a test case to tests/test_mmc_loaders.py
Add a test script for the loader (see test_mmc_katcloob as an example)

Comments

Ezmode

# ta dah!
from mmc.ez.CLIP import clip

# mhm.
clip.available_models()

# requesting a tokenizer before loading the model
# returns the openai clip SimpleTokenizer
#tokenize = clip.tokenize

# either of these works
model, preprocessor = clip.load('RN50')
model, preprocessor = clip.load('[clip - openai - RN50]')

# if we request the tokenizer *after* a model has been loaded, 
# the tokenizer appropriate to the loaded model is returned 
tokenize = clip.tokenize

opened by dmarx 3

Extend the Mock API capabilities
This was useful to do for integrating MMC with Princess Generator. I know this makes the Mock API go beyond its scope, but:

OpenAI's CLIP has this "tokenize" function. As we can't do it on the library level, as the tokenizer is different for every model - I think adding an option in the model itself might make sense

We gotta think how much heavy lifting we want to do for the devs. From my perspective I think default operations such as tokenization, preprocessing, normalization should be abstracted to them

That said not sure if the way I did it here is the best to go about it - almost sure it's not
opened by apolinario 2

mlf_vit-b/16+ loads but throws tensor shape error attempting image embedding

Also to do: add tests for this model to MLF test script

2022-05-07 19:33:23.102 | INFO     | __main__:parse_scenes:133 - Prompts loaded.
2022-05-07 19:33:23.110 | INFO     | __main__:do_run:540 - Settings saved to /home/ubuntu/pytti-core/images_out//clip_mlf_vitb16plus_e32/clip_mlf_vitb16plus_e32_settings.txt
2022-05-07 19:33:23.118 | INFO     | __main__:do_run:553 - Running prompt:
  0%|                                                                                                                                                   | 0/6000 [00:00<?, ?it/s]/home/ubuntu/venv/lib/python3.9/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  ../aten/src/ATen/native/TensorShape.cpp:2157.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
  0%|                                                                                                                                                   | 0/6000 [00:00<?, ?it/s]
Error executing job with overrides: ['conf=cloob_test', 'steps_per_scene=6000', 'border_mode=wrap', 'file_namespace=clip_mlf_vitb16plus_e32', '++mmc_models=[{architecture: clip, publisher: mlfoundations, id: ViT-B-16-plus-240--laion400m_e32 }]']
Traceback (most recent call last):
  File "/home/ubuntu/venv/lib/python3.9/site-packages/pytti/workhorse.py", line 607, in _main
    do_run()
  File "/home/ubuntu/venv/lib/python3.9/site-packages/pytti/workhorse.py", line 554, in do_run
    i += model.run_steps(
  File "/home/ubuntu/venv/lib/python3.9/site-packages/pytti/ImageGuide.py", line 188, in run_steps
    losses = self.train(
  File "/home/ubuntu/venv/lib/python3.9/site-packages/pytti/ImageGuide.py", line 295, in train
    image_embeds, offsets, sizes = self.embedder(self.image_rep, input=z)
  File "/home/ubuntu/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ubuntu/venv/lib/python3.9/site-packages/pytti/Perceptor/Embedder.py", line 169, in forward
    image_embeds.append(perceptor.encode_image(clip_in).float().unsqueeze(0))
  File "/home/ubuntu/venv/lib/python3.9/site-packages/mmc/mock/openai.py", line 60, in encode_image
    return project(image)
  File "/home/ubuntu/venv/lib/python3.9/site-packages/open_clip/model.py", line 415, in encode_image
    return self.visual(image)
  File "/home/ubuntu/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ubuntu/venv/lib/python3.9/site-packages/open_clip/model.py", line 273, in forward
    x = x + self.positional_embedding.to(x.dtype)
RuntimeError: The size of tensor a (197) must match the size of tensor b (226) at non-singleton dimension 1

bug

opened by dmarx 1

BaseLoader needs str/repr dunder funcs

specifically, this is less helpful than it could be:

mmc.REGISTRY.find(architecture='clip')

[<mmc.loaders.openaicliploader.OpenAiClipLoader at 0x7fc1f90f3150>,
 <mmc.loaders.openaicliploader.OpenAiClipLoader at 0x7fc0f0a9dfd0>,
 <mmc.loaders.openaicliploader.OpenAiClipLoader at 0x7fc0f0a9da50>,
...
]

enhancement

opened by dmarx 1

wrap loader to mock CLIP.clip module

to facilitate loading preprocessor/tokenizer more naturally

ldr = REGISTRY.find(id='my-model")
clip = MockClipModule(ldr)

tokenizer = clip.tokenize
model, preprocess_image = clip.load('my-model')

opened by dmarx 0

SLIP text projection throws ndims error

pytest -s -v -k slip

FAILED tests/test_mmc_fairslip.py::test_project_text - RuntimeError: number of dims don't match in permute
FAILED tests/test_mmc_fairslip_cc12m.py::test_project_text - RuntimeError: number of dims don't match in permute
FAILED tests/test_mmc_fairslip_cc3m.py::test_project_text - RuntimeError: number of dims don't match in permute

opened by dmarx 0

pytti expects models to report input image resolution

how attribute is computed for openai models: https://github.com/openai/CLIP/blob/main/clip/model.py#L398-L411

where attribute is used in pytti: https://github.com/pytti-tools/pytti-core/blob/main/src/pytti/Perceptor/Embedder.py#L43
enhancement

opened by dmarx 0
[Tutorial] Use this tooling to implement a relevant article as a demo
https://github.com/orpatashnik/StyleCLIP

TxST - Name Your Style: An Arbitrary Artist-aware

Image Style Transfer - https://arxiv.org/pdf/2202.13562.pdf

CLIPstyler - https://arxiv.org/abs/2112.00374
opened by dmarx 0
Gradient Checkpointing for OpenCLIP should be optional

I know hardcoding it came from me but while Gradient Checkpointing makes things faster and use less VRAM so very useful on some use-cases, but can break things on A100 and also break cutn_batches on most text-to-image implementations, so ideally it should be optional for the user

More broadly we should think on how to load options that pertain to particular loaders/modules/perceptors while not breaking the overall mocking logics

opened by apolinario 0
improved packaging
Hi, As part of our package to easily evaluate clip models, https://github.com/LAION-AI/CLIP_benchmark/issues/1 and my inference lib https://github.com/rom1504/clip-retrieval

I'm interested to have a package like this

however here is what's missing here:

pypi packaging

much clearer README, the usage of this should be 5 lines of python that can be copy pasted

performance evaluation

possibly optional dependencies to avoid the dependency list to become too large

I may be interested to contribute all that, but first I'd like to check with you if you're ok with that kind of changes

thanks
enhancement
opened by rom1504 19

Unified API to facilitate usage of pre-trained "perceptor" models, a la CLIP

Related tags

Overview

mmc

installation

That optional poe napm_installs step

Usage

Querying the Model Registry

API Mocking

MultiMMC: Multi-Perceptor Implementation

Available Pre-trained Models

VRAM Cost

Contributing

Suggest a pre-trained model

Add a pre-trained model

Comments

Owner

David Marx

Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

FuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space OptimizationFuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space Optimization

CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP

Monocular Depth Estimation - Weighted-average prediction from multiple pre-trained depth estimation models

Code, Data and Demo for Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting

《K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters》(2020)

Pytorch implementation of our paper under review — Lottery Jackpots Exist in Pre-trained Models

This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT).

Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning".

Code + pre-trained models for the paper Keeping Your Eye on the Ball Trajectory Attention in Video Transformers

We envision models that are pre-trained on a vast range of domain-relevant tasks to become key for molecule property prediction

Source code for paper: Knowledge Inheritance for Pre-trained Language Models

Pre-trained model, code, and materials from the paper "Impact of Adversarial Examples on Deep Learning Models for Biomedical Image Segmentation" (MICCAI 2019).

The Hailo Model Zoo includes pre-trained models and a full building and evaluation environment

Implementation of PyTorch-based multi-task pre-trained models

The code repository for EMNLP 2021 paper "Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization".

TorchGeo is a PyTorch domain library, similar to torchvision, that provides datasets, transforms, samplers, and pre-trained models specific to geospatial data.

Pre-trained BERT Models for Ancient and Medieval Greek, and associated code for LaTeCH 2021 paper titled - "A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek"

That optional `poe napm_installs` step