TiP-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling

Overview

TiP-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling

This is the official code release for the paper 'TiP-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling'.

Introduction

Tip-Adapter provides faster convergence and better performance than CLIP-Adapter by initializing the adapter with a cache model.

Implementation

Put tip_adapter_ImageNet.py into clip's folder and run

python tip_adapter_ImageNet.py

you will get 65.51% on ImageNet validation set.

This repo will be completed in a few days.

Contributors

Peng Gao, Renrui Zhang

Acknowledgment

CLIP, CoOp and CLIP-Adapter

Comments
  • Are CLIP/TIP-Adapter only designed for the few-shot setting?

    Are CLIP/TIP-Adapter only designed for the few-shot setting?

    Sorry I've got another question. I did not find experiments under the base-to-new/domain generalization setting and cross-dataset transfer setting, which is conducted by CoCoOp. Are CLIP/TIP-Adapter only designed for the few-shot setting? I wonder how the generation abilities are. Maybe you can give me any intuition?

    opened by machengcheng2016 4
  • Details of data augmentation

    Details of data augmentation

    In the paper, "the CLIP-style pre-processing resizes the cropped image’s short side to 224 while keeping its original aspect", and you said that you use the CLIP-style RandomResizeCrop.

    However, I found that in the code, the standard RandomResizeCrop is used.

    I wonder that is this setting important to the final performance or I misunderstood here?

    opened by SY-Xuan 3
  • replicate your results on food101 dataset

    replicate your results on food101 dataset

    Would you consider providing the script to replicate your results on food101 dataset? If someone is to adapt your script on ImageNet, do you have suggestions on what to make sure to adjust?

    opened by yinyinl 3
  • Adaptor used in vision encoder or text encoder?

    Adaptor used in vision encoder or text encoder?

    Hey, Thanks for nice work. I have some confusion as follows. First, why the adaptor is used only in vision encoder, did the authors try to use the adaptor in text encoder? Second, I don't understand why using adaptor performs better using learnable prompt. In addition, the "adaptor" used in this paper is different from the adaptor in NLP tasks, also the position of the insertion is different, which one is better?

    opened by jingzhengli 2
  • "Tip-Adapter/main.py" use test features to eval

    It seems odd to use test features to eval. see https://github.com/gaopengcuhk/Tip-Adapter/blob/fcb06059457a3b74e44ddb0d5c96d2ea7e4c5957/main.py#L111 Could authors give some explanation?

    opened by fikry102 1
  • How to extend to base-to-novel classes task?

    How to extend to base-to-novel classes task?

    Hi, This method modifies the parameters of the text encoder, so it cannot extend to base-to-new classes tasks. I would like to know how to address this problem.

    opened by jingzhengli 3
  • Run TIP-adapter on text2img retrieval instead

    Run TIP-adapter on text2img retrieval instead

    Hi, thanks for the amazing work on adapters on CLIP. Currently the framework computes the affinities between the test query image and the cache keys, before obtaining the corresponding few-shot label. This works well and good. I would just like your advise on how can i extend this to text2img retrieval where I would like to query with text search term, and utilise the cache key-value adapter to return corresponding images. Would it be as naive as to do a text to text embedding affinity matching of the query text with the cache VALUES (instead of keys) as they contain the ground truth labels for the few-shot learning?

    opened by adrielkuek 3
  • The

    The "alpha" and "beta" in the paper are the opposite of the "alpha" and "beta" in the code of Tip-Adapter

    In Code, "alpha_list = [i * (6.0 - 1.0) / 20 + 1 for i in range(20)] " "beta_list = [i * (7 - 0.1) / 200 + 0.1 for i in range(200)]" In paper, image

    opened by euminds 3
  • Bug when I try cifar100

    Bug when I try cifar100

    Thanks for your work. When I try your code on CIFAR100, I got this error and I dont know how to slove it. Due to ImageNet's huge number of images, I can only do this. PLS help.

    Torch version: 1.7.1 Namespace(alpha=1, augment_epoch=10, beta=1.17, lr=0.001, train_epoch=20) Model parameters: 151,277,313 Input resolution: 224 Context length: 77 Vocab size: 49408 Load data finished. start getting text features. finish getting text features. start getting image features start saving training image features Augment time: 0 / 10 3%|▉ | 6/196 [00:03<01:45, 1.81it/s] Traceback (most recent call last): File "main.py", line 487, in <module> main() File "main.py", line 244, in main for i, (images, target) in enumerate(tqdm(train_loader)): File "/home/yh/.conda/envs/prompt/lib/python3.6/site-packages/tqdm/std.py", line 1180, in __iter__ for obj in iterable: File "/home/yh/.conda/envs/prompt/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 435, in __next__ data = self._next_data() File "/home/yh/.conda/envs/prompt/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 475, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/home/yh/.conda/envs/prompt/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/yh/.conda/envs/prompt/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp> data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/yh/.conda/envs/prompt/lib/python3.6/site-packages/torchvision/datasets/cifar.py", line 113, in __getitem__ img, target = self.data[index], self.targets[index] IndexError: list index out of range [1]+ Killed python main.py

    opened by heng-yin 2
Owner
peng gao
Young Scientist at Shanghai AI Lab
peng gao
CLIP (Contrastive Language–Image Pre-training) trained on Indonesian data

CLIP-Indonesian CLIP (Radford et al., 2021) is a multimodal model that can connect images and text by training a vision encoder and a text encoder joi

Galuh 17 Mar 10, 2022
Learning recognition/segmentation models without end-to-end training. 40%-60% less GPU memory footprint. Same training time. Better performance.

InfoPro-Pytorch The Information Propagation algorithm for training deep networks with local supervision. (ICLR 2021) Revisiting Locally Supervised Lea

null 78 Dec 27, 2022
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Salesforce 1.3k Dec 31, 2022
Deduplicating Training Data Makes Language Models Better

Deduplicating Training Data Makes Language Models Better This repository contains code to deduplicate language model datasets as descrbed in the paper

Google Research 431 Dec 27, 2022
Official codebase for ICLR oral paper Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling

CLIORA This is the official codebase for ICLR oral paper: Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling. We introduce

Bo Wan                                             32 Dec 23, 2022
(IEEE TIP 2021) Regularized Densely-connected Pyramid Network for Salient Instance Segmentation

RDPNet IEEE TIP 2021: Regularized Densely-connected Pyramid Network for Salient Instance Segmentation PyTorch training and testing code are available.

Yu-Huan Wu 41 Oct 21, 2022
[TIP 2020] Multi-Temporal Scene Classification and Scene Change Detection with Correlation based Fusion

Multi-Temporal Scene Classification and Scene Change Detection with Correlation based Fusion Code for Multi-Temporal Scene Classification and Scene Ch

Lixiang Ru 33 Dec 12, 2022
Code release for The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification (TIP 2020)

The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification Code release for The Devil is in the Channels: Mutual-Channel

PRIS-CV: Computer Vision Group 230 Dec 31, 2022
[TIP 2021] SADRNet: Self-Aligned Dual Face Regression Networks for Robust 3D Dense Face Alignment and Reconstruction

SADRNet Paper link: SADRNet: Self-Aligned Dual Face Regression Networks for Robust 3D Dense Face Alignment and Reconstruction Requirements python

Multimedia Computing Group, Nanjing University 99 Dec 30, 2022
Code for the TIP 2021 Paper "Salient Object Detection with Purificatory Mechanism and Structural Similarity Loss"

PurNet Project for the TIP 2021 Paper "Salient Object Detection with Purificatory Mechanism and Structural Similarity Loss" Abstract Image-based salie

Jinming Su 4 Aug 25, 2022
Official implementation of NLOS-OT: Passive Non-Line-of-Sight Imaging Using Optimal Transport (IEEE TIP, accepted)

NLOS-OT Official implementation of NLOS-OT: Passive Non-Line-of-Sight Imaging Using Optimal Transport (IEEE TIP, accepted) Description In this reposit

Ruixu Geng(耿瑞旭) 16 Dec 16, 2022
PyTorch implementation of Deep HDR Imaging via A Non-Local Network (TIP 2020).

NHDRRNet-PyTorch This is the PyTorch implementation of Deep HDR Imaging via A Non-Local Network (TIP 2020). 0. Differences between Original Paper and

Yutong Zhang 1 Mar 1, 2022
TipToiDog - Tip Toi Dog With Python

TipToiDog Was ist dieses Projekt? Meine 5-jährige Tochter spielt sehr gerne das

null 1 Feb 7, 2022
Yoloxkeypointsegment - An anchor-free version of YOLO, with a simpler design but better performance

Introduction 关键点版本:已完成 全景分割版本:已完成 实例分割版本:已完成 YOLOX is an anchor-free version of

null 23 Oct 20, 2022
A task-agnostic vision-language architecture as a step towards General Purpose Vision

Towards General Purpose Vision Systems By Tanmay Gupta, Amita Kamath, Aniruddha Kembhavi, and Derek Hoiem Overview Welcome to the official code base f

AI2 79 Dec 23, 2022
A PyTorch Lightning solution to training OpenAI's CLIP from scratch.

train-CLIP ?? A PyTorch Lightning solution to training CLIP from scratch. Goal ⚽ Our aim is to create an easy to use Lightning implementation of OpenA

Cade Gordon 396 Dec 30, 2022
《K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters》(2020)

K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters This repository is the implementation of the paper "K-Adapter: Infusing Knowledge

Microsoft 118 Dec 13, 2022
Code for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter"

Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter Code and checkpoints for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling

null 274 Dec 6, 2022
The Adapter-Bot: All-In-One Controllable Conversational Model

The Adapter-Bot: All-In-One Controllable Conversational Model This is the implementation of the paper: The Adapter-Bot: All-In-One Controllable Conver

CAiRE 37 Nov 4, 2022