Official Code Release for "TIP-Adapter: Training-free clIP-Adapter for Better Vision-Language Modeling"


Pipeline of Tip-Adapter

Tip-Adapter can provide fast convergence and better performance by leveraging cache-model as initialization of adapter.


Peng Gao, Renrui Zhang


CLIP and CoOp

  • Are CLIP/TIP-Adapter only designed for the few-shot setting?

    Sorry I've got another question. I did not find experiments under the base-to-new/domain generalization setting and cross-dataset transfer setting, which is conducted by CoCoOp. Are CLIP/TIP-Adapter only designed for the few-shot setting? I wonder how the generation abilities are. Maybe you can give me any intuition?

    opened by machengcheng2016 4
  • Details of data augmentation

    In the paper, "the CLIP-style pre-processing resizes the cropped image’s short side to 224 while keeping its original aspect", and you said that you use the CLIP-style RandomResizeCrop.

    However, I found that in the code, the standard RandomResizeCrop is used.

    I wonder that is this setting important to the final performance or I misunderstood here?

    opened by SY-Xuan 3
  • replicate your results on food101 dataset

    Would you consider providing the script to replicate your results on food101 dataset? If someone is to adapt your script on ImageNet, do you have suggestions on what to make sure to adjust?

    opened by yinyinl 3
  • Adaptor used in vision encoder or text encoder?

    Hey, Thanks for nice work. I have some confusion as follows. First, why the adaptor is used only in vision encoder, did the authors try to use the adaptor in text encoder? Second, I don't understand why using adaptor performs better using learnable prompt. In addition, the "adaptor" used in this paper is different from the adaptor in NLP tasks, also the position of the insertion is different, which one is better?

    opened by jingzhengli 2
  • "Tip-Adapter/" use test features to eval

    It seems odd to use test features to eval. see Could authors give some explanation?

    opened by fikry102 3
  • How to extend to base-to-novel classes task?

    Hi, This method modifies the parameters of the text encoder, so it cannot extend to base-to-new classes tasks. I would like to know how to address this problem.

    opened by jingzhengli 3
  • Run TIP-adapter on text2img retrieval instead

    Hi, thanks for the amazing work on adapters on CLIP. Currently the framework computes the affinities between the test query image and the cache keys, before obtaining the corresponding few-shot label. This works well and good. I would just like your advise on how can i extend this to text2img retrieval where I would like to query with text search term, and utilise the cache key-value adapter to return corresponding images. Would it be as naive as to do a text to text embedding affinity matching of the query text with the cache VALUES (instead of keys) as they contain the ground truth labels for the few-shot learning?

    opened by adrielkuek 3
  • The

    In Code, "alpha_list = [i * (6.0 - 1.0) / 20 + 1 for i in range(20)] " "beta_list = [i * (7 - 0.1) / 200 + 0.1 for i in range(200)]" In paper, image

    opened by euminds 3
  • Bug when I try cifar100

    Thanks for your work. When I try your code on CIFAR100, I got this error and I dont know how to slove it. Due to ImageNet's huge number of images, I can only do this. PLS help.

    Torch version: 1.7.1 Namespace(alpha=1, augment_epoch=10, beta=1.17, lr=0.001, train_epoch=20) Model parameters: 151,277,313 Input resolution: 224 Context length: 77 Vocab size: 49408 Load data finished. start getting text features. finish getting text features. start getting image features start saving training image features Augment time: 0 / 10 3%|▉ | 6/196 [00:03<01:45, 1.81it/s] Traceback (most recent call last): File "", line 487, in <module> main() File "", line 244, in main for i, (images, target) in enumerate(tqdm(train_loader)): File "/home/yh/.conda/envs/prompt/lib/python3.6/site-packages/tqdm/", line 1180, in __iter__ for obj in iterable: File "/home/yh/.conda/envs/prompt/lib/python3.6/site-packages/torch/utils/data/", line 435, in __next__ data = self._next_data() File "/home/yh/.conda/envs/prompt/lib/python3.6/site-packages/torch/utils/data/", line 475, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/home/yh/.conda/envs/prompt/lib/python3.6/site-packages/torch/utils/data/_utils/", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/yh/.conda/envs/prompt/lib/python3.6/site-packages/torch/utils/data/_utils/", line 44, in <listcomp> data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/yh/.conda/envs/prompt/lib/python3.6/site-packages/torchvision/datasets/", line 113, in __getitem__ img, target =[index], self.targets[index] IndexError: list index out of range [1]+ Killed python

    opened by heng-yin 2
peng gao
Young Scientist at Shanghai AI Lab
peng gao
