A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

Overview

CLIP4CMR

A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

The original data and pre-calculated CLIP features are available at here. The train.pkl and test.pkl include image pixel features and text id features, and the clip_train.pkl and clip_test.pkl include 1024-dimensional image and text features.

You might also like...
Pytorch implementation of the AAAI 2022 paper "Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification"

[AAAI22] Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification We point out the overlooked unbiasedness in long-tailed clas

[Preprint]
[Preprint] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen*, Kaixiong Zhou*, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang

Bag of Tricks for Training Deeper Graph Neural Networks: A Comprehensive Benchmark Study Codes for [Preprint] Bag of Tricks for Training Deeper Graph

[CVPR 2021]
[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models Codes for this paper The Lottery Tickets Hypo

A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution.
A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution.

Awesome Pretrained StyleGAN2 A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution. Note the readme is a

🐥A PyTorch implementation of OpenAI's finetuned transformer language model with a script to import the weights pre-trained by OpenAI
🐥A PyTorch implementation of OpenAI's finetuned transformer language model with a script to import the weights pre-trained by OpenAI

PyTorch implementation of OpenAI's Finetuned Transformer Language Model This is a PyTorch implementation of the TensorFlow code provided with OpenAI's

DziriBERT: a Pre-trained Language Model for the Algerian Dialect
DziriBERT: a Pre-trained Language Model for the Algerian Dialect

DziriBERT DziriBERT is the first Transformer-based Language Model that has been pre-trained specifically for the Algerian Dialect. It handles Algerian

Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

t5-japanese Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts. The following is a list of models that

This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT).
This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT).

Dynamic-Vision-Transformer (Pytorch) This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT). Not All Ima

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Comments
  • 关于设置BN层的问题

    关于设置BN层的问题

    您好,“A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval” 这个工作是一个十分有价值的工作。我仔细学习了您提供的实验代码。不过发现了一个小问题,这个问题可能会避免日后研究的一些异常情况。

    您对于缺失场景的实现策略似乎是这样的: (1)一个batch内数据的图像和文本模态全部经过模型推理; (2)将推理后的输出结果进行采样,获取一定百分比的缺失数据用于计算损失函数。 这样的方式不需要提前划分好缺失数据集,更加简洁。但是这种实现方式似乎存在一个问题: 该方法必须假设在网络推理过程中,batch内不同样本之间是不能存在信息交互的,如果发生了信息交互,即使在输出后的结果中,舍弃掉一部分数据构建缺失并计算损失,那这些被舍弃的数据虽然没进行损失计算,但是在推理阶段对其他样本产生了信息交互,因此可能不算是严格意义上的数据缺失(即依然可见)。典型的信息交互,比如BN层,会计算Batch内的均值和方差。

    opened by FutureTwT 1
  • Different Results than reported

    Different Results than reported

    Hey @zhixiongz

    I like your work and tried to reproduce the numbers on the Nuswide dataset for the different losses. In line 169 in main.py you used betas as a variable and haven't initialized it anywhere. I kept the default values of beta for adam optimizer to generate results. My results are different and lower from the values reported in Table2.

    https://github.com/zhixiongz/CLIP4CMR/blob/ca75cfccc3486263b0d5f116cf546798cf31f572/main.py#L169

    Please update the code for the values that you used to generate the numbers in Table 2.

    Thanks and Regards Shivangi

    opened by shivangibithel 1
  • 公共空间可视化工作

    公共空间可视化工作

    您好,非常感谢您开源的代码和详细的论文描述,让我受益匪浅。有一个小小的可视化问题想请叫一下您。在论文中的图五您分析由成对损失和类损失获得的公共表示空间的差异并做了可视化处理,但源代码我只找到了关于loss和map的绘图方法,请问方便告知关于公共表示空间中的类内图文距离和类间图文距离的可视化的代码。 感恩!感恩!

    opened by kxkaixin 1
Owner
null
PyTorch code for the paper "Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval".

Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval (M2HSE) PyTorch code fo

Xinlei-Pei 6 Dec 23, 2022
[ICCV-2021] An Empirical Study of the Collapsing Problem in Semi-Supervised 2D Human Pose Estimation

An Empirical Study of the Collapsing Problem in Semi-Supervised 2D Human Pose Estimation (ICCV 2021) Introduction This is an official pytorch implemen

rongchangxie 42 Jan 4, 2023
Pre-trained BERT Models for Ancient and Medieval Greek, and associated code for LaTeCH 2021 paper titled - "A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek"

Ancient Greek BERT The first and only available Ancient Greek sub-word BERT model! State-of-the-art post fine-tuning on Part-of-Speech Tagging and Mor

Pranaydeep Singh 22 Dec 8, 2022
Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation This repository is the pytorch implementation of our paper: Hierarchical Cr

null 43 Nov 21, 2022
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration

ROSITA News & Updates (24/08/2021) Release the demo to perform fine-grained semantic alignments using the pretrained ROSITA model. (15/08/2021) Releas

Vision and Language Group@ MIL 48 Dec 23, 2022
Empirical Study of Transformers for Source Code & A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code

Transformers for variable misuse, function naming and code completion tasks The official PyTorch implementation of: Empirical Study of Transformers fo

Bayesian Methods Research Group 56 Nov 15, 2022
Code for 'Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning', ICCV 2021

CMIC-Retrieval Code for Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning. ICCV 2021. Introduction In this wo

null 42 Nov 17, 2022
Saeed Lotfi 28 Dec 12, 2022
Code and pre-trained models for MultiMAE: Multi-modal Multi-task Masked Autoencoders

MultiMAE: Multi-modal Multi-task Masked Autoencoders Roman Bachmann*, David Mizrahi*, Andrei Atanov, Amir Zamir Website | arXiv | BibTeX Official PyTo

Visual Intelligence & Learning Lab, Swiss Federal Institute of Technology (EPFL) 385 Jan 6, 2023
The code repository for EMNLP 2021 paper "Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization".

Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization [Paper] accepted at the EMNLP 2021: Vision Guided Genera

CAiRE 42 Jan 7, 2023