A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

Last update: Dec 26, 2022

Related tags

Deep Learning CLIP4CMR

Overview

CLIP4CMR

A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

The original data and pre-calculated CLIP features are available at here. The train.pkl and test.pkl include image pixel features and text id features, and the clip_train.pkl and clip_test.pkl include 1024-dimensional image and text features.

Comments

关于设置BN层的问题

您好，“A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval” 这个工作是一个十分有价值的工作。我仔细学习了您提供的实验代码。不过发现了一个小问题，这个问题可能会避免日后研究的一些异常情况。

您对于缺失场景的实现策略似乎是这样的：（1）一个batch内数据的图像和文本模态全部经过模型推理；（2）将推理后的输出结果进行采样，获取一定百分比的缺失数据用于计算损失函数。这样的方式不需要提前划分好缺失数据集，更加简洁。但是这种实现方式似乎存在一个问题：该方法必须假设在网络推理过程中，batch内不同样本之间是不能存在信息交互的，如果发生了信息交互，即使在输出后的结果中，舍弃掉一部分数据构建缺失并计算损失，那这些被舍弃的数据虽然没进行损失计算，但是在推理阶段对其他样本产生了信息交互，因此可能不算是严格意义上的数据缺失（即依然可见）。典型的信息交互，比如BN层，会计算Batch内的均值和方差。

opened by FutureTwT 1
Different Results than reported

Hey @zhixiongz

I like your work and tried to reproduce the numbers on the Nuswide dataset for the different losses. In line 169 in main.py you used betas as a variable and haven't initialized it anywhere. I kept the default values of beta for adam optimizer to generate results. My results are different and lower from the values reported in Table2.

https://github.com/zhixiongz/CLIP4CMR/blob/ca75cfccc3486263b0d5f116cf546798cf31f572/main.py#L169

Please update the code for the values that you used to generate the numbers in Table 2.

Thanks and Regards Shivangi

opened by shivangibithel 1
公共空间可视化工作

您好，非常感谢您开源的代码和详细的论文描述，让我受益匪浅。有一个小小的可视化问题想请叫一下您。在论文中的图五您分析由成对损失和类损失获得的公共表示空间的差异并做了可视化处理，但源代码我只找到了关于loss和map的绘图方法，请问方便告知关于公共表示空间中的类内图文距离和类间图文距离的可视化的代码。感恩！感恩！

opened by kxkaixin 1

Pytorch implementation of the AAAI 2022 paper "Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification"

[AAAI22] Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification We point out the overlooked unbiasedness in long-tailed clas

28 Oct 18, 2022

[Preprint] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen, Kaixiong Zhou, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang

Bag of Tricks for Training Deeper Graph Neural Networks: A Comprehensive Benchmark Study Codes for [Preprint] Bag of Tricks for Training Deeper Graph

101 Dec 29, 2022

[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models Codes for this paper The Lottery Tickets Hypo

59 Dec 28, 2022

A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution.

Awesome Pretrained StyleGAN2 A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution. Note the readme is a

1.1k Dec 24, 2022

🐥A PyTorch implementation of OpenAI's finetuned transformer language model with a script to import the weights pre-trained by OpenAI

PyTorch implementation of OpenAI's Finetuned Transformer Language Model This is a PyTorch implementation of the TensorFlow code provided with OpenAI's

1.4k Jan 5, 2023

DziriBERT: a Pre-trained Language Model for the Algerian Dialect

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

1.3k Dec 31, 2022

A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

Related tags

Overview

CLIP4CMR

You might also like...

Pytorch implementation of the AAAI 2022 paper "Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification"

[Preprint] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen, Kaixiong Zhou, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang

[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution.

🐥A PyTorch implementation of OpenAI's finetuned transformer language model with a script to import the weights pre-trained by OpenAI

DziriBERT: a Pre-trained Language Model for the Algerian Dialect

Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT).

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Comments

关于设置BN层的问题

Different Results than reported

公共空间可视化工作

Owner

PyTorch code for the paper "Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval".

[ICCV-2021] An Empirical Study of the Collapsing Problem in Semi-Supervised 2D Human Pose Estimation

Pre-trained BERT Models for Ancient and Medieval Greek, and associated code for LaTeCH 2021 paper titled - "A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek"

Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration

Empirical Study of Transformers for Source Code & A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code

Code for 'Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning', ICCV 2021

SUPERVISED-CONTRASTIVE-LEARNING-FOR-PRE-TRAINED-LANGUAGE-MODEL-FINE-TUNING - The Facebook paper about fine tuning RoBERTa with contrastive loss

Code and pre-trained models for MultiMAE: Multi-modal Multi-task Masked Autoencoders

The code repository for EMNLP 2021 paper "Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization".

A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

Related tags

Overview

CLIP4CMR

You might also like...

Pytorch implementation of the AAAI 2022 paper "Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification"

[Preprint] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen*, Kaixiong Zhou*, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang

[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution.

🐥A PyTorch implementation of OpenAI's finetuned transformer language model with a script to import the weights pre-trained by OpenAI

DziriBERT: a Pre-trained Language Model for the Algerian Dialect

Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT).

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Comments

关于设置BN层的问题

Different Results than reported

公共空间可视化工作

Owner

PyTorch code for the paper "Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval".

[ICCV-2021] An Empirical Study of the Collapsing Problem in Semi-Supervised 2D Human Pose Estimation

Pre-trained BERT Models for Ancient and Medieval Greek, and associated code for LaTeCH 2021 paper titled - "A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek"

Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration

Empirical Study of Transformers for Source Code & A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code

Code for 'Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning', ICCV 2021

SUPERVISED-CONTRASTIVE-LEARNING-FOR-PRE-TRAINED-LANGUAGE-MODEL-FINE-TUNING - The Facebook paper about fine tuning RoBERTa with contrastive loss

Code and pre-trained models for MultiMAE: Multi-modal Multi-task Masked Autoencoders

The code repository for EMNLP 2021 paper "Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization".

[Preprint] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen, Kaixiong Zhou, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang