The Pytorch code of "Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot Classification", CVPR 2022 (Oral).

Overview

DeepBDC for few-shot learning

      

Introduction

In this repo, we provide the implementation of the following paper:
"Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot Classification" [Project] [Paper].

In this paper, we propose deep Brownian Distance Covariance (DeepBDC) for few-shot classification. DeepBDC can effectively learn image representations by measuring, for the query and support images, the discrepancy between the joint distribution of their embedded features and product of the marginals. The core of DeepBDC is formulated as a modular and efficient layer, which can be flexibly inserted into deep networks, suitable not only for meta-learning framework based on episodic training, but also for the simple transfer learning (STL) framework of pretraining plus linear classifier.

If you find this repo helpful for your research, please consider citing our paper:

@inproceedings{DeepBDC-CVPR2022,
    title={Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot Classification},
    author={Jiangtao Xie and Fei Long and Jiaming Lv and Qilong Wang and Peihua Li}, 
    booktitle={CVPR},
    year={2022}
 }

Few-shot classification Results

Experimental results on miniImageNet and CUB. We report average results with 2,000 randomly sampled episodes for both 1-shot and 5-shot evaluation. More details on the experiments can be seen in the paper.

miniImageNet

Method ResNet-12 Pre-trained models Meta-trained models
5-way-1-shot 5-way-5-shot GoogleDrive BaiduCloud GoogleDrive BaiduCloud
ProtoNet 62.11±0.44 80.77±0.30 Download Download Download Download
Good-Embed 64.98±0.44 82.10±0.30 Download Download N/A
Meta DeepBDC 67.34±0.43 84.46±0.28 Download Download Download Download
STL DeepBDC 67.83±0.43 85.45±0.29 Download Download N/A

Note that for Good-Embed and STL DeepBDC, a sequential self-distillation technique is used to obtain the pre-trained models; See the paper of Good-Embed for details.

CUB

Method ResNet-18 Pre-trained models Meta-trained models
5-way-1-shot 5-way-5-shot GoogleDrive BaiduCloud GoogleDrive BaiduCloud
ProtoNet 80.90±0.43 89.81±0.23 Download Download Download Download
Good-Embed 77.92±0.46 89.94±0.26 Download Download N/A
Meta DeepBDC 83.55±0.40 93.82±0.17 Download Download Download Download
STL DeepBDC 84.01±0.42 94.02±0.24 Download Download N/A

Note that for Good-Embed and STL DeepBDC, a sequential self-distillation technique is used to obtain the pre-trained models; See the paper of Good-Embed for details.

References

[BDC] G. J. Szekely and M. L. Rizzo. Brownian distance covariance. Annals of Applied Statistics, 3:1236–1265, 2009.
[ProtoNet] Jake Snell, Kevin Swersky, and Richard Zemel. Prototypical networks for few-shot learning. In NIPS, 2017.
[Good-Embed] Y. Tian, Y. Wang, D. Krishnan, J. B. Tenenbaum, and P. Isola. Rethinking few-shot image classification: a good embedding is all you need? In ECCV, 2020.

Implementation details

Datasets

  • miniImageNet: We use the splits provided by Chen et al.
  • CUB: We use the splits provided by Chen et al.
  • tieredImageNet
  • Aircraft
  • Cars

Implementation environment

Note that the test accuracy may slightly vary with different Pytorch/CUDA versions, GPUs, etc.

  • Linux
  • Python 3.8.3
  • torch 1.7.1
  • GPU (RTX3090) + CUDA11.0 CuDNN
  • sklearn1.0.1, pillow8.0.0, numpy1.19.2

Installation

  • Clone this repo:
git clone https://github.com/Fei-Long121/DeepBDC.git
cd DeepBDC

For Meta DeepBDC on general object recognition

  1. cd scripts/mini_magenet/run_meta_deepbdc
  2. modify the dataset path in run_pretrain.sh, run_metatrain.sh and run_test.sh
  3. bash run.sh

For STL DeepBDC on general object recognition

  1. cd scripts/mini_imagenet/run_stl_deepbdc
  2. modify the dataset path in run_pretrain.sh, run_distillation.sh and run_test.sh
  3. bash run.sh

Acknowledgments

Our code builds upon the the following code publicly available:

Contact

If you have any questions or suggestions, please contact us:

Fei Long([email protected])
Jiaming Lv([email protected])

Comments
  • 与风格Gram matrix的关系

    与风格Gram matrix的关系

    以下是一些个人见解和疑惑,作者可以回答一下么: 我理解这工作核心,在于使用预训练模型的feature map(记为X),然后计算BDC来抽取图片的特征。而BDC就是X的列的欧氏距离经过一些归一化操作得到的。在神经风格中,使用Gram矩阵来描述图片的风格(记为G),G=(X^T) X。文章中公式(6)第一行也提到了A = G_ii + G_jj - 2*G_ij;公式第二行、第三行分别做了开根号,和移除行、列的均值的操作。 那么可以想象,如果两张图片的Gram矩阵很接近,BDC模块提取的特征也会非常接近,也就是说他们会被认为是同一类。而我们知道Gram矩阵是跟纹理、风格相关的,而跟内容关系不大。那是否可以理解为,最终抽取的特征其实是图片纹理或风格?

    opened by weifengchiu 7
  • Question about dropblock in ResNet-12

    Question about dropblock in ResNet-12

    First, thank you for your wonderful paper and tailored code.

    I found some differences between paper and code. In your paper, DropBlock regularization is utilized for ResNet-12 during training. However, in your code (ResNet-12 of models/backbones/ResNet.py), the drop_rate is always 0. Then, the DropBlock doesn't apply.

    In addition, when A was adjusted to 0.1, it shows low performance than when DropBlock was not used. Please explain this phenomenon.

    opened by leesb7426 4
  • 关于BDC类中的属性temperature

    关于BDC类中的属性temperature

    在BDC类的构造函数中,最后几行对属性temperature进行了赋值

    self.temperature = nn.Parameter(torch.log((1. / (2 * input_dim[1]*input_dim[2])) * torch.ones(1,1)), requires_grad=True)

    请问该参数对应于论文中的哪个公式呢?

    opened by smhhyyz 2
  • Do you use a single checkpoint to evaluate arbitrary-shot episodes?

    Do you use a single checkpoint to evaluate arbitrary-shot episodes?

    Hello,

    Congratulations on your paper accepted as an oral presentation!

    I would like to ask a simple question on the implementation: To evaluate both 1-shot and 5-shot episodes on the test set, do you use one model checkpoint which had the best validation accuracy on 1-shot episodes?

    Thank you for releasing the code publicly available! Have a great day! 😃

    Best, Dahyun

    opened by dahyun-kang 2
  • Self-distillation

    Self-distillation

    Hi, thanks for publishing the code. It is really an interesting work!

    I have a question in terms of the self-distillation used in the pre-training stage. It looks like the teacher model is fixed in the beginning and the knowledge is distilled into a new student model over the iteration. So it is not really sequential (i.e. the previous student becomes the teacher in the next iteration)?

    opened by RongKaiWeskerMA 1
  • 关于您在readme中dataset的描述和#9中不一致的问题

    关于您在readme中dataset的描述和#9中不一致的问题

    我使用了您在问题#9中,给出的链接https://github.com/yaoyao-liu/mini-imagenet-tools 仓库提供的mini-imagenet(从ILSVRC2012中提取)下载链接,但是没有达到您论文中的效果。而我注意到您在readme中的dataset部分给出的链接https://github.com/wyharveychen/CloserLookFewShot 中, 是从ILSVRC2015中提取的数据,请问您论文中的效果是使用的哪一种方式得到的?感谢!

    opened by layers33 2
  • Some question w.r.t. the comparison between the proposed method (DeepBDC) and DeepEMD

    Some question w.r.t. the comparison between the proposed method (DeepBDC) and DeepEMD

    Hi, I find this work quite interesting and I have a small question w.r.t. the comparison between the proposed method and DeepEMD.

    Specifically, I find that in Table 1 of your paper, you mentioned that DeepEMD has joint distribution to be yes and dependency to be N/A, as also shown below. image

    However, it seems that the dependency of DeepEMD is not mentioned anywhere else (I am sorry if I missed that), and all the comparison between the proposed method and DeepEMD elsewhere is based on the latency. Can I know why DeepEMD as a method that can model the joint distribution cannot model any dependency? Or in other words, what is the difference between DeepEMD and the proposed method that makes DeepEMD cannot model any dependency but the proposed method can model both linear and nonlinear dependency?

    Thanks in advance for your help!

    opened by Harryqu123 0
  • tiered_ImageNet的代码及超参数能提供么?

    tiered_ImageNet的代码及超参数能提供么?

    你好,代码中只提供了mini-imagenet 和 cu b的代码及超参数,您看能否提供tired-image的超参数脚本? tired-image的代码和其他两个数据集一样么?
    这里的代码主要是数据集相关的代码

    在请教下 cub 和tiered_ImageNet 在STLBDC中的预训练阶段batch size 都是64么?

    opened by xue19890510 3
Official code for the CVPR 2022 (oral) paper "Extracting Triangular 3D Models, Materials, and Lighting From Images".

nvdiffrec Joint optimization of topology, materials and lighting from multi-view image observations as described in the paper Extracting Triangular 3D

NVIDIA Research Projects 1.4k Jan 1, 2023
Code for "Neural 3D Scene Reconstruction with the Manhattan-world Assumption" CVPR 2022 Oral

News 05/10/2022 To make the comparison on ScanNet easier, we provide all quantitative and qualitative results of baselines here, including COLMAP, COL

ZJU3DV 365 Dec 30, 2022
[CVPR 2022 Oral] Rethinking Minimal Sufficient Representation in Contrastive Learning

Rethinking Minimal Sufficient Representation in Contrastive Learning PyTorch implementation of Rethinking Minimal Sufficient Representation in Contras

null 36 Nov 23, 2022
(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry Official implementation of the paper Multi-View Depth Est

Bae, Gwangbin 138 Dec 28, 2022
[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers

TubeDETR: Spatio-Temporal Video Grounding with Transformers Website • STVG Demo • Paper This repository provides the code for our paper. This includes

Antoine Yang 108 Dec 27, 2022
[CVPR 2022 Oral] Crafting Better Contrastive Views for Siamese Representation Learning

Crafting Better Contrastive Views for Siamese Representation Learning (CVPR 2022 Oral) 2022-03-29: The paper was selected as a CVPR 2022 Oral paper! 2

null 249 Dec 28, 2022
(CVPR 2022 Oral) Official implementation for "Surface Representation for Point Clouds"

RepSurf - Surface Representation for Point Clouds [CVPR 2022 Oral] By Haoxi Ran* , Jun Liu, Chengjie Wang ( * : corresponding contact) The pytorch off

Haoxi Ran 264 Dec 23, 2022
[CVPR 2022 Oral] EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

EPro-PnP EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation In CVPR 2022 (Oral). [paper] Hanshen

 同济大学智能汽车研究所综合感知研究组 ( Comprehensive Perception Research Group under Institute of Intelligent Vehicles, School of Automotive Studies, Tongji University) 842 Jan 4, 2023
[CVPR 2022 Oral] Versatile Multi-Modal Pre-Training for Human-Centric Perception

Versatile Multi-Modal Pre-Training for Human-Centric Perception Fangzhou Hong1  Liang Pan1  Zhongang Cai1,2,3  Ziwei Liu1* 1S-Lab, Nanyang Technologic

Fangzhou Hong 96 Jan 3, 2023
Scribble-Supervised LiDAR Semantic Segmentation, CVPR 2022 (ORAL)

Scribble-Supervised LiDAR Semantic Segmentation Dataset and code release for the paper Scribble-Supervised LiDAR Semantic Segmentation, CVPR 2022 (ORA

null 102 Dec 25, 2022
Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds (CVPR 2022, Oral)

Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds (CVPR 2022, Oral) This is the official implementat

Yifan Zhang 259 Dec 25, 2022
[CVPR 2022] PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision (Oral)

PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision Kehong Gong*, Bingbing Li*, Jianfeng Zhang*, Ta

null 256 Dec 28, 2022
Official MegEngine implementation of CREStereo(CVPR 2022 Oral).

[CVPR 2022] Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation This repository contains MegEngine implementation of ou

MEGVII Research 309 Dec 30, 2022
[CVPR 2022 Oral] MixFormer: End-to-End Tracking with Iterative Mixed Attention

MixFormer The official implementation of the CVPR 2022 paper MixFormer: End-to-End Tracking with Iterative Mixed Attention [Models and Raw results] (G

Multimedia Computing Group, Nanjing University 235 Jan 3, 2023
Temporally Efficient Vision Transformer for Video Instance Segmentation, CVPR 2022, Oral

Temporally Efficient Vision Transformer for Video Instance Segmentation Temporally Efficient Vision Transformer for Video Instance Segmentation (CVPR

Hust Visual Learning Team 203 Dec 31, 2022
Focal Sparse Convolutional Networks for 3D Object Detection (CVPR 2022, Oral)

Focal Sparse Convolutional Networks for 3D Object Detection (CVPR 2022, Oral) This is the official implementation of Focals Conv (CVPR 2022), a new sp

DV Lab 280 Jan 7, 2023
Imposter-detector-2022 - HackED 2022 Team 3IQ - 2022 Imposter Detector

HackED 2022 Team 3IQ - 2022 Imposter Detector By Aneeljyot Alagh, Curtis Kan, Jo

Joshua Ji 3 Aug 20, 2022
The 7th edition of NTIRE: New Trends in Image Restoration and Enhancement workshop will be held on June 2022 in conjunction with CVPR 2022.

NTIRE 2022 - Image Inpainting Challenge Important dates 2022.02.01: Release of train data (input and output images) and validation data (only input) 2

Andrés Romero 37 Nov 27, 2022
A PyTorch implementation of ICLR 2022 Oral paper PiCO: Contrastive Label Disambiguation for Partial Label Learning

PiCO: Contrastive Label Disambiguation for Partial Label Learning This is a PyTorch implementation of ICLR 2022 Oral paper PiCO; also see our Project

王皓波 83 May 11, 2022