Code for 'Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning', ICCV 2021

Last update: Nov 17, 2022

Related tags

Deep Learning IBSR_jittor

Overview

CMIC-Retrieval

Code for Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning. ICCV 2021.

Introduction

In this work, we tackle the problem of single image-based 3D shape retrieval (IBSR), where we seek to find the most matched shape of a given single 2D image from a shape repository. Most of the existing works learn to embed 2D images and 3D shapes into a common feature space and perform metric learning using a triplet loss. Inspired by the great success in recent contrastive learning works on self-supervised representation learning, we propose a novel IBSR pipeline leveraging contrastive learning. We note that adopting such cross-modal contrastive learning between 2D images and 3D shapes into IBSR tasks is non-trivial and challenging: contrastive learning requires very strong data augmentation in constructed positive pairs to learn the feature invariance, whereas traditional metric learning works do not have this requirement. However, object shape and appearance are entangled in 2D query images, thus making the learning task more difficult than contrasting single-modal data. To mitigate the challenges, we propose to use multi-view grayscale rendered images from the 3D shapes as a shape representation. We then introduce a strong data augmentation technique based on color transfer, which can significantly but naturally change the appearance of the query image, effectively satisfying the need for contrastive learning. Finally, we propose to incorporate a novel category-level contrastive loss that helps distinguish similar objects from different categories, in addition to classic instance-level contrastive loss. Our experiments demonstrate that our approach achieves the best performance on all the three popular IBSR benchmarks, including Pix3D, Stanford Cars, and Comp Cars, outperforming the previous state-of-the-art from 4% - 15% on retrieval accuracy.

About this repository

This repository provides data, pre-trained models and code.

Citations

@inProceedings{lin2021cmic,
	title={Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning},
	author={Lin, Ming-Xian and Yang, Jie and Wang, He and Lai, Yu-Kun and Jia, Rongfei and Zhao, Binqiang and Gao, Lin},
	year={2021},
	booktitle={International Conference on Computer Vision (ICCV)}
}

Updates

[Oct 1, 2021] Preliminary version of Data and Code released. For more code and data, coming soon. Please follow our updates.

You might also like...

PyTorch implemention of ICCV'21 paper SGPA: Structure-Guided Prior Adaptation for Category-Level 6D Object Pose Estimation

SGPA: Structure-Guided Prior Adaptation for Category-Level 6D Object Pose Estimation This is the PyTorch implemention of ICCV'21 paper SGPA: Structure

24 Dec 5, 2022

Official Implement of CVPR 2021 paper “Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting”

RGBT Crowd Counting Lingbo Liu, Jiaqi Chen, Hefeng Wu, Guanbin Li, Chenglong Li, Liang Lin. "Cross-Modal Collaborative Representation Learning and a L

37 Dec 8, 2022

Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal, multi-exposure and multi-focus image fusion.

U2Fusion Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal (VIS-IR, medical), multi

129 Dec 11, 2022

Comments

Where is the 'Models'？

Hi~ @tommaoer, thank you for the repo. As shown in (https://github.com/IGLICT/IBSR_jittor/blob/main/code/RetrievalNet.py#L3), 'from Models import RetrievalNet' but I can not find the file, where is the Models.py and RetrievalNet?

opened by tianbao-li 1
Missing code and files for reproducibility

Hello, thanks for sharing your work!

As I'm trying to reproduce your results on Pix3D, could you provide important data files including rendering_pix3d.pkl, pix3d_train.json, pix3d_test.json, predicted masks from OCRNet and so on. It would be very helpful if you can provide more details about how you train OCRNet on Pix3D and pretrained OCRNet model.

Thanks!

opened by qiruiw 0
Dataset Splits for Pix3D

Hello! Could you please provide us with the train / test splits that you have used in your paper for the Pix3D dataset? Also, I have noticed that in your paper you say that you are using 94 3D shapes from Comp Cars 3D, whereas the original Comp Cars 3D repository has 98 3D shapes (annotations are here). Is there a specific reason for removing 4 3D shapes?

Thanks a lot! Konstantinos

opened by ktertikas 1
There are some files missed

Hi, thanks for your sharing. But there are some files not contained in this repo, such as rendering_pix3d.pkl, pix3d_train.json, pix3d_test.json and pretrained models. Can you submit these files if possible? Thanks!

opened by datar001 1

Code for 'Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning', ICCV 2021

Related tags

Overview

CMIC-Retrieval

Introduction

About this repository

Citations

Updates

You might also like...

PyTorch implemention of ICCV'21 paper SGPA: Structure-Guided Prior Adaptation for Category-Level 6D Object Pose Estimation

Official Implement of CVPR 2021 paper “Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting”

Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal, multi-exposure and multi-focus image fusion.

Image-retrieval-baseline - MUGE Multimodal Retrieval Baseline

Code in conjunction with the publication 'Contrastive Representation Learning for Hand Shape Estimation'

A-SDF: Learning Disentangled Signed Distance Functions for Articulated Shape Representation (ICCV 2021)

《Image2Reverb: Cross-Modal Reverb Impulse Response Synthesis》(2021)

Probabilistic Cross-Modal Embedding (PCME) CVPR 2021

Instance-level Image Retrieval using Reranking Transformers

Comments

Where is the 'Models'？

Missing code and files for reproducibility

Dataset Splits for Pix3D

There are some files missed

Owner

Cross-Modal Contrastive Learning for Text-to-Image Generation

CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training

A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

Code for Referring Image Segmentation via Cross-Modal Progressive Comprehension, CVPR2020.

Single-stage Keypoint-based Category-level Object Pose Estimation from an RGB Image

Joint Learning of 3D Shape Retrieval and Deformation, CVPR 2021

pytorch implementation of "Contrastive Multiview Coding", "Momentum Contrast for Unsupervised Visual Representation Learning", and "Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination"

ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration

Code for "Share With Thy Neighbors: Single-View Reconstruction by Cross-Instance Consistency" paper

Code for ICCV 2021 paper: ARAPReg: An As-Rigid-As Possible Regularization Loss for Learning Deformable Shape Generators..