Official implementation of "CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding" (CVPR, 2022)

Overview

CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding (CVPR'22)

Paper Link | Project Page

Abstract :

Manual annotation of large-scale point cloud dataset for varying tasks such as 3D object classification, segmentation and detection is often laborious owing to the irregular structure of point clouds. Self-supervised learning, which operates without any human labeling, is a promising approach to address this issue. We observe in the real world that humans are capable of mapping the visual concepts learnt from 2D images to understand the 3D world. Encouraged by this insight, we propose CrossPoint, a simple cross-modal contrastive learning approach to learn transferable 3D point cloud representations. It enables a 3D-2D correspondence of objects by maximizing agreement between point clouds and the corresponding rendered 2D image in the invariant space, while encouraging invariance to transformations in the point cloud modality. Our joint training objective combines the feature correspondences within and across modalities, thus ensembles a rich learning signal from both 3D point cloud and 2D image modalities in a self-supervised fashion. Experimental results show that our approach outperforms the previous unsupervised learning methods on a diverse range of downstream tasks including 3D object classification and segmentation. Further, the ablation studies validate the potency of our approach for a better point cloud understanding.

Citation

If you find our work, this repository, or pretrained models useful, please consider giving a star and citation.

@inproceedings{afham2022crosspoint,
    title={CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding}, 
    author={Mohamed Afham and Isuru Dissanayake and Dinithi Dissanayake and Amaya Dharmasiri and Kanchana Thilakarathna and Ranga Rodrigo},
    booktitle={IEEE/CVF International Conference on Computer Vision and Pattern Recognition},
    month = {June},
    year={2022}
  }

Dependencies

Refer requirements.txt for the required packages.

Pretrained Models

CrossPoint pretrained models with DGCNN feature extractor are available here.

Download data

Datasets are available here. Run the command below to download all the datasets (ShapeNetRender, ModelNet40, ScanObjectNN, ShapeNetPart) to reproduce the results.

cd data
source download_data.sh

Train CrossPoint

Refer scripts/script.sh for the commands to train CrossPoint.

Downstream Tasks

1. 3D Object Classification

Run eval_ssl.ipynb notebook to perform linear SVM object classification in both ModelNet40 and ScanObjectNN datasets.

2. Few-Shot Object Classification

Refer scripts/fsl_script.sh to perform few-shot object classification.

3. 3D Object Part Segmentation

Refer scripts/script.sh for fine-tuning experiment for part segmentation in ShapeNetPart dataset.

Acknowledgements

Our code borrows heavily from DGCNN repository. We thank the authors of DGCNN for releasing their code. If you use our model, please consider citing them as well.

Comments
  • It seems that the pretrain model you provide has gap on modelnet40

    It seems that the pretrain model you provide has gap on modelnet40

    Hi, I used your pretrain model directly test linear accuracy on modelnet40, it got 90.27%, same as I runed train_crosspoint.py without any initialize, But the result you mentioned in your paper can get 91.2%. So I want to know are there any tricks in your codes. Or It means I should train based on your pretrain model? I look forward to your answers

    opened by Zscozer 3
  • How did you get the 2D images corresponding to the ModelNet40, ScanObjectNN point cloud data? The content inside eval_ssl.ipynb looks incomprehensible, can you provide the original .py file code?

    How did you get the 2D images corresponding to the ModelNet40, ScanObjectNN point cloud data? The content inside eval_ssl.ipynb looks incomprehensible, can you provide the original .py file code?

    Hello, dear author! How did you get the 2D images corresponding to the ModelNet40, ScanObjectNN point cloud data? The content inside eval_ssl.ipynb looks incomprehensible, can you provide the original .py file code?

    opened by 2311762665 3
  • Downstream tasks 3D Object classification

    Downstream tasks 3D Object classification

    thanks for your great work! I'm confused that why you fit a simple linear SVM classifier on the train split of the classification datasets in 3D object classification? where can I find the corresponding code?

    opened by curryanswer 3
  • what variant do we use in few-shot learning on ScanObjectNN?

    what variant do we use in few-shot learning on ScanObjectNN?

    Hi, thank you for sharing such excellent results

    I would like to ask what variant do we use in few-shot learning on ScanObjectNN?

    OBJ ONLY OBJ BG PB T25 PB T25 R PB T50 R PB T50 RS

    Looking forward for your response, thank you

    opened by TangYuan96 2
  • Can't download the dataset using gdown

    Can't download the dataset using gdown

    When using the download_data.sh, it will raise the error: requests.exceptions.MissingSchema: Invalid URL '': No scheme supplied. Perhaps you meant http://?

    How to use gdown to download the dataset?

    opened by Phoebe-ovo 2
  • Can train_crosspoint.py train the partseg model based on ShapeNetPart?

    Can train_crosspoint.py train the partseg model based on ShapeNetPart?

    @MohamedAfham Thank you for releasing the code. The paper is well written and the code is robust.

    I have successfully trained the classification and part segmentation models based on train_crosspoint.py and train_partseg.py, respectively. Everything goes smoothly.

    One point I'm confused with is the comments in scripts/script.sh, you point out train_crosspoint.py can be used for training the part segmentation model and train_partseg.py is used for finetuing it. The code in train_crosspoint.py, however, only load ShapeNetRender for pretraining and ModelNet40 for linear accuracy evaluation. Actually, it does not load ShapeNetPart for part segmentation.

    Instead, I think both training and finetuning take place in train_partseg.py as the train_loader in this file is designed for ShapeNetPart. Further, I think the self-superviesd cross-modal contrastive learning is intended for point cloud classification. Have I got a correct understaning?

    opened by auniquesun 2
  • What's the GPU device used during your training and finetuing?

    What's the GPU device used during your training and finetuing?

    As the title described, I wonder the GPU device you used to support the batch_size=20.

    I use a RTX 2080 Ti, which has 11GB memory, when running train_crosspoint.py, I have to set batch_size=2 to avoid CUDA out of memory since you konw, knn and torch.cat in models/dgcnn.py will consume a large portion of memory.

    However, the small batch_size leads to much slower training procedure so that I can get the final results probably in 4 or 5 days.

    By the way, I have multiple GPUs, is it possible to incorporate DistributedDataParallel to accelerate the training procedure?

    Anyway, I will try it out!

    opened by auniquesun 1
  • get_graph_feature adds tensors on two different devices

    get_graph_feature adds tensors on two different devices

    Thank you for your contributions. Very interesting work!. I have two GPUs and I'm trying to run crosspoint pre-training for classification using:

    python train_crosspoint.py --model dgcnn --epochs 100 --lr 0.001 --exp_name crosspoint_dgcnn_cls --batch_size 20 --print_freq 200 --k 15

    And i'm getting the following error:

    Traceback (most recent call last):
      File "train_crosspoint.py", line 258, in <module>
        train(args, io)
      File "train_crosspoint.py", line 100, in train
        _, point_feats, _ = point_model(data)
      File "/home/nas/anaconda3/envs/crosspoint/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/nas/Desktop/CrossPoint/models/dgcnn.py", line 95, in forward
        x = get_graph_feature(x, k=self.k)
      File "/home/nas/Desktop/CrossPoint/models/dgcnn.py", line 31, in get_graph_feature
        idx = idx + idx_base
    RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! 
    

    Is there a reason for hardcoding cuda device to 1 here: https://github.com/MohamedAfham/CrossPoint/blob/440e3bdf1656014eb4284786a6b2bcdf83e8df30/models/dgcnn.py#L27

    opened by nazMahmoud 1
  • About the pointcloud visualization software in Fig.2

    About the pointcloud visualization software in Fig.2

    Hi, Mohamed Afham! I really appreciate your great work! And I think the figures in your paper are wonderful! Could you please tell me what the pointcloud visualization software is in Figure 2? It's looks nice! Thanks in advance! figure2_pc

    opened by Bingoang 0
  • Definition of the dgcnn_seg model

    Definition of the dgcnn_seg model

    Thanks for your nice work. I am trying to reproduce your fine-tuning results on ShapeNetPart segmentation. I find that the model architecture for classification pertaining and segmentation pertaining are different. More specifically, in classification pertaining, the dgcnn model is adopted, while dgcnn_seg is utilized for the pre-training for part segmentation, as shown in the following: https://github.com/MohamedAfham/CrossPoint/blob/364987ed8ae3b9f439b2c08305746ad3c451a820/scripts/script.sh#L6

    However, I can not find the definition of dgcnn_seg in your model library. I guess the dgcnn_seg should be the DGCNN_partseg model with pretrain=True, right?

    In addition, in my opinion, other paper may adopt the same architecture in pre-training for both classification and part segmentation, such as OcCo. Such a difference may lead to unfair comparison. What's your opinion?

    opened by YBZh 0
  • relatively large performance gap on ScanObjectNN

    relatively large performance gap on ScanObjectNN

    @MohamedAfham Recently, I have run all experiments in the codebase at least 3 times to ensure there are not explicit exceptions during my operations.

    Some of the results are very encouraging, which means they are comparable with the paper reported, sometimes even higher than that in the paper, e.g. the reproduced results on ModelNet. But some are not.

    Specifically, for the downstream task few-shot classification on ScanObjectNN, the performance gap is relatively large, e.g.,

    1. for 5 way, 10 shot, I got 72.5 ± 8.33,
    2. for 5 way, 20 shot, I got 82.5 ± 5.06,
    3. for 10 way, 10 shot, I got 59.4 ± 3.95,
    4. for 10 way, 20 shot, I got 67.8 ± 4.41

    For the downstream task linear SVM classification on ScanObjectNN, the reproduced performance is 75.73%. All experiments use the DGCNN backbone and default settings except for the batch size.

    In short, all of results are behind the reported peformances on ScanObjectNN in the paper, by a large margin.

    At this point, I wonder whether there are some precautions when experimenting on ScanObjectNN, and what possible reasons are. Can you provide some suggestions? thank you.

    opened by auniquesun 4
  • distributed training for CrossPoint

    distributed training for CrossPoint

    @MohamedAfham I have succefully integrated the PyTorch DistributedDataParallel mechanism into your codebase, which accelerates the training procedure remarkbly and achieves a similar performance with the paper reported.

    Later on I want to pull a request to your repository, thank you.

    opened by auniquesun 3
Owner
Mohamed Afham
Electronics and Telecommunication Engineering Undergraduate | Passionate in Deep Learning Research
Mohamed Afham
The 7th edition of NTIRE: New Trends in Image Restoration and Enhancement workshop will be held on June 2022 in conjunction with CVPR 2022.

NTIRE 2022 - Image Inpainting Challenge Important dates 2022.02.01: Release of train data (input and output images) and validation data (only input) 2

Andrés Romero 37 Nov 27, 2022
Official implementation of "Can You Spot the Chameleon? Adversarially Camouflaging Images from Co-Salient Object Detection" in CVPR 2022.

Jadena Official implementation of "Can You Spot the Chameleon? Adversarially Camouflaging Images from Co-Salient Object Detection" in CVPR 2022. arXiv

Qing Guo 13 Nov 29, 2022
Official Pytorch implementation of "Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes", CVPR 2022

Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes / 3DCrowdNet News ?? 3DCrowdNet achieves the state-of-the-art accuracy on 3D

Hongsuk Choi 113 Dec 21, 2022
Official Implementation of CVPR 2022 paper: "Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning"

(CVPR 2022) Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning ArXiv This repo contains Official Implementat

Yujun Shi 24 Nov 1, 2022
Commonality in Natural Images Rescues GANs: Pretraining GANs with Generic and Privacy-free Synthetic Data - Official PyTorch Implementation (CVPR 2022)

Commonality in Natural Images Rescues GANs: Pretraining GANs with Generic and Privacy-free Synthetic Data (CVPR 2022) Potentials of primitive shapes f

null 31 Sep 27, 2022
Official implementation for "QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation" (CVPR 2022)

QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation (CVPR2022) https://arxiv.org/abs/2203.08483 Unpaired image-to-image (I2I

Xueqi Hu 50 Dec 16, 2022
Official PyTorch implementation of the paper "Deep Constrained Least Squares for Blind Image Super-Resolution", CVPR 2022.

Deep Constrained Least Squares for Blind Image Super-Resolution [Paper] This is the official implementation of 'Deep Constrained Least Squares for Bli

MEGVII Research 141 Dec 30, 2022
Official pytorch implementation for Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion (CVPR 2022)

Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion This repository contains a pytorch implementation of "Learning to Listen: Modeling

null 50 Dec 17, 2022
[CVPR 2022] Official PyTorch Implementation for "Reference-based Video Super-Resolution Using Multi-Camera Video Triplets"

Reference-based Video Super-Resolution (RefVSR) Official PyTorch Implementation of the CVPR 2022 Paper Project | arXiv | RealMCVSR Dataset This repo c

Junyong Lee 151 Dec 30, 2022
(CVPR 2022 Oral) Official implementation for "Surface Representation for Point Clouds"

RepSurf - Surface Representation for Point Clouds [CVPR 2022 Oral] By Haoxi Ran* , Jun Liu, Chengjie Wang ( * : corresponding contact) The pytorch off

Haoxi Ran 264 Dec 23, 2022
Official implementation of the paper 'Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution' in CVPR 2022

LDL Paper | Supplementary Material Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution Jie Liang*, Hu

null 150 Dec 26, 2022
Official implementation for "Style Transformer for Image Inversion and Editing" (CVPR 2022)

Style Transformer for Image Inversion and Editing (CVPR2022) https://arxiv.org/abs/2203.07932 Existing GAN inversion methods fail to provide latent co

Xueqi Hu 153 Dec 2, 2022
Official MegEngine implementation of CREStereo(CVPR 2022 Oral).

[CVPR 2022] Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation This repository contains MegEngine implementation of ou

MEGVII Research 309 Dec 30, 2022
[CVPR 2022] Official code for the paper: "A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved Neural Network Calibration"

MDCA Calibration This is the official PyTorch implementation for the paper: "A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved

MDCA Calibration 21 Dec 22, 2022
Sound-guided Semantic Image Manipulation - Official Pytorch Code (CVPR 2022)

?? Sound-guided Semantic Image Manipulation (CVPR2022) Official Pytorch Implementation Sound-guided Semantic Image Manipulation IEEE/CVF Conference on

CVLAB 58 Dec 28, 2022
Official code of the paper "Expanding Low-Density Latent Regions for Open-Set Object Detection" (CVPR 2022)

OpenDet Expanding Low-Density Latent Regions for Open-Set Object Detection (CVPR2022) Jiaming Han, Yuqiang Ren, Jian Ding, Xingjia Pan, Ke Yan, Gui-So

csuhan 64 Jan 7, 2023
[CVPR 2022] Official Pytorch code for OW-DETR: Open-world Detection Transformer

OW-DETR: Open-world Detection Transformer (CVPR 2022) [Paper] Akshita Gupta*, Sanath Narayan*, K J Joseph, Salman Khan, Fahad Shahbaz Khan, Mubarak Sh

Akshita Gupta 127 Dec 27, 2022
Official code for the CVPR 2022 (oral) paper "Extracting Triangular 3D Models, Materials, and Lighting From Images".

nvdiffrec Joint optimization of topology, materials and lighting from multi-view image observations as described in the paper Extracting Triangular 3D

NVIDIA Research Projects 1.4k Jan 1, 2023
Official repository for the paper "Self-Supervised Models are Continual Learners" (CVPR 2022)

Self-Supervised Models are Continual Learners This is the official repository for the paper: Self-Supervised Models are Continual Learners Enrico Fini

Enrico Fini 73 Dec 18, 2022