Focal Sparse Convolutional Networks for 3D Object Detection (CVPR 2022, Oral)

Overview

arXiv visitors

Focal Sparse Convolutional Networks for 3D Object Detection (CVPR 2022, Oral)

This is the official implementation of Focals Conv (CVPR 2022), a new sparse convolution design for 3D object detection (feasible for both lidar-only and multi-modal settings). For more details, please refer to:

Focal Sparse Convolutional Networks for 3D Object Detection [Paper]
Yukang Chen, Yanwei Li, Xiangyu Zhang, Jian Sun, Jiaya Jia

Experimental results

KITTI dataset

Car@R11 Car@R40 download
PV-RCNN + Focals Conv 83.91 85.20 Google | Baidu (key: m15b)
PV-RCNN + Focals Conv (multimodal) 84.58 85.34 Google | Baidu (key: ie6n)
Voxel R-CNN (Car) + Focals Conv (multimodal) 85.68 86.00 Google | Baidu (key: tnw9)

nuScenes dataset

mAP NDS download
CenterPoint + Focals Conv (multi-modal) 63.86 69.41 Google | Baidu (key: 01jh)
CenterPoint + Focals Conv (multi-modal) - 1/4 data 62.15 67.45 Google | Baidu (key: 6qsc)

Visualization of voxel distribution of Focals Conv on KITTI val dataset:

Getting Started

Installation

a. Clone this repository

https://github.com/dvlab-research/FocalsConv && cd FocalsConv

b. Install the environment

Following the install documents for OpenPCdet and CenterPoint codebases respectively, based on your preference.

*spconv 2.x is highly recommended instead of spconv 1.x version.

c. Prepare the datasets.

Download and organize the official KITTI and Waymo following the document in OpenPCdet, and nuScenes from the CenterPoint codebase.

*Note that for nuScenes dataset, we use image-level gt-sampling (copy-paste) in the multi-modal training. Please download this dbinfos_train_10sweeps_withvelo.pkl to replace the original one. (Google | Baidu (key: b466))

*Note that for nuScenes dataset, we conduct ablation studies on a 1/4 data training split. Please download infos_train_mini_1_4_10sweeps_withvelo_filter_True.pkl if you needed for training. (Google | Baidu (key: 769e))

d. Download pre-trained models.

If you want to directly evaluate the trained models we provide, please download them first.

If you want to train by yourselvef, for multi-modal settings, please download this resnet pre-train model first, torchvision-res50-deeplabv3.

Evaluation

We provide the trained weight file so you can just run with that. You can also use the model you trained.

For models in OpenPCdet,

NUM_GPUS=8
cd tools 
bash scripts/dist_test.sh ${NUM_GPUS} --cfg_file cfgs/kitti_models/voxel_rcnn_car_focal_multimodal.yaml --ckpt path/to/voxelrcnn_focal_multimodal.pth

bash scripts/dist_test.sh ${NUM_GPUS} --cfg_file cfgs/kitti_models/pv_rcnn_focal_multimodal.yaml --ckpt ../pvrcnn_focal_multimodal.pth

bash scripts/dist_test.sh ${NUM_GPUS} --cfg_file cfgs/kitti_models/pv_rcnn_focal_lidar.yaml --ckpt path/to/pvrcnn_focal_lidar.pth

For models in CenterPoint,

CONFIG="nusc_centerpoint_voxelnet_0075voxel_fix_bn_z_focal_multimodal"
python -m torch.distributed.launch --nproc_per_node=${NUM_GPUS} ./tools/dist_test.py configs/nusc/voxelnet/$CONFIG.py --work_dir ./work_dirs/$CONFIG --checkpoint centerpoint_focal_multimodal.pth

Training

For configures in OpenPCdet,

bash scripts/dist_train.sh ${NUM_GPUS} --cfg_file cfgs/kitti_models/CONFIG.yaml

For configures in CenterPoint,

python -m torch.distributed.launch --nproc_per_node=${NUM_GPUS} ./tools/train.py configs/nusc/voxelnet/$CONFIG.py --work_dir ./work_dirs/CONFIG
  • Note that we use 8 GPUs to train OpenPCdet models and 4 GPUs to train CenterPoint models.

TODO List

    • Config files and trained models on the overall Waymo dataset.
    • Config files and scripts for the test augs (double-flip and rotation) in nuScenes test submission.
    • Results and models of Focals Conv Networks on 3D Segmentation datasets.

Citation

If you find this project useful in your research, please consider citing:

@inproceedings{focalsconv-chen,
  title={Focal Sparse Convolutional Networks for 3D Object Detection},
  author={Chen, Yukang and Li, Yanwei and Zhang, Xiangyu and Sun, Jian and Jia, Jiaya},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year={2022}
}

Acknowledgement

  • This work is built upon the OpenPCDet and CenterPoint. Please refer to the official github repositories, OpenPCDet and CenterPoint for more information.

  • This README follows the style of IA-SSD.

License

This project is released under the Apache 2.0 license.

Related Repos

  1. spconv GitHub stars
  2. Deformable Conv GitHub stars
  3. Submanifold Sparse Conv GitHub stars
Comments
  • Question about the results on nuscenes

    Question about the results on nuscenes

    Thanks for your inspiring work.

    I have some questions concerning the results on nuscene.

    When do you impose the focal loss [loss_box_of_pts] on nuscenes? I notice that you only employ it with modality fusion but disable it with LiDAR only? Is there any reason behind this?

    opened by Solacex 14
  • I can't train the multi-modal accuracy(KITTI val split in AP3D(R11))you mentioned in the paper

    I can't train the multi-modal accuracy(KITTI val split in AP3D(R11))you mentioned in the paper

    my val result: epochs:90 3d AP:89.3103, 85.0520, 79.2089

    your val result in Table 8 Focal Conv-F 3d: 89.82 85.22 85.19

    I wonder why can't improve accuracy in hard situation and how many epochs you train?

    opened by qimingx 11
  • Backward error when training on nuScenes

    Backward error when training on nuScenes

    Thanks for contributing this wonderful work.

    Previously when I run Focal Conv on Kitti, every thing is OK. However, when I try to train on nuScene using nusc_centerpoint_voxelnet_0075voxel_fix_bn_z_focal. I got an Error :

    File "det3d/torchie/apis/train.py", line 337, in train_detector
        trainer.run(data_loaders, cfg.workflow, cfg.total_epochs, local_rank=cfg.local_rank)
      File "det3d/torchie/trainer/trainer.py", line 553, in run
        epoch_runner(data_loaders[i], self.epoch, **kwargs)
      File "det3d/torchie/trainer/trainer.py", line 428, in train
        self.call_hook("after_train_iter")
      File "det3d/torchie/trainer/trainer.py", line 335, in call_hook
        getattr(hook, fn_name)(self)
      File "det3d/core/utils/dist_utils.py", line 54, in after_train_iter
        runner.outputs["loss"].backward()
      File "torch/_tensor.py", line 484, in backward
        torch.autograd.backward(
      File "torch/autograd/__init__.py", line 191, in backward
        Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
    RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [148285, 16]], which is output 0 of ReluBackward0, is at version 12; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
    

    I also try to run the config of the normal CenterPoint with voxel net nusc_centerpoint_voxelnet_0075voxel_fix_bn_z.py in this repo and it trains smoothly. So I guess it is some problems occurs in the Focal Conv layer. Any Idea about this problem? Any hint or suggestion about the possible error place to look into also helps. A lot of Thanks.

    opened by klightz 8
  • lidar-only focal centerpoint config is missing

    lidar-only focal centerpoint config is missing

    I see centerpoint with multimodal focalsconv , but can not find centerpoint with unimodal focalsconv version. So I made new config myself changing centerpoint config from voxelnet to voxelfocal, from spmiddleresnetfhd to spmiddleresnetfhdfocal with use_img = false. Is it right way to do so??

    opened by konyul 7
  • Issues about fusing multimodal features

    Issues about fusing multimodal features

    dear author: In def construct_multimodal_features, there are two ways to fuse multimodal features. Fuse_sum = False means we use concatenation. But when I set fuse_sum = False in "out = out.replace_feature(self.construct_multimodal_features(out, x_rgb, batch_dict, False))", code will report an error which is "RuntimeError: running_mean should contain 32 elements not 16". I wonder how to amend code to solve this problem. Thanks a lot!

    opened by Eaton2022 5
  • Can not reproduce the result by loading the provided checkpoint on my machines

    Can not reproduce the result by loading the provided checkpoint on my machines

    Hello, sorry for another question, recently I try to train the model by myself, and I get a 4-5 points lower result than the README report. So I try to load the checkpoint you provide first for a valid test.

    However, I found that the checkpoint I loaded also in a lower performance. I also tried to setup the environment independently on my another machine and directly do the checkpoint test, still get a worse result, same number as my previous one. I suspect it may caused by some package version mismatch, or some api behavior. Any Idea about this?

    I think I exactly follow the guideline for the dataset and codebase setup. For the evaluation command, I follow the README and run:

    CONFIG="nusc_centerpoint_voxelnet_0075voxel_fix_bn_z_focal_multimodal"
    python -m torch.distributed.launch --nproc_per_node=${NUM_GPUS} ./tools/dist_test.py configs/nusc/voxelnet/$CONFIG.py --work_dir ./work_dirs/$CONFIG --checkpoint centerpoint_focal_multimodal.pth
    

    Here are my key packages version:

    torch==1.8.2
    opencv-python==4.4.0.46
    kornia==0.6.6
    spconv==2.1.22
    

    The output of the prediction:

    Loading NuScenes tables for version v1.0-trainval...
    23 category,
    8 attribute,
    4 visibility,
    64386 instance,
    12 sensor,
    10200 calibrated_sensor,
    2631083 ego_pose,
    68 log,
    850 scene,
    34149 sample,
    2631083 sample_data,
    1166187 sample_annotation,
    4 map,
    Done loading in 39.821 seconds.
    ======
    Reverse indexing ...
    Done reverse indexing in 9.6 seconds.
    ======
    Finish generate predictions for testset, save to work_dirs/nusc_centerpoint_voxelnet_0075voxel_fix_bn_z_focal_multimodal/infos_val_10sw
    eeps_withvelo_filter_True.json
    Initializing nuScenes detection evaluation
    Loaded results from work_dirs/nusc_centerpoint_voxelnet_0075voxel_fix_bn_z_focal_multimodal/infos_val_10sweeps_withvelo_filter_True.jso
    n. Found detections for 6019 samples.
    Loading annotations for val split from nuScenes version: v1.0-trainval
    100%|█████████████████████████████████████████████████████████████████████████████████████████████| 6019/6019 [00:17<00:00, 350.05it/s]
    Loaded ground truth annotations for 6019 samples.
    Filtering predictions
    => Original number of boxes: 497297
    => After distance based filtering: 357680
    => After LIDAR and RADAR points based filtering: 357680
    => After bike rack filtering: 357308
    Filtering ground truth annotations
    => Original number of boxes: 187528
    => After distance based filtering: 134565
    => After LIDAR and RADAR points based filtering: 121871
    => After bike rack filtering: 121861
    Rendering sample token 5376e3a2874542d8b440faa899e52b97
    Rendering sample token 14f665de1fa34d0a9d12838a5b77d687
    Rendering sample token c428be7e072c4c2489b90a6dcefcae4c
    Rendering sample token d6d3eac48860468aa0eba1ae2896b5ea
    Rendering sample token 67aad7ad948f44f8af668ea8389bdd52
    Rendering sample token 9c9f22a58fdc45f2b8a119cda3554f1f
    Rendering sample token e30f071748cc49eb85babe49265a4eda
    Rendering sample token f4550267cd0240e1a1ceb844e33e97d4
    Rendering sample token 93fdce35d7db4764ad5f822f57ab49e2
    Rendering sample token 22186f4894ab46b481a9e1ee31d7734e
    Accumulating metric data...
    Calculating metrics...
    Rendering PR and TP curves
    Saving metrics to: ./work_dirs/nusc_centerpoint_voxelnet_0075voxel_fix_bn_z_focal_multimodal
    mAP: 0.6030
    mATE: 0.2830
    mASE: 0.2549
    mAOE: 0.2796
    mAVE: 0.2552
    mAAE: 0.1885
    NDS: 0.6754
    Eval time: 110.2s
    Per-class results:
    Object Class    AP      ATE     ASE     AOE     AVE     AAE
    car     0.854   0.178   0.155   0.108   0.267   0.193
    truck   0.563   0.311   0.179   0.067   0.243   0.235
    bus     0.699   0.314   0.179   0.049   0.420   0.274
    trailer 0.396   0.513   0.209   0.442   0.191   0.177
    construction_vehicle    0.221   0.666   0.425   0.865   0.120   0.284
    pedestrian      0.853   0.140   0.275   0.377   0.212   0.093
    motorcycle      0.618   0.200   0.244   0.229   0.387   0.240
    bicycle 0.458   0.166   0.266   0.300   0.202   0.011
    traffic_cone    0.686   0.140   0.333   nan     nan     nan
    barrier 0.683   0.201   0.284   0.080   nan     nan
    Evaluation nusc: Nusc v1.0-trainval Evaluation
    car Nusc dist [email protected], 1.0, 2.0, 4.0
    76.30, 85.90, 89.02, 90.20 mean AP: 0.8535605962619657
    truck Nusc dist [email protected], 1.0, 2.0, 4.0
    38.66, 55.67, 63.59, 67.33 mean AP: 0.563122813202771
    construction_vehicle Nusc dist [email protected], 1.0, 2.0, 4.0
    3.60, 14.35, 29.32, 40.98 mean AP: 0.22061467261075263
    bus Nusc dist [email protected], 1.0, 2.0, 4.0
    45.98, 70.61, 80.23, 82.79 mean AP: 0.6990066750667856
    trailer Nusc dist [email protected], 1.0, 2.0, 4.0
    10.95, 34.49, 51.42, 61.45 mean AP: 0.3957810641400781
    barrier Nusc dist [email protected], 1.0, 2.0, 4.0
    58.83, 68.34, 72.09, 73.76 mean AP: 0.682558135475819
    motorcycle Nusc dist [email protected], 1.0, 2.0, 4.0
    53.71, 63.50, 64.63, 65.32 mean AP: 0.6179004111724922
    bicycle Nusc dist [email protected], 1.0, 2.0, 4.0
    43.90, 45.99, 46.47, 46.84 mean AP: 0.45797520318286145
    pedestrian Nusc dist [email protected], 1.0, 2.0, 4.0
    83.08, 84.88, 85.97, 87.29 mean AP: 0.8530652142070791
    traffic_cone Nusc dist [email protected], 1.0, 2.0, 4.0
    65.87, 67.16, 69.11, 72.33 mean AP: 0.6861473045618407
    

    So you can see i get a 60.3 on the full dataset, and I get 56 map on 1/4 dataset.

    BTW, I also check the dataset correctness by observing the behavior of the CenterPoint performance. I load the checkpoint under this setup centerpoint_voxel_1440 and I can exactly get the 59.6 mAP they as report. So i think something goes wrong with the image fusion part. Any Idea about this issue? Really thanks a lot!

    opened by klightz 5
  • voxel rcnn usage

    voxel rcnn usage

    Hi, thanks for your excellent work! I want to check the eval result of 3 classes on voxel rcnn. So how can I train? In the kitti_model directory, I can't see voxel rcnn with focal conv configuration files of 3 classes. Can you release it? thanks again

    opened by vehxianfish 5
  • how to generate 1/4 nuscenes data on openpcdet

    how to generate 1/4 nuscenes data on openpcdet

    hello, you have give the 1/4 nuscenes data in centerpoint based pkl as [infos_train_mini_1_4_10sweeps_withvelo_filter_True.pkl], so i want to now how to generate 1/4 data based on openpcdet. Thank you very much!

    opened by wenchened 4
  • why split foreground and background features based importance but then combine them for next submaniford conv?

    why split foreground and background features based importance but then combine them for next submaniford conv?

    Hi, these codes confuse me:

    Foreground and background features are splitted firstly based the predicted importance:

    https://github.com/dvlab-research/FocalsConv/blob/875e7b0c931ad7d1b2577c4c2201f228c7314a57/OpenPCDet/pcdet/models/backbones_3d/focal_sparse_conv/focal_sparse_conv.py#L212

    but then combine the foreground and background features to be fed into a next submaniford conv:

    https://github.com/dvlab-research/FocalsConv/blob/875e7b0c931ad7d1b2577c4c2201f228c7314a57/OpenPCDet/pcdet/models/backbones_3d/focal_sparse_conv/focal_sparse_conv.py#L216

    why background features participate in next conv?

    opened by FlyingQianMM 3
  • How to compute the params and runtime(inference time?)

    How to compute the params and runtime(inference time?)

    Dear author, first of all, thanks for your great work. After reading your paper, I really want to know how to calculate the params and the runtime of adding Focals Conv to VoxelRCNN as u mentioned in your Experiments, and I want to try it, but I don't know how to do it, there's little information on the Internet, and after searching it in Google, I got confused. So I ask for your help if you have the code to accomplish it. I would very appreciate it if you could help me! Thank you in advance.

    opened by Jane-QinJ 3
  • UnboundLocalError: local variable 'mv_height' referenced before assignment

    UnboundLocalError: local variable 'mv_height' referenced before assignment

    Hello, thank u for your awesome job! When I try to use the pv_rcnn_focal_lidar.yaml to train, there was an error occured: UnboundLocalError: local variable 'mv_height' referenced before assignment

    the full error is:

    Traceback (most recent call last): | 0/464 [00:00<?, ?it/s] File "train.py", line 201, in main() File "train.py", line 153, in main train_model( File "/data1/qinjia/FocalsConv/OpenPCDet/tools/train_utils/train_utils.py", line 111, in train_model accumulated_iter = train_one_epoch( File "/data1/qinjia/FocalsConv/OpenPCDet/tools/train_utils/train_utils.py", line 25, in train_one_epoch batch = next(dataloader_iter) File "/home/qinjia/Software/anaconda3/envs/FocalConv/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 363, in next data = self._next_data() File "/home/qinjia/Software/anaconda3/envs/FocalConv/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 989, in _next_data return self._process_data(data) File "/home/qinjia/Software/anaconda3/envs/FocalConv/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1014, in _process_data data.reraise() File "/home/qinjia/Software/anaconda3/envs/FocalConv/lib/python3.8/site-packages/torch/_utils.py", line 395, in reraise raise self.exc_type(msg) UnboundLocalError: Caught UnboundLocalError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/qinjia/Software/anaconda3/envs/FocalConv/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 185, in _worker_loop data = fetcher.fetch(index) File "/home/qinjia/Software/anaconda3/envs/FocalConv/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/qinjia/Software/anaconda3/envs/FocalConv/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/data1/qinjia/FocalsConv/OpenPCDet/tools/../pcdet/datasets/kitti/kitti_dataset.py", line 425, in getitem data_dict = self.prepare_data(data_dict=input_dict) File "/data1/qinjia/FocalsConv/OpenPCDet/tools/../pcdet/datasets/dataset.py", line 130, in prepare_data data_dict = self.data_augmentor.forward( File "/data1/qinjia/FocalsConv/OpenPCDet/tools/../pcdet/datasets/augmentor/data_augmentor.py", line 243, in forward data_dict = cur_augmentor(data_dict=data_dict) File "/data1/qinjia/FocalsConv/OpenPCDet/tools/../pcdet/datasets/augmentor/database_sampler.py", line 403, in call mv_height = mv_height[valid_mask] UnboundLocalError: local variable 'mv_height' referenced before assignment

    and I set the USE_ROAD_PLANE : FALSE in pv_rcnn_focal_lidar.yaml , because my custom dataset don't have the road plane data, but then I try to use KITTI data, the same error still occured.

    I try to google this error, it seems like that this vraiable is not defined before the reference. I try to check out the mv_height in /FocalsConv/OpenPCDet/tools/../pcdet/datasets/augmentor/database_sampler.py. line 403, in call: mv_height = mv_height[valid_mask]

    But I don't understand what wrong with this, if u can help me, I'd be very appreciated! Thanks!

    opened by Jane-QinJ 3
  • add web demo/model to Huggingface

    add web demo/model to Huggingface

    Hi, would you be interested in adding FocalsConv to Hugging Face? The Hub offers free hosting, and it would make your work more accessible and visible to the rest of the ML community. Models/datasets/spaces(web demos) can be added to a user account or organization similar to github.

    Example from other organizations: Keras: https://huggingface.co/keras-io Microsoft: https://huggingface.co/microsoft Facebook: https://huggingface.co/facebook

    Example spaces with repos: github: https://github.com/salesforce/BLIP Spaces: https://huggingface.co/spaces/salesforce/BLIP

    github: https://github.com/facebookresearch/omnivore Spaces: https://huggingface.co/spaces/akhaliq/omnivore

    and here are guides for adding spaces/models/datasets to your org

    How to add a Space: https://huggingface.co/blog/gradio-spaces how to add models: https://huggingface.co/docs/hub/adding-a-model uploading a dataset: https://huggingface.co/docs/datasets/upload_dataset.html

    Please let us know if you would be interested and if you have any questions, we can also help with the technical implementation.

    opened by AK391 2
Owner
DV Lab
Deep Vision Lab
DV Lab
Official PyTorch Implementation of Convolutional Hough Matching Networks, CVPR 2021 (oral)

Convolutional Hough Matching Networks This is the implementation of the paper "Convolutional Hough Matching Network" by J. Min and M. Cho. Implemented

Juhong Min 70 Nov 22, 2022
[CVPR 2022 Oral] EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

EPro-PnP EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation In CVPR 2022 (Oral). [paper] Hanshen

 同济大学智能汽车研究所综合感知研究组 ( Comprehensive Perception Research Group under Institute of Intelligent Vehicles, School of Automotive Studies, Tongji University) 842 Jan 4, 2023
[CVPR 2022] CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation

CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation Prerequisite Please create and activate the following conda envrionment. To r

Qin Wang 87 Jan 8, 2023
Official repository for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'21, Oral Presentation)

Official PyTorch Implementation for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'2021, Oral Presentation) HOTR: End-to-

Kakao Brain 114 Nov 28, 2022
This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent Convolutional Networks.

Orientation independent Möbius CNNs This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of

Maurice Weiler 59 Dec 9, 2022
Implementation of CVPR'2022:Reconstructing Surfaces for Sparse Point Clouds with On-Surface Priors

Reconstructing Surfaces for Sparse Point Clouds with On-Surface Priors (CVPR 2022) Personal Web Pages | Paper | Project Page This repository contains

null 151 Dec 26, 2022
Submanifold sparse convolutional networks

Submanifold Sparse Convolutional Networks This is the PyTorch library for training Submanifold Sparse Convolutional Networks. Spatial sparsity This li

Facebook Research 1.8k Jan 6, 2023
Official code for "Focal Self-attention for Local-Global Interactions in Vision Transformers"

Focal Transformer This is the official implementation of our Focal Transformer -- "Focal Self-attention for Local-Global Interactions in Vision Transf

Microsoft 486 Dec 20, 2022
Focal and Global Knowledge Distillation for Detectors

FGD Paper: Focal and Global Knowledge Distillation for Detectors Install MMDetection and MS COCO2017 Our codes are based on MMDetection. Please follow

Mesopotamia 261 Dec 23, 2022
Differentiable Neural Computers, Sparse Access Memory and Sparse Differentiable Neural Computers, for Pytorch

Differentiable Neural Computers and family, for Pytorch Includes: Differentiable Neural Computers (DNC) Sparse Access Memory (SAM) Sparse Differentiab

ixaxaar 302 Dec 14, 2022
[CVPR 2022 Oral] Rethinking Minimal Sufficient Representation in Contrastive Learning

Rethinking Minimal Sufficient Representation in Contrastive Learning PyTorch implementation of Rethinking Minimal Sufficient Representation in Contras

null 36 Nov 23, 2022
(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry Official implementation of the paper Multi-View Depth Est

Bae, Gwangbin 138 Dec 28, 2022
[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers

TubeDETR: Spatio-Temporal Video Grounding with Transformers Website • STVG Demo • Paper This repository provides the code for our paper. This includes

Antoine Yang 108 Dec 27, 2022
Official code for the CVPR 2022 (oral) paper "Extracting Triangular 3D Models, Materials, and Lighting From Images".

nvdiffrec Joint optimization of topology, materials and lighting from multi-view image observations as described in the paper Extracting Triangular 3D

NVIDIA Research Projects 1.4k Jan 1, 2023
[CVPR 2022 Oral] Crafting Better Contrastive Views for Siamese Representation Learning

Crafting Better Contrastive Views for Siamese Representation Learning (CVPR 2022 Oral) 2022-03-29: The paper was selected as a CVPR 2022 Oral paper! 2

null 249 Dec 28, 2022
Code for "Neural 3D Scene Reconstruction with the Manhattan-world Assumption" CVPR 2022 Oral

News 05/10/2022 To make the comparison on ScanNet easier, we provide all quantitative and qualitative results of baselines here, including COLMAP, COL

ZJU3DV 365 Dec 30, 2022
(CVPR 2022 Oral) Official implementation for "Surface Representation for Point Clouds"

RepSurf - Surface Representation for Point Clouds [CVPR 2022 Oral] By Haoxi Ran* , Jun Liu, Chengjie Wang ( * : corresponding contact) The pytorch off

Haoxi Ran 264 Dec 23, 2022
The Pytorch code of "Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot Classification", CVPR 2022 (Oral).

DeepBDC for few-shot learning        Introduction In this repo, we provide the implementation of the following paper: "Joint Distribution Matters: Dee

FeiLong 116 Dec 19, 2022
[CVPR 2022 Oral] Versatile Multi-Modal Pre-Training for Human-Centric Perception

Versatile Multi-Modal Pre-Training for Human-Centric Perception Fangzhou Hong1  Liang Pan1  Zhongang Cai1,2,3  Ziwei Liu1* 1S-Lab, Nanyang Technologic

Fangzhou Hong 96 Jan 3, 2023