Code for the upcoming CVPR 2021 paper

Overview

The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth

Jamie Watson, Oisin Mac Aodha, Victor Prisacariu, Gabriel J. Brostow and Michael FirmanCVPR 2021

[Link to paper]

We introduce ManyDepth, an adaptive approach to dense depth estimation that can make use of sequence information at test time, when it is available.

  • Self-supervised: We train from monocular video only. No depths or poses are needed at training or test time.
  • Good depths from single frames; even better depths from short sequences.
  • Efficient: Only one forward pass at test time. No test-time optimization needed.
  • State-of-the-art self-supervised monocular-trained depth estimation on KITTI and CityScapes.

Overview

Cost volumes are commonly used for estimating depths from multiple input views:

Cost volume used for aggreagting sequences of frames

However, cost volumes do not easily work with self-supervised training.

Baseline: Depth from cost volume input without our contributions

In our paper, we:

  • Introduce an adaptive cost volume to deal with unknown scene scales
  • Fix problems with moving objects
  • Introduce augmentations to deal with static cameras and start-of-sequence frames

These contributions enable cost volumes to work with self-supervised training:

ManyDepth: Depth from cost volume input with our contributions

With our contributions, short test-time sequences give better predictions than methods which predict depth from just a single frame.

ManyDepth vs Monodepth2 depths and error maps

✏️ 📄 Citation

If you find our work useful or interesting, please cite our paper:

@inproceedings{watson2021temporal,
    author = {Jamie Watson and
              Oisin Mac Aodha and
              Victor Prisacariu and
              Gabriel Brostow and
              Michael Firman},
    title = {{The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth}},
    booktitle = {Computer Vision and Pattern Recognition (CVPR)},
    year = {2021}
}

📈 Results

Our ManyDepth method outperforms all previous methods in all subsections across most metrics, whether or not the baselines use multiple frames at test time. See our paper for full details.

KITTI results table

👀 Reproducing Paper Results

To recreate the results from our paper, run:

CUDA_VISIBLE_DEVICES=<your_desired_GPU> \
python -m manydepth.train \
    --data_path <your_KITTI_path> \
    --log_dir <your_save_path>  \
    --model_name <your_model_name>

Depending on the size of your GPU, you may need to set --batch_size to be lower than 12. Additionally you can train a high resolution model by adding --height 320 --width 1024.

For instructions on downloading the KITTI dataset, see Monodepth2

To train a CityScapes model, run:

CUDA_VISIBLE_DEVICES=<your_desired_GPU> \
python -m manydepth.train \
    --data_path <your_preprocessed_cityscapes_path> \
    --log_dir <your_save_path>  \
    --model_name <your_model_name> \
    --dataset cityscapes_preprocessed \
    --split cityscapes_preprocessed \
    --freeze_teacher_epoch 5 \
    --height 192 --width 512

This assumes you have already preprocessed the CityScapes dataset using SfMLearner's prepare_train_data.py script. We used the following command:

python prepare_train_data.py \
    --img_height 512 \
    --img_width 1024 \
    --dataset_dir <path_to_downloaded_cityscapes_data> \
    --dataset_name cityscapes \
    --dump_root <your_preprocessed_cityscapes_path> \
    --seq_length 3 \
    --num_threads 8

Note that while we use the --img_height 512 flag, the prepare_train_data.py script will save images which are 1024x384 as it also crops off the bottom portion of the image. You could probably save disk space without a loss of accuracy by preprocessing with --img_height 256 --img_width 512 (to create 512x192 images), but this isn't what we did for our experiments.

💾 Pretrained weights and evaluation

You can download weights for some pretrained models here:

To evaluate a model on KITTI, run:

CUDA_VISIBLE_DEVICES=<your_desired_GPU> \
python -m manydepth.evaluate_depth \
    --data_path <your_KITTI_path> \
    --load_weights_folder <your_model_path>
    --eval_mono

Make sure you have first run export_gt_depth.py to extract ground truth files.

And to evaluate a model on Cityscapes, run:

CUDA_VISIBLE_DEVICES=<your_desired_GPU> \
python -m manydepth.evaluate_depth \
    --data_path <your_cityscapes_path> \
    --load_weights_folder <your_model_path>
    --eval_mono \
    --eval_split cityscapes

During evaluation, we crop and evaluate on the middle 50% of the images.

We provide ground truth depth files HERE, which were converted from pixel disparities using intrinsics and the known baseline. Download this and unzip into splits/cityscapes.

🖼 Running on your own images

We provide some sample code in test_simple.py which demonstrates multi-frame inference. This predicts depth for a sequence of two images cropped from a dashcam video. Prediction also requires an estimate of the intrinsics matrix, in json format. For the provided test images, we have estimated the intrinsics to be equivalent to those of the KITTI dataset. Note that the intrinsics provided in the json file are expected to be in normalised coordinates.

Download and unzip model weights from one of the links above, and then run the following command:

python -m manydepth.test_simple \
    --target_image_path assets/test_sequence_target.jpg \
    --source_image_path assets/test_sequence_source.jpg \
    --intrinsics_json_path assets/test_sequence_intrinsics.json \
    --model_path path/to/weights

A predicted depth map rendering will be saved to assets/test_sequence_target_disp.jpeg.

👩‍⚖️ License

Copyright © Niantic, Inc. 2021. Patent Pending. All rights reserved. Please see the license file for terms.

Comments
  • Training with Lyft

    Training with Lyft

    Hi, I have some questions regarding training with a custom dataset.

    (I noticed that my issue became a bit lengthy, so here's a TL;DR):

    1. Can I use images pointing in more than one direction to increase the number of samples in the dataset?
    2. Do I need to modify the intrinsic matrix when cropping the images
    3. Can I use images from different cameras, with different dimensions, but points in the same direction?

    More in-depth questions

    I'm trying to use the data from the Lyft dataset. It contains images from multiple cameras, all pointing in different directions. I've mainly used the front-facing camera, but I'm not sure how good the result actually is. I've attached some samples of the original data and its corresponding disparity images:

    Original image image

    Disp_mono: image

    Disp_multi: image

    Training stats after 42k batches: image image image

    As you can see, the model has clearly learned the most important principles, but I still feel that these disparity images are not as good as those created by training with the Kitti dataset.

    The total number of images in the dataset from the front-facing camera is ~17 000. I guess that the model would benefit from more data, but this leads me to my questions

    Do you think it would be possible to use data from cameras pointing in different directions simultaneously as I use data from the front-facing camera? I'm a bit concerned about how this will affect the pose network, as the cameras move differently compared to each other. The Lyft vehicles are utilized with cameras in the following setup:

    image

    Another possibility that I might try is to use the backward-facing camera. Using this in reverse temporal order would simulate the car moving forward (although with some other views than the forward-facing ones).

    I have also tried to crop the images a bit, as the original images contain the lower part of the vehicle. By doing so, I have also changed the cx and cy parameters in the intrinsic matrix. (I used Berkley Automations library here: https://berkeleyautomation.github.io/perception/api/camera_intrinsics.html), but I'm not quite sure if I should change the intrinsic at all. I've done it like this:

    # This is defined in __init__()
    self.crop_value = (4, 200, 4, 216)
    
    # The intrinsic matrix is different for each vehicle, so each sequence contains the associated vehicle's intrinsic.
    path = pathlib.Path(self.data_path + folder).parent
    K = np.fromfile(f'{path}/CAM_FRONT_k_matrix.npy')
    K = K.reshape(3, 3)
    
    fx = K[0, 0] 
    cx = K[0, 2]
    fy = K[1, 1]
    cy = K[1, 2]
    
    # Initialize the camera intrinsic params.
    cam_intrinsics = CameraIntrinsics(
                fx=fx,
                fy=fy,
                cx=cx,
                cy=cy,
                width=self.full_res_shape[0],
                height=self.full_res_shape[1]
            )
    
    # Calculate the new dimensions and center points.
    cropped_width = self.full_res_shape[0] - self.crop_value[2] - self.crop_value[0]
    cropped_height = self.full_res_shape[1] - self.crop_value[3] - self.crop_value[1]
    
    # The center points are the original center points + (0.5 * the number of cropped pixels on the bottom) - (0.5 * the number of pixels cropped on the top)
    crop_cj = (self.full_res_shape[0] - self.crop_value[2] + self.crop_value[0]) // 2
    crop_ci = (self.full_res_shape[1] - self.crop_value[3] + self.crop_value[1]) // 2
    
    # Generate the new cropped intrinsics.
    cropped_intrinsics = cam_intrinsics.crop(
        height=cropped_height,
        width=cropped_width,
        crop_ci=crop_ci,
        crop_cj=crop_cj,
    )
    
    # Create the 4x4 version.
    intrinsics = np.array([[cropped_intrinsics.fx, 0, cropped_intrinsics.cx, 0],
                           [0, cropped_intrinsics.fy, cropped_intrinsics.cy, 0],
                           [0, 0, 1, 0],
                           [0, 0, 0, 1]]).astype(np.float32)
    
    # Resize fx and fy by the original dimensions and cx, cy by the cropped dimensions.
    intrinsics[0, 0] /= self.full_res_shape[0]
    intrinsics[1, 1] /= self.full_res_shape[1]
    intrinsics[0, 2] /= cropped_width
    intrinsics[1, 2] /= cropped_height
    

    I have also noticed that some of the sequences in the Lyft dataset contain images in different dimensions. Some of the images are in 1224x1024, and some in 1920x1080. As long as I normalize the intrinsic matrix with the corresponding image dimensions, do you think it would be any problems with using these images simulatenously? One possibility is maybe to crop both images so that they are in the same format, if this is possible (as per my other question).

    opened by didriksg 8
  • Why disable gradients of on lookup images?

    Why disable gradients of on lookup images?

    In resnet_encoder.py line 275~291

    # feature extraction on lookup images - disable gradients to save memory       
    with torch.no_grad():            
          if self.adaptive_bins:                
              self.compute_depth_bins(min_depth_bin, max_depth_bin)
         ......
    

    I don't understand why disable gradients of on lookup images, if don't do like this, will the result be impacted?

    opened by WBS-123 7
  • About test-time-refinement (TTR)

    About test-time-refinement (TTR)

    Hi @JamieWatson683, thank you for this very exciting project! May I ask you a question: Do you provide code for the test-time-refinement (TTR) as shown in the main table of the Results section? If so, how to use that for my own sequence?

    opened by heiwang1997 5
  • batch size 8 got abs rel 0.130

    batch size 8 got abs rel 0.130

    Hi, thanks for your great work and sharing code. I have a V100 GPU but I cannot start training with batch size 12. The maximum value of batch size I can use is 8. And I didn't change other parameters. Then I got a bad performance. image image How should I do to get a better performance?

    opened by sunnyHelen 5
  • Pre-trained models for monocular depth networks

    Pre-trained models for monocular depth networks

    Thank you for open-sourcing this very interesting work. Would it be possible to also provide weights for the monocular depth networks (the teacher networks) that go along with the currently available pre-trained models? Thank you!

    opened by VitorGuizilini-TRI 5
  • Depth map scale for KITTI data

    Depth map scale for KITTI data

    What is the scaling factor needed to get metric depth maps from output disparity maps with the KITTI dataset?

    I see that a lot of the code is from monodepth2 including using the same disparity to depth transformation when predicting for KITTI images, that is; disp_to_depth with default values 0.1 and 100, followed by scaling with the KITTI stereo factor of 5.4. Using these default values the transformation can be summarised by the following formula

    depth = 5.4 / (0.01+9.99*disparity)

    However using this same transformation on the output of manydepth results in depth maps with completely different scales to that of the monodepth2 depth maps. For example the output of test_sequence_target.jpg on the manydepth KITTI_HR model using multi mode has the following statistics: output | max value | mean | median | min value ------------: | ------------:| -------------: | ------------: | -------------: raw disparity|0.651358|0.247255|0.187170|0.027917 depth map |18.6921|3.23547|2.87261|0.828594

    Compare this with the output of running the same image on the monodepth2 mono+stereo_1024x320 model: output | max value | mean | median | min value ------------: | ------------:| -------------: | ------------: | -------------: raw disparity |0.114764|0.037749|0.026548|0.006090 depth map | 76.2298|20.6049|19.6213|4.66927

    The same can be seen for any images in the KITTI dataset.

    Clearly because the scale of the raw output disparities is very different there needs to be a different scale applied when transforming into depth, but I can't find anywhere in the code what this should be. Is there a known value to scale the depths maps for KITTI images so that depth is in a metric scale, or at least they more match the scale used by monodepth2 for KITTI images?

    opened by Benjabby 5
  • Train on own dataset with not good result

    Train on own dataset with not good result

    Hi, thanks for your interesting paper and innovative ideas on depth estimation. I am trying to use your model to train on our own campus dataset to see if it works well in real time. As a freshman on deep learning, I follow your experiment implementation and code instructions but still get frustrating results. Could you give me some advice on training to get a better result?

    My frame order is [0,-1,1], so I changed the code to match the input. Screenshot from 2021-05-12 21-53-15

    My result: Screenshot from 2021-05-12 21-57-27 Screenshot from 2021-05-12 21-57-31 Screenshot from 2021-05-12 21-57-40 Screenshot from 2021-05-12 21-57-50 Screenshot from 2021-05-12 21-58-09 Screenshot from 2021-05-12 21-58-17 Screenshot from 2021-05-12 21-58-25 Screenshot from 2021-05-12 21-58-30 Screenshot from 2021-05-12 21-58-34 Screenshot from 2021-05-12 21-58-38 Screenshot from 2021-05-12 21-58-42

    My settings: { "data_path": "/media/xzy/daa84e38-7f66-4aa4-a0ce-4fe978abe706/xzy/Downloads/manydepth/dump_root", "log_dir": "/media/xzy/daa84e38-7f66-4aa4-a0ce-4fe978abe706/xzy/Downloads/manydepth/log", "model_name": "Vecan_model", "split": "vecan", "num_layers": 18, "depth_binning": "linear", "num_depth_bins": 96, "dataset": "cityscapes_preprocessed", "png": true, "height": 192, "width": 640, "disparity_smoothness": 0.001, "scales": [ 0, 1, 2, 3 ], "min_depth": 0.1, "max_depth": 80.0, "frame_ids": [ 0, -1, 1 ], "batch_size": 8, "learning_rate": 0.0001, "num_epochs": 20, "scheduler_step_size": 15, "freeze_teacher_and_pose": false, "freeze_teacher_epoch": 5, "v1_multiscale": false, "avg_reprojection": false, "disable_automasking": false, "no_ssim": false, "weights_init": "pretrained", "use_future_frame": false, "num_matching_frames": 1, "disable_motion_masking": false, "no_matching_augmentation": false, "no_cuda": false, "num_workers": 8, "load_weights_folder": "/media/xzy/daa84e38-7f66-4aa4-a0ce-4fe978abe706/xzy/Downloads/manydepth/manydepth/checkpoint/KITTI_MR", "mono_weights_folder": null, "models_to_load": [ "encoder", "depth", "pose_encoder", "pose" ], "log_frequency": 250, "save_frequency": 1, "eval_stereo": false, "eval_mono": false, "disable_median_scaling": false, "pred_depth_scale_factor": 1, "ext_disp_to_eval": null, "eval_split": "eigen", "save_pred_disps": false, "no_eval": false, "eval_eigen_to_benchmark": false, "eval_out_dir": null, "post_process": false, "zero_cost_volume": false, "static_camera": false }

    opened by xzyxzy29 5
  • Depth estimation from underwater monocular video sequences

    Depth estimation from underwater monocular video sequences

    Hi@mdfirman @daniyar-niantic , Thanks for your work! I tested your model in the underwater data set, but the effect is not very good. after debugging, the loss function drops normally, and the pose network can work normally, but the final result is very strange.the depth data is almost between 0.01-0.15m. I want to ask whether is the model doesn't work for this type of dataset,?here are some images from my dataset, do you know what's the problem?Thanks! 3415 3415_disp_multi 3048 3374 4856

    opened by buster-zbb 4
  • "Normal" Training Loss and Strange Test Result

    After fixing the "--png" bug, I also faced difficulties in reproducing good results.

    Training Loss

    manydepth_loss

    with command

    CUDA_VISIBLE_DEVICES=0 python3 -m manydepth.train --data_path /home/kitti_raw/ --log_dir workdirs/ --model_name manydepth --png
    

    which is quite normal (I don't know what it is expected to be but that is reasonable at least).

    Test Results

       abs_rel |   sq_rel |     rmse | rmse_log |       a1 |       a2 |       a3 |                                                                                                                              
    &   0.454  &   4.961  &  12.336  &   0.607  &   0.288  &   0.541  &   0.754  \\ 
    

    which is of course a wrong one.

    Tensorboard Validation events

    tensorboard_val_color tensorboard_val_disp

    I can't detect bug from here.

    Local Modification of Codes

    For codes, I modified the datasets/mono_dataset.py on the color augmentation part in compatibility with the new torchvision (which does not seems to be the main problem). git_diff_dataset

    I also modified the export_gt scripts (I don't find the original script works because the splits are on the upper level folder of the script).

    git_diff_export_gt

    opened by Owen-Liuyuxuan 4
  • The results of cityscapes

    The results of cityscapes

    Hi, @JamieWatson683 @daniyar-niantic

    I evaluate the model with CityScapes (512x192) and got the same number as that in Table 3 in your paper (IEEE) and paper (arxiv).

    Is the resolution in Table 3 a clerical error? (416x128 in the paper, 512x192 in the weights) Is there a revised version of the paper available for reference?

    opened by Ecalpal 3
  • How can I save predicted depth map?

    How can I save predicted depth map?

    It's a great work! But I got one question.

    After evaluating the model, how can I save the predicted depth map? I notice that there is an option 'eval_split' in the code and I think it can save predicted depth map. If I set 'eval_split' is 'benchmark', an error occured: ` Traceback (most recent call last): File "/home/hzc/anaconda3/envs/cas/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/hzc/anaconda3/envs/cas/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/hzc/manydepth/manydepth/evaluate_depth.py", line 371, in evaluate(options.parse()) File "/home/hzc/manydepth/manydepth/evaluate_depth.py", line 158, in evaluate for i, data in tqdm.tqdm(enumerate(dataloader)): File "/home/hzc/anaconda3/envs/cas/lib/python3.6/site-packages/tqdm/std.py", line 1178, in iter for obj in iterable: File "/home/hzc/anaconda3/envs/cas/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 435, in next data = self._next_data() File "/home/hzc/anaconda3/envs/cas/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 475, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/home/hzc/anaconda3/envs/cas/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/hzc/anaconda3/envs/cas/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/hzc/manydepth/manydepth/datasets/mono_dataset.py", line 157, in getitem folder, frame_index + i, side, do_flip) File "/home/hzc/manydepth/manydepth/datasets/kitti_dataset.py", line 65, in get_color color = self.loader(self.get_image_path(folder, frame_index, side)) File "/home/hzc/manydepth/manydepth/datasets/kitti_dataset.py", line 82, in get_image_path self.data_path, folder, "image_0{}/data".format(self.side_map[side]), f_str) KeyError: None

    `

    opened by agenthong 3
  • Why the training time of manydepth is much shorter than monodepth2?

    Why the training time of manydepth is much shorter than monodepth2?

    I use the same training environment (python 3.7, 3080ti) to train manydepth and monodepth2 separately. Since we train a complete monodepth2 as a teacher in manydepth, the training time of manydepth is longer than mondepth2. But the fact is the training time of manydepth is about 7.5h and monodepth2 is about 14h in scale 0 1 quite counter-intuitive. Could anybody tell me why this could happen? In training console output of manydepth: epoch 1 | batch 683 | examples/s: 33.7 | loss: 0.29235 | time elapsed: 00h26m29s | time left: 06h52m52s Thanks in advance!

    opened by myalos 1
  • About scale problem in monocular setting

    About scale problem in monocular setting

    Hi, thanks for sharing the code ! I have a question about median scaling. I saw "We propose an adaptive cost volume to overcome the scale ambiguity arising from self-supervised training on monocular sequences" in the paper, and i saw median scaling in evaluate_depth.py. Is the problem of unknown scale still untackled? Could you explain more about scale ambiguity that you have overcome in the paper? Thanks in advance.

    opened by myalos 1
  • Help training out bright reflection...

    Help training out bright reflection...

    I'm trying to evaluate this for usage in a warehouse. It's almost working! Except I continue to get an incorrect reading on bright reflections - see below. Any ideas on how to train this 'away' ?

    Note that running this on the given KITTI_HR model didn't have that effect, but introduced others even stranger - e.g., the anomaly in the upper left of the image.

    Thoughts? (I just have to significantly up the ante on computing needs to go 10X greater in training...)

    Thanks! This is great! p

    Source image: image

    My trained image: (roughly 30K images as videos) image

    Same image using KITTI_HR trained model: image

    opened by pgaston 1
  • how to test many frames at the same time?

    how to test many frames at the same time?

    hello,after training the model,i used test_simple.py to test the model,but i do not know how to test many frames at the time? i can only use one target frame and one source fframe to test.

    opened by 9796l 0
  • Question of relative pose in matching augmentation.

    Question of relative pose in matching augmentation.

    Hi, after I look through the codes of the many depth, I found a confused part when doing the matching augmentation. When you try to solve the static camera problem, you replace the frame -1 to be the color augmented version of frame 0. However, in the code, it seems that you only change the rgb from frame -1 to frame 0, the pose is still the relative pose between frame -1 and frame 0. Therefore this original relative pose will be used to compute the cost volume. Whereas, I think it is a little bit unreasonable, because in this case, when you compute the cost volume, the two frame will be augmented frame 0 and frame 0, whose relative pose should be identity matrix, but, in practical, the corresponding relative pose is between frame -1 and frame 0. It is confused for me. Looking forward to your explain.

    opened by JarvisLee0423 0
  • Depth Estimation Results on Single Frames

    Depth Estimation Results on Single Frames

    Hi @daniyar-niantic, thank you for your good work!

    As mentioned in this issue, we can actually evaluate ManyDepth on single frames instead of current and previous. Therefore, I implement such evaluation based on your instruction and the results are like this:

    | $\text{Abs Rel}$ | $\text{Sq Rel}$ | $\text{RMSE}$ | $\text{RMSE log}$ | $\delta < 1.25$ | $\delta < 1.25^2$ | $\delta < 1.25^3$ | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | 0.118 | 0.894 | 4.765 | 0.192 | 0.871 | 0.959 | 0.982 |

    I would like to confer with you about whether the results are lying in a reasonable range. Thank you!

    opened by ldkong1205 0
Owner
Niantic Labs
Building technologies and ideas that move us
Niantic Labs
[CVPR 2022] CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation

CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation Prerequisite Please create and activate the following conda envrionment. To r

Qin Wang 87 Jan 8, 2023
[CVPR 21] Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, CVPR 2021. Ayan Kumar Bhunia, Pinaki nath Chowdhury, Yongxin Yan

Ayan Kumar Bhunia 44 Dec 12, 2022
Code for our CVPR 2021 paper "MetaCam+DSCE"

Joint Noise-Tolerant Learning and Meta Camera Shift Adaptation for Unsupervised Person Re-Identification (CVPR'21) Introduction Code for our CVPR 2021

FlyingRoastDuck 59 Oct 31, 2022
Official code of the paper "ReDet: A Rotation-equivariant Detector for Aerial Object Detection" (CVPR 2021)

ReDet: A Rotation-equivariant Detector for Aerial Object Detection ReDet: A Rotation-equivariant Detector for Aerial Object Detection (CVPR2021), Jiam

csuhan 334 Dec 23, 2022
Official code for the paper: Deep Graph Matching under Quadratic Constraint (CVPR 2021)

QC-DGM This is the official PyTorch implementation and models for our CVPR 2021 paper: Deep Graph Matching under Quadratic Constraint. It also contain

Quankai Gao 55 Nov 14, 2022
Code for CVPR 2021 paper: Anchor-Free Person Search

Introduction This is the implementationn for Anchor-Free Person Search in CVPR2021 License This project is released under the Apache 2.0 license. Inst

null 158 Jan 4, 2023
Code of paper "CDFI: Compression-Driven Network Design for Frame Interpolation", CVPR 2021

CDFI (Compression-Driven-Frame-Interpolation) [Paper] (Coming soon...) | [arXiv] Tianyu Ding*, Luming Liang*, Zhihui Zhu, Ilya Zharkov IEEE Conference

Tianyu Ding 95 Dec 4, 2022
Official code for the CVPR 2021 paper "How Well Do Self-Supervised Models Transfer?"

How Well Do Self-Supervised Models Transfer? This repository hosts the code for the experiments in the CVPR 2021 paper How Well Do Self-Supervised Mod

Linus Ericsson 157 Dec 16, 2022
Demo code for paper "Learning optical flow from still images", CVPR 2021.

Depthstillation Demo code for "Learning optical flow from still images", CVPR 2021. [Project page] - [Paper] - [Supplementary] This code is provided t

null 130 Dec 25, 2022
the code for our CVPR 2021 paper Bilateral Grid Learning for Stereo Matching Network [BGNet]

BGNet This repository contains the code for our CVPR 2021 paper Bilateral Grid Learning for Stereo Matching Network [BGNet] Environment Python 3.6.* C

3DCV developer 87 Nov 29, 2022
the code of the paper: Recurrent Multi-view Alignment Network for Unsupervised Surface Registration (CVPR 2021)

RMA-Net This repo is the implementation of the paper: Recurrent Multi-view Alignment Network for Unsupervised Surface Registration (CVPR 2021). Paper

Wanquan Feng 205 Nov 9, 2022
Code for CVPR 2021 oral paper "Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts"

Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts The rapid progress in 3D scene understanding has come with growing dem

Facebook Research 182 Dec 30, 2022
Research code for CVPR 2021 paper "End-to-End Human Pose and Mesh Reconstruction with Transformers"

MeshTransformer ✨ This is our research code of End-to-End Human Pose and Mesh Reconstruction with Transformers. MEsh TRansfOrmer is a simple yet effec

Microsoft 473 Dec 31, 2022
Code for the CVPR 2021 paper: Understanding Failures of Deep Networks via Robust Feature Extraction

Welcome to Barlow Barlow is a tool for identifying the failure modes for a given neural network. To achieve this, Barlow first creates a group of imag

Sahil Singla 33 Dec 5, 2022
The code for the CVPR 2021 paper Neural Deformation Graphs, a novel approach for globally-consistent deformation tracking and 3D reconstruction of non-rigid objects.

Neural Deformation Graphs Project Page | Paper | Video Neural Deformation Graphs for Globally-consistent Non-rigid Reconstruction Aljaž Božič, Pablo P

Aljaz Bozic 134 Dec 16, 2022
CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

selfcontact This repo is part of our project: On Self-Contact and Human Pose. [Project Page] [Paper] [MPI Project Page] It includes the main function

Lea Müller 68 Dec 6, 2022
CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

SMPLify-XMC This repo is part of our project: On Self-Contact and Human Pose. [Project Page] [Paper] [MPI Project Page] License Software Copyright Lic

Lea Müller 83 Dec 14, 2022
Code for the paper One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation, CVPR 2021.

One Thing One Click One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation (CVPR2021) Code for the paper One Thi

null 44 Dec 12, 2022
Code for the CVPR 2021 paper "Triple-cooperative Video Shadow Detection"

Triple-cooperative Video Shadow Detection Code and dataset for the CVPR 2021 paper "Triple-cooperative Video Shadow Detection"[arXiv link] [official l

Zhihao Chen 24 Oct 4, 2022