PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

Overview

Neural Scene Flow Fields

PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

[Project Website] [Paper] [Video]

Dependency

The code is tested with Python3, Pytorch >= 1.6 and CUDA >= 10.2, the dependencies includes

  • configargparse
  • matplotlib
  • opencv
  • scikit-image
  • scipy
  • cupy
  • imageio.
  • tqdm

Video preprocessing

  1. Download nerf_data.zip from link, an example input video with SfM camera poses and intrinsics estimated from COLMAP (Note you need to use COLMAP "colmap image_undistorter" command to undistort input images to get "dense" folder as shown in the example, this dense folder should include "images" and "sparse" folders).

  2. Download single view depth prediction model "model.pt" from link, and put it on the folder "nsff_scripts".

  3. Run the following commands to generate required inputs for training/inference:

    # Usage
    cd nsff_scripts
    # create camera intrinsics/extrinsic format for NSFF, same as original NeRF where it uses imgs2poses.py script from the LLFF code: https://github.com/Fyusion/LLFF/blob/master/imgs2poses.py
    python save_poses_nerf.py --data_path "/home/xxx/Neural-Scene-Flow-Fields/kid-running/dense/"
    # Resize input images and run single view model
    python run_midas.py --data_path "/home/xxx/Neural-Scene-Flow-Fields/kid-running/dense/" --input_w 640 --input_h 360 --resize_height 288
    # Run optical flow model (for easy setup and Pytorch version consistency, we use RAFT as backbond optical flow model, but should be easy to change to other models such as PWC-Net or FlowNet2.0)
    ./download_models.sh
    python run_flows_video.py --model models/raft-things.pth --data_path /home/xxx/Neural-Scene-Flow-Fields/kid-running/dense/ --epi_threhold 1.0 --input_flow_w 768 --input_semantic_w 1024 --input_semantic_h 576

Rendering from an example pretrained model

  1. Download pretraind model "kid-running_ndc_5f_sv_of_sm_unify3_F00-30.zip" from link. Unzipping and putting it in the folder "nsff_exp/logs/kid-running_ndc_5f_sv_of_sm_unify3_F00-30/360000.tar".

Set datadir in config/config_kid-running.txt to the root directory of input video. Then go to directory "nsff_exp":

   cd nsff_exp
  1. Rendering of fixed time, viewpoint interpolation
   python run_nerf.py --config configs/config_kid-running.txt --render_bt --target_idx 10

By running the example command, you should get the following result: Alt Text

  1. Rendering of fixed viewpoint, time interpolation
   python run_nerf.py --config configs/config_kid-running.txt --render_lockcam_slowmo --target_idx 8

By running the example command, you should get the following result: Alt Text

  1. Rendering of space-time interpolation
   python run_nerf.py --config configs/config_kid-running.txt --render_slowmo_bt  --target_idx 10

By running the example command, you should get the following result: Alt Text

Training

  1. In configs/config_kid-running.txt, modifying expname to any name you like (different from the original one), and running the following command to train the model:
    python run_nerf.py --config configs/config_kid-running.txt

The per-scene training takes ~2 days using 2 Nvidia V100 GPUs.

  1. Several parameters in config files you might need to know for training a good model
  • N_samples: in order to render images with higher resolution, you have to increase number sampled points
  • start_frame, end_frame: indicate training frame range. The default model usually works for video of 1~2s. Training on longer frames can cause oversmooth rendering. To mitigate the effect, you can increase the capacity of the network by increasing netwidth (but it can drastically increase training time and memory usage).
  • decay_iteration: number of iteartion in initialization stage. Data-driven losses will decay every 1000*decay_iteration steps. It's usually good to match decay_iteration to the number of training frames.
  • no_ndc: our current implementation only supports reconstruction in NDC space, meaning it only works for forward-facing scene like original NeRF. But it should be not hard to adapt to euclidean space.
  • use_motion_mask, num_extra_sample: whether to use estimated coarse motion segmentation mask to perform hard-mining sampling during initialization stage, and how many extra samples during initialization stage.
  • w_depth, w_optical_flow: weight of losses for single-view depth and geometry consistency priors described in the paper
  • w_cycle: weights of scene flow cycle consistency loss
  • w_sm: weight of scene flow smoothness loss
  • w_prob_reg: weight of disocculusion weight regularization

Evaluation on the Dynamic Scene Dataset

  1. Download Dynamic Scene dataset "dynamic_scene_data_full.zip" from link

  2. Download pretrained model "dynamic_scene_pretrained_models.zip" from link, unzip and put them in the folder "nsff_exp/logs/"

  3. Run the following command for each scene to get quantitative results reported in the paper:

   # Usage: configs/config_xxx.txt indicates each scene name such as config_balloon1-2.txt in nsff/configs
   python evaluation.py --config configs/config_xxx.txt
  • Note: you have to use modified LPIPS implementation included in this branch in order to measure LIPIS error for dynamic region only as described in the paper.

Acknowledgment

The code is based on implementation of several prior work:

License

This repository is released under the MIT license.

Citation

If you find our code/models useful, please consider citing our paper:

@article{li2020neural,
  title={Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes},
  author={Li, Zhengqi and Niklaus, Simon and Snavely, Noah and Wang, Oliver},
  journal={arXiv preprint arXiv:2011.13084},
  year={2020}
}
Comments
  • Question about image warping and blending weight

    Question about image warping and blending weight

    Hi, the idea of warping and having a blending weight predicting how to merge static and dynamic results is a good idea, however I have been confused by some operations in the implementation. Initially I was not sure if these confusing parts lead to bad results, until now I see several issues concerning the performance on own datasets, so I decided to post this issue, hoping to provide some insights that could potentially solve some of the issues.

    1. Predicting the blending weight by the static network seems really counterintuitive since dynamic objects move inside the scene independently of the static network. A more reasonable way is to predict this weight by the dynamic network as suggested by this paper. Or a even simpler way is to totally remove this blending weight and use the addition strategy as in NeRF-W. I did a short experiment on the difference between these two blending strategies (blending weight vs addition), and find that addition produces better reconstruction and novel view results. Although this finding might not be always correct for different data, at least I think predicting the blending weight by the static network is not the ideal way to go.
    2. For image warping, you do blending in the current timestamp https://github.com/zhengqili/Neural-Scene-Flow-Fields/blob/4cb2ef4314ca4f569d7d72f1b4ddc3e756dc3815/nsff_exp/render_utils.py#L1019-L1022 but only render the dynamic parts of t-1 and t+1 https://github.com/zhengqili/Neural-Scene-Flow-Fields/blob/4cb2ef4314ca4f569d7d72f1b4ddc3e756dc3815/nsff_exp/render_utils.py#L1055-L1057 this is really confusing, basically it means there are two rendering pipelines, and the network should learn to maximize the performance of both pipelines. It is hard to tell how this causes problem exactly, but I doubt that this reduces the final performance that uses blending. In my opinion, suppose that having the static network gives better result, you should use blending in current time rendering and in image warping too. Again, I don't know whether changing this yields better result, but this part is confusing.

    I would like to know @zhengqili 's opinion on these points, and maybe suggest the users to try these modifications to see if it solves some problem.

    opened by kwea123 10
  • Correct approach to separate static and dynamic regions

    Correct approach to separate static and dynamic regions

    Hi, I have originally mentioned this issue in #1, but it seems to deviate from the original question, so I decided to open a new issue.

    As discussed in #1, I tried setting the raw_blend_w to either 0 or 1 to create "static only" and "dynamic only" images that theoretically would look like the Fig. 5 in the paper and in the video. However, this approach seems to be wrong because from the result, the static part looks ok-ish but the dynamic part is almost everything, which is not good at all (we want only the moving part, e.g. only the running kid).

    It's been a week that I have been testing this while waiting for some response, but still to no avail. @zhengqili @sniklaus @snavely @owang Sorry for bothering, but could any of the authors kindly clarify what's wrong with my approach to separate static/dynamic by setting the blending weight to either 0 or 1? I also tried blending the sigmas (opacity in the code) instead of alphas as in the paper, or directly use the rgb_map_ref_dy as output image, but neither helped. https://github.com/zhengqili/Neural-Scene-Flow-Fields/blob/7d8a336919b2f0b0dfe458dfd35bee1ffa04bac0/nsff_exp/render_utils.py#L804-L809

    I have applied the above approach to other pretrained scenes, but none of them produces good results.


    Left: static (raw_blend_w=0). Right: dynamic (raw_blend_w=1).


    Left: static (raw_blend_w=0). Right: dynamic (raw_blend_w=1).

    I believe there's something wrong with my approach, but I cannot figure out. I really appreciate if the authors could kindly point out what's the correct approach. Thank you very much.

    opened by kwea123 8
  • Other data and the motion mask accuracy

    Other data and the motion mask accuracy

    Hi, thanks for the code! Do you plan to publish the full data (running kid, and other data you used in the paper other than the NVIDIA ones) as well?

    In fact, the thing I'd like to check the most is your motion masks' accuracy. I'd like know if it's really possible to let the network learn to separate the background and the foreground by only providing the "coarse mask" that you mentioned in the supplementary.

    For example for the bubble scene on the project page, how accurate should the mask be to clearly separate the bubbles from the background like you showed? Have you also experimented on the influence of the mask quality, i.e. if masks are more coarse (larger), then how well can the model separate bg/fg?

    opened by kwea123 7
  • error when run the evaluation.py with trained model

    error when run the evaluation.py with trained model

    image

    Hello, I really appreciate to your awesome work!!

    However, I have an error when I try to run the evaluation.py file with our trained model.

    I think there are lack of declaration on --chain_sf in the file.

    I used the given configuration file of kid-running scene.

    Here are script that I run

    python evaluation.py --datadir /data1/dogyoon/neural_sceneflow_data/nerf_data/kid-running/dense/ --expname Default_Test --config configs/config_kid-running.txt
    

    Do I should add the argument in evaluation.py file or there are any way to solve this problem?

    Thanks!!

    opened by dogyoonlee 3
  • Coordinate System Operations

    Coordinate System Operations

    Hi team, amazing paper!

    I am trying to adapt your model to regularise the volume-rendered scene flow against a monocular scene flow estimator from a different paper. The third-party scene flow estimator produces results in world coordinates (not normalised) so for me to compare against this model's scene flow, which is in NDC space, I need to transform one coordinate system to the other. This has me confused by some of the code you use to do coordinate system conversions.

    1. Firstly, in the supplementary document, and in the code, you reference "euclidean space". I couldn't find anything online about whether this is world space or camera space. Could you please clarify?
    2. The supplementary document references the NDC ray space derivation from the NeRF paper. That derivation outlines how to convert points from camera space (o) to NDC space (o'): Screen Shot 2021-07-14 at 12 39 21 pm Following this, I found this function, which appears to do the inverse operation of eq 25 above. https://github.com/zhengqili/Neural-Scene-Flow-Fields/blob/562049433ddb3e3dee620f2247f4173f74e03438/nsff_exp/run_nerf_helpers.py#L534-L541 That is, it converts from NDC to what I can conclude must be camera space. However, when I look at its invocation, the variable name suggests that this function converts from NDC to world coordinates https://github.com/zhengqili/Neural-Scene-Flow-Fields/blob/562049433ddb3e3dee620f2247f4173f74e03438/nsff_exp/run_nerf_helpers.py#L552
    3. Following this, the pipeline to project from 3d NDC to a 2d image has me quite confused: https://github.com/zhengqili/Neural-Scene-Flow-Fields/blob/562049433ddb3e3dee620f2247f4173f74e03438/nsff_exp/run_nerf_helpers.py#L546-L563 a) I assume se3_transform_points converts from world space to camera space, is that correct? b) Why do you perform the perspective projection from camera space? Everything I have read online seems to perform perspective projection from either world coordinates / ndc.

    Generally, it would be very helpful to me if you could point me to where you obtained the operations for perspective_projection, se3_transform_points and NDC2Euclidean.

    My graphics knowledge is limited so apologies if these questions are trivial. Your help is greatly appreciated :)

    opened by rohaldb 3
  • Running on my dataset

    Running on my dataset

    Hi, I want to run nsff on my dataset. When I run the preprocessing section on my dataset, motion_masks are all white. Does that mean there are some issues with my dataset or maybe I cannot run it with my dataset? How can I solve it? Thanks! f229cf6bb9297d77c8ccf0a433e4b5b

    opened by Carinazhao22 3
  • Evaluation Set

    Evaluation Set

    Hi,

    I found the evaluation.py actually use the training images. Could you please share how to get the exact number in Table 3 of your paper. More specifically, how to know which are

    the remaining 11 held-out images per time instance for evaluation

    ?

    opened by fuqichen1998 2
  • urlopen error [Errno 111]

    urlopen error [Errno 111]

    I get the following error while loading the pre-trained ResNet when I run run_midas.py from a remote server. On my local machine, it however worked (with python 3.9.12). Here I use python 3.7.4, but I also tried with python 3.8.5 with the same result.

    Traceback (most recent call last): File "run_midas.py", line 267, in <module> args.resize_height) File "run_midas.py", line 158, in run model = MidasNet(model_path, non_negative=True) File "/cluster/project/infk/courses/252-0579-00L/group34_nerf/CloudNeRF/other_papers/Neural-Scene-Flow-Fields/nsff_scripts/models/midas_net.py", line 30, in __init__ self.pretrained, self.scratch = _make_encoder(features, use_pretrained) File "/cluster/project/infk/courses/252-0579-00L/group34_nerf/CloudNeRF/other_papers/Neural-Scene-Flow-Fields/nsff_scripts/models/blocks.py", line 6, in _make_encoder pretrained = _make_pretrained_resnext101_wsl(use_pretrained) File "/cluster/project/infk/courses/252-0579-00L/group34_nerf/CloudNeRF/other_papers/Neural-Scene-Flow-Fields/nsff_scripts/models/blocks.py", line 26, in _make_pretrained_resnext101_wsl resnet = torch.hub.load("facebookresearch/WSL-Images", "resnext101_32x8d_wsl") File "/cluster/project/infk/courses/252-0579-00L/group34_nerf/CloudNeRF/other_papers/Neural-Scene-Flow-Fields/nsff_venv/lib64/python3.7/site-packages/torch/hub.py", line 403, in load repo_or_dir = _get_cache_or_reload(repo_or_dir, force_reload, verbose, skip_validation) File "/cluster/project/infk/courses/252-0579-00L/group34_nerf/CloudNeRF/other_papers/Neural-Scene-Flow-Fields/nsff_venv/lib64/python3.7/site-packages/torch/hub.py", line 170, in _get_cache_or_reload repo_owner, repo_name, branch = _parse_repo_info(github) File "/cluster/project/infk/courses/252-0579-00L/group34_nerf/CloudNeRF/other_papers/Neural-Scene-Flow-Fields/nsff_venv/lib64/python3.7/site-packages/torch/hub.py", line 124, in _parse_repo_info with urlopen(f"https://github.com/{repo_owner}/{repo_name}/tree/main/"): File "/cluster/apps/nss/python/3.7.4/x86_64/lib64/python3.7/urllib/request.py", line 222, in urlopen return opener.open(url, data, timeout) File "/cluster/apps/nss/python/3.7.4/x86_64/lib64/python3.7/urllib/request.py", line 525, in open response = self._open(req, data) File "/cluster/apps/nss/python/3.7.4/x86_64/lib64/python3.7/urllib/request.py", line 543, in _open '_open', req) File "/cluster/apps/nss/python/3.7.4/x86_64/lib64/python3.7/urllib/request.py", line 503, in _call_chain result = func(*args) File "/cluster/apps/nss/python/3.7.4/x86_64/lib64/python3.7/urllib/request.py", line 1360, in https_open context=self._context, check_hostname=self._check_hostname) File "/cluster/apps/nss/python/3.7.4/x86_64/lib64/python3.7/urllib/request.py", line 1319, in do_open raise URLError(err) urllib.error.URLError: <urlopen error [Errno 111] Connection refused>

    opened by ChlaegerIO 1
  • README Demo Not Working

    README Demo Not Working

    Hi,

    I believe there is an issue with the demo as outlined in the readme. Specifically, when trying to use the pre-trained model to render any of the 3 interpolation methods - time, viewpoint, or both - the resulting images (as found in nsff_exp/logs/kid-running_ndc_5f_sv_of_sm_unify3_testing_F00-30/<interpolation_dependent_name>/images) come out looking like this:

    5409d20a-cc1a-49fb-844b-2f74691ab620 or 49da9f71-2949-45b9-addd-7352e925ec48 depending on which form of interpolation I run.

    I have a colab file set up that obtains the above result. It pulls from a forked repository which has only two changes: I add a requirements.txt file and update the data_dir in nsff_exp/configs/config_kid-running.txt.

    Have I missed something or am I correct in saying the demo broken? Thanks!

    opened by rohaldb 1
  • Evaluation metrics

    Evaluation metrics

    Hi, I am wondering if there is a standard for SSIM and LPIPS.

    For SSIM: I see you use the implementation of scikit-image. When I use kornia's implementation with window size = 11 (I don't know what size scikit-image uses if it's not set), it seems to yield different result... Do you have idea what other authors use?

    For LPIPS:

    1. Do the other authors also use alexnet?
    2. https://github.com/zhengqili/Neural-Scene-Flow-Fields/blob/62a1770db37ae9d927cdd2c43918263da43d7efa/nsff_exp/evaluation.py#L326 The network expects the rgb to be scaled to [-1, 1]. If it's [0, 1] it seems that you need to pass argument normalize=True https://github.com/zhengqili/Neural-Scene-Flow-Fields/blob/62a1770db37ae9d927cdd2c43918263da43d7efa/nsff_exp/models/init.py#L26-L34 So I'm afraid the evaluation you have is not exactly correct...

    It makes me think that if there's no common standard for these metrics that might differ from one implementation to the other, or that sometimes the authors make mistake in the evaluation process, then only the PSNR score is credible...

    opened by kwea123 1
  • Multi-gpu training

    Multi-gpu training

    In readme you say it takes 2 days on 2 V100 gpus, but I don't see any option setting the number of gpus to use in run_nerf.py. Does it mean this code only supports single gpu training?

    opened by kwea123 1
  • Question about Least Kinetic Motion Prior

    Question about Least Kinetic Motion Prior

    Hi, I was wondering why

    sf_sm_loss += args.w_sm * compute_sf_lke_loss(ret['raw_pts_ref'], 
                                                        ret['raw_pts_post'], 
                                                        ret['raw_pts_prev'], 
                                                        H, W, focal)
    

    is called twice at the following two lines?:

    https://github.com/zhengqili/Neural-Scene-Flow-Fields/blob/d4001759a39b056c95d8bc22da34b10b4fb85afb/nsff_exp/run_nerf.py#L589

    https://github.com/zhengqili/Neural-Scene-Flow-Fields/blob/86ad6ddd1ce1c758bc908ef022ce843aae323d50/nsff_exp/run_nerf.py#L593

    Should compute_sf_lke_loss compute the $L_{temp}$ term?

    Thank you!

    opened by rliu100 0
  • Faster Training.

    Faster Training.

    Hello, Do you have any suggestions for making the training faster given GPUs with more memory? I'm working with 2 A6000s and would like to fully leverage the memory capacity.

    opened by jonathanhyunmoon 0
  • Singularly in NDC2Euclidean

    Singularly in NDC2Euclidean

    NDC2Euclidean appears to be attempting to prevent a divide-by-zero error by the addition of an epsilon value:

    def NDC2Euclidean(xyz_ndc, H, W, f):
        z_e = 2./ (xyz_ndc[..., 2:3] - 1. + 1e-6)
        x_e = - xyz_ndc[..., 0:1] * z_e * W/ (2. * f)
        y_e = - xyz_ndc[..., 1:2] * z_e * H/ (2. * f)
    
        xyz_e = torch.cat([x_e, y_e, z_e], -1)
     
        return xyz_e
    

    However, since the coordinates have scene flow field vectors added to them, and the scene flow field output ranges (-1.0,1.0), it is possible to have xyz_ndc significantly outside of the normal range. This means that a divide-by-zero can still happen in the above code if the z value hits (1.0+1e-6), which it does in our training.

    We suggest clamping to valid NDC values to the range (-1.0, 0.99), with 0.99 chosen to prevent the Euclidean far plane from getting too large. This choice of clamping has significantly stabilized our training in early iterations:

    z_e = 2./ (torch.clamp(xyz_ndc[..., 2:3], -1.0, 0.99) - 1.0)
    

    https://github.com/zhengqili/Neural-Scene-Flow-Fields/blob/main/nsff_exp/run_nerf_helpers.py#L535

    opened by geoffreymantel 0
  • How would you recommend adapting NSFF to non-forward facing scenes?

    How would you recommend adapting NSFF to non-forward facing scenes?

    Hello,

    first of all, thank you for releasing the implementation for your amazing project. The question I wanted to ask is how does one adapt NSFF to support reconstruction in euclidean space, thereby extending it to also work on non-forward facing scenes?

    In other words, which parts of the codebase would I need to modify to enable the codebase to run on such scenes? I'm guessing just setting the "no_ndc" flag to "True" inside the config file wouldn't be enough.

    opened by andrewsonga 1
  • RuntimeError: stack expects each tensor to be equal size & AttributeError: 'NoneType' object has no attribute 'shape'

    RuntimeError: stack expects each tensor to be equal size & AttributeError: 'NoneType' object has no attribute 'shape'

    #local run
    colmap feature_extractor \
    --database_path ./database.db --image_path ./dense/images/
    
    colmap exhaustive_matcher \
    --database_path ./database.db
    
    colmap mapper \
    --database_path ./database.db \
    --image_path ./dense/images \
    --output_path ./dense/sparse
    
    colmap image_undistorter \
    --image_path ./dense/images \
    --input_path ./dense/sparse/0 \
    --output_path ./dense \
    --output_type COLMAP \
    --max_image_size 2000
    

    #colab run

    %cd /content/drive/MyDrive/neural-net
    !git clone https://github.com/zl548/Neural-Scene-Flow-Fields
    %cd Neural-Scene-Flow-Fields
    !pip install configargparse
    !pip install matplotlib
    !pip install opencv
    !pip install scikit-image
    !pip install scipy
    !pip install cupy
    !pip install imageio.
    !pip install tqdm
    !pip install kornia
    

    my Images are 288x512 pixels

    %cd /content/drive/MyDrive/neural-net/Neural-Scene-Flow-Fields/nsff_scripts/
        # create camera intrinsics/extrinsic format for NSFF, same as original NeRF where it uses imgs2poses.py script from the LLFF code: https://github.com/Fyusion/LLFF/blob/master/imgs2poses.py
    !python save_poses_nerf.py --data_path "/content/drive/MyDrive/neural-net/Neural-Scene-Flow-Fields/nerf_data/bolli/dense"
        # Resize input images and run single view model, 
        # argument resize_height: resized image height for model training, width will be resized based on original aspect ratio
    !python run_midas.py --data_path "/content/drive/MyDrive/neural-net/Neural-Scene-Flow-Fields/nerf_data/bolli/dense"  --resize_height 512
    !bash ./download_models.sh
        # Run optical flow model
    !python run_flows_video.py --model models/raft-things.pth --data_path /content/drive/MyDrive/neural-net/Neural-Scene-Flow-Fields/nerf_data/bolli/dense
    

    Error:

    Traceback (most recent call last):
      File "run_flows_video.py", line 448, in <module>
        run_optical_flows(args)
      File "run_flows_video.py", line 350, in run_optical_flows
        images = load_image_list(images)
      File "run_flows_video.py", line 257, in load_image_list
        images = torch.stack(images, dim=0)
    RuntimeError: stack expects each tensor to be equal size, but got [3, 512, 288] at entry 0 and [3, 512, 287] at entry 31
    

    So input_w = is not consistent, eventhough my images are all dimensions 288x512

    Even if I modify the script:

    def load_image(imfile):
        long_dim = 512
    
        img = np.array(Image.open(imfile)).astype(np.uint8)
    
        # Portrait Orientation
        if img.shape[0] > img.shape[1]:
            input_h = long_dim
            input_w = 288
    

    The dimensions error is gone, but another error:

    ...
    flow input w 288 h 512
    0
    /usr/local/lib/python3.7/dist-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  ../aten/src/ATen/native/TensorShape.cpp:2157.)
      return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
    Traceback (most recent call last):
      File "run_flows_video.py", line 448, in <module>
        run_optical_flows(args)
      File "run_flows_video.py", line 363, in run_optical_flows
        (img_train.shape[1], img_train.shape[0]), 
    AttributeError: 'NoneType' object has no attribute 'shape'
    
    opened by bartman081523 1
Owner
Zhengqi Li
CS Ph.D. student at Cornell University/Cornell Tech
Zhengqi Li
Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

IC-Conv This repository is an official implementation of the paper Inception Convolution with Efficient Dilation Search. Getting Started Download Imag

Jie Liu 111 Dec 31, 2022
A Pytorch implementation of CVPR 2021 paper "RSG: A Simple but Effective Module for Learning Imbalanced Datasets"

RSG: A Simple but Effective Module for Learning Imbalanced Datasets (CVPR 2021) A Pytorch implementation of our CVPR 2021 paper "RSG: A Simple but Eff

null 120 Dec 12, 2022
Official PyTorch implementation of the preprint paper "Stylized Neural Painting", accepted to CVPR 2021.

Official PyTorch implementation of the preprint paper "Stylized Neural Painting", accepted to CVPR 2021.

Zhengxia Zou 1.5k Dec 28, 2022
PyTorch implementation of paper "IBRNet: Learning Multi-View Image-Based Rendering", CVPR 2021.

IBRNet: Learning Multi-View Image-Based Rendering PyTorch implementation of paper "IBRNet: Learning Multi-View Image-Based Rendering", CVPR 2021. IBRN

Google Interns 371 Jan 3, 2023
[CVPR 21] Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, CVPR 2021. Ayan Kumar Bhunia, Pinaki nath Chowdhury, Yongxin Yan

Ayan Kumar Bhunia 44 Dec 12, 2022
[CVPR 2022] CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation

CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation Prerequisite Please create and activate the following conda envrionment. To r

Qin Wang 87 Jan 8, 2023
PyTorch reimplementation of the paper Involution: Inverting the Inherence of Convolution for Visual Recognition [CVPR 2021].

Involution: Inverting the Inherence of Convolution for Visual Recognition Unofficial PyTorch reimplementation of the paper Involution: Inverting the I

Christoph Reich 100 Dec 1, 2022
This is an official implementation of our CVPR 2021 paper "Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression" (https://arxiv.org/abs/2104.02300)

Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression Introduction In this paper, we are interested in the bottom-up paradigm of estima

HRNet 367 Dec 27, 2022
Implementation for the paper SMPLicit: Topology-aware Generative Model for Clothed People (CVPR 2021)

SMPLicit: Topology-aware Generative Model for Clothed People [Project] [arXiv] License Software Copyright License for non-commercial scientific resear

Enric Corona 225 Dec 13, 2022
The official implementation of our CVPR 2021 paper - Hybrid Rotation Averaging: A Fast and Robust Rotation Averaging Approach

Graph Optimizer This repo contains the official implementation of our CVPR 2021 paper - Hybrid Rotation Averaging: A Fast and Robust Rotation Averagin

Chenyu 109 Dec 23, 2022
Tensorflow implementation of the paper "HumanGPS: Geodesic PreServing Feature for Dense Human Correspondences", CVPR 2021.

HumanGPS: Geodesic PreServing Feature for Dense Human Correspondences Tensorflow implementation of the paper "HumanGPS: Geodesic PreServing Feature fo

Google Interns 50 Dec 21, 2022
Implementation of the CVPR 2021 paper "Online Multiple Object Tracking with Cross-Task Synergy"

Online Multiple Object Tracking with Cross-Task Synergy This repository is the implementation of the CVPR 2021 paper "Online Multiple Object Tracking

null 54 Oct 15, 2022
Implementation of CVPR 2021 paper "Spatially-invariant Style-codes Controlled Makeup Transfer"

SCGAN Implementation of CVPR 2021 paper "Spatially-invariant Style-codes Controlled Makeup Transfer" Prepare The pre-trained model is avaiable at http

null 118 Dec 12, 2022
This repository contains a re-implementation of the code for the CVPR 2021 paper "Omnimatte: Associating Objects and Their Effects in Video."

Omnimatte in PyTorch This repository contains a re-implementation of the code for the CVPR 2021 paper "Omnimatte: Associating Objects and Their Effect

Erika Lu 728 Dec 28, 2022
The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation This repository is the official implementation of CVPR 2021 paper:

null 9 Nov 14, 2022
The official implementation of the CVPR 2021 paper FAPIS: a Few-shot Anchor-free Part-based Instance Segmenter

FAPIS The official implementation of the CVPR 2021 paper FAPIS: a Few-shot Anchor-free Part-based Instance Segmenter Introduction This repo is primari

Khoi Nguyen 8 Dec 11, 2022
Official implementation for CVPR 2021 paper: Adaptive Class Suppression Loss for Long-Tail Object Detection

Adaptive Class Suppression Loss for Long-Tail Object Detection This repo is the official implementation for CVPR 2021 paper: Adaptive Class Suppressio

CASIA-IVA-Lab 67 Dec 4, 2022
Official PyTorch implementation of RobustNet (CVPR 2021 Oral)

RobustNet (CVPR 2021 Oral): Official Project Webpage Codes and pretrained models will be released soon. This repository provides the official PyTorch

Sungha Choi 173 Dec 21, 2022
PyTorch implementation for COMPLETER: Incomplete Multi-view Clustering via Contrastive Prediction (CVPR 2021)

Completer: Incomplete Multi-view Clustering via Contrastive Prediction This repo contains the code and data of the following paper accepted by CVPR 20

XLearning Group 72 Dec 7, 2022