Volumetric Correspondence Networks for Optical Flow, NeurIPS 2019.

Overview

VCN: Volumetric correspondence networks for optical flow

[project website]

Requirements

Pre-trained models

To test on any two images

Running visualize.ipynb gives you the following flow visualizations with color and vectors. Note: the sintel model "./weights/sintel-ft-trainval/finetune_67999.tar" is trained on multiple datasets and generalizes better than the KITTI model.

KITTI

This correspondens to the entry on the leaderboard (Fl-all=6.30%).

Evaluate on KITTI-15 benchmark

To run + visualize on KITTI-15 test set,

modelname=kitti-ft-trainval
i=149999
CUDA_VISIBLE_DEVICES=0 python submission.py --dataset 2015test --datapath dataset/kitti_scene/testing/   --outdir ./weights/$modelname/ --loadmodel ./weights/$modelname/finetune_$i.tar  --maxdisp 512 --fac 2
python eval_tmp.py --path ./weights/$modelname/ --vis yes --dataset 2015test
Evaluate on KITTI-val

To see the details of the train-val split, please scroll down to "note on train-val" and run dataloader/kitti15list_val.py, dataloader/kitti15list_train.py, dataloader/sitnellist_train.py, and dataloader/sintellist_val.py.

To evaluate on the 40 validation images of KITTI-15 (0,5,...195), (also assuming the data is at /ssd/kitti_scene)

modelname=kitti-ft-trainval
i=149999
CUDA_VISIBLE_DEVICES=0 python submission.py --dataset 2015 --datapath /ssd/kitti_scene/training/   --outdir ./weights/$modelname/ --loadmodel ./weights/$modelname/finetune_$i.tar  --maxdisp 512 --fac 2
python eval_tmp.py --path ./weights/$modelname/ --vis no --dataset 2015

To evaluate + visualize on KITTI-15 validation set,

python eval_tmp.py --path ./weights/$modelname/ --vis yes --dataset 2015

Evaluation error on 40 validation images : Fl-err = 3.9, EPE = 1.144

Sintel

This correspondens to the entry on the leaderboard (EPE-all-final = 4.404, EPE-all-clean = 2.808).

Evaluate on Sintel-val

To evaluate on Sintel validation set,

modelname=sintel-ft-trainval
i=67999
CUDA_VISIBLE_DEVICES=0 python submission.py --dataset sintel --datapath /ssd/rob_flow/training/   --outdir ./weights/$modelname/ --loadmodel ./weights/$modelname/finetune_$i.tar  --maxdisp 448 --fac 1.4
python eval_tmp.py --path ./weights/$modelname/ --vis no --dataset sintel

Evaluation error on sintel validation images: Fl-err = 7.9, EPE = 2.351

Train the model

We follow the same stage-wise training procedure as prior work: Chairs->Things->KITTI or Chairs->Things->Sintel, but uses much lesser iterations. If you plan to train the model and reproduce the numbers, please check out our supplementary material for the differences in hyper-parameters with FlowNet2 and PWCNet.

Pretrain on flying chairs and flying things

Make sure you have downloaded flying chairs and flying things subset, and placed them under the same folder, say /ssd/.

To first train on flying chairs for 140k iterations with a batchsize of 8, run (assuming you have two gpus)

CUDA_VISIBLE_DEVICES=0,1 python main.py --maxdisp 256 --fac 1 --database /ssd/ --logname chairs-0 --savemodel /data/ptmodel/  --epochs 1000 --stage chairs --ngpus 2

Then we want to fine-tune on flying things for 80k iterations with a batchsize of 8, resume from your pre-trained model or use our pretrained model

CUDA_VISIBLE_DEVICES=0,1 python main.py --maxdisp 256 --fac 1 --database /ssd/ --logname things-0 --savemodel /data/ptmodel/  --epochs 1000 --stage things --ngpus 2 --loadmodel ./weights/charis/finetune_141999.tar --retrain false

Note that to resume the number of iterations, put the iteration to start from in iter_counts-(your suffix).txt. In this example, I'll put 141999 in iter_counts-0.txt. Be aware that the program reads/writes to iter_counts-(suffix).txt at training time, so you may want to use different suffix when multiple training programs are running at the same time.

Finetune on KITTI / Sintel

Please first download the kitti 2012/2015 flow dataset if you want to fine-tune on kitti. Download rob_devkit if you want to fine-tune on sintel.

To fine-tune on KITTI with a batchsize of 16, run

CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --maxdisp 512 --fac 2 --database /ssd/ --logname kitti-trainval-0 --savemodel /data/ptmodel/  --epochs 1000 --stage 2015trainval --ngpus 4 --loadmodel ./weights/things/finetune_211999.tar --retrain true

To fine-tune on Sintel with a batchsize of 16, run

CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --maxdisp 448 --fac 1.4 --database /ssd/ --logname sintel-trainval-0 --savemodel /data/ptmodel/  --epochs 1000 --stage sinteltrainval --ngpus 4 --loadmodel ./weights/things/finetune_239999.tar --retrain true

Note on train-val

  • To tune hyper-parameters, we use a train-val split for kitti and sintel, which is not covered by the above procedure.
  • For kitti we use every 5th image in the training set (0,5,10,...195) for validation, and the rest for training; while for Sintel, we manually select several sequences for validation.
  • If you plan to use our split, put "--stage 2015train" or "--stage sinteltrain" for training.
  • The numbers in Tab.3 of the paper is on the whole train-val set (all the data with ground-truth).
  • You might find run.sh helpful to run evaluation on KITTI/Sintel.

Measure FLOPS

python flops.py

gives

PWCNet: flops(G)/params(M):90.8/9.37

VCN: flops(G)/params(M):96.5/6.23

Note on inference time

The current implementation runs at 180ms/pair on KITTI-sized images at inference time. A rough breakdown of running time is: feature extraction - 4.9%, feature correlation - 8.7%, separable 4D convolutions - 56%, trun. soft-argmin (soft winner-take-all) - 20% and hypotheses fusion - 9.5%. A detailed breakdown is shown below in the form "name-level percentage".

Note that separable 4D convolutions use less FLOPS than 2D convolutions (i.e., feature extraction module + hypotheses fusion module, 47.8 v.s. 53.3 Gflops) but take 4X more time (56% v.s. 14.4%). One reason might be that pytorch (also other packages) is more friendly to networks with more feature channels than those with large spatial size given the same Flops. This might be fixed at the conv kernel / hardware level.

Besides, the truncated soft-argmin is implemented with 3D max pooling, which is inefficient and takes more time than expected.

Acknowledgement

Thanks ClementPinard, Lyken17, NVlabs and many others for open-sourcing their code.

Citation

@inproceedings{yang2019vcn,
  title={Volumetric Correspondence Networks for Optical Flow},
  author={Yang, Gengshan and Ramanan, Deva},
  booktitle={NeurIPS},
  year={2019}
}
Comments
  • About separable 4d convolution

    About separable 4d convolution

    Hello, thank you for introducing nice work and code!

    I have a question about separable 4d convolution.

    Is the WTA operation explicitly applied to the 4d separable convolution? In the code, I can't see the part of WTA in the conv4d.py. or is the WTA kernel in sec3.2 the same as the other kernel(e.g., spatial kernel)?

    Plus, can you explain (or derive) more in details about the way to decompose 4d convolution into the wta and spatial convolution (sec3.2, [4d-convolution -> factorization]? Actually, it was hard to understand how the original 4d convolution be equal to the factorized one by equation.

    It would be very appreciated if these questions are answered. Thank you!

    opened by shim94kr 4
  • How to evaluate on KITTI and Sintel

    How to evaluate on KITTI and Sintel

    Hi gengshan~~ Thanks for your wonderful work and code!

    I have done the three training stage: FlyingChairs->FlyingThings->KITTI. I tried the following code to evaluate my model:

    modelname=kitti-ft-trainval
    i=149999
    CUDA_VISIBLE_DEVICES=0 python submission.py --dataset 2015 --datapath /ssd/kitti_scene/training/   --outdir ./weights/$modelname/ --loadmodel ./weights/$modelname/finetune_$i.tar  --maxdisp 512 --fac 2
    python eval_tmp.py --path ./weights/$modelname/ --vis no --dataset 2015
    

    But I get the following error:

    epe = np.sqrt(np.power(gtflow - flow,2).sum(-1))[mask]
    ValueError: operands could not be broadcast together with shapes (370,1224,2) (375,1242,2)
    

    Another problem is how to finetune my model on Sintel, I am confused with the "rob_devkit" tool. When I run python flow_devkit.py, it asks me which format to use. There are two choices: middlebury and kitti. Which format should I use? I tried to use kitti format, and get the following folder structure:

    rob_devkit/
        flow/
            datasets_kitti2015/
                metadata/
                test/
                training/
                bundler/
    

    But in your main.py, the sintel dataset is loaded by iml0, iml1, flowl0 = ls.dataloader('%s/rob_flow/training/'%args.database). I don't know where to find the rob_flow.

    I am a newcomer to this field and puzzled by these problems for several days. Could you please help me?

    opened by ghost 4
  • How to test the PWC-Net module

    How to test the PWC-Net module

    Hi, thank you for your great work.

    I have used the VCN module and it works well. But when I wanted to train the PWC-Net module to make a comparison, I found that just changing this(line 201, in main.py)

    model = VCN([batch_size//ngpus]+data_inuse.datasets[0].shape[::-1], md=[int(4*(args.maxdisp/256)), 4,4,4,4], fac=args.fac)
    

    to

    model = PWCDCNet([batch_size//ngpus]+data_inuse.datasets[0].shape[::-1])
    

    cannot work, it shows:

    Iter 2 training loss = -7931635.000 , AEPE = 234449312.000 , time = 0.16
    THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1579027003190/work/aten/src/THC/THCCachingHostAllocator.cpp line=278 error=77 : an illegal memory access was encountered
    

    I think it is because in models/PWCNet.py, line 300, the output is

    return flow2*20,flow3*20,flow4*20,flow5*20,flow6*20,flow2, flow2[:,0]
    

    It seems that he output[-2] and output[-1] are not the required loss and oor, so should I rewrite these part as that in VCN in order to train the PWCNet? namely, adding

                oor2 = F.upsample(oor2[:,np.newaxis], [im.size()[2],im.size()[3]], mode='bilinear')[:,0]
                oor3 = F.upsample(oor3[:,np.newaxis], [im.size()[2],im.size()[3]], mode='bilinear')[:,0]
                oor4 = F.upsample(oor4[:,np.newaxis], [im.size()[2],im.size()[3]], mode='bilinear')[:,0]
                oor5 = F.upsample(oor5[:,np.newaxis], [im.size()[2],im.size()[3]], mode='bilinear')[:,0]
                oor6 = F.upsample(oor6[:,np.newaxis], [im.size()[2],im.size()[3]], mode='bilinear')[:,0]
                loss += self.get_oor_loss(flowl0[:,:2]-0,        oor6, (64* self.flow_reg64.flowx.max()),occ_mask)
                loss += self.get_oor_loss(flowl0[:,:2]-up_flow6, oor5, (32* self.flow_reg32.flowx.max()),occ_mask)
                loss += self.get_oor_loss(flowl0[:,:2]-up_flow5, oor4, (16* self.flow_reg16.flowx.max()),occ_mask)
                loss += self.get_oor_loss(flowl0[:,:2]-up_flow4, oor3, (8* self.flow_reg8.flowx.max())  ,occ_mask)
                loss += self.get_oor_loss(flowl0[:,:2]-up_flow3, oor2, (4* self.flow_reg4.flowx.max())  ,occ_mask)
    
                return flow2*20, flow3*20,flow4*20,flow5*20,flow6*20,loss, oor2
    

    Thank you very much for any help!

    opened by littlespray 3
  • Questions about codes

    Questions about codes

    Hi gengshan~

    Thanks for your wonderful work and code!

    When I tried to train the VCNet according to the "Pretrain on flying chairs and flying things" part of readme, I found an error which is "one of the variables needed for gradient computation has been modified by an inplace operation". But I just modified some paths for my own environment, and didn't change other parts.

    I am so confused about this error, so I wonder that: (1) have you met this error before? (2) can you give me some suggestions about solving this problem?

    Thanks for your help! And I wish to hear from you soon!

    Best wishes.

    Blcony

    opened by Blcony 3
  • about the third channel of flow

    about the third channel of flow

    Hi, to my understanding, flowl0 if of shape [H, W, 3], where the third channel indicates whether the flow at that position is valid. Specifically, mask_0 is used to track valid pixels after spatial transformation augmentation. However, I find that the target variable is always of two channels. So every time you use target[..., 2:3], you only get a numpy array of shape [H,W,0], which seems to be erroneous? https://github.com/gengshan-y/VCN/blob/00c4befdbdf4e42050867996a6f686f52086e01a/dataloader/flow_transforms.py#L206

    opened by btwbtm 2
  • the number of Training epoch

    the number of Training epoch

    Hi,

    I'm trying to reproduce your results. In the README.md, finetuning on KITTI, sintel needs 1000 epoch.

    Is a model trained for 1000epochs to reproduce the results in the paper?

    thx.

    opened by OreoChocolate 2
  • Real-time inference?

    Real-time inference?

    Hi, I congrats for your work. I would like to know whether at inference time is it possible to use this model as part of a framework that performs the online action detection task? (i.e. let's suppose I have an input stream video, is the model able to compute optical-flow features as the frames arrive, with real-time speed ?)

    Thank you!

    opened by FedericoVasile1 2
  • Unable to untar the pre-trained VCN model weights

    Unable to untar the pre-trained VCN model weights

    Hi,

    Thanks so much for open-sourcing your work!

    I am trying to load your pre-trained model weights from (https://drive.google.com/drive/folders/1mgadg50ti1QdwfAf6aR2v1pCx-ITsYfE?usp=sharing) but I am not able to untar the finetune_149999.tar from neither "chairs" or "kitti-ft-trainval". Below is the error message:

    tar -xvf finetune_149999.tar
    tar: Error opening archive: Unrecognized archive format
    
    opened by pichuang1984 2
  • Chair and Things dataset when fine-tuning on Sintel

    Chair and Things dataset when fine-tuning on Sintel

    Hi Gengshan,

    You added the Chair, Things, Hd1k, and Kitti datasets to the Sintel dataset when fine-tuning. I guess this operation aims to improve generalization ability and avoid overfitting. I was wondering is this important for the Sintel benchmark result and training set performance?

    Best Regards

    opened by jytime 2
  • RuntimeError: expected device cuda:1 but got device cuda:0

    RuntimeError: expected device cuda:1 but got device cuda:0

    Hi everyone! I am a beginner in Pytorch and Computer vision. I just cloned the source code(without installing the correlation module, I thought it is not necessary for training?) and began training using the command:

    CUDA_VISIBLE_DEVICES=1,2 python main.py \
    --maxdisp 256 --fac 1 \
    --database /ssd2/ \
    --logname chairs-0 \
    --savemodel /ssd1/models/vcn/ \
    --epochs 1000 --stage chairs --ngpus 2
    

    and I got the following traceback:

    File "main.py", line 258, in train
          vis['AEPE'] = realEPE(output[0].detach(), flowl0.permute(0,3,1,2).detach(),mask,sparse=False)
    File "/home/VCN/utils/multiscaleloss.py", line 86, in realEPE
          return EPE(upsampled_output, target,mask, sparse, mean=True)
    File "/home/VCN/utils/multiscaleloss.py", line 12, in EPE
          EPE_map = torch.norm(target_flow-input_flow,2,1)
    

    Thanks very much for any help.

    opened by littlespray 2
  • Weight decay during training

    Weight decay during training

    Hi Gengshan,

    In the supplementary material of VCN, you highlight that the weight decay term in PWCNet is removed. May I ask what's the effect of removing it? Is it for faster training or performance boost?

    Best Regards, Jianyuan

    opened by jytime 2
  • About loss

    About loss

    Hi,

    Thank you for sharing your codes.

    I am little confused that in your supplemental material for paper you said that you used L1 loss + OOR loss when finetuned Sintel and KITTI dataset.

    However, in you code, it shows that you use L2 loss and OOR loss.

    I would like to know which loss I should use.

    opened by ChangyuLNeu 2
  • question about the code

    question about the code

    https://github.com/gengshan-y/VCN/blob/00c4befdbdf4e42050867996a6f686f52086e01a/dataloader/flow_transforms.py#L186-L189 Hi, could you please explain what this exit condition means? thx!

    opened by btwbtm 1
  • Kitti 2015 train and val epe

    Kitti 2015 train and val epe

    Hi,

    I am trying to reproduce your KITTI-15 validation (40 images) Fl-err and EPE result. I am using the finetune_149999.tar under kitti-ft-trainval. However I am unable to reporduce neither of your Fl-err (3.9) and EPE (1.144).

    For EPE on 40 images, the average EPE I got is 2.527. As a reference, the average EPE is 2.5 for the kitti-train.

    To calculate the EPE, I am using the following approach:

    for index in range(len(test_left_img)):
        1. flow, _ = VCN(input)
        2. Resize flow to kitti Height and Width
        mask = gtflow[:,:,2]==1
        gtflow = gtflow[:,:,:2]
        flow = flow[:,:,:2]
        epe = np.sqrt(np.power(gtflow - flow,2).sum(-1))[mask]
         aepe_s.append( epe.mean() )
    
    result = sum(aepe_s)/len(aepe_s)
    

    Is this the right approach? Also, I don't seen to be able to find the code to compute Fl-err. Can you share this as well?

    Thanks,

    opened by pichuang1984 1
  • Visualize.ipynb

    Visualize.ipynb

    Hello, I am trying to run the example notebook. However, when I run this line flow,entropy = inference(imgL_o, imgR_o, maxdisp, fac, modelpath) I get this error:

    ~/Documents/GitHub/VCN/models/conv4d.py in forward(self, x)
        163 
        164     def forward(self,x):
    --> 165         b,c,u,v,h,w = x.size()
        166         x = self.conv1(x.view(b,c,u,v,h*w))
        167         if self.with_bn:
    
    ValueError: not enough values to unpack (expected 6, got 5)
    

    This happens with both the sintel and KITTI models.

    opened by realleyriley 1
An official implementation of "SFNet: Learning Object-aware Semantic Correspondence" (CVPR 2019, TPAMI 2020) in PyTorch.

PyTorch implementation of SFNet This is the implementation of the paper "SFNet: Learning Object-aware Semantic Correspondence". For more information,

CV Lab @ Yonsei University 87 Dec 30, 2022
Learning Correspondence from the Cycle-consistency of Time (CVPR 2019)

TimeCycle Code for Learning Correspondence from the Cycle-consistency of Time (CVPR 2019, Oral). The code is developed based on the PyTorch framework,

Xiaolong Wang 706 Nov 29, 2022
Demo code for paper "Learning optical flow from still images", CVPR 2021.

Depthstillation Demo code for "Learning optical flow from still images", CVPR 2021. [Project page] - [Paper] - [Supplementary] This code is provided t

null 130 Dec 25, 2022
Learning Optical Flow from a Few Matches (CVPR 2021)

Learning Optical Flow from a Few Matches This repository contains the source code for our paper: Learning Optical Flow from a Few Matches CVPR 2021 Sh

Shihao Jiang (Zac) 159 Dec 16, 2022
A lightweight deep network for fast and accurate optical flow estimation.

FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation The official PyTorch implementation of FastFlowNet (ICRA 2021). Authors: Lingtong

Tone 161 Jan 3, 2023
Unsupervised Learning of Multi-Frame Optical Flow with Occlusions

This is a Pytorch implementation of Janai, J., Güney, F., Ranjan, A., Black, M. and Geiger, A., Unsupervised Learning of Multi-Frame Optical Flow with

Anurag Ranjan 110 Nov 2, 2022
a reimplementation of Optical Flow Estimation using a Spatial Pyramid Network in PyTorch

pytorch-spynet This is a personal reimplementation of SPyNet [1] using PyTorch. Should you be making use of this work, please cite the paper according

Simon Niklaus 269 Jan 2, 2023
A fast model to compute optical flow between two input images.

DCVNet: Dilated Cost Volumes for Fast Optical Flow This repository contains our implementation of the paper: @InProceedings{jiang2021dcvnet, title={

Huaizu Jiang 8 Sep 27, 2021
Demo code for ICCV 2021 paper "Sensor-Guided Optical Flow"

Sensor-Guided Optical Flow Demo code for "Sensor-Guided Optical Flow", ICCV 2021 This code is provided to replicate results with flow hints obtained f

null 10 Mar 16, 2022
A project that uses optical flow and machine learning to detect aimhacking in video clips.

waldo-anticheat A project that aims to use optical flow and machine learning to visually detect cheating or hacking in video clips from fps games. Che

waldo.vision 542 Dec 3, 2022
A project that uses optical flow and machine learning to detect aimhacking in video clips.

waldo-anticheat A project that aims to use optical flow and machine learning to visually detect cheating or hacking in video clips from fps games. Che

waldo.vision 383 Nov 1, 2021
MMFlow is an open source optical flow toolbox based on PyTorch

Documentation: https://mmflow.readthedocs.io/ Introduction English | 简体中文 MMFlow is an open source optical flow toolbox based on PyTorch. It is a part

OpenMMLab 688 Jan 6, 2023
UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss

UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss This repository contains the TensorFlow implementation of the paper UnF

Simon Meister 270 Nov 6, 2022
Traditional deepdream with VQGAN+CLIP and optical flow. Ready to use in Google Colab

VQGAN-CLIP-Video cat.mp4 policeman.mp4 schoolboy.mp4 forsenBOG.mp4

null 23 Oct 26, 2022
[CVPR 2022] Deep Equilibrium Optical Flow Estimation

Deep Equilibrium Optical Flow Estimation This is the official repo for the paper Deep Equilibrium Optical Flow Estimation (CVPR 2022), by Shaojie Bai*

CMU Locus Lab 136 Dec 18, 2022
Official Pytorch implementation of 'GOCor: Bringing Globally Optimized Correspondence Volumes into Your Neural Network' (NeurIPS 2020)

Official implementation of GOCor This is the official implementation of our paper : GOCor: Bringing Globally Optimized Correspondence Volumes into You

Prune Truong 71 Nov 18, 2022
《Dual-Resolution Correspondence Network》(NeurIPS 2020)

Dual-Resolution Correspondence Network Dual-Resolution Correspondence Network, NeurIPS 2020 Dependency All dependencies are included in asset/dualrcne

Active Vision Laboratory 45 Nov 21, 2022
A PyTorch implementation for V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation

A PyTorch implementation of V-Net Vnet is a PyTorch implementation of the paper V-Net: Fully Convolutional Neural Networks for Volumetric Medical Imag

Matthew Macy 606 Dec 21, 2022
Just Go with the Flow: Self-Supervised Scene Flow Estimation

Just Go with the Flow: Self-Supervised Scene Flow Estimation Code release for the paper Just Go with the Flow: Self-Supervised Scene Flow Estimation,

Himangi Mittal 50 Nov 22, 2022