Code release for "COTR: Correspondence Transformer for Matching Across Images"

UBC Computer Vision Group

Last update: Jan 6, 2023

Related tags

Deep Learning COTR

Overview

COTR: Correspondence Transformer for Matching Across Images

This repository contains the inference code for COTR. We plan to release the training code in the future. COTR establishes correspondence in a functional and end-to-end fashion. It solves dense and sparse correspondence problem in the same framework.

Demos

Check out our demo video at here.

1. Install environment

Our implementation is based on PyTorch. Install the conda environment by: conda env create -f environment.yml.

Activate the environment by: conda activate cotr_env.

Notice that we use scipy=1.2.1 .

2. Download the pretrained weights

Down load the pretrained weights at here. Extract in to ./out, such that the weights file is at /out/default/checkpoint.pth.tar.

3. Single image pair demo

python demo_single_pair.py --load_weights="default"

Example sparse output:

Example dense output with triangulation:

Note: This example uses 10K valid sparse correspondences to densify.

4. Facial landmarks demo

python demo_face.py --load_weights="default"

Example:

5. Homography demo

python demo_homography.py --load_weights="default"

Citation

If you use this code in your research, cite the paper:

@article{jiang2021cotr,
  title={{COTR: Correspondence Transformer for Matching Across Images}},
  author={Wei Jiang and Eduard Trulls and Jan Hosang and Andrea Tagliasacchi and Kwang Moo Yi},
  booktitle={arXiv preprint},
  publisher_page={https://arxiv.org/abs/2103.14167},
  year={2021}
}

Comments

Matching time

您好，感谢您精彩的工作。有点疑问向您请教，请问该如何理解一个点的查询，每秒可以做到35个对应点？ "Our currently non-optimized prototype implementation queries one point at a time, and achieves 35 correspondences per second on a NVIDIA RTX 3090 GPU. " 我最近在跑您的代码，我在NVIDIA RTX 3090 GPU跑demo_single_pair.py，匹配大概花了30s,请问这正常吗？谢谢！

opened by zhirui-gao 19
find the coordinates of the corresponding point (x', y') on another picture.

Thank you for the outstanding work you do. I would like to ask if it is possible to enter the coordinates of a point (x, y) and find the coordinates of the corresponding point (x', y') on another picture.

opened by lllllialois 9
patch partition?

Thank you for such an excellent job. I have some questions about cotr. During the training process, do you divide the scene images into 256*256 patches according to certain rules after scaling and then input them into the network for training? (I'm not sure where this step is implemented in the program.) How is corrs partitioned? Will it be the case that the corresponding point is divided into the next patch? How should this be handled? Is the validation process also similar to the training process after the split iteration.

opened by zbc-l 5
How is the warpped image in Figure 9 generated?

Hi, thanks for the great work! I'm curious about how do you generate the warpped image in Figure 9 by dense flow. If I understand correctly, you input a pixel coordinate (x, y) in img1, and get its corresponding coordinate (x', y') in the img2. Then, you just copy the RGB in (x, y) to (x', y') in img2, and repeat this for all the coordinates in img1. Am I correct? Or, is there any efficient way of doing so? (like you've mentioned in #28 ?)

opened by Wuziyi616 4
Question

What does the dense correspondence map in Figure 1 mean and how to get it and how to reflect it numerically, I only know that it is the dense correspondence between the two images, what does color-coded ‘x’ channel mean ？

opened by j1o2h3n 4
TypeError: 'NoneType' object is not callable

Thank you very much for your open source code！ When I run "python demo_single_pair.py --load_weights="default"", the bug show. Could you give me some debugging advice?

opened by USTC-wlsong 4
Possible redundancy in the code

Hi, I notice that when constructing the Transformer, you always return the intermediate features at this line. However, after feeding them to MLP for corr regression, you only take the prediction over the last layer at this line. So I guess maybe you can set return_intermediate=False to save some memory/computation?

opened by Wuziyi616 3
Dense optical flow as in paper Figure 1 (c)

Hi, thanks for the great work! I wonder how can I estimate the optical flow between two images. Say img1 is of shape [H, W], then can I basically reshape the grid coordinates to [H*W, 2] and then input to queries_a as in this demo?

opened by Wuziyi616 3
Question

Hello, when running through the code with the pre-trained model, it appears that RuntimeError: CUDA out of memory. Tried to allocate 2.00 GiB (GPU 0; 7.79 GiB total capacity; 2.90 GiB already allocated; 1.83 GiB free; 4.80 GiB reserved in total by PyTorch).Is there any solution？For example, which parameters to adjust？

opened by Lucifer1002 2
Rotation angle

Hello, I would like to ask, when COTR extracts the common view area, for some scenes with too large rotation angle, the formula area cannot be extracted. What is the possible reason for this phenomenon?

opened by Lucifer1002 1
Match time

Hello, about COTR, if I use other feature extraction methods to get the feature point positions of the image and input them, can I reduce the time of COTR feature matching?

opened by Lucifer1002 1
How can I ensure the smoothness of point movement when key point tracking is performed on the video?

How can I ensure the smoothness of point movement when key point tracking is performed on the video? I am trying to find the key points frame by frame, but it is very un-smooth and will jump and drift repeatedly.

opened by lllllialois 0
About ETH3D evaluation

Hi Wei, thanks for sharing the code.

Would it be possible to provide the ETH3D evaluation code? I was wondering how the data flow of the model's forward propagation.

Look forward to your reply. Regards

opened by CARRLIANSS 3
Sharing raw data of ETH3D and KITTI

Hi everyone:

I'd like to share the raw output from COTR for ETH3D and KITTI dataset.

ETH3D eval: https://drive.google.com/file/d/1pfAuHRK7FvB6Hc9Rru-beH6F-2lpZAk6/view?usp=sharing

KITTI: https://drive.google.com/file/d/1SiN5UbqautqosUCInQN2WhyxbRcbWt8b/view?usp=sharing

The format is: {src_id}->{tgt_id}.npy, and I saved the results as a dictionary. There are several keys: "raw_corr", "drifting_forward", and "drifting_backward". "raw_corr" is the raw sparse correspondences in XYXY format, and "drifting_forward", "drifting_backward" are used to the masks to filter out drifted predictions.
documentation

opened by jiangwei221 10
About HPatches datasets

Thanks very much for your great work! AND i want to know that how do you test and evaluate the HPatches dataset(in the code)? Can you tell me how to get the relevant code?

opened by ifuramango 2
training

Hello, I would like to ask if you are using the complete MegaDepth dataset for training data, or select a part of it, and if it is convenient, can you provide a training data?

opened by Lucifer1002 3

Owner

UBC Computer Vision Group

University of British Columbia Computer Vision Group

GitHub

Code release for NeX: Real-time View Synthesis with Neural Basis Expansion

NeX: Real-time View Synthesis with Neural Basis Expansion Project Page | Video | Paper | COLAB | Shiny Dataset We present NeX, a new approach to novel

536 Dec 20, 2022

The code release of paper 'Domain Generalization for Medical Imaging Classification with Linear-Dependency Regularization' NIPS 2020.

Domain Generalization for Medical Imaging Classification with Linear Dependency Regularization The code release of paper 'Domain Generalization for Me

56 Dec 28, 2022

Code release for "COTR: Correspondence Transformer for Matching Across Images"

Related tags

Overview

COTR: Correspondence Transformer for Matching Across Images

Demos

1. Install environment

2. Download the pretrained weights

3. Single image pair demo

4. Facial landmarks demo

5. Homography demo

Citation

Comments

Owner

UBC Computer Vision Group

Code release for NeX: Real-time View Synthesis with Neural Basis Expansion

The code release of paper 'Domain Generalization for Medical Imaging Classification with Linear-Dependency Regularization' NIPS 2020.

Code release for "Transferable Semantic Augmentation for Domain Adaptation" (CVPR 2021)

This is the official code release for the paper Shape and Material Capture at Home

Code release for paper: The Boombox: Visual Reconstruction from Acoustic Vibrations

We will release the code of "ConTNet: Why not use convolution and transformer at the same time?" in this repo

Code release to accompany paper "Geometry-Aware Gradient Algorithms for Neural Architecture Search."

This is the dataset and code release of the OpenRooms Dataset.

Code release of paper "Deep Multi-View Stereo gone wild"

Code release for DS-NeRF (Depth-supervised Neural Radiance Fields)

Code release for BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images

Code Release for Learning to Adapt to Evolving Domains

Code release for "Self-Tuning for Data-Efficient Deep Learning" (ICML 2021)

Code release for our paper, "SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo"

Code release for The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification (TIP 2020)

Code release for NeurIPS 2020 paper "Co-Tuning for Transfer Learning"

Code release for NeuS

Code Release for ICCV 2021 (oral), "AdaFit: Rethinking Learning-based Normal Estimation on Point Clouds"

Code release for the ICML 2021 paper "PixelTransformer: Sample Conditioned Signal Generation".