Code release for "COTR: Correspondence Transformer for Matching Across Images"

Related tags

Text Data & NLP COTR
Overview

COTR: Correspondence Transformer for Matching Across Images

This repository contains the inference code for COTR. We plan to release the training code in the future. COTR establishes correspondence in a functional and end-to-end fashion. It solves dense and sparse correspondence problem in the same framework.

Demos

Check out our demo video at here.

1. Install environment

Our implementation is based on PyTorch. Install the conda environment by: conda env create -f environment.yml.

Activate the environment by: conda activate cotr_env.

Notice that we use scipy=1.2.1 .

2. Download the pretrained weights

Down load the pretrained weights at here. Extract in to ./out, such that the weights file is at /out/default/checkpoint.pth.tar.

3. Single image pair demo

python demo_single_pair.py --load_weights="default"

Example sparse output:

Example dense output with triangulation:

Note: This example uses 10K valid sparse correspondences to densify.

4. Facial landmarks demo

python demo_face.py --load_weights="default"

Example:

5. Homography demo

python demo_homography.py --load_weights="default"

Citation

If you use this code in your research, cite the paper:

@article{jiang2021cotr,
  title={{COTR: Correspondence Transformer for Matching Across Images}},
  author={Wei Jiang and Eduard Trulls and Jan Hosang and Andrea Tagliasacchi and Kwang Moo Yi},
  booktitle={arXiv preprint},
  publisher_page={https://arxiv.org/abs/2103.14167},
  year={2021}
}
Comments
  • Matching time

    Matching time

    您好,感谢您精彩的工作。有点疑问向您请教,请问该如何理解一个点的查询,每秒可以做到35个对应点? "Our currently non-optimized prototype implementation queries one point at a time, and achieves 35 correspondences per second on a NVIDIA RTX 3090 GPU. " 我最近在跑您的代码,我在NVIDIA RTX 3090 GPU跑demo_single_pair.py,匹配大概花了30s,请问这正常吗? 谢谢!

    opened by zhirui-gao 19
  • find the coordinates of the corresponding point (x', y') on another picture.

    find the coordinates of the corresponding point (x', y') on another picture.

    Thank you for the outstanding work you do. I would like to ask if it is possible to enter the coordinates of a point (x, y) and find the coordinates of the corresponding point (x', y') on another picture.

    opened by lllllialois 9
  • patch partition?

    patch partition?

    Thank you for such an excellent job. I have some questions about cotr. During the training process, do you divide the scene images into 256*256 patches according to certain rules after scaling and then input them into the network for training? (I'm not sure where this step is implemented in the program.) How is corrs partitioned? Will it be the case that the corresponding point is divided into the next patch? How should this be handled? Is the validation process also similar to the training process after the split iteration.

    opened by zbc-l 5
  • How is the warpped image in Figure 9 generated?

    How is the warpped image in Figure 9 generated?

    Hi, thanks for the great work! I'm curious about how do you generate the warpped image in Figure 9 by dense flow. If I understand correctly, you input a pixel coordinate (x, y) in img1, and get its corresponding coordinate (x', y') in the img2. Then, you just copy the RGB in (x, y) to (x', y') in img2, and repeat this for all the coordinates in img1. Am I correct? Or, is there any efficient way of doing so? (like you've mentioned in #28 ?)

    opened by Wuziyi616 4
  • Question

    Question

    What does the dense correspondence map in Figure 1 mean and how to get it and how to reflect it numerically, I only know that it is the dense correspondence between the two images, what does color-coded ‘x’ channel mean ?

    opened by j1o2h3n 4
  • TypeError: 'NoneType' object is not callable

    TypeError: 'NoneType' object is not callable

    Thank you very much for your open source code! When I run "python demo_single_pair.py --load_weights="default"", the bug show. Could you give me some debugging advice? image

    opened by USTC-wlsong 4
  • Possible redundancy in the code

    Possible redundancy in the code

    Hi, I notice that when constructing the Transformer, you always return the intermediate features at this line. However, after feeding them to MLP for corr regression, you only take the prediction over the last layer at this line. So I guess maybe you can set return_intermediate=False to save some memory/computation?

    opened by Wuziyi616 3
  • Dense optical flow as in paper Figure 1 (c)

    Dense optical flow as in paper Figure 1 (c)

    Hi, thanks for the great work! I wonder how can I estimate the optical flow between two images. Say img1 is of shape [H, W], then can I basically reshape the grid coordinates to [H*W, 2] and then input to queries_a as in this demo?

    opened by Wuziyi616 3
  • Question

    Question

    Hello, when running through the code with the pre-trained model, it appears that RuntimeError: CUDA out of memory. Tried to allocate 2.00 GiB (GPU 0; 7.79 GiB total capacity; 2.90 GiB already allocated; 1.83 GiB free; 4.80 GiB reserved in total by PyTorch).Is there any solution?For example, which parameters to adjust?

    opened by Lucifer1002 2
  • Rotation angle

    Rotation angle

    Hello, I would like to ask, when COTR extracts the common view area, for some scenes with too large rotation angle, the formula area cannot be extracted. What is the possible reason for this phenomenon?

    opened by Lucifer1002 1
  • Match time

    Match time

    Hello, about COTR, if I use other feature extraction methods to get the feature point positions of the image and input them, can I reduce the time of COTR feature matching?

    opened by Lucifer1002 1
  • How can I ensure the smoothness of point movement when key point tracking is performed on the video?

    How can I ensure the smoothness of point movement when key point tracking is performed on the video?

    How can I ensure the smoothness of point movement when key point tracking is performed on the video? I am trying to find the key points frame by frame, but it is very un-smooth and will jump and drift repeatedly.

    opened by lllllialois 0
  • About ETH3D evaluation

    About ETH3D evaluation

    Hi Wei, thanks for sharing the code.

    Would it be possible to provide the ETH3D evaluation code? I was wondering how the data flow of the model's forward propagation.

    Look forward to your reply. Regards

    opened by CARRLIANSS 3
  • Sharing raw data of ETH3D and KITTI

    Sharing raw data of ETH3D and KITTI

    Hi everyone:

    I'd like to share the raw output from COTR for ETH3D and KITTI dataset.

    ETH3D eval: https://drive.google.com/file/d/1pfAuHRK7FvB6Hc9Rru-beH6F-2lpZAk6/view?usp=sharing

    KITTI: https://drive.google.com/file/d/1SiN5UbqautqosUCInQN2WhyxbRcbWt8b/view?usp=sharing

    The format is: {src_id}->{tgt_id}.npy, and I saved the results as a dictionary. There are several keys: "raw_corr", "drifting_forward", and "drifting_backward". "raw_corr" is the raw sparse correspondences in XYXY format, and "drifting_forward", "drifting_backward" are used to the masks to filter out drifted predictions.

    documentation 
    opened by jiangwei221 10
  • About HPatches datasets

    About HPatches datasets

    Thanks very much for your great work! AND i want to know that how do you test and evaluate the HPatches dataset(in the code)? Can you tell me how to get the relevant code?

    opened by ifuramango 2
  • training

    training

    Hello, I would like to ask if you are using the complete MegaDepth dataset for training data, or select a part of it, and if it is convenient, can you provide a training data?

    opened by Lucifer1002 3
Owner
UBC Computer Vision Group
University of British Columbia Computer Vision Group
UBC Computer Vision Group
TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset.

TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset. TunBERT was applied to three NLP downstream tasks: Sentiment Analysis (SA), Tunisian Dialect Identification (TDI) and Reading Comprehension Question-Answering (RCQA)

InstaDeep Ltd 72 Dec 9, 2022
Code examples for my Write Better Python Code series on YouTube.

Write Better Python Code This repository contains the code examples used in my Write Better Python Code series published on YouTube: https:/

null 858 Dec 29, 2022
Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

This codebase is being actively maintained, please create and issue if you have issues using it Basics All data files are included under losses and ea

Justin Terry 32 Nov 9, 2021
A python script to prefab your scripts/text files, and re create them with ease and not have to open your browser to copy code or write code yourself

Scriptfab - What is it? A python script to prefab your scripts/text files, and re create them with ease and not have to open your browser to copy code

DevNugget 3 Jul 28, 2021
Code for the Python code smells video on the ArjanCodes channel.

7 Python code smells This repository contains the code for the Python code smells video on the ArjanCodes channel (watch the video here). The example

null 55 Dec 29, 2022
Code for CodeT5: a new code-aware pre-trained encoder-decoder model.

CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation This is the official PyTorch implementation

Salesforce 564 Jan 8, 2023
Galois is an auto code completer for code editors (or any text editor) based on OpenAI GPT-2.

Galois is an auto code completer for code editors (or any text editor) based on OpenAI GPT-2. It is trained (finetuned) on a curated list of approximately 45K Python (~470MB) files gathered from the Github. Currently, it just works properly on Python but not bad at other languages (thanks to GPT-2's power).

Galois Autocompleter 91 Sep 23, 2022
Code-autocomplete, a code completion plugin for Python

Code AutoComplete code-autocomplete, a code completion plugin for Python.

xuming 13 Jan 7, 2023
Code of paper: A Recurrent Vision-and-Language BERT for Navigation

Recurrent VLN-BERT Code of the Recurrent-VLN-BERT paper: A Recurrent Vision-and-Language BERT for Navigation Yicong Hong, Qi Wu, Yuankai Qi, Cristian

YicongHong 109 Dec 21, 2022
Easy, fast, effective, and automatic g-code compression!

Getting to the meat of g-code. Easy, fast, effective, and automatic g-code compression! MeatPack nearly doubles the effective data rate of a standard

Scott Mudge 97 Nov 21, 2022
Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.

textgenrnn Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code, or quickly tr

Max Woolf 4.8k Dec 30, 2022
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

T5: Text-To-Text Transfer Transformer The t5 library serves primarily as code for reproducing the experiments in Exploring the Limits of Transfer Lear

Google Research 4.6k Jan 1, 2023
Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.

textgenrnn Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code, or quickly tr

Max Woolf 4.3k Feb 18, 2021
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

T5: Text-To-Text Transfer Transformer The t5 library serves primarily as code for reproducing the experiments in Exploring the Limits of Transfer Lear

Google Research 3.2k Feb 17, 2021
Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks. It takes raw videos/images + text as inputs, and outputs task predictions. ClipBERT is designed based on 2D CNNs and transformers, and uses a sparse sampling strategy to enable efficient end-to-end video-and-language learning.

Jie Lei 雷杰 612 Jan 4, 2023
Collection of scripts to pinpoint obfuscated code

Obfuscation Detection (v1.0) Author: Tim Blazytko Automatically detect control-flow flattening and other state machines Description: Scripts and binar

Tim Blazytko 230 Nov 26, 2022
Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper

Data Augmentation using Pre-trained Transformer Models Code associated with the Data Augmentation using Pre-trained Transformer Models paper Code cont

null 44 Dec 31, 2022