Joint deep network for feature line detection and description

Related tags

Deep Learning SOLD2
Overview

SOLD² - Self-supervised Occlusion-aware Line Description and Detection

This repository contains the implementation of the paper: SOLD² : Self-supervised Occlusion-aware Line Description and Detection, J-T. Lin*, R. Pautrat*, V. Larsson, M. Oswald and M. Pollefeys (Oral at CVPR 2021).

SOLD² is a deep line segment detector and descriptor that can be trained without hand-labelled line segments and that can robustly match lines even in the presence of occlusion.

Demos

Matching in the presence of occlusion: demo_occlusion

Matching with a moving camera: demo_moving_camera

Usage

Installation

We recommend using this code in a Python environment (e.g. venv or conda). The following script installs the necessary requirements with pip:

pip install -r requirements.txt

Set your dataset and experiment paths (where you will store your datasets and checkpoints of your experiments) by modifying the file config/project_config.py. Both variables DATASET_ROOT and EXP_PATH have to be set.

You can download the version of the Wireframe dataset that we used during our training and testing here. This repository also includes some files to train on the Holicity dataset to add more outdoor images, but note that we did not extensively test this dataset and the original paper was based on the Wireframe dataset only.

Training your own model

All training parameters are located in configuration files in the folder config. Training SOLD² from scratch requires several steps, some of which taking several days, depending on the size of your dataset.

Step 1: Train on a synthetic dataset

The following command will create the synthetic dataset and start training the model on it:

python experiment.py --mode train --dataset_config config/synthetic_dataset.yaml --model_config config/train_detector.yaml --exp_name sold2_synth
Step 2: Export the raw pseudo ground truth on the Wireframe dataset with homography adaptation

Note that this step can take one to several days depending on your machine and on the size of the dataset. You can set the batch size to the maximum capacity that your GPU can handle.

python experiment.py --exp_name wireframe_train --mode export --resume_path <path to your previously trained sold2_synth> --model_config config/train_detector.yaml --dataset_config config/wireframe_dataset.yaml --checkpoint_name <name of the best checkpoint> --export_dataset_mode train --export_batch_size 4

You can similarly perform the same for the test set:

python experiment.py --exp_name wireframe_test --mode export --resume_path <path to your previously trained sold2_synth> --model_config config/train_detector.yaml --dataset_config config/wireframe_dataset.yaml --checkpoint_name <name of the best checkpoint> --export_dataset_mode test --export_batch_size 4
Step3: Compute the ground truth line segments from the raw data
cd postprocess
python convert_homography_results.py <name of the previously exported file (e.g. "wireframe_train.h5")> <name of the new data with extracted line segments (e.g. "wireframe_train_gt.h5")> ../config/export_line_features.yaml
cd ..

We recommend testing the results on a few samples of your dataset to check the quality of the output, and modifying the hyperparameters if need be. Using a detect_thresh=0.5 and inlier_thresh=0.99 proved to be successful for the Wireframe dataset in our case for example.

Step 4: Train the detector on the Wireframe dataset

We found it easier to pretrain the detector alone first, before fine-tuning it with the descriptor part. Uncomment the lines 'gt_source_train' and 'gt_source_test' in config/wireframe_dataset.yaml and fill them with the path to the h5 file generated in the previous step.

python experiment.py --mode train --dataset_config config/wireframe_dataset.yaml --model_config config/train_detector.yaml --exp_name sold2_wireframe

Alternatively, you can also fine-tune the already trained synthetic model:

python experiment.py --mode train --dataset_config config/wireframe_dataset.yaml --model_config config/train_detector.yaml --exp_name sold2_wireframe --pretrained --pretrained_path <path ot the pre-trained sold2_synth> --checkpoint_name <name of the best checkpoint>

Lastly, you can resume a training that was stopped:

python experiment.py --mode train --dataset_config config/wireframe_dataset.yaml --model_config config/train_detector.yaml --exp_name sold2_wireframe --resume --resume_path <path to the  model to resume> --checkpoint_name <name of the last checkpoint>
Step 5: Train the full pipeline on the Wireframe dataset

You first need to modify the field 'return_type' in config/wireframe_dataset.yaml to 'paired_desc'. The following command will then train the full model (detector + descriptor) on the Wireframe dataset:

python experiment.py --mode train --dataset_config config/wireframe_dataset.yaml --model_config config/train_full_pipeline.yaml --exp_name sold2_full_wireframe --pretrained --pretrained_path <path ot the pre-trained sold2_wireframe> --checkpoint_name <name of the best checkpoint>

Pretrained models

We provide the checkpoints of two pretrained models:

How to use it

We provide a notebook showing how to use the trained model of SOLD². Additionally, you can use the model to export line features (segments and descriptor maps) as follows:

python export_line_features.py --img_list <list to a txt file containing the path to all the images> --output_folder <path to the output folder> --checkpoint_path <path to your best checkpoint,>

You can tune some of the line detection parameters in config/export_line_features.yaml, in particular the 'detect_thresh' and 'inlier_thresh' to adapt them to your trained model and type of images.

Results

Comparison of repeatability and localization error to the state of the art on the Wireframe dataset for an error threshold of 5 pixels in structural and orthogonal distances:

Structural distance Orthogonal distance
Rep-5 Loc-5 Rep-5 Loc-5
LCNN 0.434 2.589 0.570 1.725
HAWP 0.451 2.625 0.537 1.725
DeepHough 0.419 2.576 0.618 1.720
TP-LSD TP512 0.563 2.467 0.746 1.450
LSD 0.358 2.079 0.707 0.825
Ours with NMS 0.557 1.995 0.801 1.119
Ours 0.616 2.019 0.914 0.816

Matching precision-recall curves on the Wireframe and ETH3D datasets: pred_lines_pr_curve

Bibtex

If you use this code in your project, please consider citing the following paper:

@InProceedings{Pautrat_Lin_2021_CVPR,
    author = {Pautrat, Rémi* and Juan-Ting, Lin* and Larsson, Viktor and Oswald, Martin R. and Pollefeys, Marc},
    title = {SOLD²: Self-supervised Occlusion-aware Line Description and Detection},
    booktitle = {Computer Vision and Pattern Recognition (CVPR)},
    year = {2021},
}
Comments
  • How to train this model on the other datasets?

    How to train this model on the other datasets?

    Hello, thank you for sharing the code! If I want to train this model on the other datasets such as kitti dataset, could you provide some steps or suggestions? Thank you very much.

    opened by weiningwei 31
  • The experiment setting of comparison with SuperPoint

    The experiment setting of comparison with SuperPoint

    Hi, thank you so much for the paper and the code:) I find it very interesting that SOLD2 has almost double the performance of SuperPoint on WireFrame. I have some questions for this comparison, since I have been worked on SuperPoint for a while:

    1. Did you train superpoint on wireframe to compare, or just use magicleap's pretrain weight?
    2. SuperPoint:SOLD2 = 0.58:0.94 (see Fig. 1). Given that the rotation during validation is union in -90 ~ 90 degree, did you observe that SuperPoint mainly did bad on data with rotate <-45 or >45 (the half with larger angle)? This assumption is based on the shown graph, where SuperPoint seems to perform poorly on large angle data (see Fig. 2)

    Fig. 1 image

    Fig. 2 image

    opened by triangleCZH 17
  • The time cost of the training process in step 1

    The time cost of the training process in step 1

    Hello, thank you for sharing! In the training process of step 1, an epoch costs one to two hours, and you set up 200 epoch. Is this training time normal? Is there any way to improve the training speed?

    opened by Machine97 16
  • Question about the descriptor branch

    Question about the descriptor branch

    Hello, thank you for sharing! The backbone encoding is processed by two consecutive convolutions of kernels 3 × 3 and 1 × 1, and output channels 256 and 128, to produce a h/4 ×w/4 × 128 feature descriptor map. Coud I modify the output channels "256 and 128" to "256 and 256"? and then i will get a h/4 ×w/4 × 256 feature descriptor map. Will this improve the performance of the network, and do I need to modify other things?

    opened by Zhuyawen7 15
  • Question about occlusion

    Question about occlusion

    Hi, thanks for your contribution Is the indicator the occlusion of each line, or the percentage of occlusion of the entire image in fig 5? If it is the whole image, assuming 0.6 occlusion, your accuracy is about 0.7. Accuracy is the ratio of matching correct lines to detected lines or the ratio of matching correct lines to the all ground-truth line? Because I think if 0.6 is occluded for the whole image, then there will be no more matches than 0.4.

    opened by Zhuyawen7 14
  • junction point descriptor

    junction point descriptor

    Hi @rpautrat,

    How are you?

    Thanks so much for your help every time! I appreciate it.

    Just wondering when we predict the junction points using "SOLD2", will it output "point coordinates", "descriptors", and etc. just like the one from "superpoint"?

    Thanks so much and have a good week!

    opened by miaoqiz 13
  • Qs about the paper

    Qs about the paper

    Hello there,

    I recently read your paper many times but still cannot understand the part (Descriptor Branch). When the network is doing a forward propagation. The network will go through homograph adaption to get the ground truth. Then the network will go through the backbone encoder and line detection module to get the predicted result. (junction and line heatmap). After that, the dense descriptor will compare these two results? If you can guide me to any prior knowledge, it will be appreciated.

    Thanks in advance,

    Zhonghan Deng

    opened by d5423197 12
  • opencv-python==4.0.1.23

    opencv-python==4.0.1.23

    Hello there,

    When I am trying to install all the requirements. I could not find opencv-python==4.0.1.23. Then I go to the official web, it says this version has been yanked. Is there a substitution that can be used? By the way, what is the python version recommended form this project?

    Thanks,

    Zhonghan Deng

    opened by d5423197 10
  • KeyError in step 4 when training on Holicity dataset

    KeyError in step 4 when training on Holicity dataset

    Hello there,

    I am trying to train on the Holicity dataset, but when I run step 4 I get the following error:

    holicity_dataset.py", line 767, in getitem data["junctions"] = exported_label["junctions"] KeyError: 'junctions'

    When I print the keys of exported_label I get the following keys: ['heatmap_cout', 'heatmap_prob_max', 'heatmap_prob_mean', 'image', 'junc_count', 'junc_prob_max', 'junc_prob_mean']

    Do you know what might be the problem? Thank you!

    opened by ISil14 10
  • combining both points and lines into a unified framework

    combining both points and lines into a unified framework

    Hi, thanks for sharing your work. In the paper, you mentioned combining both points and lines into a unified framework.Is this work you are doing? What do you think are the difficulties of this work

    opened by 2017DEVIL 7
  • the matching between day and night

    the matching between day and night

    Hi,Thx for your great work.. I found the matching between day and night is bad,can you give me some suggestion? Than you very much. I use the pre_trained model for test and also train by myself. 图片 图片

    opened by zhengshunkai 7
  • How to match lines detected by LSD

    How to match lines detected by LSD

    Hello, thank you very much for your work! I wonder how to match lines detected by LSD with our descriptor. I can only use the code in the jupyter notebook to match lines detected by SOLD2.

    opened by JosepLee 1
  • 3D Line Reconstruction

    3D Line Reconstruction

    Hi, thanks for sharing your work.

    I notice that, you show the 3D line reconstruction result in your oral presentation video(https://www.youtube.com/watch?v=HadE8YnCIRw), it looks great. Is that possible to share the 3D line reconstruction scripts with us?

    Thanks again.

    opened by Hao-HUST 7
  • How to carry out migration training

    How to carry out migration training

    Hi, thank you very much for your contribution. I have a question about how to use the pre training model(sold2_wireframe.tar) of wireframe dataset provided in Readme document to migrate and learn on a small self-made street view image dataset. I hope you can provide me with some suggestions.

    opened by 3600067524 10
Owner
Computer Vision and Geometry Lab
Computer Vision and Geometry Lab
Joint Detection and Identification Feature Learning for Person Search

Person Search Project This repository hosts the code for our paper Joint Detection and Identification Feature Learning for Person Search. The code is

null 712 Dec 17, 2022
Pytorch implementation of "Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling"

RNN-for-Joint-NLU Pytorch implementation of "Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling"

Kim SungDong 194 Dec 28, 2022
Joint detection and tracking model named DEFT, or ``Detection Embeddings for Tracking.

DEFT: Detection Embeddings for Tracking DEFT: Detection Embeddings for Tracking, Mohamed Chaabane, Peter Zhang, J. Ross Beveridge, Stephen O'Hara

Mohamed Chaabane 253 Dec 18, 2022
The description of FMFCC-A (audio track of FMFCC) dataset and Challenge resluts.

FMFCC-A This project is the description of FMFCC-A (audio track of FMFCC) dataset and Challenge resluts. The FMFCC-A dataset is shared through BaiduCl

null 2 Oct 20, 2021
Code and description for my BSc Project, September 2021

BSc-Project Disclaimer: This repo consists of only the additional python scripts necessary to run the agent. To run the project on your own personal d

Matin Tavakoli 20 Jul 19, 2022
Drone-based Joint Density Map Estimation, Localization and Tracking with Space-Time Multi-Scale Attention Network

DroneCrowd Paper Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark. Introduction This paper proposes a space-time multi-scale atte

VisDrone 98 Nov 16, 2022
Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

This is the official PyTorch implementation of our paper: "Joint Object Detection and Multi-Object Tracking with Graph Neural Networks". Our project website and video demos are here.

Richard Wang 443 Dec 6, 2022
Open source code for Paper "A Co-Interactive Transformer for Joint Slot Filling and Intent Detection"

A Co-Interactive Transformer for Joint Slot Filling and Intent Detection This repository contains the PyTorch implementation of the paper: A Co-Intera

null 67 Dec 5, 2022
joint detection and semantic segmentation, based on ultralytics/yolov5,

Multi YOLO V5——Detection and Semantic Segmentation Overeview This is my undergraduate graduation project which based on ultralytics YOLO V5 tag v5.0.

null 477 Jan 6, 2023
Code for HLA-Face: Joint High-Low Adaptation for Low Light Face Detection (CVPR21)

HLA-Face: Joint High-Low Adaptation for Low Light Face Detection The official PyTorch implementation for HLA-Face: Joint High-Low Adaptation for Low L

Wenjing Wang 77 Dec 8, 2022
The code repository for "RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection" (ACM MM'21)

RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection (ACM MM'21) By Zhuofan Zong, Qianggang Cao, Biao Leng Introduction F

TempleX 9 Jul 30, 2022
The Pytorch code of "Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot Classification", CVPR 2022 (Oral).

DeepBDC for few-shot learning        Introduction In this repo, we provide the implementation of the following paper: "Joint Distribution Matters: Dee

FeiLong 116 Dec 19, 2022
CFC-Net: A Critical Feature Capturing Network for Arbitrary-Oriented Object Detection in Remote Sensing Images

CFC-Net This project hosts the official implementation for the paper: CFC-Net: A Critical Feature Capturing Network for Arbitrary-Oriented Object Dete

ming71 55 Dec 12, 2022
Pytorch implementation of Feature Pyramid Network (FPN) for Object Detection

fpn.pytorch Pytorch implementation of Feature Pyramid Network (FPN) for Object Detection Introduction This project inherits the property of our pytorc

Jianwei Yang 912 Dec 21, 2022
The implementation of the paper "A Deep Feature Aggregation Network for Accurate Indoor Camera Localization".

A Deep Feature Aggregation Network for Accurate Indoor Camera Localization This is the PyTorch implementation of our paper "A Deep Feature Aggregation

null 9 Dec 9, 2022
A Data Annotation Tool for Semantic Segmentation, Object Detection and Lane Line Detection.(In Development Stage)

Data-Annotation-Tool How to Run this Tool? To run this software, follow the steps: git clone https://github.com/Autonomous-Car-Project/Data-Annotation

TiVRA AI 13 Aug 18, 2022
A framework for joint super-resolution and image synthesis, without requiring real training data

SynthSR This repository contains code to train a Convolutional Neural Network (CNN) for Super-resolution (SR), or joint SR and data synthesis. The met

null 83 Jan 1, 2023
git《Joint Entity and Relation Extraction with Set Prediction Networks》(2020) GitHub:

Joint Entity and Relation Extraction with Set Prediction Networks Source code for Joint Entity and Relation Extraction with Set Prediction Networks. W

null 130 Dec 13, 2022
《Where am I looking at? Joint Location and Orientation Estimation by Cross-View Matching》(CVPR 2020)

This contains the codes for cross-view geo-localization method described in: Where am I looking at? Joint Location and Orientation Estimation by Cross-View Matching, CVPR2020.

null 41 Oct 27, 2022