ScaleNet: A Shallow Architecture for Scale Estimation

Axel Barroso

Last update: Nov 9, 2022

Related tags

Deep Learning ScaleNet

Overview

ScaleNet: A Shallow Architecture for Scale Estimation

Repository for the code of ScaleNet paper:

"ScaleNet: A Shallow Architecture for Scale Estimation".
Axel Barroso-Laguna, Yurun Tian, and Krystian Mikolajczyk. arxiv 2021.

[Paper on arxiv]

Prerequisite

Python 3.7 is required for running and training ScaleNet code. Use Conda to install the dependencies:

conda create --name scalenet_env
conda activate scalenet_env 
conda install pytorch==1.2.0 -c pytorch
conda install -c conda-forge tensorboardx opencv tqdm 
conda install -c anaconda pandas 
conda install -c pytorch torchvision

Scale estimation

run_scalenet.py can be used to estimate the scale factor between two input images. We provide as an example two images, im1.jpg and im2.jpg, within the assets/im_test folder as an example. For a quick test, please run:

python run_scalenet.py --im1_path assets/im_test/im1.jpg --im2_path assets/im_test/im2.jpg

Arguments:

im1_path: Path to image A.
im2_path: Path to image B.

It returns the scale factor A->B.

Training ScaleNet

We provide a list of Megadepth image pairs and scale factors in the assets folder. We use the undistorted images, corresponding camera intrinsics, and extrinsics preprocessed by D2-Net. You can download them directly from their main repository. If you desire to use the default configuration for training, just run the following line:

python train_ScaleNet.py --image_data_path /path/to/megadepth_d2net

There are though some important arguments to take into account when training ScaleNet.

Arguments:

image_data_path: Path to the undistorted Megadepth images from D2-Net.
save_processed_im: ScaleNet processes the images so that they are center-cropped and resized to a default resolution. We give the option to store the processed images and load them during training, which results in a much faster training. However, the size of the files can be big, and hence, we suggest storing them in a large storage disk. Default: True.
root_precomputed_files: Path to save the processed image pairs.

If you desire to modify ScaleNet training or architecture, look for all the arguments in the train_ScaleNet.py script.

Test ScaleNet - camera pose

In addition to the training, we also provide a template for testing ScaleNet in the camera pose task. In assets/data/test.csv, you can find the test Megadepth pairs, along with their scale change as well as their camera poses.

Run the following command to test ScaleNet + SIFT in our custom camera pose split:

python test_camera_pose.py --image_data_path /path/to/megadepth_d2net

camera_pose.py script is intended to provide a structure of our camera pose experiment. You can change either the local feature extractor or the scale estimator and obtain your camera pose results.

BibTeX

If you use this code or the provided training/testing pairs in your research, please cite our paper:

@InProceedings{Barroso-Laguna2021_scale,
    author = {Barroso-Laguna, Axel and Tian, Yurun and Mikolajczyk, Krystian},
    title = {{ScaleNet: A Shallow Architecture for Scale Estimation}},
    booktitle = {Arxiv: },
    year = {2021},
}

You might also like...

A large-scale video dataset for the training and evaluation of 3D human pose estimation models

ASPset-510 (Australian Sports Pose Dataset) is a large-scale video dataset for the training and evaluation of 3D human pose estimation models. It contains 17 different amateur subjects performing 30 sports-related actions each, for a total of 510 action clips.

25 Jun 20, 2021

[BMVC 2021] Official PyTorch Implementation of Self-supervised learning of Image Scale and Orientation Estimation

Self-Supervised Learning of Image Scale and Orientation Estimation (BMVC 2021) This is the official implementation of the paper "Self-Supervised Learn

17 Nov 10, 2022

the official code for ICRA 2021 Paper: "Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation"

G2S This is the official code for ICRA 2021 Paper: Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation by Hemang

4 Jul 27, 2022

Monocular Depth Estimation - Weighted-average prediction from multiple pre-trained depth estimation models

merged_depth runs (1) AdaBins, (2) DiverseDepth, (3) MiDaS, (4) SGDepth, and (5) Monodepth2, and calculates a weighted-average per-pixel absolute dept

39 Nov 21, 2022

Web service for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation based on OpenFace 2.0

OpenGaze: Web Service for OpenFace Facial Behaviour Analysis Toolkit Overview OpenFace is a fantastic tool intended for computer vision and machine le

4 Nov 3, 2022

OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.

OpenFace 2.2.0: a facial behavior analysis toolkit Over the past few years, there has been an increased interest in automatic facial behavior analysis

5.8k Dec 31, 2022

Light-weight network, depth estimation, knowledge distillation, real-time depth estimation, auxiliary data.

light-weight-depth-estimation Boosting Light-Weight Depth Estimation Via Knowledge Distillation, https://arxiv.org/abs/2105.06143 Junjie Hu, Chenyou F

13 Dec 10, 2022

[ICLR 2021] "Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective" by Wuyang Chen, Xinyu Gong, Zhangyang Wang

Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective [PDF] Wuyang Chen, Xinyu Gong, Zhangyang Wang In ICLR 2

156 Nov 28, 2022

Graph Transformer Architecture. Source code for

Graph Transformer Architecture Source code for the paper "A Generalization of Transformer Networks to Graphs" by Vijay Prakash Dwivedi and Xavier Bres

561 Jan 8, 2023

Comments

Not use 'save_processed_im'

Hello, very nice work. Thanks for sharing your implementation!

I find that the argument 'save_processed_im' in the train_ScaleNet.py script is not used.

Best regards, Zhenjun

opened by ericzzj1989 0

ScaleNet: A Shallow Architecture for Scale Estimation

Related tags

Overview

ScaleNet: A Shallow Architecture for Scale Estimation

Prerequisite

Scale estimation

Training ScaleNet

Test ScaleNet - camera pose

BibTeX

You might also like...

A large-scale video dataset for the training and evaluation of 3D human pose estimation models

[BMVC 2021] Official PyTorch Implementation of Self-supervised learning of Image Scale and Orientation Estimation

the official code for ICRA 2021 Paper: "Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation"

Monocular Depth Estimation - Weighted-average prediction from multiple pre-trained depth estimation models

Web service for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation based on OpenFace 2.0

OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.

Light-weight network, depth estimation, knowledge distillation, real-time depth estimation, auxiliary data.

[ICLR 2021] "Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective" by Wuyang Chen, Xinyu Gong, Zhangyang Wang

Graph Transformer Architecture. Source code for

Comments

Not use 'save_processed_im'

Owner

Axel Barroso

A Fast Monotone Rotating Shallow Water model

Shallow Convolutional Neural Networks for Human Activity Recognition using Wearable Sensors

Event sourced bank - A wide-and-shallow example using the Python event sourcing library

code for paper "Does Unsupervised Architecture Representation Learning Help Neural Architecture Search?"

Model search is a framework that implements AutoML algorithms for model architecture search at scale

(ImageNet pretrained models) The official pytorch implemention of the TPAMI paper "Res2Net: A New Multi-scale Backbone Architecture"

Compute descriptors for 3D point cloud registration using a multi scale sparse voxel architecture

The official repo for CVPR2021——ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search.

Drone-based Joint Density Map Estimation, Localization and Tracking with Space-Time Multi-Scale Attention Network

A large-scale video dataset for the training and evaluation of 3D human pose estimation models