A Transformer-Based Feature Segmentation and Region Alignment Method For UAV-View Geo-Localization

Overview

University1652-Baseline

Python 3.6 Language grade: Python Total alerts License: MIT

VideoDemo

[Paper] [Slide] [Explore Drone-view Data] [Explore Satellite-view Data] [Explore Street-view Data] [Video Sample] [中文介绍]

This repository contains the dataset link and the code for our paper University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization, ACM Multimedia 2020. The offical paper link is at https://dl.acm.org/doi/10.1145/3394171.3413896. We collect 1652 buildings of 72 universities around the world. Thank you for your kindly attention.

Task 1: Drone-view target localization. (Drone -> Satellite) Given one drone-view image or video, the task aims to find the most similar satellite-view image to localize the target building in the satellite view.

Task 2: Drone navigation. (Satellite -> Drone) Given one satellite-view image, the drone intends to find the most relevant place (drone-view images) that it has passed by. According to its flight history, the drone could be navigated back to the target place.

Table of contents

About Dataset

The dataset split is as follows:

Split #imgs #buildings #universities
Training 50,218 701 33
Query_drone 37,855 701 39
Query_satellite 701 701 39
Query_ground 2,579 701 39
Gallery_drone 51,355 951 39
Gallery_satellite 951 951 39
Gallery_ground 2,921 793 39

More detailed file structure:

├── University-1652/
│   ├── readme.txt
│   ├── train/
│       ├── drone/                   /* drone-view training images 
│           ├── 0001
|           ├── 0002
|           ...
│       ├── street/                  /* street-view training images 
│       ├── satellite/               /* satellite-view training images       
│       ├── google/                  /* noisy street-view training images (collected from Google Image)
│   ├── test/
│       ├── query_drone/  
│       ├── gallery_drone/  
│       ├── query_street/  
│       ├── gallery_street/ 
│       ├── query_satellite/  
│       ├── gallery_satellite/ 
│       ├── 4K_drone/

We note that there are no overlaps between 33 univeristies of training set and 39 univeristies of test set.

News

1 Dec 2021 Fix the issue due to the latest torchvision, which do not allow the empty subfolder. Note that some buildings do not have google images.

3 March 2021 GeM Pooling is added. You may use it by --pool gem.

21 January 2021 The GPU-Re-Ranking, a GNN-based real-time post-processing code, is at Here.

21 August 2020 The transfer learning code for Oxford and Paris is at Here.

27 July 2020 The meta data of 1652 buildings, such as latitude and longitude, are now available at Google Driver. (You could use Google Earth Pro to open the kml file or use vim to check the value).
We also provide the spiral flight tour file at Google Driver. (You could open the kml file via Google Earth Pro to enable the flight camera).

26 July 2020 The paper is accepted by ACM Multimedia 2020.

12 July 2020 I made the baseline of triplet loss (with soft margin) on University-1652 public available at Here.

12 March 2020 I add the state-of-the-art page for geo-localization and tutorial, which will be updated soon.

Code Features

Now we have supported:

  • Float16 to save GPU memory based on apex
  • Multiple Query Evaluation
  • Re-Ranking
  • Random Erasing
  • ResNet/VGG-16
  • Visualize Training Curves
  • Visualize Ranking Result
  • Linear Warm-up

Prerequisites

  • Python 3.6
  • GPU Memory >= 8G
  • Numpy > 1.12.1
  • Pytorch 0.3+
  • [Optional] apex (for float16)

Getting started

Installation

git clone https://github.com/pytorch/vision
cd vision
python setup.py install
  • [Optinal] You may skip it. Install apex from the source
git clone https://github.com/NVIDIA/apex.git
cd apex
python setup.py install --cuda_ext --cpp_ext

Dataset & Preparation

Download [University-1652] upon request. You may use the request template.

Or download CVUSA / CVACT.

For CVUSA, I follow the training/test split in (https://github.com/Liumouliu/OriCNN).

Train & Evaluation

Train & Evaluation University-1652

python train.py --name three_view_long_share_d0.75_256_s1_google  --extra --views 3  --droprate 0.75  --share  --stride 1 --h 256  --w 256 --fp16; 
python test.py --name three_view_long_share_d0.75_256_s1_google

Default setting: Drone -> Satellite If you want to try other evaluation setting, you may change these lines at: https://github.com/layumi/University1652-Baseline/blob/master/test.py#L217-L225

Ablation Study only Satellite & Drone

python train_no_street.py --name two_view_long_no_street_share_d0.75_256_s1  --share --views 3  --droprate 0.75  --stride 1 --h 256  --w 256  --fp16; 
python test.py --name two_view_long_no_street_share_d0.75_256_s1

Set three views but set the weight of loss on street images to zero.

Train & Evaluation CVUSA

python prepare_cvusa.py
python train_cvusa.py --name usa_vgg_noshare_warm5_lr2 --warm 5 --lr 0.02 --use_vgg16 --h 256 --w 256  --fp16 --batchsize 16;
python test_cvusa.py  --name usa_vgg_noshare_warm5_lr2 

Trained Model

You could download the trained model at GoogleDrive or OneDrive. After download, please put model folders under ./model/.

Citation

The following paper uses and reports the result of the baseline model. You may cite it in your paper.

@article{zheng2020university,
  title={University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization},
  author={Zheng, Zhedong and Wei, Yunchao and Yang, Yi},
  journal={ACM Multimedia},
  year={2020}
}

Instance loss is defined in

@article{zheng2017dual,
  title={Dual-Path Convolutional Image-Text Embeddings with Instance Loss},
  author={Zheng, Zhedong and Zheng, Liang and Garrett, Michael and Yang, Yi and Xu, Mingliang and Shen, Yi-Dong},
  journal={ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)},
  doi={10.1145/3383184},
  volume={16},
  number={2},
  pages={1--23},
  year={2020},
  publisher={ACM New York, NY, USA}
}

Related Work

  • Instance Loss Code
  • Lending Orientation to Neural Networks for Cross-view Geo-localization Code
  • Predicting Ground-Level Scene Layout from Aerial Imagery Code
Comments
  • difficulties in downloading the dataset from Google Drive - Need direct link

    difficulties in downloading the dataset from Google Drive - Need direct link

    Hi, thank you for sharing your dataset. Living in China, it's almost impossible to download your dataset from Google Drive. It's also stop if we try to use a VPN. Can you provide a direct link to download your dataset?

    Thank you

    opened by jpainam 5
  • Results can't be reproduced

    Results can't be reproduced

    Hi @layumi , thanks for releasing the codes.

    When I ran the train.py file (using the resnet model), after initializing with the pretraining model parameters and training for 119 epochs, I ran the test.py file and only got the following results: Recall@1:1.29 Recall@5:4.54 Recall@10:7.43 Recall@top1:7.92 AP:2.53

    And when I ran the train.py file using the vgg mode, I got: Recall@1:1.75 Recall@5:6.22 Recall@10:10.36 Recall@top1:11.16 AP:3.39

    The hyper-parameters of the above results are set by default. To get the results in the paper, do I need to modify the hyper-parameters in the code?

    I use pytorch 1.1.0 and V100

    opened by Anonymous-so 4
  • How to visualize the retrieved image?

    How to visualize the retrieved image?

    Hello, I've been looking at your code recently. In test.py file, after extracting the features of the image, save result to pytorch__result. mat file, and then run evaluate_ gpu. py file for evaluation. I want to know how to visualize the search results and get the matching results like Figure 5 in the paper.

    opened by zkangkang0 2
  • Question about collecting images

    Question about collecting images

    Hello, First of all, thank you for sharing your great work.

    I'm currently doing researches with cross-view geo-localization and I want to collect image data like the University1652 dataset, so I was wondering if you could share some sample codes, or a simple tutorial about how to collect images using Google Earth Engine.

    Thank you and best regards.

    opened by viet2411 2
  • Testing Drone -> satellite with views=2 is not defined but is default settings

    Testing Drone -> satellite with views=2 is not defined but is default settings

    Hi. I trained using the tutorial readme with this command. python train.py --gpu_ids 0,2 --name ft_ResNet50 --train_all --batchsize 32 --data_dir /home/xx/datasets/University-Release/train And this is the generated yaml

    DA: false
    batchsize: 32
    color_jitter: false
    data_dir: /home/paul/datasets/University-Release/train
    droprate: 0.5
    erasing_p: 0
    extra_Google: false
    fp16: false
    gpu_ids: 0,2
    h: 384
    lr: 0.01
    moving_avg: 1.0
    name: ft_ResNet50
    nclasses: 701
    pad: 10
    pool: avg
    resume: false
    share: false
    stride: 2
    train_all: true
    use_NAS: false
    use_dense: false
    views: 2
    w: 384
    warm_epoch: 0
    

    So, for testing, i do this python test.py --gpu_ids 0 --name ft_ResNet50 --test_dir /home/xx/datasets/University-Release/test --batchsize 32 --which_epoch 119 I found out that, the views=2 and the view_index=3 in the extract_feature function. Using this code

    def which_view(name):
        if 'satellite' in name:
            return 1
        elif 'street' in name:
            return 2
        elif 'drone' in name:
            return 3
        else:
            print('unknown view')
        return -1
    

    The task is 3 -> 1 means Drone -> Satellite with views=2. But the code in the testing, doesn't consider this scenario

     for scale in ms:
        if scale != 1:
           # bicubic is only  available in pytorch>= 1.1
            input_img = nn.functional.interpolate(input_img, scale_factor=scale, mode='bilinear', align_corners=False)
            if opt.views ==2:
               if view_index == 1:
                  outputs, _ = model(input_img, None) 
               elif view_index ==2:
                   _, outputs = model(None, input_img) 
            elif opt.views ==3:
               if view_index == 1:
                  outputs, _, _ = model(input_img, None, None)
               elif view_index ==2:
                    _, outputs, _ = model(None, input_img, None)
                elif view_index ==3:
                        _, _, outputs = model(None, None, input_img)
                    ff += outputs # Give error, since outputs is not defined
    

    For views == 2, there is no views_index == 3

    opened by jpainam 2
  • file naming: Error Path too long

    file naming: Error Path too long

    Hi, I guess on a Unix/Linux system, such error might not occur. But a file naming similar to the Market-1501 dataset could have been better for Windows based systems. Here an error due to the path length in Windows systems. image

    opened by jpainam 1
  • How to use t-SNE ?

    How to use t-SNE ?

    Hi, Dr. Zheng. After reading your paper, I want to use t-SNE code, could you release this t-SNE code? I find lots of t-SNE codes on github, but I can not find useful codes of using resnet network or pretrained models. Thanks a lot !!!!

    opened by starstarb 1
  • About GNN Re-ranking training program

    About GNN Re-ranking training program

    Hello @layumi , thank you for your work

    I was trying to reproduce the result in paper "Understanding Image Retrieval Re-Ranking: A Graph Neural Network Perspective" using your pytorch code, but I'm having some trouble in running the program.

    The program needs "market_88_test.pkl" as input data for re-ranking process, but I don't understand how to generate it properly.

    Could you give some advices on how to use this code?

    Thank you and best regards.

    opened by viet2411 2
Releases(v1.1)
Owner
Zhedong Zheng
Hi, I am a PhD student at University of Technology Sydney. My work focuses on computer vision, especially representation learning.
Zhedong Zheng
Weak-supervised Visual Geo-localization via Attention-based Knowledge Distillation

Weak-supervised Visual Geo-localization via Attention-based Knowledge Distillation Introduction WAKD is a PyTorch implementation for our ICPR-2022 pap

null 2 Oct 20, 2022
Cross-Image Region Mining with Region Prototypical Network for Weakly Supervised Segmentation

Cross-Image Region Mining with Region Prototypical Network for Weakly Supervised Segmentation The code of: Cross-Image Region Mining with Region Proto

LiuWeide 16 Nov 26, 2022
SSL_SLAM2: Lightweight 3-D Localization and Mapping for Solid-State LiDAR (mapping and localization separated) ICRA 2021

SSL_SLAM2 Lightweight 3-D Localization and Mapping for Solid-State LiDAR (Intel Realsense L515 as an example) This repo is an extension work of SSL_SL

Wang Han 王晗 1.3k Jan 8, 2023
Blender add-on: Add to Cameras menu: View → Camera, View → Add Camera, Camera → View, Previous Camera, Next Camera

Blender add-on: Camera additions In 3D view, it adds these actions to the View|Cameras menu: View → Camera : set the current camera to the 3D view Vie

German Bauer 11 Feb 8, 2022
(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry Official implementation of the paper Multi-View Depth Est

Bae, Gwangbin 138 Dec 28, 2022
This is the official code for the paper "Tracker Meets Night: A Transformer Enhancer for UAV Tracking".

SCT This is the official code for the paper "Tracker Meets Night: A Transformer Enhancer for UAV Tracking" The spatial-channel Transformer (SCT) enhan

Intelligent Vision for Robotics in Complex Environment 27 Nov 23, 2022
The official implementation of paper Siamese Transformer Pyramid Networks for Real-Time UAV Tracking, accepted by WACV22

SiamTPN Introduction This is the official implementation of the SiamTPN (WACV2022). The tracker intergrates pyramid feature network and transformer in

Robotics and Intelligent Systems Control @ NYUAD 28 Nov 25, 2022
Aerial Imagery dataset for fire detection: classification and segmentation (Unmanned Aerial Vehicle (UAV))

Aerial Imagery dataset for fire detection: classification and segmentation using Unmanned Aerial Vehicle (UAV) Title FLAME (Fire Luminosity Airborne-b

null 79 Jan 6, 2023
Python scripts performing class agnostic object localization using the Object Localization Network model in ONNX.

ONNX Object Localization Network Python scripts performing class agnostic object localization using the Object Localization Network model in ONNX. Ori

Ibai Gorordo 15 Oct 14, 2022
PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

Ilya Kostrikov 3k Dec 31, 2022
Prototypical python implementation of the trust-region algorithm presented in Sequential Linearization Method for Bound-Constrained Mathematical Programs with Complementarity Constraints by Larson, Leyffer, Kirches, and Manns.

Prototypical python implementation of the trust-region algorithm presented in Sequential Linearization Method for Bound-Constrained Mathematical Programs with Complementarity Constraints by Larson, Leyffer, Kirches, and Manns.

null 3 Dec 2, 2022
Industrial Image Anomaly Localization Based on Gaussian Clustering of Pre-trained Feature

Industrial Image Anomaly Localization Based on Gaussian Clustering of Pre-trained Feature Q. Wan, L. Gao, X. Li and L. Wen, "Industrial Image Anomaly

smiler 6 Dec 25, 2022
CenterFace(size of 7.3MB) is a practical anchor-free face detection and alignment method for edge devices.

CenterFace Introduce CenterFace(size of 7.3MB) is a practical anchor-free face detection and alignment method for edge devices. Recent Update 2019.09.

StarClouds 1.2k Dec 21, 2022
PanopticBEV - Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images

Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images This r

null 63 Dec 16, 2022
the code of the paper: Recurrent Multi-view Alignment Network for Unsupervised Surface Registration (CVPR 2021)

RMA-Net This repo is the implementation of the paper: Recurrent Multi-view Alignment Network for Unsupervised Surface Registration (CVPR 2021). Paper

Wanquan Feng 205 Nov 9, 2022
Pytorch implementation for "Implicit Feature Alignment: Learn to Convert Text Recognizer to Text Spotter".

Implicit Feature Alignment: Learn to Convert Text Recognizer to Text Spotter This is a pytorch-based implementation for paper Implicit Feature Alignme

wangtianwei 61 Nov 12, 2022
An official implementation of the paper Exploring Sequence Feature Alignment for Domain Adaptive Detection Transformers

Sequence Feature Alignment (SFA) By Wen Wang, Yang Cao, Jing Zhang, Fengxiang He, Zheng-jun Zha, Yonggang Wen, and Dacheng Tao This repository is an o

WangWen 79 Dec 24, 2022
nnFormer: Interleaved Transformer for Volumetric Segmentation Code for paper "nnFormer: Interleaved Transformer for Volumetric Segmentation "

nnFormer: Interleaved Transformer for Volumetric Segmentation Code for paper "nnFormer: Interleaved Transformer for Volumetric Segmentation ". Please

jsguo 610 Dec 28, 2022