[CVPR'21] Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-view Transformation

Last update: Dec 26, 2022

Related tags

Deep Learning cross-view

Overview

Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-view Transformation

Weixiang Yang, Qi Li, Wenxi Liu, Yuanlong Yu, Yuexin Ma, Shengfeng He, Jia Pan

Paper

Accepted to CVPR 2021

Abstract

HD map reconstruction is crucial for autonomous driving. LiDAR-based methods are limited due to the deployed expensive sensors and time-consuming computation. Camera-based methods usually need to separately perform road segmentation and view transformation, which often causes distortion and the absence of content. To push the limits of the technology, we present a novel framework that enables reconstructing a local map formed by road layout and vehicle occupancy in the bird's-eye view given a front-view monocular image only. In particular, we propose a cross-view transformation module, which takes the constraint of cycle consistency between views into account and makes full use of their correlation to strengthen the view transformation and scene understanding. Considering the relationship between vehicles and roads, we also design a context-aware discriminator to further refine the results. Experiments on public benchmarks show that our method achieves the state-of-the-art performance in the tasks of road layout estimation and vehicle occupancy estimation. Especially for the latter task, our model outperforms all competitors by a large margin. Furthermore, our model runs at 35 FPS on a single GPU, which is efficient and applicable for real-time panorama HD map reconstruction.

Contributions

We propose a novel framework that reconstructs a local map formed by top-view road scene layout and vehicle occupancy using a single monocular front-view image only. In particular, we propose a cross-view transformation module which leverages the cycle consistency between views and their correlation to strengthen the view transformation.
We also propose a context-aware discriminator that considers the spatial relationship between vehicles and roads in the task of estimating vehicle occupancies.
On public benchmarks, it is demonstrated that our model achieves the state-of-the-art performance for the tasks of road layout and vehicle occupancy estimation.

Approach overview

Repository Structure

cross-view/
├── crossView            # Contains scripts for dataloaders and network/model architecture
└── datasets             # Contains datasets
    ├── argoverse        # argoverse dataset
    ├── kitti            # kitti dataset 
├── log                  # Contains a log of network/model
├── losses               # Contains scripts for loss of network/model
├── models               # Contains the saved model of the network/model
├── output               # Contains output of network/model
└── splits
    ├── 3Dobject         # Training and testing splits for KITTI 3DObject Detection dataset 
    ├── argo             # Training and testing splits for Argoverse Tracking v1.0 dataset
    ├── odometry         # Training and testing splits for KITTI Odometry dataset
    └── raw              # Training and testing splits for KITTI RAW dataset(based on Schulter et. al.)

Installation

We recommend setting up a Python 3.7 and Pytorch 1.0 Virtual Environment and installing all the dependencies listed in the requirements file.

git clone https://github.com/JonDoe-297/cross-view.git

cd cross-view
pip install -r requirements.txt

Datasets

In the paper, we've presented results for KITTI 3D Object, KITTI Odometry, KITTI RAW, and Argoverse 3D Tracking v1.0 datasets. For comparison with Schulter et. al., We've used the same training and test splits sequences from the KITTI RAW dataset. For more details about the training/testing splits one can look at the splits directory. And you can download Ground-truth from Monolayout.

# Download KITTI RAW
./data/download_datasets.sh raw

# Download KITTI 3D Object
./data/download_datasets.sh object

# Download KITTI Odometry
./data/download_datasets.sh odometry

# Download Argoverse Tracking v1.0
./data/download_datasets.sh argoverse

The above scripts will download, unzip and store the respective datasets in the datasets directory.

datasets/
└── argoverse                          # argoverse dataset
    └── argoverse-tracking
        └── train1
            └── 1d676737-4110-3f7e-bec0-0c90f74c248f
                ├── car_bev_gt         # Vehicle GT
                ├── road_gt            # Road GT
                ├── stereo_front_left  # RGB image
└── kitti                              # kitti dataset 
    └── object                         # kitti 3D Object dataset 
        └── training
            ├── image_2                # RGB image
            ├── vehicle_256            # Vehicle GT
    ├── odometry                       # kitti odometry dataset 
        └── 00
            ├── image_2                # RGB image
            ├── road_dense128  # Road GT
    ├── raw                            # kitti raw dataset 
        └── 2011_09_26
            └── 2011_09_26_drive_0001_sync
                ├── image_2            # RGB image
                ├── road_dense128      # Road GT

Training

Prepare the corresponding dataset
Run training

# Corss view Road (KITTI Odometry)
python3 train.py --type static --split odometry --data_path ./datasets/odometry/ --model_name <Model Name with specifications>

# Corss view Vehicle (KITTI 3D Object)
python3 train.py --type dynamic --split 3Dobject --data_path ./datasets/kitti/object/training --model_name <Model Name with specifications>

# Corss view Road (KITTI RAW)
python3 train.py --type static --split raw --data_path ./datasets/kitti/raw/  --model_name <Model Name with specifications>

# Corss view Vehicle (Argoverse Tracking v1.0)
python3 train.py --type dynamic --split argo --data_path ./datasets/argoverse/ --model_name <Model Name with specifications>

# Corss view Road (Argoverse Tracking v1.0)
python3 train.py --type static --split argo --data_path ./datasets/argoverse/ --model_name <Model Name with specifications>

The training model are in "models" (default: ./models)

Testing

Download pre-trained models
Run testing

python3 test.py --type <static/dynamic> --model_path <path to the model directory> --image_path <path to the image directory>  --out_dir <path to the output directory>

The results are in "output" (default: ./output)

Evaluation

Prepare the corresponding dataset
Download pre-trained models
Run evaluation

# Evaluate on KITTI Odometry 
python3 eval.py --type static --split odometry --model_path <path to the model directory> --data_path ./datasets/odometry --height 512 --width 512 --occ_map_size 128

# Evaluate on KITTI 3D Object
python3 eval.py --type dynamic --split 3Dobject --model_path <path to the model directory> --data_path ./datasets/kitti/object/training

# Evaluate on KITTI RAW
python3 eval.py --type static --split raw --model_path <path to the model directory> --data_path ./datasets/kitti/raw/

# Evaluate on Argoverse Tracking v1.0 (Road)
python3 eval.py --type static --split argo --model_path <path to the model directory> --data_path ./datasets/kitti/argoverse/

# Evaluate on Argoverse Tracking v1.0 (Vehicle)
python3 eval.py --type dynamic --split argo --model_path <path to the model directory> --data_path ./datasets/kitti/argoverse

The results are in "output" (default: ./output)

Pretrained Models

The following table provides links to the pre-trained models for each dataset mentioned in our paper. The table also shows the corresponding evaluation results for these models.

Dataset	Segmentation Objects	mIOU(%)	mAP(%)	Pretrained Model
KITTI 3D Object	Vehicle	38.85	51.04	link
KITTI Odometry	Road	77.47	86.39	link
KITTI Raw	Road	68.26	79.65	link
Argoverse Tracking	Vehicle	47.87	62.69	link
Argoverse Tracking	Road	76.56	87.30	link

Results

Contact

If you meet any problems, please describe them in issues or contact:

Weixiang Yang: [email protected]

License

This project is released under the MIT License (refer to the LICENSE file for details).This project partially depends on the sources of Monolayout

Comments

About test

Hi, thanks a lot for sharing your interesting and great work. I ran test.py following your instructions but got results like this. There are no results of vehicles. I noticed there is an args item "type", but it seems to only work for saving directory. Is there anything I got wrong?

opened by sunnyHelen 1
FIx bug `'tuple' object is not callable` when train.py

Line 174 in file datasets.py :

EDIT color_aug = transforms.ColorJitter.get_params(self.brightness, self.contrast, self.saturation, self.hue) TO color_aug = transforms.ColorJitter(self.brightness, self.contrast, self.saturation, self.hue)

opened by tuan97ta 0
eval setting problems of kitti odometry

The author provides the following eval command; however, we should use the 1024, 1024, 256 as height, width and occ_map_size if we use the pretrained model provided by the author. python3 eval.py --type static --split odometry --model_path --data_path ./datasets/odometry --height 512 --width 512 --occ_map_size 128

That means, we should evaluate on kittiodometry dataset using the following command. python3 eval.py --type static --split odometry --model_path --data_path ./datasets/odometry --height 1024 --width 1024 --occ_map_size 256

opened by JiayuZou2020 0
Where is the discriminator loss?
#4 is the first question and definitely it's different from the discription in your paper. I try the two versions and the results seems no different.

2.Magic code: if type == 'transform_decoder': num_ch_in = 128 if i == 4 else self.num_ch_dec[i + 1] else: num_ch_in = 128 if i == 4 else self.num_ch_dec[i + 1]

And discriminator loss doesn't exist in your current code. That is to say, your code is not based on your paper.
opened by SYSUGrain 0
missing some folders in data

Hi,

I downloaded the data with the provided python from monolayout project. However some of the folders not found. Here are missing folders

1d676737-4110-3f7e-bec0-0c90f74c248f/car_bev_gt 1d676737-4110-3f7e-bec0-0c90f74c248f/road_gt odometry/00/road_dense128 raw/2011_09_26/2011_09_26_drive_0001_sync/ road_dense128

Is there anyway to know how to get these data?

opened by bobd988 1

[CVPR'21] Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-view Transformation

Related tags

Overview

Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-view Transformation

Weixiang Yang, Qi Li, Wenxi Liu, Yuanlong Yu, Yuexin Ma, Shengfeng He, Jia Pan

Paper

Accepted to CVPR 2021

Abstract

Contributions

Approach overview

Repository Structure

Installation

Datasets

Training

Testing

Evaluation

Pretrained Models

Results

Contact

License

Comments

About test

FIx bug `'tuple' object is not callable` when train.py

eval setting problems of kitti odometry

Where is the discriminator loss?

missing some folders in data

Owner

Projecting interval uncertainty through the discrete Fourier transform

I3-master-layout - Simple master and stack layout script

[CVPR21] LightTrack: Finding Lightweight Neural Network for Object Tracking via One-Shot Architecture Search

Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

A novel benchmark dataset for Monocular Layout prediction

Monocular Depth Estimation - Weighted-average prediction from multiple pre-trained depth estimation models

(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

PanopticBEV - Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images

Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging

Code for HLA-Face: Joint High-Low Adaptation for Low Light Face Detection (CVPR21)

Repository relating to the CVPR21 paper TimeLens: Event-based Video Frame Interpolation

Official repository for CVPR21 paper "Deep Stable Learning for Out-Of-Distribution Generalization".

Official code for "End-to-End Optimization of Scene Layout" -- including VAE, Diff Render, SPADE for colorization (CVPR 2020 Oral)

Meta Representation Transformation for Low-resource Cross-lingual Learning

Affine / perspective transformation in Pose Estimation with Tensorflow 2

This repository contains the code for "SBEVNet: End-to-End Deep Stereo Layout Estimation" paper by Divam Gupta, Wei Pu, Trenton Tabor, Jeff Schneider

Trajectory Extraction of road users via Traffic Camera

Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

Self-Supervised Multi-Frame Monocular Scene Flow (CVPR 2021)