HDMapNet: A Local Semantic Map Learning and Evaluation Framework

Overview

HDMapNet_devkit

Devkit for HDMapNet.

HDMapNet: A Local Semantic Map Learning and Evaluation Framework

Qi Li, Yue Wang, Yilun Wang, Hang Zhao

[Paper] [Project Page] [5-min video]

Abstract: Estimating local semantics from sensory inputs is a central component for high-definition map constructions in autonomous driving. However, traditional pipelines require a vast amount of human efforts and resources in annotating and maintaining the semantics in the map, which limits its scalability. In this paper, we introduce the problem of local semantic map learning, which dynamically constructs the vectorized semantics based on onboard sensor observations. Meanwhile, we introduce a local semantic map learning method, dubbed HDMapNet. HDMapNet encodes image features from surrounding cameras and/or point clouds from LiDAR, and predicts vectorized map elements in the bird's-eye view. We benchmark HDMapNet on nuScenes dataset and show that in all settings, it performs better than baseline methods. Of note, our fusion-based HDMapNet outperforms existing methods by more than 50% in all metrics. In addition, we develop semantic-level and instance-level metrics to evaluate the map learning performance. Finally, we showcase our method is capable of predicting a locally consistent map. By introducing the method and metrics, we invite the community to study this novel map learning problem. Code and evaluation kit will be released to facilitate future development.

Questions/Requests: Please file an issue or email me at [email protected].

Preparation

  1. Download nuScenes dataset and put it to dataset/ folder.

  2. Install dependencies by running

pip install -r requirement.txt

Vectorization

Run python vis_label.py for demo of vectorized labels. The visualizations are in dataset/nuScenes/samples/GT.

Evaluation

Run python evaluate.py --result_path [submission file] for evaluation. The script accepts vectorized or rasterized maps as input. For vectorized map, We firstly rasterize the vectors to map to do evaluation. For rasterized map, you should make sure the line width=1.

Below is the format for vectorized submission:

-- Whether this submission uses camera data as an input. "use_lidar": -- Whether this submission uses lidar data as an input. "use_radar": -- Whether this submission uses radar data as an input. "use_external": -- Whether this submission uses external data as an input. "vector": true -- Whether this submission uses vector format. }, "results": { sample_token : List[vectorized_line] -- Maps each sample_token to a list of vectorized lines. } } vectorized_line { "pts": List[ ] -- Ordered points to define the vectorized line. "pts_num": , -- Number of points in this line. "type": <0, 1, 2> -- Type of the line: 0: ped; 1: divider; 2: boundary "confidence_level": -- Confidence level for prediction (used by Average Precision) }">
vectorized_submission {
    "meta": {
        "use_camera":   
          
             -- Whether this submission uses camera data as an input.
        "use_lidar":    
           
              -- Whether this submission uses lidar data as an input.
        "use_radar":    
            
               -- Whether this submission uses radar data as an input.
        "use_external": 
             
                -- Whether this submission uses external data as an input.
        "vector":        true   -- Whether this submission uses vector format.
    },
    "results": {
        sample_token 
              
               : List[vectorized_line] -- Maps each sample_token to a list of vectorized lines. } } vectorized_line { "pts": List[
               
                ] -- Ordered points to define the vectorized line. "pts_num": 
                
                 , -- Number of points in this line. "type": <0, 1, 2> -- Type of the line: 0: ped; 1: divider; 2: boundary "confidence_level": 
                 
                   -- Confidence level for prediction (used by Average Precision) } 
                 
                
               
              
             
            
           
          

For rasterized submission, the format is:

-- Whether this submission uses camera data as an input. "use_lidar": -- Whether this submission uses lidar data as an input. "use_radar": -- Whether this submission uses radar data as an input. "use_external": -- Whether this submission uses external data as an input. "vector": false -- Whether this submission uses vector format. }, "results": { sample_token : { -- Maps each sample_token to a list of vectorized lines. "map": [ ], -- Raster map of prediction (C=0: ped; 1: divider 2: boundary). The value indicates the line idx (start from 1). "confidence_level": Array[float], -- confidence_level[i] stands for confidence level for i^th line (start from 1). } } }">
rasterized_submisson {
    "meta": {
        "use_camera":   
        
           -- Whether this submission uses camera data as an input.
        "use_lidar":    
         
            -- Whether this submission uses lidar data as an input.
        "use_radar":    
          
             -- Whether this submission uses radar data as an input.
        "use_external": 
           
              -- Whether this submission uses external data as an input.
        "vector":       false   -- Whether this submission uses vector format.
    },
    "results": {
        sample_token 
            
             : { -- Maps each sample_token to a list of vectorized lines. "map": [
             
              ], -- Raster map of prediction (C=0: ped; 1: divider 2: boundary). The value indicates the line idx (start from 1). "confidence_level": Array[float], -- confidence_level[i] stands for confidence level for i^th line (start from 1). } } } 
             
            
           
          
         
        

Run python export_to_json.py to get a demo of vectorized submission. Run python export_to_json.py --raster for rasterized submission.

Citation

If you found this useful in your research, please consider citing

@misc{li2021hdmapnet,
      title={HDMapNet: A Local Semantic Map Learning and Evaluation Framework}, 
      author={Qi Li and Yue Wang and Yilun Wang and Hang Zhao},
      year={2021},
      eprint={2107.06307},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Comments
  • Some questions about the command 'run evaluate.py'

    Some questions about the command 'run evaluate.py'

    Hi, thank you for your complete and great code. When i tried to use the command 'python evaluate.py' as readme.md said, I got some error message: AttributeError: 'NoneType' object has no attribute 'seek'. You can only torch.load from a file that is seekable. Please pre-load the data into a buffer like io.BytesIO and try to load from it instead. I have trained the model, but i can't solve the problem above. Besides, there is another question: i could't find a argument named 'result_path' as your readme.md said. Should i add the argument by myself or the version of code has some different? Thank you for your time.

    opened by MelonLee-go 6
  • question about GT generation

    question about GT generation

    Thanks for the great work! I run the vis_labe.py following the readme. And I found that the gt lines projected on surrounding views had some misalignment with the real line elements. The results may be attributed to supposing the Z=0 for all map elements in https://github.com/Tsinghua-MARS-Lab/HDMapNet/blob/main/vis_label.py#L75-L76

    Will the gt misalignment between top-down map and perspective images hinder the performance of the surrounding branch of HDMapNet? And how to align the gt? image

    opened by LegendBC 3
  • Question on BEV decoder

    Question on BEV decoder

    Hi, thanks for your great work! I have a question regarding your architecture of the BEV decoder, according to the paper:

    The BEV decoder is a fully convolutional network with 3 branches, namely semantic segmentation branch, instance embedding branch, and direction prediction branch…ResNet with three blocks is used as the BEV decoder.

    If I understand it right, the input BEV feature to the decoder has shape , the segmentation branch has output shape (K' different for each branch). It's not very clear to me how a ResNet can be applied here.

    My guess is you use a U-Net structure with a shared encoder(which uses the last 3 encoders of ResNet with dilated convs to avoid downsampling) and a separate decoder for each branch, but this seems unnecessary given that the input is already a feature vector. Or do you mean each decoder is simply made up of 3 residual blocks? Thanks!

    Also as a minor request, could you upload icon/car.png so that we can run vis_label.py with the ego car icon included in the g.t. map label as well?

    opened by Francis777 3
  • lost intrins in IPM transformation

    lost intrins in IPM transformation

    Hello,

    I have some questions when reviewing the code base. I know the function of IPM Net in the model. I think the grid plane (BEV map) should transfer to the pixel level coordinate system based on the code logics. That's to say, the Ln 192 and 193 of homography.py is to multiply the external trins, intrins, and post_RTs to transform the grid plane. However, the Ks (intrins) seems to be an identity matrix because the Ln 71 of the hdmapnet.py is empty. The Ks is not initialized by the intrins of the camera. It seems a strange error here. By the way, why the post_RTs is None at the Ln 78 of the hdmapnet.py? It should be none or it is an error?

    opened by JadoTu 2
  • Inconsistent Intrinsic Matrix in data fetching, probably leading to worse IPMNet performance

    Inconsistent Intrinsic Matrix in data fetching, probably leading to worse IPMNet performance

    Thank you for your amazing work.

    I notice that there is a resize operation in the data fetching phase that shrinks the image size.

    However, to keep the projection relation between ground truth 3D lanes and the (resized)image pixels, we also need to adjust the camera intrinsic during the data fetching phase. (For example, if the image is downsampled by two, the first two rows of the intrinsic matrix should also become one half smaller)

    In HDMapNet, because we use linear layers to learn the mapping between the front view and the BEV view, there is no need for intrinsic parameters, and the code works fine.

    However, in IPMNet and Lift-Splat-Shoot, correct camera intrinsic parameters are essential to correctly project the FV features to BEV. The inconsistent camera intrinsic matrix could probably hurt the performance of IPMNet and LSS in the experiments mentioned in your paper. Did the ablation experiments in your paper also use the code in this repo to reproduce?

    opened by Owen-Liuyuxuan 2
  • Model Evaluation Correction

    Model Evaluation Correction

    To create submission file try to run -

    python export_gt_to_json.py
    

    After running above file you will get submission.json file.

    To evaluate file

    python evaluate_json.py --result_path [submssion_file_path]
    

    Hope this will save time for those who getting stuck with Readme instructions

    opened by Vishal1711 1
  • about the code release

    about the code release

    Hi, @liqi17thu @hangzhaomit ,

    When will the code be released? The link (https://github.com/Tsinghua-MARS-Lab/HDMapNet-dev) attached in your website is broken now.

    Thanks~

    opened by amiltonwong 1
  • 'instance_seg' and 'direction_pred' both use 'self.up1_embedded.' is it intended?

    'instance_seg' and 'direction_pred' both use 'self.up1_embedded.' is it intended?

    I just found that --direction-pred only works when --instance-seg is activated

    because direction_pred and instance_seg share the self.up1_embedded layer https://github.com/Tsinghua-MARS-Lab/HDMapNet/blob/7911227d45edb0226422083bc4bc452dc18e6399/model/base.py#L131

    i guessed it is kinda typo. because there is another layer designed for direction_pred, 'self.up1_direction'. but never used https://github.com/Tsinghua-MARS-Lab/HDMapNet/blob/7911227d45edb0226422083bc4bc452dc18e6399/model/base.py#L102

    opened by minhyeok-heo 0
  • Is there any wrong with eval_iou()?

    Is there any wrong with eval_iou()?

    Hi, thx for your nice work! When I train the model, I find a problem. In train dataset, IOU can achieve about[0.8 0.6 0.8], while its eval IOU is only about [0.04 0.01 0.05]. Here it is: image So Is there any wrong with function eval_iou()? It seems to be an order of magnitude smaller. Thanks for your time.

    opened by ppbangKGT 0
  • XIO:  fatal IO error 0 (Success) on X server

    XIO: fatal IO error 0 (Success) on X server "localhost:10.0"

    Hi, thx for your nice work! When I read the code and try ro run python vis_label.py to produce label, I meet this error:XIO: fatal IO error 0 (Success) on X server "localhost:10.0 after 18050 requests (18050 known processed) with 862 events remaining. How can I solve this? I wonder if you could give some advice. Thanks for your time!

    opened by ppbangKGT 0
Owner
Tsinghua MARS Lab
MARS Lab at IIIS, Tsinghua University
Tsinghua MARS Lab
Predicting Semantic Map Representations from Images with Pyramid Occupancy Networks

This is the code associated with the paper Predicting Semantic Map Representations from Images with Pyramid Occupancy Networks, published at CVPR 2020.

Thomas Roddick 219 Dec 20, 2022
Codes for TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization.

TS-CAM: Token Semantic Coupled Attention Map for Weakly SupervisedObject Localization This is the official implementaion of paper TS-CAM: Token Semant

vasgaowei 112 Jan 2, 2023
Semantic Scholar's Author Disambiguation Algorithm & Evaluation Suite

S2AND This repository provides access to the S2AND dataset and S2AND reference model described in the paper S2AND: A Benchmark and Evaluation System f

AI2 54 Nov 28, 2022
Framework for joint representation learning, evaluation through multimodal registration and comparison with image translation based approaches

CoMIR: Contrastive Multimodal Image Representation for Registration Framework ?? Registration of images in different modalities with Deep Learning ??

Methods for Image Data Analysis - MIDA 55 Dec 9, 2022
Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Learning Pixel-level Semantic Affinity with Image-level Supervision This code is deprecated. Please see https://github.com/jiwoon-ahn/irn instead. Int

Jiwoon Ahn 337 Dec 15, 2022
​TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.

TextWorld A text-based game generator and extensible sandbox learning environment for training and testing reinforcement learning (RL) agents. Also ch

Microsoft 983 Dec 23, 2022
ImageNet-CoG is a benchmark for concept generalization. It provides a full evaluation framework for pre-trained visual representations which measure how well they generalize to unseen concepts.

The ImageNet-CoG Benchmark Project Website Paper (arXiv) Code repository for the ImageNet-CoG Benchmark introduced in the paper "Concept Generalizatio

NAVER 23 Oct 9, 2022
This YoloV5 based model is fit to detect people and different types of land vehicles, and displaying their density on a fitted map, according to their coordinates and detected labels.

This YoloV5 based model is fit to detect people and different types of land vehicles, and displaying their density on a fitted map, according to their

Liron Bdolah 8 May 22, 2022
A curated list of awesome resources related to Semantic Search🔎 and Semantic Similarity tasks.

A curated list of awesome resources related to Semantic Search?? and Semantic Similarity tasks.

null 224 Jan 4, 2023
Many Class Activation Map methods implemented in Pytorch for CNNs and Vision Transformers. Including Grad-CAM, Grad-CAM++, Score-CAM, Ablation-CAM and XGrad-CAM

Class Activation Map methods implemented in Pytorch pip install grad-cam ⭐ Tested on many Common CNN Networks and Vision Transformers. ⭐ Includes smoo

Jacob Gildenblat 6.6k Jan 6, 2023
An open source bike computer based on Raspberry Pi Zero (W, WH) with GPS and ANT+. Including offline map and navigation.

Pi Zero Bikecomputer An open-source bike computer based on Raspberry Pi Zero (W, WH) with GPS and ANT+ https://github.com/hishizuka/pizero_bikecompute

hishizuka 264 Jan 2, 2023
Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation) Download Synthia dataset The model uses

null 32 Sep 21, 2022
Build upon neural radiance fields to create a scene-specific implicit 3D semantic representation, Semantic-NeRF

Semantic-NeRF: Semantic Neural Radiance Fields Project Page | Video | Paper | Data In-Place Scene Labelling and Understanding with Implicit Scene Repr

Shuaifeng Zhi 243 Jan 7, 2023
Machine learning evaluation metrics, implemented in Python, R, Haskell, and MATLAB / Octave

Note: the current releases of this toolbox are a beta release, to test working with Haskell's, Python's, and R's code repositories. Metrics provides i

Ben Hamner 1.6k Dec 26, 2022
Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

This repository is a toolkit to do machine learning for programming languages. It implements tokenization, dataset preprocessing, model training and m

Facebook Research 408 Jan 1, 2023
A TensorFlow implementation of SOFA, the Simulator for OFfline LeArning and evaluation.

SOFA This repository is the implementation of SOFA, the Simulator for OFfline leArning and evaluation. Keeping Dataset Biases out of the Simulation: A

null 22 Nov 23, 2022
Local trajectory planner based on a multilayer graph framework for autonomous race vehicles.

Graph-Based Local Trajectory Planner The graph-based local trajectory planner is python-based and comes with open interfaces as well as debug, visuali

TUM - Institute of Automotive Technology 160 Jan 4, 2023