HDMapNet: A Local Semantic Map Learning and Evaluation Framework

Tsinghua MARS Lab

Last update: Jan 4, 2023

Related tags

Deep Learning HDMapNet

Overview

HDMapNet_devkit

Devkit for HDMapNet.

HDMapNet: A Local Semantic Map Learning and Evaluation Framework

Qi Li, Yue Wang, Yilun Wang, Hang Zhao

[Paper] [Project Page] [5-min video]

Abstract: Estimating local semantics from sensory inputs is a central component for high-definition map constructions in autonomous driving. However, traditional pipelines require a vast amount of human efforts and resources in annotating and maintaining the semantics in the map, which limits its scalability. In this paper, we introduce the problem of local semantic map learning, which dynamically constructs the vectorized semantics based on onboard sensor observations. Meanwhile, we introduce a local semantic map learning method, dubbed HDMapNet. HDMapNet encodes image features from surrounding cameras and/or point clouds from LiDAR, and predicts vectorized map elements in the bird's-eye view. We benchmark HDMapNet on nuScenes dataset and show that in all settings, it performs better than baseline methods. Of note, our fusion-based HDMapNet outperforms existing methods by more than 50% in all metrics. In addition, we develop semantic-level and instance-level metrics to evaluate the map learning performance. Finally, we showcase our method is capable of predicting a locally consistent map. By introducing the method and metrics, we invite the community to study this novel map learning problem. Code and evaluation kit will be released to facilitate future development.

Questions/Requests: Please file an issue or email me at [email protected].

Preparation

Download nuScenes dataset and put it to dataset/ folder.
Install dependencies by running

pip install -r requirement.txt

Vectorization

Run python vis_label.py for demo of vectorized labels. The visualizations are in dataset/nuScenes/samples/GT.

Evaluation

Run python evaluate.py --result_path [submission file] for evaluation. The script accepts vectorized or rasterized maps as input. For vectorized map, We firstly rasterize the vectors to map to do evaluation. For rasterized map, you should make sure the line width=1.

Below is the format for vectorized submission:

-- Whether this submission uses camera data as an input. "use_lidar": -- Whether this submission uses lidar data as an input. "use_radar": -- Whether this submission uses radar data as an input. "use_external": -- Whether this submission uses external data as an input. "vector": true -- Whether this submission uses vector format. }, "results": { sample_token : List[vectorized_line] -- Maps each sample_token to a list of vectorized lines. } } vectorized_line { "pts": List[ ] -- Ordered points to define the vectorized line. "pts_num": , -- Number of points in this line. "type": <0, 1, 2> -- Type of the line: 0: ped; 1: divider; 2: boundary "confidence_level": -- Confidence level for prediction (used by Average Precision) }">

vectorized_submission {
    "meta": {
        "use_camera":   
          
             -- Whether this submission uses camera data as an input.
        "use_lidar":    
           
              -- Whether this submission uses lidar data as an input.
        "use_radar":    
            
               -- Whether this submission uses radar data as an input.
        "use_external": 
             
                -- Whether this submission uses external data as an input.
        "vector":        true   -- Whether this submission uses vector format.
    },
    "results": {
        sample_token 
              
               : List[vectorized_line] -- Maps each sample_token to a list of vectorized lines. } } vectorized_line { "pts": List[
               
                ] -- Ordered points to define the vectorized line. "pts_num": 
                
                 , -- Number of points in this line. "type": <0, 1, 2> -- Type of the line: 0: ped; 1: divider; 2: boundary "confidence_level": 
                 
                   -- Confidence level for prediction (used by Average Precision) }

For rasterized submission, the format is:

-- Whether this submission uses camera data as an input. "use_lidar": -- Whether this submission uses lidar data as an input. "use_radar": -- Whether this submission uses radar data as an input. "use_external": -- Whether this submission uses external data as an input. "vector": false -- Whether this submission uses vector format. }, "results": { sample_token : { -- Maps each sample_token to a list of vectorized lines. "map": [ ], -- Raster map of prediction (C=0: ped; 1: divider 2: boundary). The value indicates the line idx (start from 1). "confidence_level": Array[float], -- confidence_level[i] stands for confidence level for i^th line (start from 1). } } }">

rasterized_submisson {
    "meta": {
        "use_camera":   
        
           -- Whether this submission uses camera data as an input.
        "use_lidar":    
         
            -- Whether this submission uses lidar data as an input.
        "use_radar":    
          
             -- Whether this submission uses radar data as an input.
        "use_external": 
           
              -- Whether this submission uses external data as an input.
        "vector":       false   -- Whether this submission uses vector format.
    },
    "results": {
        sample_token 
            
             : { -- Maps each sample_token to a list of vectorized lines. "map": [
             
              ], -- Raster map of prediction (C=0: ped; 1: divider 2: boundary). The value indicates the line idx (start from 1). "confidence_level": Array[float], -- confidence_level[i] stands for confidence level for i^th line (start from 1). } } }

Run python export_to_json.py to get a demo of vectorized submission. Run python export_to_json.py --raster for rasterized submission.

Citation

If you found this useful in your research, please consider citing

@misc{li2021hdmapnet,
      title={HDMapNet: A Local Semantic Map Learning and Evaluation Framework}, 
      author={Qi Li and Yue Wang and Yilun Wang and Hang Zhao},
      year={2021},
      eprint={2107.06307},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Comments

Some questions about the command 'run evaluate.py'

Hi, thank you for your complete and great code. When i tried to use the command 'python evaluate.py' as readme.md said, I got some error message: AttributeError: 'NoneType' object has no attribute 'seek'. You can only torch.load from a file that is seekable. Please pre-load the data into a buffer like io.BytesIO and try to load from it instead. I have trained the model, but i can't solve the problem above. Besides, there is another question: i could't find a argument named 'result_path' as your readme.md said. Should i add the argument by myself or the version of code has some different? Thank you for your time.

opened by MelonLee-go 6
question about GT generation

Thanks for the great work! I run the vis_labe.py following the readme. And I found that the gt lines projected on surrounding views had some misalignment with the real line elements. The results may be attributed to supposing the Z=0 for all map elements in https://github.com/Tsinghua-MARS-Lab/HDMapNet/blob/main/vis_label.py#L75-L76

Will the gt misalignment between top-down map and perspective images hinder the performance of the surrounding branch of HDMapNet? And how to align the gt?

opened by LegendBC 3
Question on BEV decoder

Hi, thanks for your great work! I have a question regarding your architecture of the BEV decoder, according to the paper:

The BEV decoder is a fully convolutional network with 3 branches, namely semantic segmentation branch, instance embedding branch, and direction prediction branch…ResNet with three blocks is used as the BEV decoder.

If I understand it right, the input BEV feature to the decoder has shape $[H_{\text{bev}}, W_{\text{bev}}, K]$ , the segmentation branch has output shape $[H_{\text{bev}}, W_{\text{bev}}, K^{'}]$ (K' different for each branch). It's not very clear to me how a ResNet can be applied here.

My guess is you use a U-Net structure with a shared encoder(which uses the last 3 encoders of ResNet with dilated convs to avoid downsampling) and a separate decoder for each branch, but this seems unnecessary given that the input is already a feature vector. Or do you mean each decoder is simply made up of 3 residual blocks? Thanks!

Also as a minor request, could you upload icon/car.png so that we can run vis_label.py with the ego car icon included in the g.t. map label as well?

opened by Francis777 3
lost intrins in IPM transformation

Hello,

I have some questions when reviewing the code base. I know the function of IPM Net in the model. I think the grid plane (BEV map) should transfer to the pixel level coordinate system based on the code logics. That's to say, the Ln 192 and 193 of homography.py is to multiply the external trins, intrins, and post_RTs to transform the grid plane. However, the Ks (intrins) seems to be an identity matrix because the Ln 71 of the hdmapnet.py is empty. The Ks is not initialized by the intrins of the camera. It seems a strange error here. By the way, why the post_RTs is None at the Ln 78 of the hdmapnet.py? It should be none or it is an error?

opened by JadoTu 2
Inconsistent Intrinsic Matrix in data fetching, probably leading to worse IPMNet performance

Thank you for your amazing work.

I notice that there is a resize operation in the data fetching phase that shrinks the image size.

However, to keep the projection relation between ground truth 3D lanes and the (resized)image pixels, we also need to adjust the camera intrinsic during the data fetching phase. (For example, if the image is downsampled by two, the first two rows of the intrinsic matrix should also become one half smaller)

In HDMapNet, because we use linear layers to learn the mapping between the front view and the BEV view, there is no need for intrinsic parameters, and the code works fine.

However, in IPMNet and Lift-Splat-Shoot, correct camera intrinsic parameters are essential to correctly project the FV features to BEV. The inconsistent camera intrinsic matrix could probably hurt the performance of IPMNet and LSS in the experiments mentioned in your paper. Did the ablation experiments in your paper also use the code in this repo to reproduce?

opened by Owen-Liuyuxuan 2
Model Evaluation Correction
To create submission file try to run -

python export_gt_to_json.py

After running above file you will get submission.json file.

To evaluate file

python evaluate_json.py --result_path [submssion_file_path]

Hope this will save time for those who getting stuck with Readme instructions
opened by Vishal1711 1
about the code release

Hi, @liqi17thu @hangzhaomit ,

When will the code be released? The link (https://github.com/Tsinghua-MARS-Lab/HDMapNet-dev) attached in your website is broken now.

Thanks~

opened by amiltonwong 1
'instance_seg' and 'direction_pred' both use 'self.up1_embedded.' is it intended?

I just found that --direction-pred only works when --instance-seg is activated

because direction_pred and instance_seg share the self.up1_embedded layer https://github.com/Tsinghua-MARS-Lab/HDMapNet/blob/7911227d45edb0226422083bc4bc452dc18e6399/model/base.py#L131

i guessed it is kinda typo. because there is another layer designed for direction_pred, 'self.up1_direction'. but never used https://github.com/Tsinghua-MARS-Lab/HDMapNet/blob/7911227d45edb0226422083bc4bc452dc18e6399/model/base.py#L102

opened by minhyeok-heo 0
Is there any wrong with eval_iou()?

Hi, thx for your nice work! When I train the model, I find a problem. In train dataset, IOU can achieve about[0.8 0.6 0.8], while its eval IOU is only about [0.04 0.01 0.05]. Here it is: So Is there any wrong with function eval_iou()? It seems to be an order of magnitude smaller. Thanks for your time.

opened by ppbangKGT 0
XIO: fatal IO error 0 (Success) on X server "localhost:10.0"

Hi, thx for your nice work! When I read the code and try ro run python vis_label.py to produce label, I meet this error:XIO: fatal IO error 0 (Success) on X server "localhost:10.0 after 18050 requests (18050 known processed) with 862 events remaining. How can I solve this? I wonder if you could give some advice. Thanks for your time!

opened by ppbangKGT 0

HDMapNet: A Local Semantic Map Learning and Evaluation Framework

Related tags

Overview

HDMapNet_devkit

Preparation

Vectorization

Evaluation

Citation

Comments

Some questions about the command 'run evaluate.py'

question about GT generation

Question on BEV decoder

lost intrins in IPM transformation

Inconsistent Intrinsic Matrix in data fetching, probably leading to worse IPMNet performance

Model Evaluation Correction

about the code release

'instance_seg' and 'direction_pred' both use 'self.up1_embedded.' is it intended?

Is there any wrong with eval_iou()?

XIO: fatal IO error 0 (Success) on X server "localhost:10.0"

Owner

Tsinghua MARS Lab

Predicting Semantic Map Representations from Images with Pyramid Occupancy Networks

Codes for TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization.

Semantic Scholar's Author Disambiguation Algorithm & Evaluation Suite

Framework for joint representation learning, evaluation through multimodal registration and comparison with image translation based approaches

Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.

ImageNet-CoG is a benchmark for concept generalization. It provides a full evaluation framework for pre-trained visual representations which measure how well they generalize to unseen concepts.

This YoloV5 based model is fit to detect people and different types of land vehicles, and displaying their density on a fitted map, according to their coordinates and detected labels.

A curated list of awesome resources related to Semantic Search🔎 and Semantic Similarity tasks.

Many Class Activation Map methods implemented in Pytorch for CNNs and Vision Transformers. Including Grad-CAM, Grad-CAM++, Score-CAM, Ablation-CAM and XGrad-CAM

An open source bike computer based on Raspberry Pi Zero (W, WH) with GPS and ANT+. Including offline map and navigation.

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)

Siamese-nn-semantic-text-similarity - A repository containing comprehensive Neural Networks based PyTorch implementations for the semantic text similarity task

Build upon neural radiance fields to create a scene-specific implicit 3D semantic representation, Semantic-NeRF

Machine learning evaluation metrics, implemented in Python, R, Haskell, and MATLAB / Octave

Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

A TensorFlow implementation of SOFA, the Simulator for OFfline LeArning and evaluation.

Local trajectory planner based on a multilayer graph framework for autonomous race vehicles.

HDMapNet: A Local Semantic Map Learning and Evaluation Framework

Related tags

Overview

HDMapNet_devkit

Preparation

Vectorization

Evaluation

Citation

Comments

Owner

Tsinghua MARS Lab

Predicting Semantic Map Representations from Images with Pyramid Occupancy Networks

Codes for TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization.

Semantic Scholar's Author Disambiguation Algorithm & Evaluation Suite

Framework for joint representation learning, evaluation through multimodal registration and comparison with image translation based approaches

Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

​TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.

ImageNet-CoG is a benchmark for concept generalization. It provides a full evaluation framework for pre-trained visual representations which measure how well they generalize to unseen concepts.

This YoloV5 based model is fit to detect people and different types of land vehicles, and displaying their density on a fitted map, according to their coordinates and detected labels.

A curated list of awesome resources related to Semantic Search🔎 and Semantic Similarity tasks.

Many Class Activation Map methods implemented in Pytorch for CNNs and Vision Transformers. Including Grad-CAM, Grad-CAM++, Score-CAM, Ablation-CAM and XGrad-CAM

An open source bike computer based on Raspberry Pi Zero (W, WH) with GPS and ANT+. Including offline map and navigation.

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)

Siamese-nn-semantic-text-similarity - A repository containing comprehensive Neural Networks based PyTorch implementations for the semantic text similarity task

Build upon neural radiance fields to create a scene-specific implicit 3D semantic representation, Semantic-NeRF

Machine learning evaluation metrics, implemented in Python, R, Haskell, and MATLAB / Octave

Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

A TensorFlow implementation of SOFA, the Simulator for OFfline LeArning and evaluation.

Local trajectory planner based on a multilayer graph framework for autonomous race vehicles.

TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.