[CVPR 2021] "Multimodal Motion Prediction with Stacked Transformers": official code implementation and project page.

DeciForce: Crossroads of Machine Perception and Autonomy

Last update: Dec 31, 2022

Related tags

Overview

mmTransformer

Introduction

This repo is official implementation for mmTransformer in pytorch. Currently, the core code of mmTransformer is implemented in the commercial project, we provide inference code of model with six trajectory propopals for your reference.
For other information, please refer to our paper Multimodal Motion Prediction with Stacked Transformers. (CVPR 2021) [Paper] [Webpage]

Set up your virtual environment

Initialize virtual environment:
```
conda create -n mmTrans python=3.7
```
Install agoverse api. Please refer to this page.
Install the pytorch. The latest codes are tested on Ubuntu 16.04, CUDA11.1, PyTorch 1.8 and Python 3.7: (Note that we require the version of torch >= 1.5.0 for testing with pretrained model)
```
pip install torch==1.8.0+cu111\
      torchvision==0.9.0+cu111\
      torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html
```
For other requirement, please install with following command:
```
pip install -r requirement.txt
```

Preparation

Download the code, model and data

Clone this repo from the GitHub.

 git clone https://github.com/decisionforce/mmTransformer.git

Download the pretrained model and data [here] (map.pkl for Python 3.7 is available [here]) and save it to ./models and ./interm_data.
```
 cd mmTransformer
 mkdir models
 mkdir interm_data
```

Finally, your directory structure should look something like this:

 mmTransformer
 └── models
     └── demo.pt
 └── interm_data
     └── argoverse_info_val.pkl
     └── map.pkl

Preprocess the dataset

Alternatively, you can process the data from scratch using following commands.

Download Argoverse dataset and create a symbolic link to ./data folder or use following commands.

 cd path/to/mmtransformer/root
 mkdir data
 cd data
 wget https://s3.amazonaws.com/argoai-argoverse/forecasting_val_v1.1.tar.gz 
 tar -zxvf  forecasting_val_v1.1.tar.gz

Then extract the agent and map information from raw data via Argoverse API:
```
 python -m lib.dataset.argoverse_convertor ./config/demo.py
```
Finally, your directory structure should look something like above illustrated.

Format of processed data in ‘argoverse_info_val.pkl’:

Format of map information in ‘map.pkl’:

Run the mmTransformer

For testing:

python Evaluation.py ./config/demo.py --model-name demo

Results

Here we showcase the expected results on validation set:

Model	Expected results	Results in paper
minADE	0.709	0.713
minFDE	1.081	1.153
MR (K=6)	10.2	10.6

TODO

We are going to open source our visualization tools and a demo result. (TBD)

Contact us

If you have any issues with the code, please contact to this email: [email protected]

Citation

If you find our work useful for your research, please consider citing the paper

@article{liu2021multimodal,
  title={Multimodal Motion Prediction with Stacked Transformers},
  author={Liu, Yicheng and Zhang, Jinghuai and Fang, Liangji and Jiang, Qinhong and Zhou, Bolei},
  journal={Computer Vision and Pattern Recognition},
  year={2021}
}

Comments

Questions about decoder input and positional encoding
Hi,

In page 4, it is said that 'The decoder inputs are the trajectory proposals, which are initialied by a set of learnable positional encoding'. And in page 9, it is said that 'The decoder receives proposals(randomly initialized), positional encoding of proposals, as well as encoder memory...' So, what is the input of the first decoder layer? Is it randomly initialized proposals added by learnable positional encoding? And what is the initialization distribution?

In page 9, it is said that 'In encoder, spatial positional encoding are added to the queries and keys at each MHSA layer' Is the pisitional encoding in encoder fixed or learnable? Is this positional encoding used in both motion extractor, map aggregator and social constructor or only one of them? Thank you.
opened by panda2020-sky 7
details about the embedding dimension

Could you provide the the embedding dimension of each step in motion aggregator and map extractor( with VectorNet)? I haven't found them or correponding reference in Implementation Details in Appendix. Are they same with the hidden state(128)?

opened by Yisten 7
Some questions about dataloading and model

Hi, congrats on the nice work and thank you for the quick replies on other issues and sharing the data preprocessing repo, which really helps me a lot. I have some questions about the VectorNet and training.

(a) I wonder if you use the subgraph implemented in this repo? I am new to GNN and torch_geometric, I wonder if I can just implement the model with both torch and torch_geometric?

(b) How many epochs do you train for a single experiment? My implementation can not get a good results(2.0+ minADE and 5.0+ minFDE) and I found it takes much more epochs for my model to overfit a small subset of data compare to other non-transformer based model. I wonder if my implementation has some bugs or my training process is wrong.

opened by L4zyy 6
Some questions about visulization
Hi, many thanks for the quick replies on other issues and sharing the data preprocessing repo. I have some questions about the visualization part.

The demo video looks really impressive, but I think for the argoverse forecasting data, the total length is only 5 seconds. In the demo video, each scene looks like lasts about 30 seconds. So I'm wondering if you are using the argoverse forecasting data for visualization?

For the argo data I think it is relatively easy to get and visualize the map with the help of API, but the forecasting data itself did not provide information regarding bounding box size, orientation, etc. So how do you get that information for visualization?

I saw that the core code is implemented in the commercial project so it may difficult to release to the public, but I'm wondering if it is possible to release the code for visualization?

Many thanks!
opened by lyk1993 4
Some questions about the paper

Hello,

Congrats on the nice work! It is not clear to me what happens to other agents. It seems that you treat all agents similarly with the same network. (a) What happens in motion extractor? Do you feed all histories and then update proposals for all vehicles? (b) Is the scene normalized for each agent or you keep it normalized for the target agent?

opened by MohammadHossein-Bahari 4
Several questions about the implementation and paper
Thanks for your great work and the inference code! Here are several questions about this work and it will be very helpful if you give me some hints.

In the paper, you mentioned that "parallel trajectory proposals can integrate the information from the encoder independently, allowing each single proposal to carry disentangled modality information" (page 3). How to understand the term "disentangled"? Is this means that proposals will focus on different modality automatically? I try to visualize the distribution of endpoints generated by different proposals just like that in Fig. 5, and the result is shown below. The problems are:

The endpoint distribution is not spatially disentangled which is different from Fig. 5. Here, endpoints from different proposals are heavily overlapped. Can I assert the proposed RTS makes the prediction spatially disentangled? So how to understand "each single proposal to carry disentangled modality information"?

It seems that only a few proposals are used in most cases -- {0,1,2,3} are used while {4,5} are always low-confident. Is this unbalanced phenomenon also caused by the vanilla training strategy?

Filter the points with confidences lower than the uniform probability (1/K): Without filtering:

In the paper, you mentioned that "we only utilize the decoder of social constructor to update the proposals for target vehicles, instead of all vehicles, in pursuit of higher efficiency" (page 4). However, it seems that the decoder of the social layer (social_dec) is not used in the released code, and social_mem is simply unsqueezed and concatenated with social_out. Is this change intentional? If so, why?

It seems that the ablative results of the order of transformers are missing. Tab. 2 shows the effectiveness of each module but does not contain how the order of the modules influences prediction results.

Thanks. Always happy to hear from you!
opened by MasterIzumi 3
'map.pkl'文件为Python3.8情况下保存的

如题，在执行'val_dataset = ArgoverseDataset(validation_cfg)'命令时，出现'ValueError: unsupported pickle protocol: 5'问题，应该是pickle储存文件时Python版本问题，但您在code解释部分，要求开发环境保持为Python3.7。

opened by Gengmaosi 2
Is the final result on leadingboard trained using train and val dataset both?

Hi! I always wonder know, if the models with competitive results use train dataset only? Because when I tried to submit the result using train and validation dataset, there was a drop in performance. :(

opened by shouldnotfail 1

RuntimeError: CUDA error:

When I tried to implement this code with the below command, I got this error.

command

python Evaluation.py ./config/demo.py --model-name demo

error

gpu number:1
model loaded from ./models/demo.pt
Successfully Loaded model: ./models/demo.pt
Finished Initialization in 15.365s!!!
  0%|                                                                                                                                                                                                                                                            | 0/1234 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "Evaluation.py", line 77, in <module>
    out = model(data)
  File "/home/usaywook/anaconda3/envs/mmTrans/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/media/usaywook/Samsung_T5/tmp/mmTransformer/lib/models/mmTransformer.py", line 150, in forward
    social_mask, lane_enc, lane_mask)
  File "/home/usaywook/anaconda3/envs/mmTrans/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/media/usaywook/Samsung_T5/tmp/mmTransformer/lib/models/TF_version/stacked_transformer.py", line 128, in forward
    lane_mem = self.lane_enc(self.lane_emb(lane_enc), lane_mask) # (batch size, max_lane_num, 128)
  File "/home/usaywook/anaconda3/envs/mmTrans/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/media/usaywook/Samsung_T5/tmp/mmTransformer/lib/models/TF_utils.py", line 49, in forward
    x = layer(x, x_mask)
  File "/home/usaywook/anaconda3/envs/mmTrans/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/media/usaywook/Samsung_T5/tmp/mmTransformer/lib/models/TF_utils.py", line 69, in forward
    x = self.sublayer[0](x, lambda x: self.self_attn(x, x, x, mask))
  File "/home/usaywook/anaconda3/envs/mmTrans/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/media/usaywook/Samsung_T5/tmp/mmTransformer/lib/models/TF_utils.py", line 208, in forward
    return x + self.dropout(sublayer(self.norm(x)))
  File "/media/usaywook/Samsung_T5/tmp/mmTransformer/lib/models/TF_utils.py", line 69, in <lambda>
    x = self.sublayer[0](x, lambda x: self.self_attn(x, x, x, mask))
  File "/home/usaywook/anaconda3/envs/mmTrans/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/media/usaywook/Samsung_T5/tmp/mmTransformer/lib/models/TF_utils.py", line 170, in forward
    query, key, value, mask=mask, dropout=self.dropout)
  File "/media/usaywook/Samsung_T5/tmp/mmTransformer/lib/models/TF_utils.py", line 227, in attention
    scores = torch.matmul(query, key.transpose(-2, -1)) / math.sqrt(d_k)
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemmStridedBatched( handle, opa, opb, m, n, k, &alpha, a, lda, stridea, b, ldb, strideb, &beta, c, ldc, stridec, num_batches)`

If I had removed .cuda in line 61 and line 75 from this code, I could resolve the error. However, I cannot use the GPU to implement this code.

Moreover, in this repository, I cannot find the loss function to consider multimodal trajectories. Could you share the code for the loss function used in the original paper?

opened by Usaywook 0

The ckpt has a social decoder, but the code does not.
Thanks for your code. I encountered several questions when reproducing this paper.
I hope you can help me solve these questions.

Code did not have a social decoder, but the ckpt you provided has. I'm interested in how you trained this ckpt.

What is the classification loss you choose for six trajectory proposals? CrossEntropy or KL and what is the target ?

What is the loss weight for reg loss and cls loss? Thanks for your project again and I really hope to get your reply.
opened by xushilin1 1
demo error: No such file or directory

When I tried to run the demo, python -m lib.dataset.argoverse_convertor ./config/demo.py I got this error. FileNotFoundError: [Errno 2] No such file or directory: '/home/mmTransformer/argoverseapi/map_files/pruned_argoverse_PIT_10314_vector_map.xml'

I have no idea about it! Are there anyone could give me a hand? Thanks!

opened by fgqile 1
Are all agents involved in calculating the loss? How long one epoch takes in training?

I have three question: 1、I guess you only use target without other agents in calculating loss and propagating backward because you only generate one theta value in a sample scene, if not please give me more detail......If you use all agents in loss, you regard every agent as target, then your data preprocess code need to be modified for training 2、How long one epoch takes in training, and how many GPU did you use in experiment, which type GPU did you use. 3、How much improvement did the data augmentation give?

opened by fengsky401 0
Kmeans normalization method in 435 line

Another Question, in part D of Appendix, what is exactly normalization described in line 435 of your paper. Because I cannot figure out what is exactly the 435 line in the paper, so....

opened by YouSonicAI 0

Owner

DeciForce: Crossroads of Machine Perception and Autonomy

Research on Unifying Machine Perception and Autonomy in Zhou Group

GitHub

Official project website for the CVPR 2021 paper "Exploring intermediate representation for monocular vehicle pose estimation"

EgoNet Official project website for the CVPR 2021 paper "Exploring intermediate representation for monocular vehicle pose estimation". This repo inclu

138 Dec 9, 2022

[CVPR 21] Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, CVPR 2021. Ayan Kumar Bhunia, Pinaki nath Chowdhury, Yongxin Yan

44 Dec 12, 2022

project page for VinVL

VinVL: Revisiting Visual Representations in Vision-Language Models Updates 02/28/2021: Project page built. Introduction This repository is the project

308 Jan 9, 2023

Project page of the paper 'Analyzing Perception-Distortion Tradeoff using Enhanced Perceptual Super-resolution Network' (ECCVW 2018)

EPSR (Enhanced Perceptual Super-resolution Network) paper This repo provides the test code, pretrained models, and results on benchmark datasets of ou

78 Nov 19, 2022

Official page of Struct-MDC (RA-L'22 with IROS'22 option); Depth completion from Visual-SLAM using point & line features

Struct-MDC (click the above buttons for redirection!) Official page of "Struct-MDC: Mesh-Refined Unsupervised Depth Completion Leveraging Structural R

37 Dec 22, 2022

Official Implementation and Dataset of "PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask and Group-Level Consistency", CVPR 2021

Portrait Photo Retouching with PPR10K Paper | Supplementary Material PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask an

184 Dec 11, 2022

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

selfcontact This repo is part of our project: On Self-Contact and Human Pose. [Project Page] [Paper] [MPI Project Page] It includes the main function

68 Dec 6, 2022

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

SMPLify-XMC This repo is part of our project: On Self-Contact and Human Pose. [Project Page] [Paper] [MPI Project Page] License Software Copyright Lic

83 Dec 14, 2022

CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training

UC2 UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training Mingyang Zhou, Luowei Zhou, Shuohang Wang, Yu Cheng, Linjie Li, Zhou Yu,

28 Dec 30, 2022

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

TUCH This repo is part of our project: On Self-Contact and Human Pose. [Project Page] [Paper] [MPI Project Page] License Software Copyright License fo

45 Jan 7, 2023

The official implementation of our CVPR 2021 paper - Hybrid Rotation Averaging: A Fast and Robust Rotation Averaging Approach

Graph Optimizer This repo contains the official implementation of our CVPR 2021 paper - Hybrid Rotation Averaging: A Fast and Robust Rotation Averagin

109 Dec 23, 2022

Official Pytorch implementation of "Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video", CVPR 2021

TCMR: Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video Qualtitative result Paper teaser video Introduction This r

215 Jan 6, 2023

Official code of the paper "ReDet: A Rotation-equivariant Detector for Aerial Object Detection" (CVPR 2021)

ReDet: A Rotation-equivariant Detector for Aerial Object Detection ReDet: A Rotation-equivariant Detector for Aerial Object Detection (CVPR2021), Jiam

334 Dec 23, 2022

Official code for the paper: Deep Graph Matching under Quadratic Constraint (CVPR 2021)

QC-DGM This is the official PyTorch implementation and models for our CVPR 2021 paper: Deep Graph Matching under Quadratic Constraint. It also contain

55 Nov 14, 2022

Official PyTorch Code of GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection (CVPR 2021)

GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Mo

76 Jan 2, 2023

This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.

Skeleton Aware Multi-modal Sign Language Recognition By Songyao Jiang, Bin Sun, Lichen Wang, Yue Bai, Kunpeng Li and Yun Fu. Smile Lab @ Northeastern

128 Dec 8, 2022

[CVPR 2021] "Multimodal Motion Prediction with Stacked Transformers": official code implementation and project page.

Related tags

Overview

mmTransformer

Introduction

Set up your virtual environment

Preparation

Download the code, model and data

Preprocess the dataset

Run the mmTransformer

Results

TODO

Contact us

Citation

Comments

Owner

DeciForce: Crossroads of Machine Perception and Autonomy

Official project website for the CVPR 2021 paper "Exploring intermediate representation for monocular vehicle pose estimation"

[CVPR 21] Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

project page for VinVL

Project page of the paper 'Analyzing Perception-Distortion Tradeoff using Enhanced Perceptual Super-resolution Network' (ECCVW 2018)

Official page of Struct-MDC (RA-L'22 with IROS'22 option); Depth completion from Visual-SLAM using point & line features

Official Implementation and Dataset of "PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask and Group-Level Consistency", CVPR 2021

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

The official implementation of our CVPR 2021 paper - Hybrid Rotation Averaging: A Fast and Robust Rotation Averaging Approach

Official Pytorch implementation of "Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video", CVPR 2021

Official code of the paper "ReDet: A Rotation-equivariant Detector for Aerial Object Detection" (CVPR 2021)

Official code for the paper: Deep Graph Matching under Quadratic Constraint (CVPR 2021)

Official PyTorch Code of GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection (CVPR 2021)

Official code for the CVPR 2021 paper "How Well Do Self-Supervised Models Transfer?"

Official PyTorch code of Holistic 3D Scene Understanding from a Single Image with Implicit Representation (CVPR 2021)

Official code of CVPR 2021's PLOP: Learning without Forgetting for Continual Semantic Segmentation

This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.