Implementation of the "Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos" paper.

Hehe Fan

Last update: Dec 29, 2022

Related tags

Deep Learning P4Transformer

Overview

Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos

Introduction

Point cloud videos exhibit irregularities and lack of order along the spatial dimension where points emerge inconsistently across different frames. To capture the dynamics in point cloud videos, point tracking is usually employed. However, as points may flow in and out across frames, computing accurate point trajectories is extremely difficult. Moreover, tracking usually relies on point colors and thus may fail to handle colorless point clouds. In this paper, to avoid point tracking, we propose a novel Point 4D Transformer (P4Transformer) network to model raw point cloud videos. Specifically, P4Transformer consists of (i) a point 4D convolution to embed the spatio-temporal local structures presented in a point cloud video and (ii) a transformer to capture the appearance and motion information across the entire video by performing self-attention on the embedded local features. In this fashion, related or similar local areas are merged with attention weight rather than by explicit tracking.

Installation

The code is tested with Red Hat Enterprise Linux Workstation release 7.7 (Maipo), g++ (GCC) 8.3.1, PyTorch (both v1.4.0 and v1.8.1 are supported), CUDA 10.2 and cuDNN v7.6.

Compile the CUDA layers for PointNet++, which we used for furthest point sampling (FPS) and radius neighbouring search:

mv modules-pytorch-1.4.0/modules-pytorch-1.8.1 modules
cd modules
python setup.py install

Citation

If you find our work useful in your research, please consider citing:

@inproceedings{fan21p4transformer,
  author    = {Hehe Fan and
               Yi Yang and
               Mohan Kankanhalli},
  title     = {Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos},
  booktitle = {{IEEE/CVF} Conference on Computer Vision and Pattern Recognition, {CVPR}},
  year      = {2021}
}

Related Repos

PointNet++ PyTorch implementation: https://github.com/facebookresearch/votenet/tree/master/pointnet2
MeteorNet: https://github.com/xingyul/meteornet
3DV: https://github.com/3huo/3DV-Action
PSTNet: https://github.com/hehefan/Point-Spatio-Temporal-Convolution
Transformer: https://github.com/lucidrains/vit-pytorch
PointRNN (TensorFlow implementation): https://github.com/hehefan/PointRNN
PointRNN (PyTorch implementation): https://github.com/hehefan/PointRNN-PyTorch

Comments

collab example?

Hello, I have just discovered your project. how could i use it to do inference on a real time feed?

i am trying to feed a depth image or a point cloud to "some algorithm" that will put them together and store them in a ply. i think this is called point cloud registration? but it overlaps with sfM, where i would take N camera shots and then store a point cloud, then do N+M shots and compare with the previous point cloud to see how much did the new M steps contribute to adding "relevant" points to the PC. or some variant of that maybe with this project.

opened by Ademord 8
Transfer of points to nearest frames

@hehefan

Can you please explain what does transfer of points to nearest frames mean? how are they transferred? (I understand those anchor points are picked using Farthest point sampling)

Thanks in Advance.

opened by sheshap 3
about MSR

Do I need preprocessing when running the MSR dataset, or do I directly access his depth file? If preprocessing is required, can you provide preprocessing code

opened by weiyutao886 1
Including Color (RGB) or other features

Hi, I'm just curious if it's possible to add color information in the form of RGB channels, or even other point-wise features, as input to the model. And, if it is, what kind of modifications would be required on the model architecture. Is it possible/logical to just add extra channels so that instead of passing xyz tensors, we could feed the model the stacked xyzrgb tensors.

Thanks.

opened by ShadiZaki 1
Update module installation

To address issue #11 where the pointnet2 installation imports error out, we updated the pointnet2_utils.py file for both Pytorch 1.4.0 & 1.8.1 modules to import from the correct module.

opened by smellslikeml 0
errors occur while running python setup.py install

After downloading this repository, I follow readme.md to run python setup.py, but the error occurs:

My current cuda version is 11.6 , and the pytorch version is 1.11.3. I wonder if the above problem can be fixed by just installing the pytorch with the version of 1.8.1. under a new virtual conda environment. Looking forward to your helps, thanks a lot!

opened by Carbord 0
Pre-processing on synthia4D

hi, i am wondering why you cut one frame into two pieces and use the loss of each piece to update the model when training with synthia4D? Is there any reason to do this? Specifically, i don't understand why using the function called half_crop_w_context in datasets.synthia.

opened by dyh-Jack 1
Problems about NTU RGBD dataset pre-process

I use the pre-processor from 3DV work, but seems that it will transform the data into .npy files instead of .npz files, so can you provide the code for pre-process of NTU RGBD? Much appreciated.

opened by yxc21ssucb 1
Problems about results

Hi, I try to run the code and get the result as a baseline. I didn't change the parameters and set the clip len=24, batch size=14, but I only got 89.55 acc for training 100 epochs, while the paper said it should be 90.94. I just don't know why the results are different. I upload my training process, hope you can give me some possible reasons. out_P4.txt

opened by dyh-Jack 7
visualize tansformer's attention

I want to visualize tansformer's attention. I see that Fig4 in your paper visualizes it. Can you tell me where and how to visualize it? Can you share the visualization code? Thank you

opened by weiyutao886 15
Intuition behind choice of input points, ball query radius, nsamples and spatial-stride
Dear @hehefan ,

From the train-msr.py file, I see the default values input points = 2048, ball query radius = 0.7, nsamples = 32, spatial-stride = 32

On a given point cloud frame that contains 2048 points, the farthest points sampled (FPS) i.e., 32 are sampled (smaller). Around each of them, a ball with a radius of 0.7 (bigger) is used to query 32 points (very small number).

I have a few questions to understand the setting.

Wouldn't the queried points (32) around the farthest point be very close to it since the overall object contains 2048 points(dense)?

What is the intuition behind choosing a bigger radius but querying only 32 points at each FPS point?

I suspect due to 2048 input points, but querying only 32 points around each of 32 farthest points, the points considered from a given frame at a time are limited to 32x32 i.e., 1024 points. And these points are clusters of 32 points around each of the 32 farthest points due to ball querying.

Please help me understand the intuition behind the design.

Thanks in advance.
opened by sheshap 3

Implementation of the "Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos" paper.

Related tags

Overview

Introduction

Installation

Citation

Related Repos

Comments

Owner

Hehe Fan

Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper. This code is for the part of the paper describing video-based avatars.

Code for paper ECCV 2020 paper: Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization in the Loop.

The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

Home repository for the Regularized Greedy Forest (RGF) library. It includes original implementation from the paper and multithreaded one written in C++, along with various language-specific wrappers.

Official implementation of AAAI-21 paper "Label Confusion Learning to Enhance Text Classification Models"

Official PyTorch implementation for paper Context Matters: Graph-based Self-supervised Representation Learning for Medical Images

A PyTorch re-implementation of the paper 'Exploring Simple Siamese Representation Learning'. Reproduced the 67.8% Top1 Acc on ImageNet.

Implementation of the paper NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series Forecasting.

This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data.

Official implementation of the ICLR 2021 paper

Implementation of Nyström Self-attention, from the paper Nyströmformer

Implementation of SETR model, Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.

Official implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis https://arxiv.org/abs/2011.13775

Official pytorch implementation of paper "Image-to-image Translation via Hierarchical Style Disentanglement".

PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

Implementation of Barlow Twins paper

Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

Official implementation of our paper "LLA: Loss-aware Label Assignment for Dense Pedestrian Detection" in Pytorch.

Functional TensorFlow Implementation of Singular Value Decomposition for paper Fast Graph Learning