Implementation of the "Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos" paper.

Overview

Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos

Introduction

Point cloud videos exhibit irregularities and lack of order along the spatial dimension where points emerge inconsistently across different frames. To capture the dynamics in point cloud videos, point tracking is usually employed. However, as points may flow in and out across frames, computing accurate point trajectories is extremely difficult. Moreover, tracking usually relies on point colors and thus may fail to handle colorless point clouds. In this paper, to avoid point tracking, we propose a novel Point 4D Transformer (P4Transformer) network to model raw point cloud videos. Specifically, P4Transformer consists of (i) a point 4D convolution to embed the spatio-temporal local structures presented in a point cloud video and (ii) a transformer to capture the appearance and motion information across the entire video by performing self-attention on the embedded local features. In this fashion, related or similar local areas are merged with attention weight rather than by explicit tracking.

Installation

The code is tested with Red Hat Enterprise Linux Workstation release 7.7 (Maipo), g++ (GCC) 8.3.1, PyTorch (both v1.4.0 and v1.8.1 are supported), CUDA 10.2 and cuDNN v7.6.

Compile the CUDA layers for PointNet++, which we used for furthest point sampling (FPS) and radius neighbouring search:

mv modules-pytorch-1.4.0/modules-pytorch-1.8.1 modules
cd modules
python setup.py install

Citation

If you find our work useful in your research, please consider citing:

@inproceedings{fan21p4transformer,
  author    = {Hehe Fan and
               Yi Yang and
               Mohan Kankanhalli},
  title     = {Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos},
  booktitle = {{IEEE/CVF} Conference on Computer Vision and Pattern Recognition, {CVPR}},
  year      = {2021}
}

Related Repos

  1. PointNet++ PyTorch implementation: https://github.com/facebookresearch/votenet/tree/master/pointnet2
  2. MeteorNet: https://github.com/xingyul/meteornet
  3. 3DV: https://github.com/3huo/3DV-Action
  4. PSTNet: https://github.com/hehefan/Point-Spatio-Temporal-Convolution
  5. Transformer: https://github.com/lucidrains/vit-pytorch
  6. PointRNN (TensorFlow implementation): https://github.com/hehefan/PointRNN
  7. PointRNN (PyTorch implementation): https://github.com/hehefan/PointRNN-PyTorch
Comments
  • collab example?

    collab example?

    Hello, I have just discovered your project. how could i use it to do inference on a real time feed?

    i am trying to feed a depth image or a point cloud to "some algorithm" that will put them together and store them in a ply. i think this is called point cloud registration? but it overlaps with sfM, where i would take N camera shots and then store a point cloud, then do N+M shots and compare with the previous point cloud to see how much did the new M steps contribute to adding "relevant" points to the PC. or some variant of that maybe with this project.

    opened by Ademord 8
  • Transfer of points to nearest frames

    Transfer of points to nearest frames

    @hehefan

    Can you please explain what does transfer of points to nearest frames mean? how are they transferred? (I understand those anchor points are picked using Farthest point sampling)

    image

    Thanks in Advance.

    opened by sheshap 3
  • about MSR

    about MSR

    Do I need preprocessing when running the MSR dataset, or do I directly access his depth file? If preprocessing is required, can you provide preprocessing code

    opened by weiyutao886 1
  • Including Color (RGB) or other features

    Including Color (RGB) or other features

    Hi, I'm just curious if it's possible to add color information in the form of RGB channels, or even other point-wise features, as input to the model. And, if it is, what kind of modifications would be required on the model architecture. Is it possible/logical to just add extra channels so that instead of passing xyz tensors, we could feed the model the stacked xyzrgb tensors.

    Thanks.

    opened by ShadiZaki 1
  • Update module installation

    Update module installation

    To address issue #11 where the pointnet2 installation imports error out, we updated the pointnet2_utils.py file for both Pytorch 1.4.0 & 1.8.1 modules to import from the correct module.

    opened by smellslikeml 0
  • errors occur while running  python setup.py install

    errors occur while running python setup.py install

    After downloading this repository, I follow readme.md to run python setup.py, but the error occurs: {MM6WZL@Z5~Z`6YH9~7W3Z7

    My current cuda version is 11.6 , and the pytorch version is 1.11.3. I wonder if the above problem can be fixed by just installing the pytorch with the version of 1.8.1. under a new virtual conda environment. Looking forward to your helps, thanks a lot!

    opened by Carbord 0
  • Pre-processing on synthia4D

    Pre-processing on synthia4D

    hi, i am wondering why you cut one frame into two pieces and use the loss of each piece to update the model when training with synthia4D? Is there any reason to do this? Specifically, i don't understand why using the function called half_crop_w_context in datasets.synthia.

    opened by dyh-Jack 1
  • Problems about NTU RGBD dataset pre-process

    Problems about NTU RGBD dataset pre-process

    I use the pre-processor from 3DV work, but seems that it will transform the data into .npy files instead of .npz files, so can you provide the code for pre-process of NTU RGBD? Much appreciated.

    opened by yxc21ssucb 1
  • Problems about results

    Problems about results

    Hi, I try to run the code and get the result as a baseline. I didn't change the parameters and set the clip len=24, batch size=14, but I only got 89.55 acc for training 100 epochs, while the paper said it should be 90.94. I just don't know why the results are different. I upload my training process, hope you can give me some possible reasons. out_P4.txt

    opened by dyh-Jack 7
  • visualize tansformer's attention

    visualize tansformer's attention

    I want to visualize tansformer's attention. I see that Fig4 in your paper visualizes it. Can you tell me where and how to visualize it? Can you share the visualization code? Thank you

    opened by weiyutao886 15
  • Intuition behind choice of input points, ball query radius, nsamples and spatial-stride

    Intuition behind choice of input points, ball query radius, nsamples and spatial-stride

    Dear @hehefan ,

    From the train-msr.py file, I see the default values input points = 2048, ball query radius = 0.7, nsamples = 32, spatial-stride = 32

    On a given point cloud frame that contains 2048 points, the farthest points sampled (FPS) i.e., 32 are sampled (smaller). Around each of them, a ball with a radius of 0.7 (bigger) is used to query 32 points (very small number).

    I have a few questions to understand the setting.

    1. Wouldn't the queried points (32) around the farthest point be very close to it since the overall object contains 2048 points(dense)?
    2. What is the intuition behind choosing a bigger radius but querying only 32 points at each FPS point?

    I suspect due to 2048 input points, but querying only 32 points around each of 32 farthest points, the points considered from a given frame at a time are limited to 32x32 i.e., 1024 points. And these points are clusters of 32 points around each of the 32 farthest points due to ball querying.

    Please help me understand the intuition behind the design.

    Thanks in advance.

    opened by sheshap 3
Owner
Hehe Fan
Research fellow at the National University of Singapore.
Hehe Fan
Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper. This code is for the part of the paper describing video-based avatars.

NeuralTextures This is repository with inference code for paper "StylePeople: A Generative Model of Fullbody Human Avatars" (CVPR21). This code is for

Visual Understanding Lab @ Samsung AI Center Moscow 18 Oct 6, 2022
Code for paper ECCV 2020 paper: Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization in the Loop.

Who Left the Dogs Out? Evaluation and demo code for our ECCV 2020 paper: Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization

Benjamin Biggs 29 Dec 28, 2022
The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

Deep High-Resolution Representation Learning for Human Pose Estimation (CVPR 2019) News [2020/07/05] A very nice blog from Towards Data Science introd

Leo Xiao 3.9k Jan 5, 2023
Home repository for the Regularized Greedy Forest (RGF) library. It includes original implementation from the paper and multithreaded one written in C++, along with various language-specific wrappers.

Regularized Greedy Forest Regularized Greedy Forest (RGF) is a tree ensemble machine learning method described in this paper. RGF can deliver better r

RGF-team 364 Dec 28, 2022
Official implementation of AAAI-21 paper "Label Confusion Learning to Enhance Text Classification Models"

Description: This is the official implementation of our AAAI-21 accepted paper Label Confusion Learning to Enhance Text Classification Models. The str

null 101 Nov 25, 2022
Official PyTorch implementation for paper Context Matters: Graph-based Self-supervised Representation Learning for Medical Images

Context Matters: Graph-based Self-supervised Representation Learning for Medical Images Official PyTorch implementation for paper Context Matters: Gra

null 49 Nov 23, 2022
A PyTorch re-implementation of the paper 'Exploring Simple Siamese Representation Learning'. Reproduced the 67.8% Top1 Acc on ImageNet.

Exploring simple siamese representation learning This is a PyTorch re-implementation of the SimSiam paper on ImageNet dataset. The results match that

Taojiannan Yang 72 Nov 9, 2022
Implementation of the paper NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series Forecasting.

Non-AR Spatial-Temporal Transformer Introduction Implementation of the paper NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series For

Chen Kai 66 Nov 28, 2022
This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data.

This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data.

null 212 Dec 25, 2022
Official implementation of the ICLR 2021 paper

You Only Need Adversarial Supervision for Semantic Image Synthesis Official PyTorch implementation of the ICLR 2021 paper "You Only Need Adversarial S

Bosch Research 272 Dec 28, 2022
Implementation of Nyström Self-attention, from the paper Nyströmformer

Nyström Attention Implementation of Nyström Self-attention, from the paper Nyströmformer. Yannic Kilcher video Install $ pip install nystrom-attention

Phil Wang 95 Jan 2, 2023
Implementation of SETR model, Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.

SETR - Pytorch Since the original paper (Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.) has no official

zhaohu xing 112 Dec 16, 2022
Official implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis https://arxiv.org/abs/2011.13775

CIPS -- Official Pytorch Implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis Requirements pip install -r requi

Multimodal Lab @ Samsung AI Center Moscow 201 Dec 21, 2022
Official pytorch implementation of paper "Image-to-image Translation via Hierarchical Style Disentanglement".

HiSD: Image-to-image Translation via Hierarchical Style Disentanglement Official pytorch implementation of paper "Image-to-image Translation

null 364 Dec 14, 2022
PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

Neural Scene Flow Fields PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 20

Zhengqi Li 585 Jan 4, 2023
Implementation of Barlow Twins paper

barlowtwins PyTorch Implementation of Barlow Twins paper: Barlow Twins: Self-Supervised Learning via Redundancy Reduction This is currently a work in

IgorSusmelj 86 Dec 20, 2022
Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

IC-Conv This repository is an official implementation of the paper Inception Convolution with Efficient Dilation Search. Getting Started Download Imag

Jie Liu 111 Dec 31, 2022
Official implementation of our paper "LLA: Loss-aware Label Assignment for Dense Pedestrian Detection" in Pytorch.

LLA: Loss-aware Label Assignment for Dense Pedestrian Detection This project provides an implementation for "LLA: Loss-aware Label Assignment for Dens

null 35 Dec 6, 2022
Functional TensorFlow Implementation of Singular Value Decomposition for paper Fast Graph Learning

tf-fsvd TensorFlow Implementation of Functional Singular Value Decomposition for paper Fast Graph Learning with Unique Optimal Solutions Cite If you f

Sami Abu-El-Haija 14 Nov 25, 2021