Implementation of the "Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos" paper.

Overview

Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos

Introduction

Point cloud videos exhibit irregularities and lack of order along the spatial dimension where points emerge inconsistently across different frames. To capture the dynamics in point cloud videos, point tracking is usually employed. However, as points may flow in and out across frames, computing accurate point trajectories is extremely difficult. Moreover, tracking usually relies on point colors and thus may fail to handle colorless point clouds. In this paper, to avoid point tracking, we propose a novel Point 4D Transformer (P4Transformer) network to model raw point cloud videos. Specifically, P4Transformer consists of (i) a point 4D convolution to embed the spatio-temporal local structures presented in a point cloud video and (ii) a transformer to capture the appearance and motion information across the entire video by performing self-attention on the embedded local features. In this fashion, related or similar local areas are merged with attention weight rather than by explicit tracking.

Installation

The code is tested with Red Hat Enterprise Linux Workstation release 7.7 (Maipo), g++ (GCC) 8.3.1, PyTorch (both v1.4.0 and v1.8.1 are supported), CUDA 10.2 and cuDNN v7.6.

Compile the CUDA layers for PointNet++, which we used for furthest point sampling (FPS) and radius neighbouring search:

mv modules-pytorch-1.4.0/modules-pytorch-1.8.1 modules
cd modules
python setup.py install

Citation

If you find our work useful in your research, please consider citing:

@inproceedings{fan21p4transformer,
  author    = {Hehe Fan and
               Yi Yang and
               Mohan Kankanhalli},
  title     = {Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos},
  booktitle = {{IEEE/CVF} Conference on Computer Vision and Pattern Recognition, {CVPR}},
  year      = {2021}
}

Related Repos

  1. PointNet++ PyTorch implementation: https://github.com/facebookresearch/votenet/tree/master/pointnet2
  2. MeteorNet: https://github.com/xingyul/meteornet
  3. 3DV: https://github.com/3huo/3DV-Action
  4. PSTNet: https://github.com/hehefan/Point-Spatio-Temporal-Convolution
  5. Transformer: https://github.com/lucidrains/vit-pytorch
  6. PointRNN (TensorFlow implementation): https://github.com/hehefan/PointRNN
  7. PointRNN (PyTorch implementation): https://github.com/hehefan/PointRNN-PyTorch
Issues
  • collab example?

    collab example?

    Hello, I have just discovered your project. how could i use it to do inference on a real time feed?

    i am trying to feed a depth image or a point cloud to "some algorithm" that will put them together and store them in a ply. i think this is called point cloud registration? but it overlaps with sfM, where i would take N camera shots and then store a point cloud, then do N+M shots and compare with the previous point cloud to see how much did the new M steps contribute to adding "relevant" points to the PC. or some variant of that maybe with this project.

    opened by Ademord 8
  • Transfer of points to nearest frames

    Transfer of points to nearest frames

    @hehefan

    Can you please explain what does transfer of points to nearest frames mean? how are they transferred? (I understand those anchor points are picked using Farthest point sampling)

    image

    Thanks in Advance.

    opened by sheshap 3
  • Including Color (RGB) or other features

    Including Color (RGB) or other features

    Hi, I'm just curious if it's possible to add color information in the form of RGB channels, or even other point-wise features, as input to the model. And, if it is, what kind of modifications would be required on the model architecture. Is it possible/logical to just add extra channels so that instead of passing xyz tensors, we could feed the model the stacked xyzrgb tensors.

    Thanks.

    opened by ShadiZaki 1
  • visualize tansformer's attention

    visualize tansformer's attention

    I want to visualize tansformer's attention. I see that Fig4 in your paper visualizes it. Can you tell me where and how to visualize it? Can you share the visualization code? Thank you

    opened by weiyutao886 5
  • Intuition behind choice of input points, ball query radius, nsamples and spatial-stride

    Intuition behind choice of input points, ball query radius, nsamples and spatial-stride

    Dear @hehefan ,

    From the train-msr.py file, I see the default values input points = 2048, ball query radius = 0.7, nsamples = 32, spatial-stride = 32

    On a given point cloud frame that contains 2048 points, the farthest points sampled (FPS) i.e., 32 are sampled (smaller). Around each of them, a ball with a radius of 0.7 (bigger) is used to query 32 points (very small number).

    I have a few questions to understand the setting.

    1. Wouldn't the queried points (32) around the farthest point be very close to it since the overall object contains 2048 points(dense)?
    2. What is the intuition behind choosing a bigger radius but querying only 32 points at each FPS point?

    I suspect due to 2048 input points, but querying only 32 points around each of 32 farthest points, the points considered from a given frame at a time are limited to 32x32 i.e., 1024 points. And these points are clusters of 32 points around each of the 32 farthest points due to ball querying.

    Please help me understand the intuition behind the design.

    Thanks in advance.

    opened by sheshap 3
  • Dataloader Error:_pickle.UnpicklingError: invalid load key, '\x27'.

    Dataloader Error:_pickle.UnpicklingError: invalid load key, '\x27'.

    Thanks a lot for your code! But when i try to run train-msr.py with the dataset MSR-Action3D, i got this error. it seems that there is something wrong loading data,but i have no idea why. PS:i use the data in the MSR-Action3D/Depth.rar

    the output is like below:

    Traceback (most recent call last): File "/home/yuhao/anaconda3/envs/python38/lib/python3.8/site-packages/numpy/lib/npyio.py", line 447, in load return pickle.load(fid, **pickle_kwargs) _pickle.UnpicklingError: invalid load key, '\x27'.

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "train-msr.py", line 257, in main(args) File "train-msr.py", line 127, in main dataset = MSRAction3D( File "/home/yuhao/P4Transformer-main/P4Transformer-main/datasets/msr.py", line 16, in init video = np.load(os.path.join(root, video_name), allow_pickle=True)['point_clouds'] File "/home/yuhao/anaconda3/envs/python38/lib/python3.8/site-packages/numpy/lib/npyio.py", line 449, in load raise IOError( OSError: Failed to interpret file '/home/yuhao/P4Transformer-main/P4Transformer-main/Depth/a08_s04_e03_sdepth.bin' as a pickle

    Looking forward to your help!

    opened by dyh-Jack 3
Owner
Hehe Fan
Research fellow at the National University of Singapore.
Hehe Fan
Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper. This code is for the part of the paper describing video-based avatars.

NeuralTextures This is repository with inference code for paper "StylePeople: A Generative Model of Fullbody Human Avatars" (CVPR21). This code is for

Visual Understanding Lab @ Samsung AI Center Moscow 17 Jun 9, 2022
Code for paper ECCV 2020 paper: Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization in the Loop.

Who Left the Dogs Out? Evaluation and demo code for our ECCV 2020 paper: Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization

Benjamin Biggs 27 Jun 26, 2022
The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

Deep High-Resolution Representation Learning for Human Pose Estimation (CVPR 2019) News [2020/07/05] A very nice blog from Towards Data Science introd

Leo Xiao 3.8k Jun 26, 2022
Home repository for the Regularized Greedy Forest (RGF) library. It includes original implementation from the paper and multithreaded one written in C++, along with various language-specific wrappers.

Regularized Greedy Forest Regularized Greedy Forest (RGF) is a tree ensemble machine learning method described in this paper. RGF can deliver better r

RGF-team 358 Jun 8, 2022
Official implementation of AAAI-21 paper "Label Confusion Learning to Enhance Text Classification Models"

Description: This is the official implementation of our AAAI-21 accepted paper Label Confusion Learning to Enhance Text Classification Models. The str

null 90 Jun 27, 2022
Official PyTorch implementation for paper Context Matters: Graph-based Self-supervised Representation Learning for Medical Images

Context Matters: Graph-based Self-supervised Representation Learning for Medical Images Official PyTorch implementation for paper Context Matters: Gra

null 46 Jun 4, 2022
A PyTorch re-implementation of the paper 'Exploring Simple Siamese Representation Learning'. Reproduced the 67.8% Top1 Acc on ImageNet.

Exploring simple siamese representation learning This is a PyTorch re-implementation of the SimSiam paper on ImageNet dataset. The results match that

Taojiannan Yang 71 Jun 13, 2022
Implementation of the paper NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series Forecasting.

Non-AR Spatial-Temporal Transformer Introduction Implementation of the paper NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series For

Chen Kai 55 Jun 20, 2022
This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data.

This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data.

null 168 Jun 27, 2022
Official implementation of the ICLR 2021 paper

You Only Need Adversarial Supervision for Semantic Image Synthesis Official PyTorch implementation of the ICLR 2021 paper "You Only Need Adversarial S

Bosch Research 225 Jun 24, 2022
Implementation of Nyström Self-attention, from the paper Nyströmformer

Nyström Attention Implementation of Nyström Self-attention, from the paper Nyströmformer. Yannic Kilcher video Install $ pip install nystrom-attention

Phil Wang 83 May 6, 2022
Implementation of SETR model, Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.

SETR - Pytorch Since the original paper (Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.) has no official

zhaohu xing 96 Jun 13, 2022
Official implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis https://arxiv.org/abs/2011.13775

CIPS -- Official Pytorch Implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis Requirements pip install -r requi

Multimodal Lab @ Samsung AI Center Moscow 191 Jun 22, 2022
Official pytorch implementation of paper "Image-to-image Translation via Hierarchical Style Disentanglement".

HiSD: Image-to-image Translation via Hierarchical Style Disentanglement Official pytorch implementation of paper "Image-to-image Translation

null 351 Jun 29, 2022
PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

Neural Scene Flow Fields PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 20

Zhengqi Li 499 Jun 30, 2022
Implementation of Barlow Twins paper

barlowtwins PyTorch Implementation of Barlow Twins paper: Barlow Twins: Self-Supervised Learning via Redundancy Reduction This is currently a work in

IgorSusmelj 79 Jun 19, 2022
Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

IC-Conv This repository is an official implementation of the paper Inception Convolution with Efficient Dilation Search. Getting Started Download Imag

Jie Liu 62 Mar 21, 2022
Official implementation of our paper "LLA: Loss-aware Label Assignment for Dense Pedestrian Detection" in Pytorch.

LLA: Loss-aware Label Assignment for Dense Pedestrian Detection This project provides an implementation for "LLA: Loss-aware Label Assignment for Dens

null 33 Mar 31, 2022
Functional TensorFlow Implementation of Singular Value Decomposition for paper Fast Graph Learning

tf-fsvd TensorFlow Implementation of Functional Singular Value Decomposition for paper Fast Graph Learning with Unique Optimal Solutions Cite If you f

Sami Abu-El-Haija 14 Nov 25, 2021