HAIS_2GNN: 3D Visual Grounding with Graph and Attention

Yue Chen

Last update: Nov 26, 2022

Related tags

Text Data & NLP nlp machine-learning deep-learning 3d visual-grounding gnn scanrefer instancerefer

Overview

HAIS_2GNN: 3D Visual Grounding with Graph and Attention

This repository is for the HAIS_2GNN research project.

Tao Gu, Yue Chen

Introduction

The motivation of this project is to improve the accuracy of 3D visual grounding. In this report, we propose a new model, named HAIS_2GNN based on the InstanceRefer model, to tackle the problem of insufficient connections between instance proposals. Our model incorporates a powerful instance segmentation model HAIS and strengthens the instance features by the structure of graph and attention, so that the text and point cloud can be better matched together. Experiments confirm that our method outperforms the InstanceRefer on ScanRefer validation datasets. Link to the technical report

Setup

The code is tested on Ubuntu 20.04.3 LTS with Python 3.9.7 PyTorch 1.10.1 CUDA 11.3.1 installed.

conda install pytorch==1.10.1 torchvision==0.11.2 cudatoolkit=11.3 -c pytorch

Install the necessary packages listed out in requirements.txt:

pip install -r requirements.txt

After all packages are properly installed, please run the following commands to compile the torchsaprse v1.4.0:

sudo apt-get install libsparsehash-dev
pip install --upgrade git+https://github.com/mit-han-lab/[email protected]

Before moving on to the next step, please don't forget to set the project root path to the CONF.PATH.BASE in lib/config.py.

Data preparation

Download the ScanRefer dataset and unzip it under data/.
Downloadand the preprocessed GLoVE embeddings (~990MB) and put them under data/.
Download the ScanNetV2 dataset and put (or link) scans/ under (or to) data/scannet/scans/ (Please follow the ScanNet Instructions for downloading the ScanNet dataset). After this step, there should be folders containing the ScanNet scene data under the data/scannet/scans/ with names like scene0000_00
Used official and pre-trained HAIS generate panoptic segmentation in PointGroupInst/. We will provide the pre-trained data soon.
Pre-processed instance labels, and new data should be generated in data/scannet/pointgroup_data/

cd data/scannet/
python prepare_data.py --split train --pointgroupinst_path [YOUR_PATH]
python prepare_data.py --split val   --pointgroupinst_path [YOUR_PATH]
python prepare_data.py --split test  --pointgroupinst_path [YOUR_PATH]

Finally, the dataset folder should be organized as follows.

InstanceRefer
├── data
│   ├── glove.p
│   ├── ScanRefer_filtered.json
│   ├── ...
│   ├── scannet
│   │  ├── meta_data
│   │  ├── pointgroup_data
│   │  │  ├── scene0000_00_aligned_bbox.npy
│   │  │  ├── scene0000_00_aligned_vert.npy
│   │  ├──├──  ... ...

Training

Train the InstanceRefer model. You can change hyper-parameters in config/InstanceRefer.yaml:

python scripts/train.py --log_dir HAIS_2GNN

Evaluation

You need specific the use_checkpoint with the folder that contains model.pth in config/InstanceRefer.yaml and run with:

python scripts/eval.py

Pre-trained Models

Input	[email protected] Unique	[email protected]	Checkpoints
xyz+rgb	39.24	33.66	will be released soon

TODO

Add pre-trained HAIS dataset.
Release pre-trained model.
Merge HAIS in an end-to-end manner.
Upload to ScanRefer benchmark

Changelog

02/09/2022: Released HAIS_2GNN

Acknowledgement

This work is a research project conducted by Tao Gu and Yue Chen for ADL4CV:Visual Computing course at the Technical University of Munich.

We acknowledge that our work is based on ScanRefer, InstanceRefer, HAIS, torchsaprse, and pytorch_geometric.

License

This repository is released under MIT License (see LICENSE file for details).

You might also like...

Implementation of the Hybrid Perception Block and Dual-Pruned Self-Attention block from the ITTR paper for Image to Image Translation using Transformers

ITTR - Pytorch Implementation of the Hybrid Perception Block (HPB) and Dual-Pruned Self-Attention (DPSA) block from the ITTR paper for Image to Image

17 Dec 23, 2022

Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch

Memorizing Transformers - Pytorch Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memori

364 Jan 6, 2023

Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

This is a fork of Fairseq(-py) with implementations of the following models: Pervasive Attention - 2D Convolutional Neural Networks for Sequence-to-Se

490 Dec 15, 2022

pytorch implementation of Attention is all you need

A Pytorch Implementation of the Transformer: Attention Is All You Need Our implementation is largely based on Tensorflow implementation Requirements N

230 Dec 7, 2022

A PyTorch implementation of the Transformer model in "Attention is All You Need".

Attention is all you need: A Pytorch Implementation This is a PyTorch implementation of the Transformer model in "Attention is All You Need" (Ashish V

7.1k Jan 5, 2023

multi-label，classifier，text classification，多标签文本分类，文本分类，BERT，ALBERT，multi-label-classification，seq2seq，attention，beam search

30 Dec 12, 2022

Modified GPT using average pooling to reduce the softmax attention memory constraints.

HAIS_2GNN: 3D Visual Grounding with Graph and Attention

Related tags

Overview

HAIS_2GNN: 3D Visual Grounding with Graph and Attention

Introduction

Setup

Data preparation

Training

Evaluation

Pre-trained Models

TODO

Changelog

Acknowledgement

License

You might also like...

Implementation of the Hybrid Perception Block and Dual-Pruned Self-Attention block from the ITTR paper for Image to Image Translation using Transformers

Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch

Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

pytorch implementation of Attention is all you need

A PyTorch implementation of the Transformer model in "Attention is All You Need".

multi-label，classifier，text classification，多标签文本分类，文本分类，BERT，ALBERT，multi-label-classification，seq2seq，attention，beam search

Modified GPT using average pooling to reduce the softmax attention memory constraints.

End-to-end image captioning with EfficientNet-b3 + LSTM with Attention

Multi-Scale Temporal Frequency Convolutional Network With Axial Attention for Speech Enhancement

Owner

Yue Chen

[ICCV 2021] Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification

Visual Automata is a Python 3 library built as a wrapper for Caleb Evans' Automata library to add more visualization features.

Learning Spatio-Temporal Transformer for Visual Tracking

A simple visual front end to the Maya UE4 RBF plugin delivered with MetaHumans

TalkNet: Audio-visual active speaker detection Model

Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

Which Apple Keeps Which Doctor Away? Colorful Word Representations with Visual Oracles

A Flask Sentiment Analysis API, with visual implementation

Intent parsing and slot filling in PyTorch with seq2seq + attention

Seq2seq attn - Use the Seq2Seq method to implement machine translation and introduce Attention mechanism to improve the results