SAT: 2D Semantics Assisted Training for 3D Visual Grounding, ICCV 2021 (Oral)

Zhengyuan Yang

Last update: Nov 30, 2022

Related tags

Deep Learning SAT

Overview

SAT: 2D Semantics Assisted Training for 3D Visual Grounding

by Zhengyuan Yang, Songyang Zhang, Liwei Wang, and Jiebo Luo

IEEE International Conference on Computer Vision (ICCV), 2021, Oral

Introduction

We propose 2D Semantics Assisted Training (SAT) that assists 3D visual grounding with 2D semantics. SAT helps 3D tasks with 2D semantics in training but does not require 2D inputs during inference. For more details, please refer to our paper.

Citation

@inproceedings{yang2021sat,
  title={SAT: 2D Semantics Assisted Training for 3D Visual Grounding},
  author={Yang, Zhengyuan and Zhang, Songyang and Wang, Liwei and Luo, Jiebo},
  booktitle={ICCV},
  year={2021}
}

Prerequisites

Python 3.6.9 (e.g., conda create -n sat_env python=3.6.9 cudatoolkit=10.0)
Pytorch 1.2.0 (e.g., conda install pytorch==1.2.0 cudatoolkit=10.0 -c pytorch)
Install other common packages (numpy, pytorch_transformers, etc.)
Please refer to setup.py (From ReferIt3D).

Installation

Clone the repository

git clone https://github.com/zyang-ur/SAT.git
cd SAT
pip install -e .

To use a PointNet++ visual-encoder you need to compile its CUDA layers for PointNet++: Note: To do this compilation also need: gcc5.4 or later.
```
cd external_tools/pointnet2
python setup.py install
```

Data

ScanNet

First you should download the train/val scans of ScanNet if you do not have them locally. Please refer to the instructions from referit3d for more details. The output is the scanfile keep_all_points_00_view_with_global_scan_alignment.pkl/keep_all_points_with_global_scan_alignment.pkl.

Ref3D Linguistic Data

You can dowload the Nr3D and Sr3D/Sr3D+ from Referit3D, and send the file path to referit3D-file.

SAT Processed 2D Features

You can download the processed 2D object image features from here. The cached feature should be placed under the referit3d/data folder, or match the cache path in the dataloader. The feature extraction code will be cleaned and released in the future. Meanwhile, feel free to contact me if you need that before the official release.

Training

Please reference the following example command on Nr3D. Feel free to change the parameters. Please reference arguments for valid options.

cd referit3d/scripts
scanfile=keep_all_points_00_view_with_global_scan_alignment.pkl ## keep_all_points_with_global_scan_alignment if include Sr3D
python train_referit3d.py --patience 100 --max-train-epochs 100 --init-lr 1e-4 --batch-size 16 --transformer --model mmt_referIt3DNet -scannet-file $scanfile -referit3D-file $nr3dfile_csv --log-dir log/$exp_id --n-workers 2 --gpu 0 --unit-sphere-norm True --feat2d clsvecROI --context_2d unaligned --mmt_mask train2d --warmup

Evaluation

Please find the pretrained models here (clsvecROI on Nr3D). A known issue here.

cd referit3d/scripts
python train_referit3d.py --transformer --model mmt_referIt3DNet -scannet-file $scanfile -referit3D-file $nr3dfile --log-dir log/$exp_id --n-workers 2 --gpu $gpu --unit-sphere-norm True --feat2d clsvecROI --mode evaluate --pretrain-path $pretrain_path/best_model.pth

Credits

The project is built based on the following repository:

ReferIt3D.

Part of the code or models are from ScanRef, MMF, TAP, and ViLBERT.

Comments

Feature Extraction Code for the 2D part

Dears,

Thanks for sharing your amazing work. I am just wondering if you will share the Feature Extraction Code for the 2D part? I was planning to send to you asking for them but unfortunately, the contact info is not working.

Thanks in advance.

opened by eslambakr 0
How to train and evaluate on ScanRef dataset?

Thanks for your great job and code! The specific process of equipping the ReferIt3d has been explicitly explained in the "readme", however, I don't find how to equip the code with ScanRef dataset. Am I missing something you actually have mentioned?

opened by SxJyJay 0
Cannot reproduce the results in the paper

I run the training code with command provided in README, the results are lower than that in the paper.

| Accuracy | SAT | Reproduce | | :-------: | :--: | :-------: | | Nr3d | 49.2 | 46.0 | | ScanRefer | 53.8 | 51.4 |

opened by CurryYuan 2

SAT: 2D Semantics Assisted Training for 3D Visual Grounding, ICCV 2021 (Oral)

Related tags

Overview

SAT: 2D Semantics Assisted Training for 3D Visual Grounding

Introduction

Citation

Prerequisites

Installation

Data

ScanNet

Ref3D Linguistic Data

SAT Processed 2D Features

Training

Evaluation

Credits

You might also like...

Code for the USENIX 2017 paper: kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels

Meta graph convolutional neural network-assisted resilient swarm communications

This is the code for CVPR 2021 oral paper: Jigsaw Clustering for Unsupervised Visual Representation Learning

Code for "Human Pose Regression with Residual Log-likelihood Estimation", ICCV 2021 Oral

Improving Contrastive Learning by Visualizing Feature Transformation, ICCV 2021 Oral

BARF: Bundle-Adjusting Neural Radiance Fields 🤮 (ICCV 2021 oral)

[ICCV 2021 Oral] PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers

Code for the ICCV 2021 paper "Pixel Difference Networks for Efficient Edge Detection" (Oral).

[ICCV 2021 Oral] NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo

Comments

Feature Extraction Code for the 2D part

How to train and evaluate on ScanRef dataset?

Cannot reproduce the results in the paper

Owner

Zhengyuan Yang

The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers

A SAT-based sudoku solver

[CVPR2021] Look before you leap: learning landmark features for one-stage visual grounding.

[ICCV2021] 3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds

SeqTR: A Simple yet Universal Network for Visual Grounding

Implementation for "Seamless Manga Inpainting with Semantics Awareness" (SIGGRAPH 2021 issue)

This repo is the code release of EMNLP 2021 conference paper "Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories".

PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World [ACL 2021]

Planar Prior Assisted PatchMatch Multi-View Stereo