SAT: 2D Semantics Assisted Training for 3D Visual Grounding, ICCV 2021 (Oral)

Related tags

Deep Learning SAT
Overview

SAT: 2D Semantics Assisted Training for 3D Visual Grounding

SAT: 2D Semantics Assisted Training for 3D Visual Grounding

by Zhengyuan Yang, Songyang Zhang, Liwei Wang, and Jiebo Luo

IEEE International Conference on Computer Vision (ICCV), 2021, Oral

Introduction

We propose 2D Semantics Assisted Training (SAT) that assists 3D visual grounding with 2D semantics. SAT helps 3D tasks with 2D semantics in training but does not require 2D inputs during inference. For more details, please refer to our paper.

Citation

@inproceedings{yang2021sat,
  title={SAT: 2D Semantics Assisted Training for 3D Visual Grounding},
  author={Yang, Zhengyuan and Zhang, Songyang and Wang, Liwei and Luo, Jiebo},
  booktitle={ICCV},
  year={2021}
}

Prerequisites

  • Python 3.6.9 (e.g., conda create -n sat_env python=3.6.9 cudatoolkit=10.0)
  • Pytorch 1.2.0 (e.g., conda install pytorch==1.2.0 cudatoolkit=10.0 -c pytorch)
  • Install other common packages (numpy, pytorch_transformers, etc.)
  • Please refer to setup.py (From ReferIt3D).

Installation

  • Clone the repository

    git clone https://github.com/zyang-ur/SAT.git
    cd SAT
    pip install -e .
    
  • To use a PointNet++ visual-encoder you need to compile its CUDA layers for PointNet++: Note: To do this compilation also need: gcc5.4 or later.

    cd external_tools/pointnet2
    python setup.py install
    

Data

ScanNet

First you should download the train/val scans of ScanNet if you do not have them locally. Please refer to the instructions from referit3d for more details. The output is the scanfile keep_all_points_00_view_with_global_scan_alignment.pkl/keep_all_points_with_global_scan_alignment.pkl.

Ref3D Linguistic Data

You can dowload the Nr3D and Sr3D/Sr3D+ from Referit3D, and send the file path to referit3D-file.

SAT Processed 2D Features

You can download the processed 2D object image features from here. The cached feature should be placed under the referit3d/data folder, or match the cache path in the dataloader. The feature extraction code will be cleaned and released in the future. Meanwhile, feel free to contact me if you need that before the official release.

Training

Please reference the following example command on Nr3D. Feel free to change the parameters. Please reference arguments for valid options.

cd referit3d/scripts
scanfile=keep_all_points_00_view_with_global_scan_alignment.pkl ## keep_all_points_with_global_scan_alignment if include Sr3D
python train_referit3d.py --patience 100 --max-train-epochs 100 --init-lr 1e-4 --batch-size 16 --transformer --model mmt_referIt3DNet -scannet-file $scanfile -referit3D-file $nr3dfile_csv --log-dir log/$exp_id --n-workers 2 --gpu 0 --unit-sphere-norm True --feat2d clsvecROI --context_2d unaligned --mmt_mask train2d --warmup

Evaluation

Please find the pretrained models here (clsvecROI on Nr3D). A known issue here.

cd referit3d/scripts
python train_referit3d.py --transformer --model mmt_referIt3DNet -scannet-file $scanfile -referit3D-file $nr3dfile --log-dir log/$exp_id --n-workers 2 --gpu $gpu --unit-sphere-norm True --feat2d clsvecROI --mode evaluate --pretrain-path $pretrain_path/best_model.pth

Credits

The project is built based on the following repository:

Part of the code or models are from ScanRef, MMF, TAP, and ViLBERT.

You might also like...
Code for the USENIX 2017 paper: kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels

kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels Blazing fast x86-64 VM kernel fuzzing framework with performant VM reloads for Linux, MacOS an

Meta graph convolutional neural network-assisted resilient swarm communications
Meta graph convolutional neural network-assisted resilient swarm communications

Resilient UAV Swarm Communications with Graph Convolutional Neural Network This repository contains the source codes of Resilient UAV Swarm Communicat

This is the code for CVPR 2021 oral paper: Jigsaw Clustering for Unsupervised Visual Representation Learning

JigsawClustering Jigsaw Clustering for Unsupervised Visual Representation Learning Pengguang Chen, Shu Liu, Jiaya Jia Introduction This project provid

Code for
Code for "Human Pose Regression with Residual Log-likelihood Estimation", ICCV 2021 Oral

Human Pose Regression with Residual Log-likelihood Estimation [Paper] [arXiv] [Project Page] Human Pose Regression with Residual Log-likelihood Estima

Improving Contrastive Learning by Visualizing Feature Transformation, ICCV 2021 Oral
Improving Contrastive Learning by Visualizing Feature Transformation, ICCV 2021 Oral

Improving Contrastive Learning by Visualizing Feature Transformation This project hosts the codes, models and visualization tools for the paper: Impro

BARF: Bundle-Adjusting Neural Radiance Fields 🤮 (ICCV 2021 oral)

BARF 🤮 : Bundle-Adjusting Neural Radiance Fields Chen-Hsuan Lin, Wei-Chiu Ma, Antonio Torralba, and Simon Lucey IEEE International Conference on Comp

[ICCV 2021 Oral] PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers
[ICCV 2021 Oral] PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers

PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers Created by Xumin Yu*, Yongming Rao*, Ziyi Wang, Zuyan Liu, Jiwen Lu, Jie Zhou

Code for the ICCV 2021 paper
Code for the ICCV 2021 paper "Pixel Difference Networks for Efficient Edge Detection" (Oral).

Pixel Difference Convolution This repository contains the PyTorch implementation for "Pixel Difference Networks for Efficient Edge Detection" by Zhuo

[ICCV 2021 Oral] NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo
[ICCV 2021 Oral] NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo

NerfingMVS Project Page | Paper | Video | Data NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo Yi Wei, Shaohui

Comments
  • Feature Extraction Code for the 2D part

    Feature Extraction Code for the 2D part

    Dears,

    Thanks for sharing your amazing work. I am just wondering if you will share the Feature Extraction Code for the 2D part? I was planning to send to you asking for them but unfortunately, the contact info is not working.

    Thanks in advance.

    opened by eslambakr 0
  • How to train and evaluate on ScanRef dataset?

    How to train and evaluate on ScanRef dataset?

    Thanks for your great job and code! The specific process of equipping the ReferIt3d has been explicitly explained in the "readme", however, I don't find how to equip the code with ScanRef dataset. Am I missing something you actually have mentioned?

    opened by SxJyJay 0
  • Cannot reproduce the results in the paper

    Cannot reproduce the results in the paper

    I run the training code with command provided in README, the results are lower than that in the paper.

    | Accuracy | SAT | Reproduce | | :-------: | :--: | :-------: | | Nr3d | 49.2 | 46.0 | | ScanRefer | 53.8 | 51.4 |

    opened by CurryYuan 2
Owner
Zhengyuan Yang
Zhengyuan Yang
The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation This repository is the official implementation of CVPR 2021 paper:

null 9 Nov 14, 2022
[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers

TubeDETR: Spatio-Temporal Video Grounding with Transformers Website • STVG Demo • Paper This repository provides the code for our paper. This includes

Antoine Yang 108 Dec 27, 2022
A SAT-based sudoku solver

SAT Sudoku solver A SAT-based Sudoku solver made in the context of a small project in the "Logic Problem Solving" class in the first year at the Polyt

Alexandre Malfreyt 5 Apr 15, 2022
[CVPR2021] Look before you leap: learning landmark features for one-stage visual grounding.

LBYL-Net This repo implements paper Look Before You Leap: Learning Landmark Features For One-Stage Visual Grounding CVPR 2021. Getting Started Prerequ

SVIP Lab 45 Dec 12, 2022
[ICCV2021] 3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds

3DVG-Transformer This repository is for the ICCV 2021 paper "3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds" Our method "3DV

null 22 Dec 11, 2022
SeqTR: A Simple yet Universal Network for Visual Grounding

SeqTR This is the official implementation of SeqTR: A Simple yet Universal Network for Visual Grounding, which simplifies and unifies the modelling fo

seanZhuh 76 Dec 24, 2022
Implementation for "Seamless Manga Inpainting with Semantics Awareness" (SIGGRAPH 2021 issue)

Seamless Manga Inpainting with Semantics Awareness [SIGGRAPH 2021](To appear) | Project Website | BibTex Introduction: Manga inpainting fills up the d

null 101 Jan 1, 2023
This repo is the code release of EMNLP 2021 conference paper "Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories".

Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories This repo is the code release of EMNLP 2021 con

null 12 Nov 22, 2022
PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World [ACL 2021]

piglet PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World [ACL 2021] This repo contains code and data for PIGLeT. If you like

Rowan Zellers 51 Oct 8, 2022
Planar Prior Assisted PatchMatch Multi-View Stereo

ACMP [News] The code for ACMH is released!!! [News] The code for ACMM is released!!! About This repository contains the code for the paper Planar Prio

Qingshan Xu 127 Dec 31, 2022