Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Paper Accepted to CVPR 2022.

Introduction

Multi-task indoor scene understanding is widely considered as an intriguing formulation, as the affinity of different tasks may lead to improved performance. In this paper, we tackle the new problem of joint semantic, affordance and attribute parsing. However, successfully resolving it requires a model to capture long-range dependency, learn from weakly aligned data and properly balance sub-tasks during training. To this end, we propose an attention-based architecture named Cerberus and a tailored training framework. Our method effectively addresses aforementioned challenges and achieves state-of-the-art performance on all three tasks. Moreover, an in-depth analysis shows concept affinity consistent with human cognition, which inspires us to explore the possibility of extremely low-shot learning. Surprisingly, Cerberus achieves strong results using only 0.1%-1% annotation. Visualizations further confirm that this success is credited to common attention maps across tasks. Code and models are publicly available.

Citation

If you find our work useful in your research, please consider citing:

@misc{chen2021cerberus,
    title={Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing}, 
    author={Xiaoxue Chen and Tianyu Liu and Hao Zhao and Guyue Zhou and Ya-Qin Zhang},
    year={2021},
    eprint={2111.12608},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Installation

Requirements

torch==1.8.1
torchvision==0.9.1
opencv-python==4.5.2
timm==0.5.4

Data preparation

Then, prepare NYUd2 dataset or your own dataset.

NYUd2 dataset should have the following hierachy:

dataset_path
|   info.json
|   train_images.txt
|   train_labels.txt
|   val_iamges.txt
|   val_labels.txt
|
└───image(semantic image folder)
|     └───...
└───gt_sem_40(semantic label folder)
|     └───...
|
|   train_attribute_images.txt
|   train_attribute_labels.txt
|   val_attribute_iamges.txt
|   val_attribute_labels.txt
|
└───attribute(attribute image and label folder)
|     └───aNYU
|           └───...
|
|   train_affordance_images.txt
|   train_affordance_labels.txt
|   val_affordance_iamges.txt
|   val_affordance_labels.txt
|
└───affordance(affordance image and label folder)
      └───Affordance_ground_truth
            └───...

Attribute

Download prepocessed attribute dataset HERE

Affordance

Download prepocessed affordance dataset HERE

Semantic

Download prepocessed semantic dataset HERE

Run Pre-trained Model

You can download pre-trained Cerberus model HERE.

Training and evaluating

To train a Cerberus on NYUd2 with a single GPU:

CUDA_VISIBLE_DEVICES=0 python main.py train -d [dataset_path] -s 512 --batch-size 2 --random-scale 2 --random-rotate 10 --epochs 200 --lr 0.007 --momentum 0.9 --lr-mode poly --workers 12

To test the trained model with its checkpoint:

CUDA_VISIBLE_DEVICES=0 python main.py test -d [dataset_path]  -s 512 --resume model_best.pth.tar --phase val --batch-size 1 --ms --workers 10

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
doc		doc
model		model
classify.py		classify.py
data_transforms.py		data_transforms.py
drn.py		drn.py
main.py		main.py
min_norm_solvers.py		min_norm_solvers.py
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doc

doc

model

model

classify.py

classify.py

data_transforms.py

data_transforms.py

drn.py

drn.py

main.py

main.py

min_norm_solvers.py

min_norm_solvers.py

readme.md

readme.md

Repository files navigation

Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Introduction

Citation

Installation

Requirements

Data preparation

Attribute

Affordance

Semantic

Run Pre-trained Model

Training and evaluating

About

Releases

Packages

Contributors 3

Languages

OPEN-AIR-SUN/Cerberus

Folders and files

Latest commit

History

Repository files navigation

Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Introduction

Citation

Installation

Requirements

Data preparation

Attribute

Affordance

Semantic

Run Pre-trained Model

Training and evaluating

About

Resources

Stars

Watchers

Forks

Languages