Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Last update: Dec 5, 2022

Related tags

Deep Learning Cerberus

Overview

Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Introduction

Multi-task indoor scene understanding is widely considered as an intriguing formulation, as the affinity of different tasks may lead to improved performance. In this paper, we tackle the new problem of joint semantic, affordance and attribute parsing. However, successfully resolving it requires a model to capture long-range dependency, learn from weakly aligned data and properly balance sub-tasks during training. To this end, we propose an attention-based architecture named Cerberus and a tailored training framework. Our method effectively addresses aforementioned challenges and achieves state-of-the-art performance on all three tasks. Moreover, an in-depth analysis shows concept affinity consistent with human cognition, which inspires us to explore the possibility of extremely low-shot learning. Surprisingly, Cerberus achieves strong results using only 0.1%-1% annotation. Visualizations further confirm that this success is credited to common attention maps across tasks. Code and models are publicly available.

Citation

If you find our work useful in your research, please consider citing:

Installation

Requirements

Data preparation

Attribute

Affordance

Semantic

Run Pre-trained Model

You can download pre-trained model HERE.

Training and evaluating

To train a Cerberus on NYUd2 with a single GPU:

CUDA_VISIBLE_DEVICES=0 python main.py train -d [dataset_path] -s 512 --batch-size 2 --random-scale 2 --random-rotate 10 --epochs 200 --lr 0.007 --momentum 0.9 --lr-mode poly --workers 12

To test the trained model with its checkpoint:

CUDA_VISIBLE_DEVICES=0 python main.py test -d [dataset_path]  -s 512 --resume model_best.pth.tar --phase val --batch-size 1 --ms --workers 10

You might also like...

Pytorch implementation for Semantic Segmentation/Scene Parsing on MIT ADE20K dataset

Semantic Segmentation on MIT ADE20K dataset in PyTorch This is a PyTorch implementation of semantic segmentation models on MIT ADE20K scene parsing da

4.5k Jan 8, 2023

ICLR 2021: Pre-Training for Context Representation in Conversational Semantic Parsing

SCoRe: Pre-Training for Context Representation in Conversational Semantic Parsing This repository contains code for the ICLR 2021 paper "SCoRE: Pre-Tr

28 Oct 2, 2022

Pytorch implementation for Semantic Segmentation/Scene Parsing on MIT ADE20K dataset

Semantic Segmentation on MIT ADE20K dataset in PyTorch This is a PyTorch implementation of semantic segmentation models on MIT ADE20K scene parsing da

4.5k Jan 8, 2023

Confident Semantic Ranking Loss for Part Parsing

5 Oct 22, 2022

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

225 Nov 13, 2022

Pytorch implementation of “Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement”

Graph-to-Graph Transformers Self-attention models, such as Transformer, have been hugely successful in a wide range of natural language processing (NL

40 Aug 14, 2022

Deepface is a lightweight face recognition and facial attribute analysis (age, gender, emotion and race) framework for python

deepface Deepface is a lightweight face recognition and facial attribute analysis (age, gender, emotion and race) framework for python. It is a hybrid

2 Feb 10, 2022

A very simple tool to rewrite parameters such as attributes and constants for OPs in ONNX models. Simple Attribute and Constant Modifier for ONNX.

sam4onnx A very simple tool to rewrite parameters such as attributes and constants for OPs in ONNX models. Simple Attribute and Constant Modifier for

6 May 15, 2022

Disentangled Face Attribute Editing via Instance-Aware Latent Space Search, accepted by IJCAI 2021.

Instance-Aware Latent-Space Search This is a PyTorch implementation of the following paper: Disentangled Face Attribute Editing via Instance-Aware Lat

67 Dec 21, 2022

Comments

AttributeError: 'VisionTransformer' object has no attribute 'dist_token'

I've tried to evaluate your pretrained model following the instructions, but I'm stuck with the 'dist_token' not being an attribute of VisionTransformer

The environment has the following versions:

python==3.6.13
torch==1.8.1
torchvision==0.9.1
opencv-python==4.5.2
timm==0.4.5
CUDAtoolkit==11.1

Maybe I'm missing a file or a custom build?

Full traceback:

$ CUDA_VISIBLE_DEVICES=0 python main.py test -d dataset_path -s 512 --resume model_best.pth.tar --phase val --batch-size 1 --ms --workers 10 --classes 4
main.py test -d dataset_path -s 512 --resume model_best.pth.tar --phase val --batch-size 1 --ms --workers 10 --classes 4                                      
Namespace(cmd='test', data_dir='dataset_path', classes=4, crop_size=512, step=200, arch=None, batch_size=1, epochs=10, lr=0.01, lr_mode='step', momentum=0.9, 
weight_decay=0.0001, evaluate=False, resume='model_best.pth.tar', trans_resume='', pretrained='', pretrained_model='', workers=10, load_rel=None, phase='val',
 random_scale=0, random_rotate=0, bn_sync=False, ms=True, trans=False, with_gt=False, test_suffix='')                                                         
cmd : test                                                                                                                                                    
data_dir : dataset_path                                                                                                                                       
classes : 4                                                                                                                                                   
crop_size : 512                                                                
step : 200                                                                                                                                                    
arch : None                                                                                                                                                   
batch_size : 1                                                                                                                                                
epochs : 10                                                                                                                                                   
lr : 0.01                                                                                                                                                     
lr_mode : step                                                                                                                                                
momentum : 0.9                                                                                                                                                
weight_decay : 0.0001                                                          
evaluate : False                                                                                                                                              
resume : model_best.pth.tar                                                                                                                                   
trans_resume :                         
pretrained :                                                                                                                                                  
pretrained_model :                     
workers : 10                           
load_rel : None                                                                                                                                               
phase : val                                                                                                                                                   
random_scale : 0                       
random_rotate : 0                      
bn_sync : False                        
ms : True                              
trans : False                          
with_gt : False                        
test_suffix :
[W NNPACK.cpp:80] Could not initialize NNPACK! Reason: Unsupported hardware.                                                                                  
/users/mmazuecos/miniconda3/envs/cerberus/lib/python3.9/site-packages/torch/nn/functional.py:3454: UserWarning: Default upsampling behavior when mode=bilinear
 is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for det
ails.                                  
  warnings.warn(                       
Traceback (most recent call last):                                             
  File "/users/mmazuecos/Cerberus/main.py", line 1798, in <module>                                                                                            
    main()                             
  File "/users/mmazuecos/Cerberus/main.py", line 1794, in main                                                                                                
    test_seg_cerberus(args)                                                    
  File "/users/mmazuecos/Cerberus/main.py", line 1719, in test_seg_cerberus                                                                                   
    mAP = test_ms_cerberus(test_loader_list, model, save_vis=True,                                                                                            
  File "/users/mmazuecos/Cerberus/main.py", line 1491, in test_ms_cerberus                                                                                    
    final, _, _ = model(image_var, index)                                      
  File "/users/mmazuecos/miniconda3/envs/cerberus/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl                            
    result = self.forward(*input, **kwargs)                                                                                                                   
  File "/users/mmazuecos/Cerberus/model/models.py", line 379, in forward                                                                                      
    layer_1, layer_2, layer_3, layer_4 = forward_vit(self.pretrained, x)                                                                                      
  File "/users/mmazuecos/Cerberus/model/vit.py", line 109, in forward_vit                                                                                     
    glob = pretrained.model.forward_flex(x)                                                                                                                   
  File "/users/mmazuecos/Cerberus/model/vit.py", line 184, in forward_flex                                                                                    
    if self.dist_token:                                                        
  File "/users/mmazuecos/miniconda3/envs/cerberus/lib/python3.9/site-packages/torch/nn/modules/module.py", line 947, in __getattr__                           
    raise AttributeError("'{}' object has no attribute '{}'".format(

opened by mmazuecos 3

Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Related tags

Overview

Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Introduction

Citation

Installation

Requirements

Data preparation

Attribute

Affordance

Semantic

Run Pre-trained Model

Training and evaluating

You might also like...

Pytorch implementation for Semantic Segmentation/Scene Parsing on MIT ADE20K dataset

ICLR 2021: Pre-Training for Context Representation in Conversational Semantic Parsing

Pytorch implementation for Semantic Segmentation/Scene Parsing on MIT ADE20K dataset

Confident Semantic Ranking Loss for Part Parsing

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

Pytorch implementation of “Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement”

Deepface is a lightweight face recognition and facial attribute analysis (age, gender, emotion and race) framework for python

A very simple tool to rewrite parameters such as attributes and constants for OPs in ONNX models. Simple Attribute and Constant Modifier for ONNX.

Disentangled Face Attribute Editing via Instance-Aware Latent Space Search, accepted by IJCAI 2021.

Comments

AttributeError: 'VisionTransformer' object has no attribute 'dist_token'

Owner

Official PyTorch implementation of Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations

Pytorch implementation of One-Shot Affordance Detection

O2O-Afford: Annotation-Free Large-Scale Object-Object Affordance Learning (CoRL 2021)

One implementation of the paper "DMRST: A Joint Framework for Document-Level Multilingual RST Discourse Segmentation and Parsing".

Implementation of fast algorithms for Maximum Spanning Tree (MST) parsing that includes fast ArcMax+Reweighting+Tarjan algorithm for single-root dependency parsing.

joint detection and semantic segmentation, based on ultralytics/yolov5,

An official implementation of "Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation" (ICCV 2021) in PyTorch.

Open source code for Paper "A Co-Interactive Transformer for Joint Slot Filling and Intent Detection"

Release of SPLASH: Dataset for semantic parse correction with natural language feedback in the context of text-to-SQL parsing

:hot_pepper: R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)