Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Overview

Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Paper

Introduction

Multi-task indoor scene understanding is widely considered as an intriguing formulation, as the affinity of different tasks may lead to improved performance. In this paper, we tackle the new problem of joint semantic, affordance and attribute parsing. However, successfully resolving it requires a model to capture long-range dependency, learn from weakly aligned data and properly balance sub-tasks during training. To this end, we propose an attention-based architecture named Cerberus and a tailored training framework. Our method effectively addresses aforementioned challenges and achieves state-of-the-art performance on all three tasks. Moreover, an in-depth analysis shows concept affinity consistent with human cognition, which inspires us to explore the possibility of extremely low-shot learning. Surprisingly, Cerberus achieves strong results using only 0.1%-1% annotation. Visualizations further confirm that this success is credited to common attention maps across tasks. Code and models are publicly available.

Citation

If you find our work useful in your research, please consider citing:

Installation

Requirements

Data preparation

Attribute

Affordance

Semantic

Run Pre-trained Model

You can download pre-trained model HERE.

Training and evaluating

To train a Cerberus on NYUd2 with a single GPU:

CUDA_VISIBLE_DEVICES=0 python main.py train -d [dataset_path] -s 512 --batch-size 2 --random-scale 2 --random-rotate 10 --epochs 200 --lr 0.007 --momentum 0.9 --lr-mode poly --workers 12 

To test the trained model with its checkpoint:

CUDA_VISIBLE_DEVICES=0 python main.py test -d [dataset_path]  -s 512 --resume model_best.pth.tar --phase val --batch-size 1 --ms --workers 10
You might also like...
Pytorch implementation for Semantic Segmentation/Scene Parsing on MIT ADE20K dataset
Pytorch implementation for Semantic Segmentation/Scene Parsing on MIT ADE20K dataset

Semantic Segmentation on MIT ADE20K dataset in PyTorch This is a PyTorch implementation of semantic segmentation models on MIT ADE20K scene parsing da

ICLR 2021: Pre-Training for Context Representation in Conversational Semantic Parsing

SCoRe: Pre-Training for Context Representation in Conversational Semantic Parsing This repository contains code for the ICLR 2021 paper "SCoRE: Pre-Tr

Pytorch implementation for Semantic Segmentation/Scene Parsing on MIT ADE20K dataset
Pytorch implementation for Semantic Segmentation/Scene Parsing on MIT ADE20K dataset

Semantic Segmentation on MIT ADE20K dataset in PyTorch This is a PyTorch implementation of semantic segmentation models on MIT ADE20K scene parsing da

Confident Semantic Ranking Loss for Part Parsing

Confident Semantic Ranking Loss for Part Parsing

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).
VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

Pytorch implementation of “Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement”

Graph-to-Graph Transformers Self-attention models, such as Transformer, have been hugely successful in a wide range of natural language processing (NL

Deepface is a lightweight face recognition and facial attribute analysis (age, gender, emotion and race) framework for python
Deepface is a lightweight face recognition and facial attribute analysis (age, gender, emotion and race) framework for python

deepface Deepface is a lightweight face recognition and facial attribute analysis (age, gender, emotion and race) framework for python. It is a hybrid

A very simple tool to rewrite parameters such as attributes and constants for OPs in ONNX models. Simple Attribute and Constant Modifier for ONNX.
A very simple tool to rewrite parameters such as attributes and constants for OPs in ONNX models. Simple Attribute and Constant Modifier for ONNX.

sam4onnx A very simple tool to rewrite parameters such as attributes and constants for OPs in ONNX models. Simple Attribute and Constant Modifier for

Disentangled Face Attribute Editing via Instance-Aware Latent Space Search, accepted by IJCAI 2021.

Instance-Aware Latent-Space Search This is a PyTorch implementation of the following paper: Disentangled Face Attribute Editing via Instance-Aware Lat

Comments
  • AttributeError: 'VisionTransformer' object has no attribute 'dist_token'

    AttributeError: 'VisionTransformer' object has no attribute 'dist_token'

    I've tried to evaluate your pretrained model following the instructions, but I'm stuck with the 'dist_token' not being an attribute of VisionTransformer

    The environment has the following versions:

    • python==3.6.13
    • torch==1.8.1
    • torchvision==0.9.1
    • opencv-python==4.5.2
    • timm==0.4.5
    • CUDAtoolkit==11.1

    Maybe I'm missing a file or a custom build?

    Full traceback:

    $ CUDA_VISIBLE_DEVICES=0 python main.py test -d dataset_path -s 512 --resume model_best.pth.tar --phase val --batch-size 1 --ms --workers 10 --classes 4
    main.py test -d dataset_path -s 512 --resume model_best.pth.tar --phase val --batch-size 1 --ms --workers 10 --classes 4                                      
    Namespace(cmd='test', data_dir='dataset_path', classes=4, crop_size=512, step=200, arch=None, batch_size=1, epochs=10, lr=0.01, lr_mode='step', momentum=0.9, 
    weight_decay=0.0001, evaluate=False, resume='model_best.pth.tar', trans_resume='', pretrained='', pretrained_model='', workers=10, load_rel=None, phase='val',
     random_scale=0, random_rotate=0, bn_sync=False, ms=True, trans=False, with_gt=False, test_suffix='')                                                         
    cmd : test                                                                                                                                                    
    data_dir : dataset_path                                                                                                                                       
    classes : 4                                                                                                                                                   
    crop_size : 512                                                                
    step : 200                                                                                                                                                    
    arch : None                                                                                                                                                   
    batch_size : 1                                                                                                                                                
    epochs : 10                                                                                                                                                   
    lr : 0.01                                                                                                                                                     
    lr_mode : step                                                                                                                                                
    momentum : 0.9                                                                                                                                                
    weight_decay : 0.0001                                                          
    evaluate : False                                                                                                                                              
    resume : model_best.pth.tar                                                                                                                                   
    trans_resume :                         
    pretrained :                                                                                                                                                  
    pretrained_model :                     
    workers : 10                           
    load_rel : None                                                                                                                                               
    phase : val                                                                                                                                                   
    random_scale : 0                       
    random_rotate : 0                      
    bn_sync : False                        
    ms : True                              
    trans : False                          
    with_gt : False                        
    test_suffix :
    [W NNPACK.cpp:80] Could not initialize NNPACK! Reason: Unsupported hardware.                                                                                  
    /users/mmazuecos/miniconda3/envs/cerberus/lib/python3.9/site-packages/torch/nn/functional.py:3454: UserWarning: Default upsampling behavior when mode=bilinear
     is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for det
    ails.                                  
      warnings.warn(                       
    Traceback (most recent call last):                                             
      File "/users/mmazuecos/Cerberus/main.py", line 1798, in <module>                                                                                            
        main()                             
      File "/users/mmazuecos/Cerberus/main.py", line 1794, in main                                                                                                
        test_seg_cerberus(args)                                                    
      File "/users/mmazuecos/Cerberus/main.py", line 1719, in test_seg_cerberus                                                                                   
        mAP = test_ms_cerberus(test_loader_list, model, save_vis=True,                                                                                            
      File "/users/mmazuecos/Cerberus/main.py", line 1491, in test_ms_cerberus                                                                                    
        final, _, _ = model(image_var, index)                                      
      File "/users/mmazuecos/miniconda3/envs/cerberus/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl                            
        result = self.forward(*input, **kwargs)                                                                                                                   
      File "/users/mmazuecos/Cerberus/model/models.py", line 379, in forward                                                                                      
        layer_1, layer_2, layer_3, layer_4 = forward_vit(self.pretrained, x)                                                                                      
      File "/users/mmazuecos/Cerberus/model/vit.py", line 109, in forward_vit                                                                                     
        glob = pretrained.model.forward_flex(x)                                                                                                                   
      File "/users/mmazuecos/Cerberus/model/vit.py", line 184, in forward_flex                                                                                    
        if self.dist_token:                                                        
      File "/users/mmazuecos/miniconda3/envs/cerberus/lib/python3.9/site-packages/torch/nn/modules/module.py", line 947, in __getattr__                           
        raise AttributeError("'{}' object has no attribute '{}'".format(
    
    opened by mmazuecos 3
Owner
null
Official PyTorch implementation of Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations

Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations Zhenyu Jiang, Yifeng Zhu, Maxwell Svetlik, Kuan Fang, Yu

UT-Austin Robot Perception and Learning Lab 63 Jan 3, 2023
Pytorch implementation of One-Shot Affordance Detection

One-shot Affordance Detection PyTorch implementation of our one-shot affordance detection models. This repository contains PyTorch evaluation code, tr

null 46 Dec 12, 2022
O2O-Afford: Annotation-Free Large-Scale Object-Object Affordance Learning (CoRL 2021)

O2O-Afford: Annotation-Free Large-Scale Object-Object Affordance Learning Object-object Interaction Affordance Learning. For a given object-object int

Kaichun Mo 26 Nov 4, 2022
One implementation of the paper "DMRST: A Joint Framework for Document-Level Multilingual RST Discourse Segmentation and Parsing".

Introduction One implementation of the paper "DMRST: A Joint Framework for Document-Level Multilingual RST Discourse Segmentation and Parsing". Users

seq-to-mind 18 Dec 11, 2022
Implementation of fast algorithms for Maximum Spanning Tree (MST) parsing that includes fast ArcMax+Reweighting+Tarjan algorithm for single-root dependency parsing.

Fast MST Algorithm Implementation of fast algorithms for (Maximum Spanning Tree) MST parsing that includes fast ArcMax+Reweighting+Tarjan algorithm fo

Miloš Stanojević 11 Oct 14, 2022
joint detection and semantic segmentation, based on ultralytics/yolov5,

Multi YOLO V5——Detection and Semantic Segmentation Overeview This is my undergraduate graduation project which based on ultralytics YOLO V5 tag v5.0.

null 477 Jan 6, 2023
An official implementation of "Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation" (ICCV 2021) in PyTorch.

Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation This is an official implementation of the paper "Exploiting a Joint

CV Lab @ Yonsei University 35 Oct 26, 2022
Open source code for Paper "A Co-Interactive Transformer for Joint Slot Filling and Intent Detection"

A Co-Interactive Transformer for Joint Slot Filling and Intent Detection This repository contains the PyTorch implementation of the paper: A Co-Intera

null 67 Dec 5, 2022
Release of SPLASH: Dataset for semantic parse correction with natural language feedback in the context of text-to-SQL parsing

SPLASH: Semantic Parsing with Language Assistance from Humans SPLASH is dataset for the task of semantic parse correction with natural language feedba

Microsoft Research - Language and Information Technologies (MSR LIT) 35 Oct 31, 2022
:hot_pepper: R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)

R²SQL The PyTorch implementation of paper Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing. (AAAI 2021) Requirement

huybery 60 Dec 31, 2022