Pytorch implementation of One-Shot Affordance Detection

Last update: Dec 12, 2022

Related tags

Deep Learning OSAD_Net

Overview

One-shot Affordance Detection

PyTorch implementation of our one-shot affordance detection models. This repository contains PyTorch evaluation code, training code and pretrained models.

📋 Table of content

📎 Paper Link
💡 Abstract
📖 Method
1. IJCAI Version
2. Extended Version
📂 Dataset
1. PAD
2. PADv2
📃 Requirements
✏️ Usage
1. Train
2. Test
3. Evaluation
📊 Experimental Results
1. Performance on PADv2
2. Performance on PAD
🍎 Potential Applications
✉️ Statement
🔍 Citation

📎 Paper Link

One-Shot Affordance Detection (IJCAI2021) (link)

Authors: Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, Dacheng Tao

One-Shot Affordance Detection (Extended Version) (link)

Authors: Wei Zhai*, Hongchen Luo*, Jing Zhang, Yang Cao, Dacheng Tao

💡 Abstract

Affordance detection refers to identifying the potential action possibilities of objects in an image, which is a crucial ability for robot perception and manipulation. To empower robots with this ability in unseen scenarios, we first consider the challenging one-shot affordance detection problem in this paper, i.e., given a support image that depicts the action purpose, all objects in a scene with the common affordance should be detected. To this end, we devise a One-Shot Affordance Detection Network (OSAD-Net) that firstly estimates the human action purpose and then transfers it to help detect the common affordance from all candidate images. Through collaboration learning, OSAD-Net can capture the common characteristics between objects having the same underlying affordance and learn a good adaptation capability for perceiving unseen affordances. Besides, we build a Purpose-driven Affordance Dataset v2 (PADv2) by collecting and labeling 30k images from 39 affordance and 94 object categories. With complex scenes and rich annotations, our PADv2 can comprehensively understand the affordance of objects and can even be used in other vision tasks, such as scene understanding, action recognition, robot manipulation, etc. We present a standard one-shot affordance detection benchmark comparing 11 advanced models in several different fields. Experimental results demonstrate the superiority of our model over previous representative ones in terms of both objective metrics and visual quality.

Illustration of perceiving affordance. Given a support image that depicts the action purpose, all objects in ascene with the common affordance could be detected.

📖 Method

OSAD-Net (IJCAI2021)

Our One-Shot Affordance Detection (OS-AD) network. OSAD-Net_ijcai consists of three key modules: Purpose Learning Module (PLM), Purpose Transfer Module (PTM), and Collaboration Enhancement Module (CEM). (a) PLM aims to estimate action purpose from the human-object interaction in the support image. (b) PTM transfers the action purpose to the query images via an attention mechanism to enhance the relevant features. (c) CEM captures the intrinsic characteristics between objects having the common affordance to learn a better affordance perceiving ability.

OSAD-Net (Extended Version)

The framework of our OSAD-Net. For our OSAD-Net pipeline, the network first uses a Resnet50 to extract the features of support image and query images. Subsequently, the support feature, the bounding box of the person and object, and the pose of the person are fed together into the action purpose learning (APL) module to obtain the human action purpose features. And then send the human action purpose features and query images together to the mixture purpose transfer (MPT) to transfer the human action purpose to query images and activate the object region belonging to the affordance in the query images. Then, the output of the MPT is fed into a densely collaborative enhancement (DCE) module to learn the commonality among objects of the same affordance and suppress the irrelevant background regions using the cooperative strategy, and finally feed into the decoder to obtain the final detection results.

📂 Dataset

The samples images in the PADv2 of this paper. Our PADv2 has rich annotations such as affordance masks as well as depth information. Thus it provides a solid foundation for the affordance detection task.

The properties of PADv2. (a) The classification structure of the PADv2 in this paper consists of 39 affordance categories and 94 object categories. (b) The word cloud distribution of the PADv2. (c) Overlapping masks visualization of PADv2 mixed with specific affordance classes and overall category masks. (d) Confusion matrix of PADv2 affordance category and object category, where the horizontal axis corresponds to the object category and the vertical axis corresponds to the affordance category, (e) Distribution of co-occurring attributes of the PADv2, the grid is numbered for the total number of images.

Download PAD

You can download the PAD from [ Google Drive | Baidu Pan (z40m) ].

cd Downloads/
unzip PAD.zip
cd OSAD-Net
mkdir datasets/PAD
mv Downloads/PAD/divide_1 datasets/PAD/   
mv Downloads/PAD/divide_2 datasets/PAD/   
mv Downloads/PAD/divide_3 datasets/PAD/

Download PADv2

You can download the PADv2 from [ Baidu Pan (1ttj) ].

cd Downloads/
unzip PADv2_part1.zip
cd OSAD-Net
mkdir datasets/PADv2_part1
mv Downloads/PADv2_part1/divide_1 datasets/PADv2_part1/  
mv Downloads/PADv2_part1/divide_2 datasets/PADv2_part1/  
mv Downloads/PADv2_part1/divide_3 datasets/PADv2_part1/

📃 Requirements

python 3.7
pytorch 1.1.0
opencv

✏️ Usage

git clone https://github.com/lhc1224/OSAD_Net.git
cd OSAD-Net

Train

You can download the pretrained model from [ Google Drive | Baidu Pan (xjk5) ], then move it to the models folder To train the OSAD-Net_ijcai model, run run_os_ad.py with the desired model architecture:

python run_os_ad.py

To train the OSAD-Net model, run run_os_adv2.py with the desired model architecture:

python run_os_adv2.py

Test

To test the OSAD-Net_ijcai model, run run_os_ad.py:

python run_os_ad.py  --mode test

To test the OSAD-Net model, run run_os_ad.py, you can download the trained models from [ Google Drive | Baidu Pan (611r) ]

python run_os_adv2.py  --mode test

Evaluation

In order to evaluate the forecast results, the evaluation code can be obtained via the following Evaluation Tools.

📊 Experimental Results

Performance on PADv2

You can download the affordance maps from [ Google Drive | Baidu Pan (hwtf) ]

Performance on PAD

You can download the affordance maps from [ Google Drive | Baidu Pan(hrlj) ]

🍎 Potential Applications

Potential Applications of one-shot affordance system. (a) Application I: Content Image Retrieval. The content image retrieval model combined with affordance detection has a promising application in search engines and online shopping platforms. (b) Application II: Learning from Demonstration. The one-shot affordance detection model can help an agent to naturally select the correct object based on the expert’s actions. (c) Application III: Self-exploration of Agents. The one-shot affordance detection model helps an agent to autonomously perceive all instances or areas of a scene with the similar affordance property in unknown human spaces based on historical data (e.g., images of human interactions)

✉️ Statement

This project is for research purpose only, please contact us for the licence of commercial use. For any other questions please contact [email protected] or [email protected].

🔍 Citation

@inproceedings{Oneluo,
  title={One-Shot Affordance Detection},
  author={Hongchen Luo and Wei Zhai and Jing Zhang and Yang Cao and Dacheng Tao},
  booktitle={IJCAI},
  year={2021}
}

@article{luo2021one,
  title={One-Shot Affordance Detection in the Wild},
  author={Zhai, Wei and Luo, Hongchen and Zhang, Jing and Cao, Yang and Tao, Dacheng},
  journal={arXiv preprint arXiv:2106.14747xx},
  year={2021}
}

You might also like...

FairMOT - A simple baseline for one-shot multi-object tracking

3.6k Jan 8, 2023

(AAAI 2021) Progressive One-shot Human Parsing

End-to-end One-shot Human Parsing This is the official repository for our two papers: Progressive One-shot Human Parsing (AAAI 2021) End-to-end One-sh

54 Dec 30, 2022

Complete system for facial identity system. Include one-shot model, database operation, features visualization, monitoring

2 Dec 28, 2021

Code for One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022)

One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022) Paper | Demo Requirements Python = 3.6 , Pytorch

84 Jan 3, 2023

Code for the paper One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation, CVPR 2021.

One Thing One Click One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation (CVPR2021) Code for the paper One Thi

44 Dec 12, 2022

Comments

Unable to access PADv2 through Baidu Pan

Hi,

First of all, thank you very much for this great work and also for making your code and data publicly available! For a research project, I would like to download PADv2, however since I am located in Switzerland I was unable to create a Baidu account to access the dataset through Baidu Pan. Would it be possible for you to upload PADv2 on Google Drive as well?

Thank you very much for your time!

Best, Ayça

opened by aycatakmaz 2
Test after Learning and Learning Outcomes Analysis

Hello. Thank you for sharing the sauce. I ran it until 'Train -> pythonrun_os_adv2.py' and checked that the file 'OSADv2/save_models/epoch_x_bone.pth' was created as well. How can I analyze the test and learning results with this? Thanks you

opened by sktm502 1
Questions about dataset files and baselines
Hi, much thanks for the open-resourced codebase. I would like to ask several questions about the dataset files and segmentation baselines:

I notice that the dataset PADv2 seems to contain several ref files, i.e., there exists refs and refs_2 folders under each train\divide_x folder, and several test_ref_x.txt file under each test\divide_x folder. This makes me very confused because I don't know which file paths I should enter as hyperparameters. Can you please explain the differences between these files and which one should I use if I want to reproduce the experimental results in the paper? Moreover, I suggest providing a README for your proposed datasets to explain the role of each file contained in them, so as to effectively increase the influence of this work.

I want to ask how you adapted conventional segmentation methods (UNet, PSPNet, DeepLabV3+) to the one-shot affordance detection task, as they need segmentation masks as supervision for each category, and these ground-truth masks are not provided for novel affordance categories in this task. I guess that you just ignore the categories and use all ground-truth masks of base affordance categories to train the model, and also ignore the support images and bounding boxes when predicting the masks for query samples of novel affordance categories. However, I am not sure whether my guess is correct, because I did not find the relevant details in the paper.
opened by bighuang624 2

Pytorch implementation of One-Shot Affordance Detection

Related tags

Overview

One-shot Affordance Detection

📋 Table of content

📎 Paper Link

💡 Abstract

📖 Method

OSAD-Net (IJCAI2021)

OSAD-Net (Extended Version)

📂 Dataset

Download PAD

Download PADv2

📃 Requirements

✏️ Usage

Train

Test

Evaluation

📊 Experimental Results

Performance on PADv2

Performance on PAD

🍎 Potential Applications

✉️ Statement

🔍 Citation

You might also like...

FairMOT - A simple baseline for one-shot multi-object tracking

(AAAI 2021) Progressive One-shot Human Parsing

Complete system for facial identity system. Include one-shot model, database operation, features visualization, monitoring

Code for One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022)

Code for the paper One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation, CVPR 2021.

People log into different sites every day to get information and browse through these sites one by one

SSD: Single Shot MultiBox Detector pytorch implementation focusing on simplicity

Official PyTorch Implementation of Hypercorrelation Squeeze for Few-Shot Segmentation, arXiv 2021

Implementation of Cross Transformer for spatially-aware few-shot transfer, in Pytorch

Comments

Unable to access PADv2 through Baidu Pan

Test after Learning and Learning Outcomes Analysis

Questions about dataset files and baselines

Owner

Official PyTorch implementation of Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations

O2O-Afford: Annotation-Free Large-Scale Object-Object Affordance Learning (CoRL 2021)

Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Official PyTorch implementation of MX-Font (Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts)

Unofficial pytorch implementation of paper "One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing"

Official implementation of "One-Shot Voice Conversion with Weight Adaptive Instance Normalization".

Unofficial implementation of One-Shot Free-View Neural Talking Head Synthesis

GPU-accelerated PyTorch implementation of Zero-shot User Intent Detection via Capsule Neural Networks

[CVPR 2021] 'Searching by Generating: Flexible and Efficient One-Shot NAS with Architecture Generator'

[CVPR21] LightTrack: Finding Lightweight Neural Network for Object Tracking via One-Shot Architecture Search