PromptDet: Expand Your Detector Vocabulary with Uncurated Images

Last update: Dec 20, 2022

Related tags

Deep Learning computer-vision vocabulary self-training object-detection clip zero-shot-learning pseudo-labeling web-image prompt-learning regional-prompt novel-categories

Overview

PromptDet: Expand Your Detector Vocabulary with Uncurated Images

Paper Website

Introduction

The goal of this work is to establish a scalable pipeline for expanding an object detector towards novel/unseen categories, using zero manual annotations. To achieve that, we make the following four contributions: (i) in pursuit of generalisation, we propose a two-stage open-vocabulary object detector that categorises each box proposal by a classifier generated from the text encoder of a pre-trained visual-language model; (ii) To pair the visual latent space (from RPN box proposal) with that of the pre-trained text encoder, we propose the idea of regional prompt learning to optimise a couple of learnable prompt vectors, converting the textual embedding space to fit those visually object-centric images; (iii) To scale up the learning procedure towards detecting a wider spectrum of objects, we exploit the available online resource, iteratively updating the prompts, and later self-training the proposed detector with pseudo labels generated on a large corpus of noisy, uncurated web images. The self-trained detector, termed as PromptDet, significantly improves the detection performance on categories for which manual annotations are unavailable or hard to obtain, e.g. rare categories. Finally, (iv) to validate the necessity of our proposed components, we conduct extensive experiments on the challenging LVIS and MS-COCO dataset, showing superior performance over existing approaches with fewer additional training images and zero manual annotations whatsoever.

Training framework

Prerequisites

MMDetection version 2.16.0.
Please see get_started.md for installation and the basic usage of MMDetection.

Inference

./tools/dist_test.sh configs/promptdet/promptdet_mask_rcnn_r50_fpn_sample1e-3_mstrain_1x_lvis_v1.py work_dirs/promptdet_mask_rcnn_r50_fpn_sample1e-3_mstrain_1x_lvis_v1.pth 4 --eval bbox segm

Train

To be updated.

Models

For your convenience, we provide the following trained models (PromptDet) with mask AP.

Model	Epochs	Scale Jitter	Input Size	AP_novel	APc	AP_f	AP	Config	Download
PromptDet_R_50_FPN_1x	12	640~800	800x800	19.0	18.5	25.8	21.4	config	google / baidu
PromptDet_R_50_FPN_6x	72	100~1280	800x800	21.4	23.3	29.3	25.3	config	google / baidu

[0] All results are obtained with a single model and without any test time data augmentation such as multi-scale, flipping and etc..
[1] Refer to more details in config files in config/promptdet/.
[2] Extraction code of baidu netdisk: promptdet.

Acknowledgement

Thanks MMDetection team for the wonderful open source project!

Citation

If you find PromptDet useful in your research, please consider citing:

@inproceedings{feng2022promptdet,
    title={PromptDet: Expand Your Detector Vocabulary with Uncurated Images},
    author={Feng, Chengjian and Zhong, Yujie and Jie, Zequn and Chu, Xiangxiang and Ren, Haibing and Wei, Xiaolin and Xie, Weidi and Ma, Lin},
    journal={arXiv preprint arXiv:2203.16513},
    year={2022}
}

Comments

COCO embeddings

Hi, Thank you for sharing your amazing work.

Can you please share the embeddings used for COCO evaluation ? The LVIS-v1 has only 59 categories common with COCO. Otherwise could you share the learned 1 + 1 prompt vectors so it may be used in any dataset.

Thank you.

opened by hanoonaR 3
Baseline training configs

Hi,

Thank you for sharing your work. I would to like know the training configurations used in your baseline reported in Table 2 in your paper. The implementation details in the paper specifies 1x schedule with lr of 0.02. However, the samples_per_gpu is set to 4 in the shared configuration, https://github.com/fcjian/PromptDet/blob/83467c79114f441cbf4dedc31baf54a9a146e689/configs/promptdet/promptdet_mask_rcnn_r50_fpn_sample1e-3_mstrain_1x_lvis_v1.py#L35 However, the default training config in mmdet, for Mask-RCNN with FPN for 1x schedule is 8 GPUs and 2 samples per GPU, for effective batch size of 16, and lr of 0.02.

Could you please specify the the number of GPU's and the batch size and corresponding lr used in your baseline.

Thank you.

opened by hanoonaR 1
How to train the model?

Thanks for your nice work and precious time! Could you give some examples on how to train the model using existing config files in the configs/promptdet?

opened by Kyfafyd 1
singe image inference

In the process of reproducing your work, I found that there were only inference code of lvis validation dataset in the inference section. I would like to ask if there are any scripts to implement single image inference or single video inference?

opened by LeonG7 1
code for regional prompt learning

Hi, I'm currently reproducing your work, but cannot find the code related to regional prompt learning. Can u tell me where the code for preprocessing and training of regional prompt learning is? ( Sorry I'm new to mmdetection so it's hard to search ..) Thanks!

opened by jihwanp 1

Owner

GitHub

A whale detector design for the Kaggle whale-detector challenge!

CNN (InceptionV1) + STFT based Whale Detection Algorithm So, this repository is my PyTorch solution for the Kaggle whale-detection challenge. The obje

92 Sep 28, 2021

Lane follower: Lane-detector (OpenCV) + Object-detector (YOLO5) + CAN-bus

Lane Follower This code is for the lane follower, including perception and control, as shown below. Environment Hardware Industrial Camera Intel-NUC(1

3 Jul 7, 2022

HeartRate detector with ArduinoandPython - Use Arduino and Python create a heartrate detector.

Syllabus of Contents Syllabus of Contents Introduction Of Project Features Develop With Python code introduction Installation License Developer Contac

1 Jan 5, 2022

Video lie detector using xgboost - A video lie detector using OpenFace and xgboost

video_lie_detector_using_xgboost a video lie detector using OpenFace and xgboost

2 Jan 11, 2022

Imposter-detector-2022 - HackED 2022 Team 3IQ - 2022 Imposter Detector

HackED 2022 Team 3IQ - 2022 Imposter Detector By Aneeljyot Alagh, Curtis Kan, Jo

3 Aug 20, 2022

An image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testingAn image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testing

SVM Données Une base d’images contient 490 images pour l’apprentissage (400 voitures et 90 bateaux), et encore 21 images pour fait des tests. Prétrait

3 Nov 30, 2021

Code for paper "Vocabulary Learning via Optimal Transport for Neural Machine Translation"

**Codebase and data are uploaded in progress. ** VOLT(-py) is a vocabulary learning codebase that allows researchers and developers to automaticaly ge

416 Jan 9, 2023

Empirical Study of Transformers for Source Code & A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code

Transformers for variable misuse, function naming and code completion tasks The official PyTorch implementation of: Empirical Study of Transformers fo

56 Nov 15, 2022

Implementation of the paper "Fine-Tuning Transformers: Vocabulary Transfer"

Transformer-vocabulary-transfer Implementation of the paper "Fine-Tuning Transfo

13 Nov 30, 2022

Official Implementation of DDOD (Disentangle your Dense Object Detector), ACM MM2021

Disentangle Your Dense Object Detector This repo contains the supported code and configuration files to reproduce object detection results of Disentan

51 Jan 7, 2023

Code for "The Box Size Confidence Bias Harms Your Object Detector"

The Box Size Confidence Bias Harms Your Object Detector - Code Disclaimer: This repository is for research purposes only. It is designed to maintain r

24 Dec 7, 2022

SSD: Single Shot MultiBox Detector pytorch implementation focusing on simplicity

SSD: Single Shot MultiBox Detector Introduction Here is my pytorch implementation of 2 models: SSD-Resnet50 and SSDLite-MobilenetV2.

149 Jan 7, 2023

Official code of the paper "ReDet: A Rotation-equivariant Detector for Aerial Object Detection" (CVPR 2021)

ReDet: A Rotation-equivariant Detector for Aerial Object Detection ReDet: A Rotation-equivariant Detector for Aerial Object Detection (CVPR2021), Jiam

334 Dec 23, 2022

Code for one-stage adaptive set-based HOI detector AS-Net.

AS-Net Code for one-stage adaptive set-based HOI detector AS-Net. Mingfei Chen*, Yue Liao*, Si Liu, Zhiyuan Chen, Fei Wang, Chen Qian. "Reformulating

45 Dec 9, 2022

Code for "LoFTR: Detector-Free Local Feature Matching with Transformers", CVPR 2021

LoFTR: Detector-Free Local Feature Matching with Transformers Project Page | Paper LoFTR: Detector-Free Local Feature Matching with Transformers Jiami

1.4k Jan 4, 2023

Code repository for paper `Skeleton Merger: an Unsupervised Aligned Keypoint Detector`.

Skeleton Merger Skeleton Merger, an Unsupervised Aligned Keypoint Detector. The paper is available at https://arxiv.org/abs/2103.10814. A map of the r

48 Nov 14, 2022

YOLO5Face: Why Reinventing a Face Detector (https://arxiv.org/abs/2105.12931)

Introduction Yolov5-face is a real-time,high accuracy face detection. Performance Single Scale Inference on VGA resolution（max side is equal to 640 an

1.4k Jan 7, 2023

Deformable DETR is an efficient and fast-converging end-to-end object detector.

Deformable DETR: Deformable Transformers for End-to-End Object Detection.

2k Jan 5, 2023

LoFTR:Detector-Free Local Feature Matching with Transformers CVPR 2021

LoFTR-with-train-script LoFTR:Detector-Free Local Feature Matching with Transformers CVPR 2021 (with train script --- unofficial ---). About Megadepth

15 Nov 4, 2022

PromptDet: Expand Your Detector Vocabulary with Uncurated Images

Related tags

Overview

PromptDet: Expand Your Detector Vocabulary with Uncurated Images

Introduction

Training framework

Prerequisites

Inference

Train

Models

Acknowledgement

Citation

Comments

COCO embeddings

Baseline training configs

How to train the model?

singe image inference

code for regional prompt learning

Owner

A whale detector design for the Kaggle whale-detector challenge!

Lane follower: Lane-detector (OpenCV) + Object-detector (YOLO5) + CAN-bus

HeartRate detector with ArduinoandPython - Use Arduino and Python create a heartrate detector.

Video lie detector using xgboost - A video lie detector using OpenFace and xgboost

Imposter-detector-2022 - HackED 2022 Team 3IQ - 2022 Imposter Detector

An image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testingAn image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testing

Code for paper "Vocabulary Learning via Optimal Transport for Neural Machine Translation"

Empirical Study of Transformers for Source Code & A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code

Implementation of the paper "Fine-Tuning Transformers: Vocabulary Transfer"

Official Implementation of DDOD (Disentangle your Dense Object Detector), ACM MM2021

Code for "The Box Size Confidence Bias Harms Your Object Detector"

SSD: Single Shot MultiBox Detector pytorch implementation focusing on simplicity

Official code of the paper "ReDet: A Rotation-equivariant Detector for Aerial Object Detection" (CVPR 2021)

Code for one-stage adaptive set-based HOI detector AS-Net.

Code for "LoFTR: Detector-Free Local Feature Matching with Transformers", CVPR 2021

Code repository for paper `Skeleton Merger: an Unsupervised Aligned Keypoint Detector`.

YOLO5Face: Why Reinventing a Face Detector (https://arxiv.org/abs/2105.12931)

Deformable DETR is an efficient and fast-converging end-to-end object detector.

LoFTR:Detector-Free Local Feature Matching with Transformers CVPR 2021