PromptDet: Expand Your Detector Vocabulary with Uncurated Images

Overview

PromptDet: Expand Your Detector Vocabulary with Uncurated Images

Paper     Website

Introduction

The goal of this work is to establish a scalable pipeline for expanding an object detector towards novel/unseen categories, using zero manual annotations. To achieve that, we make the following four contributions: (i) in pursuit of generalisation, we propose a two-stage open-vocabulary object detector that categorises each box proposal by a classifier generated from the text encoder of a pre-trained visual-language model; (ii) To pair the visual latent space (from RPN box proposal) with that of the pre-trained text encoder, we propose the idea of regional prompt learning to optimise a couple of learnable prompt vectors, converting the textual embedding space to fit those visually object-centric images; (iii) To scale up the learning procedure towards detecting a wider spectrum of objects, we exploit the available online resource, iteratively updating the prompts, and later self-training the proposed detector with pseudo labels generated on a large corpus of noisy, uncurated web images. The self-trained detector, termed as PromptDet, significantly improves the detection performance on categories for which manual annotations are unavailable or hard to obtain, e.g. rare categories. Finally, (iv) to validate the necessity of our proposed components, we conduct extensive experiments on the challenging LVIS and MS-COCO dataset, showing superior performance over existing approaches with fewer additional training images and zero manual annotations whatsoever.

Training framework

method overview

Prerequisites

  • MMDetection version 2.16.0.

  • Please see get_started.md for installation and the basic usage of MMDetection.

Inference

./tools/dist_test.sh configs/promptdet/promptdet_mask_rcnn_r50_fpn_sample1e-3_mstrain_1x_lvis_v1.py work_dirs/promptdet_mask_rcnn_r50_fpn_sample1e-3_mstrain_1x_lvis_v1.pth 4 --eval bbox segm

Train

To be updated.

Models

For your convenience, we provide the following trained models (PromptDet) with mask AP.

Model Epochs Scale Jitter Input Size APnovel APc APf AP Config Download
PromptDet_R_50_FPN_1x 12 640~800 800x800 19.0 18.5 25.8 21.4 config google / baidu
PromptDet_R_50_FPN_6x 72 100~1280 800x800 21.4 23.3 29.3 25.3 config google / baidu

[0] All results are obtained with a single model and without any test time data augmentation such as multi-scale, flipping and etc..
[1] Refer to more details in config files in config/promptdet/.
[2] Extraction code of baidu netdisk: promptdet.

Acknowledgement

Thanks MMDetection team for the wonderful open source project!

Citation

If you find PromptDet useful in your research, please consider citing:

@inproceedings{feng2022promptdet,
    title={PromptDet: Expand Your Detector Vocabulary with Uncurated Images},
    author={Feng, Chengjian and Zhong, Yujie and Jie, Zequn and Chu, Xiangxiang and Ren, Haibing and Wei, Xiaolin and Xie, Weidi and Ma, Lin},
    journal={arXiv preprint arXiv:2203.16513},
    year={2022}
}
Issues
  • COCO embeddings

    COCO embeddings

    Hi, Thank you for sharing your amazing work.

    Can you please share the embeddings used for COCO evaluation ? The LVIS-v1 has only 59 categories common with COCO. Otherwise could you share the learned 1 + 1 prompt vectors so it may be used in any dataset.

    Thank you.

    opened by hanoonaR 3
  • Baseline training configs

    Baseline training configs

    Hi,

    Thank you for sharing your work. I would to like know the training configurations used in your baseline reported in Table 2 in your paper. The implementation details in the paper specifies 1x schedule with lr of 0.02. However, the samples_per_gpu is set to 4 in the shared configuration, https://github.com/fcjian/PromptDet/blob/83467c79114f441cbf4dedc31baf54a9a146e689/configs/promptdet/promptdet_mask_rcnn_r50_fpn_sample1e-3_mstrain_1x_lvis_v1.py#L35 However, the default training config in mmdet, for Mask-RCNN with FPN for 1x schedule is 8 GPUs and 2 samples per GPU, for effective batch size of 16, and lr of 0.02.

    Could you please specify the the number of GPU's and the batch size and corresponding lr used in your baseline.

    Thank you.

    opened by hanoonaR 1
  • How to train the model?

    How to train the model?

    Thanks for your nice work and precious time! Could you give some examples on how to train the model using existing config files in the configs/promptdet?

    opened by Kyfafyd 1
A whale detector design for the Kaggle whale-detector challenge!

CNN (InceptionV1) + STFT based Whale Detection Algorithm So, this repository is my PyTorch solution for the Kaggle whale-detection challenge. The obje

Tarin Ziyaee 92 Sep 28, 2021
Lane follower: Lane-detector (OpenCV) + Object-detector (YOLO5) + CAN-bus

Lane Follower This code is for the lane follower, including perception and control, as shown below. Environment Hardware Industrial Camera Intel-NUC(1

Siqi Fan 2 Jan 14, 2022
HeartRate detector with ArduinoandPython - Use Arduino and Python create a heartrate detector.

Syllabus of Contents Syllabus of Contents Introduction Of Project Features Develop With Python code introduction Installation License Developer Contac

null 1 Jan 5, 2022
Video lie detector using xgboost - A video lie detector using OpenFace and xgboost

video_lie_detector_using_xgboost a video lie detector using OpenFace and xgboost

null 2 Jan 11, 2022
Imposter-detector-2022 - HackED 2022 Team 3IQ - 2022 Imposter Detector

HackED 2022 Team 3IQ - 2022 Imposter Detector By Aneeljyot Alagh, Curtis Kan, Jo

Joshua Ji 4 Jan 27, 2022
An image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testingAn image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testing

SVM Données Une base d’images contient 490 images pour l’apprentissage (400 voitures et 90 bateaux), et encore 21 images pour fait des tests. Prétrait

Achraf Rahouti 3 Nov 30, 2021
Code for paper "Vocabulary Learning via Optimal Transport for Neural Machine Translation"

**Codebase and data are uploaded in progress. ** VOLT(-py) is a vocabulary learning codebase that allows researchers and developers to automaticaly ge

null 404 Jun 17, 2022
Empirical Study of Transformers for Source Code & A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code

Transformers for variable misuse, function naming and code completion tasks The official PyTorch implementation of: Empirical Study of Transformers fo

Bayesian Methods Research Group 46 Jun 1, 2022
Implementation of the paper "Fine-Tuning Transformers: Vocabulary Transfer"

Transformer-vocabulary-transfer Implementation of the paper "Fine-Tuning Transfo

LEYA 12 Jun 17, 2022
Official Implementation of DDOD (Disentangle your Dense Object Detector), ACM MM2021

Disentangle Your Dense Object Detector This repo contains the supported code and configuration files to reproduce object detection results of Disentan

loveSnowBest 41 Jun 18, 2022
Code for "The Box Size Confidence Bias Harms Your Object Detector"

The Box Size Confidence Bias Harms Your Object Detector - Code Disclaimer: This repository is for research purposes only. It is designed to maintain r

Johannes G. 22 Jan 18, 2022
SSD: Single Shot MultiBox Detector pytorch implementation focusing on simplicity

SSD: Single Shot MultiBox Detector Introduction Here is my pytorch implementation of 2 models: SSD-Resnet50 and SSDLite-MobilenetV2.

Viet Nguyen 138 Jun 15, 2022
Official code of the paper "ReDet: A Rotation-equivariant Detector for Aerial Object Detection" (CVPR 2021)

ReDet: A Rotation-equivariant Detector for Aerial Object Detection ReDet: A Rotation-equivariant Detector for Aerial Object Detection (CVPR2021), Jiam

csuhan 311 Jun 17, 2022
Code for one-stage adaptive set-based HOI detector AS-Net.

AS-Net Code for one-stage adaptive set-based HOI detector AS-Net. Mingfei Chen*, Yue Liao*, Si Liu, Zhiyuan Chen, Fei Wang, Chen Qian. "Reformulating

Mingfei Chen 44 Mar 30, 2022
Code for "LoFTR: Detector-Free Local Feature Matching with Transformers", CVPR 2021

LoFTR: Detector-Free Local Feature Matching with Transformers Project Page | Paper LoFTR: Detector-Free Local Feature Matching with Transformers Jiami

ZJU3DV 1.1k Jun 29, 2022
Code repository for paper `Skeleton Merger: an Unsupervised Aligned Keypoint Detector`.

Skeleton Merger Skeleton Merger, an Unsupervised Aligned Keypoint Detector. The paper is available at https://arxiv.org/abs/2103.10814. A map of the r

北海若 46 May 26, 2022
YOLO5Face: Why Reinventing a Face Detector (https://arxiv.org/abs/2105.12931)

Introduction Yolov5-face is a real-time,high accuracy face detection. Performance Single Scale Inference on VGA resolution(max side is equal to 640 an

DeepCam Shenzhen 1.1k Jun 21, 2022
Deformable DETR is an efficient and fast-converging end-to-end object detector.

Deformable DETR: Deformable Transformers for End-to-End Object Detection.

null 1.6k Jun 22, 2022
LoFTR:Detector-Free Local Feature Matching with Transformers CVPR 2021

LoFTR-with-train-script LoFTR:Detector-Free Local Feature Matching with Transformers CVPR 2021 (with train script --- unofficial ---). About Megadepth

Nan Xiaohu 11 Mar 25, 2022