NanoDet-Plus⚡Super fast and lightweight anchor-free object detection model. 🔥Only 980 KB(int8) / 1.8MB (fp16) and run 97FPS on cellphone🔥

Last update: Jan 7, 2023

Related tags

Deep Learning android deep-neural-networks deep-learning model-zoo pytorch object-detection mnn shufflenet ncnn openvino anchor-free efficientnet nanodet repvgg nanodet-plus

Overview

NanoDet-Plus

Super fast and high accuracy lightweight anchor-free object detection model. Real-time on mobile devices.

⚡ Super lightweight: Model file is only 980KB(INT8) or 1.8MB(FP16).
⚡ Super fast: 97fps(10.23ms) on mobile ARM CPU.
👍 High accuracy: Up to 34.3 mAP^val@0.5:0.95 and still realtime on CPU.
🤗 Training friendly: Much lower GPU memory cost than other models. Batch-size=80 is available on GTX1060 6G.
😎 Easy to deploy: Support various backends including ncnn, MNN and OpenVINO. Also provide Android demo based on ncnn inference framework.

Introduction

NanoDet is a FCOS-style one-stage anchor-free object detection model which using Generalized Focal Loss as classification and regression loss.

In NanoDet-Plus, we propose a novel label assignment strategy with a simple assign guidance module (AGM) and a dynamic soft label assigner (DSLA) to solve the optimal label assignment problem in lightweight model training. We also introduce a light feature pyramid called Ghost-PAN to enhance multi-layer feature fusion. These improvements boost previous NanoDet's detection accuracy by 7 mAP on COCO dataset.

NanoDet-Plus 知乎中文介绍

NanoDet 知乎中文介绍

QQ交流群：908606542 (答案：炼丹)

Benchmarks

Model	Resolution	mAP^val 0.5:0.95	CPU Latency^(i7-8700)	ARM Latency^(4xA76)	FLOPS	Params	Model Size
NanoDet-m	320*320	20.6	4.98ms	10.23ms	0.72G	0.95M	1.8MB(FP16) \| 980KB(INT8)
NanoDet-Plus-m	320*320	27.0	5.25ms	11.97ms	0.9G	1.17M	2.3MB(FP16) \| 1.2MB(INT8)
NanoDet-Plus-m	416*416	30.4	8.32ms	19.77ms	1.52G	1.17M	2.3MB(FP16) \| 1.2MB(INT8)
NanoDet-Plus-m-1.5x	320*320	29.9	7.21ms	15.90ms	1.75G	2.44M	4.7MB(FP16) \| 2.3MB(INT8)
NanoDet-Plus-m-1.5x	416*416	34.1	11.50ms	25.49ms	2.97G	2.44M	4.7MB(FP16) \| 2.3MB(INT8)
YOLOv3-Tiny	416*416	16.6	-	37.6ms	5.62G	8.86M	33.7MB
YOLOv4-Tiny	416*416	21.7	-	32.81ms	6.96G	6.06M	23.0MB
YOLOX-Nano	416*416	25.8	-	23.08ms	1.08G	0.91M	1.8MB(FP16)
YOLOv5-n	640*640	28.4	-	44.39ms	4.5G	1.9M	3.8MB(FP16)
FBNetV5	320*640	30.4	-	-	1.8G	-	-
MobileDet	320*320	25.6	-	-	0.9G	-	-

Download pre-trained models and find more models in Model Zoo or in Release Files

Notes (click to expand)

ARM Performance is measured on Kirin 980(4xA76+4xA55) ARM CPU based on ncnn. You can test latency on your phone with ncnn_android_benchmark.
Intel CPU Performance is measured Intel Core-i7-8700 based on OpenVINO.
NanoDet mAP(0.5:0.95) is validated on COCO val2017 dataset with no testing time augmentation.
YOLOv3&YOLOv4 mAP refers from Scaled-YOLOv4: Scaling Cross Stage Partial Network.

NEWS!!!

[2021.12.25] NanoDet-Plus release! Adding AGM(Assign Guidance Module) & DSLA(Dynamic Soft Label Assigner) to improve 7 mAP with only a little cost.

Find more update notes in Update notes.

Demo

Android demo

Android demo project is in demo_android_ncnn folder. Please refer to Android demo guide.

Here is a better implementation 👉 ncnn-android-nanodet

NCNN C++ demo

C++ demo based on ncnn is in demo_ncnn folder. Please refer to Cpp demo guide.

MNN demo

Inference using Alibaba's MNN framework is in demo_mnn folder. Please refer to MNN demo guide.

OpenVINO demo

Inference using OpenVINO is in demo_openvino folder. Please refer to OpenVINO demo guide.

Web browser demo

https://nihui.github.io/ncnn-webassembly-nanodet/

Pytorch demo

First, install requirements and setup NanoDet following installation guide. Then download COCO pretrain weight from here

👉 COCO pretrain checkpoint

The pre-trained weight was trained by the config config/nanodet-plus-m_416.yml.

Inference images

python demo/demo.py image --config CONFIG_PATH --model MODEL_PATH --path IMAGE_PATH

Inference video

python demo/demo.py video --config CONFIG_PATH --model MODEL_PATH --path VIDEO_PATH

Inference webcam

python demo/demo.py webcam --config CONFIG_PATH --model MODEL_PATH --camid YOUR_CAMERA_ID

Besides, We provide a notebook here to demonstrate how to make it work with PyTorch.

Install

Requirements

Linux or MacOS
CUDA >= 10.0
Python >= 3.6
Pytorch >= 1.7
experimental support Windows (Notice: Windows not support distributed training before pytorch1.7)

Step

Create a conda virtual environment and then activate it.

 conda create -n nanodet python=3.8 -y
 conda activate nanodet

Install pytorch

conda install pytorch torchvision cudatoolkit=11.1 -c pytorch -c conda-forge

Install requirements

pip install Cython termcolor numpy tensorboard pycocotools matplotlib pyaml opencv-python tqdm pytorch-lightning torchmetrics

Setup NanoDet

git clone https://github.com/RangiLyu/nanodet.git
cd nanodet
python setup.py develop

Model Zoo

NanoDet supports variety of backbones. Go to the config folder to see the sample training config files.

Model	Backbone	Resolution	COCO mAP	FLOPS	Params	Pre-train weight
NanoDet-m	ShuffleNetV2 1.0x	320*320	20.6	0.72G	0.95M	Download
NanoDet-Plus-m-320 (NEW)	ShuffleNetV2 1.0x	320*320	27.0	0.9G	1.17M	Weight \| Checkpoint
NanoDet-Plus-m-416 (NEW)	ShuffleNetV2 1.0x	416*416	30.4	1.52G	1.17M	Weight \| Checkpoint
NanoDet-Plus-m-1.5x-320 (NEW)	ShuffleNetV2 1.5x	320*320	29.9	1.75G	2.44M	Weight \| Checkpoint
NanoDet-Plus-m-1.5x-416 (NEW)	ShuffleNetV2 1.5x	416*416	34.1	2.97G	2.44M	Weight \| Checkpoint

Notice: The difference between Weight and Checkpoint is the weight only provide params in inference time, but the checkpoint contains training time params.

Legacy Model Zoo

Model	Backbone	Resolution	COCO mAP	FLOPS	Params	Pre-train weight
NanoDet-m-416	ShuffleNetV2 1.0x	416*416	23.5	1.2G	0.95M	Download
NanoDet-m-1.5x	ShuffleNetV2 1.5x	320*320	23.5	1.44G	2.08M	Download
NanoDet-m-1.5x-416	ShuffleNetV2 1.5x	416*416	26.8	2.42G	2.08M	Download
NanoDet-m-0.5x	ShuffleNetV2 0.5x	320*320	13.5	0.3G	0.28M	Download
NanoDet-t	ShuffleNetV2 1.0x	320*320	21.7	0.96G	1.36M	Download
NanoDet-g	Custom CSP Net	416*416	22.9	4.2G	3.81M	Download
NanoDet-EfficientLite	EfficientNet-Lite0	320*320	24.7	1.72G	3.11M	Download
NanoDet-EfficientLite	EfficientNet-Lite1	416*416	30.3	4.06G	4.01M	Download
NanoDet-EfficientLite	EfficientNet-Lite2	512*512	32.6	7.12G	4.71M	Download
NanoDet-RepVGG	RepVGG-A0	416*416	27.8	11.3G	6.75M	Download

How to Train

Prepare dataset

If your dataset annotations are pascal voc xml format, refer to config/nanodet_custom_xml_dataset.yml

Or convert your dataset annotations to MS COCO format(COCO annotation format details).
Prepare config file

Copy and modify an example yml config file in config/ folder.

Change save_path to where you want to save model.

Change num_classes in model->arch->head.

Change image path and annotation path in both data->train and data->val.

Set gpu ids, num workers and batch size in device to fit your device.

Set total_epochs, lr and lr_schedule according to your dataset and batchsize.

If you want to modify network, data augmentation or other things, please refer to Config File Detail
Start training

NanoDet is now using pytorch lightning for training.

For both single-GPU or multiple-GPUs, run:
```
python tools/train.py CONFIG_FILE_PATH
```
Visualize Logs

TensorBoard logs are saved in save_dir which you set in config file.

To visualize tensorboard logs, run:
```
cd <YOUR_SAVE_DIR>
tensorboard --logdir ./
```

How to Deploy

NanoDet provide multi-backend C++ demo including ncnn, OpenVINO and MNN. There is also an Android demo based on ncnn library.

Export model to ONNX

To convert NanoDet pytorch model to ncnn, you can choose this way: pytorch->onnx->ncnn

To export onnx model, run tools/export_onnx.py.

python tools/export_onnx.py --cfg_path ${CONFIG_PATH} --model_path ${PYTORCH_MODEL_PATH}

Run NanoDet in C++ with inference libraries

ncnn

Please refer to demo_ncnn.

OpenVINO

Please refer to demo_openvino.

MNN

Please refer to demo_mnn.

Run NanoDet on Android

Please refer to android_demo.

Citation

If you find this project useful in your research, please consider cite:

@misc{=nanodet,
    title={NanoDet-Plus: Super fast and high accuracy lightweight anchor-free object detection model.},
    author={RangiLyu},
    howpublished = {\url{https://github.com/RangiLyu/nanodet}},
    year={2021}
}

Thanks

https://github.com/Tencent/ncnn

https://github.com/open-mmlab/mmdetection

https://github.com/implus/GFocal

https://github.com/cmdbug/YOLOv5_NCNN

https://github.com/rbgirshick/yacs

Comments

训练完10个epoch开始测试的时候报错：list object has no attribute cpu

File "nanodet-main/nanodet/trainer/trainer.py", line 89, in run_epoch results[meta['img_info']['id'].cpu().numpy()[0]] = dets AttributeError: 'list' object has no attribute 'cpu'

opened by DL-Practise 16
Training nanodet from scratch

Hi, I'm training NanoDet-m model (ShuffleNetV2 1.0x | 320*320) from scratch with Coco dataset and 4 GeForce RTX 2080 Ti. Convergence seems pretty slow, it could take 1-2 weeks.

May I ask how long did it takes for you to reach 20.6 mAP, and which setup did you use?

Thank you.
bug help wanted

opened by Cloudz333 10
关于项目部署的问题
你好，我想请教两个问题：

在nanodet.cpp文件中的NanoDet::detect(cv::Mat image, float score_threshold, float nms_threshold)函数中，给模型输入数据的时候是用的ex.input("input.1", input);,这里的input.1是什么意思呢，是输入层的名字吗，我怎么通过pytorch查看到这个名字呢，print(model)后没看到层的名字，在Tencent/ncnn/tree/master/examples 上看到基本上都是ex.input("input", input);，如果我加载自己训练的一个模型，这里应该怎么匹配？

在nadodet.h中，有一个 std::vector heads_info，这个里面的值具体是什么含义呢，是和网络输出有关的吗

std::vector<HeadInfo> heads_info{ // cls_pred|dis_pred|stride {"792", "795", 8}, {"814", "817", 16}, {"836", "839", 32}, };

对pytorch以及nano网络都不是很熟，望见谅。
opened by busyyang 8
运行demo.py时,出现了一个小问题.
我的运行环境: cuda==10.1 pytorch==1.7 torchvision==0.8.0 当我运行"python demo/demo.py image --config CONFIG_PATH --model MODEL_PATH --path IMAGE_PATH",尝试推理图片时, 出现错误: RuntimeError: Could not run 'torchvision::nms' with arguments from the 'CUDA' backend. 'torchvision::nms' is only available for these backends: [CPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, Tracer, Autocast, Batched, VmapMode].

CPU: registered at /root/project/torchvision/csrc/vision.cpp:59 [kernel] BackendSelect: fallthrough registered at /pytorch/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback] Named: registered at /pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback] AutogradOther: fallthrough registered at /pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:35 [backend fallback] AutogradCPU: fallthrough registered at /pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:39 [backend fallback] AutogradCUDA: fallthrough registered at /pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:43 [backend fallback] AutogradXLA: fallthrough registered at /pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:47 [backend fallback] Tracer: fallthrough registered at /pytorch/torch/csrc/jit/frontend/tracer.cpp:967 [backend fallback] Autocast: fallthrough registered at /pytorch/aten/src/ATen/autocast_mode.cpp:254 [backend fallback] Batched: registered at /pytorch/aten/src/ATen/BatchingRegistrations.cpp:511 [backend fallback] VmapMode: fallthrough registered at /pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]

但是当我把:/nanodet/nanodet/model/module/nms.py batched_nms(boxes, scores, idxs, nms_cfg, class_agnostic=False)函数改后:

boxes_for_nms = boxes_for_nms.cpu() scores = scores.cpu() boxes = boxes.cpu() split_thr = nms_cfg_.pop('split_thr', 10000) if len(boxes_for_nms) < split_thr: # dets, keep = nms_op(boxes_for_nms, scores, **nms_cfg_) keep = nms(boxes_for_nms, scores, **nms_cfg_) boxes = boxes[keep] # scores = dets[:, -1] scores = scores[keep]

demo.py正常运行.
opened by lidongliang666 8

加入mosaic后效果变差了，是什么原因

coco.py

if self.load_mosaic and not isval:
            img4, labels4, bbox4 = load_mosaic(self, idx)
            meta['img_info']['height'] = img4.shape[0]
            meta['img_info']['width'] = img4.shape[1]
            meta['img'] = img4
            meta['gt_labels'] = labels4
            meta['gt_bboxes'] = bbox4


        meta = self.pipeline(self, meta, input_size)

        meta["img"] = torch.from_numpy(meta["img"].transpose(2, 0, 1))
        return meta

在ShapeTransform里测试打印出来的bbox是正常的

meta_data["img"] = img
        meta_data["warp_matrix"] = M
        if "gt_bboxes" in meta_data:
            boxes = meta_data["gt_bboxes"]
            meta_data["gt_bboxes"] = warp_boxes(boxes, M, dst_shape[0], dst_shape[1])
        if "gt_masks" in meta_data:
            for i, mask in enumerate(meta_data["gt_masks"]):
                meta_data["gt_masks"][i] = cv2.warpPerspective(
                    mask, M, dsize=tuple(dst_shape)
                )
        for i in range(meta_data["gt_bboxes"].shape[0]):
            cv2.rectangle(img, (int(meta_data["gt_bboxes"][i][0]), int(meta_data["gt_bboxes"][i][1])), (int(meta_data["gt_bboxes"][i][2]), int(meta_data["gt_bboxes"][i][3])), (255,0,0), 2)
        cv2.imwrite('./%d.jpg' % int(meta_data["gt_bboxes"][0][0]), img)

有什么可能的原因导致的？

opened by Rokuki 6

Cannot find blob with name: dis_pred_stride_8

使用demo_ncnn和demo_openvino测试转换预训练模型，转换过程均正常，但是预测时候出现问题，想问下怎么解决？

# demo_ncnn
find_blob_index_by_name input.1 failed
Try
find_blob_index_by_name dis_pred_stride_8 failed
Try
find_blob_index_by_name cls_pred_stride_8 failed

# demo_openvino
start init model
success
terminate called after throwing an instance of 'InferenceEngine::details::InferenceEngineException'
what(): Cannot find blob with name: dis_pred_stride_8

发现onnx模型存在dis_pred_stride_8等节点，但是转换后的ncnn模型这几个节点消失 onnx网络结构： onnx ncnn网络结构： ncnn

opened by TTMRonald 6

Cannot find blob with name: 795

转换的是NanoDet-EfficientLite 512x512这个模型，openvino版本为2021.3.394，能够正常转换，并在程序中加载成功，但推理的时候报错，日志如下： start init model success terminate called after throwing an instance of 'InferenceEngine::details::InferenceEngineException' what(): Cannot find blob with name: 795 有人遇到过吗

opened by deep-practice 6
CoreML export failure: 'ConvModule' object has no attribute 'norm'

Hi, I tried to turn the nanodet-m.pth to coreml for IOS. I used coremltools as the guide, and got error "CoreML export failure: 'ConvModule' object has no attribute 'norm'". I read the source code of nanodet found that the norm in head is BN which should be supported by coreml. So I do not know why is the error happening. Is anyone has tried coreml? Thanks!

opened by ghoshaw 6

No result while using single-class nano model in ncnn

Hi,我训练了一个person类的nanodet模型，然后通过tool/export.py转为onnx，然后转为ncnn的model，但是发现ncnn的model没有输出，我更改了cpp代码中的类别与图片size，不知道是在转换onnx时候出错还是onnx->NCNN时候出错了。下面是我训练时候的cfg

#Config File example
save_dir: workspace/nanodet_m
model:
  arch:
    name: GFL
    backbone:
      name: ShuffleNetV2
      model_size: 1.0x
      out_stages: [2,3,4]
      activation: LeakyReLU
    fpn:
      name: PAN
      in_channels: [116, 232, 464]
      out_channels: 96
      start_level: 0
      num_outs: 3
    head:
      name: NanoDetHead
      num_classes: 1
      input_channel: 96
      feat_channels: 96
      stacked_convs: 2
      share_cls_reg: True
      octave_base_scale: 5
      scales_per_octave: 1
      strides: [8, 16, 32]
      reg_max: 7
      norm_cfg:
        type: BN
      loss:
        loss_qfl:
          name: QualityFocalLoss
          use_sigmoid: True
          beta: 2.0
          loss_weight: 1.0
        loss_dfl:
          name: DistributionFocalLoss
          loss_weight: 0.25
        loss_bbox:
          name: GIoULoss
          loss_weight: 2.0
data:
  train:
    name: coco
    img_path: ../data/yoga_coco/images/train2017
    ann_path: ../data/yoga_coco/annotations/instances_train2017.json
    input_size: [416,416] #[w,h]
    keep_ratio: True
    pipeline:
      perspective: 0.0
      scale: [0.6, 1.4]
      stretch: [[1, 1], [1, 1]]
      rotation: 0
      shear: 0
      translate: 0.2
      flip: 0.5
      brightness: 0.2
      contrast: [0.8, 1.2]
      saturation: [0.8, 1.2]
      normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]]
  val:
    name: coco
    img_path: ../data/yoga_coco/images/val2017
    ann_path: ../data/yoga_coco/annotations/instances_val2017.json
    input_size: [416,416] #[w,h]
    keep_ratio: True
    pipeline:
      normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]]
device:
  gpu_ids: [0]
  workers_per_gpu: 6
  batchsize_per_gpu: 40
schedule:
#  resume:
#  load_model: YOUR_MODEL_PATH
  optimizer:
    name: SGD
    lr: 0.14
    momentum: 0.9
    weight_decay: 0.0001
  warmup:
    name: linear
    steps: 300
    ratio: 0.1
  total_epochs: 50
  lr_schedule:
    name: MultiStepLR
    milestones: [130,160,175,185]
    gamma: 0.1
  val_intervals: 10
evaluator:
  name: CocoDetectionEvaluator
  save_key: mAP

log:
  interval: 10

class_names: ['person',]

当我使用80类的model时，转化为ncnn有结果，所以想问问当转化成single-class时候，有什么配置是需要再修改一下的。

opened by Sean-hku 6

pth转onnx转ncnn问题

您好，我想问一下，我这边用pytorch模型转onnx再转ncnn模型，最后用ncnn模型检测结果不对。有几个修改：将config中的val输入改为64x64，将tools/export.py的输入大小改为64x64 python tools/export.py python -m onnxsim output.onnx output-sim.onnx build/tools/onnx/onnx2ncnn output-sim.onnx output-sim.param output-sim.bin build/tools/ncnnoptimize output-sim.param output-sim.bin new-output-sim.param new-output-sim.bin 0 这样操作是这样的 pytorch用的1.7.1 onnx 1.8.0 onnx-simplifier 0.2.19 onnxoptimizer 0.1.1 onnxruntime 1.6.0

是哪里操作有问题吗？

opened by yhl41001 6
original pytorch or onnx model

Could you please provide pretrained pytorch or onnx model weights also? I noticed you only shared converted ncnn models, but I would like to see the speed of inference on gpu/npu accelerated systems

opened by kadirbeytorun 6
python tools/train.py config/nanodet-plus-m_320.yml

Tried to : python tools/train.py config/nanodet-plus-m_320.yml error: pytorch_lightning.utilities.cloud_io.get_filesystem has been deprecated in v1.8.0 and will be" [NanoDet][01-04 10:28:00]INFO:Setting up data... loading annotations into memory... Done (t=18.55s) creating index... index created! loading annotations into memory... Done (t=0.56s) creating index... index created! [NanoDet][01-04 10:28:21]INFO:Creating model... model size is 1.0x init weights... => loading pretrained model https://download.pytorch.org/models/shufflenetv2_x1-5666bf0f80.pth Finish initialize NanoDet-Plus Head. GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs /root/anaconda3/envs/nanodet/lib/python3.7/site-packages/torch/cuda/init.py:143: UserWarning: NVIDIA GeForce RTX 3090 with CUDA capability sm_86 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70. If you want to use the NVIDIA GeForce RTX 3090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name)) LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]

| Name | Type | Params

0 | model | NanoDetPlus | 4.3 M 1 | avg_model | NanoDetPlus | 4.3 M

8.7 M Trainable params 0 Non-trainable params 8.7 M Total params 34.647 Total estimated model params size (MB) [NanoDet][01-04 10:28:21]INFO:Weight Averaging is enabled /root/anaconda3/envs/nanodet/lib/python3.7/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:229: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers argument(try 40 which is the number of cpus on this machine) in theDataLoader` init to improve performance. category=PossibleUserWarning, Traceback (most recent call last): File "tools/train.py", line 146, in main(args) File "tools/train.py", line 141, in main trainer.fit(task, train_dataloader, val_dataloader) File "/root/anaconda3/envs/nanodet/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 604, in fit self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path File "/root/anaconda3/envs/nanodet/lib/python3.7/site-packages/pytorch_lightning/trainer/call.py", line 38, in _call_and_handle_interrupt return trainer_fn(*args, **kwargs) File "/root/anaconda3/envs/nanodet/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 645, in _fit_impl self._run(model, ckpt_path=self.ckpt_path) File "/root/anaconda3/envs/nanodet/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1098, in _run results = self._run_stage() File "/root/anaconda3/envs/nanodet/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1177, in _run_stage self._run_train() File "/root/anaconda3/envs/nanodet/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1200, in _run_train self.fit_loop.run() File "/root/anaconda3/envs/nanodet/lib/python3.7/site-packages/pytorch_lightning/loops/loop.py", line 194, in run self.on_run_start(*args, **kwargs) File "/root/anaconda3/envs/nanodet/lib/python3.7/site-packages/pytorch_lightning/loops/fit_loop.py", line 206, in on_run_start self.trainer.reset_train_dataloader(self.trainer.lightning_module) File "/root/anaconda3/envs/nanodet/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1552, in reset_train_dataloader if has_len_all_ranks(self.train_dataloader, self.strategy, module) File "/root/anaconda3/envs/nanodet/lib/python3.7/site-packages/pytorch_lightning/utilities/data.py", line 110, in has_len_all_ranks if total_length == 0: RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

python3.7 cuda==10.2 gpu==RT3090 UBUNTU20.04

Thanks

opened by molyswu 0

Fails to train a model on a dataset with single class.

I used the converted COCO 2017 with only labeled persons. Вот мой config:

save_dir: workspace/nanodet-plus-m_416
model:
  weight_averager:
    name: ExpMovingAverager
    decay: 0.9998
  arch:
    name: NanoDetPlus
    detach_epoch: 10
    backbone:
      name: ShuffleNetV2
      model_size: 1.0x
      out_stages: [2,3,4]
      activation: LeakyReLU
    fpn:
      name: GhostPAN
      in_channels: [116, 232, 464]
      out_channels: 96
      kernel_size: 5
      num_extra_level: 1
      use_depthwise: True
      activation: LeakyReLU
    head:
      name: NanoDetPlusHead
      num_classes: 1
      input_channel: 96
      feat_channels: 96
      stacked_convs: 2
      kernel_size: 5
      strides: [8, 16, 32, 64]
      activation: LeakyReLU
      reg_max: 1
      norm_cfg:
        type: BN
      loss:
        loss_qfl:
          name: QualityFocalLoss
          use_sigmoid: True
          beta: 2.0
          loss_weight: 1.0
        loss_dfl:
          name: DistributionFocalLoss
          loss_weight: 0.25
        loss_bbox:
          name: GIoULoss
          loss_weight: 2.0
    # Auxiliary head, only use in training time.
    aux_head:
      name: SimpleConvHead
      num_classes: 1
      input_channel: 192
      feat_channels: 192
      stacked_convs: 4
      strides: [8, 16, 32, 64]
      activation: LeakyReLU
      reg_max: 1
data:
  train:
    name: CocoDataset
    img_path: /home/mosminin/fiftyone/coco_person/train/data
    ann_path: /home/mosminin/fiftyone/coco_person/train/labels.json
    input_size: [416,416] #[w,h]
    keep_ratio: False
    pipeline:
      perspective: 0.0
      scale: [0.6, 1.4]
      stretch: [[0.8, 1.2], [0.8, 1.2]]
      rotation: 0
      shear: 0
      translate: 0.2
      flip: 0.5
      brightness: 0.2
      contrast: [0.6, 1.4]
      saturation: [0.5, 1.2]
      normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]]
  val:
    name: CocoDataset
    img_path: /home/mosminin/fiftyone/coco_person/validation/data
    ann_path: /home/mosminin/fiftyone/coco_person/validation/labels.json
    input_size: [416,416] #[w,h]
    keep_ratio: False
    pipeline:
      normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]]
device:
  gpu_ids: [0]
  workers_per_gpu: 6
  batchsize_per_gpu: 16
schedule:
#  resume:
#  load_model:
  optimizer:
    name: AdamW
    lr: 0.001
    weight_decay: 0.05
  warmup:
    name: linear
    steps: 500
    ratio: 0.0001
  total_epochs: 300
  lr_schedule:
    name: CosineAnnealingLR
    T_max: 300
    eta_min: 0.00005
  val_intervals: 10
grad_clip: 35
evaluator:
  name: CocoDetectionEvaluator
  save_key: mAP
log:
  interval: 50

class_names: ['person']

I also changed the train.py to use CPU instead of GPU the errors were more understandable.

    # if cfg.device.gpu_ids == -1:
    #     logger.info("Using CPU training")
    #     accelerator, devices, strategy = "cpu", None, None
    # else:
    #     accelerator, devices, strategy = "gpu", cfg.device.gpu_ids, None

    accelerator, devices, strategy = "cpu", None, None # CPU training

After running it, I get the following errors.

(.venv) mosminin@debian:~/dev/nanodet$ python tools/train.py /home/mosminin/dev/nanodet/config/nanodet-plus-m_416_person.yml
/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/utilities/cloud_io.py:33: LightningDeprecationWarning: `pytorch_lightning.utilities.cloud_io.get_filesystem` has been deprecated in v1.8.0 and will be removed in v1.10.0. Please use `lightning_lite.utilities.cloud_io.get_filesystem` instead.
  rank_zero_deprecation(
[NanoDet][12-18 14:05:30]INFO:Setting up data...
loading annotations into memory...
Done (t=4.35s)
creating index...
index created!
loading annotations into memory...
Done (t=0.16s)
creating index...
index created!
[NanoDet][12-18 14:05:35]INFO:Creating model...
model size is  1.0x
init weights...
=> loading pretrained model https://download.pytorch.org/models/shufflenetv2_x1-5666bf0f80.pth
Finish initialize NanoDet-Plus Head.
GPU available: True (cuda), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/trainer/setup.py:175: PossibleUserWarning: GPU available but not used. Set `accelerator` and `devices` using `Trainer(accelerator='gpu', devices=1)`.
  rank_zero_warn(

  | Name      | Type        | Params
------------------------------------------
0 | model     | NanoDetPlus | 4.1 M 
1 | avg_model | NanoDetPlus | 4.1 M 
------------------------------------------
8.2 M     Trainable params
0         Non-trainable params
8.2 M     Total params
32.903    Total estimated model params size (MB)
[NanoDet][12-18 14:05:35]INFO:Weight Averaging is enabled
/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Traceback (most recent call last):
  File "/home/mosminin/dev/nanodet/tools/train.py", line 147, in <module>
    main(args)
  File "/home/mosminin/dev/nanodet/tools/train.py", line 142, in main
    trainer.fit(task, train_dataloader, val_dataloader)
  File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 603, in fit
    call._call_and_handle_interrupt(
  File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/trainer/call.py", line 38, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 645, in _fit_impl
    self._run(model, ckpt_path=self.ckpt_path)
  File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1098, in _run
    results = self._run_stage()
  File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1177, in _run_stage
    self._run_train()
  File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1200, in _run_train
    self.fit_loop.run()
  File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
    self.advance(*args, **kwargs)
  File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/loops/fit_loop.py", line 267, in advance
    self._outputs = self.epoch_loop.run(self._data_fetcher)
  File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
    self.advance(*args, **kwargs)
  File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 214, in advance
    batch_output = self.batch_loop.run(kwargs)
  File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
    self.advance(*args, **kwargs)
  File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 88, in advance
    outputs = self.optimizer_loop.run(optimizers, kwargs)
  File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
    self.advance(*args, **kwargs)
  File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 200, in advance
    result = self._run_optimization(kwargs, self._optimizers[self.optim_progress.optimizer_position])
  File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 247, in _run_optimization
    self._optimizer_step(optimizer, opt_idx, kwargs.get("batch_idx", 0), closure)
  File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 357, in _optimizer_step
    self.trainer._call_lightning_module_hook(
  File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1342, in _call_lightning_module_hook
    output = fn(*args, **kwargs)
  File "/home/mosminin/dev/nanodet/nanodet/trainer/task.py", line 281, in optimizer_step
    optimizer.step(closure=optimizer_closure)
  File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/core/optimizer.py", line 169, in step
    step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
  File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/strategies/strategy.py", line 234, in optimizer_step
    return self.precision_plugin.optimizer_step(
  File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 121, in optimizer_step
    return optimizer.step(closure=closure, **kwargs)
  File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/torch/optim/lr_scheduler.py", line 68, in wrapper
    return wrapped(*args, **kwargs)
  File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/torch/optim/optimizer.py", line 140, in wrapper
    out = func(*args, **kwargs)
  File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/torch/optim/adamw.py", line 120, in step
    loss = closure()
  File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 107, in _wrap_closure
    closure_result = closure()
  File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 147, in __call__
    self._result = self.closure(*args, **kwargs)
  File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 133, in closure
    step_output = self._step_fn()
  File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 406, in _training_step
    training_step_output = self.trainer._call_strategy_hook("training_step", *kwargs.values())
  File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1480, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/strategies/strategy.py", line 378, in training_step
    return self.model.training_step(*args, **kwargs)
  File "/home/mosminin/dev/nanodet/nanodet/trainer/task.py", line 78, in training_step
    preds, loss, loss_states = self.model.forward_train(batch)
  File "/home/mosminin/dev/nanodet/nanodet/model/arch/nanodet_plus.py", line 56, in forward_train
    loss, loss_states = self.head.loss(head_out, gt_meta, aux_preds=aux_head_out)
  File "/home/mosminin/dev/nanodet/nanodet/model/head/nanodet_plus_head.py", line 198, in loss
    batch_assign_res = multi_apply(
  File "/home/mosminin/dev/nanodet/nanodet/util/misc.py", line 24, in multi_apply
    return tuple(map(list, zip(*map_results)))
  File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/mosminin/dev/nanodet/nanodet/model/head/nanodet_plus_head.py", line 314, in target_assign_single_img
    assign_result = self.assigner.assign(
  File "/home/mosminin/dev/nanodet/nanodet/model/head/assigner/dsl_assigner.py", line 86, in assign
    F.one_hot(gt_labels.to(torch.int64), pred_scores.shape[-1])
RuntimeError: Class values must be smaller than num_classes.

What am I doing wrong?

opened by Octopusmode 0

Adapting the code to output a center x, y instead of bounding boxes (x1, y1, x2, y2)

Hey, I'm not too familiar with machine learning and the like, and I'm not exactly ready to spend the next 2 months (yet) learning how tensor-flow works and such, so I'm hoping someone can assist me with this.

So far, my experience with nanodet has been great; but, manually annotating images takes a lot of time which I don't have; because I don't really need the bounding box information anyway, I assumed I'd seek for a way to only give the center of objects rather than the top left and bottom right corners.

Help would be highly appreciated 😄

opened by icecreamnotallowed 0
The onnx model(which is transfor by export_onnx.py) out put is differ from pytoch model

def image_preprocess(img_path): img = cv2.imread(img_path).astype("float32")/255 # mean = [103.53, 116.28, 123.675] # Image net values # std = [57.375, 57.12, 58.395] mean = [113.533554, 118.14172, 123.63607] std = [21.405144, 21.405144, 21.405144] mean = np.array(mean, dtype=np.float32).reshape(1, 1, 3) / 255 std = np.array(std, dtype=np.float32).reshape(1, 1, 3) / 255 img = (img - mean) / std img = np.transpose(img, (2, 0, 1)) img = np.expand_dims(img, axis=0) return img

def test_onnx_model(onnx_model,img_path=None): if img_path is None: img_path = "path for img" imgdata = image_preprocess(img_path) sess = rt.InferenceSession(onnx_model) input_name = sess.get_inputs()[0].name output_detect_name = sess.get_outputs()[0].name pred_onnx0= sess.run([output_detect_name], {input_name: imgdata}) print("outputs:") print(np.array(pred_onnx0))

opened by Genlk 0
Fixes a couple of issues to add fp16 training support

There were a couple of issues when trying to use fp16 training. For one was that it was not exposed through the configuration system. The other was that the DynamicSoftLabelAssigner used binary_cross_entropy instead of binary_cross_entropy_with_logits. This changes where sigmoid is called on the predictions so that the more stable binary_cross_entropy_with_logits can be used and the Trainer can be configured to use fp16 precision.

opened by crisp-snakey 0

Releases(v1.0.0-alpha-1)

v1.0.0-alpha-1(Dec 26, 2021)

NanoDet-Plus v1.0.0-alpha

In NanoDet-Plus, we propose a novel label assignment strategy with a simple assign guidance module (AGM) and a dynamic soft label assigner (DSLA) to solve the optimal label assignment problem in lightweight model training. We also introduce a light feature pyramid called Ghost-PAN to enhance multi-layer feature fusion. These improvements boost previous NanoDet's detection accuracy by 7 mAP on COCO dataset.

Model |Resolution| mAP^{val
0.5:0.95 |CPU Latency^{(i7-8700) |ARM Latency^{(4xA76) | FLOPS | Params | Model Size
:-------------:|:--------:|:-------:|:--------------------:|:--------------------:|:----------:|:---------:|:-------:
NanoDet-m | 320320 | 20.6 | 4.98ms | 10.23ms | 0.72G | 0.95M | 1.8MB(FP16) | 980KB(INT8)
NanoDet-Plus-m | 320320 | 27.0 | 5.25ms | 11.97ms | 0.9G | 1.17M | 2.3MB(FP16) | 1.2MB(INT8)
NanoDet-Plus-m | 416416 | 30.4 | 8.32ms | 19.77ms | 1.52G | 1.17M | 2.3MB(FP16) | 1.2MB(INT8)
NanoDet-Plus-m-1.5x | 320320 | 29.9 | 7.21ms | 15.90ms | 1.75G | 2.44M | 4.7MB(FP16) | 2.3MB(INT8)
NanoDet-Plus-m-1.5x | 416416 | 34.1 | 11.50ms | 25.49ms | 2.97G | 2.44M | 4.7MB(FP16) | 2.3MB(INT8)
YOLOv3-Tiny | 416416 | 16.6 | - | 37.6ms | 5.62G | 8.86M | 33.7MB
YOLOv4-Tiny | 416416 | 21.7 | - | 32.81ms | 6.96G | 6.06M | 23.0MB
YOLOX-Nano | 416416 | 25.8 | - | 23.08ms | 1.08G | 0.91M | 1.8MB(FP16)
YOLOv5-n | 640640 | 28.4 | - | 44.39ms | 4.5G | 1.9M | 3.8MB(FP16)
FBNetV5 | 320640 | 30.4 | - | - | 1.8G | - | -
MobileDet | 320*320 | 25.6 | - | - | 0.9G | - | -}}}

Model checkpoints and weights

Download in the release files.
Source code(tar.gz)
Source code(zip)
nanodet-plus-m-1.5x_320.onnx(9.43 MB)
nanodet-plus-m-1.5x_320_checkpoint.ckpt(61.63 MB)
nanodet-plus-m-1.5x_416.onnx(9.43 MB)
nanodet-plus-m-1.5x_416_checkpoint.ckpt(61.63 MB)
nanodet-plus-m-1.5x_416_ncnn.zip(4.40 MB)
nanodet-plus-m-1.5x_416_openvino.zip(4.39 MB)
nanodet-plus-m_320.onnx(4.57 MB)
nanodet-plus-m_320_checkpoint.ckpt(33.82 MB)
nanodet-plus-m_416.onnx(4.57 MB)
nanodet-plus-m_416_checkpoint.ckpt(33.82 MB)
nanodet-plus-m_416_mnn.mnn(4.59 MB)
nanodet-plus-m_416_ncnn.zip(2.11 MB)
nanodet-plus-m_416_openvino.zip(2.11 MB)
v0.4.2(Aug 22, 2021)

v0.4.2

Fix some compatibility issue of NanoDet v0.4

Fix pytorch-lightning compatibility. (#304 #309 ) Fix pytorch1.9 compatibility. (#308 ) Support not raising an error when evaluate with empty results. (#310)

I'm doing a lot of refactoring. NanoDet v1.x is coming soon.

Download pretrained models

Model | Backbone |Resolution|COCO mAP| FLOPS |Params | Pre-train weight | ncnn model | ncnn-int8 | :--------------------:|:------------------:|:--------:|:------:|:-----:|:-----:|:-----:|:-----:|:-----:| NanoDet-m | ShuffleNetV2 1.0x | 320320 | 20.6 | 0.72B | 0.95M | Download | Download | Download NanoDet-m-416 | ShuffleNetV2 1.0x | 416416 | 23.5 | 1.2B | 0.95M | Download| Download | Download | NanoDet-m-1.5x | ShuffleNetV2 1.5x | 320320 | 23.5 | 1.44B | 2.08M | Download | Download | Download NanoDet-m-1.5x-416 | ShuffleNetV2 1.5x | 416416 | 26.8 | 2.42B | 2.08M | Download| Download | Download NanoDet-t | ShuffleNetV2 1.0x | 320320 | 21.7 | 0.96B | 1.36M | Download | NanoDet-g | Custom CSP Net | 416416 | 22.9 | 4.2B | 3.81M | Download| NanoDet-EfficientLite | EfficientNet-Lite0 | 320320 | 24.7 | 1.72B | 3.11M | Download| NanoDet-EfficientLite | EfficientNet-Lite1 | 416416 | 30.3 | 4.06B | 4.01M | Download | NanoDet-EfficientLite | EfficientNet-Lite2 | 512512 | 32.6 | 7.12B | 4.71M | Download | NanoDet-RepVGG | RepVGG-A0 | 416416 | 27.8 | 11.3B | 6.75M | Download |
Source code(tar.gz)
Source code(zip)
v0.4.1(Jul 17, 2021)

v0.4.1

This is a final release of NanoDet v0.x.

I'm doing a lot of refactoring. NanoDet v1.x is coming soon.

Download pretrained models

Model | Backbone |Resolution|COCO mAP| FLOPS |Params | Pre-train weight | ncnn model | ncnn-int8 | :--------------------:|:------------------:|:--------:|:------:|:-----:|:-----:|:-----:|:-----:|:-----:| NanoDet-m | ShuffleNetV2 1.0x | 320320 | 20.6 | 0.72B | 0.95M | Download | Download | Download NanoDet-m-416 | ShuffleNetV2 1.0x | 416416 | 23.5 | 1.2B | 0.95M | Download| Download | Download | NanoDet-m-1.5x | ShuffleNetV2 1.5x | 320320 | 23.5 | 1.44B | 2.08M | Download | Download | Download NanoDet-m-1.5x-416 | ShuffleNetV2 1.5x | 416416 | 26.8 | 2.42B | 2.08M | Download| Download | Download NanoDet-t | ShuffleNetV2 1.0x | 320320 | 21.7 | 0.96B | 1.36M | Download | NanoDet-g | Custom CSP Net | 416416 | 22.9 | 4.2B | 3.81M | Download| NanoDet-EfficientLite | EfficientNet-Lite0 | 320320 | 24.7 | 1.72B | 3.11M | Download| NanoDet-EfficientLite | EfficientNet-Lite1 | 416416 | 30.3 | 4.06B | 4.01M | Download | NanoDet-EfficientLite | EfficientNet-Lite2 | 512512 | 32.6 | 7.12B | 4.71M | Download | NanoDet-RepVGG | RepVGG-A0 | 416416 | 27.8 | 11.3B | 6.75M | Download |
Source code(tar.gz)
Source code(zip)
v0.4.0(Jun 8, 2021)
What's new in v0.4.0

Fix a little bug in demo.py by BlainWu (#210)

Add script to export TorchScript model by strawberrypie (#211)

Use fixed output names when exporting ONNX (#218)

Use scale_factor instead of fixed size in resize to support dynamic shape inference (#218)

Ensure num_classes equal len(class_names) by ZHEQIUSHUI (#221)

Fix a bug in mnn demo while using GPU device by AcherStyx (#234)

Fix with_last_conv bug in shufflenet (#239)

Support batch eval (#241)

Add nanodet-m-1.5x models (#242)

Update model benchmark (#246)

Prevent lightning Trainer from disabling cudnn.benchmark (#249)

Fix multi-GPU evaluation bug with pytorch-lightning (#254)

Download pretrained models

Model | Backbone |Resolution|COCO mAP| FLOPS |Params | Pre-train weight | :--------------------:|:------------------:|:--------:|:------:|:-----:|:-----:|:-----:| NanoDet-m | ShuffleNetV2 1.0x | 320320 | 20.6 | 0.72B | 0.95M | Download | NanoDet-m-416 | ShuffleNetV2 1.0x | 416416 | 23.5 | 1.2B | 0.95M | Download| NanoDet-m-1.5x | ShuffleNetV2 1.5x | 320320 | 23.5 | 1.44B | 2.08M | Download | NanoDet-m-1.5x-416 | ShuffleNetV2 1.5x | 416416 | 26.8 | 2.42B | 2.08M | Download| NanoDet-t | ShuffleNetV2 1.0x | 320320 | 21.7 | 0.96B | 1.36M | Download | NanoDet-g | Custom CSP Net | 416416 | 22.9 | 4.2B | 3.81M | Download| NanoDet-EfficientLite | EfficientNet-Lite0 | 320320 | 24.7 | 1.72B | 3.11M | Download| NanoDet-EfficientLite | EfficientNet-Lite1 | 416416 | 30.3 | 4.06B | 4.01M | Download | NanoDet-EfficientLite | EfficientNet-Lite2 | 512512 | 32.6 | 7.12B | 4.71M | Download | NanoDet-RepVGG | RepVGG-A0 | 416416 | 27.8 | 11.3B | 6.75M | Download |

Download ncnn models below
Source code(tar.gz)
Source code(zip)
ncnn-nanodet-m-1.5x-416-int8.zip(1.82 MB)
ncnn-nanodet-m-1.5x-416.zip(3.67 MB)
ncnn-nanodet-m-1.5x-int8.zip(1.82 MB)
ncnn-nanodet-m-1.5x.zip(3.66 MB)
ncnn-nanodet-m-416-int8.zip(882.58 KB)
ncnn-nanodet-m-416.zip(1.64 MB)
ncnn-nanodet-m-int8.zip(888.76 KB)
ncnn-nanodet-m.zip(1.64 MB)
v0.3.0(Apr 11, 2021)
What's new in v0.3.0

Refactor training and testing code with pytorch-lightning.

Solving ONNX inference AxisError by zshn25 (#198).

Download pretrained models

Model | Backbone |Resolution|COCO mAP| FLOPS |Params | Pre-train weight | :--------------------:|:------------------:|:--------:|:------:|:-----:|:-----:|:-----:| NanoDet-m | ShuffleNetV2 1.0x | 320320 | 20.6 | 0.72B | 0.95M | Download | NanoDet-m-416 | ShuffleNetV2 1.0x | 416416 | 23.5 | 1.2B | 0.95M | Download| NanoDet-t (NEW) | ShuffleNetV2 1.0x | 320320 | 21.7 | 0.96B | 1.36M | Download | NanoDet-g | Custom CSP Net | 416416 | 22.9 | 4.2B | 3.81M | Download| NanoDet-EfficientLite | EfficientNet-Lite0 | 320320 | 24.7 | 1.72B | 3.11M | Download| NanoDet-EfficientLite | EfficientNet-Lite1 | 416416 | 30.3 | 4.06B | 4.01M | Download | NanoDet-EfficientLite | EfficientNet-Lite2 | 512512 | 32.6 | 7.12B | 4.71M | Download | NanoDet-RepVGG | RepVGG-A0 | 416416 | 27.8 | 11.3B | 6.75M | Download |
Source code(tar.gz)
Source code(zip)
nanodet_m_ncnn_model.zip(1.64 MB)
v0.2.0(Mar 29, 2021)
What's new in v0.2.0

Add pyncnn demo by caishanli (#167).

Fix ncnn demo build failure without vulkan by nihui (#168).

Add NanoDet-t with Transformer Attention Network (#183).

Add Notebook demo by zhiqwang (#188).

Add feature of saving demo inference result by wwdok (#191).

Fix utf-8 decode bug (#184).

Fix test bug.

Download pretrained models

Model | Backbone |Resolution|COCO mAP| FLOPS |Params | Pre-train weight | :--------------------:|:------------------:|:--------:|:------:|:-----:|:-----:|:-----:| NanoDet-m | ShuffleNetV2 1.0x | 320320 | 20.6 | 0.72B | 0.95M | Download | NanoDet-m-416 | ShuffleNetV2 1.0x | 416416 | 23.5 | 1.2B | 0.95M | Download| NanoDet-t (NEW) | ShuffleNetV2 1.0x | 320320 | 21.7 | 0.96B | 1.36M | Download | NanoDet-g | Custom CSP Net | 416416 | 22.9 | 4.2B | 3.81M | Download| NanoDet-EfficientLite | EfficientNet-Lite0 | 320320 | 24.7 | 1.72B | 3.11M | Download| NanoDet-EfficientLite | EfficientNet-Lite1 | 416416 | 30.3 | 4.06B | 4.01M | Download | NanoDet-EfficientLite | EfficientNet-Lite2 | 512512 | 32.6 | 7.12B | 4.71M | Download | NanoDet-RepVGG | RepVGG-A0 | 416416 | 27.8 | 11.3B | 6.75M | Download |
Source code(tar.gz)
Source code(zip)
v0.1.0(Mar 7, 2021)
What's new in v0.1.0

Support MNN python and cpp inference (#83 ).

Support OpenVINO inference.

Support libtorch inference experimentally.

Add NanoDet-g.

Add EfficientNet-Lite and Rep-VGG backbone.

Add Model Zoo and provide more pre-trained model.

Refactor GFL head (#154 ).

Download pretrained models

Model | Backbone |Resolution|COCO mAP| FLOPS |Params | Pre-train weight | :--------------------:|:------------------:|:--------:|:------:|:-----:|:-----:|:-----:| NanoDet-m | ShuffleNetV2 1.0x | 320320 | 20.6 | 0.72B | 0.95M | Download | NanoDet-m-416 | ShuffleNetV2 1.0x | 416416 | 23.5 | 1.2B | 0.95M | Download| NanoDet-g | Custom CSP Net | 416416 | 22.9 | 4.2B | 3.81M | Download| NanoDet-EfficientLite | EfficientNet-Lite0 | 320320 | 24.7 | 1.72B | 3.11M | Download| NanoDet-EfficientLite | EfficientNet-Lite1 | 416416 | 30.3 | 4.06B | 4.01M | Download | NanoDet-EfficientLite | EfficientNet-Lite2 | 512512 | 32.6 | 7.12B | 4.71M | Download | NanoDet-RepVGG | RepVGG-A0 | 416*416 | 27.8 | 11.3B | 6.75M | Download |
Source code(tar.gz)
Source code(zip)
v0.0.1(Nov 22, 2020)

NanoDet ncnn model released.
Source code(tar.gz)
Source code(zip)
nanodet_ncnn_model.zip(1.65 MB)

NanoDet-Plus⚡Super fast and lightweight anchor-free object detection model. 🔥Only 980 KB(int8) / 1.8MB (fp16) and run 97FPS on cellphone🔥

Related tags

Overview

NanoDet-Plus

Introduction

Benchmarks

NEWS!!!

Demo

Android demo

NCNN C++ demo

MNN demo

OpenVINO demo

Web browser demo

Pytorch demo

Install

Requirements

Step

Model Zoo

How to Train

How to Deploy

Export model to ONNX

Run NanoDet in C++ with inference libraries

ncnn

OpenVINO

MNN

Run NanoDet on Android

Citation

Thanks

Comments

| Name | Type | Params

0 | model | NanoDetPlus | 4.3 M 1 | avg_model | NanoDetPlus | 4.3 M

Releases(v1.0.0-alpha-1)

v1.0.0-alpha-1(Dec 26, 2021)

NanoDet-Plus v1.0.0-alpha

Model checkpoints and weights

v0.4.2(Aug 22, 2021)

v0.4.2

Fix some compatibility issue of NanoDet v0.4

Download pretrained models

v0.4.1(Jul 17, 2021)

v0.4.1

Download pretrained models

v0.4.0(Jun 8, 2021)

What's new in v0.4.0

Download pretrained models

Download ncnn models below

v0.3.0(Apr 11, 2021)

What's new in v0.3.0

Download pretrained models

v0.2.0(Mar 29, 2021)

What's new in v0.2.0

Download pretrained models

v0.1.0(Mar 7, 2021)

What's new in v0.1.0

Download pretrained models

v0.0.1(Nov 22, 2020)

Owner

This project aims to explore the deployment of Swin-Transformer based on TensorRT, including the test results of FP16 and INT8.

MMdet2-based reposity about lightweight detection model: Nanodet, PicoDet.

DAFNe: A One-Stage Anchor-Free Deep Model for Oriented Object Detection

Super-Fast-Adversarial-Training - A PyTorch Implementation code for developing super fast adversarial training

CenterFace(size of 7.3MB) is a practical anchor-free face detection and alignment method for edge devices.

Face Recognition plus identification simply and fast | Python

Official implementation of "Dynamic Anchor Learning for Arbitrary-Oriented Object Detection" (AAAI2021).

A simple python module to generate anchor (aka default/prior) boxes for object detection tasks.

tensorrt int8 量化yolov5 4.0 onnx模型

EfficientNetv2 TensorRT int8

A high-performance anchor-free YOLO. Exceeding yolov3~v5 with ONNX, TensorRT, NCNN, and Openvino supported.

YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with ONNX, TensorRT, ncnn, and OpenVINO supported.

Code for CVPR 2021 paper: Anchor-Free Person Search

Code for CVPR2021 paper "Learning Salient Boundary Feature for Anchor-free Temporal Action Localization"

The official implementation of the CVPR 2021 paper FAPIS: a Few-shot Anchor-free Part-based Instance Segmenter

Yoloxkeypointsegment - An anchor-free version of YOLO, with a simpler design but better performance

Repository to run object detection on a model trained on an autonomous driving dataset.

Run object detection model on the Raspberry Pi

Based on Yolo's low-power, ultra-lightweight universal target detection algorithm, the parameter is only 250k, and the speed of the smart phone mobile terminal can reach ~300fps+