NanoDet-Plus⚡Super fast and lightweight anchor-free object detection model. 🔥Only 980 KB(int8) / 1.8MB (fp16) and run 97FPS on cellphone🔥

Overview

NanoDet-Plus

Super fast and high accuracy lightweight anchor-free object detection model. Real-time on mobile devices.

CI testing Codecov GitHub license Github downloads GitHub release (latest by date)

  • Super lightweight: Model file is only 980KB(INT8) or 1.8MB(FP16).
  • Super fast: 97fps(10.23ms) on mobile ARM CPU.
  • 👍 High accuracy: Up to 34.3 mAPval@0.5:0.95 and still realtime on CPU.
  • 🤗 Training friendly: Much lower GPU memory cost than other models. Batch-size=80 is available on GTX1060 6G.
  • 😎 Easy to deploy: Support various backends including ncnn, MNN and OpenVINO. Also provide Android demo based on ncnn inference framework.

Introduction

NanoDet is a FCOS-style one-stage anchor-free object detection model which using Generalized Focal Loss as classification and regression loss.

In NanoDet-Plus, we propose a novel label assignment strategy with a simple assign guidance module (AGM) and a dynamic soft label assigner (DSLA) to solve the optimal label assignment problem in lightweight model training. We also introduce a light feature pyramid called Ghost-PAN to enhance multi-layer feature fusion. These improvements boost previous NanoDet's detection accuracy by 7 mAP on COCO dataset.

NanoDet-Plus 知乎中文介绍

NanoDet 知乎中文介绍

QQ交流群:908606542 (答案:炼丹)


Benchmarks

Model Resolution mAPval
0.5:0.95
CPU Latency
(i7-8700)
ARM Latency
(4xA76)
FLOPS Params Model Size
NanoDet-m 320*320 20.6 4.98ms 10.23ms 0.72G 0.95M 1.8MB(FP16) | 980KB(INT8)
NanoDet-Plus-m 320*320 27.0 5.25ms 11.97ms 0.9G 1.17M 2.3MB(FP16) | 1.2MB(INT8)
NanoDet-Plus-m 416*416 30.4 8.32ms 19.77ms 1.52G 1.17M 2.3MB(FP16) | 1.2MB(INT8)
NanoDet-Plus-m-1.5x 320*320 29.9 7.21ms 15.90ms 1.75G 2.44M 4.7MB(FP16) | 2.3MB(INT8)
NanoDet-Plus-m-1.5x 416*416 34.1 11.50ms 25.49ms 2.97G 2.44M 4.7MB(FP16) | 2.3MB(INT8)
YOLOv3-Tiny 416*416 16.6 - 37.6ms 5.62G 8.86M 33.7MB
YOLOv4-Tiny 416*416 21.7 - 32.81ms 6.96G 6.06M 23.0MB
YOLOX-Nano 416*416 25.8 - 23.08ms 1.08G 0.91M 1.8MB(FP16)
YOLOv5-n 640*640 28.4 - 44.39ms 4.5G 1.9M 3.8MB(FP16)
FBNetV5 320*640 30.4 - - 1.8G - -
MobileDet 320*320 25.6 - - 0.9G - -

Download pre-trained models and find more models in Model Zoo or in Release Files

Notes (click to expand)
  • ARM Performance is measured on Kirin 980(4xA76+4xA55) ARM CPU based on ncnn. You can test latency on your phone with ncnn_android_benchmark.

  • Intel CPU Performance is measured Intel Core-i7-8700 based on OpenVINO.

  • NanoDet mAP(0.5:0.95) is validated on COCO val2017 dataset with no testing time augmentation.

  • YOLOv3&YOLOv4 mAP refers from Scaled-YOLOv4: Scaling Cross Stage Partial Network.


NEWS!!!

  • [2021.12.25] NanoDet-Plus release! Adding AGM(Assign Guidance Module) & DSLA(Dynamic Soft Label Assigner) to improve 7 mAP with only a little cost.

Find more update notes in Update notes.

Demo

Android demo

android_demo

Android demo project is in demo_android_ncnn folder. Please refer to Android demo guide.

Here is a better implementation 👉 ncnn-android-nanodet

NCNN C++ demo

C++ demo based on ncnn is in demo_ncnn folder. Please refer to Cpp demo guide.

MNN demo

Inference using Alibaba's MNN framework is in demo_mnn folder. Please refer to MNN demo guide.

OpenVINO demo

Inference using OpenVINO is in demo_openvino folder. Please refer to OpenVINO demo guide.

Web browser demo

https://nihui.github.io/ncnn-webassembly-nanodet/

Pytorch demo

First, install requirements and setup NanoDet following installation guide. Then download COCO pretrain weight from here

👉 COCO pretrain checkpoint

The pre-trained weight was trained by the config config/nanodet-plus-m_416.yml.

  • Inference images
python demo/demo.py image --config CONFIG_PATH --model MODEL_PATH --path IMAGE_PATH
  • Inference video
python demo/demo.py video --config CONFIG_PATH --model MODEL_PATH --path VIDEO_PATH
  • Inference webcam
python demo/demo.py webcam --config CONFIG_PATH --model MODEL_PATH --camid YOUR_CAMERA_ID

Besides, We provide a notebook here to demonstrate how to make it work with PyTorch.


Install

Requirements

  • Linux or MacOS
  • CUDA >= 10.0
  • Python >= 3.6
  • Pytorch >= 1.7
  • experimental support Windows (Notice: Windows not support distributed training before pytorch1.7)

Step

  1. Create a conda virtual environment and then activate it.
 conda create -n nanodet python=3.8 -y
 conda activate nanodet
  1. Install pytorch
conda install pytorch torchvision cudatoolkit=11.1 -c pytorch -c conda-forge
  1. Install requirements
pip install Cython termcolor numpy tensorboard pycocotools matplotlib pyaml opencv-python tqdm pytorch-lightning torchmetrics
  1. Setup NanoDet
git clone https://github.com/RangiLyu/nanodet.git
cd nanodet
python setup.py develop

Model Zoo

NanoDet supports variety of backbones. Go to the config folder to see the sample training config files.

Model Backbone Resolution COCO mAP FLOPS Params Pre-train weight
NanoDet-m ShuffleNetV2 1.0x 320*320 20.6 0.72G 0.95M Download
NanoDet-Plus-m-320 (NEW) ShuffleNetV2 1.0x 320*320 27.0 0.9G 1.17M Weight | Checkpoint
NanoDet-Plus-m-416 (NEW) ShuffleNetV2 1.0x 416*416 30.4 1.52G 1.17M Weight | Checkpoint
NanoDet-Plus-m-1.5x-320 (NEW) ShuffleNetV2 1.5x 320*320 29.9 1.75G 2.44M Weight | Checkpoint
NanoDet-Plus-m-1.5x-416 (NEW) ShuffleNetV2 1.5x 416*416 34.1 2.97G 2.44M Weight | Checkpoint

Notice: The difference between Weight and Checkpoint is the weight only provide params in inference time, but the checkpoint contains training time params.

Legacy Model Zoo

Model Backbone Resolution COCO mAP FLOPS Params Pre-train weight
NanoDet-m-416 ShuffleNetV2 1.0x 416*416 23.5 1.2G 0.95M Download
NanoDet-m-1.5x ShuffleNetV2 1.5x 320*320 23.5 1.44G 2.08M Download
NanoDet-m-1.5x-416 ShuffleNetV2 1.5x 416*416 26.8 2.42G 2.08M Download
NanoDet-m-0.5x ShuffleNetV2 0.5x 320*320 13.5 0.3G 0.28M Download
NanoDet-t ShuffleNetV2 1.0x 320*320 21.7 0.96G 1.36M Download
NanoDet-g Custom CSP Net 416*416 22.9 4.2G 3.81M Download
NanoDet-EfficientLite EfficientNet-Lite0 320*320 24.7 1.72G 3.11M Download
NanoDet-EfficientLite EfficientNet-Lite1 416*416 30.3 4.06G 4.01M Download
NanoDet-EfficientLite EfficientNet-Lite2 512*512 32.6 7.12G 4.71M Download
NanoDet-RepVGG RepVGG-A0 416*416 27.8 11.3G 6.75M Download

How to Train

  1. Prepare dataset

    If your dataset annotations are pascal voc xml format, refer to config/nanodet_custom_xml_dataset.yml

    Or convert your dataset annotations to MS COCO format(COCO annotation format details).

  2. Prepare config file

    Copy and modify an example yml config file in config/ folder.

    Change save_path to where you want to save model.

    Change num_classes in model->arch->head.

    Change image path and annotation path in both data->train and data->val.

    Set gpu ids, num workers and batch size in device to fit your device.

    Set total_epochs, lr and lr_schedule according to your dataset and batchsize.

    If you want to modify network, data augmentation or other things, please refer to Config File Detail

  3. Start training

    NanoDet is now using pytorch lightning for training.

    For both single-GPU or multiple-GPUs, run:

    python tools/train.py CONFIG_FILE_PATH
  4. Visualize Logs

    TensorBoard logs are saved in save_dir which you set in config file.

    To visualize tensorboard logs, run:

    cd <YOUR_SAVE_DIR>
    tensorboard --logdir ./

How to Deploy

NanoDet provide multi-backend C++ demo including ncnn, OpenVINO and MNN. There is also an Android demo based on ncnn library.

Export model to ONNX

To convert NanoDet pytorch model to ncnn, you can choose this way: pytorch->onnx->ncnn

To export onnx model, run tools/export_onnx.py.

python tools/export_onnx.py --cfg_path ${CONFIG_PATH} --model_path ${PYTORCH_MODEL_PATH}

Run NanoDet in C++ with inference libraries

ncnn

Please refer to demo_ncnn.

OpenVINO

Please refer to demo_openvino.

MNN

Please refer to demo_mnn.

Run NanoDet on Android

Please refer to android_demo.


Citation

If you find this project useful in your research, please consider cite:

@misc{=nanodet,
    title={NanoDet-Plus: Super fast and high accuracy lightweight anchor-free object detection model.},
    author={RangiLyu},
    howpublished = {\url{https://github.com/RangiLyu/nanodet}},
    year={2021}
}

Thanks

https://github.com/Tencent/ncnn

https://github.com/open-mmlab/mmdetection

https://github.com/implus/GFocal

https://github.com/cmdbug/YOLOv5_NCNN

https://github.com/rbgirshick/yacs

Comments
  • 训练完10个epoch开始测试的时候报错:list object has no attribute cpu

    训练完10个epoch开始测试的时候报错:list object has no attribute cpu

    File "nanodet-main/nanodet/trainer/trainer.py", line 89, in run_epoch results[meta['img_info']['id'].cpu().numpy()[0]] = dets AttributeError: 'list' object has no attribute 'cpu'

    opened by DL-Practise 16
  • Training nanodet from scratch

    Training nanodet from scratch

    Hi, I'm training NanoDet-m model (ShuffleNetV2 1.0x | 320*320) from scratch with Coco dataset and 4 GeForce RTX 2080 Ti. Convergence seems pretty slow, it could take 1-2 weeks.

    May I ask how long did it takes for you to reach 20.6 mAP, and which setup did you use?

    Thank you.

    bug help wanted 
    opened by Cloudz333 10
  • 关于项目部署的问题

    关于项目部署的问题

    你好,我想请教两个问题:

    1. nanodet.cpp文件中的NanoDet::detect(cv::Mat image, float score_threshold, float nms_threshold)函数中,给模型输入数据的时候是用的ex.input("input.1", input);,这里的input.1是什么意思呢,是输入层的名字吗,我怎么通过pytorch查看到这个名字呢,print(model)后没看到层的名字,在Tencent/ncnn/tree/master/examples 上看到基本上都是ex.input("input", input);,如果我加载自己训练的一个模型,这里应该怎么匹配?
    2. nadodet.h中,有一个 std::vector heads_info,这个里面的值具体是什么含义呢,是和网络输出有关的吗
        std::vector<HeadInfo> heads_info{
            // cls_pred|dis_pred|stride
                {"792", "795",    8},
                {"814", "817",   16},
                {"836", "839",   32},
        };
    

    对pytorch以及nano网络都不是很熟,望见谅。

    opened by busyyang 8
  • 运行demo.py时,出现了一个小问题.

    运行demo.py时,出现了一个小问题.

    我的运行环境: cuda==10.1 pytorch==1.7 torchvision==0.8.0 当我运行"python demo/demo.py image --config CONFIG_PATH --model MODEL_PATH --path IMAGE_PATH",尝试推理图片时, 出现错误: RuntimeError: Could not run 'torchvision::nms' with arguments from the 'CUDA' backend. 'torchvision::nms' is only available for these backends: [CPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, Tracer, Autocast, Batched, VmapMode].

    CPU: registered at /root/project/torchvision/csrc/vision.cpp:59 [kernel] BackendSelect: fallthrough registered at /pytorch/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback] Named: registered at /pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback] AutogradOther: fallthrough registered at /pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:35 [backend fallback] AutogradCPU: fallthrough registered at /pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:39 [backend fallback] AutogradCUDA: fallthrough registered at /pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:43 [backend fallback] AutogradXLA: fallthrough registered at /pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:47 [backend fallback] Tracer: fallthrough registered at /pytorch/torch/csrc/jit/frontend/tracer.cpp:967 [backend fallback] Autocast: fallthrough registered at /pytorch/aten/src/ATen/autocast_mode.cpp:254 [backend fallback] Batched: registered at /pytorch/aten/src/ATen/BatchingRegistrations.cpp:511 [backend fallback] VmapMode: fallthrough registered at /pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]

    但是当我把:/nanodet/nanodet/model/module/nms.py batched_nms(boxes, scores, idxs, nms_cfg, class_agnostic=False)函数改后:

    boxes_for_nms = boxes_for_nms.cpu()
    scores = scores.cpu()
    boxes = boxes.cpu()
    split_thr = nms_cfg_.pop('split_thr', 10000)
    if len(boxes_for_nms) < split_thr:
        # dets, keep = nms_op(boxes_for_nms, scores, **nms_cfg_)
        keep = nms(boxes_for_nms, scores, **nms_cfg_)
        boxes = boxes[keep]
        # scores = dets[:, -1]
        scores = scores[keep]
    

    demo.py正常运行.

    opened by lidongliang666 8
  • 加入mosaic后效果变差了,是什么原因

    加入mosaic后效果变差了,是什么原因

    coco.py

    if self.load_mosaic and not isval:
                img4, labels4, bbox4 = load_mosaic(self, idx)
                meta['img_info']['height'] = img4.shape[0]
                meta['img_info']['width'] = img4.shape[1]
                meta['img'] = img4
                meta['gt_labels'] = labels4
                meta['gt_bboxes'] = bbox4
    
    
            meta = self.pipeline(self, meta, input_size)
    
            meta["img"] = torch.from_numpy(meta["img"].transpose(2, 0, 1))
            return meta
    

    在ShapeTransform里测试打印出来的bbox是正常的

    meta_data["img"] = img
            meta_data["warp_matrix"] = M
            if "gt_bboxes" in meta_data:
                boxes = meta_data["gt_bboxes"]
                meta_data["gt_bboxes"] = warp_boxes(boxes, M, dst_shape[0], dst_shape[1])
            if "gt_masks" in meta_data:
                for i, mask in enumerate(meta_data["gt_masks"]):
                    meta_data["gt_masks"][i] = cv2.warpPerspective(
                        mask, M, dsize=tuple(dst_shape)
                    )
            for i in range(meta_data["gt_bboxes"].shape[0]):
                cv2.rectangle(img, (int(meta_data["gt_bboxes"][i][0]), int(meta_data["gt_bboxes"][i][1])), (int(meta_data["gt_bboxes"][i][2]), int(meta_data["gt_bboxes"][i][3])), (255,0,0), 2)
            cv2.imwrite('./%d.jpg' % int(meta_data["gt_bboxes"][0][0]), img)
    

    有什么可能的原因导致的?

    opened by Rokuki 6
  • Cannot find blob with name: dis_pred_stride_8

    Cannot find blob with name: dis_pred_stride_8

    使用demo_ncnn和demo_openvino测试转换预训练模型,转换过程均正常,但是预测时候出现问题,想问下怎么解决?

    # demo_ncnn
    find_blob_index_by_name input.1 failed
    Try
    find_blob_index_by_name dis_pred_stride_8 failed
    Try
    find_blob_index_by_name cls_pred_stride_8 failed
    
    # demo_openvino
    start init model
    success
    terminate called after throwing an instance of 'InferenceEngine::details::InferenceEngineException'
    what(): Cannot find blob with name: dis_pred_stride_8
    

    发现onnx模型存在dis_pred_stride_8等节点,但是转换后的ncnn模型这几个节点消失 onnx网络结构: onnx ncnn网络结构: ncnn

    opened by TTMRonald 6
  • Cannot find blob with name: 795

    Cannot find blob with name: 795

    转换的是NanoDet-EfficientLite 512x512这个模型,openvino版本为2021.3.394,能够正常转换,并在程序中加载成功,但推理的时候报错,日志如下: start init model success terminate called after throwing an instance of 'InferenceEngine::details::InferenceEngineException' what(): Cannot find blob with name: 795 有人遇到过吗

    opened by deep-practice 6
  • CoreML export failure: 'ConvModule' object has no attribute 'norm'

    CoreML export failure: 'ConvModule' object has no attribute 'norm'

    Hi, I tried to turn the nanodet-m.pth to coreml for IOS. I used coremltools as the guide, and got error "CoreML export failure: 'ConvModule' object has no attribute 'norm'". I read the source code of nanodet found that the norm in head is BN which should be supported by coreml. So I do not know why is the error happening. Is anyone has tried coreml? Thanks!

    opened by ghoshaw 6
  • No result while using single-class nano model in ncnn

    No result while using single-class nano model in ncnn

    Hi,我训练了一个person类的nanodet模型,然后通过tool/export.py转为onnx,然后转为ncnn的model,但是发现ncnn的model没有输出,我更改了cpp代码中的类别与图片size,不知道是在转换onnx时候出错还是onnx->NCNN时候出错了。下面是我训练时候的cfg

    #Config File example
    save_dir: workspace/nanodet_m
    model:
      arch:
        name: GFL
        backbone:
          name: ShuffleNetV2
          model_size: 1.0x
          out_stages: [2,3,4]
          activation: LeakyReLU
        fpn:
          name: PAN
          in_channels: [116, 232, 464]
          out_channels: 96
          start_level: 0
          num_outs: 3
        head:
          name: NanoDetHead
          num_classes: 1
          input_channel: 96
          feat_channels: 96
          stacked_convs: 2
          share_cls_reg: True
          octave_base_scale: 5
          scales_per_octave: 1
          strides: [8, 16, 32]
          reg_max: 7
          norm_cfg:
            type: BN
          loss:
            loss_qfl:
              name: QualityFocalLoss
              use_sigmoid: True
              beta: 2.0
              loss_weight: 1.0
            loss_dfl:
              name: DistributionFocalLoss
              loss_weight: 0.25
            loss_bbox:
              name: GIoULoss
              loss_weight: 2.0
    data:
      train:
        name: coco
        img_path: ../data/yoga_coco/images/train2017
        ann_path: ../data/yoga_coco/annotations/instances_train2017.json
        input_size: [416,416] #[w,h]
        keep_ratio: True
        pipeline:
          perspective: 0.0
          scale: [0.6, 1.4]
          stretch: [[1, 1], [1, 1]]
          rotation: 0
          shear: 0
          translate: 0.2
          flip: 0.5
          brightness: 0.2
          contrast: [0.8, 1.2]
          saturation: [0.8, 1.2]
          normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]]
      val:
        name: coco
        img_path: ../data/yoga_coco/images/val2017
        ann_path: ../data/yoga_coco/annotations/instances_val2017.json
        input_size: [416,416] #[w,h]
        keep_ratio: True
        pipeline:
          normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]]
    device:
      gpu_ids: [0]
      workers_per_gpu: 6
      batchsize_per_gpu: 40
    schedule:
    #  resume:
    #  load_model: YOUR_MODEL_PATH
      optimizer:
        name: SGD
        lr: 0.14
        momentum: 0.9
        weight_decay: 0.0001
      warmup:
        name: linear
        steps: 300
        ratio: 0.1
      total_epochs: 50
      lr_schedule:
        name: MultiStepLR
        milestones: [130,160,175,185]
        gamma: 0.1
      val_intervals: 10
    evaluator:
      name: CocoDetectionEvaluator
      save_key: mAP
    
    log:
      interval: 10
    
    class_names: ['person',]
    

    当我使用80类的model时,转化为ncnn有结果,所以想问问 当转化成single-class时候,有什么配置是需要再修改一下的。

    opened by Sean-hku 6
  • pth转onnx转ncnn问题

    pth转onnx转ncnn问题

    您好,我想问一下,我这边用pytorch模型转onnx再转ncnn模型,最后用ncnn模型检测结果不对。 有几个修改: 将config中的val输入改为64x64,将tools/export.py的输入大小改为64x64 python tools/export.py python -m onnxsim output.onnx output-sim.onnx build/tools/onnx/onnx2ncnn output-sim.onnx output-sim.param output-sim.bin build/tools/ncnnoptimize output-sim.param output-sim.bin new-output-sim.param new-output-sim.bin 0 这样操作是这样的 pytorch用的1.7.1 onnx 1.8.0 onnx-simplifier 0.2.19 onnxoptimizer 0.1.1 onnxruntime 1.6.0

    是哪里操作有问题吗?

    opened by yhl41001 6
  • original pytorch or onnx model

    original pytorch or onnx model

    Could you please provide pretrained pytorch or onnx model weights also? I noticed you only shared converted ncnn models, but I would like to see the speed of inference on gpu/npu accelerated systems

    opened by kadirbeytorun 6
  •  python tools/train.py  config/nanodet-plus-m_320.yml

    python tools/train.py config/nanodet-plus-m_320.yml

    Tried to : python tools/train.py config/nanodet-plus-m_320.yml error: pytorch_lightning.utilities.cloud_io.get_filesystem has been deprecated in v1.8.0 and will be" [NanoDet][01-04 10:28:00]INFO:Setting up data... loading annotations into memory... Done (t=18.55s) creating index... index created! loading annotations into memory... Done (t=0.56s) creating index... index created! [NanoDet][01-04 10:28:21]INFO:Creating model... model size is 1.0x init weights... => loading pretrained model https://download.pytorch.org/models/shufflenetv2_x1-5666bf0f80.pth Finish initialize NanoDet-Plus Head. GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs /root/anaconda3/envs/nanodet/lib/python3.7/site-packages/torch/cuda/init.py:143: UserWarning: NVIDIA GeForce RTX 3090 with CUDA capability sm_86 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70. If you want to use the NVIDIA GeForce RTX 3090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

    warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name)) LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]

    | Name | Type | Params

    0 | model | NanoDetPlus | 4.3 M 1 | avg_model | NanoDetPlus | 4.3 M

    8.7 M Trainable params 0 Non-trainable params 8.7 M Total params 34.647 Total estimated model params size (MB) [NanoDet][01-04 10:28:21]INFO:Weight Averaging is enabled /root/anaconda3/envs/nanodet/lib/python3.7/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:229: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers argument(try 40 which is the number of cpus on this machine) in theDataLoader` init to improve performance. category=PossibleUserWarning, Traceback (most recent call last): File "tools/train.py", line 146, in main(args) File "tools/train.py", line 141, in main trainer.fit(task, train_dataloader, val_dataloader) File "/root/anaconda3/envs/nanodet/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 604, in fit self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path File "/root/anaconda3/envs/nanodet/lib/python3.7/site-packages/pytorch_lightning/trainer/call.py", line 38, in _call_and_handle_interrupt return trainer_fn(*args, **kwargs) File "/root/anaconda3/envs/nanodet/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 645, in _fit_impl self._run(model, ckpt_path=self.ckpt_path) File "/root/anaconda3/envs/nanodet/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1098, in _run results = self._run_stage() File "/root/anaconda3/envs/nanodet/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1177, in _run_stage self._run_train() File "/root/anaconda3/envs/nanodet/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1200, in _run_train self.fit_loop.run() File "/root/anaconda3/envs/nanodet/lib/python3.7/site-packages/pytorch_lightning/loops/loop.py", line 194, in run self.on_run_start(*args, **kwargs) File "/root/anaconda3/envs/nanodet/lib/python3.7/site-packages/pytorch_lightning/loops/fit_loop.py", line 206, in on_run_start self.trainer.reset_train_dataloader(self.trainer.lightning_module) File "/root/anaconda3/envs/nanodet/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1552, in reset_train_dataloader if has_len_all_ranks(self.train_dataloader, self.strategy, module) File "/root/anaconda3/envs/nanodet/lib/python3.7/site-packages/pytorch_lightning/utilities/data.py", line 110, in has_len_all_ranks if total_length == 0: RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

    python3.7 cuda==10.2 gpu==RT3090 UBUNTU20.04

    Thanks

    opened by molyswu 0
  • Fails to train a model on a dataset with single class.

    Fails to train a model on a dataset with single class.

    I used the converted COCO 2017 with only labeled persons. Вот мой config:

    save_dir: workspace/nanodet-plus-m_416
    model:
      weight_averager:
        name: ExpMovingAverager
        decay: 0.9998
      arch:
        name: NanoDetPlus
        detach_epoch: 10
        backbone:
          name: ShuffleNetV2
          model_size: 1.0x
          out_stages: [2,3,4]
          activation: LeakyReLU
        fpn:
          name: GhostPAN
          in_channels: [116, 232, 464]
          out_channels: 96
          kernel_size: 5
          num_extra_level: 1
          use_depthwise: True
          activation: LeakyReLU
        head:
          name: NanoDetPlusHead
          num_classes: 1
          input_channel: 96
          feat_channels: 96
          stacked_convs: 2
          kernel_size: 5
          strides: [8, 16, 32, 64]
          activation: LeakyReLU
          reg_max: 1
          norm_cfg:
            type: BN
          loss:
            loss_qfl:
              name: QualityFocalLoss
              use_sigmoid: True
              beta: 2.0
              loss_weight: 1.0
            loss_dfl:
              name: DistributionFocalLoss
              loss_weight: 0.25
            loss_bbox:
              name: GIoULoss
              loss_weight: 2.0
        # Auxiliary head, only use in training time.
        aux_head:
          name: SimpleConvHead
          num_classes: 1
          input_channel: 192
          feat_channels: 192
          stacked_convs: 4
          strides: [8, 16, 32, 64]
          activation: LeakyReLU
          reg_max: 1
    data:
      train:
        name: CocoDataset
        img_path: /home/mosminin/fiftyone/coco_person/train/data
        ann_path: /home/mosminin/fiftyone/coco_person/train/labels.json
        input_size: [416,416] #[w,h]
        keep_ratio: False
        pipeline:
          perspective: 0.0
          scale: [0.6, 1.4]
          stretch: [[0.8, 1.2], [0.8, 1.2]]
          rotation: 0
          shear: 0
          translate: 0.2
          flip: 0.5
          brightness: 0.2
          contrast: [0.6, 1.4]
          saturation: [0.5, 1.2]
          normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]]
      val:
        name: CocoDataset
        img_path: /home/mosminin/fiftyone/coco_person/validation/data
        ann_path: /home/mosminin/fiftyone/coco_person/validation/labels.json
        input_size: [416,416] #[w,h]
        keep_ratio: False
        pipeline:
          normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]]
    device:
      gpu_ids: [0]
      workers_per_gpu: 6
      batchsize_per_gpu: 16
    schedule:
    #  resume:
    #  load_model:
      optimizer:
        name: AdamW
        lr: 0.001
        weight_decay: 0.05
      warmup:
        name: linear
        steps: 500
        ratio: 0.0001
      total_epochs: 300
      lr_schedule:
        name: CosineAnnealingLR
        T_max: 300
        eta_min: 0.00005
      val_intervals: 10
    grad_clip: 35
    evaluator:
      name: CocoDetectionEvaluator
      save_key: mAP
    log:
      interval: 50
    
    class_names: ['person']
    

    I also changed the train.py to use CPU instead of GPU the errors were more understandable.

        # if cfg.device.gpu_ids == -1:
        #     logger.info("Using CPU training")
        #     accelerator, devices, strategy = "cpu", None, None
        # else:
        #     accelerator, devices, strategy = "gpu", cfg.device.gpu_ids, None
    
        accelerator, devices, strategy = "cpu", None, None # CPU training
    
    

    After running it, I get the following errors.

    (.venv) mosminin@debian:~/dev/nanodet$ python tools/train.py /home/mosminin/dev/nanodet/config/nanodet-plus-m_416_person.yml
    /home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/utilities/cloud_io.py:33: LightningDeprecationWarning: `pytorch_lightning.utilities.cloud_io.get_filesystem` has been deprecated in v1.8.0 and will be removed in v1.10.0. Please use `lightning_lite.utilities.cloud_io.get_filesystem` instead.
      rank_zero_deprecation(
    [NanoDet][12-18 14:05:30]INFO:Setting up data...
    loading annotations into memory...
    Done (t=4.35s)
    creating index...
    index created!
    loading annotations into memory...
    Done (t=0.16s)
    creating index...
    index created!
    [NanoDet][12-18 14:05:35]INFO:Creating model...
    model size is  1.0x
    init weights...
    => loading pretrained model https://download.pytorch.org/models/shufflenetv2_x1-5666bf0f80.pth
    Finish initialize NanoDet-Plus Head.
    GPU available: True (cuda), used: False
    TPU available: False, using: 0 TPU cores
    IPU available: False, using: 0 IPUs
    HPU available: False, using: 0 HPUs
    /home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/trainer/setup.py:175: PossibleUserWarning: GPU available but not used. Set `accelerator` and `devices` using `Trainer(accelerator='gpu', devices=1)`.
      rank_zero_warn(
    
      | Name      | Type        | Params
    ------------------------------------------
    0 | model     | NanoDetPlus | 4.1 M 
    1 | avg_model | NanoDetPlus | 4.1 M 
    ------------------------------------------
    8.2 M     Trainable params
    0         Non-trainable params
    8.2 M     Total params
    32.903    Total estimated model params size (MB)
    [NanoDet][12-18 14:05:35]INFO:Weight Averaging is enabled
    /home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
      return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
    Traceback (most recent call last):
      File "/home/mosminin/dev/nanodet/tools/train.py", line 147, in <module>
        main(args)
      File "/home/mosminin/dev/nanodet/tools/train.py", line 142, in main
        trainer.fit(task, train_dataloader, val_dataloader)
      File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 603, in fit
        call._call_and_handle_interrupt(
      File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/trainer/call.py", line 38, in _call_and_handle_interrupt
        return trainer_fn(*args, **kwargs)
      File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 645, in _fit_impl
        self._run(model, ckpt_path=self.ckpt_path)
      File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1098, in _run
        results = self._run_stage()
      File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1177, in _run_stage
        self._run_train()
      File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1200, in _run_train
        self.fit_loop.run()
      File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
        self.advance(*args, **kwargs)
      File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/loops/fit_loop.py", line 267, in advance
        self._outputs = self.epoch_loop.run(self._data_fetcher)
      File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
        self.advance(*args, **kwargs)
      File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 214, in advance
        batch_output = self.batch_loop.run(kwargs)
      File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
        self.advance(*args, **kwargs)
      File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 88, in advance
        outputs = self.optimizer_loop.run(optimizers, kwargs)
      File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
        self.advance(*args, **kwargs)
      File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 200, in advance
        result = self._run_optimization(kwargs, self._optimizers[self.optim_progress.optimizer_position])
      File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 247, in _run_optimization
        self._optimizer_step(optimizer, opt_idx, kwargs.get("batch_idx", 0), closure)
      File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 357, in _optimizer_step
        self.trainer._call_lightning_module_hook(
      File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1342, in _call_lightning_module_hook
        output = fn(*args, **kwargs)
      File "/home/mosminin/dev/nanodet/nanodet/trainer/task.py", line 281, in optimizer_step
        optimizer.step(closure=optimizer_closure)
      File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/core/optimizer.py", line 169, in step
        step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
      File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/strategies/strategy.py", line 234, in optimizer_step
        return self.precision_plugin.optimizer_step(
      File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 121, in optimizer_step
        return optimizer.step(closure=closure, **kwargs)
      File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/torch/optim/lr_scheduler.py", line 68, in wrapper
        return wrapped(*args, **kwargs)
      File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/torch/optim/optimizer.py", line 140, in wrapper
        out = func(*args, **kwargs)
      File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
        return func(*args, **kwargs)
      File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/torch/optim/adamw.py", line 120, in step
        loss = closure()
      File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 107, in _wrap_closure
        closure_result = closure()
      File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 147, in __call__
        self._result = self.closure(*args, **kwargs)
      File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 133, in closure
        step_output = self._step_fn()
      File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 406, in _training_step
        training_step_output = self.trainer._call_strategy_hook("training_step", *kwargs.values())
      File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1480, in _call_strategy_hook
        output = fn(*args, **kwargs)
      File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/pytorch_lightning/strategies/strategy.py", line 378, in training_step
        return self.model.training_step(*args, **kwargs)
      File "/home/mosminin/dev/nanodet/nanodet/trainer/task.py", line 78, in training_step
        preds, loss, loss_states = self.model.forward_train(batch)
      File "/home/mosminin/dev/nanodet/nanodet/model/arch/nanodet_plus.py", line 56, in forward_train
        loss, loss_states = self.head.loss(head_out, gt_meta, aux_preds=aux_head_out)
      File "/home/mosminin/dev/nanodet/nanodet/model/head/nanodet_plus_head.py", line 198, in loss
        batch_assign_res = multi_apply(
      File "/home/mosminin/dev/nanodet/nanodet/util/misc.py", line 24, in multi_apply
        return tuple(map(list, zip(*map_results)))
      File "/home/mosminin/dev/nanodet/.venv/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
        return func(*args, **kwargs)
      File "/home/mosminin/dev/nanodet/nanodet/model/head/nanodet_plus_head.py", line 314, in target_assign_single_img
        assign_result = self.assigner.assign(
      File "/home/mosminin/dev/nanodet/nanodet/model/head/assigner/dsl_assigner.py", line 86, in assign
        F.one_hot(gt_labels.to(torch.int64), pred_scores.shape[-1])
    RuntimeError: Class values must be smaller than num_classes.
    
    

    What am I doing wrong?

    opened by Octopusmode 0
  • Adapting the code to output a center x, y instead of bounding boxes (x1, y1, x2, y2)

    Adapting the code to output a center x, y instead of bounding boxes (x1, y1, x2, y2)

    Hey, I'm not too familiar with machine learning and the like, and I'm not exactly ready to spend the next 2 months (yet) learning how tensor-flow works and such, so I'm hoping someone can assist me with this.

    So far, my experience with nanodet has been great; but, manually annotating images takes a lot of time which I don't have; because I don't really need the bounding box information anyway, I assumed I'd seek for a way to only give the center of objects rather than the top left and bottom right corners.

    Help would be highly appreciated 😄

    opened by icecreamnotallowed 0
  • The onnx model(which is transfor by export_onnx.py) out put is differ from pytoch model

    The onnx model(which is transfor by export_onnx.py) out put is differ from pytoch model

    def image_preprocess(img_path): img = cv2.imread(img_path).astype("float32")/255 # mean = [103.53, 116.28, 123.675] # Image net values # std = [57.375, 57.12, 58.395] mean = [113.533554, 118.14172, 123.63607] std = [21.405144, 21.405144, 21.405144] mean = np.array(mean, dtype=np.float32).reshape(1, 1, 3) / 255 std = np.array(std, dtype=np.float32).reshape(1, 1, 3) / 255 img = (img - mean) / std img = np.transpose(img, (2, 0, 1)) img = np.expand_dims(img, axis=0) return img

    def test_onnx_model(onnx_model,img_path=None): if img_path is None: img_path = "path for img" imgdata = image_preprocess(img_path) sess = rt.InferenceSession(onnx_model) input_name = sess.get_inputs()[0].name output_detect_name = sess.get_outputs()[0].name pred_onnx0= sess.run([output_detect_name], {input_name: imgdata}) print("outputs:") print(np.array(pred_onnx0))

    opened by Genlk 0
  • Fixes a couple of issues to add fp16 training support

    Fixes a couple of issues to add fp16 training support

    There were a couple of issues when trying to use fp16 training. For one was that it was not exposed through the configuration system. The other was that the DynamicSoftLabelAssigner used binary_cross_entropy instead of binary_cross_entropy_with_logits. This changes where sigmoid is called on the predictions so that the more stable binary_cross_entropy_with_logits can be used and the Trainer can be configured to use fp16 precision.

    opened by crisp-snakey 0
Releases(v1.0.0-alpha-1)
  • v1.0.0-alpha-1(Dec 26, 2021)

    NanoDet-Plus v1.0.0-alpha

    In NanoDet-Plus, we propose a novel label assignment strategy with a simple assign guidance module (AGM) and a dynamic soft label assigner (DSLA) to solve the optimal label assignment problem in lightweight model training. We also introduce a light feature pyramid called Ghost-PAN to enhance multi-layer feature fusion. These improvements boost previous NanoDet's detection accuracy by 7 mAP on COCO dataset.

    image

    Model |Resolution| mAPval
    0.5:0.95 |CPU Latency
    (i7-8700) |ARM Latency
    (4xA76) | FLOPS | Params | Model Size :-------------:|:--------:|:-------:|:--------------------:|:--------------------:|:----------:|:---------:|:-------: NanoDet-m | 320320 | 20.6 | 4.98ms | 10.23ms | 0.72G | 0.95M | 1.8MB(FP16) | 980KB(INT8) NanoDet-Plus-m | 320320 | 27.0 | 5.25ms | 11.97ms | 0.9G | 1.17M | 2.3MB(FP16) | 1.2MB(INT8) NanoDet-Plus-m | 416416 | 30.4 | 8.32ms | 19.77ms | 1.52G | 1.17M | 2.3MB(FP16) | 1.2MB(INT8) NanoDet-Plus-m-1.5x | 320320 | 29.9 | 7.21ms | 15.90ms | 1.75G | 2.44M | 4.7MB(FP16) | 2.3MB(INT8) NanoDet-Plus-m-1.5x | 416416 | 34.1 | 11.50ms | 25.49ms | 2.97G | 2.44M | 4.7MB(FP16) | 2.3MB(INT8) YOLOv3-Tiny | 416416 | 16.6 | - | 37.6ms | 5.62G | 8.86M | 33.7MB YOLOv4-Tiny | 416416 | 21.7 | - | 32.81ms | 6.96G | 6.06M | 23.0MB YOLOX-Nano | 416416 | 25.8 | - | 23.08ms | 1.08G | 0.91M | 1.8MB(FP16) YOLOv5-n | 640640 | 28.4 | - | 44.39ms | 4.5G | 1.9M | 3.8MB(FP16) FBNetV5 | 320640 | 30.4 | - | - | 1.8G | - | - MobileDet | 320*320 | 25.6 | - | - | 0.9G | - | -

    Model checkpoints and weights

    Download in the release files.

    Source code(tar.gz)
    Source code(zip)
    nanodet-plus-m-1.5x_320.onnx(9.43 MB)
    nanodet-plus-m-1.5x_320_checkpoint.ckpt(61.63 MB)
    nanodet-plus-m-1.5x_416.onnx(9.43 MB)
    nanodet-plus-m-1.5x_416_checkpoint.ckpt(61.63 MB)
    nanodet-plus-m-1.5x_416_ncnn.zip(4.40 MB)
    nanodet-plus-m-1.5x_416_openvino.zip(4.39 MB)
    nanodet-plus-m_320.onnx(4.57 MB)
    nanodet-plus-m_320_checkpoint.ckpt(33.82 MB)
    nanodet-plus-m_416.onnx(4.57 MB)
    nanodet-plus-m_416_checkpoint.ckpt(33.82 MB)
    nanodet-plus-m_416_mnn.mnn(4.59 MB)
    nanodet-plus-m_416_ncnn.zip(2.11 MB)
    nanodet-plus-m_416_openvino.zip(2.11 MB)
  • v0.4.2(Aug 22, 2021)

    v0.4.2

    Fix some compatibility issue of NanoDet v0.4

    Fix pytorch-lightning compatibility. (#304 #309 ) Fix pytorch1.9 compatibility. (#308 ) Support not raising an error when evaluate with empty results. (#310)

    I'm doing a lot of refactoring. NanoDet v1.x is coming soon.

    Download pretrained models

    Model | Backbone |Resolution|COCO mAP| FLOPS |Params | Pre-train weight | ncnn model | ncnn-int8 | :--------------------:|:------------------:|:--------:|:------:|:-----:|:-----:|:-----:|:-----:|:-----:| NanoDet-m | ShuffleNetV2 1.0x | 320320 | 20.6 | 0.72B | 0.95M | Download | Download | Download NanoDet-m-416 | ShuffleNetV2 1.0x | 416416 | 23.5 | 1.2B | 0.95M | Download| Download | Download | NanoDet-m-1.5x | ShuffleNetV2 1.5x | 320320 | 23.5 | 1.44B | 2.08M | Download | Download | Download NanoDet-m-1.5x-416 | ShuffleNetV2 1.5x | 416416 | 26.8 | 2.42B | 2.08M | Download| Download | Download NanoDet-t | ShuffleNetV2 1.0x | 320320 | 21.7 | 0.96B | 1.36M | Download | NanoDet-g | Custom CSP Net | 416416 | 22.9 | 4.2B | 3.81M | Download| NanoDet-EfficientLite | EfficientNet-Lite0 | 320320 | 24.7 | 1.72B | 3.11M | Download| NanoDet-EfficientLite | EfficientNet-Lite1 | 416416 | 30.3 | 4.06B | 4.01M | Download | NanoDet-EfficientLite | EfficientNet-Lite2 | 512512 | 32.6 | 7.12B | 4.71M | Download | NanoDet-RepVGG | RepVGG-A0 | 416416 | 27.8 | 11.3B | 6.75M | Download |

    Source code(tar.gz)
    Source code(zip)
  • v0.4.1(Jul 17, 2021)

    v0.4.1

    This is a final release of NanoDet v0.x.

    I'm doing a lot of refactoring. NanoDet v1.x is coming soon.

    Download pretrained models

    Model | Backbone |Resolution|COCO mAP| FLOPS |Params | Pre-train weight | ncnn model | ncnn-int8 | :--------------------:|:------------------:|:--------:|:------:|:-----:|:-----:|:-----:|:-----:|:-----:| NanoDet-m | ShuffleNetV2 1.0x | 320320 | 20.6 | 0.72B | 0.95M | Download | Download | Download NanoDet-m-416 | ShuffleNetV2 1.0x | 416416 | 23.5 | 1.2B | 0.95M | Download| Download | Download | NanoDet-m-1.5x | ShuffleNetV2 1.5x | 320320 | 23.5 | 1.44B | 2.08M | Download | Download | Download NanoDet-m-1.5x-416 | ShuffleNetV2 1.5x | 416416 | 26.8 | 2.42B | 2.08M | Download| Download | Download NanoDet-t | ShuffleNetV2 1.0x | 320320 | 21.7 | 0.96B | 1.36M | Download | NanoDet-g | Custom CSP Net | 416416 | 22.9 | 4.2B | 3.81M | Download| NanoDet-EfficientLite | EfficientNet-Lite0 | 320320 | 24.7 | 1.72B | 3.11M | Download| NanoDet-EfficientLite | EfficientNet-Lite1 | 416416 | 30.3 | 4.06B | 4.01M | Download | NanoDet-EfficientLite | EfficientNet-Lite2 | 512512 | 32.6 | 7.12B | 4.71M | Download | NanoDet-RepVGG | RepVGG-A0 | 416416 | 27.8 | 11.3B | 6.75M | Download |

    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(Jun 8, 2021)

    What's new in v0.4.0

    1. Fix a little bug in demo.py by BlainWu (#210)
    2. Add script to export TorchScript model by strawberrypie (#211)
    3. Use fixed output names when exporting ONNX (#218)
    4. Use scale_factor instead of fixed size in resize to support dynamic shape inference (#218)
    5. Ensure num_classes equal len(class_names) by ZHEQIUSHUI (#221)
    6. Fix a bug in mnn demo while using GPU device by AcherStyx (#234)
    7. Fix with_last_conv bug in shufflenet (#239)
    8. Support batch eval (#241)
    9. Add nanodet-m-1.5x models (#242)
    10. Update model benchmark (#246)
    11. Prevent lightning Trainer from disabling cudnn.benchmark (#249)
    12. Fix multi-GPU evaluation bug with pytorch-lightning (#254)

    Download pretrained models

    Model | Backbone |Resolution|COCO mAP| FLOPS |Params | Pre-train weight | :--------------------:|:------------------:|:--------:|:------:|:-----:|:-----:|:-----:| NanoDet-m | ShuffleNetV2 1.0x | 320320 | 20.6 | 0.72B | 0.95M | Download | NanoDet-m-416 | ShuffleNetV2 1.0x | 416416 | 23.5 | 1.2B | 0.95M | Download| NanoDet-m-1.5x | ShuffleNetV2 1.5x | 320320 | 23.5 | 1.44B | 2.08M | Download | NanoDet-m-1.5x-416 | ShuffleNetV2 1.5x | 416416 | 26.8 | 2.42B | 2.08M | Download| NanoDet-t | ShuffleNetV2 1.0x | 320320 | 21.7 | 0.96B | 1.36M | Download | NanoDet-g | Custom CSP Net | 416416 | 22.9 | 4.2B | 3.81M | Download| NanoDet-EfficientLite | EfficientNet-Lite0 | 320320 | 24.7 | 1.72B | 3.11M | Download| NanoDet-EfficientLite | EfficientNet-Lite1 | 416416 | 30.3 | 4.06B | 4.01M | Download | NanoDet-EfficientLite | EfficientNet-Lite2 | 512512 | 32.6 | 7.12B | 4.71M | Download | NanoDet-RepVGG | RepVGG-A0 | 416416 | 27.8 | 11.3B | 6.75M | Download |

    Download ncnn models below

    Source code(tar.gz)
    Source code(zip)
    ncnn-nanodet-m-1.5x-416-int8.zip(1.82 MB)
    ncnn-nanodet-m-1.5x-416.zip(3.67 MB)
    ncnn-nanodet-m-1.5x-int8.zip(1.82 MB)
    ncnn-nanodet-m-1.5x.zip(3.66 MB)
    ncnn-nanodet-m-416-int8.zip(882.58 KB)
    ncnn-nanodet-m-416.zip(1.64 MB)
    ncnn-nanodet-m-int8.zip(888.76 KB)
    ncnn-nanodet-m.zip(1.64 MB)
  • v0.3.0(Apr 11, 2021)

    What's new in v0.3.0

    1. Refactor training and testing code with pytorch-lightning.
    2. Solving ONNX inference AxisError by zshn25 (#198).

    Download pretrained models

    Model | Backbone |Resolution|COCO mAP| FLOPS |Params | Pre-train weight | :--------------------:|:------------------:|:--------:|:------:|:-----:|:-----:|:-----:| NanoDet-m | ShuffleNetV2 1.0x | 320320 | 20.6 | 0.72B | 0.95M | Download | NanoDet-m-416 | ShuffleNetV2 1.0x | 416416 | 23.5 | 1.2B | 0.95M | Download| NanoDet-t (NEW) | ShuffleNetV2 1.0x | 320320 | 21.7 | 0.96B | 1.36M | Download | NanoDet-g | Custom CSP Net | 416416 | 22.9 | 4.2B | 3.81M | Download| NanoDet-EfficientLite | EfficientNet-Lite0 | 320320 | 24.7 | 1.72B | 3.11M | Download| NanoDet-EfficientLite | EfficientNet-Lite1 | 416416 | 30.3 | 4.06B | 4.01M | Download | NanoDet-EfficientLite | EfficientNet-Lite2 | 512512 | 32.6 | 7.12B | 4.71M | Download | NanoDet-RepVGG | RepVGG-A0 | 416416 | 27.8 | 11.3B | 6.75M | Download |

    Source code(tar.gz)
    Source code(zip)
    nanodet_m_ncnn_model.zip(1.64 MB)
  • v0.2.0(Mar 29, 2021)

    What's new in v0.2.0

    1. Add pyncnn demo by caishanli (#167).
    2. Fix ncnn demo build failure without vulkan by nihui (#168).
    3. Add NanoDet-t with Transformer Attention Network (#183).
    4. Add Notebook demo by zhiqwang (#188).
    5. Add feature of saving demo inference result by wwdok (#191).
    6. Fix utf-8 decode bug (#184).
    7. Fix test bug.

    Download pretrained models

    Model | Backbone |Resolution|COCO mAP| FLOPS |Params | Pre-train weight | :--------------------:|:------------------:|:--------:|:------:|:-----:|:-----:|:-----:| NanoDet-m | ShuffleNetV2 1.0x | 320320 | 20.6 | 0.72B | 0.95M | Download | NanoDet-m-416 | ShuffleNetV2 1.0x | 416416 | 23.5 | 1.2B | 0.95M | Download| NanoDet-t (NEW) | ShuffleNetV2 1.0x | 320320 | 21.7 | 0.96B | 1.36M | Download | NanoDet-g | Custom CSP Net | 416416 | 22.9 | 4.2B | 3.81M | Download| NanoDet-EfficientLite | EfficientNet-Lite0 | 320320 | 24.7 | 1.72B | 3.11M | Download| NanoDet-EfficientLite | EfficientNet-Lite1 | 416416 | 30.3 | 4.06B | 4.01M | Download | NanoDet-EfficientLite | EfficientNet-Lite2 | 512512 | 32.6 | 7.12B | 4.71M | Download | NanoDet-RepVGG | RepVGG-A0 | 416416 | 27.8 | 11.3B | 6.75M | Download |

    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(Mar 7, 2021)

    What's new in v0.1.0

    1. Support MNN python and cpp inference (#83 ).
    2. Support OpenVINO inference.
    3. Support libtorch inference experimentally.
    4. Add NanoDet-g.
    5. Add EfficientNet-Lite and Rep-VGG backbone.
    6. Add Model Zoo and provide more pre-trained model.
    7. Refactor GFL head (#154 ).

    Download pretrained models

    Model | Backbone |Resolution|COCO mAP| FLOPS |Params | Pre-train weight | :--------------------:|:------------------:|:--------:|:------:|:-----:|:-----:|:-----:| NanoDet-m | ShuffleNetV2 1.0x | 320320 | 20.6 | 0.72B | 0.95M | Download | NanoDet-m-416 | ShuffleNetV2 1.0x | 416416 | 23.5 | 1.2B | 0.95M | Download| NanoDet-g | Custom CSP Net | 416416 | 22.9 | 4.2B | 3.81M | Download| NanoDet-EfficientLite | EfficientNet-Lite0 | 320320 | 24.7 | 1.72B | 3.11M | Download| NanoDet-EfficientLite | EfficientNet-Lite1 | 416416 | 30.3 | 4.06B | 4.01M | Download | NanoDet-EfficientLite | EfficientNet-Lite2 | 512512 | 32.6 | 7.12B | 4.71M | Download | NanoDet-RepVGG | RepVGG-A0 | 416*416 | 27.8 | 11.3B | 6.75M | Download |

    Source code(tar.gz)
    Source code(zip)
  • v0.0.1(Nov 22, 2020)

Owner
Away From Keyboard
null
This project aims to explore the deployment of Swin-Transformer based on TensorRT, including the test results of FP16 and INT8.

Swin Transformer This project aims to explore the deployment of SwinTransformer based on TensorRT, including the test results of FP16 and INT8. Introd

maggiez 87 Dec 21, 2022
MMdet2-based reposity about lightweight detection model: Nanodet, PicoDet.

Lightweight-Detection-and-KD MMdet2-based reposity about lightweight detection model: Nanodet, PicoDet. This repo also includes detection knowledge di

Egqawkq 12 Jan 5, 2023
DAFNe: A One-Stage Anchor-Free Deep Model for Oriented Object Detection

DAFNe: A One-Stage Anchor-Free Deep Model for Oriented Object Detection Code for our Paper DAFNe: A One-Stage Anchor-Free Deep Model for Oriented Obje

Steven Lang 58 Dec 19, 2022
Super-Fast-Adversarial-Training - A PyTorch Implementation code for developing super fast adversarial training

Super-Fast-Adversarial-Training This is a PyTorch Implementation code for develo

LBK 26 Dec 2, 2022
CenterFace(size of 7.3MB) is a practical anchor-free face detection and alignment method for edge devices.

CenterFace Introduce CenterFace(size of 7.3MB) is a practical anchor-free face detection and alignment method for edge devices. Recent Update 2019.09.

StarClouds 1.2k Dec 21, 2022
Face Recognition plus identification simply and fast | Python

PyFaceDetection Face Recognition plus identification simply and fast Ubuntu Setup sudo pip3 install numpy sudo pip3 install cmake sudo pip3 install dl

Peyman Majidi Moein 16 Sep 22, 2022
Official implementation of "Dynamic Anchor Learning for Arbitrary-Oriented Object Detection" (AAAI2021).

DAL This project hosts the official implementation for our AAAI 2021 paper: Dynamic Anchor Learning for Arbitrary-Oriented Object Detection [arxiv] [c

ming71 215 Nov 28, 2022
A simple python module to generate anchor (aka default/prior) boxes for object detection tasks.

PyBx WIP A simple python module to generate anchor (aka default/prior) boxes for object detection tasks. Calculated anchor boxes are returned as ndarr

thatgeeman 4 Dec 15, 2022
tensorrt int8 量化yolov5 4.0 onnx模型

onnx模型转换为 int8 tensorrt引擎

null 123 Dec 28, 2022
EfficientNetv2 TensorRT int8

EfficientNetv2_TensorRT_int8 EfficientNetv2模型实现来自https://github.com/d-li14/efficientnetv2.pytorch 环境配置 ubuntu:18.04 cuda:11.0 cudnn:8.0 tensorrt:7

null 34 Apr 24, 2022
A high-performance anchor-free YOLO. Exceeding yolov3~v5 with ONNX, TensorRT, NCNN, and Openvino supported.

YOLOX is an anchor-free version of YOLO, with a simpler design but better performance! It aims to bridge the gap between research and industrial communities. For more details, please refer to our report on Arxiv.

null 7.7k Jan 6, 2023
YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with ONNX, TensorRT, ncnn, and OpenVINO supported.

Introduction YOLOX is an anchor-free version of YOLO, with a simpler design but better performance! It aims to bridge the gap between research and ind

null 7.7k Jan 3, 2023
Code for CVPR 2021 paper: Anchor-Free Person Search

Introduction This is the implementationn for Anchor-Free Person Search in CVPR2021 License This project is released under the Apache 2.0 license. Inst

null 158 Jan 4, 2023
Code for CVPR2021 paper "Learning Salient Boundary Feature for Anchor-free Temporal Action Localization"

AFSD: Learning Salient Boundary Feature for Anchor-free Temporal Action Localization This is an official implementation in PyTorch of AFSD. Our paper

Tencent YouTu Research 146 Dec 24, 2022
The official implementation of the CVPR 2021 paper FAPIS: a Few-shot Anchor-free Part-based Instance Segmenter

FAPIS The official implementation of the CVPR 2021 paper FAPIS: a Few-shot Anchor-free Part-based Instance Segmenter Introduction This repo is primari

Khoi Nguyen 8 Dec 11, 2022
Yoloxkeypointsegment - An anchor-free version of YOLO, with a simpler design but better performance

Introduction 关键点版本:已完成 全景分割版本:已完成 实例分割版本:已完成 YOLOX is an anchor-free version of

null 23 Oct 20, 2022
Repository to run object detection on a model trained on an autonomous driving dataset.

Autonomous Driving Object Detection on the Raspberry Pi 4 Description of Repository This repository contains code and instructions to configure the ne

Ethan 51 Nov 17, 2022
Run object detection model on the Raspberry Pi

Using TensorFlow Lite with Python is great for embedded devices based on Linux, such as Raspberry Pi.

Dimitri Yanovsky 6 Oct 8, 2022
Based on Yolo's low-power, ultra-lightweight universal target detection algorithm, the parameter is only 250k, and the speed of the smart phone mobile terminal can reach ~300fps+

Based on Yolo's low-power, ultra-lightweight universal target detection algorithm, the parameter is only 250k, and the speed of the smart phone mobile terminal can reach ~300fps+

null 567 Dec 26, 2022