ncnn is a high-performance neural network inference framework optimized for the mobile platform

Overview

ncnn

License Build Status download codecov Language grade: C/C++

ncnn is a high-performance neural network inference computing framework optimized for mobile platforms. ncnn is deeply considerate about deployment and uses on mobile phones from the beginning of design. ncnn does not have third party dependencies. it is cross-platform, and runs faster than all known open source frameworks on mobile phone cpu. Developers can easily deploy deep learning algorithm models to the mobile platform by using efficient ncnn implementation, create intelligent APPs, and bring the artificial intelligence to your fingertips. ncnn is currently being used in many Tencent applications, such as QQ, Qzone, WeChat, Pitu and so on.

ncnn 是一个为手机端极致优化的高性能神经网络前向计算框架。ncnn 从设计之初深刻考虑手机端的部署和使用。无第三方依赖,跨平台,手机端 cpu 的速度快于目前所有已知的开源框架。基于 ncnn,开发者能够将深度学习算法轻松移植到手机端高效执行,开发出人工智能 APP,将 AI 带到你的指尖。ncnn 目前已在腾讯多款应用中使用,如 QQ,Qzone,微信,天天P图等。


技术交流QQ群:637093648(超多大佬) 答案:卷卷卷卷卷

Pocky群(MLIR YES!): 677104663(超多大佬)

Telegram Group https://t.me/ncnnyes

Discord Channel https://discord.gg/YRsxgmF


Current building status matrix

System CPU (32bit) CPU (64bit) GPU (32bit) GPU (64bit)
Linux (GCC) Build Status Build Status Build Status
Linux (Clang) Build Status Build Status Build Status
Linux (ARM) Build Status Build Status
Linux (MIPS) Build Status Build Status
Linux (RISC-V) Build Status
Windows (VS2015) Build Status Build Status
Windows (VS2017) Build Status Build Status Build Status
Windows (VS2019) Build Status Build Status Build Status
macOS Build Status Build Status
macOS (ARM) Build Status Build Status
Android Build Status Build Status Build Status Build Status
Android-x86 Build Status Build Status Build Status Build Status
iOS Build Status Build Status Build Status
iOS Simulator Build Status Build Status
WebAssembly Build Status
RISC-V GCC/Newlib Build Status Build Status

Support most commonly used CNN network

支持大部分常用的 CNN 网络


HowTo

how to build ncnn library on Linux / Windows / macOS / Raspberry Pi3 / Android / NVIDIA Jetson / iOS / WebAssembly / AllWinner D1 / Loongson 2K1000

download prebuild binary package for android and ios

use ncnn with alexnet with detailed steps, recommended for beginners :)

ncnn 组件使用指北 alexnet 附带详细步骤,新人强烈推荐 :)

use netron for ncnn model visualization

out-of-the-box web model conversion

ncnn low-level operation api

ncnn param and model file spec

ncnn operation param weight table

how to implement custom layer step by step


FAQ

ncnn throw error

ncnn produce wrong result

ncnn vulkan


Features

  • Supports convolutional neural networks, supports multiple input and multi-branch structure, can calculate part of the branch
  • No third-party library dependencies, does not rely on BLAS / NNPACK or any other computing framework
  • Pure C++ implementation, cross-platform, supports android, ios and so on
  • ARM NEON assembly level of careful optimization, calculation speed is extremely high
  • Sophisticated memory management and data structure design, very low memory footprint
  • Supports multi-core parallel computing acceleration, ARM big.LITTLE cpu scheduling optimization
  • Supports GPU acceleration via the next-generation low-overhead vulkan api
  • Extensible model design, supports 8bit quantization and half-precision floating point storage, can import caffe/pytorch/mxnet/onnx/darknet/keras/tensorflow(mlir) models
  • Support direct memory zero copy reference load network model
  • Can be registered with custom layer implementation and extended
  • Well, it is strong, not afraid of being stuffed with 卷 QvQ

功能概述

  • 支持卷积神经网络,支持多输入和多分支结构,可计算部分分支
  • 无任何第三方库依赖,不依赖 BLAS/NNPACK 等计算框架
  • 纯 C++ 实现,跨平台,支持 android ios 等
  • ARM NEON 汇编级良心优化,计算速度极快
  • 精细的内存管理和数据结构设计,内存占用极低
  • 支持多核并行计算加速,ARM big.LITTLE cpu 调度优化
  • 支持基于全新低消耗的 vulkan api GPU 加速
  • 可扩展的模型设计,支持 8bit 量化 和半精度浮点存储,可导入 caffe/pytorch/mxnet/onnx/darknet/keras/tensorflow(mlir) 模型
  • 支持直接内存零拷贝引用加载网络模型
  • 可注册自定义层实现并扩展
  • 恩,很强就是了,不怕被塞卷 QvQ

supported platform matrix

  • = known work and runs fast with good optimization
  • ✔️ = known work, but speed may not be fast enough
  • = shall work, not confirmed
  • / = not applied
Windows Linux Android macOS iOS
intel-cpu ✔️ ✔️ ✔️ /
intel-gpu ✔️ ✔️ /
amd-cpu ✔️ ✔️ ✔️ /
amd-gpu ✔️ ✔️ /
nvidia-gpu ✔️ ✔️ /
qcom-cpu ✔️ / /
qcom-gpu ✔️ ✔️ / /
arm-cpu / /
arm-gpu ✔️ / /
apple-cpu / / / ✔️
apple-gpu / / / ✔️ ✔️

Example project


License

BSD 3 Clause

Comments
  • 测试MTCNN结果完全不一样

    测试MTCNN结果完全不一样

    跑了下MTCNN的PNet和RNet结果与标准结果相差很大,拿一张人脸给RNet的得分也很低

    const float mean_vals[3] = {127.5f, 127.5f, 127.5f}; const float norm_vals[3] = {0.0078125, 0.0078125, 0.0078125};

    int hs = ceil(img_hscales[i]); int ws = ceil(img_wscales[i]); ncnn::Mat pnet_img = ncnn::Mat::from_pixels_resize(bgr.data, ncnn::Mat::PIXEL_BGR, img_w, img_h, ws, hs); pnet_img.substract_mean_normalize(mean_vals,norm_vals); ncnn::Extractor Pnet_ex = Pnet.create_extractor(); Pnet_ex.set_light_mode(true); Pnet_ex.input("data", pnet_img); ncnn::Mat score, loc; Pnet_ex.extract("prob1", score); Pnet_ex.extract("conv4-2", loc);

    if(*(score_data+i)>=thresh)。。。

    opened by ElegantGod 31
  • prepare for release with experimental gpu inference capability

    prepare for release with experimental gpu inference capability

    shader

    • [x] priorbox (ssd)
    • [x] premute (ssd)
    • [x] deconvolution
    • [x] deconvolutiondepthwise (yolo)
    • [x] interp (upsample)
    • [x] reorg (yolov2)
    • [x] prelu
    • [x] reshape
    • [x] tanh
    • [x] sigmoid
    • [x] clip
    • [x] absval
    • [x] shufflechannel (shufflenet)

    example

    • [x] squeezenet-gpu
    • [x] mobilenet-ssd-gpu
    • [x] mobilenet-yolov3-gpu

    benchncnn

    • [x] shufflenet
    • [x] mobilenet-ssd / squeezenet-ssd
    • [x] mobilenet-yolo / mobilenet-yolov3

    binary release

    • [x] vulkan-enabled android prebuild library
    • [x] vulkan-enabled ios prebuild framework (arm64 only)

    documentation

    • [x] faq about common vulkan api error
    • [x] faq about packing
    • [x] faq about hybrid gpu/cpu inference practice
    • [x] faq about op fusion
    enhancement 
    opened by nihui 26
  • 在转换模型文件的过程中,为什么参数会发生变化?

    在转换模型文件的过程中,为什么参数会发生变化?

    我是用pytorch生成的onnx模型,onnx模型经过onnxsim简化之后,再用onnx2ncnn转成.param文件,再用ncnnoptimize对.param进行优化之后, 然后用netron工具打开onnx模型文件和.param文件发现最后一层全连接层的w和b的值不一样,这是什么问题?有人遇到过没?

    opened by yanJiang0216 21
  • mtcnn使用20191113版本速度变慢

    mtcnn使用20191113版本速度变慢

    代码:https://github.com/moli232777144/mtcnn_ncnn 使用代码自带的ncnn库(更新时间是20180516),NDK版本使用android-ndk-r16b,对代码自带的科比图像循环检测100次,线程数为2,平均耗时是45ms。 更新ncnn库到20191113版本,NDK版本更新为android-ndk-r19c,对科比图像循环检测100次,线程数为2,平均耗时是106ms。 注:测试手机是vivo NEX A,骁龙710,代码下载下来之后未作变动,只是更新了gradle版本。对比前后只是更新了ncnn库和sdk版本(sdk版本不变的话,20191113版本检测时间还会更长一点),其余未作变动。 这个耗时相差还是比较大的,请问一下可能是什么原因呢?非常感谢!

    更新:对历史版本挨个测试之后发现,20190611版本的2线程循环1000次平均耗时仍然为44ms,到了20190908版本就变成了107ms。

    opened by yue-sunyata 20
  • 小米10/10Pro 省电模式必现闪退:__kmp_abort_process

    小米10/10Pro 省电模式必现闪退:__kmp_abort_process

    error log | 日志或报错信息 | ログ

    Fatal signal 6 (SIGABRT), code -1 (SI_QUEUE) in tid 31959 (thread-pool-1), pid 31478 (om.timehut.ncnn)
    pid: 31478, tid: 31959, name: thread-pool-1  >>> com.timehut.ncnn <<<
          #01 pc 0000000000b19260  /data/app/~~DewB4Z9g4BUSPMAa4ro2Zw==/com.timehut.ncnn-g1R4w3hYqG12Txs5VaOSeQ==/base.apk!libtimehut-ai.so (offset 0x23ce000) (__kmp_abort_process+52) (BuildId: 91d74b3087e79f0d9d64fe036c708f7533cc5462)
    

    context | 编译/运行环境 | バックグラウンド

    ncnn版本:ncnn-20220729-android-vulkan Android Studio Electric Eel | 2022.1.1 Beta 4 NDK :24.0.8215888 运行环境:小米10/小米10 Pro,省电模式

    how to reproduce | 复现步骤 | 再現方法

    1. 省电模式每次必现闪退,测试发现只要调用System.loadLibrary加载ncnn编译好的so库就会闪退
    2. 代码如下

    more | 其他 | その他

        init {
            System.loadLibrary("timehut-ai")
        }
    
    bug 
    opened by ijero 19
  • 量化int8后出现以下情况:yolov5无检测框,yolo fastest等轻量检测网络精度下降严重

    量化int8后出现以下情况:yolov5无检测框,yolo fastest等轻量检测网络精度下降严重

    目前量化实验得到的几个结果返馈一下并提问大佬一些问题 ①预处理和校验数据确实是会影响到量化模型的精度(coco&ImageNet&voc的mean&norm转化后需区别对待,此条可忽略) ②有些模型量化后会出现没有检测框的现象(比如v5,v5有些处理模块比较复杂,不知道和这个有没有关系) ③一些轻量模型量化后推理时间反而更久,且精度出现下降严重的情况,但是这个推理时间更久是有前提的,比如用的是inter i7,i5这类处理器(原因可能有两个,一是像up说的ncnn更注重arm类架构,其次才是我说的这类处理器,二是类似yolo fastest的fp16在inter的处理器上已经达到20ms级别,开启vulkan甚至可以10ms,可能量化后速度也很难得到提升~可能在板子上的情况并非如此,但是我的树莓派坏了,没发去验证。) 想问下nihui大佬几个问题~ 一是yolo fastest会出现精度下降的情况具体原因是啥呢(相比下,fp16的检测精度还是可以的) 二是为何像v5这种会出现无检测框情况,虽然单帧运行确实快了三倍,但无检测框说明在下的量化过程是失败的,求大佬指点一二,感谢

    bug 
    opened by ppogg 19
  • ncnn 有类似下面这个函数没? warpAffine

    ncnn 有类似下面这个函数没? warpAffine

    void warpAffine(InputArray src, OutputArray dst, InputArray M, Size dsize, int flags=INTER_LINEAR, intborderMode=BORDER_CONSTANT, const Scalar& borderValue=Scalar())

    另外多线程 有没有 纯C++的code 。 谢谢 z

    opened by bjthemost 19
  • 为什么我拿ncnn编译出来的.a文件编译so的时候会报undefined reference to '__kmpc_fork_call'的错误

    为什么我拿ncnn编译出来的.a文件编译so的时候会报undefined reference to '__kmpc_fork_call'的错误

    你好,我想请问一下: 我拿ncnn编译出来的.a文件,用ndk-build编译.so的时候会报 undefined reference to '__kmpc_fork_call' undefined reference to '__kmpc_for_static_init_4' undefined reference to '__kmpc_for_static_fini' undefined reference to '__kmpc_for_static_init_4' layer/convolutiondepthwise.cpp:176: error:undefined reference to '__kmpc_for_static_init_8'

    opened by zhangyanhbr 19
  • 使用ncnn推理时error

    使用ncnn推理时error

    编译运行环境都是linux 然后在推理自己的代码时出现 [New LWP 17798] [New LWP 17800] [New LWP 17802] [New LWP 17801] [New LWP 17803] [New LWP 17799] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `./matting-infer'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000562614d73a39 in ncnn::NetPrivate::forward_layer(int, std::vector<ncnn::Mat, std::allocatorncnn::Mat >&, ncnn::Option const&) const () [Current thread is 1 (Thread 0x7f979a75cc00 (LWP 17798))]

    opened by FengMu1995 18
  • architecture changes for int8 packing

    architecture changes for int8 packing

    • [x] requantize
    • [ ] armv7 im2col+gemm pack8
    • [ ] armv7 conv1x1 pack8
    • [ ] armv7 conv3x3s1 pack8
    • [ ] armv8 im2col+gemm pack8
    • [ ] armv8 conv1x1 pack8
    • [ ] armv8 conv3x3s1 pack8
    opened by nihui 18
  • 预编译的库ncnn-vulkan.framework有问题,直接ld: Framework not found ncnn-vulkan

    预编译的库ncnn-vulkan.framework有问题,直接ld: Framework not found ncnn-vulkan

    我使用github上面预编译好的ncnn-vulkan.framework库,按照常规使用方式在Build phases里添加进项目,run之后直接error了,提示ld: Framework not found [ncnn-vulkan,我使用其他的framework库都没有任何问题,我网上下载了别人编译好的ncnn-vulkan.framework库,按同样的方式集成进项目里,也没有报错,目前就是使用了github上预编译好的ncnn-vulkan.framework出现了这个问题,麻烦尽快处理一下,谢谢!

    opened by chenminghong 17
  • Conflict between ncnn and TengineKit

    Conflict between ncnn and TengineKit

    error log | 日志或报错信息 | ログ

    I came across a native exception of "A/libc: Fatal signal 6 (SIGABRT)" on an Android mobile device when I used TengineKit.

    context | 编译/运行环境 | バックグラウンド

    Android 12 tengine-kit-sdk1.0.1.aar

    how to reproduce | 复现步骤 | 再現方法

    1. Initialize TengineKit.
    2. Load ncnn parameters and binary files.
    3. An exception of "A/libc: Fatal signal 6 (SIGABRT)" was raised.

    more | 其他 | その他

    This error will be raised regardless of the execution order of 1 and 2.

    opened by GanchengZhu 0
  • [feature request] int8 quantization support for Convolution1D and ConvolutionDepthWise1D

    [feature request] int8 quantization support for Convolution1D and ConvolutionDepthWise1D

    https://github.com/Tencent/ncnn/blob/c471826da1e1fd3820e4a6690e777479e22c4ceb/tools/quantize/ncnn2table.cpp#L132

    Is there a plan to also support

    • Convolution1D
    • ConvolutionDepthWise1D
    opened by csukuangfj 0
  • 'float16x4_t' was not declared in this scope when compile for hisi

    'float16x4_t' was not declared in this scope when compile for hisi

    error log | 日志或报错信息 | ログ

    [ 0%] Built target ncnn-generate-spirv [ 0%] Building CXX object src/CMakeFiles/ncnn.dir/blob.cpp.o [ 1%] Building CXX object src/CMakeFiles/ncnn.dir/allocator.cpp.o [ 1%] Building CXX object src/CMakeFiles/ncnn.dir/benchmark.cpp.o [ 2%] Building CXX object src/CMakeFiles/ncnn.dir/command.cpp.o [ 2%] Building CXX object src/CMakeFiles/ncnn.dir/cpu.cpp.o [ 4%] Building CXX object src/CMakeFiles/ncnn.dir/datareader.cpp.o [ 4%] Building CXX object src/CMakeFiles/ncnn.dir/c_api.cpp.o [ 4%] Building CXX object src/CMakeFiles/ncnn.dir/gpu.cpp.o [ 4%] Building CXX object src/CMakeFiles/ncnn.dir/mat.cpp.o [ 5%] Building CXX object src/CMakeFiles/ncnn.dir/layer.cpp.o [ 6%] Building CXX object src/CMakeFiles/ncnn.dir/mat_pixel_drawing.cpp.o [ 7%] Building CXX object src/CMakeFiles/ncnn.dir/mat_pixel_affine.cpp.o [ 7%] Building CXX object src/CMakeFiles/ncnn.dir/mat_pixel.cpp.o [ 8%] Building CXX object src/CMakeFiles/ncnn.dir/mat_pixel_resize.cpp.o [ 8%] Building CXX object src/CMakeFiles/ncnn.dir/mat_pixel_rotate.cpp.o [ 9%] Building CXX object src/CMakeFiles/ncnn.dir/modelbin.cpp.o [ 10%] Building CXX object src/CMakeFiles/ncnn.dir/net.cpp.o [ 10%] Building CXX object src/CMakeFiles/ncnn.dir/option.cpp.o [ 10%] Building CXX object src/CMakeFiles/ncnn.dir/pipeline.cpp.o [ 11%] Building CXX object src/CMakeFiles/ncnn.dir/paramdict.cpp.o [ 12%] Building CXX object src/CMakeFiles/ncnn.dir/pipelinecache.cpp.o [ 12%] Building CXX object src/CMakeFiles/ncnn.dir/simpleocv.cpp.o [ 14%] Building CXX object src/CMakeFiles/ncnn.dir/simpleomp.cpp.o [ 14%] Building CXX object src/CMakeFiles/ncnn.dir/simplestl.cpp.o [ 14%] Building CXX object src/CMakeFiles/ncnn.dir/layer/absval.cpp.o [ 15%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/absval_arm.cpp.o [ 15%] Building CXX object src/CMakeFiles/ncnn.dir/layer/batchnorm.cpp.o [ 16%] Building CXX object src/CMakeFiles/ncnn.dir/layer/bias.cpp.o [ 17%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/batchnorm_arm.cpp.o [ 17%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/bias_arm.cpp.o [ 17%] Building CXX object src/CMakeFiles/ncnn.dir/layer/concat.cpp.o [ 18%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/concat_arm.cpp.o [ 19%] Building CXX object src/CMakeFiles/ncnn.dir/layer/bnll.cpp.o [ 19%] Building CXX object src/CMakeFiles/ncnn.dir/layer/convolution.cpp.o [ 20%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/convolution_arm.cpp.o [ 21%] Building CXX object src/CMakeFiles/ncnn.dir/layer/crop.cpp.o [ 21%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/crop_arm.cpp.o [ 22%] Building CXX object src/CMakeFiles/ncnn.dir/layer/deconvolution.cpp.o [ 22%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/deconvolution_arm.cpp.o [ 23%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/dropout_arm.cpp.o [ 23%] Building CXX object src/CMakeFiles/ncnn.dir/layer/eltwise.cpp.o [ 24%] Building CXX object src/CMakeFiles/ncnn.dir/layer/dropout.cpp.o [ 24%] Building CXX object src/CMakeFiles/ncnn.dir/layer/elu.cpp.o [ 25%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/eltwise_arm.cpp.o [ 25%] Building CXX object src/CMakeFiles/ncnn.dir/layer/exp.cpp.o [ 27%] Building CXX object src/CMakeFiles/ncnn.dir/layer/flatten.cpp.o [ 27%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/flatten_arm.cpp.o [ 28%] Building CXX object src/CMakeFiles/ncnn.dir/layer/embed.cpp.o [ 29%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/innerproduct_arm.cpp.o [ 29%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/innerproduct_arm_vfpv4.cpp.o [ 29%] Building CXX object src/CMakeFiles/ncnn.dir/layer/innerproduct.cpp.o [ 30%] Building CXX object src/CMakeFiles/ncnn.dir/layer/input.cpp.o [ 30%] Building CXX object src/CMakeFiles/ncnn.dir/layer/log.cpp.o [ 31%] Building CXX object src/CMakeFiles/ncnn.dir/layer/lrn.cpp.o [ 32%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/lrn_arm.cpp.o [ 32%] Building CXX object src/CMakeFiles/ncnn.dir/layer/memorydata.cpp.o [ 33%] Building CXX object src/CMakeFiles/ncnn.dir/layer/mvn.cpp.o [ 35%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/pooling_arm.cpp.o [ 35%] Building CXX object src/CMakeFiles/ncnn.dir/layer/power.cpp.o [ 35%] Building CXX object src/CMakeFiles/ncnn.dir/layer/prelu.cpp.o [ 36%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/prelu_arm.cpp.o [ 36%] Building CXX object src/CMakeFiles/ncnn.dir/layer/pooling.cpp.o [ 37%] Building CXX object src/CMakeFiles/ncnn.dir/layer/reduction.cpp.o [ 37%] Building CXX object src/CMakeFiles/ncnn.dir/layer/proposal.cpp.o [ 37%] Building CXX object src/CMakeFiles/ncnn.dir/layer/relu.cpp.o [ 38%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/relu_arm.cpp.o [ 39%] Building CXX object src/CMakeFiles/ncnn.dir/layer/reshape.cpp.o [ 39%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/reshape_arm.cpp.o [ 40%] Building CXX object src/CMakeFiles/ncnn.dir/layer/roipooling.cpp.o [ 40%] Building CXX object src/CMakeFiles/ncnn.dir/layer/scale.cpp.o [ 41%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/scale_arm.cpp.o [ 42%] Building CXX object src/CMakeFiles/ncnn.dir/layer/sigmoid.cpp.o In file included from /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_arm.cpp:31:0: /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_fp16s.h: In function 'void ncnn::innerproduct_pack4_fp16s_neon(const ncnn::Mat&, ncnn::Mat&, const ncnn::Mat&, const ncnn::Mat&, int, const ncnn::Mat&, const ncnn::Option&)': /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_fp16s.h:256:45: error: 'float16x4_t' was not declared in this scope float32x4_t _w0 = vcvt_f32_f16((float16x4_t)(vget_low_u16(_w01))); ^~~~~~~~~~~ /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_fp16s.h:256:77: error: 'vcvt_f32_f16' was not declared in this scope float32x4_t _w0 = vcvt_f32_f16((float16x4_t)(vget_low_u16(_w01))); ^ /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_fp16s.h:284:44: error: 'float16x4_t' was not declared in this scope float32x4_t _w = vcvt_f32_f16((float16x4_t)(vld1_u16(kptr))); ^~~~~~~~~~~ /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_fp16s.h:284:72: error: 'vcvt_f32_f16' was not declared in this scope float32x4_t _w = vcvt_f32_f16((float16x4_t)(vld1_u16(kptr))); ^ /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_fp16s.h: In function 'void ncnn::innerproduct_fp16s_neon(const ncnn::Mat&, ncnn::Mat&, const ncnn::Mat&, const ncnn::Mat&, int, const ncnn::Mat&, const ncnn::Option&)': /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_fp16s.h:413:45: error: 'float16x4_t' was not declared in this scope float32x4_t _w0 = vcvt_f32_f16((float16x4_t)(vld1_u16(kptr0))); ^~~~~~~~~~~ /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_fp16s.h:413:74: error: 'vcvt_f32_f16' was not declared in this scope float32x4_t _w0 = vcvt_f32_f16((float16x4_t)(vld1_u16(kptr0))); ^ /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_fp16s.h:510:44: error: 'float16x4_t' was not declared in this scope float32x4_t _w = vcvt_f32_f16((float16x4_t)(vld1_u16(kptr))); ^~~~~~~~~~~ /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_fp16s.h:510:72: error: 'vcvt_f32_f16' was not declared in this scope float32x4_t _w = vcvt_f32_f16((float16x4_t)(vld1_u16(kptr))); ^ /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_fp16s.h: In function 'void ncnn::innerproduct_transform_kernel_fp16s_neon(const ncnn::Mat&, ncnn::Mat&, int, int, const ncnn::Option&)': /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_fp16s.h:716:68: error: 'vcvt_f16_f32' was not declared in this scope _p.val[0] = (uint16x4_t)(vcvt_f16_f32(vld1q_f32(k0))); ^ In file included from /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_arm.cpp:32:0: /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_gemm_fp16s.h: In function 'void ncnn::innerproduct_gemm_fp16s_neon(const ncnn::Mat&, ncnn::Mat&, const ncnn::Mat&, const ncnn::Mat&, int, const ncnn::Mat&, const ncnn::Option&)': /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_gemm_fp16s.h:123:52: error: 'float16x4_t' was not declared in this scope float32x4_t _w = vcvt_f32_f16((float16x4_t)(vld1_u16(kptr))); ^~~~~~~~~~~ /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_gemm_fp16s.h:123:80: error: 'vcvt_f32_f16' was not declared in this scope float32x4_t _w = vcvt_f32_f16((float16x4_t)(vld1_u16(kptr))); ^ /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_gemm_fp16s.h:217:53: error: 'float16x4_t' was not declared in this scope float32x4_t _w0 = vcvt_f32_f16((float16x4_t)(vget_low_u16(_w01))); ^~~~~~~~~~~ /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_gemm_fp16s.h:217:85: error: 'vcvt_f32_f16' was not declared in this scope float32x4_t _w0 = vcvt_f32_f16((float16x4_t)(vget_low_u16(_w01))); ^ /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_gemm_fp16s.h:245:52: error: 'float16x4_t' was not declared in this scope float32x4_t _w = vcvt_f32_f16((float16x4_t)(vld1_u16(kptr))); ^~~~~~~~~~~ /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_gemm_fp16s.h:245:80: error: 'vcvt_f32_f16' was not declared in this scope float32x4_t _w = vcvt_f32_f16((float16x4_t)(vld1_u16(kptr))); ^ /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_gemm_fp16s.h:320:52: error: 'float16x4_t' was not declared in this scope float32x4_t _w = vcvt_f32_f16((float16x4_t)(vld1_u16(kptr))); ^~~~~~~~~~~ /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_gemm_fp16s.h:320:80: error: 'vcvt_f32_f16' was not declared in this scope float32x4_t _w = vcvt_f32_f16((float16x4_t)(vld1_u16(kptr))); ^ [ 42%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/sigmoid_arm.cpp.o /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_gemm_fp16s.h:417:53: error: 'float16x4_t' was not declared in this scope float32x4_t _w0 = vcvt_f32_f16((float16x4_t)(vget_low_u16(_w01))); ^~~~~~~~~~~ /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_gemm_fp16s.h:417:85: error: 'vcvt_f32_f16' was not declared in this scope float32x4_t _w0 = vcvt_f32_f16((float16x4_t)(vget_low_u16(_w01))); ^ /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_gemm_fp16s.h:436:52: error: 'float16x4_t' was not declared in this scope float32x4_t _w = vcvt_f32_f16((float16x4_t)(vld1_u16(kptr))); ^~~~~~~~~~~ /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_gemm_fp16s.h:436:80: error: 'vcvt_f32_f16' was not declared in this scope float32x4_t _w = vcvt_f32_f16((float16x4_t)(vld1_u16(kptr))); ^ [ 43%] Building CXX object src/CMakeFiles/ncnn.dir/layer/slice.cpp.o [ 43%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/slice_arm.cpp.o [ 44%] Building CXX object src/CMakeFiles/ncnn.dir/layer/softmax.cpp.o [ 45%] Building CXX object src/CMakeFiles/ncnn.dir/layer/split.cpp.o [ 45%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/softmax_arm.cpp.o [ 46%] Building CXX object src/CMakeFiles/ncnn.dir/layer/tanh.cpp.o [ 47%] Building CXX object src/CMakeFiles/ncnn.dir/layer/threshold.cpp.o [ 47%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/tanh_arm.cpp.o [ 47%] Building CXX object src/CMakeFiles/ncnn.dir/layer/tile.cpp.o [ 48%] Building CXX object src/CMakeFiles/ncnn.dir/layer/rnn.cpp.o [ 49%] Building CXX object src/CMakeFiles/ncnn.dir/layer/lstm.cpp.o [ 49%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/rnn_arm.cpp.o [ 50%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/lstm_arm.cpp.o make[2]: *** [src/CMakeFiles/ncnn.dir/build.make:762: src/CMakeFiles/ncnn.dir/layer/arm/innerproduct_arm.cpp.o] Error 1 make[2]: *** Waiting for unfinished jobs.... [ 50%] Building CXX object src/CMakeFiles/ncnn.dir/layer/binaryop.cpp.o make[1]: *** [CMakeFiles/Makefile2:143: src/CMakeFiles/ncnn.dir/all] Error 2 make: *** [Makefile:136: all] Error 2

    context | 编译/运行环境 | バックグラウンド

    $ cmake --version cmake version 3.25.0

    CMake suite maintained and supported by Kitware (kitware.com/cmake).

    $ make --version GNU Make 4.2.1 Built for x86_64-pc-linux-gnu Copyright (C) 1988-2016 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.

    how to reproduce | 复现步骤 | 再現方法

    1. cmake -DCMAKE_TOOLCHAIN_FILE=../toolchains/himix200.toolchain.cmake ..
    2. make -j$(nproc)

    more | 其他 | その他

    opened by live106 1
Releases(20221128)
  • 20221128(Nov 28, 2022)

    编译版本,默认配置,android-ndk-r25b,xcode 12.4,ubuntu-18.04,ubuntu-20.04,ubuntu-22.04,vs2015,vs2017,vs2019,vs2022,emscripten-2.0.8 | file | content | arch | |---|---|---| |ncnn-full-source.zip |包含全部 submodule 代码的完整源码 | | |ncnn-android.zip | android 静态库/动态库 | armeabi-v7a + arm64-v8a + x86 + x86_64 | |ncnn-android-vulkan.zip | android 静态库/动态库,支持 GPU | armeabi-v7a + arm64-v8a + x86 + x86_64 | |ncnn-ios.zip | ios 静态库,with and w/o bitcode | armv7 + arm64 + arm64e + i386 + x86_64 | |ncnn-ios-vulkan.zip | ios 静态库,支持 GPU,with and w/o bitcode | arm64 + arm64e + x86_64 | |ncnn-macos.zip | macos 静态库 | x86_64 + arm64 | |ncnn-macos-vulkan.zip | macos 静态库,支持 GPU | x86_64 + arm64 | |ncnn-ubuntu.zip | ubuntu linux 静态库/动态库,支持 GPU,模型转换工具 | x86_64 | |ncnn-windows.zip | windows 静态库/动态库,支持 GPU,模型转换工具 | x86 + x64 + arm + arm64 | |ncnn-webassembly.zip | webassembly 静态库 | wasm32 + simd + threads + simd-threads |

    新增loongarch64 lsx向量指令集优化,包括absval/batchnorm/bias/binaryop/cast/clip/concat/convolution1d/convolutiondepthwise/convolution/crop/deconvolutiondepthwise/deconvolution/dequantize/dropout/eltwise/flatten/hardsigmoid/hardswish/innerproduct/interp/mish/packing/padding/pooling/prelu/quantize/relu/requantize/sigmoid/slice/softmax/swish/tanh/unaryop算子(@junchao-loongson) layernorm x86优化(@LinHeLurking @LRY89757) batchnorm/elu/prelu/gelu x86优化(@LRY89757) softmax arm neon优化(@luqiang-guo) batchnorm/instancenorm riscv vector优化(@thelastlin) deformableconv2d x86优化(@miemie2013) elu vulkan优化(@Yoh-Z) convolution int8 x86 sse2/avx2优化 更新riscv vector segment load/store(@thelastlin) 改善内存池回收机制(@LinHeLurking) 新增获取cpu物理核心数量api,默认线程数设为物理大核心数量 实现控制单层运算特性是否启用的参数 更通用的macos/ios cpu特性探测过程,a15/a16/m2启用bf16和i8mm指令集 统一innerproduct x86 fp32/fp16s内核代码 修复在android省电模式cpu离线导致openmp崩溃的问题 实现glu算子与对应的pnnx转换(@csukuangfj) 新增fold和unfold算子 新增gridsample算子与对应的pnnx转换(@LRY89757) lstm支持proj_size参数 groupnorm支持1d/2d/4d输入计算 squeeze/expanddims支持4d输入输出 multiheadattention支持kdim vdim参数 修复convolutiondepthwise allocator的错误设置(@w8501) 修正windows arm环境中convolution权重为空的问题 修复onnx2ncnn blob名字超出255长度的问题(@ZhangGe6) 修正expanddims axes参数id错误的问题(@LiuYi-Up) 修正c api allocator无法工作的问题(@qiqikit) 更严格的编译器armv7 fp16功能检查和兼容 修复老版本gcc编译avx512代码的编译错误(@bestpower) 修复windows-arm64编译(@zchrissirhcz) 修复在老版本ndk引用ncnn链接atomic内置函数失败的问题 修复新版本pybind11编译错误(@tpoisonooo) python模块支持mat.numpy()(@csukuangfj) 更新pybind11和glslang子模块 pyncnn发布python 3.11包和windows arm版本 pnnx支持pytorch 1.13 pnnx现已支持在cpu上加载gpu导出的torchscript pnnx保存onnx-zero模型文件 pnnx转换时将常量存储在临时文件减少内存占用 pnnx新增命令行参数fp16=0/1控制是否用fp16保存onnx-zero/ncnn模型 pnnx支持大部分数学函数转换,新增nn.Softmax2d/nn.Fold/nn.Unfold/F.fold/F.unfold/bitwise_left_shift/bitwise_right_shift转换 pnnx改善和匹配inplace slice copy操作 融合更多静态的F.convND/F.linear为nn module 合并临接的reshape 合并pad到conv中 改善pnnx F.softmax转换对dtype兼容性(@EdVince) 修正pnnx softmax/normalize/slice负数axis转换错误的问题 修正pnnx slice end下标错误问题 修正pnnx转ncnn保存fp16权重没考虑对齐的问题 pnnx遇到动态size时不再折叠为常量 pnnx自动折叠new_full/full_like yolov5示例支持yolov5 6.2(@shaoshengsong) 修复编译警告(@tpoisonooo @veahow) 删除无用空行(@MollySophia @Menci) 修正空格对齐(@tonori) 修正拼写错误(@LRY89757 @Zepan @eltociear) 忽略.xmake目录,CMakeSettings.json,Visual Studio CMake文件(@zchrissirhcz) 重构README(@septs) 改善README布局(@magicse) 添加一些示例项目链接(@magicse @shaoshengsong) faq新增有关禁用fp16设置的内容(@MisakaBit) 更新riscv rvv ci 新增c906 ci 新增loongarch64 lsx ci 迁移部分github action ci到腾讯ci 新增TH1520 cmake toolchain(@luyanaa) 切分大型单元测试加快多进程测试速度 新增Intel Celeron M 420跑分(@MouriNaruto) 新增T-Head TH1520跑分(@YuzukiTsuru) 新增rock5b rk3588跑分(@hwdef)

    New Contributors

    • @LinHeLurking made their first contribution in https://github.com/Tencent/ncnn/pull/4065
    • @septs made their first contribution in https://github.com/Tencent/ncnn/pull/4114
    • @w8501 made their first contribution in https://github.com/Tencent/ncnn/pull/4173
    • @MollySophia made their first contribution in https://github.com/Tencent/ncnn/pull/4187
    • @Menci made their first contribution in https://github.com/Tencent/ncnn/pull/4188
    • @magicse made their first contribution in https://github.com/Tencent/ncnn/pull/4193
    • @tonori made their first contribution in https://github.com/Tencent/ncnn/pull/4217
    • @YuzukiTsuru made their first contribution in https://github.com/Tencent/ncnn/pull/4240
    • @ZhangGe6 made their first contribution in https://github.com/Tencent/ncnn/pull/4236
    • @MisakaBit made their first contribution in https://github.com/Tencent/ncnn/pull/4248
    • @LiuYi-Up made their first contribution in https://github.com/Tencent/ncnn/pull/4259
    • @veahow made their first contribution in https://github.com/Tencent/ncnn/pull/4274
    • @csukuangfj made their first contribution in https://github.com/Tencent/ncnn/pull/4283
    • @Zepan made their first contribution in https://github.com/Tencent/ncnn/pull/4287
    • @bestpower made their first contribution in https://github.com/Tencent/ncnn/pull/4294
    • @shaoshengsong made their first contribution in https://github.com/Tencent/ncnn/pull/4328
    • @junchao-loongson made their first contribution in https://github.com/Tencent/ncnn/pull/4242
    • @eltociear made their first contribution in https://github.com/Tencent/ncnn/pull/4358

    Full Changelog: https://github.com/Tencent/ncnn/compare/20220729...20221128

    Source code(tar.gz)
    Source code(zip)
    ncnn-20221128-android-shared.zip(9.51 MB)
    ncnn-20221128-android-vulkan-shared.zip(14.54 MB)
    ncnn-20221128-android-vulkan.zip(19.57 MB)
    ncnn-20221128-android.zip(11.05 MB)
    ncnn-20221128-full-source.zip(20.10 MB)
    ncnn-20221128-ios-bitcode.zip(48.44 MB)
    ncnn-20221128-ios-vulkan-bitcode.zip(54.98 MB)
    ncnn-20221128-ios-vulkan.zip(12.40 MB)
    ncnn-20221128-ios.zip(11.80 MB)
    ncnn-20221128-macos-vulkan.zip(9.26 MB)
    ncnn-20221128-macos.zip(5.59 MB)
    ncnn-20221128-ubuntu-1804-shared.zip(5.02 MB)
    ncnn-20221128-ubuntu-1804.zip(23.32 MB)
    ncnn-20221128-ubuntu-2004-shared.zip(5.16 MB)
    ncnn-20221128-ubuntu-2004.zip(23.91 MB)
    ncnn-20221128-ubuntu-2204-shared.zip(5.31 MB)
    ncnn-20221128-ubuntu-2204.zip(24.62 MB)
    ncnn-20221128-webassembly.zip(2.56 MB)
    ncnn-20221128-windows-vs2015-shared.zip(6.61 MB)
    ncnn-20221128-windows-vs2015.zip(31.38 MB)
    ncnn-20221128-windows-vs2017-shared.zip(6.89 MB)
    ncnn-20221128-windows-vs2017.zip(34.46 MB)
    ncnn-20221128-windows-vs2019-shared.zip(8.46 MB)
    ncnn-20221128-windows-vs2019.zip(41.73 MB)
    ncnn-20221128-windows-vs2022-shared.zip(8.49 MB)
    ncnn-20221128-windows-vs2022.zip(41.86 MB)
  • 20220729(Jul 29, 2022)

    编译版本,默认配置,android-ndk-r24,xcode 12.4,ubuntu-18.04,ubuntu-20.04,ubuntu-22.04,vs2015,vs2017,vs2019,vs2022,emscripten-2.0.8 | file | content | arch | |---|---|---| |ncnn-full-source.zip |包含全部 submodule 代码的完整源码 | | |ncnn-android.zip | android 静态库/动态库 | armeabi-v7a + arm64-v8a + x86 + x86_64 | |ncnn-android-vulkan.zip | android 静态库/动态库,支持 GPU | armeabi-v7a + arm64-v8a + x86 + x86_64 | |ncnn-ios.zip | ios 静态库,with and w/o bitcode | armv7 + arm64 + arm64e + i386 + x86_64 | |ncnn-ios-vulkan.zip | ios 静态库,支持 GPU,with and w/o bitcode | arm64 + arm64e + x86_64 | |ncnn-macos.zip | macos 静态库 | x86_64 + arm64 | |ncnn-macos-vulkan.zip | macos 静态库,支持 GPU | x86_64 + arm64 | |ncnn-ubuntu.zip | ubuntu linux 静态库/动态库,支持 GPU,模型转换工具 | x86_64 | |ncnn-windows.zip | windows 静态库/动态库,支持 GPU,模型转换工具 | x86 + x64 + arm + arm64 | |ncnn-webassembly.zip | webassembly 静态库 | wasm32 + simd + threads + simd-threads |

    batchnorm avx512 优化(@LRY89757) 新增DeformableConv2d层和单元测试(@miemie2013) 修复conv3x3 winograd tensorcore权重数据错乱导致结果出错的问题 修复memorydata 4维数据转换的问题 pnnx转换torchvision.ops.DeformConv2d到ncnn pnnx自动删除无用的 mul + torch.ones 和 add + torch.zeros pnnx修复动态shape时删除无用pad可能的崩溃问题 pnnx修复动态shape时错误删除upsample的问题 添加sse优化文档(@DC-Zhou) 更加严格的编译器riscv vector支持检查,删除rvv-0.7.1编译支持 更新ci中android ndk路径,使用android-ndk-r24打包

    New Contributors

    • @DC-Zhou made their first contribution in https://github.com/Tencent/ncnn/pull/4053
    • @miemie2013 made their first contribution in https://github.com/Tencent/ncnn/pull/4070

    Full Changelog: https://github.com/Tencent/ncnn/compare/20220721...20220729

    Source code(tar.gz)
    Source code(zip)
    ncnn-20220729-android-shared.zip(8.80 MB)
    ncnn-20220729-android-vulkan-shared.zip(13.79 MB)
    ncnn-20220729-android-vulkan.zip(18.46 MB)
    ncnn-20220729-android.zip(10.14 MB)
    ncnn-20220729-full-source.zip(19.45 MB)
    ncnn-20220729-ios-bitcode.zip(45.78 MB)
    ncnn-20220729-ios-vulkan-bitcode.zip(53.10 MB)
    ncnn-20220729-ios-vulkan.zip(11.93 MB)
    ncnn-20220729-ios.zip(11.07 MB)
    ncnn-20220729-macos-vulkan.zip(8.81 MB)
    ncnn-20220729-macos.zip(5.20 MB)
    ncnn-20220729-ubuntu-1804-shared.zip(4.64 MB)
    ncnn-20220729-ubuntu-1804.zip(21.43 MB)
    ncnn-20220729-ubuntu-2004-shared.zip(4.82 MB)
    ncnn-20220729-ubuntu-2004.zip(22.14 MB)
    ncnn-20220729-ubuntu-2204-shared.zip(4.94 MB)
    ncnn-20220729-ubuntu-2204.zip(22.72 MB)
    ncnn-20220729-webassembly.zip(2.44 MB)
    ncnn-20220729-windows-vs2015-shared.zip(6.35 MB)
    ncnn-20220729-windows-vs2015.zip(29.72 MB)
    ncnn-20220729-windows-vs2017-shared.zip(6.66 MB)
    ncnn-20220729-windows-vs2017.zip(32.72 MB)
    ncnn-20220729-windows-vs2019-shared.zip(8.19 MB)
    ncnn-20220729-windows-vs2019.zip(39.74 MB)
    ncnn-20220729-windows-vs2022-shared.zip(8.21 MB)
    ncnn-20220729-windows-vs2022.zip(39.84 MB)
  • 20220721(Jul 21, 2022)

    编译版本,默认配置,android-ndk-r23c,xcode 12.4,ubuntu-18.04,ubuntu-20.04,ubuntu-22.04,vs2015,vs2017,vs2019,vs2022,emscripten-2.0.8 | file | content | arch | |---|---|---| |ncnn-full-source.zip |包含全部 submodule 代码的完整源码 | | |ncnn-android.zip | android 静态库/动态库 | armeabi-v7a + arm64-v8a + x86 + x86_64 | |ncnn-android-vulkan.zip | android 静态库/动态库,支持 GPU | armeabi-v7a + arm64-v8a + x86 + x86_64 | |ncnn-ios.zip | ios 静态库,with and w/o bitcode | armv7 + arm64 + arm64e + i386 + x86_64 | |ncnn-ios-vulkan.zip | ios 静态库,支持 GPU,with and w/o bitcode | arm64 + arm64e + x86_64 | |ncnn-macos.zip | macos 静态库 | x86_64 + arm64 | |ncnn-macos-vulkan.zip | macos 静态库,支持 GPU | x86_64 + arm64 | |ncnn-ubuntu.zip | ubuntu linux 静态库/动态库,支持 GPU,模型转换工具 | x86_64 | |ncnn-windows.zip | windows 静态库/动态库,支持 GPU,模型转换工具 | x86 + x64 + arm + arm64 | |ncnn-webassembly.zip | webassembly 静态库 | wasm32 + simd + threads + simd-threads |

    armv5 convolution gemm int8优化 armv6 dsp convolution gemm int8优化 armv6 dsp convolution int8 winograd优化 mips msa/loongson mmi convolution int8 winograd优化 armv8.4 i8mm convolution gemm int8优化 探测编译器armv8.4/armv8.6的支持情况 优化innerproduct fp16s权重转换的内存消耗 统一arm eltwise不同elempack的分支 修复多线程下arm rnn/gru/lstm计算结果错误的问题 修复android-ndk-r16b编译多线程运行报错的问题 loongarch架构强制识别为mips以提升性能(@HougeLangley) 修复非常老版本的gcc编译错误 Mat创建时检查OOM 修复在android api 26编译找不到vkGetAndroidHardwareBufferPropertiesANDROID符号的问题 修复x86 fp32转fp16可能存在的内存泄漏 pnnx支持torch 1.12 pnnx识别torchscript文件格式并输出报错 pnnx转换torch.tensor_split pnnx合并多次同轴slice为tensor_split,修正插入位置 pnnx去除无用的一倍upsample pnnx转ncnn时合并多个BinaryOp为加权求和Eltwise pnnx合并megvii风格的shufflechannel+slice 添加pkgconfig(@djdisodo) 优化检测示例后处理nms(@jedi007) example检查加载模型返回值(@zchrissirhcz @jedi007) 添加Loongson2F toolchain(@luyanaa) 添加君正x2000 toolchain 添加ncnn svg图标(@ArchieMeng) 改善protobuf FAQ文档(@tpoisonooo) README添加ncnn-android-yolov7(@xiang-wuu) 添加yolov7示例(@cmdbug) 添加yolov7_pnnx示例(@hariag) benchmark新增fastestdet模型(@dog-qiuqiu) 新增armv8.6 ci和coverage 新增x86无sse ci 新增x86 address sanitizer ci

    New Contributors

    • @djdisodo made their first contribution in https://github.com/Tencent/ncnn/pull/3984
    • @jedi007 made their first contribution in https://github.com/Tencent/ncnn/pull/4001
    • @xiang-wuu made their first contribution in https://github.com/Tencent/ncnn/pull/4038
    • @ArchieMeng made their first contribution in https://github.com/Tencent/ncnn/pull/4037
    • @HougeLangley made their first contribution in https://github.com/Tencent/ncnn/pull/4044

    Full Changelog: https://github.com/Tencent/ncnn/compare/20220701...20220721

    Source code(tar.gz)
    Source code(zip)
    ncnn-20220721-android-shared.zip(8.69 MB)
    ncnn-20220721-android-vulkan-shared.zip(13.67 MB)
    ncnn-20220721-android-vulkan.zip(18.26 MB)
    ncnn-20220721-android.zip(10.04 MB)
    ncnn-20220721-full-source.zip(19.45 MB)
    ncnn-20220721-ios-bitcode.zip(45.47 MB)
    ncnn-20220721-ios-vulkan-bitcode.zip(52.90 MB)
    ncnn-20220721-ios-vulkan.zip(11.90 MB)
    ncnn-20220721-ios.zip(11.03 MB)
    ncnn-20220721-macos-vulkan.zip(8.78 MB)
    ncnn-20220721-macos.zip(5.18 MB)
    ncnn-20220721-ubuntu-1804-shared.zip(4.63 MB)
    ncnn-20220721-ubuntu-1804.zip(21.34 MB)
    ncnn-20220721-ubuntu-2004-shared.zip(4.80 MB)
    ncnn-20220721-ubuntu-2004.zip(22.06 MB)
    ncnn-20220721-ubuntu-2204-shared.zip(4.92 MB)
    ncnn-20220721-ubuntu-2204.zip(22.61 MB)
    ncnn-20220721-webassembly.zip(2.41 MB)
    ncnn-20220721-windows-vs2015-shared.zip(6.34 MB)
    ncnn-20220721-windows-vs2015.zip(29.61 MB)
    ncnn-20220721-windows-vs2017-shared.zip(6.65 MB)
    ncnn-20220721-windows-vs2017.zip(32.61 MB)
    ncnn-20220721-windows-vs2019-shared.zip(8.17 MB)
    ncnn-20220721-windows-vs2019.zip(39.60 MB)
    ncnn-20220721-windows-vs2022-shared.zip(8.18 MB)
    ncnn-20220721-windows-vs2022.zip(39.70 MB)
  • 20220701(Jul 1, 2022)

    编译版本,默认配置,android-ndk-r23c,xcode 12.4,ubuntu-18.04,ubuntu-20.04,ubuntu-22.04,vs2015,vs2017,vs2019,vs2022,emscripten-2.0.8 | file | content | arch | |---|---|---| |ncnn-full-source.zip |包含全部 submodule 代码的完整源码 | | |ncnn-android.zip | android 静态库/动态库 | armeabi-v7a + arm64-v8a + x86 + x86_64 | |ncnn-android-vulkan.zip | android 静态库/动态库,支持 GPU | armeabi-v7a + arm64-v8a + x86 + x86_64 | |ncnn-ios.zip | ios 静态库,with and w/o bitcode | armv7 + arm64 + arm64e + i386 + x86_64 | |ncnn-ios-vulkan.zip | ios 静态库,支持 GPU,with and w/o bitcode | arm64 + arm64e + x86_64 | |ncnn-macos.zip | macos 静态库 | x86_64 + arm64 | |ncnn-macos-vulkan.zip | macos 静态库,支持 GPU | x86_64 + arm64 | |ncnn-ubuntu.zip | ubuntu linux 静态库/动态库,支持 GPU,模型转换工具 | x86_64 | |ncnn-windows.zip | windows 静态库/动态库,支持 GPU,模型转换工具 | x86 + x86_64 + arm + arm64 | |ncnn-webassembly.zip | webassembly 静态库 | wasm32 + simd + threads + simd-threads |

    x86/arm/mips/risc-v/vulkan 去除无用的权重内存占用 改善x86/arm/mips/risc-v winograd卷积选择策略,独立出dot函数 合并逐元素运算算子不同elempack的实现 x86 sse/avx bnll/tan优化(@jasonZhang892) x86 avx512 tanh优化(@jasonZhang892) x86 winograd 输入变换函数优化 x86 sse/avx convolution winograd23/42 pack1优化 x86 f16c innerproduct fp16s优化 arm neon tan/arcsin/arcos优化(@jasonZhang892) 改善arm sgemm卷积的选择策略 arm neon pooling bf16s优化 arm neon innerproduct汇编优化 armv7 vfpv4编译探测和运行时检测 armv7 vfpv4 cast fp16优化 armv7 vfpv4 innerproduct fp16s优化 armv5 sgemm卷积优化 mips msa cast fp16优化 mips sgemm卷积/convdw3x3优化 mips msa innerproduct fp16s优化 loongson mmi convolution gemm int8优化 risc-v vector erfc和gelu优化(@thelastlin) 优化sgemm和winograd尾尺寸寄存器布局 risc-v sgemm/winograd卷积/innerproduct/convdw3x3优化 avx512bf16/avx512fp16编译探测和运行时检测 avx512bf16/avx512fp16 cast bf16/fp16优化 armv8.2 asimdfhm,armv8.4 bf16 i8mm,armv8.6 sve sve2编译探测和运行时检测 新增einsum层实现和对应的pnnx转换 simpleomp支持libgomp abi layernorm支持一维输入和沿w做norm rnn/lstm/gru支持openmp多线程加速 mali-t760启用fp16运算 更多的binaryop arm/mips/riscv/x86特化实现 修复unaryop_x86 gcc-4.4编译问题(@Yoh-Z) 修复Mat fill gcc-4.4编译问题(@Yoh-Z) 添加Power层单元测试(@proydakov) yolov5_pnnx示例自动适应不同的num_class数量(@FeiGeChuanShu) 修正yolox输入shape w!=h的情况(@FeiGeChuanShu) 修复armv7中fp16转fp32发生bus error的问题(@cugxchen) 修复convdw/deconvdw的avx512代码路径(@Yoh-Z) 修复imreadwrite中total可能的溢出问题(@Z841973620) 去除scale_x86中无效的死代码(@luyanaa) 去除pnnx ir中无效的死代码(@moozae) 修正test_deconvolutiondepthwise3d中printf参数错误(@Nlzy) 修复mips架构上cmake寻找thread的错误 修复在不支持cooperative matrix扩展测试崩溃的问题 改善risc-v vector vfredsum/vfredusum编译器兼容(@thelastlin) 修复某些arm编译器循环优化劣化的问题 更新glslang,修复在使用系统glslang的include路径问题 拆分arm82源码到单独文件,减小编译体积和内存占用 修复ios universal arm82编译开关启用的问题 统一winograd函数命名 修复padding arm编译警告 修复ios/tools/arm82/non-int8编译警告(@proydakov) 修复LGTM警告(@proydakov) pnnx支持转换torch bmm/min/max/einsum/arange/bitwise_and/bitwise_not/bitwise_or/bitwise_xor/eq/gather/ge/gt/le/lt/ne/norm/index_select/scatter_add/complex/imag/real/fft/fft2/fftn/hfft/hfft2/hfftn/ifft/ifft2/ifftn/ihfft/ihfft2/ihfftn/irfft/irfft2/irfftn/rfft/rfft2/rfftn,Tensor new_ones/new_zeros/masked_fill,F.normalize一维情况 pnnx ir支持复数数据类型 pnnx支持转换Tensor select到ncnn 新增pnnx导出为onnx函数 pnnx导出ncnn fp16存储设计为一个pass pnnx添加更多hardsigmoid合并模式 pnnx合并multiheadattention的尾部unpack pnnx在静态shape输入时能有效的折叠常量 pnnx合并静态权重的卷积F.convND为nn.ConvND 修复pnnx生成slice表达式遇到动态参数崩溃的问题 pnnx支持dict作为模型输出的转换 pnnx转换ncnn模型遇到4d/5d输入nn.Linear自动添加reshape pnnx去除单输入的cat算子 pnnx在合并表达式时跳过可折叠的常量 pnnx兼容更多inplace风格的算子,改善子图匹配浮点和整数比较 pnnx导出moduleop时存出所有内部权重 pnnx添加vit_b_32和convnext端到端模型测试 pnnx添加swin_transformer模型测试 gitignore添加python生成的文件(@triple-Mu) 添加c906和c910 v240 toolchain 迁移pnnx,loongarch和gpu的ci到自建服务器 修复loongarch ci 添加avx512 spr cpu ci 更新ci的qemu版本 cmake安装目标路径采用gnuinstalldirs 限制github action配置的权限(@nathannaveen) ci的swiftshader使用单线程 修复vs2022 ci中protobuf兼容问题 添加关于关闭android界面和设置cpu/gpu性能模式的信息 更新README中QQ群信息(@zchrissirhcz) README中的YOLOV改为YOLOv(@zhiqwang) 更新树莓派和d1的编译文档 更新添加自定义算子文档的过时信息(@LRY89757) 添加英文版faq文档(@Jianbo-Ning) 添加英文版build-for-visualstudio文档(@dankernel) 添加vision_transformer benchmark(@tpoisonooo) 更新rk3399 rk3288 gpu benchmark数据 更新qcom810 qcom855plus benchmark数据 添加Jetson AGX Orin/Jetson AGX Xavier/AX620A benchmark数据(@BUG1989) 添加loongson和sunway benchmark数据(@wzyforgit) 添加RK3588 benchmark数据(@FeiGeChuanShu) 添加amd 5700g benchmark数据(@hwdef) release添加ubuntu 22.04预编译包 release android采用ndk-r23c编译 release预编译包保护软链接

    New Contributors

    • @LRY89757 made their first contribution in https://github.com/Tencent/ncnn/pull/3741
    • @nathannaveen made their first contribution in https://github.com/Tencent/ncnn/pull/3758
    • @Z841973620 made their first contribution in https://github.com/Tencent/ncnn/pull/3757
    • @Nlzy made their first contribution in https://github.com/Tencent/ncnn/pull/3774
    • @cugxchen made their first contribution in https://github.com/Tencent/ncnn/pull/3779
    • @luyanaa made their first contribution in https://github.com/Tencent/ncnn/pull/3821
    • @triple-Mu made their first contribution in https://github.com/Tencent/ncnn/pull/3824
    • @Jianbo-Ning made their first contribution in https://github.com/Tencent/ncnn/pull/3901
    • @moozae made their first contribution in https://github.com/Tencent/ncnn/pull/3965

    Full Changelog: https://github.com/Tencent/ncnn/compare/20220420...20220701

    Source code(tar.gz)
    Source code(zip)
    ncnn-20220701-android-shared.zip(8.65 MB)
    ncnn-20220701-android-vulkan-shared.zip(13.63 MB)
    ncnn-20220701-android-vulkan.zip(18.22 MB)
    ncnn-20220701-android.zip(10.00 MB)
    ncnn-20220701-full-source.zip(19.39 MB)
    ncnn-20220701-ios-bitcode.zip(45.39 MB)
    ncnn-20220701-ios-vulkan-bitcode.zip(52.81 MB)
    ncnn-20220701-ios-vulkan.zip(11.89 MB)
    ncnn-20220701-ios.zip(11.02 MB)
    ncnn-20220701-macos-vulkan.zip(8.77 MB)
    ncnn-20220701-macos.zip(5.18 MB)
    ncnn-20220701-ubuntu-1804-shared.zip(4.63 MB)
    ncnn-20220701-ubuntu-1804.zip(21.35 MB)
    ncnn-20220701-ubuntu-2004-shared.zip(4.80 MB)
    ncnn-20220701-ubuntu-2004.zip(22.03 MB)
    ncnn-20220701-ubuntu-2204-shared.zip(4.91 MB)
    ncnn-20220701-ubuntu-2204.zip(22.59 MB)
    ncnn-20220701-webassembly.zip(2.41 MB)
    ncnn-20220701-windows-vs2015-shared.zip(6.33 MB)
    ncnn-20220701-windows-vs2015.zip(29.59 MB)
    ncnn-20220701-windows-vs2017-shared.zip(6.64 MB)
    ncnn-20220701-windows-vs2017.zip(32.56 MB)
    ncnn-20220701-windows-vs2019-shared.zip(8.16 MB)
    ncnn-20220701-windows-vs2019.zip(39.57 MB)
    ncnn-20220701-windows-vs2022-shared.zip(8.18 MB)
    ncnn-20220701-windows-vs2022.zip(39.67 MB)
  • 20220420(Apr 20, 2022)

    编译版本,默认配置,android-ndk-r21d,xcode 12.4,ubuntu-18.04,ubuntu-20.04,vs2015,vs2017,vs2019,vs2022,emscripten-2.0.8 | file | content | arch | |---|---|---| |ncnn-full-source.zip |包含全部 submodule 代码的完整源码 | | |ncnn-android.zip | android 静态库/动态库 | armeabi-v7a + arm64-v8a + x86 + x86_64 | |ncnn-android-vulkan.zip | android 静态库/动态库,支持 GPU | armeabi-v7a + arm64-v8a + x86 + x86_64 | |ncnn-ios.zip | ios 静态库,with and w/o bitcode | armv7 + arm64 + arm64e + i386 + x86_64 | |ncnn-ios-vulkan.zip | ios 静态库,支持 GPU,with and w/o bitcode | arm64 + arm64e + x86_64 | |ncnn-macos.zip | macos 静态库 | x86_64 + arm64 | |ncnn-macos-vulkan.zip | macos 静态库,支持 GPU | x86_64 + arm64 | |ncnn-ubuntu.zip | ubuntu linux 静态库/动态库,支持 GPU,模型转换工具 | x86_64 | |ncnn-windows.zip | windows 静态库/动态库,支持 GPU,模型转换工具 | x86 + x86_64 | |ncnn-webassembly.zip | webassembly 静态库 | wasm32 + simd + threads + simd-threads |

    conv vulkan im2col+sgemm优化 conv vulkan winograd43优化 conv vulkan implicit gemm优化 deconv vulkan sgemm+col2im优化 conv/deconv vulkan local memory优化 conv vulkan 直接卷积unroll优化 改善conv vulkan winograd23/winograd43选择策略 融合conv vulkan winograd 前后的pad/crop到transform中 innerproduct vulkan 拆分两阶段优化 补全conv 1x1 vulkan任意packing 补全conv 3x3 winograd vulkan任意packing conv/deconv vulkan pack4 nvidia tensorcore优化 x86 sse/avx 数学函数优化(@Yoh-Z) unaryop x86 优化(@Yoh-Z) floor/ceil/abs x86 sse优化(@MouriNaruto) convoluition/convoluitiondepthwise/innerproduct/padding/pooling/interp/eltwise/crop/reshape/slice/hardsigmoid/swish/binaryop/clip/relu/sigmoid/unaryop x86 avx512优化 conv sgemm avx512优化 conv3x3 winograd avx512优化 deconvolution/deconvolutiondepthwise x86直接反卷积实现 softmax x86 sse/avx/avx512优化 quantize/dequantize/requantize mips msa优化 conv int8/convdw int8/innerproduct int8 mips msa优化 multiheadattention arm neon优化(@EdVince) softmax arm neon优化 conv3x3 winograd transform部分提出为可复用函数 x86 f16c指令集检测和分发 删除没什么用的avx2-fp16相关代码 simpleomp允许最多32个microtask参数 添加loongson mmi头文件和编译支持 新增deconv1d,deconv3d和对应的pnnx转换 修正老版本gcc的avx512编译参数问题 修正sigmoid x86在很大数值输入返回nan的问题 修正gpu推理convdw发生unlocked pool allocator destoryed too early的问题 避免mips msa推理时可能发生浮点数异常 batchnorm加载参数时避免除0异常 为新算子更新modelwriter copy_make_border添加reflect类型 mali g31/g52启用fp16 修复armhf工具链编译问题 global pooling强制使用fp32累加避免nan问题 修复某些android系统无法dlsym getauxval的问题 修正新版本moltenvk tanh兼容问题 提出vulkan激活函数,glsl中实现include 修复armv7编译单元测试失败的问题(@jasonZhang892) 修正conv3x3 winograd矩阵注释(@MouriNaruto) 修正how-to-build拼写错误,更新jetson-nano编译文档(@tpoisonooo) 更新ios编译文档(@mirrorsysu) 一些注释和代码清理和修复编译警告(@tpoisonooo) 修正readme中的单词大小写(@YoungSx) 更新use-ncnn-with-own-project中的glslang的库列表 ci新增msvc arm/arm64目标 ci新增linux loongarch目标 ci更新windows matrix和vs2022目标 修复vs2019打包 新增yolov5_pnnx例子 新增nanodetplus_pnnx例子 减少yolov5例子中后处理耗时(@UNeedCryDear) 修复yolov5.py框位置问题(@hariag) 更新ls2k1000的benchmark数据 pnnx支持转换torch unbind/ones/ones_like/full/full_like/randn_like/empty/empty_like/addmm pnnx支持torch 1.11.0版本 pnnx转换的ncnn模型文件使用fp16保存 pnnx在linux上链接pthread,修复windows minmax编译问题 pnnx新增静态msvc crt cmake选项 修正pnnx hardtanh 参数的ncnn转换 修复pnnx macos动态库加载路径的问题

    New Contributors

    • @MouriNaruto made their first contribution in https://github.com/Tencent/ncnn/pull/3591
    • @YoungSx made their first contribution in https://github.com/Tencent/ncnn/pull/3655
    • @hariag made their first contribution in https://github.com/Tencent/ncnn/pull/3656
    • @EdVince made their first contribution in https://github.com/Tencent/ncnn/pull/3667
    • @mirrorsysu made their first contribution in https://github.com/Tencent/ncnn/pull/3696
    • @jasonZhang892 made their first contribution in https://github.com/Tencent/ncnn/pull/3710
    • @UNeedCryDear made their first contribution in https://github.com/Tencent/ncnn/pull/3649

    Full Changelog: https://github.com/Tencent/ncnn/compare/20220216...20220420

    Source code(tar.gz)
    Source code(zip)
    ncnn-20220420-android-shared.zip(9.76 MB)
    ncnn-20220420-android-vulkan-shared.zip(17.39 MB)
    ncnn-20220420-android-vulkan.zip(17.98 MB)
    ncnn-20220420-android.zip(9.87 MB)
    ncnn-20220420-full-source.zip(18.85 MB)
    ncnn-20220420-ios-bitcode.zip(52.46 MB)
    ncnn-20220420-ios-vulkan-bitcode.zip(57.02 MB)
    ncnn-20220420-ios-vulkan.zip(12.67 MB)
    ncnn-20220420-ios.zip(12.21 MB)
    ncnn-20220420-macos-vulkan.zip(9.17 MB)
    ncnn-20220420-macos.zip(5.72 MB)
    ncnn-20220420-ubuntu-1804-shared.zip(12.54 MB)
    ncnn-20220420-ubuntu-1804.zip(21.21 MB)
    ncnn-20220420-ubuntu-2004-shared.zip(13.00 MB)
    ncnn-20220420-ubuntu-2004.zip(21.86 MB)
    ncnn-20220420-webassembly.zip(2.35 MB)
    ncnn-20220420-windows-vs2015-shared.zip(6.19 MB)
    ncnn-20220420-windows-vs2015.zip(29.28 MB)
    ncnn-20220420-windows-vs2017-shared.zip(6.37 MB)
    ncnn-20220420-windows-vs2017.zip(31.47 MB)
    ncnn-20220420-windows-vs2019-shared.zip(7.86 MB)
    ncnn-20220420-windows-vs2019.zip(38.05 MB)
    ncnn-20220420-windows-vs2022-shared.zip(7.89 MB)
    ncnn-20220420-windows-vs2022.zip(38.15 MB)
  • 20220216(Feb 16, 2022)

    编译版本,默认配置,android-ndk-r21d,xcode 12.4,ubuntu-18.04,ubuntu-20.04,vs2015,vs2017,vs2019,emscripten-2.0.8 | file | content | arch | |---|---|---| |ncnn-full-source.zip |包含全部 submodule 代码的完整源码 | | |ncnn-android.zip | android 静态库/动态库 | armeabi-v7a + arm64-v8a + x86 + x86_64 | |ncnn-android-vulkan.zip | android 静态库/动态库,支持 GPU | armeabi-v7a + arm64-v8a + x86 + x86_64 | |ncnn-ios.zip | ios 静态库,with and w/o bitcode | armv7 + arm64 + arm64e + i386 + x86_64 | |ncnn-ios-vulkan.zip | ios 静态库,支持 GPU,with and w/o bitcode | arm64 + arm64e + x86_64 | |ncnn-macos.zip | macos 静态库 | x86_64 + arm64 | |ncnn-macos-vulkan.zip | macos 静态库,支持 GPU | x86_64 + arm64 | |ncnn-ubuntu.zip | ubuntu linux 静态库/动态库,支持 GPU,模型转换工具 | x86_64 | |ncnn-windows.zip | windows 静态库/动态库,支持 GPU,模型转换工具 | x86 + x86_64 | |ncnn-webassembly.zip | webassembly 静态库 | wasm32 + simd + threads + simd-threads |

    conv sgemm pack4/pack1to4/pack4to1 x86 sse2/avx优化 conv3x3s1 winograd pack4/pack4to1 x86 sse2/avx优化 conv int8 gemm pack8to4/pack8to1/pack1to8 x86 xop/avx2/avx512-vnni/avx-vnni优化 conv3x3s1 int8 winograd pack8to4/pack8to1 x86 xop/avx2/avx512-vnni/avx-vnni优化 scale x86 avx优化(Yoh-Z) interp x86 avx优化(Yoh-Z) conv pack arm neon优化 x86 avx512基础架构 默认启用x86 avx512编译和运行时检测 解耦合x86 fma和avx2 不依赖libgcc的x86 cpu指令集探测 支持动态权重的卷积 修正可能因Mat成员函数没有内联导致的非法指令问题 修正可能因函数对象实例没有内联导致的非法指令问题 修正单元测试比较函数错误(yyuzhong) binaryop/unaryop/reduction支持4维输入 新增Tile层和torch.repeat的转换 新增MatMul层和torch.matmul的转换 armv8.2 dot编译为运行时可选 支持sw_64平台(wzyforgit) 增加c-api的cmake开关 c-api增加默认mat构造函数(tpoisonooo) 简化binaryop的函数对象代码(tpoisonooo) 修正interp nearest在有非常规scale_factor参数计算错误的问题 简化c-api自定义层forward_n参数类型 删除非avx2编译时退化sse2的警告(kagurazakakotori) 在64位编译时使用_mm_cvtsi128_si64降低内存访问(kagurazakakotori) 修正low-level op api文档错误(FeiGeChuanShu) 修正crop test缺失的doffset参数(xh-liu-tech) 修正arm convolution pack1to4 int8权重重排(cmdbug) 简化get_current_time平台相关宏(cmdbug) 修正armv7无neon编译时计算错误的问题 增加c906 v223工具链(zchrissirhcz) 添加第二个qq技术交流群答案(LJoson) python ci禁用tools和examples构建 ci动态库编译禁用LTO ci更新swiftshader-20220211 删除travis ci和readme相关条目(proydakov) 新增yolo-fastest模型benchmark(dog-qiuqiu) 更新来自Q-engineering树莓派/jetson-nano等benchmark数据 benchmark增加zynq-7020/z8350/n5105 pnnx支持转换torch dequantize/quantize_per_tensor/quantized.linearrelu/argmax/argmin/clone/normal/expand/var/amax/amin/logsumexp/prod/sum/arange/matmul/zeros_like/expand_like/deformconv2d/roialign/norm/stack/repeat/zeros/roll/remainder pnnx自动删除dropout算子 pnnx自动删除无pads的pad和noop算术表达式 pnnx常量折叠 pnnx转换4维常量数据 pnnx支持half数据类型导出的模型 pnnx转ncnn时删除尾部的reshape/permute pnnx合并conv1d-bn convtranspose1d-bn pnnx合并单一维度全select为unbind pnnx确保算子名唯一性 修正pnnx转ncnn时遇到无法展开的表达式发生崩溃的问题 pnnx转ncnn支持负数pads的F.pad pnnx转ncnn合并transpose-matmul pnnx转ncnn在pooling123d前后增加升维和降维的reshape模拟nn.MaxPool123d处理无batch维数据的行为 pnnx命令行参数的shape指定输入类型 pnnx自动寻找pytorch安装目录(Yutyrannus) pnnx ci自动拷贝dll文件(Yutyrannus) 添加pnnx命令行工具用法说明(ling0322)

    New Contributors

    • @wzyforgit made their first contribution in https://github.com/Tencent/ncnn/pull/3421
    • @dog-qiuqiu made their first contribution in https://github.com/Tencent/ncnn/pull/3470
    • @xh-liu-tech made their first contribution in https://github.com/Tencent/ncnn/pull/3475
    • @ling0322 made their first contribution in https://github.com/Tencent/ncnn/pull/3487
    • @kagurazakakotori made their first contribution in https://github.com/Tencent/ncnn/pull/3527
    • @LJoson made their first contribution in https://github.com/Tencent/ncnn/pull/3532
    • @Yoh-Z made their first contribution in https://github.com/Tencent/ncnn/pull/3540
    • @yyuzhong made their first contribution in https://github.com/Tencent/ncnn/pull/3556

    Full Changelog: https://github.com/Tencent/ncnn/compare/20211208...20220216

    Source code(tar.gz)
    Source code(zip)
    ncnn-20220216-android-shared.zip(9.06 MB)
    ncnn-20220216-android-vulkan-shared.zip(16.52 MB)
    ncnn-20220216-android-vulkan.zip(16.89 MB)
    ncnn-20220216-android.zip(8.95 MB)
    ncnn-20220216-full-source.zip(18.54 MB)
    ncnn-20220216-ios-bitcode.zip(49.16 MB)
    ncnn-20220216-ios-vulkan-bitcode.zip(54.83 MB)
    ncnn-20220216-ios-vulkan.zip(12.09 MB)
    ncnn-20220216-ios.zip(11.31 MB)
    ncnn-20220216-macos-vulkan.zip(8.64 MB)
    ncnn-20220216-macos.zip(5.27 MB)
    ncnn-20220216-ubuntu-1804-shared.zip(11.46 MB)
    ncnn-20220216-ubuntu-1804.zip(19.34 MB)
    ncnn-20220216-ubuntu-2004-shared.zip(11.90 MB)
    ncnn-20220216-ubuntu-2004.zip(20.05 MB)
    ncnn-20220216-webassembly.zip(2.23 MB)
    ncnn-20220216-windows-vs2015-shared.zip(5.91 MB)
    ncnn-20220216-windows-vs2015.zip(27.65 MB)
    ncnn-20220216-windows-vs2017-shared.zip(5.86 MB)
    ncnn-20220216-windows-vs2017.zip(28.47 MB)
    ncnn-20220216-windows-vs2019-shared.zip(5.97 MB)
    ncnn-20220216-windows-vs2019.zip(29.05 MB)
  • 20211208(Dec 8, 2021)

    编译版本,默认配置,android-ndk-r21d,xcode 12.4,ubuntu-18.04,ubuntu-20.04,vs2015,vs2017,vs2019,emscripten-2.0.8 | file | content | arch | |---|---|---| |ncnn-full-source.zip |包含全部 submodule 代码的完整源码 | | |ncnn-android.zip | android 静态库/动态库 | armeabi-v7a + arm64-v8a + x86 + x86_64 | |ncnn-android-vulkan.zip | android 静态库/动态库,支持 GPU | armeabi-v7a + arm64-v8a + x86 + x86_64 | |ncnn-ios.zip | ios 静态库,with and w/o bitcode | armv7 + arm64 + arm64e + i386 + x86_64 | |ncnn-ios-vulkan.zip | ios 静态库,支持 GPU,with and w/o bitcode | arm64 + arm64e + x86_64 | |ncnn-macos.zip | macos 静态库 | x86_64 + arm64 | |ncnn-macos-vulkan.zip | macos 静态库,支持 GPU | x86_64 + arm64 | |ncnn-ubuntu.zip | ubuntu linux 静态库/动态库,支持 GPU,模型转换工具 | x86_64 | |ncnn-windows.zip | windows 静态库/动态库,支持 GPU,模型转换工具 | x86 + x86_64 | |ncnn-webassembly.zip | webassembly 静态库 | wasm32 + simd + threads + simd-threads |

    Mat数据结构支持4维 新增Convolution3D, Pooling3D和对应的pnnx算子转换 这些算子支持4维输入输出(Cast, Packing, ReLU, BatchNorm, Reshape, Flatten, Permute, Crop)和对应的pnnx算子转换 C api增加4维mat Convolution1D常规的simd优化(sse/avx/neon/rvv/msa) 降低gpu推理时的cpu占用 降低单元测试cpu占用 改进pnnx转ncnn的batch轴识别 更新operators文档 修复开启simpleocv时仍然寻找系统opencv的问题(zchrissirhcz) 修正p2pnet例子绘图bug(FeiGeChuanShu) 支持c906 v2.2.2新工具链 更可靠的ci任务取消机制 ci新增avx512和nvidia t4 修复python wheel发布脚本 更新ci lavapipe版本(ljtjerry) 更新ci webassembly支持nodejs v16 更新FAQ(zhaqu, Bright476, Rinfair-CSP-A016) 修正拼写错误(cmdbug)

    New Contributors

    • @zhaqu made their first contribution in https://github.com/Tencent/ncnn/pull/3374
    • @ljtjerry made their first contribution in https://github.com/Tencent/ncnn/pull/3387
    • @Rinfair-CSP-A016 made their first contribution in https://github.com/Tencent/ncnn/pull/3399

    Full Changelog: https://github.com/Tencent/ncnn/compare/20211122...20211208

    Source code(tar.gz)
    Source code(zip)
    ncnn-20211208-android-shared.zip(8.12 MB)
    ncnn-20211208-android-vulkan-shared.zip(15.54 MB)
    ncnn-20211208-android-vulkan.zip(15.67 MB)
    ncnn-20211208-android.zip(7.78 MB)
    ncnn-20211208-full-source.zip(18.21 MB)
    ncnn-20211208-ios-bitcode.zip(51.21 MB)
    ncnn-20211208-ios-vulkan-bitcode.zip(58.52 MB)
    ncnn-20211208-ios-vulkan.zip(12.53 MB)
    ncnn-20211208-ios.zip(11.15 MB)
    ncnn-20211208-macos-vulkan.zip(8.45 MB)
    ncnn-20211208-macos.zip(5.15 MB)
    ncnn-20211208-ubuntu-1804-shared.zip(9.08 MB)
    ncnn-20211208-ubuntu-1804.zip(15.31 MB)
    ncnn-20211208-ubuntu-2004-shared.zip(9.43 MB)
    ncnn-20211208-ubuntu-2004.zip(15.83 MB)
    ncnn-20211208-webassembly.zip(2.07 MB)
    ncnn-20211208-windows-vs2015-shared.zip(5.36 MB)
    ncnn-20211208-windows-vs2015.zip(24.76 MB)
    ncnn-20211208-windows-vs2017-shared.zip(5.22 MB)
    ncnn-20211208-windows-vs2017.zip(24.36 MB)
    ncnn-20211208-windows-vs2019-shared.zip(5.27 MB)
    ncnn-20211208-windows-vs2019.zip(24.65 MB)
  • 20211122(Nov 22, 2021)

    编译版本,默认配置,android-ndk-r21d,xcode 12.4,ubuntu-18.04,ubuntu-20.04,vs2015,vs2017,vs2019,emscripten-2.0.8 | file | content | arch | |---|---|---| |ncnn-full-source.zip |包含全部 submodule 代码的完整源码 | | |ncnn-android.zip | android 静态库/动态库 | armeabi-v7a + arm64-v8a + x86 + x86_64 | |ncnn-android-vulkan.zip | android 静态库/动态库,支持 GPU | armeabi-v7a + arm64-v8a + x86 + x86_64 | |ncnn-ios.zip | ios 静态库,with and w/o bitcode | armv7 + arm64 + arm64e + i386 + x86_64 | |ncnn-ios-vulkan.zip | ios 静态库,支持 GPU,with and w/o bitcode | arm64 + arm64e + x86_64 | |ncnn-macos.zip | macos 静态库 | x86_64 + arm64 | |ncnn-macos-vulkan.zip | macos 静态库,支持 GPU | x86_64 + arm64 | |ncnn-ubuntu.zip | ubuntu linux 静态库/动态库,支持 GPU,模型转换工具 | x86_64 | |ncnn-windows.zip | windows 静态库/动态库,支持 GPU,模型转换工具 | x86 + x86_64 | |ncnn-webassembly.zip | webassembly 静态库 | wasm32 + simd + threads + simd-threads |

    PNNX(PyTorch Neural Network Exchange)是PyTorch模型部署的新方式,可以避开ONNX中间商,导出比较干净的高层OP risc-v v binaryop, hardswish, hardsigmoid, prelu, selu, dropout, gru, softmax优化(thelastlin) risc-v v conv1x1 fc优化 arm neon requantize leakyrelu优化 arm neon innerproduct gemm int8优化 针对c906 sgemm pack优化(yaobyPerfxlab, xianyi) x86 avx 卷积激活优化(zhiliu6) x86 sse convolution, convolutiondepthwise, pooling优化(Timen) 修正layernorm affine计算错误 修正pooling adaptive计算错误 修正deconvolution output padding在有bias时的计算错误 interp支持cubic aligncorner插值 interp支持对2维数据w方向拉伸 新增convolutiondepthwise1d和pnnx转换 rnn/lstm/gru支持不相等的输入输出个数 修正squeeze和expanddims层axes的处理 使用整数计算pooling adaptive参数上下界(Yutyrannus) 修复armv7 neon round模式差异 修复x86 sse/avx round模式差异 修复int8输入单元测试可能的越界读 修复在某些android平台无法获得auxv变量的问题 修正apple a11 a12检测armv8.2 dot扩展指令错误的问题 内存引用加载模型时不再拷贝到内存 修复pyncnn numpy转Mat时非对齐拷贝出错的问题 正确检测和支持apple a15和m1(zchrissirhcz) 修复AVX-only代码和用户提供opt时的单元测试逻辑(Timen) hardswish激活合并入convolution和innerproduct(zhiliu6) 自动解耦extract的Mat数据与Net实例的内存池 Net的custom_layer_to_index移到public(Timen) 删除代码中的无用变量(Sinky-Yan) cmake检测esp32的xtensa架构 cmake install安装ncnn工具(jinmingyi1998) 修正hardswish test beta参数(zhiliu6) 修复ncnnoptimize无法生成合理int8权重的问题 ncnnoptimize支持embd层 修正onnx2ncnn concat算子负数axis转换的问题 修复onnx2ncnn合并expand算子(grimoire) 修复某些arm kernel越界读数据的问题 修复NCNN_STDIO=OFF的编译问题 新增YOLOX例子, 更新预处理逻辑(FateScript) 新增RobustVideoMatting例子(FeiGeChuanShu) 新增scrfd croudhuman例子(MarsTechHAN) 新增YOLOv5 v6.0例子(zhiliu6) 新增CrowdCounting-P2PNet例子(FeiGeChuanShu) readme添加yolox(Sinky-Yan) 更新readme文档(fzyzcjy) 修复msvc编译器警告(TianZerL) 一些拼写错误修正(cmdbug, huoshuai-dot) 更新faq文档(ncnnnnn, luqiang-guo, zhiqwang, cmdbug, CharlesHuan, Shiro-Nana, zmq175) 更新operators算子文档(soragotosann) 更新d1和ls2k编译文档 新增termux编译文档(Sinky-Yan) 更新msvc编译文档(ncnnnnn) 更新编译文档(dankernel, mlbo, xiguadong) 更新macos openmp安装方法(zhiqwang) 更新量化文档中的链接(ShiquanYu) 修正python编译文档路径错误(nixondutt) benchmark新增m1数据(zhiqwang) benchmark新增mbp数据(AnnYellow) benchmark新增khadas vim3 amlogic a311d数据(elejke, FeiGeChuanShu) benchmark新增Phytium FT-2000+/64数据 benchmark新增RK3568数据(BowShotDS) benchmark新增RK3328数据(Liuyufanlyf) benchmark新增Ingenic X2000和T40数据(MarsTechHAN) ci更新swiftshader ci新增基于lavapipe的gpu测试 ci删除travis arm32(Richuanwu) ci更新xcode版本

    New Contributors

    • @SinKy-Yan made their first contribution in https://github.com/Tencent/ncnn/pull/3124
    • @FateScript made their first contribution in https://github.com/Tencent/ncnn/pull/3110
    • @BowShotDS made their first contribution in https://github.com/Tencent/ncnn/pull/3145
    • @Liuyufanlyf made their first contribution in https://github.com/Tencent/ncnn/pull/3164
    • @yaobyPerfxlab made their first contribution in https://github.com/Tencent/ncnn/pull/3159
    • @TianZerL made their first contribution in https://github.com/Tencent/ncnn/pull/3188
    • @grimoire made their first contribution in https://github.com/Tencent/ncnn/pull/3189
    • @dankernel made their first contribution in https://github.com/Tencent/ncnn/pull/3248
    • @Richuanwu made their first contribution in https://github.com/Tencent/ncnn/pull/3279
    • @ShiquanYu made their first contribution in https://github.com/Tencent/ncnn/pull/3283
    • @nixondutt made their first contribution in https://github.com/Tencent/ncnn/pull/3293
    • @mlbo made their first contribution in https://github.com/Tencent/ncnn/pull/3314
    • @luqiang-guo made their first contribution in https://github.com/Tencent/ncnn/pull/3332
    • @Yutyrannus made their first contribution in https://github.com/Tencent/ncnn/pull/3333
    • @xiguadong made their first contribution in https://github.com/Tencent/ncnn/pull/3344
    • @soragotosann made their first contribution in https://github.com/Tencent/ncnn/pull/3345
    • @huoshuai-dot made their first contribution in https://github.com/Tencent/ncnn/pull/3348
    • @fzyzcjy made their first contribution in https://github.com/Tencent/ncnn/pull/3358
    • @CharlesHuan made their first contribution in https://github.com/Tencent/ncnn/pull/3361
    • @Shiro-Nana made their first contribution in https://github.com/Tencent/ncnn/pull/3368
    • @zmq175 made their first contribution in https://github.com/Tencent/ncnn/pull/3369
    • @AnnYellow made their first contribution in https://github.com/Tencent/ncnn/pull/3373

    Full Changelog: https://github.com/Tencent/ncnn/compare/20210720...20211122

    Source code(tar.gz)
    Source code(zip)
    ncnn-20211122-android-shared.zip(7.84 MB)
    ncnn-20211122-android-vulkan-shared.zip(15.20 MB)
    ncnn-20211122-android-vulkan.zip(15.21 MB)
    ncnn-20211122-android.zip(7.40 MB)
    ncnn-20211122-full-source.zip(18.05 MB)
    ncnn-20211122-ios-bitcode.zip(49.11 MB)
    ncnn-20211122-ios-vulkan-bitcode.zip(56.83 MB)
    ncnn-20211122-ios-vulkan.zip(12.17 MB)
    ncnn-20211122-ios.zip(10.72 MB)
    ncnn-20211122-macos-vulkan.zip(8.17 MB)
    ncnn-20211122-macos.zip(4.89 MB)
    ncnn-20211122-ubuntu-1804-shared.zip(8.82 MB)
    ncnn-20211122-ubuntu-1804.zip(14.83 MB)
    ncnn-20211122-ubuntu-2004-shared.zip(9.09 MB)
    ncnn-20211122-ubuntu-2004.zip(15.23 MB)
    ncnn-20211122-webassembly.zip(1.99 MB)
    ncnn-20211122-windows-vs2015-shared.zip(5.25 MB)
    ncnn-20211122-windows-vs2015.zip(23.96 MB)
    ncnn-20211122-windows-vs2017-shared.zip(5.09 MB)
    ncnn-20211122-windows-vs2017.zip(23.21 MB)
    ncnn-20211122-windows-vs2019-shared.zip(5.04 MB)
    ncnn-20211122-windows-vs2019.zip(23.13 MB)
  • 20210720(Jul 20, 2021)

    编译版本,默认配置,android-ndk-r21d,xcode 12.2,ubuntu-16.04,ubuntu-18.04,ubuntu-20.04,vs2015, vs2017,vs2019, emscripten-2.0.8 | file | content | arch | |---|---|---| |ncnn-full-source.zip |包含全部 submodule 代码的完整源码 | | |ncnn-android.zip | android 静态库/动态库 | armeabi-v7a + arm64-v8a + x86 + x86_64 | |ncnn-android-vulkan.zip | android 静态库/动态库,支持 GPU | armeabi-v7a + arm64-v8a + x86 + x86_64 | |ncnn-ios.zip | ios 静态库,with and w/o bitcode | armv7 + arm64 + arm64e + i386 + x86_64 | |ncnn-ios-vulkan.zip | ios 静态库,支持 GPU,with and w/o bitcode | arm64 + arm64e + x86_64 | |ncnn-macos.zip | macos 静态库 | x86_64 + arm64 | |ncnn-macos-vulkan.zip | macos 静态库,支持 GPU | x86_64 + arm64 | |ncnn-ubuntu.zip | ubuntu linux 静态库/动态库,支持 GPU,模型转换工具 | x86_64 | |ncnn-windows.zip | windows 静态库/动态库,支持 GPU,模型转换工具 | x86 + x86_64 | |ncnn-webassembly.zip | webassembly 静态库 | wasm32 + simd + threads + simd-threads |

    risc-v架构cpu的vector 0.7.1和1.0扩展的算子优化,包括fp32/fp16,支持可变vlen,运行时检测vector和半精度扩展与函数分发(absval, cast, clip, concat, convolution, convolutiondepthwise, crop, deconvolution, deconvolutiondepthwise, flatten, innerproduct, interp, mish, packing, padding, pooling, relu, sigmoid, swish, tanh, unaryop) mips架构cpu的msa扩展的算子优化,运行时检测msa扩展与函数分发(absval, bias, binaryop, clip, concat, convolution, convolutiondepthwise, crop, deconvolution, deconvolutiondepthwise, dropout, eltwise, flatten, hardsigmoid, hardswish, innerproduct, interp, mish, packing, padding, pooling, prelu, relu, sigmoid, slice, softmax, swish, tanh, unaryop) 运行时检测avx,优化avx-only平台的性能(Timen) 支持loongarch64架构编译(tsuibin) 在apple设备上总是启用armv8.2 dot 可以不依赖opencv用simpleocv编译example示例 改善visual studio配合clang编译(Timen) 新增cmake选项NCNN_BF16,可禁止所有bf16相关代码减小库体积 大幅更新operators算子文档 arm neon数学函数fma优化 arm neon tanh优化(deepage) AbsVal/ReLU的risc-v vector优化(thelastlin) 修正x86 requantize pack4to1计算错误 修正innerproduct gemm in8计算错误(lsdustc) 修正warpaffine_bilinear_yuv420sp uv变换矩阵错误(DaydreamCoding) 修正yuv420sp2rgb在armv7平台越界读数据问题(zchrissirhcz) 修正vulkan push_constant编码时的内存泄漏(chenxiemin) darknet2ncnn支持sam层和letter_box检测(zhiliu6) 修复darknet2ncnn pad=0转换错误(zhiliu6) 修正ncnn2table工具解析命令行长数字错误 优化ncnn2table多线程效率 ncnn2table支持动态输入(jinmingyi1998) 修正ncnn2table easyquant方法文件索引错误(lsdustc) 修正ncnnoptimize丢失interp参数问题(jinmingyi1998) 修复ncnnoptimize在替换conv为fc算子后可能的段错误 修复ncnnoptimize在遇到孤立节点时可能的段错误 修复macos ci swiftshader编译 修正test_extractor.py的TypeError断言(zhiqwang) 修复macos的编译警告(proydakov) risc-v ci升级qemu 6.0.0并支持rvv扩展 ci现在过滤更改文件,减少触发 新增c906工具链 更新visual studio编译中文文档(zchrissirhcz) 修正examples yolov4视频加载错误(uniartisan) readme添加pocky群号(JuYanYan) readme添加scrfd(ncnnnnn) 增加issue模板(tpoisonooo) 一些拼写错误修正(hwdef) benchmark新增nanodet_m模型(BUG1989) benchmark新增v1605b数据(kalcohol) benchmark新增loongson 2k1000数据 benchmark更新jetson agx数据(zineos) 新增代码格式化ci,禁用restyled

    New Contributors

    • @JuYanYan made their first contribution in https://github.com/Tencent/ncnn/pull/2956
    • @uniartisan made their first contribution in https://github.com/Tencent/ncnn/pull/3005
    • @hwdef made their first contribution in https://github.com/Tencent/ncnn/pull/3045
    • @DaydreamCoding made their first contribution in https://github.com/Tencent/ncnn/pull/3048
    • @sdli1995 made their first contribution in https://github.com/Tencent/ncnn/pull/3081
    • @chenxiemin made their first contribution in https://github.com/Tencent/ncnn/pull/3088
    • @tsuibin made their first contribution in https://github.com/Tencent/ncnn/pull/3094

    Full Changelog: https://github.com/Tencent/ncnn/compare/20210525...20210720

    Source code(tar.gz)
    Source code(zip)
    ncnn-20210720-android-shared.zip(7.75 MB)
    ncnn-20210720-android-vulkan-shared.zip(15.12 MB)
    ncnn-20210720-android-vulkan.zip(15.12 MB)
    ncnn-20210720-android.zip(7.31 MB)
    ncnn-20210720-full-source.zip(17.32 MB)
    ncnn-20210720-ios-bitcode.zip(39.22 MB)
    ncnn-20210720-ios-vulkan-bitcode.zip(48.75 MB)
    ncnn-20210720-ios-vulkan.zip(10.72 MB)
    ncnn-20210720-ios.zip(8.88 MB)
    ncnn-20210720-macos-vulkan.zip(7.38 MB)
    ncnn-20210720-macos.zip(4.11 MB)
    ncnn-20210720-ubuntu-1604-shared.zip(8.40 MB)
    ncnn-20210720-ubuntu-1604.zip(13.93 MB)
    ncnn-20210720-ubuntu-1804-shared.zip(8.60 MB)
    ncnn-20210720-ubuntu-1804.zip(14.48 MB)
    ncnn-20210720-ubuntu-2004-shared.zip(8.86 MB)
    ncnn-20210720-ubuntu-2004.zip(14.82 MB)
    ncnn-20210720-webassembly.zip(1.93 MB)
    ncnn-20210720-windows-vs2015-shared.zip(5.07 MB)
    ncnn-20210720-windows-vs2015.zip(22.95 MB)
    ncnn-20210720-windows-vs2017-shared.zip(4.91 MB)
    ncnn-20210720-windows-vs2017.zip(22.23 MB)
    ncnn-20210720-windows-vs2019-shared.zip(4.88 MB)
    ncnn-20210720-windows-vs2019.zip(22.27 MB)
  • 20210525(May 25, 2021)

    编译版本,默认配置,android-ndk-r21d,xcode 12.2,ubuntu-16.04,ubuntu-18.04,ubuntu-20.04,vs2015, vs2017,vs2019, emscripten-2.0.8 | file | content | arch | |---|---|---| |ncnn-full-source.zip |包含全部 submodule 代码的完整源码 | | |ncnn-android.zip | android 静态库/动态库 | armeabi-v7a + arm64-v8a + x86 + x86_64 | |ncnn-android-vulkan.zip | android 静态库/动态库,支持 GPU | armeabi-v7a + arm64-v8a + x86 + x86_64 | |ncnn-ios.zip | ios 静态库,with and w/o bitcode | armv7 + arm64 + arm64e + i386 + x86_64 | |ncnn-ios-vulkan.zip | ios 静态库,支持 GPU,with and w/o bitcode | arm64 + arm64e + x86_64 | |ncnn-macos.zip | macos 静态库 | x86_64 + arm64 | |ncnn-macos-vulkan.zip | macos 静态库,支持 GPU | x86_64 + arm64 | |ncnn-ubuntu.zip | ubuntu linux 静态库/动态库,支持 GPU,模型转换工具 | x86_64 | |ncnn-windows.zip | windows 静态库/动态库,支持 GPU,模型转换工具 | x86 + x86_64 | |ncnn-webassembly.zip | webassembly 静态库 | wasm32 + simd + threads + simd-threads |

    绘图api,画框,画圆,画线,画字 simpleocv添加stb_image读写图片文件 运行时检测armv8.2 dotprod 未找到opencv时依然能编译量化工具 允许超过gpu硬件队列数限制并行提交任务 ncnnoptimize自动生成随机bin文件 修复convolutiondepthwise int8经过ncnnoptimize后权重错位问题 extract int8数据自动转换为fp32 添加获取模型输入输出blob接口(caishanli) 发布python预编译包(caishanli) 更新imx7d benchmark risc-v v 优化数学函数 运行时检测 risc-v vlen mlir2ncnn合并swish benchmark添加efficientnetv2_b0 修正jetson cmake工具链错误定义android宏的问题 修正macos/ios glslang framework打包问题 webassembly预编译包启用simpleocv 添加scrfd人脸检测例子 添加scrfd人脸检测截图(ZHEQIUSHUI) 更新FAQ(deepage) 修正拼写错误(FusionBolt) 添加下载数badge

    New Contributors

    • @FusionBolt made their first contribution in https://github.com/Tencent/ncnn/pull/2922
    • @ZHEQIUSHUI made their first contribution in https://github.com/Tencent/ncnn/pull/2925

    Full Changelog: https://github.com/Tencent/ncnn/compare/20210507...20210525

    Source code(tar.gz)
    Source code(zip)
    ncnn-20210525-android-shared.zip(7.07 MB)
    ncnn-20210525-android-vulkan-shared.zip(14.40 MB)
    ncnn-20210525-android-vulkan.zip(14.20 MB)
    ncnn-20210525-android.zip(6.43 MB)
    ncnn-20210525-full-source.zip(17.04 MB)
    ncnn-20210525-ios-bitcode.zip(45.00 MB)
    ncnn-20210525-ios-vulkan-bitcode.zip(54.28 MB)
    ncnn-20210525-ios-vulkan.zip(11.61 MB)
    ncnn-20210525-ios.zip(9.73 MB)
    ncnn-20210525-macos-vulkan.zip(7.67 MB)
    ncnn-20210525-macos.zip(4.40 MB)
    ncnn-20210525-ubuntu-1604-shared.zip(7.51 MB)
    ncnn-20210525-ubuntu-1604.zip(12.45 MB)
    ncnn-20210525-ubuntu-1804-shared.zip(7.65 MB)
    ncnn-20210525-ubuntu-1804.zip(12.83 MB)
    ncnn-20210525-ubuntu-2004-shared.zip(7.87 MB)
    ncnn-20210525-ubuntu-2004.zip(13.14 MB)
    ncnn-20210525-webassembly.zip(1.90 MB)
    ncnn-20210525-windows-vs2015-shared.zip(4.80 MB)
    ncnn-20210525-windows-vs2015.zip(21.16 MB)
    ncnn-20210525-windows-vs2017-shared.zip(4.72 MB)
    ncnn-20210525-windows-vs2017.zip(20.74 MB)
    ncnn-20210525-windows-vs2019-shared.zip(4.66 MB)
    ncnn-20210525-windows-vs2019.zip(20.90 MB)
  • 20210507(May 7, 2021)

    编译版本,默认配置,android-ndk-r21d,xcode 12.2,ubuntu-16.04,ubuntu-18.04,ubuntu-20.04,vs2015, vs2017,vs2019, emscripten-2.0.8 | file | content | arch | |---|---|---| |ncnn-full-source.zip |包含全部 submodule 代码的完整源码 | | |ncnn-android.zip | android 静态库/动态库 | armeabi-v7a + arm64-v8a + x86 + x86_64 | |ncnn-android-vulkan.zip | android 静态库/动态库,支持 GPU | armeabi-v7a + arm64-v8a + x86 + x86_64 | |ncnn-ios.zip | ios 静态库,with and w/o bitcode | armv7 + arm64 + arm64e + i386 + x86_64 | |ncnn-ios-vulkan.zip | ios 静态库,支持 GPU,with and w/o bitcode | arm64 + arm64e + x86_64 | |ncnn-macos.zip | macos 静态库 | x86_64 + arm64 | |ncnn-macos-vulkan.zip | macos 静态库,支持 GPU | x86_64 + arm64 | |ncnn-ubuntu.zip | ubuntu linux 静态库/动态库,支持 GPU,模型转换工具 | x86_64 | |ncnn-windows.zip | windows 静态库/动态库,支持 GPU,模型转换工具 | x86 + x86_64 | |ncnn-webassembly.zip | webassembly 静态库 | wasm32 + simd + threads + simd-threads |

    int8 packing重构,自动转换packing,大幅优化int8 arm neon卷积实现,支持armv8.2 dot指令加速 重写ncnn量化工具,支持kl/aciq/easyquant三种量化策略,提升多线程效率 ncnn2int8 离线融合quantize-激活-dequantize为requantize,实现端到端int8 去除用于在线融合requantize功能的cmake NCNN_REQUANT编译选项 conv/fc int8支持全部激活组合 conv3x3s1 pack8 直接卷积x86 avx优化 conv3x3 conv1x1 pack8 x86优化(zhiliu6) instancenorm arm neon优化 pixelshuffle arm neon优化 运行时检测risc-v cpu v扩展和函数分发 log/exp/sin/cos/tanh数学函数的risc-v v优化 packing/sigmoid/mish/tanh/swish/unaryop/cast risc-v v优化 动态库导出vkcommand和vkcompute api binaryop新增三种broadcast类型 新增cmake编译选项NCNN_INT8,可禁用全部int8推理相关代码 为find_blob_index_by_name错误附加更有用的提示信息 新增检测cpu支持risc-v v扩展和fp16扩展的查询接口 新增c906编译支持 删除caffe2ncnn中早期遗留的fp16和int8量化功能 修正conv1x1s1 sgemm pack8to4 fp16sa有bias时arm neon计算错误的问题 修正lrn vulkan因fp16精度不足导致计算错误的问题 修正ncnn2int8解析过长行时出错 修正使用gpu时候,无法input extract同个blob的问题 修正macos使用集成显卡时初始化错误 修正tanh neon计算错误 修复启用NCNN_ARM82编译时在树莓派运行发生非法指令错误 修正caffe2ncnn input层1维2维的参数转换 修正onnx clip没有max导致转换出错的问题 mxnet2ncnn 支持channel pad 改善mxnet2ncnn有同名blob模型的兼容性 排序ncnn_add_layer_test(ncnnnnn) 为hardsigmoid/hardswish默认参数添加注释(songqun) python打包去掉ppc64le(caishanli) 修复老版本gcc编译问题 修复NCNN_STRING=OFF时的编译问题(zhiliu6) 添加Apple M1 benchmark(DaChengTechnology) 修正文档拼写错误(cmdbug, proydakov) 更新ncnn量化推理文档 更新operation-param-weight-table(cmdbug) 添加新的faq文档(wwdok, zchrissirhcz, runrunrun1994, wblksheep, DaChengTechnology, ncnnnnn, 791136190, mmiirroo, cmdbug) 删除构建文档中有关cctools的部分 restyled机器人不触发ci 更新codecov版本 更新risc-v ci工具链,支持最新rvv-1.0 添加risc-v的代码覆盖率 更新qcom855+的benchmark数据

    New Contributors

    • @wwdok made their first contribution in https://github.com/Tencent/ncnn/pull/2829
    • @runrunrun1994 made their first contribution in https://github.com/Tencent/ncnn/pull/2831
    • @wblksheep made their first contribution in https://github.com/Tencent/ncnn/pull/2835
    • @mmiirroo made their first contribution in https://github.com/Tencent/ncnn/pull/2851
    • @xianyi made their first contribution in https://github.com/Tencent/ncnn/pull/2868

    Full Changelog: https://github.com/Tencent/ncnn/compare/20210322...20210507

    Source code(tar.gz)
    Source code(zip)
    ncnn-20210507-android-shared.zip(6.42 MB)
    ncnn-20210507-android-vulkan-shared.zip(13.73 MB)
    ncnn-20210507-android-vulkan.zip(13.32 MB)
    ncnn-20210507-android.zip(5.57 MB)
    ncnn-20210507-full-source.zip(16.89 MB)
    ncnn-20210507-ios-bitcode.zip(35.57 MB)
    ncnn-20210507-ios-vulkan-bitcode.zip(37.71 MB)
    ncnn-20210507-ios-vulkan.zip(8.16 MB)
    ncnn-20210507-ios.zip(7.96 MB)
    ncnn-20210507-macos-vulkan.zip(5.58 MB)
    ncnn-20210507-macos.zip(3.64 MB)
    ncnn-20210507-ubuntu-1604-shared.zip(7.29 MB)
    ncnn-20210507-ubuntu-1604.zip(8.10 MB)
    ncnn-20210507-ubuntu-1804-shared.zip(7.43 MB)
    ncnn-20210507-ubuntu-1804.zip(8.27 MB)
    ncnn-20210507-ubuntu-2004-shared.zip(7.64 MB)
    ncnn-20210507-ubuntu-2004.zip(8.48 MB)
    ncnn-20210507-webassembly.zip(1.63 MB)
    ncnn-20210507-windows-vs2015-shared.zip(4.55 MB)
    ncnn-20210507-windows-vs2015.zip(15.89 MB)
    ncnn-20210507-windows-vs2017-shared.zip(4.52 MB)
    ncnn-20210507-windows-vs2017.zip(15.65 MB)
    ncnn-20210507-windows-vs2019-shared.zip(4.47 MB)
    ncnn-20210507-windows-vs2019.zip(15.82 MB)
  • 20210322(Mar 22, 2021)

    编译版本,默认配置,android-ndk-r21d,xcode 12.2,ubuntu-16.04,ubuntu-18.04,ubuntu-20.04,vs2017,vs2019 | file | content | arch | |---|---|---| |ncnn-full-source.zip |包含全部 submodule 代码的完整源码 | | |ncnn-android.zip | android 静态库/动态库 | armeabi-v7a + arm64-v8a + x86 + x86_64 | |ncnn-android-vulkan.zip | android 静态库/动态库,支持 GPU | armeabi-v7a + arm64-v8a + x86 + x86_64 | |ncnn-ios.zip | ios 静态库,w/o bitcode | armv7 + arm64 + arm64e + i386 + x86_64 | |ncnn-ios-vulkan.zip | ios 静态库,支持 GPU,w/o bitcode | arm64 + arm64e + x86_64 | |ncnn-macos.zip | macos 静态库 | x86_64 + arm64 | |ncnn-macos-vulkan.zip | macos 静态库,支持 GPU | x86_64 + arm64 | |ncnn-ubuntu.zip | ubuntu linux 静态库/动态库,支持 GPU,模型转换工具 | x86_64 | |ncnn-windows.zip | windows 静态库/动态库,支持 GPU,模型转换工具 | x86 + x86_64 | |ncnn-webassembly.zip | webassembly 静态库 | wasm32 + simd + threads |

    warpaffine函数arm neon优化 新增multiheadattention和pytorch的转换 新增gelu(RBelogorodtsevFBase) 新增flush denormal选项并默认启用(leeys888) 新增adaptive_pooling vulkan实现(zylo117) 默认使用net局部的内存池 默认使用大核心的cpu作为线程数 在较新的adreno/mali gpu上启用fp16s/int8s 修正int8 armv7 conv1x1s1 requant无neon计算错误 conv3x3s1 winograd42 arm neon优化 通用convolution im2col sgemm的arm neon优化 常规的AVX2 convolution优化(zhiliu6) 优化ssd detectionoutput候选框计算(WeiChungChang) 在旧版adreno驱动时使用桥接的image上传下载 修复inplace forward(gdh1995) 消除NCNN_BENCHMARK启用时不必要的数据复制(yx9527) 改善megvii风格的shufflechannel转换 onnx2ncnn自动拓扑排序 改善onnx2ncnn layernorm转换 ncnnoptimize可以切分模型(chentyjpm) mlir2ncnn设置自定义llvm/mlr路径(daquexian) 修复多输入模型使用gpu推理可能导致的数据错误 修复gpu image分配失败的回退cpu的问题 修复gpu buffer2host的问题 修复某些pipeline编译失败的问题(zchrissirhcz) 修复macos编译问题(leeys888) 修复onnx2ncnn转换某些memorydata shape错误 修复python model zoo下载(caishanli) 修复ncnnoptimize遇到多个自定义层崩溃问题 cmake寻找和链接thread库 更新glslang版本(proydakov) int8量化工具支持adaptive pool(GuoxiaWang) 上传wheel到pypi(caishanli) 修复python net extractor销毁顺序的问题(caishanli) 新增python vulkan test(caishanli) 修正python setup.py缺失的import(zylo117) 更新convertmodel.com链接(daquexian) 更新mlir tf2 dialect 修正一些拼写错误(zchrissirhcz, caishanli, zhiqwang) ios最低版本要求9.0(DaChengTechnology) 关于在自己项目使用ncnn的文档(zchrissirhcz) readme新增nanodet(RangiLyu) 补充android cmake ninja编译方法(ncnnnnn) readme更新各个算法链接(linser233) 修正how-to-build文档中benchncnn输出格式(ncnnnnn) 新增build-mlir2ncnn文档(zchrissirhcz) 修复nanodet示例代码的变量重名(RangiLyu) 修复yolact示例代码的越界问题(cmdbug) 新增nanodet python demo(caishanli) 新增有关nvidia gpu无法开启vulkan问题的文档(PENGUINLIONG)

    Source code(tar.gz)
    Source code(zip)
    ncnn-20210322-android-shared.zip(5.96 MB)
    ncnn-20210322-android-vulkan-shared.zip(13.31 MB)
    ncnn-20210322-android-vulkan.zip(12.92 MB)
    ncnn-20210322-android.zip(5.06 MB)
    ncnn-20210322-full-source.zip(16.76 MB)
    ncnn-20210322-ios-bitcode.zip(33.37 MB)
    ncnn-20210322-ios-vulkan-bitcode.zip(36.65 MB)
    ncnn-20210322-ios-vulkan.zip(7.80 MB)
    ncnn-20210322-ios.zip(7.24 MB)
    ncnn-20210322-macos-vulkan.zip(5.33 MB)
    ncnn-20210322-macos.zip(3.31 MB)
    ncnn-20210322-ubuntu-1604-shared.zip(7.11 MB)
    ncnn-20210322-ubuntu-1604.zip(7.92 MB)
    ncnn-20210322-ubuntu-1804-shared.zip(7.25 MB)
    ncnn-20210322-ubuntu-1804.zip(8.07 MB)
    ncnn-20210322-ubuntu-2004-shared.zip(7.49 MB)
    ncnn-20210322-ubuntu-2004.zip(8.33 MB)
    ncnn-20210322-webassembly.zip(1.60 MB)
    ncnn-20210322-windows-vs2015-shared.zip(4.39 MB)
    ncnn-20210322-windows-vs2015.zip(15.38 MB)
    ncnn-20210322-windows-vs2017-shared.zip(4.36 MB)
    ncnn-20210322-windows-vs2017.zip(15.23 MB)
    ncnn-20210322-windows-vs2019-shared.zip(4.33 MB)
    ncnn-20210322-windows-vs2019.zip(15.38 MB)
  • 20210124(Jan 24, 2021)

    编译版本,默认配置,android-ndk-r21d,xcode 12.2,ubuntu-16.04,ubuntu-18.04,ubuntu-20.04,vs2017,vs2019 | file | content | arch | |---|---|---| |ncnn-full-source.zip |包含全部 submodule 代码的完整源码 | | |ncnn-android.zip | android 静态库/动态库 | armeabi-v7a + arm64-v8a + x86 + x86_64 | |ncnn-android-vulkan.zip | android 静态库/动态库,支持 GPU | armeabi-v7a + arm64-v8a + x86 + x86_64 | |ncnn-ios.zip | ios 静态库,w/o bitcode | armv7 + arm64 + arm64e + i386 + x86_64 | |ncnn-ios-vulkan.zip | ios 静态库,支持 GPU,w/o bitcode | arm64 + arm64e + x86_64 | |ncnn-macos.zip | macos 静态库 | x86_64 + arm64 | |ncnn-macos-vulkan.zip | macos 静态库,支持 GPU | x86_64 + arm64 | |ncnn-ubuntu.zip | ubuntu linux 静态库/动态库,支持 GPU,模型转换工具 | x86_64 | |ncnn-windows.zip | windows 静态库/动态库,支持 GPU,模型转换工具 | x86 + x86_64 | |ncnn-webassembly.zip | webassembly 静态库 | wasm32 + simd + threads |

    添加python绑定,默认不编译(caishanli) 新增cmake选项NCNN_SHARED_LIB编译动态库,pimpl实现模式 新增cmake选项NCNN_PLATFORM_API控制是否调用平台相关的api 大幅更新 C api,增加底层 op 接口 Blob类的consumer成员改为单个 重构forward_layer逻辑为更加简短 warpaffine整数实现和 arm neon 优化 新增pooling的adaptive参数,支持pytorch AdaptiveAvgPool2d/AdaptiveMaxPool2d转换(GuoxiaWang) 新增gru和rnn,以及对应的 arm neon 优化 lstm arm neon 优化 innerproduct aarch64 int8 gemm优化(tpoisonooo) padding支持1-3维,补充单元测试用例 packing pack8 arm neon 优化 重构onnx2ncnn和mlir2ncnn构图和优化逻辑,修复opset_version=12模型转换错误 改善onnx lsrm/gru/rnn的转换和算子融合 interp vulkan 支持 align_corner=True innerproduct现支持2维gemm计算 简化innerproduct x86 arm的pack分类和优化代码 自定义层注册添加custom_layer_destroyer参数(caishanli) Reorg/PixelShuffle层支持nhwc模式,onnx DepthToSpace mode DCR转换 减少不必要的unpack/pack(maxfy1992) packing pack4 x86 sse 优化 conv1x1s1 pack4 x86 sse 优化 改善macos的arm64支持 修正 memorydata vulkan 没有packing的问题 单元测试检查layer的支持属性 onnx2ncnn跨batch的Transpose转换 mlir2ncnn使用更短的blob名字 ncnnoptimize中的aborted提示改为skipped 修复blob数量超出实际数量时ncnnoptimize崩溃的问题 android 平台自动链接 jnigraphics 当开启vulkan校验时会启用VK_LAYER_KHRONOS_validation扩展 更换pyncnn model zoo地址(mosheliv) 检查NCNN_MAX_PARAM_COUNT越界(zchrissirhcz) simplepose例子添加提示(zchrissirhcz) 修复一些编译器警告(zchrissirhcz, ncnnnnn, proydakov) 修正一些代码的拼写错误(zchrissirhcz, ncnnnnn) 修复mingw-x64 simd编译错误(zchrissirhcz) operators文档添加convolutiondepthwise/crop/sigmoid/tanh(cavalleria) operators文档添加pooling(Sanster) 更新支持的平台矩阵(monkeyking) 新增github pull request文档(tpoisonooo) 增加 pytest option/allocator/net/extractor V831工具链(sunnycase) ci添加vs2015 ci添加python编译(caishanli) release的windows预编译包包含32位库以及vs2015版本 release添加linux/windows/android动态库预编译包

    Source code(tar.gz)
    Source code(zip)
    ncnn-20210124-android-shared.zip(5.92 MB)
    ncnn-20210124-android-vulkan-shared.zip(12.95 MB)
    ncnn-20210124-android-vulkan.zip(12.98 MB)
    ncnn-20210124-android.zip(5.01 MB)
    ncnn-20210124-full-source.zip(16.50 MB)
    ncnn-20210124-ios-bitcode.zip(32.82 MB)
    ncnn-20210124-ios-vulkan-bitcode.zip(44.69 MB)
    ncnn-20210124-ios-vulkan.zip(9.73 MB)
    ncnn-20210124-ios.zip(7.15 MB)
    ncnn-20210124-macos-vulkan.zip(6.60 MB)
    ncnn-20210124-macos.zip(3.26 MB)
    ncnn-20210124-ubuntu-1604-shared.zip(6.97 MB)
    ncnn-20210124-ubuntu-1604.zip(7.88 MB)
    ncnn-20210124-ubuntu-1804-shared.zip(7.12 MB)
    ncnn-20210124-ubuntu-1804.zip(8.03 MB)
    ncnn-20210124-ubuntu-2004-shared.zip(7.33 MB)
    ncnn-20210124-ubuntu-2004.zip(8.28 MB)
    ncnn-20210124-webassembly.zip(1.59 MB)
    ncnn-20210124-windows-vs2015-shared.zip(4.36 MB)
    ncnn-20210124-windows-vs2015.zip(15.31 MB)
    ncnn-20210124-windows-vs2017-shared.zip(4.32 MB)
    ncnn-20210124-windows-vs2017.zip(15.14 MB)
    ncnn-20210124-windows-vs2019-shared.zip(4.30 MB)
    ncnn-20210124-windows-vs2019.zip(15.30 MB)
  • 20201218(Dec 18, 2020)

    编译版本,默认配置,android-ndk-r21d,xcode 12.2,ubuntu-16.04,ubuntu-18.04,ubuntu-20.04,vs2017,vs2019 | file | content | arch | |---|---|---| |ncnn-full-source.zip |包含全部 submodule 代码的完整源码 | | |ncnn-android.zip | android 静态库 | armeabi-v7a + arm64-v8a + x86 + x86_64 | |ncnn-android-vulkan.zip | android 静态库,支持 GPU | armeabi-v7a + arm64-v8a + x86 + x86_64 | |ncnn-ios.zip | ios 静态库,w/o bitcode | armv7 + arm64 + arm64e + i386 + x86_64 | |ncnn-ios-vulkan.zip | ios 静态库,支持 GPU,w/o bitcode | arm64 + arm64e + x86_64 | |ncnn-macos.zip | macos 静态库 | x86_64 + arm64 | |ncnn-macos-vulkan.zip | macos 静态库,支持 GPU | x86_64 + arm64 | |ncnn-ubuntu.zip | ubuntu linux 静态库,支持 GPU,模型转换工具 | x86_64 | |ncnn-windows.zip | windows 静态库,支持 GPU,模型转换工具 | x86_64 | |ncnn-webassembly.zip | webassembly 静态库 | wasm32 + simd + threads |

    ncnnoptimize和ncnn2mem自动处理自定义op 新增获取ncnn版本的API:const char* ncnn_version(); Eltwise层支持一维和二维输入 开启更多vulkan扩展,根据gpu厂商设定较可靠的subgroup size 修正macos/ios arm64 simpleomp崩溃 onnx2ncnn 转换gemm保留输入数据 采用github action编译openmp和预编译包,发布新版本 增加ios arm64e架构编译 修正gcc-4.4/gcc-4.8编译错误 修正一些编译警告(ncnnnnn, zchrissirhcz, proydakov) 修正 build.sh macos 编译参数(cavalleria) 更新 macos vulkan sdk(monkeyking) benchmark新增3970X和RTX8000数据(BUG1989) operators文档新增cast(xingxingRealzyx)

    Source code(tar.gz)
    Source code(zip)
    ncnn-20201218-android-vulkan.zip(12.54 MB)
    ncnn-20201218-android.zip(4.61 MB)
    ncnn-20201218-full-source.zip(15.63 MB)
    ncnn-20201218-ios-bitcode.zip(30.45 MB)
    ncnn-20201218-ios-vulkan-bitcode.zip(43.43 MB)
    ncnn-20201218-ios-vulkan.zip(9.37 MB)
    ncnn-20201218-ios.zip(6.67 MB)
    ncnn-20201218-macos-vulkan.zip(6.34 MB)
    ncnn-20201218-macos.zip(3.03 MB)
    ncnn-20201218-ubuntu-1604.zip(7.71 MB)
    ncnn-20201218-ubuntu-1804.zip(7.87 MB)
    ncnn-20201218-ubuntu-2004.zip(8.13 MB)
    ncnn-20201218-webassembly.zip(1.48 MB)
    ncnn-20201218-windows-vs2017.zip(8.34 MB)
    ncnn-20201218-windows-vs2019.zip(8.17 MB)
  • 20201208(Dec 8, 2020)

    编译版本,默认配置,android-ndk-r21d,cctools-port 895 + ld64-274.2 + ios 10.2 sdk libc++ ncnn-android-lib 是 android 的静态库(armeabi-v7a + arm64-v8a + x86 + x86_64) ncnn-android-vulkan-lib 是 android 的静态库(armeabi-v7a + arm64-v8a + x86 + x86_64,包含vulkan支持) ncnn.framework.zip 是 ios 的静态库(armv7 + arm64 + i386 + x86_64) ncnn-vulkan.framework.zip 是 ios 的静态库(arm64 + x86_64,包含vulkan支持,MoltenVK-1.1.82.0) openmp.framework.zip 是 ios ncnn openmp 运行时静态库(armv7 + arm64 + i386 + x86_64) glslang.framework.zip 是 ios ncnn glslang 运行时静态库(arm64 + x86_64)

    新增Mat pixel affine任意角度旋转平移缩放函数 新增两组点计算warpaffine矩阵和逆变换函数 x86 SSE2的pack4架构更新和一些优化,包括BatchNorm/Bias/BinaryOp/Clip/Concat/Convolution/ConvolutionDepthWise/Dropout/Eltwise/Flatten/HardSigmoid/HardSwish/Innerproduct/Mish/Padding/Pooling/PReLU/ReLU/Scale/Sigmoid/Swish/TanH(PENGUINLIONG) 在支持vulkan 1.1驱动上自动检测subgroup信息 新增跨平台类thread-local-storage 新增simpleomp,实现最少的llvm openmp abi运行函数 新增获取cpu大小核信息的get_little_cpu_count()和get_big_cpu_count()(zchrissirhcz) 新增softplus层实现和onnx转换(ncnnnnn) 支持Apple M1芯片 支持Open Harmony OS(zchrissirhcz) Concat/Slice/Softmax支持axis负数的参数(Ca0L) 新增NCNN_SSE2编译选项,可完全禁用x86和webassembly的SIMD优化 一些通用的x86 Convolution AVX2性能优化(zhiliu6) vulkan支持Crop和Padding一维和二维blob 新增keras2ncnn模型转换工具(MarsTechHAN) Interp支持align_corner=True并支持onnx Resize转换(maxfy1992) 新增yolov5例子(Zhengtq) 新增nanodet例子 vulkan image storage shader统一使用image3d存储类型 禁用NCNN_THREADS时的NCNN_XADD实现 老旧adreno驱动不再启用image类型,修复gpu推理错误 去除ncnnoptimize过时的aarch64自动调优和相关文档 ncnnoptimize合并Reduction为Global Pooling ncnnoptimize替换单系数PReLU为Leaky ReLU ncnnoptimize清理broadcasting BinaryOp前的冗余Expand ncnnoptimize在shape推断后输出MAC和预计的内存足迹 有关opencv-mat转换和旋转文档的错误修正(ncnnnnn) 修正arm82 fp16s crop一维和二维错误的问题 Extractor::extract 支持第三个flag参数避免layout和存储类型转换(MarsTechHAN) abs()替换为fabs()(zchrissirhcz) 大量的编译器警告修复(proydakov, ncnnnnn, zchrissirhcz, pH5) ncnn可在c++03标准下编译 修复c++14标准的编译问题(nullptr-leo) 修复NCNN_PIXEL关闭时的编译(tpoisonooo, nullptr-leo) 修复protobuf旧版本编译错误(deepage) cmake开关NCNN_OPENCV更名为NCNN_SIMPLEOCV 修正yolact例子的颜色越界(zchrissirhcz) onnx2ncnn支持Max Min Pow与常数的转换 onnx2ncnn支持Pad channel参数 onnx2ncnn MatMul转换支持其中一个为MemoryData输入 修正onnx2ncnn Slice参数转换溢出的问题 修正onnx2ncnn和ncnnoptimize的groupnorm转换 mlir2ncnn适配新mlir api mlir2ncnn合并keras风格的batchnorm和instancenorm FAQ新增windows dll卸载崩溃的解决方案(qiqikit) 新增最小化编译ncnn二进制包的中文英文文档(songqun) 新增openmp最佳实践文档(youzainn) 改善编译步骤文档(baryluk) 一些文档和代码注释的拼写错误修正(HollowMan6, Zhengtq) 算子文档更新(xingxingRealzyx, LosReturn, Ca0L) 新增有关convertmodel.com的信息(daquexian) ci新增webassembly-nosimd和webassembly-simpleomp编译 ci新增android ndk-r16b编译 ci新增vs2017 cpu gpu编译 ci新增macos arm64编译 ci新增codeql分析

    Source code(tar.gz)
    Source code(zip)
    glslang.framework-bitcode.zip(8.72 MB)
    glslang.framework.zip(1.89 MB)
    ncnn-android-lib.zip(4.43 MB)
    ncnn-android-vulkan-lib.zip(11.53 MB)
    ncnn.framework-bitcode.zip(21.40 MB)
    ncnn.framework-vulkan-bitcode.zip(17.87 MB)
    ncnn.framework-vulkan.zip(3.12 MB)
    ncnn.framework.zip(4.11 MB)
    openmp.framework-bitcode.zip(3.54 MB)
    openmp.framework.zip(990.38 KB)
  • 20200916(Sep 16, 2020)

    编译版本,默认配置,android-ndk-r21d,cctools-port 895 + ld64-274.2 + ios 10.2 sdk libc++ ncnn-android-lib 是 android 的静态库(armeabi-v7a + arm64-v8a + x86 + x86_64) ncnn-android-vulkan-lib 是 android 的静态库(armeabi-v7a + arm64-v8a + x86 + x86_64,包含vulkan支持) ncnn.framework.zip 是 ios 的静态库(armv7 + arm64 + i386 + x86_64) ncnn-vulkan.framework.zip 是 ios 的静态库(arm64 + x86_64,包含vulkan支持,MoltenVK-1.1.82.0) openmp.framework.zip 是 ios ncnn openmp 运行时静态库(armv7 + arm64 + i386 + x86_64) glslang.framework.zip 是 ios ncnn glslang 运行时静态库(arm64 + x86_64)

    armv8.2 fp16s/fp16sa优化,包括batchnorm/binaryop/clip/convolution/convolutiondepthwise/deconvolution/deconvolutiondepthwise/eltwise/flatten/hardsigmoid/hardswish/innerproduct/interp/mish/packing/padding/pooling/prelu/relu/reshape/shufflechannel/sigmoid/slice/swish/tanh/unaryop以及对应的pack8优化实现 默认启用fp16计算功能 Mat from_pixels/from_pixels_resize接受ROI参数实现crop+resize 更精细的cpu绑定接口 新增gemm 新增groupnorm,支持pytorch groupnorm转换 新增layernorm 无affine的instancenorm/groupnorm ncnnoptimize最后的flag设1表示转fp16模型(by zchrissirhcz) simplestl新增list实现,完全去除libstdc++依赖(by nullptr-leo) 开放glsl编译spirv接口 interp允许接受第二个blob作为参考size packing 遵守线程数设置 windows上寻找vulkan-sdk时给出提示信息(by zchrissirhcz) yolov3detectionoutput层avx优化(by zhiliu6) mxnet reduction 参数兼容数组和单个数字 ncnnoptimize自动合并binaryop-with-scalar 修正opencv 2.x/4 编译问题(by zchrissirhcz) 修复resize_bilinear_c4可能的crash 修正开启bf16s时extract可能没有转回fp32的问题 修正老编译器UINT64_MAX编译错误(by ncnnnnn) 修正vulkan conv1x1s1 pack1计算错误 修正onnx2ncnn在转换resize时可能的crash load_param_mem的参数必须以\0结尾 更新mlir tf2 dialect mlir2ncnn转换tf.Maximum/tf.Minimum/tf.ResizeBilinear mlir2ncnn合并instancenorm mlir2ncnn合并keras风格的Conv2d/Dense 修正yolov4例子bgr2rgb(by MarsTechHAN and qaz734913414) 修正darknet maxpool padding size转换(by ruru5697) ncnn cmake target自动依赖glslang(by youzainn) 修正powerpc64编译问题,绕过interp altivec优化bug 修正armv7带stride参数from_pixels bus error 单元测试新增mat pixel resize 默认不编译ncnn单元测试(by caishanli) ci最低编译环境为ubuntu-16.04,c++03 添加telegram群聊(by zchrissirhcz)

    Source code(tar.gz)
    Source code(zip)
    glslang.framework-bitcode.zip(8.72 MB)
    glslang.framework.zip(1.89 MB)
    ncnn-android-lib.zip(4.18 MB)
    ncnn-android-vulkan-lib.zip(11.22 MB)
    ncnn.framework-bitcode.zip(20.14 MB)
    ncnn.framework-vulkan-bitcode.zip(17.13 MB)
    ncnn.framework-vulkan.zip(2.98 MB)
    ncnn.framework.zip(3.87 MB)
    openmp.framework-bitcode.zip(3.54 MB)
    openmp.framework.zip(990.38 KB)
  • 20200727(Jul 27, 2020)

    编译版本,默认配置,android-ndk-r21d,cctools-port 895 + ld64-274.2 + ios 10.2 sdk libc++ ncnn-android-lib 是 android 的静态库(armeabi-v7a + arm64-v8a + x86 + x86_64) ncnn-android-vulkan-lib 是 android 的静态库(armeabi-v7a + arm64-v8a + x86 + x86_64,包含vulkan支持) ncnn.framework.zip 是 ios 的静态库(armv7 + arm64 + i386 + x86_64) ncnn-vulkan.framework.zip 是 ios 的静态库(arm64 + x86_64,包含vulkan支持,MoltenVK-1.1.82.0) openmp.framework.zip 是 ios ncnn openmp 运行时静态库(armv7 + arm64 + i386 + x86_64) glslang.framework.zip 是 ios ncnn glslang 运行时静态库(armv7 + arm64 + i386 + x86_64)

    x86 avx优化,包括batchnorm/bias/binaryop/cast/clip/concat/convolution/convolutiondepthwise/crop/dropout/eltwise/flatten/hardsigmoid/hardswish/innerproduct/lrn/lstm/mish/packing/padding/pooling/prelu/relu/reshape/scale/sigmoid/slice/swish/tanh以及对应的pack8优化实现(by Timen) arm lstm neon优化(by Timen) 附带中间状态输入输出的lstm(by Timen) 使用在线pipeline缓存大幅优化gpu推理的模型加载效率,默认启用 使用gpu推理时自动创建instance 全局开启armv8.2和avx编译,运行时判断cpu特性使用不同的优化代码 C api 上层封装和squeezenet_c_api例子 降低摄像头循环调用ncnn时openmp线程的cpu负载 新增darknet2ncnn教程,yolov4-tiny(by zhiliu6) mlir2ncnn模型转换工具,可以转换常用分类模型/pix2pi2 修正use_packing_layout/use_bf16_storage选项打开时,int8推理错误 修正当输出层支持bf16s时,没有转换为fp32的错误 修正gpu out of allocator错误 onnx shufflechannel兼容megvii写法 改善onnx slice转换兼容性 改善onnx resize转换兼容性 改善onnx simplified后的normalize转换兼容性 改善reshape nhwc的兼容性 修正deconvolution output shape和padding的处理 修复conv3x3s1 pack1to4 bf16s计算错误 修正tanh mips计算错误 修正输入一维数据和mish激活时convolution计算错误 修正interp nearest计算错误 绕过nvidia显卡上normalize/softmax计算错误 绕过nvidia显卡上padding reflect计算错误 去掉priorbox fp16s的绕过手段 绕过qcom adreno老驱动winograd计算错误的问题 ios打包glslang ci增加squeezenet/mat_pixels_rotate ci新增riscv32/riscv64/mips32/mips64,包括riscv-v扩展和mips-msa扩展 新增yolov4例子(by zhiliu6) FAQ新增jpg解码和resize章节(by zchrissirhcz) benchmark新增jstson agx(by zineos)

    Source code(tar.gz)
    Source code(zip)
    glslang.framework-bitcode.zip(8.72 MB)
    glslang.framework.zip(1.89 MB)
    ncnn-android-lib.zip(3.83 MB)
    ncnn-android-vulkan-lib.zip(10.87 MB)
    ncnn.framework-bitcode.zip(18.51 MB)
    ncnn.framework-vulkan-bitcode.zip(15.83 MB)
    ncnn.framework-vulkan.zip(2.71 MB)
    ncnn.framework.zip(3.53 MB)
    openmp.framework-bitcode.zip(3.54 MB)
    openmp.framework.zip(990.38 KB)
  • 20200616(Jun 16, 2020)

    编译版本,默认配置,android-ndk-r21d,cctools-port 895 + ld64-274.2 + ios 10.2 sdk libc++ ncnn-android-lib 是 android 的静态库(armeabi-v7a + arm64-v8a + x86 + x86_64) ncnn-android-vulkan-lib 是 android 的静态库(armeabi-v7a + arm64-v8a + x86 + x86_64,包含vulkan支持) ncnn.framework.zip 是 ios 的静态库(armv7 + arm64 + i386 + x86_64,bitcode) ncnn-vulkan.framework.zip 是 ios 的静态库(arm64 + x86_64,bitcode,包含vulkan支持,MoltenVK-1.1.82.0) openmp.framework.zip 是 ios ncnn openmp 运行时静态库(armv7 + arm64 + i386 + x86_64,bitcode)

    adreno gpu image存储+fp16p/fp16s/fp16pa/fp16sa优化,在qcom855之前的高通芯片上默认启用,包括全部gpu shader 新增darknet转换器,支持yolov4和efficientnetb0-yolov3(by zhiliu6) 新增simplestl,可替代std::string/std::vector,默认不启用(by scarsty) 新增NCNN_LOGE宏,android自动在adb logcat输出信息(by maxint) 运行时生成spirv,大幅减小gpu库体积 新增python绑定链接 新增查询当前可用gpu显存接口 gpu fp16/fp32转换,buffer/image转换,全部融合入packing层 gpu device级别复用packing/unpacking管线 3d padding,新增blazeface benchmark(by Timen) 新增yuv420sp2rgbhalf快速实现(by tpoisonooo) interp pack4 bf16s bf16s-pack4 neon优化 改善fbnetv2 hardsigmoid hardswish识别转换 修复ncnn作为sub project编译问题(by MarisaKirisame) 修复int8 pooling(by yx9527) 修复innerproduct arm sigmoid缺失的问题 修复mxnet slice None转换crop错误(by ddddwee1) 新增mish层(by zhiliu6) 新增swish层(by zhiliu6) 改善roialign detectron2转换兼容性(by wkcn) 新增语音相关的statisticspooling层(by Wang-Charles) 新增deepcopy工具层 改善mxnet gluon导出list类型参数的转换(by papercatnku) 改善onnx swish识别转换 更稳定的arm tanh_ps优化实现 cast bfloat16/float32转换的avx优化(by Timen) ncnnoptimize合并scalar binaryop,包括sub/div ncnnoptimize合并weighted sum ncnnoptimize合并convolution + mish(by zhiliu6) 修正binaryop gpu special type 2/4的传播条件 修正ncnn2mem输出param.bin文件路径(by GuoxiaWang) 修复设置cpu affinity mask时可能的越界错误 修复老旧adreno/mali驱动image存储crash的问题 修复多国语系中param加载错误的问题 修复gpu convolution winograd在有shape hint时crop错误的问题 检查copy_cut_border参数(by tpoisonooo) 修正quantize文档的小错误(mengfu188) 树莓派4b benchmark(by elejke) qcom865开启fp16计算功能 mali-t880/g51/g52/g71/g72开启fp16计算功能 更新qcom810/qcom660/qcom835/qcom855+/kirin970 benchmark数据 统一code style 单元测试新增全部op的image存储变种 github自动识别comp为GLSL 修复旧版vulkan-sdk编译(by xfan1024) ios编译bitcode和native两种版本

    Source code(tar.gz)
    Source code(zip)
    glslang.framework-bitcode.zip(8.72 MB)
    glslang.framework.zip(1.89 MB)
    ncnn-android-lib.zip(2.77 MB)
    ncnn-android-vulkan-lib.zip(9.65 MB)
    ncnn.framework-bitcode.zip(11.88 MB)
    ncnn.framework-vulkan-bitcode.zip(11.80 MB)
    ncnn.framework-vulkan.zip(2.03 MB)
    ncnn.framework.zip(2.37 MB)
    openmp.framework-bitcode.zip(3.54 MB)
    openmp.framework.zip(990.38 KB)
  • 20200413(Apr 13, 2020)

    编译版本,默认配置,android-ndk-r19c,cctools-port 895 + ld64-274.2 + ios 10.2 sdk libc++ ncnn-android-lib 是 android 的静态库(armeabi-v7a + arm64-v8a + x86 + x86_64) ncnn-android-vulkan-lib 是 android 的静态库(armeabi-v7a + arm64-v8a + x86 + x86_64,包含vulkan支持) ncnn.framework.zip 是 ios 的静态库(armv7 + arm64 + i386 + x86_64,bitcode) ncnn-vulkan.framework.zip 是 ios 的静态库(arm64 + x86_64,bitcode,包含vulkan支持,MoltenVK-1.1.82.0) openmp.framework.zip 是 ios ncnn openmp 运行时静态库(armv7 + arm64 + i386 + x86_64,bitcode)

    bfloat16数据类型存储和算子优化实现,默认不启用,主要算子armv7/aarch64特化优化,packing布局的armv7/aarch64特化优化(包括BinaryOp/Cast/Clip/Concat/Convolution/ConvolutionDepthWise/Crop/Eltwise/Flatten/HardSigmoid/HardSwish/Innerproduct/Packing/Padding/Pooling/ReLU/ShuffleChannel/Sigmoid/Slice/TanH) 默认启用cpu packing_layout加速 Mat to_pixels/to_pixels_resize arm neon优化 mips算子优化absval/bias/clip/sigmoid/softmax/tanh(by nullptr-leo) gpu shader按下标创建并自动获得specialization和pushconstant数量 android hardwarebuffer导入VkMat接口重构 为避免对齐问题,去除全部VkMat局部引用构造函数 VkMat和Command接口重构,去除staging成员,简化上传下载API 可指定cpu id的线程精准绑定接口 Mat PixelType 新增 BGRA 和相关转换类型 BinaryOp广播规则新增左值attention type 3/4 LSTM单向双向算子和onnx转换(支持chineseocrlite) 新增工具层DeepCopy float32与bfloat16转换函数 命名enum类型(by caishanli) benchncnn新增冷却时间开关(by kalcohol) RK3288和RK3399 gpu开启fp16计算功能 visual studio 源码分组(by kalcohol) 修复innerproduct requant计算问题(by yx9527) 修复flatten gpu fp16p pack1to4/pack1to8某些shape运算错误 修复gpu非coherent显存不一致问题 改善onnx新版Pad/Resize/Clip/Slice转换兼容性 改善onnx hardsigmoid/hardswish识别转换 修复onnx模型某些BinaryOp输入丢失问题 docs文件夹自动同步wiki 单元测试增加到40个(by xieydd monkeyking) 单元测试增加gpu fp16p,gpu pack8和cpu bf16s变种 Travis CI新增arm32编译+单元测试+覆盖率 codecov代码覆盖率整合 ios编译开启bitcode windows android库编译脚本(by kalcohol) 修复windows上的mingw编译(by qaz734913414) cv::Mat与ncnn::Mat转换文档 高效roi/resize/rotate文档 protobuf安装问题FAQ(by tpoisonooo) 新增yolact实例分割例子

    Source code(tar.gz)
    Source code(zip)
    ncnn-android-lib.zip(2.67 MB)
    ncnn-android-vulkan-lib.zip(10.17 MB)
    ncnn-vulkan.framework.zip(16.79 MB)
    ncnn.framework.zip(11.51 MB)
    openmp.framework.zip(3.54 MB)
  • 20200226(Feb 26, 2020)

    编译版本,默认配置,android-ndk-r19c,cctools-port 895 + ld64-274.2 + ios 10.2 sdk libc++ ncnn-android-lib 是 android 的静态库(armeabi-v7a + arm64-v8a + x86 + x86_64) ncnn-android-vulkan-lib 是 android 的静态库(armeabi-v7a + arm64-v8a + x86 + x86_64,包含vulkan支持) ncnn.framework.zip 是 ios 的静态库(armv7 + arm64 + i386 + x86_64) ncnn-vulkan.framework.zip 是 ios 的静态库(arm64 + x86_64,包含vulkan支持,MoltenVK-1.1.82.0) openmp.framework.zip 是 ios ncnn openmp 运行时静态库(armv7 + arm64 + i386 + x86_64)

    实现全部pack8 shader,默认不启用 shader存取宏改为function-style 所有gpu算子加载时会尝试读取shape hint并调整localsize 单元测试增加到30个 ncnnoptimize离线shape推断 实现pixelshuffle的cpu/gpu推理 onnx pixelshuffle转换 im2col将遵循sgemm_convolution选项开关 改善qcom adreno老驱动兼容性 qcom855/855plus开启gpu fp16a功能 chgemm和int-requant计算错误修复(by tpoisonooo) cmake生成VS工程分组(by kalcohol) 量化工具默认加入编译(by kalcohol) fp32/fp16 avx2优化(by zhiliu6) avx2卷积计算错误修正(by zhiliu6) 很多很多算子的bug修复

    Source code(tar.gz)
    Source code(zip)
    ncnn-android-lib.zip(2.33 MB)
    ncnn-android-vulkan-lib.zip(10.02 MB)
    ncnn-vulkan.framework.zip(4.65 MB)
    ncnn.framework.zip(1.96 MB)
    openmp.framework.zip(918.03 KB)
  • 20200106(Jan 6, 2020)

    编译版本,默认配置,android-ndk-r19c,cctools-port 895 + ld64-274.2 + ios 10.2 sdk libc++ ncnn-android-lib 是 android 的静态库(armeabi-v7a + arm64-v8a + x86 + x86_64) ncnn-android-vulkan-lib 是 android 的静态库(armeabi-v7a + arm64-v8a + x86 + x86_64,包含vulkan支持) ncnn.framework.zip 是 ios 的静态库(armv7 + arm64 + i386 + x86_64) ncnn-vulkan.framework.zip 是 ios 的静态库(arm64 + x86_64,包含vulkan支持,MoltenVK-1.1.82.0) openmp.framework.zip 是 ios ncnn openmp 运行时静态库(armv7 + arm64 + i386 + x86_64)

    pixel rotate api,8种90度角度旋转镜像翻转优化,支持stride参数和ROI ncnn example model zoo提供下载 Mat from_pixels(_resize)/to_pixels(_resize)支持stride参数和ROI android 8.0支持android hardware buffer转换vkmat 修复旧arm mali驱动兼容性问题 改善onnx prelu/flatten转换兼容性 pytorch normalize转换 normalize支持inplace normalize vulkan slice pack4优化 slice vulkan per-channel padding ncnnoptimizer删除无用的pooling 1x1s1 ncnnoptimizer删除无用的memorydata 修复deconvolution vulkan计算错误 修复pooling vulkan SAME pad, avgpool_count_include_pad 修复可能的 staging buffer data race 修正prelu gpu计算错误 优化yolov3 output性能 vulkan spatial attention broadcasting 修正fasterrcnn gpu fp16p计算错误 修正conv1x1s1 gpu fp16p计算错误 改善pytorch shufflenetv2模型转换 ci新增webassembly

    Source code(tar.gz)
    Source code(zip)
    ncnn-android-lib.zip(2.30 MB)
    ncnn-android-vulkan-lib.zip(5.16 MB)
    ncnn-vulkan.framework.zip(2.26 MB)
    ncnn.framework.zip(1.95 MB)
    openmp.framework.zip(918.03 KB)
  • 20191113(Nov 13, 2019)

    编译版本,默认配置,android-ndk-r19c,cctools-port 895 + ld64-274.2 + ios 10.2 sdk libc++ ncnn-android-lib 是 android 的静态库(armeabi-v7a + arm64-v8a + x86 + x86_64) ncnn-android-vulkan-lib 是 android 的静态库(armeabi-v7a + arm64-v8a + x86 + x86_64,包含vulkan支持) ncnn.framework.zip 是 ios 的静态库(armv7 + arm64 + i386 + x86_64) ncnn-vulkan.framework.zip 是 ios 的静态库(arm64 + x86_64,包含vulkan支持,MoltenVK-1.1.82.0) openmp.framework.zip 是 ios ncnn openmp 运行时静态库(armv7 + arm64 + i386 + x86_64)

    pack内存布局推断neon优化(armv7+aarch64):conv1x1s1 conv1x1s2 conv3x3s1 conv3x3s1-winograd conv3x3s2 conv5x5s1 conv5x5s2 conv7x7s2 convdw3x3s1 convdw3x3s2 convdw5x5s1 convdw5x5s2 group-convolution/deconvolution pooling3x3s2-max pooling2x2s2-max padding relu hardsigmoid hardswish innerproduct reshape flatten shufflechannel 移除接口中的Option默认参数,预处理函数只使用1线程 mat输出新增颜色转换rgb2rgba/bgr2rgba/gray2rgba 新增android Bitmap与Mat直接转换函数 新增android Asset加载模型函数 新增NNIE ImageWatch插件 改善deconvolution pad/adj参数兼容性 新增Noop 新增convolution pad_value参数 unaryop vulkan实现tanh 修正radv驱动instancenorm crash mxnet hardsigmoid/hardswish融合转换 onnx 1维batchnorm融合转换 onnx split转换 Reduction新增PROD/L1/L2/LogSum/LogSumExp以及keepdims参数 onnx ReduceMax/ReduceMin/ReduceMean/ReduceProd/ReduceSum/ReduceSumSquare/ReduceL1/ReduceL2/ReduceLogSum/ReduceLogSumExp转换 改善mxnet和onnx slice转换兼容性 onnx Unsqueeze/Squeeze转换 onnx Pooling ceil_mode转换 新增DataReader接口,方便从加密数据加载 修复某些手机无法识别大小核的问题 ncnnoptimize融合工具保留参数中的浮点精度 修复arm平台某些参数下im2col+gemm实现错误的问题 修复gpu推断使用fp16存储上传权重时可能的越界写问题 修复hisim100编译错误 int8 convolution winograd43精度提升 ncnn2table工具兼容性改善 ci新增windows-vs2017-gpu cmake选项NCNN_DISABLE_PIC禁用PIC 新增retinaface人脸检测例子 新增mobilenetv3-ssd通用物体检测例子 benchmark新增shufflenet_v2 benchmark开启pack内存布局推断功能 新增android x86/x86_64预编译库

    Source code(tar.gz)
    Source code(zip)
    ncnn-android-lib.zip(2.27 MB)
    ncnn-android-vulkan-lib.zip(4.88 MB)
    ncnn-vulkan.framework.zip(2.12 MB)
    ncnn.framework.zip(1.92 MB)
    openmp.framework.zip(918.03 KB)
  • 20190908(Sep 8, 2019)

    编译版本,默认配置,android-ndk-r19c,cctools-port 895 + ld64-274.2 + ios 10.2 sdk libc++ ncnn-android-lib 是 android 的静态库(armeabi-v7a + arm64-v8a) ncnn-android-vulkan-lib 是 android 的静态库(armeabi-v7a + arm64-v8a,包含vulkan支持) ncnn.framework.zip 是 ios 的静态库(armv7 + arm64 + i386 + x86_64) ncnn-vulkan.framework.zip 是 ios 的静态库(arm64 + x86_64,包含vulkan支持,MoltenVK-1.1.82.0) openmp.framework.zip 是 ios ncnn openmp 运行时静态库(armv7 + arm64 + i386 + x86_64)

    搬运了很多很多wiki文档 新增实验性pack内存布局推断:absval batchnorm clip relu sigmoid packing convolution convolutiondepthwise innerproduct pooling flatten prelu scale dropout softmax binaryop deconvolution deconvolutiondepthwise concat eltwise cast split 新选项use_packing_layout控制是否cpu使用pack内存布局推理 conv1x1s1 pack4 sgemm arm neon优化 conv3x3s1 pack4 winograd arm neon优化 tanh arm neon优化 新增模型无finetune量化工具ncnn2table和readme(默认不编译) ncnn2table新增swapRB参数 ncnnoptimize在armlinux自动卷积调优 ncnnoptimize自动删除binaryop前无用的reshape 新增hardsigmoid 新增selu,onnx 支持转换 新增hardswish padding支持reflect模式 pooling avg新参数count_include_pad,mxnet/onnx支持转换 convolution deconvolution支持四个方向不等长padding onnx autopad SAME_LOWER转换 deconvolution新参数output_adj和output_shape onnx resize转换 onnx Div转换 mxnet BilinearResize2D转换 修正 mxnet onnx slice转换 pytorch onnx channelshuffle转换 pytorch onnx hardsigmoid hardswish转换 修正gpu多线程显存池共享问题 concat vulkan实现pack1to4 pack4to1to4 binaryop vulkan实现broadcast instancenorm vulkan实现 crop vulkan实现pack1to4 pack4to1 可移植的thread/condition-variable类 添加hisiv600 hi3559V100工具链文件 cmake编译系统更新,头文件安装到include/ncnn 更新mobilenetv2_yolov3 benchmark 新增mobilenet_v3 benchmark ncnn模型可在netron中可视化 新增人体关键点定位例子simplepose 更新android squeezencnn工程

    Source code(tar.gz)
    Source code(zip)
    ncnn-android-lib.zip(1.27 MB)
    ncnn-android-vulkan-lib.zip(2.57 MB)
    ncnn-vulkan.framework.zip(2.05 MB)
    ncnn.framework.zip(1.78 MB)
    openmp.framework.zip(918.03 KB)
  • 20190611(Jun 11, 2019)

    编译版本,默认配置,android-ndk-r19c,cctools-port 895 + ld64-274.2 + ios 10.2 sdk libc++ ncnn-android-lib 是 android 的静态库(armeabi-v7a + arm64-v8a) ncnn-android-vulkan-lib 是 android 的静态库(armeabi-v7a + arm64-v8a,包含vulkan支持) ncnn.framework.zip 是 ios 的静态库(armv7 + arm64 + i386 + x86_64) ncnn-vulkan.framework.zip 是 ios 的静态库(arm64 + x86_64,包含vulkan支持,MoltenVK-1.1.82.0) openmp.framework.zip 是 ios ncnn openmp 运行时静态库(armv7 + arm64 + i386 + x86_64)

    新增算子融合工具 ncnnoptimize (bn + scale / conv + bn / deconv + bn / innerproduct + bn / conv + relu 等) armv7/arm64 conv1x1s2 armv7/arm64 其他卷积的 sgemm 优化 int8 requantize 层融合实现全int8存储 修复旧版本 cmake openmp 兼容性 option api 变动,移除全局option from_pixels_resize 越界读修正 x86 convolution avx2 优化 x86 convolution sse2 优化 interp resize bicubic 插值 gpu convolution/padding 支持 SAME pad gpu conv1x1s1 优化 gpu conv3x3s1 winograd-f23 优化 gpu fp16 packed 优化,支持全部gpu gpu fp16 storage 优化,支持绝大多数桌面gpu gpu fp16 arithmetic 优化,默认关闭 允许创建自定义 vulkan compute pipeline 并嵌入推断过程 gpu 各层耗时统计 vulkan layer 架构调整 gpu fp16p fp16s fp16a int8s in8a 控制开关 更简单的多卡 gpu 设置 api 新的分割例子 pelee ssd segmentation benchmark fp32 算子融合

    Source code(tar.gz)
    Source code(zip)
    ncnn-android-lib.zip(986.35 KB)
    ncnn-android-vulkan-lib.zip(2.20 MB)
    ncnn-vulkan.framework.zip(1.86 MB)
    ncnn.framework.zip(1.51 MB)
    openmp.framework.zip(918.03 KB)
  • 20190320(Mar 20, 2019)

    编译版本,默认配置,android-ndk-r18b,cctools-port 895 + ld64-274.2 + ios 10.2 sdk libc++ ncnn-android-lib 是 android 的静态库(armeabi-v7a + arm64-v8a) ncnn-android-vulkan-lib 是 android 的静态库(armeabi-v7a + arm64-v8a,包含vulkan支持) ncnn.framework.zip 是 ios 的静态库(armv7 + arm64 + i386 + x86_64) ncnn-vulkan.framework.zip 是 ios 的静态库(arm64 + x86_64,包含vulkan支持,MoltenVK-1.1.82.0) openmp.framework.zip 是 ios ncnn openmp 运行时静态库(armv7 + arm64 + i386 + x86_64)

    新增实验性gpu推断:AbsVal, BatchNorm, BinaryOp(no broadcasting), Clip, Concat, Convolution(pad -233 not supported), ConvolutionDepthWise(pad -233 not supported), Crop, Deconvolution, DeconvolutionDepthWise, Dropout, Eltwise, Flatten, InnerProduct, Interp, LRN, Packing, Padding, Permute, Pooling(pad SAME not supported), PReLU, PriorBox, ReLU, Reorg, Reshape, Scale, ShuffleChannel, Sigmoid, Softmax, TanH, UnaryOp 支持gpu/cpu混合推断 无溢出的int8卷积计算 精度更好的int8量化方法 element packing数据存储 conv3x3s1 int8 aarch64优化 conv3x3s2 int8 armv7/aarch64优化 conv1x1s1 int8 aarch64优化 convdw3x3s1 int8 aarch64优化 convdw3x3s2 int8 aarch64优化 修正armv7-without-neon编译 example例子全部支持gpu运算 兼容onnx/mxnet的upsample和slice转换 squeezencnn增加gpu识别按钮 benchmark新增gpu推断 benchmark新增resnet50以及int8模型 ios openmp 更新为 7.0.1

    Source code(tar.gz)
    Source code(zip)
    ncnn-android-lib.zip(767.90 KB)
    ncnn-android-vulkan-lib.zip(1.20 MB)
    ncnn-vulkan.framework.zip(1.04 MB)
    ncnn.framework.zip(1.27 MB)
    openmp.framework.zip(918.03 KB)
  • 20181228(Dec 24, 2018)

    编译版本,默认配置,android-ndk-r17b,cctools-port 886 + ld64 264.3.102 + ios 9.3 sdk libc++ ncnn-android-lib 是 android 的静态库(armeabi-v7a + arm64-v8a) ncnn.framework.zip 是 ios 的静态库(armv7 + arm64 + i386 + x86_64) openmp.framework.zip 是 ios ncnn openmp 运行时静态库(armv7 + arm64 + i386 + x86_64)

    clip armv7/aarch64优化 convdw5x5 armv7/aarch64优化 conv3x3s1 aarch64优化 convdw3x3 int8 armv7 优化 conv1x1/conv3x3 int8 aarch64 优化 新增从内存加载明文param接口 Net::load_param_mem() 修正permute/dequantize/lstm/innerproduct int8 新增psroipooling/roialign 新增VS ImageWatch插件 修复新版本cmake的openmp编译 支持树莓派3编译 支持ARM-linux和hisi平台编译 更新cmake-ios工具链 修正yolov2多尺度检测 支持yolov3和yolov3例子 darknet2ncnn转换器 支持R-FCN和R-FCN例子 支持mxnet-ssd模型转换 新增shufflenetv2例子 兼容onnx opset7/8模型转换,增加更多op转换 修正mxnet batchnorm fix_gamma参数转换 新增yuv420sp转RGB和缩放函数 benchmark新增mnasnet/proxylessnasnet/mobilenet-yolov2/mobilenet-yolov3

    Source code(tar.gz)
    Source code(zip)
    ncnn-android-lib.zip(714.49 KB)
    ncnn.framework.zip(1.15 MB)
    openmp.framework.zip(822.16 KB)
  • 20180830(Aug 30, 2018)

    编译版本,默认配置,android-ndk-r15c,cctools-port 886 + ld64 264.3.102 + ios 9.3 sdk libc++ ncnn-android-lib 是 android 的静态库(armeabi-v7a + arm64-v8a) ncnn.framework.zip 是 ios 的静态库(armv7 + arm64 + i386 + x86_64) openmp.framework.zip 是 ios ncnn openmp 运行时静态库(armv7 + arm64 + i386 + x86_64)

    blob/workspace内存池接口 winograd/sgemm/int8推断运行时开关 更精细的多线程控制 caffe模型转换加载int8量化表 int8推断(experimental) windows高精度时间戳 conv3x3s2优化 depthwiseconv3x3s1 arm64优化 conv1x1和conv3x3 int8 armv7优化 修正dilated+stride conv快速运算 mxnet-shufflenet模型转换 新增mobilenetv2ssdlite例子

    Source code(tar.gz)
    Source code(zip)
    ncnn-android-lib.zip(699.85 KB)
    ncnn.framework.zip(1.07 MB)
    openmp.framework.zip(822.16 KB)
  • 20180704(Jul 4, 2018)

  • 20180427(Apr 27, 2018)

  • 20180314(Mar 14, 2018)

Owner
Tencent
Tencent
YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with ONNX, TensorRT, ncnn, and OpenVINO supported.

Introduction YOLOX is an anchor-free version of YOLO, with a simpler design but better performance! It aims to bridge the gap between research and ind

null 7.7k Jan 3, 2023
PPLNN is a Primitive Library for Neural Network is a high-performance deep-learning inference engine for efficient AI inferencing

PPLNN is a Primitive Library for Neural Network is a high-performance deep-learning inference engine for efficient AI inferencing

null 943 Jan 7, 2023
High performance Cross-platform Inference-engine, you could run Anakin on x86-cpu,arm, nv-gpu, amd-gpu,bitmain and cambricon devices.

Anakin2.0 Welcome to the Anakin GitHub. Anakin is a cross-platform, high-performance inference engine, which is originally developed by Baidu engineer

null 514 Dec 28, 2022
A modular, research-friendly framework for high-performance and inference of sequence models at many scales

T5X T5X is a modular, composable, research-friendly framework for high-performance, configurable, self-service training, evaluation, and inference of

Google Research 1.1k Jan 8, 2023
Bytedance Inc. 2.5k Jan 6, 2023
thundernet ncnn

MMDetection_Lite 基于mmdetection 实现一些轻量级检测模型,安装方式和mmdeteciton相同 voc0712 voc 0712训练 voc2007测试 coco预训练 thundernet_voc_shufflenetv2_1.5 input shape mAP 320

DayBreak 39 Dec 5, 2022
quantize aware training package for NCNN on pytorch

ncnnqat ncnnqat is a quantize aware training package for NCNN on pytorch. Table of Contents ncnnqat Table of Contents Installation Usage Code Examples

null 62 Nov 23, 2022
Ultra-lightweight human body posture key point CNN model. ModelSize:2.3MB HUAWEI P40 NCNN benchmark: 6ms/img,

Ultralight-SimplePose Support NCNN mobile terminal deployment Based on MXNET(>=1.5.1) GLUON(>=0.7.0) framework Top-down strategy: The input image is t

null 223 Dec 27, 2022
Official Pytorch implementation of 'GOCor: Bringing Globally Optimized Correspondence Volumes into Your Neural Network' (NeurIPS 2020)

Official implementation of GOCor This is the official implementation of our paper : GOCor: Bringing Globally Optimized Correspondence Volumes into You

Prune Truong 71 Nov 18, 2022
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

ONNX Runtime is a cross-platform inference and training machine-learning accelerator. ONNX Runtime inference can enable faster customer experiences an

Microsoft 8k Jan 4, 2023
Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks

Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks. Bayesian-Torch is designed to be flexible and seamless in extending a deterministic deep neural network architecture to corresponding Bayesian form by simply replacing the deterministic layers with Bayesian layers.

Intel Labs 210 Jan 4, 2023
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

Microsoft 14.5k Jan 8, 2023
Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.

Jittor: a Just-in-time(JIT) deep learning framework Quickstart | Install | Tutorial | Chinese Jittor is a high-performance deep learning framework bas

null 2.7k Jan 3, 2023
MetaBalance: High-Performance Neural Networks for Class-Imbalanced Data

This repository is the official PyTorch implementation of Meta-Balance. Find the paper on arxiv MetaBalance: High-Performance Neural Networks for Clas

Arpit Bansal 20 Oct 18, 2021
Torchserve server using a YoloV5 model running on docker with GPU and static batch inference to perform production ready inference.

Yolov5 running on TorchServe (GPU compatible) ! This is a dockerfile to run TorchServe for Yolo v5 object detection model. (TorchServe (PyTorch librar

null 82 Nov 29, 2022
Monocular 3D pose estimation. OpenVINO. CPU inference or iGPU (OpenCL) inference.

human-pose-estimation-3d-python-cpp RealSenseD435 (RGB) 480x640 + CPU Corei9 45 FPS (Depth is not used) 1. Run 1-1. RealSenseD435 (RGB) 480x640 + CPU

Katsuya Hyodo 8 Oct 3, 2022
PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices.

PyTorch-LIT PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices. With

Amin Rezaei 157 Dec 11, 2022
Data-depth-inference - Data depth inference with python

Welcome! This readme will guide you through the use of the code in this reposito

Marco 3 Feb 8, 2022
CPU inference engine that delivers unprecedented performance for sparse models

The DeepSparse Engine is a CPU runtime that delivers unprecedented performance by taking advantage of natural sparsity within neural networks to reduce compute required as well as accelerate memory bound workloads. It is focused on model deployment and scaling machine learning pipelines, fitting seamlessly into your existing deployments as an inference backend.

Neural Magic 1.2k Jan 9, 2023