High performance Cross-platform Inference-engine, you could run Anakin on x86-cpu,arm, nv-gpu, amd-gpu,bitmain and cambricon devices.

Overview

Anakin2.0

Build Status License Coverage Status

Welcome to the Anakin GitHub.

Anakin is a cross-platform, high-performance inference engine, which is originally developed by Baidu engineers and is a large-scale application of industrial products.

Please refer to our release announcement to track the latest feature of Anakin.

Features

  • Flexibility

    Anakin is a cross-platform, high-performance inference engine, supports a wide range of neural network architectures and different hardware platforms. It is easy to run Anakin on GPU / x86 / ARM platform.

    Anakin has integrated with NVIDIA TensorRT and open source this part of integrated API to provide services, developers can call the API directly or modify it as needed, which will be more flexible for development requirements.

  • High performance

    In order to give full play to the performance of hardware, we optimized the forward prediction at different levels.

    • Automatic graph fusion. The goal of all performance optimizations under a given algorithm is to make the ALU as busy as possible. Operator fusion can effectively reduce memory access and keep the ALU busy.

    • Memory reuse. Forward prediction is a one-way calculation. We reuse the memory between the input and output of different operators, thus reducing the overall memory overhead.

    • Assembly level optimization. Saber is a underlying DNN library for Anakin, which is deeply optimized at assembly level.

NV GPU Benchmark

Machine And Enviornment

CPU: Intel(R) Xeon(R) CPU 5117 @ 2.0GHz
GPU: Tesla P4
cuda: CUDA8
cuDNN: v7

  • Time:warmup 10,running 1000 times to get average time
  • Latency (ms) and Memory(MB) of different batch

The counterpart of Anakin is the acknowledged high performance inference engine NVIDIA TensorRT 5 , The models which TensorRT 5 doesn't support we use the custom plugins to support.

VGG16

Batch_Size RT latency FP32(ms) Anakin2 Latency FP32 (ms) RT Memory (MB) Anakin2 Memory (MB)
1 8.52532 8.2387 1090.89 702
2 14.1209 13.8772 1056.02 768.76
4 24.4529 24.3391 1002.17 840.54
8 46.7956 46.3309 1098.98 935.61

Resnet50

Batch_Size RT latency FP32(ms) Anakin2 Latency FP32 (ms) RT Latency INT8 (ms) Anakin2 Latency INT8 (ms) RT Memory FP32(MB) Anakin2 Memory FP32(MB)
1 4.6447 3.0863 1.78892 1.61537 1134.88 311.25
2 6.69187 5.13995 2.71136 2.70022 1108.86 382
4 11.1943 9.20513 4.16771 4.77145 885.96 406.86
8 19.8769 17.1976 6.2798 8.68197 813.84 532.61

Resnet101

Batch_Size RT latency (ms) Anakin2 Latency (ms) RT Latency INT8 (ms) Anakin2 Latency INT8 (ms) RT Memory (MB) Anakin2 Memory (MB)
1 9.98695 5.44947 2.81031 2.74399 1159.16 500.5
2 17.3489 8.85699 4.8641 4.69473 1158.73 492
4 20.6198 16.8214 7.11608 8.45324 1021.68 541.08
8 31.9653 33.5015 11.2403 15.4336 914.49 611.54

X86 CPU Benchmark

Machine And Enviornment

CPU: Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz with HT, for FP32 test
CPU: Intel(R) Xeon(R) Gold 6271 CPU @ 2.60GHz with HT, for INT8 test
System: CentOS 6.3 with GCC 4.8.2, for benchmark between Anakin and Intel Caffe

  • All test enable 8 thread parallel
  • Time:warmup 10,running 200 times to get average time

The counterpart of Anakin is Intel Cafe(1.1.6) with mklml.

Net_Name Batch_Size Anakin2 Latency(2650v4) fp32 (ms) caffe Latency(2650v4) fp32 (ms) Anakin2 Latency int8(6271) (ms)
resnet50 1 20.6201 24.1369 3.20866
resnet50 2 39.2286 43.1096 5.44311
resnet50 4 77.1392 81.8814 9.93424
resnet50 8 152.941 158.321 19.5618
vgg16 1 55.6132 70.532 15.3181
vgg16 2 96.5034 131.451 22.5082
vgg16 4 180.479 247.926 37.2974
vgg16 8 346.619 485.44 67.6682
mobilenetv1 1 3.98104 5.42775 0.926546
mobilenetv1 2 7.27079 9.16058 1.35007
mobilenetv1 4 14.4029 16.2505 2.37271
mobilenetv1 8 29.1651 29.8381 3.75992
vgg16_ssd 1 125.948 143.412
vgg16_ssd 2 247.242 266.22
vgg16_ssd 4 488.377 510.978
vgg16_ssd 8 972.762 995.407
mobilenetv2 1 3.78504 23.0066
mobilenetv2 2 7.24622 65.9301
mobilenetv2 4 13.7638 85.3893
mobilenetv2 8 28.4093 131.669

ARM CPU Benchmark

Machine And Enviornment

CPU: Kirin 980
CPU: Snapdragon 652
CPU: Snapdragon 855
CPU: RK3399

  • Compile circumstance: Android ndk cross compile,gcc 4.9,enable neon
  • Time:warmup 10,running 10 times to get average time
  • Note: 1、shufflenetv2 int8 model add swish operator

The counterpart of Anakin is ncnn(20190320). This benchmark we test ARMv7 ARMv8 splitly

ARMv8 TEST

  • ABI: arm64-v8a
  • Latency (ms) of one batch
Kirin 980 Anakin fp32 Anakin int8 NCNN fp32 NCNN int8
1 thread 2 thread 4 thread 1 thread 2 thread 4 thread 1 thread 2 thread 4 thread 1 thread 2 thread 4 thread
mobilenet_v1 34.172 19.369 12.723 37.588 20.692 13.280 45.420 24.220 16.730 50.560 27.820 20.010
mobilenet_v2 30.489 17.784 12.327 29.581 17.208 15.307 30.390 17.310 12.900
mobilenet_ssd 71.609 37.477 28.952 88.220 70.070 66.430 103.700 85.160 85.320
resnet50 255.748 137.842 104.628 1299.480 695.830 498.010 243.360 131.100 89.800
shufflenetv1 11.544 8.931 7.027 12.810 9.390 8.030
shufflenetv2 11.687 7.899 5.321 20.402 11.529 9.061
squeezenet 28.580 16.638 14.435
googlenet 93.917 52.742 40.301 130.875 72.522 54.204


Snapdragon 855 Anakin fp32 Anakin int8 NCNN fp32 NCNN int8
1 thread 2 thread 4 thread 1 thread 2 thread 4 thread 1 thread 2 thread 4 thread 1 thread 2 thread 4 thread
mobilenet_v1 32.019 19.024 10.491 34.363 20.292 10.382 37.110 22.310 13.520 47.430 28.350 15.830
mobilenet_v2 28.533 17.455 10.433 24.487 15.182 9.133 25.060 15.970 11.250
mobilenet_ssd 66.454 41.397 23.639 101.560 69.380 43.930 136.420 91.010 47.490
resnet50 201.362 132.133 78.300 1141.290 724.090 385.990 229.020 138.450 82.060
shufflenetv1 10.153 7.101 5.327 11.610 8.020 5.870
shufflenetv2 10.868 6.713 4.526 17.306 10.987 6.788
squeezenet 25.880 16.134 9.697
googlenet 85.774 54.518 34.025 118.120 73.686 41.865


Snapdragon 652 Anakin fp32 Anakin int8 NCNN fp32 NCNN int8
1 thread 2 thread 4 thread 1 thread 2 thread 4 thread 1 thread 2 thread 4 thread 1 thread 2 thread 4 thread
mobilenet_v1 109.994 54.937 33.174 83.887 43.639 24.665 123.320 122.670 65.100 128.800 154.370 125.570
mobilenet_v2 80.712 46.314 30.874 69.340 43.590 31.864 89.920 90.900 55.320
mobilenet_ssd 246.459 121.684 134.019 248.190 138.170 142.350 247.020 145.080 211.000
resnet50 673.285 346.287 378.065 880.940 514.190 533.760 313.630
shufflenetv1 34.948 26.635 21.571 39.950 25.520 20.180
shufflenetv2 35.530 21.440 16.434 49.498 29.116 19.346
squeezenet 87.037 47.192 28.663
googlenet 268.023 148.533 95.624 236.492 131.510 81.561


RK3399 Anakin fp32 Anakin int8 NCNN fp32 NCNN int8
1 thread 2 thread 4 thread 1 thread 2 thread 4 thread 1 thread 2 thread 4 thread 1 thread 2 thread 4 thread
mobilenet_v1 111.317 60.008 87.201 45.693 149.270 91.200 142.790 86.140
mobilenet_v2 105.767 60.899 79.065 53.914 118.530 86.900
mobilenet_ssd 232.923 128.337 268.900 157.860 256.560 149.730
resnet50 671.800 369.386 1029.300 571.230 569.250 344.830
shufflenetv1 38.761 25.971
shufflenetv2 36.220 22.095 51.879 30.351
squeezenet 98.489 54.863
googlenet 274.166 159.429 235.085 133.044

ARMv7 TEST

  • ABI: armveabi-v7a with neon
  • Latency (ms) of one batch
Kirin 980 Anakin fp32 Anakin int8 NCNN fp32 NCNN int8
1 thread 2 thread 4 thread 1 thread 2 thread 4 thread 1 thread 2 thread 4 thread 1 thread 2 thread 4 thread
mobilenet_v1 39.051 19.813 14.184 39.026 22.048 14.250 50.240 26.850 20.010 92.900 49.420 37.160
mobilenet_v2 36.052 19.550 14.507 32.656 19.641 15.735 35.890 20.730 18.550
mobilenet_ssd 83.474 44.530 33.116 99.960 53.160 84.360 180.000 91.380 68.140
resnet50 291.478 158.954 129.484 1412.37 766.62 560.760 355.010 189.18 133.410
shufflenetv1 11.909 9.761 7.441 16.030 10.660 8.120
shufflenetv2 11.755 7.983 6.289 21.968 14.111 9.888
squeezenet 30.148 20.908 17.084
googlenet 108.210 65.798 58.630 140.886 79.910 60.693


Snapdragon 855 Anakin fp32 Anakin int8 NCNN fp32 NCNN int8
1 thread 2 thread 4 thread 1 thread 2 thread 4 thread 1 thread 2 thread 4 thread 1 thread 2 thread 4 thread
mobilenet_v1 34.015 20.064 11.410 42.222 21.532 11.746 41.150 24.870 18.420 79.180 48.470 24.530
mobilenet_v2 30.742 18.507 11.354 24.628 15.133 9.079 30.060 19.220 15.520
mobilenet_ssd 69.749 44.010 26.000 85.030 62.770 48.940 154.600 138.700 82.140
resnet50 218.581 146.509 92.899 1380.340 996.410 540.660 324.720 261.920 126.270
shufflenetv1 11.032 7.430 5.369 13.390 9.270 6.360
shufflenetv2 11.372 7.120 4.728 19.393 12.278 7.719
squeezenet 27.860 17.538 10.729
googlenet 100.719 69.509 49.021 127.982 83.369 50.275


Snapdragon 652 Anakin fp32 Anakin int8 NCNN fp32 NCNN int8
1 thread 2 thread 4 thread 1 thread 2 thread 4 thread 1 thread 2 thread 4 thread 1 thread 2 thread 4 thread
mobilenet_v1 121.982 63.004 37.325 86.672 45.728 26.354 130.740 140.850 81.810 184.630 192.730 144.740
mobilenet_v2 89.113 50.609 35.291 72.679 45.888 33.887 94.520 101.380 65.570
mobilenet_ssd 236.466 132.293 86.335 270.630 295.520 174.280 350.640 286.420 243.850
resnet50 751.528 405.433 255.699 2762.890 1447.070 883.730 664.180 369.020
shufflenetv1 36.883 23.718 15.144 53.660 33.450 23.330
shufflenetv2 36.933 26.353 20.507 53.243 31.083 21.550
squeezenet 92.748 51.936 33.027
googlenet 296.092 179.542 125.509 242.505 140.083 89.646


RK3399 Anakin fp32 Anakin int8 NCNN fp32 NCNN int8
1 thread 2 thread 1 thread 2 thread 1 thread 2 thread 1 thread 2 thread
mobilenet_v1 116.981 65.033 87.768 47.617 155.830 98.520 201.800 116.440
mobilenet_v2 118.229 70.567 83.790 55.413 126.530 90.930
mobilenet_ssd 237.196 134.508 292.130 183.650 361.570 200.370
resnet50 725.582 413.995 2883.120 1632.800 702.660 404.970
shufflenetv1 41.094 27.353
shufflenetv2 37.660 23.489 53.558 32.122
squeezenet 104.519 59.402
googlenet 305.304 190.897 244.855 142.493

Documentation

All you need is in Doc Index

We also provide English and Chinese tutorial documentation.

Ask Questions

You are welcome to submit questions and bug reports as Github Issues.

Copyright and License

Anakin is provided under the Apache-2.0 license.

Acknowledgement

Anakin refers to the following projects:

Comments
  • 使用开源代码可否遵守开源协议?

    使用开源代码可否遵守开源协议?

    相关 issue:https://github.com/PaddlePaddle/Anakin/issues/527

    在 https://github.com/PaddlePaddle/Anakin/issues/527 里面 @yiicy 说

    感谢指出问题,我们最开始的GEMM借鉴了compute library,现在sgemm.cpp里的接口已经弃用了,统一使用sgemm_prepacked.cpp里的接口,实现上也做了相应的变化,sgemm_prepacked.cpp的xblock将按照l2 cache大小划分

    我理解为你们承认了你们使用了 acl 的开源代码,那么请遵守它们的 MIT 协议。没有人关心你们的 block 怎么划分

    在同一个 issue 里面,@throneclay 说

    没人回复这个就关了,如果有问题可以随时跟我们联系

    我和 @izp001 都进一步回复了,请你们也正面给出回应

    opened by daquexian 7
  • TensorRT MobileNet benchmark results

    TensorRT MobileNet benchmark results

    I am curious about how you implement depthwise plugin layer in TRT by yourselves? The MobileNet v1, v2 test results do not make sense to me. Why so little latency difference between batchsize=1 and 2?

    opened by chybhao666 3
  • Bump protobuf from 3.1.0 to 3.15.0 in /tools/external_converter_v2

    Bump protobuf from 3.1.0 to 3.15.0 in /tools/external_converter_v2

    Bumps protobuf from 3.1.0 to 3.15.0.

    Release notes

    Sourced from protobuf's releases.

    Protocol Buffers v3.15.0

    Protocol Compiler

    • Optional fields for proto3 are enabled by default, and no longer require the --experimental_allow_proto3_optional flag.

    C++

    • MessageDifferencer: fixed bug when using custom ignore with multiple unknown fields
    • Use init_seg in MSVC to push initialization to an earlier phase.
    • Runtime no longer triggers -Wsign-compare warnings.
    • Fixed -Wtautological-constant-out-of-range-compare warning.
    • DynamicCastToGenerated works for nullptr input for even if RTTI is disabled
    • Arena is refactored and optimized.
    • Clarified/specified that the exact value of Arena::SpaceAllocated() is an implementation detail users must not rely on. It should not be used in unit tests.
    • Change the signature of Any::PackFrom() to return false on error.
    • Add fast reflection getter API for strings.
    • Constant initialize the global message instances
    • Avoid potential for missed wakeup in UnknownFieldSet
    • Now Proto3 Oneof fields have "has" methods for checking their presence in C++.
    • Bugfix for NVCC
    • Return early in _InternalSerialize for empty maps.
    • Adding functionality for outputting map key values in proto path logging output (does not affect comparison logic) and stop printing 'value' in the path. The modified print functionality is in the MessageDifferencer::StreamReporter.
    • Fixed protocolbuffers/protobuf#8129
    • Ensure that null char symbol, package and file names do not result in a crash.
    • Constant initialize the global message instances
    • Pretty print 'max' instead of numeric values in reserved ranges.
    • Removed remaining instances of std::is_pod, which is deprecated in C++20.
    • Changes to reduce code size for unknown field handling by making uncommon cases out of line.
    • Fix std::is_pod deprecated in C++20 (#7180)
    • Fix some -Wunused-parameter warnings (#8053)
    • Fix detecting file as directory on zOS issue #8051 (#8052)
    • Don't include sys/param.h for _BYTE_ORDER (#8106)
    • remove CMAKE_THREAD_LIBS_INIT from pkgconfig CFLAGS (#8154)
    • Fix TextFormatMapTest.DynamicMessage issue#5136 (#8159)
    • Fix for compiler warning issue#8145 (#8160)
    • fix: support deprecated enums for GCC < 6 (#8164)
    • Fix some warning when compiling with Visual Studio 2019 on x64 target (#8125)

    Python

    • Provided an override for the reverse() method that will reverse the internal collection directly instead of using the other methods of the BaseContainer.
    • MessageFactory.CreateProtoype can be overridden to customize class creation.

    ... (truncated)

    Commits
    • ae50d9b Update protobuf version
    • 8260126 Update protobuf version
    • c741c46 Resovled issue in the .pb.cc files
    • eef2764 Resolved an issue where NO_DESTROY and CONSTINIT were in incorrect order
    • 0040102 Updated collect_all_artifacts.sh for Ubuntu Xenial
    • 26cb6a7 Delete root-owned files in Kokoro builds
    • 1e924ef Update port_def.inc
    • 9a80cf1 Update coded_stream.h
    • a97c4f4 Merge pull request #8276 from haberman/php-warning
    • 44cd75d Merge pull request #8282 from haberman/changelog
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 2
  • fluid2Anakin failed at the beginning

    fluid2Anakin failed at the beginning

    I'm trying to convert fluid model to anakin, but I got an error in the very beginning. Hope you can help me.

    Issue:

    ERROR: flag 'logtostderr' was defined more than once (in files '/tmp/glog-20171216-78363-w0y7q0/glog-0.3.5/src/logging.cc' and '/Users/python2/yangjiabin/PR/Paddle/build/third_party/glog/src/extern_glog/src/logging.cc').
    

    ps: /Users/python2/yangjiabin/... this is not my path.

    Env: laptop: MacOSX 10.12.6 fluid model: MobileNetV1: model+params (pretrained models of paddlepaddle official) Anakin branch: developing (already up to date)

    config.yaml:

    OPTIONS:
      Framework: FLUID
      SavePath: ./model/mobilenet_v1/output
      ResultName: MobileNetV1_fluid
      Config:
        LaunchBoard: ON
        Server:
          ip: 127.0.0.1
          port: 6007
        OptimizedGraph:
          enable: ON
          path: ./model/mobilenet_v1/output/MobileNetV1_fluid.anakin.bin
      LOGGER:
        LogToPath: ./model/mobilenet_v1/log/
        WithColor: ON
    
    TARGET:
      CAFFE:
        # path to proto files
        ProtoPaths:
          - 
        PrototxtPath: 
        ModelPath: 
        Remark:  # Generally no need to modify.
    
      FLUID:
        # path of fluid inference model
        Debug: NULL                            # Generally no need to modify.
        ModelPath: /Users/xxx/MobileNetV1/fluid
        NetType:                               # Generally no need to modify.
    

    Any comments will be appreciated. Thanks in advance!

    opened by ljayx 2
  • Anakin docker build error on ubuntu 16.04 for Nvidia-GPU

    Anakin docker build error on ubuntu 16.04 for Nvidia-GPU

    I followed the anakin docker guide guide.

    and issue the command

    sudo ./anakin_docker_build_and_run.sh -p NVIDIA-GPU -o Ubuntu -m Build
    

    error occur:

    -- cudnn include header is CUDNN_INCLUDE_DIR-NOTFOUND/cudnn.h
    -- cudnn library is /usr/lib/x86_64-linux-gnu/libcudnn.so/libcudnn.so
    CMake Error at cmake/cuda.cmake:126 (message):
      Could not find cudnn library in:
    Call Stack (most recent call first):
      cmake/gather.cmake:16 (anakin_find_cudnn)
      CMakeLists.txt:178 (include)
    
    
    -- Found CUDA: /usr/local/cuda (found suitable version "8.0", minimum required is "7.5") 
    -- Building fat-bin for cuda code !
    --  `--support arch :  3.5
    --  `--support arch :  5.0
    --  `--support arch :  6.0
    --  `--support arch :  6.1
    -- Looking for include file pthread.h
    -- Looking for include file pthread.h - found
    -- Looking for pthread_create
    -- Looking for pthread_create - not found
    -- Looking for pthread_create in pthreads
    -- Looking for pthread_create in pthreads - not found
    -- Looking for pthread_create in pthread
    -- Looking for pthread_create in pthread - found
    -- Found Threads: TRUE  
    -- Found PROTOBUF: /usr/local/lib/libprotobuf.so  
    -- Found protobuf in /usr/local/include
    -- 
    -- ================================ configuration ==================================
    -- General:
    --   anakin version            : 0.1.2
    --   System                    : Linux
    --   C++ compiler              : /usr/bin/c++
    --   C flags                   : 
    --   CXX flags                 :  -std=c++11 -fPIC -ldl -lrt -W  -pthread    -Wno-unused-variable      -fpermissive  -fdiagnostics-show-option -Wno-undef -Wno-narrowing -Wno-unknown-pragmas -Wno-delete-non-virtual-dtor -Wno-comment -Wno-sign-compare -Wno-ignored-qualifiers -Wno-enum-compare -O3 -DNDEBUG
    --   Build type                : Release;FORCE
    --   Build cross plantform     : ON
    --   Build anakin fp32         : ON
    -- 
    --   Build shared libs         : YES
    --   Build with unit test      : YES
    -- 
    --   Enable verbose message    : NO
    --   Enable noisy warnings     : NO
    --   Disable all warnings      : YES
    -- 
    --   Use local logger          : logger
    --   Use google protobuf       : ON
    --   Use local Unit test       : aktest
    --   USE_OPENCV                : OFF
    --   USE_BOOST                 : OFF
    --   USE_OPENMP                : 
    -- 
    -- Cuda:
    --   USE_CUDA                  : ON
    --     |--CUDA version         : 8.0
    --     `--NVCC flags           :   -Xcompiler -fPIC -O3 -std=c++11 --default-stream per-thread -Wno-deprecated-gpu-targets  --generate-code arch=compute_35,code=sm_35  --generate-code arch=compute_50,code=sm_50  --generate-code arch=compute_60,code=sm_60  --generate-code arch=compute_61,code=sm_61 
    --   USE_CUBLAS                : ON
    --   USE_CURAND                : ON
    --   USE_CUFFT                 : ON
    --   USE_CUDNN                 : ON
    --     `--Cudnn version        : 
    -- 
    --   USE_OPENCL                : OFF
    -- 
    --   SELECT_GPU_PLACE          : YES
    -- 
    --   Configuation path         : /Anakin/gpu_build_sm61/anakin_config.h
    -- ================================ End ==================================
    -- Configuring incomplete, errors occurred!
    See also "/Anakin/gpu_build_sm61/CMakeFiles/CMakeOutput.log".
    See also "/Anakin/gpu_build_sm61/CMakeFiles/CMakeError.log".
    make: *** No targets specified and no makefile found.  Stop.
    sha256:694bca37cbee627f47bfc46e71af099147bb61045fddf6ec3abd1beeab69d00d
    
    opened by kezunlin 2
  • load saved model failed

    load saved model failed

    WAN| 08:46:02.00618| 6.902s| main_thread| parser.cpp:138] Parsing in edges of node : relu5_4 FTL| 08:46:02.00618| 6.902s| main_thread| graph_base.inl:57] Check failed: (this->has_vertex(arc.bottom()) && this->has_vertex(arc.top())) The arc's top or bottom is not vertex! *** Check failed: (this->has_vertex(arc.bottom()) && this->has_vertex(arc.top())) fatal error: stack trace: *** 6 0x4a3329 ./anakin_demo() [0x4a3329] 5 0x7febe6ff2830 __libc_start_main + 240 [??:0] 4 0x4a420b ./anakin_demo() [0x4a420b] 3 0x7febe8878e96 anakin::graph::Graph<anakin::saber::TargetType<(anakin::saber::TargetTypeEnum)1>, (anakin::saber::DataType)1, (anakin::Precision)0>::load(std::string) + 202 [??:0] 2 0x7febe877e99a anakin::Status anakin::parser::load<anakin::saber::TargetType<(anakin::saber::TargetTypeEnum)1>, (anakin::saber::DataType)1, (anakin::Precision)0>(anakin::graph::Graph<anakin::saber::TargetType<(anakin::saber::TargetTypeEnum)1>, (anakin::saber::DataType)1, (anakin::Precision)0>, std::string&) + 69 [??:0] 1 0x7febe877f4d2 anakin::Status anakin::parser::load<anakin::saber::TargetType<(anakin::saber::TargetTypeEnum)1>, (anakin::saber::DataType)1, (anakin::Precision)0>(anakin::graph::Graph<anakin::saber::TargetType<(anakin::saber::TargetTypeEnum)1>, (anakin::saber::DataType)1, (anakin::Precision)0>, char const*) + 2656 [??:0] 0 0x4e5cbd anakin::graph::GraphBase<std::string, std::shared_ptr<anakin::graph::Node<anakin::saber::TargetType<(anakin::saber::TargetTypeEnum)1>, (anakin::saber::DataType)1, (anakin::Precision)0> >, anakin::saber::Tensor<anakin::saber::TargetType<(anakin::saber::TargetTypeEnum)1>, (anakin::saber::DataType)1, anakin::saber::NCHW>*, anakin::graph::Edge<anakin::saber::TargetType<(anakin::saber::TargetTypeEnum)1>, (anakin::saber::DataType)1> >::add_in_arc(anakin::graph::Edge<anakin::saber::TargetType<(anakin::saber::TargetTypeEnum)1>, (anakin::saber::DataType)1>&) + 451 [??:?]

    opened by adeagle 2
  • Support for different input shapes between frames

    Support for different input shapes between frames

    Hi, I have a new question,

    Does Anakin support “variadic” input shapes? For example, in frame 1, I have one standard 224224 image as input. But in frame 2, I have another image of any arbitrary size like 456789. I would like to know if Anakin could support.

    Thank you~ ~

    opened by hxbloom 2
  • Support and benchmark for int8

    Support and benchmark for int8

    Hi!, I am interested in the Anakin project.

    I would like to know if there are any supports & benchmark results for int8? If not, will there be any? thx!

    opened by hxbloom 2
  • ubuntu 16.04 build fail on gcc 4.8.5

    ubuntu 16.04 build fail on gcc 4.8.5

    based on ubuntu Dockerfile, and changed gcc version to 4.8.5. error message as follows.

    Linking CXX executable ../../output/unit_test/test_saber_buffer_NV /usr/bin/ld: ../../output/libanakin.so.2.0.1: undefined reference to symbol 'cudnnGetRNNLinLayerBiasParams' /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/libcudnn.so: error adding symbols: DSO missing from command line collect2: error: ld returned 1 exit status test/CMakeFiles/test_TargetWrapper.dir/build.make:93: recipe for target '../output/unit_test/test_TargetWrapper' failed make[2]: *** [../output/unit_test/test_TargetWrapper] Error 1 CMakeFiles/Makefile2:757: recipe for target 'test/CMakeFiles/test_TargetWrapper.dir/all' failed make[1]: *** [test/CMakeFiles/test_TargetWrapper.dir/all] Error 2

    bug 
    opened by micytw 2
  • 转换TensorFlow模型时不支持String类型Node

    转换TensorFlow模型时不支持String类型Node

    使用convert.py转换TensorFlow(1.12版本)模型时报错如下: tensor.type=7 Traceback (most recent call last): File "converter.py", line 113, in graph = Graph(config) File "/data0/xikali/tangram_run/anakin/tools/external_converter_v2/parser/graph.py", line 46, in init self.graph_io = self.parser() File "/data0/xikali/tangram_run/anakin/tools/external_converter_v2/parser/tensorflow/parser_tf.py", line 27, in call med_graph = self._conver_tf_2_med() File "/data0/xikali/tangram_run/anakin/tools/external_converter_v2/parser/tensorflow/parser_tf.py", line 41, in _conver_tf_2_med return parser.parse() File "/data0/xikali/tangram_run/anakin/tools/external_converter_v2/parser/tensorflow/parse_tf_2_med.py", line 201, in parse nodes = self._parse_tf_node(tf_graph, {}) File "/data0/xikali/tangram_run/anakin/tools/external_converter_v2/parser/tensorflow/parse_tf_2_med.py", line 73, in _parse_tf_node anakin_tensor = tf_to_anakin_tensor(node.get_attr(a)) File "/data0/xikali/tangram_run/anakin/tools/external_converter_v2/parser/tensorflow/tf_trans_util.py", line 94, in tf_to_anakin_tensor new_type = TF_TO_ANAKIN_DTYPE[tensor.dtype] KeyError: 7 看上去应该是无法支持tensor.dtype是string类型的?已经删除了所有string类型的特征,但是tf中Assert的op一定是包含string类型的tensor,这个如何解决呢?

    opened by liuliuniu 1
  • can't convert from caffe  model.

    can't convert from caffe model.

    my config:

    > OPTIONS:
    >     Framework: CAFFE
    >     SavePath: ./output
    >     ResultName: face_r100
    >     Config:
    >         LaunchBoard: ON
    >         Server:
    >             ip: 0.0.0.0
    >             port: 8888
    >         OptimizedGraph:
    >             enable: OFF
    >             path: ./googlenet.paddle_inference_model.bin.saved
    >     LOGGER:
    >         LogToPath: ./log/
    >         WithColor: ON
    > 
    > TARGET:
    >     CAFFE:
    >         # path of fluid inference model
    >         Debug: NULL                            # Generally no need to modify.
    >         PrototxtPath: ./model/model.prototxt        # The upper path of a fluid inference model.
    >         ModelPath: ./model/model.caffmodel        # The upper path of a fluid inference model.
    >         NetType:        
    

    Traceback (most recent call last): File "converter.py", line 79, in graph = Graph(config) File "/root/Anakin/tools/external_converter_v2/parser/graph.py", line 26, in init raise NameError('ERROR: GrapProtoIO not support %s model.' % (config.framework)) NameError: ERROR: GrapProtoIO not support CAFFE model.

    opened by szad670401 1
  • Bump protobuf from 3.1.0 to 3.18.3 in /tools/external_converter_v2

    Bump protobuf from 3.1.0 to 3.18.3 in /tools/external_converter_v2

    Bumps protobuf from 3.1.0 to 3.18.3.

    Release notes

    Sourced from protobuf's releases.

    Protocol Buffers v3.18.3

    C++

    Protocol Buffers v3.16.1

    Java

    • Improve performance characteristics of UnknownFieldSet parsing (#9371)

    Protocol Buffers v3.18.2

    Java

    • Improve performance characteristics of UnknownFieldSet parsing (#9371)

    Protocol Buffers v3.18.1

    Python

    • Update setup.py to reflect that we now require at least Python 3.5 (#8989)
    • Performance fix for DynamicMessage: force GetRaw() to be inlined (#9023)

    Ruby

    • Update ruby_generator.cc to allow proto2 imports in proto3 (#9003)

    Protocol Buffers v3.18.0

    C++

    • Fix warnings raised by clang 11 (#8664)
    • Make StringPiece constructible from std::string_view (#8707)
    • Add missing capability attributes for LLVM 12 (#8714)
    • Stop using std::iterator (deprecated in C++17). (#8741)
    • Move field_access_listener from libprotobuf-lite to libprotobuf (#8775)
    • Fix #7047 Safely handle setlocale (#8735)
    • Remove deprecated version of SetTotalBytesLimit() (#8794)
    • Support arena allocation of google::protobuf::AnyMetadata (#8758)
    • Fix undefined symbol error around SharedCtor() (#8827)
    • Fix default value of enum(int) in json_util with proto2 (#8835)
    • Better Smaller ByteSizeLong
    • Introduce event filters for inject_field_listener_events
    • Reduce memory usage of DescriptorPool
    • For lazy fields copy serialized form when allowed.
    • Re-introduce the InlinedStringField class
    • v2 access listener
    • Reduce padding in the proto's ExtensionRegistry map.
    • GetExtension performance optimizations
    • Make tracker a static variable rather than call static functions
    • Support extensions in field access listener
    • Annotate MergeFrom for field access listener
    • Fix incomplete types for field access listener
    • Add map_entry/new_map_entry to SpecificField in MessageDifferencer. They record the map items which are different in MessageDifferencer's reporter.
    • Reduce binary size due to fieldless proto messages
    • TextFormat: ParseInfoTree supports getting field end location in addition to start.

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • Security Vulnerability Found

    Security Vulnerability Found

    Absolute Path Traversal due to incorrect use of send_file call

    A path traversal attack (also known as directory traversal) aims to access files and directories that are stored outside the web root folder. By manipulating variables that reference files with “dot-dot-slash (../)” sequences and its variations or by using absolute file paths, it may be possible to access arbitrary files and directories stored on file system including application source code or configuration and critical system files. This attack is also known as “dot-dot-slash”, “directory traversal”, “directory climbing” and “backtracking”.

    Root Cause Analysis

    The os.path.join call is unsafe for use with untrusted input. When the os.path.join call encounters an absolute path, it ignores all the parameters it has encountered till that point and starts working with the new absolute path. Please see the example below.

    >>> import os.path
    >>> static = "path/to/mySafeStaticDir"
    >>> malicious = "/../../../../../etc/passwd"
    >>> os.path.join(t,malicious)
    '/../../../../../etc/passwd'
    

    Since the "malicious" parameter represents an absolute path, the result of os.path.join ignores the static directory completely. Hence, untrusted input is passed via the os.path.join call to flask.send_file can lead to path traversal attacks.

    In this case, the problems occurs due to the following code : https://github.com/PaddlePaddle/Anakin/blob/5fd68a6cc4c4620cd1a30794c1bf06eebd3f4730/docs/api_on_web/init.py#L26

    Here, the filename parameter is attacker controlled. This parameter passes through the unsafe os.path.join call making the effective directory and filename passed to the send_file call attacker controlled. This leads to a path traversal attack.

    Proof of Concept

    The bug can be verified using a proof of concept similar to the one shown below.

    curl --path-as-is 'http://<domain>///../../../../etc/passwd"'
    

    Remediation

    This can be fixed by preventing flow of untrusted data to the vulnerable send_file function. In case the application logic necessiates this behaviour, one can either use the werkzeug.utils.safe_join to join untrusted paths or replace flask.send_file calls with flask.send_from_directory calls.

    References

    This bug was found using CodeQL by Github

    opened by porcupineyhairs 0
CPU inference engine that delivers unprecedented performance for sparse models

The DeepSparse Engine is a CPU runtime that delivers unprecedented performance by taking advantage of natural sparsity within neural networks to reduce compute required as well as accelerate memory bound workloads. It is focused on model deployment and scaling machine learning pipelines, fitting seamlessly into your existing deployments as an inference backend.

Neural Magic 1.2k Jan 9, 2023
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 6.9k Jan 4, 2023
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 5.7k Feb 12, 2021
Control-Robot-Arm-using-PS4-Controller - A Robotic Arm based on Raspberry Pi and Arduino that controlled by PS4 Controller

Control-Robot-Arm-using-PS4-Controller You can see all details about this Robot

MohammadReza Sharifi 5 Jan 1, 2022
Monocular 3D pose estimation. OpenVINO. CPU inference or iGPU (OpenCL) inference.

human-pose-estimation-3d-python-cpp RealSenseD435 (RGB) 480x640 + CPU Corei9 45 FPS (Depth is not used) 1. Run 1-1. RealSenseD435 (RGB) 480x640 + CPU

Katsuya Hyodo 8 Oct 3, 2022
PPLNN is a Primitive Library for Neural Network is a high-performance deep-learning inference engine for efficient AI inferencing

PPLNN is a Primitive Library for Neural Network is a high-performance deep-learning inference engine for efficient AI inferencing

null 943 Jan 7, 2023
A Genetic Programming platform for Python with TensorFlow for wicked-fast CPU and GPU support.

Karoo GP Karoo GP is an evolutionary algorithm, a genetic programming application suite written in Python which supports both symbolic regression and

Kai Staats 149 Jan 9, 2023
A fast poisson image editing implementation that can utilize multi-core CPU or GPU to handle a high-resolution image input.

Poisson Image Editing - A Parallel Implementation Jiayi Weng (jiayiwen), Zixu Chen (zixuc) Poisson Image Editing is a technique that can fuse two imag

Jiayi Weng 110 Dec 27, 2022
PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices.

PyTorch-LIT PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices. With

Amin Rezaei 157 Dec 11, 2022
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

ONNX Runtime is a cross-platform inference and training machine-learning accelerator. ONNX Runtime inference can enable faster customer experiences an

Microsoft 8k Jan 4, 2023
Torchserve server using a YoloV5 model running on docker with GPU and static batch inference to perform production ready inference.

Yolov5 running on TorchServe (GPU compatible) ! This is a dockerfile to run TorchServe for Yolo v5 object detection model. (TorchServe (PyTorch librar

null 82 Nov 29, 2022
Code to run experiments in SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression.

Code to run experiments in SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression. Not an official Google product. Me

Google Research 27 Dec 12, 2022
Bytedance Inc. 2.5k Jan 6, 2023
A modified version of DeepMind's Alphafold2 to divide CPU part (MSA and template searching) and GPU part (prediction model)

ParallelFold Author: Bozitao Zhong This is a modified version of DeepMind's Alphafold2 to divide CPU part (MSA and template searching) and GPU part (p

Bozitao Zhong 77 Dec 22, 2022
A simplistic and efficient pure-python neural network library from Phys Whiz with CPU and GPU support.

A simplistic and efficient pure-python neural network library from Phys Whiz with CPU and GPU support.

Manas Sharma 19 Feb 28, 2022
A modular, research-friendly framework for high-performance and inference of sequence models at many scales

T5X T5X is a modular, composable, research-friendly framework for high-performance, configurable, self-service training, evaluation, and inference of

Google Research 1.1k Jan 8, 2023
X-modaler is a versatile and high-performance codebase for cross-modal analytics.

X-modaler X-modaler is a versatile and high-performance codebase for cross-modal analytics. This codebase unifies comprehensive high-quality modules i

null 910 Dec 28, 2022
Compute execution plan: A DAG representation of work that you want to get done. Individual nodes of the DAG could be simple python or shell tasks or complex deeply nested parallel branches or embedded DAGs themselves.

Hello from magnus Magnus provides four capabilities for data teams: Compute execution plan: A DAG representation of work that you want to get done. In

null 12 Feb 8, 2022
A tutorial showing how to train, convert, and run TensorFlow Lite object detection models on Android devices, the Raspberry Pi, and more!

A tutorial showing how to train, convert, and run TensorFlow Lite object detection models on Android devices, the Raspberry Pi, and more!

Evan 1.3k Jan 2, 2023