High performance Cross-platform Inference-engine, you could run Anakin on x86-cpu,arm, nv-gpu, amd-gpu,bitmain and cambricon devices.

Last update: Dec 28, 2022

Related tags

Deep Learning arm ai cross-platform amd high-performance intel nvidia inference-engine bitmain cambricon

Overview

Anakin2.0

Welcome to the Anakin GitHub.

Anakin is a cross-platform, high-performance inference engine, which is originally developed by Baidu engineers and is a large-scale application of industrial products.

Please refer to our release announcement to track the latest feature of Anakin.

Features

Flexibility

Anakin is a cross-platform, high-performance inference engine, supports a wide range of neural network architectures and different hardware platforms. It is easy to run Anakin on GPU / x86 / ARM platform.

Anakin has integrated with NVIDIA TensorRT and open source this part of integrated API to provide services, developers can call the API directly or modify it as needed, which will be more flexible for development requirements.
High performance

In order to give full play to the performance of hardware, we optimized the forward prediction at different levels.
- Automatic graph fusion. The goal of all performance optimizations under a given algorithm is to make the ALU as busy as possible. Operator fusion can effectively reduce memory access and keep the ALU busy.
- Memory reuse. Forward prediction is a one-way calculation. We reuse the memory between the input and output of different operators, thus reducing the overall memory overhead.
- Assembly level optimization. Saber is a underlying DNN library for Anakin, which is deeply optimized at assembly level.

NV GPU Benchmark

Machine And Enviornment

CPU: Intel(R) Xeon(R) CPU 5117 @ 2.0GHz
GPU: Tesla P4
cuda: CUDA8
cuDNN: v7

Time：warmup 10，running 1000 times to get average time
Latency (ms) and Memory(MB) of different batch

The counterpart of Anakin is the acknowledged high performance inference engine NVIDIA TensorRT 5 , The models which TensorRT 5 doesn't support we use the custom plugins to support.

VGG16

Batch_Size	RT latency FP32(ms)	Anakin2 Latency FP32 (ms)	RT Memory (MB)	Anakin2 Memory (MB)
1	8.52532	8.2387	1090.89	702
2	14.1209	13.8772	1056.02	768.76
4	24.4529	24.3391	1002.17	840.54
8	46.7956	46.3309	1098.98	935.61

Resnet50

Batch_Size	RT latency FP32(ms)	Anakin2 Latency FP32 (ms)	RT Latency INT8 (ms)	Anakin2 Latency INT8 (ms)	RT Memory FP32(MB)	Anakin2 Memory FP32(MB)
1	4.6447	3.0863	1.78892	1.61537	1134.88	311.25
2	6.69187	5.13995	2.71136	2.70022	1108.86	382
4	11.1943	9.20513	4.16771	4.77145	885.96	406.86
8	19.8769	17.1976	6.2798	8.68197	813.84	532.61

Resnet101

Batch_Size	RT latency (ms)	Anakin2 Latency (ms)	RT Latency INT8 (ms)	Anakin2 Latency INT8 (ms)	RT Memory (MB)	Anakin2 Memory (MB)
1	9.98695	5.44947	2.81031	2.74399	1159.16	500.5
2	17.3489	8.85699	4.8641	4.69473	1158.73	492
4	20.6198	16.8214	7.11608	8.45324	1021.68	541.08
8	31.9653	33.5015	11.2403	15.4336	914.49	611.54

X86 CPU Benchmark

Machine And Enviornment

CPU: Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz with HT, for FP32 test
CPU: Intel(R) Xeon(R) Gold 6271 CPU @ 2.60GHz with HT, for INT8 test
System: CentOS 6.3 with GCC 4.8.2, for benchmark between Anakin and Intel Caffe

All test enable 8 thread parallel
Time：warmup 10，running 200 times to get average time

The counterpart of Anakin is Intel Cafe(1.1.6) with mklml.

Net_Name	Batch_Size	Anakin2 Latency(2650v4) fp32 (ms)	caffe Latency(2650v4) fp32 (ms)	Anakin2 Latency int8(6271) (ms)
resnet50	1	20.6201	24.1369	3.20866
resnet50	2	39.2286	43.1096	5.44311
resnet50	4	77.1392	81.8814	9.93424
resnet50	8	152.941	158.321	19.5618
vgg16	1	55.6132	70.532	15.3181
vgg16	2	96.5034	131.451	22.5082
vgg16	4	180.479	247.926	37.2974
vgg16	8	346.619	485.44	67.6682
mobilenetv1	1	3.98104	5.42775	0.926546
mobilenetv1	2	7.27079	9.16058	1.35007
mobilenetv1	4	14.4029	16.2505	2.37271
mobilenetv1	8	29.1651	29.8381	3.75992
vgg16_ssd	1	125.948	143.412
vgg16_ssd	2	247.242	266.22
vgg16_ssd	4	488.377	510.978
vgg16_ssd	8	972.762	995.407
mobilenetv2	1	3.78504	23.0066
mobilenetv2	2	7.24622	65.9301
mobilenetv2	4	13.7638	85.3893
mobilenetv2	8	28.4093	131.669

ARM CPU Benchmark

Machine And Enviornment

CPU: Kirin 980
CPU: Snapdragon 652
CPU: Snapdragon 855
CPU: RK3399

Compile circumstance: Android ndk cross compile，gcc 4.9，enable neon
Time：warmup 10，running 10 times to get average time
Note: 1、shufflenetv2 int8 model add swish operator

The counterpart of Anakin is ncnn(20190320). This benchmark we test ARMv7 ARMv8 splitly

ARMv8 TEST

ABI： arm64-v8a

Latency (ms) of one batch

Kirin 980	Anakin fp32			Anakin int8			NCNN fp32			NCNN int8
	1 thread	2 thread	4 thread	1 thread	2 thread	4 thread	1 thread	2 thread	4 thread	1 thread	2 thread	4 thread
mobilenet_v1	34.172	19.369	12.723	37.588	20.692	13.280	45.420	24.220	16.730	50.560	27.820	20.010
mobilenet_v2	30.489	17.784	12.327	29.581	17.208	15.307	30.390	17.310	12.900
mobilenet_ssd	71.609	37.477	28.952				88.220	70.070	66.430	103.700	85.160	85.320
resnet50	255.748	137.842	104.628				1299.480	695.830	498.010	243.360	131.100	89.800
shufflenetv1	11.544	8.931	7.027				12.810	9.390	8.030
shufflenetv2	11.687	7.899	5.321	20.402	11.529	9.061
squeezenet	28.580	16.638	14.435
googlenet	93.917	52.742	40.301	130.875	72.522	54.204

Snapdragon 855	Anakin fp32			Anakin int8			NCNN fp32			NCNN int8
	1 thread	2 thread	4 thread	1 thread	2 thread	4 thread	1 thread	2 thread	4 thread	1 thread	2 thread	4 thread
mobilenet_v1	32.019	19.024	10.491	34.363	20.292	10.382	37.110	22.310	13.520	47.430	28.350	15.830
mobilenet_v2	28.533	17.455	10.433	24.487	15.182	9.133	25.060	15.970	11.250
mobilenet_ssd	66.454	41.397	23.639				101.560	69.380	43.930	136.420	91.010	47.490
resnet50	201.362	132.133	78.300				1141.290	724.090	385.990	229.020	138.450	82.060
shufflenetv1	10.153	7.101	5.327				11.610	8.020	5.870
shufflenetv2	10.868	6.713	4.526	17.306	10.987	6.788
squeezenet	25.880	16.134	9.697
googlenet	85.774	54.518	34.025	118.120	73.686	41.865

Snapdragon 652	Anakin fp32			Anakin int8			NCNN fp32			NCNN int8
	1 thread	2 thread	4 thread	1 thread	2 thread	4 thread	1 thread	2 thread	4 thread	1 thread	2 thread	4 thread
mobilenet_v1	109.994	54.937	33.174	83.887	43.639	24.665	123.320	122.670	65.100	128.800	154.370	125.570
mobilenet_v2	80.712	46.314	30.874	69.340	43.590	31.864	89.920	90.900	55.320
mobilenet_ssd	246.459	121.684	134.019				248.190	138.170	142.350	247.020	145.080	211.000
resnet50	673.285	346.287	378.065				880.940	514.190		533.760	313.630
shufflenetv1	34.948	26.635	21.571				39.950	25.520	20.180
shufflenetv2	35.530	21.440	16.434	49.498	29.116	19.346
squeezenet	87.037	47.192	28.663
googlenet	268.023	148.533	95.624	236.492	131.510	81.561

RK3399	Anakin fp32			Anakin int8			NCNN fp32			NCNN int8
	1 thread	2 thread	4 thread	1 thread	2 thread	4 thread	1 thread	2 thread	4 thread	1 thread	2 thread	4 thread
mobilenet_v1	111.317	60.008		87.201	45.693		149.270	91.200		142.790	86.140
mobilenet_v2	105.767	60.899		79.065	53.914		118.530	86.900
mobilenet_ssd	232.923	128.337					268.900	157.860		256.560	149.730
resnet50	671.800	369.386					1029.300	571.230		569.250	344.830
shufflenetv1	38.761	25.971
shufflenetv2	36.220	22.095		51.879	30.351
squeezenet	98.489	54.863
googlenet	274.166	159.429		235.085	133.044

ARMv7 TEST

ABI： armveabi-v7a with neon

Latency (ms) of one batch

Kirin 980	Anakin fp32			Anakin int8			NCNN fp32			NCNN int8
	1 thread	2 thread	4 thread	1 thread	2 thread	4 thread	1 thread	2 thread	4 thread	1 thread	2 thread	4 thread
mobilenet_v1	39.051	19.813	14.184	39.026	22.048	14.250	50.240	26.850	20.010	92.900	49.420	37.160
mobilenet_v2	36.052	19.550	14.507	32.656	19.641	15.735	35.890	20.730	18.550
mobilenet_ssd	83.474	44.530	33.116				99.960	53.160	84.360	180.000	91.380	68.140
resnet50	291.478	158.954	129.484				1412.37	766.62	560.760	355.010	189.18	133.410
shufflenetv1	11.909	9.761	7.441				16.030	10.660	8.120
shufflenetv2	11.755	7.983	6.289	21.968	14.111	9.888
squeezenet	30.148	20.908	17.084
googlenet	108.210	65.798	58.630	140.886	79.910	60.693

Snapdragon 855	Anakin fp32			Anakin int8			NCNN fp32			NCNN int8
	1 thread	2 thread	4 thread	1 thread	2 thread	4 thread	1 thread	2 thread	4 thread	1 thread	2 thread	4 thread
mobilenet_v1	34.015	20.064	11.410	42.222	21.532	11.746	41.150	24.870	18.420	79.180	48.470	24.530
mobilenet_v2	30.742	18.507	11.354	24.628	15.133	9.079	30.060	19.220	15.520
mobilenet_ssd	69.749	44.010	26.000				85.030	62.770	48.940	154.600	138.700	82.140
resnet50	218.581	146.509	92.899				1380.340	996.410	540.660	324.720	261.920	126.270
shufflenetv1	11.032	7.430	5.369				13.390	9.270	6.360
shufflenetv2	11.372	7.120	4.728	19.393	12.278	7.719
squeezenet	27.860	17.538	10.729
googlenet	100.719	69.509	49.021	127.982	83.369	50.275

Snapdragon 652	Anakin fp32			Anakin int8			NCNN fp32			NCNN int8
	1 thread	2 thread	4 thread	1 thread	2 thread	4 thread	1 thread	2 thread	4 thread	1 thread	2 thread	4 thread
mobilenet_v1	121.982	63.004	37.325	86.672	45.728	26.354	130.740	140.850	81.810	184.630	192.730	144.740
mobilenet_v2	89.113	50.609	35.291	72.679	45.888	33.887	94.520	101.380	65.570
mobilenet_ssd	236.466	132.293	86.335				270.630	295.520	174.280	350.640	286.420	243.850
resnet50	751.528	405.433	255.699				2762.890	1447.070	883.730	664.180	369.020
shufflenetv1	36.883	23.718	15.144				53.660	33.450	23.330
shufflenetv2	36.933	26.353	20.507	53.243	31.083	21.550
squeezenet	92.748	51.936	33.027
googlenet	296.092	179.542	125.509	242.505	140.083	89.646

RK3399	Anakin fp32		Anakin int8		NCNN fp32		NCNN int8
	1 thread	2 thread	1 thread	2 thread	1 thread	2 thread	1 thread	2 thread
mobilenet_v1	116.981	65.033	87.768	47.617	155.830	98.520	201.800	116.440
mobilenet_v2	118.229	70.567	83.790	55.413	126.530	90.930
mobilenet_ssd	237.196	134.508			292.130	183.650	361.570	200.370
resnet50	725.582	413.995			2883.120	1632.800	702.660	404.970
shufflenetv1	41.094	27.353
shufflenetv2	37.660	23.489	53.558	32.122
squeezenet	104.519	59.402
googlenet	305.304	190.897	244.855	142.493

Documentation

All you need is in Doc Index

We also provide English and Chinese tutorial documentation.

User guide

You can get the working principle of the project, C++ interface description and code examples from here. You can also learn about the model converter here.
Developer guide

You might want to know more details of Anakin and make it better. Please refer to how to add custom devices and how to add custom device operators.
How to Contribute

We appreciate your contributions!

Ask Questions

You are welcome to submit questions and bug reports as Github Issues.

Copyright and License

Anakin is provided under the Apache-2.0 license.

Acknowledgement

Anakin refers to the following projects:

Comments

使用开源代码可否遵守开源协议？

相关 issue：https://github.com/PaddlePaddle/Anakin/issues/527

在 https://github.com/PaddlePaddle/Anakin/issues/527 里面 @yiicy 说

感谢指出问题，我们最开始的GEMM借鉴了compute library，现在sgemm.cpp里的接口已经弃用了，统一使用sgemm_prepacked.cpp里的接口，实现上也做了相应的变化，sgemm_prepacked.cpp的xblock将按照l2 cache大小划分

我理解为你们承认了你们使用了 acl 的开源代码，那么请遵守它们的 MIT 协议。没有人关心你们的 block 怎么划分

在同一个 issue 里面，@throneclay 说

没人回复这个就关了，如果有问题可以随时跟我们联系

我和 @izp001 都进一步回复了，请你们也正面给出回应

opened by daquexian 7
TensorRT MobileNet benchmark results

I am curious about how you implement depthwise plugin layer in TRT by yourselves? The MobileNet v1, v2 test results do not make sense to me. Why so little latency difference between batchsize=1 and 2?

opened by chybhao666 3
Bump protobuf from 3.1.0 to 3.15.0 in /tools/external_converter_v2
Bumps protobuf from 3.1.0 to 3.15.0.

Release notes

Sourced from protobuf's releases.

Protocol Buffers v3.15.0

Protocol Compiler

Optional fields for proto3 are enabled by default, and no longer require the --experimental_allow_proto3_optional flag.

C++

MessageDifferencer: fixed bug when using custom ignore with multiple unknown fields

Use init_seg in MSVC to push initialization to an earlier phase.

Runtime no longer triggers -Wsign-compare warnings.

Fixed -Wtautological-constant-out-of-range-compare warning.

DynamicCastToGenerated works for nullptr input for even if RTTI is disabled

Arena is refactored and optimized.

Clarified/specified that the exact value of Arena::SpaceAllocated() is an implementation detail users must not rely on. It should not be used in unit tests.

Change the signature of Any::PackFrom() to return false on error.

Add fast reflection getter API for strings.

Constant initialize the global message instances

Avoid potential for missed wakeup in UnknownFieldSet

Now Proto3 Oneof fields have "has" methods for checking their presence in C++.

Bugfix for NVCC

Return early in _InternalSerialize for empty maps.

Adding functionality for outputting map key values in proto path logging output (does not affect comparison logic) and stop printing 'value' in the path. The modified print functionality is in the MessageDifferencer::StreamReporter.

Fixed protocolbuffers/protobuf#8129

Ensure that null char symbol, package and file names do not result in a crash.

Constant initialize the global message instances

Pretty print 'max' instead of numeric values in reserved ranges.

Removed remaining instances of std::is_pod, which is deprecated in C++20.

Changes to reduce code size for unknown field handling by making uncommon cases out of line.

Fix std::is_pod deprecated in C++20 (#7180)

Fix some -Wunused-parameter warnings (#8053)

Fix detecting file as directory on zOS issue #8051 (#8052)

Don't include sys/param.h for _BYTE_ORDER (#8106)

remove CMAKE_THREAD_LIBS_INIT from pkgconfig CFLAGS (#8154)

Fix TextFormatMapTest.DynamicMessage issue#5136 (#8159)

Fix for compiler warning issue#8145 (#8160)

fix: support deprecated enums for GCC < 6 (#8164)

Fix some warning when compiling with Visual Studio 2019 on x64 target (#8125)

Python

Provided an override for the reverse() method that will reverse the internal collection directly instead of using the other methods of the BaseContainer.

MessageFactory.CreateProtoype can be overridden to customize class creation.

... (truncated)

Commits

ae50d9b Update protobuf version

8260126 Update protobuf version

c741c46 Resovled issue in the .pb.cc files

eef2764 Resolved an issue where NO_DESTROY and CONSTINIT were in incorrect order

0040102 Updated collect_all_artifacts.sh for Ubuntu Xenial

26cb6a7 Delete root-owned files in Kokoro builds

1e924ef Update port_def.inc

9a80cf1 Update coded_stream.h

a97c4f4 Merge pull request #8276 from haberman/php-warning

44cd75d Merge pull request #8282 from haberman/changelog

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 2

fluid2Anakin failed at the beginning

I'm trying to convert fluid model to anakin, but I got an error in the very beginning. Hope you can help me.

Issue:

ERROR: flag 'logtostderr' was defined more than once (in files '/tmp/glog-20171216-78363-w0y7q0/glog-0.3.5/src/logging.cc' and '/Users/python2/yangjiabin/PR/Paddle/build/third_party/glog/src/extern_glog/src/logging.cc').

ps: /Users/python2/yangjiabin/... this is not my path.

Env: laptop: MacOSX 10.12.6 fluid model: MobileNetV1: model+params (pretrained models of paddlepaddle official) Anakin branch: developing (already up to date)

config.yaml:

OPTIONS:
  Framework: FLUID
  SavePath: ./model/mobilenet_v1/output
  ResultName: MobileNetV1_fluid
  Config:
    LaunchBoard: ON
    Server:
      ip: 127.0.0.1
      port: 6007
    OptimizedGraph:
      enable: ON
      path: ./model/mobilenet_v1/output/MobileNetV1_fluid.anakin.bin
  LOGGER:
    LogToPath: ./model/mobilenet_v1/log/
    WithColor: ON

TARGET:
  CAFFE:
    # path to proto files
    ProtoPaths:
      - 
    PrototxtPath: 
    ModelPath: 
    Remark:  # Generally no need to modify.

  FLUID:
    # path of fluid inference model
    Debug: NULL                            # Generally no need to modify.
    ModelPath: /Users/xxx/MobileNetV1/fluid
    NetType:                               # Generally no need to modify.

Any comments will be appreciated. Thanks in advance!

opened by ljayx 2

Anakin docker build error on ubuntu 16.04 for Nvidia-GPU

I followed the anakin docker guide guide.

and issue the command

sudo ./anakin_docker_build_and_run.sh -p NVIDIA-GPU -o Ubuntu -m Build

error occur:

-- cudnn include header is CUDNN_INCLUDE_DIR-NOTFOUND/cudnn.h
-- cudnn library is /usr/lib/x86_64-linux-gnu/libcudnn.so/libcudnn.so
CMake Error at cmake/cuda.cmake:126 (message):
  Could not find cudnn library in:
Call Stack (most recent call first):
  cmake/gather.cmake:16 (anakin_find_cudnn)
  CMakeLists.txt:178 (include)


-- Found CUDA: /usr/local/cuda (found suitable version "8.0", minimum required is "7.5") 
-- Building fat-bin for cuda code !
--  `--support arch :  3.5
--  `--support arch :  5.0
--  `--support arch :  6.0
--  `--support arch :  6.1
-- Looking for include file pthread.h
-- Looking for include file pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Found PROTOBUF: /usr/local/lib/libprotobuf.so  
-- Found protobuf in /usr/local/include
-- 
-- ================================ configuration ==================================
-- General:
--   anakin version            : 0.1.2
--   System                    : Linux
--   C++ compiler              : /usr/bin/c++
--   C flags                   : 
--   CXX flags                 :  -std=c++11 -fPIC -ldl -lrt -W  -pthread    -Wno-unused-variable      -fpermissive  -fdiagnostics-show-option -Wno-undef -Wno-narrowing -Wno-unknown-pragmas -Wno-delete-non-virtual-dtor -Wno-comment -Wno-sign-compare -Wno-ignored-qualifiers -Wno-enum-compare -O3 -DNDEBUG
--   Build type                : Release;FORCE
--   Build cross plantform     : ON
--   Build anakin fp32         : ON
-- 
--   Build shared libs         : YES
--   Build with unit test      : YES
-- 
--   Enable verbose message    : NO
--   Enable noisy warnings     : NO
--   Disable all warnings      : YES
-- 
--   Use local logger          : logger
--   Use google protobuf       : ON
--   Use local Unit test       : aktest
--   USE_OPENCV                : OFF
--   USE_BOOST                 : OFF
--   USE_OPENMP                : 
-- 
-- Cuda:
--   USE_CUDA                  : ON
--     |--CUDA version         : 8.0
--     `--NVCC flags           :   -Xcompiler -fPIC -O3 -std=c++11 --default-stream per-thread -Wno-deprecated-gpu-targets  --generate-code arch=compute_35,code=sm_35  --generate-code arch=compute_50,code=sm_50  --generate-code arch=compute_60,code=sm_60  --generate-code arch=compute_61,code=sm_61 
--   USE_CUBLAS                : ON
--   USE_CURAND                : ON
--   USE_CUFFT                 : ON
--   USE_CUDNN                 : ON
--     `--Cudnn version        : 
-- 
--   USE_OPENCL                : OFF
-- 
--   SELECT_GPU_PLACE          : YES
-- 
--   Configuation path         : /Anakin/gpu_build_sm61/anakin_config.h
-- ================================ End ==================================
-- Configuring incomplete, errors occurred!
See also "/Anakin/gpu_build_sm61/CMakeFiles/CMakeOutput.log".
See also "/Anakin/gpu_build_sm61/CMakeFiles/CMakeError.log".
make: *** No targets specified and no makefile found.  Stop.
sha256:694bca37cbee627f47bfc46e71af099147bb61045fddf6ec3abd1beeab69d00d

opened by kezunlin 2

load saved model failed

WAN| 08:46:02.00618| 6.902s| main_thread| parser.cpp:138] Parsing in edges of node : relu5_4 FTL| 08:46:02.00618| 6.902s| main_thread| graph_base.inl:57] Check failed: (this->has_vertex(arc.bottom()) && this->has_vertex(arc.top())) The arc's top or bottom is not vertex! *** Check failed: (this->has_vertex(arc.bottom()) && this->has_vertex(arc.top())) fatal error: stack trace: *** 6 0x4a3329 ./anakin_demo() [0x4a3329] 5 0x7febe6ff2830 __libc_start_main + 240 [??:0] 4 0x4a420b ./anakin_demo() [0x4a420b] 3 0x7febe8878e96 anakin::graph::Graph<anakin::saber::TargetType<(anakin::saber::TargetTypeEnum)1>, (anakin::saber::DataType)1, (anakin::Precision)0>::load(std::string) + 202 [??:0] 2 0x7febe877e99a anakin::Status anakin::parser::load<anakin::saber::TargetType<(anakin::saber::TargetTypeEnum)1>, (anakin::saber::DataType)1, (anakin::Precision)0>(anakin::graph::Graph<anakin::saber::TargetType<(anakin::saber::TargetTypeEnum)1>, (anakin::saber::DataType)1, (anakin::Precision)0>, std::string&) + 69 [??:0] 1 0x7febe877f4d2 anakin::Status anakin::parser::load<anakin::saber::TargetType<(anakin::saber::TargetTypeEnum)1>, (anakin::saber::DataType)1, (anakin::Precision)0>(anakin::graph::Graph<anakin::saber::TargetType<(anakin::saber::TargetTypeEnum)1>, (anakin::saber::DataType)1, (anakin::Precision)0>, char const*) + 2656 [??:0] 0 0x4e5cbd anakin::graph::GraphBase<std::string, std::shared_ptr<anakin::graph::Node<anakin::saber::TargetType<(anakin::saber::TargetTypeEnum)1>, (anakin::saber::DataType)1, (anakin::Precision)0> >, anakin::saber::Tensor<anakin::saber::TargetType<(anakin::saber::TargetTypeEnum)1>, (anakin::saber::DataType)1, anakin::saber::NCHW>*, anakin::graph::Edge<anakin::saber::TargetType<(anakin::saber::TargetTypeEnum)1>, (anakin::saber::DataType)1> >::add_in_arc(anakin::graph::Edge<anakin::saber::TargetType<(anakin::saber::TargetTypeEnum)1>, (anakin::saber::DataType)1>&) + 451 [??:?]

opened by adeagle 2
Support for different input shapes between frames

Hi, I have a new question,

Does Anakin support “variadic” input shapes? For example, in frame 1, I have one standard 224224 image as input. But in frame 2, I have another image of any arbitrary size like 456789. I would like to know if Anakin could support.

Thank you~ ~

opened by hxbloom 2
Support and benchmark for int8

Hi!, I am interested in the Anakin project.

I would like to know if there are any supports & benchmark results for int8? If not, will there be any? thx!

opened by hxbloom 2
ubuntu 16.04 build fail on gcc 4.8.5

based on ubuntu Dockerfile, and changed gcc version to 4.8.5. error message as follows.

Linking CXX executable ../../output/unit_test/test_saber_buffer_NV /usr/bin/ld: ../../output/libanakin.so.2.0.1: undefined reference to symbol 'cudnnGetRNNLinLayerBiasParams' /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/libcudnn.so: error adding symbols: DSO missing from command line collect2: error: ld returned 1 exit status test/CMakeFiles/test_TargetWrapper.dir/build.make:93: recipe for target '../output/unit_test/test_TargetWrapper' failed make[2]: *** [../output/unit_test/test_TargetWrapper] Error 1 CMakeFiles/Makefile2:757: recipe for target 'test/CMakeFiles/test_TargetWrapper.dir/all' failed make[1]: *** [test/CMakeFiles/test_TargetWrapper.dir/all] Error 2
bug

opened by micytw 2
转换TensorFlow模型时不支持String类型Node

使用convert.py转换TensorFlow（1.12版本）模型时报错如下： tensor.type=7 Traceback (most recent call last): File "converter.py", line 113, in graph = Graph(config) File "/data0/xikali/tangram_run/anakin/tools/external_converter_v2/parser/graph.py", line 46, in init self.graph_io = self.parser() File "/data0/xikali/tangram_run/anakin/tools/external_converter_v2/parser/tensorflow/parser_tf.py", line 27, in call med_graph = self._conver_tf_2_med() File "/data0/xikali/tangram_run/anakin/tools/external_converter_v2/parser/tensorflow/parser_tf.py", line 41, in _conver_tf_2_med return parser.parse() File "/data0/xikali/tangram_run/anakin/tools/external_converter_v2/parser/tensorflow/parse_tf_2_med.py", line 201, in parse nodes = self._parse_tf_node(tf_graph, {}) File "/data0/xikali/tangram_run/anakin/tools/external_converter_v2/parser/tensorflow/parse_tf_2_med.py", line 73, in _parse_tf_node anakin_tensor = tf_to_anakin_tensor(node.get_attr(a)) File "/data0/xikali/tangram_run/anakin/tools/external_converter_v2/parser/tensorflow/tf_trans_util.py", line 94, in tf_to_anakin_tensor new_type = TF_TO_ANAKIN_DTYPE[tensor.dtype] KeyError: 7 看上去应该是无法支持tensor.dtype是string类型的？已经删除了所有string类型的特征，但是tf中Assert的op一定是包含string类型的tensor，这个如何解决呢？

opened by liuliuniu 1

can't convert from caffe model.

my config:

> OPTIONS:
>     Framework: CAFFE
>     SavePath: ./output
>     ResultName: face_r100
>     Config:
>         LaunchBoard: ON
>         Server:
>             ip: 0.0.0.0
>             port: 8888
>         OptimizedGraph:
>             enable: OFF
>             path: ./googlenet.paddle_inference_model.bin.saved
>     LOGGER:
>         LogToPath: ./log/
>         WithColor: ON
> 
> TARGET:
>     CAFFE:
>         # path of fluid inference model
>         Debug: NULL                            # Generally no need to modify.
>         PrototxtPath: ./model/model.prototxt        # The upper path of a fluid inference model.
>         ModelPath: ./model/model.caffmodel        # The upper path of a fluid inference model.
>         NetType:

Traceback (most recent call last): File "converter.py", line 79, in graph = Graph(config) File "/root/Anakin/tools/external_converter_v2/parser/graph.py", line 26, in init raise NameError('ERROR: GrapProtoIO not support %s model.' % (config.framework)) NameError: ERROR: GrapProtoIO not support CAFFE model.

opened by szad670401 1

Bump protobuf from 3.1.0 to 3.18.3 in /tools/external_converter_v2
Bumps protobuf from 3.1.0 to 3.18.3.

Release notes

Sourced from protobuf's releases.

Protocol Buffers v3.18.3

C++

Reduce memory consumption of MessageSet parsing

This release addresses a Security Advisory for C++ and Python users

Protocol Buffers v3.16.1

Java

Improve performance characteristics of UnknownFieldSet parsing (#9371)

Protocol Buffers v3.18.2

Java

Improve performance characteristics of UnknownFieldSet parsing (#9371)

Protocol Buffers v3.18.1

Python

Update setup.py to reflect that we now require at least Python 3.5 (#8989)

Performance fix for DynamicMessage: force GetRaw() to be inlined (#9023)

Ruby

Update ruby_generator.cc to allow proto2 imports in proto3 (#9003)

Protocol Buffers v3.18.0

C++

Fix warnings raised by clang 11 (#8664)

Make StringPiece constructible from std::string_view (#8707)

Add missing capability attributes for LLVM 12 (#8714)

Stop using std::iterator (deprecated in C++17). (#8741)

Move field_access_listener from libprotobuf-lite to libprotobuf (#8775)

Fix #7047 Safely handle setlocale (#8735)

Remove deprecated version of SetTotalBytesLimit() (#8794)

Support arena allocation of google::protobuf::AnyMetadata (#8758)

Fix undefined symbol error around SharedCtor() (#8827)

Fix default value of enum(int) in json_util with proto2 (#8835)

Better Smaller ByteSizeLong

Introduce event filters for inject_field_listener_events

Reduce memory usage of DescriptorPool

For lazy fields copy serialized form when allowed.

Re-introduce the InlinedStringField class

v2 access listener

Reduce padding in the proto's ExtensionRegistry map.

GetExtension performance optimizations

Make tracker a static variable rather than call static functions

Support extensions in field access listener

Annotate MergeFrom for field access listener

Fix incomplete types for field access listener

Add map_entry/new_map_entry to SpecificField in MessageDifferencer. They record the map items which are different in MessageDifferencer's reporter.

Reduce binary size due to fieldless proto messages

TextFormat: ParseInfoTree supports getting field end location in addition to start.

... (truncated)

Commits

a902b39 No-op whitespace change

ae62acd Updating version.json and repo version numbers to: 18.3

f43ac49 Merge pull request #10542 from deannagarcia/3.18.x

9efdf55 Add missing includes

d1635e1 Apply patch

5b37c91 Update version.json with "lts": true (#10534)

c39d622 Merge pull request #10529 from protocolbuffers/deannagarcia-patch-5

f77d3b6 Update version.json

8178b06 Merge pull request #10503 from deannagarcia/3.18.x

24ca839 Add version file

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 1
Security Vulnerability Found
Absolute Path Traversal due to incorrect use of send_file call

A path traversal attack (also known as directory traversal) aims to access files and directories that are stored outside the web root folder. By manipulating variables that reference files with “dot-dot-slash (../)” sequences and its variations or by using absolute file paths, it may be possible to access arbitrary files and directories stored on file system including application source code or configuration and critical system files. This attack is also known as “dot-dot-slash”, “directory traversal”, “directory climbing” and “backtracking”.

Root Cause Analysis

The os.path.join call is unsafe for use with untrusted input. When the os.path.join call encounters an absolute path, it ignores all the parameters it has encountered till that point and starts working with the new absolute path. Please see the example below.

>>> import os.path >>> static = "path/to/mySafeStaticDir" >>> malicious = "/../../../../../etc/passwd" >>> os.path.join(t,malicious) '/../../../../../etc/passwd'

Since the "malicious" parameter represents an absolute path, the result of os.path.join ignores the static directory completely. Hence, untrusted input is passed via the os.path.join call to flask.send_file can lead to path traversal attacks.

In this case, the problems occurs due to the following code : https://github.com/PaddlePaddle/Anakin/blob/5fd68a6cc4c4620cd1a30794c1bf06eebd3f4730/docs/api_on_web/init.py#L26

Here, the filename parameter is attacker controlled. This parameter passes through the unsafe os.path.join call making the effective directory and filename passed to the send_file call attacker controlled. This leads to a path traversal attack.

Proof of Concept

The bug can be verified using a proof of concept similar to the one shown below.

curl --path-as-is 'http://<domain>///../../../../etc/passwd"'

Remediation

This can be fixed by preventing flow of untrusted data to the vulnerable send_file function. In case the application logic necessiates this behaviour, one can either use the werkzeug.utils.safe_join to join untrusted paths or replace flask.send_file calls with flask.send_from_directory calls.

References

OWASP Path Traversal

github/securitylab#669

This bug was found using CodeQL by Github
opened by porcupineyhairs 0

Releases(v0.1.1)

v0.1.1(Aug 13, 2018)
fix known conv fusion(with activation, pool) bugs

fix known models bugs

Source code(tar.gz)
Source code(zip)
anakin.0.1.1.sm50.tgz(7.32 MB)
anakin.0.1.1.sm61.tgz(7.32 MB)
anakin_0_1_1_non_avx_v2.tar.gz(28.52 MB)
anakin_0_1_1_sm61_non_avx_v2.tar.gz(58.07 MB)
v0.1.0(Jul 3, 2018)

Source code(tar.gz)
Source code(zip)

High performance Cross-platform Inference-engine, you could run Anakin on x86-cpu,arm, nv-gpu, amd-gpu,bitmain and cambricon devices.

Related tags

Overview

Anakin2.0

Features

NV GPU Benchmark

Machine And Enviornment

VGG16

Resnet50

Resnet101

X86 CPU Benchmark

Machine And Enviornment

ARM CPU Benchmark

Machine And Enviornment

ARMv8 TEST

ARMv7 TEST

Documentation

Ask Questions

Copyright and License

Acknowledgement

Comments

Protocol Buffers v3.15.0

Protocol Compiler

C++

Python

Protocol Buffers v3.18.3

C++

Protocol Buffers v3.16.1

Java

Protocol Buffers v3.18.2

Java

Protocol Buffers v3.18.1

Python

Ruby

Protocol Buffers v3.18.0

C++

Absolute Path Traversal due to incorrect use of send_file call

Root Cause Analysis

Proof of Concept

Remediation

References

This bug was found using CodeQL by Github

Releases(v0.1.1)

v0.1.1(Aug 13, 2018)

v0.1.0(Jul 3, 2018)

Owner

CPU inference engine that delivers unprecedented performance for sparse models

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Control-Robot-Arm-using-PS4-Controller - A Robotic Arm based on Raspberry Pi and Arduino that controlled by PS4 Controller

Monocular 3D pose estimation. OpenVINO. CPU inference or iGPU (OpenCL) inference.

PPLNN is a Primitive Library for Neural Network is a high-performance deep-learning inference engine for efficient AI inferencing

A Genetic Programming platform for Python with TensorFlow for wicked-fast CPU and GPU support.

A fast poisson image editing implementation that can utilize multi-core CPU or GPU to handle a high-resolution image input.

PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices.

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

Torchserve server using a YoloV5 model running on docker with GPU and static batch inference to perform production ready inference.

Code to run experiments in SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression.

LightSeq is a high performance training and inference library for sequence processing and generation implemented in CUDA

A modified version of DeepMind's Alphafold2 to divide CPU part (MSA and template searching) and GPU part (prediction model)

A simplistic and efficient pure-python neural network library from Phys Whiz with CPU and GPU support.

A modular, research-friendly framework for high-performance and inference of sequence models at many scales

X-modaler is a versatile and high-performance codebase for cross-modal analytics.

Compute execution plan: A DAG representation of work that you want to get done. Individual nodes of the DAG could be simple python or shell tasks or complex deeply nested parallel branches or embedded DAGs themselves.

A tutorial showing how to train, convert, and run TensorFlow Lite object detection models on Android devices, the Raspberry Pi, and more!

Absolute Path Traversal due to incorrect use of `send_file` call