Intel® Nervana™ reference deep learning framework committed to best performance on all hardware

Overview

DISCONTINUATION OF PROJECT. This project will no longer be maintained by Intel. Intel will not provide or guarantee development of or support for this project, including but not limited to, maintenance, bug fixes, new releases or updates. Patches to this project are no longer accepted by Intel. If you have an ongoing need to use this project, are interested in independently developing it, or would like to maintain patches for the community, please create your own fork of the project.

neon

neon is Intel's reference deep learning framework committed to best performance on all hardware. Designed for ease-of-use and extensibility.

For fast iteration and model exploration, neon has the fastest performance among deep learning libraries (2x speed of cuDNNv4, see benchmarks).

  • 2.5s/macrobatch (3072 images) on AlexNet on Titan X (Full run on 1 GPU ~ 26 hrs)
  • Training VGG with 16-bit floating point on 1 Titan X takes ~10 days (original paper: 4 GPUs for 2-3 weeks)

We use neon internally at Intel Nervana to solve our customers' problems across many domains. We are hiring across several roles. Apply here!

See the new features in our latest release. We want to highlight that neon v2.0.0+ has been optimized for much better performance on CPUs by enabling Intel Math Kernel Library (MKL). The DNN (Deep Neural Networks) component of MKL that is used by neon is provided free of charge and downloaded automatically as part of the neon installation.

Quick Install

On a Mac OSX or Linux machine, enter the following to download and install neon (conda users see the guide), and use it to train your first multi-layer perceptron. To force a python2 or python3 install, replace make below with either make python2 or make python3.

    git clone https://github.com/NervanaSystems/neon.git
    cd neon
    make
    . .venv/bin/activate

Starting after neon v2.2.0, the master branch of neon will be updated weekly with work-in-progress toward the next release. Check out a release tag (e.g., "git checkout v2.2.0") for a stable release. Or simply check out the "latest" release tag to get the latest stable release (i.e., "git checkout latest")

From version 2.4.0, we re-enabled pip install. Neon can be installed using package name nervananeon.

    pip install nervananeon

It is noted that aeon needs to be installed separately. The latest release v2.6.0 uses aeon v1.3.0.

Warning

Between neon v2.1.0 and v2.2.0, the aeon manifest file format has been changed. When updating from neon < v2.2.0 manifests have to be recreated using ingest scripts (in examples folder) or updated using this script.

Use a script to run an example

    python examples/mnist_mlp.py 

Selecting a backend engine from the command line

The gpu backend is selected by default, so the above command is equivalent to if a compatible GPU resource is found on the system:

    python examples/mnist_mlp.py -b gpu

When no GPU is available, the optimized CPU (MKL) backend is now selected by default as of neon v2.1.0, which means the above command is now equivalent to:

    python examples/mnist_mlp.py -b mkl

If you are interested in comparing the default mkl backend with the non-optimized CPU backend, use the following command:

    python examples/mnist_mlp.py -b cpu

Use a yaml file to run an example

Alternatively, a yaml file may be used run an example.

    neon examples/mnist_mlp.yaml

To select a specific backend in a yaml file, add or modify a line that contains backend: mkl to enable mkl backend, or backend: cpu to enable cpu backend. The gpu backend is selected by default if a GPU is available.

Recommended Settings for neon with MKL on Intel Architectures

The Intel Math Kernel Library takes advantages of the parallelization and vectorization capabilities of Intel Xeon and Xeon Phi systems. When hyperthreading is enabled on the system, we recommend the following KMP_AFFINITY setting to make sure parallel threads are 1:1 mapped to the available physical cores.

    export OMP_NUM_THREADS=<Number of Physical Cores>
    export KMP_AFFINITY=compact,1,0,granularity=fine  

or

    export OMP_NUM_THREADS=<Number of Physical Cores>
    export KMP_AFFINITY=verbose,granularity=fine,proclist=[0-<Number of Physical Cores>],explicit

For more information about KMP_AFFINITY, please check here. We encourage users to set out trying and establishing their own best performance settings.

Documentation

The complete documentation for neon is available here. Some useful starting points are:

Support

For any bugs or feature requests please:

  1. Search the open and closed issues list to see if we're already working on what you have uncovered.
  2. Check that your issue/request hasn't already been addressed in our Frequently Asked Questions (FAQ) or neon-users Google group.
  3. File a new issue or submit a new pull request if you have some code you'd like to contribute

For other questions and discussions please post a message to the neon-users Google group

License

We are releasing neon under an open source Apache 2.0 License. We welcome you to contact us with your use cases.

Comments
  • MKL backend performance regression with some topologies

    MKL backend performance regression with some topologies

    Hello! I use neon to train a model on three backends: CPU, MKL and GPU. All the settings are the same when running with these backends. I got very similar costs from CPU and GPU, while the cost from MKL backend was usually higher than the previous two, sometimes even nan. Anybody has an idea why does that happen?

    The CPU is an Intel i7; the GPU is a Nvidia GTX 1050; the code is running on Ubuntu 16.04. Here is the printed result of the code...

    Use cpu as backend.
    
    DISPLAY:neon:-------------------------------------------------------------------------------------
    DISPLAY:neon:|    Func     |    Mean     |   Median    |     Min     |     Max     |    Units    |
    DISPLAY:neon:-------------------------------------------------------------------------------------
    DISPLAY:neon:| fprop       |  456.74     |  452.61     |  439.07     |  501.7      |    msec     |
    DISPLAY:neon:| bprop       |  819.21     |  796.45     |  772.53     |  979.8      |    msec     |
    DISPLAY:neon:| iteration   |  1276       |  1250       |  1213.5     |  1457       |    msec     |
    DISPLAY:neon:-------------------------------------------------------------------------------------
    
    Epoch 0   [Train |████████████████████|  246/246  batches, 3.51 cost, 303.30s] [CrossEntropyMulti Loss 0.00, 0.00s]
    Epoch 1   [Train |████████████████████|  245/245  batches, 3.49 cost, 301.14s] [CrossEntropyMulti Loss 0.00, 0.00s]
    Epoch 2   [Train |████████████████████|  245/245  batches, 3.47 cost, 301.43s] [CrossEntropyMulti Loss 0.00, 0.00s]
    Epoch 3   [Train |████████████████████|  245/245  batches, 3.46 cost, 302.56s] [CrossEntropyMulti Loss 0.00, 0.00s]
    Epoch 4   [Train |████████████████████|  245/245  batches, 3.44 cost, 302.91s] [CrossEntropyMulti Loss 0.00, 0.00s]
    Neon training finishes in 1646.99 seconds.
    Misclassification error = 91.2%. Finished in 26.86 seconds.
    Top 3 Misclassification error = 78.1%. Finished in 27.36 seconds.
    Top 5 Misclassification error = 65.7%. Finished in 27.36 seconds.
    Misclassification error = 91.7% on test set. Finished in 43.54 seconds.
    Top 3 Misclassification error = 79.8% on test set. Finished in 43.60 seconds.
    Top 5 Misclassification error = 67.3% on test set. Finished in 43.76 seconds.
    
    
    Use mkl as backend.
    
    DISPLAY:neon:-------------------------------------------------------------------------------------
    DISPLAY:neon:|    Func     |    Mean     |   Median    |     Min     |     Max     |    Units    |
    DISPLAY:neon:-------------------------------------------------------------------------------------
    DISPLAY:neon:| fprop       |  119.82     |  120.03     |  111.14     |  130.82     |    msec     |
    DISPLAY:neon:| bprop       |  157.51     |  156.32     |  151.81     |  165.86     |    msec     |
    DISPLAY:neon:| iteration   |  277.33     |  280.49     |  264.03     |  285.16     |    msec     |
    DISPLAY:neon:-------------------------------------------------------------------------------------
    
    Epoch 0   [Train |████████████████████|  246/246  batches, 48.12 cost, 70.76s] [CrossEntropyMulti Loss 0.00, 0.00s]
    Epoch 1   [Train |████████████████████|  245/245  batches, 47.54 cost, 73.94s] [CrossEntropyMulti Loss 0.00, 0.00s]
    Epoch 2   [Train |████████████████████|  245/245  batches, 48.52 cost, 77.99s] [CrossEntropyMulti Loss 0.00, 0.00s]
    Epoch 3   [Train |████████████████████|  245/245  batches, 48.09 cost, 74.04s] [CrossEntropyMulti Loss 0.00, 0.00s]
    Epoch 4   [Train |████████████████████|  245/245  batches, 48.20 cost, 79.86s] [CrossEntropyMulti Loss 0.00, 0.00s]
    Neon training finishes in 422.74 seconds.
    Misclassification error = 94.6%. Finished in 9.29 seconds.
    Top 3 Misclassification error = 90.1%. Finished in 9.56 seconds.
    Top 5 Misclassification error = 85.6%. Finished in 9.78 seconds.
    Misclassification error = 94.5% on test set. Finished in 15.48 seconds.
    Top 3 Misclassification error = 90.0% on test set. Finished in 15.47 seconds.
    Top 5 Misclassification error = 85.5% on test set. Finished in 14.99 seconds.
    
    
    Use gpu as backend.
    
    DISPLAY:neon:-------------------------------------------------------------------------------------
    DISPLAY:neon:|    Func     |    Mean     |   Median    |     Min     |     Max     |    Units    |
    DISPLAY:neon:-------------------------------------------------------------------------------------
    DISPLAY:neon:| fprop       |  6.1057     |  6.0366     |  5.8992     |  6.3699     |    msec     |
    DISPLAY:neon:| bprop       |  10.76      |  10.753     |  9.9809     |  11.841     |    msec     |
    DISPLAY:neon:| iteration   |  16.865     |  16.783     |  15.88      |  18.185     |    msec     |
    DISPLAY:neon:-------------------------------------------------------------------------------------
    
    Epoch 0   [Train |████████████████████|  246/246  batches, 3.51 cost, 3.98s] [CrossEntropyMulti Loss 0.00, 0.00s]
    Epoch 1   [Train |████████████████████|  245/245  batches, 3.48 cost, 3.97s] [CrossEntropyMulti Loss 0.00, 0.00s]
    Epoch 2   [Train |████████████████████|  245/245  batches, 3.47 cost, 3.98s] [CrossEntropyMulti Loss 0.00, 0.00s]
    Epoch 3   [Train |████████████████████|  245/245  batches, 3.46 cost, 3.98s] [CrossEntropyMulti Loss 0.00, 0.00s]
    Epoch 4   [Train |████████████████████|  245/245  batches, 3.44 cost, 3.98s] [CrossEntropyMulti Loss 0.00, 0.00s]
    Neon training finishes in 21.84 seconds.
    Misclassification error = 91.2%. Finished in 0.38 seconds.
    Top 3 Misclassification error = 78.0%. Finished in 0.38 seconds.
    Top 5 Misclassification error = 65.6%. Finished in 0.38 seconds.
    Misclassification error = 91.6% on test set. Finished in 0.60 seconds.
    Top 3 Misclassification error = 79.8% on test set. Finished in 0.60 seconds.
    Top 5 Misclassification error = 67.4% on test set. Finished in 0.60 seconds.
    
    opened by moderato 24
  • Prediction drops to 0 after certain number of epochs

    Prediction drops to 0 after certain number of epochs

    I'm using Neon for my deep Q-learning code - https://github.com/tambetm/simple_dqn. Recently I noticed an issue, that prediction of my network drops to 0 after certain number of epochs. This can be seen from Q-value graph: breakout_neon_latest_meanq

    Normally it would look like this: breakout_lives_meanq This plot was produced using Neon commit hash 7a56fa9645a51e97c05f2e5afbbd1df7057ae832 from October 30th. My code is exactly the same.

    The most plausible explanation would be, that weights are truncated to 0 at some point. Because my code hasn't changed, I suspect something in Neon code related to saving and loading weights repeatedly. In my code I need to clone a model, and simplest (and most compatible) way of doing that is to just save and load the model. I do this ~45 times before network prediction drops (it doesn't drop always at the same moment).

    Any ideas what change could have resulted in such a behavior and how to debug it?

    bug 
    opened by tambetm 24
  • Running mnist-small.yaml example after setup - getting error

    Running mnist-small.yaml example after setup - getting error

    Hi, Just installed neon on Ubuntu 14 python 3.4 with the following command: nir@nir-Satellite-Pro-A50-A:~/neon$ neon examples/mlp/mnist-small.yaml and getting an error message: Traceback (most recent call last): File "/home/nir/anaconda3/bin/neon", line 240, in experiment, result, status = main() File "/home/nir/anaconda3/bin/neon", line 126, in main experiment = deserialize(args.yaml_file) File "/home/nir/anaconda3/lib/python3.4/site-packages/neon/util/persist.py", line 183, in deserialize if not isinstance(load_path, file): NameError: name 'file' is not defined I check in the directory - this file exists. Appreciate your assistance thanks N

    opened by nirre1401 18
  • Updated Docker images

    Updated Docker images

    I've updated my Docker builds for version 1.0 - one for the cpu backend, and a new one for the gpu backend. The GPU images referenced in the 0.9 docs are still available, but with a note about deprecation.

    I've tested the cpu version with neon examples/mnist_mlp.yaml and python examples/mnist_mlp.py, and it appears fine. However, the gpu version builds the cpu version because of https://github.com/NervanaSystems/neon/issues/83. When building the code to check for the GPU capabilities, please keep https://github.com/NervanaSystems/neon/issues/19 in mind.

    opened by Kaixhin 17
  • Enable gpu error

    Enable gpu error

    I install cuda and environment set,

    nvidia-smi 
    Wed Dec  6 13:53:24 2017       
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 384.81                 Driver Version: 384.81                    |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |===============================+======================+======================|
    |   0  Tesla P100-PCIE...  Off  | 00000000:2F:00.0 Off |                    0 |
    | N/A   33C    P0    31W / 250W |  15553MiB / 16276MiB |      0%      Default |
    +-------------------------------+----------------------+----------------------+
    |   1  Tesla P100-PCIE...  Off  | 00000000:86:00.0 Off |                    0 |
    | N/A   36C    P0    31W / 250W |  15479MiB / 16276MiB |      0%      Default |
    +-------------------------------+----------------------+----------------------+
    
    pip install nervananeon
    
    env | grep PATH
    LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/lib:/usr/lib/python2.7/site-packages/mklml_lnx_2018.0.1.20171007/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/.singularity.d/libs
    PATH=/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
    
    python  mnist.py -b gpu
    python : error: argument -b/--backend: invalid choice: 'gpu' (choose from 'cpu', 'mkl')
    

    I don't understand why CPU and GPU install the same package,It should be used when the GPU package is installed by TensorFlow.

    pip install tensorflow-gpu
    
    opened by yangyang-zhang 12
  • 'magic' is not very descriptive :-D

    'magic' is not very descriptive :-D

    You have this cool function by dividing by integers by using bitshift, and first multiplying by another number, so you're not limited to dividing by powers of 2, as described in https://gmplib.org/~tege/divcnst-pldi94.pdf

    At the moment, this function is called 'magic', but I'm not sure it's very descriptive? I've renamed it to get_div_mul_shift in my own branch: https://github.com/hughperkins/winogradCl/blob/api/winogradcl/util/math_helper.py#L33

    def get_div_mul_shift_32(nmax, d)
    
    opened by hughperkins 11
  • Race condition in softmax-like expressions?

    Race condition in softmax-like expressions?

    There appears to be a race condition in certain expressions, such as the denominator of softmax with axis=1 (unlike in neon.transforms.activation):

    https://gist.github.com/oleg-trott/30b802902fd8c63ce002

    I tried this on several GTX 980 cards and see it on all of them. The errors are rare, about 0.1-0.2%. However, when they happen, they are usually dramatic.

    I also see these errors on GTX 750, but they are about 100 times less frequent.

    opened by oleg-trott 11
  • Create custom layer

    Create custom layer

    How do I start to implement a new layer? I want to implement a few more exotic pooling functions, so basically I want to change the function that takes the max / average and maybe even the gradient for that layer. It would be great to get a step by step recommendation on how to do such things.

    opened by Sowasa 11
  • make  neon error

    make neon error

    $make HAS_GPU=true
    Building MKL Engine...
    which: no icc in (/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin)
    using GNU compiler...
    basename: missing operand
    Try 'basename --help' for more information.
    mkl root: /opt/neon/
    make[1]: Entering directory `/opt/neon/neon/backends/mklEngine'
    make[1]: Leaving directory `/opt/neon/neon/backends/mklEngine'
    make[1]: Entering directory `/opt/neon/neon/backends/mklEngine'
    Building mklEngine.so...
    gcc src/conv.c src/pooling.c src/relu.c src/batchNorm.c src/concat.c src/softmax.c src/MKLDNN.h -shared -o mklEngine.so -std=c99 -O3 -I/opt/neon//include -L/opt/neon//lib -Wl,-rpath=/opt/neon//lib -fopenmp -lmklml_gnu -fPIC -march=native -g -liomp5
    In file included from src/conv.c:15:0:
    src/MKLDNN.h:23:21: fatal error: mkl_dnn.h: No such file or directory
     #include <mkl_dnn.h>
                         ^
    compilation terminated.
    In file included from src/pooling.c:15:0:
    src/MKLDNN.h:23:21: fatal error: mkl_dnn.h: No such file or directory
     #include <mkl_dnn.h>
                         ^
    compilation terminated.
    In file included from src/relu.c:15:0:
    src/MKLDNN.h:23:21: fatal error: mkl_dnn.h: No such file or directory
     #include <mkl_dnn.h>
                         ^
    
    $ bash prepare_mkl.sh
    
    Checking MKLML dependencies...
    Downloading required MKLML version mklml_lnx_2018.0.1.20171007 ...
    MKLML dependencies installed: MKLROOT=/opt/neon/
    basename: missing operand
    Try 'basename --help' for more information.
    /opt/neon/ 1
    

    I need help,thanks

    opened by yangyang-zhang 10
  • Assertion Error loading weights to hidden layers

    Assertion Error loading weights to hidden layers

    Hi guys,

    An assertion error is thrown when I run this line of code: layer.load_weights(params)

    I looked at the source code and it looks like I might be missing an argument in the function called 'self'. Not sure what is meant by self and if I missed the documentation for it I apologize. I know I do not need the load_states argument since it defaults to true.

    Full code here, similar to VGG example: param_layers = [l for l in model.layers.layers] param_dict_list = trained_vgg['model']['config']['layers'] for layer, params in zip(param_layers, param_dict_list): if(layer.name == 'class_layer'): break print(params) layer.load_weights(params)

    opened by pantherso48 10
  • Ubuntu 16.04: `unsupported GNU version! gcc versions later than 4.9 are not supported!`

    Ubuntu 16.04: `unsupported GNU version! gcc versions later than 4.9 are not supported!`

    Ubuntu 16.04: unsupported GNU version! gcc versions later than 4.9 are not supported!

    I have gcc4.9 installed. In Torch, I can select gcc4.9 by doing:

    export CC=gcc-4.9
    export CXX=g++-4.9
    

    Unclear how to do this for Neon?

    opened by hughperkins 10
  •  No module named neon.util.compat

    No module named neon.util.compat

    Hi All,

    I am getting this error when I am running the script that has the following line, from neon.util.compat import range, StringIO. And I get the error that there isn't a module named neon.util.compat. How do I fix this error?

    And help will be appreciated

    opened by chandratejatiriveedhi 0
  • About installation

    About installation

    Hi! my name is keval pandya . I have issue about neon that . how can I install neon on my python in windows of . the python version is 3.8.5. so can you tell me how can I install neon because I am learning deep-learning from intel and there is use of neon framework so please help me fast as possible so I can learn . please provide me solution to install it on windows

    opened by keval2232 1
  • docs: fix simple typo, sclae -> scale

    docs: fix simple typo, sclae -> scale

    There is a small typo in examples/ssd/mboxloss.py.

    Should read scale rather than sclae.

    Semi-automated pull request generated by https://github.com/timgates42/meticulous/blob/master/docs/NOTE.md

    opened by timgates42 0
  • IndexError: index 200 is out of bounds for axis 0 with size 2

    IndexError: index 200 is out of bounds for axis 0 with size 2

    I have been doing the codes as -

    df=pd.DataFrame(adata) df.head()

    xq=adata['batch size'].values.reshape(-1,1) yq=adata['Price'].values.reshape(-1,1)

    mod = smf.quantreg('yq ~ xq', df) res = mod.fit(q=.5) print(res.summary())

    quantiles = [.05, .25, .50, .75, .95] def fit_model(q): res = mod.fit(q=q) return [q, res.params['Intercept'], res.params[xq]] + res.conf_int().ix[xq].tolist() models = [fit_model(xq) for xq in quantiles]

    However am getting the error as IndexError: index 200 is out of bounds for axis 0 with size 2

    opened by krpwn 0
  • IndexError: index 5000 is out of bounds for axis 0 with size 5000

    IndexError: index 5000 is out of bounds for axis 0 with size 5000

    Hi, I am running my capstone project and working on my dataset. When I tried to clean my dataset removing the outliers, I am getting this error. I am attaching the code as below.

    #Removing Outliers #Tukey Method

    import required libraries

    from collections import Counter

    Outlier detection

    def detect_outliers(df,n,features):

    outlier_indices = []
    
    # iterate over features(columns)
    for col in features:
        # 1st quartile (25%)
        Q1 = np.percentile(df[col], 25)
        # 3rd quartile (75%)
        Q3 = np.percentile(df[col],75)
        # Interquartile range (IQR)
        IQR = Q3 - Q1
        
        # outlier step
        outlier_step = 1.5 * IQR
        
        # Determine a list of indices of outliers for feature col
        outlier_list_col = df[(df[col] < Q1 - outlier_step) | (df[col] > Q3 + outlier_step )].index
        
        # append the found outlier indices for col to the list of outlier indices 
        outlier_indices.extend(outlier_list_col)
        
    # select observations containing more than 2 outliers
    outlier_indices = Counter(outlier_indices)        
    multiple_outliers = list( k for k, v in outlier_indices.items() if v > n )
    
    return multiple_outliers   
    

    List of Outliers

    Outliers_to_drop = detect_outliers(data1.drop('Class',axis=1),0,list(data1.drop('Class',axis=1))) data1.drop('Class',axis=1).loc[Outliers_to_drop]

    #Create New Dataset without Outliers good_data = data1.drop(data1.index[Outliers_to_drop]).reset_index(drop = True) good_data.info()


    IndexError Traceback (most recent call last) in 1 #Create New Dataset without Outliers ----> 2 good_data = data1.drop(data1.index[Outliers_to_drop]).reset_index(drop = True) 3 good_data.info()

    ~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in getitem(self, key) 4289 4290 key = com.values_from_object(key) -> 4291 result = getitem(key) 4292 if not is_scalar(result): 4293 return promote(result)

    IndexError: index 5000 is out of bounds for axis 0 with size 5000

    ​Can any one help me to fix this and code it properly.

    opened by venkidevictor 0
  • pip install failed, posix-ipc using sys/time.h failed

    pip install failed, posix-ipc using sys/time.h failed

    pip install nervananeon failed.

    posix_ipc_module.c(37): fatal error C1083: 无法打开包括文件: “sys/time.h” no such file or directory

    Mine is CPU x86-64

    How can I do?

    opened by silkyrose 2
Releases(v2.6.0)
  • v2.6.0(Jan 5, 2018)

  • v2.5.0(Dec 21, 2017)

    • Optimized SSD MKL backend performance (~3X boost version over version)
    • Bumped aeon version to v1.3.0
    • Fixed inference performance issue of MKL batchnorm
    • Fixed batch prediction issue for gpu backend
    • Enabled subset_pct for MNIST_DCGAN example
    • Updated "make clean" to clean up mkl artifacts
    • Added dockerfile for IA mkl
    Source code(tar.gz)
    Source code(zip)
  • v2.4.0(Nov 27, 2017)

    • Enabled pip install through pypi
    • Updated MKLML to version 20171007 with performance improvement of ~3X for mnist datalayer/nondatalayer and ~1.6X for DCGAN/WGAN datalayer
    • Updated resnet model to optimize performance with MKLML 20171007
    • Updated Alexnet weight file and fixed bug for deep dream
    • Fixed faster-rcnn inference model loading issue
    • Added data_loading time measurement and enabled GAN networks benchmarking
    • Updated to Aeon version 1.2.0
    • Enabled neon build with mklEngine on Windows systems
    Source code(tar.gz)
    Source code(zip)
  • v2.3.0(Oct 27, 2017)

    • Optimized DeepSpeech2 MKL backend performance (~7X improvement over the CPU backend)
    • Fused convolution and bias layer which significantly boosted AlexNet and VGG performance on Intel architectures with MKL backend
    • Made SSD and Faster-RNN use VGG weight files in new format
    • Fixed use of reset_cells hyperparameter
    • Fixed MKL backend bug for GAN and Faster-RCNN models
    Source code(tar.gz)
    Source code(zip)
  • v2.2.0(Sep 27, 2017)

    • Update MKLML version 20170908 that fixes a bug related to data conversions
    • Add SSD example for bounding box object detection that works for both GPU and MKL backend
    • Add DeepSpeech2 MKL backend optimization that features ~3X improvement
    • Update aeon to 1.0.0 including new version of manifest (doc/source/loading_data.rst#aeon-dataloader)
    • Add CHWD Support for Batch Normalization in mkl backend
    • Modify ResNet-50 model's last layer to match the original ResNet-50 model paper
    • Enable Seq2Seq testing and benchmarking
    Source code(tar.gz)
    Source code(zip)
  • v2.1.0(Aug 2, 2017)

    • Set MKL backend (-b mkl) as the default CPU backend on Linux (use -b cpu to specify original CPU backend)
    • Update MKLML version 20170720 (AVX512 code paths enabled by default and conversion optimizations)
    • Simplify ResNet example
    • Makefiles now check for virtualenv and pkg-config (NervanaSystems/neon#383)
    • Fix Deep Speech2 model on MKL backend
    • Fix MKL installation for "make sysinstall"
    Source code(tar.gz)
    Source code(zip)
  • v2.0.0(Jun 28, 2017)

    • Added support for MKL backend (-b mkl) on Linux, which boosts neon CPU performance significantly
    • Added WGAN model examples for LSUN and MNIST data
    • Enabled WGAN and DCGAN model examples for Python3
    • Added fix (using file locking) to prevent race conditions running multiple jobs on the same machine with multiple GPUs
    • Added functionality to display some information about hardware, OS and model used
    • Updated appdirs to 1.4.3 to be compatibile on Centos 7.3 for appliance
    Source code(tar.gz)
    Source code(zip)
  • v1.9.0(May 4, 2017)

    • Add support for 3D deconvolution
    • Generative Adversarial Networks (GAN) implementation, and MNIST DCGAN example, following GoodFellow 2014 (http://arXiv.org/abs/1406.2661)
    • Implement Wasserstein GAN cost function and make associated API changes for GAN models
    • Add a new benchmarking script with per-layer timings
    • Add weight clipping for GDM, RMSProp, Adagrad, Adadelta and Adam optimizers
    • Make multicost an explicit choice in mnist_branch.py example
    • Enable NMS kernels to work with normalized boxes and offset
    • Fix missing links in api.rst [#366]
    • Fix docstring for --datatype option to neon [#367]
    • Fix perl shebang in maxas.py and allow for build with numpy 1.12 [#356]
    • Replace os.path.join for Windows interoperability [#351]
    • Update aeon to 0.2.7 to fix a seg fault on termination
    Source code(tar.gz)
    Source code(zip)
  • v1.8.2(Feb 24, 2017)

    • Make the whale calls example stable and shuffle dataset before splitting into subsets
    • Reduce default depth in cifar_msra example to 2
    • Fix the formatting of the conv layer description
    • Fix documentation error in the video-c3d example
    • Support greyscale videos
    Source code(tar.gz)
    Source code(zip)
  • v1.8.1(Jan 18, 2017)

    • Bug fix: Add dilation to object dict and assign defaults to dil_w = dil_h = 1 [#335, #336]
    • Bug fix: Prevent GPU backend from ignoring non-zero slope in Rectlinclip and change default slope to 0
    • Bug fix: Nesterov momentum was updating velocities incorrectly
    Source code(tar.gz)
    Source code(zip)
  • v1.8.0(Dec 28, 2016)

    • Skip Thought Vectors (http://arxiv.org/abs/1506.06726) example
    • Dilated convolution support
    • Nesterov Accelerated Gradient option to SGD optimizer
    • MultiMetric class to allow wrapping Metric classes
    • Support for serializing and deserializing encoder-decoder models
    • Allow specifying the number of time steps to evaluate during beam search
    • A new community-contributed Docker image
    • Improved error messages when a tensor is created with an invalid shape or reshaped to an incompatible size
    • Fix bugs in MultiCost support
    • Documentation fixes [#331]
    Source code(tar.gz)
    Source code(zip)
  • v1.7.0(Nov 21, 2016)

    • Update Data Loader to aeon https://github.com/NervanaSystems/aeon for flexible, multi-threaded data loading and transformations
    • Add Neural Machine Translation model
    • Remove Fast RCNN model (use Faster RCNN model instead)
    • Remove music_genres example
    • Fix super blocking for small N with 1D conv
    • Fix update-direct conv kernel for small N
    • Add gradient clipping to Adam optimizer
    • Documentation updates and bug fixes
    Source code(tar.gz)
    Source code(zip)
  • v1.6.0(Sep 21, 2016)

    • Faster RCNN model
    • Sequence to Sequence container and char_rae recurrent autoencoder model
    • Reshape Layer that reshapes the input [#221]
    • Pip requirements in requirements.txt updated to latest versions [#289]
    • Remove deprecated data loaders and update docs
    • Use NEON_DATA_CACHE_DIR envvar as archive dir to store DataLoader ingested data
    • Eliminate type conversion for FP16 for CUDA compute capability >= 5.2
    • Use GEMV kernels for batch size 1
    • Alter delta buffers for nesting of merge-broadcast layers
    • Support for ncloud real-time logging
    • Add fast_style Makefile target
    • Fix Python 3 builds on Ubuntu 16.04
    • Run setup.py for sysinstall to generate version.py [#282]
    • Fix broken link in mnist docs
    • Fix conv/deconv tests for CPU execution and fix i32 data type
    • Fix for average pooling with batch size 1
    • Change default scale_min to allow random cropping if omitted
    • Fix yaml loading
    • Fix bug with image resize during injest
    • Update references to the ModelZoo and neon examples to their new locations
    Source code(tar.gz)
    Source code(zip)
  • v1.5.4(Jul 15, 2016)

    • Python2/Python3 compatibility [#191]
    • Support for Pascal GPUs
    • Persistent RNN kernels [#262]
    • Implement Binarized Neural Networks from http://arxiv.org/pdf/1602.02830v3.pdf (added in v1.5.4)
    • Dataloader enhancements (audio loader with examples)
    • HDF5 file data iterator
    • Convolution kernel improvements
    • API documentation improvements [#234, #244, #263]
    • Cache directory cleanup
    • Reorganization of all unit tests
    • Bug fixes [#182, #183, #231, #241, #252, #253, #257, #259, #267, #268]
    Source code(tar.gz)
    Source code(zip)
  • v1.5.3(Jul 7, 2016)

    • Python2/Python3 compatibility [#191]
    • Support for Pascal GPUs
    • Persistent RNN kernels [#262]
    • Dataloader enhancements (audio loader with examples)
    • HDF5 file data iterator
    • Convolution kernel improvements
    • API documentation improvements [#234, #244, #263]
    • Cache directory cleanup
    • Reorganization of all unit tests
    • Bug fixes [#182, #183, #231, #241, #252, #253, #257, #259, #267]
    Source code(tar.gz)
    Source code(zip)
  • v1.5.2(Jul 7, 2016)

    • Python2/Python3 compatibility [#191]
    • Support for Pascal GPUs
    • Persistent RNN kernels [#262]
    • Dataloader enhancements (audio loader with examples)
    • HDF5 file data iterator
    • Convolution kernel improvements
    • API documentation improvements [#234, #244, #263]
    • Cache directory cleanup
    • Reorganization of all unit tests
    • Bug fixes [#182, #183, #231, #241, #252, #253, #257, #259]
    Source code(tar.gz)
    Source code(zip)
  • v1.5.1(Jun 30, 2016)

    • Python2/Python3 compatibility [#191]
    • Support for Pascal GPUs
    • Persistent RNN kernels [#262]
    • Dataloader enhancements (audio loader with examples)
    • HDF5 file data iterator
    • Convolution kernel improvements
    • API documentation improvements [#234, #244, #263]
    • Cache directory cleanup
    • Reorganization of all unit tests
    • Bug fixes [#182, #183, #231, #241, #252, #253, #257, #259]
    Source code(tar.gz)
    Source code(zip)
  • v1.4.0(Apr 29, 2016)

    • VGG16 based Fast R-CNN model using winograd kernels
    • new, backward compatible, generic data loader
    • C3D video loader model trained on UCF101 dataset
    • Deep Dream example
    • make conv layer printout more informative [#222]
    • fix some examples to use new arg override capability
    • improve performance for relu for small N
    • better support for arbitrary batch norm layer placement
    • documentation updates [#210, #213, #236]
    Source code(tar.gz)
    Source code(zip)
  • v1.3.0(Mar 4, 2016)

    • winograd kernels and associated autotuning routines
    • benchmarking scripts
    • deprecation of deterministic argument for backend constructor
    • improve batch norm stability with fp16 backend
    • allow strided support for dimshuffle kernel
    • speed up zero momentum gradient descent
    Source code(tar.gz)
    Source code(zip)
  • v1.2.2(Feb 25, 2016)

    • benchmarking enhancements
    • fast dimshuffle, transpose, other kernel speedups and refactoring
    • batch norm states fix, deterministic updates
    • example fixes for fast rcnn and conv_autoencoder
    • image decoding rescaling method fix
    • deserialization fixes for RNN's, refactoring
    • caffe compatibility fixes
    • documentation updates
    Source code(tar.gz)
    Source code(zip)
    neon-1.2.2.tar.gz(532.69 KB)
  • v1.2.1(Feb 5, 2016)

  • v1.2.0(Jan 31, 2016)

    • kepler GPU kernel support [#80]
    • new dataloader format, updated docs [#115, #170]
    • new serialization format
    • FastRCNN implementation, ROI pooling support [#135]
    • deep residual nets implementation and example
    • expanded model zoo
    • Ticker dataset and copy, repeat copy tasks
    • autodiff transpose support [#173]
    • numerous bug fixes and documentation updates.
    Source code(tar.gz)
    Source code(zip)
    neon-1.2.0.tar.gz(467.88 KB)
  • v1.1.5(Jan 14, 2016)

    • CUDA kernels for lookuptable layer (up to 4x speedup)
    • support for determinstic Conv layer updatesa
    • LRN layer support
    • custom dataset walkthrough utilizing bAbI data
    • reduced number of threads in deep reduction EW kernels [#171]
    • additional (de)serialization routines [#106]
    • CPU tensor slicing fix
    • corrections for PrecisionRecall, MultiLabelStats [#148]
    • explicitly specify python2.7 for virtualenv [#155]
    • default to SM50 when no working GPU found [#186]
    • Add alpha to ELU activation [#164]
    • deconv callback fix [#162]
    • various documentation updates [#151, #152]
    Source code(tar.gz)
    Source code(zip)
    neon-1.1.5.tar.gz(468.95 KB)
  • v1.1.4(Dec 15, 2015)

    • Add support for bidirectional RNNs and LSTMs
    • added ELU, leaky ReLU activations
    • significantly faster GPU kernel builds (using ptx instead of cuda-c)
    • data shuffling enhancements, removal of old data loader code.
    • caffe conv, pool, dropout layer matching and compatibility flags
    • add scheduling support for RMSProp
    • callback enhancements, additional unit tests
    • documentation auditing, added links to introductory video tutorials
    Source code(tar.gz)
    Source code(zip)
  • v1.1.3(Dec 1, 2015)

    • deconvolution and weight histogram visualization examples and documentation
    • CPU convolution and pooling layer speedups (~2x faster)
    • bAbI question and answer interactive demo, dataset support.
    • various ImageLoader enhancements.
    • interactive usage improvements (shortcut Callback import, multiple Callbacks init, doc updates, single item batch size support)
    • set default verbosity level to warning
    • CIFAR10 example normalization updates
    • CUDA detection enhancements [#132]
    • only parse batch_writer arguments when used as a script, allow undefined global_mean [#137, #140]
    Source code(tar.gz)
    Source code(zip)
    neon-1.1.3.tar.gz(444.45 KB)
  • v1.1.2(Nov 18, 2015)

    • completely re-written C++ multithreaded dataloader
    • new weight initialization options for recurrent layers
    • Added deconvolution visualization support (guided backprop)
    • new bAbI question answering example network
    • Improved performance of cifar10_allcnn, word_lstm examples
    • new CUDA-C max and avg pooling kernels
    • Additional bugfixes and documentation updates
    Source code(tar.gz)
    Source code(zip)
  • v1.1.1(Nov 6, 2015)

    • Callback initialization bug fix [#127]
    • IMDB LSTM example bug fix [#130]
    • Added cuda-convnet2 style binary dropout variant
    • Added benchmark function to model (separate fprop, bprop, update timings)
    • Remove h_buffer references in lieu of outputs for recurrent layers
    • Multi-cost output buffer bugfix for inference [#131]
    • New timeseries prediction and generation example
    • Change Callback initialization to re-support named arguments. Separate out these arguments in argparser. [#128]
    Source code(tar.gz)
    Source code(zip)
  • v1.1.0(Oct 30, 2015)

    • Sentiment analysis support (LSTM lookupTable based), new IMDB example
    • Support for merge and branch layer stacks via LayerContainers
      • Sequential, Tree, MergeBroadcast, MergeMultiStream
    • Support for freezing layer stacks
    • Adagrad optimizer support
    • new GPU kernels for fast compounding batch norm, conv and pooling engine updates, new kernel build system and flags.
    • Modifications for Caffe support
      • conv, pooling, P/Q updates, dropout layer normalization more in-line with Caffe approach. NOTE: this breaks backwards compatibility with some strided conv/pool related models serialized using older versions of neon as the output sizes may now be different. See the FAQ for more info.
      • serialization enhancements to make caffe model import/export easier
      • use per-channel mean subtraction instead of single global. NOTE: this breaks backwards compatibility with ImgMaster saved datasets prior to this revision. To correct, please use the included update_dataset_cache.py script in the util directory.
    • Default training cost display during progress bar is now calculated on a rolling window basis rather than from the beginning of each epoch
    • Separate Layer configuration and initialization steps
    • YAML based alexnet example
    • Callback enhancements.
      • now pass args instead of having to spell out callbacks in each example
      • Changed validation callback to loss callback, validation_frequency now evaluation_frequency
      • Generic metric callback.
    • Various bug fixes
      • non-contiguous array get for GPUTensors
      • 1D slicing returns 2D matrices
      • bin/neon serialization fixes for RNNs
      • 3D conv fixes for fprop, bprop
      • batch norm inference fix
      • bias layer size fix
    • Documentation updates and improvements
    Source code(tar.gz)
    Source code(zip)
    neon-1.1.0.tar.gz(429.88 KB)
  • v1.0.0(Oct 30, 2015)

    • Ensure root logging handler setup [#82]
    • C++ utility for CUDA compatibility checking [#83]
    • Add predict function to models [#86]
    • Fix bug in learning rate schedule impacting deserialization
    • Speed up batch norm computation
    • Average gradients in OpTree, fix tests
    • Use inference mode for fprop during validation
    • Add top-k misclassifcation metric
    • Simplify maxas install, make vis requirements optional, doc updates.
    Source code(tar.gz)
    Source code(zip)
    neon-1.0.0.tar.gz(431.73 KB)
  • v0.9.0(Jul 20, 2015)

    This release implements support for multi GPU processing using weird trick parallelization (data parallel for local layers, model parallel for fully-connected layers) and cleans up previously existing MPI based parallel code.

    Multi GPU is only supported on newer Maxwell based cards using the NervanaGPU backend.

    Older, Kepler based cards using the cudanet backend are no longer supported (some models and datasets will still work, but others may raise DeprecationWarning's). Users of these cards are encouraged to remain on the 0.8.2 release until we back-port NervanaGPU to support Kepler cards.

    Source code(tar.gz)
    Source code(zip)
    neon-0.9.0.tar.gz(519.07 KB)
Owner
Nervana
Intel® Nervana™ - Artificial Intelligence Products Group
Nervana
PyTorch implementation of CVPR 2020 paper (Reference-Based Sketch Image Colorization using Augmented-Self Reference and Dense Semantic Correspondence) and pre-trained model on ImageNet dataset

Reference-Based-Sketch-Image-Colorization-ImageNet This is a PyTorch implementation of CVPR 2020 paper (Reference-Based Sketch Image Colorization usin

Yuzhi ZHAO 11 Jul 28, 2022
This repository contains numerical implementation for the paper Intertemporal Pricing under Reference Effects: Integrating Reference Effects and Consumer Heterogeneity.

This repository contains numerical implementation for the paper Intertemporal Pricing under Reference Effects: Integrating Reference Effects and Consumer Heterogeneity.

Hansheng Jiang 6 Nov 18, 2022
Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation. Intel iHD GPU (iGPU) support. NVIDIA GPU (dGPU) support.

mtomo Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation.

Katsuya Hyodo 24 Mar 2, 2022
SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems

The SLIDE package contains the source code for reproducing the main experiments in this paper. Dataset The Datasets can be downloaded in Amazon-

Intel Labs 72 Dec 16, 2022
Machine learning and Deep learning models, deploy on telegram (the best social media)

Semi Intelligent BOT The project involves : Classifying fake news Classifying objects such as aeroplane, automobile, bird, cat, deer, dog, frog, horse

MohammadReza Norouzi 5 Mar 6, 2022
Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.

Jittor: a Just-in-time(JIT) deep learning framework Quickstart | Install | Tutorial | Chinese Jittor is a high-performance deep learning framework bas

null 2.7k Jan 3, 2023
A machine learning library for spiking neural networks. Supports training with both torch and jax pipelines, and deployment to neuromorphic hardware.

Rockpool Rockpool is a Python package for developing signal processing applications with spiking neural networks. Rockpool allows you to build network

SynSense 21 Dec 14, 2022
Ivy is a templated deep learning framework which maximizes the portability of deep learning codebases.

Ivy is a templated deep learning framework which maximizes the portability of deep learning codebases. Ivy wraps the functional APIs of existing frameworks. Framework-agnostic functions, libraries and layers can then be written using Ivy, with simultaneous support for all frameworks. Ivy currently supports Jax, TensorFlow, PyTorch, MXNet and Numpy. Check out the docs for more info!

Ivy 8.2k Jan 2, 2023
[ICLR 2021] HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark

HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark Accepted as a spotlight paper at ICLR 2021. Table of content File structure Prerequi

null 72 Jan 3, 2023
A project which aims to protect your privacy using inexpensive hardware and easily modifiable software

Protecting your privacy using an ESP32, an IR sensor and a python script This project, which I personally call the "never-gonna-catch-me-in-the-act-ev

null 8 Oct 10, 2022
Hardware accelerated, batchable and differentiable optimizers in JAX.

JAXopt Installation | Examples | References Hardware accelerated (GPU/TPU), batchable and differentiable optimizers in JAX. Installation JAXopt can be

Google 621 Jan 8, 2023
An exploration of log domain "alternative floating point" for hardware ML/AI accelerators.

This repository contains the SystemVerilog RTL, C++, HLS (Intel FPGA OpenCL to wrap RTL code) and Python needed to reproduce the numerical results in

Facebook Research 373 Dec 31, 2022
Software & Hardware to do multi color printing with Sharpies

3D Print Colorizer is a combination of 3D printed parts and a Cura plugin which allows anyone with an Ender 3 like 3D printer to produce multi colored

null 343 Jan 6, 2023
IDA file loader for UF2, created for the DEFCON 29 hardware badge

UF2 Loader for IDA The DEFCON 29 badge uses the UF2 bootloader, which conveniently allows you to dump and flash the firmware over USB as a mass storag

Kevin Colley 6 Feb 8, 2022
An extremely simple, intuitive, hardware-friendly, and well-performing network structure for LiDAR semantic segmentation on 2D range image. IROS21

FIDNet_SemanticKITTI Motivation Implementing complicated network modules with only one or two points improvement on hardware is tedious. So here we pr

YimingZhao 54 Dec 12, 2022
Code for the USENIX 2017 paper: kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels

kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels Blazing fast x86-64 VM kernel fuzzing framework with performant VM reloads for Linux, MacOS an

Chair for Sys­tems Se­cu­ri­ty 541 Nov 27, 2022
Open source hardware and software platform to build a small scale self driving car.

Donkeycar is minimalist and modular self driving library for Python. It is developed for hobbyists and students with a focus on allowing fast experimentation and easy community contributions.

Autorope 2.4k Jan 4, 2023