PlaidML is a framework for making deep learning work everywhere.

Overview

A platform for making deep learning work everywhere.

Documentation | Installation Instructions | Building PlaidML | Contributing | Troubleshooting | Reporting Issues

License Build status

To Our Users

First off, we’d like to thank you for choosing PlaidML. Whether you’re a new user or a multi-year veteran, we greatly appreciate you for the time you’ve spent tinkering around with our source code, sending us feedback, and improving our codebase. PlaidML would truly not be the same without you.

The feedback we have received from our users indicates an ever-increasing need for performance, programmability, and portability. During the past few months, we have been restructuring PlaidML to address those needs. Below is a summary of the biggest changes:

  • We’ve adopted MLIR, an extensible compiler infrastructure that has gained industry-wide adoption since its release in early 2019. MLIR makes it easier to integrate new software and hardware into our compiler stack, as well as making it easier to write optimizations for our compiler.
  • We’ve worked extensively on Stripe, our low-level intermediate representation within PlaidML. Stripe contains optimizations that greatly improve the performance of our compiler. While our work on Stripe began before we decided to use MLIR, we are in the process of fully integrating Stripe into MLIR.
  • We created our C++/Python embedded domain-specific language (EDSL) to improve the programmability of PlaidML.

Today, we’re announcing a new branch of PlaidML — plaidml-v1. This will act as our development branch going forward and will allow us to more rapidly prototype the changes we’re making without breaking our existing user base. As a precaution, please note that certain features, tests, and hardware targets may be broken in plaidml-v1.

You can continue to use code on the master branch or from our releases on PyPI. For your convenience, the contents of our master branch will be released as version 0.7.0. We are keeping the master branch of PlaidML stable and maintaining it until plaidml-v1 is ready for production.

If you’d like to try out some of PlaidML’s newer performance improvements, you can try running PlaidML with the environment variable PLAIDML_USE_STRIPE=1. This will act as a precursor to the changes you’ll be seeing in plaidml-v1, and we’re excited to hear your feedback on Stripe.

Your support means a lot to us. Thank you for being understanding of our new development process during this new and exciting time for deep learning compilers.


PlaidML is an advanced and portable tensor compiler for enabling deep learning on laptops, embedded devices, or other devices where the available computing hardware is not well supported or the available software stack contains unpalatable license restrictions.

PlaidML sits underneath common machine learning frameworks, enabling users to access any hardware supported by PlaidML. PlaidML supports Keras, ONNX, and nGraph.

As a component within the nGraph Compiler stack, PlaidML further extends the capabilities of specialized deep-learning hardware (especially GPUs,) and makes it both easier and faster to access or make use of subgraph-level optimizations that would otherwise be bounded by the compute limitations of the device.

As a component under Keras, PlaidML can accelerate training workloads with customized or automatically-generated Tile code. It works especially well on GPUs, and it doesn't require use of CUDA/cuDNN on Nvidia hardware, while achieving comparable performance.

PlaidML works on all major operating systems: Linux, macOS, and Windows.

If you are using a hardware target not supported by PlaidML by default, such as Clover, check out the instructions at building PlaidML to build a custom configuration to support your hardware.

Prerequisites

  • Python (v2 supported, v3 recommended)
  • OpenCL 1.2 or greater

Quick Start

See the troubleshooting section for solutions to common issues.

virtualenv plaidml
source plaidml/bin/activate
pip install plaidml-keras plaidbench

Choose which accelerator you'd like to use (many computers, especially laptops, have multiple):

plaidml-setup

Next, try benchmarking MobileNet inference performance:

plaidbench keras mobilenet

Or, try training MobileNet:

plaidbench --batch-size 16 keras --train mobilenet

Installation Instructions

We support a variety of operating systems and installation methods.

Demos and Related Projects

Plaidbench

Plaidbench is a performance testing suite designed to help users compare the performance of different cards and different frameworks.

Hello VGG

One of the great things about Keras is how easy it is to play with state of the art networks. Here's all the code you need to run VGG-19:

#!/usr/bin/env python

import numpy as np
import os
import time

os.environ["KERAS_BACKEND"] = "plaidml.keras.backend"

import keras
import keras.applications as kapp
from keras.datasets import cifar10

(x_train, y_train_cats), (x_test, y_test_cats) = cifar10.load_data()
batch_size = 8
x_train = x_train[:batch_size]
x_train = np.repeat(np.repeat(x_train, 7, axis=1), 7, axis=2)
model = kapp.VGG19()
model.compile(optimizer='sgd', loss='categorical_crossentropy',
              metrics=['accuracy'])

print("Running initial batch (compiling tile program)")
y = model.predict(x=x_train, batch_size=batch_size)

# Now start the clock and run 10 batches
print("Timing inference...")
start = time.time()
for i in range(10):
    y = model.predict(x=x_train, batch_size=batch_size)
print("Ran in {} seconds".format(time.time() - start))

Reporting Issues

Either open a ticket on GitHub or join our slack channel (#plaidml).

CI & Validation

Validated Hardware

A comprehensive set of tests for each release are run against the hardware targets listed below.

  • AMD

    • R9 Nano
    • RX 480
    • Vega 10
  • Intel

    • HD4000
    • HD Graphics 505
  • NVIDIA

    • K80
    • GT 640M
    • GTX 1050
    • GTX 1070

Validated Networks

We support all of the Keras application networks from current versions of 2.x. Validated networks are tested for performance and correctness as part of our continuous integration system.

  • CNNs

    • Inception v3
    • ResNet50
    • VGG19
    • Xception
    • MobileNet
    • DenseNet
    • ShuffleNet
  • LSTM

    • examples/imdb_lstm.py (from keras)
Comments
  • [macOS] model.fit() loss: nan

    [macOS] model.fit() loss: nan

    Ran mnist_cnn.py from keras/examples after adding plaidml as the backend. This issue affects many others, but this is the simplest example.

    Will run fine for a while, then loss will hit nan and acc will plummet until it hits 0, where it stays.

    Andys-iMac-2:examples andy$ python mnist_cnn.py x_train shape: (60000, 28, 28, 1) 60000 train samples 10000 test samples INFO:plaidml:Opening device "amd_radeon_pro_580_compute_engine.0 Train on 60000 samples, validate on 10000 samples Epoch 1/12 59776/60000 [============================>.] - ETA: 0s - loss: 0.3177 - acc: 0.9025INFO:plaidml:Analyzing Ops: 85 of 285 operations complete 60000/60000 [==============================] - 27s - loss: 0.3172 - acc: 0.9026 - val_loss: 0.2699 - val_acc: 0.9217 Epoch 2/12 60000/60000 [==============================] - 18s - loss: 0.1104 - acc: 0.9666 - val_loss: 0.2247 - val_acc: 0.9308 Epoch 3/12 60000/60000 [==============================] - 19s - loss: nan - acc: 0.5408 - val_loss: nan - val_acc: 0.0000e+00 Epoch 4/12 60000/60000 [==============================] - 19s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00 Epoch 5/12 60000/60000 [==============================] - 18s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00 Epoch 6/12 60000/60000 [==============================] - 18s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00 Epoch 7/12 60000/60000 [==============================] - 18s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00 Epoch 8/12 60000/60000 [==============================] - 18s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00 Epoch 9/12 60000/60000 [==============================] - 18s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00 Epoch 10/12 60000/60000 [==============================] - 18s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00 Epoch 11/12 60000/60000 [==============================] - 18s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00 Epoch 12/12 60000/60000 [==============================] - 18s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00 Test loss: nan Test accuracy: 0.0

    opened by andyoneal 28
  • trying to implement ReflectionPadding2D

    trying to implement ReflectionPadding2D

    finally I implemented it in one op for B,H,W,C

    class ReflectionPadding2D(PMLTile.Operation):
        def __init__(self, input, h_pad, w_pad):
            if K.image_data_format() == 'channels_last':
                if input.shape.ndims == 4:
                    H, W = input.shape.dims[1:3]
                    if (type(H) == int and h_pad >= H) or \
                       (type(W) == int and w_pad >= W):
                        raise ValueError("Paddings must be less than dimensions.")
                    c = """ function (I[B, H, W, C] ) -> (O) {{
                            WE = W + {w_pad}*2;
                            HE = H + {h_pad}*2;
                        """.format(h_pad=h_pad, w_pad=w_pad)
                    if w_pad > 0:
                        c += """
                            LEFT_PAD [b, h, w , c : B, H, WE, C ] = =(I[b, h, {w_pad}-w,            c]), w < {w_pad} ;
                            HCENTER  [b, h, w , c : B, H, WE, C ] = =(I[b, h, w-{w_pad},            c]), w < W+{w_pad}-1 ;
                            RIGHT_PAD[b, h, w , c : B, H, WE, C ] = =(I[b, h, 2*W - (w-{w_pad}) -2, c]);
                            LCR = LEFT_PAD+HCENTER+RIGHT_PAD;
                        """.format(h_pad=h_pad, w_pad=w_pad)
                    else:
                        c += "LCR = I;"
                    if h_pad > 0:
                        c += """
                            TOP_PAD   [b, h, w , c : B, HE, WE, C ] = =(LCR[b, {h_pad}-h,            w, c]), h < {h_pad};
                            VCENTER   [b, h, w , c : B, HE, WE, C ] = =(LCR[b, h-{h_pad},            w, c]), h < H+{h_pad}-1 ;
                            BOTTOM_PAD[b, h, w , c : B, HE, WE, C ] = =(LCR[b, 2*H - (h-{h_pad}) -2, w, c]);
                            TVB = TOP_PAD+VCENTER+BOTTOM_PAD;
                        """.format(h_pad=h_pad, w_pad=w_pad)
                    else:
                        c += "TVB = LCR;"
                    c += "O = TVB; }"
                    inp_dims = input.shape.dims
                    out_dims = (inp_dims[0], inp_dims[1]+h_pad*2, inp_dims[2]+w_pad*2, inp_dims[3])
                else:
                    raise NotImplemented
            else:
                raise NotImplemented
            super(ReflectionPadding2D, self).__init__(c, [('I', input) ],
                    [('O', PMLTile.Shape(input.shape.dtype, out_dims ) )])
    

    also I implemented it via slice and concat but I suppose it will consume more VRAM for this? or am I wrong??

    class ReflectionPadding2D():
        def __init__(self, h_pad, w_pad):
            self.h_pad, self.w_pad = h_pad, w_pad
        def __call__(self, inp):
            h_pad, w_pad = self.h_pad, self.w_pad
            if K.image_data_format() == 'channels_last':
                if inp.shape.ndims == 4:
                    w = K.concatenate ([ inp[:,:,w_pad:0:-1,:],
                                         inp,
                                         inp[:,:,-2:-w_pad-2:-1,:] ], axis=2 )
                    h = K.concatenate ([ w[:,h_pad:0:-1,:,:],
                                         w,
                                         w[:,-2:-h_pad-2:-1,:,:] ], axis=1 )
                    return h
                else:
                    raise NotImplemented
            else:
                raise NotImplemented
    
    needs integration 
    opened by iperov 27
  • plaidml.exceptions.PlaidMLError: Could not find PlaidML configuration file:

    plaidml.exceptions.PlaidMLError: Could not find PlaidML configuration file: "experimental.json".

    Traceback (most recent call last): File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.7_3.7.1264.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 193, in run_module_as_main "main", mod_spec) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.7_3.7.1264.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 85, in run_code exec(code, run_globals) File "C:\Users\andre\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\Scripts\plaidml-setup.exe_main.py", line 5, in File "C:\Users\andre\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\plaidml_init.py", line 50, in import plaidml.settings File "C:\Users\andre\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\plaidml\settings.py", line 33, in _setup_config('PLAIDML_EXPERIMENTAL_CONFIG', 'experimental.json') File "C:\Users\andre\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\plaidml\settings.py", line 30, in _setup_config 'Could not find PlaidML configuration file: "{}".'.format(filename)) plaidml.exceptions.PlaidMLError: Could not find PlaidML configuration file: "experimental.json".

    opened by Duddino 26
  • Memory error on Vega 10

    Memory error on Vega 10

    Hi I am trying plaid ml on AMD Vega 10 : gfx900

    I get the following error:

    prj47-rack-06@PRJ47-RACK-06:~/biswa/plaidbench$ python plaidbench.py mobilenet Using PlaidML backend. INFO:plaidml:Initializing device gfx900.0: "gfx900", vendor "Advanced Micro Devi ces, Inc." INFO:plaidml:Initializing device gfx900.1: "gfx900", vendor "Advanced Micro Devi ces, Inc." INFO:plaidml:Initializing device gfx900.2: "gfx900", vendor "Advanced Micro Devi ces, Inc." INFO:plaidml:Initializing device gfx900.3: "gfx900", vendor "Advanced Micro Devi ces, Inc." INFO:plaidml:Opening device "gfx900.3": "Advanced Micro Devices, Inc. gfx900"

    Model loaded. Compiling and running initial batch, batch_size=1 Warmup Memory access fault by GPU node-7 on address 0x4408bd6000. Reason: Page not pres ent or supervisor privilege. Aborted (core dumped)

    Any idea how to resolve this?

    Thanks, Biswa

    opened by biswagsingh 26
  • "CL_OUT_OF_HOST_MEMORY" error when command "plaidml-setup"

    Hello again, I'm experiencing a new issue with the 0.6.0 rc1 version of the plaidml. Using 0.5 led to this issue: https://github.com/plaidml/plaidml/issues/73. Any luck of solving it?

    opened by iamkucuk 23
  • Feature request - port to Python 3.6

    Feature request - port to Python 3.6

    I've got PlaidML running on my AMD Bonaire on Arch Linux with Python 2.7 in a Conda environment. Every other Python package I have runs with 3.6 and my goal is to keep it that way. ;-)

    There doesn't seem to even be a pip package for 3.6, so the pip install -U plaidml-keras fails with Python 3.6. If you can post build-from-GitHub-source instructions, I can make a local package and install it.

    P.S.: Let me know if you want Arch setup instructions for AMD GPUs. Most of it is on the Arch User Repository wiki but I've got some scripts that do the work.

    P.P.S.: Benchmark results

    Using PlaidML backend.
    INFO:plaidml:Initializing device bonaire.0: "Bonaire", vendor "Advanced Micro Devices, Inc."
    INFO:plaidml:Opening device "bonaire.0": "Advanced Micro Devices, Inc. Bonaire"
    Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.6/mobilenet_1_0_224_tf.h5
    16793600/17225924 [============================>.] - ETA: 0s 
    Model loaded.
    Compiling and running initial batch, batch_size=1
    Warmup
    Doing the main timing
    Example finished, elapsed: 6.821215868 (compile), 15.0223557949 (execution)
    
    opened by znmeb 21
  • Mac+AMD: AMD not detected and Intel uses too high of a work group

    Mac+AMD: AMD not detected and Intel uses too high of a work group

    iMac 2017 with a Radeon Pro 580 and a Core i5-7600K. Compiled and installed PlaidML from source. Installed via the pip wheel.

    Ran plaidml-setup:

    PlaidML Setup (0.0.0.dev0)

    Thanks for using PlaidML!

    Some Notes:

    • Bugs and other issues: https://github.com/plaidml/plaidml
    • Questions: https://stackoverflow.com/questions/tagged/plaidml
    • Say hello: https://groups.google.com/forum/#!forum/plaidml-dev
    • PlaidML is licensed under the GNU AGPLv3

    Default Config Devices: No devices.

    Experimental Config Devices: intel(r)_core(tm)i5-7600k_cpu@_3.80ghz.0 : Intel Intel(R) Core(TM) i5-7600K CPU @ 3.80GHz

    Using experimental devices can cause poor performance, crashes, and other nastiness. Enable experimental device support? (y,n)[n]:y

    PlaidML sends anonymous usage statistics to help guide improvements. We'd love your help making it better.

    Enable telemetry reporting? (y,n)[y]:y

    Almost done. Multiplying some matrices... Tile code: function (B[X,Z], C[Z,Y]) -> (A) { A[x,y : X,Y] = +(B[x,z] * C[z,y]); } ERROR:plaidml:OpenCL: [CL_INVALID_WORK_GROUP_SIZE] : OpenCL Error : clEnqueueNDRangeKernel failed: total work group size (32) is greater than the device can support (1) (cb=12) Whew. That worked.

    Save settings to /Users/andy/.plaidml? (y,n)[y]:y Success!

    Should a gpu be detected at this point? Is there somewhere I can lower total work group size manually?

    New to submitting git issues. Sorry if I'm missing anything.

    opened by andyoneal 19
  • PlaidML Setup Issue Windows

    PlaidML Setup Issue Windows

    Hi, Running plaidml-setup gives me the following:

    PlaidML Setup (0.3.5)

    Thanks for using PlaidML!

    Some Notes:

    • Bugs and other issues: https://github.com/plaidml/plaidml
    • Questions: https://stackoverflow.com/questions/tagged/plaidml
    • Say hello: https://groups.google.com/forum/#!forum/plaidml-dev
    • PlaidML is licensed under the GNU AGPLv3

    No OpenCL devices found. Check driver installation. Read the helpful, easy driver installation instructions from our README: http://github.com/plaidml/plaidml

    This is the output from clinfo: Number of platforms: 1 Platform Profile: FULL_PROFILE Platform Version: OpenCL 2.1 AMD-APP (2766.5) Platform Name: AMD Accelerated Parallel Processing Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_amd_event_callback cl_amd_offline_devices

    Platform Name: AMD Accelerated Parallel Processing Number of devices: 1 Device Type: CL_DEVICE_TYPE_GPU Vendor ID: 1002h Board name: Radeon RX 580 Series Device Topology: PCI[ B#1, D#0, F#0 ] Max compute units: 36 Max work items dimensions: 3 Max work items[0]: 1024 Max work items[1]: 1024 Max work items[2]: 1024 Max work group size: 256 Preferred vector width char: 4 Preferred vector width short: 2 Preferred vector width int: 1 Preferred vector width long: 1 Preferred vector width float: 1 Preferred vector width double: 1 Native vector width char: 4 Native vector width short: 2 Native vector width int: 1 Native vector width long: 1 Native vector width float: 1 Native vector width double: 1 Max clock frequency: 1340Mhz Address bits: 64 Max memory allocation: 4244635648 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 64 Max image 2D width: 16384 Max image 2D height: 16384 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 1024 Alignment (bits) of base address: 2048 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: No Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: Read/Write Cache line size: 64 Cache size: 16384 Global memory size: 8589934592 Constant buffer size: 4244635648 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 32768 Max pipe arguments: 16 Max pipe active reservations: 16 Max pipe packet size: 4244635648 Max global variable size: 3820172032 Max global variable preferred total size: 8589934592 Max read/write image args: 64 Max on device events: 1024 Queue on device max size: 8388608 Max on device queues: 1 Queue on device preferred size: 262144 SVM capabilities: Coarse grain buffer: Yes Fine grain buffer: Yes Fine grain system: No Atomics: No Preferred platform atomic alignment: 0 Preferred global atomic alignment: 0 Preferred local atomic alignment: 0 Kernel Preferred work group size multiple: 64 Error correction support: 0 Unified memory for Host and Device: 0 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue on Host properties: Out-of-Order: No Profiling : Yes Queue on Device properties: Out-of-Order: Yes Profiling : Yes Platform ID: 00007FFEC2C66FD0 Name: Ellesmere Vendor: Advanced Micro Devices, Inc. Device OpenCL C version: OpenCL C 2.0 Driver version: 2766.5 Profile: FULL_PROFILE Version: OpenCL 2.0 AMD-APP (2766.5) Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_khr_gl_depth_images cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_spir cl_khr_subgroups cl_khr_gl_event cl_khr_depth_images cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_amd_liquid_flash cl_amd_planar_yuv

    Shouldn't it be working? I just switched to a new computer, so I used to use NVIDIA with CUDA. Any help is appreciated!

    Note: I do have the most recent AMD driver installed.

    opened by YutaTakano 16
  • could not broadcast input array from shape (3,2048) into shape (6144)

    could not broadcast input array from shape (3,2048) into shape (6144)

    I just installed plaidml and i tried to run this example:

    #!/usr/bin/env python
    
    import plaidml.keras
    plaidml.keras.install_backend() 
    
    import numpy as np
    import matplotlib.pyplot as plt
    from keras.models import Sequential
    from keras.layers.core import Dense, Activation, Dropout
    from keras.datasets import mnist
    from keras.utils import np_utils
    
    # fix a random seed for reproducibility
    np.random.seed(9)
    
    # user inputs
    nb_epoch = 25
    num_classes = 10
    batch_size = 128
    train_size = 60000
    test_size = 10000
    v_length = 784
    
    # split the mnist data into train and test
    (trainData, trainLabels), (testData, testLabels) = mnist.load_data()
    
    
    # reshape the dataset
    trainData = trainData.reshape(train_size, v_length)
    testData = testData.reshape(test_size, v_length)
    trainData = trainData.astype("float32")
    testData = testData.astype("float32")
    trainData /= 255
    testData /= 255
    
    
    # convert class vectors to binary class matrices --> one-hot encoding
    mTrainLabels = np_utils.to_categorical(trainLabels, num_classes)
    mTestLabels = np_utils.to_categorical(testLabels, num_classes)
    
    # create the model
    model = Sequential()
    model.add(Dense(512, input_shape=(784,)))
    model.add(Activation("relu"))
    model.add(Dropout(0.2))
    model.add(Dense(256))
    model.add(Activation("relu"))
    model.add(Dropout(0.2))
    model.add(Dense(num_classes))
    model.add(Activation("softmax"))
    
    # summarize the model
    model.summary()
    
    # compile the model
    model.compile(loss="categorical_crossentropy",
    			  optimizer="adam",
    			  metrics=["accuracy"])
    
    # fit the model
    history = model.fit(trainData, 
    				 	mTrainLabels,
    					validation_data=(testData, mTestLabels),
    					batch_size=batch_size,
    					nb_epoch=nb_epoch,
    					verbose=2)
    
    # print the history keys
    
    
    # evaluate the model
    scores = model.evaluate(testData, mTestLabels, verbose=0)
    
    # history plot for accuracy
    plt.plot(history.history["acc"])
    plt.plot(history.history["val_acc"])
    plt.title("Model Accuracy")
    plt.xlabel("Epoch")
    plt.ylabel("Accuracy")
    plt.legend(["train", "test"], loc="upper left")
    plt.show()
    
    # history plot for accuracy
    plt.plot(history.history["loss"])
    plt.plot(history.history["val_loss"])
    plt.title("Model Loss")
    plt.xlabel("Epoch")
    plt.ylabel("Loss")
    plt.legend(["train", "test"], loc="upper left")
    plt.show()
    
    

    and I got this error

    could not broadcast input array from shape (3,2048) into shape (6144)

    Then I tried running Hello VGG example from plaidml github page and I got the same error.

    I am using plaidml 0.3.4 on ubuntu in virtualenv and I am trying to run this code on rx 480.

    Tnx for help.

    opened by leon3428 16
  • plaidml.exceptions.Unknown: Duplicate updates

    plaidml.exceptions.Unknown: Duplicate updates

    Setup:

    sudo apt-get install clinfo
    clinfo [sees 1080ti]
    sudo pip install -U plaidml-keras
    plaidml-setup
    [insert before keras import:]
    import plaidml.keras
    plaidml.keras.install_backend()
    

    But, intermediate problem:

     ImportError: No module named plaidml.keras
    $ which python
    /home/phobrain/anaconda2/bin//python
    

    Fix:

    sys.path.append('/usr/local/lib/python2.7/dist-packages/')
    import plaidml.keras
    plaidml.keras.install_backend()
    

    'Real' issue being reported:

    File "siaconv.py", line 919, in doit epochs=epochs) File "/home/phobrain/anaconda2/lib/python2.7/site-packages/keras/legacy/interfaces.py", line 87, in wrapper return func(*args, **kwargs) File "/home/phobrain/anaconda2/lib/python2.7/site-packages/keras/engine/training.py", line 1926, in fit_generator self._make_train_function() File "/home/phobrain/anaconda2/lib/python2.7/site-packages/keras/engine/training.py", line 967, in _make_train_function **self._function_kwargs) File "/usr/local/lib/python2.7/dist-packages/plaidml/keras/backend.py", line 1718, in function return _Function(inputs, outputs, updates, name) File "/usr/local/lib/python2.7/dist-packages/plaidml/keras/backend.py", line 931, in init c.add_update(_plaidml_val(var), _plaidml_val(newval)) File "/usr/local/lib/python2.7/dist-packages/plaidml/init.py", line 1289, in add_update _lib().plaidml_add_composer_update(self, dest, src) File "/usr/local/lib/python2.7/dist-packages/plaidml/init.py", line 674, in _check_err self.raise_last_status() File "/usr/local/lib/python2.7/dist-packages/plaidml/library.py", line 136, in raise_last_status raise self.last_status() plaidml.exceptions.Unknown: Duplicate updates

    model.fit_generator(
            myGen('data', tr_pairs, tr_y, batch_size, True),
            (len(tr_pairs)-1) / batch_size,
            validation_data=myGen('valid', te_pairs, te_y, batch_size, True),
            validation_steps=1,
            max_queue_size=2,
            workers=1,
            epochs=epochs)
    

    Net:

    KERNEL_INIT = 'glorot_normal'
    
        seq.add(Dense(dense_size, input_shape=input_shape,
                    activation='relu', kernel_initializer=KERNEL_INIT))
        seq.add(BatchNormalization())
        seq.add(Dense((dense_size*2)/3,
                activation='relu',
                kernel_initializer=KERNEL_INIT))
        seq.add(Dropout(0.1, seed=SEED))
        seq.add(Dense(dense_size/4,
                activation='relu',
                kernel_initializer=KERNEL_INIT))
        seq.add(Dense((dense_size*2)/3,
                activation='relu',
                kernel_initializer=KERNEL_INIT))
        seq.add(Dense(dense_size,
                activation='relu',
                kernel_initializer=KERNEL_INIT))
        seq.add(Dense(512,
                    activation='relu',
                    kernel_initializer=KERNEL_INIT))
        seq.add(Dense(256,
                    activation='relu',
                    kernel_initializer=KERNEL_INIT))
        seq.add(Dense(128,
                    activation='relu',
                    kernel_initializer=KERNEL_INIT))
        seq.add(Dense(256,
                    activation='relu',
                    kernel_initializer=KERNEL_INIT))
        seq.add(Dense(128,
                    activation='relu',
                    kernel_initializer=KERNEL_INIT))
    
    opened by phobrain 16
  • Plaidml not detecting Mali-T628 on ARM

    Plaidml not detecting Mali-T628 on ARM

    Hi,

    I've build plaidml 0.3.5 to use on Odroid XU4 with Mali-T628 GPU with debian stretch. I manage to install the wheel, when I run plaidml-setup, I get:

    "No supported devices found. Run 'clinfo' and file an issue containing the full output."

    However, with plaidml 0.3.0rc1 latest available with pip install plaidml, my devices can be configured and I have 2 mali-t628 reported. "experimental.json" appears quite similar in both cases.

    Any clue with what I may have done wrong building plaidml ? (used basel 0.18.1 with --config linux_arm_32v7) or what change might explain 0.3.5 not recognizing my devices where 0.3.0rc1 did ?

    Thanks

    Here's my clinfo report:

    Number of platforms 1 Platform Name ARM Platform Platform Vendor ARM Platform Version OpenCL 1.2 v1.r12p0-04rel0.03af15950392f3702b248717f4938b82 Platform Profile FULL_PROFILE Platform Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_gl_sharing cl_khr_icd cl_khr_egl_event cl_khr_egl_image cl_arm_core_id cl_arm_printf cl_arm_thread_limit_hint cl_arm_non_uniform_work_group_size cl_arm_import_memory Platform Extensions function suffix ARM

    Platform Name ARM Platform Number of devices 2 Device Name Mali-T628 Device Vendor ARM Device Vendor ID 0x6200010 Device Version OpenCL 1.2 v1.r12p0-04rel0.03af15950392f3702b248717f4938b82 Driver Version 1.2 Device OpenCL C Version OpenCL C 1.2 v1.r12p0-04rel0.03af15950392f3702b248717f4938b82 Device Type GPU Device Profile FULL_PROFILE Max compute units 4 Max clock frequency 600MHz Device Partition (core) Max number of sub-devices 0 Supported partition types None Max work item dimensions 3 Max work item sizes 256x256x256 Max work group size 256 Preferred work group size multiple 4 Preferred / native vector sizes
    char 16 / 16
    short 8 / 8
    int 4 / 4
    long 2 / 2
    half 8 / 8 (cl_khr_fp16) float 4 / 4
    double 2 / 2 (cl_khr_fp64) Half-precision Floating-point support (cl_khr_fp16) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations No Single-precision Floating-point support (core) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations No Double-precision Floating-point support (cl_khr_fp64) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations No Address bits 64, Little-Endian Global memory size 2090405888 (1.947GiB) Error Correction support No Max memory allocation 522601472 (498.4MiB) Unified memory for Host and Device Yes Minimum alignment for any data type 128 bytes Alignment of base address 1024 bits (128 bytes) Global Memory cache type Read/Write Global Memory cache size <printDeviceInfo:89: get CL_DEVICE_GLOBAL_MEM_CACHE_SIZE : error -30> Global Memory cache line 64 bytes Image support Yes Max number of samplers per kernel 16 Max size for 1D images from buffer 65536 pixels Max 1D or 2D image array size 2048 images Max 2D image size 65536x65536 pixels Max 3D image size 65536x65536x65536 pixels Max number of read image args 128 Max number of write image args 8 Local memory type Global Local memory size 32768 (32KiB) Max constant buffer size 65536 (64KiB) Max number of constant args 8 Max size of kernel argument 1024 Queue properties
    Out-of-order execution Yes Profiling Yes Prefer user sync for interop No Profiling timer resolution 1000ns Execution capabilities
    Run OpenCL kernels Yes Run native kernels No printf() buffer size 1048576 (1024KiB) Built-in kernels
    Device Available Yes Compiler Available Yes Linker Available Yes Device Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_gl_sharing cl_khr_icd cl_khr_egl_event cl_khr_egl_image cl_arm_core_id cl_arm_printf cl_arm_thread_limit_hint cl_arm_non_uniform_work_group_size cl_arm_import_memory

    Device Name Mali-T628 Device Vendor ARM Device Vendor ID 0x6200010 Device Version OpenCL 1.2 v1.r12p0-04rel0.03af15950392f3702b248717f4938b82 Driver Version 1.2 Device OpenCL C Version OpenCL C 1.2 v1.r12p0-04rel0.03af15950392f3702b248717f4938b82 Device Type GPU Device Profile FULL_PROFILE Max compute units 2 Max clock frequency 600MHz Device Partition (core) Max number of sub-devices 0 Supported partition types None Max work item dimensions 3 Max work item sizes 256x256x256 Max work group size 256 Preferred work group size multiple 4 Preferred / native vector sizes
    char 16 / 16
    short 8 / 8
    int 4 / 4
    long 2 / 2
    half 8 / 8 (cl_khr_fp16) float 4 / 4
    double 2 / 2 (cl_khr_fp64) Half-precision Floating-point support (cl_khr_fp16) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations No Single-precision Floating-point support (core) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations No Double-precision Floating-point support (cl_khr_fp64) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations No Address bits 64, Little-Endian Global memory size 2090405888 (1.947GiB) Error Correction support No Max memory allocation 522601472 (498.4MiB) Unified memory for Host and Device Yes Minimum alignment for any data type 128 bytes Alignment of base address 1024 bits (128 bytes) Global Memory cache type Read/Write Global Memory cache size <printDeviceInfo:89: get CL_DEVICE_GLOBAL_MEM_CACHE_SIZE : error -30> Global Memory cache line 64 bytes Image support Yes Max number of samplers per kernel 16 Max size for 1D images from buffer 65536 pixels Max 1D or 2D image array size 2048 images Max 2D image size 65536x65536 pixels Max 3D image size 65536x65536x65536 pixels Max number of read image args 128 Max number of write image args 8 Local memory type Global Local memory size 32768 (32KiB) Max constant buffer size 65536 (64KiB) Max number of constant args 8 Max size of kernel argument 1024 Queue properties
    Out-of-order execution Yes Profiling Yes Prefer user sync for interop No Profiling timer resolution 1000ns Execution capabilities
    Run OpenCL kernels Yes Run native kernels No printf() buffer size 1048576 (1024KiB) Built-in kernels
    Device Available Yes Compiler Available Yes Linker Available Yes Device Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_gl_sharing cl_khr_icd cl_khr_egl_event cl_khr_egl_image cl_arm_core_id cl_arm_printf cl_arm_thread_limit_hint cl_arm_non_uniform_work_group_size cl_arm_import_memory

    NULL platform behavior clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) ARM Platform clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [ARM] clCreateContext(NULL, ...) [default] Success [ARM] clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (2) Platform Name ARM Platform Device Name Mali-T628 Device Name Mali-T628 clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (2) Platform Name ARM Platform Device Name Mali-T628 Device Name Mali-T628

    ICD loader properties ICD loader Name OpenCL ICD Loader ICD loader Vendor OCL Icd free software ICD loader Version 2.2.11 ICD loader Profile OpenCL 2.1

    opened by nitescuc 15
  • Tile language vs EDSL

    Tile language vs EDSL

    In the last publicly available version (0.7.0) of PlaidML the Tile language was used to write operators. The latest documentation talks about a C++/Python EDSL being developed as well, however, documentation there is a bit lacking, and only C++ is presented, not any Python version. I was wondering whether that is meant to substitute the Tile language or will both of them be kept in the future? I used to have some test project built using the Tile language from Python, which used to work well for me, and I'd like to improve that, but was wondering if I should port it to the EDSL approach, or is it okay to stay with the Tile language approach.

    Also, I was wondering if the v1 branch still supports the Tile language, and will it continue to do so when it is finally released? If I take the path of compiling it from source, and use that instead of the pip installable v0.7, will my project that uses it break or continue working?

    Is there any idea of a time-line for the next public release? I have been following this promising project for a while, hoping that a new release will come sooner or later.

    opened by gyenesvi 9
  • Capture affine.store op

    Capture affine.store op

    Hello.

    I try to perform stencil on the following code

    affine.store %1, %arg2[%arg3, %arg4, %arg5, %arg6] : memref<?x?x?x?xi64>
    affine.yield %arg2 : memref<?x?x?x?xi64>
    

    by using the matchPattern function:

    matchPattern(yield, m_Op<AffineYieldOp>(m_Capture(&store, m_Op<AffineStoreOp>(m_Any(), m_Any())))
    

    But it seems m_Capture function and m_Op function used in existing examples, such as StencilGEMM, can not be used to capture operation without a return val, like affine.store here. Can I just use existing structure to match this pattern and capture the affine.store op ?

    opened by IsolatedMy 4
  • Stenciling of MAX/ADD for RN50

    Stenciling of MAX/ADD for RN50

    This patch fix the pass "--x86-stencil-tpp-unary" so that all the reduce patterns in RN50 get stenciled with correct TPP parameters and unary flags.

    opened by ZhangMZh 0
  • Batch parallelization and allocs to alloca changes

    Batch parallelization and allocs to alloca changes

    This wip patch modifies scoped allocs to allocas using PromoteBuffersToStackPass as well as pxa localization pass. As of now, the first pass does not seem to be scoping allocs other than weights. On the other hand, pxa localization pass throws a segfault at runtime for threads>1. This patch also parallelizes layers along batch dimension barring those which don't have batch dimension as the outer loop's induction variable.

    wip 
    opened by KavithaTipturMadhu 0
  • TPSS: parallelization directives

    TPSS: parallelization directives

    This patch adds support for parallelization directives to be specified in a file using the environment <PLAIDML_PARALLELIZATION_CONFIG_FILE>. This patch adds a rule parser which matches the shapes of convolution based on the equalities/inequalities in the config file and applies the rules that follow.
    It is important to note that collapse directive also adds a parallelize directive by default and can only be applied to 2 loop levels corresponding to a perfect loop nest (validity of reordering of loops in order to support the requested order is not verified).

    wip 
    opened by KavithaTipturMadhu 0
  • how to fix

    how to fix "cannot import name 'Iterable' from 'collections' when running test code from main page

    Hey all,

    I've decided to try some mL projects and since I have a amd gpu (5700xt) I decided to use Plaidml. On the main website theres a test code for VGG-19 and I'm trying to run it right now but I run into the error attached in the screenshot. I tried to simply change collections to collections.abc but it looks like python 3.10 already does that? I'm pretty stuck, any help would be appreciated. Thanks! Screenshot from 2022-06-11 22-03-38

    opened by KSTRTK 3
Releases(0.7.0)
Owner
PlaidML
PlaidML makes deep learning work everywhere.
PlaidML
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

DeCLIP Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm. Our paper is available in arxiv Updates ** Ou

Sense-GVT 470 Dec 30, 2022
How the Deep Q-learning method works and discuss the new ideas that makes the algorithm work

Deep Q-Learning Recommend papers The first step is to read and understand the method that you will implement. It was first introduced in a 2013 paper

null 1 Jan 25, 2022
Ivy is a templated deep learning framework which maximizes the portability of deep learning codebases.

Ivy is a templated deep learning framework which maximizes the portability of deep learning codebases. Ivy wraps the functional APIs of existing frameworks. Framework-agnostic functions, libraries and layers can then be written using Ivy, with simultaneous support for all frameworks. Ivy currently supports Jax, TensorFlow, PyTorch, MXNet and Numpy. Check out the docs for more info!

Ivy 8.2k Jan 2, 2023
This is the implementation of our work Deep Extreme Cut (DEXTR), for object segmentation from extreme points.

This is the implementation of our work Deep Extreme Cut (DEXTR), for object segmentation from extreme points.

Sergi Caelles 828 Jan 5, 2023
A toolkit for making real world machine learning and data analysis applications in C++

dlib C++ library Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real worl

Davis E. King 11.6k Jan 1, 2023
Decentralized Reinforcment Learning: Global Decision-Making via Local Economic Transactions (ICML 2020)

Decentralized Reinforcement Learning This is the code complementing the paper Decentralized Reinforcment Learning: Global Decision-Making via Local Ec

null 40 Oct 30, 2022
TensorFlow Similarity is a python package focused on making similarity learning quick and easy.

TensorFlow Similarity is a python package focused on making similarity learning quick and easy.

null 912 Jan 8, 2023
RepVGG: Making VGG-style ConvNets Great Again

RepVGG: Making VGG-style ConvNets Great Again (PyTorch) This is a super simple ConvNet architecture that achieves over 80% top-1 accuracy on ImageNet

null 2.8k Jan 4, 2023
naked is a Python tool which allows you to strip a model and only keep what matters for making predictions.

naked is a Python tool which allows you to strip a model and only keep what matters for making predictions. The result is a pure Python function with no third-party dependencies that you can simply copy/paste wherever you wish.

Max Halford 24 Dec 20, 2022
《Train in Germany, Test in The USA: Making 3D Object Detectors Generalize》(CVPR 2020)

Train in Germany, Test in The USA: Making 3D Object Detectors Generalize This paper has been accpeted by Conference on Computer Vision and Pattern Rec

Xiangyu Chen 101 Jan 2, 2023
[ICLR 2021 Spotlight Oral] "Undistillable: Making A Nasty Teacher That CANNOT teach students", Haoyu Ma, Tianlong Chen, Ting-Kuei Hu, Chenyu You, Xiaohui Xie, Zhangyang Wang

Undistillable: Making A Nasty Teacher That CANNOT teach students "Undistillable: Making A Nasty Teacher That CANNOT teach students" Haoyu Ma, Tianlong

VITA 71 Dec 28, 2022
A tool for making map images from OpenTTD save games

OpenTTD Surveyor A tool for making map images from OpenTTD save games. This is not part of the main OpenTTD codebase, nor is it ever intended to be pa

Aidan Randle-Conde 9 Feb 15, 2022
[ICLR2021] Unlearnable Examples: Making Personal Data Unexploitable

Unlearnable Examples Code for ICLR2021 Spotlight Paper "Unlearnable Examples: Making Personal Data Unexploitable " by Hanxun Huang, Xingjun Ma, Sarah

Hanxun Huang 98 Dec 7, 2022
RepVGG: Making VGG-style ConvNets Great Again

This repository is the code that needs to be submitted for OpenMMLab Algorithm Ecological Challenge,the paper is RepVGG: Making VGG-style ConvNets Great Again

Ty Feng 62 May 21, 2022
Azua - build AI algorithms to aid efficient decision-making with minimum data requirements.

Project Azua 0. Overview Many modern AI algorithms are known to be data-hungry, whereas human decision-making is much more efficient. The human can re

Microsoft 197 Jan 6, 2023
Using this codebase as a tool for my own research. Making some modifications to the original repo for my own purposes.

For SwapNet Create a list.txt file containing all the images to process. This can be done with the GNU find command: find path/to/input/folder -name '

Andrew Jong 2 Nov 10, 2021
Making a music video with Wav2CLIP and VQGAN-CLIP

music2video Overview A repo for making a music video with Wav2CLIP and VQGAN-CLIP. The base code was derived from VQGAN-CLIP The CLIP embedding for au

Joel Jang | 장요엘 163 Dec 26, 2022
An updated version of virtual model making

Model-Swap-Face v2   这个项目是基于stylegan2 pSp制作的,比v1版本Model-Swap-Face在推理速度和图像质量上有一定提升。主要的功能是将虚拟模特进行环球不同区域的风格转换,目前转换器提供西欧模特、东亚模特和北非模特三种主流的风格样式,可帮我们实现生产资料零成

seeprettyface.com 62 Dec 9, 2022