General purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends)

The Kompute Project

Last update: Jan 6, 2023

Related tags

Deep Learning machine-learning deep-learning cpp vulkan gpgpu gpu-computing vulkan-demos deep-learning-gpu vulkan-compute vulkan-tutorial vulkan-example vulkan-compute-tutorial vulkan-compute-framework vulkan-compute-example machine-learning-gpu

Overview

Kompute

The general purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends).

Blazing fast, mobile-enabled, asynchronous, and optimized for advanced GPU acceleration usecases.

💬 Join the Discord & Community Calls 🔋 Documentation 💻 Blog Post ⌨ Examples 💾

Kompute is backed by the Linux Foundation as a hosted project by the LF AI & Data Foundation.

Principles & Features

Flexible Python module with C++ SDK for optimizations
Asynchronous & parallel processing support through GPU family queues
Mobile enabled with examples via Android NDK across several architectures
BYOV: Bring-your-own-Vulkan design to play nice with existing Vulkan applications
Explicit relationships for GPU and host memory ownership and memory management
Robust codebase with 90% unit test code coverage
Advanced use-cases on machine learning 🤖 , mobile development 📱 and game development 🎮 .
Active community with monthly calls, discord chat and more

Getting Started

Below you can find a GPU multiplication example using the C++ and Python Kompute interfaces.

You can join the Discord for questions/discussion, open a github issue, or read the documentation.

Your First Kompute (C++)

The C++ interface provides low level access to the native components of Kompute, enabling for advanced optimizations as well as extension of components.

vector()) std::cout << elem << " "; } // Manages / releases all CPU and GPU memory resources int main() { // Define a raw string shader (or use the Kompute tools to compile to SPIRV / C++ header // files). This shader shows some of the main components including constants, buffers, etc std::string shader = (R"( #version 450 layout (local_size_x = 1) in; // The input tensors bind index is relative to index in parameter passed layout(set = 0, binding = 0) buffer buf_in_a { float in_a[]; }; layout(set = 0, binding = 1) buffer buf_in_b { float in_b[]; }; layout(set = 0, binding = 2) buffer buf_out_a { uint out_a[]; }; layout(set = 0, binding = 3) buffer buf_out_b { uint out_b[]; }; // Kompute supports push constants updated on dispatch layout(push_constant) uniform PushConstants { float val; } push_const; // Kompute also supports spec constants on initalization layout(constant_id = 0) const float const_one = 0; void main() { uint index = gl_GlobalInvocationID.x; out_a[index] += uint( in_a[index] * in_b[index] ); out_b[index] += uint( const_one * push_const.val ); } )"); // Run the function declared above with our raw string shader kompute(shader); } ">

void kompute(const std::string& shader) {

    // 1. Create Kompute Manager with default settings (device 0, first queue and no extensions)
    kp::Manager mgr; 

    // 2. Create and initialise Kompute Tensors through manager

    // Default tensor constructor simplifies creation of float values
    auto tensorInA = mgr.tensor({ 2., 2., 2. });
    auto tensorInB = mgr.tensor({ 1., 2., 3. });
    // Explicit type constructor supports uint32, int32, double, float and bool
    auto tensorOutA = mgr.tensorT<uint32_t>({ 0, 0, 0 });
    auto tensorOutB = mgr.tensorT<uint32_t>({ 0, 0, 0 });

    std::vector
   
    > params = {tensorInA, tensorInB, tensorOutA, tensorOutB};

    
    // 3. Create algorithm based on shader (supports buffers & push/spec constants)
    kp::Workgroup 
    workgroup({
    3, 
    1, 
    1});
    std::vector<
    float> 
    specConsts({ 
    2 });
    std::vector<
    float> 
    pushConstsA({ 
    2.0 });
    std::vector<
    float> 
    pushConstsB({ 
    3.0 });

    
    auto algorithm = mgr.
    algorithm(params,
                                   
    // See documentation shader section for compileSource
                                   
    compileSource(shader),
                                   workgroup,
                                   specConsts,
                                   pushConstsA);

    
    // 4. Run operation synchronously using sequence
    mgr.
    sequence()
        ->
    record
    
     (params)
        ->
     record
     
      (algorithm) 
      // Binds default push consts
        ->
      eval() 
      // Evaluates the two recorded operations
        ->
      record
      
       (algorithm, pushConstsB) 
       // Overrides push consts
        ->
       eval(); 
       // Evaluates only last recorded operation

    
       // 5. Sync results from the GPU asynchronously
    
       auto sq = mgr.
       sequence();
    sq->
       evalAsync
       
        (params); 
        // ... Do other work asynchronously whilst GPU finishes sq->
        evalAwait(); 
        // Prints the first output which is: { 4, 8, 12 } 
        for (
        const 
        float& elem : tensorOutA->
        vector()) std::cout << elem << 
        " "; 
        // Prints the second output which is: { 10, 10, 10 } 
        for (
        const 
        float& elem : tensorOutB->
        vector()) std::cout << elem << 
        " "; } 
        // Manages / releases all CPU and GPU memory resources 
        int 
        main() { 
        // Define a raw string shader (or use the Kompute tools to compile to SPIRV / C++ header 
        // files). This shader shows some of the main components including constants, buffers, etc std::string shader = (
        R"( 
         #version 450 
         
         layout (local_size_x = 1) in; 
         
         // The input tensors bind index is relative to index in parameter passed 
         layout(set = 0, binding = 0) buffer buf_in_a { float in_a[]; }; 
         layout(set = 0, binding = 1) buffer buf_in_b { float in_b[]; }; 
         layout(set = 0, binding = 2) buffer buf_out_a { uint out_a[]; }; 
         layout(set = 0, binding = 3) buffer buf_out_b { uint out_b[]; }; 
         
         // Kompute supports push constants updated on dispatch 
         layout(push_constant) uniform PushConstants { 
         float val; 
         } push_const; 
         
         // Kompute also supports spec constants on initalization 
         layout(constant_id = 0) const float const_one = 0; 
         
         void main() { 
         uint index = gl_GlobalInvocationID.x; 
         out_a[index] += uint( in_a[index] * in_b[index] ); 
         out_b[index] += uint( const_one * push_const.val ); 
         } 
         )"); 
        // Run the function declared above with our raw string shader 
        kompute(shader); }

Your First Kompute (Python)

The Python package provides a high level interactive interface that enables for experimentation whilst ensuring high performance and fast development workflows.

from .utils import compile_source # using util function from python/test/utils

def kompute(shader):
    # 1. Create Kompute Manager with default settings (device 0, first queue and no extensions)
    mgr = kp.Manager()

    # 2. Create and initialise Kompute Tensors through manager

    # Default tensor constructor simplifies creation of float values
    tensor_in_a = mgr.tensor([2, 2, 2])
    tensor_in_b = mgr.tensor([1, 2, 3])
    # Explicit type constructor supports uint32, int32, double, float and bool
    tensor_out_a = mgr.tensor_t(np.array([0, 0, 0], dtype=np.uint32))
    tensor_out_b = mgr.tensor_t(np.array([0, 0, 0], dtype=np.uint32))

    params = [tensor_in_a, tensor_in_b, tensor_out_a, tensor_out_b]

    # 3. Create algorithm based on shader (supports buffers & push/spec constants)
    workgroup = (3, 1, 1)
    spec_consts = [2]
    push_consts_a = [2]
    push_consts_b = [3]

    # See documentation shader section for compile_source
    spirv = compile_source(shader)

    algo = mgr.algorithm(params, spirv, workgroup, spec_consts, push_consts_a)

    # 4. Run operation synchronously using sequence
    (mgr.sequence()
        .record(kp.OpTensorSyncDevice(params))
        .record(kp.OpAlgoDispatch(algo)) # Binds default push consts provided
        .eval() # evaluates the two recorded ops
        .record(kp.OpAlgoDispatch(algo, push_consts_b)) # Overrides push consts
        .eval()) # evaluates only the last recorded op

    # 5. Sync results from the GPU asynchronously
    sq = mgr.sequence()
    sq.eval_async(kp.OpTensorSyncLocal(params))

    # ... Do other work asynchronously whilst GPU finishes

    sq.eval_await()

    # Prints the first output which is: { 4, 8, 12 }
    print(tensor_out_a)
    # Prints the first output which is: { 10, 10, 10 }
    print(tensor_out_b)

if __name__ == "__main__":

    # Define a raw string shader (or use the Kompute tools to compile to SPIRV / C++ header
    # files). This shader shows some of the main components including constants, buffers, etc
    shader = """
        #version 450

        layout (local_size_x = 1) in;

        // The input tensors bind index is relative to index in parameter passed
        layout(set = 0, binding = 0) buffer buf_in_a { float in_a[]; };
        layout(set = 0, binding = 1) buffer buf_in_b { float in_b[]; };
        layout(set = 0, binding = 2) buffer buf_out_a { uint out_a[]; };
        layout(set = 0, binding = 3) buffer buf_out_b { uint out_b[]; };

        // Kompute supports push constants updated on dispatch
        layout(push_constant) uniform PushConstants {
            float val;
        } push_const;

        // Kompute also supports spec constants on initalization
        layout(constant_id = 0) const float const_one = 0;

        void main() {
            uint index = gl_GlobalInvocationID.x;
            out_a[index] += uint( in_a[index] * in_b[index] );
            out_b[index] += uint( const_one * push_const.val );
        }
    """

    kompute(shader)

Interactive Notebooks & Hands on Videos

You are able to try out the interactive Colab Notebooks which allow you to use a free GPU. The available examples are the Python and C++ examples below:

Try the interactive C++ Colab from Blog Post	Try the interactive Python Colab from Blog Post

You can also check out the two following talks presented at the FOSDEM 2021 conference.

Both videos have timestamps which will allow you to skip to the most relevant section for you - the intro & motivations for both is almost the same so you can skip to the more specific content.

Watch the video for C++ Enthusiasts	Watch the video for Python & Machine Learning Enthusiasts

Architectural Overview

The core architecture of Kompute includes the following:

Kompute Manager - Base orchestrator which creates and manages device and child components
Kompute Sequence - Container of operations that can be sent to GPU as batch
Kompute Operation (Base) - Base class from which all operations inherit
Kompute Tensor - Tensor structured data used in GPU operations
Kompute Algorithm - Abstraction for (shader) logic executed in the GPU

To see a full breakdown you can read further in the C++ Class Reference.

Full Architecture	Simplified Kompute Components
(very tiny, check the full reference diagram in docs for details)

Asynchronous and Parallel Operations

Kompute provides flexibility to run operations in an asynrchonous way through vk::Fences. Furthermore, Kompute enables for explicit allocation of queues, which allow for parallel execution of operations across queue families.

The image below provides an intuition on how Kompute Sequences can be allocated to different queues to enable parallel execution based on hardware. You can see the hands on example, as well as the detailed documentation page describing how it would work using an NVIDIA 1650 as an example.

Mobile Enabled

Kompute has been optimized to work in mobile environments. The build system enables for dynamic loading of the Vulkan shared library for Android environments, together with a working Android NDK wrapper for the CPP headers.

For a full deep dive you can read the blog post "Supercharging your Mobile Apps with On-Device GPU Accelerated Machine Learning".

You can also access the end-to-end example code in the repository, which can be run using android studio.

More examples

Simple examples

End-to-end examples

Python Package

Besides the C++ core SDK you can also use the Python package of Kompute, which exposes the same core functionality, and supports interoperability with Python objects like Lists, Numpy Arrays, etc.

The only dependencies are Python 3.5+ and Cmake 3.4.1+. You can install Kompute from the Python pypi package using the following command.

pip install kp

You can also install from master branch using:

pip install git+git://github.com/KomputeProject/kompute.git@master

For further details you can read the Python Package documentation or the Python Class Reference documentation.

C++ Build Overview

The build system provided uses cmake, which allows for cross platform builds.

The top level Makefile provides a set of optimized configurations for development as well as the docker image build, but you can start a build with the following command:

   cmake -Bbuild

You also are able to add Kompute in your repo with add_subdirectory - the Android example CMakeLists.txt file shows how this would be done.

For a more advanced overview of the build configuration check out the Build System Deep Dive documentation.

Kompute Development

We appreciate PRs and Issues. If you want to contribute try checking the "Good first issue" tag, but even using Kompute and reporting issues is a great contribution!

Contributing

Dev Dependencies

Testing
- GTest
Documentation
- Doxygen (with Dot)
- Sphynx

Development

Follows Mozilla C++ Style Guide https://www-archive.mozilla.org/hacking/mozilla-style-guide.html
- Uses post-commit hook to run the linter, you can set it up so it runs the linter before commit
- All dependencies are defined in vcpkg.json
Uses cmake as build system, and provides a top level makefile with recommended command
Uses xxd (or xxd.exe windows 64bit port) to convert shader spirv to header files
Uses doxygen and sphinx for documentation and autodocs
Uses vcpkg for finding the dependencies, it's the recommended set up to retrieve the libraries

If you want to run with debug layers you can add them with the KOMPUTE_ENV_DEBUG_LAYERS parameter as:

export KOMPUTE_ENV_DEBUG_LAYERS="VK_LAYER_LUNARG_api_dump"

Updating documentation

To update the documentation you will need to:

Run the gendoxygen target in the build system
Run the gensphynx target in the build-system
Push to github pages with make push_docs_to_ghpages

Running tests

Running the unit tests has been significantly simplified for contributors.

The tests run on CPU, and can be triggered using the ACT command line interface (https://github.com/nektos/act) - once you install the command line (And start the Docker daemon) you just have to type:

$ act

[Python Tests/python-tests] 🚀  Start image=axsauze/kompute-builder:0.2
[C++ Tests/cpp-tests      ] 🚀  Start image=axsauze/kompute-builder:0.2
[C++ Tests/cpp-tests      ]   🐳  docker run image=axsauze/kompute-builder:0.2 entrypoint=["/usr/bin/tail" "-f" "/dev/null"] cmd=[]
[Python Tests/python-tests]   🐳  docker run image=axsauze/kompute-builder:0.2 entrypoint=["/usr/bin/tail" "-f" "/dev/null"] cmd=[]
...

The repository contains unit tests for the C++ and Python code, and can be found under the test/ and python/test folder.

The tests are currently run through the CI using Github Actions. It uses the images found in docker-builders/.

In order to minimise hardware requirements the tests can run without a GPU, directly in the CPU using Swiftshader.

For more information on how the CI and tests are setup, you can go to the CI, Docker and Tests Section in the documentation.

Motivations

This project started after seeing that a lot of new and renowned ML & DL projects like Pytorch, Tensorflow, Alibaba DNN, Tencent NCNN - among others - have either integrated or are looking to integrate the Vulkan SDK to add mobile (and cross-vendor) GPU support.

The Vulkan SDK offers a great low level interface that enables for highly specialized optimizations - however it comes at a cost of highly verbose code which requires 500-2000 lines of code to even begin writing application code. This has resulted in each of these projects having to implement the same baseline to abstract the non-compute related features of the Vulkan SDK. This large amount of non-standardised boiler-plate can result in limited knowledge transfer, higher chance of unique framework implementation bugs being introduced, etc.

We are currently developing Kompute not to hide the Vulkan SDK interface (as it's incredibly well designed) but to augment it with a direct focus on the Vulkan SDK's GPU computing capabilities. This article provides a high level overview of the motivations of Kompute, together with a set of hands on examples that introduce both GPU computing as well as the core Kompute architecture.

Comments

push_constant not working in my case?

I tried both codes from README and the test (TestPushConstant.cpp), but apparently push_constant values all get zero value for some reasons? Here is the result for TestPushConstant.cpp):

[alipmpaint@archlinux test]$ ./test_kompute
Running main() from /home/alipmpaint/Documents/github/vulkan-kompute/external/googletest/googletest/src/gtest_main.cc
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from TestPushConstants
[ RUN      ] TestPushConstants.TestTwoConstants
[2021-02-28 23:28:54.779] [info] [Shader.cpp:68] Kompute Shader Information: 


WARNING: radv is not a conformant vulkan implementation, testing use only.
[2021-02-28 23:28:54.940] [info] [Manager.cpp:269] Using physical device index 1 found AMD RADV VERDE (ACO)
[2021-02-28 23:28:54.947] [info] [Algorithm.cpp:18] Kompute Algorithm initialising with tensor size: 1 and spirv size: 292
[2021-02-28 23:28:54.947] [info] [Algorithm.cpp:395] Kompute OpAlgoCreate setting dispatch size
[2021-02-28 23:28:54.947] [info] [Algorithm.cpp:409] Kompute OpAlgoCreate set dispatch size X: 1, Y: 1, Z: 1
[2021-02-28 23:28:54.947] [info] [Sequence.cpp:44] Kompute Sequence command now started recording
[2021-02-28 23:28:54.947] [info] [Sequence.cpp:58] Kompute Sequence command recording END
/home/alipmpaint/Documents/github/vulkan-kompute/test/TestPushConstant.cpp:46: Failure
Expected equality of these values:
  tensor->data()
    Which is: { 0, 0, 0 }
  kp::Constants({ 0.4, 0.4, 0.4 })
    Which is: { 0.4, 0.4, 0.4 }
[2021-02-28 23:28:54.947] [info] [Sequence.cpp:186] Freeing CommandBuffer
[2021-02-28 23:28:54.947] [info] [Sequence.cpp:202] Destroying CommandPool
[2021-02-28 23:28:54.947] [info] [Sequence.cpp:219] Kompute Sequence clearing operations buffer
[2021-02-28 23:28:54.947] [info] [Manager.cpp:102] Destroying device
[2021-02-28 23:28:54.949] [warning] [Sequence.cpp:180] Kompute Sequence destroy called with null Device pointer
[  FAILED  ] TestPushConstants.TestTwoConstants (269 ms)
[----------] 1 test from TestPushConstants (269 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (269 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] TestPushConstants.TestTwoConstants

 1 FAILED TEST

(specConsts work fine btw)

I don't know where the problem exactly is... Since the tests have passed in Github actions and I have clones the same code, I guess it's local. I have two vulkan drivers, one is AMDVLK and the second one AMD RADV VERDE from mesa. What I just described happens with AMD RADV VERDE . In AMDVLK, I get a segfault(I still haven't investigated whether it's the push_constant or not since I'm running out of time) in both codes. So, I guess it has something to do with my Vulkan drivers?

bug c++

opened by unexploredtest 28

Update compileSource function in examples/docs to correct one

I am not a windows guy... but what the hell? Followed the example here: https://github.com/KomputeProject/kompute/tree/master/examples/array_multiplication

Frustrating... The example does not mention vulkan headers and the other deps should be optional but are not. wth?

cmake -Bbuild/ -DCMAKE_BUILD_TYPE=Debug -DKOMPUTE_OPT_INSTALL=0 -DKOMPUTE_OPT_REPO_SUBMODULE_BUILD=1 -DKOMPUTE_OPT_ENABLE_SPDLOG=1
-- Building for: Visual Studio 16 2019
-- Selecting Windows SDK version 10.0.17763.0 to target Windows 10.0.19043.
-- The C compiler identification is MSVC 19.29.30138.0
-- The CXX compiler identification is MSVC 19.29.30138.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Vulkan: C:/VulkanSDK/1.2.198.1/Lib/vulkan-1.lib
CMake Error at C:/Users/rootkid/workspace/kompute/src/CMakeLists.txt:73 (add_subdirectory):
  The source directory

    C:/Users/rootkid/workspace/kompute/external/Vulkan-Headers

  does not contain a CMakeLists.txt file.


CMake Error at C:/Users/rootkid/workspace/kompute/src/CMakeLists.txt:74 (get_target_property):
  get_target_property() called with non-existent target "Vulkan-Headers".


CMake Error at C:/Users/rootkid/workspace/kompute/src/CMakeLists.txt:85 (add_subdirectory):
  The source directory

    C:/Users/rootkid/workspace/kompute/external/fmt

  does not contain a CMakeLists.txt file.


CMake Error at C:/Users/rootkid/workspace/kompute/src/CMakeLists.txt:101 (add_subdirectory):
  The source directory

    C:/Users/rootkid/workspace/kompute/external/spdlog

  does not contain a CMakeLists.txt file.


-- Configuring incomplete, errors occurred!
See also "C:/Users/rootkid/workspace/kompute/examples/array_multiplication/build/CMakeFiles/CMakeOutput.log".

opened by kommander 23

Refactor build system
Why this PR?

I have a couple of problems with the current way Kompute handles dependencies:

git-Submodules are a rather outdated concept in my eyes and should be replaced since they are always a pain to use and are not flexible in any way.

When including Kompute in a project that also uses Spdlog, it beaks a lot of stuff, e.g. Log-Macros stay in code, but we do not link Kompute against Spdlog.

On my System (Fedora) Vulkan-Headers >= 1.3.0 are available, but my driver (mesa/intel) supports only >= 1.2.131. Well I need to link against a different version Vulkan-Headers without downgrading my System Vulkan-Headers since other applications depend on those. If I don't change the Vulkan-Header version, I always run into the following assertion: VULKAN_HPP_ASSERT( d.getVkHeaderVersion() == VK_HEADER_VERSION );

Proposed Solutions

Replace git-Submodules with CMake fetch_content

This solves 1. and 3.

I also replaced KOMPUTE_OPT_REPO_SUBMODULE_BUILD with KOMPUTE_OPT_USE_BUILD_IN_{SPDLOG, VULKAN_HEADER, ...} and added a deprecation warning for the old way.

I added KOMPUTE_OPT_BUILD_IN_VULKAN_HEADER_TAG which allows consumers to set their specific Vulkan-Header version.

I added a check_vulkan_version to CMake that checks if your hardware supports for example Vulkan 1.3 if you link against Vulkan-Headers 1.3 and prints a warning if only the patch version of your hardware is < that the Vulkan-Header version. There is also an option to disable this: KOMPUTE_OPT_DISABLE_VULKAN_VERSION_CHECK

I also cleaned up all CMake options related to dependencies and fixed a few bugs here and there to solve 2.

What is left to do?

[x] Check if it runs on Windows - working on it

[x] Check if this works with other GPUs (Nvidia)

[x] Fix Python Bindings

[x] Check if it works on Android

[x] Add all removed CMake options into the deprecated list here: https://github.com/COM8/kompute/blob/master/cmake/deprecation_warnings.cmake

[x] Update docs

Things that need to be done once the PR has been merged

Update the repo and hash for the examples

https://github.com/COM8/kompute/blob/f6d30d5b0803c48e7f74c44adaf32e9bc613b98f/examples/array_multiplication/CMakeLists.txt#L28-L29

https://github.com/COM8/kompute/blob/f6d30d5b0803c48e7f74c44adaf32e9bc613b98f/examples/logistic_regression/CMakeLists.txt#L28-L29

Let me know what you think about this.
opened by COM8 20
Create example compiling and running in raspberry pi with Mesa Vulkan drivers

It seems there are some relatively recent advancements in the Vulkan Drivers support for RaspberryPis (https://www.raspberrypi.org/blog/vulkan-update-were-conformant/). This issue encompasses exploring putting together an end to end example similar to the android example that shows how to run Kompute on a Raspberry Pi using the Mesa driver which enables for Vulkan 1.0 compliant processing in the Raspberry Pi (https://gitlab.freedesktop.org/mesa/mesa)
documentation help wanted good first issue python c++

opened by axsaucedo 20
made changes for include paths for complete installation

When installed globally(sudo make install) glslang/StandAlone headers don't get installed(afaik) and only could get it to work after changing #include <StandAlone/ResourceLimits.h> to #include <glslang/Include/ResourceLimits.h>.
Also I had to change #include <SPIRV/GlslangToSpv.h> to #include <glslang/SPIRV/GlslangToSpv.h but I figured that it would break compatibility if one wanted to run locally in the same directory and without global installation.

opened by unexploredtest 19
Deep Learning Convolutional Neural Network (CNN) example implementation

I didn't see them in the list of shaders, and searching "conv" and "convolution" in this repository didn't return much.

I have naive glsl shaders for convolutions (forwards and backwards), so I could convert those.

opened by SimLeek 17
CMake Error: Imported target "kompute::kompute" includes non-existent path "/usr/local/single_include"
When Vulkan Kompute is installed globally, it'll try to find /usr/local/single_include but it doesn't exist, the cause of problem is from:

target_include_directories( kompute PUBLIC $<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/include> $<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/single_include> $<INSTALL_INTERFACE:include> $<INSTALL_INTERFACE:single_include> )

In src/CMakeLists.txt.
opened by unexploredtest 13

Test SingleSequenceRecord is not thread safe and fails in AMD card

Hello. I am running the tests and got the following failure in the TEST(TestMultipleAlgoExecutions, SingleSequenceRecord):

error: Expected equality of these values:
  tensorA->vector()
    Which is: { 1, 1, 1 }
  std::vector<float>({ 3, 3, 3 })
    Which is: { 3, 3, 3 }

This is the sequence of commands of the test:

        mgr.sequence()
          ->record<kp::OpTensorSyncDevice>({ tensorA })
          ->record<kp::OpAlgoDispatch>(mgr.algorithm({ tensorA }, spirv))
          ->record<kp::OpAlgoDispatch>(mgr.algorithm({ tensorA }, spirv))
          ->record<kp::OpAlgoDispatch>(mgr.algorithm({ tensorA }, spirv))
          ->record<kp::OpTensorSyncLocal>({ tensorA })
          ->eval();

If I add kp::OpTensorSyncDevice between the dispatches, it also fails. However, if I add eval() or kp::OpTensorSyncLocal between the dispatches, it passes.

opened by lmreia 13

Delete methods for sequences inside managers

This solves #36 partially, the only thing that remains is creating a method that gives the ability delete a given anonymous sequence.
It's my first time making a PR for a C++ project, so it might not be good. I have provided comments for each commit explaining what each does.
enhancement

opened by unexploredtest 13

gcc12 build fails because std:shared_ptr requires explicit declation of

Build fails on opensuse tumbleweed

Steps to reproduce
    ##Tested on commit 6b8b6e864a35a43ee71fe652fc95013aacf6904f
    $git clone https://github.com/KomputeProject/kompute.git
    $cmake -Bbuild
    $cd build
    $make
    
    $gcc (SUSE Linux) 12.2.1 20221020 [revision 0aaef83351473e8f4eb774f8f999bbe87a4866d7]
    Copyright (C) 2022 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions.  There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
    
     $lsb_release -a
    LSB Version:    n/a
    Distributor ID: openSUSE
    Description:    openSUSE Tumbleweed
    Release:        20221124
    Codename:       n/a
    
   Error log
gitrepo/kompute/build> make
[ 16%] Built target fmt
[ 27%] Built target kp_shader
[ 38%] Built target kp_logger
[ 44%] Building CXX object src/CMakeFiles/kompute.dir/OpTensorCopy.cpp.o
    In file included from /home/doof/gitrepo/kompute/src/include/kompute/operations/OpTensorCopy.hpp:6,
                     from /home/doof/gitrepo/kompute/src/OpTensorCopy.cpp:3:
    /home/doof/gitrepo/kompute/src/include/kompute/Tensor.hpp:55:27: error: expected ‘)’ before ‘<’ token
       55 |     Tensor(std::shared_ptr<vk::PhysicalDevice> physicalDevice,
          |           ~               ^
          |                           )
    /home/doof/gitrepo/kompute/src/include/kompute/Tensor.hpp:108:30: error: ‘std::shared_ptr’ has not been declared
      108 |                         std::shared_ptr<Tensor> copyFromTensor);
          |                              ^~~~~~~~~~
    /home/doof/gitrepo/kompute/src/include/kompute/Tensor.hpp:108:40: error: expected ‘,’ or ‘...’ before ‘<’ token
      108 |                         std::shared_ptr<Tensor> copyFromTensor);
          |                                        ^
    /home/doof/gitrepo/kompute/src/include/kompute/Tensor.hpp:256:10: error: ‘shared_ptr’ in namespace ‘std’ does not name a template type
      256 |     std::shared_ptr<vk::PhysicalDevice> mPhysicalDevice;
          |          ^~~~~~~~~~
    /home/doof/gitrepo/kompute/src/include/kompute/Tensor.hpp:6:1: note: ‘std::shared_ptr’ is defined in header ‘<memory>’; did you forget to ‘#include <memory>’?
        5 | #include "logger/Logger.hpp"
      +++ |+#include <memory>
        6 | #include <string>
    /home/doof/gitrepo/kompute/src/include/kompute/Tensor.hpp:257:10: error: ‘shared_ptr’ in namespace ‘std’ does not name a template type
      257 |     std::shared_ptr<vk::Device> mDevice;
          |          ^~~~~~~~~~
    /home/doof/gitrepo/kompute/src/include/kompute/Tensor.hpp:257:5: note: ‘std::shared_ptr’ is defined in header ‘<memory>’; did you forget to ‘#include <memory>’?
      257 |     std::shared_ptr<vk::Device> mDevice;
          |     ^~~
    /home/doof/gitrepo/kompute/src/include/kompute/Tensor.hpp:260:10: error: ‘shared_ptr’ in namespace ‘std’ does not name a template type
      260 |     std::shared_ptr<vk::Buffer> mPrimaryBuffer;

opened by hungrymonkey 12

java.lang.UnsatisfiedLinkError: dlopen failed: library "libkompute-jni.so" not found

Hello, I am following the following tutorial: https://towardsdatascience.com/gpu-accelerated-machine-learning-in-your-mobile-applications-using-the-android-ndk-vulkan-kompute-1e9da37b7617. I am running ubuntu 20 and whenever I run the emulator I get the following error.

E/AndroidRuntime: FATAL EXCEPTION: main Process: com.ethicalml.kompute, PID: 14488 java.lang.UnsatisfiedLinkError: dlopen failed: library "libkompute-jni.so" not found at java.lang.Runtime.loadLibrary0(Runtime.java:1087) at java.lang.Runtime.loadLibrary0(Runtime.java:1008) at java.lang.System.loadLibrary(System.java:1664) at com.ethicalml.kompute.KomputeJni.<clinit>(KomputeJni.kt:80) at java.lang.Class.newInstance(Native Method) at android.app.AppComponentFactory.instantiateActivity(AppComponentFactory.java:95) at androidx.core.app.CoreComponentFactory.instantiateActivity(CoreComponentFactory.java:41) at android.app.Instrumentation.newActivity(Instrumentation.java:1253) at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:3353) at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:3601) at android.app.servertransaction.LaunchActivityItem.execute(LaunchActivityItem.java:85) at android.app.servertransaction.TransactionExecutor.executeCallbacks(TransactionExecutor.java:135) at android.app.servertransaction.TransactionExecutor.execute(TransactionExecutor.java:95) at android.app.ActivityThread$H.handleMessage(ActivityThread.java:2066) at android.os.Handler.dispatchMessage(Handler.java:106) at android.os.Looper.loop(Looper.java:223) at android.app.ActivityThread.main(ActivityThread.java:7656) at java.lang.reflect.Method.invoke(Native Method) at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:592) at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:947) I/Process: Sending signal. PID: 14488 SIG: 9

opened by PascalPolygon 11
Instance creation fails on macOS with recent Vulkan SDK
Running on MacOS with Vulkan SDK 1.3.231. The Manager::createInstance() fails silently when calling

vk::createInstance(&computeInstanceCreateInfo, nullptr, this->mInstance.get());

Perhaps it should check the returned code, which in this case happens to be VK_ERROR_INCOMPATIBLE_DRIVER. It seems that the recent SDK requires the VK_KHR_PORTABILITY_ENUMERATION_EXTENSION_NAME extension to be enabled on MacOS:

applicationExtensions.push_back(VK_KHR_PORTABILITY_ENUMERATION_EXTENSION_NAME); computeInstanceCreateInfo.flags |= vk::InstanceCreateFlagBits::eEnumeratePortabilityKHR; ... vk::createInstance(&computeInstanceCreateInfo, nullptr, this->mInstance.get());
opened by Archie3d 0

"Kompute Tensor device is null" when rebuilding tensor

When creating a tensor from manager with

inline std::shared_ptr<kp::TensorT<float>> kp::Manager::tensorT<float>(const kp::Constants &data, kp::Tensor::TensorTypes tensorType = kp::Tensor::TensorTypes::eDevice)

and then calling rebuild on created tensor it will throw "Kompute Tensor device is null"

I'm not very familiar with kompute yet but it seems if in rebuild member function following branch gets executed

if (this->mPrimaryBuffer || this->mPrimaryMemory) {
        KP_LOG_DEBUG(
          "Kompute Tensor destroying existing resources before rebuild");
        this->destroy();
    }

then in allocateMemoryCreateGPUResources

if (!this->mDevice) {
        throw std::runtime_error("Kompute Tensor device is null");
    }

gets executed every time because after this->destroy() mDevice is nullptr every time. This can be seen at Tensor.cpp:548

if (this->mDevice) {
    this->mDevice = nullptr;
}

opened by MiroPalmu 0

memory sanitizer reports errors for simple example

Hi

I am trying to integrate the given simple multiplication example into a project. I use the memory sanitizer by default and it is showing following info.

single_include/kompute/Kompute.hpp:1049:30: runtime error: member call on address 0x555559947500 which does not point to an object of type 'TensorT' 0x555559947500: note: object has invalid vptr

Please help on how to proceed.

Also, in debug mode I am getting a segmentation fault in manager.cpp at this line this->mInstance->destroyDebugReportCallbackEXT(

why is this happening ?

opened by mkandulavm 2

can not use kp::Algorithm::setPushConstants(const std::vector& pushConstants) in v0.8.1

kp::Algorithm::setPushConstants(const std::vector& pushConstants) can not compile success.

compiler context : windows 10 x64, VS2017 x64

    template<typename T>
    void setPushConstants(const std::vector<T>& pushConstants)  // specify to const vector, and `data()` return const T *
    {
        uint32_t memorySize = sizeof(decltype(pushConstants.back()));
        uint32_t size = pushConstants.size();

        this->setPushConstants(pushConstants.data(), size, memorySize); // here invoke non constant version function
    }

// only provide non constant version 

 /**
     * Sets the push constants to the new value provided to use in the next
     * bindPush() with the raw memory block location and memory size to be used.
     *
     * @param data The raw data point to copy the data from, without modifying
     * the pointer.
     * @param size The number of data elements provided in the data
     * @param memorySize The memory size of each of the data elements in bytes.
     */
    void setPushConstants(void* data, uint32_t size, uint32_t memorySize) // <- void * data, so can not compile 
    {

        uint32_t totalSize = memorySize * size;
        uint32_t previousTotalSize =
          this->mPushConstantsDataTypeMemorySize * this->mPushConstantsSize;

        if (totalSize != previousTotalSize) {
            throw std::runtime_error(fmt::format(
              "Kompute Algorithm push "
              "constant total memory size provided is {} but expected {} bytes",
              totalSize,
              previousTotalSize));
        }
        if (this->mPushConstantsData) {
            free(this->mPushConstantsData);
        }

        this->mPushConstantsData = malloc(totalSize);
        memcpy(this->mPushConstantsData, data, totalSize);
        this->mPushConstantsDataTypeMemorySize = memorySize;
        this->mPushConstantsSize = size;
    }

opened by cracy3m 3

Validation Error ：VUID-vkBeginCommandBuffer-commandBuffer-00050
Calling record() after calling eval() on the same kp:: sequence object will cause a vuid-vkbegincommandbuffer-commandbuffer-00050 Vulkan validation error.

Error occur in README.md demo:

... // 4. Run operation synchronously using sequence mgr.sequence() ->record<kp::OpTensorSyncDevice>(params) ->record<kp::OpAlgoDispatch>(algorithm) // Binds default push consts ->eval() // Evaluates the two recorded operations ->record<kp::OpAlgoDispatch>(algorithm, pushConstsB) // Overrides push consts ,<--- vuid-vkbegincommandbuffer-commandbuffer-00050 ->eval(); // Evaluates only last recorded operation ...

Kompute create vulkan command pool without VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT, but kp::Sequence::begin will call vkBeginCommandBuffer(if command buffer is not in the initial state, the validation layer will get a error message ),

Will such error cause problems? Or should record() not be used after kp:: sequence:: eval()?
opened by cracy3m 5
Cannot create tensor without initialization data

Is there a way to create a Tensor without a initialization data? The data arg of kp::Tensor () function set to nullptr will occur exception (alway use memcpy(..., data, ...) inside the function)!

kp::Tensor ( std::shared_ptr< vk::PhysicalDevice > physicalDevice, std::shared_ptr< vk::Device > device, void *data, uint32_t elementTotalCount, uint32_t elementMemorySize, const TensorDataTypes &dataType, const TensorTypes &tensorType=TensorTypes::eDevice )

opened by cracy3m 9

Releases(v0.8.1)

v0.8.1(Apr 13, 2022)
v0.8.1

Full Changelog

Closed issues:

Discord link in README and docs is broken #276

Website examples typo's and 6500 XT unknown GPU #275

[Question] How to disable all log ? #274

full diagram 404 #271

Error when enabling KOMPUTE\_ENABLE\_SPDLOG #268

Add KOMPUTE_LOG_ACTIVE_LEVEL instead of current SPDLOG_ACTIVE_LEVEL #267

Update/Fix Android project #264

Update compileSource function in examples/docs to correct one #261

Technically can Kompute be modified to support data visualization? #260

Data-transfer for Integrated GPU #258

Python "getting started" example fails #252

Python example in README doesn't work #248

Running Android app #234

Merged pull requests:

Added active log level definitions for kompute #280 (axsaucedo)

Fix TestDestroy.TestDestroyTensorSingle #279 (ScheissSchiesser)

Updated discord link #277 (axsaucedo)

style(src/Algorithm): fix typo #273 (tpoisonooo)

Fix Android Example confirmed with blog post steps #266 (axsaucedo)

Adding Governance with TSC charter #263 (axsaucedo)

Updating array_mutiplication example to work correctly #262 (axsaucedo)

Updated formatting #257 (axsaucedo)

Fix first two python examples in the docs #256 (lopuhin)

Remove nonexisting "single_include" from INSTALL_INTERFACE #254 (ItsBasi)

Added community page #253 (axsaucedo)

Updated readme to reflect shader utils #249 (axsaucedo)

Avoid using pointers to temporary copies of desired extensions. #247 (ItsBasi)

Source code(tar.gz)
Source code(zip)
Kompute.hpp(103.63 KB)
v0.8.0(Sep 16, 2021)
v0.8.0

Full Changelog

Closed issues:

An unset KOMPUTE_ENV_DEBUG_LAYERS leads KP_LOG_DEBUG to pass envLayerNamesVal==nullptr #245

Extend utils shader helpers in test for windows #240

Python segfaults after import kp #230

Simple and extended python examples do not work (v 0.7.0) #228

Python macOS issue (ImportError: dlopen(...): no suitable image found. Did find: ...: mach-o, but wrong architecture) #223

Python macOS issue (Symbol not found: __PyThreadState_Current ... Expected in: flat namespace) #221

Finalise Migration of Kompute into Linux Foundation #216

CMake Error: Imported target "kompute::kompute" includes non-existent path "/usr/local/single_include" #212

Incompatibality inroduced with #168 on Vulkan 1.1.x #209

external libraries #201

Starting slack group or discord for alternative / faster version of asking questions #198

Test SingleSequenceRecord is not thread safe and fails in AMD card #196

Update Kompute headers to reference the glslang headers for install vs build interfaces #193

Integrate with GLSLang find_package file when issue is resolved in the glslang repo #191

Release 0.7.0 #187

Get number of available devices #185

Deep Learning Convolutional Neural Network (CNN) example implementation #162

Create example compiling and running in raspberry pi with Mesa Vulkan drivers #131

Add support for VK_EXT_debug_utils labels #110

Merged pull requests:

Fix for null debug log causing exception in fmt lib #246 (axsaucedo)

0.8.0 Release #244 (axsaucedo)

Adding support for different types for spec and push consts #242 (axsaucedo)

Extend shader helper functions in tests to support windows #241 (axsaucedo)

Increase test cov across codebase #239 (axsaucedo)

Updated collab link for C++ notebook #237 (axsaucedo)

Updating repo licenses and links #236 (axsaucedo)

Removing GLSLang as core dependency #235 (axsaucedo)

Naive matrice multiplication example #233 (Corentin-pro)

Fixed typo in CMakeLists.txt (ANDOID => ANDROID) #232 (Corentin-pro)

Set kp_debug, kp_info, kp_warning and kp_error to py::none() when the program terminates. #231 (thinking-tower)

VGG7 Python example #227 (20kdc)

Add documentation for CMake flags #224 (thinking-tower)

Set PYTHON_INCLUDE_DIR and PYTHON_LIBRARY during installation #222 (thinking-tower)

Removing xxd.exe binary and add instructions to build #220 (axsaucedo)

[PYTHON] Ensure numpy array increments refcount of tensor to keep valid #219 (axsaucedo)

Added destroy for manager #218 (axsaucedo)

Revert "Fixed the issue that caused CMake to look for non-existent path after being installed" #217 (axsaucedo)

Fixed the issue that caused CMake to look for non-existent path after being installed #213 (unexploredtest)

omitted .data() because it is incompatible with vulkan 1.1.x #211 (unexploredtest)

vkEnumeratePhysicalDevices(*(this->mInstance) ... doesn't work on Linux i386 #208 (unexploredtest)

Raises an error when having no/exceeding vulkan device's limit #207 (unexploredtest)

Updated README and fixed a syntax error on C++'s example #206 (unexploredtest)

removed the extra comma after KOMPUTE_OPT_REPO_SUBMODULE_BUILD #205 (unexploredtest)

Extending list_devices test for multiple devices #204 (axsaucedo)

Fix #include <SPIRV/GlslangToSpv.h> #200 (unexploredtest)

Added memory barrier on test #199 (axsaucedo)

Add function to list physical devices #195 (axsaucedo)

v0.7.0 release #189 (axsaucedo)

Add instructions for running on Pi4 #180 (hpgmiskin)

Source code(tar.gz)
Source code(zip)
Kompute.hpp(102.97 KB)
v0.7.0(Mar 14, 2021)
Release v0.7.0

The 0.7.0 release introduces a very extensive list of features - a high level overview includes:

Support for push constants

Support for specailisation constants

Support for tensor data types bool, float, double, int32 and uint32

Ability to define Operations outside manager

Ability to create Algorithm outside manager

New OpMemoryBarrier to add custom barriers

New OpAlgoDispatch to dispatch algorithm with push constants

New interface for sequences

New memory relationships all managed by top level manager with weak references allowing for smart pointers to terminate objects

Code coverage metrics using gcov + lcov

Implemented enhancements:

Extend non-spdlog print functions to use std::format #158

Add code coverage reports with codecov #145

Explore removing std::vector mData; completely from Tensor in favour of always storing data in hostVisible buffer memory (TBC) #144

Update all examples to match breaking changes in 0.7.0 #141

Avoid copy when returning python numpy / array #139

Cover all Python & C++ tests in CI #121

Add C++ Test for Simple Work Groups Example #117

Expose push constants in OpAlgo #54

Expose ability to create barriers in OpTensor operations #45

Create delete function in manager to free / destroy sequence #36

Make specialisation data extensible #12

Support multiple types for Kompute Tensors #2

Added re-record sequence functionality and updated docs #171 (axsaucedo)

Extend non-spdlog print functions to use fmt::format / fmt::print #159 (axsaucedo)

Added support for custom SpecializedConstants and removed KomputeWorkgroup class #151 (axsaucedo)

Added destroy functions for tensors and sequences (named and object) #146 (axsaucedo)

Fixed bugs:

push_constant not working in my case? #168

DescriptorPool set is not being freed #155

Updated memory barriers to include staging buffers #182 (axsaucedo)

Adds push const ranges in pipelinelayout to fix #168 #174 (axsaucedo)

Added destructor for staging tensors #134 (axsaucedo)

Closed issues:

Update memory barriers to align with tensor staging/primary memory revamp #181

Move shader defaultResource inside kp::Shader class #175

Reach at least 90% code coverage on tests #170

Add functionality to re-record sequence as now it's possible to update the underlying algorithm #169

Use numpy arrays as default return value #166

Update all shared_ptr value passes to be by ref or const ref #161

Amend memory hierarchy for kp::Operations so they can be created separately #160

Customise theme of documentation #156

Remove KomputeWorkgroup class in favour of std::array<uint32_t, 3> #152

Passing raw GLSL string to Shader Module depricated so remove this method from supported approach #150

Add python backwards compatibility for eval_tensor_create_def #147

Document breaking changes for 0.7.0 #140

Tensor memory management and memory hierarchy redesign #136

Staging tensor GPU memory is not freed as part of OpCreateTensor removal #133

eStorage Tensors are currently unusable as OpTensorCreate calls mapDataIntoHostMemory #132

0.6.0 Release #126

java.lang.UnsatisfiedLinkError: dlopen failed: library "libkompute-jni.so" not found #125

Initial exploration: Include explicit GLSL to SPIRV compilation #107

Add support for push constants #106

Merged pull requests:

Resolve moving all functions from tensor HPP to CPP #186 (axsaucedo)

Device Properties #184 (alexander-g)

Too many warnings #183 (alexander-g)

Add support for bool, double, int32, uint32 and float32 on Tensors via TensorT #177 (axsaucedo)

Support for Timestamping #176 (alexander-g)

Test for ShaderResources #165 (aliPMPAINT)

Amend memory hierarchy to enable for push constants and functional interface for more flexible operations #164 (axsaucedo)

made changes for include paths for complete installation #163 (aliPMPAINT)

Added dark mode on docs #157 (axsaucedo)

Glslang implementation for online shader compilation #154 (axsaucedo)

Adding test code coverage using gcov and lcov #149 (axsaucedo)

Added temporary backwards compatibility for eval_tensor_create_def function #148 (axsaucedo)

Amend memory ownership hierarchy to have Tensor owned by Manager instead of OpCreateTensor / OpBase #138 (axsaucedo)

Removed Staging Tensors in favour of having two buffer & memory in a Tensor to minimise data transfer #137 (axsaucedo)

Source code(tar.gz)
Source code(zip)
Kompute.hpp(97.97 KB)
v0.6.0(Jan 31, 2021)
v0.6.0

Full Changelog

Implemented enhancements:

Add simple test for Python log\_level function #120

Add further numpy support #104

SWIG syntax error - change order of keywords. #94

Create mocks to isolate unit tests for components #8

Disallowing zero sized tensors #129 (alexander-g)

Added further tests to CI and provide Dockerimage with builds to swiftshader #119 (axsaucedo)

Workgroups for Python #116 (alexander-g)

Ubuntu CI #115 (alexander-g)

Faster set_data() #109 (alexander-g)

String parameter for eval_algo_str methods in Python #105 (alexander-g)

Added numpy() method #103 (alexander-g)

Fixed bugs:

[PYTHON] Support string parameter instead of list for eval_algo_data when passing raw shader as string #93

[PYTHON] Fix log_level on the python implementation (using pybind's logging functions) #92

Closed issues:

Add documentation for custom operations #128

Numpy Array Support and Work Group Configuration in Python Kompute #124

Remove references to spdlog in python module #122

Setup automated CI testing for PRs using GitHub actions #114

Python example type error (pyshader). #111

Update all references to operations to not use template #101

Getting a undefined reference error while creating a Kompute Manager #100

Merged pull requests:

122 remove spdlog references in python #123 (axsaucedo)

Native logging for Python #118 (alexander-g)

Fixes for the c++ Simple and Extended examples in readme #108 (aliPMPAINT)

Fix building shaders on native linux #102 (aliPMPAINT)

Source code(tar.gz)
Source code(zip)
Kompute.hpp(95.19 KB)
v0.5.1(Nov 14, 2020)
Fixed error with Kompute-go SWIG integration #94

Added python list operations for Tensor (setitem, len, etc)

Added new python blog post

Source code(tar.gz)
Source code(zip)
Kompute.hpp(94.63 KB)
v0.5.0(Nov 8, 2020)
Migrated all OpAlgoBase components to use dynamic layouts instead of templates #26, #57

Updated all examples to use spir-v bytes by default #86

Added compatibility for Vulkan Headers HPP v1.2.154+ #84

Added Python Pypi package for Kompute #87

Added python interface functions to process python spirv bytes directly* Added implementation of Logistic Regression implementation in Python

Extended examples to showcase pyshader to use more pythonic GPU development

Enabled spdlog builds by default on python package

Added multi-platform python package installs via pypi https://pypi.org/project/kp/

Added log level config functionality in python

Added Python Bindings for Kompute library

Added Python tests showcasing core functionality using Manager and Sequences

Added documentation integrating pyhton class references (to be automated)

Changed sequences to be returned as shader_ptr instead of weak_ptr

Added sequence memory management via init member function

Added explicit definition on VulkanDestroy funtions for VulkanHPP 1.2.155-1.2.158 compat

Removed template parameters from OpAlgoBase functions (added Op*.cpp files)

Added python build to main cmake file

Added pybind11 submodule

Added Sequence tests to verify memory management via init member function

Update e2e examples

Add Python documentation and further examples

Source code(tar.gz)
Source code(zip)
Kompute.hpp(94.63 KB)
v0.4.1(Nov 1, 2020)
Support for Vulkan HPP version 1.2.155 and above (tested with every version until 1.2.158)

Updated linux docker image to test multiple versions of Vulkan SDK

CCLS support for submodule builds (besides vcpkg builds)

Submodules for core dependencies for flexibility when testing dependencies with particular versions

Build flags to configure submodule builds vs vcpkg (toolchain based) builds

Removed range prints which removes explicit dependency on fmt

Syntax correction of private/protected member variables

AUR package added by contributor https://aur.archlinux.org/packages/vulkan-kompute-git/ via #81

Source code(tar.gz)
Source code(zip)
Kompute.hpp(106.00 KB)
v0.4.0(Oct 18, 2020)
Added async/await capabilities for manager and sequence

Added fence resource as member object for Sequence

Ensure sequence begin() function clears previous operations

Ensure begin does not get called if sequence in running state

Ensure manager creates new sequence when default functions called

Added capabilities for multi queue support in manager with explicit allocation on sequences

Fixed compile warnings on Linux (ubuntu)

Added createManagedSequence on manager with ability for default sequence name to be created

Added tests for asynchronous and parallel processing

Added LogisticRegression shader as cpp header

Added documentation on advanced examples

Added documentation on shader-to-cpp-header scripts

Added CNAME for kompute.cc domain

Source code(tar.gz)
Source code(zip)
Kompute.hpp(105.76 KB)
v0.3.2(Oct 4, 2020)
Enabled dynamic loading of Vulkan library for Vulkan support

Added Android NDK Wrapper with Vulkan HPP support

Updated CMAKE min standard to C++14

Downgraded min cmake requirement to 3.4.1

Added new shader file as shaderlogisticregression as part of AggregateHeader

Added #pragma once guard to Kompte.hpp single header

Updated createComputePipeline to return Result instead of ResultValue for backwards compatibility

Added new compute flags for android including:

KOMPUTE_OPT_INSTALL

KOMPUTE_OPT_ANDROID_BUILD

KOMPUTE_OPT_DISABLE_VK_DEBUG_LAYERS

KOMPUTE_VK_API_VERSION

KOMPUTE_VK_API_MINOR_VERSION

KOMPUTE_VK_API_MAJOR_VERSION

Source code(tar.gz)
Source code(zip)
Kompute.hpp(99.30 KB)
v0.3.1(Sep 20, 2020)
#61 Fixed bug in OpAlgoRhsLhsOut

#60 Add example of how vulkan kompute can be used for ML in Godot Game Engine

#59 Changed c++ to 14 from 17 for support with older frameworks

Source code(tar.gz)
Source code(zip)
Kompute.hpp(66.82 KB)
v0.3.0(Sep 13, 2020)
#50 - Added documentation and testing for reusing recorded commands

#19 - Added Machine Learning example with Kompute

#40 - Provide further granularity for handling tensor data

#39 - Create OpTensorSyncLocal and OpTensorSyncDevice operations

#43 - Renamed OpCreateTensor to OpTensorCreate to align with tensor operations

#43 - Fixed OpTensorCreate not mapping data for host tensors

#47 - Add preSubmit function to OpBase to account for multiple eval commands

#58 - Add standalone examples that show using Kompute from scratch

#58 - Make Kompute installable locally

#56 - Remove OpAlgoBase copy tensor functionality to delegate to OpTensorSync

Source code(tar.gz)
Source code(zip)
Kompute.hpp(66.67 KB)
v0.2.0(Sep 5, 2020)
#18 - Improve access to underlying data for speed and ease of access

#17 - Enabled for compute shaders to be passed as files or strings

#13 - Enable OpCreateTensor to receive more than 1 tensor

#11 - Add default specialisation data to algorithm holding all tensor sizes

#9 - Added documentation automated with doxygen and sphinx

#15 - Memory profiling and ensured no memory leaks explicitly

#30 - Removed spdlog as required dependency and into optional

#37 - Migrated to GTest

Source code(tar.gz)
Source code(zip)
Kompute.hpp(60.65 KB)
v0.1.0(Aug 29, 2020)
Vulkan Kompute Release 0.1.0

The 0.1.0 release of Vulkan Kompute, the General Purpose Vulkan Compute Framework.

Features

Automatic detection and selection of phyiscal GPU device

Single header import with dynamic library available for easy integration

Uses vcpkg with optional manifest to download / fetch dependencies

Provides high level kompute interface via Manager and Sequence

GPU processing data abstracted with Tensors

Compute shaders and pipelines are abstracted via algorithms

Base options for operations including create tensor and multiplication as example

Ability to initialise components with external vulkan resources

Conversion of glsl shaders into SPIR-V and native HPP files for static building

DEBUG option for verbose builds that cover execution path in detail with SPDLOG

Docker image to build the library in linux environments

CMAKE support with tested functionality in linux and windows

Documentation containing autodoc with Doxygen using Sphinx

Unit and integration testing using Catch2

Source code(tar.gz)
Source code(zip)
Kompute.hpp(40.65 KB)

General purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends)

Related tags

Overview

Kompute

The general purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends).

Blazing fast, mobile-enabled, asynchronous, and optimized for advanced GPU acceleration usecases.

Kompute is backed by the Linux Foundation as a hosted project by the LF AI & Data Foundation.

Principles & Features

Getting Started

Your First Kompute (C++)

Your First Kompute (Python)

Interactive Notebooks & Hands on Videos

Try the interactive C++ Colab from Blog Post

Try the interactive Python Colab from Blog Post

Watch the video for C++ Enthusiasts

Watch the video for Python & Machine Learning Enthusiasts

Architectural Overview

Asynchronous and Parallel Operations

Mobile Enabled

More examples

Simple examples

End-to-end examples

Python Package

C++ Build Overview

Kompute Development

Contributing

Dev Dependencies

Development

Updating documentation

Running tests

Motivations

Comments

Why this PR?

Proposed Solutions

What is left to do?

Things that need to be done once the PR has been merged

kp::Algorithm::setPushConstants(const std::vector& pushConstants) can not compile success.

Releases(v0.8.1)

v0.8.1(Apr 13, 2022)

v0.8.0(Sep 16, 2021)

v0.7.0(Mar 14, 2021)

Release v0.7.0

v0.6.0(Jan 31, 2021)

v0.5.1(Nov 14, 2020)

v0.5.0(Nov 8, 2020)

v0.4.1(Nov 1, 2020)

v0.4.0(Oct 18, 2020)

v0.3.2(Oct 4, 2020)

v0.3.1(Sep 20, 2020)

v0.3.0(Sep 13, 2020)

v0.2.0(Sep 5, 2020)

v0.1.0(Aug 29, 2020)

Vulkan Kompute Release 0.1.0

Features

Owner

The Kompute Project

Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation. Intel iHD GPU (iGPU) support. NVIDIA GPU (dGPU) support.

Spam your friends and famly and when you do your famly will disown you and you will have no friends.

HashNeRF-pytorch - Pure PyTorch Implementation of NVIDIA paper on Instant Training of Neural Graphics primitives

ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representation from common sense knowledge graphs.

NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.

A python package simulating the quasi-2D pseudospin-1/2 Gross-Pitaevskii equation with NVIDIA GPU acceleration.

(Py)TOD: Tensor-based Outlier Detection, A General GPU-Accelerated Framework

Implementation of self-attention mechanisms for general purpose. Focused on computer vision modules. Ongoing repository.

a general-purpose Transformer based vision backbone

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation

ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection

A task-agnostic vision-language architecture as a step towards General Purpose Vision

A general-purpose, flexible, and easy-to-use simulator alongside an OpenAI Gym trading environment for MetaTrader 5 trading platform (Approved by OpenAI Gym)

Unofficial PyTorch implementation of MobileViT based on paper "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer".

A general-purpose programming language, focused on simplicity, safety and stability.

GrabGpu_py: a scripts for grab gpu when gpu is free

Cards Against Humanity AI

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped

Rendering Point Clouds with Compute Shaders