General purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends)

Overview

GitHub GitHub GitHub GitHub GitHub CII Best Practices

Kompute

The general purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends).

Blazing fast, mobile-enabled, asynchronous, and optimized for advanced GPU acceleration usecases.

💬 Join the Discord & Community Calls 🔋 Documentation 💻 Blog Post Examples 💾


Kompute is backed by the Linux Foundation as a hosted project by the LF AI & Data Foundation.

Principles & Features

Getting Started

Below you can find a GPU multiplication example using the C++ and Python Kompute interfaces.

You can join the Discord for questions/discussion, open a github issue, or read the documentation.

Your First Kompute (C++)

The C++ interface provides low level access to the native components of Kompute, enabling for advanced optimizations as well as extension of components.

vector()) std::cout << elem << " "; } // Manages / releases all CPU and GPU memory resources int main() { // Define a raw string shader (or use the Kompute tools to compile to SPIRV / C++ header // files). This shader shows some of the main components including constants, buffers, etc std::string shader = (R"( #version 450 layout (local_size_x = 1) in; // The input tensors bind index is relative to index in parameter passed layout(set = 0, binding = 0) buffer buf_in_a { float in_a[]; }; layout(set = 0, binding = 1) buffer buf_in_b { float in_b[]; }; layout(set = 0, binding = 2) buffer buf_out_a { uint out_a[]; }; layout(set = 0, binding = 3) buffer buf_out_b { uint out_b[]; }; // Kompute supports push constants updated on dispatch layout(push_constant) uniform PushConstants { float val; } push_const; // Kompute also supports spec constants on initalization layout(constant_id = 0) const float const_one = 0; void main() { uint index = gl_GlobalInvocationID.x; out_a[index] += uint( in_a[index] * in_b[index] ); out_b[index] += uint( const_one * push_const.val ); } )"); // Run the function declared above with our raw string shader kompute(shader); } ">
void kompute(const std::string& shader) {

    // 1. Create Kompute Manager with default settings (device 0, first queue and no extensions)
    kp::Manager mgr; 

    // 2. Create and initialise Kompute Tensors through manager

    // Default tensor constructor simplifies creation of float values
    auto tensorInA = mgr.tensor({ 2., 2., 2. });
    auto tensorInB = mgr.tensor({ 1., 2., 3. });
    // Explicit type constructor supports uint32, int32, double, float and bool
    auto tensorOutA = mgr.tensorT<uint32_t>({ 0, 0, 0 });
    auto tensorOutB = mgr.tensorT<uint32_t>({ 0, 0, 0 });

    std::vector
   
    > params = {tensorInA, tensorInB, tensorOutA, tensorOutB};

    
    // 3. Create algorithm based on shader (supports buffers & push/spec constants)
    kp::Workgroup 
    workgroup({
    3, 
    1, 
    1});
    std::vector<
    float> 
    specConsts({ 
    2 });
    std::vector<
    float> 
    pushConstsA({ 
    2.0 });
    std::vector<
    float> 
    pushConstsB({ 
    3.0 });

    
    auto algorithm = mgr.
    algorithm(params,
                                   
    // See documentation shader section for compileSource
                                   
    compileSource(shader),
                                   workgroup,
                                   specConsts,
                                   pushConstsA);

    
    // 4. Run operation synchronously using sequence
    mgr.
    sequence()
        ->
    record
    
     (params)
        ->
     record
     
      (algorithm) 
      // Binds default push consts
        ->
      eval() 
      // Evaluates the two recorded operations
        ->
      record
      
       (algorithm, pushConstsB) 
       // Overrides push consts
        ->
       eval(); 
       // Evaluates only last recorded operation

    
       // 5. Sync results from the GPU asynchronously
    
       auto sq = mgr.
       sequence();
    sq->
       evalAsync
       
        (params); 
        // ... Do other work asynchronously whilst GPU finishes sq->
        evalAwait(); 
        // Prints the first output which is: { 4, 8, 12 } 
        for (
        const 
        float& elem : tensorOutA->
        vector()) std::cout << elem << 
        " "; 
        // Prints the second output which is: { 10, 10, 10 } 
        for (
        const 
        float& elem : tensorOutB->
        vector()) std::cout << elem << 
        " "; } 
        // Manages / releases all CPU and GPU memory resources 
        int 
        main() { 
        // Define a raw string shader (or use the Kompute tools to compile to SPIRV / C++ header 
        // files). This shader shows some of the main components including constants, buffers, etc std::string shader = (
        R"( 
         #version 450 
         
         layout (local_size_x = 1) in; 
         
         // The input tensors bind index is relative to index in parameter passed 
         layout(set = 0, binding = 0) buffer buf_in_a { float in_a[]; }; 
         layout(set = 0, binding = 1) buffer buf_in_b { float in_b[]; }; 
         layout(set = 0, binding = 2) buffer buf_out_a { uint out_a[]; }; 
         layout(set = 0, binding = 3) buffer buf_out_b { uint out_b[]; }; 
         
         // Kompute supports push constants updated on dispatch 
         layout(push_constant) uniform PushConstants { 
         float val; 
         } push_const; 
         
         // Kompute also supports spec constants on initalization 
         layout(constant_id = 0) const float const_one = 0; 
         
         void main() { 
         uint index = gl_GlobalInvocationID.x; 
         out_a[index] += uint( in_a[index] * in_b[index] ); 
         out_b[index] += uint( const_one * push_const.val ); 
         } 
         )"); 
        // Run the function declared above with our raw string shader 
        kompute(shader); } 
       
      
     
    
   
  

Your First Kompute (Python)

The Python package provides a high level interactive interface that enables for experimentation whilst ensuring high performance and fast development workflows.

from .utils import compile_source # using util function from python/test/utils

def kompute(shader):
    # 1. Create Kompute Manager with default settings (device 0, first queue and no extensions)
    mgr = kp.Manager()

    # 2. Create and initialise Kompute Tensors through manager

    # Default tensor constructor simplifies creation of float values
    tensor_in_a = mgr.tensor([2, 2, 2])
    tensor_in_b = mgr.tensor([1, 2, 3])
    # Explicit type constructor supports uint32, int32, double, float and bool
    tensor_out_a = mgr.tensor_t(np.array([0, 0, 0], dtype=np.uint32))
    tensor_out_b = mgr.tensor_t(np.array([0, 0, 0], dtype=np.uint32))

    params = [tensor_in_a, tensor_in_b, tensor_out_a, tensor_out_b]

    # 3. Create algorithm based on shader (supports buffers & push/spec constants)
    workgroup = (3, 1, 1)
    spec_consts = [2]
    push_consts_a = [2]
    push_consts_b = [3]

    # See documentation shader section for compile_source
    spirv = compile_source(shader)

    algo = mgr.algorithm(params, spirv, workgroup, spec_consts, push_consts_a)

    # 4. Run operation synchronously using sequence
    (mgr.sequence()
        .record(kp.OpTensorSyncDevice(params))
        .record(kp.OpAlgoDispatch(algo)) # Binds default push consts provided
        .eval() # evaluates the two recorded ops
        .record(kp.OpAlgoDispatch(algo, push_consts_b)) # Overrides push consts
        .eval()) # evaluates only the last recorded op

    # 5. Sync results from the GPU asynchronously
    sq = mgr.sequence()
    sq.eval_async(kp.OpTensorSyncLocal(params))

    # ... Do other work asynchronously whilst GPU finishes

    sq.eval_await()

    # Prints the first output which is: { 4, 8, 12 }
    print(tensor_out_a)
    # Prints the first output which is: { 10, 10, 10 }
    print(tensor_out_b)

if __name__ == "__main__":

    # Define a raw string shader (or use the Kompute tools to compile to SPIRV / C++ header
    # files). This shader shows some of the main components including constants, buffers, etc
    shader = """
        #version 450

        layout (local_size_x = 1) in;

        // The input tensors bind index is relative to index in parameter passed
        layout(set = 0, binding = 0) buffer buf_in_a { float in_a[]; };
        layout(set = 0, binding = 1) buffer buf_in_b { float in_b[]; };
        layout(set = 0, binding = 2) buffer buf_out_a { uint out_a[]; };
        layout(set = 0, binding = 3) buffer buf_out_b { uint out_b[]; };

        // Kompute supports push constants updated on dispatch
        layout(push_constant) uniform PushConstants {
            float val;
        } push_const;

        // Kompute also supports spec constants on initalization
        layout(constant_id = 0) const float const_one = 0;

        void main() {
            uint index = gl_GlobalInvocationID.x;
            out_a[index] += uint( in_a[index] * in_b[index] );
            out_b[index] += uint( const_one * push_const.val );
        }
    """

    kompute(shader)

Interactive Notebooks & Hands on Videos

You are able to try out the interactive Colab Notebooks which allow you to use a free GPU. The available examples are the Python and C++ examples below:

Try the interactive C++ Colab from Blog Post
Try the interactive Python Colab from Blog Post

You can also check out the two following talks presented at the FOSDEM 2021 conference.

Both videos have timestamps which will allow you to skip to the most relevant section for you - the intro & motivations for both is almost the same so you can skip to the more specific content.

Watch the video for C++ Enthusiasts
Watch the video for Python & Machine Learning Enthusiasts

Architectural Overview

The core architecture of Kompute includes the following:

To see a full breakdown you can read further in the C++ Class Reference.

Full Architecture Simplified Kompute Components


(very tiny, check the full reference diagram in docs for details)

Asynchronous and Parallel Operations

Kompute provides flexibility to run operations in an asynrchonous way through vk::Fences. Furthermore, Kompute enables for explicit allocation of queues, which allow for parallel execution of operations across queue families.

The image below provides an intuition on how Kompute Sequences can be allocated to different queues to enable parallel execution based on hardware. You can see the hands on example, as well as the detailed documentation page describing how it would work using an NVIDIA 1650 as an example.

Mobile Enabled

Kompute has been optimized to work in mobile environments. The build system enables for dynamic loading of the Vulkan shared library for Android environments, together with a working Android NDK wrapper for the CPP headers.

For a full deep dive you can read the blog post "Supercharging your Mobile Apps with On-Device GPU Accelerated Machine Learning".

You can also access the end-to-end example code in the repository, which can be run using android studio.

More examples

Simple examples

End-to-end examples

Python Package

Besides the C++ core SDK you can also use the Python package of Kompute, which exposes the same core functionality, and supports interoperability with Python objects like Lists, Numpy Arrays, etc.

The only dependencies are Python 3.5+ and Cmake 3.4.1+. You can install Kompute from the Python pypi package using the following command.

pip install kp

You can also install from master branch using:

pip install git+git://github.com/KomputeProject/kompute.git@master

For further details you can read the Python Package documentation or the Python Class Reference documentation.

C++ Build Overview

The build system provided uses cmake, which allows for cross platform builds.

The top level Makefile provides a set of optimized configurations for development as well as the docker image build, but you can start a build with the following command:

   cmake -Bbuild

You also are able to add Kompute in your repo with add_subdirectory - the Android example CMakeLists.txt file shows how this would be done.

For a more advanced overview of the build configuration check out the Build System Deep Dive documentation.

Kompute Development

We appreciate PRs and Issues. If you want to contribute try checking the "Good first issue" tag, but even using Kompute and reporting issues is a great contribution!

Contributing

Dev Dependencies

  • Testing
    • GTest
  • Documentation
    • Doxygen (with Dot)
    • Sphynx

Development

  • Follows Mozilla C++ Style Guide https://www-archive.mozilla.org/hacking/mozilla-style-guide.html
    • Uses post-commit hook to run the linter, you can set it up so it runs the linter before commit
    • All dependencies are defined in vcpkg.json
  • Uses cmake as build system, and provides a top level makefile with recommended command
  • Uses xxd (or xxd.exe windows 64bit port) to convert shader spirv to header files
  • Uses doxygen and sphinx for documentation and autodocs
  • Uses vcpkg for finding the dependencies, it's the recommended set up to retrieve the libraries

If you want to run with debug layers you can add them with the KOMPUTE_ENV_DEBUG_LAYERS parameter as:

export KOMPUTE_ENV_DEBUG_LAYERS="VK_LAYER_LUNARG_api_dump"
Updating documentation

To update the documentation you will need to:

  • Run the gendoxygen target in the build system
  • Run the gensphynx target in the build-system
  • Push to github pages with make push_docs_to_ghpages
Running tests

Running the unit tests has been significantly simplified for contributors.

The tests run on CPU, and can be triggered using the ACT command line interface (https://github.com/nektos/act) - once you install the command line (And start the Docker daemon) you just have to type:

$ act

[Python Tests/python-tests] 🚀  Start image=axsauze/kompute-builder:0.2
[C++ Tests/cpp-tests      ] 🚀  Start image=axsauze/kompute-builder:0.2
[C++ Tests/cpp-tests      ]   🐳  docker run image=axsauze/kompute-builder:0.2 entrypoint=["/usr/bin/tail" "-f" "/dev/null"] cmd=[]
[Python Tests/python-tests]   🐳  docker run image=axsauze/kompute-builder:0.2 entrypoint=["/usr/bin/tail" "-f" "/dev/null"] cmd=[]
...

The repository contains unit tests for the C++ and Python code, and can be found under the test/ and python/test folder.

The tests are currently run through the CI using Github Actions. It uses the images found in docker-builders/.

In order to minimise hardware requirements the tests can run without a GPU, directly in the CPU using Swiftshader.

For more information on how the CI and tests are setup, you can go to the CI, Docker and Tests Section in the documentation.

Motivations

This project started after seeing that a lot of new and renowned ML & DL projects like Pytorch, Tensorflow, Alibaba DNN, Tencent NCNN - among others - have either integrated or are looking to integrate the Vulkan SDK to add mobile (and cross-vendor) GPU support.

The Vulkan SDK offers a great low level interface that enables for highly specialized optimizations - however it comes at a cost of highly verbose code which requires 500-2000 lines of code to even begin writing application code. This has resulted in each of these projects having to implement the same baseline to abstract the non-compute related features of the Vulkan SDK. This large amount of non-standardised boiler-plate can result in limited knowledge transfer, higher chance of unique framework implementation bugs being introduced, etc.

We are currently developing Kompute not to hide the Vulkan SDK interface (as it's incredibly well designed) but to augment it with a direct focus on the Vulkan SDK's GPU computing capabilities. This article provides a high level overview of the motivations of Kompute, together with a set of hands on examples that introduce both GPU computing as well as the core Kompute architecture.

Comments
  • push_constant not working in my case?

    push_constant not working in my case?

    I tried both codes from README and the test (TestPushConstant.cpp), but apparently push_constant values all get zero value for some reasons? Here is the result for TestPushConstant.cpp):

    [alipmpaint@archlinux test]$ ./test_kompute
    Running main() from /home/alipmpaint/Documents/github/vulkan-kompute/external/googletest/googletest/src/gtest_main.cc
    [==========] Running 1 test from 1 test suite.
    [----------] Global test environment set-up.
    [----------] 1 test from TestPushConstants
    [ RUN      ] TestPushConstants.TestTwoConstants
    [2021-02-28 23:28:54.779] [info] [Shader.cpp:68] Kompute Shader Information: 
    
    
    WARNING: radv is not a conformant vulkan implementation, testing use only.
    [2021-02-28 23:28:54.940] [info] [Manager.cpp:269] Using physical device index 1 found AMD RADV VERDE (ACO)
    [2021-02-28 23:28:54.947] [info] [Algorithm.cpp:18] Kompute Algorithm initialising with tensor size: 1 and spirv size: 292
    [2021-02-28 23:28:54.947] [info] [Algorithm.cpp:395] Kompute OpAlgoCreate setting dispatch size
    [2021-02-28 23:28:54.947] [info] [Algorithm.cpp:409] Kompute OpAlgoCreate set dispatch size X: 1, Y: 1, Z: 1
    [2021-02-28 23:28:54.947] [info] [Sequence.cpp:44] Kompute Sequence command now started recording
    [2021-02-28 23:28:54.947] [info] [Sequence.cpp:58] Kompute Sequence command recording END
    /home/alipmpaint/Documents/github/vulkan-kompute/test/TestPushConstant.cpp:46: Failure
    Expected equality of these values:
      tensor->data()
        Which is: { 0, 0, 0 }
      kp::Constants({ 0.4, 0.4, 0.4 })
        Which is: { 0.4, 0.4, 0.4 }
    [2021-02-28 23:28:54.947] [info] [Sequence.cpp:186] Freeing CommandBuffer
    [2021-02-28 23:28:54.947] [info] [Sequence.cpp:202] Destroying CommandPool
    [2021-02-28 23:28:54.947] [info] [Sequence.cpp:219] Kompute Sequence clearing operations buffer
    [2021-02-28 23:28:54.947] [info] [Manager.cpp:102] Destroying device
    [2021-02-28 23:28:54.949] [warning] [Sequence.cpp:180] Kompute Sequence destroy called with null Device pointer
    [  FAILED  ] TestPushConstants.TestTwoConstants (269 ms)
    [----------] 1 test from TestPushConstants (269 ms total)
    
    [----------] Global test environment tear-down
    [==========] 1 test from 1 test suite ran. (269 ms total)
    [  PASSED  ] 0 tests.
    [  FAILED  ] 1 test, listed below:
    [  FAILED  ] TestPushConstants.TestTwoConstants
    
     1 FAILED TEST
    

    (specConsts work fine btw)

    I don't know where the problem exactly is... Since the tests have passed in Github actions and I have clones the same code, I guess it's local. I have two vulkan drivers, one is AMDVLK and the second one AMD RADV VERDE from mesa. What I just described happens with AMD RADV VERDE . In AMDVLK, I get a segfault(I still haven't investigated whether it's the push_constant or not since I'm running out of time) in both codes. So, I guess it has something to do with my Vulkan drivers?

    bug c++ 
    opened by unexploredtest 28
  • Update compileSource function in examples/docs to correct one

    Update compileSource function in examples/docs to correct one

    I am not a windows guy... but what the hell? Followed the example here: https://github.com/KomputeProject/kompute/tree/master/examples/array_multiplication

    Frustrating... The example does not mention vulkan headers and the other deps should be optional but are not. wth?

    cmake -Bbuild/ -DCMAKE_BUILD_TYPE=Debug -DKOMPUTE_OPT_INSTALL=0 -DKOMPUTE_OPT_REPO_SUBMODULE_BUILD=1 -DKOMPUTE_OPT_ENABLE_SPDLOG=1
    -- Building for: Visual Studio 16 2019
    -- Selecting Windows SDK version 10.0.17763.0 to target Windows 10.0.19043.
    -- The C compiler identification is MSVC 19.29.30138.0
    -- The CXX compiler identification is MSVC 19.29.30138.0
    -- Detecting C compiler ABI info
    -- Detecting C compiler ABI info - done
    -- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe - skipped
    -- Detecting C compile features
    -- Detecting C compile features - done
    -- Detecting CXX compiler ABI info
    -- Detecting CXX compiler ABI info - done
    -- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe - skipped
    -- Detecting CXX compile features
    -- Detecting CXX compile features - done
    -- Found Vulkan: C:/VulkanSDK/1.2.198.1/Lib/vulkan-1.lib
    CMake Error at C:/Users/rootkid/workspace/kompute/src/CMakeLists.txt:73 (add_subdirectory):
      The source directory
    
        C:/Users/rootkid/workspace/kompute/external/Vulkan-Headers
    
      does not contain a CMakeLists.txt file.
    
    
    CMake Error at C:/Users/rootkid/workspace/kompute/src/CMakeLists.txt:74 (get_target_property):
      get_target_property() called with non-existent target "Vulkan-Headers".
    
    
    CMake Error at C:/Users/rootkid/workspace/kompute/src/CMakeLists.txt:85 (add_subdirectory):
      The source directory
    
        C:/Users/rootkid/workspace/kompute/external/fmt
    
      does not contain a CMakeLists.txt file.
    
    
    CMake Error at C:/Users/rootkid/workspace/kompute/src/CMakeLists.txt:101 (add_subdirectory):
      The source directory
    
        C:/Users/rootkid/workspace/kompute/external/spdlog
    
      does not contain a CMakeLists.txt file.
    
    
    -- Configuring incomplete, errors occurred!
    See also "C:/Users/rootkid/workspace/kompute/examples/array_multiplication/build/CMakeFiles/CMakeOutput.log".
    
    opened by kommander 23
  • Refactor build system

    Refactor build system

    Why this PR?

    I have a couple of problems with the current way Kompute handles dependencies:

    1. git-Submodules are a rather outdated concept in my eyes and should be replaced since they are always a pain to use and are not flexible in any way.
    2. When including Kompute in a project that also uses Spdlog, it beaks a lot of stuff, e.g. Log-Macros stay in code, but we do not link Kompute against Spdlog.
    3. On my System (Fedora) Vulkan-Headers >= 1.3.0 are available, but my driver (mesa/intel) supports only >= 1.2.131. Well I need to link against a different version Vulkan-Headers without downgrading my System Vulkan-Headers since other applications depend on those. If I don't change the Vulkan-Header version, I always run into the following assertion: VULKAN_HPP_ASSERT( d.getVkHeaderVersion() == VK_HEADER_VERSION );

    Proposed Solutions

    • Replace git-Submodules with CMake fetch_content
      • This solves 1. and 3.
      • I also replaced KOMPUTE_OPT_REPO_SUBMODULE_BUILD with KOMPUTE_OPT_USE_BUILD_IN_{SPDLOG, VULKAN_HEADER, ...} and added a deprecation warning for the old way.
      • I added KOMPUTE_OPT_BUILD_IN_VULKAN_HEADER_TAG which allows consumers to set their specific Vulkan-Header version.
      • I added a check_vulkan_version to CMake that checks if your hardware supports for example Vulkan 1.3 if you link against Vulkan-Headers 1.3 and prints a warning if only the patch version of your hardware is < that the Vulkan-Header version. There is also an option to disable this: KOMPUTE_OPT_DISABLE_VULKAN_VERSION_CHECK
    • I also cleaned up all CMake options related to dependencies and fixed a few bugs here and there to solve 2.

    What is left to do?

    • [x] Check if it runs on Windows - working on it
    • [x] Check if this works with other GPUs (Nvidia)
    • [x] Fix Python Bindings
    • [x] Check if it works on Android
    • [x] Add all removed CMake options into the deprecated list here: https://github.com/COM8/kompute/blob/master/cmake/deprecation_warnings.cmake
    • [x] Update docs

    Things that need to be done once the PR has been merged

    • Update the repo and hash for the examples
      • https://github.com/COM8/kompute/blob/f6d30d5b0803c48e7f74c44adaf32e9bc613b98f/examples/array_multiplication/CMakeLists.txt#L28-L29
      • https://github.com/COM8/kompute/blob/f6d30d5b0803c48e7f74c44adaf32e9bc613b98f/examples/logistic_regression/CMakeLists.txt#L28-L29

    Let me know what you think about this.

    opened by COM8 20
  • Create example compiling and running in raspberry pi with Mesa Vulkan drivers

    Create example compiling and running in raspberry pi with Mesa Vulkan drivers

    It seems there are some relatively recent advancements in the Vulkan Drivers support for RaspberryPis (https://www.raspberrypi.org/blog/vulkan-update-were-conformant/). This issue encompasses exploring putting together an end to end example similar to the android example that shows how to run Kompute on a Raspberry Pi using the Mesa driver which enables for Vulkan 1.0 compliant processing in the Raspberry Pi (https://gitlab.freedesktop.org/mesa/mesa)

    documentation help wanted good first issue python c++ 
    opened by axsaucedo 20
  • made changes for include paths for complete installation

    made changes for include paths for complete installation

    When installed globally(sudo make install) glslang/StandAlone headers don't get installed(afaik) and only could get it to work after changing #include <StandAlone/ResourceLimits.h> to #include <glslang/Include/ResourceLimits.h>.
    Also I had to change #include <SPIRV/GlslangToSpv.h> to #include <glslang/SPIRV/GlslangToSpv.h but I figured that it would break compatibility if one wanted to run locally in the same directory and without global installation.

    opened by unexploredtest 19
  • Deep Learning Convolutional Neural Network (CNN) example implementation

    Deep Learning Convolutional Neural Network (CNN) example implementation

    I didn't see them in the list of shaders, and searching "conv" and "convolution" in this repository didn't return much.

    I have naive glsl shaders for convolutions (forwards and backwards), so I could convert those.

    opened by SimLeek 17
  • CMake Error: Imported target

    CMake Error: Imported target "kompute::kompute" includes non-existent path "/usr/local/single_include"

    When Vulkan Kompute is installed globally, it'll try to find /usr/local/single_include but it doesn't exist, the cause of problem is from:

    target_include_directories(
        kompute PUBLIC
        $<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/include>
        $<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/single_include>
        $<INSTALL_INTERFACE:include>
        $<INSTALL_INTERFACE:single_include>
    )
    

    In src/CMakeLists.txt.

    opened by unexploredtest 13
  • Test SingleSequenceRecord is not thread safe and fails in AMD card

    Test SingleSequenceRecord is not thread safe and fails in AMD card

    Hello. I am running the tests and got the following failure in the TEST(TestMultipleAlgoExecutions, SingleSequenceRecord):

    error: Expected equality of these values:
      tensorA->vector()
        Which is: { 1, 1, 1 }
      std::vector<float>({ 3, 3, 3 })
        Which is: { 3, 3, 3 }
    

    This is the sequence of commands of the test:

            mgr.sequence()
              ->record<kp::OpTensorSyncDevice>({ tensorA })
              ->record<kp::OpAlgoDispatch>(mgr.algorithm({ tensorA }, spirv))
              ->record<kp::OpAlgoDispatch>(mgr.algorithm({ tensorA }, spirv))
              ->record<kp::OpAlgoDispatch>(mgr.algorithm({ tensorA }, spirv))
              ->record<kp::OpTensorSyncLocal>({ tensorA })
              ->eval();
    

    If I add kp::OpTensorSyncDevice between the dispatches, it also fails. However, if I add eval() or kp::OpTensorSyncLocal between the dispatches, it passes.

    opened by lmreia 13
  • Delete methods for sequences inside managers

    Delete methods for sequences inside managers

    This solves #36 partially, the only thing that remains is creating a method that gives the ability delete a given anonymous sequence.
    It's my first time making a PR for a C++ project, so it might not be good. I have provided comments for each commit explaining what each does.

    enhancement 
    opened by unexploredtest 13
  • gcc12 build fails because std:shared_ptr requires explicit declation of <memory>

    gcc12 build fails because std:shared_ptr requires explicit declation of

    Build fails on opensuse tumbleweed

    Steps to reproduce
        ##Tested on commit 6b8b6e864a35a43ee71fe652fc95013aacf6904f
        $git clone https://github.com/KomputeProject/kompute.git
        $cmake -Bbuild
        $cd build
        $make
        
        $gcc (SUSE Linux) 12.2.1 20221020 [revision 0aaef83351473e8f4eb774f8f999bbe87a4866d7]
        Copyright (C) 2022 Free Software Foundation, Inc.
        This is free software; see the source for copying conditions.  There is NO
        warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
        
         $lsb_release -a
        LSB Version:    n/a
        Distributor ID: openSUSE
        Description:    openSUSE Tumbleweed
        Release:        20221124
        Codename:       n/a
        
       Error log
    gitrepo/kompute/build> make
    [ 16%] Built target fmt
    [ 27%] Built target kp_shader
    [ 38%] Built target kp_logger
    [ 44%] Building CXX object src/CMakeFiles/kompute.dir/OpTensorCopy.cpp.o
        In file included from /home/doof/gitrepo/kompute/src/include/kompute/operations/OpTensorCopy.hpp:6,
                         from /home/doof/gitrepo/kompute/src/OpTensorCopy.cpp:3:
        /home/doof/gitrepo/kompute/src/include/kompute/Tensor.hpp:55:27: error: expected ‘)’ before ‘<’ token
           55 |     Tensor(std::shared_ptr<vk::PhysicalDevice> physicalDevice,
              |           ~               ^
              |                           )
        /home/doof/gitrepo/kompute/src/include/kompute/Tensor.hpp:108:30: error: ‘std::shared_ptr’ has not been declared
          108 |                         std::shared_ptr<Tensor> copyFromTensor);
              |                              ^~~~~~~~~~
        /home/doof/gitrepo/kompute/src/include/kompute/Tensor.hpp:108:40: error: expected ‘,’ or ‘...’ before ‘<’ token
          108 |                         std::shared_ptr<Tensor> copyFromTensor);
              |                                        ^
        /home/doof/gitrepo/kompute/src/include/kompute/Tensor.hpp:256:10: error: ‘shared_ptr’ in namespace ‘std’ does not name a template type
          256 |     std::shared_ptr<vk::PhysicalDevice> mPhysicalDevice;
              |          ^~~~~~~~~~
        /home/doof/gitrepo/kompute/src/include/kompute/Tensor.hpp:6:1: note: ‘std::shared_ptr’ is defined in header ‘<memory>’; did you forget to ‘#include <memory>’?
            5 | #include "logger/Logger.hpp"
          +++ |+#include <memory>
            6 | #include <string>
        /home/doof/gitrepo/kompute/src/include/kompute/Tensor.hpp:257:10: error: ‘shared_ptr’ in namespace ‘std’ does not name a template type
          257 |     std::shared_ptr<vk::Device> mDevice;
              |          ^~~~~~~~~~
        /home/doof/gitrepo/kompute/src/include/kompute/Tensor.hpp:257:5: note: ‘std::shared_ptr’ is defined in header ‘<memory>’; did you forget to ‘#include <memory>’?
          257 |     std::shared_ptr<vk::Device> mDevice;
              |     ^~~
        /home/doof/gitrepo/kompute/src/include/kompute/Tensor.hpp:260:10: error: ‘shared_ptr’ in namespace ‘std’ does not name a template type
          260 |     std::shared_ptr<vk::Buffer> mPrimaryBuffer;
    
    opened by hungrymonkey 12
  • java.lang.UnsatisfiedLinkError: dlopen failed: library

    java.lang.UnsatisfiedLinkError: dlopen failed: library "libkompute-jni.so" not found

    Hello, I am following the following tutorial: https://towardsdatascience.com/gpu-accelerated-machine-learning-in-your-mobile-applications-using-the-android-ndk-vulkan-kompute-1e9da37b7617. I am running ubuntu 20 and whenever I run the emulator I get the following error.

    E/AndroidRuntime: FATAL EXCEPTION: main Process: com.ethicalml.kompute, PID: 14488 java.lang.UnsatisfiedLinkError: dlopen failed: library "libkompute-jni.so" not found at java.lang.Runtime.loadLibrary0(Runtime.java:1087) at java.lang.Runtime.loadLibrary0(Runtime.java:1008) at java.lang.System.loadLibrary(System.java:1664) at com.ethicalml.kompute.KomputeJni.<clinit>(KomputeJni.kt:80) at java.lang.Class.newInstance(Native Method) at android.app.AppComponentFactory.instantiateActivity(AppComponentFactory.java:95) at androidx.core.app.CoreComponentFactory.instantiateActivity(CoreComponentFactory.java:41) at android.app.Instrumentation.newActivity(Instrumentation.java:1253) at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:3353) at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:3601) at android.app.servertransaction.LaunchActivityItem.execute(LaunchActivityItem.java:85) at android.app.servertransaction.TransactionExecutor.executeCallbacks(TransactionExecutor.java:135) at android.app.servertransaction.TransactionExecutor.execute(TransactionExecutor.java:95) at android.app.ActivityThread$H.handleMessage(ActivityThread.java:2066) at android.os.Handler.dispatchMessage(Handler.java:106) at android.os.Looper.loop(Looper.java:223) at android.app.ActivityThread.main(ActivityThread.java:7656) at java.lang.reflect.Method.invoke(Native Method) at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:592) at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:947) I/Process: Sending signal. PID: 14488 SIG: 9

    opened by PascalPolygon 11
  • Instance creation fails on macOS with recent Vulkan SDK

    Instance creation fails on macOS with recent Vulkan SDK

    Running on MacOS with Vulkan SDK 1.3.231. The Manager::createInstance() fails silently when calling

    vk::createInstance(&computeInstanceCreateInfo, nullptr, this->mInstance.get());
    

    Perhaps it should check the returned code, which in this case happens to be VK_ERROR_INCOMPATIBLE_DRIVER. It seems that the recent SDK requires the VK_KHR_PORTABILITY_ENUMERATION_EXTENSION_NAME extension to be enabled on MacOS:

    applicationExtensions.push_back(VK_KHR_PORTABILITY_ENUMERATION_EXTENSION_NAME);
    computeInstanceCreateInfo.flags |= vk::InstanceCreateFlagBits::eEnumeratePortabilityKHR;
    ...
    vk::createInstance(&computeInstanceCreateInfo, nullptr, this->mInstance.get());
    
    opened by Archie3d 0
  • "Kompute Tensor device is null" when rebuilding tensor

    When creating a tensor from manager with

    inline std::shared_ptr<kp::TensorT<float>> kp::Manager::tensorT<float>(const kp::Constants &data, kp::Tensor::TensorTypes tensorType = kp::Tensor::TensorTypes::eDevice)
    

    and then calling rebuild on created tensor it will throw "Kompute Tensor device is null"

    I'm not very familiar with kompute yet but it seems if in rebuild member function following branch gets executed

    if (this->mPrimaryBuffer || this->mPrimaryMemory) {
            KP_LOG_DEBUG(
              "Kompute Tensor destroying existing resources before rebuild");
            this->destroy();
        }
    

    then in allocateMemoryCreateGPUResources

    if (!this->mDevice) {
            throw std::runtime_error("Kompute Tensor device is null");
        }
    

    gets executed every time because after this->destroy() mDevice is nullptr every time. This can be seen at Tensor.cpp:548

    if (this->mDevice) {
        this->mDevice = nullptr;
    }
    
    opened by MiroPalmu 0
  • memory sanitizer reports errors for simple example

    memory sanitizer reports errors for simple example

    Hi

    I am trying to integrate the given simple multiplication example into a project. I use the memory sanitizer by default and it is showing following info.

    single_include/kompute/Kompute.hpp:1049:30: runtime error: member call on address 0x555559947500 which does not point to an object of type 'TensorT' 0x555559947500: note: object has invalid vptr

    Please help on how to proceed.

    Also, in debug mode I am getting a segmentation fault in manager.cpp at this line this->mInstance->destroyDebugReportCallbackEXT(

    why is this happening ?

    opened by mkandulavm 2
  • can not use kp::Algorithm::setPushConstants(const std::vector<T>& pushConstants) in v0.8.1

    can not use kp::Algorithm::setPushConstants(const std::vector& pushConstants) in v0.8.1

    kp::Algorithm::setPushConstants(const std::vector& pushConstants) can not compile success.

    compiler context : windows 10 x64, VS2017 x64

        template<typename T>
        void setPushConstants(const std::vector<T>& pushConstants)  // specify to const vector, and `data()` return const T *
        {
            uint32_t memorySize = sizeof(decltype(pushConstants.back()));
            uint32_t size = pushConstants.size();
    
            this->setPushConstants(pushConstants.data(), size, memorySize); // here invoke non constant version function
        }
    
    // only provide non constant version 
    
     /**
         * Sets the push constants to the new value provided to use in the next
         * bindPush() with the raw memory block location and memory size to be used.
         *
         * @param data The raw data point to copy the data from, without modifying
         * the pointer.
         * @param size The number of data elements provided in the data
         * @param memorySize The memory size of each of the data elements in bytes.
         */
        void setPushConstants(void* data, uint32_t size, uint32_t memorySize) // <- void * data, so can not compile 
        {
    
            uint32_t totalSize = memorySize * size;
            uint32_t previousTotalSize =
              this->mPushConstantsDataTypeMemorySize * this->mPushConstantsSize;
    
            if (totalSize != previousTotalSize) {
                throw std::runtime_error(fmt::format(
                  "Kompute Algorithm push "
                  "constant total memory size provided is {} but expected {} bytes",
                  totalSize,
                  previousTotalSize));
            }
            if (this->mPushConstantsData) {
                free(this->mPushConstantsData);
            }
    
            this->mPushConstantsData = malloc(totalSize);
            memcpy(this->mPushConstantsData, data, totalSize);
            this->mPushConstantsDataTypeMemorySize = memorySize;
            this->mPushConstantsSize = size;
        }
    
    
    opened by cracy3m 3
  • Validation Error :VUID-vkBeginCommandBuffer-commandBuffer-00050

    Validation Error :VUID-vkBeginCommandBuffer-commandBuffer-00050

    Calling record() after calling eval() on the same kp:: sequence object will cause a vuid-vkbegincommandbuffer-commandbuffer-00050 Vulkan validation error.

    Error occur in README.md demo:

    ...
    // 4. Run operation synchronously using sequence
        mgr.sequence()
            ->record<kp::OpTensorSyncDevice>(params)
            ->record<kp::OpAlgoDispatch>(algorithm) // Binds default push consts
            ->eval() // Evaluates the two recorded operations
            ->record<kp::OpAlgoDispatch>(algorithm, pushConstsB) // Overrides push consts  ,<--- vuid-vkbegincommandbuffer-commandbuffer-00050
            ->eval(); // Evaluates only last recorded operation
    
    ...
    

    Kompute create vulkan command pool without VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT, but kp::Sequence::begin will call vkBeginCommandBuffer(if command buffer is not in the initial state, the validation layer will get a error message ),

    Will such error cause problems? Or should record() not be used after kp:: sequence:: eval()?

    opened by cracy3m 5
  • Cannot create tensor without initialization data

    Cannot create tensor without initialization data

    Is there a way to create a Tensor without a initialization data? The data arg of kp::Tensor () function set to nullptr will occur exception (alway use memcpy(..., data, ...) inside the function)!

    kp::Tensor ( std::shared_ptr< vk::PhysicalDevice > physicalDevice, std::shared_ptr< vk::Device > device, void *data, uint32_t elementTotalCount, uint32_t elementMemorySize, const TensorDataTypes &dataType, const TensorTypes &tensorType=TensorTypes::eDevice )

    opened by cracy3m 9
Releases(v0.8.1)
  • v0.8.1(Apr 13, 2022)

    v0.8.1

    Full Changelog

    Closed issues:

    • Discord link in README and docs is broken #276
    • Website examples typo's and 6500 XT unknown GPU #275
    • [Question] How to disable all log ? #274
    • full diagram 404 #271
    • Error when enabling KOMPUTE\_ENABLE\_SPDLOG #268
    • Add KOMPUTE_LOG_ACTIVE_LEVEL instead of current SPDLOG_ACTIVE_LEVEL #267
    • Update/Fix Android project #264
    • Update compileSource function in examples/docs to correct one #261
    • Technically can Kompute be modified to support data visualization? #260
    • Data-transfer for Integrated GPU #258
    • Python "getting started" example fails #252
    • Python example in README doesn't work #248
    • Running Android app #234

    Merged pull requests:

    Source code(tar.gz)
    Source code(zip)
    Kompute.hpp(103.63 KB)
  • v0.8.0(Sep 16, 2021)

    v0.8.0

    Full Changelog

    Closed issues:

    • An unset KOMPUTE_ENV_DEBUG_LAYERS leads KP_LOG_DEBUG to pass envLayerNamesVal==nullptr #245
    • Extend utils shader helpers in test for windows #240
    • Python segfaults after import kp #230
    • Simple and extended python examples do not work (v 0.7.0) #228
    • Python macOS issue (ImportError: dlopen(...): no suitable image found. Did find: ...: mach-o, but wrong architecture) #223
    • Python macOS issue (Symbol not found: __PyThreadState_Current ... Expected in: flat namespace) #221
    • Finalise Migration of Kompute into Linux Foundation #216
    • CMake Error: Imported target "kompute::kompute" includes non-existent path "/usr/local/single_include" #212
    • Incompatibality inroduced with #168 on Vulkan 1.1.x #209
    • external libraries #201
    • Starting slack group or discord for alternative / faster version of asking questions #198
    • Test SingleSequenceRecord is not thread safe and fails in AMD card #196
    • Update Kompute headers to reference the glslang headers for install vs build interfaces #193
    • Integrate with GLSLang find_package file when issue is resolved in the glslang repo #191
    • Release 0.7.0 #187
    • Get number of available devices #185
    • Deep Learning Convolutional Neural Network (CNN) example implementation #162
    • Create example compiling and running in raspberry pi with Mesa Vulkan drivers #131
    • Add support for VK_EXT_debug_utils labels #110

    Merged pull requests:

    Source code(tar.gz)
    Source code(zip)
    Kompute.hpp(102.97 KB)
  • v0.7.0(Mar 14, 2021)

    Release v0.7.0

    The 0.7.0 release introduces a very extensive list of features - a high level overview includes:

    • Support for push constants
    • Support for specailisation constants
    • Support for tensor data types bool, float, double, int32 and uint32
    • Ability to define Operations outside manager
    • Ability to create Algorithm outside manager
    • New OpMemoryBarrier to add custom barriers
    • New OpAlgoDispatch to dispatch algorithm with push constants
    • New interface for sequences
    • New memory relationships all managed by top level manager with weak references allowing for smart pointers to terminate objects
    • Code coverage metrics using gcov + lcov

    Implemented enhancements:

    • Extend non-spdlog print functions to use std::format #158
    • Add code coverage reports with codecov #145
    • Explore removing std::vector mData; completely from Tensor in favour of always storing data in hostVisible buffer memory (TBC) #144
    • Update all examples to match breaking changes in 0.7.0 #141
    • Avoid copy when returning python numpy / array #139
    • Cover all Python & C++ tests in CI #121
    • Add C++ Test for Simple Work Groups Example #117
    • Expose push constants in OpAlgo #54
    • Expose ability to create barriers in OpTensor operations #45
    • Create delete function in manager to free / destroy sequence #36
    • Make specialisation data extensible #12
    • Support multiple types for Kompute Tensors #2
    • Added re-record sequence functionality and updated docs #171 (axsaucedo)
    • Extend non-spdlog print functions to use fmt::format / fmt::print #159 (axsaucedo)
    • Added support for custom SpecializedConstants and removed KomputeWorkgroup class #151 (axsaucedo)
    • Added destroy functions for tensors and sequences (named and object) #146 (axsaucedo)

    Fixed bugs:

    • push_constant not working in my case? #168
    • DescriptorPool set is not being freed #155
    • Updated memory barriers to include staging buffers #182 (axsaucedo)
    • Adds push const ranges in pipelinelayout to fix #168 #174 (axsaucedo)
    • Added destructor for staging tensors #134 (axsaucedo)

    Closed issues:

    • Update memory barriers to align with tensor staging/primary memory revamp #181
    • Move shader defaultResource inside kp::Shader class #175
    • Reach at least 90% code coverage on tests #170
    • Add functionality to re-record sequence as now it's possible to update the underlying algorithm #169
    • Use numpy arrays as default return value #166
    • Update all shared_ptr value passes to be by ref or const ref #161
    • Amend memory hierarchy for kp::Operations so they can be created separately #160
    • Customise theme of documentation #156
    • Remove KomputeWorkgroup class in favour of std::array<uint32_t, 3> #152
    • Passing raw GLSL string to Shader Module depricated so remove this method from supported approach #150
    • Add python backwards compatibility for eval_tensor_create_def #147
    • Document breaking changes for 0.7.0 #140
    • Tensor memory management and memory hierarchy redesign #136
    • Staging tensor GPU memory is not freed as part of OpCreateTensor removal #133
    • eStorage Tensors are currently unusable as OpTensorCreate calls mapDataIntoHostMemory #132
    • 0.6.0 Release #126
    • java.lang.UnsatisfiedLinkError: dlopen failed: library "libkompute-jni.so" not found #125
    • Initial exploration: Include explicit GLSL to SPIRV compilation #107
    • Add support for push constants #106

    Merged pull requests:

    • Resolve moving all functions from tensor HPP to CPP #186 (axsaucedo)
    • Device Properties #184 (alexander-g)
    • Too many warnings #183 (alexander-g)
    • Add support for bool, double, int32, uint32 and float32 on Tensors via TensorT #177 (axsaucedo)
    • Support for Timestamping #176 (alexander-g)
    • Test for ShaderResources #165 (aliPMPAINT)
    • Amend memory hierarchy to enable for push constants and functional interface for more flexible operations #164 (axsaucedo)
    • made changes for include paths for complete installation #163 (aliPMPAINT)
    • Added dark mode on docs #157 (axsaucedo)
    • Glslang implementation for online shader compilation #154 (axsaucedo)
    • Adding test code coverage using gcov and lcov #149 (axsaucedo)
    • Added temporary backwards compatibility for eval_tensor_create_def function #148 (axsaucedo)
    • Amend memory ownership hierarchy to have Tensor owned by Manager instead of OpCreateTensor / OpBase #138 (axsaucedo)
    • Removed Staging Tensors in favour of having two buffer & memory in a Tensor to minimise data transfer #137 (axsaucedo)
    Source code(tar.gz)
    Source code(zip)
    Kompute.hpp(97.97 KB)
  • v0.6.0(Jan 31, 2021)

    v0.6.0

    Full Changelog

    Implemented enhancements:

    Fixed bugs:

    • [PYTHON] Support string parameter instead of list for eval_algo_data when passing raw shader as string #93
    • [PYTHON] Fix log_level on the python implementation (using pybind's logging functions) #92

    Closed issues:

    • Add documentation for custom operations #128
    • Numpy Array Support and Work Group Configuration in Python Kompute #124
    • Remove references to spdlog in python module #122
    • Setup automated CI testing for PRs using GitHub actions #114
    • Python example type error (pyshader). #111
    • Update all references to operations to not use template #101
    • Getting a undefined reference error while creating a Kompute Manager #100

    Merged pull requests:

    Source code(tar.gz)
    Source code(zip)
    Kompute.hpp(95.19 KB)
  • v0.5.1(Nov 14, 2020)

  • v0.5.0(Nov 8, 2020)

    • Migrated all OpAlgoBase components to use dynamic layouts instead of templates #26, #57
    • Updated all examples to use spir-v bytes by default #86
    • Added compatibility for Vulkan Headers HPP v1.2.154+ #84
    • Added Python Pypi package for Kompute #87
    • Added python interface functions to process python spirv bytes directly* Added implementation of Logistic Regression implementation in Python
    • Extended examples to showcase pyshader to use more pythonic GPU development
    • Enabled spdlog builds by default on python package
    • Added multi-platform python package installs via pypi https://pypi.org/project/kp/
    • Added log level config functionality in python
    • Added Python Bindings for Kompute library
    • Added Python tests showcasing core functionality using Manager and Sequences
    • Added documentation integrating pyhton class references (to be automated)
    • Changed sequences to be returned as shader_ptr instead of weak_ptr
    • Added sequence memory management via init member function
    • Added explicit definition on VulkanDestroy funtions for VulkanHPP 1.2.155-1.2.158 compat
    • Removed template parameters from OpAlgoBase functions (added Op*.cpp files)
    • Added python build to main cmake file
    • Added pybind11 submodule
    • Added Sequence tests to verify memory management via init member function
    • Update e2e examples
    • Add Python documentation and further examples
    Source code(tar.gz)
    Source code(zip)
    Kompute.hpp(94.63 KB)
  • v0.4.1(Nov 1, 2020)

    • Support for Vulkan HPP version 1.2.155 and above (tested with every version until 1.2.158)
    • Updated linux docker image to test multiple versions of Vulkan SDK
    • CCLS support for submodule builds (besides vcpkg builds)
    • Submodules for core dependencies for flexibility when testing dependencies with particular versions
    • Build flags to configure submodule builds vs vcpkg (toolchain based) builds
    • Removed range prints which removes explicit dependency on fmt
    • Syntax correction of private/protected member variables
    • AUR package added by contributor https://aur.archlinux.org/packages/vulkan-kompute-git/ via #81
    Source code(tar.gz)
    Source code(zip)
    Kompute.hpp(106.00 KB)
  • v0.4.0(Oct 18, 2020)

    • Added async/await capabilities for manager and sequence
    • Added fence resource as member object for Sequence
    • Ensure sequence begin() function clears previous operations
    • Ensure begin does not get called if sequence in running state
    • Ensure manager creates new sequence when default functions called
    • Added capabilities for multi queue support in manager with explicit allocation on sequences
    • Fixed compile warnings on Linux (ubuntu)
    • Added createManagedSequence on manager with ability for default sequence name to be created
    • Added tests for asynchronous and parallel processing
    • Added LogisticRegression shader as cpp header
    • Added documentation on advanced examples
    • Added documentation on shader-to-cpp-header scripts
    • Added CNAME for kompute.cc domain
    Source code(tar.gz)
    Source code(zip)
    Kompute.hpp(105.76 KB)
  • v0.3.2(Oct 4, 2020)

    • Enabled dynamic loading of Vulkan library for Vulkan support
    • Added Android NDK Wrapper with Vulkan HPP support
    • Updated CMAKE min standard to C++14
    • Downgraded min cmake requirement to 3.4.1
    • Added new shader file as shaderlogisticregression as part of AggregateHeader
    • Added #pragma once guard to Kompte.hpp single header
    • Updated createComputePipeline to return Result instead of ResultValue for backwards compatibility
    • Added new compute flags for android including:
      • KOMPUTE_OPT_INSTALL
      • KOMPUTE_OPT_ANDROID_BUILD
      • KOMPUTE_OPT_DISABLE_VK_DEBUG_LAYERS
      • KOMPUTE_VK_API_VERSION
      • KOMPUTE_VK_API_MINOR_VERSION
      • KOMPUTE_VK_API_MAJOR_VERSION
    Source code(tar.gz)
    Source code(zip)
    Kompute.hpp(99.30 KB)
  • v0.3.1(Sep 20, 2020)

  • v0.3.0(Sep 13, 2020)

    • #50 - Added documentation and testing for reusing recorded commands
    • #19 - Added Machine Learning example with Kompute
    • #40 - Provide further granularity for handling tensor data
    • #39 - Create OpTensorSyncLocal and OpTensorSyncDevice operations
    • #43 - Renamed OpCreateTensor to OpTensorCreate to align with tensor operations
    • #43 - Fixed OpTensorCreate not mapping data for host tensors
    • #47 - Add preSubmit function to OpBase to account for multiple eval commands
    • #58 - Add standalone examples that show using Kompute from scratch
    • #58 - Make Kompute installable locally
    • #56 - Remove OpAlgoBase copy tensor functionality to delegate to OpTensorSync
    Source code(tar.gz)
    Source code(zip)
    Kompute.hpp(66.67 KB)
  • v0.2.0(Sep 5, 2020)

    • #18 - Improve access to underlying data for speed and ease of access
    • #17 - Enabled for compute shaders to be passed as files or strings
    • #13 - Enable OpCreateTensor to receive more than 1 tensor
    • #11 - Add default specialisation data to algorithm holding all tensor sizes
    • #9 - Added documentation automated with doxygen and sphinx
    • #15 - Memory profiling and ensured no memory leaks explicitly
    • #30 - Removed spdlog as required dependency and into optional
    • #37 - Migrated to GTest
    Source code(tar.gz)
    Source code(zip)
    Kompute.hpp(60.65 KB)
  • v0.1.0(Aug 29, 2020)

    Vulkan Kompute Release 0.1.0

    The 0.1.0 release of Vulkan Kompute, the General Purpose Vulkan Compute Framework.

    Features

    • Automatic detection and selection of phyiscal GPU device
    • Single header import with dynamic library available for easy integration
    • Uses vcpkg with optional manifest to download / fetch dependencies
    • Provides high level kompute interface via Manager and Sequence
    • GPU processing data abstracted with Tensors
    • Compute shaders and pipelines are abstracted via algorithms
    • Base options for operations including create tensor and multiplication as example
    • Ability to initialise components with external vulkan resources
    • Conversion of glsl shaders into SPIR-V and native HPP files for static building
    • DEBUG option for verbose builds that cover execution path in detail with SPDLOG
    • Docker image to build the library in linux environments
    • CMAKE support with tested functionality in linux and windows
    • Documentation containing autodoc with Doxygen using Sphinx
    • Unit and integration testing using Catch2
    Source code(tar.gz)
    Source code(zip)
    Kompute.hpp(40.65 KB)
Owner
The Kompute Project
Kompute is a Sandbox Project in LF AI & Data Foundation focused on advancing the GPU acceleration ecosystem through cross-vendor graphics card tooling.
The Kompute Project
Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation. Intel iHD GPU (iGPU) support. NVIDIA GPU (dGPU) support.

mtomo Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation.

Katsuya Hyodo 24 Mar 2, 2022
Spam your friends and famly and when you do your famly will disown you and you will have no friends.

SpamBot9000 Spam your friends and family and when you do your family will disown you and you will have no friends. Terms of Use Disclaimer: Please onl

DJ15 0 Jun 9, 2022
HashNeRF-pytorch - Pure PyTorch Implementation of NVIDIA paper on Instant Training of Neural Graphics primitives

HashNeRF-pytorch Instant-NGP recently introduced a Multi-resolution Hash Encodin

Yash Sanjay Bhalgat 616 Jan 6, 2023
ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representation from common sense knowledge graphs.

ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representa

Bats Research 94 Nov 21, 2022
NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.

NVIDIA Merlin NVIDIA Merlin is an open source library designed to accelerate recommender systems on NVIDIA’s GPUs. It enables data scientists, machine

null 419 Jan 3, 2023
A python package simulating the quasi-2D pseudospin-1/2 Gross-Pitaevskii equation with NVIDIA GPU acceleration.

A python package simulating the quasi-2D pseudospin-1/2 Gross-Pitaevskii equation with NVIDIA GPU acceleration. Introduction spinor-gpe is high-level,

null 2 Sep 20, 2022
(Py)TOD: Tensor-based Outlier Detection, A General GPU-Accelerated Framework

(Py)TOD: Tensor-based Outlier Detection, A General GPU-Accelerated Framework Background: Outlier detection (OD) is a key data mining task for identify

Yue Zhao 127 Jan 5, 2023
Implementation of self-attention mechanisms for general purpose. Focused on computer vision modules. Ongoing repository.

Self-attention building blocks for computer vision applications in PyTorch Implementation of self attention mechanisms for computer vision in PyTorch

AI Summer 962 Dec 23, 2022
a general-purpose Transformer based vision backbone

Swin Transformer By Ze Liu*, Yutong Lin*, Yue Cao*, Han Hu*, Yixuan Wei, Zheng Zhang, Stephen Lin and Baining Guo. This repo is the official implement

Microsoft 9.9k Jan 8, 2023
BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation This is a demo implementation of BYOL for Audio (BYOL-A), a self-sup

NTT Communication Science Laboratories 160 Jan 4, 2023
ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection

ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection This repository contains implementation of the

Visual Understanding Lab @ Samsung AI Center Moscow 190 Dec 30, 2022
A task-agnostic vision-language architecture as a step towards General Purpose Vision

Towards General Purpose Vision Systems By Tanmay Gupta, Amita Kamath, Aniruddha Kembhavi, and Derek Hoiem Overview Welcome to the official code base f

AI2 79 Dec 23, 2022
A general-purpose, flexible, and easy-to-use simulator alongside an OpenAI Gym trading environment for MetaTrader 5 trading platform (Approved by OpenAI Gym)

gym-mtsim: OpenAI Gym - MetaTrader 5 Simulator MtSim is a simulator for the MetaTrader 5 trading platform alongside an OpenAI Gym environment for rein

Mohammad Amin Haghpanah 184 Dec 31, 2022
Unofficial PyTorch implementation of MobileViT based on paper "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer".

MobileViT RegNet Unofficial PyTorch implementation of MobileViT based on paper MOBILEVIT: LIGHT-WEIGHT, GENERAL-PURPOSE, AND MOBILE-FRIENDLY VISION TR

Hong-Jia Chen 91 Dec 2, 2022
A general-purpose programming language, focused on simplicity, safety and stability.

The Rivet programming language A general-purpose programming language, focused on simplicity, safety and stability. Rivet's goal is to be a very power

The Rivet programming language 17 Dec 29, 2022
GrabGpu_py: a scripts for grab gpu when gpu is free

GrabGpu_py a scripts for grab gpu when gpu is free. WaitCondition: gpu_memory >

tianyuluan 3 Jun 18, 2022
Cards Against Humanity AI

cah-ai This is a Cards Against Humanity AI implemented using a pre-trained Semantic Search model. How it works A player is described by a combination

Alex Nichol 2 Aug 22, 2022
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped

CSWin-Transformer This repo is the official implementation of "CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows". Th

Microsoft 409 Jan 6, 2023
Rendering Point Clouds with Compute Shaders

Compute Shader Based Point Cloud Rendering This repository contains the source code to our techreport: Rendering Point Clouds with Compute Shaders and

Markus Schütz 460 Jan 5, 2023