General purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases.

Overview

GitHub GitHub GitHub GitHub GitHub

Vulkan Kompute

The general purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends).

Blazing fast, mobile-enabled, asynchronous, and optimized for advanced GPU processing usecases.

๐Ÿ”‹ Documentation ๐Ÿ’ป Blog Post โŒจ Examples ๐Ÿ’พ

Principles & Features

Getting Started

Below you can find a GPU multiplication example using the C++ and Python Kompute interfaces.

Your First Kompute (C++)

The C++ interface provides low level access to the native components of Kompute and Vulkan, enabling for advanced optimizations as well as extension of components.

void kompute(const std::string& shader) {

    // 1. Create Kompute Manager with default settings (device 0, first queue and no extensions)
    kp::Manager mgr; 

    // 2. Create and initialise Kompute Tensors through manager

    // Default tensor constructor simplifies creation of float values
    auto tensorInA = mgr.tensor({ 2., 2., 2. });
    auto tensorInB = mgr.tensor({ 1., 2., 3. });
    // Explicit type constructor supports uint32, int32, double, float and bool
    auto tensorOutA = mgr.tensorT<uint32_t>({ 0, 0, 0 });
    auto tensorOutB = mgr.tensorT<uint32_t>({ 0, 0, 0 });

    std::vector<std::shared_ptr<kp::Tensor>> params = {tensorInA, tensorInB, tensorOutA, tensorOutB};

    // 3. Create algorithm based on shader (supports buffers & push/spec constants)
    kp::Workgroup workgroup({3, 1, 1});
    kp::Constants specConsts({ 2 });
    kp::Constants pushConstsA({ 2.0 });
    kp::Constants pushConstsB({ 3.0 });

    auto algorithm = mgr.algorithm(params,
                                   kp::Shader::compile_source(shader),
                                   workgroup,
                                   specConsts,
                                   pushConstsA);

    // 4. Run operation synchronously using sequence
    mgr.sequence()
        ->record<kp::OpTensorSyncDevice>(params)
        ->record<kp::OpAlgoDispatch>(algorithm) // Binds default push consts
        ->eval() // Evaluates the two recorded operations
        ->record<kp::OpAlgoDispatch>(algorithm, pushConstsB) // Overrides push consts
        ->eval(); // Evaluates only last recorded operation

    // 5. Sync results from the GPU asynchronously
    sq = mgr.sequence()
    sq->evalAsync<kp::OpTensorSyncLocal>(params);

    // ... Do other work asynchronously whilst GPU finishes

    sq->evalAwait();

    // Prints the first output which is: { 4, 8, 12 }
    for (const float& elem : tensorOutA->data()) std::cout << elem << "  ";
    // Prints the second output which is: { 10, 10, 10 }
    for (const float& elem : tensorOutB->data()) std::cout << elem << "  ";

} // Manages / releases all CPU and GPU memory resources

int main() {

    // Define a raw string shader (or use the Kompute tools to compile to SPIRV / C++ header
    // files). This shader shows some of the main components including constants, buffers, etc
    std::string shader = (R"(
        #version 450

        layout (local_size_x = 1) in;

        // The input tensors bind index is relative to index in parameter passed
        layout(set = 0, binding = 0) buffer buf_in_a { float in_a[]; };
        layout(set = 0, binding = 1) buffer buf_in_b { float in_b[]; };
        layout(set = 0, binding = 2) buffer buf_out_a { uint out_a[]; };
        layout(set = 0, binding = 3) buffer buf_out_b { uint out_b[]; };

        // Kompute supports push constants updated on dispatch
        layout(push_constant) uniform PushConstants {
            float val;
        } push_const;

        // Kompute also supports spec constants on initalization
        layout(constant_id = 0) const float const_one = 0;

        void main() {
            uint index = gl_GlobalInvocationID.x;
            out_a[index] += uint( in_a[index] * in_b[index] );
            out_b[index] += uint( const_one * push_const.val );
        }
    )");

    // Run the function declared above with our raw string shader
    kompute(shader);
}

Your First Kompute (Python)

The Python package provides a high level interactive interface that enables for experimentation whilst ensuring high performance and fast development workflows.

def kompute(shader):
    # 1. Create Kompute Manager with default settings (device 0, first queue and no extensions)
    mgr = kp.Manager()

    # 2. Create and initialise Kompute Tensors through manager

    # Default tensor constructor simplifies creation of float values
    tensor_in_a = mgr.tensor([2, 2, 2])
    tensor_in_b = mgr.tensor([1, 2, 3])
    # Explicit type constructor supports uint32, int32, double, float and bool
    tensor_out_a = mgr.tensor_t(np.array([0, 0, 0], dtype=np.uint32))
    tensor_out_b = mgr.tensor_t(np.array([0, 0, 0], dtype=np.uint32))

    params = [tensor_in_a, tensor_in_b, tensor_out_a, tensor_out_b]

    # 3. Create algorithm based on shader (supports buffers & push/spec constants)
    workgroup = (3, 1, 1)
    spec_consts = [2]
    push_consts_a = [2]
    push_consts_b = [3]

    spirv = kp.Shader.compile_source(shader)

    algo = mgr.algorithm(params, spirv, workgroup, spec_consts, push_consts_a)

    # 4. Run operation synchronously using sequence
    (mgr.sequence()
        .record(kp.OpTensorSyncDevice(params))
        .record(kp.OpAlgoDispatch(algo)) # Binds default push consts provided
        .eval() # evaluates the two recorded ops
        .record(kp.OpAlgoDispatch(algo, push_consts_b)) # Overrides push consts
        .eval()) # evaluates only the last recorded op

    # 5. Sync results from the GPU asynchronously
    sq = mgr.sequence()
    sq.eval_async(kp.OpTensorSyncLocal(params))

    # ... Do other work asynchronously whilst GPU finishes

    sq.eval_await()

    # Prints the first output which is: { 4, 8, 12 }
    print(tensor_out_a)
    # Prints the first output which is: { 10, 10, 10 }
    print(tensor_out_b)

if __name__ == "__main__":

    # Define a raw string shader (or use the Kompute tools to compile to SPIRV / C++ header
    # files). This shader shows some of the main components including constants, buffers, etc
    shader = """
        #version 450

        layout (local_size_x = 1) in;

        // The input tensors bind index is relative to index in parameter passed
        layout(set = 0, binding = 0) buffer buf_in_a { float in_a[]; };
        layout(set = 0, binding = 1) buffer buf_in_b { float in_b[]; };
        layout(set = 0, binding = 2) buffer buf_out_a { uint out_a[]; };
        layout(set = 0, binding = 3) buffer buf_out_b { uint out_b[]; };

        // Kompute supports push constants updated on dispatch
        layout(push_constant) uniform PushConstants {
            float val;
        } push_const;

        // Kompute also supports spec constants on initalization
        layout(constant_id = 0) const float const_one = 0;

        void main() {
            uint index = gl_GlobalInvocationID.x;
            out_a[index] += uint( in_a[index] * in_b[index] );
            out_b[index] += uint( const_one * push_const.val );
        }
    """

    kompute(shader)

Interactive Notebooks & Hands on Videos

You are able to try out the interactive Colab Notebooks which allow you to use a free GPU. The available examples are the Python and C++ examples below:

Try the interactive C++ Colab from Blog Post
Try the interactive Python Colab from Blog Post

You can also check out the two following talks presented at the FOSDEM 2021 conference.

Both videos have timestamps which will allow you to skip to the most relevant section for you - the intro & motivations for both is almost the same so you can skip to the more specific content.

Watch the video for C++ & Vulkan SDK Enthusiasts
Watch the video for Python & Machine Learning Enthusiasts

Architectural Overview

The core architecture of Kompute includes the following:

To see a full breakdown you can read further in the C++ Class Reference.

Full Vulkan Components Simplified Kompute Components


(very tiny, check the full reference diagram in docs for details)

Asynchronous and Parallel Operations

Kompute provides flexibility to run operations in an asynrchonous way through Vulkan Fences. Furthermore, Kompute enables for explicit allocation of queues, which allow for parallel execution of operations across queue families.

The image below provides an intuition on how Kompute Sequences can be allocated to different queues to enable parallel execution based on hardware. You can see the hands on example, as well as the detailed documentation page describing how it would work using an NVIDIA 1650 as an example.

Mobile Enabled

Kompute has been optimized to work in mobile environments. The build system enables for dynamic loading of the Vulkan shared library for Android environments, together with a working Android NDK Vulkan wrapper for the CPP headers.

For a full deep dive you can read the blog post "Supercharging your Mobile Apps with On-Device GPU Accelerated Machine Learning".

You can also access the end-to-end example code in the repository, which can be run using android studio.

More examples

Simple examples

End-to-end examples

Python Package

Besides the C++ core SDK you can also use the Python package of Kompute, which exposes the same core functionality, and supports interoperability with Python objects like Lists, Numpy Arrays, etc.

The only dependencies are Python 3.5+ and Cmake 3.4.1+. You can install Kompute from the Python pypi package using the following command.

pip install kp

You can also install from master branch using:

pip install git+git://github.com/EthicalML/vulkan-kompute.git@master

For further details you can read the Python Package documentation or the Python Class Reference documentation.

C++ Build Overview

The build system provided uses cmake, which allows for cross platform builds.

The top level Makefile provides a set of optimized configurations for development as well as the docker image build, but you can start a build with the following command:

   cmake -Bbuild

You also are able to add Kompute in your repo with add_subdirectory - the Android example CMakeLists.txt file shows how this would be done.

For a more advanced overview of the build configuration check out the Build System Deep Dive documentation.

Kompute Development

We appreciate PRs and Issues. If you want to contribute try checking the "Good first issue" tag, but even using Vulkan Kompute and reporting issues is a great contribution!

Contributing

Dev Dependencies

  • Testing
    • GTest
  • Documentation
    • Doxygen (with Dot)
    • Sphynx

Development

  • Follows Mozilla C++ Style Guide https://www-archive.mozilla.org/hacking/mozilla-style-guide.html
    • Uses post-commit hook to run the linter, you can set it up so it runs the linter before commit
    • All dependencies are defined in vcpkg.json
  • Uses cmake as build system, and provides a top level makefile with recommended command
  • Uses xxd (or xxd.exe windows 64bit port) to convert shader spirv to header files
  • Uses doxygen and sphinx for documentation and autodocs
  • Uses vcpkg for finding the dependencies, it's the recommended set up to retrieve the libraries

If you want to run with debug layers you can add them with the KOMPUTE_ENV_DEBUG_LAYERS parameter as:

export KOMPUTE_ENV_DEBUG_LAYERS="VK_LAYER_LUNARG_api_dump"
Updating documentation

To update the documentation you will need to:

  • Run the gendoxygen target in the build system
  • Run the gensphynx target in the build-system
  • Push to github pages with make push_docs_to_ghpages
Running tests

Running the unit tests has been significantly simplified for contributors.

The tests run on CPU, and can be triggered using the ACT command line interface (https://github.com/nektos/act) - once you install the command line (And start the Docker daemon) you just have to type:

$ act

[Python Tests/python-tests] ๐Ÿš€  Start image=axsauze/kompute-builder:0.2
[C++ Tests/cpp-tests      ] ๐Ÿš€  Start image=axsauze/kompute-builder:0.2
[C++ Tests/cpp-tests      ]   ๐Ÿณ  docker run image=axsauze/kompute-builder:0.2 entrypoint=["/usr/bin/tail" "-f" "/dev/null"] cmd=[]
[Python Tests/python-tests]   ๐Ÿณ  docker run image=axsauze/kompute-builder:0.2 entrypoint=["/usr/bin/tail" "-f" "/dev/null"] cmd=[]
...

The repository contains unit tests for the C++ and Python code, and can be found under the test/ and python/test folder.

The tests are currently run through the CI using Github Actions. It uses the images found in docker-builders/.

In order to minimise hardware requirements the tests can run without a GPU, directly in the CPU using Swiftshader.

For more information on how the CI and tests are setup, you can go to the CI, Docker and Tests Section in the documentation.

Motivations

This project started after seeing that a lot of new and renowned ML & DL projects like Pytorch, Tensorflow, Alibaba DNN, Tencent NCNN - among others - have either integrated or are looking to integrate the Vulkan SDK to add mobile (and cross-vendor) GPU support.

The Vulkan SDK offers a great low level interface that enables for highly specialized optimizations - however it comes at a cost of highly verbose code which requires 500-2000 lines of code to even begin writing application code. This has resulted in each of these projects having to implement the same baseline to abstract the non-compute related features of Vulkan. This large amount of non-standardised boiler-plate can result in limited knowledge transfer, higher chance of unique framework implementation bugs being introduced, etc.

We are currently developing Vulkan Kompute not to hide the Vulkan SDK interface (as it's incredibly well designed) but to augment it with a direct focus on Vulkan's GPU computing capabilities. This article provides a high level overview of the motivations of Kompute, together with a set of hands on examples that introduce both GPU computing as well as the core Vulkan Kompute architecture.

Comments
  • push_constant not working in my case?

    push_constant not working in my case?

    I tried both codes from README and the test (TestPushConstant.cpp), but apparently push_constant values all get zero value for some reasons? Here is the result for TestPushConstant.cpp):

    [alipmpaint@archlinux test]$ ./test_kompute
    Running main() from /home/alipmpaint/Documents/github/vulkan-kompute/external/googletest/googletest/src/gtest_main.cc
    [==========] Running 1 test from 1 test suite.
    [----------] Global test environment set-up.
    [----------] 1 test from TestPushConstants
    [ RUN      ] TestPushConstants.TestTwoConstants
    [2021-02-28 23:28:54.779] [info] [Shader.cpp:68] Kompute Shader Information: 
    
    
    WARNING: radv is not a conformant vulkan implementation, testing use only.
    [2021-02-28 23:28:54.940] [info] [Manager.cpp:269] Using physical device index 1 found AMD RADV VERDE (ACO)
    [2021-02-28 23:28:54.947] [info] [Algorithm.cpp:18] Kompute Algorithm initialising with tensor size: 1 and spirv size: 292
    [2021-02-28 23:28:54.947] [info] [Algorithm.cpp:395] Kompute OpAlgoCreate setting dispatch size
    [2021-02-28 23:28:54.947] [info] [Algorithm.cpp:409] Kompute OpAlgoCreate set dispatch size X: 1, Y: 1, Z: 1
    [2021-02-28 23:28:54.947] [info] [Sequence.cpp:44] Kompute Sequence command now started recording
    [2021-02-28 23:28:54.947] [info] [Sequence.cpp:58] Kompute Sequence command recording END
    /home/alipmpaint/Documents/github/vulkan-kompute/test/TestPushConstant.cpp:46: Failure
    Expected equality of these values:
      tensor->data()
        Which is: { 0, 0, 0 }
      kp::Constants({ 0.4, 0.4, 0.4 })
        Which is: { 0.4, 0.4, 0.4 }
    [2021-02-28 23:28:54.947] [info] [Sequence.cpp:186] Freeing CommandBuffer
    [2021-02-28 23:28:54.947] [info] [Sequence.cpp:202] Destroying CommandPool
    [2021-02-28 23:28:54.947] [info] [Sequence.cpp:219] Kompute Sequence clearing operations buffer
    [2021-02-28 23:28:54.947] [info] [Manager.cpp:102] Destroying device
    [2021-02-28 23:28:54.949] [warning] [Sequence.cpp:180] Kompute Sequence destroy called with null Device pointer
    [  FAILED  ] TestPushConstants.TestTwoConstants (269 ms)
    [----------] 1 test from TestPushConstants (269 ms total)
    
    [----------] Global test environment tear-down
    [==========] 1 test from 1 test suite ran. (269 ms total)
    [  PASSED  ] 0 tests.
    [  FAILED  ] 1 test, listed below:
    [  FAILED  ] TestPushConstants.TestTwoConstants
    
     1 FAILED TEST
    

    (specConsts work fine btw)

    I don't know where the problem exactly is... Since the tests have passed in Github actions and I have clones the same code, I guess it's local. I have two vulkan drivers, one is AMDVLK and the second one AMD RADV VERDE from mesa. What I just described happens with AMD RADV VERDE . In AMDVLK, I get a segfault(I still haven't investigated whether it's the push_constant or not since I'm running out of time) in both codes. So, I guess it has something to do with my Vulkan drivers?

    bug c++ 
    opened by unexploredtest 28
  • Update compileSource function in examples/docs to correct one

    Update compileSource function in examples/docs to correct one

    I am not a windows guy... but what the hell? Followed the example here: https://github.com/KomputeProject/kompute/tree/master/examples/array_multiplication

    Frustrating... The example does not mention vulkan headers and the other deps should be optional but are not. wth?

    cmake -Bbuild/ -DCMAKE_BUILD_TYPE=Debug -DKOMPUTE_OPT_INSTALL=0 -DKOMPUTE_OPT_REPO_SUBMODULE_BUILD=1 -DKOMPUTE_OPT_ENABLE_SPDLOG=1
    -- Building for: Visual Studio 16 2019
    -- Selecting Windows SDK version 10.0.17763.0 to target Windows 10.0.19043.
    -- The C compiler identification is MSVC 19.29.30138.0
    -- The CXX compiler identification is MSVC 19.29.30138.0
    -- Detecting C compiler ABI info
    -- Detecting C compiler ABI info - done
    -- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe - skipped
    -- Detecting C compile features
    -- Detecting C compile features - done
    -- Detecting CXX compiler ABI info
    -- Detecting CXX compiler ABI info - done
    -- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe - skipped
    -- Detecting CXX compile features
    -- Detecting CXX compile features - done
    -- Found Vulkan: C:/VulkanSDK/1.2.198.1/Lib/vulkan-1.lib
    CMake Error at C:/Users/rootkid/workspace/kompute/src/CMakeLists.txt:73 (add_subdirectory):
      The source directory
    
        C:/Users/rootkid/workspace/kompute/external/Vulkan-Headers
    
      does not contain a CMakeLists.txt file.
    
    
    CMake Error at C:/Users/rootkid/workspace/kompute/src/CMakeLists.txt:74 (get_target_property):
      get_target_property() called with non-existent target "Vulkan-Headers".
    
    
    CMake Error at C:/Users/rootkid/workspace/kompute/src/CMakeLists.txt:85 (add_subdirectory):
      The source directory
    
        C:/Users/rootkid/workspace/kompute/external/fmt
    
      does not contain a CMakeLists.txt file.
    
    
    CMake Error at C:/Users/rootkid/workspace/kompute/src/CMakeLists.txt:101 (add_subdirectory):
      The source directory
    
        C:/Users/rootkid/workspace/kompute/external/spdlog
    
      does not contain a CMakeLists.txt file.
    
    
    -- Configuring incomplete, errors occurred!
    See also "C:/Users/rootkid/workspace/kompute/examples/array_multiplication/build/CMakeFiles/CMakeOutput.log".
    
    opened by kommander 23
  • Refactor build system

    Refactor build system

    Why this PR?

    I have a couple of problems with the current way Kompute handles dependencies:

    1. git-Submodules are a rather outdated concept in my eyes and should be replaced since they are always a pain to use and are not flexible in any way.
    2. When including Kompute in a project that also uses Spdlog, it beaks a lot of stuff, e.g. Log-Macros stay in code, but we do not link Kompute against Spdlog.
    3. On my System (Fedora) Vulkan-Headers >= 1.3.0 are available, but my driver (mesa/intel) supports only >= 1.2.131. Well I need to link against a different version Vulkan-Headers without downgrading my System Vulkan-Headers since other applications depend on those. If I don't change the Vulkan-Header version, I always run into the following assertion: VULKAN_HPP_ASSERT( d.getVkHeaderVersion() == VK_HEADER_VERSION );

    Proposed Solutions

    • Replace git-Submodules with CMake fetch_content
      • This solves 1. and 3.
      • I also replaced KOMPUTE_OPT_REPO_SUBMODULE_BUILD with KOMPUTE_OPT_USE_BUILD_IN_{SPDLOG, VULKAN_HEADER, ...} and added a deprecation warning for the old way.
      • I added KOMPUTE_OPT_BUILD_IN_VULKAN_HEADER_TAG which allows consumers to set their specific Vulkan-Header version.
      • I added a check_vulkan_version to CMake that checks if your hardware supports for example Vulkan 1.3 if you link against Vulkan-Headers 1.3 and prints a warning if only the patch version of your hardware is < that the Vulkan-Header version. There is also an option to disable this: KOMPUTE_OPT_DISABLE_VULKAN_VERSION_CHECK
    • I also cleaned up all CMake options related to dependencies and fixed a few bugs here and there to solve 2.

    What is left to do?

    • [x] Check if it runs on Windows - working on it
    • [x] Check if this works with other GPUs (Nvidia)
    • [x] Fix Python Bindings
    • [x] Check if it works on Android
    • [x] Add all removed CMake options into the deprecated list here: https://github.com/COM8/kompute/blob/master/cmake/deprecation_warnings.cmake
    • [x] Update docs

    Things that need to be done once the PR has been merged

    • Update the repo and hash for the examples
      • https://github.com/COM8/kompute/blob/f6d30d5b0803c48e7f74c44adaf32e9bc613b98f/examples/array_multiplication/CMakeLists.txt#L28-L29
      • https://github.com/COM8/kompute/blob/f6d30d5b0803c48e7f74c44adaf32e9bc613b98f/examples/logistic_regression/CMakeLists.txt#L28-L29

    Let me know what you think about this.

    opened by COM8 20
  • Create example compiling and running in raspberry pi with Mesa Vulkan drivers

    Create example compiling and running in raspberry pi with Mesa Vulkan drivers

    It seems there are some relatively recent advancements in the Vulkan Drivers support for RaspberryPis (https://www.raspberrypi.org/blog/vulkan-update-were-conformant/). This issue encompasses exploring putting together an end to end example similar to the android example that shows how to run Kompute on a Raspberry Pi using the Mesa driver which enables for Vulkan 1.0 compliant processing in the Raspberry Pi (https://gitlab.freedesktop.org/mesa/mesa)

    documentation help wanted good first issue python c++ 
    opened by axsaucedo 20
  • made changes for include paths for complete installation

    made changes for include paths for complete installation

    When installed globally(sudo make install) glslang/StandAlone headers don't get installed(afaik) and only could get it to work after changing #include <StandAlone/ResourceLimits.h> to #include <glslang/Include/ResourceLimits.h>.
    Also I had to change #include <SPIRV/GlslangToSpv.h> to #include <glslang/SPIRV/GlslangToSpv.h but I figured that it would break compatibility if one wanted to run locally in the same directory and without global installation.

    opened by unexploredtest 19
  • Deep Learning Convolutional Neural Network (CNN) example implementation

    Deep Learning Convolutional Neural Network (CNN) example implementation

    I didn't see them in the list of shaders, and searching "conv" and "convolution" in this repository didn't return much.

    I have naive glsl shaders for convolutions (forwards and backwards), so I could convert those.

    opened by SimLeek 17
  • CMake Error: Imported target

    CMake Error: Imported target "kompute::kompute" includes non-existent path "/usr/local/single_include"

    When Vulkan Kompute is installed globally, it'll try to find /usr/local/single_include but it doesn't exist, the cause of problem is from:

    target_include_directories(
        kompute PUBLIC
        $<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/include>
        $<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/single_include>
        $<INSTALL_INTERFACE:include>
        $<INSTALL_INTERFACE:single_include>
    )
    

    In src/CMakeLists.txt.

    opened by unexploredtest 13
  • Test SingleSequenceRecord is not thread safe and fails in AMD card

    Test SingleSequenceRecord is not thread safe and fails in AMD card

    Hello. I am running the tests and got the following failure in the TEST(TestMultipleAlgoExecutions, SingleSequenceRecord):

    error: Expected equality of these values:
      tensorA->vector()
        Which is: { 1, 1, 1 }
      std::vector<float>({ 3, 3, 3 })
        Which is: { 3, 3, 3 }
    

    This is the sequence of commands of the test:

            mgr.sequence()
              ->record<kp::OpTensorSyncDevice>({ tensorA })
              ->record<kp::OpAlgoDispatch>(mgr.algorithm({ tensorA }, spirv))
              ->record<kp::OpAlgoDispatch>(mgr.algorithm({ tensorA }, spirv))
              ->record<kp::OpAlgoDispatch>(mgr.algorithm({ tensorA }, spirv))
              ->record<kp::OpTensorSyncLocal>({ tensorA })
              ->eval();
    

    If I add kp::OpTensorSyncDevice between the dispatches, it also fails. However, if I add eval() or kp::OpTensorSyncLocal between the dispatches, it passes.

    opened by lmreia 13
  • Delete methods for sequences inside managers

    Delete methods for sequences inside managers

    This solves #36 partially, the only thing that remains is creating a method that gives the ability delete a given anonymous sequence.
    It's my first time making a PR for a C++ project, so it might not be good. I have provided comments for each commit explaining what each does.

    enhancement 
    opened by unexploredtest 13
  • gcc12 build fails because std:shared_ptr requires explicit declation of <memory>

    gcc12 build fails because std:shared_ptr requires explicit declation of

    Build fails on opensuse tumbleweed

    Steps to reproduce
        ##Tested on commit 6b8b6e864a35a43ee71fe652fc95013aacf6904f
        $git clone https://github.com/KomputeProject/kompute.git
        $cmake -Bbuild
        $cd build
        $make
        
        $gcc (SUSE Linux) 12.2.1 20221020 [revision 0aaef83351473e8f4eb774f8f999bbe87a4866d7]
        Copyright (C) 2022 Free Software Foundation, Inc.
        This is free software; see the source for copying conditions.  There is NO
        warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
        
         $lsb_release -a
        LSB Version:    n/a
        Distributor ID: openSUSE
        Description:    openSUSE Tumbleweed
        Release:        20221124
        Codename:       n/a
        
       Error log
    gitrepo/kompute/build> make
    [ 16%] Built target fmt
    [ 27%] Built target kp_shader
    [ 38%] Built target kp_logger
    [ 44%] Building CXX object src/CMakeFiles/kompute.dir/OpTensorCopy.cpp.o
        In file included from /home/doof/gitrepo/kompute/src/include/kompute/operations/OpTensorCopy.hpp:6,
                         from /home/doof/gitrepo/kompute/src/OpTensorCopy.cpp:3:
        /home/doof/gitrepo/kompute/src/include/kompute/Tensor.hpp:55:27: error: expected โ€˜)โ€™ before โ€˜<โ€™ token
           55 |     Tensor(std::shared_ptr<vk::PhysicalDevice> physicalDevice,
              |           ~               ^
              |                           )
        /home/doof/gitrepo/kompute/src/include/kompute/Tensor.hpp:108:30: error: โ€˜std::shared_ptrโ€™ has not been declared
          108 |                         std::shared_ptr<Tensor> copyFromTensor);
              |                              ^~~~~~~~~~
        /home/doof/gitrepo/kompute/src/include/kompute/Tensor.hpp:108:40: error: expected โ€˜,โ€™ or โ€˜...โ€™ before โ€˜<โ€™ token
          108 |                         std::shared_ptr<Tensor> copyFromTensor);
              |                                        ^
        /home/doof/gitrepo/kompute/src/include/kompute/Tensor.hpp:256:10: error: โ€˜shared_ptrโ€™ in namespace โ€˜stdโ€™ does not name a template type
          256 |     std::shared_ptr<vk::PhysicalDevice> mPhysicalDevice;
              |          ^~~~~~~~~~
        /home/doof/gitrepo/kompute/src/include/kompute/Tensor.hpp:6:1: note: โ€˜std::shared_ptrโ€™ is defined in header โ€˜<memory>โ€™; did you forget to โ€˜#include <memory>โ€™?
            5 | #include "logger/Logger.hpp"
          +++ |+#include <memory>
            6 | #include <string>
        /home/doof/gitrepo/kompute/src/include/kompute/Tensor.hpp:257:10: error: โ€˜shared_ptrโ€™ in namespace โ€˜stdโ€™ does not name a template type
          257 |     std::shared_ptr<vk::Device> mDevice;
              |          ^~~~~~~~~~
        /home/doof/gitrepo/kompute/src/include/kompute/Tensor.hpp:257:5: note: โ€˜std::shared_ptrโ€™ is defined in header โ€˜<memory>โ€™; did you forget to โ€˜#include <memory>โ€™?
          257 |     std::shared_ptr<vk::Device> mDevice;
              |     ^~~
        /home/doof/gitrepo/kompute/src/include/kompute/Tensor.hpp:260:10: error: โ€˜shared_ptrโ€™ in namespace โ€˜stdโ€™ does not name a template type
          260 |     std::shared_ptr<vk::Buffer> mPrimaryBuffer;
    
    opened by hungrymonkey 12
  • java.lang.UnsatisfiedLinkError: dlopen failed: library

    java.lang.UnsatisfiedLinkError: dlopen failed: library "libkompute-jni.so" not found

    Hello, I am following the following tutorial: https://towardsdatascience.com/gpu-accelerated-machine-learning-in-your-mobile-applications-using-the-android-ndk-vulkan-kompute-1e9da37b7617. I am running ubuntu 20 and whenever I run the emulator I get the following error.

    E/AndroidRuntime: FATAL EXCEPTION: main Process: com.ethicalml.kompute, PID: 14488 java.lang.UnsatisfiedLinkError: dlopen failed: library "libkompute-jni.so" not found at java.lang.Runtime.loadLibrary0(Runtime.java:1087) at java.lang.Runtime.loadLibrary0(Runtime.java:1008) at java.lang.System.loadLibrary(System.java:1664) at com.ethicalml.kompute.KomputeJni.<clinit>(KomputeJni.kt:80) at java.lang.Class.newInstance(Native Method) at android.app.AppComponentFactory.instantiateActivity(AppComponentFactory.java:95) at androidx.core.app.CoreComponentFactory.instantiateActivity(CoreComponentFactory.java:41) at android.app.Instrumentation.newActivity(Instrumentation.java:1253) at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:3353) at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:3601) at android.app.servertransaction.LaunchActivityItem.execute(LaunchActivityItem.java:85) at android.app.servertransaction.TransactionExecutor.executeCallbacks(TransactionExecutor.java:135) at android.app.servertransaction.TransactionExecutor.execute(TransactionExecutor.java:95) at android.app.ActivityThread$H.handleMessage(ActivityThread.java:2066) at android.os.Handler.dispatchMessage(Handler.java:106) at android.os.Looper.loop(Looper.java:223) at android.app.ActivityThread.main(ActivityThread.java:7656) at java.lang.reflect.Method.invoke(Native Method) at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:592) at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:947) I/Process: Sending signal. PID: 14488 SIG: 9

    opened by PascalPolygon 11
  • Instance creation fails on macOS with recent Vulkan SDK

    Instance creation fails on macOS with recent Vulkan SDK

    Running on MacOS with Vulkan SDK 1.3.231. The Manager::createInstance() fails silently when calling

    vk::createInstance(&computeInstanceCreateInfo, nullptr, this->mInstance.get());
    

    Perhaps it should check the returned code, which in this case happens to be VK_ERROR_INCOMPATIBLE_DRIVER. It seems that the recent SDK requires the VK_KHR_PORTABILITY_ENUMERATION_EXTENSION_NAME extension to be enabled on MacOS:

    applicationExtensions.push_back(VK_KHR_PORTABILITY_ENUMERATION_EXTENSION_NAME);
    computeInstanceCreateInfo.flags |= vk::InstanceCreateFlagBits::eEnumeratePortabilityKHR;
    ...
    vk::createInstance(&computeInstanceCreateInfo, nullptr, this->mInstance.get());
    
    opened by Archie3d 0
Releases(v0.8.1)
  • v0.8.1(Apr 13, 2022)

    v0.8.1

    Full Changelog

    Closed issues:

    • Discord link in README and docs is broken #276
    • Website examples typo's and 6500 XT unknown GPU #275
    • [Question] How to disable all log ? #274
    • full diagram 404 #271
    • Error when enabling KOMPUTE\_ENABLE\_SPDLOG #268
    • Add KOMPUTE_LOG_ACTIVE_LEVEL instead of current SPDLOG_ACTIVE_LEVEL #267
    • Update/Fix Android project #264
    • Update compileSource function in examples/docs to correct one #261
    • Technically can Kompute be modified to support data visualization? #260
    • Data-transfer for Integrated GPU #258
    • Python "getting started" example fails #252
    • Python example in README doesn't work #248
    • Running Android app #234

    Merged pull requests:

    Source code(tar.gz)
    Source code(zip)
    Kompute.hpp(103.63 KB)
  • v0.8.0(Sep 16, 2021)

    v0.8.0

    Full Changelog

    Closed issues:

    • An unset KOMPUTE_ENV_DEBUG_LAYERS leads KP_LOG_DEBUG to pass envLayerNamesVal==nullptr #245
    • Extend utils shader helpers in test for windows #240
    • Python segfaults after import kp #230
    • Simple and extended python examples do not work (v 0.7.0) #228
    • Python macOS issue (ImportError: dlopen(...): no suitable image found. Did find: ...: mach-o, but wrong architecture) #223
    • Python macOS issue (Symbol not found: __PyThreadState_Current ... Expected in: flat namespace) #221
    • Finalise Migration of Kompute into Linux Foundation #216
    • CMake Error: Imported target "kompute::kompute" includes non-existent path "/usr/local/single_include" #212
    • Incompatibality inroduced with #168 on Vulkan 1.1.x #209
    • external libraries #201
    • Starting slack group or discord for alternative / faster version of asking questions #198
    • Test SingleSequenceRecord is not thread safe and fails in AMD card #196
    • Update Kompute headers to reference the glslang headers for install vs build interfaces #193
    • Integrate with GLSLang find_package file when issue is resolved in the glslang repo #191
    • Release 0.7.0 #187
    • Get number of available devices #185
    • Deep Learning Convolutional Neural Network (CNN) example implementation #162
    • Create example compiling and running in raspberry pi with Mesa Vulkan drivers #131
    • Add support for VK_EXT_debug_utils labels #110

    Merged pull requests:

    Source code(tar.gz)
    Source code(zip)
    Kompute.hpp(102.97 KB)
  • v0.7.0(Mar 14, 2021)

    Release v0.7.0

    The 0.7.0 release introduces a very extensive list of features - a high level overview includes:

    • Support for push constants
    • Support for specailisation constants
    • Support for tensor data types bool, float, double, int32 and uint32
    • Ability to define Operations outside manager
    • Ability to create Algorithm outside manager
    • New OpMemoryBarrier to add custom barriers
    • New OpAlgoDispatch to dispatch algorithm with push constants
    • New interface for sequences
    • New memory relationships all managed by top level manager with weak references allowing for smart pointers to terminate objects
    • Code coverage metrics using gcov + lcov

    Implemented enhancements:

    • Extend non-spdlog print functions to use std::format #158
    • Add code coverage reports with codecov #145
    • Explore removing std::vector mData; completely from Tensor in favour of always storing data in hostVisible buffer memory (TBC) #144
    • Update all examples to match breaking changes in 0.7.0 #141
    • Avoid copy when returning python numpy / array #139
    • Cover all Python & C++ tests in CI #121
    • Add C++ Test for Simple Work Groups Example #117
    • Expose push constants in OpAlgo #54
    • Expose ability to create barriers in OpTensor operations #45
    • Create delete function in manager to free / destroy sequence #36
    • Make specialisation data extensible #12
    • Support multiple types for Kompute Tensors #2
    • Added re-record sequence functionality and updated docs #171 (axsaucedo)
    • Extend non-spdlog print functions to use fmt::format / fmt::print #159 (axsaucedo)
    • Added support for custom SpecializedConstants and removed KomputeWorkgroup class #151 (axsaucedo)
    • Added destroy functions for tensors and sequences (named and object) #146 (axsaucedo)

    Fixed bugs:

    • push_constant not working in my case? #168
    • DescriptorPool set is not being freed #155
    • Updated memory barriers to include staging buffers #182 (axsaucedo)
    • Adds push const ranges in pipelinelayout to fix #168 #174 (axsaucedo)
    • Added destructor for staging tensors #134 (axsaucedo)

    Closed issues:

    • Update memory barriers to align with tensor staging/primary memory revamp #181
    • Move shader defaultResource inside kp::Shader class #175
    • Reach at least 90% code coverage on tests #170
    • Add functionality to re-record sequence as now it's possible to update the underlying algorithm #169
    • Use numpy arrays as default return value #166
    • Update all shared_ptr value passes to be by ref or const ref #161
    • Amend memory hierarchy for kp::Operations so they can be created separately #160
    • Customise theme of documentation #156
    • Remove KomputeWorkgroup class in favour of std::array<uint32_t, 3> #152
    • Passing raw GLSL string to Shader Module depricated so remove this method from supported approach #150
    • Add python backwards compatibility for eval_tensor_create_def #147
    • Document breaking changes for 0.7.0 #140
    • Tensor memory management and memory hierarchy redesign #136
    • Staging tensor GPU memory is not freed as part of OpCreateTensor removal #133
    • eStorage Tensors are currently unusable as OpTensorCreate calls mapDataIntoHostMemory #132
    • 0.6.0 Release #126
    • java.lang.UnsatisfiedLinkError: dlopen failed: library "libkompute-jni.so" not found #125
    • Initial exploration: Include explicit GLSL to SPIRV compilation #107
    • Add support for push constants #106

    Merged pull requests:

    • Resolve moving all functions from tensor HPP to CPP #186 (axsaucedo)
    • Device Properties #184 (alexander-g)
    • Too many warnings #183 (alexander-g)
    • Add support for bool, double, int32, uint32 and float32 on Tensors via TensorT #177 (axsaucedo)
    • Support for Timestamping #176 (alexander-g)
    • Test for ShaderResources #165 (aliPMPAINT)
    • Amend memory hierarchy to enable for push constants and functional interface for more flexible operations #164 (axsaucedo)
    • made changes for include paths for complete installation #163 (aliPMPAINT)
    • Added dark mode on docs #157 (axsaucedo)
    • Glslang implementation for online shader compilation #154 (axsaucedo)
    • Adding test code coverage using gcov and lcov #149 (axsaucedo)
    • Added temporary backwards compatibility for eval_tensor_create_def function #148 (axsaucedo)
    • Amend memory ownership hierarchy to have Tensor owned by Manager instead of OpCreateTensor / OpBase #138 (axsaucedo)
    • Removed Staging Tensors in favour of having two buffer & memory in a Tensor to minimise data transfer #137 (axsaucedo)
    Source code(tar.gz)
    Source code(zip)
    Kompute.hpp(97.97 KB)
  • v0.6.0(Jan 31, 2021)

    v0.6.0

    Full Changelog

    Implemented enhancements:

    Fixed bugs:

    • [PYTHON] Support string parameter instead of list for eval_algo_data when passing raw shader as string #93
    • [PYTHON] Fix log_level on the python implementation (using pybind's logging functions) #92

    Closed issues:

    • Add documentation for custom operations #128
    • Numpy Array Support and Work Group Configuration in Python Kompute #124
    • Remove references to spdlog in python module #122
    • Setup automated CI testing for PRs using GitHub actions #114
    • Python example type error (pyshader). #111
    • Update all references to operations to not use template #101
    • Getting a undefined reference error while creating a Kompute Manager #100

    Merged pull requests:

    Source code(tar.gz)
    Source code(zip)
    Kompute.hpp(95.19 KB)
  • v0.5.1(Nov 14, 2020)

  • v0.5.0(Nov 8, 2020)

    • Migrated all OpAlgoBase components to use dynamic layouts instead of templates #26, #57
    • Updated all examples to use spir-v bytes by default #86
    • Added compatibility for Vulkan Headers HPP v1.2.154+ #84
    • Added Python Pypi package for Kompute #87
    • Added python interface functions to process python spirv bytes directly* Added implementation of Logistic Regression implementation in Python
    • Extended examples to showcase pyshader to use more pythonic GPU development
    • Enabled spdlog builds by default on python package
    • Added multi-platform python package installs via pypi https://pypi.org/project/kp/
    • Added log level config functionality in python
    • Added Python Bindings for Kompute library
    • Added Python tests showcasing core functionality using Manager and Sequences
    • Added documentation integrating pyhton class references (to be automated)
    • Changed sequences to be returned as shader_ptr instead of weak_ptr
    • Added sequence memory management via init member function
    • Added explicit definition on VulkanDestroy funtions for VulkanHPP 1.2.155-1.2.158 compat
    • Removed template parameters from OpAlgoBase functions (added Op*.cpp files)
    • Added python build to main cmake file
    • Added pybind11 submodule
    • Added Sequence tests to verify memory management via init member function
    • Update e2e examples
    • Add Python documentation and further examples
    Source code(tar.gz)
    Source code(zip)
    Kompute.hpp(94.63 KB)
  • v0.4.1(Nov 1, 2020)

    • Support for Vulkan HPP version 1.2.155 and above (tested with every version until 1.2.158)
    • Updated linux docker image to test multiple versions of Vulkan SDK
    • CCLS support for submodule builds (besides vcpkg builds)
    • Submodules for core dependencies for flexibility when testing dependencies with particular versions
    • Build flags to configure submodule builds vs vcpkg (toolchain based) builds
    • Removed range prints which removes explicit dependency on fmt
    • Syntax correction of private/protected member variables
    • AUR package added by contributor https://aur.archlinux.org/packages/vulkan-kompute-git/ via #81
    Source code(tar.gz)
    Source code(zip)
    Kompute.hpp(106.00 KB)
  • v0.4.0(Oct 18, 2020)

    • Added async/await capabilities for manager and sequence
    • Added fence resource as member object for Sequence
    • Ensure sequence begin() function clears previous operations
    • Ensure begin does not get called if sequence in running state
    • Ensure manager creates new sequence when default functions called
    • Added capabilities for multi queue support in manager with explicit allocation on sequences
    • Fixed compile warnings on Linux (ubuntu)
    • Added createManagedSequence on manager with ability for default sequence name to be created
    • Added tests for asynchronous and parallel processing
    • Added LogisticRegression shader as cpp header
    • Added documentation on advanced examples
    • Added documentation on shader-to-cpp-header scripts
    • Added CNAME for kompute.cc domain
    Source code(tar.gz)
    Source code(zip)
    Kompute.hpp(105.76 KB)
  • v0.3.2(Oct 4, 2020)

    • Enabled dynamic loading of Vulkan library for Vulkan support
    • Added Android NDK Wrapper with Vulkan HPP support
    • Updated CMAKE min standard to C++14
    • Downgraded min cmake requirement to 3.4.1
    • Added new shader file as shaderlogisticregression as part of AggregateHeader
    • Added #pragma once guard to Kompte.hpp single header
    • Updated createComputePipeline to return Result instead of ResultValue for backwards compatibility
    • Added new compute flags for android including:
      • KOMPUTE_OPT_INSTALL
      • KOMPUTE_OPT_ANDROID_BUILD
      • KOMPUTE_OPT_DISABLE_VK_DEBUG_LAYERS
      • KOMPUTE_VK_API_VERSION
      • KOMPUTE_VK_API_MINOR_VERSION
      • KOMPUTE_VK_API_MAJOR_VERSION
    Source code(tar.gz)
    Source code(zip)
    Kompute.hpp(99.30 KB)
  • v0.3.1(Sep 20, 2020)

  • v0.3.0(Sep 13, 2020)

    • #50 - Added documentation and testing for reusing recorded commands
    • #19 - Added Machine Learning example with Kompute
    • #40 - Provide further granularity for handling tensor data
    • #39 - Create OpTensorSyncLocal and OpTensorSyncDevice operations
    • #43 - Renamed OpCreateTensor to OpTensorCreate to align with tensor operations
    • #43 - Fixed OpTensorCreate not mapping data for host tensors
    • #47 - Add preSubmit function to OpBase to account for multiple eval commands
    • #58 - Add standalone examples that show using Kompute from scratch
    • #58 - Make Kompute installable locally
    • #56 - Remove OpAlgoBase copy tensor functionality to delegate to OpTensorSync
    Source code(tar.gz)
    Source code(zip)
    Kompute.hpp(66.67 KB)
  • v0.2.0(Sep 5, 2020)

    • #18 - Improve access to underlying data for speed and ease of access
    • #17 - Enabled for compute shaders to be passed as files or strings
    • #13 - Enable OpCreateTensor to receive more than 1 tensor
    • #11 - Add default specialisation data to algorithm holding all tensor sizes
    • #9 - Added documentation automated with doxygen and sphinx
    • #15 - Memory profiling and ensured no memory leaks explicitly
    • #30 - Removed spdlog as required dependency and into optional
    • #37 - Migrated to GTest
    Source code(tar.gz)
    Source code(zip)
    Kompute.hpp(60.65 KB)
  • v0.1.0(Aug 29, 2020)

    Vulkan Kompute Release 0.1.0

    The 0.1.0 release of Vulkan Kompute, the General Purpose Vulkan Compute Framework.

    Features

    • Automatic detection and selection of phyiscal GPU device
    • Single header import with dynamic library available for easy integration
    • Uses vcpkg with optional manifest to download / fetch dependencies
    • Provides high level kompute interface via Manager and Sequence
    • GPU processing data abstracted with Tensors
    • Compute shaders and pipelines are abstracted via algorithms
    • Base options for operations including create tensor and multiplication as example
    • Ability to initialise components with external vulkan resources
    • Conversion of glsl shaders into SPIR-V and native HPP files for static building
    • DEBUG option for verbose builds that cover execution path in detail with SPDLOG
    • Docker image to build the library in linux environments
    • CMAKE support with tested functionality in linux and windows
    • Documentation containing autodoc with Doxygen using Sphinx
    • Unit and integration testing using Catch2
    Source code(tar.gz)
    Source code(zip)
    Kompute.hpp(40.65 KB)
Owner
The Institute for Ethical Machine Learning
The Institute for Ethical Machine Learning is a think-tank that brings together with technology leaders, policymakers & academics to develop standards for ML.
The Institute for Ethical Machine Learning
ArrayFire: a general purpose GPU library.

ArrayFire is a general-purpose library that simplifies the process of developing software that targets parallel and massively-parallel architectures i

ArrayFire 4k Dec 29, 2022
Python 3 Bindings for NVML library. Get NVIDIA GPU status inside your program.

py3nvml Documentation also available at readthedocs. Python 3 compatible bindings to the NVIDIA Management Library. Can be used to query the state of

Fergal Cotter 212 Jan 4, 2023
A Python module for getting the GPU status from NVIDA GPUs using nvidia-smi programmically in Python

GPUtil GPUtil is a Python module for getting the GPU status from NVIDA GPUs using nvidia-smi. GPUtil locates all GPUs on the computer, determines thei

Anders Krogh Mortensen 927 Dec 8, 2022
jupyter/ipython experiment containers for GPU and general RAM re-use

ipyexperiments jupyter/ipython experiment containers and utils for profiling and reclaiming GPU and general RAM, and detecting memory leaks. About Thi

Stas Bekman 153 Dec 7, 2022
Python 3 Bindings for the NVIDIA Management Library

====== pyNVML ====== *** Patched to support Python 3 (and Python 2) *** ------------------------------------------------ Python bindings to the NVID

Nicolas Hennion 95 Jan 1, 2023
๐Ÿ“Š A simple command-line utility for querying and monitoring GPU status

gpustat Just less than nvidia-smi? NOTE: This works with NVIDIA Graphics Devices only, no AMD support as of now. Contributions are welcome! Self-Promo

Jongwook Choi 3.2k Jan 4, 2023
Python interface to GPU-powered libraries

Package Description scikit-cuda provides Python interfaces to many of the functions in the CUDA device/runtime, CUBLAS, CUFFT, and CUSOLVER libraries

Lev E. Givon 924 Dec 26, 2022
cuDF - GPU DataFrame Library

cuDF - GPU DataFrames NOTE: For the latest stable README.md ensure you are on the main branch. Resources cuDF Reference Documentation: Python API refe

RAPIDS 5.2k Jan 8, 2023
BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.

A lightweight, GPU accelerated, SQL engine built on the RAPIDS.ai ecosystem. Get Started on app.blazingsql.com Getting Started | Documentation | Examp

BlazingSQL 1.8k Jan 2, 2023
Library for faster pinned CPU <-> GPU transfer in Pytorch

SpeedTorch Faster pinned CPU tensor <-> GPU Pytorch variabe transfer and GPU tensor <-> GPU Pytorch variable transfer, in certain cases. Update 9-29-1

Santosh Gupta 657 Dec 19, 2022
A Python function for Slurm, to monitor the GPU information

Gpu-Monitor A Python function for Slurm, where I couldn't use nvidia-smi to monitor the GPU information. whole repo is not finish Installation TODO Mo

Squidward Tentacles 2 Feb 11, 2022
cuSignal - RAPIDS Signal Processing Library

cuSignal The RAPIDS cuSignal project leverages CuPy, Numba, and the RAPIDS ecosystem for GPU accelerated signal processing. In some cases, cuSignal is

RAPIDS 646 Dec 30, 2022
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Introduction This repository holds NVIDIA-maintained utilities to streamline mixed precision and distributed training in Pytorch. Some of the code her

NVIDIA Corporation 6.9k Dec 28, 2022
General purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends)

General purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases. Backed by the Linux Foundation.

The Kompute Project 1k Jan 6, 2023
High performance Cross-platform Inference-engine, you could run Anakin on x86-cpu,arm, nv-gpu, amd-gpu,bitmain and cambricon devices.

Anakin2.0 Welcome to the Anakin GitHub. Anakin is a cross-platform, high-performance inference engine, which is originally developed by Baidu engineer

null 514 Dec 28, 2022
Python 3.9.4 Graphics and Compute Shader Framework and Primitives with no external module dependencies

pyshader Python 3.9.4 Graphics and Compute Shader Framework and Primitives with no external module dependencies Fully programmable shader model (even

Alastair Cota 1 Jan 11, 2022
Unofficial PyTorch implementation of MobileViT based on paper "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer".

MobileViT RegNet Unofficial PyTorch implementation of MobileViT based on paper MOBILEVIT: LIGHT-WEIGHT, GENERAL-PURPOSE, AND MOBILE-FRIENDLY VISION TR

Hong-Jia Chen 91 Dec 2, 2022
Spam your friends and famly and when you do your famly will disown you and you will have no friends.

SpamBot9000 Spam your friends and family and when you do your family will disown you and you will have no friends. Terms of Use Disclaimer: Please onl

DJ15 0 Jun 9, 2022
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

NVIDIA DALI The NVIDIA Data Loading Library (DALI) is a library for data loading and pre-processing to accelerate deep learning applications. It provi

NVIDIA Corporation 4.2k Jan 8, 2023
๐Ÿฆ• NanoSaur is a little tracked robot ROS2 enabled, made for an NVIDIA Jetson Nano

?? nanosaur NanoSaur is a little tracked robot ROS2 enabled, made for an NVIDIA Jetson Nano Website: nanosaur.ai Do you need an help? Discord For tech

NanoSaur 162 Dec 9, 2022