Several simple examples for popular neural network toolkits calling custom CUDA operators.

WeiYang

Last update: Jan 1, 2023

Related tags

Deep Learning python neural-network cpp tensorflow cuda pytorch

Overview

Neural Network CUDA Example

Several simple examples for neural network toolkits (PyTorch, TensorFlow, etc.) calling custom CUDA operators.

We provide several ways to compile the CUDA kernels and their cpp wrappers, including jit, setuptools and cmake.

We also provide several python codes to call the CUDA kernels, including kernel time statistics and model training.

For more accurate time statistics, you'd best use nvprof or nsys to run the code.

Environments

NVIDIA Driver: 418.116.00
CUDA: 11.0
Python: 3.7.3
PyTorch: 1.7.0+cu110
TensorFlow: 2.4.1
CMake: 3.16.3
Ninja: 1.10.0
GCC: 8.3.0

Cannot ensure successful running in other environments.

Code structure

├── include
│   └── add2.h # header file of add2 cuda kernel
├── kernel
│   └── add2_kernel.cu # add2 cuda kernel
├── pytorch
│   ├── add2_ops.cpp # torch wrapper of add2 cuda kernel
│   ├── time.py # time comparison of cuda kernel and torch
│   ├── train.py # training using custom cuda kernel
│   ├── setup.py
│   └── CMakeLists.txt
├── tensorflow
│   ├── add2_ops.cpp # tensorflow wrapper of add2 cuda kernel
│   ├── time.py # time comparison of cuda kernel and tensorflow
│   ├── train.py # training using custom cuda kernel
│   └── CMakeLists.txt
├── LICENSE
└── README.md

PyTorch

Compile cpp and cuda

JIT
Directly run the python code.

Setuptools

python3 pytorch/setup.py install

CMake

mkdir build
cd build
cmake ../pytorch
make

Run python

Compare kernel running time

python3 pytorch/time.py --compiler jit
python3 pytorch/time.py --compiler setup
python3 pytorch/time.py --compiler cmake

Train model

python3 pytorch/train.py --compiler jit
python3 pytorch/train.py --compiler setup
python3 pytorch/train.py --compiler cmake

TensorFlow

Compile cpp and cuda

CMake

mkdir build
cd build
cmake ../tensorflow
make

Run python

Compare kernel running time

python3 tensorflow/time.py --compiler cmake

Train model

python3 tensorflow/train.py --compiler cmake

Implementation details (in Chinese)

PyTorch自定义CUDA算子教程与运行时间分析
 详解PyTorch编译并调用自定义CUDA算子的三种方式
 三分钟教你如何PyTorch自定义反向传播

F.A.Q

Q. ImportError: libc10.so: cannot open shared object file: No such file or directory

A. You must do import torch before import add2.

Comments

ERROR: expected constructor, destructor, or type conversion before ‘(’ token
Thanks for your sharing. When I compile pytorch project using JIT or Setuptools (e.g., python3 pytorch/setup.py install), I have a error as follows:

/home/chenxingyu/Documents/NN-CUDA-Example/pytorch/add2_ops.cpp:20:14: error: expected constructor, destructor, or type conversion before ‘(’ token TORCH_LIBRARY(add2, m) {

Could you help me solve it?
opened by SeanChenxy 2
tf2.3 cuda10.1 tf.load_op_libary("build/libadd2.so") error

Traceback (most recent call last): File "tensorflow/time.py", line 57, in cuda_module = tf.load_op_library('build/libadd2.so') File "/home/guowei/anaconda3/envs/gy_py3.7_tf2.3/lib/python3.7/site-packages/tensorflow/python/framework/load_library.py", line 58, in load_op_library lib_handle = py_tf.TF_LoadLibrary(library_filename) tensorflow.python.framework.errors_impl.NotFoundError: build/libadd2.so: undefined symbol: _ZTIN10tensorflow8OpKernelE

opened by calliwen 2
RuntimeError: CUDA error: an illegal memory access was encountered ===== ?

I have changed the train.py here into this. And I meet this error : RuntimeError: CUDA error: an illegal memory access was encountered

(My environment is OK...using normal train.py can work)

Would you provide any suggestion? Thank you!!!

opened by Arsmart123 1
A compilation problem about setup.py

This problem occurs when I compile

running install running bdist_egg running egg_info writing add2.egg-info\PKG-INFO writing dependency_links to add2.egg-info\dependency_links.txt writing top-level names to add2.egg-info\top_level.txt reading manifest file 'add2.egg-info\SOURCES.txt' writing manifest file 'add2.egg-info\SOURCES.txt' installing library code to build\bdist.win-amd64\egg running install_lib running build_ext D:\miniconda3\envs\torch-gpu\lib\site-packages\torch\utils\cpp_extension.py:304: UserWarning: Error checking compiler version for cl: [WinError 2] 系统找不到指定的文件。 warnings.warn(f'Error checking compiler version for {compiler}: {error}') building 'add2' extension Emitting ninja build file D:\git-bash\daima\NN-CUDA-Example\build\temp.win-amd64-3.8\Release\build.ninja... Compiling objects... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) 1.10.2.git.kitware.jobserver-1 C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:D:\miniconda3\envs\torch-gpu\lib\site- packages\torch\lib "/LIBPATH:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\lib/x64" /LIBPATH:D:\miniconda3\envs\torch-gpu\libs /LIBPATH:D:\miniconda3\envs\torch-gpu\PCbuild\amd64 "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\lib\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.10240.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\8.1\lib\winv6.3\um\x64" c10.lib torch.lib tor ch_cpu.lib torch_python.lib cudart.lib c10_cuda.lib torch_cuda_cu.lib torch_cuda_cpp.lib /EXPORT:PyInit_add2 D:\git-bash\daima\NN-CUDA-Example\build\temp.win-amd64-3.8\Release\pytorch/add2_ops.obj D:\git-bash\daima\NN-CUDA-Example\b uild\temp.win-amd64-3.8\Release\kernel/add2_kernel.obj /OUT:build\lib.win-amd64-3.8\add2.cp38-win_amd64.pyd /IMPLIB:D:\git-bash\daima\NN-CUDA-Example\build\temp.win-amd64-3.8\Release\pytorch\add2.cp38-win_amd64.lib LINK : fatal error LNK1181: 无法打开输入文件“D:\git-bash\daima\NN-CUDA-Example\build\temp.win-amd64-3.8\Release\pytorch\add2_ops.obj” error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\link.exe' failed with exit status 1181

The environment configuration is as follows： pytorch1.8.1+cuda11.1 python3.8

How to solve this problem？

opened by Richard9797 0
subprocess.CalledProcessError: Command '['where', 'cl']' returned non-zero exit status 1.(Using jit)

The environment is competely same....But when I am using pytorch with jit method, below error appears:

subprocess.CalledProcessError: Command '['where', 'cl']' returned non-zero exit status 1.

Any suggestion? Thank you!!!!

opened by Arsmart123 1
Cmake error，other 2 compile methods succeeded

Checking whether the CUDA compiler is NVIDIA using "" did not match "nvcc: NVIDIA (R) Cuda compiler driver": Checking whether the CUDA compiler is Clang using "" did not match "(clang version)": Compiling the CUDA compiler identification source file "CMakeCUDACompilerId.cu" failed. Compiler: /usr/local/cuda/bin/nvcc Build flags: Id flags: -v

The output was: No such file or directory

Compiling the CUDA compiler identification source file "CMakeCUDACompilerId.cu" failed. Compiler: /usr/local/cuda/bin/nvcc Build flags: Id flags: -v

The output was: No such file or directory

Checking whether the CUDA compiler is NVIDIA using "" did not match "nvcc: NVIDIA (R) Cuda compiler driver": Checking whether the CUDA compiler is Clang using "" did not match "(clang version)": Checking whether the CUDA compiler is NVIDIA using "" did not match "nvcc: NVIDIA (R) Cuda compiler driver": Checking whether the CUDA compiler is Clang using "" did not match "(clang version)": Checking whether the CUDA compiler is NVIDIA using "" did not match "nvcc: NVIDIA (R) Cuda compiler driver": Checking whether the CUDA compiler is Clang using "" did not match "(clang version)": Compiling the CUDA compiler identification source file "CMakeCUDACompilerId.cu" failed. Compiler: /usr/local/cuda/bin/nvcc Build flags: Id flags: -v

The output was: No such file or directory

Compiling the CUDA compiler identification source file "CMakeCUDACompilerId.cu" failed. Compiler: /usr/local/cuda/bin/nvcc Build flags: Id flags: -v

The output was: No such file or directory

Checking whether the CUDA compiler is NVIDIA using "" did not match "nvcc: NVIDIA (R) Cuda compiler driver": Checking whether the CUDA compiler is Clang using "" did not match "(clang version)": Checking whether the CUDA compiler is NVIDIA using "" did not match "nvcc: NVIDIA (R) Cuda compiler driver": Checking whether the CUDA compiler is Clang using "" did not match "(clang version)": Checking whether the CUDA compiler is NVIDIA using "" did not match "nvcc: NVIDIA (R) Cuda compiler driver": Checking whether the CUDA compiler is Clang using "" did not match "(clang version)": Compiling the CUDA compiler identification source file "CMakeCUDACompilerId.cu" failed. Compiler: /usr/local/cuda/bin/nvcc Build flags: Id flags: -v

The output was: No such file or directory

Compiling the CUDA compiler identification source file "CMakeCUDACompilerId.cu" failed. Compiler: /usr/local/cuda/bin/nvcc Build flags: Id flags: -v

The output was: No such file or directory

Checking whether the CUDA compiler is NVIDIA using "" did not match "nvcc: NVIDIA (R) Cuda compiler driver": Checking whether the CUDA compiler is Clang using "" did not match "(clang version)": Checking whether the CUDA compiler is NVIDIA using "" did not match "nvcc: NVIDIA (R) Cuda compiler driver": Checking whether the CUDA compiler is Clang using "" did not match "(clang version)": Checking whether the CUDA compiler is NVIDIA using "" did not match "nvcc: NVIDIA (R) Cuda compiler driver": Checking whether the CUDA compiler is Clang using "" did not match "(clang version)": Compiling the CUDA compiler identification source file "CMakeCUDACompilerId.cu" failed. Compiler: /usr/local/cuda/bin/nvcc Build flags: Id flags: -v

The output was: No such file or directory

Compiling the CUDA compiler identification source file "CMakeCUDACompilerId.cu" failed. Compiler: /usr/local/cuda/bin/nvcc Build flags: Id flags: -v

The output was: No such file or directory

Checking whether the CUDA compiler is NVIDIA using "" did not match "nvcc: NVIDIA (R) Cuda compiler driver": Checking whether the CUDA compiler is Clang using "" did not match "(clang version)": Checking whether the CUDA compiler is NVIDIA using "" did not match "nvcc: NVIDIA (R) Cuda compiler driver": Checking whether the CUDA compiler is Clang using "" did not match "(clang version)": Performing C++ SOURCE FILE Test CMAKE_HAVE_LIBC_PTHREAD failed with the following output: Change Dir: /data/jupyter/cuda_learn/NN-CUDA-Example/build/CMakeFiles/CMakeTmp

Run Build Command(s):/usr/bin/make -f Makefile cmTC_6d839/fast && /usr/bin/make -f CMakeFiles/cmTC_6d839.dir/build.make CMakeFiles/cmTC_6d839.dir/build make[1]: Entering directory '/data/jupyter/cuda_learn/NN-CUDA-Example/build/CMakeFiles/CMakeTmp' Building CXX object CMakeFiles/cmTC_6d839.dir/src.cxx.o /usr/bin/c++ -DCMAKE_HAVE_LIBC_PTHREAD -o CMakeFiles/cmTC_6d839.dir/src.cxx.o -c /data/jupyter/cuda_learn/NN-CUDA-Example/build/CMakeFiles/CMakeTmp/src.cxx Linking CXX executable cmTC_6d839 /home/huyang02/anaconda3/lib/python3.6/site-packages/cmake/data/bin/cmake -E cmake_link_script CMakeFiles/cmTC_6d839.dir/link.txt --verbose=1 /usr/bin/c++ -rdynamic CMakeFiles/cmTC_6d839.dir/src.cxx.o -o cmTC_6d839 CMakeFiles/cmTC_6d839.dir/src.cxx.o: In function main': src.cxx:(.text+0x3e): undefined reference topthread_create' src.cxx:(.text+0x4a): undefined reference to pthread_detach' src.cxx:(.text+0x56): undefined reference topthread_cancel' src.cxx:(.text+0x67): undefined reference to pthread_join' src.cxx:(.text+0x7b): undefined reference topthread_atfork' collect2: error: ld returned 1 exit status CMakeFiles/cmTC_6d839.dir/build.make:98: recipe for target 'cmTC_6d839' failed make[1]: *** [cmTC_6d839] Error 1 make[1]: Leaving directory '/data/jupyter/cuda_learn/NN-CUDA-Example/build/CMakeFiles/CMakeTmp' Makefile:127: recipe for target 'cmTC_6d839/fast' failed make: *** [cmTC_6d839/fast] Error 2

Source file was: #include <pthread.h>

static void* test_func(void* data) { return data; }

int main(void) { pthread_t thread; pthread_create(&thread, NULL, test_func, NULL); pthread_detach(thread); pthread_cancel(thread); pthread_join(thread, NULL); pthread_atfork(NULL, NULL, NULL); pthread_exit(NULL);

return 0; }

Determining if the function pthread_create exists in the pthreads failed with the following output: Change Dir: /data/jupyter/cuda_learn/NN-CUDA-Example/build/CMakeFiles/CMakeTmp

Run Build Command(s):/usr/bin/make -f Makefile cmTC_308c5/fast && /usr/bin/make -f CMakeFiles/cmTC_308c5.dir/build.make CMakeFiles/cmTC_308c5.dir/build make[1]: Entering directory '/data/jupyter/cuda_learn/NN-CUDA-Example/build/CMakeFiles/CMakeTmp' Building CXX object CMakeFiles/cmTC_308c5.dir/CheckFunctionExists.cxx.o /usr/bin/c++ -DCHECK_FUNCTION_EXISTS=pthread_create -o CMakeFiles/cmTC_308c5.dir/CheckFunctionExists.cxx.o -c /data/jupyter/cuda_learn/NN-CUDA-Example/build/CMakeFiles/CheckLibraryExists/CheckFunctionExists.cxx Linking CXX executable cmTC_308c5 /home/huyang02/anaconda3/lib/python3.6/site-packages/cmake/data/bin/cmake -E cmake_link_script CMakeFiles/cmTC_308c5.dir/link.txt --verbose=1 /usr/bin/c++ -DCHECK_FUNCTION_EXISTS=pthread_create -rdynamic CMakeFiles/cmTC_308c5.dir/CheckFunctionExists.cxx.o -o cmTC_308c5 -lpthreads /usr/bin/ld: cannot find -lpthreads collect2: error: ld returned 1 exit status CMakeFiles/cmTC_308c5.dir/build.make:98: recipe for target 'cmTC_308c5' failed make[1]: *** [cmTC_308c5] Error 1 make[1]: Leaving directory '/data/jupyter/cuda_learn/NN-CUDA-Example/build/CMakeFiles/CMakeTmp' Makefile:127: recipe for target 'cmTC_308c5/fast' failed make: *** [cmTC_308c5/fast] Error 2

opened by yanghu819 0
阳神能不能帮我看看为什么编译失败报错RuntimeError: Error building extension 'add2'，后面这句不理解fatal error: add2.h: No such file or directory

我在Jupiter notebook里运行下面jit命令的编译报错如下 python3 time.py --compiler jit

Using /tmp/torch_extensions as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /tmp/torch_extensions/add2/build.ninja... Building extension module add2... [1/2] c++ -MMD -MF add2_ops.o.d -DTORCH_EXTENSION_NAME=add2 -DTORCH_API_INCLUDE_EXTENSION_H -I/gby/NN-CUDA-Example-master/pytorch/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.6/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /gby/NN-CUDA-Example-master/pytorch/add2_ops.cpp -o add2_ops.o FAILED: add2_ops.o c++ -MMD -MF add2_ops.o.d -DTORCH_EXTENSION_NAME=add2 -DTORCH_API_INCLUDE_EXTENSION_H -I/gby/NN-CUDA-Example-master/pytorch/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.6/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /gby/NN-CUDA-Example-master/pytorch/add2_ops.cpp -o add2_ops.o /gby/NN-CUDA-Example-master/pytorch/add2_ops.cpp:2:10: fatal error: add2.h: No such file or directory #include "add2.h" ^~~~~~~~ compilation terminated. ninja: build stopped: subcommand failed. Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 960, in _build_extension_module check=True) File "/usr/lib/python3.6/subprocess.py", line 438, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "time.py", line 56, in verbose=True) File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 658, in load is_python_module) File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 827, in _jit_compile with_cuda=with_cuda) File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 880, in _write_ninja_file_and_build _build_extension_module(name, build_directory, verbose) File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 973, in _build_extension_module raise RuntimeError(message) RuntimeError: Error building extension 'add2'

运行下面cmake命令的编译报错如下 Traceback (most recent call last): File "time.py", line 60, in torch.ops.load_library("build/libadd2.so") File "/usr/local/lib/python3.6/dist-packages/torch/_ops.py", line 106, in load_library ctypes.CDLL(path) File "/usr/lib/python3.6/ctypes/init.py", line 348, in init self._handle = _dlopen(self._name, mode) OSError: /gby/NN-CUDA-Example-master/pytorch/build/libadd2.so: cannot open shared object file: No such file or directory

opened by Henry-Avery 1
torch.ops.load_library("build/libadd2.so") error

Traceback (most recent call last): File "time.py", line 60, in torch.ops.load_library("build/libadd2.so") File "/home/gzy/anaconda3/envs/pytorch1.7/lib/python3.6/site-packages/torch/_ops.py", line 105, in load_library ctypes.CDLL(path) File "/home/gzy/anaconda3/envs/pytorch1.7/lib/python3.6/ctypes/init.py", line 348, in init self._handle = _dlopen(self._name, mode) OSError: /home/gzy/NN-CUDA-Example/pytorch/build/libadd2.so: undefined symbol: THPVariableClass

opened by longzeyilang 4

Owner

WeiYang

微信公众号「算法码上来」 / ByteDance AI Lab / East China Normal University

GitHub

An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

45 Dec 8, 2022

Extending JAX with custom C++ and CUDA code

Extending JAX with custom C++ and CUDA code This repository is meant as a tutorial demonstrating the infrastructure required to provide custom ops in

237 Dec 23, 2022

Neural network for digit classification powered by cuda

cuda_nn_mnist Neural network library for digit classification powered by cuda Resources The library was built to work with MNIST dataset. python-mnist

1 Dec 20, 2021

Lunar is a neural network aimbot that uses real-time object detection accelerated with CUDA on Nvidia GPUs.

Lunar Lunar is a neural network aimbot that uses real-time object detection accelerated with CUDA on Nvidia GPUs. About Lunar can be modified to work

276 Jan 7, 2023

ColossalAI-Examples - Examples of training models with hybrid parallelism using ColossalAI

ColossalAI-Examples This repository contains examples of training models with Co

185 Jan 9, 2023

Learning nonlinear operators via DeepONet

DeepONet: Learning nonlinear operators The source code for the paper Learning nonlinear operators via DeepONet based on the universal approximation th

239 Jan 2, 2023

Calling Julia from Python - an experiment on data loading

Calling Julia from Python - an experiment on data loading See the slides. TLDR After reading Patrick's blog post, we decided to try to replace C++ wit

8 Jun 7, 2022

Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.

Jittor: a Just-in-time(JIT) deep learning framework Quickstart | Install | Tutorial | Chinese Jittor is a high-performance deep learning framework bas

2.7k Jan 3, 2023

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis. You write a high level configuration file specifying your in

917 Jan 3, 2023

Sparse-dense operators implementation for Paddle

Sparse-dense operators implementation for Paddle This module implements coo, csc and csr matrix formats and their inter-ops with dense matrices. Feel

3 Dec 17, 2022

A dead simple python wrapper for darknet that works with OpenCV 4.1, CUDA 10.1

What Dead simple python wrapper for Yolo V3 using AlexyAB's darknet fork. Works with CUDA 10.1 and OpenCV 4.1 or later (I use OpenCV master as of Jun

6 Jan 12, 2022

Example-custom-ml-block-keras - Custom Keras ML block example for Edge Impulse

Custom Keras ML block example for Edge Impulse This repository is an example on

8 Nov 2, 2022

Picasso: A CUDA-based Library for Deep Learning over 3D Meshes

The Picasso Library is intended for complex real-world applications with large-scale surfaces, while it also performs impressively on the small-scale applications over synthetic shape manifolds. We have upgraded the point cloud modules of SPH3D-GCN from homogeneous to heterogeneous representations, and included the upgraded modules into this latest work as well. We are happy to announce that the work is accepted to IEEE CVPR2021.

97 Dec 1, 2022

Several simple examples for popular neural network toolkits calling custom CUDA operators.

Related tags

Overview

Neural Network CUDA Example

Environments

Code structure

PyTorch

Compile cpp and cuda

Run python

TensorFlow

Compile cpp and cuda

Run python

Implementation details (in Chinese)

F.A.Q

Comments

我在Jupiter notebook里运行下面jit命令的编译报错如下 python3 time.py --compiler jit

Owner

WeiYang

An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

Extending JAX with custom C++ and CUDA code

Neural network for digit classification powered by cuda

Lunar is a neural network aimbot that uses real-time object detection accelerated with CUDA on Nvidia GPUs.

ColossalAI-Examples - Examples of training models with hybrid parallelism using ColossalAI

Learning nonlinear operators via DeepONet

Calling Julia from Python - an experiment on data loading

Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis

Sparse-dense operators implementation for Paddle

A dead simple python wrapper for darknet that works with OpenCV 4.1, CUDA 10.1

Example-custom-ml-block-keras - Custom Keras ML block example for Edge Impulse

Picasso: A CUDA-based Library for Deep Learning over 3D Meshes

This Repo is the official CUDA implementation of ICCV 2019 Oral paper for CARAFE: Content-Aware ReAssembly of FEatures

PyTorch implementation of Soft-DTW: a Differentiable Loss Function for Time-Series in CUDA

Convert Python 3 code to CUDA code.

This demo showcase the use of onnxruntime-rs with a GPU on CUDA 11 to run Bert in a data pipeline with Rust.

LightSeq is a high performance training and inference library for sequence processing and generation implemented in CUDA

CUDA Python Low-level Bindings