Several simple examples for popular neural network toolkits calling custom CUDA operators.

Overview

Neural Network CUDA Example

logo

Several simple examples for neural network toolkits (PyTorch, TensorFlow, etc.) calling custom CUDA operators.

We provide several ways to compile the CUDA kernels and their cpp wrappers, including jit, setuptools and cmake.

We also provide several python codes to call the CUDA kernels, including kernel time statistics and model training.

For more accurate time statistics, you'd best use nvprof or nsys to run the code.

Environments

  • NVIDIA Driver: 418.116.00
  • CUDA: 11.0
  • Python: 3.7.3
  • PyTorch: 1.7.0+cu110
  • TensorFlow: 2.4.1
  • CMake: 3.16.3
  • Ninja: 1.10.0
  • GCC: 8.3.0

Cannot ensure successful running in other environments.

Code structure

├── include
│   └── add2.h # header file of add2 cuda kernel
├── kernel
│   └── add2_kernel.cu # add2 cuda kernel
├── pytorch
│   ├── add2_ops.cpp # torch wrapper of add2 cuda kernel
│   ├── time.py # time comparison of cuda kernel and torch
│   ├── train.py # training using custom cuda kernel
│   ├── setup.py
│   └── CMakeLists.txt
├── tensorflow
│   ├── add2_ops.cpp # tensorflow wrapper of add2 cuda kernel
│   ├── time.py # time comparison of cuda kernel and tensorflow
│   ├── train.py # training using custom cuda kernel
│   └── CMakeLists.txt
├── LICENSE
└── README.md

PyTorch

Compile cpp and cuda

JIT
Directly run the python code.

Setuptools

python3 pytorch/setup.py install

CMake

mkdir build
cd build
cmake ../pytorch
make

Run python

Compare kernel running time

python3 pytorch/time.py --compiler jit
python3 pytorch/time.py --compiler setup
python3 pytorch/time.py --compiler cmake

Train model

python3 pytorch/train.py --compiler jit
python3 pytorch/train.py --compiler setup
python3 pytorch/train.py --compiler cmake

TensorFlow

Compile cpp and cuda

CMake

mkdir build
cd build
cmake ../tensorflow
make

Run python

Compare kernel running time

python3 tensorflow/time.py --compiler cmake

Train model

python3 tensorflow/train.py --compiler cmake

Implementation details (in Chinese)

PyTorch自定义CUDA算子教程与运行时间分析
详解PyTorch编译并调用自定义CUDA算子的三种方式
三分钟教你如何PyTorch自定义反向传播

F.A.Q

Q. ImportError: libc10.so: cannot open shared object file: No such file or directory

A. You must do import torch before import add2.

Comments
  • ERROR: expected constructor, destructor, or type conversion before ‘(’ token

    ERROR: expected constructor, destructor, or type conversion before ‘(’ token

    Thanks for your sharing. When I compile pytorch project using JIT or Setuptools (e.g., python3 pytorch/setup.py install), I have a error as follows:

    /home/chenxingyu/Documents/NN-CUDA-Example/pytorch/add2_ops.cpp:20:14: error: expected constructor, destructor, or type conversion before ‘(’ token
     TORCH_LIBRARY(add2, m) {
    

    Could you help me solve it?

    opened by SeanChenxy 2
  • tf2.3 cuda10.1  tf.load_op_libary(

    tf2.3 cuda10.1 tf.load_op_libary("build/libadd2.so") error

    Traceback (most recent call last): File "tensorflow/time.py", line 57, in cuda_module = tf.load_op_library('build/libadd2.so') File "/home/guowei/anaconda3/envs/gy_py3.7_tf2.3/lib/python3.7/site-packages/tensorflow/python/framework/load_library.py", line 58, in load_op_library lib_handle = py_tf.TF_LoadLibrary(library_filename) tensorflow.python.framework.errors_impl.NotFoundError: build/libadd2.so: undefined symbol: _ZTIN10tensorflow8OpKernelE

    opened by calliwen 2
  • RuntimeError: CUDA error: an illegal memory access was encountered =====  ?

    RuntimeError: CUDA error: an illegal memory access was encountered ===== ?

    D1C18D629D317E071B2F2B55EFB070C7 I have changed the train.py here into this. And I meet this error : RuntimeError: CUDA error: an illegal memory access was encountered

    (My environment is OK...using normal train.py can work)

    Would you provide any suggestion? Thank you!!!

    opened by Arsmart123 1
  • A compilation problem about setup.py

    A compilation problem about setup.py

    This problem occurs when I compile

    running install running bdist_egg running egg_info writing add2.egg-info\PKG-INFO writing dependency_links to add2.egg-info\dependency_links.txt writing top-level names to add2.egg-info\top_level.txt reading manifest file 'add2.egg-info\SOURCES.txt' writing manifest file 'add2.egg-info\SOURCES.txt' installing library code to build\bdist.win-amd64\egg running install_lib running build_ext D:\miniconda3\envs\torch-gpu\lib\site-packages\torch\utils\cpp_extension.py:304: UserWarning: Error checking compiler version for cl: [WinError 2] 系统找不到指定的文件。 warnings.warn(f'Error checking compiler version for {compiler}: {error}') building 'add2' extension Emitting ninja build file D:\git-bash\daima\NN-CUDA-Example\build\temp.win-amd64-3.8\Release\build.ninja... Compiling objects... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) 1.10.2.git.kitware.jobserver-1 C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:D:\miniconda3\envs\torch-gpu\lib\site- packages\torch\lib "/LIBPATH:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\lib/x64" /LIBPATH:D:\miniconda3\envs\torch-gpu\libs /LIBPATH:D:\miniconda3\envs\torch-gpu\PCbuild\amd64 "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\lib\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.10240.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\8.1\lib\winv6.3\um\x64" c10.lib torch.lib tor ch_cpu.lib torch_python.lib cudart.lib c10_cuda.lib torch_cuda_cu.lib torch_cuda_cpp.lib /EXPORT:PyInit_add2 D:\git-bash\daima\NN-CUDA-Example\build\temp.win-amd64-3.8\Release\pytorch/add2_ops.obj D:\git-bash\daima\NN-CUDA-Example\b uild\temp.win-amd64-3.8\Release\kernel/add2_kernel.obj /OUT:build\lib.win-amd64-3.8\add2.cp38-win_amd64.pyd /IMPLIB:D:\git-bash\daima\NN-CUDA-Example\build\temp.win-amd64-3.8\Release\pytorch\add2.cp38-win_amd64.lib LINK : fatal error LNK1181: 无法打开输入文件“D:\git-bash\daima\NN-CUDA-Example\build\temp.win-amd64-3.8\Release\pytorch\add2_ops.obj” error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\link.exe' failed with exit status 1181

    The environment configuration is as follows: pytorch1.8.1+cuda11.1 python3.8

    How to solve this problem?

    opened by Richard9797 0
  • subprocess.CalledProcessError: Command '['where', 'cl']' returned non-zero exit status 1.(Using jit)

    subprocess.CalledProcessError: Command '['where', 'cl']' returned non-zero exit status 1.(Using jit)

    The environment is competely same....But when I am using pytorch with jit method, below error appears:

    subprocess.CalledProcessError: Command '['where', 'cl']' returned non-zero exit status 1.

    Any suggestion? Thank you!!!!

    opened by Arsmart123 1
  • Cmake error,other 2 compile methods succeeded

    Cmake error,other 2 compile methods succeeded

    Checking whether the CUDA compiler is NVIDIA using "" did not match "nvcc: NVIDIA (R) Cuda compiler driver": Checking whether the CUDA compiler is Clang using "" did not match "(clang version)": Compiling the CUDA compiler identification source file "CMakeCUDACompilerId.cu" failed. Compiler: /usr/local/cuda/bin/nvcc Build flags: Id flags: -v

    The output was: No such file or directory

    Compiling the CUDA compiler identification source file "CMakeCUDACompilerId.cu" failed. Compiler: /usr/local/cuda/bin/nvcc Build flags: Id flags: -v

    The output was: No such file or directory

    Checking whether the CUDA compiler is NVIDIA using "" did not match "nvcc: NVIDIA (R) Cuda compiler driver": Checking whether the CUDA compiler is Clang using "" did not match "(clang version)": Checking whether the CUDA compiler is NVIDIA using "" did not match "nvcc: NVIDIA (R) Cuda compiler driver": Checking whether the CUDA compiler is Clang using "" did not match "(clang version)": Checking whether the CUDA compiler is NVIDIA using "" did not match "nvcc: NVIDIA (R) Cuda compiler driver": Checking whether the CUDA compiler is Clang using "" did not match "(clang version)": Compiling the CUDA compiler identification source file "CMakeCUDACompilerId.cu" failed. Compiler: /usr/local/cuda/bin/nvcc Build flags: Id flags: -v

    The output was: No such file or directory

    Compiling the CUDA compiler identification source file "CMakeCUDACompilerId.cu" failed. Compiler: /usr/local/cuda/bin/nvcc Build flags: Id flags: -v

    The output was: No such file or directory

    Checking whether the CUDA compiler is NVIDIA using "" did not match "nvcc: NVIDIA (R) Cuda compiler driver": Checking whether the CUDA compiler is Clang using "" did not match "(clang version)": Checking whether the CUDA compiler is NVIDIA using "" did not match "nvcc: NVIDIA (R) Cuda compiler driver": Checking whether the CUDA compiler is Clang using "" did not match "(clang version)": Checking whether the CUDA compiler is NVIDIA using "" did not match "nvcc: NVIDIA (R) Cuda compiler driver": Checking whether the CUDA compiler is Clang using "" did not match "(clang version)": Compiling the CUDA compiler identification source file "CMakeCUDACompilerId.cu" failed. Compiler: /usr/local/cuda/bin/nvcc Build flags: Id flags: -v

    The output was: No such file or directory

    Compiling the CUDA compiler identification source file "CMakeCUDACompilerId.cu" failed. Compiler: /usr/local/cuda/bin/nvcc Build flags: Id flags: -v

    The output was: No such file or directory

    Checking whether the CUDA compiler is NVIDIA using "" did not match "nvcc: NVIDIA (R) Cuda compiler driver": Checking whether the CUDA compiler is Clang using "" did not match "(clang version)": Checking whether the CUDA compiler is NVIDIA using "" did not match "nvcc: NVIDIA (R) Cuda compiler driver": Checking whether the CUDA compiler is Clang using "" did not match "(clang version)": Checking whether the CUDA compiler is NVIDIA using "" did not match "nvcc: NVIDIA (R) Cuda compiler driver": Checking whether the CUDA compiler is Clang using "" did not match "(clang version)": Compiling the CUDA compiler identification source file "CMakeCUDACompilerId.cu" failed. Compiler: /usr/local/cuda/bin/nvcc Build flags: Id flags: -v

    The output was: No such file or directory

    Compiling the CUDA compiler identification source file "CMakeCUDACompilerId.cu" failed. Compiler: /usr/local/cuda/bin/nvcc Build flags: Id flags: -v

    The output was: No such file or directory

    Checking whether the CUDA compiler is NVIDIA using "" did not match "nvcc: NVIDIA (R) Cuda compiler driver": Checking whether the CUDA compiler is Clang using "" did not match "(clang version)": Checking whether the CUDA compiler is NVIDIA using "" did not match "nvcc: NVIDIA (R) Cuda compiler driver": Checking whether the CUDA compiler is Clang using "" did not match "(clang version)": Performing C++ SOURCE FILE Test CMAKE_HAVE_LIBC_PTHREAD failed with the following output: Change Dir: /data/jupyter/cuda_learn/NN-CUDA-Example/build/CMakeFiles/CMakeTmp

    Run Build Command(s):/usr/bin/make -f Makefile cmTC_6d839/fast && /usr/bin/make -f CMakeFiles/cmTC_6d839.dir/build.make CMakeFiles/cmTC_6d839.dir/build make[1]: Entering directory '/data/jupyter/cuda_learn/NN-CUDA-Example/build/CMakeFiles/CMakeTmp' Building CXX object CMakeFiles/cmTC_6d839.dir/src.cxx.o /usr/bin/c++ -DCMAKE_HAVE_LIBC_PTHREAD -o CMakeFiles/cmTC_6d839.dir/src.cxx.o -c /data/jupyter/cuda_learn/NN-CUDA-Example/build/CMakeFiles/CMakeTmp/src.cxx Linking CXX executable cmTC_6d839 /home/huyang02/anaconda3/lib/python3.6/site-packages/cmake/data/bin/cmake -E cmake_link_script CMakeFiles/cmTC_6d839.dir/link.txt --verbose=1 /usr/bin/c++ -rdynamic CMakeFiles/cmTC_6d839.dir/src.cxx.o -o cmTC_6d839 CMakeFiles/cmTC_6d839.dir/src.cxx.o: In function main': src.cxx:(.text+0x3e): undefined reference topthread_create' src.cxx:(.text+0x4a): undefined reference to pthread_detach' src.cxx:(.text+0x56): undefined reference topthread_cancel' src.cxx:(.text+0x67): undefined reference to pthread_join' src.cxx:(.text+0x7b): undefined reference topthread_atfork' collect2: error: ld returned 1 exit status CMakeFiles/cmTC_6d839.dir/build.make:98: recipe for target 'cmTC_6d839' failed make[1]: *** [cmTC_6d839] Error 1 make[1]: Leaving directory '/data/jupyter/cuda_learn/NN-CUDA-Example/build/CMakeFiles/CMakeTmp' Makefile:127: recipe for target 'cmTC_6d839/fast' failed make: *** [cmTC_6d839/fast] Error 2

    Source file was: #include <pthread.h>

    static void* test_func(void* data) { return data; }

    int main(void) { pthread_t thread; pthread_create(&thread, NULL, test_func, NULL); pthread_detach(thread); pthread_cancel(thread); pthread_join(thread, NULL); pthread_atfork(NULL, NULL, NULL); pthread_exit(NULL);

    return 0; }

    Determining if the function pthread_create exists in the pthreads failed with the following output: Change Dir: /data/jupyter/cuda_learn/NN-CUDA-Example/build/CMakeFiles/CMakeTmp

    Run Build Command(s):/usr/bin/make -f Makefile cmTC_308c5/fast && /usr/bin/make -f CMakeFiles/cmTC_308c5.dir/build.make CMakeFiles/cmTC_308c5.dir/build make[1]: Entering directory '/data/jupyter/cuda_learn/NN-CUDA-Example/build/CMakeFiles/CMakeTmp' Building CXX object CMakeFiles/cmTC_308c5.dir/CheckFunctionExists.cxx.o /usr/bin/c++ -DCHECK_FUNCTION_EXISTS=pthread_create -o CMakeFiles/cmTC_308c5.dir/CheckFunctionExists.cxx.o -c /data/jupyter/cuda_learn/NN-CUDA-Example/build/CMakeFiles/CheckLibraryExists/CheckFunctionExists.cxx Linking CXX executable cmTC_308c5 /home/huyang02/anaconda3/lib/python3.6/site-packages/cmake/data/bin/cmake -E cmake_link_script CMakeFiles/cmTC_308c5.dir/link.txt --verbose=1 /usr/bin/c++ -DCHECK_FUNCTION_EXISTS=pthread_create -rdynamic CMakeFiles/cmTC_308c5.dir/CheckFunctionExists.cxx.o -o cmTC_308c5 -lpthreads /usr/bin/ld: cannot find -lpthreads collect2: error: ld returned 1 exit status CMakeFiles/cmTC_308c5.dir/build.make:98: recipe for target 'cmTC_308c5' failed make[1]: *** [cmTC_308c5] Error 1 make[1]: Leaving directory '/data/jupyter/cuda_learn/NN-CUDA-Example/build/CMakeFiles/CMakeTmp' Makefile:127: recipe for target 'cmTC_308c5/fast' failed make: *** [cmTC_308c5/fast] Error 2

    opened by yanghu819 0
  • 阳神能不能帮我看看为什么编译失败报错RuntimeError: Error building extension 'add2',后面这句不理解fatal error: add2.h: No such file or directory

    阳神能不能帮我看看为什么编译失败报错RuntimeError: Error building extension 'add2',后面这句不理解fatal error: add2.h: No such file or directory

    我在Jupiter notebook里运行下面jit命令的编译报错如下 python3 time.py --compiler jit

    Using /tmp/torch_extensions as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /tmp/torch_extensions/add2/build.ninja... Building extension module add2... [1/2] c++ -MMD -MF add2_ops.o.d -DTORCH_EXTENSION_NAME=add2 -DTORCH_API_INCLUDE_EXTENSION_H -I/gby/NN-CUDA-Example-master/pytorch/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.6/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /gby/NN-CUDA-Example-master/pytorch/add2_ops.cpp -o add2_ops.o FAILED: add2_ops.o c++ -MMD -MF add2_ops.o.d -DTORCH_EXTENSION_NAME=add2 -DTORCH_API_INCLUDE_EXTENSION_H -I/gby/NN-CUDA-Example-master/pytorch/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.6/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /gby/NN-CUDA-Example-master/pytorch/add2_ops.cpp -o add2_ops.o /gby/NN-CUDA-Example-master/pytorch/add2_ops.cpp:2:10: fatal error: add2.h: No such file or directory #include "add2.h" ^~~~~~~~ compilation terminated. ninja: build stopped: subcommand failed. Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 960, in _build_extension_module check=True) File "/usr/lib/python3.6/subprocess.py", line 438, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "time.py", line 56, in verbose=True) File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 658, in load is_python_module) File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 827, in _jit_compile with_cuda=with_cuda) File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 880, in _write_ninja_file_and_build _build_extension_module(name, build_directory, verbose) File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 973, in _build_extension_module raise RuntimeError(message) RuntimeError: Error building extension 'add2'

    运行下面cmake命令的编译报错如下 Traceback (most recent call last): File "time.py", line 60, in torch.ops.load_library("build/libadd2.so") File "/usr/local/lib/python3.6/dist-packages/torch/_ops.py", line 106, in load_library ctypes.CDLL(path) File "/usr/lib/python3.6/ctypes/init.py", line 348, in init self._handle = _dlopen(self._name, mode) OSError: /gby/NN-CUDA-Example-master/pytorch/build/libadd2.so: cannot open shared object file: No such file or directory

    opened by Henry-Avery 1
  • torch.ops.load_library(

    torch.ops.load_library("build/libadd2.so") error

    Traceback (most recent call last): File "time.py", line 60, in torch.ops.load_library("build/libadd2.so") File "/home/gzy/anaconda3/envs/pytorch1.7/lib/python3.6/site-packages/torch/_ops.py", line 105, in load_library ctypes.CDLL(path) File "/home/gzy/anaconda3/envs/pytorch1.7/lib/python3.6/ctypes/init.py", line 348, in init self._handle = _dlopen(self._name, mode) OSError: /home/gzy/NN-CUDA-Example/pytorch/build/libadd2.so: undefined symbol: THPVariableClass

    opened by longzeyilang 4
Owner
WeiYang
微信公众号「算法码上来」 / ByteDance AI Lab / East China Normal University
WeiYang
An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

null 45 Dec 8, 2022
Extending JAX with custom C++ and CUDA code

Extending JAX with custom C++ and CUDA code This repository is meant as a tutorial demonstrating the infrastructure required to provide custom ops in

Dan Foreman-Mackey 237 Dec 23, 2022
Neural network for digit classification powered by cuda

cuda_nn_mnist Neural network library for digit classification powered by cuda Resources The library was built to work with MNIST dataset. python-mnist

Nikita Ardashev 1 Dec 20, 2021
Lunar is a neural network aimbot that uses real-time object detection accelerated with CUDA on Nvidia GPUs.

Lunar Lunar is a neural network aimbot that uses real-time object detection accelerated with CUDA on Nvidia GPUs. About Lunar can be modified to work

Zeyad Mansour 276 Jan 7, 2023
ColossalAI-Examples - Examples of training models with hybrid parallelism using ColossalAI

ColossalAI-Examples This repository contains examples of training models with Co

HPC-AI Tech 185 Jan 9, 2023
Learning nonlinear operators via DeepONet

DeepONet: Learning nonlinear operators The source code for the paper Learning nonlinear operators via DeepONet based on the universal approximation th

Lu Lu 239 Jan 2, 2023
Calling Julia from Python - an experiment on data loading

Calling Julia from Python - an experiment on data loading See the slides. TLDR After reading Patrick's blog post, we decided to try to replace C++ wit

Abel Siqueira 8 Jun 7, 2022
Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.

Jittor: a Just-in-time(JIT) deep learning framework Quickstart | Install | Tutorial | Chinese Jittor is a high-performance deep learning framework bas

null 2.7k Jan 3, 2023
Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis. You write a high level configuration file specifying your in

Blue Collar Bioinformatics 917 Jan 3, 2023
Sparse-dense operators implementation for Paddle

Sparse-dense operators implementation for Paddle This module implements coo, csc and csr matrix formats and their inter-ops with dense matrices. Feel

北海若 3 Dec 17, 2022
A dead simple python wrapper for darknet that works with OpenCV 4.1, CUDA 10.1

What Dead simple python wrapper for Yolo V3 using AlexyAB's darknet fork. Works with CUDA 10.1 and OpenCV 4.1 or later (I use OpenCV master as of Jun

Pliable Pixels 6 Jan 12, 2022
Example-custom-ml-block-keras - Custom Keras ML block example for Edge Impulse

Custom Keras ML block example for Edge Impulse This repository is an example on

Edge Impulse 8 Nov 2, 2022
Picasso: A CUDA-based Library for Deep Learning over 3D Meshes

The Picasso Library is intended for complex real-world applications with large-scale surfaces, while it also performs impressively on the small-scale applications over synthetic shape manifolds. We have upgraded the point cloud modules of SPH3D-GCN from homogeneous to heterogeneous representations, and included the upgraded modules into this latest work as well. We are happy to announce that the work is accepted to IEEE CVPR2021.

null 97 Dec 1, 2022
This Repo is the official CUDA implementation of ICCV 2019 Oral paper for CARAFE: Content-Aware ReAssembly of FEatures

Introduction This Repo is the official CUDA implementation of ICCV 2019 Oral paper for CARAFE: Content-Aware ReAssembly of FEatures. @inproceedings{Wa

Jiaqi Wang 42 Jan 7, 2023
PyTorch implementation of Soft-DTW: a Differentiable Loss Function for Time-Series in CUDA

Soft DTW Loss Function for PyTorch in CUDA This is a Pytorch Implementation of Soft-DTW: a Differentiable Loss Function for Time-Series which is batch

Keon Lee 76 Dec 20, 2022
Convert Python 3 code to CUDA code.

Py2CUDA Convert python code to CUDA. Usage To convert a python file say named py_file.py to CUDA, run python generate_cuda.py --file py_file.py --arch

Yuval Rosen 3 Jul 14, 2021
This demo showcase the use of onnxruntime-rs with a GPU on CUDA 11 to run Bert in a data pipeline with Rust.

Demo BERT ONNX pipeline written in rust This demo showcase the use of onnxruntime-rs with a GPU on CUDA 11 to run Bert in a data pipeline with Rust. R

Xavier Tao 14 Dec 17, 2022
Bytedance Inc. 2.5k Jan 6, 2023
CUDA Python Low-level Bindings

CUDA Python Low-level Bindings

NVIDIA Corporation 529 Jan 3, 2023