CUda Matrix Multiply library.

Last update: Dec 27, 2022

Related tags

Deep Learning cumm

Overview

cumm

CUda Matrix Multiply library.

cumm is developed during learning of CUTLASS, which use too much c++ template and make code unmaintainable. So I develop pccm, use python as meta programming language, to replace c++ template meta programming. Now pccm become a foundational framework of cumm and my other c++ project such as spconv. cumm also contains a python asyncio-based gemm simulator that share same meta program with CUDA code, enable gemm visualization and easy debug experience.

Install

Prebuilt

We offer python 3.6-3.10 and cuda 10.2/11.1/11.3/11.4 prebuilt binaries for linux (manylinux).

We offer python 3.7-3.10 and cuda 10.2/11.1/11.3/11.4 prebuilt binaries for windows 10/11.

We will offer prebuilts for CUDA versions supported by latest pytorch release. For example, pytorch 1.9 support cuda 10.2 and 11.1, so we support them too.

pip install cumm for CPU-only

pip install cumm-cu102 for CUDA 10.2

pip install cumm-cu111 for CUDA 11.1

pip install cumm-cu113 for CUDA 11.3

pip install cumm-cu114 for CUDA 11.4

Build from source for development (JIT, recommend for develop)

WARNING Use code in tags!!! code in main branch may contain bugs.

The c++ code will be built automatically when you change c++ code in project.

Linux

uninstall cumm installed by pip. you must ensure no "cumm" exists in pip list | grep cumm
install build-essential, install CUDA
git clone https://github.com/FindDefinition/cumm, cd ./cumm, pip install -e .
in python, import cumm and wait for build finish.

Windows

uninstall spconv and cumm installed by pip. you must ensure no "cumm" exists in pip list | grep cumm
install visual studio 2019 or newer. make sure C++ development component is installed. install CUDA
set powershell script execution policy
start a new powershell, run tools/msvc_setup.ps1
git clone https://github.com/FindDefinition/cumm, cd ./cumm, pip install -e .
in python, import cumm and wait for build finish.

Build wheel from source

WARNING Use code in tags!!! code in main branch may contain bugs.

WARNING: If CUMM_CUDA_VERSION is set with a CUDA version, following steps will create a wheel named "cumm-cuxxx", not "cumm", this means you must use cumm-cuxxx in dependency of your project which depend on cumm, not cumm. If CUMM_CUDA_VERSION isn't set, cumm will always built with CUDA, so the CUDA must exists in your system. The wheel name will be cumm even if it is built with cuda.

Linux

It's recommend to build Linux packages in official build docker. Build with CUDA support don't need a real GPU.

Build in Official Docker

select a cuda version. available: CUDA 10.2, 11.1, 11.3, 11.4, 11.5
(Example for CUDA 11.4) git clone https://github.com/FindDefinition/cumm, cd ./cumm, docker run --rm -e PLAT=manylinux2014_x86_64 -e CUMM_CUDA_VERSION=114 -v `pwd`:/io scrin/manylinux2014-cuda:cu114-devel-1.0.0 bash -c "source /etc/bashrc && /io/tools/build-wheels.sh"

Build in your environment

install build-essential, install CUDA
set env for installed cuda version. for example, export CUMM_CUDA_VERSION="11.4". If you want to build CPU-only, run export CUMM_CUDA_VERSION="". If CUMM_CUDA_VERSION isn't set, you need to ensure cuda libraries are inside OS search path, and the built wheel name will be cumm, otherwise cumm-cuxxx
run export CUMM_DISABLE_JIT="1"
run python setup.py bdist_wheel+pip install dists/xxx.whl

Windows 10/11

install visual studio 2019 or newer. make sure C++ development package is installed. install CUDA
set powershell script execution policy
start a new powershell, run tools/msvc_setup.ps1
set env for installed cuda version. for example, $Env:CUMM_CUDA_VERSION = "11.4". If you want to build CPU-only, run $Env:CUMM_CUDA_VERSION = "". . If CUMM_CUDA_VERSION isn't set, you need to ensure cuda libraries are inside OS search path, and the built wheel name will be cumm, otherwise cumm-cuxxx
run $Env:CUMM_DISABLE_JIT = "1"
run python setup.py bdist_wheel+pip install dists/xxx.whl

Note

The work is done when the author is an employee at Tusimple.

LICENSE

Apache 2.0

Comments

TENSORVIEW_INCLUDE_PATH does not exist

Hi,

I built a wheel form source for the usage with spconv 2.x on a Nvidia Jetson TX2 with the following specifications: Ubuntu 18.04 CUDA 10.2 Python 3.6 Pytorch 1.6.0

I followed the description in the README and did not export CUMM_CUDA_VERSION. Additionally I specified the CUDA compute capability by export CUMM_CUDA_ARCH_LIST="6.2" as described in spconv.

I could build and install the wheel without any problems, but I got the following AssertionError, when importing cumm in python.

Python 3.6.9 (default, Jan 26 2021, 15:33:00) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cumm
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/nvidia/.local/lib/python3.6/site-packages/cumm/__init__.py", line 20, in <module>
    from .constants import PACKAGE_NAME, CUMM_DISABLE_JIT, CUMM_CPU_ONLY_BUILD
  File "/home/nvidia/.local/lib/python3.6/site-packages/cumm/constants.py", line 33, in <module>
    assert TENSORVIEW_INCLUDE_PATH.exists()
AssertionError

triaged

opened by jonasmeyer1 7

'BuildMeta' object has no attribute 'add_public_includes'

I built cumm from source and tried to import cumm but failed:

Traceback (most recent call last): File "", line 1, in File "d:\resources\cumm\cumm_init_.py", line 33, in verbose=False) File "C:\Users\Yihua\AppData\Roaming\Python\Python37\site-packages\pccm\builder\pybind.py", line 56, in build_pybind user_cus = cg.build_graph(cus, namespace_root) File "C:\Users\Yihua\AppData\Roaming\Python\Python37\site-packages\pccm\core_init_.py", line 1903, in build_graph cu_type_to_cu[dep] = dep() File "C:\Users\Yihua\AppData\Roaming\Python\Python37\site-packages\pccm\core_init_.py", line 844, in wrapper func(self, *args, **kwargs) File "d:\resources\cumm\cumm\common.py", line 278, in init self.build_meta.add_public_includes(TENSORVIEW_INCLUDE_PATH) AttributeError: 'BuildMeta' object has no attribute 'add_public_includes'

Environment: Windows 11 Python 3.7.13 pccm==0.3.4 ccimport==0.3.7 pip==21.2.4 mkl-service==2.4.0

I noticed that there were recent changes in ccimport so I tried another environment and built ccimport from source but also failed:

[1/38] [MSVC][c++/pch]D:\Resources\cumm\cumm\build\core_cc\include\csrc\arrayref\ArrayPtr.h.pch|D:\Resources\cumm\cumm\build\core_cc\msvc_stub\include\csrc\arrayref\ArrayPtr.h.cc.o FAILED: D:/Resources/cumm/cumm/build/core_cc/msvc_stub/include/csrc/arrayref/ArrayPtr.h.cc.o cl /I "D:\Resources\cumm\cumm\build\core_cc\include" /I "F:\Program Files (x86)\Intel\oneAPI\intelpython\latest\lib\site-packages\pybind11\include" /I "F:\Program Files (x86)\Intel\oneAPI\intelpython\latest\Include" /O2 /DNOMINMAX /std:c++14 /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc /Zc:__cplusplus /nologo /showIncludes /DTV_CUDA /I "D:\Resources\cumm\include" /I "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\include" -c /Yccsrc/arrayref/ArrayPtr.h /FpD:\Resources\cumm\cumm\build\core_cc\include\csrc\arrayref\ArrayPtr.h.pch D:\Resources\cumm\cumm\build\core_cc\msvc_stub\include\csrc\arrayref\ArrayPtr.h.cc /FoD:\Resources\cumm\cumm\build\core_cc\msvc_stub\include\csrc\arrayref\ArrayPtr.h.cc.o D:\Resources\cumm\include\tensorview\core\array.h(813): warning C4068: Unknown pragma“GCC” D:\Resources\cumm\include\tensorview/core/common.h(29): fatal error C1083: Cannot open include file: “cuda.h”: No such file or directory [2/38] [MSVC][c++/pch]D:\Resources\cumm\cumm\build\core_cc\include\tensorview_bind\TensorViewBind.h.pch|D:\Resources\cumm\cumm\build\core_cc\msvc_stub\include\tensorview_bind\TensorViewBind.h.cc.o FAILED: D:/Resources/cumm/cumm/build/core_cc/msvc_stub/include/tensorview_bind/TensorViewBind.h.cc.o cl /I "D:\Resources\cumm\cumm\build\core_cc\include" /I "F:\Program Files (x86)\Intel\oneAPI\intelpython\latest\lib\site-packages\pybind11\include" /I "F:\Program Files (x86)\Intel\oneAPI\intelpython\latest\Include" /O2 /DNOMINMAX /std:c++14 /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc /Zc:__cplusplus /nologo /showIncludes /DTV_CUDA /I "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\include" /I "D:\Resources\cumm\include" -c /Yctensorview_bind/TensorViewBind.h /FpD:\Resources\cumm\cumm\build\core_cc\include\tensorview_bind\TensorViewBind.h.pch D:\Resources\cumm\cumm\build\core_cc\msvc_stub\include\tensorview_bind\TensorViewBind.h.cc /FoD:\Resources\cumm\cumm\build\core_cc\msvc_stub\include\tensorview_bind\TensorViewBind.h.cc.o D:\Resources\cumm\include\tensorview/core/common.h(29): fatal error C1083: Cannot open include file: “cuda.h”: No such file or directory [3/38] [MSVC][c++]D:\Resources\cumm\cumm\build\core_cc\src\tensorview_bind\PyBindTensorViewBind\PyBindTensorViewBind_bind_TensorViewBind.cc.o FAILED: D:/Resources/cumm/cumm/build/core_cc/src/tensorview_bind/PyBindTensorViewBind/PyBindTensorViewBind_bind_TensorViewBind.cc.o cl /I "D:\Resources\cumm\cumm\build\core_cc\include" /I "F:\Program Files (x86)\Intel\oneAPI\intelpython\latest\lib\site-packages\pybind11\include" /I "F:\Program Files (x86)\Intel\oneAPI\intelpython\latest\Include" /O2 /DNOMINMAX /std:c++14 /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc /Zc:__cplusplus /nologo /showIncludes /DTV_CUDA /I "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\include" /I "D:\Resources\cumm\include" -c D:\Resources\cumm\cumm\build\core_cc\src\tensorview_bind\PyBindTensorViewBind\PyBindTensorViewBind_bind_TensorViewBind.cc /FoD:\Resources\cumm\cumm\build\core_cc\src\tensorview_bind\PyBindTensorViewBind\PyBindTensorViewBind_bind_TensorViewBind.cc.o D:\Resources\cumm\include\tensorview/core/common.h(29): fatal error C1083: Cannot open include file: “cuda.h”: No such file or directory [4/38] [MSVC][c++]D:\Resources\cumm\cumm\build\core_cc\src\csrc\arrayref\PyBindArrayPtr\PyBindArrayPtr_bind_ArrayPtr.cc.o FAILED: D:/Resources/cumm/cumm/build/core_cc/src/csrc/arrayref/PyBindArrayPtr/PyBindArrayPtr_bind_ArrayPtr.cc.o cl /I "D:\Resources\cumm\cumm\build\core_cc\include" /I "F:\Program Files (x86)\Intel\oneAPI\intelpython\latest\lib\site-packages\pybind11\include" /I "F:\Program Files (x86)\Intel\oneAPI\intelpython\latest\Include" /O2 /DNOMINMAX /std:c++14 /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc /Zc:_cplusplus /nologo /showIncludes /DTV_CUDA /I "D:\Resources\cumm\include" /I "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\include" -c D:\Resources\cumm\cumm\build\core_cc\src\csrc\arrayref\PyBindArrayPtr\PyBindArrayPtr_bind_ArrayPtr.cc /FoD:\Resources\cumm\cumm\build\core_cc\src\csrc\arrayref\PyBindArrayPtr\PyBindArrayPtr_bind_ArrayPtr.cc.o D:\Resources\cumm\include\tensorview\core\array.h(813): warning C4068: Unknown pragma“GCC” D:\Resources\cumm\include\tensorview/core/common.h(29): fatal error C1083: Cannot open include file: “cuda.h”: No such file or directory ninja: build stopped: subcommand failed. Traceback (most recent call last): File "", line 1, in File "d:\resources\cumm\cumm_init.py", line 29, in pccm.builder.build_pybind([ArrayPtr(), TensorViewBind()], File "D:\Resources\pccm\pccm\builder\pybind.py", line 114, in build_pybind return ccimport.ccimport( File "F:\Program Files (x86)\Intel\oneAPI\intelpython\latest\lib\site-packages\ccimport-0.4.0-py3.9.egg\ccimport\core.py", line 149, in ccimport File "F:\Program Files (x86)\Intel\oneAPI\intelpython\latest\lib\site-packages\ccimport-0.4.0-py3.9.egg\ccimport\buildtools\writer.py", line 990, in build_simple_ninja subprocess.CalledProcessError: Command '['ninja']' returned non-zero exit status 1.

Environment: Windows 11 Python 3.9.7 pccm==0.3.5 (built from source) ccimport==0.4.0 (built from source) pip==21.2.4

opened by yihuajack 4
Install cumm on Orin
Similar to the previous issue (https://github.com/FindDefinition/cumm/issues/13) , I also failed to install cumm on Orin.

Can you tell me how to install it on Orin? Any comments would be highly appreciated.

The options I have set are as follows:

export CUMM_DISABLE_JIT="1" export CUMM_ARCH_LIST="8.7+PTX" sudo pip install -e.

The error message is: [39/39] [GCC][Link]/home/siyeong/workspace/cumm/build/temp.linux-aarch64-cpython-38/cumm/build/core_cc/core_cc.cpython-38-aarch64-linux-gnu.so FAILED: /home/siyeong/workspace/cumm/build/temp.linux-aarch64-cpython-38/cumm/build/core_cc/core_cc.cpython-38-aarch64-linux-gnu.so g++ @/home/siyeong/workspace/cumm/build/temp.linux-aarch64-cpython-38/cumm/build/core_cc/core_cc.cpython-38-aarch64-linux-gnu.so.rsp -lcudart -lnvrtc -lcufilt -ldl -L "/usr/local/cuda/lib64" -L "/usr/local/cuda/lib64" -Wl,--no-as-needed -lnvrtc-builtins -shared -o /home/siyeong/workspace/cumm/build/temp.linux-aarch64-cpython-38/cumm/build/core_cc/core_cc.cpython-38-aarch64-linux-gnu.so /usr/bin/ld: /usr/local/cuda/lib64/libcufilt.a(decode.o): relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol display_func_params' which may bind externally can not be used when making a shared object; recompile with -fPIC /usr/local/cuda/lib64/libcufilt.a(decode.o): in functiondemangle_bare_function_type(char const*, int, int, a_decode_control_block*)': /dvs/p4/build/sw/rel/gpgpu/toolkit/r11.4/compiler/drivers/compiler/edg/EDG_6.2/src/../util/decode.c:4996:(.text._ZL27demangle_bare_function_typePKciiP22a_decode_control_block+0x53c): dangerous relocation: unsupported relocation /usr/bin/ld: /usr/local/cuda/lib64/libcufilt.a(decode.o): relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol display_func_params' which may bind externally can not be used when making a shared object; recompile with -fPIC /usr/local/cuda/lib64/libcufilt.a(decode.o): in functiondecode_identifier(char const*, char*, unsigned long, int*, int*, unsigned long*)': /dvs/p4/build/sw/rel/gpgpu/toolkit/r11.4/compiler/drivers/compiler/edg/EDG_6.2/src/../util/decode.c:8766:(.text._Z17decode_identifierPKcPcmPiS2_Pm+0x24): dangerous relocation: unsupported relocation /tmp/pip-build-env-mjzpnmvl/overlay/lib/python3.8/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools. warnings.warn( /tmp/pip-build-env-mjzpnmvl/overlay/lib/python3.8/site-packages/pkg_resources/init.py:123: PkgResourcesDeprecationWarning: 0.1.36ubuntu1 is an invalid version and will not be supported in a future release
opened by Siyeong-Lee 3
can't install cumm on agx xavier

jetpack 5.0.1 cuda-11.4 python3.8

I failed to install cumm.. from 0.3.0 to 0.3.4 version all failed so I tried to compile libcufilt.a (edg_decode) with g++ and nvcc with -fPIC, succeed but after recompiling, the error did not disappear

export CUMM_DISABLE_JIT="1" export CUMM_ARCH_LIST="7.2+PTX" sudo pip install -e .

[39/39] [GCC][Link]/home/jetson/cumm/build/temp.linux-aarch64-cpython-38/cumm/build/core_cc/core_cc.cpython-38-aarch64-linux-gnu.so FAILED: /home/jetson/cumm/build/temp.linux-aarch64-cpython-38/cumm/build/core_cc/core_cc.cpython-38-aarch64-linux-gnu.so g++ @/home/jetson/cumm/build/temp.linux-aarch64-cpython-38/cumm/build/core_cc/core_cc.cpython-38-aarch64-linux-gnu.so.rsp -lcudart -lnvrtc -lcufilt -ldl -L "/usr/local/cuda/lib64" -L "/usr/local/cuda/lib64" -Wl,--no-as-needed -lnvrtc-builtins -shared -o /home/jetson/cumm/build/temp.linux-aarch64-cpython-38/cumm/build/core_cc/core_cc.cpython-38-aarch64-linux-gnu.so /usr/bin/ld: /usr/local/cuda/lib64/libcufilt.a(edg_decode.o): relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol stderr@@GLIBC_2.17' which may bind externally can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: /usr/local/cuda/lib64/libcufilt.a(edg_decode.o)(.text._Z6getoptiPKPcPKc+0x15c): unresolvable R_AARCH64_ADR_PREL_PG_HI21 relocation against symbolstderr@@GLIBC_2.17' /usr/bin/ld: final link failed: bad value collect2: error: ld returned 1 exit status ninja: build stopped: subcommand failed.

opened by odroidodroid 2
cann't install cumm in jetson agx xavier arrch64

sudo pip3 install -e .

python3 Python 3.6.9 (default, Jan 26 2021, 15:33:00) [GCC 8.4.0] on linux Type "help", "copyright", "credits" or "license" for more information.

import cumm [1/36] g++ -MMD -MT /home/cicv/cumm/cumm/build/include/tensorview_bind/TensorViewBind.h.gch -MF /home/cicv/cumm/cumm/build/include/tensorview_bind/TensorViewBind.h.gch.d -I "/home/cicv/cumm/cumm/build/include" -I "/home/cicv/cumm/include" -I "/usr/local/cuda-10.2/targets/aarch64-linux/include" -I "/usr/local/lib/python3.6/dist-packages/pybind11/include" -I "/usr/include/python3.6m" -I "/usr/include/python3.6m" -DTV_CUDA -std=c++14 -O3 -fPIC -x c++-header -c /home/cicv/cumm/cumm/build/include/tensorview_bind/TensorViewBind.h -o /home/cicv/cumm/cumm/build/include/tensorview_bind/TensorViewBind.h.gch FAILED: /home/cicv/cumm/cumm/build/include/tensorview_bind/TensorViewBind.h.gch g++ -MMD -MT /home/cicv/cumm/cumm/build/include/tensorview_bind/TensorViewBind.h.gch -MF /home/cicv/cumm/cumm/build/include/tensorview_bind/TensorViewBind.h.gch.d -I "/home/cicv/cumm/cumm/build/include" -I "/home/cicv/cumm/include" -I "/usr/local/cuda-10.2/targets/aarch64-linux/include" -I "/usr/local/lib/python3.6/dist-packages/pybind11/include" -I "/usr/include/python3.6m" -I "/usr/include/python3.6m" -DTV_CUDA -std=c++14 -O3 -fPIC -x c++-header -c /home/cicv/cumm/cumm/build/include/tensorview_bind/TensorViewBind.h -o /home/cicv/cumm/cumm/build/include/tensorview_bind/TensorViewBind.h.gch /home/cicv/cumm/cumm/build/include/tensorview_bind/TensorViewBind.h:1:9: warning: #pragma once in main file #pragma once ^~~~ /home/cicv/cumm/cumm/build/include/tensorview_bind/TensorViewBind.h:55:1: fatal error: can’t write PCH file: No space left on device } // namespace tensorview_bind ^ compilation terminated. [2/36] g++ -MMD -MT /home/cicv/cumm/cumm/build/include/csrc/arrayref/ArrayPtr.h.gch -MF /home/cicv/cumm/cumm/build/include/csrc/arrayref/ArrayPtr.h.gch.d -I "/home/cicv/cumm/cumm/build/include" -I "/home/cicv/cumm/include" -I "/usr/local/cuda-10.2/targets/aarch64-linux/include" -I "/usr/local/lib/python3.6/dist-packages/pybind11/include" -I "/usr/include/python3.6m" -I "/usr/include/python3.6m" -DTV_CUDA -std=c++14 -O3 -fPIC -x c++-header -c /home/cicv/cumm/cumm/build/include/csrc/arrayref/ArrayPtr.h -o /home/cicv/cumm/cumm/build/include/csrc/arrayref/ArrayPtr.h.gch FAILED: /home/cicv/cumm/cumm/build/include/csrc/arrayref/ArrayPtr.h.gch g++ -MMD -MT /home/cicv/cumm/cumm/build/include/csrc/arrayref/ArrayPtr.h.gch -MF /home/cicv/cumm/cumm/build/include/csrc/arrayref/ArrayPtr.h.gch.d -I "/home/cicv/cumm/cumm/build/include" -I "/home/cicv/cumm/include" -I "/usr/local/cuda-10.2/targets/aarch64-linux/include" -I "/usr/local/lib/python3.6/dist-packages/pybind11/include" -I "/usr/include/python3.6m" -I "/usr/include/python3.6m" -DTV_CUDA -std=c++14 -O3 -fPIC -x c++-header -c /home/cicv/cumm/cumm/build/include/csrc/arrayref/ArrayPtr.h -o /home/cicv/cumm/cumm/build/include/csrc/arrayref/ArrayPtr.h.gch /home/cicv/cumm/cumm/build/include/csrc/arrayref/ArrayPtr.h:1:9: warning: #pragma once in main file #pragma once ^~~~ /home/cicv/cumm/cumm/build/include/csrc/arrayref/ArrayPtr.h:100:1: fatal error: can’t write PCH file: No space left on device } // namespace csrc ^ compilation terminated. ninja: build stopped: subcommand failed. Traceback (most recent call last): File "", line 1, in File "/home/cicv/cumm/cumm/init.py", line 33, in verbose=True) File "/home/cicv/PCCM-master/pccm/builder/pybind.py", line 141, in build_pybind objects_folder=objects_folder) File "/home/cicv/.local/lib/python3.6/site-packages/ccimport/core.py", line 182, in ccimport linker_to_path=linker_to_path) File "/home/cicv/.local/lib/python3.6/site-packages/ccimport/buildtools/writer.py", line 997, in build_simple_ninja raise subprocess.CalledProcessError(proc.returncode, cmds) subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

opened by AI-liu 2
Does cumm support a dynamic mode to decide to use cutlass or its own gemm based on input shapes?

Hello! I'm interested in cumm and find it's very fast. I read cumm's code and found it seems that cumm provides a gemm implement from scratch while also providing a cutlass mode. And cumm will use its own gemm by default. I do some experiments, and I find that for some shapes, the cutlass will get faster, and for others, cumm's gemm will be faster. Does cumm support a dynamic mode to decide to use cutlass or its own gemm based on the input shapes?

opened by umiswing 1

Building cumm for spconv 2x Issue

Hello and thanks for the great work. I am trying to build from source spconv following the official instructions. I am however facing some problems. When I run git clone https://github.com/FindDefinition/cumm, cd ./cumm, pip install -e . I face the same issue as #8. As you suggested, I checked out the v0.2.8 tag and I was successful in building cumm. Using python, import cumm gives no errors. However, when I then install spconv I get the following errors:

user@workstation:~/workspace/project/external_libs/spconv2/spconv$ python
Python 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:10) 
[GCC 10.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import spconv
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/piroli/workspace/projects/SprayNet/external_libs/spconv2/spconv/spconv/__init__.py", line 15, in <module>
    from . import build as _build
  File "/home/piroli/workspace/projects/SprayNet/external_libs/spconv2/spconv/spconv/build.py", line 21, in <module>
    from .constants import PACKAGE_NAME, PACKAGE_ROOT, DISABLE_JIT
  File "/home/piroli/workspace/projects/SprayNet/external_libs/spconv2/spconv/spconv/constants.py", line 19, in <module>
    from cumm.gemm.constants import NVRTCMode
ImportError: cannot import name 'NVRTCMode' from 'cumm.gemm.constants' (/home/piroli/workspace/projects/SprayNet/external_libs/spconv2/cumm/cumm/gemm/constants.py)

I noticed that the v0.2.8 https://github.com/FindDefinition/cumm/blob/main/cumm/gemm/constants.py does not have the NVRTCMode class, however the main branch does.

Is there a way to compile the latest version?

opened by aldipiroli 5

AssertionError: assert TENSORVIEW_INCLUDE_PATH.exists()

I am getting this error while I run import spconv

(mon23) ubuntu@ip-:/mnt/data/Mon/second.pytorch/second/spconv$ python Python 3.6.9 (default, Mar 15 2022, 13:55:28) [GCC 8.4.0] on linux Type "help", "copyright", "credits" or "license" for more information.

import spconv Traceback (most recent call last): File "", line 1, in File "/mnt/data/Mon/second.pytorch/second/spconv/spconv/init.py", line 17, in from .core import ConvAlgo, AlgoHint File "/mnt/data/Mon/second.pytorch/second/spconv/spconv/core.py", line 15, in from cumm.gemm.main import gen_shuffle_params_v2 as gen_shuffle_params, GemmAlgoParams File "/home/ubuntu/.local/lib/python3.6/site-packages/cumm/init.py", line 21, in from .constants import CUMM_CPU_ONLY_BUILD, CUMM_DISABLE_JIT, PACKAGE_NAME File "/home/ubuntu/.local/lib/python3.6/site-packages/cumm/constants.py", line 35, in assert TENSORVIEW_INCLUDE_PATH.exists() AssertionError

Can you please help me here?

If I build again using python setup.py bdist_wheel then I get the following error.

(mon23) ubuntu@ip-:/mnt/data/Mon/second.pytorch/second/spconv$ python setup.py bdist_wheel Traceback (most recent call last): File "setup.py", line 153, in from cumm.gemm.main import GemmMainUnitTest File "/home/ubuntu/.local/lib/python3.6/site-packages/cumm/init.py", line 21, in from .constants import CUMM_CPU_ONLY_BUILD, CUMM_DISABLE_JIT, PACKAGE_NAME File "/home/ubuntu/.local/lib/python3.6/site-packages/cumm/constants.py", line 35, in assert TENSORVIEW_INCLUDE_PATH.exists() AssertionError

opened by Hetali-Vekariya 0
how to convert a pytorch tenser to a cumm.core_cc.tensorview_bind.Tensor?

hi, I tried to convert a pytorch tensor to a cumm.core_cc.tensorview_bind.Tensor using tensor.cpu().detech().numpy() and tensorview.from_numpy(), but the gradient that I need will be lost. So, how can I convert a pytorch tensor to a cumm.core_cc.tensorview_bind.Tensor without losing gradient ? Thanks.

opened by Liu-Wendao 0
ImportError: cannot import name 'TensorOpParams' from 'cumm.gemm.algospec.core'

Hi, I built cumm from source on Nvidia Jetson nano board, when I import cumm inside python, no errors appear. But when import TensorOpParams as following: from cumm.gemm.algospec.core import TensorOpParams I got the following error:

Traceback (most recent call last): File "", line 1, in File "/home/jetson/build_spconv/spconv3/spconv/spconv/init.py", line 15, in from . import build as _build File "/home/jetson/build_spconv/spconv3/spconv/spconv/build.py", line 24, in from spconv.core import SHUFFLE_SIMT_PARAMS, SHUFFLE_VOLTA_PARAMS, SHUFFLE_TURING_PARAMS File "/home/jetson/build_spconv/spconv3/spconv/spconv/core.py", line 18, in from cumm.gemm.algospec.core import TensorOpParams ImportError: cannot import name 'TensorOpParams' from 'cumm.gemm.algospec.core' (/home/jetson/build_spconv/spconv3/cumm/cumm/gemm/algospec/core.py)

Environment： NVIDIA Jetson Nano Developer Kit (aarch64) Python: 3.7.5 cumm: 0.3.0 cuda: 10.2

Please help me!!! Thank you!

opened by SmBito 2

Owner

GitHub

Extending JAX with custom C++ and CUDA code

Extending JAX with custom C++ and CUDA code This repository is meant as a tutorial demonstrating the infrastructure required to provide custom ops in

237 Dec 23, 2022

Several simple examples for popular neural network toolkits calling custom CUDA operators.

Neural Network CUDA Example Several simple examples for neural network toolkits (PyTorch, TensorFlow, etc.) calling custom CUDA operators. We provide

798 Jan 1, 2023

This Repo is the official CUDA implementation of ICCV 2019 Oral paper for CARAFE: Content-Aware ReAssembly of FEatures

Introduction This Repo is the official CUDA implementation of ICCV 2019 Oral paper for CARAFE: Content-Aware ReAssembly of FEatures. @inproceedings{Wa

42 Jan 7, 2023

PyTorch implementation of Soft-DTW: a Differentiable Loss Function for Time-Series in CUDA

Soft DTW Loss Function for PyTorch in CUDA This is a Pytorch Implementation of Soft-DTW: a Differentiable Loss Function for Time-Series which is batch

76 Dec 20, 2022

Example repository for custom C++/CUDA operators for TorchScript

Custom TorchScript Operators Example This repository contains examples for writing, compiling and using custom TorchScript operators. See here for the

106 Dec 14, 2022

Convert Python 3 code to CUDA code.

Py2CUDA Convert python code to CUDA. Usage To convert a python file say named py_file.py to CUDA, run python generate_cuda.py --file py_file.py --arch

3 Jul 14, 2021

This demo showcase the use of onnxruntime-rs with a GPU on CUDA 11 to run Bert in a data pipeline with Rust.

Demo BERT ONNX pipeline written in rust This demo showcase the use of onnxruntime-rs with a GPU on CUDA 11 to run Bert in a data pipeline with Rust. R

14 Dec 17, 2022

CUDA Python Low-level Bindings

529 Jan 3, 2023

Time-stretch audio clips quickly with PyTorch (CUDA supported)! Additional utilities for searching efficient transformations are included.

22 Jul 7, 2022

A dead simple python wrapper for darknet that works with OpenCV 4.1, CUDA 10.1

What Dead simple python wrapper for Yolo V3 using AlexyAB's darknet fork. Works with CUDA 10.1 and OpenCV 4.1 or later (I use OpenCV master as of Jun

6 Jan 12, 2022

Prevent `CUDA error: out of memory` in just 1 line of code.

?? Koila Koila solves CUDA error: out of memory error painlessly. Fix it with just one line of code, and forget it. ?? Features ?? Prevents CUDA error

1.7k Jan 2, 2023

An addernet CUDA version

Training addernet accelerated by CUDA Usage cd adder_cuda python setup.py install cd .. python main.py Environment pytorch 1.10.0 CUDA 11.3 benchmark

4 Jun 20, 2022

Neural network for digit classification powered by cuda

cuda_nn_mnist Neural network library for digit classification powered by cuda Resources The library was built to work with MNIST dataset. python-mnist

1 Dec 20, 2021

Lunar is a neural network aimbot that uses real-time object detection accelerated with CUDA on Nvidia GPUs.

Lunar Lunar is a neural network aimbot that uses real-time object detection accelerated with CUDA on Nvidia GPUs. About Lunar can be modified to work

276 Jan 7, 2023

Decorators for maximizing memory utilization with PyTorch & CUDA

torch-max-mem This package provides decorators for memory utilization maximization with PyTorch and CUDA by starting with a maximum parameter size and

10 May 2, 2022

[ICLR 2021] Is Attention Better Than Matrix Decomposition?

Enjoy-Hamburger ?? Official implementation of Hamburger, Is Attention Better Than Matrix Decomposition? (ICLR 2021) Under construction. Introduction T

271 Dec 29, 2022

Graph-based community clustering approach to extract protein domains from a predicted aligned error matrix

Using a predicted aligned error matrix corresponding to an AlphaFold2 model , returns a series of lists of residue indices, where each list corresponds to a set of residues clustering together into a pseudo-rigid domain.

24 Nov 23, 2022

A numpy-based implementation of RANSAC for fundamental matrix and homography estimation. The degeneracy updating and local optimization components are included and optional.

Description A numpy-based implementation of RANSAC for fundamental matrix and homography estimation. The degeneracy updating and local optimization co

9 Nov 10, 2022

Implementation of SSMF: Shifting Seasonal Matrix Factorization

SSMF Implementation of SSMF: Shifting Seasonal Matrix Factorization, Koki Kawabata, Siddharth Bhatia, Rui Liu, Mohit Wadhwa, Bryan Hooi. NeurIPS, 2021

9 Jun 10, 2022