CUda Matrix Multiply library.

Related tags

Deep Learning cumm
Overview

cumm

CUda Matrix Multiply library.

Build Status

cumm is developed during learning of CUTLASS, which use too much c++ template and make code unmaintainable. So I develop pccm, use python as meta programming language, to replace c++ template meta programming. Now pccm become a foundational framework of cumm and my other c++ project such as spconv. cumm also contains a python asyncio-based gemm simulator that share same meta program with CUDA code, enable gemm visualization and easy debug experience.

Install

Prebuilt

We offer python 3.6-3.10 and cuda 10.2/11.1/11.3/11.4 prebuilt binaries for linux (manylinux).

We offer python 3.7-3.10 and cuda 10.2/11.1/11.3/11.4 prebuilt binaries for windows 10/11.

We will offer prebuilts for CUDA versions supported by latest pytorch release. For example, pytorch 1.9 support cuda 10.2 and 11.1, so we support them too.

pip install cumm for CPU-only

pip install cumm-cu102 for CUDA 10.2

pip install cumm-cu111 for CUDA 11.1

pip install cumm-cu113 for CUDA 11.3

pip install cumm-cu114 for CUDA 11.4

Build from source for development (JIT, recommend for develop)

WARNING Use code in tags!!! code in main branch may contain bugs.

The c++ code will be built automatically when you change c++ code in project.

Linux

  1. uninstall cumm installed by pip. you must ensure no "cumm" exists in pip list | grep cumm
  2. install build-essential, install CUDA
  3. git clone https://github.com/FindDefinition/cumm, cd ./cumm, pip install -e .
  4. in python, import cumm and wait for build finish.

Windows

  1. uninstall spconv and cumm installed by pip. you must ensure no "cumm" exists in pip list | grep cumm
  2. install visual studio 2019 or newer. make sure C++ development component is installed. install CUDA
  3. set powershell script execution policy
  4. start a new powershell, run tools/msvc_setup.ps1
  5. git clone https://github.com/FindDefinition/cumm, cd ./cumm, pip install -e .
  6. in python, import cumm and wait for build finish.

Build wheel from source

WARNING Use code in tags!!! code in main branch may contain bugs.

WARNING: If CUMM_CUDA_VERSION is set with a CUDA version, following steps will create a wheel named "cumm-cuxxx", not "cumm", this means you must use cumm-cuxxx in dependency of your project which depend on cumm, not cumm. If CUMM_CUDA_VERSION isn't set, cumm will always built with CUDA, so the CUDA must exists in your system. The wheel name will be cumm even if it is built with cuda.

Linux

It's recommend to build Linux packages in official build docker. Build with CUDA support don't need a real GPU.

Build in Official Docker
  1. select a cuda version. available: CUDA 10.2, 11.1, 11.3, 11.4, 11.5
  2. (Example for CUDA 11.4) git clone https://github.com/FindDefinition/cumm, cd ./cumm, docker run --rm -e PLAT=manylinux2014_x86_64 -e CUMM_CUDA_VERSION=114 -v `pwd`:/io scrin/manylinux2014-cuda:cu114-devel-1.0.0 bash -c "source /etc/bashrc && /io/tools/build-wheels.sh"
Build in your environment
  1. install build-essential, install CUDA
  2. set env for installed cuda version. for example, export CUMM_CUDA_VERSION="11.4". If you want to build CPU-only, run export CUMM_CUDA_VERSION="". If CUMM_CUDA_VERSION isn't set, you need to ensure cuda libraries are inside OS search path, and the built wheel name will be cumm, otherwise cumm-cuxxx
  3. run export CUMM_DISABLE_JIT="1"
  4. run python setup.py bdist_wheel+pip install dists/xxx.whl

Windows 10/11

  1. install visual studio 2019 or newer. make sure C++ development package is installed. install CUDA
  2. set powershell script execution policy
  3. start a new powershell, run tools/msvc_setup.ps1
  4. set env for installed cuda version. for example, $Env:CUMM_CUDA_VERSION = "11.4". If you want to build CPU-only, run $Env:CUMM_CUDA_VERSION = "". . If CUMM_CUDA_VERSION isn't set, you need to ensure cuda libraries are inside OS search path, and the built wheel name will be cumm, otherwise cumm-cuxxx
  5. run $Env:CUMM_DISABLE_JIT = "1"
  6. run python setup.py bdist_wheel+pip install dists/xxx.whl

Note

The work is done when the author is an employee at Tusimple.

LICENSE

Apache 2.0

Comments
  • TENSORVIEW_INCLUDE_PATH does not exist

    TENSORVIEW_INCLUDE_PATH does not exist

    Hi,

    I built a wheel form source for the usage with spconv 2.x on a Nvidia Jetson TX2 with the following specifications: Ubuntu 18.04 CUDA 10.2 Python 3.6 Pytorch 1.6.0

    I followed the description in the README and did not export CUMM_CUDA_VERSION. Additionally I specified the CUDA compute capability by export CUMM_CUDA_ARCH_LIST="6.2" as described in spconv.

    I could build and install the wheel without any problems, but I got the following AssertionError, when importing cumm in python.

    Python 3.6.9 (default, Jan 26 2021, 15:33:00) 
    [GCC 8.4.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import cumm
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/nvidia/.local/lib/python3.6/site-packages/cumm/__init__.py", line 20, in <module>
        from .constants import PACKAGE_NAME, CUMM_DISABLE_JIT, CUMM_CPU_ONLY_BUILD
      File "/home/nvidia/.local/lib/python3.6/site-packages/cumm/constants.py", line 33, in <module>
        assert TENSORVIEW_INCLUDE_PATH.exists()
    AssertionError
    
    
    triaged 
    opened by jonasmeyer1 7
  • 'BuildMeta' object has no attribute 'add_public_includes'

    'BuildMeta' object has no attribute 'add_public_includes'

    I built cumm from source and tried to import cumm but failed:

    Traceback (most recent call last): File "", line 1, in File "d:\resources\cumm\cumm_init_.py", line 33, in verbose=False) File "C:\Users\Yihua\AppData\Roaming\Python\Python37\site-packages\pccm\builder\pybind.py", line 56, in build_pybind user_cus = cg.build_graph(cus, namespace_root) File "C:\Users\Yihua\AppData\Roaming\Python\Python37\site-packages\pccm\core_init_.py", line 1903, in build_graph cu_type_to_cu[dep] = dep() File "C:\Users\Yihua\AppData\Roaming\Python\Python37\site-packages\pccm\core_init_.py", line 844, in wrapper func(self, *args, **kwargs) File "d:\resources\cumm\cumm\common.py", line 278, in init self.build_meta.add_public_includes(TENSORVIEW_INCLUDE_PATH) AttributeError: 'BuildMeta' object has no attribute 'add_public_includes'

    Environment: Windows 11 Python 3.7.13 pccm==0.3.4 ccimport==0.3.7 pip==21.2.4 mkl-service==2.4.0

    I noticed that there were recent changes in ccimport so I tried another environment and built ccimport from source but also failed:

    [1/38] [MSVC][c++/pch]D:\Resources\cumm\cumm\build\core_cc\include\csrc\arrayref\ArrayPtr.h.pch|D:\Resources\cumm\cumm\build\core_cc\msvc_stub\include\csrc\arrayref\ArrayPtr.h.cc.o FAILED: D:/Resources/cumm/cumm/build/core_cc/msvc_stub/include/csrc/arrayref/ArrayPtr.h.cc.o cl /I "D:\Resources\cumm\cumm\build\core_cc\include" /I "F:\Program Files (x86)\Intel\oneAPI\intelpython\latest\lib\site-packages\pybind11\include" /I "F:\Program Files (x86)\Intel\oneAPI\intelpython\latest\Include" /O2 /DNOMINMAX /std:c++14 /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc /Zc:__cplusplus /nologo /showIncludes /DTV_CUDA /I "D:\Resources\cumm\include" /I "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\include" -c /Yccsrc/arrayref/ArrayPtr.h /FpD:\Resources\cumm\cumm\build\core_cc\include\csrc\arrayref\ArrayPtr.h.pch D:\Resources\cumm\cumm\build\core_cc\msvc_stub\include\csrc\arrayref\ArrayPtr.h.cc /FoD:\Resources\cumm\cumm\build\core_cc\msvc_stub\include\csrc\arrayref\ArrayPtr.h.cc.o D:\Resources\cumm\include\tensorview\core\array.h(813): warning C4068: Unknown pragma“GCC” D:\Resources\cumm\include\tensorview/core/common.h(29): fatal error C1083: Cannot open include file: “cuda.h”: No such file or directory [2/38] [MSVC][c++/pch]D:\Resources\cumm\cumm\build\core_cc\include\tensorview_bind\TensorViewBind.h.pch|D:\Resources\cumm\cumm\build\core_cc\msvc_stub\include\tensorview_bind\TensorViewBind.h.cc.o FAILED: D:/Resources/cumm/cumm/build/core_cc/msvc_stub/include/tensorview_bind/TensorViewBind.h.cc.o cl /I "D:\Resources\cumm\cumm\build\core_cc\include" /I "F:\Program Files (x86)\Intel\oneAPI\intelpython\latest\lib\site-packages\pybind11\include" /I "F:\Program Files (x86)\Intel\oneAPI\intelpython\latest\Include" /O2 /DNOMINMAX /std:c++14 /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc /Zc:__cplusplus /nologo /showIncludes /DTV_CUDA /I "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\include" /I "D:\Resources\cumm\include" -c /Yctensorview_bind/TensorViewBind.h /FpD:\Resources\cumm\cumm\build\core_cc\include\tensorview_bind\TensorViewBind.h.pch D:\Resources\cumm\cumm\build\core_cc\msvc_stub\include\tensorview_bind\TensorViewBind.h.cc /FoD:\Resources\cumm\cumm\build\core_cc\msvc_stub\include\tensorview_bind\TensorViewBind.h.cc.o D:\Resources\cumm\include\tensorview/core/common.h(29): fatal error C1083: Cannot open include file: “cuda.h”: No such file or directory [3/38] [MSVC][c++]D:\Resources\cumm\cumm\build\core_cc\src\tensorview_bind\PyBindTensorViewBind\PyBindTensorViewBind_bind_TensorViewBind.cc.o FAILED: D:/Resources/cumm/cumm/build/core_cc/src/tensorview_bind/PyBindTensorViewBind/PyBindTensorViewBind_bind_TensorViewBind.cc.o cl /I "D:\Resources\cumm\cumm\build\core_cc\include" /I "F:\Program Files (x86)\Intel\oneAPI\intelpython\latest\lib\site-packages\pybind11\include" /I "F:\Program Files (x86)\Intel\oneAPI\intelpython\latest\Include" /O2 /DNOMINMAX /std:c++14 /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc /Zc:__cplusplus /nologo /showIncludes /DTV_CUDA /I "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\include" /I "D:\Resources\cumm\include" -c D:\Resources\cumm\cumm\build\core_cc\src\tensorview_bind\PyBindTensorViewBind\PyBindTensorViewBind_bind_TensorViewBind.cc /FoD:\Resources\cumm\cumm\build\core_cc\src\tensorview_bind\PyBindTensorViewBind\PyBindTensorViewBind_bind_TensorViewBind.cc.o D:\Resources\cumm\include\tensorview/core/common.h(29): fatal error C1083: Cannot open include file: “cuda.h”: No such file or directory [4/38] [MSVC][c++]D:\Resources\cumm\cumm\build\core_cc\src\csrc\arrayref\PyBindArrayPtr\PyBindArrayPtr_bind_ArrayPtr.cc.o FAILED: D:/Resources/cumm/cumm/build/core_cc/src/csrc/arrayref/PyBindArrayPtr/PyBindArrayPtr_bind_ArrayPtr.cc.o cl /I "D:\Resources\cumm\cumm\build\core_cc\include" /I "F:\Program Files (x86)\Intel\oneAPI\intelpython\latest\lib\site-packages\pybind11\include" /I "F:\Program Files (x86)\Intel\oneAPI\intelpython\latest\Include" /O2 /DNOMINMAX /std:c++14 /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc /Zc:_cplusplus /nologo /showIncludes /DTV_CUDA /I "D:\Resources\cumm\include" /I "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\include" -c D:\Resources\cumm\cumm\build\core_cc\src\csrc\arrayref\PyBindArrayPtr\PyBindArrayPtr_bind_ArrayPtr.cc /FoD:\Resources\cumm\cumm\build\core_cc\src\csrc\arrayref\PyBindArrayPtr\PyBindArrayPtr_bind_ArrayPtr.cc.o D:\Resources\cumm\include\tensorview\core\array.h(813): warning C4068: Unknown pragma“GCC” D:\Resources\cumm\include\tensorview/core/common.h(29): fatal error C1083: Cannot open include file: “cuda.h”: No such file or directory ninja: build stopped: subcommand failed. Traceback (most recent call last): File "", line 1, in File "d:\resources\cumm\cumm_init.py", line 29, in pccm.builder.build_pybind([ArrayPtr(), TensorViewBind()], File "D:\Resources\pccm\pccm\builder\pybind.py", line 114, in build_pybind return ccimport.ccimport( File "F:\Program Files (x86)\Intel\oneAPI\intelpython\latest\lib\site-packages\ccimport-0.4.0-py3.9.egg\ccimport\core.py", line 149, in ccimport File "F:\Program Files (x86)\Intel\oneAPI\intelpython\latest\lib\site-packages\ccimport-0.4.0-py3.9.egg\ccimport\buildtools\writer.py", line 990, in build_simple_ninja subprocess.CalledProcessError: Command '['ninja']' returned non-zero exit status 1.

    Environment: Windows 11 Python 3.9.7 pccm==0.3.5 (built from source) ccimport==0.4.0 (built from source) pip==21.2.4

    opened by yihuajack 4
  • Install cumm on Orin

    Install cumm on Orin

    Similar to the previous issue (https://github.com/FindDefinition/cumm/issues/13) , I also failed to install cumm on Orin.

    Can you tell me how to install it on Orin? Any comments would be highly appreciated.

    The options I have set are as follows:

    export CUMM_DISABLE_JIT="1"
    export CUMM_ARCH_LIST="8.7+PTX"
    sudo pip install -e.
    

    • The error message is: [39/39] [GCC][Link]/home/siyeong/workspace/cumm/build/temp.linux-aarch64-cpython-38/cumm/build/core_cc/core_cc.cpython-38-aarch64-linux-gnu.so FAILED: /home/siyeong/workspace/cumm/build/temp.linux-aarch64-cpython-38/cumm/build/core_cc/core_cc.cpython-38-aarch64-linux-gnu.so g++ @/home/siyeong/workspace/cumm/build/temp.linux-aarch64-cpython-38/cumm/build/core_cc/core_cc.cpython-38-aarch64-linux-gnu.so.rsp -lcudart -lnvrtc -lcufilt -ldl -L "/usr/local/cuda/lib64" -L "/usr/local/cuda/lib64" -Wl,--no-as-needed -lnvrtc-builtins -shared -o /home/siyeong/workspace/cumm/build/temp.linux-aarch64-cpython-38/cumm/build/core_cc/core_cc.cpython-38-aarch64-linux-gnu.so /usr/bin/ld: /usr/local/cuda/lib64/libcufilt.a(decode.o): relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol display_func_params' which may bind externally can not be used when making a shared object; recompile with -fPIC /usr/local/cuda/lib64/libcufilt.a(decode.o): in functiondemangle_bare_function_type(char const*, int, int, a_decode_control_block*)': /dvs/p4/build/sw/rel/gpgpu/toolkit/r11.4/compiler/drivers/compiler/edg/EDG_6.2/src/../util/decode.c:4996:(.text._ZL27demangle_bare_function_typePKciiP22a_decode_control_block+0x53c): dangerous relocation: unsupported relocation /usr/bin/ld: /usr/local/cuda/lib64/libcufilt.a(decode.o): relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol display_func_params' which may bind externally can not be used when making a shared object; recompile with -fPIC /usr/local/cuda/lib64/libcufilt.a(decode.o): in functiondecode_identifier(char const*, char*, unsigned long, int*, int*, unsigned long*)': /dvs/p4/build/sw/rel/gpgpu/toolkit/r11.4/compiler/drivers/compiler/edg/EDG_6.2/src/../util/decode.c:8766:(.text._Z17decode_identifierPKcPcmPiS2_Pm+0x24): dangerous relocation: unsupported relocation /tmp/pip-build-env-mjzpnmvl/overlay/lib/python3.8/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools. warnings.warn( /tmp/pip-build-env-mjzpnmvl/overlay/lib/python3.8/site-packages/pkg_resources/init.py:123: PkgResourcesDeprecationWarning: 0.1.36ubuntu1 is an invalid version and will not be supported in a future release
    opened by Siyeong-Lee 3
  • can't install cumm on agx xavier

    can't install cumm on agx xavier

    jetpack 5.0.1 cuda-11.4 python3.8

    I failed to install cumm.. from 0.3.0 to 0.3.4 version all failed so I tried to compile libcufilt.a (edg_decode) with g++ and nvcc with -fPIC, succeed but after recompiling, the error did not disappear

    export CUMM_DISABLE_JIT="1" export CUMM_ARCH_LIST="7.2+PTX" sudo pip install -e .

    [39/39] [GCC][Link]/home/jetson/cumm/build/temp.linux-aarch64-cpython-38/cumm/build/core_cc/core_cc.cpython-38-aarch64-linux-gnu.so FAILED: /home/jetson/cumm/build/temp.linux-aarch64-cpython-38/cumm/build/core_cc/core_cc.cpython-38-aarch64-linux-gnu.so g++ @/home/jetson/cumm/build/temp.linux-aarch64-cpython-38/cumm/build/core_cc/core_cc.cpython-38-aarch64-linux-gnu.so.rsp -lcudart -lnvrtc -lcufilt -ldl -L "/usr/local/cuda/lib64" -L "/usr/local/cuda/lib64" -Wl,--no-as-needed -lnvrtc-builtins -shared -o /home/jetson/cumm/build/temp.linux-aarch64-cpython-38/cumm/build/core_cc/core_cc.cpython-38-aarch64-linux-gnu.so /usr/bin/ld: /usr/local/cuda/lib64/libcufilt.a(edg_decode.o): relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol stderr@@GLIBC_2.17' which may bind externally can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: /usr/local/cuda/lib64/libcufilt.a(edg_decode.o)(.text._Z6getoptiPKPcPKc+0x15c): unresolvable R_AARCH64_ADR_PREL_PG_HI21 relocation against symbolstderr@@GLIBC_2.17' /usr/bin/ld: final link failed: bad value collect2: error: ld returned 1 exit status ninja: build stopped: subcommand failed.

    opened by odroidodroid 2
  • cann't   install    cumm   in  jetson agx xavier  arrch64

    cann't install cumm in jetson agx xavier arrch64

    sudo pip3 install -e .

    python3 Python 3.6.9 (default, Jan 26 2021, 15:33:00) [GCC 8.4.0] on linux Type "help", "copyright", "credits" or "license" for more information.

    import cumm [1/36] g++ -MMD -MT /home/cicv/cumm/cumm/build/include/tensorview_bind/TensorViewBind.h.gch -MF /home/cicv/cumm/cumm/build/include/tensorview_bind/TensorViewBind.h.gch.d -I "/home/cicv/cumm/cumm/build/include" -I "/home/cicv/cumm/include" -I "/usr/local/cuda-10.2/targets/aarch64-linux/include" -I "/usr/local/lib/python3.6/dist-packages/pybind11/include" -I "/usr/include/python3.6m" -I "/usr/include/python3.6m" -DTV_CUDA -std=c++14 -O3 -fPIC -x c++-header -c /home/cicv/cumm/cumm/build/include/tensorview_bind/TensorViewBind.h -o /home/cicv/cumm/cumm/build/include/tensorview_bind/TensorViewBind.h.gch FAILED: /home/cicv/cumm/cumm/build/include/tensorview_bind/TensorViewBind.h.gch g++ -MMD -MT /home/cicv/cumm/cumm/build/include/tensorview_bind/TensorViewBind.h.gch -MF /home/cicv/cumm/cumm/build/include/tensorview_bind/TensorViewBind.h.gch.d -I "/home/cicv/cumm/cumm/build/include" -I "/home/cicv/cumm/include" -I "/usr/local/cuda-10.2/targets/aarch64-linux/include" -I "/usr/local/lib/python3.6/dist-packages/pybind11/include" -I "/usr/include/python3.6m" -I "/usr/include/python3.6m" -DTV_CUDA -std=c++14 -O3 -fPIC -x c++-header -c /home/cicv/cumm/cumm/build/include/tensorview_bind/TensorViewBind.h -o /home/cicv/cumm/cumm/build/include/tensorview_bind/TensorViewBind.h.gch /home/cicv/cumm/cumm/build/include/tensorview_bind/TensorViewBind.h:1:9: warning: #pragma once in main file #pragma once ^~~~ /home/cicv/cumm/cumm/build/include/tensorview_bind/TensorViewBind.h:55:1: fatal error: can’t write PCH file: No space left on device } // namespace tensorview_bind ^ compilation terminated. [2/36] g++ -MMD -MT /home/cicv/cumm/cumm/build/include/csrc/arrayref/ArrayPtr.h.gch -MF /home/cicv/cumm/cumm/build/include/csrc/arrayref/ArrayPtr.h.gch.d -I "/home/cicv/cumm/cumm/build/include" -I "/home/cicv/cumm/include" -I "/usr/local/cuda-10.2/targets/aarch64-linux/include" -I "/usr/local/lib/python3.6/dist-packages/pybind11/include" -I "/usr/include/python3.6m" -I "/usr/include/python3.6m" -DTV_CUDA -std=c++14 -O3 -fPIC -x c++-header -c /home/cicv/cumm/cumm/build/include/csrc/arrayref/ArrayPtr.h -o /home/cicv/cumm/cumm/build/include/csrc/arrayref/ArrayPtr.h.gch FAILED: /home/cicv/cumm/cumm/build/include/csrc/arrayref/ArrayPtr.h.gch g++ -MMD -MT /home/cicv/cumm/cumm/build/include/csrc/arrayref/ArrayPtr.h.gch -MF /home/cicv/cumm/cumm/build/include/csrc/arrayref/ArrayPtr.h.gch.d -I "/home/cicv/cumm/cumm/build/include" -I "/home/cicv/cumm/include" -I "/usr/local/cuda-10.2/targets/aarch64-linux/include" -I "/usr/local/lib/python3.6/dist-packages/pybind11/include" -I "/usr/include/python3.6m" -I "/usr/include/python3.6m" -DTV_CUDA -std=c++14 -O3 -fPIC -x c++-header -c /home/cicv/cumm/cumm/build/include/csrc/arrayref/ArrayPtr.h -o /home/cicv/cumm/cumm/build/include/csrc/arrayref/ArrayPtr.h.gch /home/cicv/cumm/cumm/build/include/csrc/arrayref/ArrayPtr.h:1:9: warning: #pragma once in main file #pragma once ^~~~ /home/cicv/cumm/cumm/build/include/csrc/arrayref/ArrayPtr.h:100:1: fatal error: can’t write PCH file: No space left on device } // namespace csrc ^ compilation terminated. ninja: build stopped: subcommand failed. Traceback (most recent call last): File "", line 1, in File "/home/cicv/cumm/cumm/init.py", line 33, in verbose=True) File "/home/cicv/PCCM-master/pccm/builder/pybind.py", line 141, in build_pybind objects_folder=objects_folder) File "/home/cicv/.local/lib/python3.6/site-packages/ccimport/core.py", line 182, in ccimport linker_to_path=linker_to_path) File "/home/cicv/.local/lib/python3.6/site-packages/ccimport/buildtools/writer.py", line 997, in build_simple_ninja raise subprocess.CalledProcessError(proc.returncode, cmds) subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

    opened by AI-liu 2
  • Does cumm support a dynamic mode to decide to use cutlass or its own gemm based on input shapes?

    Does cumm support a dynamic mode to decide to use cutlass or its own gemm based on input shapes?

    Hello! I'm interested in cumm and find it's very fast. I read cumm's code and found it seems that cumm provides a gemm implement from scratch while also providing a cutlass mode. And cumm will use its own gemm by default. I do some experiments, and I find that for some shapes, the cutlass will get faster, and for others, cumm's gemm will be faster. Does cumm support a dynamic mode to decide to use cutlass or its own gemm based on the input shapes?

    opened by umiswing 1
  • Building cumm for spconv 2x Issue

    Building cumm for spconv 2x Issue

    Hello and thanks for the great work. I am trying to build from source spconv following the official instructions. I am however facing some problems. When I run git clone https://github.com/FindDefinition/cumm, cd ./cumm, pip install -e . I face the same issue as #8. As you suggested, I checked out the v0.2.8 tag and I was successful in building cumm. Using python, import cumm gives no errors. However, when I then install spconv I get the following errors:

    user@workstation:~/workspace/project/external_libs/spconv2/spconv$ python
    Python 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:10) 
    [GCC 10.3.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import spconv
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/piroli/workspace/projects/SprayNet/external_libs/spconv2/spconv/spconv/__init__.py", line 15, in <module>
        from . import build as _build
      File "/home/piroli/workspace/projects/SprayNet/external_libs/spconv2/spconv/spconv/build.py", line 21, in <module>
        from .constants import PACKAGE_NAME, PACKAGE_ROOT, DISABLE_JIT
      File "/home/piroli/workspace/projects/SprayNet/external_libs/spconv2/spconv/spconv/constants.py", line 19, in <module>
        from cumm.gemm.constants import NVRTCMode
    ImportError: cannot import name 'NVRTCMode' from 'cumm.gemm.constants' (/home/piroli/workspace/projects/SprayNet/external_libs/spconv2/cumm/cumm/gemm/constants.py)
    

    I noticed that the v0.2.8 https://github.com/FindDefinition/cumm/blob/main/cumm/gemm/constants.py does not have the NVRTCMode class, however the main branch does.

    Is there a way to compile the latest version?

    opened by aldipiroli 5
  • AssertionError: assert TENSORVIEW_INCLUDE_PATH.exists()

    AssertionError: assert TENSORVIEW_INCLUDE_PATH.exists()

    I am getting this error while I run import spconv

    (mon23) ubuntu@ip-:/mnt/data/Mon/second.pytorch/second/spconv$ python Python 3.6.9 (default, Mar 15 2022, 13:55:28) [GCC 8.4.0] on linux Type "help", "copyright", "credits" or "license" for more information.

    import spconv Traceback (most recent call last): File "", line 1, in File "/mnt/data/Mon/second.pytorch/second/spconv/spconv/init.py", line 17, in from .core import ConvAlgo, AlgoHint File "/mnt/data/Mon/second.pytorch/second/spconv/spconv/core.py", line 15, in from cumm.gemm.main import gen_shuffle_params_v2 as gen_shuffle_params, GemmAlgoParams File "/home/ubuntu/.local/lib/python3.6/site-packages/cumm/init.py", line 21, in from .constants import CUMM_CPU_ONLY_BUILD, CUMM_DISABLE_JIT, PACKAGE_NAME File "/home/ubuntu/.local/lib/python3.6/site-packages/cumm/constants.py", line 35, in assert TENSORVIEW_INCLUDE_PATH.exists() AssertionError

    Can you please help me here?

    If I build again using python setup.py bdist_wheel then I get the following error.

    (mon23) ubuntu@ip-:/mnt/data/Mon/second.pytorch/second/spconv$ python setup.py bdist_wheel Traceback (most recent call last): File "setup.py", line 153, in from cumm.gemm.main import GemmMainUnitTest File "/home/ubuntu/.local/lib/python3.6/site-packages/cumm/init.py", line 21, in from .constants import CUMM_CPU_ONLY_BUILD, CUMM_DISABLE_JIT, PACKAGE_NAME File "/home/ubuntu/.local/lib/python3.6/site-packages/cumm/constants.py", line 35, in assert TENSORVIEW_INCLUDE_PATH.exists() AssertionError

    opened by Hetali-Vekariya 0
  • how to convert a pytorch tenser to a cumm.core_cc.tensorview_bind.Tensor?

    how to convert a pytorch tenser to a cumm.core_cc.tensorview_bind.Tensor?

    hi, I tried to convert a pytorch tensor to a cumm.core_cc.tensorview_bind.Tensor using tensor.cpu().detech().numpy() and tensorview.from_numpy(), but the gradient that I need will be lost. So, how can I convert a pytorch tensor to a cumm.core_cc.tensorview_bind.Tensor without losing gradient ? Thanks.

    opened by Liu-Wendao 0
  • ImportError: cannot import name 'TensorOpParams' from 'cumm.gemm.algospec.core'

    ImportError: cannot import name 'TensorOpParams' from 'cumm.gemm.algospec.core'

    Hi, I built cumm from source on Nvidia Jetson nano board, when I import cumm inside python, no errors appear. But when import TensorOpParams as following: from cumm.gemm.algospec.core import TensorOpParams I got the following error:

    Traceback (most recent call last): File "", line 1, in File "/home/jetson/build_spconv/spconv3/spconv/spconv/init.py", line 15, in from . import build as _build File "/home/jetson/build_spconv/spconv3/spconv/spconv/build.py", line 24, in from spconv.core import SHUFFLE_SIMT_PARAMS, SHUFFLE_VOLTA_PARAMS, SHUFFLE_TURING_PARAMS File "/home/jetson/build_spconv/spconv3/spconv/spconv/core.py", line 18, in from cumm.gemm.algospec.core import TensorOpParams ImportError: cannot import name 'TensorOpParams' from 'cumm.gemm.algospec.core' (/home/jetson/build_spconv/spconv3/cumm/cumm/gemm/algospec/core.py)

    Environment: NVIDIA Jetson Nano Developer Kit  (aarch64) Python: 3.7.5 cumm: 0.3.0 cuda: 10.2

    Please help me!!! Thank you!

    opened by SmBito 2
Owner
null
Extending JAX with custom C++ and CUDA code

Extending JAX with custom C++ and CUDA code This repository is meant as a tutorial demonstrating the infrastructure required to provide custom ops in

Dan Foreman-Mackey 237 Dec 23, 2022
Several simple examples for popular neural network toolkits calling custom CUDA operators.

Neural Network CUDA Example Several simple examples for neural network toolkits (PyTorch, TensorFlow, etc.) calling custom CUDA operators. We provide

WeiYang 798 Jan 1, 2023
This Repo is the official CUDA implementation of ICCV 2019 Oral paper for CARAFE: Content-Aware ReAssembly of FEatures

Introduction This Repo is the official CUDA implementation of ICCV 2019 Oral paper for CARAFE: Content-Aware ReAssembly of FEatures. @inproceedings{Wa

Jiaqi Wang 42 Jan 7, 2023
PyTorch implementation of Soft-DTW: a Differentiable Loss Function for Time-Series in CUDA

Soft DTW Loss Function for PyTorch in CUDA This is a Pytorch Implementation of Soft-DTW: a Differentiable Loss Function for Time-Series which is batch

Keon Lee 76 Dec 20, 2022
Example repository for custom C++/CUDA operators for TorchScript

Custom TorchScript Operators Example This repository contains examples for writing, compiling and using custom TorchScript operators. See here for the

null 106 Dec 14, 2022
Convert Python 3 code to CUDA code.

Py2CUDA Convert python code to CUDA. Usage To convert a python file say named py_file.py to CUDA, run python generate_cuda.py --file py_file.py --arch

Yuval Rosen 3 Jul 14, 2021
This demo showcase the use of onnxruntime-rs with a GPU on CUDA 11 to run Bert in a data pipeline with Rust.

Demo BERT ONNX pipeline written in rust This demo showcase the use of onnxruntime-rs with a GPU on CUDA 11 to run Bert in a data pipeline with Rust. R

Xavier Tao 14 Dec 17, 2022
CUDA Python Low-level Bindings

CUDA Python Low-level Bindings

NVIDIA Corporation 529 Jan 3, 2023
Time-stretch audio clips quickly with PyTorch (CUDA supported)! Additional utilities for searching efficient transformations are included.

Time-stretch audio clips quickly with PyTorch (CUDA supported)! Additional utilities for searching efficient transformations are included.

Kento Nishi 22 Jul 7, 2022
A dead simple python wrapper for darknet that works with OpenCV 4.1, CUDA 10.1

What Dead simple python wrapper for Yolo V3 using AlexyAB's darknet fork. Works with CUDA 10.1 and OpenCV 4.1 or later (I use OpenCV master as of Jun

Pliable Pixels 6 Jan 12, 2022
Prevent `CUDA error: out of memory` in just 1 line of code.

?? Koila Koila solves CUDA error: out of memory error painlessly. Fix it with just one line of code, and forget it. ?? Features ?? Prevents CUDA error

RenChu Wang 1.7k Jan 2, 2023
An addernet CUDA version

Training addernet accelerated by CUDA Usage cd adder_cuda python setup.py install cd .. python main.py Environment pytorch 1.10.0 CUDA 11.3 benchmark

LingXY 4 Jun 20, 2022
Neural network for digit classification powered by cuda

cuda_nn_mnist Neural network library for digit classification powered by cuda Resources The library was built to work with MNIST dataset. python-mnist

Nikita Ardashev 1 Dec 20, 2021
Lunar is a neural network aimbot that uses real-time object detection accelerated with CUDA on Nvidia GPUs.

Lunar Lunar is a neural network aimbot that uses real-time object detection accelerated with CUDA on Nvidia GPUs. About Lunar can be modified to work

Zeyad Mansour 276 Jan 7, 2023
Decorators for maximizing memory utilization with PyTorch & CUDA

torch-max-mem This package provides decorators for memory utilization maximization with PyTorch and CUDA by starting with a maximum parameter size and

Max Berrendorf 10 May 2, 2022
[ICLR 2021] Is Attention Better Than Matrix Decomposition?

Enjoy-Hamburger ?? Official implementation of Hamburger, Is Attention Better Than Matrix Decomposition? (ICLR 2021) Under construction. Introduction T

Gsunshine 271 Dec 29, 2022
Graph-based community clustering approach to extract protein domains from a predicted aligned error matrix

Using a predicted aligned error matrix corresponding to an AlphaFold2 model , returns a series of lists of residue indices, where each list corresponds to a set of residues clustering together into a pseudo-rigid domain.

Tristan Croll 24 Nov 23, 2022
A numpy-based implementation of RANSAC for fundamental matrix and homography estimation. The degeneracy updating and local optimization components are included and optional.

Description A numpy-based implementation of RANSAC for fundamental matrix and homography estimation. The degeneracy updating and local optimization co

AoxiangFan 9 Nov 10, 2022
Implementation of SSMF: Shifting Seasonal Matrix Factorization

SSMF Implementation of SSMF: Shifting Seasonal Matrix Factorization, Koki Kawabata, Siddharth Bhatia, Rui Liu, Mohit Wadhwa, Bryan Hooi. NeurIPS, 2021

Koki Kawabata 9 Jun 10, 2022