cuSignal - RAPIDS Signal Processing Library

RAPIDS

Last update: Dec 30, 2022

Related tags

GPU Utilities cusignal

Overview

cuSignal

The RAPIDS cuSignal project leverages CuPy, Numba, and the RAPIDS ecosystem for GPU accelerated signal processing. In some cases, cuSignal is a direct port of Scipy Signal to leverage GPU compute resources via CuPy but also contains Numba CUDA and Raw CuPy CUDA kernels for additional speedups for selected functions. cuSignal achieves its best gains on large signals and compute intensive functions but stresses online processing with zero-copy memory (pinned, mapped) between CPU and GPU.

NOTE: For the latest stable README.md ensure you are on the latest branch.

Quick Start
Documentation
Installation
Software Defined Radio (SDR) Integration
Benchmarking
Contribution Guide
cuSignal Blogs and Talks

Quick Start

cuSignal has an API that mimics SciPy Signal. In depth functionality is displayed in the notebooks section of the repo, but let's examine the workflow for Polyphase Resampling under multiple scenarios:

Scipy Signal (CPU)

import numpy as np
from scipy import signal

start = 0
stop = 10
num_samps = int(1e8)
resample_up = 2
resample_down = 3

cx = np.linspace(start, stop, num_samps, endpoint=False) 
cy = np.cos(-cx**2/6.0)

%%timeit
cf = signal.resample_poly(cy, resample_up, resample_down, window=('kaiser', 0.5))

This code executes on 2x Xeon E5-2600 in 2.36 sec.

cuSignal with Data Generated on the GPU with CuPy

import cupy as cp
import cusignal

start = 0
stop = 10
num_samps = int(1e8)
resample_up = 2
resample_down = 3

gx = cp.linspace(start, stop, num_samps, endpoint=False) 
gy = cp.cos(-gx**2/6.0)

%%timeit
gf = cusignal.resample_poly(gy, resample_up, resample_down, window=('kaiser', 0.5))

This code executes on an NVIDIA V100 in 13.8 ms, a 170x increase over SciPy Signal

cuSignal with Data Generated on the CPU with Mapped, Pinned (zero-copy) Memory

import cupy as cp
import numpy as np
import cusignal

start = 0
stop = 10
num_samps = int(1e8)
resample_up = 2
resample_down = 3

# Generate Data on CPU
cx = np.linspace(start, stop, num_samps, endpoint=False) 
cy = np.cos(-cx**2/6.0)

# Create shared memory between CPU and GPU and load with CPU signal (cy)
gpu_signal = cusignal.get_shared_mem(num_samps, dtype=np.float64)

%%time
# Move data to GPU/CPU shared buffer and run polyphase resampler
gpu_signal[:] = cy
gf = cusignal.resample_poly(gpu_signal, resample_up, resample_down, window=('kaiser', 0.5))

This code executes on an NVIDIA V100 in 174 ms.

cuSignal with Data Generated on the CPU and Copied to GPU [AVOID THIS FOR ONLINE SIGNAL PROCESSING]

import cupy as cp
import numpy as np
import cusignal

start = 0
stop = 10
num_samps = int(1e8)
resample_up = 2
resample_down = 3

# Generate Data on CPU
cx = np.linspace(start, stop, num_samps, endpoint=False) 
cy = np.cos(-cx**2/6.0)

%%time
gf = cusignal.resample_poly(cp.asarray(cy), resample_up, resample_down, window=('kaiser', 0.5))

This code executes on an NVIDIA V100 in 637 ms.

Documentation

The complete cuSignal API documentation including a complete list of functionality and examples can be found for both the Stable and Nightly (Experimental) releases.

cuSignal 0.17 API | cuSignal 0.18 Nightly

Installation

cuSignal has been tested on and supports all modern GPUs - from Maxwell to Ampere. While Anaconda is the preferred installation mechanism for cuSignal, developers and Jetson users should follow the source build instructions below. As of cuSignal 0.16, there isn't a cuSignal conda package for aarch64.

Conda, Linux OS

cuSignal can be installed with conda (Miniconda, or the full Anaconda distribution) from the rapidsai channel. If you're using a Jetson GPU, please follow the build instructions below

For cusignal version == 0.17:

For CUDA 10.1.2
conda install -c rapidsai -c nvidia -c numba -c conda-forge \
    cusignal=0.17 python=3.8 cudatoolkit=10.1

# or, for CUDA 10.2
conda install -c rapidsai -c nvidia -c numba -c conda-forge \
    cusignal=0.17 python=3.8 cudatoolkit=10.2

# or, for CUDA 11.0
conda install -c rapidsai -c nvidia -c numba -c conda-forge \
    cusignal=0.17 python=3.8 cudatoolkit=11.0

For the nightly verison of cusignal, currently 0.18a:

# For CUDA 10.1.2
conda install -c rapidsai-nightly -c nvidia -c numba -c conda-forge \
    cusignal python=3.8 cudatoolkit=10.1.2

# or, for CUDA 10.2
conda install -c rapidsai-nightly -c nvidia -c numba -c conda-forge \
    cusignal python=3.8 cudatoolkit=10.2

# or, for CUDA 11.0
conda install -c rapidsai-nightly -c nvidia -c numba -c conda-forge \
    cusignal python=3.8 cudatoolkit=11.0

cuSignal has been tested and confirmed to work with Python 3.6, 3.7, and 3.8.

See the Get RAPIDS version picker for more OS and version info.

Source, aarch64 (Jetson Nano, TK1, TX2, Xavier), Linux OS

In cuSignal 0.15 and beyond, we are moving our supported aarch64 Anaconda environment from conda4aarch64 to miniforge. Further, it's assumed that your Jetson device is running a current (>= 4.3) edition of JetPack and contains the CUDA Toolkit.

Clone the repository

# Set the location to cuSignal in an environment variable CUSIGNAL_HOME
export CUSIGNAL_HOME=$(pwd)/cusignal

# Download the cuSignal repo
git clone https://github.com/rapidsai/cusignal.git $CUSIGNAL_HOME

Install miniforge and create the cuSignal conda environment:
```
cd $CUSIGNAL_HOME
conda env create -f conda/environments/cusignal_jetson_base.yml
```
Note: Compilation and installation of CuPy can be quite lengthy (~30+ mins), particularly on the Jetson Nano. Please consider setting this environment variable to decrease the CuPy dependency install time:

export CUPY_NVCC_GENERATE_CODE="arch=compute_XX,code=sm_XX" with XX being your GPU's compute capability. If you'd like to compile to multiple architectures (e.g Nano and Xavier), concatenate the arch=... string with semicolins.
Activate conda environment

conda activate cusignal-dev

Install cuSignal module

cd $CUSIGNAL_HOME
./build.sh  # install cuSignal to $PREFIX if set, otherwise $CONDA_PREFIX
            # run ./build.sh -h to print the supported command line options.

Once installed, periodically update environment

cd $CUSIGNAL_HOME
conda env update -f conda/environments/cusignal_jetson_base.yml

Also, confirm unit testing via PyTest

cd $CUSIGNAL_HOME/python
pytest -v  # for verbose mode
pytest -v -k <function name>  # for more select testing

Source, Linux OS

Clone the repository

# Set the location to cuSignal in an environment variable CUSIGNAL_HOME
export CUSIGNAL_HOME=$(pwd)/cusignal

# Download the cuSignal repo
git clone https://github.com/rapidsai/cusignal.git $CUSIGNAL_HOME

Download and install Anaconda or Miniconda then create the cuSignal conda environment:

Base environment (core dependencies for cuSignal)
```
cd $CUSIGNAL_HOME
conda env create -f conda/environments/cusignal_base.yml
```
Full environment (including RAPIDS's cuDF, cuML, cuGraph, and PyTorch)
```
cd $CUSIGNAL_HOME
conda env create -f conda/environments/cusignal_full.yml
```
Activate conda environment

conda activate cusignal-dev

Install cuSignal module

cd $CUSIGNAL_HOME
./build.sh  # install cuSignal to $PREFIX if set, otherwise $CONDA_PREFIX
            # run ./build.sh -h to print the supported command line options.

Once installed, periodically update environment

cd $CUSIGNAL_HOME
conda env update -f conda/environments/cusignal_base.yml

Also, confirm unit testing via PyTest

cd $CUSIGNAL_HOME/python
pytest -v  # for verbose mode
pytest -v -k <function name>  # for more select testing

Source, Windows OS

We have confirmed that cuSignal successfully builds and runs on Windows by using CUDA on WSL. Please follow the instructions in the link to install WSL 2 and the associated CUDA drivers. You can then proceed to follow the cuSignal source build instructions, below.

Download and install Andaconda for Windows. In an Anaconda Prompt, navigate to your checkout of cuSignal.
Create cuSignal conda environment

conda create --name cusignal-dev
Activate conda environment

conda activate cusignal-dev
Install cuSignal Core Dependencies
```
conda install numpy numba scipy cudatoolkit pip
pip install cupy-cudaXXX
```
Where XXX is the version of the CUDA toolkit you have installed. 10.1, for example is cupy-cuda101. See the CuPy Documentation for information on getting Windows wheels for other versions of CUDA.
Install cuSignal module
```
./build.sh
```
[Optional] Run tests In the cuSignal top level directory:
```
pip install pytest
pytest
```

Docker - All RAPIDS Libraries, including cuSignal

For cusignal version == 0.16:

# For CUDA 11.0
docker pull rapidsai/rapidsai:cuda11.0-runtime-ubuntu18.04
docker run --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 \
    rapidsai/rapidsai:cuda11.0-runtime-ubuntu18.04

For the nightly version of cusignal

docker pull rapidsai/rapidsai-nightly:cuda11.0-runtime-ubuntu18.04
docker run --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 \
    rapidsai/rapidsai-nightly:cuda11.0-runtime-ubuntu18.04

Please see the RAPIDS Release Selector for more information on supported Python, Linux, and CUDA versions.

SDR Integration

SoapySDR is a "vendor neutral and platform independent" library for software-defined radio usage. When used in conjunction with device (SDR) specific modules, SoapySDR allows for easy command-and-control of radios from Python or C++. To install SoapySDR into an existing cuSignal Conda environment, run:

conda install -c conda-forge soapysdr

A full list of subsequent modules, specific to your SDR are listed here, but some common ones:

rtlsdr: conda install -c conda-forge soapysdr-module-rtlsdr
Pluto SDR: conda install -c conda-forge soapysdr-module-plutosdr
UHD: conda install -c conda-forge soapysdr-module-uhd

Another popular SDR library, specific to the rtl-sdr, is pyrtlsdr.

For examples using SoapySDR, pyrtlsdr, and cuSignal, please see the notebooks/sdr directory.

Please note, for most rtlsdr devices, you'll need to blacklist the libdvb driver in Linux. To do this, run sudo vi /etc/modprobe.d/blacklist.conf and add blacklist dvb_usb_rtl28xxu to the end of the file. Restart your computer upon completion.

If you have a SDR that isn't listed above (like the LimeSDR), don't worry! You can symbolically link the system-wide Python bindings installed via apt-get to the local conda environment. Please file an issue if you run into any problems.

Benchmarking

cuSignal uses pytest-benchmark to compare performance between CPU and GPU signal processing implementations. To run cuSignal's benchmark suite, navigate to the topmost python directory ($CUSIGNAL_HOME/python) and run:

pytest --benchmark-enable --benchmark-gpu-disable

Benchmarks are disabled by default in setup.cfg providing only test correctness checks.

As with the standard pytest tool, the user can use the -v and -k flags for verbose mode and to select a specific benchmark to run. When intrepreting the output, we recommend comparing the mean execution time reported.

To reduce columns in benchmark result's table, add --benchmark-columns=LABELS, like --benchmark-columns=min,max,mean. For more information on pytest-benchmark please visit the Usage Guide.

Parameter --benchmark-gpu-disable is to disable memory checks from Rapids GPU benchmark tool. Doing so speeds up benchmarking.

If you wish to skip benchmarks of SciPy functions add -m "not cpu"

Lastly, benchmarks will be executed on local files. Therefore to test recent changes made to source, rebuild cuSignal.

Example

pytest -k upfirdn2d -m "not cpu" --benchmark-enable --benchmark-gpu-disable --benchmark-columns=mean

Output

cusignal/test/test_filtering.py ..................                                                                                                                                                                                                                                   [100%]


---------- benchmark 'UpFirDn2d': 18 tests -----------
Name (time in us, mem in bytes)         Mean          
------------------------------------------------------
test_upfirdn2d_gpu[-1-1-3-256]      195.2299 (1.0)    
test_upfirdn2d_gpu[-1-9-3-256]      196.1766 (1.00)   
test_upfirdn2d_gpu[-1-1-7-256]      196.2881 (1.01)   
test_upfirdn2d_gpu[0-2-3-256]       196.9984 (1.01)   
test_upfirdn2d_gpu[0-9-3-256]       197.5675 (1.01)   
test_upfirdn2d_gpu[0-1-7-256]       197.9015 (1.01)   
test_upfirdn2d_gpu[-1-9-7-256]      198.0923 (1.01)   
test_upfirdn2d_gpu[-1-2-7-256]      198.3325 (1.02)   
test_upfirdn2d_gpu[0-2-7-256]       198.4676 (1.02)   
test_upfirdn2d_gpu[0-9-7-256]       198.6437 (1.02)   
test_upfirdn2d_gpu[0-1-3-256]       198.7477 (1.02)   
test_upfirdn2d_gpu[-1-2-3-256]      200.1589 (1.03)   
test_upfirdn2d_gpu[-1-2-2-256]      213.0316 (1.09)   
test_upfirdn2d_gpu[0-1-2-256]       213.0944 (1.09)   
test_upfirdn2d_gpu[-1-9-2-256]      214.6168 (1.10)   
test_upfirdn2d_gpu[0-2-2-256]       214.6975 (1.10)   
test_upfirdn2d_gpu[-1-1-2-256]      216.4033 (1.11)   
test_upfirdn2d_gpu[0-9-2-256]       217.1675 (1.11)   
------------------------------------------------------

Contributing Guide

Review the CONTRIBUTING.md file for information on how to contribute code and issues to the project.

cuSignal Blogs and Talks

cuSignal - GPU Accelerating SciPy Signal with Numba and CuPy cuSignal - SciPy 2020 - Recording
Announcement Talk - GTC DC 2019 - Recording | Slides
GPU Accelerated Signal Processing with cuSignal - Adam Thompson - Medium
cuSignal 0.13 - Entering the Big Leagues and Focused on Screamin' Streaming Performance - Adam Thompson - Medium
cuSignal: Easy CUDA GPU Acceleration for SDR DSP and Other Applications - RTL-SDR.com
cuSignal on the AIR-T - Deepwave Digital
Detecting, Labeling, and Recording Training Data with the AIR-T and cuSignal - Deepwave Digital
Signal Processing and Deep Learning - Deepwave Digital
cuSignal and CyberRadio Demonstrate GPU Accelerated SDR - Andrew Back - LimeMicro
Follow the latest cuSignal Announcements on Twitter

Comments

[BUG] cuSignal throws `RawModule` Error with CuPy 6.0 on Windows OS

Hello, I have Nvidia Tesla K40 in my HP Z440 PC as shown below==> import numba.cuda

numba.cuda.detect() Found 2 CUDA devices id 0 b'Tesla K40c' [SUPPORTED] compute capability: 3.5 pci device id: 0 pci bus id: 3 id 1 b'Quadro K620' [SUPPORTED] compute capability: 5.0 pci device id: 0 pci bus id: 2 Summary: 2/2 devices are supported Out[8]: True

Have Nvidia Conda toolkit 11 as shown below==> (base) PS C:\Windows\system32> nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2020 NVIDIA Corporation Built on Mon_Oct_12_20:54:10_Pacific_Daylight_Time_2020 Cuda compilation tools, release 11.1, V11.1.105 Build cuda_11.1.relgpu_drvr455TC455_06.29190527_0 (base) PS C:\Windows\system32>

Below is output of pip list command (Only cusignal relevant parts are posted)

clyent 1.2.2
colorama 0.4.3
comtypes 1.1.7
conda 4.9.2
conda-build 3.18.11
conda-package-handling 1.6.0
conda-verify 3.4.2
contextlib2 0.6.0.post1
cryptography 2.8
cupy 6.0.0
cupy-cuda111 8.1.0

Trying to run the example code in your site

Created on Thu Nov 12 13:30:18 2020

@author: ljoseph """

import cupy as cp import cusignal

start = 0 stop = 10 num_samps = int(1e8) resample_up = 2 resample_down = 3

gx = cp.linspace(start, stop, num_samps, endpoint=False) gy = cp.cos(-gx**2/6.0)

%%timeit

gf = cusignal.resample_poly(gy, resample_up, resample_down, window=('kaiser', 0.5))

Getting below error message==> runfile('C:/dpd/Python/Try/cusignal_1.py', wdir='C:/dpd/Python/Try') Traceback (most recent call last):

File "C:\dpd\Python\Try\cusignal_1.py", line 21, in gf = cusignal.resample_poly(gy, resample_up, resample_down, window=('kaiser', 0.5))

File "C:\ProgramData\Anaconda3\lib\site-packages\cusignal-0.17.0a0+29.ga3e5293-py3.7.egg\cusignal\filtering\resample.py", line 422, in resample_poly y = upfirdn(h, x, up, down, axis)

File "C:\ProgramData\Anaconda3\lib\site-packages\cusignal-0.17.0a0+29.ga3e5293-py3.7.egg\cusignal\filtering\resample.py", line 521, in upfirdn return ufd.apply_filter(x, axis)

File "C:\ProgramData\Anaconda3\lib\site-packages\cusignal-0.17.0a0+29.ga3e5293-py3.7.egg\cusignal\filtering_upfirdn_cuda.py", line 214, in apply_filter _populate_kernel_cache(out.dtype, k_type)

File "C:\ProgramData\Anaconda3\lib\site-packages\cusignal-0.17.0a0+29.ga3e5293-py3.7.egg\cusignal\filtering_upfirdn_cuda.py", line 139, in populate_kernel_cache "cupy" + k_type + "" + str(np_type),

File "C:\ProgramData\Anaconda3\lib\site-packages\cusignal-0.17.0a0+29.ga3e5293-py3.7.egg\cusignal\utils\helper_tools.py", line 54, in _get_function module = cp.RawModule(

AttributeError: module 'cupy' has no attribute 'RawModule'

Could you please help me on this.
inactive-30d

opened by leyojoseph 33
[WIP] [ENH] Enhance FIR filter methods.
This pull-request implements the following:

Initial conditions for the firfilter method.

Zero-phase shift method called firfilter2 (based on scipy.signal.filtfilt)

FIR filter design tool firwin2 (based on scipy.signal.firwin2)

Initial conditions constructor firfilter_zi.

Bindings for lfilter and lfilter_zi with NotImplementedError for IIR coefficients.

Ignore *.fatbin files.

Dependencies:

This pull-request depends on the cupy.apply_along_axis method that is scheduled to be released in version 9.0.0. Users can test this PR by installing cupy==9.0.0a1 from PyPi.

Todo List:

[x] Implement tests for the new methods.

[ ] Wait for cupy==9.0.0 release.

2 - In Progress conda improvement non-breaking Python
opened by luigifcruz 32
[DOC] Install information for Jetson Nano

I think it might be possible to get this code running on the Jetson Nano, but I'm not able to at the moment.

The conda version for the Nano that I'm trying is Archiconda, https://github.com/Archiconda .

The Nano has Maxwell cores, which should be enough for this. But I don't think all the necessary dependencies have been put together in Archiconda, and I'm not exactly sure where to start. I get a series of PackagesNotFoundError messages saying that I can't install from current channels.

Before I go too much further, happy to provide more details as requested, or to understand whether this is intended to work or known not to work for some more fundamental reason!
good first issue

opened by vielmetti 25
[FEA] Implementation of lfilter.

The lfilter function would be really useful for demodulation. For example, FM demodulation (de-emphasis and stereo).

Scipy Signal: https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.lfilter.html#scipy.signal.lfilter
2 - In Progress feature request

opened by luigifcruz 24
[REVIEW] Added streams and double/complex64/complex128 functionality

~~I would like to determine why my Numba kernels are running so much slower than my CuPy kernels.~~

~~Using floats (Numba)~~ ~~# @cuda.jit(fastmath=True) # 38 registers - 157-190us~~ ~~# @cuda.jit(void(float32[:], float32[:], int32, int32, int32, int32, int32, float32[:],), fastmath=True) # 48 registers - 160-190us~~ ~~@cuda.jit(void(float32[:], float32[:], int64, int64, int64, int64, int64, float32[:],), fastmath=True) # 38 registers - 157-190us~~ ~~def _numba_upfirdn_1d_float(x, h_trans_flip, up, down, x_shape_a, h_per_phase, padded_len, out):~~

~~Using doubles (Numba)~~ ~~# @cuda.jit(fastmath=True) # 38 registers - 157-190us~~ ~~# @cuda.jit(void(float64[:], float64[:], int32, int32, int32, int32, int32, float64[:],), fastmath=True) # 48 registers - 160-190us~~ ~~@cuda.jit(void(float64[:], float64[:], int64, int64, int64, int64, int64, float64[:],), fastmath=True) # 39 registers - 157-190us~~ ~~def _numba_upfirdn_1d_double(x, h_trans_flip, up, down, x_shape_a, h_per_phase, padded_len, out):~~

~~Using floats (CuPy Raw Kernel)~~ ~~# 21 registers - ~10us~~

~~Using doubles (CuPy Raw Kernel)~~ ~~# 30 registers - ~42us~~

I've moved Systems profiling to a second GPU (no X-server) and I'm getting comparable results.

Time(%) Total Time Instances Average Minimum Maximum Name
------- ---------- ------- --------- --------- ------- ------------------ 26.0 8041093 100 80410.9 78527 81535 _numba_upfirdn_1d_complex128$244
23.6 7277675 100 72776.7 71648 74112 _cupy_upfirdn_1d_complex_double
10.2 3142669 100 31426.7 31232 32128 _numba_upfirdn_1d_double$242
9.4 2910471 100 29104.7 28960 30207 _numba_upfirdn_1d_complex64$243
7.7 2379027 100 23790.3 23679 24704 _numba_upfirdn_1d_float$241
6.7 2077875 100 20778.8 20448 20928 _cupy_upfirdn_1d_double
4.5 1385173 100 13851.7 13728 13952 _cupy_upfirdn_1d_complex_float
4.0 1244918 100 12449.2 12352 12512 _cupy_upfirdn_1d_float

There is a significant difference between registers usage with CuPy but not Numba.

These times are from Nsight Systems, using the attached python script. test.py.txt

Also, is it possible to overload Numba kernels? Supposedly it is with https://github.com/numba/numba/issues/431. But I'm unable to make it work.~~

I've added qdrep and sqlite files. It looks like CuPy compiles are getting cached in the same manner as Numba. cusignal.zip

As I continue to optimize, I'm noticing that even though the CuPy kernels may be 2x faster than Numba, the required launch time of the CuPy kernel take 2x longer than Numba. Making Numba the favorable option... Screenshot below. It's quite possible I'm doing something sub-optimal

opened by mnicely 23

[QST] Is lazy evaluation used ?

I'm trying to do a simple benchmarking of cuSignal vs signal on correlation. It seems that the correlate2d completes immediately, and the buffer transfer takes 2.8 seconds. Makes no sense for buffer size of 80 MBytes. Could it be the correlation is evaluated lazily only when buffer transfer is requested ?

%time signal_t_gpu = cp.asarray(signal_t)
%time pulse_t_2d_gpu = cp.asarray(pulse_t_2d)

%time corr_cusig = cusignal.correlate2d(signal_t_gpu, pulse_t_2d_gpu, mode='valid')

%time corr_cusig_np = corr_cusig.get()
%time corr_cusig_np2 = cp.asnumpy(corr_cusig)
%time corr_cusig_np3 = cp.asnumpy(corr_cusig)

and getting :

Wall time: 12 ms
Wall time: 996 µs
Wall time: 0 ns
Wall time: 2.79 s
Wall time: 30 ms
Wall time: 30 ms

question

opened by flytrex-vadim 22

[PR-REVIEW] GPU Accelerated SigMF Reader/Writer
Info

This PR is to solve #117.

Initial commit is skeleton code that does the following

Uses mmap to load a binary to RAM

Copy to DRAM

Parse binary to new location

TODO:

Add helper function to encapsulate read_bin and parse_bin based on format type.

3 - Ready for Review
opened by mnicely 20

[BUG] channelize_poly() CUDA_ERROR_FILE_NOT_FOUND after Jetson conda install (0.16)

Describe the bug channelize_poly() function is not finding cusignal/filtering/_channelizer.fatbin on Jetson Nano.

Steps/Code to reproduce bug

import cusignal
cusignal.filtering.channelize_poly(cp.random.randn(128), cp.ones((16,))/16, 4)

Expected behavior channelize_poly returns channelized output.

Environment details (please complete the following information):

Environment location: [Jetson Nano SBC]
Method of cuSignal install: [conda] Following the Jetson install instructions to install a clone of branch-0.16:

git clone https://github.com/rapidsai/cusignal.git $CUSIGNAL_HOME

cd $CUSIGNAL_HOME
conda env create -f conda/environments/cusignal_jetson_base.yml

conda activate cusignal-dev

cd $CUSIGNAL_HOME/python
python setup.py install

Running cusignal inside a python script (cuspec.py in stack trace).

Additional context Ran test suite, and actually encountered similar errors (CUDA_ERROR_FILE_NOT_FOUND). I wasn't familiar with the test output so I wrote it off at the time. Do I need to build from source?

Stack trace:

Traceback (most recent call last):
  File "/home/evanmayer/github/rtlobs/cuspec.py", line 68, in <module>
    cusignal.filtering.channelize_poly(cp.random.randn(128), cp.ones((16,))/16, 4)
  File "/home/evanmayer/miniforge3/envs/cusignal-dev/lib/python3.8/site-packages/cusignal-0.16.0a0+160.g67a650e-py3.8.egg/cusignal/filtering/filtering.py", line 722, in channelize_poly
    _channelizer(x, h, y, n_chans, n_taps, n_pts)
  File "/home/evanmayer/miniforge3/envs/cusignal-dev/lib/python3.8/site-packages/cusignal-0.16.0a0+160.g67a650e-py3.8.egg/cusignal/filtering/_channelizer_cuda.py", line 84, in _channelizer
    _populate_kernel_cache(np_type, k_type)
  File "/home/evanmayer/miniforge3/envs/cusignal-dev/lib/python3.8/site-packages/cusignal-0.16.0a0+160.g67a650e-py3.8.egg/cusignal/filtering/_channelizer_cuda.py", line 54, in _populate_kernel_cache
    _cupy_kernel_cache[(str(np_type), k_type)] = _get_function(
  File "/home/evanmayer/miniforge3/envs/cusignal-dev/lib/python3.8/site-packages/cusignal-0.16.0a0+160.g67a650e-py3.8.egg/cusignal/utils/helper_tools.py", line 40, in _get_function
    module = cp.RawModule(path=dir + fatbin,)
  File "cupy/core/raw.pyx", line 260, in cupy.core.raw.RawModule.__init__
  File "cupy/cuda/function.pyx", line 191, in cupy.cuda.function.Module.load_file
  File "cupy/cuda/function.pyx", line 195, in cupy.cuda.function.Module.load_file
  File "cupy/cuda/driver.pyx", line 231, in cupy.cuda.driver.moduleLoad
  File "cupy/cuda/driver.pyx", line 118, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_FILE_NOT_FOUND: file not found

bug ? - Needs Triage

opened by evanmayer 19

[BUG] [Jetson Nano Conda install hangs on installing pip dependencies]
Describe the bug When creating the conda environment on a Jetson Nano Development kit, the installation proceeds until installing pip dependencies, where it hangs indefinitely.

Steps/Code to reproduce bug Fresh Jetpack install on Jetson Nano board. Follow instructions for building from source on Jetson Nano exactly.

Expected behavior Expected to install the environment.

Environment details (please complete the following information):

Environment location: Jetson Nano board with Jetpack SDK

Method of cuSignal install: conda (specifically miniforge)

I've never used conda before, so I don't know exactly what logs are needed, but this is the last output from the install before it hangs: Installing pip dependencies: ...working...
2 - In Progress doc
opened by emeldar 18

[PR-REVIEW] Load fatbin at runtime

This PR is to investigate the use of loading kernels via PTX and cubins. Instead of compiling them at runtime.

The idea is to skip the process from Source Code to PTX or cubin. This should eliminate the need to precompile desired kernels, making the UI a little more friendly.

There are pros and cons to both PTX and cubin.

PTX

Pro: We can choose a single architecture (default 3.0) and any hardware will JIT based on Compute Capability. Con: This can leave performance on the table and can be slower than cubins

Cubin

Pro: Optimal performance and (slightly) faster load because it skip the JIT Con: Requires a cubin for each supported architecture (i.e. sm30, sm35, sm50, sm52, sm60, sm62, sm70, sm72, sm75, sm80)

Each methode required more space for files and cu file (if we decide to load it)

Currently, using PTX method.

Anecdotal results. PTX and cubin is ~18x faster on first pass

SOURCE CODE
Time(%)      Total Time   Instances         Average         Minimum         Maximum  Range               
-------  --------------  ----------  --------------  --------------  --------------  --------------------
   78.3       259557232         100       2595572.3         2163073         2954536  gpu_lombscargle_4   
   18.9        62547903           1      62547903.0        62547903        62547903  gpu_lombscargle_0 <-- 
    1.0         3339638           1       3339638.0         3339638         3339638  gpu_lombscargle_1   
    0.9         2960195           1       2960195.0         2960195         2960195  gpu_lombscargle_2   
    0.9         2951219           1       2951219.0         2951219         2951219  gpu_lombscargle_3
PTX
Time(%)      Total Time   Instances         Average         Minimum         Maximum  Range               
-------  --------------  ----------  --------------  --------------  --------------  --------------------
   95.2       247349403         100       2473494.0         2121631         2904074  gpu_lombscargle_4   
    1.3         3447313           1       3447313.0         3447313         3447313  gpu_lombscargle_0 <--  
    1.2         3234977           1       3234977.0         3234977         3234977  gpu_lombscargle_1   
    1.1         2904661           1       2904661.0         2904661         2904661  gpu_lombscargle_2   
    1.1         2902191           1       2902191.0         2902191         2902191  gpu_lombscargle_3
CUBIN
Time(%)      Total Time   Instances         Average         Minimum         Maximum  Range               
-------  --------------  ----------  --------------  --------------  --------------  --------------------
   95.2       239095813         100       2390958.1         2065998         2840041  gpu_lombscargle_4   
    1.3         3325468           1       3325468.0         3325468         3325468  gpu_lombscargle_0 <--  
    1.3         3163933           1       3163933.0         3163933         3163933  gpu_lombscargle_1   
    1.1         2828210           1       2828210.0         2828210         2828210  gpu_lombscargle_2   
    1.1         2823180           1       2823180.0         2823180         2823180  gpu_lombscargle_3

3 - Ready for Review

opened by mnicely 17

[REVIEW] Use Numba 0.49+ API where required

https://github.com/numba/numba/pull/5197 refactors many of Numba's submodules. Mirror the required import changes in cusignal.

This keeps cusignal in sync with numba master and the 0.49 release candidate but breaks compatibility with the current numba release (0.48).

This should go in eventually but also needs to make sure we keep compatibility with whatever version of numba we're getting in the conda channels selected by our packaging (and certainly updates the minimum required version) - need help with this part.
5 - Ready to Merge

opened by dicta 15
[DOC] Links need fix in README
Report incorrect documentation

Location of incorrect documentation Table of Contents in README.md

Describe the problems or issues found in the documentation The provided hyperlinks (anchors) within README.md file are broken/not updated.

Steps taken to verify documentation is incorrect List any steps you have taken: Checked by clicking the links, verified with latest links.

Suggested fix for documentation In the Table of Contents "Installation" section in README.md:

Replace anchor tag for Conda: Linux OS installation from #conda-linux-os to #conda-linux-os-preferred.

Replace anchor tag for Source, aarch64 (Jetson Nano, TK1, TX2, Xavier, AGX Clara DevKit), Linux OS installation from #source-aarch64-jetson-nano-tk1-tx2-xavier-linux-os to #source-aarch64-jetson-nano-tk1-tx2-xavier-agx-clara-devkit-linux-os.

doc ? - Needs Triage
opened by AGoyal0512 0
[gpuCI] Forward-merge branch-22.12 to branch-23.02 [skip gpuci]

Forward-merge triggered by push to branch-22.12 that creates a PR to keep branch-23.02 up-to-date. If this PR is unable to be immediately merged due to conflicts, it will remain open for the team to manually merge.

opened by GPUtester 1
Add differentiable polyphase resampler

closes #491

Adds a differentiable polyphase resampler in a new diff submodule. Includes some basic unit tests and a demonstration of how to incorporate the new layer in a Pytorch sequential model.
improvement non-breaking Python inactive-30d

opened by mbolding3 8
[FEA] pytorch polyphase resampler

Opening this issue to facilitate discussion. A Pytorch compatible wrapper around cusignal's polyphase resampler is here (WIP).

@awthomp Where is a good place to put the source? Currently putting it in branch 22.08 in a new sub-directory python/cusignal/pytorch but curious if a place already exists.

Also the backward method works (passes gradcheck, uses cusignal correlate) but is not optimized. It can be re-implemented most likely as another resample_poly call (since it upscales and convolves). That is on the to-do list.
2 - In Progress feature request inactive-30d

opened by mbolding3 12

[BUG] FutureWarning from CuPy on `resample`

Describe the bug When resampling a CuPy array, we get a FutureWarning instructing us not to use a non-tuple sequence for multidimensional indexing:

/opt/conda/envs/rapids/lib/python3.9/site-packages/cusignal-22.4.0a0+g8878bf7-py3.9.egg/cusignal/filtering/resample.py:269: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[cupy.array(seq)]`, which will result either in an error or a different result.
  Y[sl] = X[sl]

Steps/Code to reproduce bug With latest 22.06 conda nightlies:

import cupy as cp
import cusignal

start = 0
stop = 10
num = int(1e8)
resample_num = int(1e5)

gx = cp.linspace(start, stop, num, endpoint=False)
gy = cp.cos(-gx**2/6.0)

gf = cusignal.resample(gy, resample_num)

Expected behavior Would not expect this warning to appear when resampling.

Environment details (please complete the following information):

Environment location: bare-metal
Method of cuSignal install: conda

Additional context Came across this issue while testing the filtering examples notebook.

doc ? - Needs Triage inactive-30d inactive-90d

opened by charlesbluca 2

Releases(v22.12.00)

v22.12.00(Dec 8, 2022)
📖 Documentation

Readme update (#522) @awthomp

Revisit WSL install instructions: additional dependency needed for pytest (#514) @evanmayer

🛠️ Improvements

Flake8 migrated to GitHub and that broke some pre-commit checks (#518) @jacobtomlinson

fix filtering.resample output for even values of num parameter (#517) @mattkinsey

Use rapidsai CODE_OF_CONDUCT.md (#516) @bdice

Update channel priority (#515) @bdice

Remove stale labeler (#512) @raydouglass

Add option for smaller dataset in IO notebook (#473) @charlesbluca

Source code(tar.gz)
Source code(zip)
v23.02.00a(Dec 7, 2022)
🔗 Links

Development Branch

Compare with main branch

🛠️ Improvements

Add GitHub Actions Workflows (#528) @bdice

Enable copy_prs. (#525) @bdice

Source code(tar.gz)
Source code(zip)
v22.10.01(Oct 18, 2022)
🚀 New Features

Allow cupy 11 (#505) @galipremsagar

🛠️ Improvements

Unpin numpy in cusignal (#510) @galipremsagar

Source code(tar.gz)
Source code(zip)
v22.10.00(Oct 12, 2022)
🚀 New Features

Allow cupy 11 (#505) @galipremsagar

Source code(tar.gz)
Source code(zip)
v22.08.00(Aug 17, 2022)
📖 Documentation

Switch to using common js & css (#499) @galipremsagar

Refresh README (#487) @awthomp

🛠️ Improvements

Revert "Allow CuPy 11" (#497) @galipremsagar

Allow CuPy 11 (#494) @jakirkham

Source code(tar.gz)
Source code(zip)
v22.06.00(Jun 7, 2022)
🛠️ Improvements

Simplify conda recipes (#484) @Ethyling

Use conda to build python packages during GPU tests (#480) @Ethyling

Fix pinned buffer IO issues (#479) @charlesbluca

Use pre-commit to enforce Python style checks (#478) @charlesbluca

Extend get_pinned_mem to work with more dtype / shapes (#477) @charlesbluca

Add/fix installation sections for SDR example notebooks (#476) @charlesbluca

Use conda compilers (#461) @Ethyling

Build packages using mambabuild (#453) @Ethyling

Source code(tar.gz)
Source code(zip)
v22.04.00(Apr 6, 2022)
🐛 Bug Fixes

Fix docs builds (#455) @ajschmidt8

📖 Documentation

Fixes a list of minor errors in the example codes #457 (#458) @sosae0

🚀 New Features

adding complex parameter to chirp and additional tests (#450) @mnicely

🛠️ Improvements

Temporarily disable new ops-bot functionality (#465) @ajschmidt8

Add .github/ops-bot.yaml config file (#463) @ajschmidt8

correlation lags function (#459) @sosae0

Source code(tar.gz)
Source code(zip)
v22.02.00(Feb 2, 2022)
🛠️ Improvements

Allow CuPy 10 (#448) @jakirkham

Speedup: Single-precision hilbert, resample, and lfilter_zi. (#447) @luigifcruz

Add Nemo Machine Translation to SDR Notebook (#445) @awthomp

Add citrinet and fm_demod cusignal function to notebook (#444) @awthomp

Add FM Demodulation to cuSignal (#443) @awthomp

Revamp Offline RTL-SDR Notebook - FM Demod and NeMo Speech to Text (#442) @awthomp

Bypass Covariance Matrix Calculation if Supplied in MVDR Beamformer (#437) @awthomp

Source code(tar.gz)
Source code(zip)
v21.12.00(Dec 8, 2021)
🐛 Bug Fixes

Data type conversion for cwt. (#429) @shevateng0

Fix indexing error in CWT (#425) @awthomp

📖 Documentation

Use PyData Sphinx Theme for Generated Documentation (#436) @cmpadden

Doc fix for FIR Filters (#426) @awthomp

🛠️ Improvements

Fix Changelog Merge Conflicts for branch-21.12 (#439) @ajschmidt8

remove use_numba from notebooks - deprecated (#433) @awthomp

Allow complex wavelet output for morlet2 (#428) @shevateng0

Source code(tar.gz)
Source code(zip)
v21.10.00(Oct 7, 2021)
🐛 Bug Fixes

Change type check to use isinstance instead of str compare (#415) @jrc-exp

📖 Documentation

Fix typo in readme (#413) @awthomp

Add citation file (#412) @awthomp

README overhaul (#411) @awthomp

🛠️ Improvements

Fix Forward-Merge Conflicts (#417) @ajschmidt8

Adding CFAR (#409) @mbolding3

support space in workspace (#349) @jolorunyomi

Source code(tar.gz)
Source code(zip)
v21.08.00(Aug 4, 2021)
🐛 Bug Fixes

Remove pytorch from cusignal CI/CD (#404) @awthomp

fix firwin bug where fs is ignored if nyq provided (#400) @awthomp

Fixed imaginary part being removed in delay mode of ambgfun (#397) @cliffburdick

🛠️ Improvements

mvdr perf optimizations and addition of elementwise divide kernel (#403) @awthomp

Update sphinx config (#395) @ajschmidt8

Add Ambiguity Function (ambgfun) (#393) @awthomp

Fix 21.08 forward-merge conflicts (#392) @ajschmidt8

Adding MVDR (Capon) Beamformer (#383) @awthomp

Fix merge conflicts (#379) @ajschmidt8

Source code(tar.gz)
Source code(zip)
v21.06.00(Jun 9, 2021)
🛠️ Improvements

Perf Improvements for UPFIRDN (#378) @mnicely

Perf Improvements to SOS Filter (#377) @mnicely

Update environment variable used to determine cuda_version (#376) @ajschmidt8

Update CHANGELOG.md links for calver (#373) @ajschmidt8

Merge branch-0.19 into branch-21.06 (#372) @ajschmidt8

Update docs build script (#369) @ajschmidt8

Source code(tar.gz)
Source code(zip)
v0.19.0(Apr 21, 2021)
🐛 Bug Fixes

Fix bug in casting array to cupy (#340) @awthomp

🚀 New Features

Add morlet2 (#336) @mnicely

Increment Max CuPy Version in CI (#328) @awthomp

🛠️ Improvements

Add cusignal source dockerfile (#343) @awthomp

Update min scipy and cupy versions (#339) @awthomp

Add Taylor window (#338) @mnicely

Skip online signal processing tools testing (#337) @awthomp

Add 2D grid-stride loop to fix BUG 295 (#335) @mnicely

Update Changelog Link (#334) @ajschmidt8

Fix bug in bin_reader that ignored dtype (#333) @awthomp

Remove six dependency (#332) @awthomp

Prepare Changelog for Automation (#331) @ajschmidt8

Update 0.18 changelog entry (#330) @ajschmidt8

Fix merge conflicts in #315 (#316) @ajschmidt8

Source code(tar.gz)
Source code(zip)
v0.18.0(Feb 24, 2021)
Bug Fixes 🐛

Fix labeler.yml GitHub Action (#301) @ajschmidt8

Fix Branch 0.18 merge 0.17 (#298) @BradReesWork

Documentation 📖

Add WSL instructions for cuSignal Windows Builds (#323) @awthomp

Fix Radar API Docs (#311) @awthomp

Update cuSignal Documentation to Include Radar Functions (#309) @awthomp

Specify CuPy install time on Jetson Platform (#306) @awthomp

Update README to optimize CuPy build time on Jetson (#305) @awthomp

New Features 🚀

Include scaffolding for new radar/phased array module and add new pulse compression feature (#300) @awthomp

Improvements 🛠️

Update stale GHA with exemptions & new labels (#321) @mike-wendt

Add GHA to mark issues/prs as stale/rotten (#319) @Ethyling

Prepare Changelog for Automation (#314) @ajschmidt8

Auto-label PRs based on their content (#313) @jolorunyomi

Fix typo in convolution jupyter notebook example (#310) @awthomp

Add Pulse-Doppler Processing to radartools (#307) @awthomp

Create labeler.yml (#299) @jolorunyomi

Clarify GPU timing in E2E Jupyter Notebook (#297) @awthomp

Bump cuSignal Version (#296) @awthomp

Source code(tar.gz)
Source code(zip)
v0.17.0(Dec 10, 2020)

v0.17.0 Release
Source code(tar.gz)
Source code(zip)
v0.16.0(Oct 21, 2020)

v0.16.0 Release
Source code(tar.gz)
Source code(zip)
v0.15.0(Sep 16, 2020)

v0.15.0 Release
Source code(tar.gz)
Source code(zip)

Owner

RAPIDS

Open GPU Data Science

GitHub

cuGraph - RAPIDS Graph Analytics Library

cuGraph - GPU Graph Analytics The RAPIDS cuGraph library is a collection of GPU accelerated graph algorithms that process data found in GPU DataFrames

1.2k Jan 1, 2023

BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.

A lightweight, GPU accelerated, SQL engine built on the RAPIDS.ai ecosystem. Get Started on app.blazingsql.com Getting Started | Documentation | Examp

1.8k Jan 2, 2023

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

NVIDIA DALI The NVIDIA Data Loading Library (DALI) is a library for data loading and pre-processing to accelerate deep learning applications. It provi

4.2k Jan 8, 2023

General purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases.

Vulkan Kompute The general purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabl

The Institute for Ethical Machine Learning

1k Dec 26, 2022

A NumPy-compatible array library accelerated by CUDA

6.6k Jan 5, 2023

ArrayFire: a general purpose GPU library.

ArrayFire is a general-purpose library that simplifies the process of developing software that targets parallel and massively-parallel architectures i

4k Dec 29, 2022

cuDF - GPU DataFrame Library

cuDF - GPU DataFrames NOTE: For the latest stable README.md ensure you are on the main branch. Resources cuDF Reference Documentation: Python API refe

5.2k Jan 8, 2023

Python 3 Bindings for NVML library. Get NVIDIA GPU status inside your program.

py3nvml Documentation also available at readthedocs. Python 3 compatible bindings to the NVIDIA Management Library. Can be used to query the state of

212 Jan 4, 2023

Python 3 Bindings for the NVIDIA Management Library

====== pyNVML ====== *** Patched to support Python 3 (and Python 2) *** ------------------------------------------------ Python bindings to the NVID

95 Jan 1, 2023

Library for faster pinned CPU <-> GPU transfer in Pytorch

SpeedTorch Faster pinned CPU tensor <-> GPU Pytorch variabe transfer and GPU tensor <-> GPU Pytorch variable transfer, in certain cases. Update 9-29-1

657 Dec 19, 2022

signal-cli-rest-api is a wrapper around signal-cli and allows you to interact with it through http requests

signal-cli-rest-api signal-cli-rest-api is a wrapper around signal-cli and allows you to interact with it through http requests. Features register/ver

31 Dec 9, 2022

The purpose of this code base is to add a specified signal-to-noise ratio noise from MUSAN dataset to a pure speech signal and to generate far-field speech data using room impulse response data from BUT Speech@FIT Reverb Database.

Add_noise_and_rir_to_speech The purpose of this code base is to add a specified signal-to-noise ratio noise from MUSAN dataset to a pure speech signal

7 Oct 30, 2022

Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.

Summary Pyroomacoustics is a software package aimed at the rapid development and testing of audio array processing algorithms. The content of the pack

1k Jan 9, 2023

cuSignal - RAPIDS Signal Processing Library

Related tags

Overview

cuSignal

Table of Contents

Quick Start

Documentation

Installation

Conda, Linux OS

Source, aarch64 (Jetson Nano, TK1, TX2, Xavier), Linux OS

Source, Linux OS

Source, Windows OS

Docker - All RAPIDS Libraries, including cuSignal

SDR Integration

Benchmarking

Example

Output

Contributing Guide

cuSignal Blogs and Talks

Comments

%%timeit

Dependencies:

Todo List:

Info

PTX

Cubin

Report incorrect documentation

Releases(v22.12.00)

v22.12.00(Dec 8, 2022)

📖 Documentation

🛠️ Improvements

v23.02.00a(Dec 7, 2022)

🔗 Links

🛠️ Improvements

v22.10.01(Oct 18, 2022)

🚀 New Features

🛠️ Improvements

v22.10.00(Oct 12, 2022)

🚀 New Features

v22.08.00(Aug 17, 2022)

📖 Documentation

🛠️ Improvements

v22.06.00(Jun 7, 2022)

🛠️ Improvements

v22.04.00(Apr 6, 2022)

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

v22.02.00(Feb 2, 2022)

🛠️ Improvements

v21.12.00(Dec 8, 2021)

🐛 Bug Fixes

📖 Documentation

🛠️ Improvements

v21.10.00(Oct 7, 2021)

🐛 Bug Fixes

📖 Documentation

🛠️ Improvements

v21.08.00(Aug 4, 2021)

🐛 Bug Fixes

🛠️ Improvements

v21.06.00(Jun 9, 2021)

🛠️ Improvements

v0.19.0(Apr 21, 2021)

🐛 Bug Fixes

🚀 New Features

🛠️ Improvements

v0.18.0(Feb 24, 2021)

Bug Fixes 🐛

Documentation 📖

New Features 🚀

Improvements 🛠️

v0.17.0(Dec 10, 2020)

v0.16.0(Oct 21, 2020)

v0.15.0(Sep 16, 2020)

Owner

RAPIDS

cuGraph - RAPIDS Graph Analytics Library

BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.