A NumPy-compatible array library accelerated by CUDA

CuPy

Last update: Jan 5, 2023

Related tags

GPU Utilities python gpu numpy cuda cublas scipy tensor cudnn cupy cusolver nccl curand cusparse nvrtc cutensor nvtx

Overview

CuPy : A NumPy-compatible array library accelerated by CUDA

CuPy is an implementation of NumPy-compatible multi-dimensional array on CUDA. CuPy consists of the core multi-dimensional array class, cupy.ndarray, and many functions on it.

Installation

Wheels (precompiled binary packages) are available for Linux (x86_64) and Windows (amd64). Choose the right package for your platform.

Platform	Command
CUDA 9.0	`pip install cupy-cuda90`
CUDA 9.2	`pip install cupy-cuda92`
CUDA 10.0	`pip install cupy-cuda100`
CUDA 10.1	`pip install cupy-cuda101`
CUDA 10.2	`pip install cupy-cuda102`
CUDA 11.0	`pip install cupy-cuda110`
CUDA 11.1	`pip install cupy-cuda111`
CUDA 11.2	`pip install cupy-cuda112`
ROCm 4.0	`pip install cupy-rocm-4-0` (experimental; see docs for details)

See the Installation Guide if you are using Conda/Anaconda or to build from source.

Run on Docker

Use NVIDIA Container Toolkit to run CuPy image with GPU.

$ docker run --gpus all -it cupy/cupy

More information

License

MIT License (see LICENSE file).

CuPy is designed based on NumPy's API and SciPy's API (see docs/LICENSE_THIRD_PARTY file).

CuPy is being maintained and developed by Preferred Networks Inc. and community contributors.

Reference

Ryosuke Okuta, Yuya Unno, Daisuke Nishino, Shohei Hido and Crissman Loomis. CuPy: A NumPy-Compatible Library for NVIDIA GPU Calculations. Proceedings of Workshop on Machine Learning Systems (LearningSys) in The Thirty-first Annual Conference on Neural Information Processing Systems (NIPS), (2017). URL

@inproceedings{cupy_learningsys2017,
  author       = "Okuta, Ryosuke and Unno, Yuya and Nishino, Daisuke and Hido, Shohei and Loomis, Crissman",
  title        = "CuPy: A NumPy-Compatible Library for NVIDIA GPU Calculations",
  booktitle    = "Proceedings of Workshop on Machine Learning Systems (LearningSys) in The Thirty-first Annual Conference on Neural Information Processing Systems (NIPS)",
  year         = "2017",
  url          = "http://learningsys.org/nips17/assets/papers/paper_16.pdf"
}

Comments

Build the `cupy.cuda.cub` module by default

Close #3078. Close #3075. Close #3108. Close #3507.

This PR includes all the CUB v1.8.0 headers (see notes below) so that the cupy.cuda.cub module can be built by default. This also avoids the need of documenting how to build it.

Note that the CUB_PATH env variable is no longer needed.

TODO:

[ ] ~~Write a unit test for cupy.cuda.cub~~ (related: #2579) UPDATE: see #2598
[x] Discuss with CuPy core devs whether we should set cupy.cuda.cub_enabled = False during import for backward compatibility, and whether the CUB_DISABLED env variable is still needed.

Note:

The CUB project is open source under the BSD license, so redistributing its source code is allowed if the license file is included.
Since CUB is a header-only library, many of its files are actually needed. Below is the list of CUB headers included when compiling cupy/cuda/cupy_cub.cu (obtained via cpp -MM -I(...) cupy/cuda/cupy_cub.cu):

cub/agent/agent_reduce.cuh
cub/agent/agent_reduce_by_key.cuh
cub/agent/agent_scan.cuh
cub/agent/single_pass_scan_operators.cuh
cub/block/block_discontinuity.cuh
cub/block/block_exchange.cuh
cub/block/block_load.cuh
cub/block/block_raking_layout.cuh
cub/block/block_reduce.cuh
cub/block/block_scan.cuh
cub/block/block_store.cuh
cub/block/specializations/block_reduce_raking.cuh
cub/block/specializations/block_reduce_raking_commutative_only.cuh
cub/block/specializations/block_reduce_warp_reductions.cuh
cub/block/specializations/block_scan_raking.cuh
cub/block/specializations/block_scan_warp_scans.cuh
cub/device/device_reduce.cuh
cub/device/device_segmented_reduce.cuh
cub/device/dispatch/dispatch_reduce.cuh
cub/device/dispatch/dispatch_reduce_by_key.cuh
cub/device/dispatch/dispatch_scan.cuh
cub/grid/grid_even_share.cuh
cub/grid/grid_mapping.cuh
cub/grid/grid_queue.cuh
cub/iterator/arg_index_input_iterator.cuh
cub/iterator/cache_modified_input_iterator.cuh
cub/iterator/constant_input_iterator.cuh
cub/thread/thread_load.cuh
cub/thread/thread_operators.cuh
cub/thread/thread_reduce.cuh
cub/thread/thread_scan.cuh
cub/thread/thread_store.cuh
cub/util_arch.cuh
cub/util_debug.cuh
cub/util_device.cuh
cub/util_macro.cuh
cub/util_namespace.cuh
cub/util_ptx.cuh
cub/util_type.cuh
cub/warp/specializations/warp_reduce_shfl.cuh
cub/warp/specializations/warp_reduce_smem.cuh
cub/warp/specializations/warp_scan_shfl.cuh
cub/warp/specializations/warp_scan_smem.cuh
cub/warp/warp_reduce.cuh
cub/warp/warp_scan.cuh

Since so many of them are needed, we might as well copy the entire cub folder for future extensibility and easier maintainability. 3. CUB v1.8.0 is quite stable (no new release since Feb 2018), suitable to be a dependency.

cat:enhancement no-compat

opened by leofang 96

Support cuFFT callbacks
Close #4105.

UPDATE: This is a very unusual PR due to the static linking requirement, please read along.

Design considerations

cuFFT static (libcufft_static.a) and shared (libcufft.so) libraries cannot mix and match: For example, one cannot generate a plan handle using cufftCreate from the shared library, and call cufftXtSetCallback from the static library on it. This leads to the distinction between a "static" plan and a "shared" plan, based on the libraries they are associated with.

We want to be able to do things in the Python space, so we would like to reuse the cupy.cuda.cufft module as much as possible.

The load/store callbacks have to be visible at the module build time; it's not possible to retrieve a pointer to a device function at runtime and "link" it against libcufft_static.a. This also means in the distributed wheel or Conda package we cannot link against the static library, as it makes zero sense (callbacks can only be supplied at runtime) but only inflates the file size.

If we don't do things correctly, there'd be a bunch of undefined symbols leaking into the Python module, causing import cupy to fail; see the discussion in #4105 for detail.

Approach

For every distinct pair of load and store callbacks (either of them could be None but not both), we generate a stub containing the callback implementations, copy cupy/cuda/cufft.pyx and friends to a temporary directory, and compile at runtime a new Python module cupy_callback_<hash>.cpython-XXm-x86_64-linux-gnu.so in which all things are statically linked together. This is basically a static version of cupy.cuda.cufft, from which we can generate static plans, set the callbacks, and execute them. The generated modules are cached on disk (default: ~/.cupy/callback_cache) and can be reused for all kinds of transforms as long as they use the same pair of callbacks. To avoid collisions, the compile time options along with the callbacks are used to generate a distinct <hash> string (XX stands for the Python version).

This approach is backward compatible in that the cupy.cuda.cufft module is still linked to cuFFT dynamically and continues to function as usual. Also, the static plans can be cached in the new cuFFT plan cache just like the shared plans. The only two drawbacks of this approach are:

The generated Python module can be fat (on my system it's 159M each...)

Runtime compilation is slow for each first use.

See the docstring in cupy.fft.config.set_cufft_callbacks for more detail.

Example

This works now:

import cupy as cp code = r''' __device__ cufftComplex CB_ConvertInputC( void *dataIn, size_t offset, void *callerInfo, void *sharedPtr) { cufftComplex x; x.x = 1.; x.y = 0.; return x; } __device__ cufftCallbackLoadC d_loadCallbackPtr = CB_ConvertInputC; ''' a = cp.random.random((64, 128, 128)).astype(cp.complex64) # this fftn call uses callback with cp.fft.config.set_cufft_callbacks(cb_load=code): b = cp.fft.fftn(a, axes=(1,2)) # this does not use c = cp.fft.fftn(cp.ones(shape=a.shape, dtype=cp.complex64), axes=(1,2)) # result agrees assert cp.allclose(b, c) # static plans are also cached cp.fft.config.show_plan_cache_info()
cat:feature st:test-and-merge prio:medium
opened by leofang 87
Add a cuFFT plan cache
UPDATE: Close #3588.

This PR implements a least recently used (LRU) cache for cuFFT plans. The implementation is done in Cython to minimize the Python overhead; yet, I still use cdef classes (instead of pointers to structs) to avoid managing memory myself, and cdef'ing as much as I can.

Properties of this cache:

Per-thread, per-device

The "size" of the cache can be set by both the number of plans and the amount of (GPU) memory used by the plans (i.e., the work areas)

Enabled by default (with size = 16, which I picked ungroundedly)

Good performance:

Greatly reduced the CPU overhead of plan allocation, especially with non-prime lengths (#3556): https://github.com/cupy/cupy/pull/3730#issuecomment-670061799

Fast access time (< 1 us get/set): https://github.com/cupy/cupy/pull/3730#issuecomment-671725253

Bonus: a bit of GPU time reduction (as certain plan allocations would launch a few kernels): https://github.com/cupy/cupy/pull/3730#issuecomment-670061799

Adding and removing a multi-GPU plan is done collectively among all participating GPUs' caches to respect the memory limitation.

A few handles are exposed to cupy.fft.config:

Documented: get_plan_cache(), show_plan_cache_info()

Undocumented: the five APIs from scipy/scipy#12512

I do not expect they need to be used, though

Question: How can I generate a doc page for PlanCache without explicitly referencing it in the autosummary?

What is NOT done in this PR (see the discussions in the replies below):

Lock stream on to work area

Manage a work area pool by the cache

I think it's out of scope, requires careful planning, and the performance gain, if any, is unknown.

~~Work in progress. Description to follow. All tests passed locally.~~ ~~Aim to address #3588 and follow scipy/scipy#12512.~~

TODO:

[ ] Decide to what extent we would like to expose the cache to end users (the public API from scipy/scipy#12512 is a different thing)

[x] Per-device cache

[x] Merge add_multi_gpu_plan() and __setitem__()

[x] Add PR description

[x] Add tests

[x] Add comments to code

[x] Mark the public API from scipy/scipy#12512 experimental

[ ] ~~Add get_fft_plan outcomes to the cache?~~

cat:enhancement to-be-backported
opened by leofang 78
Small fixes for CUB block reduction kernels
Remove all of the type constraints: I remember I set the limitations due to some errors when running the full test suite, but I could no longer reproduce it (with the latest master). @grlee77's new norm kernels might also need the support for complex numbers.

Add a possible exception to the optimizer: during the optuna optimization CUDADriverError could be raised due to out of resource. This was first observed in https://github.com/cupy/cupy/pull/3244#issuecomment-641479088, and I thought by constraining the search range it'd be remedied, but today I encountered it a few more times for different tasks, so apparently this is necessary. After adding this, I see that the error is gracefully handled:

[I 2020-06-30 22:29:07,612] Finished trial#1 with value: inf with parameters: {'block_size_log': 9, 'items_per_thread': 28}. Best is trial#0 with value: 0.0029116286219972552.

Allow compiler exceptions to propagate upward.

(UPDATE) Make complex<T> (almost) obey the rule of three (to fix fp16 -> complex conversion): This is basically a follow-up of #2629 and #2741. It turns out that by ensuring the rule of three (except for the destructor, which is trivial), we get the float16 -> complex<T> conversion for free (through C++ implicit conversion fp16->fp32->complex) without additional change. I should have done this when working on #2741...😢 Note the changes are in line with the Thrust implementation.

(UPDATE) Fix tests

cat:enhancement
opened by leofang 77
Adds nvcc as a RawKernel backend

Adds nvcc as a backend for RawKernel (issue https://github.com/cupy/cupy/issues/1928). The nvcc.py is a cut and paste of much of the functionality in cupy_setup_build.py and install/*.py -- I've avoided refactoring this at the moment as I'd like to invite input as to whether you think this PR is useful.
cat:feature st:test-and-merge

opened by sjperkins 70
Implementation of ndimage filters

So far this PR includes improved correlate and convolve functions (in terms of speed, uses similar technique to #3179) along with implementations of correlate1d and convolve1d along with tests for them. The underlying kernel creation has been generalized so that it can be used to implement all of the other filters and those will be progressively added to this PR (I have an implementation of them but haven't tested them yet).

This works to address #2099 and #3111.

Current status of tests is that it is passing 7120 tests (in test_filters.py) and failing 8 (they are all with the 1d functions along axis 0 using mode=mirror with 4D images but no other axis or mode or less-dimension image).
cat:feature st:test-and-merge

opened by coderforlife 64
Add compressed sparse `__setitem__`

Closes #3115 Closes #2676 Closes #2677

This PR builds upon #3486, also porting over over from Scipy the functions necessary to set values in both major and minor axes, using integers, slices, and arrays.
cat:feature to-be-backported blocking

opened by cjnolet 60

CUDA 11 Test: `TestFftAllocate`

I built the latest master and fixed #3757 with #3775, and the only error I got from all FFT tests we have is this:

$ pytest tests/cupy_tests/fft_tests/test_fft.py
========================================================================= test session starts =========================================================================
platform linux -- Python 3.7.8, pytest-6.0.1, py-1.9.0, pluggy-0.13.1
rootdir: /home/leofang/cupy_cuda11, configfile: setup.cfg
collected 717 items                                                                                                                                                   

tests/cupy_tests/fft_tests/test_fft.py ........................................................................................................................ [ 16%]
............................................................................................................................................................... [ 38%]
............................................................................................................................................................... [ 61%]
............................................................................................................................................................... [ 83%]
.....................................................................................................................F..                                        [100%]

============================================================================== FAILURES ===============================================================================
__________________________________________________________________ TestFftAllocate.test_fft_allocate __________________________________________________________________

self = <cupy_tests.fft_tests.test_fft.TestFftAllocate testMethod=test_fft_allocate>

    def test_fft_allocate(self):
        # Check CuFFTError is not raised when the GPU memory is enough.
        # See https://github.com/cupy/cupy/issues/1063
        # TODO(mizuno): Simplify "a" after memory compaction is implemented.
        a = []
        for i in range(10):
            a.append(cupy.empty(100000000))
        del a
        b = cupy.empty(100000007, dtype=cupy.float32)
>       cupy.fft.fft(b)

tests/cupy_tests/fft_tests/test_fft.py:336: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
cupy/fft/fft.py:567: in fft
    return _fft(a, (n,), (axis,), norm, cupy.cuda.cufft.CUFFT_FORWARD)
cupy/fft/fft.py:182: in _fft
    a = _fft_c2c(a, direction, norm, axes, overwrite_x, plan=plan)
cupy/fft/fft.py:152: in _fft_c2c
    a = _exec_fft(a, direction, 'C2C', norm, axis, overwrite_x, plan=plan)
cupy/fft/fft.py:109: in _exec_fft
    plan = cufft.Plan1d(out_size, fft_type, batch, devices=devices)
cupy/cuda/cufft.pyx:277: in cupy.cuda.cufft.Plan1d.__init__
    self._single_gpu_get_plan(plan, nx, fft_type, batch)
cupy/cuda/cufft.pyx:306: in cupy.cuda.cufft.Plan1d._single_gpu_get_plan
    check_result(result)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   raise CuFFTError(result)
E   cupy.cuda.cufft.CuFFTError: CUFFT_INTERNAL_ERROR

cupy/cuda/cufft.pyx:147: CuFFTError
======================================================================= short test summary info =======================================================================
FAILED tests/cupy_tests/fft_tests/test_fft.py::TestFftAllocate::test_fft_allocate - cupy.cuda.cufft.CuFFTError: CUFFT_INTERNAL_ERROR
=================================================================== 1 failed, 716 passed in 13.16s ====================================================================

issue-checked

opened by leofang 54

Add initial cupyx.spatial.distance support from pylibraft
This is an initial PR to establish integration of pylibraft within cupy. For now, the intention is to use RAFT for supported distances and types and cupy kernels for other distances and types until we support them in RAFT. This provides us a path to make use of what's available and then continue optimizing RAFT underneath, allowing cupy to reap the benefits immediately.

This is really just a start in the direction of backing the entire cupyx.scipy.spatial package with RAFT, as well as the cupyx.scipy.cluster and cupyx.scipy.optimize packages (and potentially cupyx.scipy.stats / cupyx.scipy.qmc in the future).

Tasks:

[x] Add cdist

[x] Add scipy.spatial.distance_matrix, minkowski_distance

[x] Add tests

cat:feature prio:medium
opened by cjnolet 48
Provide full coverage for NCCL APIs in CuPy
The main purpose of PR is to add a factory function to NcclCommunicator so that it is possible to create a group of NCCL communicators for multiple devices in a single process:

from cupy.cuda import nccl # Use GPU #0, #2, and #3 # comms is a list of NcclCommunicator comms = nccl.NcclCommunicator.initAll([0, 2, 3])

Since Python/Cython does not support overloading multiple constructors, in order to preserve backward compatibility creating a factory function seems to be a necessary design decision to me. Please let me know if you have any better alternatives, thanks!

In addition, this PR also wraps several other NCCL APIs (mainly to serve the above need). With this PR, now CuPy supports the full NCCL APIs! See below for a complete and finalized list for the changes:

=== FINAL UPDATE ===

added initAll(), groupStart() and groupEnd() to support controlling multiple devices in a single process.

added size() that returns the total number of NCCL ranks.

added tests for the above functions

added documentation for NCCL APIs.

backward compatibility is still preserved

For a working demo enabled by this PR, see https://github.com/cupy/cupy/pull/2325#issuecomment-515299903.
cat:feature st:test-and-merge
opened by leofang 44
CUDA 11.2: Support the built-in Stream Ordered Memory Allocator
While this is working, I mark it as Work in Progress as there are some issues to be discussed with our NVIDIA friends 🙂

May be blocked by #4443 (?)

This PR exposes CUDA's new Stream Ordered Memory Allocator added since 11.2 to CuPy. A new memory type, MemoryAsync, is added, which is backed by cudaMallocAsync() and cudaFreeAsync().

To use this feature, one simply sets the allocator to malloc_async, similar to what's done for managed memory:

import cupy as cp cp.cuda.set_allocator(cp.cuda.malloc_async) # from now on the memory is allocated on the current stream from Stream Ordered Memory Allocator

On older CUDA (<11.2) or unsupported devices, using this new allocator will raise an error at runtime.

(I didn't add the support with a customized mempool cudaMemPool*()/cudaMallocFromPoolAsync() -- which could be the next PR -- as it's unclear to me the benefit of using non-default mempools. Also, note that there is no API to expose any current information of the mempool, so it wouldn't be compatible with CuPy's MemoryPool API, such as used_bytes() etc.)

Currently observed issues

I think nothing is wrong with my implementation, most likely these are from CUDA 😁

UPDATE: This is irrelevant of this PR, see https://github.com/cupy/cupy/pull/4537#issuecomment-757429743 and #4538.

It is unclear from the CUDA documentation if a stream is allowed to be destroyed before all memory allocated on it is freed. It could be that the driver performs a ref count internally (so we don't have to), but we need to make sure. If it's not the case, then in MemoryAsync we will also need to hold the reference to the stream (object), not just its pointer.

nvprof python my_script.py will fail if malloc_async is used in the workload:

$ nvprof --device-buffer-size 2048 --profiling-semaphore-pool-size 128000 pytest tests/cupy_tests/fft_tests/test_fft.py -k TestFFt ========================================================================= test session starts ========================================================================= platform linux -- Python 3.7.9, pytest-6.2.1, py-1.10.0, pluggy-0.13.1 rootdir: /home/leofang/dev/cupy_cuda112, configfile: setup.cfg collecting ... ==31333== NVPROF is profiling process 31333, command: /home/leofang/miniconda3/envs/cupy_cuda112_dev/bin/python /home/leofang/miniconda3/envs/cupy_cuda112_dev/bin/pytest tests/cupy_tests/fft_tests/test_fft.py -k TestFFt collected 717 items / 410 deselected / 307 selected tests/cupy_tests/fft_tests/test_fft.py ..................................................==31333== Error: Internal profiling error 3938:999. .....................................^C======== Warning: 569 records have invalid timestamps due to insufficient device buffer space. You can configure the buffer space using the option --device-buffer-size. ======== Warning: 293 records have invalid timestamps due to insufficient semaphore pool size. You can configure the pool size using the option --profiling-semaphore-pool-size. ======== Profiling result: ...

We need to confirm if this is nvprof's problem/limitation (very likely it is), as it could be annoying to our users.

TODO

[x] Add tests

[x] Add tutorial to docs/source/reference/memory.rst?

[x] Fix/update docstrings

[x] Mark it experimental (as I did)?

cc: @jakirkham @pentschev @maxpkatz Could you help address the three observed issues? 🙂
cat:feature st:test-and-merge prio:medium
opened by leofang 42
Support passing int as shape to `broadcast_to`

Reported internally. broadcast_to should accept int as shape.

https://numpy.org/doc/stable/reference/generated/numpy.broadcast_to.html

shape : tuple or int The shape of the desired array. A single integer i is interpreted as (i,).

opened by kmaehashi 0

cupy raw kernel cannot handle view of cupy ndarray

Description

When feeding a view of cupy ndarray into a kernel, for example, a slice of a big ndarray, the result looks like the kernel read the original big ndarray not a slice of it.

To Reproduce

import cupy as cp
x = cp.arange(10, dtype=cp.complex64).reshape(2,5)

show = cp.RawKernel(r'''
#include <cuComplex.h>

extern "C" __global__
void show(const cuFloatComplex* x, const int N){
  int i = blockDim.x * blockIdx.x + threadIdx.x;
  
  if(i == 0 ){
      printf("%f\n",cuCrealf(x[N]));
  }
}
''', 'show')

When call the kernel:

show((2,), (5,), (x,cp.int32(6)))
cp.cuda.runtime.deviceSynchronize()

It will print:

6.000000

But if feed a slice of x:

x_slice = x[:,:4]
show((2,), (5,), (x_slice,cp.int32(6)))
cp.cuda.runtime.deviceSynchronize()

It also print:

6.000000

which is not as wanted.

However, if a copy is fed:

x_slice = x[:,:4].copy()
show((2,), (5,), (x_slice,cp.int32(6)))
cp.cuda.runtime.deviceSynchronize()

It print:

7.000000

as expected.

Installation

Conda-Forge (conda install ...)

Environment

OS                           : Linux-3.10.0-1127.18.2.el7.x86_64-x86_64-with-glibc2.17
Python Version               : 3.9.13
CuPy Version                 : 11.2.0
CuPy Platform                : NVIDIA CUDA
NumPy Version                : 1.23.4
SciPy Version                : 1.9.3
Cython Build Version         : 0.29.32
Cython Runtime Version       : None
CUDA Root                    : /users/kangl/miniconda3/envs/rapids-22.10
nvcc PATH                    : /users/kangl/miniconda3/envs/rapids-22.10/bin/nvcc
CUDA Build Version           : 11020
CUDA Driver Version          : 11040
CUDA Runtime Version         : 11070
cuBLAS Version               : (available)
cuFFT Version                : 10702
cuRAND Version               : 10301
cuSOLVER Version             : (11, 4, 0)
cuSPARSE Version             : (available)
NVRTC Version                : (11, 7)
Thrust Version               : 101000
CUB Build Version            : 101000
Jitify Build Version         : 343be31
cuDNN Build Version          : None
cuDNN Version                : None
NCCL Build Version           : 21403
NCCL Runtime Version         : 21403
cuTENSOR Version             : None
cuSPARSELt Build Version     : None
Device 0 Name                : Tesla V100-SXM2-32GB
Device 0 Compute Capability  : 70
Device 0 PCI Bus ID          : 0000:15:00.0

Additional Information

No response

cat:bug

opened by kanglcn 5

Releases(v12.0.0b2)

v12.0.0b2(Dec 8, 2022)
This is the release note of v12.0.0b2. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

More cupyx.scipy.interpolate APIs (#7086, #7190 and #7215)

Increased coverage of cupyx.scipy.interpolate APIs, which now includes BSpline, RBFInterpolator, splantider and splder.

Acknowledgements: This work was done by Edgar Andrés Margffoy Tuay (@andfoy) and Evgeni Burovski (@ev-br) under the support of the Chan Zuckerberg Initiative's Essential Open Source Software for Science program.

Use CUB reduction classes in cupyx.jit (#7145)

Now it is possible to use the CUB reduction classes, cub::WarpReduce and cub::BlockReduce, in kernels written using CuPy JIT.

import cupy, cupyx from cupy.cuda import runtime from cupyx import jit @jit.rawkernel() def warp_reduce_sum(x, y): WarpReduce = jit.cub.WarpReduce[cupy.int32] temp_storage = jit.shared_memory( dtype=WarpReduce.TempStorage, size=1) i, j = jit.blockIdx.x, jit.threadIdx.x value = x[i, j] aggregator = WarpReduce(temp_storage[0]) aggregate = aggregator.Reduce(value, jit.cub.Sum()) if j == 0: y[i] = aggregate warp_size = 64 if runtime.is_hip else 32 h, w = (32, warp_size) x = cupy.arange(h * w, dtype=cupy.int32).reshape(h, w) cupy.random.shuffle(x) y = cupy.zeros(h, dtype=cupy.int32) warp_reduce_sum[h, w](x, y)

Acknowledgements: This work was done by Tsutsui Masayoshi (@TsutsuiMasayoshi) as a part of the internship program at Preferred Networks.

Changes

New Features

Add 1-D BSpline to interpolate module (#7086)

JIT: Support cub::WarpReduce and cub::BlockReduce (#7145)

Add cupyx.scipy.interpolate.RBFInterpolator (#7190)

Expose splder and splantider (#7215)

Enhancements

Use cuSPARSE Generic API instead of older one documented to be removed (#7052)

Improve _PerfCaseResult.to_str format (#7152)

Bug Fixes

Split inputs to random routines (#7173)

Fix 1-dim lexsort (#7178)

Fix cupyx.scipy.ndimage.zoom for outputs of size 1 when mode is 'opencv' (#7192)

Fix wrong argument in warnings.warn() (#7194)

Use list(kwargs) instead of list(kwargs.keys) (#7203)

Fix cusparseSpSM compatibility (#7214)

Remove scipy import (#7218)

Use naive comb() for Python 3.7 (#7221)

Tests

CI: Generate coverage count just after the parameter axis in table (#7175)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @andfoy @asi1024 @emcastillo @ev-br @hadipash @jjmortensen @kmaehashi @takagi @TsutsuiMasayoshi
Source code(tar.gz)
Source code(zip)
cupy_cuda102-12.0.0b2-cp310-cp310-manylinux2014_aarch64.whl(35.42 MB)
cupy_cuda102-12.0.0b2-cp310-cp310-manylinux2014_x86_64.whl(61.57 MB)
cupy_cuda102-12.0.0b2-cp310-cp310-win_amd64.whl(43.41 MB)
cupy_cuda102-12.0.0b2-cp311-cp311-manylinux2014_aarch64.whl(35.78 MB)
cupy_cuda102-12.0.0b2-cp311-cp311-manylinux2014_x86_64.whl(62.07 MB)
cupy_cuda102-12.0.0b2-cp311-cp311-win_amd64.whl(43.38 MB)
cupy_cuda102-12.0.0b2-cp37-cp37m-manylinux2014_aarch64.whl(34.30 MB)
cupy_cuda102-12.0.0b2-cp37-cp37m-manylinux2014_x86_64.whl(60.67 MB)
cupy_cuda102-12.0.0b2-cp37-cp37m-win_amd64.whl(43.45 MB)
cupy_cuda102-12.0.0b2-cp38-cp38-manylinux2014_aarch64.whl(37.65 MB)
cupy_cuda102-12.0.0b2-cp38-cp38-manylinux2014_x86_64.whl(64.09 MB)
cupy_cuda102-12.0.0b2-cp38-cp38-win_amd64.whl(43.55 MB)
cupy_cuda102-12.0.0b2-cp39-cp39-manylinux2014_aarch64.whl(36.05 MB)
cupy_cuda102-12.0.0b2-cp39-cp39-manylinux2014_x86_64.whl(62.24 MB)
cupy_cuda102-12.0.0b2-cp39-cp39-win_amd64.whl(43.54 MB)
cupy_cuda110-12.0.0b2-cp310-cp310-manylinux2014_x86_64.whl(76.40 MB)
cupy_cuda110-12.0.0b2-cp310-cp310-win_amd64.whl(58.19 MB)
cupy_cuda110-12.0.0b2-cp311-cp311-manylinux2014_x86_64.whl(76.90 MB)
cupy_cuda110-12.0.0b2-cp311-cp311-win_amd64.whl(58.16 MB)
cupy_cuda110-12.0.0b2-cp37-cp37m-manylinux2014_x86_64.whl(75.50 MB)
cupy_cuda110-12.0.0b2-cp37-cp37m-win_amd64.whl(58.23 MB)
cupy_cuda110-12.0.0b2-cp38-cp38-manylinux2014_x86_64.whl(78.91 MB)
cupy_cuda110-12.0.0b2-cp38-cp38-win_amd64.whl(58.33 MB)
cupy_cuda110-12.0.0b2-cp39-cp39-manylinux2014_x86_64.whl(77.07 MB)
cupy_cuda110-12.0.0b2-cp39-cp39-win_amd64.whl(58.33 MB)
cupy_cuda111-12.0.0b2-cp310-cp310-manylinux2014_x86_64.whl(95.59 MB)
cupy_cuda111-12.0.0b2-cp310-cp310-win_amd64.whl(78.35 MB)
cupy_cuda111-12.0.0b2-cp311-cp311-manylinux2014_x86_64.whl(96.08 MB)
cupy_cuda111-12.0.0b2-cp311-cp311-win_amd64.whl(78.32 MB)
cupy_cuda111-12.0.0b2-cp37-cp37m-manylinux2014_x86_64.whl(94.68 MB)
cupy_cuda111-12.0.0b2-cp37-cp37m-win_amd64.whl(78.39 MB)
cupy_cuda111-12.0.0b2-cp38-cp38-manylinux2014_x86_64.whl(98.09 MB)
cupy_cuda111-12.0.0b2-cp38-cp38-win_amd64.whl(78.49 MB)
cupy_cuda111-12.0.0b2-cp39-cp39-manylinux2014_x86_64.whl(96.25 MB)
cupy_cuda111-12.0.0b2-cp39-cp39-win_amd64.whl(78.48 MB)
cupy_cuda11x-12.0.0b2-cp310-cp310-manylinux2014_aarch64.whl(98.36 MB)
cupy_cuda11x-12.0.0b2-cp310-cp310-manylinux2014_x86_64.whl(87.04 MB)
cupy_cuda11x-12.0.0b2-cp310-cp310-win_amd64.whl(68.66 MB)
cupy_cuda11x-12.0.0b2-cp311-cp311-manylinux2014_aarch64.whl(99.56 MB)
cupy_cuda11x-12.0.0b2-cp311-cp311-manylinux2014_x86_64.whl(87.54 MB)
cupy_cuda11x-12.0.0b2-cp311-cp311-win_amd64.whl(68.64 MB)
cupy_cuda11x-12.0.0b2-cp37-cp37m-manylinux2014_aarch64.whl(96.62 MB)
cupy_cuda11x-12.0.0b2-cp37-cp37m-manylinux2014_x86_64.whl(86.14 MB)
cupy_cuda11x-12.0.0b2-cp37-cp37m-win_amd64.whl(68.70 MB)
cupy_cuda11x-12.0.0b2-cp38-cp38-manylinux2014_aarch64.whl(100.82 MB)
cupy_cuda11x-12.0.0b2-cp38-cp38-manylinux2014_x86_64.whl(89.55 MB)
cupy_cuda11x-12.0.0b2-cp38-cp38-win_amd64.whl(68.80 MB)
cupy_cuda11x-12.0.0b2-cp39-cp39-manylinux2014_aarch64.whl(99.07 MB)
cupy_cuda11x-12.0.0b2-cp39-cp39-manylinux2014_x86_64.whl(87.71 MB)
cupy_cuda11x-12.0.0b2-cp39-cp39-win_amd64.whl(68.80 MB)
cupy_rocm_4_3-12.0.0b2-cp310-cp310-manylinux2014_x86_64.whl(36.35 MB)
cupy_rocm_4_3-12.0.0b2-cp311-cp311-manylinux2014_x86_64.whl(36.82 MB)
cupy_rocm_4_3-12.0.0b2-cp37-cp37m-manylinux2014_x86_64.whl(35.59 MB)
cupy_rocm_4_3-12.0.0b2-cp38-cp38-manylinux2014_x86_64.whl(38.58 MB)
cupy_rocm_4_3-12.0.0b2-cp39-cp39-manylinux2014_x86_64.whl(36.94 MB)
cupy_rocm_5_0-12.0.0b2-cp310-cp310-manylinux2014_x86_64.whl(54.42 MB)
cupy_rocm_5_0-12.0.0b2-cp311-cp311-manylinux2014_x86_64.whl(54.89 MB)
cupy_rocm_5_0-12.0.0b2-cp37-cp37m-manylinux2014_x86_64.whl(53.67 MB)
cupy_rocm_5_0-12.0.0b2-cp38-cp38-manylinux2014_x86_64.whl(56.65 MB)
cupy_rocm_5_0-12.0.0b2-cp39-cp39-manylinux2014_x86_64.whl(55.01 MB)
v11.4.0(Dec 8, 2022)
This is the release note of v11.4.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Changes

Enhancements

Use cuSPARSE Generic API instead of older one documented to be removed (#7209)

Bug Fixes

Fix 1-dim lexsort (#7191)

Fix cupyx.scipy.ndimage.zoom for outputs of size 1 when mode is 'opencv' (#7202)

Split inputs to random routines (#7207)

Use list(kwargs) instead of list(kwargs.keys) (#7213)

Fix cusparseSpSM compatibility (#7220)

Tests

CI: Generate coverage count just after the parameter axis in table (#7188)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@emcastillo @hadipash @jjmortensen @kmaehashi @takagi
Source code(tar.gz)
Source code(zip)
cupy_cuda102-11.4.0-cp310-cp310-manylinux1_x86_64.whl(61.41 MB)
cupy_cuda102-11.4.0-cp310-cp310-manylinux2014_aarch64.whl(35.26 MB)
cupy_cuda102-11.4.0-cp310-cp310-win_amd64.whl(43.35 MB)
cupy_cuda102-11.4.0-cp311-cp311-manylinux1_x86_64.whl(61.90 MB)
cupy_cuda102-11.4.0-cp311-cp311-manylinux2014_aarch64.whl(35.61 MB)
cupy_cuda102-11.4.0-cp311-cp311-win_amd64.whl(43.32 MB)
cupy_cuda102-11.4.0-cp37-cp37m-manylinux1_x86_64.whl(60.50 MB)
cupy_cuda102-11.4.0-cp37-cp37m-manylinux2014_aarch64.whl(34.14 MB)
cupy_cuda102-11.4.0-cp37-cp37m-win_amd64.whl(43.39 MB)
cupy_cuda102-11.4.0-cp38-cp38-manylinux1_x86_64.whl(63.88 MB)
cupy_cuda102-11.4.0-cp38-cp38-manylinux2014_aarch64.whl(37.46 MB)
cupy_cuda102-11.4.0-cp38-cp38-win_amd64.whl(43.48 MB)
cupy_cuda102-11.4.0-cp39-cp39-manylinux1_x86_64.whl(62.06 MB)
cupy_cuda102-11.4.0-cp39-cp39-manylinux2014_aarch64.whl(35.88 MB)
cupy_cuda102-11.4.0-cp39-cp39-win_amd64.whl(43.48 MB)
cupy_cuda110-11.4.0-cp310-cp310-manylinux1_x86_64.whl(76.24 MB)
cupy_cuda110-11.4.0-cp310-cp310-win_amd64.whl(58.14 MB)
cupy_cuda110-11.4.0-cp311-cp311-manylinux1_x86_64.whl(76.73 MB)
cupy_cuda110-11.4.0-cp311-cp311-win_amd64.whl(58.11 MB)
cupy_cuda110-11.4.0-cp37-cp37m-manylinux1_x86_64.whl(75.33 MB)
cupy_cuda110-11.4.0-cp37-cp37m-win_amd64.whl(58.17 MB)
cupy_cuda110-11.4.0-cp38-cp38-manylinux1_x86_64.whl(78.71 MB)
cupy_cuda110-11.4.0-cp38-cp38-win_amd64.whl(58.27 MB)
cupy_cuda110-11.4.0-cp39-cp39-manylinux1_x86_64.whl(76.89 MB)
cupy_cuda110-11.4.0-cp39-cp39-win_amd64.whl(58.27 MB)
cupy_cuda111-11.4.0-cp310-cp310-manylinux1_x86_64.whl(95.43 MB)
cupy_cuda111-11.4.0-cp310-cp310-win_amd64.whl(78.29 MB)
cupy_cuda111-11.4.0-cp311-cp311-manylinux1_x86_64.whl(95.91 MB)
cupy_cuda111-11.4.0-cp311-cp311-win_amd64.whl(78.26 MB)
cupy_cuda111-11.4.0-cp37-cp37m-manylinux1_x86_64.whl(94.52 MB)
cupy_cuda111-11.4.0-cp37-cp37m-win_amd64.whl(78.33 MB)
cupy_cuda111-11.4.0-cp38-cp38-manylinux1_x86_64.whl(97.89 MB)
cupy_cuda111-11.4.0-cp38-cp38-win_amd64.whl(78.42 MB)
cupy_cuda111-11.4.0-cp39-cp39-manylinux1_x86_64.whl(96.07 MB)
cupy_cuda111-11.4.0-cp39-cp39-win_amd64.whl(78.42 MB)
cupy_cuda11x-11.4.0-cp310-cp310-manylinux1_x86_64.whl(86.88 MB)
cupy_cuda11x-11.4.0-cp310-cp310-manylinux2014_aarch64.whl(98.16 MB)
cupy_cuda11x-11.4.0-cp310-cp310-win_amd64.whl(68.61 MB)
cupy_cuda11x-11.4.0-cp311-cp311-manylinux1_x86_64.whl(87.37 MB)
cupy_cuda11x-11.4.0-cp311-cp311-manylinux2014_aarch64.whl(99.31 MB)
cupy_cuda11x-11.4.0-cp311-cp311-win_amd64.whl(68.58 MB)
cupy_cuda11x-11.4.0-cp37-cp37m-manylinux1_x86_64.whl(85.97 MB)
cupy_cuda11x-11.4.0-cp37-cp37m-manylinux2014_aarch64.whl(96.43 MB)
cupy_cuda11x-11.4.0-cp37-cp37m-win_amd64.whl(68.64 MB)
cupy_cuda11x-11.4.0-cp38-cp38-manylinux1_x86_64.whl(89.35 MB)
cupy_cuda11x-11.4.0-cp38-cp38-manylinux2014_aarch64.whl(100.59 MB)
cupy_cuda11x-11.4.0-cp38-cp38-win_amd64.whl(68.74 MB)
cupy_cuda11x-11.4.0-cp39-cp39-manylinux1_x86_64.whl(87.53 MB)
cupy_cuda11x-11.4.0-cp39-cp39-manylinux2014_aarch64.whl(98.86 MB)
cupy_cuda11x-11.4.0-cp39-cp39-win_amd64.whl(68.74 MB)
cupy_rocm_4_3-11.4.0-cp310-cp310-manylinux1_x86_64.whl(36.19 MB)
cupy_rocm_4_3-11.4.0-cp311-cp311-manylinux1_x86_64.whl(36.64 MB)
cupy_rocm_4_3-11.4.0-cp37-cp37m-manylinux1_x86_64.whl(35.43 MB)
cupy_rocm_4_3-11.4.0-cp38-cp38-manylinux1_x86_64.whl(38.37 MB)
cupy_rocm_4_3-11.4.0-cp39-cp39-manylinux1_x86_64.whl(36.76 MB)
cupy_rocm_5_0-11.4.0-cp310-cp310-manylinux1_x86_64.whl(54.26 MB)
cupy_rocm_5_0-11.4.0-cp311-cp311-manylinux1_x86_64.whl(54.72 MB)
cupy_rocm_5_0-11.4.0-cp37-cp37m-manylinux1_x86_64.whl(53.50 MB)
cupy_rocm_5_0-11.4.0-cp38-cp38-manylinux1_x86_64.whl(56.44 MB)
cupy_rocm_5_0-11.4.0-cp39-cp39-manylinux1_x86_64.whl(54.83 MB)
v12.0.0b1(Nov 11, 2022)
This is the release note of v12.0.0b1. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Support for CUDA 11.8 & NVIDIA H100 GPUs

This release adds support for CUDA 11.8 and the latest NVIDIA H100 GPUs. Note that CUDA 11.8 support is included in the cupy-cuda11x wheel.

Support for Python 3.11

Wheels are now available for Python 3.11.

ufunc Methods

This release adds ufunc.reduce, ufunc.accumulate, ufunc.reduceat, and ufunc.at methods. See the documentation for more details.

Use Thrust in cupyx.jit (#7054, #7139)

Now it is possible to use the Thrust library device functions in kernels written using CuPy JIT.

import cupy, cupyx @cupyx.jit.rawkernel() def sort_by_key(x, y): i = cupyx.jit.threadIdx.x x_array = x[i] y_array = y[i] cupyx.jit.thrust.sort_by_key( cupyx.jit.thrust.device, x_array.begin(), x_array.end(), y_array.begin(), ) h, w = (256, 256) x = cupy.arange(h * w, dtype=cupy.int32) cupy.random.shuffle(x) x = x.reshape(h, w) y = cupy.arange(h * w, dtype=cupy.int32) cupy.random.shuffle(y) y = y.reshape(h, w) sort_by_key[1, 256](x, y)

Currently supported Thrust functions are count, copy, find, mismatch, sort, sort_by_key.

Acknowledgements: This work was done by Tsutsui Masayoshi (@TsutsuiMasayoshi) as a part of the internship program at Preferred Networks.

Changes without compatibility

Deprecates ndarray.scatter_{add,max,min} (#7097)

cupy.ndarray.scatter_{add,max,min} methods are marked as deprecated. Use the corresponding ufunc methods (cupy.{add,maximum,minimum}.at) instead.

CUDA library wrappers now live in cupyx (#7013)

Previously, CuPy has been providing high-level wrappers for CUDA libraries as cupy.cudnn, cupy.cusolver, cupy.cusparse, and cupy.cutensor. These modules are now moved to cupyx as a part of the cupy namespace cleanup. The old modules are still available but marked as deprecated. Note that these modules are still undocumented and may be subject to change.

Changes

New Features

Add axis to cupy.logspace (#6797)

Support thrust::count, device in CuPy JIT (#7054)

Add cupy.ndarray.searchsorted (#7059)

Support add.at, maximum.at, minimum.at (#7077)

Add pdist implementation to distance functions (#7078)

Support subtract.at, bitwise_and.at, bitwise_or.at, bitwise_xor.at (#7099)

Add ufunc.reduce and ufunc.accumulate (#7105)

Add cupy.add.reduceat (#7115)

Implement cupy.min_scalar_type (#7136)

JIT: Support more thrust functions (#7139)

Enhancements

Move cupy.cudnn cupy.cusolver cupy.cutensor cupy.cusparse to cupyx (#7013)

Allow randint to support array bounds (#7051)

Deprecate ndarray.scatter_{add, max, min} (#7097)

Support CUDA 11.8 H100 GPUs (#7100)

Support CUDA 11.8 (#7117)

Add CUDA 11.8 on documents (#7119)

Fix compile error from inf/nan in cupy.fuse (#7122)

Raise TypeError instead of ValueError in cupy.from_dlpack when CPU tensor is passed (#7133)

Support NCCL 2.15 (#7153)

Support Python 3.11 (#7159)

Fix indexing sparse matrix with empty index arguments (#7143)

Bug Fixes

Make sure that cupy (array-api) Array objects can be composed using asarray (#6874)

Don't use __del__ in TCPStore (#6989)

JIT: Fix compile error for op.routine including in0_type (#7076)

Fix cupy.nansum in fusing (#7102)

Fusion TypeError in cupy._core.fusion._call_ufunc() (#7113)

Fix a typo (#7163)

JIT: Fix compile error of minmax function (#7167)

Code Fixes

Remove _ufunc_method directory (#7116)

Add missing base type to cdef declarations (#7170)

Documentation

Docs: Add missing functions (#7103)

Docs: ufunc methods (#7104)

Improve benchmark documentation (#7176)

Installation

Bump version to v12.0.0b1 (#7181)

Examples

Tests

CI: Add ROCm 5.3 (#7124)

CI: Allow /test jenkins to trigger Jenkins only (#7126)

Install zlib for CUDA 11.8 Windows CI (#7137)

CI: improve use of cache in GitHub Actions (#7141)

Fix for pytest 7.2 (#7147)

CI: Add support for the latest FlexCI Windows image (#7161)

JIT: Skip HIP thrust::sort test (#7162)

CI: use pre-commit in GitHub Actions (#7123)

Others

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @andfoy @asi1024 @Diwakar-Gupta @emcastillo @IncubatorShokuhou @kmaehashi @MarcoGorelli @takagi @TsutsuiMasayoshi
Source code(tar.gz)
Source code(zip)
cupy_cuda102-12.0.0b1-cp310-cp310-manylinux2014_aarch64.whl(35.37 MB)
cupy_cuda102-12.0.0b1-cp310-cp310-manylinux2014_x86_64.whl(61.53 MB)
cupy_cuda102-12.0.0b1-cp310-cp310-win_amd64.whl(43.39 MB)
cupy_cuda102-12.0.0b1-cp311-cp311-manylinux2014_aarch64.whl(35.73 MB)
cupy_cuda102-12.0.0b1-cp311-cp311-manylinux2014_x86_64.whl(62.02 MB)
cupy_cuda102-12.0.0b1-cp311-cp311-win_amd64.whl(43.36 MB)
cupy_cuda102-12.0.0b1-cp37-cp37m-manylinux2014_aarch64.whl(34.27 MB)
cupy_cuda102-12.0.0b1-cp37-cp37m-manylinux2014_x86_64.whl(60.63 MB)
cupy_cuda102-12.0.0b1-cp37-cp37m-win_amd64.whl(43.43 MB)
cupy_cuda102-12.0.0b1-cp38-cp38-manylinux2014_aarch64.whl(37.60 MB)
cupy_cuda102-12.0.0b1-cp38-cp38-manylinux2014_x86_64.whl(64.03 MB)
cupy_cuda102-12.0.0b1-cp38-cp38-win_amd64.whl(43.53 MB)
cupy_cuda102-12.0.0b1-cp39-cp39-manylinux2014_aarch64.whl(36.01 MB)
cupy_cuda102-12.0.0b1-cp39-cp39-manylinux2014_x86_64.whl(62.19 MB)
cupy_cuda102-12.0.0b1-cp39-cp39-win_amd64.whl(43.52 MB)
cupy_cuda110-12.0.0b1-cp310-cp310-manylinux2014_x86_64.whl(76.35 MB)
cupy_cuda110-12.0.0b1-cp310-cp310-win_amd64.whl(58.17 MB)
cupy_cuda110-12.0.0b1-cp311-cp311-manylinux2014_x86_64.whl(76.85 MB)
cupy_cuda110-12.0.0b1-cp311-cp311-win_amd64.whl(58.14 MB)
cupy_cuda110-12.0.0b1-cp37-cp37m-manylinux2014_x86_64.whl(75.45 MB)
cupy_cuda110-12.0.0b1-cp37-cp37m-win_amd64.whl(58.21 MB)
cupy_cuda110-12.0.0b1-cp38-cp38-manylinux2014_x86_64.whl(78.86 MB)
cupy_cuda110-12.0.0b1-cp38-cp38-win_amd64.whl(58.31 MB)
cupy_cuda110-12.0.0b1-cp39-cp39-manylinux2014_x86_64.whl(77.02 MB)
cupy_cuda110-12.0.0b1-cp39-cp39-win_amd64.whl(58.31 MB)
cupy_cuda111-12.0.0b1-cp310-cp310-manylinux2014_x86_64.whl(95.54 MB)
cupy_cuda111-12.0.0b1-cp310-cp310-win_amd64.whl(78.33 MB)
cupy_cuda111-12.0.0b1-cp311-cp311-manylinux2014_x86_64.whl(96.03 MB)
cupy_cuda111-12.0.0b1-cp311-cp311-win_amd64.whl(78.30 MB)
cupy_cuda111-12.0.0b1-cp37-cp37m-manylinux2014_x86_64.whl(94.64 MB)
cupy_cuda111-12.0.0b1-cp37-cp37m-win_amd64.whl(78.37 MB)
cupy_cuda111-12.0.0b1-cp38-cp38-manylinux2014_x86_64.whl(98.04 MB)
cupy_cuda111-12.0.0b1-cp38-cp38-win_amd64.whl(78.47 MB)
cupy_cuda111-12.0.0b1-cp39-cp39-manylinux2014_x86_64.whl(96.20 MB)
cupy_cuda111-12.0.0b1-cp39-cp39-win_amd64.whl(78.46 MB)
cupy_cuda11x-12.0.0b1-cp310-cp310-manylinux2014_aarch64.whl(98.30 MB)
cupy_cuda11x-12.0.0b1-cp310-cp310-manylinux2014_x86_64.whl(87.00 MB)
cupy_cuda11x-12.0.0b1-cp310-cp310-win_amd64.whl(68.64 MB)
cupy_cuda11x-12.0.0b1-cp311-cp311-manylinux2014_aarch64.whl(99.50 MB)
cupy_cuda11x-12.0.0b1-cp311-cp311-manylinux2014_x86_64.whl(87.49 MB)
cupy_cuda11x-12.0.0b1-cp311-cp311-win_amd64.whl(68.61 MB)
cupy_cuda11x-12.0.0b1-cp37-cp37m-manylinux2014_aarch64.whl(96.58 MB)
cupy_cuda11x-12.0.0b1-cp37-cp37m-manylinux2014_x86_64.whl(86.09 MB)
cupy_cuda11x-12.0.0b1-cp37-cp37m-win_amd64.whl(68.68 MB)
cupy_cuda11x-12.0.0b1-cp38-cp38-manylinux2014_aarch64.whl(100.76 MB)
cupy_cuda11x-12.0.0b1-cp38-cp38-manylinux2014_x86_64.whl(89.50 MB)
cupy_cuda11x-12.0.0b1-cp38-cp38-win_amd64.whl(68.78 MB)
cupy_cuda11x-12.0.0b1-cp39-cp39-manylinux2014_aarch64.whl(99.02 MB)
cupy_cuda11x-12.0.0b1-cp39-cp39-manylinux2014_x86_64.whl(87.66 MB)
cupy_cuda11x-12.0.0b1-cp39-cp39-win_amd64.whl(68.78 MB)
cupy_rocm_4_3-12.0.0b1-cp310-cp310-manylinux2014_x86_64.whl(36.30 MB)
cupy_rocm_4_3-12.0.0b1-cp311-cp311-manylinux2014_x86_64.whl(36.77 MB)
cupy_rocm_4_3-12.0.0b1-cp37-cp37m-manylinux2014_x86_64.whl(35.54 MB)
cupy_rocm_4_3-12.0.0b1-cp38-cp38-manylinux2014_x86_64.whl(38.53 MB)
cupy_rocm_4_3-12.0.0b1-cp39-cp39-manylinux2014_x86_64.whl(36.89 MB)
cupy_rocm_5_0-12.0.0b1-cp310-cp310-manylinux2014_x86_64.whl(54.37 MB)
cupy_rocm_5_0-12.0.0b1-cp311-cp311-manylinux2014_x86_64.whl(54.84 MB)
cupy_rocm_5_0-12.0.0b1-cp37-cp37m-manylinux2014_x86_64.whl(53.62 MB)
cupy_rocm_5_0-12.0.0b1-cp38-cp38-manylinux2014_x86_64.whl(56.60 MB)
cupy_rocm_5_0-12.0.0b1-cp39-cp39-manylinux2014_x86_64.whl(54.96 MB)
v11.3.0(Nov 11, 2022)
This is the release note of v11.3.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Support for CUDA 11.8 & NVIDIA H100 GPUs

This release adds support for CUDA 11.8 and the latest NVIDIA H100 GPUs. Note that CUDA 11.8 support is included in the cupy-cuda11x wheel.

Support for Python 3.11

Wheels are now available for Python 3.11.

Changes

Enhancements

Add wrapper for cutensorPermutation (#7083)

Fix compile error from inf/nan in cupy.fuse (#7128)

Support CUDA 11.8 (#7134)

Add CUDA 11.8 on documents (#7148)

Support NCCL 2.15 (#7160)

Support CUDA 11.8 H100 GPUs (#7169)

Support Python 3.11 (#7179)

Fix indexing sparse matrix with empty index arguments (#7155)

Bug Fixes

Make sure weibull distribution support ndarrays (#7055)

Make sure that cupy (array-api) Array objects can be composed using asarray (#7095)

JIT: Fix compile error for op.routine including in0_type (#7096)

Don't use __del__ in TCPStore (#7111)

Fix cupy.nansum in fusing (#7114)

Fusion TypeError in cupy._core.fusion._call_ufunc() (#7130)

JIT: Fix compile error of minmax function (#7174)

Documentation

Docs: Add missing functions (#7112)

Installation

Support force-overwriting Docker image via workflow (#7091)

Bump version to v11.3.0 (#7182)

Tests

CI: Add ROCm 5.3 (#7125)

CI: Allow /test jenkins to trigger Jenkins only (#7129)

Install zlib for CUDA 11.8 Windows CI (#7138)

Fix for pytest 7.2 (#7149)

CI: improve use of cache in GitHub Actions (#7156)

CI: Add support for the latest FlexCI Windows image (#7172)

Others

CI: use pre-commit in GitHub Actions (#7132)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @andfoy @asi1024 @emcastillo @kmaehashi @leofang @takagi
Source code(tar.gz)
Source code(zip)
cupy_cuda102-11.3.0-cp310-cp310-manylinux1_x86_64.whl(61.38 MB)
cupy_cuda102-11.3.0-cp310-cp310-manylinux2014_aarch64.whl(35.23 MB)
cupy_cuda102-11.3.0-cp310-cp310-win_amd64.whl(43.35 MB)
cupy_cuda102-11.3.0-cp311-cp311-manylinux1_x86_64.whl(61.87 MB)
cupy_cuda102-11.3.0-cp311-cp311-manylinux2014_aarch64.whl(35.58 MB)
cupy_cuda102-11.3.0-cp311-cp311-win_amd64.whl(43.32 MB)
cupy_cuda102-11.3.0-cp37-cp37m-manylinux1_x86_64.whl(60.48 MB)
cupy_cuda102-11.3.0-cp37-cp37m-manylinux2014_aarch64.whl(34.11 MB)
cupy_cuda102-11.3.0-cp37-cp37m-win_amd64.whl(43.38 MB)
cupy_cuda102-11.3.0-cp38-cp38-manylinux1_x86_64.whl(63.85 MB)
cupy_cuda102-11.3.0-cp38-cp38-manylinux2014_aarch64.whl(37.42 MB)
cupy_cuda102-11.3.0-cp38-cp38-win_amd64.whl(43.48 MB)
cupy_cuda102-11.3.0-cp39-cp39-manylinux1_x86_64.whl(62.03 MB)
cupy_cuda102-11.3.0-cp39-cp39-manylinux2014_aarch64.whl(35.85 MB)
cupy_cuda102-11.3.0-cp39-cp39-win_amd64.whl(43.48 MB)
cupy_cuda110-11.3.0-cp310-cp310-manylinux1_x86_64.whl(76.21 MB)
cupy_cuda110-11.3.0-cp310-cp310-win_amd64.whl(58.13 MB)
cupy_cuda110-11.3.0-cp311-cp311-manylinux1_x86_64.whl(76.70 MB)
cupy_cuda110-11.3.0-cp311-cp311-win_amd64.whl(58.10 MB)
cupy_cuda110-11.3.0-cp37-cp37m-manylinux1_x86_64.whl(75.30 MB)
cupy_cuda110-11.3.0-cp37-cp37m-win_amd64.whl(58.17 MB)
cupy_cuda110-11.3.0-cp38-cp38-manylinux1_x86_64.whl(78.67 MB)
cupy_cuda110-11.3.0-cp38-cp38-win_amd64.whl(58.26 MB)
cupy_cuda110-11.3.0-cp39-cp39-manylinux1_x86_64.whl(76.86 MB)
cupy_cuda110-11.3.0-cp39-cp39-win_amd64.whl(58.26 MB)
cupy_cuda111-11.3.0-cp310-cp310-manylinux1_x86_64.whl(95.40 MB)
cupy_cuda111-11.3.0-cp310-cp310-win_amd64.whl(78.28 MB)
cupy_cuda111-11.3.0-cp311-cp311-manylinux1_x86_64.whl(95.88 MB)
cupy_cuda111-11.3.0-cp311-cp311-win_amd64.whl(78.26 MB)
cupy_cuda111-11.3.0-cp37-cp37m-manylinux1_x86_64.whl(94.49 MB)
cupy_cuda111-11.3.0-cp37-cp37m-win_amd64.whl(78.32 MB)
cupy_cuda111-11.3.0-cp38-cp38-manylinux1_x86_64.whl(97.85 MB)
cupy_cuda111-11.3.0-cp38-cp38-win_amd64.whl(78.42 MB)
cupy_cuda111-11.3.0-cp39-cp39-manylinux1_x86_64.whl(96.04 MB)
cupy_cuda111-11.3.0-cp39-cp39-win_amd64.whl(78.42 MB)
cupy_cuda11x-11.3.0-cp310-cp310-manylinux1_x86_64.whl(86.85 MB)
cupy_cuda11x-11.3.0-cp310-cp310-manylinux2014_aarch64.whl(98.12 MB)
cupy_cuda11x-11.3.0-cp310-cp310-win_amd64.whl(68.60 MB)
cupy_cuda11x-11.3.0-cp311-cp311-manylinux1_x86_64.whl(87.33 MB)
cupy_cuda11x-11.3.0-cp311-cp311-manylinux2014_aarch64.whl(99.27 MB)
cupy_cuda11x-11.3.0-cp311-cp311-win_amd64.whl(68.57 MB)
cupy_cuda11x-11.3.0-cp37-cp37m-manylinux1_x86_64.whl(85.94 MB)
cupy_cuda11x-11.3.0-cp37-cp37m-manylinux2014_aarch64.whl(96.41 MB)
cupy_cuda11x-11.3.0-cp37-cp37m-win_amd64.whl(68.64 MB)
cupy_cuda11x-11.3.0-cp38-cp38-manylinux1_x86_64.whl(89.31 MB)
cupy_cuda11x-11.3.0-cp38-cp38-manylinux2014_aarch64.whl(100.54 MB)
cupy_cuda11x-11.3.0-cp38-cp38-win_amd64.whl(68.74 MB)
cupy_cuda11x-11.3.0-cp39-cp39-manylinux1_x86_64.whl(87.50 MB)
cupy_cuda11x-11.3.0-cp39-cp39-manylinux2014_aarch64.whl(98.82 MB)
cupy_cuda11x-11.3.0-cp39-cp39-win_amd64.whl(68.73 MB)
cupy_rocm_4_3-11.3.0-cp310-cp310-manylinux1_x86_64.whl(36.15 MB)
cupy_rocm_4_3-11.3.0-cp311-cp311-manylinux1_x86_64.whl(36.61 MB)
cupy_rocm_4_3-11.3.0-cp37-cp37m-manylinux1_x86_64.whl(35.39 MB)
cupy_rocm_4_3-11.3.0-cp38-cp38-manylinux1_x86_64.whl(38.34 MB)
cupy_rocm_4_3-11.3.0-cp39-cp39-manylinux1_x86_64.whl(36.73 MB)
cupy_rocm_5_0-11.3.0-cp310-cp310-manylinux1_x86_64.whl(54.23 MB)
cupy_rocm_5_0-11.3.0-cp311-cp311-manylinux1_x86_64.whl(54.68 MB)
cupy_rocm_5_0-11.3.0-cp37-cp37m-manylinux1_x86_64.whl(53.47 MB)
cupy_rocm_5_0-11.3.0-cp38-cp38-manylinux1_x86_64.whl(56.41 MB)
cupy_rocm_5_0-11.3.0-cp39-cp39-manylinux1_x86_64.whl(54.79 MB)
v12.0.0a2(Oct 6, 2022)
This is the release note of v12.0.0a2. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Increased cupyx.scipy APIs (#6773, #6990, #7014, #7015, #7036)

The coverage of SciPy interpolate & special APIs has increased. (Thanks @khushi-411 & @1MrEnot!)

Initial support for ufunc methods (#7049)

Starting from v12, CuPy will support the corresponding NumPy ufunc methods. This release adds compatibility with ufunc.outer. Check the tracking issue (#7082) for detailed information.

Changes

New Features

Add cupyx.scipy.special.logsumexp (#6773)

Add cupyx.scipy.interpolate.KroghInterpolator (#6990)

Add scipy.special.expi and scipy.special.exp1 (#7014)

Add cupy.byte_bounds (#7015)

Adds cupyx.scipy.special.k0, cupyx.scipy.special.k1, cupyx.scipy.special.k0e, cupyx.scipy.special.k1e (#7036)

Add ufunc.outer (#7049)

Expose pairwise distance functions (#7063)

Enhancements

Support NCCL 2.12 ~ 2.14 (#6534)

Support cuDNN 8.5 (#7008)

Fix cupy.apply_along_axis for tuple retval (#7068)

Add wrapper for cutensorPermutation (#7070)

Bug Fixes

Fix JIT for scalar argument (#6948)

Make sparse argmin/max return a scalar array containing the index (#6976)

Fix csrsm2 memory leak (#7039)

Make sure weibull distribution support ndarrays (#7048)

Fix bessel test to pass ROCm CI (#7081)

Code Fixes

Cosmetic change in _routine_indexing.pyx (#7053)

Documentation

Fixes docstring for interpolation prefiltering (#6998)

Typo fix (#7045)

Tests

CI: Create a status for FlexCI dashboard (#7024)

CI: Migrate to GAR from GCR (#7064)

CI: tentatively fix hypothesis version (#7072)

Others

Introduce pre-commit (#6987)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@1MrEnot @andfoy @asi1024 @betatim @khushi-411 @kmaehashi @leofang @maronuu @takagi @wyli
Source code(tar.gz)
Source code(zip)
cupy_cuda102-12.0.0a2-cp310-cp310-manylinux2014_aarch64.whl(35.78 MB)
cupy_cuda102-12.0.0a2-cp310-cp310-manylinux2014_x86_64.whl(61.97 MB)
cupy_cuda102-12.0.0a2-cp310-cp310-win_amd64.whl(43.36 MB)
cupy_cuda102-12.0.0a2-cp37-cp37m-manylinux2014_aarch64.whl(33.99 MB)
cupy_cuda102-12.0.0a2-cp37-cp37m-manylinux2014_x86_64.whl(60.36 MB)
cupy_cuda102-12.0.0a2-cp37-cp37m-win_amd64.whl(43.40 MB)
cupy_cuda102-12.0.0a2-cp38-cp38-manylinux2014_aarch64.whl(37.28 MB)
cupy_cuda102-12.0.0a2-cp38-cp38-manylinux2014_x86_64.whl(63.71 MB)
cupy_cuda102-12.0.0a2-cp38-cp38-win_amd64.whl(43.49 MB)
cupy_cuda102-12.0.0a2-cp39-cp39-manylinux2014_aarch64.whl(35.72 MB)
cupy_cuda102-12.0.0a2-cp39-cp39-manylinux2014_x86_64.whl(61.89 MB)
cupy_cuda102-12.0.0a2-cp39-cp39-win_amd64.whl(43.49 MB)
cupy_cuda110-12.0.0a2-cp310-cp310-manylinux2014_x86_64.whl(76.80 MB)
cupy_cuda110-12.0.0a2-cp310-cp310-win_amd64.whl(58.14 MB)
cupy_cuda110-12.0.0a2-cp37-cp37m-manylinux2014_x86_64.whl(75.19 MB)
cupy_cuda110-12.0.0a2-cp37-cp37m-win_amd64.whl(58.18 MB)
cupy_cuda110-12.0.0a2-cp38-cp38-manylinux2014_x86_64.whl(78.53 MB)
cupy_cuda110-12.0.0a2-cp38-cp38-win_amd64.whl(58.28 MB)
cupy_cuda110-12.0.0a2-cp39-cp39-manylinux2014_x86_64.whl(76.72 MB)
cupy_cuda110-12.0.0a2-cp39-cp39-win_amd64.whl(58.27 MB)
cupy_cuda111-12.0.0a2-cp310-cp310-manylinux2014_x86_64.whl(95.98 MB)
cupy_cuda111-12.0.0a2-cp310-cp310-win_amd64.whl(78.30 MB)
cupy_cuda111-12.0.0a2-cp37-cp37m-manylinux2014_x86_64.whl(94.37 MB)
cupy_cuda111-12.0.0a2-cp37-cp37m-win_amd64.whl(78.33 MB)
cupy_cuda111-12.0.0a2-cp38-cp38-manylinux2014_x86_64.whl(97.71 MB)
cupy_cuda111-12.0.0a2-cp38-cp38-win_amd64.whl(78.43 MB)
cupy_cuda111-12.0.0a2-cp39-cp39-manylinux2014_x86_64.whl(95.91 MB)
cupy_cuda111-12.0.0a2-cp39-cp39-win_amd64.whl(78.43 MB)
cupy_cuda11x-12.0.0a2-cp310-cp310-manylinux2014_aarch64.whl(89.58 MB)
cupy_cuda11x-12.0.0a2-cp310-cp310-manylinux2014_x86_64.whl(78.18 MB)
cupy_cuda11x-12.0.0a2-cp310-cp310-win_amd64.whl(59.30 MB)
cupy_cuda11x-12.0.0a2-cp37-cp37m-manylinux2014_aarch64.whl(87.09 MB)
cupy_cuda11x-12.0.0a2-cp37-cp37m-manylinux2014_x86_64.whl(76.57 MB)
cupy_cuda11x-12.0.0a2-cp37-cp37m-win_amd64.whl(59.34 MB)
cupy_cuda11x-12.0.0a2-cp38-cp38-manylinux2014_aarch64.whl(91.21 MB)
cupy_cuda11x-12.0.0a2-cp38-cp38-manylinux2014_x86_64.whl(79.92 MB)
cupy_cuda11x-12.0.0a2-cp38-cp38-win_amd64.whl(59.44 MB)
cupy_cuda11x-12.0.0a2-cp39-cp39-manylinux2014_aarch64.whl(89.49 MB)
cupy_cuda11x-12.0.0a2-cp39-cp39-manylinux2014_x86_64.whl(78.11 MB)
cupy_cuda11x-12.0.0a2-cp39-cp39-win_amd64.whl(59.43 MB)
cupy_rocm_4_3-12.0.0a2-cp310-cp310-manylinux2014_x86_64.whl(36.67 MB)
cupy_rocm_4_3-12.0.0a2-cp37-cp37m-manylinux2014_x86_64.whl(35.28 MB)
cupy_rocm_4_3-12.0.0a2-cp38-cp38-manylinux2014_x86_64.whl(38.21 MB)
cupy_rocm_4_3-12.0.0a2-cp39-cp39-manylinux2014_x86_64.whl(36.60 MB)
cupy_rocm_5_0-12.0.0a2-cp310-cp310-manylinux2014_x86_64.whl(54.74 MB)
cupy_rocm_5_0-12.0.0a2-cp37-cp37m-manylinux2014_x86_64.whl(53.36 MB)
cupy_rocm_5_0-12.0.0a2-cp38-cp38-manylinux2014_x86_64.whl(56.28 MB)
cupy_rocm_5_0-12.0.0a2-cp39-cp39-manylinux2014_x86_64.whl(54.67 MB)
v11.2.0(Oct 6, 2022)
This is the release note of v11.2.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Changes

Enhancements

Support NCCL 2.12 ~ 2.14 (#7069)

Support cuDNN 8.5 (#7071)

Bug Fixes

Fix csrsm2 memory leak (#7041)

Fix JIT for scalar argument (#7043)

Make sparse argmin/max return a scalar array containing the index (#7057)

Code Fixes

Cosmetic change in _routine_indexing.pyx (#7056)

Documentation

Fixes docstring for interpolation prefiltering (#7037)

Typo fix (#7047)

Installation

Remove use of distutils.utils (#7009)

Tests

CI: Create a status for FlexCI dashboard (#7034)

CI: Migrate to GAR from GCR (#7066)

CI: tentatively fix hypothesis version (#7073)

Others

Introduce pre-commit (#7067)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@andfoy @asi1024 @betatim @kmaehashi @leofang @takagi @wyli
Source code(tar.gz)
Source code(zip)
cupy_cuda102-11.2.0-cp310-cp310-manylinux1_x86_64.whl(61.94 MB)
cupy_cuda102-11.2.0-cp310-cp310-manylinux2014_aarch64.whl(35.75 MB)
cupy_cuda102-11.2.0-cp310-cp310-win_amd64.whl(43.34 MB)
cupy_cuda102-11.2.0-cp37-cp37m-manylinux1_x86_64.whl(60.33 MB)
cupy_cuda102-11.2.0-cp37-cp37m-manylinux2014_aarch64.whl(33.97 MB)
cupy_cuda102-11.2.0-cp37-cp37m-win_amd64.whl(43.38 MB)
cupy_cuda102-11.2.0-cp38-cp38-manylinux1_x86_64.whl(63.67 MB)
cupy_cuda102-11.2.0-cp38-cp38-manylinux2014_aarch64.whl(37.25 MB)
cupy_cuda102-11.2.0-cp38-cp38-win_amd64.whl(43.47 MB)
cupy_cuda102-11.2.0-cp39-cp39-manylinux1_x86_64.whl(61.86 MB)
cupy_cuda102-11.2.0-cp39-cp39-manylinux2014_aarch64.whl(35.69 MB)
cupy_cuda102-11.2.0-cp39-cp39-win_amd64.whl(43.47 MB)
cupy_cuda110-11.2.0-cp310-cp310-manylinux1_x86_64.whl(76.77 MB)
cupy_cuda110-11.2.0-cp310-cp310-win_amd64.whl(58.12 MB)
cupy_cuda110-11.2.0-cp37-cp37m-manylinux1_x86_64.whl(75.16 MB)
cupy_cuda110-11.2.0-cp37-cp37m-win_amd64.whl(58.16 MB)
cupy_cuda110-11.2.0-cp38-cp38-manylinux1_x86_64.whl(78.50 MB)
cupy_cuda110-11.2.0-cp38-cp38-win_amd64.whl(58.26 MB)
cupy_cuda110-11.2.0-cp39-cp39-manylinux1_x86_64.whl(76.69 MB)
cupy_cuda110-11.2.0-cp39-cp39-win_amd64.whl(58.25 MB)
cupy_cuda111-11.2.0-cp310-cp310-manylinux1_x86_64.whl(95.95 MB)
cupy_cuda111-11.2.0-cp310-cp310-win_amd64.whl(78.28 MB)
cupy_cuda111-11.2.0-cp37-cp37m-manylinux1_x86_64.whl(94.34 MB)
cupy_cuda111-11.2.0-cp37-cp37m-win_amd64.whl(78.31 MB)
cupy_cuda111-11.2.0-cp38-cp38-manylinux1_x86_64.whl(97.68 MB)
cupy_cuda111-11.2.0-cp38-cp38-win_amd64.whl(78.41 MB)
cupy_cuda111-11.2.0-cp39-cp39-manylinux1_x86_64.whl(95.87 MB)
cupy_cuda111-11.2.0-cp39-cp39-win_amd64.whl(78.41 MB)
cupy_cuda11x-11.2.0-cp310-cp310-manylinux1_x86_64.whl(78.15 MB)
cupy_cuda11x-11.2.0-cp310-cp310-manylinux2014_aarch64.whl(89.54 MB)
cupy_cuda11x-11.2.0-cp310-cp310-win_amd64.whl(59.28 MB)
cupy_cuda11x-11.2.0-cp37-cp37m-manylinux1_x86_64.whl(76.54 MB)
cupy_cuda11x-11.2.0-cp37-cp37m-manylinux2014_aarch64.whl(87.06 MB)
cupy_cuda11x-11.2.0-cp37-cp37m-win_amd64.whl(59.32 MB)
cupy_cuda11x-11.2.0-cp38-cp38-manylinux1_x86_64.whl(79.88 MB)
cupy_cuda11x-11.2.0-cp38-cp38-manylinux2014_aarch64.whl(91.17 MB)
cupy_cuda11x-11.2.0-cp38-cp38-win_amd64.whl(59.42 MB)
cupy_cuda11x-11.2.0-cp39-cp39-manylinux1_x86_64.whl(78.08 MB)
cupy_cuda11x-11.2.0-cp39-cp39-manylinux2014_aarch64.whl(89.46 MB)
cupy_cuda11x-11.2.0-cp39-cp39-win_amd64.whl(59.41 MB)
cupy_rocm_4_3-11.2.0-cp310-cp310-manylinux1_x86_64.whl(36.64 MB)
cupy_rocm_4_3-11.2.0-cp37-cp37m-manylinux1_x86_64.whl(35.26 MB)
cupy_rocm_4_3-11.2.0-cp38-cp38-manylinux1_x86_64.whl(38.18 MB)
cupy_rocm_4_3-11.2.0-cp39-cp39-manylinux1_x86_64.whl(36.57 MB)
cupy_rocm_5_0-11.2.0-cp310-cp310-manylinux1_x86_64.whl(54.71 MB)
cupy_rocm_5_0-11.2.0-cp37-cp37m-manylinux1_x86_64.whl(53.33 MB)
cupy_rocm_5_0-11.2.0-cp38-cp38-manylinux1_x86_64.whl(56.25 MB)
cupy_rocm_5_0-11.2.0-cp39-cp39-manylinux1_x86_64.whl(54.64 MB)
v12.0.0a1(Sep 1, 2022)
This is the release note of v12.0.0a1. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Increased cupyx.scipy APIs (#6823, #6849, #6855, #6890, #6958, #6971)

The coverage of SciPy interpolate, stats & special APIs has increased. (Thanks @khushi-411 & @andoorve!)

Jetson AGX Orin Support (#6876)

Arm (aarch64) wheels are now compiled with support for compute capability 8.7. These wheels are available through our Pip index: pip install --pre cupy-cuda11x -f https://pip.cupy.dev/aarch64

Changes

New Features

Add cupy.heaviside api. (#6798)

Add cupyx.scipy.special.log_softmax (#6823)

Add cupyx.scipy.stats.boxcox_llf (#6849)

Add cupyx.scipy.stats.{zmap, zscore} (#6855)

Add cupyx.scipy.special.softmax (#6890)

Add dtype, fweights, aweights to cupy.cov (#6892)

Add cupyx.scipy.interpolate.BarycentricInterpolator (#6958)

Add scipy.special.cosm1 to cupyx (#6971)

Enhancements

Enhance JIT error message when __device__ option is missing (#6837)

Fix augassign target is evaluated twice in JIT (#6844)

JIT: Add type annotation in _compile.py (#6859)

Add complex support for nanvar and nanstd (#6869)

Update cupy.array_api (#6871)

Accept kind in sort/argsort and fix cupy.array_api.{sort,argsort} accordingly (#6872)

Add CC 8.7 for Jetson Orin (#6876)

Update cupy-wheel for v11 (#6903)

Support deg in cupy.angle (#6905)

Make sure that uniform sampling respects broadcasting (#6928)

Update cupy.array_api (cont'd) (#6932)

Support SciPy 1.9 (#6962)

Make testing decorators able to use with @pytest.mark.parametrize in some cases (#6984)

Relaxed C-contiguous requirement for changing dtype of different size (#6848)

Support keepdims parameter for average (#6852)

Support equal_nan parameter for unique (#6853)

Performance Improvements

Efficiency improvements in cupyx.scipy.ndimage utilities (#6953)

Bug Fixes

Generate CUBIN for all supported GPUs at build time (#6875)

Fix boxcox_llf (#6884)

Fix real and imag in subclass (#6896)

Fix cupy.clip to match numpy (#6920)

Let argpartition use the kth argument properly (#6921)

Fix cuTensorNet shim layer (#6934)

Fix occasional hang in sparse distributed (#6942)

Fix SciPy dependency leak (#6947)

Fix CUB reduction with zero-size arrays (#6960)

Code Fixes

Fix function names (#6877)

Remove proxy functions for softlink (#6879)

Suppress nvcc warning (#6954)

Documentation

Bump documentation build requirements (#6825)

Reverting to v10 installation instruction until v11 stable release (#6836)

Fix ROCm supported versions in compat matrix (#6846)

Generate docs for private classes in one location (#6857)

Expand breaking change & best practice on device management (#6883)

Update installation guide for v11 (aarch64) (#6888)

Update install instructions on README (#6889)

Document matmul supports out (#6898)

Fix docs build failure (#6955)

Installation

Reorganize build scripts: define compile options declaratively (#6911)

Parallelize Cythonize (#6975)

Remove use of distutils.utils (#7006)

Examples

Make matrix in CG example positive definite (#6939)

Tests

Update tags for FlexCI projects (#6814)

Add config for cupy.win.cuda117 (#6880)

Fix XFAIL for tests/cupyx_tests/scipy_tests/sparse_tests/test_coo.py when scipy>=1.9.0rc2 (#6894)

Use ubuntu-22.04 as GitHub Actions runner image (#6988)

Revert comment fix (#6995)

Filter warnings from setuptools 65 (#7000)

CI: bump CUDA version used in cuda-python test (#7022)

CI: Add ROCm 5.1 and 5.2 (#6828)

CI: Show all errors when doc build fail (#6910)

Others

Bump version to v12.0.0a1 (#7027)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@andfoy @andoorve @asi1024 @BasLaa @emcastillo @grlee77 @khushi-411 @kmaehashi @leofang @pri1311 @takagi @tom24d @toslunar @tpkessler
Source code(tar.gz)
Source code(zip)
cupy_cuda102-12.0.0a1-cp310-cp310-manylinux2014_aarch64.whl(35.76 MB)
cupy_cuda102-12.0.0a1-cp310-cp310-manylinux2014_x86_64.whl(61.95 MB)
cupy_cuda102-12.0.0a1-cp310-cp310-win_amd64.whl(43.35 MB)
cupy_cuda102-12.0.0a1-cp37-cp37m-manylinux2014_aarch64.whl(33.98 MB)
cupy_cuda102-12.0.0a1-cp37-cp37m-manylinux2014_x86_64.whl(60.35 MB)
cupy_cuda102-12.0.0a1-cp37-cp37m-win_amd64.whl(43.39 MB)
cupy_cuda102-12.0.0a1-cp38-cp38-manylinux2014_aarch64.whl(37.26 MB)
cupy_cuda102-12.0.0a1-cp38-cp38-manylinux2014_x86_64.whl(63.68 MB)
cupy_cuda102-12.0.0a1-cp38-cp38-win_amd64.whl(43.48 MB)
cupy_cuda102-12.0.0a1-cp39-cp39-manylinux2014_aarch64.whl(35.70 MB)
cupy_cuda102-12.0.0a1-cp39-cp39-manylinux2014_x86_64.whl(61.87 MB)
cupy_cuda102-12.0.0a1-cp39-cp39-win_amd64.whl(43.48 MB)
cupy_cuda110-12.0.0a1-cp310-cp310-manylinux2014_x86_64.whl(76.78 MB)
cupy_cuda110-12.0.0a1-cp310-cp310-win_amd64.whl(58.13 MB)
cupy_cuda110-12.0.0a1-cp37-cp37m-manylinux2014_x86_64.whl(75.17 MB)
cupy_cuda110-12.0.0a1-cp37-cp37m-win_amd64.whl(58.17 MB)
cupy_cuda110-12.0.0a1-cp38-cp38-manylinux2014_x86_64.whl(78.51 MB)
cupy_cuda110-12.0.0a1-cp38-cp38-win_amd64.whl(58.26 MB)
cupy_cuda110-12.0.0a1-cp39-cp39-manylinux2014_x86_64.whl(76.70 MB)
cupy_cuda110-12.0.0a1-cp39-cp39-win_amd64.whl(58.26 MB)
cupy_cuda111-12.0.0a1-cp310-cp310-manylinux2014_x86_64.whl(95.96 MB)
cupy_cuda111-12.0.0a1-cp310-cp310-win_amd64.whl(78.28 MB)
cupy_cuda111-12.0.0a1-cp37-cp37m-manylinux2014_x86_64.whl(94.35 MB)
cupy_cuda111-12.0.0a1-cp37-cp37m-win_amd64.whl(78.32 MB)
cupy_cuda111-12.0.0a1-cp38-cp38-manylinux2014_x86_64.whl(97.69 MB)
cupy_cuda111-12.0.0a1-cp38-cp38-win_amd64.whl(78.42 MB)
cupy_cuda111-12.0.0a1-cp39-cp39-manylinux2014_x86_64.whl(95.88 MB)
cupy_cuda111-12.0.0a1-cp39-cp39-win_amd64.whl(78.42 MB)
cupy_cuda11x-12.0.0a1-cp310-cp310-manylinux2014_aarch64.whl(89.56 MB)
cupy_cuda11x-12.0.0a1-cp310-cp310-manylinux2014_x86_64.whl(78.16 MB)
cupy_cuda11x-12.0.0a1-cp310-cp310-win_amd64.whl(59.29 MB)
cupy_cuda11x-12.0.0a1-cp37-cp37m-manylinux2014_aarch64.whl(87.08 MB)
cupy_cuda11x-12.0.0a1-cp37-cp37m-manylinux2014_x86_64.whl(76.56 MB)
cupy_cuda11x-12.0.0a1-cp37-cp37m-win_amd64.whl(59.33 MB)
cupy_cuda11x-12.0.0a1-cp38-cp38-manylinux2014_aarch64.whl(91.19 MB)
cupy_cuda11x-12.0.0a1-cp38-cp38-manylinux2014_x86_64.whl(79.89 MB)
cupy_cuda11x-12.0.0a1-cp38-cp38-win_amd64.whl(59.43 MB)
cupy_cuda11x-12.0.0a1-cp39-cp39-manylinux2014_aarch64.whl(89.47 MB)
cupy_cuda11x-12.0.0a1-cp39-cp39-manylinux2014_x86_64.whl(78.09 MB)
cupy_cuda11x-12.0.0a1-cp39-cp39-win_amd64.whl(59.42 MB)
cupy_rocm_4_3-12.0.0a1-cp310-cp310-manylinux2014_x86_64.whl(36.66 MB)
cupy_rocm_4_3-12.0.0a1-cp37-cp37m-manylinux2014_x86_64.whl(35.27 MB)
cupy_rocm_4_3-12.0.0a1-cp38-cp38-manylinux2014_x86_64.whl(38.19 MB)
cupy_rocm_4_3-12.0.0a1-cp39-cp39-manylinux2014_x86_64.whl(36.59 MB)
cupy_rocm_5_0-12.0.0a1-cp310-cp310-manylinux2014_x86_64.whl(54.72 MB)
cupy_rocm_5_0-12.0.0a1-cp37-cp37m-manylinux2014_x86_64.whl(53.35 MB)
cupy_rocm_5_0-12.0.0a1-cp38-cp38-manylinux2014_x86_64.whl(56.26 MB)
cupy_rocm_5_0-12.0.0a1-cp39-cp39-manylinux2014_x86_64.whl(54.65 MB)
v11.1.0(Sep 1, 2022)
This is the release note of v11.1.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Jetson AGX Orin Support (#6876)

Arm (aarch64) wheels are now compiled with support for compute capability 8.7. These wheels are available through our Pip index: pip install cupy-cuda11x -f https://pip.cupy.dev/aarch64

Changes

New Features

Add cupyx.scipy.special.log_softmax (#6966)

Enhancements

Update cupy.array_api (#6929)

Add CC 8.7 for Jetson Orin (#6950)

Accept kind in sort/argsort and fix cupy.array_api.{sort,argsort} accordingly (#6951)

Fix augassign target is evaluated twice in JIT (#6964)

Update cupy.array_api (cont'd) (#6973)

Support SciPy 1.9 (#6981)

Enhance JIT error message when __device__ option is missing (#6991)

JIT: Add type annotation in _compile.py (#6993)

Make testing decorators able to use with @pytest.mark.parametrize in some cases (#7010)

Support keepdims parameter for average (#6897)

Support equal_nan parameter for unique (#6904)

Bug Fixes

Fix CUB reduction with zero-size arrays (#6968)

Fix cuTensorNet shim layer (#6979)

Fix SciPy dependency leak (#6980)

Fix occasional hang in sparse distributed (#6997)

Let argpartition use the kth argument properly (#7020)

Code Fixes

Remove proxy functions for softlink (#6946)

Suppress nvcc warning (#6970)

Documentation

Document matmul supports out (#6899)

Bump documentation build requirements (#6930)

Expand breaking change & best practice on device management (#6952)

Fix docs build failure (#6967)

Tests

Fix XFAIL for tests/cupyx_tests/scipy_tests/sparse_tests/test_coo.py when scipy>=1.9.0rc2 (#6963)

Use ubuntu-22.04 as GitHub Actions runner image (#6992)

Revert comment fix (#6996)

Filter warnings from setuptools 65 (#7004)

CI: bump CUDA version used in cuda-python test (#7023)

CI: Show all errors when doc build fail (#6945)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@asi1024 @emcastillo @khushi-411 @kmaehashi @leofang @takagi @toslunar
Source code(tar.gz)
Source code(zip)
cupy_cuda102-11.1.0-cp310-cp310-manylinux1_x86_64.whl(61.93 MB)
cupy_cuda102-11.1.0-cp310-cp310-manylinux2014_aarch64.whl(35.75 MB)
cupy_cuda102-11.1.0-cp310-cp310-win_amd64.whl(43.34 MB)
cupy_cuda102-11.1.0-cp37-cp37m-manylinux1_x86_64.whl(60.33 MB)
cupy_cuda102-11.1.0-cp37-cp37m-manylinux2014_aarch64.whl(33.97 MB)
cupy_cuda102-11.1.0-cp37-cp37m-win_amd64.whl(43.38 MB)
cupy_cuda102-11.1.0-cp38-cp38-manylinux1_x86_64.whl(63.67 MB)
cupy_cuda102-11.1.0-cp38-cp38-manylinux2014_aarch64.whl(37.25 MB)
cupy_cuda102-11.1.0-cp38-cp38-win_amd64.whl(43.47 MB)
cupy_cuda102-11.1.0-cp39-cp39-manylinux1_x86_64.whl(61.86 MB)
cupy_cuda102-11.1.0-cp39-cp39-manylinux2014_aarch64.whl(35.69 MB)
cupy_cuda102-11.1.0-cp39-cp39-win_amd64.whl(43.47 MB)
cupy_cuda110-11.1.0-cp310-cp310-manylinux1_x86_64.whl(76.76 MB)
cupy_cuda110-11.1.0-cp310-cp310-win_amd64.whl(58.12 MB)
cupy_cuda110-11.1.0-cp37-cp37m-manylinux1_x86_64.whl(75.16 MB)
cupy_cuda110-11.1.0-cp37-cp37m-win_amd64.whl(58.16 MB)
cupy_cuda110-11.1.0-cp38-cp38-manylinux1_x86_64.whl(78.49 MB)
cupy_cuda110-11.1.0-cp38-cp38-win_amd64.whl(58.26 MB)
cupy_cuda110-11.1.0-cp39-cp39-manylinux1_x86_64.whl(76.69 MB)
cupy_cuda110-11.1.0-cp39-cp39-win_amd64.whl(58.25 MB)
cupy_cuda111-11.1.0-cp310-cp310-manylinux1_x86_64.whl(95.94 MB)
cupy_cuda111-11.1.0-cp310-cp310-win_amd64.whl(78.28 MB)
cupy_cuda111-11.1.0-cp37-cp37m-manylinux1_x86_64.whl(94.34 MB)
cupy_cuda111-11.1.0-cp37-cp37m-win_amd64.whl(78.31 MB)
cupy_cuda111-11.1.0-cp38-cp38-manylinux1_x86_64.whl(97.67 MB)
cupy_cuda111-11.1.0-cp38-cp38-win_amd64.whl(78.41 MB)
cupy_cuda111-11.1.0-cp39-cp39-manylinux1_x86_64.whl(95.87 MB)
cupy_cuda111-11.1.0-cp39-cp39-win_amd64.whl(78.41 MB)
cupy_cuda11x-11.1.0-cp310-cp310-manylinux1_x86_64.whl(78.15 MB)
cupy_cuda11x-11.1.0-cp310-cp310-manylinux2014_aarch64.whl(89.54 MB)
cupy_cuda11x-11.1.0-cp310-cp310-win_amd64.whl(59.28 MB)
cupy_cuda11x-11.1.0-cp37-cp37m-manylinux1_x86_64.whl(76.54 MB)
cupy_cuda11x-11.1.0-cp37-cp37m-manylinux2014_aarch64.whl(87.06 MB)
cupy_cuda11x-11.1.0-cp37-cp37m-win_amd64.whl(59.32 MB)
cupy_cuda11x-11.1.0-cp38-cp38-manylinux1_x86_64.whl(79.88 MB)
cupy_cuda11x-11.1.0-cp38-cp38-manylinux2014_aarch64.whl(91.17 MB)
cupy_cuda11x-11.1.0-cp38-cp38-win_amd64.whl(59.42 MB)
cupy_cuda11x-11.1.0-cp39-cp39-manylinux1_x86_64.whl(78.07 MB)
cupy_cuda11x-11.1.0-cp39-cp39-manylinux2014_aarch64.whl(89.46 MB)
cupy_cuda11x-11.1.0-cp39-cp39-win_amd64.whl(59.41 MB)
cupy_rocm_4_3-11.1.0-cp310-cp310-manylinux1_x86_64.whl(36.64 MB)
cupy_rocm_4_3-11.1.0-cp37-cp37m-manylinux1_x86_64.whl(35.26 MB)
cupy_rocm_4_3-11.1.0-cp38-cp38-manylinux1_x86_64.whl(38.17 MB)
cupy_rocm_4_3-11.1.0-cp39-cp39-manylinux1_x86_64.whl(36.57 MB)
cupy_rocm_5_0-11.1.0-cp310-cp310-manylinux1_x86_64.whl(54.71 MB)
cupy_rocm_5_0-11.1.0-cp37-cp37m-manylinux1_x86_64.whl(53.33 MB)
cupy_rocm_5_0-11.1.0-cp38-cp38-manylinux1_x86_64.whl(56.24 MB)
cupy_rocm_5_0-11.1.0-cp39-cp39-manylinux1_x86_64.whl(54.64 MB)
v11.0.0(Jul 28, 2022)
This is the release note of v11.0.0. See here for the complete list of solved issues and merged PRs.

This release note only covers changes made since v11.0.0rc1 release. Check out our blog for highlights in the v11 release!

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

cupy-wheel package

Currently, downstream projects depending on CuPy had a hard time specifying a binary wheel as a dependency, and it was the users’ responsibility to install the correct package in their environments. CuPy v10 introduced the experimental cupy-wheel meta-package. In this release, we declare this feature ready for production environments. cupy-wheel will examine the users’ environment and automatically select the matching CuPy binary wheel to be installed.

Changes

For all changes in v11, please refer to the release notes of the pre-releases (alpha1, alpha2, beta1, beta2, beta3, rc1).

Enhancements

Support deg in cupy.angle (#6909)

Update cupy-wheel for v11 (#6913)

Relaxed C-contiguous requirement for changing dtype of different size (#6850)

Bug Fixes

Generate CUBIN for all supported GPUs at build time (#6881)

Fix real and imag in subclass (#6907)

Code Fixes

Fix function names (#6878)

Documentation

Fix ROCm supported versions in compat matrix (#6851)

Generate docs for private classes in one location (#6858)

Installation

Bump version to v11.0.0 (#6915)

Tests

Update tags for FlexCI projects (#6860)

CI: Add ROCm 5.1 and 5.2 (#6861)

Add config for cupy.win.cuda117 (#6885)

Others

Bump branch version to v11 (#6845)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@emcastillo @kmaehashi @takagi
Source code(tar.gz)
Source code(zip)
cupy_cuda102-11.0.0-cp310-cp310-manylinux1_x86_64.whl(61.90 MB)
cupy_cuda102-11.0.0-cp310-cp310-manylinux2014_aarch64.whl(35.72 MB)
cupy_cuda102-11.0.0-cp310-cp310-win_amd64.whl(43.33 MB)
cupy_cuda102-11.0.0-cp37-cp37m-manylinux1_x86_64.whl(60.31 MB)
cupy_cuda102-11.0.0-cp37-cp37m-manylinux2014_aarch64.whl(33.95 MB)
cupy_cuda102-11.0.0-cp37-cp37m-win_amd64.whl(43.37 MB)
cupy_cuda102-11.0.0-cp38-cp38-manylinux1_x86_64.whl(63.63 MB)
cupy_cuda102-11.0.0-cp38-cp38-manylinux2014_aarch64.whl(37.21 MB)
cupy_cuda102-11.0.0-cp38-cp38-win_amd64.whl(43.46 MB)
cupy_cuda102-11.0.0-cp39-cp39-manylinux1_x86_64.whl(61.82 MB)
cupy_cuda102-11.0.0-cp39-cp39-manylinux2014_aarch64.whl(35.65 MB)
cupy_cuda102-11.0.0-cp39-cp39-win_amd64.whl(43.46 MB)
cupy_cuda110-11.0.0-cp310-cp310-manylinux1_x86_64.whl(76.73 MB)
cupy_cuda110-11.0.0-cp310-cp310-win_amd64.whl(58.11 MB)
cupy_cuda110-11.0.0-cp37-cp37m-manylinux1_x86_64.whl(75.13 MB)
cupy_cuda110-11.0.0-cp37-cp37m-win_amd64.whl(58.15 MB)
cupy_cuda110-11.0.0-cp38-cp38-manylinux1_x86_64.whl(78.45 MB)
cupy_cuda110-11.0.0-cp38-cp38-win_amd64.whl(58.25 MB)
cupy_cuda110-11.0.0-cp39-cp39-manylinux1_x86_64.whl(76.65 MB)
cupy_cuda110-11.0.0-cp39-cp39-win_amd64.whl(58.24 MB)
cupy_cuda111-11.0.0-cp310-cp310-manylinux1_x86_64.whl(95.90 MB)
cupy_cuda111-11.0.0-cp310-cp310-win_amd64.whl(78.26 MB)
cupy_cuda111-11.0.0-cp37-cp37m-manylinux1_x86_64.whl(94.31 MB)
cupy_cuda111-11.0.0-cp37-cp37m-win_amd64.whl(78.30 MB)
cupy_cuda111-11.0.0-cp38-cp38-manylinux1_x86_64.whl(97.63 MB)
cupy_cuda111-11.0.0-cp38-cp38-win_amd64.whl(78.40 MB)
cupy_cuda111-11.0.0-cp39-cp39-manylinux1_x86_64.whl(95.83 MB)
cupy_cuda111-11.0.0-cp39-cp39-win_amd64.whl(78.40 MB)
cupy_cuda11x-11.0.0-cp310-cp310-manylinux1_x86_64.whl(78.11 MB)
cupy_cuda11x-11.0.0-cp310-cp310-manylinux2014_aarch64.whl(89.51 MB)
cupy_cuda11x-11.0.0-cp310-cp310-win_amd64.whl(59.27 MB)
cupy_cuda11x-11.0.0-cp37-cp37m-manylinux1_x86_64.whl(76.52 MB)
cupy_cuda11x-11.0.0-cp37-cp37m-manylinux2014_aarch64.whl(87.03 MB)
cupy_cuda11x-11.0.0-cp37-cp37m-win_amd64.whl(59.31 MB)
cupy_cuda11x-11.0.0-cp38-cp38-manylinux1_x86_64.whl(79.84 MB)
cupy_cuda11x-11.0.0-cp38-cp38-manylinux2014_aarch64.whl(91.15 MB)
cupy_cuda11x-11.0.0-cp38-cp38-win_amd64.whl(59.41 MB)
cupy_cuda11x-11.0.0-cp39-cp39-manylinux1_x86_64.whl(78.03 MB)
cupy_cuda11x-11.0.0-cp39-cp39-manylinux2014_aarch64.whl(89.43 MB)
cupy_cuda11x-11.0.0-cp39-cp39-win_amd64.whl(59.40 MB)
cupy_rocm_4_3-11.0.0-cp310-cp310-manylinux1_x86_64.whl(36.60 MB)
cupy_rocm_4_3-11.0.0-cp37-cp37m-manylinux1_x86_64.whl(35.23 MB)
cupy_rocm_4_3-11.0.0-cp38-cp38-manylinux1_x86_64.whl(38.13 MB)
cupy_rocm_4_3-11.0.0-cp39-cp39-manylinux1_x86_64.whl(36.53 MB)
cupy_rocm_5_0-11.0.0-cp310-cp310-manylinux1_x86_64.whl(54.67 MB)
cupy_rocm_5_0-11.0.0-cp37-cp37m-manylinux1_x86_64.whl(53.30 MB)
cupy_rocm_5_0-11.0.0-cp38-cp38-manylinux1_x86_64.whl(56.20 MB)
cupy_rocm_5_0-11.0.0-cp39-cp39-manylinux1_x86_64.whl(54.60 MB)
v11.0.0rc1(Jun 30, 2022)
This is the release note of v11.0.0rc1. See here for the complete list of solved issues and merged PRs.

We are going to release v11.0.0 on July 28th. Please start testing your workload with this release candidate (pip install --pre cupy-cuda11x -f https://pip.cupy.dev/pre). See the Upgrade Guide for the list of possible breaking changes.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Support CUDA 11.7 (#6767)

Full support for CUDA 11.7 has been added as of this release. Binary packages can be installed with the following command: pip install --pre cupy-cuda11x -f https://pip.cupy.dev/pre

Unified Binary Package for CUDA 11.2 or later (#6730)

CuPy v11 provides a unified binary package named cupy-cuda11x that supports all CUDA 11.2+ releases. This replaces per-CUDA version binary packages (cupy-cuda112, cupy-cuda113, …, cupy-cuda117) provided in CuPy v10 or earlier.

Note that CUDA 11.1 or earlier still requires per-CUDA version binary packages. cupy-cuda102, cupy-cuda110, and cupy-cuda111 will be provided for CUDA 10.2, 11.0, and 11.1, respectively.

Binary Package for Arm Platform (#6705)

CuPy v11 provides cupy-cuda11x binary package built for aarch64, which supports CUDA 11.2+ Arm SBSA and JetPack 5. These wheels are available through our Pip index: pip install --pre cupy-cuda11x -f https://pip.cupy.dev/aarch64

Support for ndarray subclassing (#6720, #6755)

This release allows users to subclass cupy.ndarray, using the same protocol as NumPy:

class C(cupy.ndarray): def __new__(cls, *args, info=None, **kwargs): obj = super().__new__(cls, *args, **kwargs) obj.info = info return obj def __array_finalize__(self, obj): if obj is None: return self.info = getattr(obj, 'info', None) a = C([0, 1, 2, 3], info='information') assert type(a) is C assert issubclass(type(a), cupy.ndarray) assert a.info == 'information'

Note that view casting and new from template mechanisms are also supported as described by the NumPy documentation.

Add Collective Communication APIs in cupyx.distributed for Sparse Matrices

All the collective calls implemented for dense matrices now support sparse matrices. Users interested in this feature should install mpi4py in order to perform an efficient metadata exchange.

Google Summer of Code 2022

We would like to give a warm welcome to @khushi-411 who will be working in adding support for the cupyx.scipy.interpolate APIs as part of her GSoC internship!

Changes without compatibility

Bump base Docker image to the latest supported one (#6802)

CuPy official Docker images have been upgraded. Users relying on these images may suffer from compatibility issues with preinstalled tools or libraries.

Changes

New Features

Add cupy.setxor1d (#6582)

Add initial cupyx.spatial.distance support from pylibraft (#6690)

Support cupy.ndarray subclassing - Part 2 - View casting (#6720)

Add sparse broadcast (#6758)

Add sparse reduce (#6761)

Add sparse all_reduce and minor fixes (#6762)

Add sparse all_to_all, reduce_scatter, send_recv (#6765)

Subclass cupy.ndarray subclassing - Part 3 - New from template (ufunc) (#6775)

Add cupyx.scipy.special.log_ndtr (#6776)

Add cupyx.scipy.special.expn (#6790)

Enhancements

Utilize CUDA Enhanced Compatibility (#6730)

Fix to return correct CUDA version when in CUDA Python mode (#6736)

Support CUDA 11.7 (#6767)

Make the warning for cupy.array_api say "cupy" instead of "numpy" (#6791)

Utilize CUDA Enhanced Compatibility in all wrappers (#6799)

Add support for cupy-cuda11x wheel (#6800)

Bump base Docker image to the latest supported one (#6802)

Remove CUPY_CUDA_VERSION as much as possible (#6810)

Raise UserWarning in cupy.cuda.compile_with_cache (#6818)

cupy-wheel: Use NVRTC to infer the toolkit version (#6819)

Support NumPy 1.23 (#6820)

Fix for NumPy 1.23 (#6807)

Performance Improvements

Improved integer matrix multiplication performance by modifying tuning parameters (#6703)

Use fast convolution algorithm in cupy.poly1d.__pow__ (#6770)

Bug Fixes

Fix polynomial tests (#6721)

Fix batched matmul for integral numbers (#6725)

Fix cupy.median for NaN inputs (#6759)

Fix required cusparse symbol not loaded in CUDA 11.1.1 (#6806)

Code Fixes

Add type annotation in _cuda_types.py (#6726)

Subclass rename (#6746)

Add type annotation to JIT internal types (#6778)

Documentation

Add CUDA 11.7 on documents (#6768)

Improved NVTX documentation (#6774)

Fix docs to hide ndarray_base (#6782)

Update docs for cupy-cuda11x wheel (#6803)

Bump NumPy version used in docs (#6824)

Add upgrade guide for CuPy v11 (#6826)

Tests

Fix mempool tests (#6591)

CI: Fix prep script to show build failure details (#6781)

Fix a potential variable misuse bug (#6786)

Fix CI Docker image build failing in head test (#6804)

Tiny clean up in CI script (#6809)

Others

Fix docker workflow to push to latest image (#6832)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@andoorve @asi1024 @asmeurer @cjnolet @emcastillo @khushi-411 @kmaehashi @leofang @LostBenjamin @pri1311 @rietmann-nv @takagi
Source code(tar.gz)
Source code(zip)
cupy_cuda102-11.0.0rc1-cp310-cp310-manylinux1_x86_64.whl(61.89 MB)
cupy_cuda102-11.0.0rc1-cp310-cp310-manylinux2014_aarch64.whl(35.72 MB)
cupy_cuda102-11.0.0rc1-cp310-cp310-win_amd64.whl(43.33 MB)
cupy_cuda102-11.0.0rc1-cp37-cp37m-manylinux1_x86_64.whl(60.31 MB)
cupy_cuda102-11.0.0rc1-cp37-cp37m-manylinux2014_aarch64.whl(33.95 MB)
cupy_cuda102-11.0.0rc1-cp37-cp37m-win_amd64.whl(43.36 MB)
cupy_cuda102-11.0.0rc1-cp38-cp38-manylinux1_x86_64.whl(63.62 MB)
cupy_cuda102-11.0.0rc1-cp38-cp38-manylinux2014_aarch64.whl(37.21 MB)
cupy_cuda102-11.0.0rc1-cp38-cp38-win_amd64.whl(43.46 MB)
cupy_cuda102-11.0.0rc1-cp39-cp39-manylinux1_x86_64.whl(61.81 MB)
cupy_cuda102-11.0.0rc1-cp39-cp39-manylinux2014_aarch64.whl(35.65 MB)
cupy_cuda102-11.0.0rc1-cp39-cp39-win_amd64.whl(43.45 MB)
cupy_cuda110-11.0.0rc1-cp310-cp310-manylinux1_x86_64.whl(76.72 MB)
cupy_cuda110-11.0.0rc1-cp310-cp310-win_amd64.whl(58.11 MB)
cupy_cuda110-11.0.0rc1-cp37-cp37m-manylinux1_x86_64.whl(75.13 MB)
cupy_cuda110-11.0.0rc1-cp37-cp37m-win_amd64.whl(58.15 MB)
cupy_cuda110-11.0.0rc1-cp38-cp38-manylinux1_x86_64.whl(78.45 MB)
cupy_cuda110-11.0.0rc1-cp38-cp38-win_amd64.whl(58.24 MB)
cupy_cuda110-11.0.0rc1-cp39-cp39-manylinux1_x86_64.whl(76.64 MB)
cupy_cuda110-11.0.0rc1-cp39-cp39-win_amd64.whl(58.24 MB)
cupy_cuda111-11.0.0rc1-cp310-cp310-manylinux1_x86_64.whl(95.90 MB)
cupy_cuda111-11.0.0rc1-cp310-cp310-win_amd64.whl(78.26 MB)
cupy_cuda111-11.0.0rc1-cp37-cp37m-manylinux1_x86_64.whl(94.31 MB)
cupy_cuda111-11.0.0rc1-cp37-cp37m-win_amd64.whl(78.30 MB)
cupy_cuda111-11.0.0rc1-cp38-cp38-manylinux1_x86_64.whl(97.63 MB)
cupy_cuda111-11.0.0rc1-cp38-cp38-win_amd64.whl(78.40 MB)
cupy_cuda111-11.0.0rc1-cp39-cp39-manylinux1_x86_64.whl(95.83 MB)
cupy_cuda111-11.0.0rc1-cp39-cp39-win_amd64.whl(78.40 MB)
cupy_cuda11x-11.0.0rc1-cp310-cp310-manylinux1_x86_64.whl(79.88 MB)
cupy_cuda11x-11.0.0rc1-cp310-cp310-manylinux2014_aarch64.whl(82.28 MB)
cupy_cuda11x-11.0.0rc1-cp310-cp310-win_amd64.whl(61.08 MB)
cupy_cuda11x-11.0.0rc1-cp37-cp37m-manylinux1_x86_64.whl(78.30 MB)
cupy_cuda11x-11.0.0rc1-cp37-cp37m-manylinux2014_aarch64.whl(79.80 MB)
cupy_cuda11x-11.0.0rc1-cp37-cp37m-win_amd64.whl(61.12 MB)
cupy_cuda11x-11.0.0rc1-cp38-cp38-manylinux1_x86_64.whl(81.61 MB)
cupy_cuda11x-11.0.0rc1-cp38-cp38-manylinux2014_aarch64.whl(83.91 MB)
cupy_cuda11x-11.0.0rc1-cp38-cp38-win_amd64.whl(61.21 MB)
cupy_cuda11x-11.0.0rc1-cp39-cp39-manylinux1_x86_64.whl(79.81 MB)
cupy_cuda11x-11.0.0rc1-cp39-cp39-manylinux2014_aarch64.whl(82.20 MB)
cupy_cuda11x-11.0.0rc1-cp39-cp39-win_amd64.whl(61.21 MB)
cupy_rocm_4_3-11.0.0rc1-cp310-cp310-manylinux1_x86_64.whl(36.59 MB)
cupy_rocm_4_3-11.0.0rc1-cp37-cp37m-manylinux1_x86_64.whl(35.23 MB)
cupy_rocm_4_3-11.0.0rc1-cp38-cp38-manylinux1_x86_64.whl(38.12 MB)
cupy_rocm_4_3-11.0.0rc1-cp39-cp39-manylinux1_x86_64.whl(36.52 MB)
cupy_rocm_5_0-11.0.0rc1-cp310-cp310-manylinux1_x86_64.whl(54.67 MB)
cupy_rocm_5_0-11.0.0rc1-cp37-cp37m-manylinux1_x86_64.whl(53.30 MB)
cupy_rocm_5_0-11.0.0rc1-cp38-cp38-manylinux1_x86_64.whl(56.19 MB)
cupy_rocm_5_0-11.0.0rc1-cp39-cp39-manylinux1_x86_64.whl(54.59 MB)
v10.6.0(Jun 30, 2022)
This is the release note of v10.6.0. See here for the complete list of solved issues and merged PRs.

This is the last planned release for CuPy v10 series. We are going to release v11.0.0 on July 28th. Please start testing your workload with the v11 release candidate (pip install --pre cupy-cuda11x -f https://pip.cupy.dev/pre). See the Upgrade Guide for the list of possible breaking changes in v11.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Support CUDA 11.7 (#6767)

Full support for CUDA 11.7 has been added as of this release. Binary packages can be installed with the following command: pip install cupy-cuda117

Changes without compatibility

Changes

Enhancements

Improve warning message in sparse (#6675)

Support CUDA 11.7 (#6794)

Make the warning for cupy.array_api say "cupy" instead of "numpy" (#6795)

cupy-wheel: Use NVRTC to infer the toolkit version (#6831)

Bug Fixes

Fix cupy.median for NaN inputs (#6760)

Fix batched matmul for integral numbers (#6777)

Documentation

Add CUDA 11.7 on documents (#6801)

Tests

Fix Dockerfile broken for array-api tests (#6518)

Skip ndimage.filter tests for ROCm 4.0 (#6676)

Xfail a test of LOBPCG on ROCm 5.0+ (#6733)

CI: Fix prep script to show build failure details (#6784)

Fix a potential variable misuse bug (#6788)

Fix CI Docker image build failing in head test (#6808)

Skip ndimage.filter tests for ROCm 4.0 (#6676)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@asi1024 @asmeurer @emcastillo @kmaehashi @LostBenjamin @takagi
Source code(tar.gz)
Source code(zip)
cupy_cuda102-10.6.0-cp310-cp310-manylinux1_x86_64.whl(60.37 MB)
cupy_cuda102-10.6.0-cp310-cp310-manylinux2014_aarch64.whl(34.61 MB)
cupy_cuda102-10.6.0-cp310-cp310-win_amd64.whl(42.30 MB)
cupy_cuda102-10.6.0-cp37-cp37m-manylinux1_x86_64.whl(58.85 MB)
cupy_cuda102-10.6.0-cp37-cp37m-manylinux2014_aarch64.whl(32.92 MB)
cupy_cuda102-10.6.0-cp37-cp37m-win_amd64.whl(42.34 MB)
cupy_cuda102-10.6.0-cp38-cp38-manylinux1_x86_64.whl(62.02 MB)
cupy_cuda102-10.6.0-cp38-cp38-manylinux2014_aarch64.whl(36.04 MB)
cupy_cuda102-10.6.0-cp38-cp38-win_amd64.whl(42.43 MB)
cupy_cuda102-10.6.0-cp39-cp39-manylinux1_x86_64.whl(60.28 MB)
cupy_cuda102-10.6.0-cp39-cp39-manylinux2014_aarch64.whl(34.55 MB)
cupy_cuda102-10.6.0-cp39-cp39-win_amd64.whl(42.42 MB)
cupy_cuda110-10.6.0-cp310-cp310-manylinux1_x86_64.whl(74.99 MB)
cupy_cuda110-10.6.0-cp310-cp310-win_amd64.whl(56.88 MB)
cupy_cuda110-10.6.0-cp37-cp37m-manylinux1_x86_64.whl(73.47 MB)
cupy_cuda110-10.6.0-cp37-cp37m-win_amd64.whl(56.92 MB)
cupy_cuda110-10.6.0-cp38-cp38-manylinux1_x86_64.whl(76.64 MB)
cupy_cuda110-10.6.0-cp38-cp38-win_amd64.whl(57.01 MB)
cupy_cuda110-10.6.0-cp39-cp39-manylinux1_x86_64.whl(74.91 MB)
cupy_cuda110-10.6.0-cp39-cp39-win_amd64.whl(57.01 MB)
cupy_cuda111-10.6.0-cp310-cp310-manylinux1_x86_64.whl(93.78 MB)
cupy_cuda111-10.6.0-cp310-cp310-win_amd64.whl(76.63 MB)
cupy_cuda111-10.6.0-cp37-cp37m-manylinux1_x86_64.whl(92.26 MB)
cupy_cuda111-10.6.0-cp37-cp37m-win_amd64.whl(76.66 MB)
cupy_cuda111-10.6.0-cp38-cp38-manylinux1_x86_64.whl(95.43 MB)
cupy_cuda111-10.6.0-cp38-cp38-win_amd64.whl(76.75 MB)
cupy_cuda111-10.6.0-cp39-cp39-manylinux1_x86_64.whl(93.69 MB)
cupy_cuda111-10.6.0-cp39-cp39-win_amd64.whl(76.75 MB)
cupy_cuda112-10.6.0-cp310-cp310-manylinux1_x86_64.whl(75.40 MB)
cupy_cuda112-10.6.0-cp310-cp310-win_amd64.whl(57.37 MB)
cupy_cuda112-10.6.0-cp37-cp37m-manylinux1_x86_64.whl(73.88 MB)
cupy_cuda112-10.6.0-cp37-cp37m-win_amd64.whl(57.40 MB)
cupy_cuda112-10.6.0-cp38-cp38-manylinux1_x86_64.whl(77.06 MB)
cupy_cuda112-10.6.0-cp38-cp38-win_amd64.whl(57.50 MB)
cupy_cuda112-10.6.0-cp39-cp39-manylinux1_x86_64.whl(75.32 MB)
cupy_cuda112-10.6.0-cp39-cp39-win_amd64.whl(57.49 MB)
cupy_cuda113-10.6.0-cp310-cp310-manylinux1_x86_64.whl(72.58 MB)
cupy_cuda113-10.6.0-cp310-cp310-win_amd64.whl(54.10 MB)
cupy_cuda113-10.6.0-cp37-cp37m-manylinux1_x86_64.whl(71.07 MB)
cupy_cuda113-10.6.0-cp37-cp37m-win_amd64.whl(54.14 MB)
cupy_cuda113-10.6.0-cp38-cp38-manylinux1_x86_64.whl(74.24 MB)
cupy_cuda113-10.6.0-cp38-cp38-win_amd64.whl(54.23 MB)
cupy_cuda113-10.6.0-cp39-cp39-manylinux1_x86_64.whl(72.50 MB)
cupy_cuda113-10.6.0-cp39-cp39-win_amd64.whl(54.23 MB)
cupy_cuda114-10.6.0-cp310-cp310-manylinux1_x86_64.whl(81.06 MB)
cupy_cuda114-10.6.0-cp310-cp310-win_amd64.whl(62.80 MB)
cupy_cuda114-10.6.0-cp37-cp37m-manylinux1_x86_64.whl(79.54 MB)
cupy_cuda114-10.6.0-cp37-cp37m-win_amd64.whl(62.83 MB)
cupy_cuda114-10.6.0-cp38-cp38-manylinux1_x86_64.whl(82.72 MB)
cupy_cuda114-10.6.0-cp38-cp38-win_amd64.whl(62.92 MB)
cupy_cuda114-10.6.0-cp39-cp39-manylinux1_x86_64.whl(80.98 MB)
cupy_cuda114-10.6.0-cp39-cp39-win_amd64.whl(62.92 MB)
cupy_cuda115-10.6.0-cp310-cp310-manylinux1_x86_64.whl(77.78 MB)
cupy_cuda115-10.6.0-cp310-cp310-manylinux2014_aarch64.whl(80.12 MB)
cupy_cuda115-10.6.0-cp310-cp310-win_amd64.whl(59.48 MB)
cupy_cuda115-10.6.0-cp37-cp37m-manylinux1_x86_64.whl(76.26 MB)
cupy_cuda115-10.6.0-cp37-cp37m-manylinux2014_aarch64.whl(77.74 MB)
cupy_cuda115-10.6.0-cp37-cp37m-win_amd64.whl(59.51 MB)
cupy_cuda115-10.6.0-cp38-cp38-manylinux1_x86_64.whl(79.43 MB)
cupy_cuda115-10.6.0-cp38-cp38-manylinux2014_aarch64.whl(81.70 MB)
cupy_cuda115-10.6.0-cp38-cp38-win_amd64.whl(59.60 MB)
cupy_cuda115-10.6.0-cp39-cp39-manylinux1_x86_64.whl(77.69 MB)
cupy_cuda115-10.6.0-cp39-cp39-manylinux2014_aarch64.whl(80.04 MB)
cupy_cuda115-10.6.0-cp39-cp39-win_amd64.whl(59.60 MB)
cupy_cuda116-10.6.0-cp310-cp310-manylinux1_x86_64.whl(77.82 MB)
cupy_cuda116-10.6.0-cp310-cp310-win_amd64.whl(59.50 MB)
cupy_cuda116-10.6.0-cp37-cp37m-manylinux1_x86_64.whl(76.29 MB)
cupy_cuda116-10.6.0-cp37-cp37m-win_amd64.whl(59.53 MB)
cupy_cuda116-10.6.0-cp38-cp38-manylinux1_x86_64.whl(79.48 MB)
cupy_cuda116-10.6.0-cp38-cp38-win_amd64.whl(59.63 MB)
cupy_cuda116-10.6.0-cp39-cp39-manylinux1_x86_64.whl(77.74 MB)
cupy_cuda116-10.6.0-cp39-cp39-win_amd64.whl(59.62 MB)
cupy_cuda117-10.6.0-cp310-cp310-manylinux1_x86_64.whl(77.93 MB)
cupy_cuda117-10.6.0-cp310-cp310-win_amd64.whl(59.61 MB)
cupy_cuda117-10.6.0-cp37-cp37m-manylinux1_x86_64.whl(76.41 MB)
cupy_cuda117-10.6.0-cp37-cp37m-win_amd64.whl(59.64 MB)
cupy_cuda117-10.6.0-cp38-cp38-manylinux1_x86_64.whl(79.59 MB)
cupy_cuda117-10.6.0-cp38-cp38-win_amd64.whl(59.74 MB)
cupy_cuda117-10.6.0-cp39-cp39-manylinux1_x86_64.whl(77.85 MB)
cupy_cuda117-10.6.0-cp39-cp39-win_amd64.whl(59.73 MB)
cupy_rocm_4_0-10.6.0-cp310-cp310-manylinux1_x86_64.whl(35.25 MB)
cupy_rocm_4_0-10.6.0-cp37-cp37m-manylinux1_x86_64.whl(33.93 MB)
cupy_rocm_4_0-10.6.0-cp38-cp38-manylinux1_x86_64.whl(36.72 MB)
cupy_rocm_4_0-10.6.0-cp39-cp39-manylinux1_x86_64.whl(35.17 MB)
cupy_rocm_4_2-10.6.0-cp310-cp310-manylinux1_x86_64.whl(34.41 MB)
cupy_rocm_4_2-10.6.0-cp37-cp37m-manylinux1_x86_64.whl(33.09 MB)
cupy_rocm_4_2-10.6.0-cp38-cp38-manylinux1_x86_64.whl(35.88 MB)
cupy_rocm_4_2-10.6.0-cp39-cp39-manylinux1_x86_64.whl(34.33 MB)
cupy_rocm_4_3-10.6.0-cp310-cp310-manylinux1_x86_64.whl(35.99 MB)
cupy_rocm_4_3-10.6.0-cp37-cp37m-manylinux1_x86_64.whl(34.68 MB)
cupy_rocm_4_3-10.6.0-cp38-cp38-manylinux1_x86_64.whl(37.46 MB)
cupy_rocm_4_3-10.6.0-cp39-cp39-manylinux1_x86_64.whl(35.91 MB)
cupy_rocm_5_0-10.6.0-cp310-cp310-manylinux1_x86_64.whl(54.06 MB)
cupy_rocm_5_0-10.6.0-cp37-cp37m-manylinux1_x86_64.whl(52.74 MB)
cupy_rocm_5_0-10.6.0-cp38-cp38-manylinux1_x86_64.whl(55.53 MB)
cupy_rocm_5_0-10.6.0-cp39-cp39-manylinux1_x86_64.whl(53.98 MB)
v11.0.0b3(May 26, 2022)
This is the release note of v11.0.0b3. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Support cuTensorNet as an einsum backend (#6677) (thanks @leofang!)

A new accelerator for CuPy has been added (CUPY_ACCELERATORS=cutensornet). This feature requires cuquantum-python >= 22.03 and cuTENSOR >= 1.5.0. And is used to accelerate and support large array sizes in the cupy.linalg.einsum API.

Changes without compatibility

Drop Support for ROCm 4.2 (#6734)

CuPy v11 will drop support for ROCm 4.2. We recommend users to use ROCm 4.3 or 5.0 instead.

Drop Support for NumPy 1.18/1.19 and SciPy 1.4/1.5 (#6735)

As per NEP29, NumPy 1.18/1.9 support has been dropped on 2021. SciPy supported versions are the one released close to NumPy supported ones.

Changes

New Features

Support cuTensorNet (from cuQuantum) as an einsum backend (#6677)

Add cupy.poly (#6697)

Support cupy.ndarray subclassing - Part 1 - Direct constructor call (#6716)

Enhancements

Support cuDNN 8.4 (#6641)

Support cuTENSOR 1.5.0 (#6665)

JIT: Use C++14 (#6670)

Support cuTENSOR 1.5.0 (#6722)

Drop support for ROCm 4.2 in CuPy v11 (#6734)

Drop support for NumPy 1.18/1.19 and SciPy 1.4/1.5 in CuPy v11 (#6735)

Fix compilation warning caused by ifdef (#6739)

Performance Improvements

Accelerate bincount, histogram2d, histogramdd with CUB (#6701)

Bug Fixes

Fix memory leak in the FFT plan cache during multi-threading (#6704)

Fix ifdef for ROCm >= 4.2 (#6750)

Code Fixes

JIT: Cosmetic change of Dim3 class (#6644)

Documentation

Fix imports of scatter_add example (#6696)

Minor improvement on the array API docs (#6706)

Document the returned benchmark object (#6712)

Use exposed name in user guide (#6718)

Tests

Xfail a test of LOBPCG on ROCm 5.0+ (#6603)

CI: Update repo for libcudnn7 in cuda10.2 (#6708)

Bump pinned mypy version (#6710)

Follow scipy==1.8.1 sparse dot bugfix (#6727)

Support testing CUDA 11.6+ in FlexCI (#6731)

Fix GPG key issue in FlexCI base image (#6738)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@asi1024 @Dahlia-Chehata @emcastillo @kmaehashi @leofang @takagi
Source code(tar.gz)
Source code(zip)
cupy_cuda102-11.0.0b3-cp310-cp310-manylinux1_x86_64.whl(61.63 MB)
cupy_cuda102-11.0.0b3-cp310-cp310-manylinux2014_aarch64.whl(35.46 MB)
cupy_cuda102-11.0.0b3-cp310-cp310-win_amd64.whl(43.27 MB)
cupy_cuda102-11.0.0b3-cp37-cp37m-manylinux1_x86_64.whl(60.07 MB)
cupy_cuda102-11.0.0b3-cp37-cp37m-manylinux2014_aarch64.whl(33.73 MB)
cupy_cuda102-11.0.0b3-cp37-cp37m-win_amd64.whl(43.31 MB)
cupy_cuda102-11.0.0b3-cp38-cp38-manylinux1_x86_64.whl(63.31 MB)
cupy_cuda102-11.0.0b3-cp38-cp38-manylinux2014_aarch64.whl(36.93 MB)
cupy_cuda102-11.0.0b3-cp38-cp38-win_amd64.whl(43.40 MB)
cupy_cuda102-11.0.0b3-cp39-cp39-manylinux1_x86_64.whl(61.55 MB)
cupy_cuda102-11.0.0b3-cp39-cp39-manylinux2014_aarch64.whl(35.40 MB)
cupy_cuda102-11.0.0b3-cp39-cp39-win_amd64.whl(43.40 MB)
cupy_cuda110-11.0.0b3-cp310-cp310-manylinux1_x86_64.whl(76.46 MB)
cupy_cuda110-11.0.0b3-cp310-cp310-win_amd64.whl(58.05 MB)
cupy_cuda110-11.0.0b3-cp37-cp37m-manylinux1_x86_64.whl(74.90 MB)
cupy_cuda110-11.0.0b3-cp37-cp37m-win_amd64.whl(58.09 MB)
cupy_cuda110-11.0.0b3-cp38-cp38-manylinux1_x86_64.whl(78.14 MB)
cupy_cuda110-11.0.0b3-cp38-cp38-win_amd64.whl(58.19 MB)
cupy_cuda110-11.0.0b3-cp39-cp39-manylinux1_x86_64.whl(76.38 MB)
cupy_cuda110-11.0.0b3-cp39-cp39-win_amd64.whl(58.18 MB)
cupy_cuda111-11.0.0b3-cp310-cp310-manylinux1_x86_64.whl(95.64 MB)
cupy_cuda111-11.0.0b3-cp310-cp310-win_amd64.whl(78.21 MB)
cupy_cuda111-11.0.0b3-cp37-cp37m-manylinux1_x86_64.whl(94.08 MB)
cupy_cuda111-11.0.0b3-cp37-cp37m-win_amd64.whl(78.25 MB)
cupy_cuda111-11.0.0b3-cp38-cp38-manylinux1_x86_64.whl(97.32 MB)
cupy_cuda111-11.0.0b3-cp38-cp38-win_amd64.whl(78.34 MB)
cupy_cuda111-11.0.0b3-cp39-cp39-manylinux1_x86_64.whl(95.56 MB)
cupy_cuda111-11.0.0b3-cp39-cp39-win_amd64.whl(78.34 MB)
cupy_cuda112-11.0.0b3-cp310-cp310-manylinux1_x86_64.whl(76.88 MB)
cupy_cuda112-11.0.0b3-cp310-cp310-win_amd64.whl(58.55 MB)
cupy_cuda112-11.0.0b3-cp37-cp37m-manylinux1_x86_64.whl(75.31 MB)
cupy_cuda112-11.0.0b3-cp37-cp37m-win_amd64.whl(58.59 MB)
cupy_cuda112-11.0.0b3-cp38-cp38-manylinux1_x86_64.whl(78.57 MB)
cupy_cuda112-11.0.0b3-cp38-cp38-win_amd64.whl(58.68 MB)
cupy_cuda112-11.0.0b3-cp39-cp39-manylinux1_x86_64.whl(76.80 MB)
cupy_cuda112-11.0.0b3-cp39-cp39-win_amd64.whl(58.68 MB)
cupy_cuda113-11.0.0b3-cp310-cp310-manylinux1_x86_64.whl(74.11 MB)
cupy_cuda113-11.0.0b3-cp310-cp310-win_amd64.whl(55.34 MB)
cupy_cuda113-11.0.0b3-cp37-cp37m-manylinux1_x86_64.whl(72.55 MB)
cupy_cuda113-11.0.0b3-cp37-cp37m-win_amd64.whl(55.38 MB)
cupy_cuda113-11.0.0b3-cp38-cp38-manylinux1_x86_64.whl(75.80 MB)
cupy_cuda113-11.0.0b3-cp38-cp38-win_amd64.whl(55.47 MB)
cupy_cuda113-11.0.0b3-cp39-cp39-manylinux1_x86_64.whl(74.04 MB)
cupy_cuda113-11.0.0b3-cp39-cp39-win_amd64.whl(55.47 MB)
cupy_cuda114-11.0.0b3-cp310-cp310-manylinux1_x86_64.whl(82.39 MB)
cupy_cuda114-11.0.0b3-cp310-cp310-win_amd64.whl(63.83 MB)
cupy_cuda114-11.0.0b3-cp37-cp37m-manylinux1_x86_64.whl(80.82 MB)
cupy_cuda114-11.0.0b3-cp37-cp37m-win_amd64.whl(63.87 MB)
cupy_cuda114-11.0.0b3-cp38-cp38-manylinux1_x86_64.whl(84.07 MB)
cupy_cuda114-11.0.0b3-cp38-cp38-win_amd64.whl(63.96 MB)
cupy_cuda114-11.0.0b3-cp39-cp39-manylinux1_x86_64.whl(82.31 MB)
cupy_cuda114-11.0.0b3-cp39-cp39-win_amd64.whl(63.96 MB)
cupy_cuda115-11.0.0b3-cp310-cp310-manylinux1_x86_64.whl(79.48 MB)
cupy_cuda115-11.0.0b3-cp310-cp310-win_amd64.whl(60.88 MB)
cupy_cuda115-11.0.0b3-cp37-cp37m-manylinux1_x86_64.whl(77.91 MB)
cupy_cuda115-11.0.0b3-cp37-cp37m-win_amd64.whl(60.92 MB)
cupy_cuda115-11.0.0b3-cp38-cp38-manylinux1_x86_64.whl(81.16 MB)
cupy_cuda115-11.0.0b3-cp38-cp38-win_amd64.whl(61.01 MB)
cupy_cuda115-11.0.0b3-cp39-cp39-manylinux1_x86_64.whl(79.40 MB)
cupy_cuda115-11.0.0b3-cp39-cp39-win_amd64.whl(61.00 MB)
cupy_cuda116-11.0.0b3-cp310-cp310-manylinux1_x86_64.whl(79.51 MB)
cupy_cuda116-11.0.0b3-cp310-cp310-win_amd64.whl(60.90 MB)
cupy_cuda116-11.0.0b3-cp37-cp37m-manylinux1_x86_64.whl(77.95 MB)
cupy_cuda116-11.0.0b3-cp37-cp37m-win_amd64.whl(60.93 MB)
cupy_cuda116-11.0.0b3-cp38-cp38-manylinux1_x86_64.whl(81.20 MB)
cupy_cuda116-11.0.0b3-cp38-cp38-win_amd64.whl(61.03 MB)
cupy_cuda116-11.0.0b3-cp39-cp39-manylinux1_x86_64.whl(79.44 MB)
cupy_cuda116-11.0.0b3-cp39-cp39-win_amd64.whl(61.02 MB)
cupy_rocm_4_3-11.0.0b3-cp310-cp310-manylinux1_x86_64.whl(36.33 MB)
cupy_rocm_4_3-11.0.0b3-cp37-cp37m-manylinux1_x86_64.whl(34.99 MB)
cupy_rocm_4_3-11.0.0b3-cp38-cp38-manylinux1_x86_64.whl(37.83 MB)
cupy_rocm_4_3-11.0.0b3-cp39-cp39-manylinux1_x86_64.whl(36.26 MB)
cupy_rocm_5_0-11.0.0b3-cp310-cp310-manylinux1_x86_64.whl(54.40 MB)
cupy_rocm_5_0-11.0.0b3-cp37-cp37m-manylinux1_x86_64.whl(53.06 MB)
cupy_rocm_5_0-11.0.0b3-cp38-cp38-manylinux1_x86_64.whl(55.90 MB)
cupy_rocm_5_0-11.0.0b3-cp39-cp39-manylinux1_x86_64.whl(54.33 MB)
v10.5.0(May 26, 2022)
This is the release note of v10.5.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Update (2022-06-17): Wheels for CUDA 11.5 Arm SBSA are now available in the Assets section below. (#6705)

Changes

Enhancements

Fix compilation warning caused by ifdef (#6740)

Support cuDNN 8.4 (#6741)

Bug Fixes

Fix memory leak in the FFT plan cache during multi-threading (#6732)

Fix ifdef for ROCm >= 4.2 (#6751)

Documentation

Minor improvement on the array API docs (#6714)

Document the returned benchmark object (#6742)

Tests

CI: Update repo for libcudnn7 in cuda10.2 (#6709)

Pin mypy version in setup.py (#6711)

Follow scipy==1.8.1 sparse dot bugfix (#6728)

Support testing CUDA 11.6+ in FlexCI (#6737)

Fix GPG key issue in FlexCI base image (#6743)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@asi1024 @emcastillo @kmaehashi @leofang @takagi
Source code(tar.gz)
Source code(zip)
cupy_cuda102-10.5.0-cp310-cp310-manylinux1_x86_64.whl(60.30 MB)
cupy_cuda102-10.5.0-cp310-cp310-manylinux2014_aarch64.whl(34.55 MB)
cupy_cuda102-10.5.0-cp310-cp310-win_amd64.whl(42.29 MB)
cupy_cuda102-10.5.0-cp37-cp37m-manylinux1_x86_64.whl(58.79 MB)
cupy_cuda102-10.5.0-cp37-cp37m-manylinux2014_aarch64.whl(32.86 MB)
cupy_cuda102-10.5.0-cp37-cp37m-win_amd64.whl(42.33 MB)
cupy_cuda102-10.5.0-cp38-cp38-manylinux1_x86_64.whl(61.93 MB)
cupy_cuda102-10.5.0-cp38-cp38-manylinux2014_aarch64.whl(35.95 MB)
cupy_cuda102-10.5.0-cp38-cp38-win_amd64.whl(42.42 MB)
cupy_cuda102-10.5.0-cp39-cp39-manylinux1_x86_64.whl(60.22 MB)
cupy_cuda102-10.5.0-cp39-cp39-manylinux2014_aarch64.whl(34.49 MB)
cupy_cuda102-10.5.0-cp39-cp39-win_amd64.whl(42.41 MB)
cupy_cuda110-10.5.0-cp310-cp310-manylinux1_x86_64.whl(74.92 MB)
cupy_cuda110-10.5.0-cp310-cp310-win_amd64.whl(56.87 MB)
cupy_cuda110-10.5.0-cp37-cp37m-manylinux1_x86_64.whl(73.41 MB)
cupy_cuda110-10.5.0-cp37-cp37m-win_amd64.whl(56.91 MB)
cupy_cuda110-10.5.0-cp38-cp38-manylinux1_x86_64.whl(76.56 MB)
cupy_cuda110-10.5.0-cp38-cp38-win_amd64.whl(57.00 MB)
cupy_cuda110-10.5.0-cp39-cp39-manylinux1_x86_64.whl(74.84 MB)
cupy_cuda110-10.5.0-cp39-cp39-win_amd64.whl(56.99 MB)
cupy_cuda111-10.5.0-cp310-cp310-manylinux1_x86_64.whl(93.71 MB)
cupy_cuda111-10.5.0-cp310-cp310-win_amd64.whl(76.62 MB)
cupy_cuda111-10.5.0-cp37-cp37m-manylinux1_x86_64.whl(92.20 MB)
cupy_cuda111-10.5.0-cp37-cp37m-win_amd64.whl(76.65 MB)
cupy_cuda111-10.5.0-cp38-cp38-manylinux1_x86_64.whl(95.34 MB)
cupy_cuda111-10.5.0-cp38-cp38-win_amd64.whl(76.74 MB)
cupy_cuda111-10.5.0-cp39-cp39-manylinux1_x86_64.whl(93.63 MB)
cupy_cuda111-10.5.0-cp39-cp39-win_amd64.whl(76.73 MB)
cupy_cuda112-10.5.0-cp310-cp310-manylinux1_x86_64.whl(75.34 MB)
cupy_cuda112-10.5.0-cp310-cp310-win_amd64.whl(57.36 MB)
cupy_cuda112-10.5.0-cp37-cp37m-manylinux1_x86_64.whl(73.82 MB)
cupy_cuda112-10.5.0-cp37-cp37m-win_amd64.whl(57.39 MB)
cupy_cuda112-10.5.0-cp38-cp38-manylinux1_x86_64.whl(76.98 MB)
cupy_cuda112-10.5.0-cp38-cp38-win_amd64.whl(57.49 MB)
cupy_cuda112-10.5.0-cp39-cp39-manylinux1_x86_64.whl(75.26 MB)
cupy_cuda112-10.5.0-cp39-cp39-win_amd64.whl(57.48 MB)
cupy_cuda113-10.5.0-cp310-cp310-manylinux1_x86_64.whl(72.51 MB)
cupy_cuda113-10.5.0-cp310-cp310-win_amd64.whl(54.09 MB)
cupy_cuda113-10.5.0-cp37-cp37m-manylinux1_x86_64.whl(71.00 MB)
cupy_cuda113-10.5.0-cp37-cp37m-win_amd64.whl(54.13 MB)
cupy_cuda113-10.5.0-cp38-cp38-manylinux1_x86_64.whl(74.15 MB)
cupy_cuda113-10.5.0-cp38-cp38-win_amd64.whl(54.22 MB)
cupy_cuda113-10.5.0-cp39-cp39-manylinux1_x86_64.whl(72.44 MB)
cupy_cuda113-10.5.0-cp39-cp39-win_amd64.whl(54.21 MB)
cupy_cuda114-10.5.0-cp310-cp310-manylinux1_x86_64.whl(80.99 MB)
cupy_cuda114-10.5.0-cp310-cp310-win_amd64.whl(62.78 MB)
cupy_cuda114-10.5.0-cp37-cp37m-manylinux1_x86_64.whl(79.48 MB)
cupy_cuda114-10.5.0-cp37-cp37m-win_amd64.whl(62.82 MB)
cupy_cuda114-10.5.0-cp38-cp38-manylinux1_x86_64.whl(82.63 MB)
cupy_cuda114-10.5.0-cp38-cp38-win_amd64.whl(62.91 MB)
cupy_cuda114-10.5.0-cp39-cp39-manylinux1_x86_64.whl(80.91 MB)
cupy_cuda114-10.5.0-cp39-cp39-win_amd64.whl(62.91 MB)
cupy_cuda115-10.5.0-cp310-cp310-manylinux1_x86_64.whl(77.71 MB)
cupy_cuda115-10.5.0-cp310-cp310-manylinux2014_aarch64.whl(80.03 MB)
cupy_cuda115-10.5.0-cp310-cp310-win_amd64.whl(59.46 MB)
cupy_cuda115-10.5.0-cp37-cp37m-manylinux1_x86_64.whl(76.20 MB)
cupy_cuda115-10.5.0-cp37-cp37m-manylinux2014_aarch64.whl(77.66 MB)
cupy_cuda115-10.5.0-cp37-cp37m-win_amd64.whl(59.50 MB)
cupy_cuda115-10.5.0-cp38-cp38-manylinux1_x86_64.whl(79.35 MB)
cupy_cuda115-10.5.0-cp38-cp38-manylinux2014_aarch64.whl(81.61 MB)
cupy_cuda115-10.5.0-cp38-cp38-win_amd64.whl(59.59 MB)
cupy_cuda115-10.5.0-cp39-cp39-manylinux1_x86_64.whl(77.63 MB)
cupy_cuda115-10.5.0-cp39-cp39-manylinux2014_aarch64.whl(79.95 MB)
cupy_cuda115-10.5.0-cp39-cp39-win_amd64.whl(59.59 MB)
cupy_cuda116-10.5.0-cp310-cp310-manylinux1_x86_64.whl(77.75 MB)
cupy_cuda116-10.5.0-cp310-cp310-win_amd64.whl(59.49 MB)
cupy_cuda116-10.5.0-cp37-cp37m-manylinux1_x86_64.whl(76.23 MB)
cupy_cuda116-10.5.0-cp37-cp37m-win_amd64.whl(59.52 MB)
cupy_cuda116-10.5.0-cp38-cp38-manylinux1_x86_64.whl(79.39 MB)
cupy_cuda116-10.5.0-cp38-cp38-win_amd64.whl(59.61 MB)
cupy_cuda116-10.5.0-cp39-cp39-manylinux1_x86_64.whl(77.67 MB)
cupy_cuda116-10.5.0-cp39-cp39-win_amd64.whl(59.61 MB)
v11.0.0b2(Apr 27, 2022)
This is the release note of v11.0.0b2. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

JIT Improvements (#6620, #6640, #6649, #6668)

CuPy JIT has been further enhanced thanks to @leofang and @eternalphane! It is now possible to use CUDA cooperative groups and access .shape and .strides attributes of ndarrays.

import cupy from cupyx import jit @jit.rawkernel() def kernel(x, y): size = x.shape[0] ntid = jit.gridDim.x * jit.blockDim.x tid = jit.blockIdx.x * jit.blockDim.x + jit.threadIdx.x for i in range(tid, size, ntid): y[i] = x[i] g = jit.cg.this_thread_block() g.sync() x = cupy.arange(200, dtype=cupy.int64) y = cupy.zeros((200,), dtype=cupy.int64) kernel[2, 32](x, y) print(kernel.cached_code)

The above program emits the CUDA code as follows:

#include <cooperative_groups.h> namespace cg = cooperative_groups; extern "C" __global__ void kernel(CArray<long long, 1, true, true> x, CArray<long long, 1, true, true> y) { ptrdiff_t i; ptrdiff_t size = thrust::get<0>(x.get_shape()); unsigned int ntid = (gridDim.x * blockDim.x); unsigned int tid = ((blockIdx.x * blockDim.x) + threadIdx.x); for (ptrdiff_t __it = tid, __stop = size, __step = ntid; __it < __stop; __it += __step) { i = __it; y[i] = x[i]; } cg::thread_block g = cg::this_thread_block(); g.sync(); }

Initial MPI and sparse matrix support in cupyx.distributed (#6628, #6658)

CuPy v10 added the cupyx.distributed API to perform interprocess communication using NCCL in a way similar to MPI. In CuPy v11 we are extending this API to support sparse matrices as defined in cupyx.scipy.sparse. Currently only send/recv primitives are supported but we will be adding support for collective calls in the following releases.

Additionally, now it is possible to use MPI (through the mpi4py python package) to initialize the NCCL communicator. This prevents from launching the TCP server used for communication exchange of CPU values. Moreover, we recommend to enable MPI for sparse matrices communication as this requires to exchange metadata per each communication call that lead to device synchronization if MPI is not enabled.

# run with mpiexec -n N python … import mpi4py comm = mpi4py.MPI.COMM_WORLD workers = comm.Get_size() rank = comm.Get_rank() comm = cupyx.distributed.init_process_group(workers, rank, use_mpi=True)

Announcements

Introduction of generic cupy-wheel (EXPERIMENTAL) (#6012)

We have added a new package in the PyPI called cupy-wheel. This meta package allows other libraries to add a dependency to CuPy with the ability to transparently install the exact CuPy binary wheel matching the user environment. Users can also install CuPy using this package instead of manually specifying a CUDA/ROCm version.

pip install cupy-wheel

This package is only available for the stable release as the current pre-release wheels are not hosted in PyPI.

This feature is currently experimental and subject to change so we recommend users not to distribute packages relying on it for now. Your suggestions or comments are highly welcomed (please visit #6688.)

Changes

New Features

Support cooperative group in JIT compiler (#6620)

Add support for sparse matrices in cupyx.distributed (#6628)

JIT: Support compile-time for-loop unrolling (#6649)

JIT: Support .shape and .strides (#6668)

Enhancements

Add a few driver/runtime/nvrtc API wrappers (#6604)

Implement flatten(order) (#6613)

Implemented a __repr__ for cupyx.profiler._time._PerfCaseResult (#6617)

JIT: Avoid calling default constructor if possible (#6619)

Add missing cudaDevAttrMemoryPoolsSupported to hip (#6621)

Add CC 3.2 to Tegra arch list (#6631)

JIT: Add more cooperative group APIs (#6640)

JIT: Add kernel.cached_code test (#6643)

Use MPI for management in cupyx.distributed (#6658)

Improve warning message in sparse (#6669)

Performance Improvements

Improve copy and assign operation (#6181)

Performance improvement of cupy.intersect1d (#6586)

Bug Fixes

Define float16::operator-() only for ROCm 5.0+ (#6624)

JIT: fix access to cached codes (#6639)

Fix cuda python CI (#6652)

Fix int64 overflow in cupy.polyval (#6664)

JIT: Disable memcpy_async on CUDA 11.0 (#6671)

Documentation

Add --pre option to instructions installing pre-releases (#6612)

JIT: fix function signatures in the docs (#6648)

Fix typo in performance guide (#6657)

Installation

Add universal CuPy package (#6012)

Tests

Run daily benchmark with head branch against latest release (#6598)

CI: Trigger FlexCI for hotfix branches (#6625)

Remove jenkins requirements (#6632)

Fix TestIncludesCompileCUDA for HEAD tests (#6646)

Trigger CUDA Python tests with /test mini (#6653)

Fix missing f prefix on f-strings fix (#6674)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@asi1024 @code-review-doctor @danielg1111 @davidegavio @emcastillo @eternalphane @kmaehashi @leofang @okuta @takagi @toslunar
Source code(tar.gz)
Source code(zip)
cupy_cuda102-11.0.0b2-cp310-cp310-manylinux1_x86_64.whl(60.58 MB)
cupy_cuda102-11.0.0b2-cp310-cp310-manylinux2014_aarch64.whl(34.84 MB)
cupy_cuda102-11.0.0b2-cp310-cp310-win_amd64.whl(42.50 MB)
cupy_cuda102-11.0.0b2-cp37-cp37m-manylinux1_x86_64.whl(59.05 MB)
cupy_cuda102-11.0.0b2-cp37-cp37m-manylinux2014_aarch64.whl(33.14 MB)
cupy_cuda102-11.0.0b2-cp37-cp37m-win_amd64.whl(42.41 MB)
cupy_cuda102-11.0.0b2-cp38-cp38-manylinux1_x86_64.whl(62.24 MB)
cupy_cuda102-11.0.0b2-cp38-cp38-manylinux2014_aarch64.whl(36.28 MB)
cupy_cuda102-11.0.0b2-cp38-cp38-win_amd64.whl(42.51 MB)
cupy_cuda102-11.0.0b2-cp39-cp39-manylinux1_x86_64.whl(60.51 MB)
cupy_cuda102-11.0.0b2-cp39-cp39-manylinux2014_aarch64.whl(34.78 MB)
cupy_cuda102-11.0.0b2-cp39-cp39-win_amd64.whl(42.50 MB)
cupy_cuda110-11.0.0b2-cp310-cp310-manylinux1_x86_64.whl(75.21 MB)
cupy_cuda110-11.0.0b2-cp310-cp310-win_amd64.whl(57.08 MB)
cupy_cuda110-11.0.0b2-cp37-cp37m-manylinux1_x86_64.whl(73.67 MB)
cupy_cuda110-11.0.0b2-cp37-cp37m-win_amd64.whl(56.99 MB)
cupy_cuda110-11.0.0b2-cp38-cp38-manylinux1_x86_64.whl(76.86 MB)
cupy_cuda110-11.0.0b2-cp38-cp38-win_amd64.whl(57.09 MB)
cupy_cuda110-11.0.0b2-cp39-cp39-manylinux1_x86_64.whl(75.13 MB)
cupy_cuda110-11.0.0b2-cp39-cp39-win_amd64.whl(57.08 MB)
cupy_cuda111-11.0.0b2-cp310-cp310-manylinux1_x86_64.whl(94.00 MB)
cupy_cuda111-11.0.0b2-cp310-cp310-win_amd64.whl(76.83 MB)
cupy_cuda111-11.0.0b2-cp37-cp37m-manylinux1_x86_64.whl(92.46 MB)
cupy_cuda111-11.0.0b2-cp37-cp37m-win_amd64.whl(76.73 MB)
cupy_cuda111-11.0.0b2-cp38-cp38-manylinux1_x86_64.whl(95.65 MB)
cupy_cuda111-11.0.0b2-cp38-cp38-win_amd64.whl(76.83 MB)
cupy_cuda111-11.0.0b2-cp39-cp39-manylinux1_x86_64.whl(93.92 MB)
cupy_cuda111-11.0.0b2-cp39-cp39-win_amd64.whl(76.82 MB)
cupy_cuda112-11.0.0b2-cp310-cp310-manylinux1_x86_64.whl(75.62 MB)
cupy_cuda112-11.0.0b2-cp310-cp310-win_amd64.whl(57.57 MB)
cupy_cuda112-11.0.0b2-cp37-cp37m-manylinux1_x86_64.whl(74.09 MB)
cupy_cuda112-11.0.0b2-cp37-cp37m-win_amd64.whl(57.48 MB)
cupy_cuda112-11.0.0b2-cp38-cp38-manylinux1_x86_64.whl(77.28 MB)
cupy_cuda112-11.0.0b2-cp38-cp38-win_amd64.whl(57.58 MB)
cupy_cuda112-11.0.0b2-cp39-cp39-manylinux1_x86_64.whl(75.55 MB)
cupy_cuda112-11.0.0b2-cp39-cp39-win_amd64.whl(57.57 MB)
cupy_cuda113-11.0.0b2-cp310-cp310-manylinux1_x86_64.whl(72.80 MB)
cupy_cuda113-11.0.0b2-cp310-cp310-win_amd64.whl(54.30 MB)
cupy_cuda113-11.0.0b2-cp37-cp37m-manylinux1_x86_64.whl(71.27 MB)
cupy_cuda113-11.0.0b2-cp37-cp37m-win_amd64.whl(54.21 MB)
cupy_cuda113-11.0.0b2-cp38-cp38-manylinux1_x86_64.whl(74.46 MB)
cupy_cuda113-11.0.0b2-cp38-cp38-win_amd64.whl(54.31 MB)
cupy_cuda113-11.0.0b2-cp39-cp39-manylinux1_x86_64.whl(72.72 MB)
cupy_cuda113-11.0.0b2-cp39-cp39-win_amd64.whl(54.30 MB)
cupy_cuda114-11.0.0b2-cp310-cp310-manylinux1_x86_64.whl(81.28 MB)
cupy_cuda114-11.0.0b2-cp310-cp310-win_amd64.whl(63.00 MB)
cupy_cuda114-11.0.0b2-cp37-cp37m-manylinux1_x86_64.whl(79.74 MB)
cupy_cuda114-11.0.0b2-cp37-cp37m-win_amd64.whl(62.90 MB)
cupy_cuda114-11.0.0b2-cp38-cp38-manylinux1_x86_64.whl(82.94 MB)
cupy_cuda114-11.0.0b2-cp38-cp38-win_amd64.whl(63.00 MB)
cupy_cuda114-11.0.0b2-cp39-cp39-manylinux1_x86_64.whl(81.20 MB)
cupy_cuda114-11.0.0b2-cp39-cp39-win_amd64.whl(63.00 MB)
cupy_cuda115-11.0.0b2-cp310-cp310-manylinux1_x86_64.whl(77.99 MB)
cupy_cuda115-11.0.0b2-cp310-cp310-win_amd64.whl(59.68 MB)
cupy_cuda115-11.0.0b2-cp37-cp37m-manylinux1_x86_64.whl(76.46 MB)
cupy_cuda115-11.0.0b2-cp37-cp37m-win_amd64.whl(59.58 MB)
cupy_cuda115-11.0.0b2-cp38-cp38-manylinux1_x86_64.whl(79.65 MB)
cupy_cuda115-11.0.0b2-cp38-cp38-win_amd64.whl(59.68 MB)
cupy_cuda115-11.0.0b2-cp39-cp39-manylinux1_x86_64.whl(77.92 MB)
cupy_cuda115-11.0.0b2-cp39-cp39-win_amd64.whl(59.68 MB)
cupy_cuda116-11.0.0b2-cp310-cp310-manylinux1_x86_64.whl(78.04 MB)
cupy_cuda116-11.0.0b2-cp310-cp310-win_amd64.whl(59.70 MB)
cupy_cuda116-11.0.0b2-cp37-cp37m-manylinux1_x86_64.whl(76.50 MB)
cupy_cuda116-11.0.0b2-cp37-cp37m-win_amd64.whl(59.61 MB)
cupy_cuda116-11.0.0b2-cp38-cp38-manylinux1_x86_64.whl(79.70 MB)
cupy_cuda116-11.0.0b2-cp38-cp38-win_amd64.whl(59.70 MB)
cupy_cuda116-11.0.0b2-cp39-cp39-manylinux1_x86_64.whl(77.96 MB)
cupy_cuda116-11.0.0b2-cp39-cp39-win_amd64.whl(59.70 MB)
cupy_rocm_4_2-11.0.0b2-cp310-cp310-manylinux1_x86_64.whl(34.63 MB)
cupy_rocm_4_2-11.0.0b2-cp37-cp37m-manylinux1_x86_64.whl(33.30 MB)
cupy_rocm_4_2-11.0.0b2-cp38-cp38-manylinux1_x86_64.whl(36.11 MB)
cupy_rocm_4_2-11.0.0b2-cp39-cp39-manylinux1_x86_64.whl(34.56 MB)
cupy_rocm_4_3-11.0.0b2-cp310-cp310-manylinux1_x86_64.whl(36.22 MB)
cupy_rocm_4_3-11.0.0b2-cp37-cp37m-manylinux1_x86_64.whl(34.89 MB)
cupy_rocm_4_3-11.0.0b2-cp38-cp38-manylinux1_x86_64.whl(37.69 MB)
cupy_rocm_4_3-11.0.0b2-cp39-cp39-manylinux1_x86_64.whl(36.14 MB)
cupy_rocm_5_0-11.0.0b2-cp310-cp310-manylinux1_x86_64.whl(54.29 MB)
cupy_rocm_5_0-11.0.0b2-cp37-cp37m-manylinux1_x86_64.whl(52.96 MB)
cupy_rocm_5_0-11.0.0b2-cp38-cp38-manylinux1_x86_64.whl(55.76 MB)
cupy_rocm_5_0-11.0.0b2-cp39-cp39-manylinux1_x86_64.whl(54.21 MB)
v10.4.0(Apr 27, 2022)
This is the release note of v10.4.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Announcements

Introduction of generic cupy-wheel (EXPERIMENTAL) (#6012)

We have added a new package in the PyPI called cupy-wheel. This meta package allows other libraries to add a dependency to CuPy with the ability to transparently install the exact CuPy binary wheel matching the user environment. Users can also install CuPy using this package instead of manually specifying a CUDA/ROCm version.

pip install cupy-wheel

This package is only available for the stable release as the current pre-release wheels are not hosted in PyPI.

This feature is currently experimental and subject to change so we recommend users not to distribute packages relying on it for now. Your suggestions or comments are highly welcomed (please visit #6688.)

Changes

Enhancements

Add missing cudaDevAttrMemoryPoolsSupported to hip (#6626)

Add CC 3.2 to Tegra arch list (#6647)

Add a few driver/runtime/nvrtc API wrappers (#6651)

Bug Fixes

Define float16::operator-() only for ROCm 5.0+ (#6629)

JIT: fix access to cached codes (#6642)

[v10] Fix Mempool attr for Cuda Python (#6654)

Fix int64 overflow in cupy.polyval (#6666)

Documentation

Documentation update for ROCm 5.0 (#6607)

Add --pre option to instructions installing pre-releases (#6614)

Fix typo in performance guide (#6659)

JIT: fix function signatures in the docs (#6660)

Installation

Add universal CuPy package (#6683)

Tests

Remove jenkins requirements (#6634)

CI: Trigger FlexCI for hotfix branches (#6636)

Fix TestIncludesCompileCUDA for HEAD tests (#6650)

Trigger CUDA Python tests with /test mini (#6655)

Fix missing f prefix on f-strings fix (#6679)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@asi1024 @code-review-doctor @danielg1111 @emcastillo @kmaehashi @leofang @takagi
Source code(tar.gz)
Source code(zip)
cupy_cuda102-10.4.0-cp310-cp310-manylinux1_x86_64.whl(60.29 MB)
cupy_cuda102-10.4.0-cp310-cp310-manylinux2014_aarch64.whl(34.55 MB)
cupy_cuda102-10.4.0-cp310-cp310-win_amd64.whl(42.39 MB)
cupy_cuda102-10.4.0-cp37-cp37m-manylinux1_x86_64.whl(58.78 MB)
cupy_cuda102-10.4.0-cp37-cp37m-manylinux2014_aarch64.whl(32.86 MB)
cupy_cuda102-10.4.0-cp37-cp37m-win_amd64.whl(42.30 MB)
cupy_cuda102-10.4.0-cp38-cp38-manylinux1_x86_64.whl(61.93 MB)
cupy_cuda102-10.4.0-cp38-cp38-manylinux2014_aarch64.whl(35.95 MB)
cupy_cuda102-10.4.0-cp38-cp38-win_amd64.whl(42.39 MB)
cupy_cuda102-10.4.0-cp39-cp39-manylinux1_x86_64.whl(60.21 MB)
cupy_cuda102-10.4.0-cp39-cp39-manylinux2014_aarch64.whl(34.48 MB)
cupy_cuda102-10.4.0-cp39-cp39-win_amd64.whl(42.39 MB)
cupy_cuda110-10.4.0-cp310-cp310-manylinux1_x86_64.whl(74.92 MB)
cupy_cuda110-10.4.0-cp310-cp310-win_amd64.whl(56.97 MB)
cupy_cuda110-10.4.0-cp37-cp37m-manylinux1_x86_64.whl(73.40 MB)
cupy_cuda110-10.4.0-cp37-cp37m-win_amd64.whl(56.88 MB)
cupy_cuda110-10.4.0-cp38-cp38-manylinux1_x86_64.whl(76.55 MB)
cupy_cuda110-10.4.0-cp38-cp38-win_amd64.whl(56.98 MB)
cupy_cuda110-10.4.0-cp39-cp39-manylinux1_x86_64.whl(74.84 MB)
cupy_cuda110-10.4.0-cp39-cp39-win_amd64.whl(56.97 MB)
cupy_cuda111-10.4.0-cp310-cp310-manylinux1_x86_64.whl(93.70 MB)
cupy_cuda111-10.4.0-cp310-cp310-win_amd64.whl(76.71 MB)
cupy_cuda111-10.4.0-cp37-cp37m-manylinux1_x86_64.whl(92.19 MB)
cupy_cuda111-10.4.0-cp37-cp37m-win_amd64.whl(76.62 MB)
cupy_cuda111-10.4.0-cp38-cp38-manylinux1_x86_64.whl(95.34 MB)
cupy_cuda111-10.4.0-cp38-cp38-win_amd64.whl(76.72 MB)
cupy_cuda111-10.4.0-cp39-cp39-manylinux1_x86_64.whl(93.62 MB)
cupy_cuda111-10.4.0-cp39-cp39-win_amd64.whl(76.71 MB)
cupy_cuda112-10.4.0-cp310-cp310-manylinux1_x86_64.whl(75.33 MB)
cupy_cuda112-10.4.0-cp310-cp310-win_amd64.whl(57.46 MB)
cupy_cuda112-10.4.0-cp37-cp37m-manylinux1_x86_64.whl(73.81 MB)
cupy_cuda112-10.4.0-cp37-cp37m-win_amd64.whl(57.37 MB)
cupy_cuda112-10.4.0-cp38-cp38-manylinux1_x86_64.whl(76.97 MB)
cupy_cuda112-10.4.0-cp38-cp38-win_amd64.whl(57.46 MB)
cupy_cuda112-10.4.0-cp39-cp39-manylinux1_x86_64.whl(75.25 MB)
cupy_cuda112-10.4.0-cp39-cp39-win_amd64.whl(57.46 MB)
cupy_cuda113-10.4.0-cp310-cp310-manylinux1_x86_64.whl(72.51 MB)
cupy_cuda113-10.4.0-cp310-cp310-win_amd64.whl(54.19 MB)
cupy_cuda113-10.4.0-cp37-cp37m-manylinux1_x86_64.whl(71.00 MB)
cupy_cuda113-10.4.0-cp37-cp37m-win_amd64.whl(54.10 MB)
cupy_cuda113-10.4.0-cp38-cp38-manylinux1_x86_64.whl(74.15 MB)
cupy_cuda113-10.4.0-cp38-cp38-win_amd64.whl(54.19 MB)
cupy_cuda113-10.4.0-cp39-cp39-manylinux1_x86_64.whl(72.43 MB)
cupy_cuda113-10.4.0-cp39-cp39-win_amd64.whl(54.19 MB)
cupy_cuda114-10.4.0-cp310-cp310-manylinux1_x86_64.whl(80.99 MB)
cupy_cuda114-10.4.0-cp310-cp310-win_amd64.whl(62.88 MB)
cupy_cuda114-10.4.0-cp37-cp37m-manylinux1_x86_64.whl(79.47 MB)
cupy_cuda114-10.4.0-cp37-cp37m-win_amd64.whl(62.79 MB)
cupy_cuda114-10.4.0-cp38-cp38-manylinux1_x86_64.whl(82.62 MB)
cupy_cuda114-10.4.0-cp38-cp38-win_amd64.whl(62.89 MB)
cupy_cuda114-10.4.0-cp39-cp39-manylinux1_x86_64.whl(80.91 MB)
cupy_cuda114-10.4.0-cp39-cp39-win_amd64.whl(62.88 MB)
cupy_cuda115-10.4.0-cp310-cp310-manylinux1_x86_64.whl(77.70 MB)
cupy_cuda115-10.4.0-cp310-cp310-win_amd64.whl(59.56 MB)
cupy_cuda115-10.4.0-cp37-cp37m-manylinux1_x86_64.whl(76.19 MB)
cupy_cuda115-10.4.0-cp37-cp37m-win_amd64.whl(59.47 MB)
cupy_cuda115-10.4.0-cp38-cp38-manylinux1_x86_64.whl(79.34 MB)
cupy_cuda115-10.4.0-cp38-cp38-win_amd64.whl(59.57 MB)
cupy_cuda115-10.4.0-cp39-cp39-manylinux1_x86_64.whl(77.62 MB)
cupy_cuda115-10.4.0-cp39-cp39-win_amd64.whl(59.56 MB)
cupy_cuda116-10.4.0-cp310-cp310-manylinux1_x86_64.whl(77.75 MB)
cupy_cuda116-10.4.0-cp310-cp310-win_amd64.whl(59.59 MB)
cupy_cuda116-10.4.0-cp37-cp37m-manylinux1_x86_64.whl(76.22 MB)
cupy_cuda116-10.4.0-cp37-cp37m-win_amd64.whl(59.50 MB)
cupy_cuda116-10.4.0-cp38-cp38-manylinux1_x86_64.whl(79.38 MB)
cupy_cuda116-10.4.0-cp38-cp38-win_amd64.whl(59.59 MB)
cupy_cuda116-10.4.0-cp39-cp39-manylinux1_x86_64.whl(77.67 MB)
cupy_cuda116-10.4.0-cp39-cp39-win_amd64.whl(59.59 MB)
v10.3.1(Apr 8, 2022)
This is the release note of v10.3.1. See here for the complete list of solved issues and merged PRs.

This is a hot-fix release for v10.3.0 which contained a regression that prevents CuPy from working on older CUDA GPUs (Maxwell or earlier).

Changes

Bug Fixes

Define float16::operator-() only for ROCm 5.0+ (#6630)

Installation

Bump version to v10.3.1 (#6633)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@kmaehashi @takagi
Source code(tar.gz)
Source code(zip)
cupy_cuda102-10.3.1-cp310-cp310-manylinux1_x86_64.whl(60.26 MB)
cupy_cuda102-10.3.1-cp310-cp310-manylinux2014_aarch64.whl(34.52 MB)
cupy_cuda102-10.3.1-cp310-cp310-win_amd64.whl(42.38 MB)
cupy_cuda102-10.3.1-cp37-cp37m-manylinux1_x86_64.whl(58.75 MB)
cupy_cuda102-10.3.1-cp37-cp37m-manylinux2014_aarch64.whl(32.83 MB)
cupy_cuda102-10.3.1-cp37-cp37m-win_amd64.whl(42.29 MB)
cupy_cuda102-10.3.1-cp38-cp38-manylinux1_x86_64.whl(61.89 MB)
cupy_cuda102-10.3.1-cp38-cp38-manylinux2014_aarch64.whl(35.92 MB)
cupy_cuda102-10.3.1-cp38-cp38-win_amd64.whl(42.39 MB)
cupy_cuda102-10.3.1-cp39-cp39-manylinux1_x86_64.whl(60.18 MB)
cupy_cuda102-10.3.1-cp39-cp39-manylinux2014_aarch64.whl(34.45 MB)
cupy_cuda102-10.3.1-cp39-cp39-win_amd64.whl(42.38 MB)
cupy_cuda110-10.3.1-cp310-cp310-manylinux1_x86_64.whl(74.88 MB)
cupy_cuda110-10.3.1-cp310-cp310-win_amd64.whl(56.97 MB)
cupy_cuda110-10.3.1-cp37-cp37m-manylinux1_x86_64.whl(73.37 MB)
cupy_cuda110-10.3.1-cp37-cp37m-win_amd64.whl(56.87 MB)
cupy_cuda110-10.3.1-cp38-cp38-manylinux1_x86_64.whl(76.52 MB)
cupy_cuda110-10.3.1-cp38-cp38-win_amd64.whl(56.97 MB)
cupy_cuda110-10.3.1-cp39-cp39-manylinux1_x86_64.whl(74.80 MB)
cupy_cuda110-10.3.1-cp39-cp39-win_amd64.whl(56.96 MB)
cupy_cuda111-10.3.1-cp310-cp310-manylinux1_x86_64.whl(93.67 MB)
cupy_cuda111-10.3.1-cp310-cp310-win_amd64.whl(76.71 MB)
cupy_cuda111-10.3.1-cp37-cp37m-manylinux1_x86_64.whl(92.16 MB)
cupy_cuda111-10.3.1-cp37-cp37m-win_amd64.whl(76.61 MB)
cupy_cuda111-10.3.1-cp38-cp38-manylinux1_x86_64.whl(95.31 MB)
cupy_cuda111-10.3.1-cp38-cp38-win_amd64.whl(76.71 MB)
cupy_cuda111-10.3.1-cp39-cp39-manylinux1_x86_64.whl(93.59 MB)
cupy_cuda111-10.3.1-cp39-cp39-win_amd64.whl(76.70 MB)
cupy_cuda112-10.3.1-cp310-cp310-manylinux1_x86_64.whl(75.30 MB)
cupy_cuda112-10.3.1-cp310-cp310-win_amd64.whl(57.45 MB)
cupy_cuda112-10.3.1-cp37-cp37m-manylinux1_x86_64.whl(73.78 MB)
cupy_cuda112-10.3.1-cp37-cp37m-win_amd64.whl(57.36 MB)
cupy_cuda112-10.3.1-cp38-cp38-manylinux1_x86_64.whl(76.93 MB)
cupy_cuda112-10.3.1-cp38-cp38-win_amd64.whl(57.46 MB)
cupy_cuda112-10.3.1-cp39-cp39-manylinux1_x86_64.whl(75.21 MB)
cupy_cuda112-10.3.1-cp39-cp39-win_amd64.whl(57.45 MB)
cupy_cuda113-10.3.1-cp310-cp310-manylinux1_x86_64.whl(72.47 MB)
cupy_cuda113-10.3.1-cp310-cp310-win_amd64.whl(54.18 MB)
cupy_cuda113-10.3.1-cp37-cp37m-manylinux1_x86_64.whl(70.97 MB)
cupy_cuda113-10.3.1-cp37-cp37m-win_amd64.whl(54.09 MB)
cupy_cuda113-10.3.1-cp38-cp38-manylinux1_x86_64.whl(74.11 MB)
cupy_cuda113-10.3.1-cp38-cp38-win_amd64.whl(54.19 MB)
cupy_cuda113-10.3.1-cp39-cp39-manylinux1_x86_64.whl(72.39 MB)
cupy_cuda113-10.3.1-cp39-cp39-win_amd64.whl(54.18 MB)
cupy_cuda114-10.3.1-cp310-cp310-manylinux1_x86_64.whl(80.95 MB)
cupy_cuda114-10.3.1-cp310-cp310-win_amd64.whl(62.88 MB)
cupy_cuda114-10.3.1-cp37-cp37m-manylinux1_x86_64.whl(79.44 MB)
cupy_cuda114-10.3.1-cp37-cp37m-win_amd64.whl(62.78 MB)
cupy_cuda114-10.3.1-cp38-cp38-manylinux1_x86_64.whl(82.59 MB)
cupy_cuda114-10.3.1-cp38-cp38-win_amd64.whl(62.88 MB)
cupy_cuda114-10.3.1-cp39-cp39-manylinux1_x86_64.whl(80.87 MB)
cupy_cuda114-10.3.1-cp39-cp39-win_amd64.whl(62.87 MB)
cupy_cuda115-10.3.1-cp310-cp310-manylinux1_x86_64.whl(77.67 MB)
cupy_cuda115-10.3.1-cp310-cp310-win_amd64.whl(59.56 MB)
cupy_cuda115-10.3.1-cp37-cp37m-manylinux1_x86_64.whl(76.16 MB)
cupy_cuda115-10.3.1-cp37-cp37m-win_amd64.whl(59.46 MB)
cupy_cuda115-10.3.1-cp38-cp38-manylinux1_x86_64.whl(79.30 MB)
cupy_cuda115-10.3.1-cp38-cp38-win_amd64.whl(59.56 MB)
cupy_cuda115-10.3.1-cp39-cp39-manylinux1_x86_64.whl(77.59 MB)
cupy_cuda115-10.3.1-cp39-cp39-win_amd64.whl(59.55 MB)
cupy_cuda116-10.3.1-cp310-cp310-manylinux1_x86_64.whl(77.71 MB)
cupy_cuda116-10.3.1-cp310-cp310-win_amd64.whl(59.58 MB)
cupy_cuda116-10.3.1-cp37-cp37m-manylinux1_x86_64.whl(76.19 MB)
cupy_cuda116-10.3.1-cp37-cp37m-win_amd64.whl(59.49 MB)
cupy_cuda116-10.3.1-cp38-cp38-manylinux1_x86_64.whl(79.35 MB)
cupy_cuda116-10.3.1-cp38-cp38-win_amd64.whl(59.58 MB)
cupy_cuda116-10.3.1-cp39-cp39-manylinux1_x86_64.whl(77.63 MB)
cupy_cuda116-10.3.1-cp39-cp39-win_amd64.whl(59.58 MB)
v11.0.0b1(Mar 31, 2022)
This is the release note of v11.0.0b1. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Notice (2022-04-05)

We have identified that this release contains a regression that prevents CuPy from working in older CUDA GPUs (Maxwell or earlier). We are planning to fix this issue in the next pre-release. See #6615 for the details.

Highlights

Increase coverage of cupyx.scipy.special APIs (#6461, #6582, #6571)

A series of scipy.special routines have been added to cupyx with optimized CUDA raw kernel implementations. loggamma, multigammaln, fast Hankel transformations and several other utility special functions are added in these series of PRs by @grlee77 and @khushi-411.

Support for CUDA 11.6

Full support for CUDA 11.6 has been added as of this release. Binary packages can be installed with the following commnad: pip install --pre cupy-cuda116 -f https://pip.cupy.dev/pre

Support for ROCm 5.0

Full support for ROCm 5.0 has been added as of this release. Binary packages can be installed with the following commnad: pip install --pre cupy-rocm-5-0 -f https://pip.cupy.dev/pre

Changes without compatibility

Use CUB by default (#6549)

CUB support in CuPy is now enabled by default. This results in faster general reductions and routines such as sum, argmax, argmin having increased performance. Notice that CUB may introduce some non-deterministic behavior and this can be disabled by setting the CUPY_ACCELERATORS="" environment variable.

Drop support for ROCm 4.0 (#6420)

CuPy v11 will drop support for ROCm 4.0. We recommend users to use ROCm 4.3 or 5.0 instead.

Changes

New Features

Add cupyx.scipy.special statistical distributions (#6461)

Add cupy.real_if_close API (#6475)

Add cupyx.scipy.special loggamma, multigammaln and fast Hankel transforms (#6528)

Add cupyx.scipy.special.{i0e, i1e} (#6571)

Enhancements

Update cupy.array_api (#6486)

Fix for supporting ROCm 5.0 (#6524)

Use CUB by default (#6549)

Fix cupy.copyto to take NumPy array scalars (#6584)

Implement ndarray.ravel(order="K") (#6585)

Make einsum accept subscripts in numpy int (#6506)

Performance Improvements

Support cusparseSpGEMM() (#6511)

eigsh: Prefer gemv over gemm (#6570)

Performance improvement of cupy.in1d (#6583)

Bug Fixes

Fix cupy.fill to properly take zero-dim cupy.ndarray (#6481)

Fix error message in vectorize (#6499)

Fix cupy.cumsum on ROCm 5.0 (#6520)

Fix coo_matrix.diagonal (#6522)

Fix array creation shape (#6545)

Fix out args parser of ufunc (#6546)

Fix may_share_memory algorithm (#6560)

Avoid using the same kernel from different devices in JIT (#6575)

Fix cupy.full and cupy.full_like to make unsafe casting (#6587)

Fix device context management in MemoryAsyncPool (#6590)

Code Fixes

mypy: array_api (#6438)

Minor fixes on uarray backend support (#6526)

Documentation

Fix documents for CUDA 11.6 (#6405)

Remove description about issues from contribution guide (#6497)

Documentation update for ROCm 5.0 (#6530)

Installation

Skip appending --compiler-bindir if cl.exe is already on PATH (#6510)

Bump version to v11.0.0b1 (#6601)

Tests

Add FlexCI projects for Windows (#5889)

Run cupy-benchmark on CI (#6417)

Disable CentOS 8 test (#6492)

Fix Dockerfile broken for array-api tests (#6508)

CI: Trigger push event of FlexCI via GitHub Actions (#6538)

Skip async_malloc tests on unsupported device (#6541)

Fix flaky test_inverse_indices_shape (#6551)

Trigger CUDA 11.6 Windows CI when push/pull-request (#6553)

CI: Fix event name in dispatcher (#6555)

CI: Fix rule name in dispatcher (#6556)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @asi1024 @emcastillo @grlee77 @khushi-411 @kmaehashi @leofang @Onkar627 @peterbell10 @pri1311 @Smit-create @takagi @toslunar @tushxr16
Source code(tar.gz)
Source code(zip)
cupy_cuda102-11.0.0b1-cp310-cp310-manylinux1_x86_64.whl(60.51 MB)
cupy_cuda102-11.0.0b1-cp310-cp310-manylinux2014_aarch64.whl(34.76 MB)
cupy_cuda102-11.0.0b1-cp310-cp310-win_amd64.whl(42.49 MB)
cupy_cuda102-11.0.0b1-cp37-cp37m-manylinux1_x86_64.whl(58.98 MB)
cupy_cuda102-11.0.0b1-cp37-cp37m-manylinux2014_aarch64.whl(33.07 MB)
cupy_cuda102-11.0.0b1-cp37-cp37m-win_amd64.whl(42.39 MB)
cupy_cuda102-11.0.0b1-cp38-cp38-manylinux1_x86_64.whl(62.16 MB)
cupy_cuda102-11.0.0b1-cp38-cp38-manylinux2014_aarch64.whl(36.20 MB)
cupy_cuda102-11.0.0b1-cp38-cp38-win_amd64.whl(42.49 MB)
cupy_cuda102-11.0.0b1-cp39-cp39-manylinux1_x86_64.whl(60.44 MB)
cupy_cuda102-11.0.0b1-cp39-cp39-manylinux2014_aarch64.whl(34.71 MB)
cupy_cuda102-11.0.0b1-cp39-cp39-win_amd64.whl(42.48 MB)
cupy_cuda110-11.0.0b1-cp310-cp310-manylinux1_x86_64.whl(75.13 MB)
cupy_cuda110-11.0.0b1-cp310-cp310-win_amd64.whl(57.07 MB)
cupy_cuda110-11.0.0b1-cp37-cp37m-manylinux1_x86_64.whl(73.60 MB)
cupy_cuda110-11.0.0b1-cp37-cp37m-win_amd64.whl(56.97 MB)
cupy_cuda110-11.0.0b1-cp38-cp38-manylinux1_x86_64.whl(76.78 MB)
cupy_cuda110-11.0.0b1-cp38-cp38-win_amd64.whl(57.07 MB)
cupy_cuda110-11.0.0b1-cp39-cp39-manylinux1_x86_64.whl(75.06 MB)
cupy_cuda110-11.0.0b1-cp39-cp39-win_amd64.whl(57.06 MB)
cupy_cuda111-11.0.0b1-cp310-cp310-manylinux1_x86_64.whl(93.92 MB)
cupy_cuda111-11.0.0b1-cp310-cp310-win_amd64.whl(76.81 MB)
cupy_cuda111-11.0.0b1-cp37-cp37m-manylinux1_x86_64.whl(92.39 MB)
cupy_cuda111-11.0.0b1-cp37-cp37m-win_amd64.whl(76.71 MB)
cupy_cuda111-11.0.0b1-cp38-cp38-manylinux1_x86_64.whl(95.57 MB)
cupy_cuda111-11.0.0b1-cp38-cp38-win_amd64.whl(76.81 MB)
cupy_cuda111-11.0.0b1-cp39-cp39-manylinux1_x86_64.whl(93.84 MB)
cupy_cuda111-11.0.0b1-cp39-cp39-win_amd64.whl(76.80 MB)
cupy_cuda112-11.0.0b1-cp310-cp310-manylinux1_x86_64.whl(75.55 MB)
cupy_cuda112-11.0.0b1-cp310-cp310-win_amd64.whl(57.55 MB)
cupy_cuda112-11.0.0b1-cp37-cp37m-manylinux1_x86_64.whl(74.01 MB)
cupy_cuda112-11.0.0b1-cp37-cp37m-win_amd64.whl(57.46 MB)
cupy_cuda112-11.0.0b1-cp38-cp38-manylinux1_x86_64.whl(77.19 MB)
cupy_cuda112-11.0.0b1-cp38-cp38-win_amd64.whl(57.56 MB)
cupy_cuda112-11.0.0b1-cp39-cp39-manylinux1_x86_64.whl(75.47 MB)
cupy_cuda112-11.0.0b1-cp39-cp39-win_amd64.whl(57.55 MB)
cupy_cuda113-11.0.0b1-cp310-cp310-manylinux1_x86_64.whl(72.72 MB)
cupy_cuda113-11.0.0b1-cp310-cp310-win_amd64.whl(54.29 MB)
cupy_cuda113-11.0.0b1-cp37-cp37m-manylinux1_x86_64.whl(71.19 MB)
cupy_cuda113-11.0.0b1-cp37-cp37m-win_amd64.whl(54.19 MB)
cupy_cuda113-11.0.0b1-cp38-cp38-manylinux1_x86_64.whl(74.37 MB)
cupy_cuda113-11.0.0b1-cp38-cp38-win_amd64.whl(54.29 MB)
cupy_cuda113-11.0.0b1-cp39-cp39-manylinux1_x86_64.whl(72.65 MB)
cupy_cuda113-11.0.0b1-cp39-cp39-win_amd64.whl(54.28 MB)
cupy_cuda114-11.0.0b1-cp310-cp310-manylinux1_x86_64.whl(81.20 MB)
cupy_cuda114-11.0.0b1-cp310-cp310-win_amd64.whl(62.98 MB)
cupy_cuda114-11.0.0b1-cp37-cp37m-manylinux1_x86_64.whl(79.67 MB)
cupy_cuda114-11.0.0b1-cp37-cp37m-win_amd64.whl(62.88 MB)
cupy_cuda114-11.0.0b1-cp38-cp38-manylinux1_x86_64.whl(82.85 MB)
cupy_cuda114-11.0.0b1-cp38-cp38-win_amd64.whl(62.98 MB)
cupy_cuda114-11.0.0b1-cp39-cp39-manylinux1_x86_64.whl(81.12 MB)
cupy_cuda114-11.0.0b1-cp39-cp39-win_amd64.whl(62.97 MB)
cupy_cuda115-11.0.0b1-cp310-cp310-manylinux1_x86_64.whl(77.92 MB)
cupy_cuda115-11.0.0b1-cp310-cp310-win_amd64.whl(59.66 MB)
cupy_cuda115-11.0.0b1-cp37-cp37m-manylinux1_x86_64.whl(76.39 MB)
cupy_cuda115-11.0.0b1-cp37-cp37m-win_amd64.whl(59.56 MB)
cupy_cuda115-11.0.0b1-cp38-cp38-manylinux1_x86_64.whl(79.56 MB)
cupy_cuda115-11.0.0b1-cp38-cp38-win_amd64.whl(59.66 MB)
cupy_cuda115-11.0.0b1-cp39-cp39-manylinux1_x86_64.whl(77.84 MB)
cupy_cuda115-11.0.0b1-cp39-cp39-win_amd64.whl(59.66 MB)
cupy_cuda116-11.0.0b1-cp310-cp310-manylinux1_x86_64.whl(77.96 MB)
cupy_cuda116-11.0.0b1-cp310-cp310-win_amd64.whl(59.68 MB)
cupy_cuda116-11.0.0b1-cp37-cp37m-manylinux1_x86_64.whl(76.42 MB)
cupy_cuda116-11.0.0b1-cp37-cp37m-win_amd64.whl(59.59 MB)
cupy_cuda116-11.0.0b1-cp38-cp38-manylinux1_x86_64.whl(79.61 MB)
cupy_cuda116-11.0.0b1-cp38-cp38-win_amd64.whl(59.68 MB)
cupy_cuda116-11.0.0b1-cp39-cp39-manylinux1_x86_64.whl(77.88 MB)
cupy_cuda116-11.0.0b1-cp39-cp39-win_amd64.whl(59.68 MB)
cupy_rocm_4_2-11.0.0b1-cp310-cp310-manylinux1_x86_64.whl(34.56 MB)
cupy_rocm_4_2-11.0.0b1-cp37-cp37m-manylinux1_x86_64.whl(33.24 MB)
cupy_rocm_4_2-11.0.0b1-cp38-cp38-manylinux1_x86_64.whl(36.03 MB)
cupy_rocm_4_2-11.0.0b1-cp39-cp39-manylinux1_x86_64.whl(34.49 MB)
cupy_rocm_4_3-11.0.0b1-cp310-cp310-manylinux1_x86_64.whl(36.14 MB)
cupy_rocm_4_3-11.0.0b1-cp37-cp37m-manylinux1_x86_64.whl(34.82 MB)
cupy_rocm_4_3-11.0.0b1-cp38-cp38-manylinux1_x86_64.whl(37.61 MB)
cupy_rocm_4_3-11.0.0b1-cp39-cp39-manylinux1_x86_64.whl(36.07 MB)
cupy_rocm_5_0-11.0.0b1-cp310-cp310-manylinux1_x86_64.whl(54.21 MB)
cupy_rocm_5_0-11.0.0b1-cp37-cp37m-manylinux1_x86_64.whl(52.89 MB)
cupy_rocm_5_0-11.0.0b1-cp38-cp38-manylinux1_x86_64.whl(55.68 MB)
cupy_rocm_5_0-11.0.0b1-cp39-cp39-manylinux1_x86_64.whl(54.14 MB)
v10.3.0(Mar 31, 2022)
This is the release note of v10.3.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Notice (2022-04-08)

We have published a hot-fix release v10.3.1 which addresses a regression that prevents CuPy from working in older CUDA GPUs (Maxwell or earlier).

Highlights

Support for CUDA 11.6

Full support for CUDA 11.6 has been added as of this release. Binary packages are available in PyPI and can be installed with the following command: pip install cupy-cuda116

Support for ROCm 5.0

Full support for ROCm 5.0 has been added as of this release. Binary packages are available in PyPI and can be installed with the following command: pip install cupy-rocm-5-0

Changes

Enhancements

Support ROCm 5.0 (#6496)

Support cuSPARSELt 0.2.0 (repost) (#6507)

Update cupy.array_api (#6550)

Fix cupy.copyto to take NumPy array scalars (#6593)

Fix for supporting ROCm 5.0 (#6599)

Make einsum accept subscripts in numpy int (#6516)

Bug Fixes

Fix error message in vectorize (#6515)

Fix cupy.cumsum on ROCm 5.0 (#6525)

Fix coo_matrix.diagonal (#6533)

Fix out args parser of ufunc (#6547)

Fix cupy.fill to properly take zero-dim cupy.ndarray (#6548)

Fix cuSPARSELt 0.1.0 support in v10 (#6563)

Fix may_share_memory algorithm (#6565)

Avoid using the same kernel from different devices in JIT (#6581)

Fix array creation shape (#6592)

Fix cupy.full and cupy.full_like to make unsafe casting (#6595)

Fix device context management in MemoryAsyncPool (#6596)

Code Fixes

mypy: array_api (#6552)

Documentation

Remove description about issues from contribution guide (#6542)

Fix documents for CUDA 11.6 (#6543)

Installation

Remove CUPY_SETUP_ENABLE_THRUST=0 environment variable (#6488)

Skip appending --compiler-bindir if cl.exe is already on PATH (#6514)

Bump version to v10.3.0 (#6602)

Tests

Ignore warnings from Optuna 3.0 pre-releases (#6490)

Disable CentOS 8 test (#6519)

Add FlexCI projects for Windows (#6540)

Skip async_malloc tests on unsupported device (#6544)

CI: Trigger push event of FlexCI via GitHub Actions (#6554)

CI: regenerate matrix (#6557)

CI: Fix rule name in dispatcher (#6558)

CI: Fix event name in dispatcher (#6559)

Fix flaky test_inverse_indices_shape (#6573)

Trigger CUDA 11.6 Windows CI when push/pull-request (#6578)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @asi1024 @kmaehashi @leofang @Onkar627 @takagi @toslunar @tushxr16
Source code(tar.gz)
Source code(zip)
cupy_cuda102-10.3.0-cp310-cp310-manylinux1_x86_64.whl(60.26 MB)
cupy_cuda102-10.3.0-cp310-cp310-manylinux2014_aarch64.whl(34.52 MB)
cupy_cuda102-10.3.0-cp310-cp310-win_amd64.whl(42.38 MB)
cupy_cuda102-10.3.0-cp37-cp37m-manylinux1_x86_64.whl(58.75 MB)
cupy_cuda102-10.3.0-cp37-cp37m-manylinux2014_aarch64.whl(32.83 MB)
cupy_cuda102-10.3.0-cp37-cp37m-win_amd64.whl(42.29 MB)
cupy_cuda102-10.3.0-cp38-cp38-manylinux1_x86_64.whl(61.89 MB)
cupy_cuda102-10.3.0-cp38-cp38-manylinux2014_aarch64.whl(35.92 MB)
cupy_cuda102-10.3.0-cp38-cp38-win_amd64.whl(42.39 MB)
cupy_cuda102-10.3.0-cp39-cp39-manylinux1_x86_64.whl(60.18 MB)
cupy_cuda102-10.3.0-cp39-cp39-manylinux2014_aarch64.whl(34.45 MB)
cupy_cuda102-10.3.0-cp39-cp39-win_amd64.whl(42.38 MB)
cupy_cuda110-10.3.0-cp310-cp310-manylinux1_x86_64.whl(74.88 MB)
cupy_cuda110-10.3.0-cp310-cp310-win_amd64.whl(56.97 MB)
cupy_cuda110-10.3.0-cp37-cp37m-manylinux1_x86_64.whl(73.37 MB)
cupy_cuda110-10.3.0-cp37-cp37m-win_amd64.whl(56.87 MB)
cupy_cuda110-10.3.0-cp38-cp38-manylinux1_x86_64.whl(76.52 MB)
cupy_cuda110-10.3.0-cp38-cp38-win_amd64.whl(56.97 MB)
cupy_cuda110-10.3.0-cp39-cp39-manylinux1_x86_64.whl(74.80 MB)
cupy_cuda110-10.3.0-cp39-cp39-win_amd64.whl(56.96 MB)
cupy_cuda111-10.3.0-cp310-cp310-manylinux1_x86_64.whl(93.67 MB)
cupy_cuda111-10.3.0-cp310-cp310-win_amd64.whl(76.71 MB)
cupy_cuda111-10.3.0-cp37-cp37m-manylinux1_x86_64.whl(92.16 MB)
cupy_cuda111-10.3.0-cp37-cp37m-win_amd64.whl(76.61 MB)
cupy_cuda111-10.3.0-cp38-cp38-manylinux1_x86_64.whl(95.31 MB)
cupy_cuda111-10.3.0-cp38-cp38-win_amd64.whl(76.71 MB)
cupy_cuda111-10.3.0-cp39-cp39-manylinux1_x86_64.whl(93.59 MB)
cupy_cuda111-10.3.0-cp39-cp39-win_amd64.whl(76.70 MB)
cupy_cuda112-10.3.0-cp310-cp310-manylinux1_x86_64.whl(75.30 MB)
cupy_cuda112-10.3.0-cp310-cp310-win_amd64.whl(57.45 MB)
cupy_cuda112-10.3.0-cp37-cp37m-manylinux1_x86_64.whl(73.78 MB)
cupy_cuda112-10.3.0-cp37-cp37m-win_amd64.whl(57.36 MB)
cupy_cuda112-10.3.0-cp38-cp38-manylinux1_x86_64.whl(76.93 MB)
cupy_cuda112-10.3.0-cp38-cp38-win_amd64.whl(57.46 MB)
cupy_cuda112-10.3.0-cp39-cp39-manylinux1_x86_64.whl(75.21 MB)
cupy_cuda112-10.3.0-cp39-cp39-win_amd64.whl(57.45 MB)
cupy_cuda113-10.3.0-cp310-cp310-manylinux1_x86_64.whl(72.47 MB)
cupy_cuda113-10.3.0-cp310-cp310-win_amd64.whl(54.18 MB)
cupy_cuda113-10.3.0-cp37-cp37m-manylinux1_x86_64.whl(70.97 MB)
cupy_cuda113-10.3.0-cp37-cp37m-win_amd64.whl(54.09 MB)
cupy_cuda113-10.3.0-cp38-cp38-manylinux1_x86_64.whl(74.11 MB)
cupy_cuda113-10.3.0-cp38-cp38-win_amd64.whl(54.19 MB)
cupy_cuda113-10.3.0-cp39-cp39-manylinux1_x86_64.whl(72.39 MB)
cupy_cuda113-10.3.0-cp39-cp39-win_amd64.whl(54.18 MB)
cupy_cuda114-10.3.0-cp310-cp310-manylinux1_x86_64.whl(80.95 MB)
cupy_cuda114-10.3.0-cp310-cp310-win_amd64.whl(62.88 MB)
cupy_cuda114-10.3.0-cp37-cp37m-manylinux1_x86_64.whl(79.44 MB)
cupy_cuda114-10.3.0-cp37-cp37m-win_amd64.whl(62.78 MB)
cupy_cuda114-10.3.0-cp38-cp38-manylinux1_x86_64.whl(82.59 MB)
cupy_cuda114-10.3.0-cp38-cp38-win_amd64.whl(62.88 MB)
cupy_cuda114-10.3.0-cp39-cp39-manylinux1_x86_64.whl(80.87 MB)
cupy_cuda114-10.3.0-cp39-cp39-win_amd64.whl(62.87 MB)
cupy_cuda115-10.3.0-cp310-cp310-manylinux1_x86_64.whl(77.67 MB)
cupy_cuda115-10.3.0-cp310-cp310-win_amd64.whl(59.56 MB)
cupy_cuda115-10.3.0-cp37-cp37m-manylinux1_x86_64.whl(76.16 MB)
cupy_cuda115-10.3.0-cp37-cp37m-win_amd64.whl(59.46 MB)
cupy_cuda115-10.3.0-cp38-cp38-manylinux1_x86_64.whl(79.30 MB)
cupy_cuda115-10.3.0-cp38-cp38-win_amd64.whl(59.56 MB)
cupy_cuda115-10.3.0-cp39-cp39-manylinux1_x86_64.whl(77.59 MB)
cupy_cuda115-10.3.0-cp39-cp39-win_amd64.whl(59.55 MB)
cupy_cuda116-10.3.0-cp310-cp310-manylinux1_x86_64.whl(77.71 MB)
cupy_cuda116-10.3.0-cp310-cp310-win_amd64.whl(59.58 MB)
cupy_cuda116-10.3.0-cp37-cp37m-manylinux1_x86_64.whl(76.19 MB)
cupy_cuda116-10.3.0-cp37-cp37m-win_amd64.whl(59.49 MB)
cupy_cuda116-10.3.0-cp38-cp38-manylinux1_x86_64.whl(79.35 MB)
cupy_cuda116-10.3.0-cp38-cp38-win_amd64.whl(59.58 MB)
cupy_cuda116-10.3.0-cp39-cp39-manylinux1_x86_64.whl(77.63 MB)
cupy_cuda116-10.3.0-cp39-cp39-win_amd64.whl(59.58 MB)
v11.0.0a2(Feb 25, 2022)
This is the release note of v11.0.0a2 See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Improved NumPy functions coverage (#6078)

As series of NumPy routines have been proposed as a good-first-issue and as a result, an increasing number of contributors have sent pull requests to help increase the number of available APIs. An issue tracker with the currently implemented issues is available at #6078.

Initial support for cupy.typing (#6251)

An API equivalent to numpy.typing to allow the introduction of data types in CuPy and user codes has been added.

Support for CUDA 11.6 (#6349)

Initial support for CUDA 11.6 has been added as of this release. However, binary wheels are not yet distributed and users are expected to build CuPy from source meanwhile.

Support for ROCm 5.0 (#6466)

Initial support for ROCm 5.0 has been added as of this release. However, binary wheels are not yet distributed and users are expected to build CuPy from source meanwhile.

Changes without compatibility

Drop support for ROCm 4.0 (#6420)

CuPy v11 will drop support for ROCm 4.0. We recommend users to use ROCm 4.2/4.3 instead.

Changes

New Features

Add cupy.isneginf and cupy.isposinf (#6089)

Add cupy.typing (#6251)

Add asarray_chkfinite API. (#6275)

Add Box-Cox transformations to cupyx.scipy.special (#6302)

Use CUDA's log1p for cupyx.scipy.special.log1p (#6315)

Add special functions from the CUDA Math API (#6317)

Add beta functions to cupyx.scipy.special (#6318)

Add cupy.union1d API. (#6357)

Add cupy.float_power (#6371)

Add cupy.intersect1d API. (#6402)

Add cupy.setdiff1d api. (#6433)

Add cupy.format_float_scientific API (#6474)

Enhancements

First step of mypy introduction (#4955)

Fix CI failure to support SciPy 1.8.0 (#6249)

implement overwrite_input in cupy.{percentile,quantile} (#6298)

avoid DeprecationWarning from SciPy 1.8 (cupyx.scipy.sparse) (#6321)

Support NumPy 1.22 (#6323)

Remove batched QR solver's experimental mark (#6327)

Make scipy.special ufuncs work with CuPy inputs (#6341)

Fix thrust related build issue with CUDA 11.6 (#6346)

Support CUDA 11.6 (#6349)

Fix CI failure to support SciPy 1.8.0 (#6362)

Fix type annotations in installer (#6382)

Add __cupy_get_ndarray__ dunder method to transform objects to arrays' (#6414)

Bump Jitify version to fix memory leak (#6430)

Support cuSPARSELt 0.2.0 (repost) (#6436)

Support ROCm 5.0 (#6466)

Warn if unexpectedlly failed to detect device count in cupy.show_config() (#6472)

Fix verbose LOBPCG for SciPy 1.8 (#6388)

Performance Improvements

Reduce memory usage in cupy.sort (#6392)

Bug Fixes

Fix JIT to support notebook environment (#6329)

Fix cupyx.ndimage.spline_filter1d for HIP (#6406)

Fix cupy.nan_to_num (#6408)

Fix cupyx.special.gammainc, lpmv and sph_harm for hip (#6409)

Fix boolean views for HIP (#6412)

Fix reduction contiguous size calculation (#6457)

Code Fixes

Remove global use_hip flag in setup (#6391)

Hide private names in cupyx.scipy.linalg (#6449)

Hide private names in cupyx.scipy.ndimage (#6450)

Hide private names in cupyx.scipy.signal (#6451)

Hide private names in cupyx.scipy.sparse (#6454)

Hide private names in cupyx.scipy.stats (#6456)

Documentation

Use cupy.__version__ instead of pkg_resources (#6332)

Tentatively pin intersphinx to SciPy 1.7.1 docs (#6440)

Revert "Tentatively pin intersphinx to SciPy 1.7.1 docs" (#6479)

Installation

Avoid monkeypatching distutils (#6273)

Eliminate unnecessary configuration pass in setup (#6389)

Remove CUPY_SETUP_ENABLE_THRUST=0 environment variable (#6390)

Drop support for ROCm 4.0 (#6420)

Bump version to v11.0.0a2 (#6501)

Tests

CI: allow discarding docker image cache manually (#6269)

Add slow tests for stable branch (#6340)

Parameterize library installer tests (#6343)

Fix tests for eigh() for CUDA 11.6 (#6347)

Avoid empty notification message for scheduled tests (#6363)

Support SciPy 1.8 (#6365)

Add cupy.testing.installed (#6381)

Mark XFAIL for SciPy 1.8 release candidate (#6385)

CI: Bump ROCm version from 4.3 to 4.3.1 (#6415)

CI: build docs in parallel (#6416)

CI: Add HEAD tests for stable branch (#6423)

CI: Use default schema/matrix path in generate.py (#6424)

Skip hfft related tests in HIP (#6427)

CI: Manage test tags in yaml (#6429)

CI: coverage in reST (#6445)

CI: fix NCCL 2.10 unit test not covered (#6448)

CI: Fix CUDA 11.6 driver update steps (#6467)

Ignore warnings from Optuna 3.0 pre-releases (#6470)

Fix failing tests in ROCm (#6482)

Others

CI: allow specifying special skip tag (#6468)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@amanchhaparia @anaruse @asi1024 @emcastillo @grlee77 @IvanYashchuk @khushi-411 @kmaehashi @pri1311 @saswatpp @takagi
Source code(tar.gz)
Source code(zip)
cupy_cuda102-11.0.0a2-cp310-cp310-manylinux1_x86_64.whl(60.44 MB)
cupy_cuda102-11.0.0a2-cp310-cp310-manylinux2014_aarch64.whl(34.69 MB)
cupy_cuda102-11.0.0a2-cp310-cp310-win_amd64.whl(42.46 MB)
cupy_cuda102-11.0.0a2-cp37-cp37m-manylinux1_x86_64.whl(58.91 MB)
cupy_cuda102-11.0.0a2-cp37-cp37m-manylinux2014_aarch64.whl(33.00 MB)
cupy_cuda102-11.0.0a2-cp37-cp37m-win_amd64.whl(42.37 MB)
cupy_cuda102-11.0.0a2-cp38-cp38-manylinux1_x86_64.whl(62.08 MB)
cupy_cuda102-11.0.0a2-cp38-cp38-manylinux2014_aarch64.whl(36.12 MB)
cupy_cuda102-11.0.0a2-cp38-cp38-win_amd64.whl(42.46 MB)
cupy_cuda102-11.0.0a2-cp39-cp39-manylinux1_x86_64.whl(60.36 MB)
cupy_cuda102-11.0.0a2-cp39-cp39-manylinux2014_aarch64.whl(34.64 MB)
cupy_cuda102-11.0.0a2-cp39-cp39-win_amd64.whl(42.46 MB)
cupy_cuda110-11.0.0a2-cp310-cp310-manylinux1_x86_64.whl(75.06 MB)
cupy_cuda110-11.0.0a2-cp310-cp310-win_amd64.whl(57.04 MB)
cupy_cuda110-11.0.0a2-cp37-cp37m-manylinux1_x86_64.whl(73.53 MB)
cupy_cuda110-11.0.0a2-cp37-cp37m-win_amd64.whl(56.95 MB)
cupy_cuda110-11.0.0a2-cp38-cp38-manylinux1_x86_64.whl(76.70 MB)
cupy_cuda110-11.0.0a2-cp38-cp38-win_amd64.whl(57.05 MB)
cupy_cuda110-11.0.0a2-cp39-cp39-manylinux1_x86_64.whl(74.98 MB)
cupy_cuda110-11.0.0a2-cp39-cp39-win_amd64.whl(57.04 MB)
cupy_cuda111-11.0.0a2-cp310-cp310-manylinux1_x86_64.whl(93.85 MB)
cupy_cuda111-11.0.0a2-cp310-cp310-win_amd64.whl(76.78 MB)
cupy_cuda111-11.0.0a2-cp37-cp37m-manylinux1_x86_64.whl(92.32 MB)
cupy_cuda111-11.0.0a2-cp37-cp37m-win_amd64.whl(76.69 MB)
cupy_cuda111-11.0.0a2-cp38-cp38-manylinux1_x86_64.whl(95.49 MB)
cupy_cuda111-11.0.0a2-cp38-cp38-win_amd64.whl(76.79 MB)
cupy_cuda111-11.0.0a2-cp39-cp39-manylinux1_x86_64.whl(93.77 MB)
cupy_cuda111-11.0.0a2-cp39-cp39-win_amd64.whl(76.78 MB)
cupy_cuda112-11.0.0a2-cp310-cp310-manylinux1_x86_64.whl(75.48 MB)
cupy_cuda112-11.0.0a2-cp310-cp310-win_amd64.whl(57.53 MB)
cupy_cuda112-11.0.0a2-cp37-cp37m-manylinux1_x86_64.whl(73.94 MB)
cupy_cuda112-11.0.0a2-cp37-cp37m-win_amd64.whl(57.44 MB)
cupy_cuda112-11.0.0a2-cp38-cp38-manylinux1_x86_64.whl(77.12 MB)
cupy_cuda112-11.0.0a2-cp38-cp38-win_amd64.whl(57.53 MB)
cupy_cuda112-11.0.0a2-cp39-cp39-manylinux1_x86_64.whl(75.40 MB)
cupy_cuda112-11.0.0a2-cp39-cp39-win_amd64.whl(57.53 MB)
cupy_cuda113-11.0.0a2-cp310-cp310-manylinux1_x86_64.whl(72.65 MB)
cupy_cuda113-11.0.0a2-cp310-cp310-win_amd64.whl(54.26 MB)
cupy_cuda113-11.0.0a2-cp37-cp37m-manylinux1_x86_64.whl(71.12 MB)
cupy_cuda113-11.0.0a2-cp37-cp37m-win_amd64.whl(54.17 MB)
cupy_cuda113-11.0.0a2-cp38-cp38-manylinux1_x86_64.whl(74.30 MB)
cupy_cuda113-11.0.0a2-cp38-cp38-win_amd64.whl(54.26 MB)
cupy_cuda113-11.0.0a2-cp39-cp39-manylinux1_x86_64.whl(72.58 MB)
cupy_cuda113-11.0.0a2-cp39-cp39-win_amd64.whl(54.26 MB)
cupy_cuda114-11.0.0a2-cp310-cp310-manylinux1_x86_64.whl(81.13 MB)
cupy_cuda114-11.0.0a2-cp310-cp310-win_amd64.whl(62.95 MB)
cupy_cuda114-11.0.0a2-cp37-cp37m-manylinux1_x86_64.whl(79.59 MB)
cupy_cuda114-11.0.0a2-cp37-cp37m-win_amd64.whl(62.86 MB)
cupy_cuda114-11.0.0a2-cp38-cp38-manylinux1_x86_64.whl(82.77 MB)
cupy_cuda114-11.0.0a2-cp38-cp38-win_amd64.whl(62.96 MB)
cupy_cuda114-11.0.0a2-cp39-cp39-manylinux1_x86_64.whl(81.05 MB)
cupy_cuda114-11.0.0a2-cp39-cp39-win_amd64.whl(62.95 MB)
cupy_cuda115-11.0.0a2-cp310-cp310-manylinux1_x86_64.whl(77.85 MB)
cupy_cuda115-11.0.0a2-cp310-cp310-win_amd64.whl(59.63 MB)
cupy_cuda115-11.0.0a2-cp37-cp37m-manylinux1_x86_64.whl(76.31 MB)
cupy_cuda115-11.0.0a2-cp37-cp37m-win_amd64.whl(59.54 MB)
cupy_cuda115-11.0.0a2-cp38-cp38-manylinux1_x86_64.whl(79.49 MB)
cupy_cuda115-11.0.0a2-cp38-cp38-win_amd64.whl(59.64 MB)
cupy_cuda115-11.0.0a2-cp39-cp39-manylinux1_x86_64.whl(77.77 MB)
cupy_cuda115-11.0.0a2-cp39-cp39-win_amd64.whl(59.63 MB)
cupy_rocm_4_2-11.0.0a2-cp310-cp310-manylinux1_x86_64.whl(34.49 MB)
cupy_rocm_4_2-11.0.0a2-cp37-cp37m-manylinux1_x86_64.whl(33.16 MB)
cupy_rocm_4_2-11.0.0a2-cp38-cp38-manylinux1_x86_64.whl(35.95 MB)
cupy_rocm_4_2-11.0.0a2-cp39-cp39-manylinux1_x86_64.whl(34.42 MB)
cupy_rocm_4_3-11.0.0a2-cp310-cp310-manylinux1_x86_64.whl(36.07 MB)
cupy_rocm_4_3-11.0.0a2-cp37-cp37m-manylinux1_x86_64.whl(34.75 MB)
cupy_rocm_4_3-11.0.0a2-cp38-cp38-manylinux1_x86_64.whl(37.54 MB)
cupy_rocm_4_3-11.0.0a2-cp39-cp39-manylinux1_x86_64.whl(36.00 MB)
v10.2.0(Feb 25, 2022)
This is the release note of v10.2.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Support for CUDA 11.6 (#6349)

Initial support for CUDA 11.6 has been added as of this release. However, binary wheels are not yet distributed and users are expected to build CuPy from source meanwhile.

Changes

Enhancements

Support cuDNN 8.3.2 (#6328)

Support cuTENSOR 1.4.0 (#6330)

Support CUDA 11.5.1 (#6331)

Support NumPy 1.22 (#6354)

avoid DeprecationWarning from SciPy 1.8 (cupyx.scipy.sparse) (#6379)

Fix thrust related build issue with CUDA 11.6 (#6386)

Fix type annotations in installer (#6395)

Support CUDA 11.6 (#6422)

Bump Jitify version to fix memory leak (#6432)

Add __cupy_get_ndarray__ dunder method to transform objects to arrays' (#6465)

Warn if unexpectedlly failed to detect device count in cupy.show_config() (#6476)

Fix verbose LOBPCG for SciPy 1.8 (#6394)

Bug Fixes

Fix JIT to support notebook environment (#6356)

Fix cuDNN installer not working (#6368)

Fix cupyx.ndimage.spline_filter1d for HIP (#6411)

Fix boolean views for HIP (#6418)

Fix cupy.nan_to_num (#6431)

Fix reduction contiguous size calculation (#6464)

Code Fixes

Remove global use_hip flag in setup (#6398)

Documentation

Use cupy.__version__ instead of pkg_resources (#6380)

Tentatively pin intersphinx to SciPy 1.7.1 docs (#6442)

Revert "Tentatively pin intersphinx to SciPy 1.7.1 docs" (#6480)

Installation

Fix for cuDNN directory structure in Windows (#6369)

Install lib directory on Windows in cuDNN installer (#6370)

Avoid monkeypatching distutils (#6373)

Eliminate unnecessary configuration pass in setup (#6399)

Bump version to v10.2.0 (#6502)

Tests

CI: use CUDA docker images for CUDA Python CI (#6338)

Avoid empty notification message for scheduled tests (#6364)

CI: allow discarding docker image cache manually (#6372)

Parameterize library installer tests (#6374)

Fix tests for eigh() for CUDA 11.6 (#6376)

Add cupy.testing.installed (#6387)

Mark XFAIL for SciPy 1.8 release candidate (#6396)

CI: build docs in parallel (#6419)

CI: Bump ROCm version from 4.3 to 4.3.1 (#6421)

CI: Use default schema/matrix path in generate.py (#6428)

CI: Manage test tags in yaml (#6441)

Support SciPy 1.8 (#6444)

CI: coverage in reST (#6447)

CI: fix NCCL 2.10 unit test not covered (#6452)

Skip hfft related tests in HIP (#6458)

CI: Fix CUDA 11.6 driver update steps (#6471)

Others

CI: allow specifying special skip tag (#6477)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @emcastillo @grlee77 @kmaehashi @takagi
Source code(tar.gz)
Source code(zip)
cupy_cuda102-10.2.0-cp310-cp310-manylinux1_x86_64.whl(60.23 MB)
cupy_cuda102-10.2.0-cp310-cp310-manylinux2014_aarch64.whl(34.49 MB)
cupy_cuda102-10.2.0-cp310-cp310-win_amd64.whl(42.38 MB)
cupy_cuda102-10.2.0-cp37-cp37m-manylinux1_x86_64.whl(58.72 MB)
cupy_cuda102-10.2.0-cp37-cp37m-manylinux2014_aarch64.whl(32.80 MB)
cupy_cuda102-10.2.0-cp37-cp37m-win_amd64.whl(42.29 MB)
cupy_cuda102-10.2.0-cp38-cp38-manylinux1_x86_64.whl(61.85 MB)
cupy_cuda102-10.2.0-cp38-cp38-manylinux2014_aarch64.whl(35.89 MB)
cupy_cuda102-10.2.0-cp38-cp38-win_amd64.whl(42.38 MB)
cupy_cuda102-10.2.0-cp39-cp39-manylinux1_x86_64.whl(60.16 MB)
cupy_cuda102-10.2.0-cp39-cp39-manylinux2014_aarch64.whl(34.42 MB)
cupy_cuda102-10.2.0-cp39-cp39-win_amd64.whl(42.38 MB)
cupy_cuda110-10.2.0-cp310-cp310-manylinux1_x86_64.whl(74.85 MB)
cupy_cuda110-10.2.0-cp310-cp310-win_amd64.whl(56.96 MB)
cupy_cuda110-10.2.0-cp37-cp37m-manylinux1_x86_64.whl(73.34 MB)
cupy_cuda110-10.2.0-cp37-cp37m-win_amd64.whl(56.87 MB)
cupy_cuda110-10.2.0-cp38-cp38-manylinux1_x86_64.whl(76.48 MB)
cupy_cuda110-10.2.0-cp38-cp38-win_amd64.whl(56.96 MB)
cupy_cuda110-10.2.0-cp39-cp39-manylinux1_x86_64.whl(74.78 MB)
cupy_cuda110-10.2.0-cp39-cp39-win_amd64.whl(56.96 MB)
cupy_cuda111-10.2.0-cp310-cp310-manylinux1_x86_64.whl(93.64 MB)
cupy_cuda111-10.2.0-cp310-cp310-win_amd64.whl(76.70 MB)
cupy_cuda111-10.2.0-cp37-cp37m-manylinux1_x86_64.whl(92.13 MB)
cupy_cuda111-10.2.0-cp37-cp37m-win_amd64.whl(76.61 MB)
cupy_cuda111-10.2.0-cp38-cp38-manylinux1_x86_64.whl(95.27 MB)
cupy_cuda111-10.2.0-cp38-cp38-win_amd64.whl(76.70 MB)
cupy_cuda111-10.2.0-cp39-cp39-manylinux1_x86_64.whl(93.56 MB)
cupy_cuda111-10.2.0-cp39-cp39-win_amd64.whl(76.70 MB)
cupy_cuda112-10.2.0-cp310-cp310-manylinux1_x86_64.whl(75.26 MB)
cupy_cuda112-10.2.0-cp310-cp310-win_amd64.whl(57.45 MB)
cupy_cuda112-10.2.0-cp37-cp37m-manylinux1_x86_64.whl(73.75 MB)
cupy_cuda112-10.2.0-cp37-cp37m-win_amd64.whl(57.36 MB)
cupy_cuda112-10.2.0-cp38-cp38-manylinux1_x86_64.whl(76.89 MB)
cupy_cuda112-10.2.0-cp38-cp38-win_amd64.whl(57.45 MB)
cupy_cuda112-10.2.0-cp39-cp39-manylinux1_x86_64.whl(75.19 MB)
cupy_cuda112-10.2.0-cp39-cp39-win_amd64.whl(57.45 MB)
cupy_cuda113-10.2.0-cp310-cp310-manylinux1_x86_64.whl(72.44 MB)
cupy_cuda113-10.2.0-cp310-cp310-win_amd64.whl(54.18 MB)
cupy_cuda113-10.2.0-cp37-cp37m-manylinux1_x86_64.whl(70.93 MB)
cupy_cuda113-10.2.0-cp37-cp37m-win_amd64.whl(54.09 MB)
cupy_cuda113-10.2.0-cp38-cp38-manylinux1_x86_64.whl(74.07 MB)
cupy_cuda113-10.2.0-cp38-cp38-win_amd64.whl(54.18 MB)
cupy_cuda113-10.2.0-cp39-cp39-manylinux1_x86_64.whl(72.37 MB)
cupy_cuda113-10.2.0-cp39-cp39-win_amd64.whl(54.18 MB)
cupy_cuda114-10.2.0-cp310-cp310-manylinux1_x86_64.whl(80.92 MB)
cupy_cuda114-10.2.0-cp310-cp310-win_amd64.whl(62.87 MB)
cupy_cuda114-10.2.0-cp37-cp37m-manylinux1_x86_64.whl(79.41 MB)
cupy_cuda114-10.2.0-cp37-cp37m-win_amd64.whl(62.78 MB)
cupy_cuda114-10.2.0-cp38-cp38-manylinux1_x86_64.whl(82.55 MB)
cupy_cuda114-10.2.0-cp38-cp38-win_amd64.whl(62.88 MB)
cupy_cuda114-10.2.0-cp39-cp39-manylinux1_x86_64.whl(80.84 MB)
cupy_cuda114-10.2.0-cp39-cp39-win_amd64.whl(62.87 MB)
cupy_cuda115-10.2.0-cp310-cp310-manylinux1_x86_64.whl(77.63 MB)
cupy_cuda115-10.2.0-cp310-cp310-win_amd64.whl(59.55 MB)
cupy_cuda115-10.2.0-cp37-cp37m-manylinux1_x86_64.whl(76.13 MB)
cupy_cuda115-10.2.0-cp37-cp37m-win_amd64.whl(59.46 MB)
cupy_cuda115-10.2.0-cp38-cp38-manylinux1_x86_64.whl(79.26 MB)
cupy_cuda115-10.2.0-cp38-cp38-win_amd64.whl(59.56 MB)
cupy_cuda115-10.2.0-cp39-cp39-manylinux1_x86_64.whl(77.56 MB)
cupy_cuda115-10.2.0-cp39-cp39-win_amd64.whl(59.55 MB)
v11.0.0a1(Jan 20, 2022)
This is the release note of v11.0.0a1. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Improved NumPy functions coverage (#6078)

As series of NumPy routines have been proposed as a good-first-issue and as a result, an increasing number of contributors have sent pull requests to help increase the number of available APIs. An issue tracker with the currently implemented issues is available at #6078.

Add cupyx.scipy.special functions (#5687)

Spherical harmonics, Legendre and Gamma functions are implemented using highly performant specific CUDA kernels. Thanks to @grlee77!

Initial support for CUDA Graph API by means of stream capture API (#4567)

This PR adds the ability of using the CUDA Graph API to greatly reduce the overhead of kernel launching. This is done by using the stream capture API, and example follows. Thanks to @leofang!

import cupy as cp a = cp.random.randint(0, 10, 100, dtype=np.int32) s = cp.cuda.Stream(non_blocking=True) with s: s.begin_capture() a += 3 a = cp.abs(a) g = s.end_capture() # work is queued, but not yet launched g.launch() s.synchronize()

Support __device__ function in CuPy JIT (#6265)

The new interface cupyx.jit.rawkernel(device=True) is supported to define a CUDA device function.

from cupyx import jit @jit.rawkernel(device=True) def getitem(x, tid): return x[tid] @jit.rawkernel() def elementwise_copy(x, y): tid = jit.threadIdx.x + jit.blockDim.x * jit.blockIdx.x y[tid] = getitem(x, tid)

The following CUDA code is generated from the above python code.

__device__ int getitem_1(CArray<int, 1, true, true> x, unsigned int tid) { return x[tid]; } extern "C" __global__ void elementwise_copy(CArray<int, 1, true, true> x, CArray<int, 1, true, true> y) { unsigned int tid; tid = (threadIdx.x + (blockDim.x * blockIdx.x)); y[tid] = getitem_1(x, tid); }

Changes

New Features

Support stream capture (#4567)

Add additional special functions (spherical harmonics, Legendre, Gamma functions) (#5687)

Add cupy.asfarray (#6085)

Add cupy.trapz (#6107)

Add cupy.array_api.linalg (#6131)

Add cupy.mask_indices (#6156)

Add cupy.array_equiv API. (#6254)

Add cupy.cublas.syrk and cupy.cublas.sbmv (#6278)

Add cupy.vander API. (#6279)

Add cupy.ediff1d API. (#6280)

Add cupy.fabs API. (#6282)

Add discrete cosine and sine transforms to cupyx.scipy.fft (#6288)

Add logit, expit and log_expit to cupyx.scipy.special (#6300)

Add xlogy and xlog1py to cupyx.scipy.special(#6301)

Add tril_indices and tril_indices_from API. (#6305)

Add cupy.format_float_positional (#6308)

Add cupy.row_stack API. (#6312)

Add triu_indices and triu_indices_from API. (#6316)

Enhancements

Raise better message when importing CPU array via DLPack (#6051)

Borrow more non-GPU APIs from NumPy (#6074)

Add more aliases for compatibility with NumPy (#6075)

Import more dtype aliases from NumPy (#6076)

Borrow indexing APIs from NumPy (#6077)

Apply upstream patch to cupy.array_api (#6086)

Compile cub/thrust with no unique symbol (#6106)

Support cuDNN 8.3.0 (#6108)

Support all advanced indexing (#6127)

Support CUDA 11.5.1 (#6166)

Support lambda function in cupy.vectorize (#6170)

Support eigenvalue solver 64bit API (#6178)

Support cuTENSOR 1.4.0 (#6187)

Make matmul support ufunc kwargs (#6195)

Alias NumPy error classes (#6212)

Support comparison to None and Ellipsis (#6222)

JIT: Fix if expr typing rule (#6234)

Support comparison with more objects (#6250)

JIT: Support __device__ function (#6265)

More clear warning message (#6283)

Make streams hashable (#6285)

Check isinstance before comparison in __eq__ (#6287)

Support cuDNN 8.3.2 (#6314)

Deprecate MachAr (support NumPy 1.22) (#6188)

Fix cupy.linalg.qr to align with NumPy 1.22 (#6225)

Change a parameter name in percentile and quantile to support NumPy 1.22 (#6228)

Performance Improvements

Avoid 64bit division for reduce register consumption (#6019)

Remove memory copy in matmul (#6179)

Bug Fixes

Detect repeated axis in reduction (#5964)

Fix __all__ in cupyx.scipy.fft (#6071)

Fix __getitem__ on Ellipsis and advanced indexing dimension (#6081)

Allow leading unit dimensions in copy source (#6118)

Always test broadcast in copyto (#6121)

Fix overloading ambiguity in ndimage filters (#6162)

Fix empty Cholesky (#6164)

Fix empty solve (#6167)

Allow flip ()-shaped array (#6169)

Handles infinities of the same sign in logaddexp and logaddexp2 (#6172)

Fix #4675 on resolving TODO in #4198 (#6197)

Eigenvalue solver 64bit API on CUDA 11.1 (#6201)

Fix edge case compatibility in cupy.eye() (#6208)

Fix linalg.eigh and linalg.eigvalsh on empty inputs (#6210)

Fix overlapping out in matmul and (tensor)dot (#6216)

Fix compile_with_cache returning None (#6232)

Fixing index calculation for random constructor (#6257)

BUG: Fix the .T attribute in the array_api namespace (#6289)

Fix stream capture in ROCm (#6296)

Fix cuDNN installer not working (#6337)

Code Fixes

Remove __all__ from cupyx/scipy/* (#6149)

Delete from os import path (#6152)

Remove legacy cp.linalg.solve() implementation (#6161)

Documentation

Add link to compatibility matrix (#6055)

Update upgrade guide (#6058)

Add v11 to compatibility matrix (#6067)

Exclude kernel_version from comparison table (#6072)

Doc: Add more footnotes to comparison table (#6073)

Add polynomial modules to comparison table (#6082)

Add CITATION.bib and update README (#6091)

Remove LLVM_PATH note on document (#6093)

Docs: Update linkcode implementation (#6126)

Update footnotes in comparison table (#6142)

Update conda-forge installation guide (#6186)

Revise Overview for CuPy v10 (#6209)

Docs: CentOS installation from source (#6218)

Fix cupy.trapz docstring (#6239)

Fix eigsh doc (#6266)

Add cupy.positive in API Reference (#6274)

Installation

Replace distutils with setuptools in Windows cl.exe detection (#6025)

Fix for cuDNN directory structure in Windows (#6342)

Tests

Fix testing.multi_gpu to add pytest marker (#6015)

CI: add link to ROCm projects in CI coverage matrix (#6037)

CI: use separate project for multi-GPU tests (#6050)

Fix CI result notification message format (#6066)

Fix CI cannot override cuSPARSELt/cuTENSOR version preinstalled (#6084)

Workaround DeprecationWarning raised from pkg_resources (#6094)

Fix missing multi_gpu annotation in tests (#6098)

Fix exception handling in cupyx.distributed (#6114)

Improve FlexCI test scripts (#6117)

CI: Add timeout to show_config (#6120)

Trigger FlexCI from GitHub Actions (#6130)

CI: Fix package override sometimes fails in CentOS (#6141)

CI: Need to update CUDA driver in cuda115.multi (#6144)

Add tests for convolve2d (#6171)

CI: Update limits to reduce cache size (#6174)

CI: Fix unquoted specifiers (#6175)

Support pre-release NumPy version in tests (#6190)

Remove XFAIL for XPASS tests on ROCm (#6259)

Tentatively pin to setuptools<60 in Windows CI (#6260)

Fix cache key for github actions (#6281)

Use NVIDIA docker images for CUDA 11.5 (#6303)

Tentatively pin to CUDA Driver 495 (#6310)

Remove unused dtype parameterizing in tril_indices test (#6322)

Use get_include instead of array_equiv for fallback test (#6333)

CI: Add cuda-slow test in FlexCI (#6335)

CI: use CUDA docker images for CUDA Python CI (#6336)

Others

Add doc issue template (#6294)

Bump version to v11.0.0a1 (#6344)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@akochepasov @amanchhaparia @asi1024 @ColmTalbot @emcastillo @eternalphane @grlee77 @haesleinhuepf @khushi-411 @kmaehashi @leofang @okuta @ptim0626 @SauravMaheshkar @shwina @takagi @thomasjpfan @tom24d @toslunar @twmht @WiseroOrb @Yutaro-Sanada
Source code(tar.gz)
Source code(zip)
cupy_cuda102-11.0.0a1-cp310-cp310-manylinux1_x86_64.whl(60.40 MB)
cupy_cuda102-11.0.0a1-cp310-cp310-manylinux2014_aarch64.whl(34.65 MB)
cupy_cuda102-11.0.0a1-cp310-cp310-win_amd64.whl(42.74 MB)
cupy_cuda102-11.0.0a1-cp37-cp37m-manylinux1_x86_64.whl(58.87 MB)
cupy_cuda102-11.0.0a1-cp37-cp37m-manylinux2014_aarch64.whl(32.96 MB)
cupy_cuda102-11.0.0a1-cp37-cp37m-win_amd64.whl(42.64 MB)
cupy_cuda102-11.0.0a1-cp38-cp38-manylinux1_x86_64.whl(62.03 MB)
cupy_cuda102-11.0.0a1-cp38-cp38-manylinux2014_aarch64.whl(36.07 MB)
cupy_cuda102-11.0.0a1-cp38-cp38-win_amd64.whl(42.74 MB)
cupy_cuda102-11.0.0a1-cp39-cp39-manylinux1_x86_64.whl(60.32 MB)
cupy_cuda102-11.0.0a1-cp39-cp39-manylinux2014_aarch64.whl(34.59 MB)
cupy_cuda102-11.0.0a1-cp39-cp39-win_amd64.whl(42.74 MB)
cupy_cuda110-11.0.0a1-cp310-cp310-manylinux1_x86_64.whl(75.02 MB)
cupy_cuda110-11.0.0a1-cp310-cp310-win_amd64.whl(57.33 MB)
cupy_cuda110-11.0.0a1-cp37-cp37m-manylinux1_x86_64.whl(73.49 MB)
cupy_cuda110-11.0.0a1-cp37-cp37m-win_amd64.whl(57.23 MB)
cupy_cuda110-11.0.0a1-cp38-cp38-manylinux1_x86_64.whl(76.65 MB)
cupy_cuda110-11.0.0a1-cp38-cp38-win_amd64.whl(57.33 MB)
cupy_cuda110-11.0.0a1-cp39-cp39-manylinux1_x86_64.whl(74.94 MB)
cupy_cuda110-11.0.0a1-cp39-cp39-win_amd64.whl(57.33 MB)
cupy_cuda111-11.0.0a1-cp310-cp310-manylinux1_x86_64.whl(93.81 MB)
cupy_cuda111-11.0.0a1-cp310-cp310-win_amd64.whl(77.08 MB)
cupy_cuda111-11.0.0a1-cp37-cp37m-manylinux1_x86_64.whl(92.28 MB)
cupy_cuda111-11.0.0a1-cp37-cp37m-win_amd64.whl(76.98 MB)
cupy_cuda111-11.0.0a1-cp38-cp38-manylinux1_x86_64.whl(95.44 MB)
cupy_cuda111-11.0.0a1-cp38-cp38-win_amd64.whl(77.07 MB)
cupy_cuda111-11.0.0a1-cp39-cp39-manylinux1_x86_64.whl(93.73 MB)
cupy_cuda111-11.0.0a1-cp39-cp39-win_amd64.whl(77.07 MB)
cupy_cuda112-11.0.0a1-cp310-cp310-manylinux1_x86_64.whl(75.44 MB)
cupy_cuda112-11.0.0a1-cp310-cp310-win_amd64.whl(57.81 MB)
cupy_cuda112-11.0.0a1-cp37-cp37m-manylinux1_x86_64.whl(73.90 MB)
cupy_cuda112-11.0.0a1-cp37-cp37m-win_amd64.whl(57.71 MB)
cupy_cuda112-11.0.0a1-cp38-cp38-manylinux1_x86_64.whl(77.07 MB)
cupy_cuda112-11.0.0a1-cp38-cp38-win_amd64.whl(57.81 MB)
cupy_cuda112-11.0.0a1-cp39-cp39-manylinux1_x86_64.whl(75.36 MB)
cupy_cuda112-11.0.0a1-cp39-cp39-win_amd64.whl(57.81 MB)
cupy_cuda113-11.0.0a1-cp310-cp310-manylinux1_x86_64.whl(72.61 MB)
cupy_cuda113-11.0.0a1-cp310-cp310-win_amd64.whl(55.02 MB)
cupy_cuda113-11.0.0a1-cp37-cp37m-manylinux1_x86_64.whl(71.08 MB)
cupy_cuda113-11.0.0a1-cp37-cp37m-win_amd64.whl(54.92 MB)
cupy_cuda113-11.0.0a1-cp38-cp38-manylinux1_x86_64.whl(74.24 MB)
cupy_cuda113-11.0.0a1-cp38-cp38-win_amd64.whl(55.02 MB)
cupy_cuda113-11.0.0a1-cp39-cp39-manylinux1_x86_64.whl(72.53 MB)
cupy_cuda113-11.0.0a1-cp39-cp39-win_amd64.whl(55.02 MB)
cupy_cuda114-11.0.0a1-cp310-cp310-manylinux1_x86_64.whl(81.09 MB)
cupy_cuda114-11.0.0a1-cp310-cp310-win_amd64.whl(63.67 MB)
cupy_cuda114-11.0.0a1-cp37-cp37m-manylinux1_x86_64.whl(79.55 MB)
cupy_cuda114-11.0.0a1-cp37-cp37m-win_amd64.whl(63.57 MB)
cupy_cuda114-11.0.0a1-cp38-cp38-manylinux1_x86_64.whl(82.72 MB)
cupy_cuda114-11.0.0a1-cp38-cp38-win_amd64.whl(63.67 MB)
cupy_cuda114-11.0.0a1-cp39-cp39-manylinux1_x86_64.whl(81.01 MB)
cupy_cuda114-11.0.0a1-cp39-cp39-win_amd64.whl(63.67 MB)
cupy_cuda115-11.0.0a1-cp310-cp310-manylinux1_x86_64.whl(77.80 MB)
cupy_cuda115-11.0.0a1-cp310-cp310-win_amd64.whl(60.35 MB)
cupy_cuda115-11.0.0a1-cp37-cp37m-manylinux1_x86_64.whl(76.27 MB)
cupy_cuda115-11.0.0a1-cp37-cp37m-win_amd64.whl(60.25 MB)
cupy_cuda115-11.0.0a1-cp38-cp38-manylinux1_x86_64.whl(79.43 MB)
cupy_cuda115-11.0.0a1-cp38-cp38-win_amd64.whl(60.35 MB)
cupy_cuda115-11.0.0a1-cp39-cp39-manylinux1_x86_64.whl(77.73 MB)
cupy_cuda115-11.0.0a1-cp39-cp39-win_amd64.whl(60.35 MB)
cupy_rocm_4_0-11.0.0a1-cp310-cp310-manylinux1_x86_64.whl(35.29 MB)
cupy_rocm_4_0-11.0.0a1-cp37-cp37m-manylinux1_x86_64.whl(33.97 MB)
cupy_rocm_4_0-11.0.0a1-cp38-cp38-manylinux1_x86_64.whl(36.74 MB)
cupy_rocm_4_0-11.0.0a1-cp39-cp39-manylinux1_x86_64.whl(35.22 MB)
cupy_rocm_4_2-11.0.0a1-cp310-cp310-manylinux1_x86_64.whl(34.45 MB)
cupy_rocm_4_2-11.0.0a1-cp37-cp37m-manylinux1_x86_64.whl(33.12 MB)
cupy_rocm_4_2-11.0.0a1-cp38-cp38-manylinux1_x86_64.whl(35.91 MB)
cupy_rocm_4_2-11.0.0a1-cp39-cp39-manylinux1_x86_64.whl(34.37 MB)
cupy_rocm_4_3-11.0.0a1-cp310-cp310-manylinux1_x86_64.whl(36.03 MB)
cupy_rocm_4_3-11.0.0a1-cp37-cp37m-manylinux1_x86_64.whl(34.70 MB)
cupy_rocm_4_3-11.0.0a1-cp38-cp38-manylinux1_x86_64.whl(37.49 MB)
cupy_rocm_4_3-11.0.0a1-cp39-cp39-manylinux1_x86_64.whl(35.95 MB)
v10.1.0(Jan 20, 2022)
This is the release note of v10.1.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Changes

Enhancements

Remove memory copy in matmul (#6241)

Fix cupy.linalg.qr to align with NumPy 1.22 (#6263)

Bug Fixes

Fix edge case compatibility in cupy.eye() (#6213)

Fix compile_with_cache returning None (#6236)

Allow flip ()-shaped array (#6237)

Fix linalg.eigh and linalg.eigvalsh on empty inputs (#6238)

Fix overloading ambiguity in ndimage filters (#6242)

Fixing index calculation for random constructor (#6267)

BUG: Fix the .T attribute in the array_api namespace (#6291)

Code Fixes

Remove legacy cp.linalg.solve() implementation (#6235)

Documentation

Docs: CentOS installation from source (#6230)

Add cupy.positive in API Reference (#6276)

Fix eigsh doc (#6292)

Tests

Add tests for convolve2d (#6194)

Change a parameter name in percentile and quantile to support NumPy 1.22 (#6247)

Tentatively pin to setuptools<60 in Windows CI (#6270)

Fix cache key for github actions (#6286)

Remove XFAIL for XPASS tests on ROCm (#6297)

Use NVIDIA docker images for CUDA 11.5 (#6304)

Tentatively pin to CUDA Driver 495 (#6311)

CI: Add cuda-slow test in FlexCI (#6339)

Others

Bump version to v10.1.0 (#6345)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@asi1024 @kmaehashi @leofang @ptim0626 @SauravMaheshkar @takagi @thomasjpfan @toslunar @WiseroOrb
Source code(tar.gz)
Source code(zip)
cupy_cuda102-10.1.0-cp310-cp310-manylinux1_x86_64.whl(60.22 MB)
cupy_cuda102-10.1.0-cp310-cp310-manylinux2014_aarch64.whl(34.48 MB)
cupy_cuda102-10.1.0-cp310-cp310-win_amd64.whl(42.67 MB)
cupy_cuda102-10.1.0-cp37-cp37m-manylinux1_x86_64.whl(58.71 MB)
cupy_cuda102-10.1.0-cp37-cp37m-manylinux2014_aarch64.whl(32.79 MB)
cupy_cuda102-10.1.0-cp37-cp37m-win_amd64.whl(42.57 MB)
cupy_cuda102-10.1.0-cp38-cp38-manylinux1_x86_64.whl(61.84 MB)
cupy_cuda102-10.1.0-cp38-cp38-manylinux2014_aarch64.whl(35.88 MB)
cupy_cuda102-10.1.0-cp38-cp38-win_amd64.whl(42.67 MB)
cupy_cuda102-10.1.0-cp39-cp39-manylinux1_x86_64.whl(60.14 MB)
cupy_cuda102-10.1.0-cp39-cp39-manylinux2014_aarch64.whl(34.41 MB)
cupy_cuda102-10.1.0-cp39-cp39-win_amd64.whl(42.67 MB)
cupy_cuda110-10.1.0-cp310-cp310-manylinux1_x86_64.whl(74.84 MB)
cupy_cuda110-10.1.0-cp310-cp310-win_amd64.whl(57.26 MB)
cupy_cuda110-10.1.0-cp37-cp37m-manylinux1_x86_64.whl(73.33 MB)
cupy_cuda110-10.1.0-cp37-cp37m-win_amd64.whl(57.16 MB)
cupy_cuda110-10.1.0-cp38-cp38-manylinux1_x86_64.whl(76.46 MB)
cupy_cuda110-10.1.0-cp38-cp38-win_amd64.whl(57.26 MB)
cupy_cuda110-10.1.0-cp39-cp39-manylinux1_x86_64.whl(74.77 MB)
cupy_cuda110-10.1.0-cp39-cp39-win_amd64.whl(57.26 MB)
cupy_cuda111-10.1.0-cp310-cp310-manylinux1_x86_64.whl(93.63 MB)
cupy_cuda111-10.1.0-cp310-cp310-win_amd64.whl(77.01 MB)
cupy_cuda111-10.1.0-cp37-cp37m-manylinux1_x86_64.whl(92.12 MB)
cupy_cuda111-10.1.0-cp37-cp37m-win_amd64.whl(76.91 MB)
cupy_cuda111-10.1.0-cp38-cp38-manylinux1_x86_64.whl(95.25 MB)
cupy_cuda111-10.1.0-cp38-cp38-win_amd64.whl(77.01 MB)
cupy_cuda111-10.1.0-cp39-cp39-manylinux1_x86_64.whl(93.55 MB)
cupy_cuda111-10.1.0-cp39-cp39-win_amd64.whl(77.01 MB)
cupy_cuda112-10.1.0-cp310-cp310-manylinux1_x86_64.whl(75.26 MB)
cupy_cuda112-10.1.0-cp310-cp310-win_amd64.whl(57.74 MB)
cupy_cuda112-10.1.0-cp37-cp37m-manylinux1_x86_64.whl(73.74 MB)
cupy_cuda112-10.1.0-cp37-cp37m-win_amd64.whl(57.64 MB)
cupy_cuda112-10.1.0-cp38-cp38-manylinux1_x86_64.whl(76.88 MB)
cupy_cuda112-10.1.0-cp38-cp38-win_amd64.whl(57.74 MB)
cupy_cuda112-10.1.0-cp39-cp39-manylinux1_x86_64.whl(75.18 MB)
cupy_cuda112-10.1.0-cp39-cp39-win_amd64.whl(57.74 MB)
cupy_cuda113-10.1.0-cp310-cp310-manylinux1_x86_64.whl(72.43 MB)
cupy_cuda113-10.1.0-cp310-cp310-win_amd64.whl(54.95 MB)
cupy_cuda113-10.1.0-cp37-cp37m-manylinux1_x86_64.whl(70.92 MB)
cupy_cuda113-10.1.0-cp37-cp37m-win_amd64.whl(54.85 MB)
cupy_cuda113-10.1.0-cp38-cp38-manylinux1_x86_64.whl(74.06 MB)
cupy_cuda113-10.1.0-cp38-cp38-win_amd64.whl(54.95 MB)
cupy_cuda113-10.1.0-cp39-cp39-manylinux1_x86_64.whl(72.36 MB)
cupy_cuda113-10.1.0-cp39-cp39-win_amd64.whl(54.95 MB)
cupy_cuda114-10.1.0-cp310-cp310-manylinux1_x86_64.whl(76.17 MB)
cupy_cuda114-10.1.0-cp310-cp310-win_amd64.whl(58.81 MB)
cupy_cuda114-10.1.0-cp37-cp37m-manylinux1_x86_64.whl(74.65 MB)
cupy_cuda114-10.1.0-cp37-cp37m-win_amd64.whl(58.71 MB)
cupy_cuda114-10.1.0-cp38-cp38-manylinux1_x86_64.whl(77.79 MB)
cupy_cuda114-10.1.0-cp38-cp38-win_amd64.whl(58.81 MB)
cupy_cuda114-10.1.0-cp39-cp39-manylinux1_x86_64.whl(76.09 MB)
cupy_cuda114-10.1.0-cp39-cp39-win_amd64.whl(58.81 MB)
cupy_cuda115-10.1.0-cp310-cp310-manylinux1_x86_64.whl(73.25 MB)
cupy_cuda115-10.1.0-cp310-cp310-win_amd64.whl(55.86 MB)
cupy_cuda115-10.1.0-cp37-cp37m-manylinux1_x86_64.whl(71.74 MB)
cupy_cuda115-10.1.0-cp37-cp37m-win_amd64.whl(55.76 MB)
cupy_cuda115-10.1.0-cp38-cp38-manylinux1_x86_64.whl(74.87 MB)
cupy_cuda115-10.1.0-cp38-cp38-win_amd64.whl(55.86 MB)
cupy_cuda115-10.1.0-cp39-cp39-manylinux1_x86_64.whl(73.17 MB)
cupy_cuda115-10.1.0-cp39-cp39-win_amd64.whl(55.86 MB)
v10.0.0(Dec 9, 2021)
This is the release note of v10.0.0. See here for the complete list of solved issues and merged PRs.

This release note only covers changes made since v10.0.0rc1 release. Check out our blog for highlights in the v10 release!

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Support all advanced indexing (#6196)

The support for advanced indexing using boolean masks has been completed in CuPy v10. Now it is possible to index arrays using combinations of Ellipsis, boolean flags and regular indexes such as a[[[1, 1, -3], [0, 2, 2]], [True, False, True, True]] and a[..., [[False, True]]]

Support lambda functions in cupy.vectorize (#6217)

A long-awaited feature to ensure compatibility with NumPy vectorize has been implemented. In this release, it is now possible to transpile lambda functions. This is especially handy when using JIT in conjunction with cupy.vectorize:

import cupy a = cupy.array([0.4, -0.2, 1.8, -1.2]) relu = cupy.vectorize(lambda x: (x > 0.0) * x) print(relu(a)) # [ 0.4 -0. 1.8 -0. ]

Announcements

Drop support for CUDA 10.1 or earlier (#5770)

As per the RFC in #5717 and Twitter, the minimum CUDA version that is supported by CuPy v10 is CUDA 10.2.

Drop support for NCCL 2.6 and 2.7 (#5855)

The minimum supported version for CuPy v10 is NCCL 2.8 as it implements the required primitives for cupyx.distributed to work.

Drop support for Python 3.6 (#5771)

Following the Python 3.6 sunset on December 2021, and the compatibility lines with NumPy, starting CuPy v10, Python 3.6 will no longer be supported.

Drop support for NumPy 1.17 (#5857)

As per NEP29, NumPy 1.17 support has been dropped on July 26, 2021.

Changes

New Features

Add cupy.array_api.linalg (#6199)

Enhancements

Add more aliases for compatibility with NumPy (#6080)

Raise better message when importing CPU array via DLPack (#6097)

Apply upstream patch to cupy.array_api (#6105)

Borrow more non-GPU APIs from NumPy (#6109)

Import more dtype aliases from NumPy (#6110)

Borrow indexing APIs from NumPy (#6111)

Compile cub/thrust with no unique symbol (#6140)

Support cuDNN 8.3.0 (#6150)

Support eigenvalue solver 64bit API (#6192)

Support all advanced indexing (#6196)

Support lambda functions in cupy.vectorize (#6217)

Deprecate MachAr (support NumPy 1.22) (#6189)

Performance Improvements

Avoid 64bit division to reduce register consumption (#6102)

Bug Fixes

Fix __all__ in cupyx.scipy.fft (#6083)

Detect repeated axis in reduction (#6103)

Fix __getitem__ on Ellipsis and advanced indexing dimension (#6113)

Allow leading unit dimensions in copy source (#6153)

Always test broadcast in copyto (#6155)

Handles infinities of the same sign in logaddexp and logaddexp2 (#6176)

Fix empty solve (#6183)

Fix empty Cholesky (#6184)

Fix #4675 on resolving TODO in #4198 (#6204)

Eigenvalue solver 64bit API on CUDA 11.1 (#6220)

Code Fixes

Avoid from os import path (#6165)

Documentation

Update stable branch (#6065)

Update labels of Docs column (#6068)

Add more footnotes to comparison table (#6079)

Exclude kernel_version from comparison table (#6090)

Remove LLVM_PATH note on document (#6101)

Add polynomial modules to comparison table (#6122)

Add link to compatibility matrix (#6135)

Update footnotes in comparison table (#6143)

Update conda-forge installation guide (#6200)

Update upgrade guide (#6203)

Update linkcode implementation (#6206)

Revise Overview for CuPy v10 (#6215)

Installation

Replace distutils with setuptools in Windows cl.exe detection (#6138)

Bump version to v10.0.0 (#6224)

Tests

Fix CI cannot override cuSPARSELt/cuTENSOR version preinstalled (#6087)

Workaround DeprecationWarning raised from pkg_resources (#6095)

Fix testing.multi_gpu to add pytest marker (#6096)

Fix missing multi_gpu annotation in tests (#6100)

Fix exception handling in cupyx.distributed (#6116)

Improve FlexCI test scripts (#6119)

Fix CI result notification message format (#6124)

CI: Add timeout to show_config (#6132)

CI: use separate project for multi-GPU tests (#6145)

CI: Need to update CUDA driver in cuda115.multi (#6146)

CI: Fix package override sometimes fails in CentOS (#6147)

CI: add link to ROCm projects in CI coverage matrix (#6148)

CI: Fix unquoted specifiers (#6182)

CI: Update limits to reduce cache size (#6185)

Trigger FlexCI from GitHub Actions (#6191)

Support pre-release NumPy version in tests (#6193)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@asi1024 @emcastillo @eternalphane @kmaehashi @leofang @okuta @takagi @toslunar @twmht @Yutaro-Sanada
Source code(tar.gz)
Source code(zip)
cupy_cuda102-10.0.0-cp310-cp310-manylinux1_x86_64.whl(60.22 MB)
cupy_cuda102-10.0.0-cp310-cp310-manylinux2014_aarch64.whl(34.47 MB)
cupy_cuda102-10.0.0-cp310-cp310-win_amd64.whl(42.89 MB)
cupy_cuda102-10.0.0-cp37-cp37m-manylinux1_x86_64.whl(58.70 MB)
cupy_cuda102-10.0.0-cp37-cp37m-manylinux2014_aarch64.whl(32.79 MB)
cupy_cuda102-10.0.0-cp37-cp37m-win_amd64.whl(42.78 MB)
cupy_cuda102-10.0.0-cp38-cp38-manylinux1_x86_64.whl(61.84 MB)
cupy_cuda102-10.0.0-cp38-cp38-manylinux2014_aarch64.whl(35.87 MB)
cupy_cuda102-10.0.0-cp38-cp38-win_amd64.whl(42.93 MB)
cupy_cuda102-10.0.0-cp39-cp39-manylinux1_x86_64.whl(60.14 MB)
cupy_cuda102-10.0.0-cp39-cp39-manylinux2014_aarch64.whl(34.41 MB)
cupy_cuda102-10.0.0-cp39-cp39-win_amd64.whl(42.89 MB)
cupy_cuda110-10.0.0-cp310-cp310-manylinux1_x86_64.whl(74.84 MB)
cupy_cuda110-10.0.0-cp310-cp310-win_amd64.whl(57.48 MB)
cupy_cuda110-10.0.0-cp37-cp37m-manylinux1_x86_64.whl(73.33 MB)
cupy_cuda110-10.0.0-cp37-cp37m-win_amd64.whl(57.37 MB)
cupy_cuda110-10.0.0-cp38-cp38-manylinux1_x86_64.whl(76.46 MB)
cupy_cuda110-10.0.0-cp38-cp38-win_amd64.whl(57.51 MB)
cupy_cuda110-10.0.0-cp39-cp39-manylinux1_x86_64.whl(74.76 MB)
cupy_cuda110-10.0.0-cp39-cp39-win_amd64.whl(57.48 MB)
cupy_cuda111-10.0.0-cp310-cp310-manylinux1_x86_64.whl(93.63 MB)
cupy_cuda111-10.0.0-cp310-cp310-win_amd64.whl(77.23 MB)
cupy_cuda111-10.0.0-cp37-cp37m-manylinux1_x86_64.whl(92.11 MB)
cupy_cuda111-10.0.0-cp37-cp37m-win_amd64.whl(77.12 MB)
cupy_cuda111-10.0.0-cp38-cp38-manylinux1_x86_64.whl(95.25 MB)
cupy_cuda111-10.0.0-cp38-cp38-win_amd64.whl(77.26 MB)
cupy_cuda111-10.0.0-cp39-cp39-manylinux1_x86_64.whl(93.55 MB)
cupy_cuda111-10.0.0-cp39-cp39-win_amd64.whl(77.23 MB)
cupy_cuda112-10.0.0-cp310-cp310-manylinux1_x86_64.whl(75.25 MB)
cupy_cuda112-10.0.0-cp310-cp310-win_amd64.whl(57.96 MB)
cupy_cuda112-10.0.0-cp37-cp37m-manylinux1_x86_64.whl(73.74 MB)
cupy_cuda112-10.0.0-cp37-cp37m-win_amd64.whl(57.85 MB)
cupy_cuda112-10.0.0-cp38-cp38-manylinux1_x86_64.whl(76.88 MB)
cupy_cuda112-10.0.0-cp38-cp38-win_amd64.whl(57.99 MB)
cupy_cuda112-10.0.0-cp39-cp39-manylinux1_x86_64.whl(75.17 MB)
cupy_cuda112-10.0.0-cp39-cp39-win_amd64.whl(57.96 MB)
cupy_cuda113-10.0.0-cp310-cp310-manylinux1_x86_64.whl(72.43 MB)
cupy_cuda113-10.0.0-cp310-cp310-win_amd64.whl(55.17 MB)
cupy_cuda113-10.0.0-cp37-cp37m-manylinux1_x86_64.whl(70.92 MB)
cupy_cuda113-10.0.0-cp37-cp37m-win_amd64.whl(55.06 MB)
cupy_cuda113-10.0.0-cp38-cp38-manylinux1_x86_64.whl(74.06 MB)
cupy_cuda113-10.0.0-cp38-cp38-win_amd64.whl(55.20 MB)
cupy_cuda113-10.0.0-cp39-cp39-manylinux1_x86_64.whl(72.35 MB)
cupy_cuda113-10.0.0-cp39-cp39-win_amd64.whl(55.17 MB)
cupy_cuda114-10.0.0-cp310-cp310-manylinux1_x86_64.whl(76.17 MB)
cupy_cuda114-10.0.0-cp310-cp310-win_amd64.whl(59.03 MB)
cupy_cuda114-10.0.0-cp37-cp37m-manylinux1_x86_64.whl(74.65 MB)
cupy_cuda114-10.0.0-cp37-cp37m-win_amd64.whl(58.92 MB)
cupy_cuda114-10.0.0-cp38-cp38-manylinux1_x86_64.whl(77.79 MB)
cupy_cuda114-10.0.0-cp38-cp38-win_amd64.whl(59.06 MB)
cupy_cuda114-10.0.0-cp39-cp39-manylinux1_x86_64.whl(76.09 MB)
cupy_cuda114-10.0.0-cp39-cp39-win_amd64.whl(59.03 MB)
cupy_cuda115-10.0.0-cp310-cp310-manylinux1_x86_64.whl(73.25 MB)
cupy_cuda115-10.0.0-cp310-cp310-win_amd64.whl(56.08 MB)
cupy_cuda115-10.0.0-cp37-cp37m-manylinux1_x86_64.whl(71.73 MB)
cupy_cuda115-10.0.0-cp37-cp37m-win_amd64.whl(55.96 MB)
cupy_cuda115-10.0.0-cp38-cp38-manylinux1_x86_64.whl(74.87 MB)
cupy_cuda115-10.0.0-cp38-cp38-win_amd64.whl(56.11 MB)
cupy_cuda115-10.0.0-cp39-cp39-manylinux1_x86_64.whl(73.17 MB)
cupy_cuda115-10.0.0-cp39-cp39-win_amd64.whl(56.07 MB)
v10.0.0rc1(Nov 11, 2021)
This is the release note of v10.0.0rc1. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Add cupyx.distributed (#5590)

This new version provides a wrapper over NVIDIA’s NCCL library to perform communication in an MPI-like style. Currently, point-to-point and collective communication primitives are supported. Check the documentation for a complete reference of the functions.

CuPy now supports CUDA 11.5, Python 3.10, and NVIDIA Jetson

Wheels for CUDA 11.5 (cupy-cuda115) are now available. Python 3.10 wheels are also available for all supported CUDA / ROCm versions.

Wheels for Jetson can be found in the attached artifacts (pip install cupy-cuda112 -f https://pip.cupy.dev/pre).

Enable Generator random API in ROCm 4.3 (#5895)

ROCm 4.3 fixes a series of issues that prevented the Generator random API (#4177) to run in AMD devices.

Changes without compatibility

Refer to the Upgrade Guide for the detailed description.

Automatically enable peer access (#5496)

Peer access is enabled by default when a CuPy ndarray is stored in a different device as long as the machine topology allows it.

Change Device.use() semantics to align with Stream.use() (#5853)

When exiting a context, the current device is now reverted back to the device of the parent's context scope, not the device last use()d.

Automatically convert big-endian numpy.ndarray to little-endian in cupy.array() and its variants (#5828)

Previously CuPy was copying the given numpy.ndarray to GPU as-is, regardless of the endianness. In CuPy v10, big-endian arrays are converted to little-endian before the transfer, which is the native byte order on GPUs. This change eliminates the need to manually change the array endianness before creating the CuPy array.

Add cupyx.profiler module (#5940)

A new module cupyx.profiler is added to host all profiling related APIs in CuPy. Accordingly, the following APIs are relocated to this module:

cupy.prof.TimeRangeDecorator() -> cupyx.profiler.time_range()

cupy.prof.time_range() -> cupyx.profiler.time_range()

cupy.cuda.profile() -> cupyx.profiler.profile()

cupyx.time.repeat() -> cupyx.profiler.benchmark()

The old routines are deprecated.

Deprecate cupy.cuda.compile_with_cache (#5858)

An internal API cupy.cuda.compile_with_cache() has been marked as deprecated as there are better alternatives (RawModule, RawKernel). While it has a long-standing history, this API has never been meant to be public. We encourage downstream libraries and users to migrate to the aforementioned public APIs.

Announcements

Drop support for CUDA 10.1 or earlier (#5770)

As per the RFC in #5717 and Twitter, the minimum CUDA version that will be supported by CuPy v10 is CUDA 10.2.

Drop support for NCCL 2.6 and 2.7 (#5855)

The minimum supported version for CuPy v10 will be NCCL 2.8 as it implements the required primitives for cupyx.distributed to work.

Drop support for Python 3.6 (#5771)

Following the Python 3.6 sunset on December 2021, and the compatibility lines with NumPy, starting CuPy v10, Python 3.6 will no longer be supported.

Drop support for NumPy 1.17 (#5857)

As per NEP29, NumPy 1.17 support has been dropped on July 26, 2021.

Alpha/Beta/RC wheels no longer distributed through PyPI

As per the discussion in #5671, we stopped uploading pre-release binary wheels to PyPI for the health of the ecosystem. Pre-release wheels can now be downloaded from the recently introduced custom index (e.g., pip install cupy-cudaXXX -f https://pip.cupy.dev/pre) . Note that the sdist package is available in PyPI for all versions.

Outdated (v8.0.0rc1 or earlier) pre-release binaries have been removed from PyPI. See #5667 for details.

Changes of supported cuSPARSELt version

We are planning to drop cuSPARSELt v0.1.0 support in CuPy v10 final release. (#6045)

Changes

New Features

Add cupyx.distributed (#5590)

Add cupy.positive() (#5774)

Update cupy.array_api (#5783)

Update cupy.array_api typing (#5821)

Add trim_mean from scipy.stats to cupyx (#5900)

Implement more array creation & serialization methods (#5925)

Enhancements

Automatically enable peer access (#5496)

Update DLPack header to v0.6 to support exchanging arrays backed by managed memory (#5512)

Lazy-preload cuDNN (#5677)

Support ROCm managed memory (#5685)

Fix import failure when pytest namespace is available (#5703) (#5707)

Support cuTENSOR 1.3.3 (#5732)

Add dtype and casting arguments to cupy.concatenate() (#5759)

Automatically convert big-endian data to little-endian in cupy.array() and its variants (#5828)

Use pylibcugraph for connected_components (#5830)

Make show_config runnable without GPU (#5835)

NotImplementedError clarity (#5841)

Change Device.use() semantics to align with Stream.use() (#5853)

Drop support for NumPy 1.17 (#5857)

Deprecate cupy.cuda.compile_with_cache (#5858)

Show error when importing cupy.array_api with Python 3.7 (#5873)

Enable new random api in ROCm 4.3 (#5895)

Add bitorder option to cupy.packbits (#5898)

Support using cuTENSOR in elementwise ufuncs (#5902)

Workaround ROCm 4.3 LLVM_PATH issue in hipRTC (#5933)

Update the Array API module (#5939)

Add cupyx.profiler module (#5940)

Use SHA1 hash for kernel cache key to support Linux in FIPS-compliant mode (#5988)

Merge fp16 headers for CUDA 11.2+ (#5993)

Support CUDA 11.5 for library installer (#5996)

Add cupy-cuda115 to duplicate detection (#5999)

Suggest using binary packages when build failed (#6028)

Improve import error message (#6029)

Display license terms when downloading libraries (#6032)

Fix error type/message for duplicate value in axis (#5953)

Performance Improvements

Use index_t for faster address calculation (#5981)

Bug Fixes

Use cudaRuntimeGetVersion instead of CUDA_VERSION for CUDA Python support (#5723)

Allow generating cubins for the max known CC (#5779)

Fix hypergeometric distribution implementation to use int (#5785)

Fix non-determinisitc behavior in cupy.random.shuffle (#5838)

Avoid using driver.get_build_version (#5861)

Fix nan_to_num to comply with NumPy API (#5870)

Do not use cuTENSOR unless available (#5872)

Fix _get_cuda_build_version for ROCm (#5888)

Fix __repr__ of mode and scalar in cuTENSOR (#5901)

Fix to push device after setDevice succeed (#5904)

Fix ndarray.clip to match numpy (#5910)

Fix copyto with non-contiguous multidevice (#5913)

Avoid use of setDevice in CuPy codebase (#5915)

Fix max blocksize used in cupyx.optimizing.optimize for HIP (#5921)

Do not use with device in code base (#5963)

Fix __dlpack__ protocol (#5970)

Fix cupyx.tools.install_library for windows (#5977)

Fix ravel for strides 0 (#5978)

Avoid using with context for streams (#5985)

Fix cuTENSOR installation on Windows (#6007)

Fix hash length for SHA1 (#6023)

Fix: Add missing output dtype check for direct correlate/convolve (#6046)

Fix cuDNN version not displayed in wheel installation (#6054)

Code Fixes

Code-fix on cupy.array() (#5842)

Successive code fix on cupy.array() (#5844)

Fix kernel name of cupyx.scipy.ndimage.interpolation.map_coordinates (#5845)

Replace addAddNameExpression with addNameExpression in NVRTC binding (#5938)

Split loop testing helpers into _loops (#5967)

Make CUPY_DLPACK_EXPORT_VERSION consistent (#5982)

Fix comment in device switching (#5984)

Avoid using deprecated setDaemon method (#6059)

Documentation

Update upgrade guide (#5824)

Update list of supported OS (#5854)

Drop support for NCCL 2.6 and 2.7 (#5855)

Add docs for driver.get_build_version (#5860)

Document ppc64le and aarch64 are supported on conda-forge (#5865)

Mention deprecation of compile_with_cache() in upgrade guide (#5883)

Add docs for scipy.sparse.csgraph module (#5903)

Refine SciPy-compatible API documentation (#5905)

Improve the comparison table (#5907)

Remove CUDA 10.0 / 10.1 from README (#5924)

Improve some docs on interoperability and cupy.linalg.cholesky (#5941)

Add footnotes for functions unimplemented in CuPy (#5942)

Document CUPY_ACCELERATORS (#5948)

Fix section heading level (#5962)

Mention np.matrix in the difference section (#5966)

Add PyTorch with RawKernel example to docs (#5973)

Add sphinx-copybutton (#5976)

Add favicon to docs (#5980)

Replace favicon with high resolution one (#5986)

Update upgrade guide for v10 (#5994)

Cover a bit more of cuTENSOR in perf guide (#5995)

Support CUDA 11.5 on documents (#5997)

Fix typo in copyright line (#6030)

Add Python 3.10.0 to support list (#6038)

Added Compatibility Matrix to Upgrade Guide (#6053)

Installation

Bump CUDA/ROCm version in docker images (#5859)

Fix library installer to limit architecture (#5926)

Tests

Introduce new toolset for CI (#5474)

Simplify legacy ROCm test script for FlexCI (#5753)

Use pytest in TestJoin (#5764)

Clean up plan cache in a FFT slow test (#5811)

Improve handling of FlexCI test runs (#5814)

Tentatively disable pytest-xdist (#5826)

Add FlexCI projects for Linux (#5836)

Fix ROCm tests does not export LLVM_PATH (#5849)

Add test for CI generator (#5850)

Remove CUDA 10.0/10.1 and Python 3.6 from FlexCI tests (#5851)

Add mypy test for cupy_builder (#5856)

Add array-api-tests in FlexCI (#5862)

Upload cache even when test failed in FlexCI (#5867)

Improve CI generator to emit warning on uncovered axis (#5871)

Add CI for pylibcugraph (#5874)

Build CI docker images using BuildKit to utilize cache from registry (#5875)

Show hint to reproduce CI result locally in shell target (#5877)

Show time taken for build in CI (#5878)

Increase parallelism of CuPy build in CI (#5880)

Copy source directory to support pip 21.3 (#5881)

Fix ccache path to support CentOS (#5882)

Upload Docker image after running branch test (#5884)

Avoid cache download failure in CI when conflicting with cache upload (#5890)

Add FlexCI project for doctest, example and head test (#5891)

FlexCI test against Python 3.10 (#5899)

Fix fft test skip condition (#5908)

Add Slack/Gitter notification when branch test fail (#5914)

Declare the same environment variables as Linux in Windows CI (#5923)

Fix trim_mean test (#5944)

Temporarily skip Array API tests on ROCm (#5945)

Relax sparse linalg testing tolerance (#5952)

CI: Fix ROCm build test (FlexCI) failing (#5956)

CI: Fix ccache not working (#6016)

CI: Use ccache in Pre-review Test (#6027)

CI: Migrate ROCm build test from FlexCI to GitHub Actions (using ROCm docker image) (#6034)

CI: Merge doctest to example test in FlexCI (#6036)

CI: allow running tests selectively (#6039)

CI: Fix pip command use in FlexCI instance (#6049)

Fix notifier to work on Python 3.6 (#6056)

CI: Do not run full combination test even for branch tests for ROCm (#5955)

Others

Avoid triggering docker workflow on release of forked repos (#5863)

Refine issue templates using Issue Forms (#5868)

Bump version to v10.0.0rc1 (#6042)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@Anubha13kumari @SaharCarmel @SwastikTripathi @amathews-amd @anaruse @asi1024 @carterbox @drbeh @emcastillo @iskode @kmaehashi @lanttu1243 @leofang @okuta @prkhrsrvstv1 @spiralray @takagi @toslunar
Source code(tar.gz)
Source code(zip)
cupy_cuda102-10.0.0rc1-cp310-cp310-manylinux1_x86_64.whl(60.19 MB)
cupy_cuda102-10.0.0rc1-cp310-cp310-manylinux2014_aarch64.whl(34.43 MB)
cupy_cuda102-10.0.0rc1-cp310-cp310-win_amd64.whl(42.89 MB)
cupy_cuda102-10.0.0rc1-cp37-cp37m-manylinux1_x86_64.whl(58.67 MB)
cupy_cuda102-10.0.0rc1-cp37-cp37m-manylinux2014_aarch64.whl(32.75 MB)
cupy_cuda102-10.0.0rc1-cp37-cp37m-win_amd64.whl(42.77 MB)
cupy_cuda102-10.0.0rc1-cp38-cp38-manylinux1_x86_64.whl(61.81 MB)
cupy_cuda102-10.0.0rc1-cp38-cp38-manylinux2014_aarch64.whl(35.84 MB)
cupy_cuda102-10.0.0rc1-cp38-cp38-win_amd64.whl(42.92 MB)
cupy_cuda102-10.0.0rc1-cp39-cp39-manylinux1_x86_64.whl(60.11 MB)
cupy_cuda102-10.0.0rc1-cp39-cp39-manylinux2014_aarch64.whl(34.36 MB)
cupy_cuda102-10.0.0rc1-cp39-cp39-win_amd64.whl(42.88 MB)
cupy_cuda110-10.0.0rc1-cp310-cp310-manylinux1_x86_64.whl(74.81 MB)
cupy_cuda110-10.0.0rc1-cp310-cp310-win_amd64.whl(57.47 MB)
cupy_cuda110-10.0.0rc1-cp37-cp37m-manylinux1_x86_64.whl(73.28 MB)
cupy_cuda110-10.0.0rc1-cp37-cp37m-win_amd64.whl(57.36 MB)
cupy_cuda110-10.0.0rc1-cp38-cp38-manylinux1_x86_64.whl(76.43 MB)
cupy_cuda110-10.0.0rc1-cp38-cp38-win_amd64.whl(57.51 MB)
cupy_cuda110-10.0.0rc1-cp39-cp39-manylinux1_x86_64.whl(74.73 MB)
cupy_cuda110-10.0.0rc1-cp39-cp39-win_amd64.whl(57.47 MB)
cupy_cuda111-10.0.0rc1-cp310-cp310-manylinux1_x86_64.whl(93.60 MB)
cupy_cuda111-10.0.0rc1-cp310-cp310-win_amd64.whl(77.22 MB)
cupy_cuda111-10.0.0rc1-cp37-cp37m-manylinux1_x86_64.whl(92.07 MB)
cupy_cuda111-10.0.0rc1-cp37-cp37m-win_amd64.whl(77.11 MB)
cupy_cuda111-10.0.0rc1-cp38-cp38-manylinux1_x86_64.whl(95.22 MB)
cupy_cuda111-10.0.0rc1-cp38-cp38-win_amd64.whl(77.26 MB)
cupy_cuda111-10.0.0rc1-cp39-cp39-manylinux1_x86_64.whl(93.52 MB)
cupy_cuda111-10.0.0rc1-cp39-cp39-win_amd64.whl(77.22 MB)
cupy_cuda112-10.0.0rc1-cp310-cp310-manylinux1_x86_64.whl(75.22 MB)
cupy_cuda112-10.0.0rc1-cp310-cp310-win_amd64.whl(57.95 MB)
cupy_cuda112-10.0.0rc1-cp37-cp37m-manylinux1_x86_64.whl(73.70 MB)
cupy_cuda112-10.0.0rc1-cp37-cp37m-win_amd64.whl(57.84 MB)
cupy_cuda112-10.0.0rc1-cp38-cp38-manylinux1_x86_64.whl(76.85 MB)
cupy_cuda112-10.0.0rc1-cp38-cp38-win_amd64.whl(57.99 MB)
cupy_cuda112-10.0.0rc1-cp39-cp39-manylinux1_x86_64.whl(75.15 MB)
cupy_cuda112-10.0.0rc1-cp39-cp39-win_amd64.whl(57.95 MB)
cupy_cuda113-10.0.0rc1-cp310-cp310-manylinux1_x86_64.whl(72.40 MB)
cupy_cuda113-10.0.0rc1-cp310-cp310-win_amd64.whl(55.16 MB)
cupy_cuda113-10.0.0rc1-cp37-cp37m-manylinux1_x86_64.whl(70.88 MB)
cupy_cuda113-10.0.0rc1-cp37-cp37m-win_amd64.whl(55.05 MB)
cupy_cuda113-10.0.0rc1-cp38-cp38-manylinux1_x86_64.whl(74.02 MB)
cupy_cuda113-10.0.0rc1-cp38-cp38-win_amd64.whl(55.20 MB)
cupy_cuda113-10.0.0rc1-cp39-cp39-manylinux1_x86_64.whl(72.32 MB)
cupy_cuda113-10.0.0rc1-cp39-cp39-win_amd64.whl(55.16 MB)
cupy_cuda114-10.0.0rc1-cp310-cp310-manylinux1_x86_64.whl(76.14 MB)
cupy_cuda114-10.0.0rc1-cp310-cp310-win_amd64.whl(59.02 MB)
cupy_cuda114-10.0.0rc1-cp37-cp37m-manylinux1_x86_64.whl(74.61 MB)
cupy_cuda114-10.0.0rc1-cp37-cp37m-win_amd64.whl(58.91 MB)
cupy_cuda114-10.0.0rc1-cp38-cp38-manylinux1_x86_64.whl(77.76 MB)
cupy_cuda114-10.0.0rc1-cp38-cp38-win_amd64.whl(59.05 MB)
cupy_cuda114-10.0.0rc1-cp39-cp39-manylinux1_x86_64.whl(76.06 MB)
cupy_cuda114-10.0.0rc1-cp39-cp39-win_amd64.whl(59.02 MB)
cupy_cuda115-10.0.0rc1-cp310-cp310-manylinux1_x86_64.whl(73.21 MB)
cupy_cuda115-10.0.0rc1-cp310-cp310-win_amd64.whl(56.07 MB)
cupy_cuda115-10.0.0rc1-cp37-cp37m-manylinux1_x86_64.whl(71.69 MB)
cupy_cuda115-10.0.0rc1-cp37-cp37m-win_amd64.whl(55.96 MB)
cupy_cuda115-10.0.0rc1-cp38-cp38-manylinux1_x86_64.whl(74.84 MB)
cupy_cuda115-10.0.0rc1-cp38-cp38-win_amd64.whl(56.10 MB)
cupy_cuda115-10.0.0rc1-cp39-cp39-manylinux1_x86_64.whl(73.14 MB)
cupy_cuda115-10.0.0rc1-cp39-cp39-win_amd64.whl(56.07 MB)
cupy_rocm_4_0-10.0.0rc1-cp310-cp310-manylinux1_x86_64.whl(35.07 MB)
cupy_rocm_4_0-10.0.0rc1-cp37-cp37m-manylinux1_x86_64.whl(33.77 MB)
cupy_rocm_4_0-10.0.0rc1-cp38-cp38-manylinux1_x86_64.whl(36.51 MB)
cupy_rocm_4_0-10.0.0rc1-cp39-cp39-manylinux1_x86_64.whl(35.00 MB)
cupy_rocm_4_2-10.0.0rc1-cp310-cp310-manylinux1_x86_64.whl(34.23 MB)
cupy_rocm_4_2-10.0.0rc1-cp37-cp37m-manylinux1_x86_64.whl(32.92 MB)
cupy_rocm_4_2-10.0.0rc1-cp38-cp38-manylinux1_x86_64.whl(35.68 MB)
cupy_rocm_4_2-10.0.0rc1-cp39-cp39-manylinux1_x86_64.whl(34.16 MB)
cupy_rocm_4_3-10.0.0rc1-cp310-cp310-manylinux1_x86_64.whl(35.81 MB)
cupy_rocm_4_3-10.0.0rc1-cp37-cp37m-manylinux1_x86_64.whl(34.50 MB)
cupy_rocm_4_3-10.0.0rc1-cp38-cp38-manylinux1_x86_64.whl(37.25 MB)
cupy_rocm_4_3-10.0.0rc1-cp39-cp39-manylinux1_x86_64.whl(35.73 MB)
v9.6.0(Nov 11, 2021)
This is the release note of v9.6.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Announcements

Final release for v9.x series

This is expected to be the last release of the CuPy v9 series. Please start trying your workflow with CuPy v10.0.0rc1 and let us know if you have any feedback!

CuPy now supports CUDA 11.5

Wheels for CUDA 11.5 (cupy-cuda115) are now available.

Removal of Alpha/Beta/RC Wheels from PyPI

As per the discussion in #5671, we stopped uploading pre-release binary wheels to PyPI for the health of the ecosystem. Pre-release wheels can now be downloaded from the recently introduced custom index (e.g., pip install cupy-cudaXXX -f https://pip.cupy.dev/pre) . Note that the sdist package is available in PyPI for all versions.

Outdated (v8.0.0rc1 or earlier) pre-release binaries have been removed from PyPI. See #5667 for details.

Changes

Enhancements

Make show_config runnable without GPU (#5839)

Merge fp16 headers for CUDA 11.2+ (#6004)

Support cuTENSOR 1.3.3 (#6005)

Support CUDA 11.5 for library installer (#6010)

Display license terms when downloading libraries (#6041)

Fix error type/message for duplicate value in axis (#5987)

Bug Fixes

Do not use cuTENSOR unless available (#5885)

Fix non-determinisitc behavior in cupy.random.shuffle (#5887)

Fix ndarray.clip to match numpy (#5916)

Fix __repr__ of mode and scalar in cuTENSOR (#5917)

Fix max blocksize used in cupyx.optimizing.optimize for HIP (#5931)

Fix ravel for strides 0 (#5998)

Fix cuTENSOR installation on Windows (#6022)

Allow generating cubins for the max known CC (#6024)

Documentation

Update upgrade guide (#5834)

Document ppc64le and aarch64 are supported on conda-forge (#5869)

Improve the comparison table (#5911)

Add footnotes for functions unimplemented in CuPy (#5954)

Update the docstring for cholesky (#5960)

Document CUPY_ACCELERATORS (#5975)

Add favicon to docs (#5983)

Support CUDA 11.5 on documents (#6006)

Replace favicon with high resolution one (#6008)

Fix typo in copyright line (#6035)

Tests

Clean up plan cache in a FFT slow test (#5825)

Copy source directory to support pip 21.3 (#5896)

Simplify legacy ROCm test script for FlexCI (#5936)

Relax sparse linalg testing tolerance (#5958)

CI: Fix ROCm build test (FlexCI) failing (#5965)

Improve handling of FlexCI test runs (#6002)

Upload cache even when test failed in FlexCI (#6003)

CI: Increase timeout for CUDA 11.4 / 11.5 tests (#6040)

CI: Do not run full combination test even for branch tests for ROCm (#5974)

Others

Avoid triggering docker workflow on release of forked repos (#5886)

Bump version to v9.6.0 (#6043)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@asi1024 @drbeh @emcastillo @kmaehashi @leofang @takagi @toslunar
Source code(tar.gz)
Source code(zip)
cupy_cuda100-9.6.0-cp36-cp36m-manylinux1_x86_64.whl(56.24 MB)
cupy_cuda100-9.6.0-cp36-cp36m-win_amd64.whl(40.07 MB)
cupy_cuda100-9.6.0-cp37-cp37m-manylinux1_x86_64.whl(56.18 MB)
cupy_cuda100-9.6.0-cp37-cp37m-win_amd64.whl(40.07 MB)
cupy_cuda100-9.6.0-cp38-cp38-manylinux1_x86_64.whl(58.41 MB)
cupy_cuda100-9.6.0-cp38-cp38-win_amd64.whl(40.21 MB)
cupy_cuda100-9.6.0-cp39-cp39-manylinux1_x86_64.whl(56.97 MB)
cupy_cuda100-9.6.0-cp39-cp39-win_amd64.whl(40.17 MB)
cupy_cuda101-9.6.0-cp36-cp36m-manylinux1_x86_64.whl(57.49 MB)
cupy_cuda101-9.6.0-cp36-cp36m-win_amd64.whl(40.91 MB)
cupy_cuda101-9.6.0-cp37-cp37m-manylinux1_x86_64.whl(57.43 MB)
cupy_cuda101-9.6.0-cp37-cp37m-win_amd64.whl(40.91 MB)
cupy_cuda101-9.6.0-cp38-cp38-manylinux1_x86_64.whl(59.71 MB)
cupy_cuda101-9.6.0-cp38-cp38-win_amd64.whl(41.05 MB)
cupy_cuda101-9.6.0-cp39-cp39-manylinux1_x86_64.whl(58.23 MB)
cupy_cuda101-9.6.0-cp39-cp39-win_amd64.whl(41.02 MB)
cupy_cuda102-9.6.0-cp36-cp36m-manylinux1_x86_64.whl(57.89 MB)
cupy_cuda102-9.6.0-cp36-cp36m-win_amd64.whl(41.70 MB)
cupy_cuda102-9.6.0-cp37-cp37m-manylinux1_x86_64.whl(57.84 MB)
cupy_cuda102-9.6.0-cp37-cp37m-win_amd64.whl(41.70 MB)
cupy_cuda102-9.6.0-cp38-cp38-manylinux1_x86_64.whl(60.11 MB)
cupy_cuda102-9.6.0-cp38-cp38-win_amd64.whl(41.84 MB)
cupy_cuda102-9.6.0-cp39-cp39-manylinux1_x86_64.whl(58.63 MB)
cupy_cuda102-9.6.0-cp39-cp39-win_amd64.whl(41.80 MB)
cupy_cuda110-9.6.0-cp36-cp36m-manylinux1_x86_64.whl(71.86 MB)
cupy_cuda110-9.6.0-cp36-cp36m-win_amd64.whl(56.28 MB)
cupy_cuda110-9.6.0-cp37-cp37m-manylinux1_x86_64.whl(71.81 MB)
cupy_cuda110-9.6.0-cp37-cp37m-win_amd64.whl(56.28 MB)
cupy_cuda110-9.6.0-cp38-cp38-manylinux1_x86_64.whl(74.84 MB)
cupy_cuda110-9.6.0-cp38-cp38-win_amd64.whl(56.42 MB)
cupy_cuda110-9.6.0-cp39-cp39-manylinux1_x86_64.whl(73.21 MB)
cupy_cuda110-9.6.0-cp39-cp39-win_amd64.whl(56.39 MB)
cupy_cuda111-9.6.0-cp36-cp36m-manylinux1_x86_64.whl(90.36 MB)
cupy_cuda111-9.6.0-cp36-cp36m-win_amd64.whl(75.69 MB)
cupy_cuda111-9.6.0-cp37-cp37m-manylinux1_x86_64.whl(90.31 MB)
cupy_cuda111-9.6.0-cp37-cp37m-win_amd64.whl(75.69 MB)
cupy_cuda111-9.6.0-cp38-cp38-manylinux1_x86_64.whl(93.34 MB)
cupy_cuda111-9.6.0-cp38-cp38-win_amd64.whl(75.83 MB)
cupy_cuda111-9.6.0-cp39-cp39-manylinux1_x86_64.whl(91.71 MB)
cupy_cuda111-9.6.0-cp39-cp39-win_amd64.whl(75.79 MB)
cupy_cuda112-9.6.0-cp36-cp36m-manylinux1_x86_64.whl(72.41 MB)
cupy_cuda112-9.6.0-cp36-cp36m-win_amd64.whl(56.89 MB)
cupy_cuda112-9.6.0-cp37-cp37m-manylinux1_x86_64.whl(72.36 MB)
cupy_cuda112-9.6.0-cp37-cp37m-win_amd64.whl(56.90 MB)
cupy_cuda112-9.6.0-cp38-cp38-manylinux1_x86_64.whl(75.39 MB)
cupy_cuda112-9.6.0-cp38-cp38-win_amd64.whl(57.04 MB)
cupy_cuda112-9.6.0-cp39-cp39-manylinux1_x86_64.whl(73.76 MB)
cupy_cuda112-9.6.0-cp39-cp39-win_amd64.whl(57.00 MB)
cupy_cuda113-9.6.0-cp36-cp36m-manylinux1_x86_64.whl(69.59 MB)
cupy_cuda113-9.6.0-cp36-cp36m-win_amd64.whl(54.10 MB)
cupy_cuda113-9.6.0-cp37-cp37m-manylinux1_x86_64.whl(69.53 MB)
cupy_cuda113-9.6.0-cp37-cp37m-win_amd64.whl(54.10 MB)
cupy_cuda113-9.6.0-cp38-cp38-manylinux1_x86_64.whl(72.56 MB)
cupy_cuda113-9.6.0-cp38-cp38-win_amd64.whl(54.24 MB)
cupy_cuda113-9.6.0-cp39-cp39-manylinux1_x86_64.whl(70.93 MB)
cupy_cuda113-9.6.0-cp39-cp39-win_amd64.whl(54.20 MB)
cupy_cuda114-9.6.0-cp36-cp36m-manylinux1_x86_64.whl(73.33 MB)
cupy_cuda114-9.6.0-cp36-cp36m-win_amd64.whl(57.95 MB)
cupy_cuda114-9.6.0-cp37-cp37m-manylinux1_x86_64.whl(73.27 MB)
cupy_cuda114-9.6.0-cp37-cp37m-win_amd64.whl(57.95 MB)
cupy_cuda114-9.6.0-cp38-cp38-manylinux1_x86_64.whl(76.30 MB)
cupy_cuda114-9.6.0-cp38-cp38-win_amd64.whl(58.10 MB)
cupy_cuda114-9.6.0-cp39-cp39-manylinux1_x86_64.whl(74.67 MB)
cupy_cuda114-9.6.0-cp39-cp39-win_amd64.whl(58.06 MB)
cupy_cuda115-9.6.0-cp36-cp36m-manylinux1_x86_64.whl(70.41 MB)
cupy_cuda115-9.6.0-cp36-cp36m-win_amd64.whl(55.00 MB)
cupy_cuda115-9.6.0-cp37-cp37m-manylinux1_x86_64.whl(70.35 MB)
cupy_cuda115-9.6.0-cp37-cp37m-win_amd64.whl(55.00 MB)
cupy_cuda115-9.6.0-cp38-cp38-manylinux1_x86_64.whl(73.38 MB)
cupy_cuda115-9.6.0-cp38-cp38-win_amd64.whl(55.15 MB)
cupy_cuda115-9.6.0-cp39-cp39-manylinux1_x86_64.whl(71.75 MB)
cupy_cuda115-9.6.0-cp39-cp39-win_amd64.whl(55.11 MB)
cupy_cuda92-9.6.0-cp36-cp36m-manylinux1_x86_64.whl(52.51 MB)
cupy_cuda92-9.6.0-cp36-cp36m-win_amd64.whl(36.70 MB)
cupy_cuda92-9.6.0-cp37-cp37m-manylinux1_x86_64.whl(52.45 MB)
cupy_cuda92-9.6.0-cp37-cp37m-win_amd64.whl(36.70 MB)
cupy_cuda92-9.6.0-cp38-cp38-manylinux1_x86_64.whl(54.68 MB)
cupy_cuda92-9.6.0-cp38-cp38-win_amd64.whl(36.84 MB)
cupy_cuda92-9.6.0-cp39-cp39-manylinux1_x86_64.whl(53.24 MB)
cupy_cuda92-9.6.0-cp39-cp39-win_amd64.whl(36.80 MB)
v9.5.0(Sep 30, 2021)
This is the release note of v9.5.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Announcements

Removal of Alpha/Beta/RC Wheels from PyPI

As per the discussion in #5671, we stopped uploading pre-release binary wheels to PyPI for the health of the ecosystem. Pre-release wheels can now be downloaded from the recently introduced custom index (e.g., pip install cupy-cudaXXX -f https://pip.cupy.dev/pre) . Note that the sdist package is available in PyPI for all versions.

Outdated (v8.0.0rc1 or earlier) pre-release binaries have been removed from PyPI. See #5667 for details.

Changes

Enhancements

Support cuDNN 8.2.4 (#5744)

Support NCCL 2.11.4 (#5747)

Fix cupyx.optimize to save file when no optimization ran (#5760)

Bug Fixes

Fix spline filter with large array (#5686)

Fix exception for indexing with multiple ellipses (#5739)

Fix docstring for fallback modules (#5742)

Include stdexcept in hip headers (#5777)

Fixed typo in error message in sparse.csr_matrix (#5788)

Fix MAX_NDIM and add guards/tests (#5798)

Disable spmm on Windows CUDA 10.2 (#5805)

Documentation

Fix random docstring (#5708)

Remove --pre from ROCm source build instructions (#5782)

Use custom index for pre-release wheels (#5793)

Installation

Add maintainers in setup.py (#5758)

Bump version to v9.5.0 (#5808)

Tests

Update test_eigenvalue.py (#5643)

Improve performance of TestSplineFilter1dLargeArray (#5694)

Stop inheriting unittest.TestCase for performance (#5710)

TestSplineFilter1dLargeArray marked slow and reduced memory usage (#5729)

Make testing helpers support non-methods (#5731)

Make test parameter names static (#5733)

Update pip and setuptools in Windows CI (#5738)

Improve FlexCI output (#5796)

Fix error message comparison (#5806)

Others

Add workflow to test/build/push docker images on pull-request/release (#5752)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@christinahedges @emcastillo @kmaehashi @leofang @takagi @toslunar
Source code(tar.gz)
Source code(zip)
cupy_cuda100-9.5.0-cp36-cp36m-manylinux1_x86_64.whl(56.28 MB)
cupy_cuda100-9.5.0-cp36-cp36m-win_amd64.whl(40.11 MB)
cupy_cuda100-9.5.0-cp37-cp37m-manylinux1_x86_64.whl(56.22 MB)
cupy_cuda100-9.5.0-cp37-cp37m-win_amd64.whl(40.11 MB)
cupy_cuda100-9.5.0-cp38-cp38-manylinux1_x86_64.whl(58.45 MB)
cupy_cuda100-9.5.0-cp38-cp38-win_amd64.whl(40.25 MB)
cupy_cuda100-9.5.0-cp39-cp39-manylinux1_x86_64.whl(57.01 MB)
cupy_cuda100-9.5.0-cp39-cp39-win_amd64.whl(40.22 MB)
cupy_cuda101-9.5.0-cp36-cp36m-manylinux1_x86_64.whl(57.53 MB)
cupy_cuda101-9.5.0-cp36-cp36m-win_amd64.whl(40.96 MB)
cupy_cuda101-9.5.0-cp37-cp37m-manylinux1_x86_64.whl(57.47 MB)
cupy_cuda101-9.5.0-cp37-cp37m-win_amd64.whl(40.96 MB)
cupy_cuda101-9.5.0-cp38-cp38-manylinux1_x86_64.whl(59.75 MB)
cupy_cuda101-9.5.0-cp38-cp38-win_amd64.whl(41.10 MB)
cupy_cuda101-9.5.0-cp39-cp39-manylinux1_x86_64.whl(58.27 MB)
cupy_cuda101-9.5.0-cp39-cp39-win_amd64.whl(41.06 MB)
cupy_cuda102-9.5.0-cp36-cp36m-manylinux1_x86_64.whl(57.93 MB)
cupy_cuda102-9.5.0-cp36-cp36m-win_amd64.whl(41.74 MB)
cupy_cuda102-9.5.0-cp37-cp37m-manylinux1_x86_64.whl(57.87 MB)
cupy_cuda102-9.5.0-cp37-cp37m-win_amd64.whl(41.74 MB)
cupy_cuda102-9.5.0-cp38-cp38-manylinux1_x86_64.whl(60.15 MB)
cupy_cuda102-9.5.0-cp38-cp38-win_amd64.whl(41.88 MB)
cupy_cuda102-9.5.0-cp39-cp39-manylinux1_x86_64.whl(58.67 MB)
cupy_cuda102-9.5.0-cp39-cp39-win_amd64.whl(41.85 MB)
cupy_cuda110-9.5.0-cp36-cp36m-manylinux1_x86_64.whl(71.90 MB)
cupy_cuda110-9.5.0-cp36-cp36m-win_amd64.whl(56.32 MB)
cupy_cuda110-9.5.0-cp37-cp37m-manylinux1_x86_64.whl(71.85 MB)
cupy_cuda110-9.5.0-cp37-cp37m-win_amd64.whl(56.32 MB)
cupy_cuda110-9.5.0-cp38-cp38-manylinux1_x86_64.whl(74.88 MB)
cupy_cuda110-9.5.0-cp38-cp38-win_amd64.whl(56.47 MB)
cupy_cuda110-9.5.0-cp39-cp39-manylinux1_x86_64.whl(73.25 MB)
cupy_cuda110-9.5.0-cp39-cp39-win_amd64.whl(56.43 MB)
cupy_cuda111-9.5.0-cp36-cp36m-manylinux1_x86_64.whl(90.40 MB)
cupy_cuda111-9.5.0-cp36-cp36m-win_amd64.whl(75.73 MB)
cupy_cuda111-9.5.0-cp37-cp37m-manylinux1_x86_64.whl(90.35 MB)
cupy_cuda111-9.5.0-cp37-cp37m-win_amd64.whl(75.73 MB)
cupy_cuda111-9.5.0-cp38-cp38-manylinux1_x86_64.whl(93.37 MB)
cupy_cuda111-9.5.0-cp38-cp38-win_amd64.whl(75.87 MB)
cupy_cuda111-9.5.0-cp39-cp39-manylinux1_x86_64.whl(91.75 MB)
cupy_cuda111-9.5.0-cp39-cp39-win_amd64.whl(75.84 MB)
cupy_cuda112-9.5.0-cp36-cp36m-manylinux1_x86_64.whl(72.45 MB)
cupy_cuda112-9.5.0-cp36-cp36m-win_amd64.whl(56.94 MB)
cupy_cuda112-9.5.0-cp37-cp37m-manylinux1_x86_64.whl(72.40 MB)
cupy_cuda112-9.5.0-cp37-cp37m-win_amd64.whl(56.94 MB)
cupy_cuda112-9.5.0-cp38-cp38-manylinux1_x86_64.whl(75.42 MB)
cupy_cuda112-9.5.0-cp38-cp38-win_amd64.whl(57.08 MB)
cupy_cuda112-9.5.0-cp39-cp39-manylinux1_x86_64.whl(73.80 MB)
cupy_cuda112-9.5.0-cp39-cp39-win_amd64.whl(57.05 MB)
cupy_cuda113-9.5.0-cp36-cp36m-manylinux1_x86_64.whl(69.63 MB)
cupy_cuda113-9.5.0-cp36-cp36m-win_amd64.whl(54.14 MB)
cupy_cuda113-9.5.0-cp37-cp37m-manylinux1_x86_64.whl(69.57 MB)
cupy_cuda113-9.5.0-cp37-cp37m-win_amd64.whl(54.14 MB)
cupy_cuda113-9.5.0-cp38-cp38-manylinux1_x86_64.whl(72.60 MB)
cupy_cuda113-9.5.0-cp38-cp38-win_amd64.whl(54.28 MB)
cupy_cuda113-9.5.0-cp39-cp39-manylinux1_x86_64.whl(70.97 MB)
cupy_cuda113-9.5.0-cp39-cp39-win_amd64.whl(54.25 MB)
cupy_cuda114-9.5.0-cp36-cp36m-manylinux1_x86_64.whl(73.37 MB)
cupy_cuda114-9.5.0-cp36-cp36m-win_amd64.whl(58.00 MB)
cupy_cuda114-9.5.0-cp37-cp37m-manylinux1_x86_64.whl(73.31 MB)
cupy_cuda114-9.5.0-cp37-cp37m-win_amd64.whl(58.00 MB)
cupy_cuda114-9.5.0-cp38-cp38-manylinux1_x86_64.whl(76.34 MB)
cupy_cuda114-9.5.0-cp38-cp38-win_amd64.whl(58.14 MB)
cupy_cuda114-9.5.0-cp39-cp39-manylinux1_x86_64.whl(74.71 MB)
cupy_cuda114-9.5.0-cp39-cp39-win_amd64.whl(58.10 MB)
cupy_cuda92-9.5.0-cp36-cp36m-manylinux1_x86_64.whl(52.55 MB)
cupy_cuda92-9.5.0-cp36-cp36m-win_amd64.whl(36.74 MB)
cupy_cuda92-9.5.0-cp37-cp37m-manylinux1_x86_64.whl(52.49 MB)
cupy_cuda92-9.5.0-cp37-cp37m-win_amd64.whl(36.75 MB)
cupy_cuda92-9.5.0-cp38-cp38-manylinux1_x86_64.whl(54.72 MB)
cupy_cuda92-9.5.0-cp38-cp38-win_amd64.whl(36.88 MB)
cupy_cuda92-9.5.0-cp39-cp39-manylinux1_x86_64.whl(53.28 MB)
cupy_cuda92-9.5.0-cp39-cp39-win_amd64.whl(36.85 MB)
v10.0.0b3(Sep 30, 2021)
This is the release note of v10.0.0b3. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Array API initial support (#5698)

This release starts implementing the Array API standard for interoperability with other tensor libraries. Please check the CuPy documentation to see a list of the currently available features.

Changes without compatibility

Drop support for CUDA 10.1 or earlier (#5770)

As per the RFC in #5717 and twitter, the minimum CUDA version that will be supported by CuPy v10 is CUDA 10.2.

Drop support for Python 3.6 (#5771)

Following the Python 3.6 sunset on December 2021, and the compatibility lines with NumPy, starting CuPy v10, Python 3.6 will no longer be supported.

Alpha/Beta/RC wheels no longer distributed through PyPI

As per the discussion in #5671, we stopped uploading pre-release binary wheels to PyPI for the health of the ecosystem. Pre-release wheels can now be downloaded from the recently introduced custom index (e.g., pip install cupy-cudaXXX -f https://pip.cupy.dev/pre) . Note that the sdist package is available in PyPI for all versions.

Outdated (v8.0.0rc1 or earlier) pre-release binaries have been removed from PyPI. See #5667 for details.

Changes

New Features

Add binomial distribution to new Generator (#5429)

Adopt the numpy.array_api module as cupy.array_api (#5698)

Enhancements

Improve stream mismatch error message (#5706)

Support cuDNN 8.2.4 (#5726)

Support NCCL 2.11.4 (#5734)

Fix cupyx.optimize to save file when no optimization ran (#5757)

Adding bitorder support to cupy.unpackbits (#5765)

Drop support for CUDA 10.1 or earlier (#5770)

Drop support for Python 3.6 (#5771)

Bug Fixes

Fix spline filter with large array (#5673)

Fix exception for indexing with multiple ellipses (#5718)

Fix docstring for fallback modules (#5728)

Fix MAX_NDIM and add guards/tests (#5749)

Fixed typo in error message in sparse.csr_matrix (#5767)

Include stdexcept in hip headers (#5769)

Disable spmm on Windows CUDA 10.2 (#5802)

Code Fixes

Prefix Cython compile_time_env with CUPY_ (#5740)

Documentation

Use custom index for pre-release wheels (#5772)

Remove --pre from ROCm source build instructions (#5773)

Installation

Reorganize build scripts, part 1 (#5730)

Reorganize build scripts, part 2: separate modules (#5743)

Reorganize build scripts, part 3: simplify setup.py (#5745)

Reorganize build scripts, part 4: remove global cupy_setup_options (#5754)

Reorganize build scripts, part 5: remove Cython version check (#5755)

Add maintainers in setup.py (#5756)

Bump version to v10.0.0b3 (#5807)

Tests

Make testing helpers support non-methods (#5594)

Stop inheriting unittest.TestCase for performance (#5599)

Eliminate random test ids (#5659)

Improve performance of TestSplineFilter1dLargeArray (#5693)

TestSplineFilter1dLargeArray marked slow and reduced memory usage (#5724)

Make test parameter names static (#5727)

Update pip and setuptools in Windows CI (#5735)

Improve FlexCI output (#5786)

Skip tests for bug cases (FFT on CUDA 10.2 + Pascal) (#5791)

Fix error message comparison (#5799)

Fix test skip issue (#5801)

Others

Update auto-notify bot for array-api label (#5725)

Fix backport trigger (#5741)

Add workflow to test/build/push docker images on pull-request/release (#5746)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@christinahedges @emcastillo @iskode @kmaehashi @leofang @povinsahu1909 @takagi @toslunar
Source code(tar.gz)
Source code(zip)
cupy_cuda102-10.0.0b3-cp37-cp37m-manylinux1_x86_64.whl(60.05 MB)
cupy_cuda102-10.0.0b3-cp37-cp37m-win_amd64.whl(43.58 MB)
cupy_cuda102-10.0.0b3-cp38-cp38-manylinux1_x86_64.whl(62.40 MB)
cupy_cuda102-10.0.0b3-cp38-cp38-win_amd64.whl(43.73 MB)
cupy_cuda102-10.0.0b3-cp39-cp39-manylinux1_x86_64.whl(60.88 MB)
cupy_cuda102-10.0.0b3-cp39-cp39-win_amd64.whl(43.69 MB)
cupy_cuda110-10.0.0b3-cp37-cp37m-manylinux1_x86_64.whl(74.31 MB)
cupy_cuda110-10.0.0b3-cp37-cp37m-win_amd64.whl(58.47 MB)
cupy_cuda110-10.0.0b3-cp38-cp38-manylinux1_x86_64.whl(77.42 MB)
cupy_cuda110-10.0.0b3-cp38-cp38-win_amd64.whl(58.61 MB)
cupy_cuda110-10.0.0b3-cp39-cp39-manylinux1_x86_64.whl(75.75 MB)
cupy_cuda110-10.0.0b3-cp39-cp39-win_amd64.whl(58.58 MB)
cupy_cuda111-10.0.0b3-cp37-cp37m-manylinux1_x86_64.whl(93.35 MB)
cupy_cuda111-10.0.0b3-cp37-cp37m-win_amd64.whl(78.44 MB)
cupy_cuda111-10.0.0b3-cp38-cp38-manylinux1_x86_64.whl(96.46 MB)
cupy_cuda111-10.0.0b3-cp38-cp38-win_amd64.whl(78.58 MB)
cupy_cuda111-10.0.0b3-cp39-cp39-manylinux1_x86_64.whl(94.79 MB)
cupy_cuda111-10.0.0b3-cp39-cp39-win_amd64.whl(78.55 MB)
cupy_cuda112-10.0.0b3-cp37-cp37m-manylinux1_x86_64.whl(74.75 MB)
cupy_cuda112-10.0.0b3-cp37-cp37m-win_amd64.whl(58.97 MB)
cupy_cuda112-10.0.0b3-cp38-cp38-manylinux1_x86_64.whl(77.86 MB)
cupy_cuda112-10.0.0b3-cp38-cp38-win_amd64.whl(59.11 MB)
cupy_cuda112-10.0.0b3-cp39-cp39-manylinux1_x86_64.whl(76.19 MB)
cupy_cuda112-10.0.0b3-cp39-cp39-win_amd64.whl(59.08 MB)
cupy_cuda113-10.0.0b3-cp37-cp37m-manylinux1_x86_64.whl(71.92 MB)
cupy_cuda113-10.0.0b3-cp37-cp37m-win_amd64.whl(56.17 MB)
cupy_cuda113-10.0.0b3-cp38-cp38-manylinux1_x86_64.whl(75.03 MB)
cupy_cuda113-10.0.0b3-cp38-cp38-win_amd64.whl(56.31 MB)
cupy_cuda113-10.0.0b3-cp39-cp39-manylinux1_x86_64.whl(73.36 MB)
cupy_cuda113-10.0.0b3-cp39-cp39-win_amd64.whl(56.28 MB)
cupy_cuda114-10.0.0b3-cp37-cp37m-manylinux1_x86_64.whl(75.66 MB)
cupy_cuda114-10.0.0b3-cp37-cp37m-win_amd64.whl(60.03 MB)
cupy_cuda114-10.0.0b3-cp38-cp38-manylinux1_x86_64.whl(78.77 MB)
cupy_cuda114-10.0.0b3-cp38-cp38-win_amd64.whl(60.17 MB)
cupy_cuda114-10.0.0b3-cp39-cp39-manylinux1_x86_64.whl(77.10 MB)
cupy_cuda114-10.0.0b3-cp39-cp39-win_amd64.whl(60.14 MB)
cupy_rocm_4_0-10.0.0b3-cp37-cp37m-manylinux1_x86_64.whl(33.19 MB)
cupy_rocm_4_0-10.0.0b3-cp38-cp38-manylinux1_x86_64.whl(35.81 MB)
cupy_rocm_4_0-10.0.0b3-cp39-cp39-manylinux1_x86_64.whl(34.38 MB)
cupy_rocm_4_2-10.0.0b3-cp37-cp37m-manylinux1_x86_64.whl(31.88 MB)
cupy_rocm_4_2-10.0.0b3-cp38-cp38-manylinux1_x86_64.whl(34.51 MB)
cupy_rocm_4_2-10.0.0b3-cp39-cp39-manylinux1_x86_64.whl(33.07 MB)
cupy_rocm_4_3-10.0.0b3-cp37-cp37m-manylinux1_x86_64.whl(31.02 MB)
cupy_rocm_4_3-10.0.0b3-cp38-cp38-manylinux1_x86_64.whl(33.65 MB)
cupy_rocm_4_3-10.0.0b3-cp39-cp39-manylinux1_x86_64.whl(32.21 MB)
v10.0.0b2(Aug 26, 2021)
This is the release note of v10.0.0b2. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Support for CUDA Python (#5638)

CuPy is one of the first libraries providing support for the newly released CUDA Python bindings. To try it, install cuda-python manually and set the CUPY_USE_CUDA_PYTHON=1 environment variable when building CuPy as written in the documentation.

Support for AMD ROCm 4.3

Support for ROCm 4.3 has been added in the latest release and binary wheels are provided as well. Note that there is currently an issue with ROCm 4.3 that prevents it from running in several environments. The current workaround is to set the LLVM_PATH variable to the llvm folder included in ROCm 4.3 installation (e.g., export LLVM_PATH=/opt/rocm-4.3/llvm).

Announcements

Removal of Alpha/Beta/RC Wheels from PyPI

As per the discussion in #5671, we will stop uploading pre-release binary wheels to PyPI for the health of the ecosystem. Pre-release wheels can now be downloaded from the assets section of each GitHub release page (e.g., pip install cupy-cudaXXX -f https://github.com/cupy/cupy/releases/tag/v10.0.0b2) . Note that the sdist package is available in PyPI for all versions.

We are also going to remove outdated (v8.0.0rc1 or earlier) pre-release binary wheels from PyPI on September 20th. See #5667 for details.

Changes

New Features

Support batched QR solver (#5583)

Add cupyx.scipy.sparse.linalg.minres (#5585)

Add Log Series distribution to cupy.random.Generator (#5618)

Add Power distribution to cupy.random.Generator (#5624)

Add support for CUDA Python (#5638)

Add Chi-square distribution to cupy.random.Generator (#5645)

Add Dirichlet distribution to cupy.random.Generator (#5648)

Add F distribution to cupy.random.Generator (#5655)

Enhancements

Add ncclAvg and ncclBfloat16 for NCCL (#5545)

Add new eigensolvers from rocSOLVER (#5555)

Add support for array input in beta distribution of cupy.random.Generator (#5573)

Release the GIL for several NCCL ops (#5574)

Allow to compile using PTX with an envvar (#5622)

Show CUDA Python version (#5651)

Fix version check for new ROCm version definition (#5657)

Rest of version check fix for new ROCm version definition (#5660)

Add ROCm 4.3 in duplicate detection (#5669)

Bug Fixes

Fix compute capability check (#5600)

Fix FFT convolve for shapes containing 1 (#5609)

Fix squareness checks (#5642)

Fix unique for empty array (#5654)

Code Fixes

Add batch_identity helper (#5614)

Remove unnecessary comments (#5631)

Documentation

Update Sphinx to 4.1.2 (#5612)

Fix random docstring (#5628)

Support ROCm v4.3 in document (#5633)

__array_function__ feature by default (#5644)

Tests

Fix skipTest in test_decomp_lu (#5593)

Mark lsmr tests xfail for CSR matrices on HIP (#5597)

Increase test timeout (#5601)

Fix cubic for_all_dtypes_combination tests (#5629)

Add CI for ROCm 4.3 (#5630)

Reload GPG key for ROCm 4.2 test (#5636)

Fix branch name of cuda-python (#5650)

Add a workaround for ROCm 4.3.0 for testing (#5662)

Others

Add cupy-cuda114 to duplicate detection (#5621)

Bump version to v10.0.0b2 (#5679)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@hauntsaninja @leofang @povinsahu1909 @yashasvimisra2798
Source code(tar.gz)
Source code(zip)
cupy_cuda100-10.0.0b2-cp36-cp36m-manylinux1_x86_64.whl(57.91 MB)
cupy_cuda100-10.0.0b2-cp36-cp36m-win_amd64.whl(41.43 MB)
cupy_cuda100-10.0.0b2-cp37-cp37m-manylinux1_x86_64.whl(57.85 MB)
cupy_cuda100-10.0.0b2-cp37-cp37m-win_amd64.whl(41.43 MB)
cupy_cuda100-10.0.0b2-cp38-cp38-manylinux1_x86_64.whl(60.14 MB)
cupy_cuda100-10.0.0b2-cp38-cp38-win_amd64.whl(41.57 MB)
cupy_cuda100-10.0.0b2-cp39-cp39-manylinux1_x86_64.whl(58.66 MB)
cupy_cuda100-10.0.0b2-cp39-cp39-win_amd64.whl(41.54 MB)
cupy_cuda101-10.0.0b2-cp36-cp36m-manylinux1_x86_64.whl(59.23 MB)
cupy_cuda101-10.0.0b2-cp36-cp36m-win_amd64.whl(42.38 MB)
cupy_cuda101-10.0.0b2-cp37-cp37m-manylinux1_x86_64.whl(59.16 MB)
cupy_cuda101-10.0.0b2-cp37-cp37m-win_amd64.whl(42.38 MB)
cupy_cuda101-10.0.0b2-cp38-cp38-manylinux1_x86_64.whl(61.51 MB)
cupy_cuda101-10.0.0b2-cp38-cp38-win_amd64.whl(42.52 MB)
cupy_cuda101-10.0.0b2-cp39-cp39-manylinux1_x86_64.whl(59.99 MB)
cupy_cuda101-10.0.0b2-cp39-cp39-win_amd64.whl(42.49 MB)
cupy_cuda102-10.0.0b2-cp36-cp36m-manylinux1_x86_64.whl(59.64 MB)
cupy_cuda102-10.0.0b2-cp36-cp36m-win_amd64.whl(43.16 MB)
cupy_cuda102-10.0.0b2-cp37-cp37m-manylinux1_x86_64.whl(59.58 MB)
cupy_cuda102-10.0.0b2-cp37-cp37m-win_amd64.whl(43.16 MB)
cupy_cuda102-10.0.0b2-cp38-cp38-manylinux1_x86_64.whl(61.92 MB)
cupy_cuda102-10.0.0b2-cp38-cp38-win_amd64.whl(43.30 MB)
cupy_cuda102-10.0.0b2-cp39-cp39-manylinux1_x86_64.whl(60.40 MB)
cupy_cuda102-10.0.0b2-cp39-cp39-win_amd64.whl(43.27 MB)
cupy_cuda110-10.0.0b2-cp36-cp36m-manylinux1_x86_64.whl(73.80 MB)
cupy_cuda110-10.0.0b2-cp36-cp36m-win_amd64.whl(57.95 MB)
cupy_cuda110-10.0.0b2-cp37-cp37m-manylinux1_x86_64.whl(73.74 MB)
cupy_cuda110-10.0.0b2-cp37-cp37m-win_amd64.whl(57.95 MB)
cupy_cuda110-10.0.0b2-cp38-cp38-manylinux1_x86_64.whl(76.84 MB)
cupy_cuda110-10.0.0b2-cp38-cp38-win_amd64.whl(58.09 MB)
cupy_cuda110-10.0.0b2-cp39-cp39-manylinux1_x86_64.whl(75.18 MB)
cupy_cuda110-10.0.0b2-cp39-cp39-win_amd64.whl(58.06 MB)
cupy_cuda111-10.0.0b2-cp36-cp36m-manylinux1_x86_64.whl(92.68 MB)
cupy_cuda111-10.0.0b2-cp36-cp36m-win_amd64.whl(77.76 MB)
cupy_cuda111-10.0.0b2-cp37-cp37m-manylinux1_x86_64.whl(92.63 MB)
cupy_cuda111-10.0.0b2-cp37-cp37m-win_amd64.whl(77.76 MB)
cupy_cuda111-10.0.0b2-cp38-cp38-manylinux1_x86_64.whl(95.74 MB)
cupy_cuda111-10.0.0b2-cp38-cp38-win_amd64.whl(77.91 MB)
cupy_cuda111-10.0.0b2-cp39-cp39-manylinux1_x86_64.whl(94.07 MB)
cupy_cuda111-10.0.0b2-cp39-cp39-win_amd64.whl(77.87 MB)
cupy_cuda112-10.0.0b2-cp36-cp36m-manylinux1_x86_64.whl(73.94 MB)
cupy_cuda112-10.0.0b2-cp36-cp36m-win_amd64.whl(58.15 MB)
cupy_cuda112-10.0.0b2-cp37-cp37m-manylinux1_x86_64.whl(73.88 MB)
cupy_cuda112-10.0.0b2-cp37-cp37m-win_amd64.whl(58.15 MB)
cupy_cuda112-10.0.0b2-cp38-cp38-manylinux1_x86_64.whl(76.99 MB)
cupy_cuda112-10.0.0b2-cp38-cp38-win_amd64.whl(58.29 MB)
cupy_cuda112-10.0.0b2-cp39-cp39-manylinux1_x86_64.whl(75.32 MB)
cupy_cuda112-10.0.0b2-cp39-cp39-win_amd64.whl(58.26 MB)
cupy_cuda113-10.0.0b2-cp36-cp36m-manylinux1_x86_64.whl(71.12 MB)
cupy_cuda113-10.0.0b2-cp36-cp36m-win_amd64.whl(55.35 MB)
cupy_cuda113-10.0.0b2-cp37-cp37m-manylinux1_x86_64.whl(71.06 MB)
cupy_cuda113-10.0.0b2-cp37-cp37m-win_amd64.whl(55.35 MB)
cupy_cuda113-10.0.0b2-cp38-cp38-manylinux1_x86_64.whl(74.16 MB)
cupy_cuda113-10.0.0b2-cp38-cp38-win_amd64.whl(55.49 MB)
cupy_cuda113-10.0.0b2-cp39-cp39-manylinux1_x86_64.whl(72.49 MB)
cupy_cuda113-10.0.0b2-cp39-cp39-win_amd64.whl(55.46 MB)
cupy_cuda114-10.0.0b2-cp36-cp36m-manylinux1_x86_64.whl(74.86 MB)
cupy_cuda114-10.0.0b2-cp36-cp36m-win_amd64.whl(59.21 MB)
cupy_cuda114-10.0.0b2-cp37-cp37m-manylinux1_x86_64.whl(74.80 MB)
cupy_cuda114-10.0.0b2-cp37-cp37m-win_amd64.whl(59.21 MB)
cupy_cuda114-10.0.0b2-cp38-cp38-manylinux1_x86_64.whl(77.91 MB)
cupy_cuda114-10.0.0b2-cp38-cp38-win_amd64.whl(59.35 MB)
cupy_cuda114-10.0.0b2-cp39-cp39-manylinux1_x86_64.whl(76.24 MB)
cupy_cuda114-10.0.0b2-cp39-cp39-win_amd64.whl(59.32 MB)
cupy_rocm_4_0-10.0.0b2-cp36-cp36m-manylinux1_x86_64.whl(33.27 MB)
cupy_rocm_4_0-10.0.0b2-cp37-cp37m-manylinux1_x86_64.whl(33.21 MB)
cupy_rocm_4_0-10.0.0b2-cp38-cp38-manylinux1_x86_64.whl(35.83 MB)
cupy_rocm_4_0-10.0.0b2-cp39-cp39-manylinux1_x86_64.whl(34.40 MB)
cupy_rocm_4_2-10.0.0b2-cp36-cp36m-manylinux1_x86_64.whl(31.95 MB)
cupy_rocm_4_2-10.0.0b2-cp37-cp37m-manylinux1_x86_64.whl(31.90 MB)
cupy_rocm_4_2-10.0.0b2-cp38-cp38-manylinux1_x86_64.whl(34.53 MB)
cupy_rocm_4_2-10.0.0b2-cp39-cp39-manylinux1_x86_64.whl(33.09 MB)
cupy_rocm_4_3-10.0.0b2-cp36-cp36m-manylinux1_x86_64.whl(31.10 MB)
cupy_rocm_4_3-10.0.0b2-cp37-cp37m-manylinux1_x86_64.whl(31.04 MB)
cupy_rocm_4_3-10.0.0b2-cp38-cp38-manylinux1_x86_64.whl(33.67 MB)
cupy_rocm_4_3-10.0.0b2-cp39-cp39-manylinux1_x86_64.whl(32.23 MB)
v9.4.0(Aug 26, 2021)
This is the release note of v9.4.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Compile with SASS (CUBIN) for CUDA versions >= 11.1 (#5097)

Changes NVRTC compile process to produce SASS (CUBIN files) instead of PTX so that kernels compiled with a new CUDA Toolkit version can be run with earlier CUDA Drivers. Check the CUDA Compatibility Guide and NVRTC Documentation for detailed information. We believe most users will not be affected by this change, but you can revert to the previous behavior by setting CUPY_COMPILE_WITH_PTX=1 environment variable just in case.

Support for AMD ROCm 4.3

Support for ROCm 4.3 has been added in the latest release and binary wheels are provided as well. Note that there is currently an issue with ROCm 4.3 that prevents it from running in several environments. The current workaround is to set the LLVM_PATH variable to the llvm folder included in ROCm 4.3 installation (e.g., export LLVM_PATH=/opt/rocm-4.3/llvm).

Changes

Enhancements

Compile with SASS for CUDA versions >= 11.1 (#5611)

Allow to compile using PTX with an envvar (#5634)

Add ncclAvg and ncclBfloat16 for NCCL (#5656)

Fix version check for new ROCm version definition (#5661)

Rest of version check fix for new ROCm version definition (#5670)

Bug Fixes

Fix FFT convolve for shapes containing 1 (#5613)

Fix the RTC call path for HIP (#5620)

Fix compute capability check (#5646)

Fix squareness checks (#5652)

Fix unique for empty array (#5658)

Code Fixes

Fix kernel names to be consistent (#5625)

Remove unnecessary comments (#5635)

Documentation

Update Sphinx to 4.1.2 (#5616)

__array_function__ feature by default (#5653)

Support ROCm v4.3 in document (#5674)

Tests

Increase test timeout (#5615)

Increase timeout for CUDA 11.4 tests (#5617)

Add CI for ROCm 4.3 (#5632)

Reload GPG key for ROCm 4.2 test (#5637)

Fix cubic for_all_dtypes_combination tests (#5639)

Add a workaround for ROCm 4.3.0 for testing (#5663)

Fix skipTest in test_decomp_lu (#5672)

Others

Bump version to v9.4.0 (#5680)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@grlee77 @leofang @yashasvimisra2798
Source code(tar.gz)
Source code(zip)
cupy_cuda100-9.4.0-cp36-cp36m-manylinux1_x86_64.whl(56.27 MB)
cupy_cuda100-9.4.0-cp36-cp36m-win_amd64.whl(40.11 MB)
cupy_cuda100-9.4.0-cp37-cp37m-manylinux1_x86_64.whl(56.21 MB)
cupy_cuda100-9.4.0-cp37-cp37m-win_amd64.whl(40.11 MB)
cupy_cuda100-9.4.0-cp38-cp38-manylinux1_x86_64.whl(58.44 MB)
cupy_cuda100-9.4.0-cp38-cp38-win_amd64.whl(40.25 MB)
cupy_cuda100-9.4.0-cp39-cp39-manylinux1_x86_64.whl(57.00 MB)
cupy_cuda100-9.4.0-cp39-cp39-win_amd64.whl(40.22 MB)
cupy_cuda101-9.4.0-cp36-cp36m-manylinux1_x86_64.whl(57.52 MB)
cupy_cuda101-9.4.0-cp36-cp36m-win_amd64.whl(40.95 MB)
cupy_cuda101-9.4.0-cp37-cp37m-manylinux1_x86_64.whl(57.45 MB)
cupy_cuda101-9.4.0-cp37-cp37m-win_amd64.whl(40.95 MB)
cupy_cuda101-9.4.0-cp38-cp38-manylinux1_x86_64.whl(59.74 MB)
cupy_cuda101-9.4.0-cp38-cp38-win_amd64.whl(41.09 MB)
cupy_cuda101-9.4.0-cp39-cp39-manylinux1_x86_64.whl(58.26 MB)
cupy_cuda101-9.4.0-cp39-cp39-win_amd64.whl(41.06 MB)
cupy_cuda102-9.4.0-cp36-cp36m-manylinux1_x86_64.whl(57.92 MB)
cupy_cuda102-9.4.0-cp36-cp36m-win_amd64.whl(41.74 MB)
cupy_cuda102-9.4.0-cp37-cp37m-manylinux1_x86_64.whl(57.86 MB)
cupy_cuda102-9.4.0-cp37-cp37m-win_amd64.whl(41.74 MB)
cupy_cuda102-9.4.0-cp38-cp38-manylinux1_x86_64.whl(60.14 MB)
cupy_cuda102-9.4.0-cp38-cp38-win_amd64.whl(41.88 MB)
cupy_cuda102-9.4.0-cp39-cp39-manylinux1_x86_64.whl(58.66 MB)
cupy_cuda102-9.4.0-cp39-cp39-win_amd64.whl(41.84 MB)
cupy_cuda110-9.4.0-cp36-cp36m-manylinux1_x86_64.whl(71.89 MB)
cupy_cuda110-9.4.0-cp36-cp36m-win_amd64.whl(56.32 MB)
cupy_cuda110-9.4.0-cp37-cp37m-manylinux1_x86_64.whl(71.84 MB)
cupy_cuda110-9.4.0-cp37-cp37m-win_amd64.whl(56.32 MB)
cupy_cuda110-9.4.0-cp38-cp38-manylinux1_x86_64.whl(74.86 MB)
cupy_cuda110-9.4.0-cp38-cp38-win_amd64.whl(56.46 MB)
cupy_cuda110-9.4.0-cp39-cp39-manylinux1_x86_64.whl(73.24 MB)
cupy_cuda110-9.4.0-cp39-cp39-win_amd64.whl(56.43 MB)
cupy_cuda111-9.4.0-cp36-cp36m-manylinux1_x86_64.whl(90.39 MB)
cupy_cuda111-9.4.0-cp36-cp36m-win_amd64.whl(75.73 MB)
cupy_cuda111-9.4.0-cp37-cp37m-manylinux1_x86_64.whl(90.33 MB)
cupy_cuda111-9.4.0-cp37-cp37m-win_amd64.whl(75.73 MB)
cupy_cuda111-9.4.0-cp38-cp38-manylinux1_x86_64.whl(93.36 MB)
cupy_cuda111-9.4.0-cp38-cp38-win_amd64.whl(75.87 MB)
cupy_cuda111-9.4.0-cp39-cp39-manylinux1_x86_64.whl(91.74 MB)
cupy_cuda111-9.4.0-cp39-cp39-win_amd64.whl(75.83 MB)
cupy_cuda112-9.4.0-cp36-cp36m-manylinux1_x86_64.whl(72.44 MB)
cupy_cuda112-9.4.0-cp36-cp36m-win_amd64.whl(56.94 MB)
cupy_cuda112-9.4.0-cp37-cp37m-manylinux1_x86_64.whl(72.39 MB)
cupy_cuda112-9.4.0-cp37-cp37m-win_amd64.whl(56.94 MB)
cupy_cuda112-9.4.0-cp38-cp38-manylinux1_x86_64.whl(75.41 MB)
cupy_cuda112-9.4.0-cp38-cp38-win_amd64.whl(57.08 MB)
cupy_cuda112-9.4.0-cp39-cp39-manylinux1_x86_64.whl(73.79 MB)
cupy_cuda112-9.4.0-cp39-cp39-win_amd64.whl(57.04 MB)
cupy_cuda113-9.4.0-cp36-cp36m-manylinux1_x86_64.whl(69.62 MB)
cupy_cuda113-9.4.0-cp36-cp36m-win_amd64.whl(54.14 MB)
cupy_cuda113-9.4.0-cp37-cp37m-manylinux1_x86_64.whl(69.56 MB)
cupy_cuda113-9.4.0-cp37-cp37m-win_amd64.whl(54.14 MB)
cupy_cuda113-9.4.0-cp38-cp38-manylinux1_x86_64.whl(72.58 MB)
cupy_cuda113-9.4.0-cp38-cp38-win_amd64.whl(54.28 MB)
cupy_cuda113-9.4.0-cp39-cp39-manylinux1_x86_64.whl(70.96 MB)
cupy_cuda113-9.4.0-cp39-cp39-win_amd64.whl(54.24 MB)
cupy_cuda114-9.4.0-cp36-cp36m-manylinux1_x86_64.whl(73.35 MB)
cupy_cuda114-9.4.0-cp36-cp36m-win_amd64.whl(58.00 MB)
cupy_cuda114-9.4.0-cp37-cp37m-manylinux1_x86_64.whl(73.30 MB)
cupy_cuda114-9.4.0-cp37-cp37m-win_amd64.whl(58.00 MB)
cupy_cuda114-9.4.0-cp38-cp38-manylinux1_x86_64.whl(76.32 MB)
cupy_cuda114-9.4.0-cp38-cp38-win_amd64.whl(58.14 MB)
cupy_cuda114-9.4.0-cp39-cp39-manylinux1_x86_64.whl(74.70 MB)
cupy_cuda114-9.4.0-cp39-cp39-win_amd64.whl(58.10 MB)
cupy_cuda92-9.4.0-cp36-cp36m-manylinux1_x86_64.whl(52.54 MB)
cupy_cuda92-9.4.0-cp36-cp36m-win_amd64.whl(36.74 MB)
cupy_cuda92-9.4.0-cp37-cp37m-manylinux1_x86_64.whl(52.48 MB)
cupy_cuda92-9.4.0-cp37-cp37m-win_amd64.whl(36.74 MB)
cupy_cuda92-9.4.0-cp38-cp38-manylinux1_x86_64.whl(54.70 MB)
cupy_cuda92-9.4.0-cp38-cp38-win_amd64.whl(36.88 MB)
cupy_cuda92-9.4.0-cp39-cp39-manylinux1_x86_64.whl(53.26 MB)
cupy_cuda92-9.4.0-cp39-cp39-win_amd64.whl(36.85 MB)
v10.0.0b1(Aug 5, 2021)
This is the release note of v10.0.0b1. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

CuPy now supports CUDA 11.4 (cupy-cuda114)

Along with the new CUDA toolkit version, support for NCCL 2.10.3 and cuDNN 8.2.2 libraries is added.

Compute capability 86 support for GPUs of the RTX 30X0 and AX000 series is also added.

Google Summer of Code

CuPy is participating in Google Summer of Code under the NumFOCUS organization.

Our student @povinsahu1909 is working hard to add support for sparse linear algebra solvers and increasing the compatibility of the new random number generation API.

Compile with SASS (CUBIN) for CUDA versions >= 11.1 (#5097)

Changes NVRTC compile process to produce SASS (CUBIN files) instead of PTX so that kernels compiled with a new CUDA Toolkit version can be run with earlier CUDA Drivers. Check the CUDA Compatibility Guide and NVRTC Documentation for detailed information.

Changes without compatibility

Support the new DLPack exchange protocol (#5306)

By adopting the new DLPack exchange protocol proposed in the Python array API standard, cupy.fromDlpack has been deprecated in favor of cupy.from_dlpack.

Known Issues

cupy-cuda102, cupy-cuda110 and cupy-cuda111 wheels are not available yet in PyPI. In the meantime, they can be downloaded from the Assets section below. See #4971 for detailed instructions.

Changes

New Features

Texture memory 2D/3D affine transformations (#5171)

Support the new DLPack exchange protocol (#5306)

Add cupyx.scipy.sparse.linalg.lsmr (#5331)

JIT: Support all atomic intrinsics (#5387)

Expose _GUFunc through cupyx (#5408)

Add geometric distribution to new Generator (#5443)

Support Numba-like jit.gridsize() syntax in CuPy JIT (#5461)

Support Numba-like jit.laneid() and jit.warpsize syntax in CuPy JIT (#5462)

Add cupyx.scipy.sparse.linalg.cgs (#5524)

Add hypergeometric distribution to new Generator (#5560)

Enhancements

Compile with SASS for CUDA versions >= 11.1 (#5097)

Support NCCL v2.9.9 (#5268)

Support CUDA 11.4 and compute_86 (#5434)

Update NumPy/SciPy pinning in setup.py (#5453)

Make matrix_power support stacked matrices (#5458)

Support hipSPARSE and fix streams not set in some generic APIs in cuSPARSE (#5472)

Add cudaDeviceDisablePeerAccess wrapper (#5495)

Support cuDNN v8.2.2 (#5516)

Support NCCL v2.10.3: library installer and document (#5521)

Bug Fixes

JIT: Fix supported dtype of atomic_add on HIP (#5383)

Fix cupy.nanmedian's axis parameter to accept a sequence other than a tuple (#5389)

Fix astype from boolean (#5410)

Fix compatibility issues of ndarray.view (#5428)

Fix types attribute of ufunc (#5448)

Fix new DLPack protocol error messages and tests (#5449)

texture_memory option in affine_transform not supported by HIP (#5464)

Fix linalg.lstsq for empty matrix (#5467)

Fix reshape (#5470)

Fix random generator output not being raveled (#5478)

Fix random integers (#5479)

Fix availability tests in cuSOLVER and cuSPARSE (#5492)

Add missing hipSPARSE include to builder (#5515)

prune cuFFT static lib by major cc ver (#5531)

Fix casts from bool in ufunc inputs (#5539)

Access cudaMemoryType in the pointer attributes and fix for HIP (#5544)

Fix casts in ufunc outputs (#5550)

Code fix for {cu, roc}SOLVER (#5558)

Fix CUDA API call on module initialization (#5561)

Fix the RTC call path for HIP (#5569)

Fix broadcast error messages (#5579)

Code Fixes

Do not call cudnnGetVersion on import (#5326)

JIT: Fix __call__() for built-in functions (#5361)

Add HIP symbol redefinitions (#5362)

Remove the data member use_32bit_indexing from CArray (#5376)

Use dtype.name instead dtype.char (#5444)

Try to use -I in hipRTC (#5486)

Hide modules from public APIs (#5522)

consistent kernel names (#5551)

Use the new macro __HIP_PLATFORM_AMD__ at build time (#5554)

Documentation

Add upgrade guide for v10 (#5278)

Update tag lines in package description and docs index (#5399)

Fix typo in apply_along_axis (#5432)

Fix indent of Returns section (#5433)

Update user_guide/basic.rst device agnostic section (#5435)

Support CUDA 11.4 on documents (#5447)

Update install guide with new NumPy/SciPy versions (#5454)

Use from_dlpack instead of fromDlpack (#5488)

Use Sphinx 4.1.0 (#5489)

Bump ReadTheDocs configuration to version 2 (#5491)

Fix docs of eigh and eigvalsh (#5494)

Add a lingering doc page for fromDlpack() (#5509)

Document scipy.fft backend usage (#5514)

Replaced the links for NumPy docs as per issue #3418 (#5548)

Use Sphinx's envvar construct (#5570)

Fix intersphinx for SciPy 1.7.1 docs (#5587)

Installation

Fix license_file option in setup.cfg (#5406)

Import numpy before Cython (#5482)

Tests

Add tests for num_to_num's optional parameters (#5337)

Add script for ROCm CI on Jenkins (#5378)

Skip unwrap tests for numpy<1.21 (#5384)

Enable strict xfail in pytest (#5407)

Remove xfail in windows jitify test (#5409)

Fix preloading slow tests (#5440)

Add script for CUDA 11.4 CI on FlexCI (#5457)

Increase memory for CUDA 11.4 tests (#5477)

Fix DLPack test for ROCm/HIP (#5485)

Fix "Revert test decorators order" (#5498)

Fix some tests for HIP (#5501)

Fix FlexCI Linux tests (#5505)

Add CUDA 11.4 for FlexCI helper script (#5528)

Increase timeout for CUDA 11.4 tests (#5575)

Update tests to install all requirements and add PATH (#5576)

Add Cython to all requirements (#5577)

Others

Notify conflict by mergify (#5371)

Fix mergify to only comment when pull-request is open (#5439)

Fix mergify condition (#5513)

Add auto notify bot for hip label (#5538)

Use pull_request_target instead for auto notify bot (#5541)

Fix auto notify bot for issues (#5546)

Disable Mergify's auto-merge (#5556)

Bump version to v10.0.0b1 (#5595)

Fix signal tests for scipy 1.7.0 (#5368)

Fix numpy.unwrap for NumPy 1.21 (#5385)

Fix signaltools medfilt for scipy>=1.7.0 (#5386)

Fix deprecated numpy.typeDict utilization (#5388)

The CuPy Team would like to thank all those who contributed to this release!

@12rambau @grlee77 @leofang @maxim-belkin @Palash-Vishnani @povinsahu1909 @the-lay
Source code(tar.gz)
Source code(zip)
cupy_cuda100-10.0.0b1-cp36-cp36m-manylinux1_x86_64.whl(57.44 MB)
cupy_cuda100-10.0.0b1-cp36-cp36m-win_amd64.whl(41.13 MB)
cupy_cuda100-10.0.0b1-cp37-cp37m-manylinux1_x86_64.whl(57.38 MB)
cupy_cuda100-10.0.0b1-cp37-cp37m-win_amd64.whl(41.13 MB)
cupy_cuda100-10.0.0b1-cp38-cp38-manylinux1_x86_64.whl(59.64 MB)
cupy_cuda100-10.0.0b1-cp38-cp38-win_amd64.whl(41.27 MB)
cupy_cuda100-10.0.0b1-cp39-cp39-manylinux1_x86_64.whl(58.18 MB)
cupy_cuda100-10.0.0b1-cp39-cp39-win_amd64.whl(41.24 MB)
cupy_cuda101-10.0.0b1-cp36-cp36m-manylinux1_x86_64.whl(58.75 MB)
cupy_cuda101-10.0.0b1-cp36-cp36m-win_amd64.whl(42.07 MB)
cupy_cuda101-10.0.0b1-cp37-cp37m-manylinux1_x86_64.whl(58.69 MB)
cupy_cuda101-10.0.0b1-cp37-cp37m-win_amd64.whl(42.07 MB)
cupy_cuda101-10.0.0b1-cp38-cp38-manylinux1_x86_64.whl(61.00 MB)
cupy_cuda101-10.0.0b1-cp38-cp38-win_amd64.whl(42.21 MB)
cupy_cuda101-10.0.0b1-cp39-cp39-manylinux1_x86_64.whl(59.50 MB)
cupy_cuda101-10.0.0b1-cp39-cp39-win_amd64.whl(42.18 MB)
cupy_cuda102-10.0.0b1-cp36-cp36m-manylinux1_x86_64.whl(59.16 MB)
cupy_cuda102-10.0.0b1-cp36-cp36m-win_amd64.whl(42.85 MB)
cupy_cuda102-10.0.0b1-cp37-cp37m-manylinux1_x86_64.whl(59.10 MB)
cupy_cuda102-10.0.0b1-cp37-cp37m-win_amd64.whl(42.84 MB)
cupy_cuda102-10.0.0b1-cp38-cp38-manylinux1_x86_64.whl(61.41 MB)
cupy_cuda102-10.0.0b1-cp38-cp38-win_amd64.whl(42.99 MB)
cupy_cuda102-10.0.0b1-cp39-cp39-manylinux1_x86_64.whl(59.91 MB)
cupy_cuda102-10.0.0b1-cp39-cp39-win_amd64.whl(42.95 MB)
cupy_cuda110-10.0.0b1-cp36-cp36m-manylinux1_x86_64.whl(73.27 MB)
cupy_cuda110-10.0.0b1-cp36-cp36m-win_amd64.whl(57.58 MB)
cupy_cuda110-10.0.0b1-cp37-cp37m-manylinux1_x86_64.whl(73.21 MB)
cupy_cuda110-10.0.0b1-cp37-cp37m-win_amd64.whl(57.58 MB)
cupy_cuda110-10.0.0b1-cp38-cp38-manylinux1_x86_64.whl(76.28 MB)
cupy_cuda110-10.0.0b1-cp38-cp38-win_amd64.whl(57.72 MB)
cupy_cuda110-10.0.0b1-cp39-cp39-manylinux1_x86_64.whl(74.64 MB)
cupy_cuda110-10.0.0b1-cp39-cp39-win_amd64.whl(57.69 MB)
cupy_cuda111-10.0.0b1-cp36-cp36m-manylinux1_x86_64.whl(92.11 MB)
cupy_cuda111-10.0.0b1-cp36-cp36m-win_amd64.whl(77.34 MB)
cupy_cuda111-10.0.0b1-cp37-cp37m-manylinux1_x86_64.whl(92.05 MB)
cupy_cuda111-10.0.0b1-cp37-cp37m-win_amd64.whl(77.34 MB)
cupy_cuda111-10.0.0b1-cp38-cp38-manylinux1_x86_64.whl(95.12 MB)
cupy_cuda111-10.0.0b1-cp38-cp38-win_amd64.whl(77.48 MB)
cupy_cuda111-10.0.0b1-cp39-cp39-manylinux1_x86_64.whl(93.48 MB)
cupy_cuda111-10.0.0b1-cp39-cp39-win_amd64.whl(77.45 MB)
cupy_cuda112-10.0.0b1-cp36-cp36m-manylinux1_x86_64.whl(73.41 MB)
cupy_cuda112-10.0.0b1-cp36-cp36m-win_amd64.whl(57.78 MB)
cupy_cuda112-10.0.0b1-cp37-cp37m-manylinux1_x86_64.whl(73.35 MB)
cupy_cuda112-10.0.0b1-cp37-cp37m-win_amd64.whl(57.78 MB)
cupy_cuda112-10.0.0b1-cp38-cp38-manylinux1_x86_64.whl(76.43 MB)
cupy_cuda112-10.0.0b1-cp38-cp38-win_amd64.whl(57.92 MB)
cupy_cuda112-10.0.0b1-cp39-cp39-manylinux1_x86_64.whl(74.78 MB)
cupy_cuda112-10.0.0b1-cp39-cp39-win_amd64.whl(57.89 MB)
cupy_cuda113-10.0.0b1-cp36-cp36m-manylinux1_x86_64.whl(70.58 MB)
cupy_cuda113-10.0.0b1-cp36-cp36m-win_amd64.whl(54.98 MB)
cupy_cuda113-10.0.0b1-cp37-cp37m-manylinux1_x86_64.whl(70.53 MB)
cupy_cuda113-10.0.0b1-cp37-cp37m-win_amd64.whl(54.98 MB)
cupy_cuda113-10.0.0b1-cp38-cp38-manylinux1_x86_64.whl(73.60 MB)
cupy_cuda113-10.0.0b1-cp38-cp38-win_amd64.whl(55.12 MB)
cupy_cuda113-10.0.0b1-cp39-cp39-manylinux1_x86_64.whl(71.95 MB)
cupy_cuda113-10.0.0b1-cp39-cp39-win_amd64.whl(55.08 MB)
cupy_cuda114-10.0.0b1-cp36-cp36m-manylinux1_x86_64.whl(74.32 MB)
cupy_cuda114-10.0.0b1-cp36-cp36m-win_amd64.whl(58.83 MB)
cupy_cuda114-10.0.0b1-cp37-cp37m-manylinux1_x86_64.whl(74.26 MB)
cupy_cuda114-10.0.0b1-cp37-cp37m-win_amd64.whl(58.83 MB)
cupy_cuda114-10.0.0b1-cp38-cp38-manylinux1_x86_64.whl(77.33 MB)
cupy_cuda114-10.0.0b1-cp38-cp38-win_amd64.whl(58.98 MB)
cupy_cuda114-10.0.0b1-cp39-cp39-manylinux1_x86_64.whl(75.69 MB)
cupy_cuda114-10.0.0b1-cp39-cp39-win_amd64.whl(58.94 MB)

Owner

CuPy

A NumPy-compatible array library accelerated by CUDA

GitHub https://cupy.dev

CUDA integration for Python, plus shiny features

PyCUDA lets you access Nvidia's CUDA parallel computation API from Python. Several wrappers of the CUDA API already exist-so what's so special about P

1.4k Jan 2, 2023

BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.

A lightweight, GPU accelerated, SQL engine built on the RAPIDS.ai ecosystem. Get Started on app.blazingsql.com Getting Started | Documentation | Examp

1.8k Jan 2, 2023

ArrayFire: a general purpose GPU library.

ArrayFire is a general-purpose library that simplifies the process of developing software that targets parallel and massively-parallel architectures i

4k Dec 29, 2022

cuDF - GPU DataFrame Library

cuDF - GPU DataFrames NOTE: For the latest stable README.md ensure you are on the main branch. Resources cuDF Reference Documentation: Python API refe

5.2k Jan 8, 2023

Python 3 Bindings for NVML library. Get NVIDIA GPU status inside your program.

py3nvml Documentation also available at readthedocs. Python 3 compatible bindings to the NVIDIA Management Library. Can be used to query the state of

212 Jan 4, 2023

cuML - RAPIDS Machine Learning Library

cuML - GPU Machine Learning Algorithms cuML is a suite of libraries that implement machine learning algorithms and mathematical primitives functions t

3.1k Jan 4, 2023

cuGraph - RAPIDS Graph Analytics Library

cuGraph - GPU Graph Analytics The RAPIDS cuGraph library is a collection of GPU accelerated graph algorithms that process data found in GPU DataFrames

1.2k Jan 1, 2023

cuSignal - RAPIDS Signal Processing Library

cuSignal The RAPIDS cuSignal project leverages CuPy, Numba, and the RAPIDS ecosystem for GPU accelerated signal processing. In some cases, cuSignal is

646 Dec 30, 2022

Python 3 Bindings for the NVIDIA Management Library

====== pyNVML ====== *** Patched to support Python 3 (and Python 2) *** ------------------------------------------------ Python bindings to the NVID

95 Jan 1, 2023

Library for faster pinned CPU <-> GPU transfer in Pytorch

SpeedTorch Faster pinned CPU tensor <-> GPU Pytorch variabe transfer and GPU tensor <-> GPU Pytorch variable transfer, in certain cases. Update 9-29-1

657 Dec 19, 2022

Calculates JMA (Japan Meteorological Agency) seismic intensity (shindo) scale from acceleration data recorded in NumPy array

shindo.py Calculates JMA (Japan Meteorological Agency) seismic intensity (shindo) scale from acceleration data stored in NumPy array Introduction Japa

3 Sep 23, 2022

A NumPy-compatible array library accelerated by CUDA

Related tags

Overview

CuPy : A NumPy-compatible array library accelerated by CUDA

Installation

Run on Docker

More information

License

Reference

Comments

Design considerations

Approach

Example

Currently observed issues

TODO

Description

To Reproduce

Installation

Environment

Additional Information

Releases(v12.0.0b2)

v12.0.0b2(Dec 8, 2022)

Highlights

More cupyx.scipy.interpolate APIs (#7086, #7190 and #7215)

Use CUB reduction classes in cupyx.jit (#7145)

Changes

New Features

Enhancements

Bug Fixes

Tests

Contributors

v11.4.0(Dec 8, 2022)

Changes

Enhancements

Bug Fixes

Tests

Contributors

v12.0.0b1(Nov 11, 2022)

Highlights

Support for CUDA 11.8 & NVIDIA H100 GPUs

Support for Python 3.11

ufunc Methods

Use Thrust in cupyx.jit (#7054, #7139)

Changes without compatibility

Deprecates ndarray.scatter_{add,max,min} (#7097)

CUDA library wrappers now live in cupyx (#7013)

Changes

New Features

Enhancements

Bug Fixes

Code Fixes

Documentation

Installation

Examples

Tests

Others

Contributors

v11.3.0(Nov 11, 2022)

Highlights

Support for CUDA 11.8 & NVIDIA H100 GPUs

Support for Python 3.11

Changes

Enhancements

Bug Fixes

Documentation

Installation

Tests

Others

Contributors

v12.0.0a2(Oct 6, 2022)

Highlights

Increased cupyx.scipy APIs (#6773, #6990, #7014, #7015, #7036)

Initial support for ufunc methods (#7049)

Changes

New Features

Enhancements

Bug Fixes

Code Fixes

Documentation

Tests

More `cupyx.scipy.interpolate` APIs (#7086, #7190 and #7215)

Use CUB reduction classes in `cupyx.jit` (#7145)

`ufunc` Methods

Use Thrust in `cupyx.jit` (#7054, #7139)

Deprecates `ndarray.scatter_{add,max,min}` (#7097)

CUDA library wrappers now live in `cupyx` (#7013)

Increased `cupyx.scipy` APIs (#6773, #6990, #7014, #7015, #7036)

Initial support for `ufunc` methods (#7049)

Increased `cupyx.scipy` APIs (#6823, #6849, #6855, #6890, #6958, #6971)

`cupy-wheel` package

Support for `ndarray` subclassing (#6720, #6755)