cuDF - GPU DataFrame Library

RAPIDS

Last update: Dec 31, 2022

Related tags

Data Containers anaconda gpu arrow machine-learning-algorithms h2o cuda pandas python-api mapd gpu-dataframe rapids cudf

Overview

cuDF - GPU DataFrames

NOTE: For the latest stable README.md ensure you are on the main branch.

Built based on the Apache Arrow columnar memory format, cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data.

cuDF provides a pandas-like API that will be familiar to data engineers & data scientists, so they can use it to easily accelerate their workflows without going into the details of CUDA programming.

For example, the following snippet downloads a CSV, then uses the GPU to parse it into rows and columns and run calculations:

import cudf, io, requests
from io import StringIO

url = "https://github.com/plotly/datasets/raw/master/tips.csv"
content = requests.get(url).content.decode('utf-8')

tips_df = cudf.read_csv(StringIO(content))
tips_df['tip_percentage'] = tips_df['tip'] / tips_df['total_bill'] * 100

# display average tip by dining party size
print(tips_df.groupby('size').tip_percentage.mean())

Output:

size
1    21.729201548727808
2    16.571919173482897
3    15.215685473711837
4    14.594900639351332
5    14.149548965142023
6    15.622920072028379
Name: tip_percentage, dtype: float64

For additional examples, browse our complete API documentation, or check out our more detailed notebooks.

Quick Start

Please see the Demo Docker Repository, choosing a tag based on the NVIDIA CUDA version you’re running. This provides a ready to run Docker container with example notebooks and data, showcasing how you can utilize cuDF.

Installation

CUDA/GPU requirements

CUDA 10.1+
NVIDIA driver 418.39+
Pascal architecture or better (Compute Capability >=6.0)

Conda

cuDF can be installed with conda (miniconda, or the full Anaconda distribution) from the rapidsai channel:

For cudf version == 0.18 :

# for CUDA 10.1
conda install -c rapidsai -c nvidia -c numba -c conda-forge \
    cudf=0.18 python=3.7 cudatoolkit=10.1

# or, for CUDA 10.2
conda install -c rapidsai -c nvidia -c numba -c conda-forge \
    cudf=0.18 python=3.7 cudatoolkit=10.2

For the nightly version of cudf :

# for CUDA 10.1
conda install -c rapidsai-nightly -c nvidia -c numba -c conda-forge \
    cudf python=3.7 cudatoolkit=10.1

# or, for CUDA 10.2
conda install -c rapidsai-nightly -c nvidia -c numba -c conda-forge \
    cudf python=3.7 cudatoolkit=10.2

Note: cuDF is supported only on Linux, and with Python versions 3.7 and later.

See the Get RAPIDS version picker for more OS and version info.

Build/Install from Source

See build instructions.

Contributing

Please see our guide for contributing to cuDF.

Contact

Find out more details on the RAPIDS site

Open GPU Data Science

The RAPIDS suite of open source software libraries aim to enable execution of end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization, but exposing that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.

Apache Arrow on GPU

The GPU version of Apache Arrow is a common API that enables efficient interchange of tabular data between processes running on the GPU. End-to-end computation on the GPU avoids unnecessary copying and converting of data off the GPU, reducing compute time and cost for high-performance analytics common in artificial intelligence workloads. As the name implies, cuDF uses the Apache Arrow columnar data format on the GPU. Currently, a subset of the features in Apache Arrow are supported.

Comments

Make a plan for sort_values/set_index

It would be nice to be able to use the set_index method to sort the dataframe by a particular column.

There are currently two implementations for this, one in dask.dataframe and one in dask-cudf which uses a batcher sorting net. While most dask-cudf code has been removed in favor of the dask.dataframe implementations this sorting code has remained, mostly because I don't understand it fully, and don't know if there was a reason for this particular implementation.

Why was this implementation chosen? Was this discussed somewhere? Alternatively @sklam, do you have any information here?

cc @kkraus14 @randerzander
cuDF (Python) dask dask-cudf

opened by mrocklin 80
[WIP] Update cudf.to_parquet to use new GPU accelerated Parquet writer

Update cudf.to_parquet to use new GPU accelerated Parquet writer. This including creating the appropriate c++ interface in io_writers and io_functions along with modifications to parquet pyx and pxd files.

This closes #3574
libcudf cuDF (Python) 5 - Ready to Merge cuIO Cython

opened by jdye64 66
[QST] cuDF performance with gridsearchcv

In a conversation with @kkraus14 about cudf usage with cuml+gridsearch we looked at cudf performance. Attached is profile plot of running gridsearch+cuml+cudf.

Folks can download the full dask profile here: https://gist.github.com/quasiben/1da49c5aa6e61d979dd42ce6c50e79b3

In the image above you can see that the computation is spending ~80% in the iloc call. My initial thought was that iloc/_prepare_series_for_add could be improved. I believe @kkraus14 suggested we look at host_to_device transfers see if we can build requisite indicies in cuda/cupy/numba_cuda instead of numpy (this is required during splitting/kfold calls)
question Performance

opened by quasiben 59
[DISCUSSION] libcudf column abstraction redesign
Creating and interfacing with the gdf_column C struct is rife with problems. We need better abstraction(s) to make libcudf a safer and more pleasant library to use and develop.

In lieu of adding support for any new column types, the initial goal of a cudf::column design is to ease the current pain points in creating and interfacing with the column data structure in libcudf.

Goals:

Identify pain points with existing gdf_column structure

Derive requirements for an abstraction or set of abstractions to ease those pain points

Define an API design that satisfies the requirements

Provide a working implementation of the design

Non-Goals

Derive requirements to support new column types, e.g., variable width elements, compressed columns, etc.

Support delayed materialization or lazy evaluation

Note that a “Non-Goal” is not something that we want to expressly forbid in our redesign, but rather are not the focus of the current effort. Whenever possible, we can make design decisions that will enable these “Non-Goals” sometime in the future, so long as those decisions do not compromise the timely accomplishment of the above “Goals”

Process

1. Gather pain points

Those who wish to participate should list 3-5 pain points (in priority order) that they would like to solve with the column redesign.

Note that choosing to participate implies a commitment to putting in the effort to derive requirements and provide feedback on designs, i.e., if you want something to change, you’re expected to put in the work to make it happen.

Pain points should be submitted by responding to this issue.

@jrhemstad will take responsibility for gathering pain points and distilling/organizing based on functional area.

Proposed Deadline: 0.7 release

2. Derive requirements

Distill pain points into satisfiable requirements

@jrhemstad will take responsibility for providing an initial draft of requirements from pain points and distributing for feedback.

Stakeholders will provide feedback on requirements and iterate until consensus is reached on initial requirements

Proposed Deadline: 0.8 Release

3. Design Mock Up

Create draft interface of class(es) that attempt to satisfy requirements.

APIs should be fully Doxymented.

Code does not need to function nor compile

Design should be submitted via a PR to cuDF

TBD will take responsibility for providing an initial interface design

Stakeholders will provide feedback and iterate until consensus is reached on design

Proposed Deadline: 0.8 Release

4. Implementation

Implement the agreed upon interface

Should provide Google Test unit tests

Implementation/testing will likely expose necessary design changes

Implementation should be submitted as a PR to cuDF

TBD will take responsibility for implementing/testing the design

Stakeholders will review implementation PR until consensus is reached

Proposed Deadline 0.8 Release

5. Initial Refactor

Two candidate libcudf features shall be chosen for refactoring to use the new cudf::column abstraction

Two developers (TBD) will take responsibility for refactoring the features (one each) to use the newly designed abstraction(s) and submitting a cuDF PR for review. At least one of the developers shall be different from the developer who designed and implemented the column abstraction.

Any required design changes exposed in refactoring shall be discussed in the PR

Stakeholders will review refactored feature until consensus is reached

TBD will be responsible for creating/amending a style guide with lessons learned and best practices for refactoring a feature using gdf_column to the new abstraction(s)

Proposed Deadline: 0.9 Release

6. Full Refactor

Remaining libcudf features will be refactored one at a time to use the new column abstraction(s)

The style guide mentioned above will be distributed to all libcudf developers to provide guidance in this refactoring effort

This will be an ongoing process that likely will not be fully complete for several releases

feature request help wanted proposal libcudf
opened by jrhemstad 55
[FEA] Make cudf::size_type 64-bit
Is your feature request related to a problem? Please describe. cudf::size_type is currently an int32_t, which limits column size to two billion elements (MAX_INT). Moreover, it limits child column size to the same. This causes problems, for example, for string columns, where there may be fewer than 2B strings, but the character data to represent them could easily exceed 2B characters.

A 32-bit size was originally chosen to ensure compatibility with Apache Arrow, which dictates that Arrow arrays have a 32-bit size, and that larger arrays are made by chunking into individual Arrays.

Describe the solution you'd like

Change size_type to be an int64_t.

Handle compatibility with Arrow by creating arrow chunked arrays in the libcudf to_arrow interface (not yet created), and combine arrow chunked arrays in the libcudf from_arrow interface. This can be dealt with when we create these APIs.

Describe alternatives you've considered

Chunked columns. This would be very challenging -- supporting chunked columns in every algorithm would result in complex distributed algorithms and implementations, where libcudf currently aims to be communication agnostic / ignorant. In other words, a higher level library handles distributed algorithms.

Additional context

A potential downside: @felipeblazing called us brave for considering supporting chunked columns. If we implement this feature request, perhaps he will not consider us quite so brave. :(
feature request wontfix libcudf
opened by harrism 52
[Discussion] Requirements for schema/column names

There have been a number of requests related to adding column names, either to the column's themselves and/or to tables and their views.

libcudf internals don't use column names, so we need requirements to be driven by users that will make use of the names (cuIO/Spark/cuDF).

For those who need column names, please discuss what you would like to see for column names.

CC @kkraus14 @revans2 @jlowe @j-ieong @shwina
feature request cuDF (Python) cuIO helps: Spark

opened by jrhemstad 50

[BUG] nan_as_null parameter affects output of sort_values.

Describe the bug nan_as_null parameter affects output of sort_values.

Steps/Code to reproduce bug

In [22]: df = cudf.DataFrame({'a': cudf.Series([np.nan, 1.0, np.nan, 2.0, np.nan, 0.0], nan_as_null=True)})

In [23]: print(df.sort_values(by='a'))
     a
5  0.0
1  1.0
3  2.0
0
2
4
In [19]: df = cudf.DataFrame({'a': cudf.Series([np.nan, 1.0, np.nan, 2.0, np.nan, 0.0], nan_as_null=False)})

In [20]: print(df.sort_values(by='a'))
     a
0  nan
1  1.0
2  nan
3  2.0
4  nan
5  0.0

similar issues with methods using libcudf APIs. Eg. drop_duplicates (which uses sorting)

df = cudf.DataFrame({'a': cudf.Series([1.0, np.nan, 0, np.nan, 0, 1], nan_as_null=False)})

In [10]: print(df)
     a
0  1.0
1  nan
2  0.0
3  nan
4  0.0
5  1.0

In [11]: print(df.drop_duplicates())
     a
0  1.0
1  nan
2  0.0
3  nan
4  0.0
5  1.0

Expected behavior For sorting, drop_duplicates, nan should be considered equal.

Environment overview (please complete the following information)

Environment location: Bare-metal
Method of cuDF install: from source

bug libcudf helps: Spark

opened by karthikeyann 48

[DISCUSSION] Behavior for NaN comparisons in libcudf

Recent issues (https://github.com/rapidsai/cudf/issues/4753 https://github.com/rapidsai/cudf/issues/4752) have called into question how libcudf handles NaN floating point values. We've only ever addressed this issue on an ad hoc basis as opposed to having a larger conversation about the issue.

C++

C++ follows the IEEE 754 standard for floating point values, which for comparisons with NaN has the following behavior:

| Comparison | NaN ≥ x | NaN ≤ x | NaN > x | NaN < x | NaN = x | NaN ≠ x | |------------|--------------|--------------|--------------|--------------|--------------|-------------| | Result | Always False | Always False | Always False | Always False | Always False | Always True |

https://en.wikipedia.org/wiki/NaN

Spark

Spark is non-conforming with the IEEE 754 standard:

| Comparison | NaN ≥ x | NaN ≤ x | NaN > x | NaN < x | NaN = x | NaN ≠ x | |------------|-------------|-----------------------|-------------|--------------|------------------|----------------------| | Result | Always True | False unless x is NaN | Always True | Always False | True only if x is NaN | True unless x is NaN |

See https://spark.apache.org/docs/latest/sql-reference.html#nan-semantics

Python/Pandas

Python is a bit of a grey area because prior to 1.0, Pandas did not have the concept of "null" values and used NaN's in their stead.

In most regards, Python does respect IEEE 754. For example, see how numpy conforms with the expected IEEE754 behavior in binary ops https://github.com/rapidsai/cudf/issues/4752#issuecomment-606649251 (where Spark does not).

However, there are some cases where Pandas is non-conforming due to the pseudo-null behavior. For example, in sort_values there is a na_position argument to control where NaN values are placed. This requires specializing the libcudf comparator used for sorting to special case floating point values and deviate from the IEEE 754 behavior of NaN < x == false and NaN > x == false. See https://github.com/rapidsai/cudf/issues/2191 and https://github.com/rapidsai/cudf/issues/3226 where this was done previously.

That said, I believe Python's requirements could be satisfied by always converting NaN values to nulls, but @shwina @kkraus14 will need to confirm. Prior to Pandas 1.0, it wasn't possible to have both NaN and NULL values in a floating point column. We should see what the expected behavior is of NaNs vs Nulls will be in 1.0.

Discussion

We need to have a conversation and make decisions on what libcudf will and will not do with respect to NaN behavior.

My stance is that libcudf should adhere to IEEE 754. Spark's semantics redefine a core concept of the C++ language/IEEE standard and satisfying those semantics would require extremely invasive changes that negatively impact both performance and code maintainability.

Even worse, because Spark differs from C++/Pandas, we need to provide separate code paths for all comparison based operations: a "Spark" path, and a "C++/Pandas" path. This further increases code bloat and maintenance costs.

Furthermore, for consistency, I think we should roll back the non-conformant changes introduced for comparators in https://github.com/rapidsai/cudf/issues/3226.

In conclusion, we already have special logic for handling NULLs everywhere in libcudf. Users should leverage that logic by converting NaNs to NULLs. I understand that vanilla Spark treats NaNs and NULLs independently, but I believe trying to imitate that behavior in libcudf comes at too high a cost.
libcudf cuDF (Python) helps: Spark

opened by jrhemstad 45
[DOC] [BUG] Building from source fails as deps are not fetched
Describe the bug Building v0.9.0 from source fails as some dependencies are missing or not fetched.

Steps/Code to reproduce bug

git clone and checkout v0.9.0.

update submodules

bash build.sh libcudf

-- RMM: RMM_LIBRARY set to RMM_LIBRARY-NOTFOUND -- RMM: RMM_INCLUDE set to RMM_INCLUDE-NOTFOUND -- DLPACK: DLPACK_INCLUDE set to DLPACK_INCLUDE-NOTFOUND -- NVSTRINGS: NVSTRINGS_INCLUDE set to NVSTRINGS_INCLUDE-NOTFOUND -- NVSTRINGS: NVSTRINGS_LIBRARY set to NVSTRINGS_LIBRARY-NOTFOUND -- NVSTRINGS: NVCATEGORY_LIBRARY set to NVCATEGORY_LIBRARY-NOTFOUND -- NVSTRINGS: NVTEXT_LIBRARY set to NVTEXT_LIBRARY-NOTFOUND

Expected behavior Build succeeds without missing deps.

Environment overview (please complete the following information)

Environment location: Centos 7, avx512

Method of cuDF install: source

Additional context

The documentation does not state dlpack, rmm or nvstrings as dependencies.

According to the RMM README

RMM currently must be built from source. This happens automatically in a submodule when you build or install cuDF or RAPIDS containers.

Users should then expect that theses deps should be automatically pulled.
bug doc
opened by ccoulombe 42
[FEA] CUDA versions between PyTorch and RAPIDS

Hi Developers,

Thanks for the great tools you have made. Our group would like to use cudf for deep learning, however pytorch currently only support CUDA 10.2 and CUDA 11.1, the nightly version rapids supported is CUDA 11.0 and 11.2, which is a pain for users (mostly scientist) if they need to compile either pytorch or rapids from source. Is that possible for rapids to support CUDA 11.1 for user to install from conda?

I noticed that #8224 just remove cuda 11.1 related files.

Thanks! Richard
feature request conda

opened by yueyericardo 41
Use cuFile for Parquet IO when available
Adds optional cuFile integration:

cufile.h is included in the build when available.

libcufile.so is loaded at runtime if LIBCUDF_CUFILE_POLICY environment variable is set to "ALWAYS" or "GDS".

cuFile compatibility mode is set through the same policy variable - "ALWAYS" means on, "GDS" means off.

cuFile is currently only used on Parquet R/W and in CSV writer.

device_read/write API can be used with file datasource/data_sink.

Added CUDA stream to device_read.

libcudf CMake cuIO Performance improvement non-breaking
opened by vuule 41
[QST] CPU memory spike during cudf dataframe conversion

Hi all, I have a dataframe that is ~19K Rows, ~11.4 MB (Profiled using df.info(memory_usage = "deep")). We are currently running into CPU out of memory issues and so profiling our memory using this sample dataset. As you can see in the screenshot attached, there is a jump in mem usage, from 840MiB -> 4148MiB, during the type conversion of df. Image below shows the dataframe memory usage after conversion.

My question is: Why is there a jump in the memory usage when converting a dataframe from pandas to cudf? Furthermore, this memory is not released after, and so increases from this point in following processing steps.
question ? - Needs Triage

opened by lvxhnat 0
nvcc fatal : Unsupported gpu architecture 'compute_NATIVE'

When I run ./build.sh with branch 22.12 ,the erro print log is nvcc fatal : Unsupported gpu architecture 'compute_NATIVE' How can I deal the erro ,thank you.
question ? - Needs Triage

opened by newjavaer 0
JSON data page in user guide
Description

Adding a walkthrough to the User Guide that extracts data from common JSON structures.

Checklist

[ ] I am familiar with the Contributing Guidelines.

[ ] New or existing tests cover these changes.

[ ] The documentation is up to date with these changes.
opened by GregoryKimball 1
Support "values" orient (array of arrays) in Nested JSON reader
Description

Legacy GPU JSON reader can read "values" orient data in JSON string (only JSON lines). With this PR change, Nested JSON reader can also reader "values" orient data for both JSON lines and non-line JSON string.

Examples:

import cudf json="[[1, 2, 3], [4, 5], [7, 8, 9]]" cudf.read_json(json, engine="cudf_experimental") 0 1 2 0 1 2 3 1 4 5 <NA> 2 7 8 9 json="[1, 2, 3]\n [4, 5, null]\n [7, 8, [9]]" cudf.read_json(json, engine="cudf_experimental", lines=True) 0 1 2 0 1 2 None 1 4 5 None 2 7 8 [9]

Note that pandas passes "values" data but with orient="records" argument, but it is parsed as "values". Similar support is added here too. Passing values with orient="records" will still work).

Checklist

[x] I am familiar with the Contributing Guidelines.

[ ] New or existing tests cover these changes.

[ ] The documentation is up to date with these changes.

feature request 2 - In Progress libcudf cuDF (Python) cuIO non-breaking
opened by karthikeyann 0

[FEA] category dtype support in parquet reader

Is your feature request related to a problem? Please describe. writing code with import cudf as pd

Describe the solution you'd like same behavior as import pandas as pd

In [1]: import cudf as pd

In [2]: pd.__version__
Out[2]: '22.12.01'

In [3]: df = pd.DataFrame({'a': ['one','two','three'] * 10})

In [4]: df.info()
<class 'cudf.core.dataframe.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   a       30 non-null     object
dtypes: object(1)
memory usage: 234.0+ bytes

In [5]: df.a = df.astype('category')

In [6]: df.info()
<class 'cudf.core.dataframe.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   a       30 non-null     category
dtypes: category(1)
memory usage: 57.0 bytes

In [7]: %ls df.parquet
ls: cannot access 'df.parquet': No such file or directory

In [8]: df.to_pandas().to_parquet('df.parquet')

In [9]: %ls df.parquet
df.parquet

In [10]: pd.read_parquet('df.parquet').info()
<class 'cudf.core.dataframe.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   a       30 non-null     object
dtypes: object(1)
memory usage: 234.0+ bytes

In [11]: import pandas

In [12]: pandas.read_parquet('df.parquet').info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype   
---  ------  --------------  -----   
 0   a       30 non-null     category
dtypes: category(1)
memory usage: 290.0 bytes

In [13]: pd.DataFrame(pandas.read_parquet('df.parquet')).info()
<class 'cudf.core.dataframe.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   a       30 non-null     category
dtypes: category(1)
memory usage: 57.0 bytes

the parquet reader turns the column into dtype=object

In [10]: pd.read_parquet('df.parquet').info()
<class 'cudf.core.dataframe.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   a       30 non-null     object
dtypes: object(1)
memory usage: 234.0+ bytes

feature request ? - Needs Triage

opened by mattf 0

[FEA] category dtype support in parquet writer

Is your feature request related to a problem? Please describe. writing code with import cudf as pd

Describe the solution you'd like same behavior as import pandas as pd

In [1]: import cudf as pd

In [2]: pd.__version__
Out[2]: '22.12.01'

In [3]: df = pd.DataFrame({'a': ['one','two','three'] * 10})

In [4]: df.info()
<class 'cudf.core.dataframe.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   a       30 non-null     object
dtypes: object(1)
memory usage: 234.0+ bytes

In [5]: df.a = df.astype('category')

In [6]: df.info()
<class 'cudf.core.dataframe.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   a       30 non-null     category
dtypes: category(1)
memory usage: 57.0 bytes

In [7]: df.to_parquet('df.parquet')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[7], line 1
----> 1 df.to_parquet('df.parquet')

File .../lib/python3.8/site-packages/cudf/core/dataframe.py:6287, in DataFrame.to_parquet(self, path, engine, compression, index, partition_cols, partition_file_name, partition_offsets, statistics, metadata_file_path, int96_timestamps, row_group_size_bytes, row_group_size_rows, max_page_size_bytes, max_page_size_rows, storage_options, return_metadata, *args, **kwargs)
   6284 """{docstring}"""
   6285 from cudf.io import parquet
-> 6287 return parquet.to_parquet(
   6288     self,
   6289     path=path,
   6290     engine=engine,
   6291     compression=compression,
   6292     index=index,
   6293     partition_cols=partition_cols,
   6294     partition_file_name=partition_file_name,
   6295     partition_offsets=partition_offsets,
   6296     statistics=statistics,
   6297     metadata_file_path=metadata_file_path,
   6298     int96_timestamps=int96_timestamps,
   6299     row_group_size_bytes=row_group_size_bytes,
   6300     row_group_size_rows=row_group_size_rows,
   6301     max_page_size_bytes=max_page_size_bytes,
   6302     max_page_size_rows=max_page_size_rows,
   6303     storage_options=storage_options,
   6304     return_metadata=return_metadata,
   6305     *args,
   6306     **kwargs,
   6307 )

File .../lib/python3.8/contextlib.py:75, in ContextDecorator.__call__.<locals>.inner(*args, **kwds)
     72 @wraps(func)
     73 def inner(*args, **kwds):
     74     with self._recreate_cm():
---> 75         return func(*args, **kwds)

File .../lib/python3.8/site-packages/cudf/io/parquet.py:700, in to_parquet(df, path, engine, compression, index, partition_cols, partition_file_name, partition_offsets, statistics, metadata_file_path, int96_timestamps, row_group_size_bytes, row_group_size_rows, max_page_size_bytes, max_page_size_rows, storage_options, return_metadata, *args, **kwargs)
    698     if partition_cols is None or col not in partition_cols:
    699         if df[col].dtype.name == "category":
--> 700             raise ValueError(
    701                 "'category' column dtypes are currently not "
    702                 + "supported by the gpu accelerated parquet writer"
    703             )
    705 if partition_cols:
    706     if metadata_file_path is not None:

ValueError: 'category' column dtypes are currently not supported by the gpu accelerated parquet writer

In [8]: df.to_pandas().to_parquet('df.parquet')

In [9]: %ls df.parquet
df.parquet

feature request ? - Needs Triage

opened by mattf 0

Releases(v22.12.01)

v22.12.01(Dec 8, 2022)
🚨 Breaking Changes

Add JNI for substring without 'end' parameter. (#12113) @firestarman

Refactor purge_nonempty_nulls (#12111) @ttnghia

Create an int8 column in read_csv when all elements are missing (#12110) @vuule

Throw an error when libcudf is built without cuFile and LIBCUDF_CUFILE_POLICY is set to "ALWAYS" (#12080) @vuule

Fix type promotion edge cases in numerical binops (#12074) @wence-

Reduce/Remove reliance on **kwargs and *args in IO readers & writers (#12025) @galipremsagar

Rollback of DeviceBufferLike (#12009) @madsbk

Remove unused managed_allocator (#12005) @vyasr

Pass column names to write_csv instead of table_metadata pointer (#11972) @vuule

Accept const refs instead of const unique_ptr refs in reduce and scan APIs. (#11960) @vyasr

Default to equal NaNs in make_merge_sets_aggregation. (#11952) @bdice

Remove validation that requires introspection (#11938) @vyasr

Trim quotes for non-string values in nested json parsing (#11898) @karthikeyann

Add tests ensuring that cudf's default stream is always used (#11875) @vyasr

Support nested types as groupby keys in libcudf (#11792) @PointKernel

Default to equal NaNs in make_collect_set_aggregation. (#11621) @bdice

Removing int8 column option from parquet byte_array writing (#11539) @hyperbolic2346

part1: Simplify BaseIndex to an abstract class (#10389) @skirui-source

🐛 Bug Fixes

strings_udf: use libcudf caching of character tables (#12343) @wence-

Fix include line for IO Cython modules (#12250) @vyasr

Make dask pinning looser (#12231) @vyasr

Workaround for CUB segmented-sort bug with boolean keys (#12217) @davidwendt

Fix from_dict backend dispatch to match upstream dask (#12203) @galipremsagar

Merge branch-22.10 into branch-22.12 (#12198) @davidwendt

Fix compression in ORC writer (#12194) @vuule

Don't use CMake 3.25.0 as it has a show stopping FindCUDAToolkit bug (#12188) @robertmaynard

Fix data corruption when reading ORC files with empty stripes (#12160) @vuule

Fix decimal binary operations (#12142) @galipremsagar

Ensure dlpack include is provided to cudf interop lib (#12139) @robertmaynard

Safely allocate udf_string pointers in strings_udf (#12138) @brandon-b-miller

Fix/disable jitify lto (#12122) @robertmaynard

Fix conditional_full_join benchmark (#12121) @GregoryKimball

Fix regex working-memory-size refactor error (#12119) @davidwendt

Add in negative size checks for columns (#12118) @revans2

Add JNI for substring without 'end' parameter. (#12113) @firestarman

Fix reading of CSV files with blank second row (#12098) @vuule

Fix an error in IO with GzipFile type (#12085) @galipremsagar

Workaround groupby aggregate thrust::copy_if overflow (#12079) @davidwendt

Fix alignment of compressed blocks in ORC writer (#12077) @vuule

Fix singleton-range __setitem__ edge case (#12075) @wence-

Fix type promotion edge cases in numerical binops (#12074) @wence-

Force using old fmt in nvbench. (#12067) @vyasr

Fixes List offset bug in Nested JSON reader (#12060) @karthikeyann

Allow falling back to shim_60.ptx by default in strings_udf (#12056) @brandon-b-miller

Force black exclusions for pre-commit. (#12036) @bdice

Add memory_usage & items implementation for Struct column & dtype (#12033) @galipremsagar

Reduce/Remove reliance on **kwargs and *args in IO readers & writers (#12025) @galipremsagar

Fixes bug in csv_reader_options construction in cython (#12021) @karthikeyann

Fix issues when both usecols and names options are used in read_csv (#12018) @vuule

Port thrust's pinned_allocator to cudf, since Thrust 1.17 removes the type (#12004) @robertmaynard

Revert "Replace most of preprocessor usage in nvcomp adapter with constexpr" (#11999) @vuule

Fix bug where df.loc resulting in single row could give wrong index (#11998) @eriknw

Switch to DISABLE_DEPRECATION_WARNINGS to match other RAPIDS projects (#11989) @robertmaynard

Fix maximum page size estimate in Parquet writer (#11962) @vuule

Fix local offset handling in bgzip reader (#11918) @upsj

Fix an issue reading struct-of-list types in Parquet. (#11910) @nvdbaranec

Fix memcheck error in TypeInference.Timestamp gtest (#11905) @davidwendt

Fix type casting in Series.setitem (#11904) @wence-

Fix memcheck error in get_dremel_data (#11903) @davidwendt

Fixes Unsupported column type error due to empty list columns in Nested JSON reader (#11897) @karthikeyann

Fix segmented-sort to ignore indices outside the offsets (#11888) @davidwendt

Fix cudf::stable_sorted_order for NaN and -NaN in FLOAT64 columns (#11874) @davidwendt

Fix writing of Parquet files with many fragments (#11869) @etseidl

Fix RangeIndex unary operators. (#11868) @vyasr

JNI Avoid NPE for reading host binary data (#11865) @revans2

Fix decimal benchmark input data generation (#11863) @karthikeyann

Fix pre-commit copyright check (#11860) @galipremsagar

Fix Parquet support for seconds and milliseconds duration types (#11854) @vuule

Ensure better compiler cache results between cudf cal-ver branches (#11835) @robertmaynard

Fix make_column_from_scalar for all-null strings column (#11807) @davidwendt

Tell jitify_preprocess where to search for libnvrtc (#11787) @robertmaynard

add V2 page header support to parquet reader (#11778) @etseidl

Parquet reader: bug fix for a num_rows/skip_rows corner case, w/optimization for nested preprocessing (#11752) @nvdbaranec

Determine if Arrow has S3 support at runtime in unit test. (#11560) @bdice

📖 Documentation

Use rapidsai CODE_OF_CONDUCT.md (#12166) @bdice

Add symlinks to notebooks. (#12128) @bdice

Add truncate API to python doc pages (#12109) @galipremsagar

Update Numba docs links. (#12107) @bdice

Remove "Multi-GPU with Dask-cuDF" notebook. (#12095) @bdice

Fix link to c++ developer guide from CONTRIBUTING.md (#12084) @brandon-b-miller

Add pivot_table and crosstab to docs. (#12014) @bdice

Fix doxygen text for cudf::dictionary::encode (#11991) @davidwendt

Replace default_stream_value with get_default_stream in docs. (#11985) @vyasr

Add dtype docs pages and docstrings for cudf specific dtypes (#11974) @galipremsagar

Update Unit Testing in libcudf guidelines to code tests outside the cudf::test namespace (#11959) @davidwendt

Rename libcudf++ to libcudf. (#11953) @bdice

Fix documentation referring to removed as_gpu_matrix method. (#11937) @bdice

Remove "experimental" warning for struct columns in ORC reader and writer (#11880) @vuule

Initial draft of policies and guidelines for libcudf usage. (#11853) @vyasr

Add clear indication of non-GPU accelerated parameters in read_json docstring (#11825) @GregoryKimball

Add developer docs for writing tests (#11199) @vyasr

🚀 New Features

Adds an EventHandler to Java MemoryBuffer to be invoked on close (#12125) @abellina

Support + in strings_udf (#12117) @brandon-b-miller

Support upper and lower in strings_udf (#12099) @brandon-b-miller

Add wheel builds (#12096) @vyasr

Allow setting malloc heap size in string udfs (#12094) @brandon-b-miller

Support strip, lstrip, and rstrip in strings_udf (#12091) @brandon-b-miller

Mark nvcomp zstd compression stable (#12059) @jbrennan333

Add debug-only onAllocated/onDeallocated to RmmEventHandler (#12054) @abellina

Enable building against the libarrow contained in pyarrow (#12034) @vyasr

Add strings like jni and native method (#12032) @cindyyuanjiang

Cleanup common parsing code in JSON, CSV reader (#12022) @karthikeyann

byte_range support for JSON Lines format (#12017) @karthikeyann

Minor cleanup of root CMakeLists.txt for better organization (#11988) @robertmaynard

Add inplace arithmetic operators to MaskedType (#11987) @brandon-b-miller

Implement JNI for chunked Parquet reader (#11961) @ttnghia

Add method argument to DataFrame.quantile (#11957) @rjzamora

Add gpu memory watermark apis to JNI (#11950) @abellina

Adds retryCount to RmmEventHandler.onAllocFailure (#11940) @abellina

Enable returning string data from UDFs used through apply (#11933) @brandon-b-miller

Switch over to rapids-cmake patches for thrust (#11921) @robertmaynard

Add strings udf C++ classes and functions for phase II (#11912) @davidwendt

Trim quotes for non-string values in nested json parsing (#11898) @karthikeyann

Enable CEC for strings_udf (#11884) @brandon-b-miller

ArrowIPCTableWriter writes en empty batch in the case of an empty table. (#11883) @firestarman

Implement chunked Parquet reader (#11867) @ttnghia

Add read_orc_metadata to libcudf (#11815) @vuule

Support nested types as groupby keys in libcudf (#11792) @PointKernel

Adding feature Truncate to DataFrame and Series (#11435) @VamsiTallam95

🛠️ Improvements

Reduce number of tests marked spilling (#12197) @madsbk

Pin dask and distributed for release (#12165) @galipremsagar

Don't rely on GNU find in headers_test.sh (#12164) @wence-

Update cp.clip call (#12148) @quasiben

Enable automatic column projection in groupby().agg (#12124) @rjzamora

Refactor purge_nonempty_nulls (#12111) @ttnghia

Create an int8 column in read_csv when all elements are missing (#12110) @vuule

Spilling to host memory (#12106) @madsbk

First pass of pd.read_orc changes in tests (#12103) @galipremsagar

Expose engine argument in dask_cudf.read_json (#12101) @rjzamora

Remove CUDA 10 compatibility code. (#12088) @bdice

Move and update dask nigthly install in CI (#12082) @galipremsagar

Throw an error when libcudf is built without cuFile and LIBCUDF_CUFILE_POLICY is set to "ALWAYS" (#12080) @vuule

Remove macros that inspect the contents of exceptions (#12076) @vyasr

Fix ingest_raw_data performance issue in Nested JSON reader due to RVO (#12070) @karthikeyann

Remove overflow error during decimal binops (#12063) @galipremsagar

Change cudf::detail::tdigest to cudf::tdigest::detail (#12050) @davidwendt

Fix quantile gtests coded in namespace cudf::test (#12049) @davidwendt

Add support for DataFrame.from_dict`to_dictandSeries.to_dict` (#12048) @galipremsagar

Refactor Parquet reader (#12046) @ttnghia

Forward merge 22.10 into 22.12 (#12045) @vyasr

Standardize newlines at ends of files. (#12042) @bdice

Trim trailing whitespace from all files. (#12041) @bdice

Use nosync policy in gather and scatter implementations. (#12038) @bdice

Remove smart quotes from all docstrings. (#12035) @bdice

Update cuda-python dependency to 11.7.1 (#12030) @galipremsagar

Add cython-lint to pre-commit checks. (#12020) @bdice

Use pragma once (#12019) @bdice

New GHA to add issues/prs to project board (#12016) @jarmak-nv

Add DataFrame.pivot_table. (#12015) @bdice

Rollback of DeviceBufferLike (#12009) @madsbk

Remove default parameters for nvtext::detail functions (#12007) @davidwendt

Remove default parameters for cudf::dictionary::detail functions (#12006) @davidwendt

Remove unused managed_allocator (#12005) @vyasr

Remove default parameters for cudf::strings::detail functions (#12003) @davidwendt

Remove unnecessary code from dask-cudf _Frame (#12001) @rjzamora

Ignore python docs build artifacts (#12000) @galipremsagar

Use rapids-cmake for google benchmark. (#11997) @vyasr

Leverage rapids_cython for more automated RPATH handling (#11996) @vyasr

Remove stale labeler (#11995) @raydouglass

Move protobuf compilation to CMake (#11986) @vyasr

Replace most of preprocessor usage in nvcomp adapter with constexpr (#11980) @vuule

Add missing noexcepts to column_in_metadata methods (#11973) @vyasr

Pass column names to write_csv instead of table_metadata pointer (#11972) @vuule

Accelerate libcudf segmented sort with CUB segmented sort (#11969) @davidwendt

Feature/remove default streams (#11967) @vyasr

Add pool memory resource to libcudf basic example (#11966) @davidwendt

Fix some libcudf calls to cudf::detail::gather (#11963) @davidwendt

Accept const refs instead of const unique_ptr refs in reduce and scan APIs. (#11960) @vyasr

Add deprecation warning for set_allocator. (#11958) @vyasr

Fix lists and structs gtests coded in namespace cudf::test (#11956) @davidwendt

Add full page indexes to Parquet writer benchmarks (#11955) @etseidl

Use gather-based strings factory in cudf::strings::strip (#11954) @davidwendt

Default to equal NaNs in make_merge_sets_aggregation. (#11952) @bdice

Add strip_delimiters option to read_text (#11946) @upsj

Refactor multibyte_split output_builder (#11945) @upsj

Remove validation that requires introspection (#11938) @vyasr

Add .str.find_multiple API (#11928) @galipremsagar

Add regex_program class for use with all regex APIs (#11927) @davidwendt

Enable backend dispatching for Dask-DataFrame creation (#11920) @rjzamora

Performance improvement in JSON Tree traversal (#11919) @karthikeyann

Fix some gtests incorrectly coded in namespace cudf::test (part I) (#11917) @davidwendt

Refactor pad/zfill functions for reuse with strings udf (#11914) @davidwendt

Add nanosecond & microsecond to DatetimeProperties (#11911) @galipremsagar

Pin mimesis version in setup.py. (#11906) @bdice

Error on ListColumn or any new unsupported column in cudf.Index (#11902) @galipremsagar

Add thrust output iterator fix (1805) to thrust.patch (#11900) @davidwendt

Relax codecov threshold diff (#11899) @galipremsagar

Use public APIs in STREAM_COMPACTION_NVBENCH (#11892) @GregoryKimball

Add coverage for string UDF tests. (#11891) @vyasr

Provide data_chunk_source wrapper for datasource (#11886) @upsj

Handle multibyte_split byte_range out-of-bounds offsets on host (#11885) @upsj

Add tests ensuring that cudf's default stream is always used (#11875) @vyasr

Change expect_strings_empty into expect_column_empty libcudf test utility (#11873) @davidwendt

Add ngroup (#11871) @shwina

Reduce memory usage in nested JSON parser - tree generation (#11864) @karthikeyann

Unpin dask and distributed for development (#11859) @galipremsagar

Remove unused includes for table/row_operators (#11857) @GregoryKimball

Use conda-forge's pyorc (#11855) @jakirkham

Add libcudf strings examples (#11849) @davidwendt

Remove cudf_io namespace alias (#11827) @vuule

Test/remove thrust vector usage (#11813) @vyasr

Add BGZIP reader to python read_text (#11802) @upsj

Merge branch-22.10 into branch-22.12 (#11801) @davidwendt

Fix compile warning from CUDF_FUNC_RANGE in a member function (#11798) @davidwendt

Update cudf JNI version to 22.12.0-SNAPSHOT (#11764) @pxLi

Update flake8 to 5.0.4 and use flake8-force to check Cython. (#11736) @bdice

Add BGZIP multibyte_split benchmark (#11723) @upsj

Bifurcate Dependency Lists (#11674) @bdice

Default to equal NaNs in make_collect_set_aggregation. (#11621) @bdice

Conform "bench_isin" to match generator column names (#11549) @GregoryKimball

Removing int8 column option from parquet byte_array writing (#11539) @hyperbolic2346

Add checks for HLG layers in dask-cudf groupby tests (#10853) @charlesbluca

part1: Simplify BaseIndex to an abstract class (#10389) @skirui-source

Make all nvcc warnings into errors (#8916) @trxcllnt

Source code(tar.gz)
Source code(zip)
v22.12.00(Dec 8, 2022)
🚨 Breaking Changes

Add JNI for substring without 'end' parameter. (#12113) @firestarman

Refactor purge_nonempty_nulls (#12111) @ttnghia

Create an int8 column in read_csv when all elements are missing (#12110) @vuule

Throw an error when libcudf is built without cuFile and LIBCUDF_CUFILE_POLICY is set to "ALWAYS" (#12080) @vuule

Fix type promotion edge cases in numerical binops (#12074) @wence-

Reduce/Remove reliance on **kwargs and *args in IO readers & writers (#12025) @galipremsagar

Rollback of DeviceBufferLike (#12009) @madsbk

Remove unused managed_allocator (#12005) @vyasr

Pass column names to write_csv instead of table_metadata pointer (#11972) @vuule

Accept const refs instead of const unique_ptr refs in reduce and scan APIs. (#11960) @vyasr

Default to equal NaNs in make_merge_sets_aggregation. (#11952) @bdice

Remove validation that requires introspection (#11938) @vyasr

Trim quotes for non-string values in nested json parsing (#11898) @karthikeyann

Add tests ensuring that cudf's default stream is always used (#11875) @vyasr

Support nested types as groupby keys in libcudf (#11792) @PointKernel

Default to equal NaNs in make_collect_set_aggregation. (#11621) @bdice

Removing int8 column option from parquet byte_array writing (#11539) @hyperbolic2346

part1: Simplify BaseIndex to an abstract class (#10389) @skirui-source

🐛 Bug Fixes

Fix include line for IO Cython modules (#12250) @vyasr

Make dask pinning looser (#12231) @vyasr

Workaround for CUB segmented-sort bug with boolean keys (#12217) @davidwendt

Fix from_dict backend dispatch to match upstream dask (#12203) @galipremsagar

Merge branch-22.10 into branch-22.12 (#12198) @davidwendt

Fix compression in ORC writer (#12194) @vuule

Don't use CMake 3.25.0 as it has a show stopping FindCUDAToolkit bug (#12188) @robertmaynard

Fix data corruption when reading ORC files with empty stripes (#12160) @vuule

Fix decimal binary operations (#12142) @galipremsagar

Ensure dlpack include is provided to cudf interop lib (#12139) @robertmaynard

Safely allocate udf_string pointers in strings_udf (#12138) @brandon-b-miller

Fix/disable jitify lto (#12122) @robertmaynard

Fix conditional_full_join benchmark (#12121) @GregoryKimball

Fix regex working-memory-size refactor error (#12119) @davidwendt

Add in negative size checks for columns (#12118) @revans2

Add JNI for substring without 'end' parameter. (#12113) @firestarman

Fix reading of CSV files with blank second row (#12098) @vuule

Fix an error in IO with GzipFile type (#12085) @galipremsagar

Workaround groupby aggregate thrust::copy_if overflow (#12079) @davidwendt

Fix alignment of compressed blocks in ORC writer (#12077) @vuule

Fix singleton-range __setitem__ edge case (#12075) @wence-

Fix type promotion edge cases in numerical binops (#12074) @wence-

Force using old fmt in nvbench. (#12067) @vyasr

Fixes List offset bug in Nested JSON reader (#12060) @karthikeyann

Allow falling back to shim_60.ptx by default in strings_udf (#12056) @brandon-b-miller

Force black exclusions for pre-commit. (#12036) @bdice

Add memory_usage & items implementation for Struct column & dtype (#12033) @galipremsagar

Reduce/Remove reliance on **kwargs and *args in IO readers & writers (#12025) @galipremsagar

Fixes bug in csv_reader_options construction in cython (#12021) @karthikeyann

Fix issues when both usecols and names options are used in read_csv (#12018) @vuule

Port thrust's pinned_allocator to cudf, since Thrust 1.17 removes the type (#12004) @robertmaynard

Revert "Replace most of preprocessor usage in nvcomp adapter with constexpr" (#11999) @vuule

Fix bug where df.loc resulting in single row could give wrong index (#11998) @eriknw

Switch to DISABLE_DEPRECATION_WARNINGS to match other RAPIDS projects (#11989) @robertmaynard

Fix maximum page size estimate in Parquet writer (#11962) @vuule

Fix local offset handling in bgzip reader (#11918) @upsj

Fix an issue reading struct-of-list types in Parquet. (#11910) @nvdbaranec

Fix memcheck error in TypeInference.Timestamp gtest (#11905) @davidwendt

Fix type casting in Series.setitem (#11904) @wence-

Fix memcheck error in get_dremel_data (#11903) @davidwendt

Fixes Unsupported column type error due to empty list columns in Nested JSON reader (#11897) @karthikeyann

Fix segmented-sort to ignore indices outside the offsets (#11888) @davidwendt

Fix cudf::stable_sorted_order for NaN and -NaN in FLOAT64 columns (#11874) @davidwendt

Fix writing of Parquet files with many fragments (#11869) @etseidl

Fix RangeIndex unary operators. (#11868) @vyasr

JNI Avoid NPE for reading host binary data (#11865) @revans2

Fix decimal benchmark input data generation (#11863) @karthikeyann

Fix pre-commit copyright check (#11860) @galipremsagar

Fix Parquet support for seconds and milliseconds duration types (#11854) @vuule

Ensure better compiler cache results between cudf cal-ver branches (#11835) @robertmaynard

Fix make_column_from_scalar for all-null strings column (#11807) @davidwendt

Tell jitify_preprocess where to search for libnvrtc (#11787) @robertmaynard

add V2 page header support to parquet reader (#11778) @etseidl

Parquet reader: bug fix for a num_rows/skip_rows corner case, w/optimization for nested preprocessing (#11752) @nvdbaranec

Determine if Arrow has S3 support at runtime in unit test. (#11560) @bdice

📖 Documentation

Use rapidsai CODE_OF_CONDUCT.md (#12166) @bdice

Add symlinks to notebooks. (#12128) @bdice

Add truncate API to python doc pages (#12109) @galipremsagar

Update Numba docs links. (#12107) @bdice

Remove "Multi-GPU with Dask-cuDF" notebook. (#12095) @bdice

Fix link to c++ developer guide from CONTRIBUTING.md (#12084) @brandon-b-miller

Add pivot_table and crosstab to docs. (#12014) @bdice

Fix doxygen text for cudf::dictionary::encode (#11991) @davidwendt

Replace default_stream_value with get_default_stream in docs. (#11985) @vyasr

Add dtype docs pages and docstrings for cudf specific dtypes (#11974) @galipremsagar

Update Unit Testing in libcudf guidelines to code tests outside the cudf::test namespace (#11959) @davidwendt

Rename libcudf++ to libcudf. (#11953) @bdice

Fix documentation referring to removed as_gpu_matrix method. (#11937) @bdice

Remove "experimental" warning for struct columns in ORC reader and writer (#11880) @vuule

Initial draft of policies and guidelines for libcudf usage. (#11853) @vyasr

Add clear indication of non-GPU accelerated parameters in read_json docstring (#11825) @GregoryKimball

Add developer docs for writing tests (#11199) @vyasr

🚀 New Features

Adds an EventHandler to Java MemoryBuffer to be invoked on close (#12125) @abellina

Support + in strings_udf (#12117) @brandon-b-miller

Support upper and lower in strings_udf (#12099) @brandon-b-miller

Add wheel builds (#12096) @vyasr

Allow setting malloc heap size in string udfs (#12094) @brandon-b-miller

Support strip, lstrip, and rstrip in strings_udf (#12091) @brandon-b-miller

Mark nvcomp zstd compression stable (#12059) @jbrennan333

Add debug-only onAllocated/onDeallocated to RmmEventHandler (#12054) @abellina

Enable building against the libarrow contained in pyarrow (#12034) @vyasr

Add strings like jni and native method (#12032) @cindyyuanjiang

Cleanup common parsing code in JSON, CSV reader (#12022) @karthikeyann

byte_range support for JSON Lines format (#12017) @karthikeyann

Minor cleanup of root CMakeLists.txt for better organization (#11988) @robertmaynard

Add inplace arithmetic operators to MaskedType (#11987) @brandon-b-miller

Implement JNI for chunked Parquet reader (#11961) @ttnghia

Add method argument to DataFrame.quantile (#11957) @rjzamora

Add gpu memory watermark apis to JNI (#11950) @abellina

Adds retryCount to RmmEventHandler.onAllocFailure (#11940) @abellina

Enable returning string data from UDFs used through apply (#11933) @brandon-b-miller

Switch over to rapids-cmake patches for thrust (#11921) @robertmaynard

Add strings udf C++ classes and functions for phase II (#11912) @davidwendt

Trim quotes for non-string values in nested json parsing (#11898) @karthikeyann

Enable CEC for strings_udf (#11884) @brandon-b-miller

ArrowIPCTableWriter writes en empty batch in the case of an empty table. (#11883) @firestarman

Implement chunked Parquet reader (#11867) @ttnghia

Add read_orc_metadata to libcudf (#11815) @vuule

Support nested types as groupby keys in libcudf (#11792) @PointKernel

Adding feature Truncate to DataFrame and Series (#11435) @VamsiTallam95

🛠️ Improvements

Reduce number of tests marked spilling (#12197) @madsbk

Pin dask and distributed for release (#12165) @galipremsagar

Don't rely on GNU find in headers_test.sh (#12164) @wence-

Update cp.clip call (#12148) @quasiben

Enable automatic column projection in groupby().agg (#12124) @rjzamora

Refactor purge_nonempty_nulls (#12111) @ttnghia

Create an int8 column in read_csv when all elements are missing (#12110) @vuule

Spilling to host memory (#12106) @madsbk

First pass of pd.read_orc changes in tests (#12103) @galipremsagar

Expose engine argument in dask_cudf.read_json (#12101) @rjzamora

Remove CUDA 10 compatibility code. (#12088) @bdice

Move and update dask nigthly install in CI (#12082) @galipremsagar

Throw an error when libcudf is built without cuFile and LIBCUDF_CUFILE_POLICY is set to "ALWAYS" (#12080) @vuule

Remove macros that inspect the contents of exceptions (#12076) @vyasr

Fix ingest_raw_data performance issue in Nested JSON reader due to RVO (#12070) @karthikeyann

Remove overflow error during decimal binops (#12063) @galipremsagar

Change cudf::detail::tdigest to cudf::tdigest::detail (#12050) @davidwendt

Fix quantile gtests coded in namespace cudf::test (#12049) @davidwendt

Add support for DataFrame.from_dict`to_dictandSeries.to_dict` (#12048) @galipremsagar

Refactor Parquet reader (#12046) @ttnghia

Forward merge 22.10 into 22.12 (#12045) @vyasr

Standardize newlines at ends of files. (#12042) @bdice

Trim trailing whitespace from all files. (#12041) @bdice

Use nosync policy in gather and scatter implementations. (#12038) @bdice

Remove smart quotes from all docstrings. (#12035) @bdice

Update cuda-python dependency to 11.7.1 (#12030) @galipremsagar

Add cython-lint to pre-commit checks. (#12020) @bdice

Use pragma once (#12019) @bdice

New GHA to add issues/prs to project board (#12016) @jarmak-nv

Add DataFrame.pivot_table. (#12015) @bdice

Rollback of DeviceBufferLike (#12009) @madsbk

Remove default parameters for nvtext::detail functions (#12007) @davidwendt

Remove default parameters for cudf::dictionary::detail functions (#12006) @davidwendt

Remove unused managed_allocator (#12005) @vyasr

Remove default parameters for cudf::strings::detail functions (#12003) @davidwendt

Remove unnecessary code from dask-cudf _Frame (#12001) @rjzamora

Ignore python docs build artifacts (#12000) @galipremsagar

Use rapids-cmake for google benchmark. (#11997) @vyasr

Leverage rapids_cython for more automated RPATH handling (#11996) @vyasr

Remove stale labeler (#11995) @raydouglass

Move protobuf compilation to CMake (#11986) @vyasr

Replace most of preprocessor usage in nvcomp adapter with constexpr (#11980) @vuule

Add missing noexcepts to column_in_metadata methods (#11973) @vyasr

Pass column names to write_csv instead of table_metadata pointer (#11972) @vuule

Accelerate libcudf segmented sort with CUB segmented sort (#11969) @davidwendt

Feature/remove default streams (#11967) @vyasr

Add pool memory resource to libcudf basic example (#11966) @davidwendt

Fix some libcudf calls to cudf::detail::gather (#11963) @davidwendt

Accept const refs instead of const unique_ptr refs in reduce and scan APIs. (#11960) @vyasr

Add deprecation warning for set_allocator. (#11958) @vyasr

Fix lists and structs gtests coded in namespace cudf::test (#11956) @davidwendt

Add full page indexes to Parquet writer benchmarks (#11955) @etseidl

Use gather-based strings factory in cudf::strings::strip (#11954) @davidwendt

Default to equal NaNs in make_merge_sets_aggregation. (#11952) @bdice

Add strip_delimiters option to read_text (#11946) @upsj

Refactor multibyte_split output_builder (#11945) @upsj

Remove validation that requires introspection (#11938) @vyasr

Add .str.find_multiple API (#11928) @galipremsagar

Add regex_program class for use with all regex APIs (#11927) @davidwendt

Enable backend dispatching for Dask-DataFrame creation (#11920) @rjzamora

Performance improvement in JSON Tree traversal (#11919) @karthikeyann

Fix some gtests incorrectly coded in namespace cudf::test (part I) (#11917) @davidwendt

Refactor pad/zfill functions for reuse with strings udf (#11914) @davidwendt

Add nanosecond & microsecond to DatetimeProperties (#11911) @galipremsagar

Pin mimesis version in setup.py. (#11906) @bdice

Error on ListColumn or any new unsupported column in cudf.Index (#11902) @galipremsagar

Add thrust output iterator fix (1805) to thrust.patch (#11900) @davidwendt

Relax codecov threshold diff (#11899) @galipremsagar

Use public APIs in STREAM_COMPACTION_NVBENCH (#11892) @GregoryKimball

Add coverage for string UDF tests. (#11891) @vyasr

Provide data_chunk_source wrapper for datasource (#11886) @upsj

Handle multibyte_split byte_range out-of-bounds offsets on host (#11885) @upsj

Add tests ensuring that cudf's default stream is always used (#11875) @vyasr

Change expect_strings_empty into expect_column_empty libcudf test utility (#11873) @davidwendt

Add ngroup (#11871) @shwina

Reduce memory usage in nested JSON parser - tree generation (#11864) @karthikeyann

Unpin dask and distributed for development (#11859) @galipremsagar

Remove unused includes for table/row_operators (#11857) @GregoryKimball

Use conda-forge's pyorc (#11855) @jakirkham

Add libcudf strings examples (#11849) @davidwendt

Remove cudf_io namespace alias (#11827) @vuule

Test/remove thrust vector usage (#11813) @vyasr

Add BGZIP reader to python read_text (#11802) @upsj

Merge branch-22.10 into branch-22.12 (#11801) @davidwendt

Fix compile warning from CUDF_FUNC_RANGE in a member function (#11798) @davidwendt

Update cudf JNI version to 22.12.0-SNAPSHOT (#11764) @pxLi

Update flake8 to 5.0.4 and use flake8-force to check Cython. (#11736) @bdice

Add BGZIP multibyte_split benchmark (#11723) @upsj

Bifurcate Dependency Lists (#11674) @bdice

Default to equal NaNs in make_collect_set_aggregation. (#11621) @bdice

Conform "bench_isin" to match generator column names (#11549) @GregoryKimball

Removing int8 column option from parquet byte_array writing (#11539) @hyperbolic2346

Add checks for HLG layers in dask-cudf groupby tests (#10853) @charlesbluca

part1: Simplify BaseIndex to an abstract class (#10389) @skirui-source

Make all nvcc warnings into errors (#8916) @trxcllnt

Source code(tar.gz)
Source code(zip)
v23.02.00a(Nov 18, 2022)
🔗 Links

Development Branch

Compare with main branch

🚨 Breaking Changes

Add trailing comma support for nested JSON reader (#12448) @karthikeyann

Upgrade to arrow-10.0.1 (#12327) @galipremsagar

Fail loudly to avoid data corruption with unsupported input in read_orc (#12325) @vuule

CSV, JSON reader to infer integer column with nulls as int64 instead of float64 (#12309) @karthikeyann

Remove deprecated code for 23.02 (#12281) @vyasr

Null element for parsing error in numeric types in JSON, CSV reader (#12272) @karthikeyann

Purge non-empty nulls for superimpose_nulls and push_down_nulls (#12239) @ttnghia

Rename cudf::structs::detail::superimpose_parent_nulls APIs (#12230) @ttnghia

Remove JIT type names, refactor id_to_type. (#12158) @bdice

Floor division uses integer division for integral arguments (#12131) @wence-

🐛 Bug Fixes

Enable metadata transfer for complex types in transpose (#12491) @galipremsagar

Fix missing metadata transfer in concat for ListColumn (#12487) @galipremsagar

Fix compile issue with arrow 10 (#12465) @ttnghia

Fix xfail incompatibilities (#12423) @vyasr

Fix bug in Parquet column index encoding (#12404) @etseidl

When building Arrow shared look for a shared OpenSSL (#12396) @robertmaynard

Fix get_json_object to return empty column on empty input (#12384) @davidwendt

Pin arrow 9 in testing dependencies to prevent conda solve issues (#12377) @vyasr

Fix reductions any/all return value for empty input (#12374) @davidwendt

Fix debug compile errors in parquet.hpp (#12372) @davidwendt

Purge non-empty nulls in cudf::make_lists_column (#12370) @ttnghia

Use correct memory resource in io::make_column (#12364) @vyasr

Add code to detect possible malformed page data in parquet files. (#12360) @nvdbaranec

Fail loudly to avoid data corruption with unsupported input in read_orc (#12325) @vuule

Fix NumericPairIteratorTest for float values (#12306) @davidwendt

Fixes memory allocation in nested JSON tokenizer (#12300) @elstehle

Fix regex \A and \Z to strictly match string begin/end (#12282) @davidwendt

Fix compile issue in json_chunked_reader.cpp (#12280) @ttnghia

Change reductions any/all to return valid values for empty input (#12279) @davidwendt

Only exclude join keys that are indices from key columns (#12271) @wence-

Fix spill to device limit (#12252) @madsbk

Correct behaviour of sort in concat for singleton concatenations (#12247) @wence-

Purge non-empty nulls for superimpose_nulls and push_down_nulls (#12239) @ttnghia

Patch CUB DeviceSegmentedSort and remove workaround (#12234) @davidwendt

Fix memory leak in udf_string::assign(&&) function (#12206) @davidwendt

Workaround thrust-copy-if limit in json get_tree_representation (#12190) @davidwendt

Fix page size calculation in Parquet writer (#12182) @etseidl

Add cudf::detail::sizes_to_offsets_iterator to allow checking overflow in offsets (#12180) @davidwendt

Workaround thrust-copy-if limit in wordpiece-tokenizer (#12168) @davidwendt

Floor division uses integer division for integral arguments (#12131) @wence-

📖 Documentation

Link unsupported iteration API docstrings (#12482) @galipremsagar

strings_udf doc update (#12469) @brandon-b-miller

Update cudf_assert docs with correct NDEBUG behavior (#12464) @robertmaynard

Update pre-commit hooks guide (#12395) @bdice

Update test docs to not use detail comparison utilities (#12332) @PointKernel

Fix doxygen description for regex_program::compute_working_memory_size (#12329) @davidwendt

Add eval to docs. (#12322) @vyasr

Turn on xfail_strict=true (#12244) @wence-

Update 10 minutes to cuDF (#12114) @wence-

🚀 New Features

one_hot_encode to use experimental row comparators (#12478) @divyegala

Refactor thrust_copy_if into cudf::detail::copy_if_safe (#12455) @ttnghia

Add trailing comma support for nested JSON reader (#12448) @karthikeyann

Extract tokenize_json.hpp detail header from src/io/json/nested_json.hpp (#12432) @ttnghia

JNI bindings to write CSV (#12425) @mythrocks

Implement lists::reverse (#12336) @ttnghia

Use device_read in experimental read_json (#12314) @vuule

Implement JNI for strings::reverse (#12283) @ttnghia

Null element for parsing error in numeric types in JSON, CSV reader (#12272) @karthikeyann

Add cudf::strings:like function with multiple patterns (#12269) @davidwendt

Add environment variable to control host memory allocation in hostdevice_vector (#12251) @vuule

Add cudf::strings::reverse function (#12227) @davidwendt

Support replace in strings_udf (#12207) @brandon-b-miller

Add support to read binary encoded decimals in parquet (#12205) @PointKernel

Support regex EOL where the string ends with a new-line character (#12181) @davidwendt

🛠️ Improvements

Stop using pandas._testing (#12492) @vyasr

Rework nvtext::generate_character_ngrams to use make_strings_children (#12480) @davidwendt

Raise warnings as errors in the test suite (#12468) @vyasr

Remove int32 hard-coding in python (#12467) @galipremsagar

Use cudaMemcpyDefault. (#12466) @bdice

Update workflows for nightly tests (#12462) @ajschmidt8

JNI build image default as cuda11.8 (#12441) @pxLi

Re-enable Recently Updated Check (#12435) @ajschmidt8

Rework remaining cudf::strings::from_xyz functions to use make_strings_children (#12434) @vuule

Remove arguments for checking exception messages in Python (#12424) @vyasr

Clean up cuco usage (#12421) @PointKernel

Fix warnings in remaining modules (#12406) @vyasr

Update ops-bot.yaml (#12402) @ajschmidt8

Rework cudf::strings::integers_to_ipv4 to use make_strings_children utility (#12401) @davidwendt

Expose the RMM pool size in JNI (#12390) @revans2

Fix COPYING_TEST: gtests coded in namespace cudf::test (#12387) @davidwendt

Rework cudf::strings::url_encode to use make_strings_children utility (#12385) @davidwendt

Fix warnings in test_datetime.py (#12381) @vyasr

Fix warnings in dataframe.py (#12369) @vyasr

Update conda recipes. (#12368) @bdice

Use gpu-latest-1 runner tag (#12366) @bdice

Rework cudf::strings::from_booleans to use make_strings_children (#12365) @vuule

Fix warnings in test modules up to test_dataframe.py (#12355) @vyasr

Accelerate stable-segmented-sort with CUB segmented sort (#12347) @davidwendt

Enable max compression ratio small block optimization for ZSTD (#12338) @vuule

Fix warnings in test_monotonic.py (#12334) @vyasr

Improve JSON column creation performance (list offsets) (#12330) @karthikeyann

Upgrade to arrow-10.0.1 (#12327) @galipremsagar

Fix warnings in test_orc.py (#12326) @vyasr

Fix warnings in test_groupby.py (#12324) @vyasr

Fix test_notebooks.sh (#12323) @ajschmidt8

Fix transform gtests coded in namespace cudf::test (#12321) @davidwendt

Fix check_style.sh script (#12320) @ajschmidt8

Rework cudf::strings::from_timestamps to use make_strings_children (#12317) @davidwendt

Fix warnings in test_index.py (#12313) @vyasr

Fix warnings in test_multiindex.py (#12310) @vyasr

CSV, JSON reader to infer integer column with nulls as int64 instead of float64 (#12309) @karthikeyann

Fix warnings in test_indexing.py (#12305) @vyasr

Fix warnings in test_joining.py (#12304) @vyasr

Unpin dask and distributed for development (#12302) @galipremsagar

Re-enable sccache for Jenkins builds (#12297) @ajschmidt8

Define needs for pr-builder workflow. (#12296) @bdice

Forward merge 22.12 into 23.02 (#12294) @vyasr

Fix warnings in test_stats.py (#12293) @vyasr

Fix table gtests coded in namespace cudf::test (#12292) @davidwendt

Change cython for regex calls to use cudf::strings::regex_program (#12289) @davidwendt

Improved error reporting when reading multiple JSON files (#12285) @vuule

Deprecate Frame.sum_of_squares (#12284) @vyasr

Remove deprecated code for 23.02 (#12281) @vyasr

Clean up handling of max_page_size_bytes in Parquet writer (#12277) @etseidl

Fix replace gtests coded in namespace cudf::test (#12270) @davidwendt

Add pandas nullable type support in Index.to_pandas (#12268) @galipremsagar

Rework nvtext::detokenize to use indexalator for row indices (#12267) @davidwendt

Fix reduction gtests coded in namespace cudf::test (#12257) @davidwendt

Remove default parameters from cudf::detail::sort function declarations (#12254) @davidwendt

Add duplicated support for Series, DataFrame and Index (#12246) @galipremsagar

Replace column/table test utilities with macros (#12242) @PointKernel

Rework cudf::strings::pad and zfill to use make_strings_children (#12238) @davidwendt

Fix sort gtests coded in namespace cudf::test (#12237) @davidwendt

Wrapping concat and file writes in @acquire_spill_lock() (#12232) @madsbk

Rename cudf::structs::detail::superimpose_parent_nulls APIs (#12230) @ttnghia

Cover parsing to decimal types in read_json tests (#12229) @vuule

Spill Statistics (#12223) @madsbk

Use CUDF_JNI_ENABLE_PROFILING to conditionally enable profiling support. (#12221) @bdice

Clean up of test_spilling.py (#12220) @madsbk

Simplify repetitive boolean logic (#12218) @vuule

Add Series.hasnans and Index.hasnans (#12214) @galipremsagar

Add cudf::strings:udf::replace function (#12210) @davidwendt

Adds in new java APIs for appending byte arrays to host columnar data (#12208) @revans2

Remove Python dependencies from Java CI. (#12193) @bdice

Fix null order in sort-based groupby and improve groupby tests (#12191) @divyegala

Move strings children functions from cudf/strings/detail/utilities.cuh to new header (#12185) @davidwendt

Clean up existing JNI scalar to column code (#12173) @revans2

Remove JIT type names, refactor id_to_type. (#12158) @bdice

Update JNI version to 23.02.0-SNAPSHOT (#12129) @pxLi

Minor refactor of cpp/src/io/parquet/page_data.cu (#12126) @etseidl

Add codespell as a linter (#12097) @benfred

Enable specifying exceptions in error macros (#12078) @vyasr

Move _label_encoding from Series to Column (#12040) @shwina

Add GitHub Actions Workflows (#12002) @ajschmidt8

Consolidate dask-cudf groupby_agg calls in one place (#10835) @charlesbluca

Source code(tar.gz)
Source code(zip)
v22.10.01(Nov 3, 2022)
🚨 Breaking Changes

Disable Zstandard decompression on nvCOMP 2.4 and Pascal GPus (#11856) @vuule

Disable nvCOMP DEFLATE integration (#11811) @vuule

Fix return type of Index.isna & Index.notna (#11769) @galipremsagar

Remove kwargs in read_csv & to_csv (#11762) @galipremsagar

Fix cudf::partition* APIs that do not return offsets for empty output table (#11709) @ttnghia

Fix regex negated classes to not automatically include new-lines (#11644) @davidwendt

Update zfill to match Python output (#11634) @davidwendt

Upgrade pandas to 1.5 (#11617) @galipremsagar

Change default value of ordered to False in CategoricalDtype (#11604) @galipremsagar

Move cudf::strings::findall_record to cudf::strings::findall (#11575) @davidwendt

Adding optional parquet reader schema (#11524) @hyperbolic2346

Deprecate skiprows and num_rows in read_orc (#11522) @galipremsagar

Remove support for skip_rows / num_rows options in the parquet reader. (#11503) @nvdbaranec

Drop support for skiprows and num_rows in cudf.read_parquet (#11480) @galipremsagar

Disable Arrow S3 support by default. (#11470) @bdice

Convert thrust::optional usages to std::optional (#11455) @robertmaynard

Remove unused is_struct trait. (#11450) @bdice

Refactor the Buffer class (#11447) @madsbk

Return empty dataframe when reading an ORC file using empty columns option (#11446) @vuule

Refactor pad_side and strip_type enums into side_type enum (#11438) @davidwendt

Remove HASH_SERIAL_MURMUR3 / serial32BitMurmurHash3 (#11383) @bdice

Use the new JSON parser when the experimental reader is selected (#11364) @vuule

Remove deprecated Series.applymap. (#11031) @bdice

Remove deprecated expand parameter from str.findall. (#11030) @bdice

🐛 Bug Fixes

Update cuda-python dependency to 11.7.1 (#11994) @shwina

Fixes bug in temporary decompression space estimation before calling nvcomp (#11879) @abellina

Handle ptx file paths during strings_udf import (#11862) @galipremsagar

Disable Zstandard decompression on nvCOMP 2.4 and Pascal GPus (#11856) @vuule

Reset strings_udf CEC and solve several related issues (#11846) @brandon-b-miller

Fix bug in new shuffle-based groupby implementation (#11836) @rjzamora

Fix is_valid checks in Scalar._binaryop (#11818) @wence-

Fix operator NotImplemented issue with numpy (#11816) @galipremsagar

Disable nvCOMP DEFLATE integration (#11811) @vuule

Build strings_udf package with other python packages in nightlies (#11808) @brandon-b-miller

Revert problematic shuffle=explicit-comms changes (#11803) @rjzamora

Fix regex out-of-bounds write in strided rows logic (#11797) @davidwendt

Build cudf locally before building strings_udf conda packages in CI (#11785) @brandon-b-miller

Fix an issue in cudf::row_bit_count involving structs and lists at multiple levels. (#11779) @nvdbaranec

Fix return type of Index.isna & Index.notna (#11769) @galipremsagar

Fix issue with set-item incase of list and struct types (#11760) @galipremsagar

Ensure all libcudf APIs run on cudf's default stream (#11759) @vyasr

Resolve dask_cudf failures caused by upstream groupby changes (#11755) @rjzamora

Fix ORC string sum statistics (#11740) @vuule

Add strings_udf package for python 3.9 (#11730) @brandon-b-miller

Ensure that all tests launch kernels on cudf's default stream (#11726) @vyasr

Don't assume stream is a compile-time constant expression (#11725) @vyasr

Fix get_thrust.cmake format at patch command (#11715) @davidwendt

Fix cudf::partition* APIs that do not return offsets for empty output table (#11709) @ttnghia

Fix cudf::lists::sort_lists for NaN and Infinity values (#11703) @davidwendt

Modify ORC reader timestamp parsing to match the apache reader behavior (#11699) @vuule

Fix DataFrame.from_arrow to preserve type metadata (#11698) @galipremsagar

Fix compile error due to missing header (#11697) @ttnghia

Default to Snappy compression in to_orc when using cuDF or Dask (#11690) @vuule

Fix an issue related to Multindex when group_keys=True (#11689) @galipremsagar

Transfer correct dtype to exploded column (#11687) @wence-

Ignore protobuf generated files in mypy checks (#11685) @galipremsagar

Maintain the index name after .loc (#11677) @shwina

Fix issue with extracting nested column data & dtype preservation (#11671) @galipremsagar

Ensure that all cudf tests and benchmarks are conda env aware (#11666) @robertmaynard

Update to Thrust 1.17.2 to fix cub ODR issues (#11665) @robertmaynard

Fix multi-file remote datasource bug (#11655) @rjzamora

Fix invalid regex quantifier check to not include alternation (#11654) @davidwendt

Fix bug in device_write(): it uses an incorrect size (#11651) @madsbk

fixes overflows in benchmarks (#11649) @elstehle

Fix regex negated classes to not automatically include new-lines (#11644) @davidwendt

Fix compile error in benchmark nested_json.cpp (#11637) @davidwendt

Update zfill to match Python output (#11634) @davidwendt

Removed converted type for INT32 and INT64 since they do not convert (#11627) @hyperbolic2346

Fix host scalars construction of nested types (#11612) @galipremsagar

Fix compile warning in nested_json_gpu.cu (#11607) @davidwendt

Change default value of ordered to False in CategoricalDtype (#11604) @galipremsagar

Preserve order if necessary when deduping categoricals internally (#11597) @brandon-b-miller

Add is_timestamp test for leap second (60) (#11594) @davidwendt

Fix an issue with to_arrow when column name type is not a string (#11590) @galipremsagar

Fix exception in segmented-reduce benchmark (#11588) @davidwendt

Fix encode/decode of negative timestamps in ORC reader/writer (#11586) @vuule

Correct distribution data type in quantiles benchmark (#11584) @vuule

Fix multibyte_split benchmark for host buffers (#11583) @upsj

xfail custreamz display test for now (#11567) @shwina

Fix JNI for TableWithMeta to use schema_info instead of column_names (#11566) @jlowe

Reduce code duplication for dask & distributed nightly/stable installs (#11565) @galipremsagar

Fix groupby failures in dask_cudf CI (#11561) @rjzamora

Fix for pivot: error when 'values' is a multicharacter string (#11538) @shaswat-indian

find_package(cudf) + arrow9 usable with cudf build directory (#11535) @robertmaynard

Fixing crash when writing binary nested data in parquet (#11526) @hyperbolic2346

Fix for: error when assigning a value to an empty series (#11523) @shaswat-indian

Fix invalid results from conditional-left-anti-join in debug build (#11517) @davidwendt

Fix cmake error after upgrading to Arrow 9 (#11513) @ttnghia

Fix reverse binary operators acting on a host value and cudf.Scalar (#11512) @bdice

Update parquet fuzz tests to drop support for skiprows & num_rows (#11505) @galipremsagar

Use rapids-cmake 22.10 best practice for RAPIDS.cmake location (#11493) @robertmaynard

Handle some zero-sized corner cases in dlpack interop (#11449) @wence-

Return empty dataframe when reading an ORC file using empty columns option (#11446) @vuule

libcudf c++ example updated to CPM version 0.35.3 (#11417) @robertmaynard

Fix regex quantifier check to include capture groups (#11373) @davidwendt

Fix read_text when byte_range is aligned with field (#11371) @upsj

Fix to_timestamps truncated subsecond calculation (#11367) @davidwendt

column: calculate null_count before release()ing the cudf::column (#11365) @wence-

📖 Documentation

Update guide-to-udfs notebook (#11861) @brandon-b-miller

Update docstring for cudf.read_text (#11799) @GregoryKimball

Add doc section for list & struct handling (#11770) @galipremsagar

Document that minimum required CMake version is now 3.23.1 (#11751) @robertmaynard

Update libcudf documentation build command in DOCUMENTATION.md (#11735) @davidwendt

Add docs for use of string data to DataFrame.apply and Series.apply and update guide to UDFs notebook (#11733) @brandon-b-miller

Enable more Pydocstyle rules (#11582) @bdice

Remove unused cpp/img folder (#11554) @davidwendt

Publish C++ developer docs (#11475) @vyasr

Fix a misalignment in cudf.get_dummies docstring (#11443) @galipremsagar

Update contributing doc to include links to the developer guides (#11390) @davidwendt

Fix table_view_base doxygen format (#11340) @davidwendt

Create main developer guide for Python (#11235) @vyasr

Add developer documentation for benchmarking (#11122) @vyasr

cuDF error handling document (#7917) @isVoid

🚀 New Features

Add hasNull statistic reading ability to ORC (#11747) @devavret

Add istitle to string UDFs (#11738) @brandon-b-miller

JSON Column creation in GPU (#11714) @karthikeyann

Adds option to take explicit nested schema for nested JSON reader (#11682) @elstehle

Add BGZIP data_chunk_reader (#11652) @upsj

Support DECIMAL order-by for RANGE window functions (#11645) @mythrocks

changing version of cmake to 3.23.3 (#11619) @hyperbolic2346

Generate unique keys table in java JNI contiguousSplitGroups (#11614) @res-life

Generic type casting to support the new nested JSON reader (#11613) @elstehle

JSON tree traversal (#11610) @karthikeyann

Add casting operators to masked UDFs (#11578) @brandon-b-miller

Adds type inference and type conversion for leaf-columns to the nested JSON parser (#11574) @elstehle

Add strings 'like' function (#11558) @davidwendt

Handle hyphen as literal for regex cclass when incomplete range (#11557) @davidwendt

Enable ZSTD compression in ORC and Parquet writers (#11551) @vuule

Adds support for json lines format to the nested JSON reader (#11534) @elstehle

Adding optional parquet reader schema (#11524) @hyperbolic2346

Adds GPU implementation of JSON-token-stream to JSON-tree (#11518) @karthikeyann

Add gdb pretty-printers for simple types (#11499) @upsj

Add create_random_column function to the data generator (#11490) @vuule

Add fluent API builder to data_profile (#11479) @vuule

Adds Nested Json benchmark (#11466) @karthikeyann

Convert thrust::optional usages to std::optional (#11455) @robertmaynard

Python API for the future experimental JSON reader (#11426) @vuule

Return schema info from JSON reader (#11419) @vuule

Add regex ASCII flag support for matching builtin character classes (#11404) @davidwendt

Truncate parquet column indexes (#11403) @etseidl

Adds the end-to-end JSON parser implementation (#11388) @elstehle

Use the new JSON parser when the experimental reader is selected (#11364) @vuule

Add placeholder for the experimental JSON reader (#11334) @vuule

Add read-only functions on string dtypes to DataFrame.apply and Series.apply (#11319) @brandon-b-miller

Added 'crosstab' and 'pivot_table' features (#11314) @shaswat-indian

Quickly error out when trying to build with unsupported nvcc versions (#11297) @robertmaynard

Adds JSON tokenizer (#11264) @elstehle

List lexicographic comparator (#11129) @devavret

Add generic type inference for cuIO (#11121) @PointKernel

Fully support nested types in cudf::contains (#10656) @ttnghia

Support nested types in lists::contains (#10548) @ttnghia

🛠️ Improvements

Pin dask and distributed for release (#11822) @galipremsagar

Add examples for Nested JSON reader (#11814) @GregoryKimball

Support shuffle-based groupby aggregations in dask_cudf (#11800) @rjzamora

Update strings udf version updater script (#11772) @galipremsagar

Remove kwargs in read_csv & to_csv (#11762) @galipremsagar

Pass dtype param to avoid pd.Series warnings (#11761) @galipremsagar

Enable schema_element & keep_quotes support in json reader (#11746) @galipremsagar

Add ability to construct ListColumn when size is None (#11745) @galipremsagar

Reduces memory requirements in JSON parser and adds bytes/s and peak memory usage to benchmarks (#11732) @elstehle

Add missing copyright headers. (#11712) @bdice

Fix copyright check issues in pre-commit (#11711) @bdice

Include decimal in supported types for range window order-by columns (#11710) @mythrocks

Disable very large column gtest for contiguous-split (#11706) @davidwendt

Drop split_out=None test from groupby.agg (#11704) @wence-

Use CubinLinker for CUDA Minor Version Compatibility (#11701) @gmarkall

Add regex capture-group parameter to auto convert to non-capture groups (#11695) @davidwendt

Add a __dataframe__ method to the protocol dataframe object (#11692) @rgommers

Special-case multibyte_split for single-byte delimiter (#11681) @upsj

Remove isort exclusions (#11680) @bdice

Refactor CSV reader benchmarks with nvbench (#11678) @PointKernel

Check conda recipe headers with pre-commit (#11669) @bdice

Remove redundant style check for clang-format. (#11668) @bdice

Add support for group_keys in groupby (#11659) @galipremsagar

Fix pandoc pinning. (#11658) @bdice

Revert removal of skip_rows / num_rows options from the Parquet reader. (#11657) @nvdbaranec

Update git metadata (#11647) @bdice

Call set_null_count on a returning column if null-count is known (#11646) @davidwendt

Fix some libcudf detail calls not passing the stream variable (#11642) @davidwendt

Update to mypy 0.971 (#11640) @wence-

Refactor strings strip functor to details header (#11635) @davidwendt

Fix incorrect nullCount in get_json_object (#11633) @trxcllnt

Simplify hostdevice_vector (#11631) @upsj

Refactor parquet writer benchmarks with nvbench (#11623) @PointKernel

Rework contains_scalar to check nulls at runtime (#11622) @davidwendt

Fix incorrect memory resource used in rolling temp columns (#11618) @mythrocks

Upgrade pandas to 1.5 (#11617) @galipremsagar

Move type-dispatcher calls from traits.hpp to traits.cpp (#11616) @davidwendt

Refactor parquet reader benchmarks with nvbench (#11611) @PointKernel

Forward-merge branch-22.08 to branch-22.10 (#11608) @bdice

Use stream in Java API. (#11601) @bdice

Refactors of public/detail APIs, CUDF_FUNC_RANGE, stream handling. (#11600) @bdice

Improve ORC writer benchmark with nvbench (#11598) @PointKernel

Tune multibyte_split kernel (#11587) @upsj

Move split_utils.cuh to strings/detail (#11585) @davidwendt

Fix warnings due to compiler regression with if constexpr (#11581) @ttnghia

Add full 24-bit dictionary support to Parquet writer (#11580) @etseidl

Expose "explicit-comms" option in shuffle-based dask_cudf functions (#11576) @rjzamora

Move cudf::strings::findall_record to cudf::strings::findall (#11575) @davidwendt

Refactor dask_cudf groupby to use apply_concat_apply (#11571) @rjzamora

Add ability to write list(struct) columns as map type in orc writer (#11568) @galipremsagar

Add byte_range to multibyte_split benchmark + NVBench refactor (#11562) @upsj

JNI support for writing binary columns in parquet (#11556) @revans2

Support additional dictionary bit widths in Parquet writer (#11547) @etseidl

Refactor string/numeric conversion utilities (#11545) @davidwendt

Removing unnecessary asserts in parquet tests (#11544) @hyperbolic2346

Clean up ORC reader benchmarks with NVBench (#11543) @PointKernel

Reuse MurmurHash3_32 in Parquet page data. (#11528) @bdice

Add hexadecimal value separators (#11527) @bdice

Deprecate skiprows and num_rows in read_orc (#11522) @galipremsagar

Struct support for NULL_EQUALS binary operation (#11520) @rwlee

Bump hadoop-common from 3.2.3 to 3.2.4 in /java (#11516) @dependabot[bot]

Fix Feather test warning. (#11511) @bdice

copy_range ballot_syncs to have no execution dependency (#11508) @robertmaynard

Upgrade to arrow-9.x (#11507) @galipremsagar

Remove support for skip_rows / num_rows options in the parquet reader. (#11503) @nvdbaranec

Single-pass multibyte_split (#11500) @upsj

Sanitize percentile_approx() output for empty input (#11498) @SrikarVanavasam

Unpin dask and distributed for development (#11492) @galipremsagar

Move SparkMurmurHash3_32 functor. (#11489) @bdice

Refactor group_nunique.cu to use nullate::DYNAMIC for reduce-by-key functor (#11482) @davidwendt

Drop support for skiprows and num_rows in cudf.read_parquet (#11480) @galipremsagar

Add reduction distinct_count benchmark (#11473) @ttnghia

Add groupby nunique aggregation benchmark (#11472) @ttnghia

Disable Arrow S3 support by default. (#11470) @bdice

Add groupby max aggregation benchmark (#11464) @ttnghia

Extract Dremel encoding code from Parquet (#11461) @vyasr

Add missing Thrust #includes. (#11457) @bdice

Make CMake hooks verbose (#11456) @vyasr

Control Parquet page size through Python API (#11454) @etseidl

Add control of Parquet column index creation to python (#11453) @etseidl

Remove unused is_struct trait. (#11450) @bdice

Refactor the Buffer class (#11447) @madsbk

Refactor pad_side and strip_type enums into side_type enum (#11438) @davidwendt

Update to Thrust 1.17.0 (#11437) @bdice

Add in JNI for parsing JSON data and getting the metadata back too. (#11431) @revans2

Convert byte_array_view to use std::byte (#11424) @hyperbolic2346

Deprecate unflatten_nested_columns (#11421) @SrikarVanavasam

Remove HASH_SERIAL_MURMUR3 / serial32BitMurmurHash3 (#11383) @bdice

Add Spark list hashing Java tests (#11379) @bdice

Move cmake to the build section. (#11376) @vyasr

Remove use of CUDA driver API calls from libcudf (#11370) @shwina

Add column constructor from device_uvector&& (#11356) @SrikarVanavasam

Remove unused custreamz thirdparty directory (#11343) @vyasr

Update jni version to 22.10.0-SNAPSHOT (#11338) @pxLi

Enable using upstream jitify2 (#11287) @shwina

Cache cudf.Scalar (#11246) @shwina

Remove deprecated Series.applymap. (#11031) @bdice

Remove deprecated expand parameter from str.findall. (#11030) @bdice

Source code(tar.gz)
Source code(zip)
v22.10.00(Oct 12, 2022)
🚨 Breaking Changes

Disable Zstandard decompression on nvCOMP 2.4 and Pascal GPus (#11856) @vuule

Disable nvCOMP DEFLATE integration (#11811) @vuule

Fix return type of Index.isna & Index.notna (#11769) @galipremsagar

Remove kwargs in read_csv & to_csv (#11762) @galipremsagar

Fix cudf::partition* APIs that do not return offsets for empty output table (#11709) @ttnghia

Fix regex negated classes to not automatically include new-lines (#11644) @davidwendt

Update zfill to match Python output (#11634) @davidwendt

Upgrade pandas to 1.5 (#11617) @galipremsagar

Change default value of ordered to False in CategoricalDtype (#11604) @galipremsagar

Move cudf::strings::findall_record to cudf::strings::findall (#11575) @davidwendt

Adding optional parquet reader schema (#11524) @hyperbolic2346

Deprecate skiprows and num_rows in read_orc (#11522) @galipremsagar

Remove support for skip_rows / num_rows options in the parquet reader. (#11503) @nvdbaranec

Drop support for skiprows and num_rows in cudf.read_parquet (#11480) @galipremsagar

Disable Arrow S3 support by default. (#11470) @bdice

Convert thrust::optional usages to std::optional (#11455) @robertmaynard

Remove unused is_struct trait. (#11450) @bdice

Refactor the Buffer class (#11447) @madsbk

Return empty dataframe when reading an ORC file using empty columns option (#11446) @vuule

Refactor pad_side and strip_type enums into side_type enum (#11438) @davidwendt

Remove HASH_SERIAL_MURMUR3 / serial32BitMurmurHash3 (#11383) @bdice

Use the new JSON parser when the experimental reader is selected (#11364) @vuule

Remove deprecated Series.applymap. (#11031) @bdice

Remove deprecated expand parameter from str.findall. (#11030) @bdice

🐛 Bug Fixes

Fixes bug in temporary decompression space estimation before calling nvcomp (#11879) @abellina

Handle ptx file paths during strings_udf import (#11862) @galipremsagar

Disable Zstandard decompression on nvCOMP 2.4 and Pascal GPus (#11856) @vuule

Reset strings_udf CEC and solve several related issues (#11846) @brandon-b-miller

Fix bug in new shuffle-based groupby implementation (#11836) @rjzamora

Fix is_valid checks in Scalar._binaryop (#11818) @wence-

Fix operator NotImplemented issue with numpy (#11816) @galipremsagar

Disable nvCOMP DEFLATE integration (#11811) @vuule

Build strings_udf package with other python packages in nightlies (#11808) @brandon-b-miller

Revert problematic shuffle=explicit-comms changes (#11803) @rjzamora

Fix regex out-of-bounds write in strided rows logic (#11797) @davidwendt

Build cudf locally before building strings_udf conda packages in CI (#11785) @brandon-b-miller

Fix an issue in cudf::row_bit_count involving structs and lists at multiple levels. (#11779) @nvdbaranec

Fix return type of Index.isna & Index.notna (#11769) @galipremsagar

Fix issue with set-item incase of list and struct types (#11760) @galipremsagar

Ensure all libcudf APIs run on cudf's default stream (#11759) @vyasr

Resolve dask_cudf failures caused by upstream groupby changes (#11755) @rjzamora

Fix ORC string sum statistics (#11740) @vuule

Add strings_udf package for python 3.9 (#11730) @brandon-b-miller

Ensure that all tests launch kernels on cudf's default stream (#11726) @vyasr

Don't assume stream is a compile-time constant expression (#11725) @vyasr

Fix get_thrust.cmake format at patch command (#11715) @davidwendt

Fix cudf::partition* APIs that do not return offsets for empty output table (#11709) @ttnghia

Fix cudf::lists::sort_lists for NaN and Infinity values (#11703) @davidwendt

Modify ORC reader timestamp parsing to match the apache reader behavior (#11699) @vuule

Fix DataFrame.from_arrow to preserve type metadata (#11698) @galipremsagar

Fix compile error due to missing header (#11697) @ttnghia

Default to Snappy compression in to_orc when using cuDF or Dask (#11690) @vuule

Fix an issue related to Multindex when group_keys=True (#11689) @galipremsagar

Transfer correct dtype to exploded column (#11687) @wence-

Ignore protobuf generated files in mypy checks (#11685) @galipremsagar

Maintain the index name after .loc (#11677) @shwina

Fix issue with extracting nested column data & dtype preservation (#11671) @galipremsagar

Ensure that all cudf tests and benchmarks are conda env aware (#11666) @robertmaynard

Update to Thrust 1.17.2 to fix cub ODR issues (#11665) @robertmaynard

Fix multi-file remote datasource bug (#11655) @rjzamora

Fix invalid regex quantifier check to not include alternation (#11654) @davidwendt

Fix bug in device_write(): it uses an incorrect size (#11651) @madsbk

fixes overflows in benchmarks (#11649) @elstehle

Fix regex negated classes to not automatically include new-lines (#11644) @davidwendt

Fix compile error in benchmark nested_json.cpp (#11637) @davidwendt

Update zfill to match Python output (#11634) @davidwendt

Removed converted type for INT32 and INT64 since they do not convert (#11627) @hyperbolic2346

Fix host scalars construction of nested types (#11612) @galipremsagar

Fix compile warning in nested_json_gpu.cu (#11607) @davidwendt

Change default value of ordered to False in CategoricalDtype (#11604) @galipremsagar

Preserve order if necessary when deduping categoricals internally (#11597) @brandon-b-miller

Add is_timestamp test for leap second (60) (#11594) @davidwendt

Fix an issue with to_arrow when column name type is not a string (#11590) @galipremsagar

Fix exception in segmented-reduce benchmark (#11588) @davidwendt

Fix encode/decode of negative timestamps in ORC reader/writer (#11586) @vuule

Correct distribution data type in quantiles benchmark (#11584) @vuule

Fix multibyte_split benchmark for host buffers (#11583) @upsj

xfail custreamz display test for now (#11567) @shwina

Fix JNI for TableWithMeta to use schema_info instead of column_names (#11566) @jlowe

Reduce code duplication for dask & distributed nightly/stable installs (#11565) @galipremsagar

Fix groupby failures in dask_cudf CI (#11561) @rjzamora

Fix for pivot: error when 'values' is a multicharacter string (#11538) @shaswat-indian

find_package(cudf) + arrow9 usable with cudf build directory (#11535) @robertmaynard

Fixing crash when writing binary nested data in parquet (#11526) @hyperbolic2346

Fix for: error when assigning a value to an empty series (#11523) @shaswat-indian

Fix invalid results from conditional-left-anti-join in debug build (#11517) @davidwendt

Fix cmake error after upgrading to Arrow 9 (#11513) @ttnghia

Fix reverse binary operators acting on a host value and cudf.Scalar (#11512) @bdice

Update parquet fuzz tests to drop support for skiprows & num_rows (#11505) @galipremsagar

Use rapids-cmake 22.10 best practice for RAPIDS.cmake location (#11493) @robertmaynard

Handle some zero-sized corner cases in dlpack interop (#11449) @wence-

Return empty dataframe when reading an ORC file using empty columns option (#11446) @vuule

libcudf c++ example updated to CPM version 0.35.3 (#11417) @robertmaynard

Fix regex quantifier check to include capture groups (#11373) @davidwendt

Fix read_text when byte_range is aligned with field (#11371) @upsj

Fix to_timestamps truncated subsecond calculation (#11367) @davidwendt

column: calculate null_count before release()ing the cudf::column (#11365) @wence-

📖 Documentation

Update guide-to-udfs notebook (#11861) @brandon-b-miller

Update docstring for cudf.read_text (#11799) @GregoryKimball

Add doc section for list & struct handling (#11770) @galipremsagar

Document that minimum required CMake version is now 3.23.1 (#11751) @robertmaynard

Update libcudf documentation build command in DOCUMENTATION.md (#11735) @davidwendt

Add docs for use of string data to DataFrame.apply and Series.apply and update guide to UDFs notebook (#11733) @brandon-b-miller

Enable more Pydocstyle rules (#11582) @bdice

Remove unused cpp/img folder (#11554) @davidwendt

Publish C++ developer docs (#11475) @vyasr

Fix a misalignment in cudf.get_dummies docstring (#11443) @galipremsagar

Update contributing doc to include links to the developer guides (#11390) @davidwendt

Fix table_view_base doxygen format (#11340) @davidwendt

Create main developer guide for Python (#11235) @vyasr

Add developer documentation for benchmarking (#11122) @vyasr

cuDF error handling document (#7917) @isVoid

🚀 New Features

Add hasNull statistic reading ability to ORC (#11747) @devavret

Add istitle to string UDFs (#11738) @brandon-b-miller

JSON Column creation in GPU (#11714) @karthikeyann

Adds option to take explicit nested schema for nested JSON reader (#11682) @elstehle

Add BGZIP data_chunk_reader (#11652) @upsj

Support DECIMAL order-by for RANGE window functions (#11645) @mythrocks

changing version of cmake to 3.23.3 (#11619) @hyperbolic2346

Generate unique keys table in java JNI contiguousSplitGroups (#11614) @res-life

Generic type casting to support the new nested JSON reader (#11613) @elstehle

JSON tree traversal (#11610) @karthikeyann

Add casting operators to masked UDFs (#11578) @brandon-b-miller

Adds type inference and type conversion for leaf-columns to the nested JSON parser (#11574) @elstehle

Add strings 'like' function (#11558) @davidwendt

Handle hyphen as literal for regex cclass when incomplete range (#11557) @davidwendt

Enable ZSTD compression in ORC and Parquet writers (#11551) @vuule

Adds support for json lines format to the nested JSON reader (#11534) @elstehle

Adding optional parquet reader schema (#11524) @hyperbolic2346

Adds GPU implementation of JSON-token-stream to JSON-tree (#11518) @karthikeyann

Add gdb pretty-printers for simple types (#11499) @upsj

Add create_random_column function to the data generator (#11490) @vuule

Add fluent API builder to data_profile (#11479) @vuule

Adds Nested Json benchmark (#11466) @karthikeyann

Convert thrust::optional usages to std::optional (#11455) @robertmaynard

Python API for the future experimental JSON reader (#11426) @vuule

Return schema info from JSON reader (#11419) @vuule

Add regex ASCII flag support for matching builtin character classes (#11404) @davidwendt

Truncate parquet column indexes (#11403) @etseidl

Adds the end-to-end JSON parser implementation (#11388) @elstehle

Use the new JSON parser when the experimental reader is selected (#11364) @vuule

Add placeholder for the experimental JSON reader (#11334) @vuule

Add read-only functions on string dtypes to DataFrame.apply and Series.apply (#11319) @brandon-b-miller

Added 'crosstab' and 'pivot_table' features (#11314) @shaswat-indian

Quickly error out when trying to build with unsupported nvcc versions (#11297) @robertmaynard

Adds JSON tokenizer (#11264) @elstehle

List lexicographic comparator (#11129) @devavret

Add generic type inference for cuIO (#11121) @PointKernel

Fully support nested types in cudf::contains (#10656) @ttnghia

Support nested types in lists::contains (#10548) @ttnghia

🛠️ Improvements

Pin dask and distributed for release (#11822) @galipremsagar

Add examples for Nested JSON reader (#11814) @GregoryKimball

Support shuffle-based groupby aggregations in dask_cudf (#11800) @rjzamora

Update strings udf version updater script (#11772) @galipremsagar

Remove kwargs in read_csv & to_csv (#11762) @galipremsagar

Pass dtype param to avoid pd.Series warnings (#11761) @galipremsagar

Enable schema_element & keep_quotes support in json reader (#11746) @galipremsagar

Add ability to construct ListColumn when size is None (#11745) @galipremsagar

Reduces memory requirements in JSON parser and adds bytes/s and peak memory usage to benchmarks (#11732) @elstehle

Add missing copyright headers. (#11712) @bdice

Fix copyright check issues in pre-commit (#11711) @bdice

Include decimal in supported types for range window order-by columns (#11710) @mythrocks

Disable very large column gtest for contiguous-split (#11706) @davidwendt

Drop split_out=None test from groupby.agg (#11704) @wence-

Use CubinLinker for CUDA Minor Version Compatibility (#11701) @gmarkall

Add regex capture-group parameter to auto convert to non-capture groups (#11695) @davidwendt

Add a __dataframe__ method to the protocol dataframe object (#11692) @rgommers

Special-case multibyte_split for single-byte delimiter (#11681) @upsj

Remove isort exclusions (#11680) @bdice

Refactor CSV reader benchmarks with nvbench (#11678) @PointKernel

Check conda recipe headers with pre-commit (#11669) @bdice

Remove redundant style check for clang-format. (#11668) @bdice

Add support for group_keys in groupby (#11659) @galipremsagar

Fix pandoc pinning. (#11658) @bdice

Revert removal of skip_rows / num_rows options from the Parquet reader. (#11657) @nvdbaranec

Update git metadata (#11647) @bdice

Call set_null_count on a returning column if null-count is known (#11646) @davidwendt

Fix some libcudf detail calls not passing the stream variable (#11642) @davidwendt

Update to mypy 0.971 (#11640) @wence-

Refactor strings strip functor to details header (#11635) @davidwendt

Fix incorrect nullCount in get_json_object (#11633) @trxcllnt

Simplify hostdevice_vector (#11631) @upsj

Refactor parquet writer benchmarks with nvbench (#11623) @PointKernel

Rework contains_scalar to check nulls at runtime (#11622) @davidwendt

Fix incorrect memory resource used in rolling temp columns (#11618) @mythrocks

Upgrade pandas to 1.5 (#11617) @galipremsagar

Move type-dispatcher calls from traits.hpp to traits.cpp (#11616) @davidwendt

Refactor parquet reader benchmarks with nvbench (#11611) @PointKernel

Forward-merge branch-22.08 to branch-22.10 (#11608) @bdice

Use stream in Java API. (#11601) @bdice

Refactors of public/detail APIs, CUDF_FUNC_RANGE, stream handling. (#11600) @bdice

Improve ORC writer benchmark with nvbench (#11598) @PointKernel

Tune multibyte_split kernel (#11587) @upsj

Move split_utils.cuh to strings/detail (#11585) @davidwendt

Fix warnings due to compiler regression with if constexpr (#11581) @ttnghia

Add full 24-bit dictionary support to Parquet writer (#11580) @etseidl

Expose "explicit-comms" option in shuffle-based dask_cudf functions (#11576) @rjzamora

Move cudf::strings::findall_record to cudf::strings::findall (#11575) @davidwendt

Refactor dask_cudf groupby to use apply_concat_apply (#11571) @rjzamora

Add ability to write list(struct) columns as map type in orc writer (#11568) @galipremsagar

Add byte_range to multibyte_split benchmark + NVBench refactor (#11562) @upsj

JNI support for writing binary columns in parquet (#11556) @revans2

Support additional dictionary bit widths in Parquet writer (#11547) @etseidl

Refactor string/numeric conversion utilities (#11545) @davidwendt

Removing unnecessary asserts in parquet tests (#11544) @hyperbolic2346

Clean up ORC reader benchmarks with NVBench (#11543) @PointKernel

Reuse MurmurHash3_32 in Parquet page data. (#11528) @bdice

Add hexadecimal value separators (#11527) @bdice

Deprecate skiprows and num_rows in read_orc (#11522) @galipremsagar

Struct support for NULL_EQUALS binary operation (#11520) @rwlee

Bump hadoop-common from 3.2.3 to 3.2.4 in /java (#11516) @dependabot[bot]

Fix Feather test warning. (#11511) @bdice

copy_range ballot_syncs to have no execution dependency (#11508) @robertmaynard

Upgrade to arrow-9.x (#11507) @galipremsagar

Remove support for skip_rows / num_rows options in the parquet reader. (#11503) @nvdbaranec

Single-pass multibyte_split (#11500) @upsj

Sanitize percentile_approx() output for empty input (#11498) @SrikarVanavasam

Unpin dask and distributed for development (#11492) @galipremsagar

Move SparkMurmurHash3_32 functor. (#11489) @bdice

Refactor group_nunique.cu to use nullate::DYNAMIC for reduce-by-key functor (#11482) @davidwendt

Drop support for skiprows and num_rows in cudf.read_parquet (#11480) @galipremsagar

Add reduction distinct_count benchmark (#11473) @ttnghia

Add groupby nunique aggregation benchmark (#11472) @ttnghia

Disable Arrow S3 support by default. (#11470) @bdice

Add groupby max aggregation benchmark (#11464) @ttnghia

Extract Dremel encoding code from Parquet (#11461) @vyasr

Add missing Thrust #includes. (#11457) @bdice

Make CMake hooks verbose (#11456) @vyasr

Control Parquet page size through Python API (#11454) @etseidl

Add control of Parquet column index creation to python (#11453) @etseidl

Remove unused is_struct trait. (#11450) @bdice

Refactor the Buffer class (#11447) @madsbk

Refactor pad_side and strip_type enums into side_type enum (#11438) @davidwendt

Update to Thrust 1.17.0 (#11437) @bdice

Add in JNI for parsing JSON data and getting the metadata back too. (#11431) @revans2

Convert byte_array_view to use std::byte (#11424) @hyperbolic2346

Deprecate unflatten_nested_columns (#11421) @SrikarVanavasam

Remove HASH_SERIAL_MURMUR3 / serial32BitMurmurHash3 (#11383) @bdice

Add Spark list hashing Java tests (#11379) @bdice

Move cmake to the build section. (#11376) @vyasr

Remove use of CUDA driver API calls from libcudf (#11370) @shwina

Add column constructor from device_uvector&& (#11356) @SrikarVanavasam

Remove unused custreamz thirdparty directory (#11343) @vyasr

Update jni version to 22.10.0-SNAPSHOT (#11338) @pxLi

Enable using upstream jitify2 (#11287) @shwina

Cache cudf.Scalar (#11246) @shwina

Remove deprecated Series.applymap. (#11031) @bdice

Remove deprecated expand parameter from str.findall. (#11030) @bdice

Source code(tar.gz)
Source code(zip)
v22.08.01(Sep 29, 2022)
🚨 Breaking Changes

Pin numpy to <1.23 (#11824) @galipremsagar

Remove legacy join APIs (#11274) @vyasr

Remove lists::drop_list_duplicates (#11236) @ttnghia

Remove Index.replace API (#11131) @vyasr

Remove deprecated Index methods from Frame (#11073) @vyasr

Remove public API of cudf.merge_sorted. (#11032) @bdice

Drop python 3.7 in code-base (#11029) @galipremsagar

Return empty dataframe when reading a Parquet file using empty columns option (#11018) @vuule

Remove Arrow CUDA IPC code (#10995) @shwina

Buffer: make .ptr read-only (#10872) @madsbk

🐛 Bug Fixes

Fix out-of-bound access in cudf::detail::label_segments (#11497) @ttnghia

Fix distributed error related to loop_in_thread (#11428) @galipremsagar

Fix atomic operations on NaN values (#11420) @ttnghia

Relax arrow pinning to just 8.x and remove cuda build dependency from cudf recipe (#11412) @kkraus14

Revert "Allow CuPy 11" (#11409) @jakirkham

Fix moto timeouts (#11369) @galipremsagar

Set +/-infinity as the identity values for floating-point numbers in device operators min and max (#11357) @ttnghia

Fix memory_usage() for ListSeries (#11355) @thomcom

Fix constructing Column from column_view with expired mask (#11354) @shwina

Handle parquet corner case: Columns with more rows than are in the row group. (#11353) @nvdbaranec

Fix DatetimeIndex & TimedeltaIndex constructors (#11342) @galipremsagar

Fix unsigned-compare compile warning in IntPow binops (#11339) @davidwendt

Fix performance issue and add a new code path to cudf::detail::contains (#11330) @ttnghia

Pin pytorch to temporarily unblock from libcupti errors (#11289) @galipremsagar

Workaround for nvcomp zstd overwriting blocks for orc due to underestimate of sizes (#11288) @jbrennan333

Fix inconsistency when hashing two tables in cudf::detail::contains (#11284) @ttnghia

Fix issue related to numpy array and category dtype (#11282) @galipremsagar

Add NotImplementedError when on is specified in DataFrame.join. (#11275) @vyasr

Fix invalid allocate_like() and empty_like() tests. (#11268) @nvdbaranec

Returns DataFrame When Concating Along Axis 1 (#11263) @isVoid

Fix compile error due to missing header (#11257) @ttnghia

Fix a memory aliasing/crash issue in scatter for lists. (#11254) @nvdbaranec

Fix tests/rolling/empty_input_test (#11238) @ttnghia

Fix const qualifier when using host_span<bitmask_type const*> (#11220) @ttnghia

Avoid using nvcompBatchedDeflateDecompressGetTempSizeEx in cuIO (#11213) @vuule

Generate benchmark data with correct run length regardless of cardinality (#11205) @vuule

Fix cumulative count index behavior (#11188) @brandon-b-miller

Fix assertion in dask_cudf test_struct_explode (#11170) @rjzamora

Provides a method for the user to remove the hook and re-register the hook in a custom shutdown hook manager (#11161) @res-life

Fix compatibility issues with pandas 1.4.3 (#11152) @vyasr

Ensure cuco export set is installed in cmake build (#11147) @jlowe

Avoid redundant deepcopy in cudf.from_pandas (#11142) @galipremsagar

Fix compile error due to missing header (#11126) @ttnghia

Fix __cuda_array_interface__ failures (#11113) @galipremsagar

Support octal and hex within regex character class pattern (#11112) @davidwendt

Fix split_re matching logic for word boundaries (#11106) @davidwendt

Handle multiple files metadata in read_parquet (#11105) @galipremsagar

Fix index alignment for Series objects with repeated index (#11103) @shwina

FindcuFile now searches in the current CUDA Toolkit location (#11101) @robertmaynard

Fix regex word boundary logic to include underline (#11099) @davidwendt

Exclude CudaFatalTest when selecting all Java tests (#11083) @jlowe

Fix duplicate cudatoolkit pinning issue (#11070) @galipremsagar

Maintain the input index in the result of a groupby-transform (#11068) @shwina

Fix bug with row count comparison for expect_columns_equivalent(). (#11059) @nvdbaranec

Fix BPE uninitialized size value for null and empty input strings (#11054) @davidwendt

Include missing header for usage of get_current_device_resource() (#11047) @AtlantaPepsi

Fix warn_unused_result error in parquet test (#11026) @karthikeyann

Return empty dataframe when reading a Parquet file using empty columns option (#11018) @vuule

Fix small error in page row count limiting (#10991) @etseidl

Fix a row index entry error in ORC writer issue (#10989) @vuule

Fix grouped covariance to require both values to be convertible to double. (#10891) @bdice

📖 Documentation

Defer loading of custom.js (#11465) @galipremsagar

Fix issues with day & night modes in python docs (#11400) @galipremsagar

Update missing data handling APIs in docs (#11345) @galipremsagar

Add lists filtering APIs to doxygen group. (#11336) @bdice

Remove unused import in README sample (#11318) @vyasr

Note null behavior in where docs (#11276) @brandon-b-miller

Update docstring for spans in get_row_data_range (#11271) @vyasr

Update nvCOMP integration table (#11231) @vuule

Add dev docs for documentation writing (#11217) @vyasr

Documentation fix for concatenate (#11187) @dagardner-nv

Fix unresolved links in markdown (#11173) @karthikeyann

Fix cudf version in README.md install commands (#11164) @jvanstraten

Switch language from None to "en" in docs build (#11133) @galipremsagar

Remove docs mentioning scalar_view since no such class exists. (#11132) @bdice

Add docstring entry for DataFrame.value_counts (#11039) @galipremsagar

Add docs to rolling var, std, count. (#11035) @bdice

Fix docs for Numba UDFs. (#11020) @bdice

Replace column comparison utilities functions with macros (#11007) @karthikeyann

Fix Doxygen warnings in multiple headers files (#11003) @karthikeyann

Fix doxygen warnings in utilities/ headers (#10974) @karthikeyann

Fix Doxygen warnings in table header files (#10964) @karthikeyann

Fix Doxygen warnings in column header files (#10963) @karthikeyann

Fix Doxygen warnings in strings / header files (#10937) @karthikeyann

Generate Doxygen Tag File for Libcudf (#10932) @isVoid

Fix doxygen warnings in structs, lists headers (#10923) @karthikeyann

Fix doxygen warnings in fixed_point.hpp (#10922) @karthikeyann

Fix doxygen warnings in ast/, rolling, tdigest/, wrappers/, dictionary/ headers (#10921) @karthikeyann

fix doxygen warnings in cudf/io/types.hpp, other header files (#10913) @karthikeyann

fix doxygen warnings in cudf/io/ avro, csv, json, orc, parquet header files (#10912) @karthikeyann

Fix doxygen warnings in cudf/*.hpp (#10896) @karthikeyann

Add missing documentation in aggregation.hpp (#10887) @karthikeyann

Revise PR template. (#10774) @bdice

🚀 New Features

Change cmake to allow controlling Arrow version via cmake variable (#11429) @kkraus14

Adding support for list<int8> columns to be written as byte arrays in parquet (#11328) @hyperbolic2346

Adding byte array view structure (#11322) @hyperbolic2346

Adding byte_array statistics (#11303) @hyperbolic2346

Add column indexes to Parquet writer (#11302) @etseidl

Provide an Option for Default Integer and Floating Bitwidth (#11272) @isVoid

FST benchmark (#11243) @karthikeyann

Adds the Finite-State Transducer algorithm (#11242) @elstehle

Refactor collect_set to use cudf::distinct and cudf::lists::distinct (#11228) @ttnghia

Treat zstd as stable in nvcomp releases 2.3.2 and later (#11226) @jbrennan333

Add 24 bit dictionary support to Parquet writer (#11216) @devavret

Enable positive group indices for extractAllRecord on JNI (#11215) @anthony-chang

JNI bindings for NTH_ELEMENT window aggregation (#11201) @mythrocks

Add JNI bindings for extractAllRecord (#11196) @anthony-chang

Add cudf.options (#11193) @isVoid

Add thrift support for parquet column and offset indexes (#11178) @etseidl

Adding binary read/write as options for parquet (#11160) @hyperbolic2346

Support nth_element for window functions (#11158) @mythrocks

Implement lists::distinct and cudf::detail::stable_distinct (#11149) @ttnghia

Implement Groupby pct_change (#11144) @skirui-source

Add JNI for set operations (#11143) @ttnghia

Remove deprecated PER_THREAD_DEFAULT_STREAM (#11134) @jbrennan333

Added a Java method to check the existence of a list of keys in a map (#11128) @razajafri

Feature/python benchmarking (#11125) @vyasr

Support nan_equality in cudf::distinct (#11118) @ttnghia

Added JNI for getMapValueForKeys (#11104) @razajafri

Refactor semi_anti_join (#11100) @ttnghia

Replace remaining instances of rmm::cuda_stream_default with cudf::default_stream_value (#11082) @jbrennan333

Adds the Logical Stack algorithm (#11078) @elstehle

Add doxygen-check pre-commit hook (#11076) @karthikeyann

Use new nvCOMP API to optimize the decompression temp memory size (#11064) @vuule

Add Doxygen CI check (#11057) @karthikeyann

Support duplicate_keep_option in cudf::distinct (#11052) @ttnghia

Support set operations (#11043) @ttnghia

Support for ZLIB compression in ORC writer (#11036) @vuule

Adding feature swaplevels (#11027) @VamsiTallam95

Use nvCOMP for ZLIB decompression in ORC reader (#11024) @vuule

Function for bfill, ffill #9591 (#11022) @Sreekiran096

Generate group offsets from element labels (#11017) @ttnghia

Feature axes (#10979) @VamsiTallam95

Generate group labels from offsets (#10945) @ttnghia

Add missing cuIO benchmark coverage for duration types (#10933) @vuule

Dask-cuDF cumulative groupby ops (#10889) @brandon-b-miller

Reindex Improvements (#10815) @brandon-b-miller

Implement value_counts for DataFrame (#10813) @martinfalisse

🛠️ Improvements

Pin numpy to <1.23 (#11824) @galipremsagar

Make Index Join Tests on Default Precisions Deterministic (#11451) @isVoid

Pin dask & distributed for release (#11433) @galipremsagar

Use documented header template for doxygen (#11430) @galipremsagar

Relax arrow version in dev env (#11418) @galipremsagar

Added Java bindings for Parquet options for binary read (#11410) @razajafri

Allow CuPy 11 (#11393) @jakirkham

Improve multibyte_split performance (#11347) @cwharris

Switch death test to use explicit trap. (#11326) @vyasr

Add --output-on-failure to ctest args. (#11321) @vyasr

Consolidate remaining DataFrame/Series APIs (#11315) @vyasr

Add JNI support for the join_strings API (#11309) @revans2

Add cupy version to setup.py install_requires (#11306) @vyasr

removing some unused code (#11305) @hyperbolic2346

Add test of wildcard selection (#11300) @vyasr

Update parquet reader to take stream parameter (#11294) @PointKernel

Spark list hashing (#11292) @bdice

Remove legacy join APIs (#11274) @vyasr

Fix cudf recipes syntax (#11273) @ajschmidt8

Fix cudf recipe (#11267) @ajschmidt8

Cleanup config files (#11266) @vyasr

Run mypy on all packages (#11265) @vyasr

Update to isort 5.10.1. (#11262) @vyasr

Consolidate flake8 and pydocstyle configuration (#11260) @vyasr

Remove redundant black config specifications. (#11258) @vyasr

Ensure DeprecationWarnings are not introduced via pre-commit (#11255) @wence-

Optimization to gpu::PreprocessColumnData in parquet reader. (#11252) @nvdbaranec

Move rolling impl details to detail/ directory. (#11250) @mythrocks

Remove lists::drop_list_duplicates (#11236) @ttnghia

Use cudf::lists::distinct in Python binding (#11234) @ttnghia

Use cudf::lists::distinct in Java binding (#11233) @ttnghia

Use cudf::distinct in Java binding (#11232) @ttnghia

Pin dask-cuda in dev environment (#11229) @galipremsagar

Remove cruft in map_lookup (#11221) @mythrocks

Deprecate skiprows & num_rows in parquet reader (#11218) @galipremsagar

Remove Frame._index (#11210) @vyasr

Improve performance for cudf::contains when searching for a scalar (#11202) @ttnghia

Document why Development component is needing for CMake. (#11200) @vyasr

cleanup unused code in rolling_test.hpp (#11195) @karthikeyann

Standardize join internals around DataFrame (#11184) @vyasr

Move character case table declarations from src to detail (#11183) @davidwendt

Remove usage of Frame in StringMethods (#11181) @vyasr

Expose get_json_object_options to Python (#11180) @SrikarVanavasam

Fix decimal128 stats in parquet writer (#11179) @etseidl

Modify CheckPageRows in parquet_test to use datasources (#11177) @etseidl

Pin max version of cuda-python to 11.7.0 (#11174) @Ethyling

Refactor and optimize Frame.where (#11168) @vyasr

Add npos const static member to cudf::string_view (#11166) @davidwendt

Move _drop_rows_by_label from Frame to IndexedFrame (#11157) @vyasr

Clean up _copy_type_metadata (#11156) @vyasr

Add nvcc conda package in dev environment (#11154) @galipremsagar

Struct binary comparison op functionality for spark rapids (#11153) @rwlee

Refactor inline conditionals. (#11151) @bdice

Refactor Spark hashing tests (#11145) @bdice

Add new _from_data_like_self factory (#11140) @vyasr

Update get_cucollections to use rapids-cmake (#11139) @vyasr

Remove unnecessary extra function for libcudacxx detection (#11138) @vyasr

Allow initial value for cudf::reduce and cudf::segmented_reduce. (#11137) @SrikarVanavasam

Remove Index.replace API (#11131) @vyasr

Move char-type table function declarations from src to detail (#11127) @davidwendt

Clean up repo root (#11124) @bdice

Improve print formatting of strings containing newline characters. (#11108) @nvdbaranec

Fix cudf::string_view::find() to return pos for empty string argument (#11107) @davidwendt

Forward-merge branch-22.06 to branch-22.08 (#11086) @bdice

Take iterators by value in clamp.cu. (#11084) @bdice

Performance improvements for row to column conversions (#11075) @hyperbolic2346

Remove deprecated Index methods from Frame (#11073) @vyasr

Use per-page max compressed size estimate for compression (#11066) @devavret

column to row refactor for performance (#11063) @hyperbolic2346

Include skbuild directory into build.sh clean operation (#11060) @galipremsagar

Unpin dask & distributed for development (#11058) @galipremsagar

Add support for Series.between (#11051) @galipremsagar

Fix groupby include (#11046) @bwyogatama

Regex cleanup internal reclass and reclass_device classes (#11045) @davidwendt

Remove public API of cudf.merge_sorted. (#11032) @bdice

Drop python 3.7 in code-base (#11029) @galipremsagar

Addition & integration of the integer power operator (#11025) @AtlantaPepsi

Refactor lists::contains (#11019) @ttnghia

Change build.sh to find C++ library by default and avoid shadowing CMAKE_ARGS (#11013) @vyasr

Clean up parquet unit test (#11005) @PointKernel

Add missing #pragma once to header files (#11004) @karthikeyann

Cleanup iterator.cuh and add fixed point support for scalar_optional_accessor (#10999) @ttnghia

Refactor cudf::contains (#10997) @ttnghia

Remove Arrow CUDA IPC code (#10995) @shwina

Change file extension for groupby benchmark (#10985) @ttnghia

Sort recipe include checks. (#10984) @bdice

Update cuCollections for thrust upgrade (#10983) @PointKernel

Expose row-group size options in cudf ParquetWriter (#10980) @rjzamora

Cleanup cudf::strings::detail::regex_parser class source (#10975) @davidwendt

Handle missing fields as nulls in get_json_object() (#10970) @SrikarVanavasam

Fix license families to match all-caps expected by conda-verify. (#10931) @bdice

Include <optional> for GCC 11 compatibility. (#10927) @bdice

Enable builds with scikit-build (#10919) @vyasr

Improve distinct by using cuco::static_map::retrieve_all (#10916) @PointKernel

update cudfjni to 22.08.0-SNAPSHOT (#10910) @pxLi

Improve the capture of fatal cuda error (#10884) @sperlingxx

Cleanup regex compiler operators and operands source (#10879) @davidwendt

Buffer: make .ptr read-only (#10872) @madsbk

Configurable NaN handling in device_row_comparators (#10870) @rwlee

Register cudf.core.groupby.Grouper objects to dask grouper_dispatch (#10838) @brandon-b-miller

Upgrade to arrow-8 (#10816) @galipremsagar

Remove getattr method in RangeIndex class (#10538) @skirui-source

Adding bins to value counts (#8247) @marlenezw

Source code(tar.gz)
Source code(zip)
v22.08.00(Aug 17, 2022)
🚨 Breaking Changes

Remove legacy join APIs (#11274) @vyasr

Remove lists::drop_list_duplicates (#11236) @ttnghia

Remove Index.replace API (#11131) @vyasr

Remove deprecated Index methods from Frame (#11073) @vyasr

Remove public API of cudf.merge_sorted. (#11032) @bdice

Drop python 3.7 in code-base (#11029) @galipremsagar

Return empty dataframe when reading a Parquet file using empty columns option (#11018) @vuule

Remove Arrow CUDA IPC code (#10995) @shwina

Buffer: make .ptr read-only (#10872) @madsbk

🐛 Bug Fixes

Fix distributed error related to loop_in_thread (#11428) @galipremsagar

Relax arrow pinning to just 8.x and remove cuda build dependency from cudf recipe (#11412) @kkraus14

Revert "Allow CuPy 11" (#11409) @jakirkham

Fix moto timeouts (#11369) @galipremsagar

Set +/-infinity as the identity values for floating-point numbers in device operators min and max (#11357) @ttnghia

Fix memory_usage() for ListSeries (#11355) @thomcom

Fix constructing Column from column_view with expired mask (#11354) @shwina

Handle parquet corner case: Columns with more rows than are in the row group. (#11353) @nvdbaranec

Fix DatetimeIndex & TimedeltaIndex constructors (#11342) @galipremsagar

Fix unsigned-compare compile warning in IntPow binops (#11339) @davidwendt

Fix performance issue and add a new code path to cudf::detail::contains (#11330) @ttnghia

Pin pytorch to temporarily unblock from libcupti errors (#11289) @galipremsagar

Workaround for nvcomp zstd overwriting blocks for orc due to underestimate of sizes (#11288) @jbrennan333

Fix inconsistency when hashing two tables in cudf::detail::contains (#11284) @ttnghia

Fix issue related to numpy array and category dtype (#11282) @galipremsagar

Add NotImplementedError when on is specified in DataFrame.join. (#11275) @vyasr

Fix invalid allocate_like() and empty_like() tests. (#11268) @nvdbaranec

Returns DataFrame When Concating Along Axis 1 (#11263) @isVoid

Fix compile error due to missing header (#11257) @ttnghia

Fix a memory aliasing/crash issue in scatter for lists. (#11254) @nvdbaranec

Fix tests/rolling/empty_input_test (#11238) @ttnghia

Fix const qualifier when using host_span<bitmask_type const*> (#11220) @ttnghia

Avoid using nvcompBatchedDeflateDecompressGetTempSizeEx in cuIO (#11213) @vuule

Generate benchmark data with correct run length regardless of cardinality (#11205) @vuule

Fix cumulative count index behavior (#11188) @brandon-b-miller

Fix assertion in dask_cudf test_struct_explode (#11170) @rjzamora

Provides a method for the user to remove the hook and re-register the hook in a custom shutdown hook manager (#11161) @res-life

Fix compatibility issues with pandas 1.4.3 (#11152) @vyasr

Ensure cuco export set is installed in cmake build (#11147) @jlowe

Avoid redundant deepcopy in cudf.from_pandas (#11142) @galipremsagar

Fix compile error due to missing header (#11126) @ttnghia

Fix __cuda_array_interface__ failures (#11113) @galipremsagar

Support octal and hex within regex character class pattern (#11112) @davidwendt

Fix split_re matching logic for word boundaries (#11106) @davidwendt

Handle multiple files metadata in read_parquet (#11105) @galipremsagar

Fix index alignment for Series objects with repeated index (#11103) @shwina

FindcuFile now searches in the current CUDA Toolkit location (#11101) @robertmaynard

Fix regex word boundary logic to include underline (#11099) @davidwendt

Exclude CudaFatalTest when selecting all Java tests (#11083) @jlowe

Fix duplicate cudatoolkit pinning issue (#11070) @galipremsagar

Maintain the input index in the result of a groupby-transform (#11068) @shwina

Fix bug with row count comparison for expect_columns_equivalent(). (#11059) @nvdbaranec

Fix BPE uninitialized size value for null and empty input strings (#11054) @davidwendt

Include missing header for usage of get_current_device_resource() (#11047) @AtlantaPepsi

Fix warn_unused_result error in parquet test (#11026) @karthikeyann

Return empty dataframe when reading a Parquet file using empty columns option (#11018) @vuule

Fix small error in page row count limiting (#10991) @etseidl

Fix a row index entry error in ORC writer issue (#10989) @vuule

Fix grouped covariance to require both values to be convertible to double. (#10891) @bdice

📖 Documentation

Fix issues with day & night modes in python docs (#11400) @galipremsagar

Update missing data handling APIs in docs (#11345) @galipremsagar

Add lists filtering APIs to doxygen group. (#11336) @bdice

Remove unused import in README sample (#11318) @vyasr

Note null behavior in where docs (#11276) @brandon-b-miller

Update docstring for spans in get_row_data_range (#11271) @vyasr

Update nvCOMP integration table (#11231) @vuule

Add dev docs for documentation writing (#11217) @vyasr

Documentation fix for concatenate (#11187) @dagardner-nv

Fix unresolved links in markdown (#11173) @karthikeyann

Fix cudf version in README.md install commands (#11164) @jvanstraten

Switch language from None to "en" in docs build (#11133) @galipremsagar

Remove docs mentioning scalar_view since no such class exists. (#11132) @bdice

Add docstring entry for DataFrame.value_counts (#11039) @galipremsagar

Add docs to rolling var, std, count. (#11035) @bdice

Fix docs for Numba UDFs. (#11020) @bdice

Replace column comparison utilities functions with macros (#11007) @karthikeyann

Fix Doxygen warnings in multiple headers files (#11003) @karthikeyann

Fix doxygen warnings in utilities/ headers (#10974) @karthikeyann

Fix Doxygen warnings in table header files (#10964) @karthikeyann

Fix Doxygen warnings in column header files (#10963) @karthikeyann

Fix Doxygen warnings in strings / header files (#10937) @karthikeyann

Generate Doxygen Tag File for Libcudf (#10932) @isVoid

Fix doxygen warnings in structs, lists headers (#10923) @karthikeyann

Fix doxygen warnings in fixed_point.hpp (#10922) @karthikeyann

Fix doxygen warnings in ast/, rolling, tdigest/, wrappers/, dictionary/ headers (#10921) @karthikeyann

fix doxygen warnings in cudf/io/types.hpp, other header files (#10913) @karthikeyann

fix doxygen warnings in cudf/io/ avro, csv, json, orc, parquet header files (#10912) @karthikeyann

Fix doxygen warnings in cudf/*.hpp (#10896) @karthikeyann

Add missing documentation in aggregation.hpp (#10887) @karthikeyann

Revise PR template. (#10774) @bdice

🚀 New Features

Change cmake to allow controlling Arrow version via cmake variable (#11429) @kkraus14

Adding support for list<int8> columns to be written as byte arrays in parquet (#11328) @hyperbolic2346

Adding byte array view structure (#11322) @hyperbolic2346

Adding byte_array statistics (#11303) @hyperbolic2346

Add column indexes to Parquet writer (#11302) @etseidl

Provide an Option for Default Integer and Floating Bitwidth (#11272) @isVoid

FST benchmark (#11243) @karthikeyann

Adds the Finite-State Transducer algorithm (#11242) @elstehle

Refactor collect_set to use cudf::distinct and cudf::lists::distinct (#11228) @ttnghia

Treat zstd as stable in nvcomp releases 2.3.2 and later (#11226) @jbrennan333

Add 24 bit dictionary support to Parquet writer (#11216) @devavret

Enable positive group indices for extractAllRecord on JNI (#11215) @anthony-chang

JNI bindings for NTH_ELEMENT window aggregation (#11201) @mythrocks

Add JNI bindings for extractAllRecord (#11196) @anthony-chang

Add cudf.options (#11193) @isVoid

Add thrift support for parquet column and offset indexes (#11178) @etseidl

Adding binary read/write as options for parquet (#11160) @hyperbolic2346

Support nth_element for window functions (#11158) @mythrocks

Implement lists::distinct and cudf::detail::stable_distinct (#11149) @ttnghia

Implement Groupby pct_change (#11144) @skirui-source

Add JNI for set operations (#11143) @ttnghia

Remove deprecated PER_THREAD_DEFAULT_STREAM (#11134) @jbrennan333

Added a Java method to check the existence of a list of keys in a map (#11128) @razajafri

Feature/python benchmarking (#11125) @vyasr

Support nan_equality in cudf::distinct (#11118) @ttnghia

Added JNI for getMapValueForKeys (#11104) @razajafri

Refactor semi_anti_join (#11100) @ttnghia

Replace remaining instances of rmm::cuda_stream_default with cudf::default_stream_value (#11082) @jbrennan333

Adds the Logical Stack algorithm (#11078) @elstehle

Add doxygen-check pre-commit hook (#11076) @karthikeyann

Use new nvCOMP API to optimize the decompression temp memory size (#11064) @vuule

Add Doxygen CI check (#11057) @karthikeyann

Support duplicate_keep_option in cudf::distinct (#11052) @ttnghia

Support set operations (#11043) @ttnghia

Support for ZLIB compression in ORC writer (#11036) @vuule

Adding feature swaplevels (#11027) @VamsiTallam95

Use nvCOMP for ZLIB decompression in ORC reader (#11024) @vuule

Function for bfill, ffill #9591 (#11022) @Sreekiran096

Generate group offsets from element labels (#11017) @ttnghia

Feature axes (#10979) @VamsiTallam95

Generate group labels from offsets (#10945) @ttnghia

Add missing cuIO benchmark coverage for duration types (#10933) @vuule

Dask-cuDF cumulative groupby ops (#10889) @brandon-b-miller

Reindex Improvements (#10815) @brandon-b-miller

Implement value_counts for DataFrame (#10813) @martinfalisse

🛠️ Improvements

Pin dask & distributed for release (#11433) @galipremsagar

Use documented header template for doxygen (#11430) @galipremsagar

Relax arrow version in dev env (#11418) @galipremsagar

Allow CuPy 11 (#11393) @jakirkham

Improve multibyte_split performance (#11347) @cwharris

Switch death test to use explicit trap. (#11326) @vyasr

Add --output-on-failure to ctest args. (#11321) @vyasr

Consolidate remaining DataFrame/Series APIs (#11315) @vyasr

Add JNI support for the join_strings API (#11309) @revans2

Add cupy version to setup.py install_requires (#11306) @vyasr

removing some unused code (#11305) @hyperbolic2346

Add test of wildcard selection (#11300) @vyasr

Update parquet reader to take stream parameter (#11294) @PointKernel

Spark list hashing (#11292) @bdice

Remove legacy join APIs (#11274) @vyasr

Fix cudf recipes syntax (#11273) @ajschmidt8

Fix cudf recipe (#11267) @ajschmidt8

Cleanup config files (#11266) @vyasr

Run mypy on all packages (#11265) @vyasr

Update to isort 5.10.1. (#11262) @vyasr

Consolidate flake8 and pydocstyle configuration (#11260) @vyasr

Remove redundant black config specifications. (#11258) @vyasr

Ensure DeprecationWarnings are not introduced via pre-commit (#11255) @wence-

Optimization to gpu::PreprocessColumnData in parquet reader. (#11252) @nvdbaranec

Move rolling impl details to detail/ directory. (#11250) @mythrocks

Remove lists::drop_list_duplicates (#11236) @ttnghia

Use cudf::lists::distinct in Python binding (#11234) @ttnghia

Use cudf::lists::distinct in Java binding (#11233) @ttnghia

Use cudf::distinct in Java binding (#11232) @ttnghia

Pin dask-cuda in dev environment (#11229) @galipremsagar

Remove cruft in map_lookup (#11221) @mythrocks

Deprecate skiprows & num_rows in parquet reader (#11218) @galipremsagar

Remove Frame._index (#11210) @vyasr

Improve performance for cudf::contains when searching for a scalar (#11202) @ttnghia

Document why Development component is needing for CMake. (#11200) @vyasr

cleanup unused code in rolling_test.hpp (#11195) @karthikeyann

Standardize join internals around DataFrame (#11184) @vyasr

Move character case table declarations from src to detail (#11183) @davidwendt

Remove usage of Frame in StringMethods (#11181) @vyasr

Expose get_json_object_options to Python (#11180) @SrikarVanavasam

Fix decimal128 stats in parquet writer (#11179) @etseidl

Modify CheckPageRows in parquet_test to use datasources (#11177) @etseidl

Pin max version of cuda-python to 11.7.0 (#11174) @Ethyling

Refactor and optimize Frame.where (#11168) @vyasr

Add npos const static member to cudf::string_view (#11166) @davidwendt

Move _drop_rows_by_label from Frame to IndexedFrame (#11157) @vyasr

Clean up _copy_type_metadata (#11156) @vyasr

Add nvcc conda package in dev environment (#11154) @galipremsagar

Struct binary comparison op functionality for spark rapids (#11153) @rwlee

Refactor inline conditionals. (#11151) @bdice

Refactor Spark hashing tests (#11145) @bdice

Add new _from_data_like_self factory (#11140) @vyasr

Update get_cucollections to use rapids-cmake (#11139) @vyasr

Remove unnecessary extra function for libcudacxx detection (#11138) @vyasr

Allow initial value for cudf::reduce and cudf::segmented_reduce. (#11137) @SrikarVanavasam

Remove Index.replace API (#11131) @vyasr

Move char-type table function declarations from src to detail (#11127) @davidwendt

Clean up repo root (#11124) @bdice

Improve print formatting of strings containing newline characters. (#11108) @nvdbaranec

Fix cudf::string_view::find() to return pos for empty string argument (#11107) @davidwendt

Forward-merge branch-22.06 to branch-22.08 (#11086) @bdice

Take iterators by value in clamp.cu. (#11084) @bdice

Performance improvements for row to column conversions (#11075) @hyperbolic2346

Remove deprecated Index methods from Frame (#11073) @vyasr

Use per-page max compressed size estimate for compression (#11066) @devavret

column to row refactor for performance (#11063) @hyperbolic2346

Include skbuild directory into build.sh clean operation (#11060) @galipremsagar

Unpin dask & distributed for development (#11058) @galipremsagar

Add support for Series.between (#11051) @galipremsagar

Fix groupby include (#11046) @bwyogatama

Regex cleanup internal reclass and reclass_device classes (#11045) @davidwendt

Remove public API of cudf.merge_sorted. (#11032) @bdice

Drop python 3.7 in code-base (#11029) @galipremsagar

Addition & integration of the integer power operator (#11025) @AtlantaPepsi

Refactor lists::contains (#11019) @ttnghia

Change build.sh to find C++ library by default and avoid shadowing CMAKE_ARGS (#11013) @vyasr

Clean up parquet unit test (#11005) @PointKernel

Add missing #pragma once to header files (#11004) @karthikeyann

Cleanup iterator.cuh and add fixed point support for scalar_optional_accessor (#10999) @ttnghia

Refactor cudf::contains (#10997) @ttnghia

Remove Arrow CUDA IPC code (#10995) @shwina

Change file extension for groupby benchmark (#10985) @ttnghia

Sort recipe include checks. (#10984) @bdice

Update cuCollections for thrust upgrade (#10983) @PointKernel

Expose row-group size options in cudf ParquetWriter (#10980) @rjzamora

Cleanup cudf::strings::detail::regex_parser class source (#10975) @davidwendt

Handle missing fields as nulls in get_json_object() (#10970) @SrikarVanavasam

Fix license families to match all-caps expected by conda-verify. (#10931) @bdice

Include <optional> for GCC 11 compatibility. (#10927) @bdice

Enable builds with scikit-build (#10919) @vyasr

Improve distinct by using cuco::static_map::retrieve_all (#10916) @PointKernel

update cudfjni to 22.08.0-SNAPSHOT (#10910) @pxLi

Improve the capture of fatal cuda error (#10884) @sperlingxx

Cleanup regex compiler operators and operands source (#10879) @davidwendt

Buffer: make .ptr read-only (#10872) @madsbk

Configurable NaN handling in device_row_comparators (#10870) @rwlee

Register cudf.core.groupby.Grouper objects to dask grouper_dispatch (#10838) @brandon-b-miller

Upgrade to arrow-8 (#10816) @galipremsagar

Remove getattr method in RangeIndex class (#10538) @skirui-source

Adding bins to value counts (#8247) @marlenezw

Source code(tar.gz)
Source code(zip)
v22.10.00a(Nov 3, 2022)
🔗 Links

Development Branch

Compare with main branch

🚨 Breaking Changes

Disable Zstandard decompression on nvCOMP 2.4 and Pascal GPus (#11856) @vuule

Disable nvCOMP DEFLATE integration (#11811) @vuule

Fix return type of Index.isna & Index.notna (#11769) @galipremsagar

Remove kwargs in read_csv & to_csv (#11762) @galipremsagar

Fix cudf::partition* APIs that do not return offsets for empty output table (#11709) @ttnghia

Fix regex negated classes to not automatically include new-lines (#11644) @davidwendt

Update zfill to match Python output (#11634) @davidwendt

Upgrade pandas to 1.5 (#11617) @galipremsagar

Change default value of ordered to False in CategoricalDtype (#11604) @galipremsagar

Move cudf::strings::findall_record to cudf::strings::findall (#11575) @davidwendt

Adding optional parquet reader schema (#11524) @hyperbolic2346

Deprecate skiprows and num_rows in read_orc (#11522) @galipremsagar

Remove support for skip_rows / num_rows options in the parquet reader. (#11503) @nvdbaranec

Drop support for skiprows and num_rows in cudf.read_parquet (#11480) @galipremsagar

Disable Arrow S3 support by default. (#11470) @bdice

Convert thrust::optional usages to std::optional (#11455) @robertmaynard

Remove unused is_struct trait. (#11450) @bdice

Refactor the Buffer class (#11447) @madsbk

Return empty dataframe when reading an ORC file using empty columns option (#11446) @vuule

Refactor pad_side and strip_type enums into side_type enum (#11438) @davidwendt

Remove HASH_SERIAL_MURMUR3 / serial32BitMurmurHash3 (#11383) @bdice

Use the new JSON parser when the experimental reader is selected (#11364) @vuule

Remove deprecated Series.applymap. (#11031) @bdice

Remove deprecated expand parameter from str.findall. (#11030) @bdice

🐛 Bug Fixes

Force using old fmt in nvbench. (#12064) @vyasr

Update cuda-python dependency to 11.7.1 (#11994) @shwina

Fixes bug in temporary decompression space estimation before calling nvcomp (#11879) @abellina

Handle ptx file paths during strings_udf import (#11862) @galipremsagar

Disable Zstandard decompression on nvCOMP 2.4 and Pascal GPus (#11856) @vuule

Reset strings_udf CEC and solve several related issues (#11846) @brandon-b-miller

Fix bug in new shuffle-based groupby implementation (#11836) @rjzamora

Fix is_valid checks in Scalar._binaryop (#11818) @wence-

Fix operator NotImplemented issue with numpy (#11816) @galipremsagar

Disable nvCOMP DEFLATE integration (#11811) @vuule

Build strings_udf package with other python packages in nightlies (#11808) @brandon-b-miller

Revert problematic shuffle=explicit-comms changes (#11803) @rjzamora

Fix regex out-of-bounds write in strided rows logic (#11797) @davidwendt

Build cudf locally before building strings_udf conda packages in CI (#11785) @brandon-b-miller

Fix an issue in cudf::row_bit_count involving structs and lists at multiple levels. (#11779) @nvdbaranec

Fix return type of Index.isna & Index.notna (#11769) @galipremsagar

Fix issue with set-item incase of list and struct types (#11760) @galipremsagar

Ensure all libcudf APIs run on cudf's default stream (#11759) @vyasr

Resolve dask_cudf failures caused by upstream groupby changes (#11755) @rjzamora

Fix ORC string sum statistics (#11740) @vuule

Add strings_udf package for python 3.9 (#11730) @brandon-b-miller

Ensure that all tests launch kernels on cudf's default stream (#11726) @vyasr

Don't assume stream is a compile-time constant expression (#11725) @vyasr

Fix get_thrust.cmake format at patch command (#11715) @davidwendt

Fix cudf::partition* APIs that do not return offsets for empty output table (#11709) @ttnghia

Fix cudf::lists::sort_lists for NaN and Infinity values (#11703) @davidwendt

Modify ORC reader timestamp parsing to match the apache reader behavior (#11699) @vuule

Fix DataFrame.from_arrow to preserve type metadata (#11698) @galipremsagar

Fix compile error due to missing header (#11697) @ttnghia

Default to Snappy compression in to_orc when using cuDF or Dask (#11690) @vuule

Fix an issue related to Multindex when group_keys=True (#11689) @galipremsagar

Transfer correct dtype to exploded column (#11687) @wence-

Ignore protobuf generated files in mypy checks (#11685) @galipremsagar

Maintain the index name after .loc (#11677) @shwina

Fix issue with extracting nested column data & dtype preservation (#11671) @galipremsagar

Ensure that all cudf tests and benchmarks are conda env aware (#11666) @robertmaynard

Update to Thrust 1.17.2 to fix cub ODR issues (#11665) @robertmaynard

Fix multi-file remote datasource bug (#11655) @rjzamora

Fix invalid regex quantifier check to not include alternation (#11654) @davidwendt

Fix bug in device_write(): it uses an incorrect size (#11651) @madsbk

fixes overflows in benchmarks (#11649) @elstehle

Fix regex negated classes to not automatically include new-lines (#11644) @davidwendt

Fix compile error in benchmark nested_json.cpp (#11637) @davidwendt

Update zfill to match Python output (#11634) @davidwendt

Removed converted type for INT32 and INT64 since they do not convert (#11627) @hyperbolic2346

Fix host scalars construction of nested types (#11612) @galipremsagar

Fix compile warning in nested_json_gpu.cu (#11607) @davidwendt

Change default value of ordered to False in CategoricalDtype (#11604) @galipremsagar

Preserve order if necessary when deduping categoricals internally (#11597) @brandon-b-miller

Add is_timestamp test for leap second (60) (#11594) @davidwendt

Fix an issue with to_arrow when column name type is not a string (#11590) @galipremsagar

Fix exception in segmented-reduce benchmark (#11588) @davidwendt

Fix encode/decode of negative timestamps in ORC reader/writer (#11586) @vuule

Correct distribution data type in quantiles benchmark (#11584) @vuule

Fix multibyte_split benchmark for host buffers (#11583) @upsj

xfail custreamz display test for now (#11567) @shwina

Fix JNI for TableWithMeta to use schema_info instead of column_names (#11566) @jlowe

Reduce code duplication for dask & distributed nightly/stable installs (#11565) @galipremsagar

Fix groupby failures in dask_cudf CI (#11561) @rjzamora

Fix for pivot: error when 'values' is a multicharacter string (#11538) @shaswat-indian

find_package(cudf) + arrow9 usable with cudf build directory (#11535) @robertmaynard

Fixing crash when writing binary nested data in parquet (#11526) @hyperbolic2346

Fix for: error when assigning a value to an empty series (#11523) @shaswat-indian

Fix invalid results from conditional-left-anti-join in debug build (#11517) @davidwendt

Fix cmake error after upgrading to Arrow 9 (#11513) @ttnghia

Fix reverse binary operators acting on a host value and cudf.Scalar (#11512) @bdice

Update parquet fuzz tests to drop support for skiprows & num_rows (#11505) @galipremsagar

Use rapids-cmake 22.10 best practice for RAPIDS.cmake location (#11493) @robertmaynard

Handle some zero-sized corner cases in dlpack interop (#11449) @wence-

Return empty dataframe when reading an ORC file using empty columns option (#11446) @vuule

libcudf c++ example updated to CPM version 0.35.3 (#11417) @robertmaynard

Fix regex quantifier check to include capture groups (#11373) @davidwendt

Fix read_text when byte_range is aligned with field (#11371) @upsj

Fix to_timestamps truncated subsecond calculation (#11367) @davidwendt

column: calculate null_count before release()ing the cudf::column (#11365) @wence-

📖 Documentation

Update guide-to-udfs notebook (#11861) @brandon-b-miller

Update docstring for cudf.read_text (#11799) @GregoryKimball

Add doc section for list & struct handling (#11770) @galipremsagar

Document that minimum required CMake version is now 3.23.1 (#11751) @robertmaynard

Update libcudf documentation build command in DOCUMENTATION.md (#11735) @davidwendt

Add docs for use of string data to DataFrame.apply and Series.apply and update guide to UDFs notebook (#11733) @brandon-b-miller

Enable more Pydocstyle rules (#11582) @bdice

Remove unused cpp/img folder (#11554) @davidwendt

Publish C++ developer docs (#11475) @vyasr

Fix a misalignment in cudf.get_dummies docstring (#11443) @galipremsagar

Update contributing doc to include links to the developer guides (#11390) @davidwendt

Fix table_view_base doxygen format (#11340) @davidwendt

Create main developer guide for Python (#11235) @vyasr

Add developer documentation for benchmarking (#11122) @vyasr

cuDF error handling document (#7917) @isVoid

🚀 New Features

Add hasNull statistic reading ability to ORC (#11747) @devavret

Add istitle to string UDFs (#11738) @brandon-b-miller

JSON Column creation in GPU (#11714) @karthikeyann

Adds option to take explicit nested schema for nested JSON reader (#11682) @elstehle

Add BGZIP data_chunk_reader (#11652) @upsj

Support DECIMAL order-by for RANGE window functions (#11645) @mythrocks

changing version of cmake to 3.23.3 (#11619) @hyperbolic2346

Generate unique keys table in java JNI contiguousSplitGroups (#11614) @res-life

Generic type casting to support the new nested JSON reader (#11613) @elstehle

JSON tree traversal (#11610) @karthikeyann

Add casting operators to masked UDFs (#11578) @brandon-b-miller

Adds type inference and type conversion for leaf-columns to the nested JSON parser (#11574) @elstehle

Add strings 'like' function (#11558) @davidwendt

Handle hyphen as literal for regex cclass when incomplete range (#11557) @davidwendt

Enable ZSTD compression in ORC and Parquet writers (#11551) @vuule

Adds support for json lines format to the nested JSON reader (#11534) @elstehle

Adding optional parquet reader schema (#11524) @hyperbolic2346

Adds GPU implementation of JSON-token-stream to JSON-tree (#11518) @karthikeyann

Add gdb pretty-printers for simple types (#11499) @upsj

Add create_random_column function to the data generator (#11490) @vuule

Add fluent API builder to data_profile (#11479) @vuule

Adds Nested Json benchmark (#11466) @karthikeyann

Convert thrust::optional usages to std::optional (#11455) @robertmaynard

Python API for the future experimental JSON reader (#11426) @vuule

Return schema info from JSON reader (#11419) @vuule

Add regex ASCII flag support for matching builtin character classes (#11404) @davidwendt

Truncate parquet column indexes (#11403) @etseidl

Adds the end-to-end JSON parser implementation (#11388) @elstehle

Use the new JSON parser when the experimental reader is selected (#11364) @vuule

Add placeholder for the experimental JSON reader (#11334) @vuule

Add read-only functions on string dtypes to DataFrame.apply and Series.apply (#11319) @brandon-b-miller

Added 'crosstab' and 'pivot_table' features (#11314) @shaswat-indian

Quickly error out when trying to build with unsupported nvcc versions (#11297) @robertmaynard

Adds JSON tokenizer (#11264) @elstehle

List lexicographic comparator (#11129) @devavret

Add generic type inference for cuIO (#11121) @PointKernel

Fully support nested types in cudf::contains (#10656) @ttnghia

Support nested types in lists::contains (#10548) @ttnghia

🛠️ Improvements

Pin dask and distributed for release (#11822) @galipremsagar

Add examples for Nested JSON reader (#11814) @GregoryKimball

Support shuffle-based groupby aggregations in dask_cudf (#11800) @rjzamora

Update strings udf version updater script (#11772) @galipremsagar

Remove kwargs in read_csv & to_csv (#11762) @galipremsagar

Pass dtype param to avoid pd.Series warnings (#11761) @galipremsagar

Enable schema_element & keep_quotes support in json reader (#11746) @galipremsagar

Add ability to construct ListColumn when size is None (#11745) @galipremsagar

Reduces memory requirements in JSON parser and adds bytes/s and peak memory usage to benchmarks (#11732) @elstehle

Add missing copyright headers. (#11712) @bdice

Fix copyright check issues in pre-commit (#11711) @bdice

Include decimal in supported types for range window order-by columns (#11710) @mythrocks

Disable very large column gtest for contiguous-split (#11706) @davidwendt

Drop split_out=None test from groupby.agg (#11704) @wence-

Use CubinLinker for CUDA Minor Version Compatibility (#11701) @gmarkall

Add regex capture-group parameter to auto convert to non-capture groups (#11695) @davidwendt

Add a __dataframe__ method to the protocol dataframe object (#11692) @rgommers

Special-case multibyte_split for single-byte delimiter (#11681) @upsj

Remove isort exclusions (#11680) @bdice

Refactor CSV reader benchmarks with nvbench (#11678) @PointKernel

Check conda recipe headers with pre-commit (#11669) @bdice

Remove redundant style check for clang-format. (#11668) @bdice

Add support for group_keys in groupby (#11659) @galipremsagar

Fix pandoc pinning. (#11658) @bdice

Revert removal of skip_rows / num_rows options from the Parquet reader. (#11657) @nvdbaranec

Update git metadata (#11647) @bdice

Call set_null_count on a returning column if null-count is known (#11646) @davidwendt

Fix some libcudf detail calls not passing the stream variable (#11642) @davidwendt

Update to mypy 0.971 (#11640) @wence-

Refactor strings strip functor to details header (#11635) @davidwendt

Fix incorrect nullCount in get_json_object (#11633) @trxcllnt

Simplify hostdevice_vector (#11631) @upsj

Refactor parquet writer benchmarks with nvbench (#11623) @PointKernel

Rework contains_scalar to check nulls at runtime (#11622) @davidwendt

Fix incorrect memory resource used in rolling temp columns (#11618) @mythrocks

Upgrade pandas to 1.5 (#11617) @galipremsagar

Move type-dispatcher calls from traits.hpp to traits.cpp (#11616) @davidwendt

Refactor parquet reader benchmarks with nvbench (#11611) @PointKernel

Forward-merge branch-22.08 to branch-22.10 (#11608) @bdice

Use stream in Java API. (#11601) @bdice

Refactors of public/detail APIs, CUDF_FUNC_RANGE, stream handling. (#11600) @bdice

Improve ORC writer benchmark with nvbench (#11598) @PointKernel

Tune multibyte_split kernel (#11587) @upsj

Move split_utils.cuh to strings/detail (#11585) @davidwendt

Fix warnings due to compiler regression with if constexpr (#11581) @ttnghia

Add full 24-bit dictionary support to Parquet writer (#11580) @etseidl

Expose "explicit-comms" option in shuffle-based dask_cudf functions (#11576) @rjzamora

Move cudf::strings::findall_record to cudf::strings::findall (#11575) @davidwendt

Refactor dask_cudf groupby to use apply_concat_apply (#11571) @rjzamora

Add ability to write list(struct) columns as map type in orc writer (#11568) @galipremsagar

Add byte_range to multibyte_split benchmark + NVBench refactor (#11562) @upsj

JNI support for writing binary columns in parquet (#11556) @revans2

Support additional dictionary bit widths in Parquet writer (#11547) @etseidl

Refactor string/numeric conversion utilities (#11545) @davidwendt

Removing unnecessary asserts in parquet tests (#11544) @hyperbolic2346

Clean up ORC reader benchmarks with NVBench (#11543) @PointKernel

Reuse MurmurHash3_32 in Parquet page data. (#11528) @bdice

Add hexadecimal value separators (#11527) @bdice

Deprecate skiprows and num_rows in read_orc (#11522) @galipremsagar

Struct support for NULL_EQUALS binary operation (#11520) @rwlee

Bump hadoop-common from 3.2.3 to 3.2.4 in /java (#11516) @dependabot[bot]

Fix Feather test warning. (#11511) @bdice

copy_range ballot_syncs to have no execution dependency (#11508) @robertmaynard

Upgrade to arrow-9.x (#11507) @galipremsagar

Remove support for skip_rows / num_rows options in the parquet reader. (#11503) @nvdbaranec

Single-pass multibyte_split (#11500) @upsj

Sanitize percentile_approx() output for empty input (#11498) @SrikarVanavasam

Unpin dask and distributed for development (#11492) @galipremsagar

Move SparkMurmurHash3_32 functor. (#11489) @bdice

Refactor group_nunique.cu to use nullate::DYNAMIC for reduce-by-key functor (#11482) @davidwendt

Drop support for skiprows and num_rows in cudf.read_parquet (#11480) @galipremsagar

Add reduction distinct_count benchmark (#11473) @ttnghia

Add groupby nunique aggregation benchmark (#11472) @ttnghia

Disable Arrow S3 support by default. (#11470) @bdice

Add groupby max aggregation benchmark (#11464) @ttnghia

Extract Dremel encoding code from Parquet (#11461) @vyasr

Add missing Thrust #includes. (#11457) @bdice

Make CMake hooks verbose (#11456) @vyasr

Control Parquet page size through Python API (#11454) @etseidl

Add control of Parquet column index creation to python (#11453) @etseidl

Remove unused is_struct trait. (#11450) @bdice

Refactor the Buffer class (#11447) @madsbk

Refactor pad_side and strip_type enums into side_type enum (#11438) @davidwendt

Update to Thrust 1.17.0 (#11437) @bdice

Add in JNI for parsing JSON data and getting the metadata back too. (#11431) @revans2

Convert byte_array_view to use std::byte (#11424) @hyperbolic2346

Deprecate unflatten_nested_columns (#11421) @SrikarVanavasam

Remove HASH_SERIAL_MURMUR3 / serial32BitMurmurHash3 (#11383) @bdice

Add Spark list hashing Java tests (#11379) @bdice

Move cmake to the build section. (#11376) @vyasr

Remove use of CUDA driver API calls from libcudf (#11370) @shwina

Add column constructor from device_uvector&& (#11356) @SrikarVanavasam

Remove unused custreamz thirdparty directory (#11343) @vyasr

Update jni version to 22.10.0-SNAPSHOT (#11338) @pxLi

Enable using upstream jitify2 (#11287) @shwina

Cache cudf.Scalar (#11246) @shwina

Remove deprecated Series.applymap. (#11031) @bdice

Remove deprecated expand parameter from str.findall. (#11030) @bdice

Source code(tar.gz)
Source code(zip)
v22.06.01(Jul 6, 2022)

v22.06.01
Source code(tar.gz)
Source code(zip)
v22.06.00(Jun 7, 2022)
🚨 Breaking Changes

Enable Zstandard decompression only when all nvcomp integrations are enabled (#10944) @vuule

Rename sliced_child to get_sliced_child. (#10885) @bdice

Add parameters to control page size in Parquet writer (#10882) @etseidl

Make cudf::test::expect_columns_equal() to fail when comparing unsanitary lists. (#10880) @nvdbaranec

Cleanup regex compiler fixed quantifiers source (#10843) @davidwendt

Refactor cudf::contains, renaming and switching parameters role (#10802) @ttnghia

Generic serialization of all column types (#10784) @wence-

Return per-file metadata from readers (#10782) @vuule

HostColumnVectoreCore#isNull should return true for out-of-range rows (#10779) @gerashegalov

Update groupby::hash to use new row operators for keys (#10770) @PointKernel

update mangle_dupe_cols behavior in csv reader to match pandas 1.4.0 behavior (#10749) @karthikeyann

Rename CUDA_TRY macro to CUDF_CUDA_TRY, rename CHECK_CUDA macro to CUDF_CHECK_CUDA. (#10589) @bdice

Upgrade cudf to support pandas 1.4.x versions (#10584) @galipremsagar

Move binop methods from Frame to IndexedFrame and standardize the docstring (#10576) @vyasr

Add default= kwarg to .list.get() accessor method (#10547) @shwina

Remove deprecated decimal_cols_as_float in the ORC reader (#10515) @vuule

Support nvComp 2.3 if local, otherwise use nvcomp 2.2 (#10513) @robertmaynard

Fix findall_record to return empty list for no matches (#10491) @davidwendt

Namespace/Docstring Fixes for Reduction (#10471) @isVoid

Additional refactoring of hash functions (#10462) @bdice

Fix default value of str.split expand parameter. (#10457) @bdice

Remove deprecated code. (#10450) @vyasr

🐛 Bug Fixes

Fix single column MultiIndex issue in sort_index (#10957) @galipremsagar

Make SerializedTableHeader(numRows) public (#10949) @gerashegalov

Fix gcc_linux version pinning in dev environment (#10943) @galipremsagar

Fix an issue with reading raw string in cudf.read_json (#10924) @galipremsagar

Make cudf::test::expect_columns_equal() to fail when comparing unsanitary lists. (#10880) @nvdbaranec

Fix segmented_reduce on empty column with non-empty offsets (#10876) @davidwendt

Fix dask-cudf groupby handling when grouping by all columns (#10866) @charlesbluca

Fix a bug in distinct: using nested nulls logic (#10848) @PointKernel

Fix constness / references in weak ordering operator() signatures. (#10846) @bdice

Suppress sizeof-array-div warnings in thrust found by gcc-11 (#10840) @robertmaynard

Add handling for string by-columns in dask-cudf groupby (#10830) @charlesbluca

Fix compile warning in search.cu (#10827) @davidwendt

Fix element access const correctness in hostdevice_vector (#10804) @vuule

Update cuco git tag (#10788) @PointKernel

HostColumnVectoreCore#isNull should return true for out-of-range rows (#10779) @gerashegalov

Fixing deprecation warnings in test_orc.py (#10772) @hyperbolic2346

Enable writing to s3 storage in chunked parquet writer (#10769) @galipremsagar

Fix construction of nested structs with EMPTY child (#10761) @shwina

Fix replace error when regex has only zero match quantifiers (#10760) @davidwendt

Fix an issue with one_level_list schemas in parquet reader. (#10750) @nvdbaranec

update mangle_dupe_cols behavior in csv reader to match pandas 1.4.0 behavior (#10749) @karthikeyann

Fix cupy function in notebook (#10737) @ajschmidt8

Fix fillna to retain columns when it is MultiIndex (#10729) @galipremsagar

Fix scatter for all-empty-string column case (#10724) @davidwendt

Retain series name in Series.apply (#10716) @brandon-b-miller

Correct build dir cudf-config dependency issues for static builds (#10704) @robertmaynard

Fix list of testing requirements in setup.py. (#10678) @bdice

Fix rounding to zero error in stod on very small float numbers (#10672) @davidwendt

cuco isn't a cudf dependency when we are built shared (#10662) @robertmaynard

Fix to_timestamps to support Z for %z format specifier (#10617) @davidwendt

Verify compression type in Parquet reader (#10610) @vuule

Fix struct row comparator's exception on empty structs (#10604) @sperlingxx

Fix strings strip() to accept only str Scalar for to_strip parameter (#10597) @davidwendt

Fix has_atomic_support check in can_use_hash_groupby() (#10588) @jbrennan333

Revert Thrust 1.16 to Thrust 1.15 (#10586) @bdice

Fix missing RMM_STATIC_CUDART define when compiling JNI with static CUDA runtime (#10585) @jlowe

pin more cmake versions (#10570) @robertmaynard

Re-enable Build Metrics Report (#10562) @davidwendt

Remove statically linked CUDA runtime check in Java build (#10532) @jlowe

Fix temp data cleanup in test_text.py (#10524) @brandon-b-miller

Update pre-commit to run black 22.3.0 (#10523) @vyasr

Remove deprecated decimal_cols_as_float in the ORC reader (#10515) @vuule

Fix findall_record to return empty list for no matches (#10491) @davidwendt

Allow users to specify data types for a subset of columns in read_csv (#10484) @vuule

Fix default value of str.split expand parameter. (#10457) @bdice

Improve coverage of dask-cudf's groupby aggregation, add tests for dropna support (#10449) @charlesbluca

Allow string aggs for dask_cudf.CudfDataFrameGroupBy.aggregate (#10222) @charlesbluca

In-place updates with loc or iloc don't work correctly when the LHS has more than one column (#9918) @skirui-source

📖 Documentation

Clarify append deprecation notice. (#10930) @bdice

Use full name of GPUDirect Storage SDK in docs (#10904) @vuule

Update Dask + Pandas to Dask + cuDF path (#10897) @miguelusque

Add missing documentation in cudf/types.hpp (#10895) @karthikeyann

Add strong index iterator docs. (#10888) @bdice

spell check fixes (#10865) @karthikeyann

Add missing documentation in scalar/ headers (#10861) @karthikeyann

Remove typo in ngram documentation (#10859) @miguelusque

fix doxygen warnings (#10842) @karthikeyann

Add a library_design.md file documenting the core Python data structures and their relationship (#10817) @vyasr

Add NumPy to intersphinx references. (#10809) @bdice

Add a section to the docs that compares cuDF with Pandas (#10796) @shwina

Mention 2 cpp-reviewer requirement in pull request template (#10768) @davidwendt

Enable pydocstyle for all packages. (#10759) @bdice

Enable pydocstyle rules involving quotes (#10748) @vyasr

Revise 10 minutes notebook. (#10738) @bdice

Reorganize cuDF Python docs (#10691) @shwina

Fix sphinx/jupyter heading issue in UDF notebook (#10690) @brandon-b-miller

Migrated user guide notebooks to MyST-NB and added sphinx extension (#10685) @mmccarty

add data generation to benchmark documentation (#10677) @karthikeyann

Fix some docs build warnings (#10674) @galipremsagar

Update UDF notebook in User Guide. (#10668) @bdice

Improve User Guide docs (#10663) @bdice

Fix some docstrings formatting (#10660) @galipremsagar

Remove implementation details from apply docstrings (#10651) @brandon-b-miller

Revise CONTRIBUTING.md (#10644) @bdice

Add missing APIs to documentation. (#10643) @bdice

Use cudf.read_json as documented API name. (#10640) @bdice

Fix docstring section headings. (#10639) @bdice

Document cudf.read_text and cudf.read_avro. (#10638) @bdice

Fix type-o in docstring for json_reader_options (#10627) @dagardner-nv

Update guide to UDFs with notes about Series.applymap deprecation and related changes (#10607) @brandon-b-miller

Fix doxygen Modules page for cudf::lists::sequences (#10561) @davidwendt

Add Replace Backreferences section to Regex Features page (#10560) @davidwendt

Introduce deprecation policy to developer guide. (#10252) @vyasr

🚀 New Features

Enable Zstandard decompression only when all nvcomp integrations are enabled (#10944) @vuule

Handle nested types in cudf::concatenate_rows() (#10890) @nvdbaranec

Strong index types for equality comparator (#10883) @ttnghia

Add parameters to control page size in Parquet writer (#10882) @etseidl

Support for Zstandard decompression in ORC reader (#10873) @vuule

Use pre-built nvcomp 2.3 binaries by default (#10851) @robertmaynard

Support for Zstandard decompression in Parquet reader (#10847) @vuule

Add JNI support for apply_boolean_mask (#10812) @res-life

Segmented Min/Max for Fixed Point Types (#10794) @isVoid

Return per-file metadata from readers (#10782) @vuule

Segmented apply_boolean_mask for LIST columns (#10773) @mythrocks

Update groupby::hash to use new row operators for keys (#10770) @PointKernel

Support purging non-empty null elements from LIST/STRING columns (#10701) @mythrocks

Add detail::hash_join (#10695) @PointKernel

Persist string statistics data across multiple calls to orc chunked write (#10694) @hyperbolic2346

Add .list.astype() to cast list leaves to specified dtype (#10693) @shwina

JNI: Add generateListOffsets API (#10683) @sperlingxx

Support args in groupby apply (#10682) @brandon-b-miller

Enable segmented_gather in Java package (#10669) @sperlingxx

Add row hasher with nested column support (#10641) @devavret

Add support for numeric_only in DataFrame._reduce (#10629) @martinfalisse

First step toward statistics in ORC files with chunked writes (#10567) @hyperbolic2346

Add support for struct columns to the random table generator (#10566) @vuule

Enable passing a sequence for the index argument to .list.get() (#10564) @shwina

Add python bindings for cudf::list::index_of (#10549) @ChrisJar

Add default= kwarg to .list.get() accessor method (#10547) @shwina

Add cudf.DataFrame.applymap (#10542) @brandon-b-miller

Support nvComp 2.3 if local, otherwise use nvcomp 2.2 (#10513) @robertmaynard

Add column field ID control in parquet writer (#10504) @PointKernel

Deprecate Series.applymap (#10497) @brandon-b-miller

Add option to drop cache in cuIO benchmarks (#10488) @vuule

move benchmark input generation in device in reduction nvbench (#10486) @karthikeyann

Support Segmented Min/Max Reduction on String Type (#10447) @isVoid

List element Equality comparator (#10289) @devavret

Implement all methods of groupby rank aggregation in libcudf, python (#9569) @karthikeyann

Implement DataFrame.eval using libcudf ASTs (#8022) @vyasr

🛠️ Improvements

Use conda compilers in env file (#10915) @galipremsagar

Remove C style artifacts in cuIO (#10886) @vuule

Rename sliced_child to get_sliced_child. (#10885) @bdice

Replace defaulted stream value for libcudf APIs that use NVCOMP (#10877) @jbrennan333

Add more unit tests for cudf::distinct for nested types with sliced input (#10860) @ttnghia

Changing list_view.cuh to list_view.hpp (#10854) @ttnghia

More error checking in from_dlpack (#10850) @wence-

Cleanup regex compiler fixed quantifiers source (#10843) @davidwendt

Adds the JNI call for Cuda.deviceSynchronize (#10839) @abellina

Add missing cuda-python dependency to cudf (#10833) @bdice

Change std::string parameters in cudf::strings APIs to std::string_view (#10832) @davidwendt

Split up search.cu to improve compile time (#10831) @davidwendt

Add tests for null scalar binaryops (#10828) @brandon-b-miller

Cleanup regex compile optimize functions (#10825) @davidwendt

Use ThreadedMotoServer instead of subprocess in spinning up s3 server (#10822) @galipremsagar

Import NA from missing rather than using cudf.NA everywhere (#10821) @brandon-b-miller

Refactor regex builtin character-class identifiers (#10814) @davidwendt

Change pattern parameter for regex APIs from std::string to std::string_view (#10810) @davidwendt

Make the JNI API to get list offsets as a view public. (#10807) @revans2

Add cudf JNI docker build github action (#10806) @pxLi

Removed mr parameter from inplace bitmask operations (#10805) @AtlantaPepsi

Refactor cudf::contains, renaming and switching parameters role (#10802) @ttnghia

Handle closed property in IntervalDtype.from_pandas (#10798) @wence-

Return weak orderings from device_row_comparator. (#10793) @rwlee

Rework Scalar imports (#10791) @brandon-b-miller

Enable ccache for cudfjni build in Docker (#10790) @gerashegalov

Generic serialization of all column types (#10784) @wence-

simplifying skiprows test in test_orc.py (#10783) @hyperbolic2346

Use column_views instead of column_device_views in binary operations. (#10780) @bdice

Add struct utility functions. (#10776) @bdice

Add multiple rows to subword tokenizer benchmark (#10767) @davidwendt

Refactor host decompression in ORC reader (#10764) @vuule

Flush output streams before creating a process to drop caches (#10762) @vuule

Refactor binaryop/compiled/util.cpp (#10756) @bdice

Use warp per string for long strings in cudf::strings::contains() (#10739) @davidwendt

Use generator expressions in any/all functions. (#10736) @bdice

Use canonical "magic methods" (replace x.__repr__() with repr(x)). (#10735) @bdice

Improve use of isinstance. (#10734) @bdice

Rename tests from multiIndex to multiindex. (#10732) @bdice

Two-table comparators with strong index types (#10730) @bdice

Replace std::make_pair with std::pair (C++17 CTAD) (#10727) @karthikeyann

Use structured bindings instead of std::tie (#10726) @karthikeyann

Missing f prefix on f-strings fix (#10721) @code-review-doctor

Add max_file_size parameter to chunked parquet dataset writer (#10718) @galipremsagar

Deprecate merge_sorted, change dask cudf usage to internal method (#10713) @isVoid

Prepare dask_cudf test_parquet.py for upcoming API changes (#10709) @rjzamora

Remove or simplify various utility functions (#10705) @vyasr

Allow building arrow with parquet and not python (#10702) @revans2

Partial cuIO GPU decompression refactor (#10699) @vuule

Cython API refactor: merge.pyx (#10698) @isVoid

Fix random string data length to become variable (#10697) @galipremsagar

Add bindings for index_of with column search key (#10696) @ChrisJar

Deprecate index merging (#10689) @vyasr

Remove cudf::strings::string namespace (#10684) @davidwendt

Standardize imports. (#10680) @bdice

Standardize usage of collections.abc. (#10679) @bdice

Cython API Refactor: transpose.pyx, sort.pyx (#10675) @isVoid

Add device_memory_resource parameter to create_string_vector_from_column (#10673) @davidwendt

Split up mixed-join kernels source files (#10671) @davidwendt

Use std::filesystem for temporary directory location and deletion (#10664) @vuule

cleanup benchmark includes (#10661) @karthikeyann

Use upstream clang-format pre-commit hook. (#10659) @bdice

Clean up C++ includes to use <> instead of "". (#10658) @bdice

Handle RuntimeError thrown by CUDA Python in validate_setup (#10653) @shwina

Rework JNI CMake to leverage rapids_find_package (#10649) @jlowe

Use conda to build python packages during GPU tests (#10648) @Ethyling

Deprecate various functions that don't need to be defined for Index. (#10647) @vyasr

Update pinning to allow newer CMake versions. (#10646) @vyasr

Bump hadoop-common from 3.1.4 to 3.2.3 in /java (#10645) @dependabot[bot]

Remove concurrent_unordered_multimap. (#10642) @bdice

Improve parquet dictionary encoding (#10635) @PointKernel

Improve cudf::cuda_error (#10630) @sperlingxx

Add support for null and non-numeric types in Series.diff and DataFrame.diff (#10625) @Matt711

Branch 22.06 merge 22.04 (#10624) @vyasr

Unpin dask & distributed for development (#10623) @galipremsagar

Slightly improve accuracy of stod in to_floats (#10622) @davidwendt

Allow libcudfjni to be built as a static library (#10619) @jlowe

Change stack-based regex state data to use global memory (#10600) @davidwendt

Resolve Forward merging of branch-22.04 into branch-22.06 (#10598) @galipremsagar

KvikIO as an alternative GDS backend (#10593) @madsbk

Rename CUDA_TRY macro to CUDF_CUDA_TRY, rename CHECK_CUDA macro to CUDF_CHECK_CUDA. (#10589) @bdice

Upgrade cudf to support pandas 1.4.x versions (#10584) @galipremsagar

Refactor binary ops for timedelta and datetime columns (#10581) @vyasr

Refactor cudf::strings::count_re API to use count_matches utility (#10580) @davidwendt

Update Programming Language :: Python Versions to 3.8 & 3.9 (#10579) @madsbk

Automate Java cudf jar build with statically linked dependencies (#10578) @gerashegalov

Add patch for thrust-cub 1.16 to fix sort compile times (#10577) @davidwendt

Move binop methods from Frame to IndexedFrame and standardize the docstring (#10576) @vyasr

Cleanup libcudf strings regex classes (#10573) @davidwendt

Simplify preprocessing of arguments for DataFrame binops (#10563) @vyasr

Reduce kernel calls to build strings findall results (#10559) @davidwendt

Forward-merge branch-22.04 to branch-22.06 (#10557) @bdice

Update strings contains benchmark to measure varying match rates (#10555) @davidwendt

JNI: throw CUDA errors more specifically (#10551) @sperlingxx

Enable building static libs (#10545) @trxcllnt

Remove pip requirements files. (#10543) @bdice

Remove Click pinnings that are unnecessary after upgrading black. (#10541) @vyasr

Refactor memory_usage to improve performance (#10537) @galipremsagar

Adjust the valid range of group index for replace_with_backrefs (#10530) @sperlingxx

add accidentally removed comment. (#10526) @vyasr

Update conda environment. (#10525) @vyasr

Remove ColumnBase.getitem (#10516) @vyasr

Optimize left_semi_join by materializing the gather mask (#10511) @cheinger

Define proper binary operation APIs for columns (#10509) @vyasr

Upgrade arrow-cpp & pyarrow to 7.0.0 (#10503) @galipremsagar

Update to Thrust 1.16 (#10489) @bdice

Namespace/Docstring Fixes for Reduction (#10471) @isVoid

Update cudfjni 22.06.0-SNAPSHOT (#10467) @pxLi

Use Lists of Columns for Various Files (#10463) @isVoid

Additional refactoring of hash functions (#10462) @bdice

Fix Series.str.findall behavior for expand=False. (#10459) @bdice

Remove deprecated code. (#10450) @vyasr

Update cmake-format version. (#10440) @vyasr

Consolidate C++ conda recipes and add libcudf-tests package (#10326) @ajschmidt8

Use conda compilers (#10275) @Ethyling

Add row bitmask as a detail::hash_join member (#10248) @PointKernel

Source code(tar.gz)
Source code(zip)
v22.04.00(Apr 6, 2022)
🚨 Breaking Changes

Drop unsupported method argument from nunique and distinct_count. (#10411) @bdice

Refactor stream compaction APIs (#10370) @PointKernel

Add scan_aggregation and reduce_aggregation derived types. (#10357) @nvdbaranec

Avoid decimal type narrowing for decimal binops (#10299) @galipremsagar

Rewrites sample API (#10262) @isVoid

Remove probe-time null equality parameters in cudf::hash_join (#10260) @PointKernel

Enable proper Index round-tripping in orc reader and writer (#10170) @galipremsagar

Add JNI for strings::split_re and strings::split_record_re (#10139) @ttnghia

Change cudf::strings::find_multiple to return a lists column (#10134) @davidwendt

Remove the option to completely disable decimal128 columns in the ORC reader (#10127) @vuule

Remove deprecated code (#10124) @vyasr

Update gpu_utils.py to reflect current CUDA support. (#10113) @bdice

Optimize compaction operations (#10030) @PointKernel

Remove deprecated method Series.set_index. (#9945) @bdice

Add cudf::strings::findall_record API (#9911) @davidwendt

Upgrade arrow & pyarrow to 6.0.1 (#9686) @galipremsagar

🐛 Bug Fixes

Fix an issue with tdigest merge aggregations. (#10506) @nvdbaranec

Batch of fixes for index overflows in grid stride loops. (#10448) @nvdbaranec

Update dask_cudf imports to be compatible with latest dask (#10442) @rlratzel

Fix for integer overflow in contiguous-split (#10437) @jbrennan333

Fix has_null predicate for drop_list_duplicates on nested structs (#10436) @sperlingxx

Fix empty reduce with List output and non-List input (#10435) @sperlingxx

Fix list and struct meta generation issue in dask-cudf (#10434) @galipremsagar

Fix error in cudf.to_numeric when a bool input is passed (#10431) @galipremsagar

Support cupy array in quantile input (#10429) @galipremsagar

Fix benchmarks to work with new aggregation types (#10428) @davidwendt

Fix cudf::shift to handle offset greater than column size (#10414) @davidwendt

Fix lifespan of the temporary directory that holds cuFile configuration file (#10403) @vuule

Fix error thrown in compiled-binaryop benchmark (#10398) @davidwendt

Limiting async allocator using alignment of 512 (#10395) @rongou

Include <optional> in multibyte split. (#10385) @bdice

Fix issue with column and scalar re-assignment (#10377) @galipremsagar

Fix floating point data generation in benchmarks (#10372) @vuule

Avoid overflow in fused_concatenate_kernel output_index (#10344) @abellina

Remove is_relationally_comparable for table device views (#10342) @davidwendt

Fix debug compile error in device_span to column_view conversion (#10331) @davidwendt

Add Pascal support to JCUDF transcode (row_conversion) (#10329) @mythrocks

Fix std::bad_alloc exception due to JIT reserving a huge buffer (#10317) @ttnghia

Fixes up the overflowed fixed-point round on nullable column (#10316) @sperlingxx

Fix DataFrame slicing issues for empty cases (#10310) @brandon-b-miller

Fix documentation issues (#10307) @ajschmidt8

Allow Java bindings to use default decimal precisions when writing columns (#10276) @sperlingxx

Fix incorrect slicing of GDS read/write calls (#10274) @vuule

Fix out-of-memory error in compiled-binaryop benchmark (#10269) @davidwendt

Add tests of reflected ufuncs and fix behavior of logical reflected ufuncs (#10261) @vyasr

Remove probe-time null equality parameters in cudf::hash_join (#10260) @PointKernel

Fix out-of-memory error in UrlDecode benchmark (#10258) @davidwendt

Fix groupby reductions that perform operations on source type instead of target type (#10250) @ttnghia

Fix small leak in explode (#10245) @revans2

Yet another small JNI memory leak (#10238) @revans2

Fix regex octal parsing to limit to 3 characters (#10233) @davidwendt

Fix string to decimal128 conversion handling large exponents (#10231) @davidwendt

Fix JNI leak on copy to device (#10229) @revans2

Fix the data generator element size for decimal types (#10225) @vuule

Fix decimal metadata in parquet writer (#10224) @galipremsagar

Fix strings handling of hex in regex pattern (#10220) @davidwendt

Fix docs builds (#10216) @ajschmidt8

Fix a leftover _has_nulls change from Nullate (#10211) @devavret

Fix bitmask of the output for JNI of lists::drop_list_duplicates (#10210) @ttnghia

Fix compile error in binaryop/compiled/util.cpp (#10209) @ttnghia

Skip ORC and Parquet readers' benchmark cases that are not currently supported (#10194) @vuule

Fix JNI leak of a cudf::column_view native class. (#10171) @revans2

Enable proper Index round-tripping in orc reader and writer (#10170) @galipremsagar

Convert Column Name to String Before Using Struct Column Factory (#10156) @isVoid

Preserve the correct ListDtype while creating an identical empty column (#10151) @galipremsagar

benchmark fixture - static object pointer fix (#10145) @karthikeyann

Fix UDF Caching (#10133) @brandon-b-miller

Raise duplicate column error in DataFrame.rename (#10120) @galipremsagar

Fix flaky memory usage test by guaranteeing array size. (#10114) @vyasr

Encode values from python callback for C++ (#10103) @jdye64

Add check for regex instructions causing an infinite-loop (#10095) @davidwendt

Remove metadata singleton from nvtext normalizer (#10090) @davidwendt

Column equality testing fixes (#10011) @brandon-b-miller

Pin libcudf runtime dependency for cudf / libcudf-kafka nightlies (#9847) @charlesbluca

📖 Documentation

Fix documentation for DataFrame.corr and Series.corr. (#10493) @bdice

Add cut to API docs (#10479) @shwina

Remove documentation for methods removed in #10124. (#10366) @bdice

Fix documentation issues (#10306) @ajschmidt8

Fix fixed_point binary operation documentation (#10198) @codereport

Remove cleaned up methods from docs (#10189) @galipremsagar

Update developer guide to recommend no default stream parameter. (#10136) @bdice

Update benchmarking guide to use NVBench. (#10093) @bdice

🚀 New Features

Add StringIO support to read_text (#10465) @cwharris

Add support for tdigest and merge_tdigest aggregations through cudf::reduce (#10433) @nvdbaranec

JNI support for Collect Ops in Reduction (#10427) @sperlingxx

Enable read_text with dask_cudf using byte_range (#10407) @ChrisJar

Add cudf::stable_sort_by_key (#10387) @PointKernel

Implement maps_column_view abstraction over LIST<STRUCT<K,V>> (#10380) @mythrocks

Support Java bindings for Avro reader (#10373) @HaoYang670

Refactor stream compaction APIs (#10370) @PointKernel

Support collect aggregations in reduction (#10353) @sperlingxx

Refactor array_ufunc for Index and unify across all classes (#10346) @vyasr

Add JNI for extract_list_element with index column (#10341) @firestarman

Support min and max operations for structs in rolling window (#10332) @ttnghia

Add device create_sequence_table for benchmarks (#10300) @karthikeyann

Enable numpy ufuncs for DataFrame (#10287) @vyasr

move input generation for json benchmark to device (#10281) @karthikeyann

move input generation for type dispatcher benchmark to device (#10280) @karthikeyann

move input generation for copy benchmark to device (#10279) @karthikeyann

generate url decode benchmark input in device (#10278) @karthikeyann

device input generation in join bench (#10277) @karthikeyann

Add nvtext::byte_pair_encoding API (#10270) @davidwendt

Prevent internal usage of expensive APIs (#10263) @vyasr

Column to JCUDF row for tables with strings (#10235) @hyperbolic2346

Support percent_rank() aggregation (#10227) @mythrocks

Refactor Series.array_ufunc (#10217) @vyasr

Reduce pytest runtime (#10203) @brandon-b-miller

Add regex flags parameter to python cudf strings split (#10185) @davidwendt

Support for MOD, PMOD and PYMOD for decimal32/64/128 (#10179) @codereport

Adding string row size iterator for row to column and column to row conversion (#10157) @hyperbolic2346

Add file size counter to cuIO benchmarks (#10154) @vuule

byte_range support for multibyte_split/read_text (#10150) @cwharris

Add JNI for strings::split_re and strings::split_record_re (#10139) @ttnghia

Add maxSplit parameter to Java binding for strings:split (#10137) @ttnghia

Add libcudf strings split API that accepts regex pattern (#10128) @davidwendt

generate benchmark input in device (#10109) @karthikeyann

Avoid nan_as_null op if nan_count is 0 (#10082) @galipremsagar

Add Dataframe and Index nunique (#10077) @martinfalisse

Support nanosecond timestamps in parquet (#10063) @PointKernel

Java bindings for mixed semi and anti joins (#10040) @jlowe

Implement mixed equality/conditional semi/anti joins (#10037) @vyasr

Optimize compaction operations (#10030) @PointKernel

Support args= in Series.apply (#9982) @brandon-b-miller

Add cudf::strings::findall_record API (#9911) @davidwendt

Add covariance for sort groupby (python) (#9889) @mayankanand007

Implement DataFrame diff() (#9817) @skirui-source

Implement DataFrame pct_change (#9805) @skirui-source

Support segmented reductions and null mask reductions (#9621) @isVoid

Add 'spearman' correlation method for dataframe.corr and series.corr (#7141) @dominicshanshan

🛠️ Improvements

Add scipy skip for a test (#10502) @galipremsagar

Temporarily disable new ops-bot functionality (#10496) @ajschmidt8

Include <cstddef> to fix compilation of parquet reader on GCC 11. (#10483) @bdice

Pin dask and distributed (#10481) @galipremsagar

MD5 refactoring. (#10445) @bdice

Remove or split up Frame methods that use the index (#10439) @vyasr

Centralization of tdigest aggregation code. (#10422) @nvdbaranec

Simplify column binary operations (#10421) @vyasr

Add .github/ops-bot.yaml config file (#10420) @ajschmidt8

Use list of columns for methods in Groupby.pyx (#10419) @isVoid

Remove warnings in test_timedelta.py (#10418) @galipremsagar

Fix some warnings in test_parquet.py (#10416) @galipremsagar

JNI support for segmented reduce (#10413) @revans2

Clean up null mask after purging null entries (#10412) @sperlingxx

Drop unsupported method argument from nunique and distinct_count. (#10411) @bdice

Use str instead of builtins.str. (#10410) @bdice

Fix warnings in test_rolling (#10405) @bdice

Enable codecov github-check in CI (#10404) @galipremsagar

Fix warnings in test_cuda_apply, test_numerical, test_pickling, test_unaops. (#10402) @bdice

Set column names in _from_columns_like_self factory (#10400) @isVoid

Refactor nvtx annotations in cudf & dask-cudf (#10396) @galipremsagar

Consolidate .cov and .corr for sort groupby (#10386) @skirui-source

Consolidate some Frame APIs (#10381) @vyasr

Refactor hash functions and hash_combine (#10379) @bdice

Add nvtx annotations for Series and Index (#10374) @galipremsagar

Refactor filling.repeat API (#10371) @isVoid

Move standalone UTF8 functions from string_view.hpp to utf8.hpp (#10369) @davidwendt

Remove doc for deprecated function one_hot_encoding (#10367) @isVoid

Refactor array function (#10364) @vyasr

Fix warnings in test_csv.py. (#10362) @bdice

Implement a mixin for binops (#10360) @vyasr

Refactor cython interface: copying.pyx (#10359) @isVoid

Implement a mixin for scans (#10358) @vyasr

Add scan_aggregation and reduce_aggregation derived types. (#10357) @nvdbaranec

Add cleanup of python artifacts (#10355) @galipremsagar

Fix warnings in test_categorical.py. (#10354) @bdice

Create a dispatcher for invoking regex kernel functions (#10349) @davidwendt

Fix codecov in CI (#10347) @galipremsagar

Enable caching for memory_usage calculation in Column (#10345) @galipremsagar

C++17 cleanup: traits replace std::enable_if<>::type with std::enable_if_t (#10343) @karthikeyann

JNI: Support appending DECIMAL128 into ColumnBuilder in terms of byte array (#10338) @sperlingxx

multibyte_split test improvements (#10328) @vuule

Fix warnings in test_binops.py. (#10327) @bdice

Fix warnings from pandas in test_array_ufunc.py. (#10324) @bdice

Update upload script (#10321) @ajschmidt8

Move hash type declarations to hashing.hpp (#10320) @davidwendt

C++17 cleanup: traits replace ::value with _v (#10319) @karthikeyann

Remove internal columns usage (#10315) @vyasr

Remove extraneous build.sh parameter (#10313) @ajschmidt8

Add const qualifier to MurmurHash3_32::hash_combine (#10311) @davidwendt

Remove TODO in libcudf_kafka recipe (#10309) @ajschmidt8

Add conversions between column_view and device_span<T const>. (#10302) @bdice

Avoid decimal type narrowing for decimal binops (#10299) @galipremsagar

Deprecate DataFrame.iteritems and introduce .items (#10298) @galipremsagar

Explicitly request CMake use gnu++17 over c++17 (#10297) @robertmaynard

Add copyright check as pre-commit hook. (#10290) @vyasr

DataFrame insert and creation optimizations (#10285) @galipremsagar

Improve hash join detail functions (#10273) @PointKernel

Replace custom cached_property implementation with functools (#10272) @shwina

Rewrites sample API (#10262) @isVoid

Bump hadoop-common from 3.1.0 to 3.1.4 in /java (#10259) @dependabot[bot]

Remove making redundant copy across code-base (#10257) @galipremsagar

Add more nvtx annotations (#10256) @galipremsagar

Add copyright check in cudf (#10253) @galipremsagar

Remove redundant copies in fillna to improve performance (#10241) @galipremsagar

Remove std::numeric_limit specializations for timestamp & durations (#10239) @codereport

Optimize DataFrame creation across code-base (#10236) @galipremsagar

Change pytest distribution algorithm and increase parallelism in CI (#10232) @galipremsagar

Add environment variables for I/O thread pool and slice sizes (#10218) @vuule

Add regex flags to strings findall functions (#10208) @davidwendt

Update dask-cudf parquet tests to reflect upstream bugfixes to _metadata (#10206) @charlesbluca

Remove unnecessary nunique function in Series. (#10205) @martinfalisse

Refactor DataFrame tests. (#10204) @bdice

Rewrites column.__setitem__, Use boolean_mask_scatter (#10202) @isVoid

Java utilities to aid in accelerating aggregations on 128-bit types (#10201) @jlowe

Fix docstrings alignment in Frame methods (#10199) @galipremsagar

Fix cuco pair issue in hash join (#10195) @PointKernel

Replace dask groupby .index usages with .by (#10193) @galipremsagar

Add regex flags to strings extract function (#10192) @davidwendt

Forward-merge branch-22.02 to branch-22.04 (#10191) @bdice

Add CMake install rule for tests (#10190) @ajschmidt8

Unpin dask & distributed (#10182) @galipremsagar

Add comments to explain test validation (#10176) @galipremsagar

Reduce warnings in pytest output (#10168) @bdice

Some consolidation of indexed frame methods (#10167) @vyasr

Refactor isin implementations (#10165) @vyasr

Faster struct row comparator (#10164) @devavret

Refactor groupby::get_groups. (#10161) @bdice

Deprecate decimal_cols_as_float in ORC reader (C++ layer) (#10152) @vuule

Replace ccache with sccache (#10146) @ajschmidt8

Murmur3 hash kernel cleanup (#10143) @rwlee

Deprecate decimal_cols_as_float in ORC reader (#10142) @galipremsagar

Run pyupgrade 2.31.0. (#10141) @bdice

Remove drop_nan from internal IndexedFrame._drop_na_rows. (#10140) @bdice

Change cudf::strings::find_multiple to return a lists column (#10134) @davidwendt

Update cmake-format script for branch 22.04. (#10132) @bdice

Accept r-value references in convert_table_for_return(): (#10131) @mythrocks

Remove the option to completely disable decimal128 columns in the ORC reader (#10127) @vuule

Remove deprecated code (#10124) @vyasr

Update gpu_utils.py to reflect current CUDA support. (#10113) @bdice

Remove benchmarks suffix (#10112) @bdice

Update cudf java binding version to 22.04.0-SNAPSHOT (#10084) @pxLi

Remove unnecessary docker files. (#10069) @vyasr

Limit benchmark iterations using environment variable (#10060) @karthikeyann

Add timing chart for libcudf build metrics report page (#10038) @davidwendt

JNI: Rewrite growBuffersAndRows to accelerate the HostColumnBuilder (#10025) @sperlingxx

Reduce redundant code in CUDF JNI (#10019) @mythrocks

Make snappy decompress check more efficient (#9995) @cheinger

Remove deprecated method Series.set_index. (#9945) @bdice

Implement a mixin for reductions (#9925) @vyasr

JNI: Push back decimal utils from spark-rapids (#9907) @sperlingxx

Add assert_column_memory_* (#9882) @isVoid

Add CUDF_UNREACHABLE macro. (#9727) @bdice

Upgrade arrow & pyarrow to 6.0.1 (#9686) @galipremsagar

Source code(tar.gz)
Source code(zip)
v22.02.00(Feb 2, 2022)
🚨 Breaking Changes

ORC writer API changes for granular statistics (#10058) @mythrocks

decimal128 Support for to/from_arrow (#9986) @codereport

Remove deprecated method one_hot_encoding (#9977) @isVoid

Remove str.subword_tokenize (#9968) @VibhuJawa

Remove deprecated method parameter from merge and join. (#9944) @bdice

Remove deprecated method DataFrame.hash_columns. (#9943) @bdice

Remove deprecated method Series.hash_encode. (#9942) @bdice

Refactoring ceil/round/floor code for datetime64 types (#9926) @mayankanand007

Introduce nan_as_null parameter for cudf.Index (#9893) @galipremsagar

Add regex_flags parameter to strings replace_re functions (#9878) @davidwendt

Break tie for top categorical columns in Series.describe (#9867) @isVoid

Add partitioning support in parquet writer (#9810) @devavret

Move drop_duplicates, drop_na, _gather, take to IndexFrame and create their _base_index counterparts (#9807) @isVoid

Raise temporary error for decimal128 types in parquet reader (#9804) @galipremsagar

Change default dtype of all nulls column from float to object (#9803) @galipremsagar

Remove unused masked udf cython/c++ code (#9792) @brandon-b-miller

Pick smallest decimal type with required precision in ORC reader (#9775) @vuule

Add decimal128 support to Parquet reader and writer (#9765) @vuule

Refactor TableTest assertion methods to a separate utility class (#9762) @jlowe

Use cuFile direct device reads/writes by default in cuIO (#9722) @vuule

Match pandas scalar result types in reductions (#9717) @brandon-b-miller

Add parameters to control row group size in Parquet writer (#9677) @vuule

Refactor bit counting APIs, introduce valid/null count functions, and split host/device side code for segmented counts. (#9588) @bdice

Add support for decimal128 in cudf python (#9533) @galipremsagar

Implement lists::index_of() to find positions in list rows (#9510) @mythrocks

Rewriting row/column conversions for Spark <-> cudf data conversions (#8444) @hyperbolic2346

🐛 Bug Fixes

Add check for negative stripe index in ORC reader (#10074) @vuule

Update Java tests to expect DECIMAL128 from Arrow (#10073) @jlowe

Avoid index materialization when DataFrame is created with un-named Series objects (#10071) @galipremsagar

fix gcc 11 compilation errors (#10067) @rongou

Fix columns ordering issue in parquet reader (#10066) @galipremsagar

Fix dataframe setitem with ndarray types (#10056) @galipremsagar

Remove implicit copy due to conversion from cudf::size_type and size_t (#10045) @robertmaynard

Include <optional> in headers that use std::optional (#10044) @robertmaynard

Fix repr and concat of StructColumn (#10042) @galipremsagar

Include row group level stats when writing ORC files (#10041) @vuule

build.sh respects the --build_metrics and --incl_cache_stats flags (#10035) @robertmaynard

Fix memory leaks in JNI native code. (#10029) @mythrocks

Update JNI to use new arena mr constructor (#10027) @rongou

Fix null check when comparing structs in arg_min operation of reduction/groupby (#10026) @ttnghia

Wrap CI script shell variables in quotes to fix local testing. (#10018) @bdice

cudftestutil no longer propagates compiler flags to external users (#10017) @robertmaynard

Remove CUDA_DEVICE_CALLABLE macro usage (#10015) @hyperbolic2346

Add missing list filling header in meta.yaml (#10007) @devavret

Fix conda recipes for custreamz & cudf_kafka (#10003) @ajschmidt8

Fix matching regex word-boundary (\b) in strings replace (#9997) @davidwendt

Fix null check when comparing structs in min and max reduction/groupby operations (#9994) @ttnghia

Fix octal pattern matching in regex string (#9993) @davidwendt

decimal128 Support for to/from_arrow (#9986) @codereport

Fix groupby shift/diff/fill after selecting from a GroupBy (#9984) @shwina

Fix the overflow problem of decimal rescale (#9966) @sperlingxx

Use default value for decimal precision in parquet writer when not specified (#9963) @devavret

Fix cudf java build error. (#9958) @firestarman

Use gpuci_mamba_retry to install local artifacts. (#9951) @bdice

Fix regression HostColumnVectorCore requiring native libs (#9948) @jlowe

Rename aggregate_metadata in writer to fix name collision (#9938) @devavret

Fixed issue with percentile_approx where output tdigests could have uninitialized data at the end. (#9931) @nvdbaranec

Resolve racecheck errors in ORC kernels (#9916) @vuule

Fix the java build after parquet partitioning support (#9908) @revans2

Fix compilation of benchmark for parquet writer. (#9905) @bdice

Fix a memcheck error in ORC writer (#9896) @vuule

Introduce nan_as_null parameter for cudf.Index (#9893) @galipremsagar

Fix fallback to sort aggregation for grouping only hash aggregate (#9891) @abellina

Add zlib to cudfjni link when using static libcudf library dependency (#9890) @jlowe

TimedeltaIndex constructor raises an AttributeError. (#9884) @skirui-source

Fix cudf.Scalar string datetime construction (#9875) @brandon-b-miller

Load libcufile.so with RTLD_NODELETE flag (#9872) @vuule

Break tie for top categorical columns in Series.describe (#9867) @isVoid

Fix null handling for structs min and arg_min in groupby, groupby scan, reduction, and inclusive_scan (#9864) @ttnghia

Add one-level list encoding support in parquet reader (#9848) @PointKernel

Fix an out-of-bounds read in validity copying in contiguous_split. (#9842) @nvdbaranec

Fix join of MultiIndex to Index with one column and overlapping name. (#9830) @vyasr

Fix caching in Series.applymap (#9821) @brandon-b-miller

Enforce boolean ascending for dask-cudf sort_values (#9814) @charlesbluca

Fix ORC writer crash with empty input columns (#9808) @vuule

Change default dtype of all nulls column from float to object (#9803) @galipremsagar

Load native dependencies when Java ColumnView is loaded (#9800) @jlowe

Fix dtype-argument bug in dask_cudf read_csv (#9796) @rjzamora

Fix overflow for min calculation in strings::from_timestamps (#9793) @revans2

Fix memory error due to lambda return type deduction limitation (#9778) @karthikeyann

Revert regex $/EOL end-of-string new-line special case handling (#9774) @davidwendt

Fix missing streams (#9767) @karthikeyann

Fix make_empty_scalar_like on list_type (#9759) @sperlingxx

Update cmake and conda to 22.02 (#9746) @devavret

Fix out-of-bounds memory write in decimal128-to-string conversion (#9740) @davidwendt

Match pandas scalar result types in reductions (#9717) @brandon-b-miller

Fix regex non-multiline EOL/$ matching strings ending with a new-line (#9715) @davidwendt

Fixed build by adding more checks for int8, int16 (#9707) @razajafri

Fix null handling when boolean dtype is passed (#9691) @galipremsagar

Fix stream usage in segmented_gather() (#9679) @mythrocks

📖 Documentation

Update decimal dtypes related docs entries (#10072) @galipremsagar

Fix regex doc describing hexadecimal escape characters (#10009) @davidwendt

Fix cudf compilation instructions. (#9956) @esoha-nvidia

Fix see also links for IO APIs (#9895) @galipremsagar

Fix build instructions for libcudf doxygen (#9837) @davidwendt

Fix some doxygen warnings and add missing documentation (#9770) @karthikeyann

update cuda version in local build (#9736) @karthikeyann

Fix doxygen for enum types in libcudf (#9724) @davidwendt

Spell check fixes (#9682) @karthikeyann

Fix links in C++ Developer Guide. (#9675) @bdice

🚀 New Features

Remove libcudacxx patch needed for nvcc 11.4 (#10057) @robertmaynard

Allow CuPy 10 (#10048) @jakirkham

Add in support for NULL_LOGICAL_AND and NULL_LOGICAL_OR binops (#10016) @revans2

Add groupby.transform (only support for aggregations) (#10005) @shwina

Add partitioning support to Parquet chunked writer (#10000) @devavret

Add jni for sequences (#9972) @wbo4958

Java bindings for mixed left, inner, and full joins (#9941) @jlowe

Java bindings for JSON reader support (#9940) @wbo4958

Enable transpose for string columns in cudf python (#9937) @galipremsagar

Support structs for cudf::contains with column/scalar input (#9929) @ttnghia

Implement mixed equality/conditional joins (#9917) @vyasr

Add cudf::strings::extract_all API (#9909) @davidwendt

Implement JNI for cudf::scatter APIs (#9903) @ttnghia

JNI: Function to copy and set validity from bool column. (#9901) @mythrocks

Add dictionary support to cudf::copy_if_else (#9887) @davidwendt

add run_benchmarks target for running benchmarks with json output (#9879) @karthikeyann

Add regex_flags parameter to strings replace_re functions (#9878) @davidwendt

Add_suffix and add_prefix for DataFrames and Series (#9846) @mayankanand007

Add JNI for cudf::drop_duplicates (#9841) @ttnghia

Implement per-list sequence (#9839) @ttnghia

adding series.transpose (#9835) @mayankanand007

Adding support for Series.autocorr (#9833) @mayankanand007

Support round operation on datetime64 datatypes (#9820) @mayankanand007

Add partitioning support in parquet writer (#9810) @devavret

Raise temporary error for decimal128 types in parquet reader (#9804) @galipremsagar

Add decimal128 support to Parquet reader and writer (#9765) @vuule

Optimize groupby::scan (#9754) @PointKernel

Add sample JNI API (#9728) @res-life

Support min and max in inclusive scan for structs (#9725) @ttnghia

Add first and last method to IndexedFrame (#9710) @isVoid

Support min and max reduction for structs (#9697) @ttnghia

Add parameters to control row group size in Parquet writer (#9677) @vuule

Run compute-sanitizer in nightly build (#9641) @karthikeyann

Implement Series.datetime.floor (#9571) @skirui-source

ceil/floor for DatetimeIndex (#9554) @mayankanand007

Add support for decimal128 in cudf python (#9533) @galipremsagar

Implement lists::index_of() to find positions in list rows (#9510) @mythrocks

custreamz oauth callback for kafka (librdkafka) (#9486) @jdye64

Add Pearson correlation for sort groupby (python) (#9166) @skirui-source

Interchange dataframe protocol (#9071) @iskode

Rewriting row/column conversions for Spark <-> cudf data conversions (#8444) @hyperbolic2346

🛠️ Improvements

Prepare upload scripts for Python 3.7 removal (#10092) @Ethyling

Simplify custreamz and cudf_kafka recipes files (#10065) @Ethyling

ORC writer API changes for granular statistics (#10058) @mythrocks

Remove python constraints in cutreamz and cudf_kafka recipes (#10052) @Ethyling

Unpin dask and distributed in CI (#10028) @galipremsagar

Add _from_column_like_self factory (#10022) @isVoid

Replace custom CUDA bindings previously provided by RMM with official CUDA Python bindings (#10008) @shwina

Use cuda::std::is_arithmetic in cudf::is_numeric trait. (#9996) @bdice

Clean up CUDA stream use in cuIO (#9991) @vuule

Use addressed-ordered first fit for the pinned memory pool (#9989) @rongou

Add strings tests to transpose_test.cpp (#9985) @davidwendt

Use gpuci_mamba_retry on Java CI. (#9983) @bdice

Remove deprecated method one_hot_encoding (#9977) @isVoid

Minor cleanup of unused Python functions (#9974) @vyasr

Use new efficient partitioned parquet writing in cuDF (#9971) @devavret

Remove str.subword_tokenize (#9968) @VibhuJawa

Forward-merge branch-21.12 to branch-22.02 (#9947) @bdice

Remove deprecated method parameter from merge and join. (#9944) @bdice

Remove deprecated method DataFrame.hash_columns. (#9943) @bdice

Remove deprecated method Series.hash_encode. (#9942) @bdice

use ninja in java ci build (#9933) @rongou

Add build-time publish step to cpu build script (#9927) @davidwendt

Refactoring ceil/round/floor code for datetime64 types (#9926) @mayankanand007

Remove various unused functions (#9922) @vyasr

Raise in query if dtype is not supported (#9921) @brandon-b-miller

Add missing imports tests (#9920) @Ethyling

Spark Decimal128 hashing (#9919) @rwlee

Replace thrust/std::get with structured bindings (#9915) @codereport

Upgrade thrust version to 1.15 (#9912) @robertmaynard

Remove conda envs for CUDA 11.0 and 11.2. (#9910) @bdice

Return count of set bits from inplace_bitmask_and. (#9904) @bdice

Use dynamic nullate for join hasher and equality comparator (#9902) @davidwendt

Update ucx-py version on release using rvc (#9897) @Ethyling

Remove IncludeCategories from .clang-format (#9876) @codereport

Support statically linking CUDA runtime for Java bindings (#9873) @jlowe

Add clang-tidy to libcudf (#9860) @codereport

Remove deprecated methods from Java Table class (#9853) @jlowe

Add test for map column metadata handling in ORC writer (#9852) @vuule

Use pandas to_offset to parse frequency string in date_range (#9843) @isVoid

add templated benchmark with fixture (#9838) @karthikeyann

Use list of column inputs for apply_boolean_mask (#9832) @isVoid

Added a few more tests for Decimal to String cast (#9818) @razajafri

Run doctests. (#9815) @bdice

Avoid overflow for fixed_point round (#9809) @sperlingxx

Move drop_duplicates, drop_na, _gather, take to IndexFrame and create their _base_index counterparts (#9807) @isVoid

Use vector factories for host-device copies. (#9806) @bdice

Refactor host device macros (#9797) @vyasr

Remove unused masked udf cython/c++ code (#9792) @brandon-b-miller

Allow custom sort functions for dask-cudf sort_values (#9789) @charlesbluca

Improve build time of libcudf iterator tests (#9788) @davidwendt

Copy Java native dependencies directly into classpath (#9787) @jlowe

Add decimal types to cuIO benchmarks (#9776) @vuule

Pick smallest decimal type with required precision in ORC reader (#9775) @vuule

Avoid overflow for fixed_point cudf::cast and performance optimization (#9772) @codereport

Use CTAD with Thrust function objects (#9768) @codereport

Refactor TableTest assertion methods to a separate utility class (#9762) @jlowe

Use Java classloader to find test resources (#9760) @jlowe

Allow cast decimal128 to string and add tests (#9756) @razajafri

Load balance optimization for contiguous_split (#9755) @nvdbaranec

Consolidate and improve reset_index (#9750) @isVoid

Update to UCX-Py 0.24 (#9748) @pentschev

Skip cufile tests in JNI build script (#9744) @pxLi

Enable string to decimal 128 cast (#9742) @razajafri

Use stop instead of stop_. (#9735) @bdice

Forward-merge branch-21.12 to branch-22.02 (#9730) @bdice

Improve cmake format script (#9723) @vyasr

Use cuFile direct device reads/writes by default in cuIO (#9722) @vuule

Add directory-partitioned data support to cudf.read_parquet (#9720) @rjzamora

Use stream allocator adaptor for hash join table (#9704) @PointKernel

Update check for inf/nan strings in libcudf float conversion to ignore case (#9694) @davidwendt

Update cudf JNI to 22.02.0-SNAPSHOT (#9681) @pxLi

Replace cudf's concurrent_ordered_map with cuco::static_map in semi/anti joins (#9666) @vyasr

Some improvements to parse_decimal function and bindings for is_fixed_point (#9658) @razajafri

Add utility to format ninja-log build times (#9631) @davidwendt

Allow runtime has_nulls parameter for row operators (#9623) @davidwendt

Use fsspec.parquet for improved read_parquet performance from remote storage (#9589) @rjzamora

Refactor bit counting APIs, introduce valid/null count functions, and split host/device side code for segmented counts. (#9588) @bdice

Use List of Columns as Input for drop_nulls, gather and drop_duplicates (#9558) @isVoid

Simplify merge internals and reduce overhead (#9516) @vyasr

Add struct generation support in datagenerator & fuzz tests (#9180) @galipremsagar

Simplify write_csv by removing unnecessary writer/impl classes (#9089) @cwharris

Source code(tar.gz)
Source code(zip)
v21.12.02(Dec 16, 2021)

v21.12.02
Source code(tar.gz)
Source code(zip)
v21.12.01(Dec 9, 2021)

v21.12.01
Source code(tar.gz)
Source code(zip)
v21.12.00(Dec 3, 2021)
🚨 Breaking Changes

Update bitmask_and and bitmask_or to return a pair of resulting mask and count of unset bits (#9616) @PointKernel

Remove sizeof and standardize on memory_usage (#9544) @vyasr

Add support for single-line regex anchors ^/$ in contains_re (#9482) @davidwendt

Refactor sorting APIs (#9464) @vyasr

Update Java nvcomp JNI bindings to nvcomp 2.x API (#9384) @jbrennan333

Support Python UDFs written in terms of rows (#9343) @brandon-b-miller

JNI: Support nested types in ORC writer (#9334) @firestarman

Optionally nullify out-of-bounds indices in segmented_gather(). (#9318) @mythrocks

Refactor cuIO timestamp processing with cuda::std::chrono (#9278) @PointKernel

Various internal MultiIndex improvements (#9243) @vyasr

🐛 Bug Fixes

Fix read_parquet bug for bytes input (#9669) @rjzamora

Use _gather internal for sort_* (#9668) @isVoid

Fix behavior of equals for non-DataFrame Frames and add tests. (#9653) @vyasr

Dont recompute output size if it is already available (#9649) @abellina

Fix read_parquet bug for extended dtypes from remote storage (#9638) @rjzamora

add const when getting data from a JNI data wrapper (#9637) @wjxiz1992

Fix debrotli issue on CUDA 11.5 (#9632) @vuule

Use std::size_t when computing join output size (#9626) @jlowe

Fix usecols parameter handling in dask_cudf.read_csv (#9618) @galipremsagar

Add support for string 'nan', 'inf' & '-inf' values while type-casting to float (#9613) @galipremsagar

Avoid passing NativeFileDatasource to pyarrow in read_parquet (#9608) @rjzamora

Fix test failure with cuda 11.5 in row_bit_count tests. (#9581) @nvdbaranec

Correct _LIBCUDACXX_CUDACC_VER value computation (#9579) @robertmaynard

Increase max RLE stream size estimate to avoid potential overflows (#9568) @vuule

Fix edge case in tdigest scalar generation for groups containing all nulls. (#9551) @nvdbaranec

Fix pytests failing in cuda-11.5 environment (#9547) @galipremsagar

compile libnvcomp with PTDS if requested (#9540) @jbrennan333

Fix segmented_gather() for null LIST rows (#9537) @mythrocks

Deprecate DataFrame.label_encoding, use private _label_encoding method internally. (#9535) @bdice

Fix several test and benchmark issues related to bitmask allocations. (#9521) @nvdbaranec

Fix for inserting duplicates in groupby result cache (#9508) @karthikeyann

Fix mismatched types error in clip() when using non int64 numeric types (#9498) @davidwendt

Match conda pinnings for style checks (revert part of #9412, #9433). (#9490) @bdice

Make sure all dask-cudf supported aggs are handled in _tree_node_agg (#9487) @charlesbluca

Resolve hash_columns FutureWarning in dask_cudf (#9481) @pentschev

Add fixed point to AllTypes in libcudf unit tests (#9472) @karthikeyann

Fix regex handling of embedded null characters (#9470) @davidwendt

Fix memcheck error in copy-if-else (#9467) @davidwendt

Fix bug in dask_cudf.read_parquet for index=False (#9453) @rjzamora

Preserve the decimal scale when creating a default scalar (#9449) @revans2

Push down parent nulls when flattening nested columns. (#9443) @mythrocks

Fix memcheck error in gtest SegmentedGatherTest/GatherSliced (#9442) @davidwendt

Revert "Fix quantile division / partition handling for dask-cudf sort… (#9438) @charlesbluca

Allow int-like objects for the decimals argument in round (#9428) @shwina

Fix stream compaction's drop_duplicates API to use stable sort (#9417) @ttnghia

Skip Comparing Uniform Window Results in Var/std Tests (#9416) @isVoid

Fix StructColumn.to_pandas type handling issues (#9388) @galipremsagar

Correct issues in the build dir cudf-config.cmake (#9386) @robertmaynard

Fix Java table partition test to account for non-deterministic ordering (#9385) @jlowe

Fix timestamp truncation/overflow bugs in orc/parquet (#9382) @PointKernel

Fix the crash in stats code (#9368) @devavret

Make Series.hash_encode results reproducible. (#9366) @bdice

Fix libcudf compile warnings on debug 11.4 build (#9360) @davidwendt

Fail gracefully when compiling python UDFs that attempt to access columns with unsupported dtypes (#9359) @brandon-b-miller

Set pass_filenames: false in mypy pre-commit configuration. (#9349) @bdice

Fix cudf_assert in cudf::io::orc::gpu::gpuDecodeOrcColumnData (#9348) @davidwendt

Fix memcheck error in groupby-tdigest get_scalar_minmax (#9339) @davidwendt

Optimizations for cudf.concat when axis=1 (#9333) @galipremsagar

Use f-string in join helper warning message. (#9325) @bdice

Avoid casting to list or struct dtypes in dask_cudf.read_parquet (#9314) @rjzamora

Fix null count in statistics for parquet (#9303) @devavret

Potential overflow of decimal32 when casting to int64_t (#9287) @codereport

Fix quantile division / partition handling for dask-cudf sort on null dataframes (#9259) @charlesbluca

Updating cudf version also updates rapids cmake branch (#9249) @robertmaynard

Implement one_hot_encoding in libcudf and bind to python (#9229) @isVoid

BUG FIX: CSV Writer ignores the header parameter when no metadata is provided (#8740) @skirui-source

📖 Documentation

Update Documentation to use TYPED_TEST_SUITE (#9654) @codereport

Add dedicated page for StringHandling in python docs (#9624) @galipremsagar

Update docstring of DataFrame.merge (#9572) @galipremsagar

Use raw strings to avoid SyntaxErrors in parsed docstrings. (#9526) @bdice

Add example to docstrings in rolling.apply (#9522) @isVoid

Update help message to escape quotes in ./build.sh --cmake-args. (#9494) @bdice

Improve Python docstring formatting. (#9493) @bdice

Update table of I/O supported types (#9476) @vuule

Document invalid regex patterns as undefined behavior (#9473) @davidwendt

Miscellaneous documentation fixes to cudf (#9471) @galipremsagar

Fix many documentation errors in libcudf. (#9355) @karthikeyann

Fixing SubwordTokenizer docs issue (#9354) @mayankanand007

Improved deprecation warnings. (#9347) @bdice

doc reorder mr, stream to stream, mr (#9308) @karthikeyann

Deprecate method parameters to DataFrame.join, DataFrame.merge. (#9291) @bdice

Added deprecation warning for .label_encoding() (#9289) @mayankanand007

🚀 New Features

Enable Series.divide and DataFrame.divide (#9630) @vyasr

Update bitmask_and and bitmask_or to return a pair of resulting mask and count of unset bits (#9616) @PointKernel

Add handling of mixed numeric types in to_dlpack (#9585) @galipremsagar

Support re.Pattern object for pat arg in str.replace (#9573) @davidwendt

Add JNI for lists::drop_list_duplicates with keys-values input column (#9553) @ttnghia

Support structs column in min, max, argmin and argmax groupby aggregate() and scan() (#9545) @ttnghia

Move libcudacxx to use rapids_cpm and use newer versions (#9539) @robertmaynard

Add scan min/max support for chrono types to libcudf reduction-scan (not groupby scan) (#9518) @davidwendt

Support args= in apply (#9514) @brandon-b-miller

Add groupby scan min/max support for strings values (#9502) @davidwendt

Add list output option to character_ngrams() function (#9499) @davidwendt

More granular column selection in ORC reader (#9496) @vuule

add min_periods, ddof to groupby covariance, & correlation aggregation (#9492) @karthikeyann

Implement Series.datetime.floor (#9488) @skirui-source

Enable linting of CMake files using pre-commit (#9484) @vyasr

Add support for single-line regex anchors ^/$ in contains_re (#9482) @davidwendt

Augment order_by to Accept a List of null_precedence (#9455) @isVoid

Add format API for list column of strings (#9454) @davidwendt

Enable Datetime/Timedelta dtypes in Masked UDFs (#9451) @brandon-b-miller

Add cudf python groupby.diff (#9446) @karthikeyann

Implement lists::stable_sort_lists for stable sorting of elements within each row of lists column (#9425) @ttnghia

add ctest memcheck using cuda-sanitizer (#9414) @karthikeyann

Support Unary Operations in Masked UDF (#9409) @isVoid

Move Several Series Function to Frame (#9394) @isVoid

MD5 Python hash API (#9390) @bdice

Add cudf strings is_title API (#9380) @davidwendt

Enable casting to int64, uint64, and double in AST code. (#9379) @vyasr

Add support for writing ORC with map columns (#9369) @vuule

extract_list_elements() with column_view indices (#9367) @mythrocks

Reimplement lists::drop_list_duplicates for keys-values lists columns (#9345) @ttnghia

Support Python UDFs written in terms of rows (#9343) @brandon-b-miller

JNI: Support nested types in ORC writer (#9334) @firestarman

Optionally nullify out-of-bounds indices in segmented_gather(). (#9318) @mythrocks

Add shallow hash function and shallow equality comparison for column_view (#9312) @karthikeyann

Add CudaMemoryBuffer for cudaMalloc memory using RMM cuda_memory_resource (#9311) @rongou

Add parameters to control row index stride and stripe size in ORC writer (#9310) @vuule

Add na_position param to dask-cudf sort_values (#9264) @charlesbluca

Add ascending parameter for dask-cudf sort_values (#9250) @charlesbluca

New array conversion methods (#9236) @vyasr

Series apply method backed by masked UDFs (#9217) @brandon-b-miller

Grouping by frequency and resampling (#9178) @shwina

Pure-python masked UDFs (#9174) @brandon-b-miller

Add Covariance, Pearson correlation for sort groupby (libcudf) (#9154) @karthikeyann

Add calendrical_month_sequence in c++ and date_range in python (#8886) @shwina

🛠️ Improvements

Followup to PR 9088 comments (#9659) @cwharris

Update cuCollections to version that supports installed libcudacxx (#9633) @robertmaynard

Add 11.5 dev.yml to cudf (#9617) @galipremsagar

Add xfail for parquet reader 11.5 issue (#9612) @galipremsagar

remove deprecated Rmm.initialize method (#9607) @rongou

Use HostColumnVectorCore for child columns in JCudfSerialization.unpackHostColumnVectors (#9596) @sperlingxx

Set RMM pool to a fixed size in JNI (#9583) @rongou

Use nvCOMP for Snappy compression/decompression (#9582) @vuule

Build CUDA version agnostic packages for dask-cudf (#9578) @Ethyling

Fixed tests warning: "TYPED_TEST_CASE is deprecated, please use TYPED_TEST_SUITE" (#9574) @ttnghia

Enable CMake format in CI and fix style (#9570) @vyasr

Add NVTX Start/End Ranges to JNI (#9563) @abellina

Add librdkafka and python-confluent-kafka to dev conda environments s… (#9562) @jdye64

Add offsets_begin/end() to strings_column_view (#9559) @davidwendt

remove alignment options for RMM jni (#9550) @rongou

Add axis parameter passthrough to DataFrame and Series take for pandas API compatibility (#9549) @dantegd

Remove sizeof and standardize on memory_usage (#9544) @vyasr

Adds cudaProfilerStart/cudaProfilerStop in JNI api (#9543) @abellina

Generalize comparison binary operations (#9542) @vyasr

Expose APIs to wrap CUDA or RMM allocations with a Java device buffer instance (#9538) @jlowe

Add scan sum support for duration types to libcudf (#9536) @davidwendt

Force inlining to improve AST performance (#9530) @vyasr

Generalize some more indexed frame methods (#9529) @vyasr

Add Java bindings for rolling window stddev aggregation (#9527) @razajafri

catch rmm::out_of_memory exceptions in jni (#9525) @rongou

Add an overload of make_empty_column with type_id parameter (#9524) @ttnghia

Accelerate conditional inner joins with larger right tables (#9523) @vyasr

Initial pass of generalizing decimal support in cudf python layer (#9517) @galipremsagar

Cleanup for flattening nested columns (#9509) @rwlee

Enable running tests using RMM arena and async memory resources (#9506) @rongou

Remove dependency on six. (#9495) @bdice

Cleanup some libcudf strings gtests (#9489) @davidwendt

Rename strings/array_tests.cu to strings/array_tests.cpp (#9480) @davidwendt

Refactor sorting APIs (#9464) @vyasr

Implement DataFrame.hash_values, deprecate DataFrame.hash_columns. (#9458) @bdice

Deprecate Series.hash_encode. (#9457) @bdice

Update conda recipes for Enhanced Compatibility effort (#9456) @ajschmidt8

Small clean up to simplify column selection code in ORC reader (#9444) @vuule

add missing stream to scalar.is_valid() wherever stream is available (#9436) @karthikeyann

Adds Deprecation Warnings to one_hot_encoding and Implement get_dummies with Cython API (#9435) @isVoid

Update pre-commit hook URLs. (#9433) @bdice

Remove pyarrow import in dask_cudf.io.parquet (#9429) @charlesbluca

Miscellaneous improvements for UDFs (#9422) @isVoid

Use pre-commit for CI (#9412) @vyasr

Update to UCX-Py 0.23 (#9407) @pentschev

Expose OutOfBoundsPolicy in JNI for Table.gather (#9406) @abellina

Improvements to tdigest aggregation code. (#9403) @nvdbaranec

Add Java API to deserialize a table to host columns (#9402) @jlowe

Frame copy to use class instead of type() (#9397) @madsbk

Change all DeprecationWarnings to FutureWarning. (#9392) @bdice

Update Java nvcomp JNI bindings to nvcomp 2.x API (#9384) @jbrennan333

Add IndexedFrame class and move SingleColumnFrame to a separate module (#9378) @vyasr

Support Arrow NativeFile and PythonFile for remote ORC storage (#9377) @rjzamora

Use Arrow PythonFile for remote CSV storage (#9376) @rjzamora

Add multi-threaded writing to GDS writes (#9372) @devavret

Miscellaneous column cleanup (#9370) @vyasr

Use single kernel to extract all groups in cudf::strings::extract (#9358) @davidwendt

Consolidate binary ops into Frame (#9357) @isVoid

Move rank scan implementations from scan_inclusive.cu to rank_scan.cu (#9351) @davidwendt

Remove usage of deprecated thrust::host_space_tag. (#9350) @bdice

Use Default Memory Resource for Temporaries in reduction.cpp (#9344) @isVoid

Fix Cython compilation warnings. (#9327) @bdice

Fix some unused variable warnings in libcudf (#9326) @davidwendt

Use optional-iterator for copy-if-else kernel (#9324) @davidwendt

Remove Table class (#9315) @vyasr

Unpin dask and distributed in CI (#9307) @galipremsagar

Add optional-iterator support to indexalator (#9306) @davidwendt

Consolidate more methods in Frame (#9305) @vyasr

Add Arrow-NativeFile and PythonFile support to read_parquet and read_csv in cudf (#9304) @rjzamora

Pin mypy in .pre-commit-config.yaml to match conda environment pinning. (#9300) @bdice

Use gather.hpp when gather-map exists in device memory (#9299) @davidwendt

Fix Automerger for Branch-21.12 from branch-21.10 (#9285) @galipremsagar

Refactor cuIO timestamp processing with cuda::std::chrono (#9278) @PointKernel

Change strings copy_if_else to use optional-iterator instead of pair-iterator (#9266) @davidwendt

Update cudf java bindings to 21.12.0-SNAPSHOT (#9248) @pxLi

Various internal MultiIndex improvements (#9243) @vyasr

Add detail interface for split and slice(table_view), refactors both function with host_span (#9226) @isVoid

Refactor MD5 implementation. (#9212) @bdice

Update groupby result_cache to allow sharing intermediate results based on column_view instead of requests. (#9195) @karthikeyann

Use nvcomp's snappy decompressor in avro reader (#9181) @devavret

Add isocalendar API support (#9169) @marlenezw

Simplify read_json by removing unnecessary reader/impl classes (#9088) @cwharris

Simplify read_csv by removing unnecessary reader/impl classes (#9041) @cwharris

Refactor hash join with cuCollections multimap (#8934) @PointKernel

Source code(tar.gz)
Source code(zip)
v21.10.01(Oct 12, 2021)

v21.10.01
Source code(tar.gz)
Source code(zip)
v21.10.00(Oct 6, 2021)
🚨 Breaking Changes

Remove Cython APIs for table view generation (#9199) @vyasr

Upgrade pandas version in cudf (#9147) @galipremsagar

Make AST operators nullable (#9096) @vyasr

Remove the option to pass data types as strings to read_csv and read_json (#9079) @vuule

Update JNI java CSV APIs to not use deprecated API (#9066) @revans2

Support additional format specifiers in from_timestamps (#9047) @davidwendt

Expose expression base class publicly and simplify public AST API (#9045) @vyasr

Add support for struct type in ORC writer (#9025) @vuule

Remove aliases of various api.types APIs from utils.dtypes. (#9011) @vyasr

Java bindings for conditional join output sizes (#9002) @jlowe

Move compute_column API out of ast namespace (#8957) @vyasr

cudf.dtype function (#8949) @shwina

Refactor Frame reductions (#8944) @vyasr

Add nested column selection to parquet reader (#8933) @devavret

JNI Aggregation Type Changes (#8919) @revans2

Add groupby_aggregation and groupby_scan_aggregation classes and force their usage. (#8906) @nvdbaranec

Expand CSV and JSON reader APIs to accept dtypes as a vector or map of data_type objects (#8856) @vuule

Change cudf docs theme to pydata theme (#8746) @galipremsagar

Enable compiled binary ops in libcudf, python and java (#8741) @karthikeyann

Make groupby transform-like op order match original data order (#8720) @isVoid

🐛 Bug Fixes

fixed_point cudf::groupby for mean aggregation (#9296) @codereport

Fix interleave_columns when the input string lists column having empty child column (#9292) @ttnghia

Update nvcomp to include fixes for installation of headers (#9276) @devavret

Fix Java column leak in testParquetWriteMap (#9271) @jlowe

Fix call to thrust::reduce_by_key in argmin/argmax libcudf groupby (#9263) @davidwendt

Fixing empty input to getMapValue crashing (#9262) @hyperbolic2346

Fix duplicate names issue in MultiIndex.deserialize (#9258) @galipremsagar

Dataframe.sort_index optimizations (#9238) @galipremsagar

Temporarily disabling problematic test in parquet writer (#9230) @devavret

Explicitly disable groupby on unsupported key types. (#9227) @mythrocks

Fix gather for sliced input structs column (#9218) @ttnghia

Fix JNI code for left semi and anti joins (#9207) @jlowe

Only install thrust when using a non 'system' version (#9206) @robertmaynard

Remove zlib from libcudf public CMake dependencies (#9204) @robertmaynard

Fix out-of-bounds memory read in orc gpuEncodeOrcColumnData (#9196) @davidwendt

Fix gather() for STRUCT inputs with no nulls in members. (#9194) @mythrocks

get_cucollections properly uses rapids_cpm_find (#9189) @robertmaynard

rapids-export correctly reference build code block and doc strings (#9186) @robertmaynard

Fix logic while parsing the sum statistic for numerical orc columns (#9183) @ayushdg

Add handling for nulls in dask_cudf.sorting.quantile_divisions (#9171) @charlesbluca

Approximate overflow detection in ORC statistics (#9163) @vuule

Use decimal precision metadata when reading from parquet files (#9162) @shwina

Fix variable name in Java build script (#9161) @jlowe

Import rapids-cmake modules using the correct cmake variable. (#9149) @robertmaynard

Fix conditional joins with empty left table (#9146) @vyasr

Fix joining on indexes with duplicate level names (#9137) @shwina

Fixes missing child column name in dtype while reading ORC file. (#9134) @rgsl888prabhu

Apply type metadata after column is slice-copied (#9131) @isVoid

Fix a bug: inner_join_size return zero if build table is empty (#9128) @PointKernel

Fix multi hive-partition parquet reading in dask-cudf (#9122) @rjzamora

Support null literals in expressions (#9117) @vyasr

Fix cudf::hash_join output size for struct joins (#9107) @jlowe

Import fix (#9104) @shwina

Fix cudf::strings::is_fixed_point checking of overflow for decimal32 (#9093) @davidwendt

Fix branch_stack calculation in row_bit_count() (#9076) @mythrocks

Fetch rapids-cmake to work around cuCollection cmake issue (#9075) @jlowe

Fix compilation errors in groupby benchmarks. (#9072) @nvdbaranec

Preserve float16 upscaling (#9069) @galipremsagar

Fix memcheck read error in libcudf contiguous_split (#9067) @davidwendt

Add support for reading ORC file with no row group index (#9060) @rgsl888prabhu

Various multiindex related fixes (#9036) @shwina

Avoid rebuilding cython in build.sh (#9034) @brandon-b-miller

Add support for percentile dispatch in dask_cudf (#9031) @galipremsagar

cudf resolve nvcc 11.0 compiler crashes during codegen (#9028) @robertmaynard

Fetch correct grouping keys agg of dask groupby (#9022) @galipremsagar

Allow where() to work with a Series and other=cudf.NA (#9019) @sarahyurick

Use correct index when returning Series from GroupBy.apply() (#9016) @charlesbluca

Fix Dataframe indexer setitem when array is passed (#9006) @galipremsagar

Fix ORC reading of files with struct columns that have null values (#9005) @vuule

Ensure JNI native libraries load when CompiledExpression loads (#8997) @jlowe

Fix memory read error in get_dremel_data in page_enc.cu (#8995) @davidwendt

Fix memory write error in get_list_child_to_list_row_mapping utility (#8994) @davidwendt

Fix debug compile error for csv_test.cpp (#8981) @davidwendt

Fix memory read/write error in concatenate_lists_ignore_null (#8978) @davidwendt

Fix concatenation of cudf.RangeIndex (#8970) @galipremsagar

Java conditional joins should not require matching column counts (#8955) @jlowe

Fix concatenate empty structs (#8947) @sperlingxx

Fix cuda-memcheck errors for some libcudf functions (#8941) @davidwendt

Apply series name to result of SeriesGroupby.apply() (#8939) @charlesbluca

cdef packed_columns as cppclass instead of struct (#8936) @charlesbluca

Inserting a cudf.NA into a DataFrame (#8923) @sarahyurick

Support casting with Pandas dtype aliases (#8920) @sarahyurick

Allow sort_values to accept same kind values as Pandas (#8912) @sarahyurick

Enable casting to pandas nullable dtypes (#8889) @brandon-b-miller

Fix libcudf memory errors (#8884) @karthikeyann

Throw KeyError when accessing field from struct with nonexistent key (#8880) @NV-jpt

replace auto with auto& ref for cast<&> (#8866) @karthikeyann

Add missing include<optional> in binops (#8864) @karthikeyann

Fix select_dtypes to work when non-class dtypes present in dataframe (#8849) @sarahyurick

Re-enable JSON tests (#8843) @vuule

Support header with embedded delimiter in csv writer (#8798) @davidwendt

📖 Documentation

Add IO docs page in cudf documentation (#9145) @galipremsagar

use correct namespace in cuio code examples (#9037) @cwharris

Restructuring Contributing doc (#9026) @iskode

Update stable version in readme (#9008) @galipremsagar

Add spans and more include guidelines to libcudf developer guide (#8931) @harrism

Update Java build instructions to mention Arrow S3 and Docker (#8867) @jlowe

List GDS-enabled formats in the docs (#8805) @vuule

Change cudf docs theme to pydata theme (#8746) @galipremsagar

🚀 New Features

Revert "Add shallow hash function and shallow equality comparison for column_view (#9185)" (#9283) @karthikeyann

Align DataFrame.apply signature with pandas (#9275) @brandon-b-miller

Add struct type support for drop_list_duplicates (#9202) @ttnghia

support CUDA async memory resource in JNI (#9201) @rongou

Add shallow hash function and shallow equality comparison for column_view (#9185) @karthikeyann

Superimpose null masks for STRUCT columns. (#9144) @mythrocks

Implemented bindings for ceil timestamp operation (#9141) @shaneding

Adding MAP type support for ORC Reader (#9132) @rgsl888prabhu

Implement interleave_columns for lists with arbitrary nested type (#9130) @ttnghia

Add python bindings to fixed-size window and groupby rolling.var, rolling.std (#9097) @isVoid

Make AST operators nullable (#9096) @vyasr

Java bindings for approx_percentile (#9094) @andygrove

Add dseries.struct.explode (#9086) @isVoid

Add support for BaseIndexer in Rolling APIs (#9085) @galipremsagar

Remove the option to pass data types as strings to read_csv and read_json (#9079) @vuule

Add handling for nested dicts in dask-cudf groupby (#9054) @charlesbluca

Added Series.dt.is_quarter_start and Series.dt.is_quarter_end (#9046) @TravisHester

Support nested types for nth_element reduction (#9043) @sperlingxx

Update sort groupby to use non-atomic operation (#9035) @karthikeyann

Add support for struct type in ORC writer (#9025) @vuule

Implement interleave_columns for structs columns (#9012) @ttnghia

Add groupby first and last aggregations (#9004) @shwina

Add DecimalBaseColumn and move as_decimal_column (#9001) @isVoid

Python/Cython bindings for multibyte_split (#8998) @jdye64

Support scalar months in add_calendrical_months, extends API to INT32 support (#8991) @isVoid

Added Series.dt.is_month_end (#8989) @TravisHester

Support for using tdigests to compute approximate percentiles. (#8983) @nvdbaranec

Support "unflatten" of columns flattened via flatten_nested_columns(): (#8956) @mythrocks

Implement timestamp ceil (#8942) @shaneding

Add nested column selection to parquet reader (#8933) @devavret

Expose conditional join size calculation (#8928) @vyasr

Support Nulls in Timeseries Generator (#8925) @isVoid

Avoid index equality check in _CPackedColumns.from_py_table() (#8917) @charlesbluca

Add dot product binary op (#8909) @charlesbluca

Expose days_in_month function in libcudf and add python bindings (#8892) @isVoid

Series string repeat (#8882) @sarahyurick

Python binding for quarters (#8862) @shaneding

Expand CSV and JSON reader APIs to accept dtypes as a vector or map of data_type objects (#8856) @vuule

Add Java bindings for AST transform (#8846) @jlowe

Series datetime is_month_start (#8844) @sarahyurick

Support bracket syntax for cudf::strings::replace_with_backrefs group index values (#8841) @davidwendt

Support VARIANCE and STD aggregation in rolling op (#8809) @isVoid

Add quarters to libcudf datetime (#8779) @shaneding

Linear Interpolation of nans via cupy (#8767) @brandon-b-miller

Enable compiled binary ops in libcudf, python and java (#8741) @karthikeyann

Make groupby transform-like op order match original data order (#8720) @isVoid

multibyte_split (#8702) @cwharris

Implement JNI for strings:repeat_strings that repeats each string separately by different numbers of times (#8572) @ttnghia

🛠️ Improvements

Pin max dask and distributed versions to 2021.09.1 (#9286) @galipremsagar

Optimized fsspec data transfer for remote file-systems (#9265) @rjzamora

Skip dask-cudf tests on arm64 (#9252) @Ethyling

Use nvcomp's snappy compressor in ORC writer (#9242) @devavret

Only run imports tests on x86_64 (#9241) @Ethyling

Remove unnecessary call to device_uvector::release() (#9237) @harrism

Use nvcomp's snappy decompression in ORC reader (#9235) @devavret

Add grouped_rolling test with STRUCT groupby keys. (#9228) @mythrocks

Optimize cudf.concat for axis=0 (#9222) @galipremsagar

Fix some libcudf calls not passing the stream parameter (#9220) @davidwendt

Add min and max bounds for random dataframe generator numeric types (#9211) @galipremsagar

Improve performance of expression evaluation (#9210) @vyasr

Misc optimizations in cudf (#9203) @galipremsagar

Remove Cython APIs for table view generation (#9199) @vyasr

Add JNI support for drop_list_duplicates (#9198) @revans2

Update pandas versions in conda recipes and requirements.txt files (#9197) @galipremsagar

Minor C++17 cleanup of groupby.cu: structured bindings, more concise lambda, etc (#9193) @codereport

Explicit about bitwidth difference between cudf boolean and arrow boolean (#9192) @isVoid

Remove _source_index from MultiIndex (#9191) @vyasr

Fix typo in the name of cudf-testing-targets.cmake (#9190) @trxcllnt

Add support for single-digits in cudf::to_timestamps (#9173) @davidwendt

Fix cufilejni build include path (#9168) @pxLi

dask_cudf dispatch registering cleanup (#9160) @galipremsagar

Remove unneeded stream/mr from a cudf::make_strings_column (#9148) @davidwendt

Upgrade pandas version in cudf (#9147) @galipremsagar

make data chunk reader return unique_ptr (#9129) @cwharris

Add backend for percentile_lookup dispatch (#9118) @galipremsagar

Refactor implementation of column setitem (#9110) @vyasr

Fix compile warnings found using nvcc 11.4 (#9101) @davidwendt

Update to UCX-Py 0.22 (#9099) @pentschev

Simplify read_avro by removing unnecessary writer/impl classes (#9090) @cwharris

Allowing %f in format to return nanoseconds (#9081) @marlenezw

Java bindings for cudf::hash_join (#9080) @jlowe

Remove stale code in ColumnBase._fill (#9078) @isVoid

Add support for get_group in GroupBy (#9070) @galipremsagar

Remove remaining "support" methods from DataFrame (#9068) @vyasr

Update JNI java CSV APIs to not use deprecated API (#9066) @revans2

Added method to remove null_masks if the column has no nulls (#9061) @razajafri

Consolidate Several Series and Dataframe Methods (#9059) @isVoid

Remove usage of string based set_dtypes for csv & json readers (#9049) @galipremsagar

Remove some debug print statements from gtests (#9048) @davidwendt

Support additional format specifiers in from_timestamps (#9047) @davidwendt

Expose expression base class publicly and simplify public AST API (#9045) @vyasr

move filepath and mmap logic out of json/csv up to functions.cpp (#9040) @cwharris

Refactor Index hierarchy (#9039) @vyasr

cudf now leverages rapids-cmake to reduce CMake boilerplate (#9030) @robertmaynard

Add support for STRUCT input to groupby (#9024) @mythrocks

Refactor Frame scans (#9021) @vyasr

Remove duplicate set_categories code (#9018) @isVoid

Map support for ParquetWriter (#9013) @razajafri

Remove aliases of various api.types APIs from utils.dtypes. (#9011) @vyasr

Java bindings for conditional join output sizes (#9002) @jlowe

Remove _copy_construct factory (#8999) @vyasr

ENH Allow arbitrary CMake config options in build.sh (#8996) @dillon-cullinan

A small optimization for JNI copy column view to column vector (#8985) @revans2

Fix nvcc warnings in ORC writer (#8975) @devavret

Support nested structs in rank and dense rank (#8962) @rwlee

Move compute_column API out of ast namespace (#8957) @vyasr

Series datetime is_year_end and is_year_start (#8954) @marlenezw

Make Java AstNode public (#8953) @jlowe

Replace allocate with device_uvector for subword_tokenize internal tables (#8952) @davidwendt

cudf.dtype function (#8949) @shwina

Refactor Frame reductions (#8944) @vyasr

Add deprecation warning for Series.set_mask API (#8943) @galipremsagar

Move AST evaluator into a separate header (#8930) @vyasr

JNI Aggregation Type Changes (#8919) @revans2

Move template parameter to function parameter in cudf::detail::left_semi_anti_join (#8914) @davidwendt

Upgrade arrow & pyarrow to 5.0.0 (#8908) @galipremsagar

Add groupby_aggregation and groupby_scan_aggregation classes and force their usage. (#8906) @nvdbaranec

Move structs_column_tests.cu to .cpp. (#8902) @mythrocks

Add stream and memory-resource parameters to struct-scalar copy ctor (#8901) @davidwendt

Combine linearizer and ast_plan (#8900) @vyasr

Add Java bindings for conditional join gather maps (#8888) @jlowe

Remove max version pin for dask & distributed on development branch (#8881) @galipremsagar

fix cufilejni build w/ c++17 (#8877) @pxLi

Add struct accessor to dask-cudf (#8874) @NV-jpt

Migrate dask-cudf CudfEngine to leverage ArrowDatasetEngine (#8871) @rjzamora

Add JNI for extract_quarter, add_calendrical_months, and is_leap_year (#8863) @revans2

Change cudf::scalar copy and move constructors to protected (#8857) @davidwendt

Replace is_same<>::value with is_same_v<> (#8852) @codereport

Add min pytorch version to importorskip in pytest (#8851) @galipremsagar

Java bindings for regex replace (#8847) @jlowe

Remove make strings children with null mask (#8830) @davidwendt

Refactor conditional joins (#8815) @vyasr

Small cleanup (unused headers / commented code removals) (#8799) @codereport

ENH Replace gpuci_conda_retry with gpuci_mamba_retry (#8770) @dillon-cullinan

Update cudf java bindings to 21.10.0-SNAPSHOT (#8765) @pxLi

Refactor and improve join benchmarks with nvbench (#8734) @PointKernel

Refactor Python factories and remove usage of Table for libcudf output handling (#8687) @vyasr

Optimize URL Decoding (#8622) @gaohao95

Parquet writer dictionary encoding refactor (#8476) @devavret

Use nvcomp's snappy decompression in parquet reader (#8252) @devavret

Use nvcomp's snappy compressor in parquet writer (#8229) @devavret

Source code(tar.gz)
Source code(zip)
v21.08.03(Sep 16, 2021)

v21.08.03
Source code(tar.gz)
Source code(zip)
v21.08.02(Aug 6, 2021)

v21.08.02
Source code(tar.gz)
Source code(zip)
v21.08.01(Aug 6, 2021)

v21.08.01
Source code(tar.gz)
Source code(zip)
v21.08.00(Aug 4, 2021)
🚨 Breaking Changes

Fix a crash in pack() when being handed tables with no columns. (#8697) @nvdbaranec

Remove unused cudf::strings::create_offsets (#8663) @davidwendt

Add delimiter parameter to cudf::strings::capitalize() (#8620) @davidwendt

Change default datetime index resolution to ns to match pandas (#8611) @vyasr

Add sequence_type parameter to cudf::strings::title function (#8602) @davidwendt

Add strings::repeat_strings API that can repeat each string a different number of times (#8561) @ttnghia

String-to-boolean conversion is different from Pandas (#8549) @skirui-source

Add accurate hash join size functions (#8453) @PointKernel

Expose a Decimal32Dtype in cuDF Python (#8438) @skirui-source

Update dask make_meta changes to be compatible with dask upstream (#8426) @galipremsagar

Adapt cudf::scalar classes to changes in rmm::device_scalar (#8411) @harrism

Remove special Index class from the general index class hierarchy (#8309) @vyasr

Add first-class dtype utilities (#8308) @vyasr

ORC - Support reading multiple orc files/buffers in a single operation (#8142) @jdye64

Upgrade arrow to 4.0.1 (#7495) @galipremsagar

🐛 Bug Fixes

Fix contains check in string column (#8834) @galipremsagar

Remove unused variable from row_bit_count_test. (#8829) @mythrocks

Fixes issue with null struct columns in ORC reader (#8819) @rgsl888prabhu

Set CMake vars for python/parquet support in libarrow builds (#8808) @vyasr

Handle empty child columns in row_bit_count() (#8791) @mythrocks

Revert "Remove cudf unneeded build time requirement of the cuda driver" (#8784) @robertmaynard

Fix isort error in utils.pyx (#8771) @charlesbluca

Handle sliced struct/list columns properly in concatenate() bounds checking. (#8760) @nvdbaranec

Fix issues with _CPackedColumns.serialize() handling of host and device data (#8759) @charlesbluca

Fix issues with MultiIndex in dropna, stack & reset_index (#8753) @galipremsagar

Write pandas extension types to parquet file metadata (#8749) @devavret

Fix where to handle DataFrame & Series input combination (#8747) @galipremsagar

Fix replace to handle null values correctly (#8744) @galipremsagar

Handle sliced structs properly in pack/contiguous_split. (#8739) @nvdbaranec

Fix issue in slice() where columns with a positive offset were computing null counts incorrectly. (#8738) @nvdbaranec

Fix cudf.Series constructor to handle list of sequences (#8735) @galipremsagar

Fix min/max sorted groupby aggregation on string column with nulls (argmin, argmax sentinel value missing on nulls) (#8731) @karthikeyann

Fix orc reader assert on create data_type in debug (#8706) @davidwendt

Fix min/max inclusive cudf::scan for strings column (#8705) @davidwendt

JNI: Fix driver version assertion logic in testGetCudaRuntimeInfo (#8701) @sperlingxx

Adding fix for skip_rows and crash in orc reader (#8700) @rgsl888prabhu

Bug fix: replace_nulls_policy functor not returning correct indices for gathermap (#8699) @isVoid

Fix a crash in pack() when being handed tables with no columns. (#8697) @nvdbaranec

Add post-processing steps to dask_cudf.groupby.CudfSeriesGroupby.aggregate (#8694) @charlesbluca

JNI build no longer looks for Arrow in conda environment (#8686) @jlowe

Handle arbitrarily different data in null list column rows when checking for equivalency. (#8666) @nvdbaranec

Add ConfigureNVBench to avoid concurrent main() entry points (#8662) @PointKernel

Pin *arrow to use *cuda in run (#8651) @jakirkham

Add proper support for tolerances in testing methods. (#8649) @vyasr

Support multi-char case conversion in capitalize function (#8647) @davidwendt

Fix repeated mangled names in read_csv with duplicate column names (#8645) @karthikeyann

Temporarily disable libcudf example build tests (#8642) @isVoid

Use conda-sourced cudf artifacts for libcudf example in CI (#8638) @isVoid

Ensure dev environment uses Arrow GPU packages (#8637) @charlesbluca

Fix bug that columns only initialized once when specified columns and index in dataframe ctor (#8628) @isVoid

Propagate **kwargs through to as_*_column methods (#8618) @shwina

Fix orc_reader_benchmark.cpp compile error (#8609) @davidwendt

Fix missed renumbering of Aggregation values (#8600) @revans2

Update cmake to 3.20.5 in the Java Docker image (#8593) @NvTimLiu

Fix bug in replace_with_backrefs when group has greedy quantifier (#8575) @davidwendt

Apply metadata to keys before returning in Frame._encode (#8560) @charlesbluca

Fix for strings containing special JSON characters in get_json_object(). (#8556) @nvdbaranec

Fix debug compile error in gather_struct_tests.cpp (#8554) @davidwendt

String-to-boolean conversion is different from Pandas (#8549) @skirui-source

Fix __repr__ output with display.max_rows is None (#8547) @galipremsagar

Fix size passed to column constructors in _with_type_metadata (#8539) @shwina

Properly retrieve last column when -1 is specified for column index (#8529) @isVoid

Fix importing apply from dask (#8517) @galipremsagar

Fix offset of the string dictionary length stream (#8515) @vuule

Fix double counting of selected columns in CSV reader (#8508) @ochan1

Incorrect map size in scatter_to_gather corrupts struct columns (#8507) @gerashegalov

replace_nulls properly propagates memory resource to gather calls (#8500) @robertmaynard

Disallow groupby aggs for StructColumns (#8499) @charlesbluca

Fixes out-of-bounds access for small files in unzip (#8498) @elstehle

Adding support for writing empty dataframe (#8490) @shaneding

Fix exclusive scan when including nulls and improve testing (#8478) @harrism

Add workaround for crash in libcudf debug build using output_indexalator in thrust::lower_bound (#8432) @davidwendt

Install only the same Thrust files that Thrust itself installs (#8420) @robertmaynard

Add nightly version for ucx-py in ci script (#8419) @galipremsagar

Fix null_equality config of rolling_collect_set (#8415) @sperlingxx

CollectSetAggregation: implement RollingAggregation interface (#8406) @sperlingxx

Handle pre-sliced nested columns in contiguous_split. (#8391) @nvdbaranec

Fix bitmask_tests.cpp host accessing device memory (#8370) @davidwendt

Fix concurrent_unordered_map to prevent accessing padding bits in pair_type (#8348) @davidwendt

BUG FIX: Raise appropriate strings error when concatenating strings column (#8290) @skirui-source

Make gpuCI and pre-commit style configurations consistent (#8215) @charlesbluca

Add collect list to dask-cudf groupby aggregations (#8045) @charlesbluca

📖 Documentation

Update Python UDFs notebook (#8810) @brandon-b-miller

Fix dask.dataframe API docs links after reorg (#8772) @jsignell

Fix instructions for running cuDF/dask-cuDF tests in CONTRIBUTING.md (#8724) @shwina

Translate Markdown documentation to rST and remove recommonmark (#8698) @vyasr

Fixed spelling mistakes in libcudf documentation (#8664) @karthikeyann

Custom Sphinx Extension: PandasCompat (#8643) @isVoid

Fix README.md (#8535) @ajschmidt8

Change namespace contains_nulls to struct (#8523) @davidwendt

Add info about NVTX ranges to dev guide (#8461) @jrhemstad

Fixed documentation bug in groupby agg method (#8325) @ahmet-uyar

🚀 New Features

Fix concatenating structs (#8811) @shaneding

Implement JNI for groupby aggregations M2 and MERGE_M2 (#8763) @ttnghia

Bump isort to 5.6.4 and remove isort overrides made for 5.0.7 (#8755) @charlesbluca

Implement __setitem__ for StructColumn (#8737) @shaneding

Add is_leap_year to DateTimeProperties and DatetimeIndex (#8736) @isVoid

Add struct.explode() method (#8729) @shwina

Add DataFrame.to_struct() method to convert a DataFrame to a struct Series (#8728) @shwina

Add support for list type in ORC writer (#8723) @vuule

Fix slicing from struct columns and accessing struct columns (#8719) @shaneding

Add datetime::is_leap_year (#8711) @isVoid

Accessing struct columns from dask_cudf (#8675) @shaneding

Added pct_change to Series (#8650) @TravisHester

Add strings support to cudf::shift function (#8648) @davidwendt

Support Scatter struct_scalar (#8630) @isVoid

Struct scalar from host dictionary (#8629) @shaneding

Add dayofyear and day_of_year to Series, DatetimeColumn, and DatetimeIndex (#8626) @beckernick

JNI support for capitalize (#8624) @firestarman

Add delimiter parameter to cudf::strings::capitalize() (#8620) @davidwendt

Add NVBench in CMake (#8619) @PointKernel

Change default datetime index resolution to ns to match pandas (#8611) @vyasr

ListColumn __setitem__ (#8606) @brandon-b-miller

Implement groupby aggregations M2 and MERGE_M2 (#8605) @ttnghia

Add sequence_type parameter to cudf::strings::title function (#8602) @davidwendt

Adding support for list and struct type in ORC Reader (#8599) @rgsl888prabhu

Benchmark for strings::repeat_strings APIs (#8589) @ttnghia

Nested scalar support for copy if else (#8588) @gerashegalov

User specified decimal columns to float64 (#8587) @jdye64

Add get_element for struct column (#8578) @isVoid

Python changes for adding __getitem__ for struct (#8577) @shaneding

Add strings::repeat_strings API that can repeat each string a different number of times (#8561) @ttnghia

Refactor tests/iterator_utilities.hpp functions (#8540) @ttnghia

Support MERGE_LISTS and MERGE_SETS in Java package (#8516) @sperlingxx

Decimal support csv reader (#8511) @elstehle

Add column type tests (#8505) @isVoid

Warn when downscaling decimal columns (#8492) @ChrisJar

Add JNI for strings::repeat_strings (#8491) @ttnghia

Add Index.get_loc for Numerical, String Index support (#8489) @isVoid

Expose half_up rounding in cuDF (#8477) @shwina

Java APIs to fetch CUDA runtime info (#8465) @sperlingxx

Add str.edit_distance_matrix (#8463) @isVoid

Support constructing cudf.Scalar objects from host side lists (#8459) @brandon-b-miller

Add accurate hash join size functions (#8453) @PointKernel

Add cudf::strings::integer_to_hex convert API (#8450) @davidwendt

Create objects from iterables that contain cudf.NA (#8442) @brandon-b-miller

JNI bindings for sort_lists (#8439) @sperlingxx

Expose a Decimal32Dtype in cuDF Python (#8438) @skirui-source

Replace all_null() and all_valid() by iterator_all_nulls() and iterator_no_null() in tests (#8437) @ttnghia

Implement groupby MERGE_LISTS and MERGE_SETS aggregates (#8436) @ttnghia

Add public libcudf match_dictionaries API (#8429) @davidwendt

Add move constructors for string_scalar and struct_scalar (#8428) @ttnghia

Implement strings::repeat_strings (#8423) @ttnghia

STRUCT column support for cudf::merge. (#8422) @nvdbaranec

Implement reverse in libcudf (#8410) @shaneding

Support multiple input files/buffers for read_json (#8403) @jdye64

Improve test coverage for struct search (#8396) @ttnghia

Add groupby.fillna (#8362) @isVoid

Enable AST-based joining (#8214) @vyasr

Generalized null support in user defined functions (#8213) @brandon-b-miller

Add compiled binary operation (#8192) @karthikeyann

Implement .describe() for DataFrameGroupBy (#8179) @skirui-source

ORC - Support reading multiple orc files/buffers in a single operation (#8142) @jdye64

Add Python bindings for lists::concatenate_list_elements and expose them as .list.concat() (#8006) @shwina

Use Arrow URI FileSystem backed instance to retrieve remote files (#7709) @jdye64

Example to build custom application and link to libcudf (#7671) @isVoid

Upgrade arrow to 4.0.1 (#7495) @galipremsagar

🛠️ Improvements

Provide a better error message when CUDA::cuda_driver not found (#8794) @robertmaynard

Remove anonymous namespace from null_mask.cuh (#8786) @nvdbaranec

Allow cudf to be built without libcuda.so existing (#8751) @robertmaynard

Pin mimesis to <4.1 (#8745) @galipremsagar

Update conda environment name for CI (#8692) @ajschmidt8

Remove flatbuffers dependency (#8671) @Ethyling

Add options to build Arrow with Python and Parquet support (#8670) @trxcllnt

Remove unused cudf::strings::create_offsets (#8663) @davidwendt

Update GDS lib version to 1.0.0 (#8654) @pxLi

Support for groupby/scan rank and dense_rank aggregations (#8652) @rwlee

Fix usage of deprecated arrow ipc API (#8632) @revans2

Use absolute imports in cudf (#8631) @galipremsagar

ENH Add Java CI build script (#8627) @dillon-cullinan

Add DeprecationWarning to ser.str.subword_tokenize (#8603) @VibhuJawa

Rewrite binary operations for improved performance and additional type support (#8598) @vyasr

Fix mypy errors surfacing because of numpy-1.21.0 (#8595) @galipremsagar

Remove unneeded includes from cudf::string_view headers (#8594) @davidwendt

Use cmake 3.20.1 as it is now required by rmm (#8586) @robertmaynard

Remove device debug symbols from cmake CUDF_CUDA_FLAGS (#8584) @davidwendt

Dask-CuDF: use default Dask Dataframe optimizer (#8581) @madsbk

Remove checking if an unsigned value is less than zero (#8579) @robertmaynard

Remove strings_count parameter from cudf::strings::detail::create_chars_child_column (#8576) @davidwendt

Make cudf.api.types imports consistent (#8571) @galipremsagar

Modernize libcudf basic example CMakeFile; updates CI build tests (#8568) @isVoid

Rename concatenate_tests.cu to .cpp (#8555) @davidwendt

enable window lead/lag test on struct (#8548) @wbo4958

Add Java methods to split and write column views (#8546) @razajafri

Small cleanup (#8534) @codereport

Unpin dask version in CI (#8533) @galipremsagar

Added optional flag for building Arrow with S3 filesystem support (#8531) @jdye64

Minor clean up of various internal column and frame utilities (#8528) @vyasr

Rename some copying_test source files .cu to .cpp (#8527) @davidwendt

Correct the last warnings and issues when using newer cuda versions (#8525) @robertmaynard

Correct unused parameter warnings in transform and unary ops (#8521) @robertmaynard

Correct unused parameter warnings in string algorithms (#8509) @robertmaynard

Add in JNI APIs for scan, replace_nulls, group_by.scan, and group_by.replace_nulls (#8503) @revans2

Fix 21.08 forward-merge conflicts (#8502) @ajschmidt8

Fix Cython formatting command in Contributing.md. (#8496) @marlenezw

Bug/correct unused parameters in reshape and text (#8495) @robertmaynard

Correct unused parameter warnings in partitioning and stream compact (#8494) @robertmaynard

Correct unused parameter warnings in labelling and list algorithms (#8493) @robertmaynard

Refactor index construction (#8485) @vyasr

Correct unused parameter warnings in replace algorithms (#8483) @robertmaynard

Correct unused parameter warnings in reduction algorithms (#8481) @robertmaynard

Correct unused parameter warnings in io algorithms (#8480) @robertmaynard

Correct unused parameter warnings in interop algorithms (#8479) @robertmaynard

Correct unused parameter warnings in filling algorithms (#8468) @robertmaynard

Correct unused parameter warnings in groupby (#8467) @robertmaynard

use libcu++ time_point as timestamp (#8466) @karthikeyann

Modify reprog_device::extract to return groups in a single pass (#8460) @davidwendt

Update minimum Dask requirement to 2021.6.0 (#8458) @pentschev

Fix failures when performing binary operations on DataFrames with empty columns (#8452) @ChrisJar

Fix conflicts in 8447 (#8448) @ajschmidt8

Add serialization methods for List and StructDtype (#8441) @charlesbluca

Replace make_empty_strings_column with make_empty_column (#8435) @davidwendt

JNI bindings for get_element (#8433) @revans2

Update dask make_meta changes to be compatible with dask upstream (#8426) @galipremsagar

Unpin dask version on CI (#8425) @galipremsagar

Add benchmark for strings/fixed_point convert APIs (#8417) @davidwendt

Adapt cudf::scalar classes to changes in rmm::device_scalar (#8411) @harrism

Add benchmark for strings/integers convert APIs (#8402) @davidwendt

Enable multi-file partitioning in dask_cudf.read_parquet (#8393) @rjzamora

Correct unused parameter warnings in rolling algorithms (#8390) @robertmaynard

Correct unused parameters in column round and search (#8389) @robertmaynard

Add functionality to apply Dtype metadata to ColumnBase (#8373) @charlesbluca

Refactor setting stack size in regex code (#8358) @davidwendt

Update Java bindings to 21.08-SNAPSHOT (#8344) @pxLi

Replace remaining uses of device_vector (#8343) @harrism

Statically link libnvcomp into libcudfjni (#8334) @jlowe

Resolve auto merge conflicts for Branch 21.08 from branch 21.06 (#8329) @galipremsagar

Minor code refactor for sorted_order (#8326) @wbo4958

Remove special Index class from the general index class hierarchy (#8309) @vyasr

Add first-class dtype utilities (#8308) @vyasr

Add option to link Java bindings with Arrow dynamically (#8307) @jlowe

Refactor ColumnMethods and its subclasses to remove column argument and require parent argument (#8306) @shwina

Refactor scatter for list columns (#8255) @isVoid

Expose pack/unpack API to Python (#8153) @charlesbluca

Adding cudf.cut method (#8002) @marlenezw

Optimize string gather performance for large strings (#7980) @gaohao95

Add peak memory usage tracking to cuIO benchmarks (#7770) @devavret

Updating Clang Version to 11.0.0 (#6695) @codereport

Source code(tar.gz)
Source code(zip)
v21.06.01(Jun 17, 2021)

Source code(tar.gz)
Source code(zip)
v21.06.00(Jun 9, 2021)
🚨 Breaking Changes

Add support for make_meta_obj dispatch in dask-cudf (#8342) @galipremsagar

Add separator-on-null parameter to strings concatenate APIs (#8282) @davidwendt

Introduce a common parent class for NumericalColumn and DecimalColumn (#8278) @vyasr

Update ORC statistics API to use C++17 standard library (#8241) @vuule

Preserve column hierarchy when getting NULL row from LIST column (#8206) @isVoid

Groupby.shift c++ API refactor and python binding (#8131) @isVoid

🐛 Bug Fixes

Fix struct flattening to add a validity column only when the input column has null element (#8374) @ttnghia

Compilation fix: Remove redefinition for std::is_same_v() (#8369) @mythrocks

Add backward compatibility for dask-cudf to work with other versions of dask (#8368) @galipremsagar

Handle empty results with nested types in copy_if_else (#8359) @nvdbaranec

Handle nested column types properly for empty parquet files. (#8350) @nvdbaranec

Raise error when unsupported arguments are passed to dask_cudf.DataFrame.sort_values (#8349) @galipremsagar

Raise NotImplementedError for axis=1 in rank (#8347) @galipremsagar

Add support for make_meta_obj dispatch in dask-cudf (#8342) @galipremsagar

Update Java string concatenate test for single column (#8330) @tgravescs

Use empty_like in scatter (#8314) @revans2

Fix concatenate_lists_ignore_null on rows of all_nulls (#8312) @sperlingxx

Add separator-on-null parameter to strings concatenate APIs (#8282) @davidwendt

COLLECT_LIST support returning empty output columns. (#8279) @mythrocks

Update io util to convert path like object to string (#8275) @ayushdg

Fix result column types for empty inputs to rolling window (#8274) @mythrocks

Actually test equality in assert_groupby_results_equal (#8272) @shwina

CMake always explicitly specify a source files extension (#8270) @robertmaynard

Fix struct binary search and struct flattening (#8268) @ttnghia

Revert "patch thrust to fix intmax num elements limitation in scan_by_key" (#8263) @cwharris

upgrade dlpack to 0.5 (#8262) @cwharris

Fixes CSV-reader type inference for thousands separator and decimal point (#8261) @elstehle

Fix incorrect assertion in Java concat (#8258) @sperlingxx

Copy nested types upon construction (#8244) @isVoid

Preserve column hierarchy when getting NULL row from LIST column (#8206) @isVoid

Clip decimal binary op precision at max precision (#8194) @ChrisJar

📖 Documentation

Add docstring for dask_cudf.read_csv (#8355) @galipremsagar

Fix cudf release version in readme (#8331) @galipremsagar

Fix structs column description in dev docs (#8318) @isVoid

Update readme with correct CUDA versions (#8315) @raydouglass

Add description of the cuIO GDS integration (#8293) @vuule

Remove unused parameter from copy_partition kernel documentation (#8283) @robertmaynard

🚀 New Features

Add support merging b/w categorical data (#8332) @galipremsagar

Java: Support struct scalar (#8327) @sperlingxx

added _is_homogeneous property (#8299) @shaneding

Added decimal writing for CSV writer (#8296) @kaatish

Java: Support creating a scalar from utf8 string (#8294) @firestarman

Add Java API for Concatenate strings with separator (#8289) @tgravescs

strings::join_list_elements options for empty list inputs (#8285) @ttnghia

Return python lists for getitem calls to list type series (#8265) @brandon-b-miller

add unit tests for lead/lag on list for row window (#8259) @wbo4958

Create a String column from UTF8 String byte arrays (#8257) @firestarman

Support scattering list_scalar (#8256) @isVoid

Implement lists::concatenate_list_elements (#8231) @ttnghia

Support for struct scalars. (#8220) @nvdbaranec

Add support for decimal types in ORC writer (#8198) @vuule

Support create lists column from a list_scalar (#8185) @isVoid

Groupby.shift c++ API refactor and python binding (#8131) @isVoid

Add groupby::replace_nulls(replace_policy) api (#7118) @isVoid

🛠️ Improvements

Support Dask + Distributed 2021.05.1 (#8392) @jakirkham

Add aliases for string methods (#8353) @shwina

Update environment variable used to determine cuda_version (#8321) @ajschmidt8

JNI: Refactor the code of making column from scalar (#8310) @firestarman

Update CHANGELOG.md links for calver (#8303) @ajschmidt8

Merge branch-0.19 into branch-21.06 (#8302) @ajschmidt8

use address and length for GDS reads/writes (#8301) @rongou

Update cudfjni version to 21.06.0 (#8292) @pxLi

Update docs build script (#8284) @ajschmidt8

Make device_buffer streams explicit and enforce move construction (#8280) @harrism

Introduce a common parent class for NumericalColumn and DecimalColumn (#8278) @vyasr

Do not add nulls to the hash table when null_equality::NOT_EQUAL is passed to left_semi_join and left_anti_join (#8277) @nvdbaranec

Enable implicit casting when concatenating mixed types (#8276) @ChrisJar

Fix CMake FindPackage rmm, pin dev envs' dlpack to v0.3 (#8271) @trxcllnt

Update cudfjni version to 21.06 (#8267) @pxLi

support RMM aligned resource adapter in JNI (#8266) @rongou

Pass compiler environment variables to conda python build (#8260) @Ethyling

Remove abc inheritance from Serializable (#8254) @vyasr

Move more methods into SingleColumnFrame (#8253) @vyasr

Update ORC statistics API to use C++17 standard library (#8241) @vuule

Correct unused parameter warnings in dictonary algorithms (#8239) @robertmaynard

Correct unused parameters in the copying algorithms (#8232) @robertmaynard

IO statistics cleanup (#8191) @kaatish

Refactor of rolling_window implementation. (#8158) @nvdbaranec

Add a flag for allowing single quotes in JSON strings. (#8144) @nvdbaranec

Column refactoring 2 (#8130) @vyasr

support space in workspace (#7956) @jolorunyomi

Support collect_set on rolling window (#7881) @sperlingxx

Source code(tar.gz)
Source code(zip)
v0.19.2(Apr 28, 2021)
🚨 Breaking Changes

Allow hash_partition to take a seed value (#7771) @magnatelee

Allow merging index column with data column using keyword "on" (#7736) @skirui-source

Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2

Replace device_vector with device_uvector in null_mask (#7715) @harrism

Don't identify decimals as strings. (#7710) @vyasr

Fix Java Parquet write after writer API changes (#7655) @revans2

Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism

Update missing docstring examples in python public APIs (#7546) @galipremsagar

Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt

Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt

Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller

Add struct support to parquet writer (#7461) @devavret

Join APIs that return gathermaps (#7454) @shwina

fixed_point + cudf::binary_operation API Changes (#7435) @codereport

Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source

Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt

Refactor strings column factories (#7397) @harrism

Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard

Upgrade pandas to 1.2 (#7375) @galipremsagar

Rename logical_cast to bit_cast and allow additional conversions (#7373) @ttnghia

Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt

🐛 Bug Fixes

unsnap: busy wait a number of cycles (#8073) @vuule

Fix returned column type when extracting from an empty list column (#8031) @jlowe

Don't reindex an new value on setitem if the original dataframe was empty (#8026) @vyasr

Fix a NameError in meta dispatch API (#7996) @galipremsagar

Reindex in DataFrame.__setitem__ (#7957) @galipremsagar

jitify direct-to-cubin compilation and caching. (#7919) @cwharris

Use dynamic cudart for nvcomp in java build (#7896) @abellina

fix "incompatible redefinition" warnings (#7894) @cwharris

cudf consistently specifies the cuda runtime (#7887) @robertmaynard

disable verbose output for jitify_preprocess (#7886) @cwharris

CMake jit_preprocess_files function only runs when needed (#7872) @robertmaynard

Push DeviceScalar construction into cython for list.contains (#7864) @brandon-b-miller

cudf now sets an install rpath of $ORIGIN (#7863) @robertmaynard

Don't install Thrust examples, tests, docs, and python files (#7811) @robertmaynard

Sort by index in groupby tests more consistently (#7802) @shwina

Revert "Update conda recipes pinning of repo dependencies (#7743)" (#7793) @raydouglass

Add decimal column handling in copy_type_metadata (#7788) @shwina

Add column names validation in parquet writer (#7786) @galipremsagar

Fix Java explode outer unit tests (#7782) @jlowe

Fix compiler warning about non-POD types passed through ellipsis (#7781) @jrhemstad

User resource fix for replace_nulls (#7769) @magnatelee

Fix type dispatch for columnar replace_nulls (#7768) @jlowe

Add ignore_order parameter to dask-cudf concat dispatch (#7765) @galipremsagar

Fix slicing and arrow representations of decimal columns (#7755) @vyasr

Fixing issue with explode_outer position not nulling position entries of null rows (#7754) @hyperbolic2346

Implement scatter for struct columns (#7752) @ttnghia

Fix data corruption in string columns (#7746) @galipremsagar

Fix string length in stripe dictionary building (#7744) @kaatish

Update conda recipes pinning of repo dependencies (#7743) @mike-wendt

Enable dask dispatch to cuDF's is_categorical_dtype for cuDF objects (#7740) @brandon-b-miller

Fix dictionary size computation in ORC writer (#7737) @vuule

Fix cudf::cast overflow for decimal64 to int32_t or smaller in certain cases (#7733) @codereport

Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2

Disable column_view data accessors for unsupported types (#7725) @jrhemstad

Materialize RangeIndex when index=True in parquet writer (#7711) @galipremsagar

Don't identify decimals as strings. (#7710) @vyasr

Fix return type of DataFrame.argsort (#7706) @galipremsagar

Fix/correct cudf installed package requirements (#7688) @robertmaynard

Fix SparkMurmurHash3_32 hash inconsistencies with Apache Spark (#7672) @jlowe

Fix ORC reader issue with reading empty string columns (#7656) @rgsl888prabhu

Fix Java Parquet write after writer API changes (#7655) @revans2

Fixing empty null lists throwing explode_outer for a loop. (#7649) @hyperbolic2346

Fix internal compiler error during JNI Docker build (#7645) @jlowe

Fix Debug build break with device_uvectors in grouped_rolling.cu (#7633) @mythrocks

Parquet reader: Fix issue when using skip_rows on non-nested columns containing nulls (#7627) @nvdbaranec

Fix ORC reader for empty DataFrame/Table (#7624) @rgsl888prabhu

Fix specifying GPU architecture in JNI build (#7612) @jlowe

Fix ORC writer OOM issue (#7605) @vuule

Fix 0.18 --> 0.19 automerge (#7589) @kkraus14

Fix ORC issue with incorrect timestamp nanosecond values (#7581) @vuule

Fix missing Dask imports (#7580) @kkraus14

CMAKE_CUDA_ARCHITECTURES doesn't change when build-system invokes cmake (#7579) @robertmaynard

Another fix for offsets_end() iterator in lists_column_view (#7575) @ttnghia

Fix ORC writer output corruption with string columns (#7565) @vuule

Fix cudf::lists::sort_lists failing for sliced column (#7564) @ttnghia

FIX Fix Anaconda upload args (#7558) @dillon-cullinan

Fix index mismatch issue in equality related APIs (#7555) @galipremsagar

FIX Revert gpuci_conda_retry on conda file output locations (#7552) @dillon-cullinan

Fix offset_end iterator for lists_column_view, which was not correctl… (#7551) @ttnghia

Fix no such file dlpack.h error when build libcudf (#7549) @chenrui17

Update missing docstring examples in python public APIs (#7546) @galipremsagar

Decimal32 Build Fix (#7544) @razajafri

FIX Retry conda output location (#7540) @dillon-cullinan

fix missing renames of dask git branches from master to main (#7535) @kkraus14

Remove detail from device_span (#7533) @rwlee

Change dask and distributed branch to main (#7532) @dantegd

Update JNI build to use CUDF_USE_ARROW_STATIC (#7526) @jlowe

Make sure rmm::rmm CMake target is visibile to cudf users (#7524) @robertmaynard

Fix contiguous_split not properly handling output partitions > 2 GB. (#7515) @nvdbaranec

Change jit launch to safe_launch (#7510) @devavret

Fix comparison between Datetime/Timedelta columns and NULL scalars (#7504) @brandon-b-miller

Fix off-by-one error in char-parallel string scalar replace (#7502) @jlowe

Fix JNI deprecation of all, put it on the wrong version before (#7501) @revans2

Fix Series/Dataframe Mixed Arithmetic (#7491) @brandon-b-miller

Fix JNI build after removal of libcudf sub-libraries (#7486) @jlowe

Correctly compile benchmarks (#7485) @robertmaynard

Fix bool column corruption with ORC Reader (#7483) @rgsl888prabhu

Fix __repr__ for categorical dtype (#7476) @galipremsagar

Java cleaner synchronization (#7474) @abellina

Fix java float/double parsing tests (#7473) @revans2

Pass stream and user resource to make_default_constructed_scalar (#7469) @magnatelee

Improve stability of dask_cudf.DataFrame.var and dask_cudf.DataFrame.std (#7453) @rjzamora

Missing device_storage_dispatch change affecting cudf::gather (#7449) @codereport

fix cuFile JNI compile errors (#7445) @rongou

Support Series.__setitem__ with key to a new row (#7443) @isVoid

Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source

Make inclusive scan safe for cases with leading nulls (#7432) @magnatelee

Fix typo in list_device_view::pair_rep_end() (#7423) @mythrocks

Fix string to double conversion and row equivalent comparison (#7410) @ttnghia

Fix thrust failure when transfering data from device_vector to host_vector with vectors of size 1 (#7382) @ttnghia

Fix std::exeception catch-by-reference gcc9 compile error (#7380) @davidwendt

Fix skiprows issue with ORC Reader (#7359) @rgsl888prabhu

fix Arrow CMake file (#7358) @rongou

Fix lists::contains() for NaN and Decimals (#7349) @mythrocks

Handle cupy array in Dataframe.__setitem__ (#7340) @galipremsagar

Fix invalid-device-fn error in cudf::strings::replace_re with multiple regex's (#7336) @davidwendt

FIX Add codecov upload block to gpu script (#6860) @dillon-cullinan

📖 Documentation

Fix join API doxygen (#7890) @shwina

Add Resources to README. (#7697) @bdice

Add isin examples in Docstring (#7479) @galipremsagar

Resolving unlinked type shorthands in cudf doc (#7416) @isVoid

Fix typo in regex.md doc page (#7363) @davidwendt

Fix incorrect strings_column_view::chars_size documentation (#7360) @jlowe

🚀 New Features

Enable basic reductions for decimal columns (#7776) @ChrisJar

Enable join on decimal columns (#7764) @ChrisJar

Allow merging index column with data column using keyword "on" (#7736) @skirui-source

Implement DecimalColumn + Scalar and add cudf.Scalars of Decimal64Dtype (#7732) @brandon-b-miller

Add support for unique groupby aggregation (#7726) @shwina

Expose libcudf's label_bins function to cudf (#7724) @vyasr

Adding support for equi-join on struct (#7720) @hyperbolic2346

Add decimal column comparison operations (#7716) @isVoid

Implement scan operations for decimal columns (#7707) @ChrisJar

Enable typecasting between decimal and int (#7691) @ChrisJar

Enable decimal support in parquet writer (#7673) @devavret

Adds list.unique API (#7664) @isVoid

Fix NaN handling in drop_list_duplicates (#7662) @ttnghia

Add lists.sort_values API (#7657) @isVoid

Add is_integer API that can check for the validity of a string-to-integer conversion (#7642) @ttnghia

Adds explode API (#7607) @isVoid

Adds list.take, python binding for cudf::lists::segmented_gather (#7591) @isVoid

Implement cudf::label_bins() (#7554) @vyasr

Add Python bindings for lists::contains (#7547) @skirui-source

cudf::row_bit_count() support. (#7534) @nvdbaranec

Implement drop_list_duplicates (#7528) @ttnghia

Add Python bindings for lists::extract_lists_element (#7505) @skirui-source

Add explode_outer and explode_outer_position (#7499) @hyperbolic2346

Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller

Add struct support to parquet writer (#7461) @devavret

Enable type conversion from float to decimal type (#7450) @ChrisJar

Add cython for converting strings/fixed-point functions (#7429) @davidwendt

Add struct column support to cudf::sort and cudf::sorted_order (#7422) @karthikeyann

Implement groupby collect_set (#7420) @ttnghia

Merge branch-0.18 into branch-0.19 (#7411) @raydouglass

Refactor strings column factories (#7397) @harrism

Add groupby scan operations (sort groupby) (#7387) @karthikeyann

Add cudf::explode_position (#7376) @hyperbolic2346

Add string conversion to/from decimal values libcudf APIs (#7364) @davidwendt

Add groupby SUM_OF_SQUARES support (#7362) @karthikeyann

Add Series.drop api (#7304) @isVoid

get_json_object() implementation (#7286) @nvdbaranec

Python API for LIstMethods.len() (#7283) @isVoid

Support null_policy::EXCLUDE for COLLECT rolling aggregation (#7264) @mythrocks

Add support for special tokens in nvtext::subword_tokenizer (#7254) @davidwendt

Fix inplace update of data and add Series.update (#7201) @galipremsagar

Implement cudf::group_by (hash) for decimal32 and decimal64 (#7190) @codereport

Adding support to specify "level" parameter for Dataframe.rename (#7135) @skirui-source

🛠️ Improvements

fix GDS include path for version 0.95 (#7877) @rongou

Update dask + distributed to 2021.4.0 (#7858) @jakirkham

Add ability to extract include dirs from CUDF_HOME (#7848) @galipremsagar

Add USE_GDS as an option in build script (#7833) @pxLi

add an allocate method with stream in java DeviceMemoryBuffer (#7826) @rongou

Constrain dask and distributed versions to 2021.3.1 (#7825) @shwina

Revert dask versioning of concat dispatch (#7823) @galipremsagar

add copy methods in Java memory buffer (#7791) @rongou

Update README and CONTRIBUTING for 0.19 (#7778) @robertmaynard

Allow hash_partition to take a seed value (#7771) @magnatelee

Turn on NVTX by default in java build (#7761) @tgravescs

Add Java bindings to join gather map APIs (#7751) @jlowe

Add replacements column support for Java replaceNulls (#7750) @jlowe

Add Java bindings for row_bit_count (#7749) @jlowe

Remove unused JVM array creation (#7748) @jlowe

Added JNI support for new is_integer (#7739) @revans2

Create and promote library aliases in libcudf installations (#7734) @trxcllnt

Support groupby operations for decimal dtypes (#7731) @vyasr

Memory map the input file only when GDS compatiblity mode is not used (#7717) @vuule

Replace device_vector with device_uvector in null_mask (#7715) @harrism

Struct hashing support for SerialMurmur3 and SparkMurmur3 (#7714) @jlowe

Add gbenchmark for nvtext replace-tokens function (#7708) @davidwendt

Use stream in groupby calls (#7705) @karthikeyann

Update codeowners file (#7701) @ajschmidt8

Cleanup groupby to use host_span, device_span, device_uvector (#7698) @karthikeyann

Add gbenchmark for nvtext ngrams functions (#7693) @davidwendt

Misc Python/Cython optimizations (#7686) @shwina

Add gbenchmark for nvtext tokenize functions (#7684) @davidwendt

Add column_device_view to orc writer (#7676) @kaatish

cudf_kafka now uses cuDF CMake export targets (CPM) (#7674) @robertmaynard

Add gbenchmark for nvtext normalize functions (#7668) @davidwendt

Resolve unnecessary import of thrust/optional.hpp in types.hpp (#7667) @vyasr

Feature/optimize accessor copy (#7660) @vyasr

Fix find_package(cudf) (#7658) @trxcllnt

Work-around for gcc7 compile error on Centos7 (#7652) @davidwendt

Add in JNI support for count_elements (#7651) @revans2

Fix issues with building cudf in a non-conda environment (#7647) @galipremsagar

Refactor ConfigureCUDA to not conditionally insert compiler flags (#7643) @robertmaynard

Add gbenchmark for converting strings to/from timestamps (#7641) @davidwendt

Handle constructing a cudf.Scalar from a cudf.Scalar (#7639) @shwina

Add in JNI support for table partition (#7637) @revans2

Add explicit fixed_point merge test (#7635) @codereport

Add JNI support for IDENTITY hash partitioning (#7626) @revans2

Java support on explode_outer (#7625) @sperlingxx

Java support of casting string from/to decimal (#7623) @sperlingxx

Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism

Add gbenchmark for cudf::strings::translate function (#7617) @davidwendt

Use file(COPY ) over file(INSTALL ) so cmake output is reduced (#7616) @robertmaynard

Use rmm::device_uvector in place of rmm::device_vector for ORC reader/writer and cudf::io::column_buffer (#7614) @vuule

Refactor Java host-side buffer concatenation to expose separate steps (#7610) @jlowe

Add gbenchmarks for string substrings functions (#7603) @davidwendt

Refactor string conversion check (#7599) @ttnghia

JNI: Pass names of children struct columns to native Arrow IPC writer (#7598) @firestarman

Revert "ENH Fix stale GHA and prevent duplicates " (#7595) @mike-wendt

ENH Fix stale GHA and prevent duplicates (#7594) @mike-wendt

Fix auto-detecting GPU architectures (#7593) @trxcllnt

Reduce cudf library size (#7583) @robertmaynard

Optimize cudf::make_strings_column for long strings (#7576) @davidwendt

Always build and export the cudf::cudftestutil target (#7574) @trxcllnt

Eliminate literal parameters to uvector::set_element_async and device_scalar::set_value (#7563) @harrism

Add gbenchmark for strings::concatenate (#7560) @davidwendt

Update Changelog Link (#7550) @ajschmidt8

Add gbenchmarks for strings replace regex functions (#7541) @davidwendt

Add __repr__ for Column and ColumnAccessor (#7531) @shwina

Support Decimal DIV changes in cudf (#7527) @razajafri

Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt

Use device_uvector, device_span in sort groupby (#7523) @karthikeyann

Add gbenchmarks for strings extract function (#7522) @davidwendt

Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt

Reduce compile time/size for scan.cu (#7516) @davidwendt

Change device_vector to device_uvector in nvtext source files (#7512) @davidwendt

Removed unneeded includes from traits.hpp (#7509) @davidwendt

FIX Remove random build directory generation for ccache (#7508) @dillon-cullinan

xfail failing pytest in pandas 1.2.3 (#7507) @galipremsagar

JNI bit cast (#7493) @revans2

Combine rolling window function tests (#7480) @mythrocks

Prepare Changelog for Automation (#7477) @ajschmidt8

Java support for explode position (#7471) @sperlingxx

Update 0.18 changelog entry (#7463) @ajschmidt8

JNI: Support skipping nulls for collect aggregation (#7457) @firestarman

Join APIs that return gathermaps (#7454) @shwina

Remove dependence on managed memory for multimap test (#7451) @jrhemstad

Use cuFile for Parquet IO when available (#7444) @vuule

Statistics cleanup (#7439) @kaatish

Add gbenchmarks for strings filter functions (#7438) @davidwendt

fixed_point + cudf::binary_operation API Changes (#7435) @codereport

Improve string gather performance (#7433) @jlowe

Don't use user resource for a temporary allocation in sort_by_key (#7431) @magnatelee

Detail APIs for datetime functions (#7430) @magnatelee

Replace thrust::max_element with thrust::reduce in strings findall_re (#7428) @davidwendt

Add gbenchmark for strings split/split_record functions (#7427) @davidwendt

Update JNI build to use CMAKE_CUDA_ARCHITECTURES (#7425) @jlowe

Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt

Simplify type dispatch with device_storage_dispatch (#7419) @codereport

Java support for casting of nested child columns (#7417) @razajafri

Improve scalar string replace performance for long strings (#7415) @jlowe

Remove unneeded temporary device vector for strings scatter specialization (#7409) @davidwendt

bitmask_or implementation with bitmask refactor (#7406) @rwlee

Add other cudf::strings::replace functions to current strings replace gbenchmark (#7403) @davidwendt

Clean up included headers in device_operators.cuh (#7401) @codereport

Move nullable index iterator to indexalator factory (#7399) @davidwendt

ENH Pass ccache variables to conda recipe & use Ninja in CI (#7398) @Ethyling

upgrade maven-antrun-plugin to support maven parallel builds (#7393) @rongou

Add gbenchmark for strings find/contains functions (#7392) @davidwendt

Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard

Refactor libcudf strings::replace to use make_strings_children utility (#7384) @davidwendt

Added in JNI support for out of core sort algorithm (#7381) @revans2

Upgrade pandas to 1.2 (#7375) @galipremsagar

Rename logical_cast to bit_cast and allow additional conversions (#7373) @ttnghia

jitify 2 support (#7372) @cwharris

compile_udf: Cache PTX for similar functions (#7371) @gmarkall

Add string scalar replace benchmark (#7369) @jlowe

Add gbenchmark for strings contains_re/count_re functions (#7366) @davidwendt

Update orc reader and writer fuzz tests (#7357) @galipremsagar

Improve url_decode performance for long strings (#7353) @jlowe

cudf::ast Small Refactorings (#7352) @codereport

Remove std::cout and print in the scatter test function EmptyListsOfNullableStrings. (#7342) @ttnghia

Use cudf::detail::make_counting_transform_iterator (#7338) @codereport

Change block size parameter from a global to a template param. (#7333) @nvdbaranec

Partial clean up of ORC writer (#7324) @vuule

Add gbenchmark for cudf::strings::to_lower (#7316) @davidwendt

Update Java bindings version to 0.19-SNAPSHOT (#7307) @pxLi

Move cudf::test::make_counting_transform_iterator to cudf/detail/iterator.cuh (#7306) @codereport

Use string literals in fixed_point release_asserts (#7303) @codereport

Fix merge conflicts for #7295 (#7297) @ajschmidt8

Add UTF-8 chars to create_random_column<string_view> benchmark utility (#7292) @davidwendt

Abstracting block reduce and block scan from cuIO kernels with cub apis (#7278) @rgsl888prabhu

Build.sh use cmake --build to drive build system invocation (#7270) @robertmaynard

Refactor dictionary support for reductions any/all (#7242) @davidwendt

Replace stream.value() with stream for stream_view args (#7236) @karthikeyann

Interval index and interval_range (#7182) @marlenezw

avro reader integration tests (#7156) @cwharris

Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt

Adding Interval Dtype (#6984) @marlenezw

Cleaning up for loops with make_(counting_)transform_iterator (#6546) @codereport

Source code(tar.gz)
Source code(zip)
v0.19.1(Apr 22, 2021)
🚨 Breaking Changes

Allow hash_partition to take a seed value (#7771) @magnatelee

Allow merging index column with data column using keyword "on" (#7736) @skirui-source

Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2

Replace device_vector with device_uvector in null_mask (#7715) @harrism

Don't identify decimals as strings. (#7710) @vyasr

Fix Java Parquet write after writer API changes (#7655) @revans2

Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism

Update missing docstring examples in python public APIs (#7546) @galipremsagar

Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt

Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt

Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller

Add struct support to parquet writer (#7461) @devavret

Join APIs that return gathermaps (#7454) @shwina

fixed_point + cudf::binary_operation API Changes (#7435) @codereport

Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source

Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt

Refactor strings column factories (#7397) @harrism

Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard

Upgrade pandas to 1.2 (#7375) @galipremsagar

Rename logical_cast to bit_cast and allow additional conversions (#7373) @ttnghia

Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt

🐛 Bug Fixes

Fix returned column type when extracting from an empty list column (#8031) @jlowe

Don't reindex an new value on setitem if the original dataframe was empty (#8026) @vyasr

Fix a NameError in meta dispatch API (#7996) @galipremsagar

Reindex in DataFrame.__setitem__ (#7957) @galipremsagar

jitify direct-to-cubin compilation and caching. (#7919) @cwharris

Use dynamic cudart for nvcomp in java build (#7896) @abellina

fix "incompatible redefinition" warnings (#7894) @cwharris

cudf consistently specifies the cuda runtime (#7887) @robertmaynard

disable verbose output for jitify_preprocess (#7886) @cwharris

CMake jit_preprocess_files function only runs when needed (#7872) @robertmaynard

Push DeviceScalar construction into cython for list.contains (#7864) @brandon-b-miller

cudf now sets an install rpath of $ORIGIN (#7863) @robertmaynard

Don't install Thrust examples, tests, docs, and python files (#7811) @robertmaynard

Sort by index in groupby tests more consistently (#7802) @shwina

Revert "Update conda recipes pinning of repo dependencies (#7743)" (#7793) @raydouglass

Add decimal column handling in copy_type_metadata (#7788) @shwina

Add column names validation in parquet writer (#7786) @galipremsagar

Fix Java explode outer unit tests (#7782) @jlowe

Fix compiler warning about non-POD types passed through ellipsis (#7781) @jrhemstad

User resource fix for replace_nulls (#7769) @magnatelee

Fix type dispatch for columnar replace_nulls (#7768) @jlowe

Add ignore_order parameter to dask-cudf concat dispatch (#7765) @galipremsagar

Fix slicing and arrow representations of decimal columns (#7755) @vyasr

Fixing issue with explode_outer position not nulling position entries of null rows (#7754) @hyperbolic2346

Implement scatter for struct columns (#7752) @ttnghia

Fix data corruption in string columns (#7746) @galipremsagar

Fix string length in stripe dictionary building (#7744) @kaatish

Update conda recipes pinning of repo dependencies (#7743) @mike-wendt

Enable dask dispatch to cuDF's is_categorical_dtype for cuDF objects (#7740) @brandon-b-miller

Fix dictionary size computation in ORC writer (#7737) @vuule

Fix cudf::cast overflow for decimal64 to int32_t or smaller in certain cases (#7733) @codereport

Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2

Disable column_view data accessors for unsupported types (#7725) @jrhemstad

Materialize RangeIndex when index=True in parquet writer (#7711) @galipremsagar

Don't identify decimals as strings. (#7710) @vyasr

Fix return type of DataFrame.argsort (#7706) @galipremsagar

Fix/correct cudf installed package requirements (#7688) @robertmaynard

Fix SparkMurmurHash3_32 hash inconsistencies with Apache Spark (#7672) @jlowe

Fix ORC reader issue with reading empty string columns (#7656) @rgsl888prabhu

Fix Java Parquet write after writer API changes (#7655) @revans2

Fixing empty null lists throwing explode_outer for a loop. (#7649) @hyperbolic2346

Fix internal compiler error during JNI Docker build (#7645) @jlowe

Fix Debug build break with device_uvectors in grouped_rolling.cu (#7633) @mythrocks

Parquet reader: Fix issue when using skip_rows on non-nested columns containing nulls (#7627) @nvdbaranec

Fix ORC reader for empty DataFrame/Table (#7624) @rgsl888prabhu

Fix specifying GPU architecture in JNI build (#7612) @jlowe

Fix ORC writer OOM issue (#7605) @vuule

Fix 0.18 --> 0.19 automerge (#7589) @kkraus14

Fix ORC issue with incorrect timestamp nanosecond values (#7581) @vuule

Fix missing Dask imports (#7580) @kkraus14

CMAKE_CUDA_ARCHITECTURES doesn't change when build-system invokes cmake (#7579) @robertmaynard

Another fix for offsets_end() iterator in lists_column_view (#7575) @ttnghia

Fix ORC writer output corruption with string columns (#7565) @vuule

Fix cudf::lists::sort_lists failing for sliced column (#7564) @ttnghia

FIX Fix Anaconda upload args (#7558) @dillon-cullinan

Fix index mismatch issue in equality related APIs (#7555) @galipremsagar

FIX Revert gpuci_conda_retry on conda file output locations (#7552) @dillon-cullinan

Fix offset_end iterator for lists_column_view, which was not correctl… (#7551) @ttnghia

Fix no such file dlpack.h error when build libcudf (#7549) @chenrui17

Update missing docstring examples in python public APIs (#7546) @galipremsagar

Decimal32 Build Fix (#7544) @razajafri

FIX Retry conda output location (#7540) @dillon-cullinan

fix missing renames of dask git branches from master to main (#7535) @kkraus14

Remove detail from device_span (#7533) @rwlee

Change dask and distributed branch to main (#7532) @dantegd

Update JNI build to use CUDF_USE_ARROW_STATIC (#7526) @jlowe

Make sure rmm::rmm CMake target is visibile to cudf users (#7524) @robertmaynard

Fix contiguous_split not properly handling output partitions > 2 GB. (#7515) @nvdbaranec

Change jit launch to safe_launch (#7510) @devavret

Fix comparison between Datetime/Timedelta columns and NULL scalars (#7504) @brandon-b-miller

Fix off-by-one error in char-parallel string scalar replace (#7502) @jlowe

Fix JNI deprecation of all, put it on the wrong version before (#7501) @revans2

Fix Series/Dataframe Mixed Arithmetic (#7491) @brandon-b-miller

Fix JNI build after removal of libcudf sub-libraries (#7486) @jlowe

Correctly compile benchmarks (#7485) @robertmaynard

Fix bool column corruption with ORC Reader (#7483) @rgsl888prabhu

Fix __repr__ for categorical dtype (#7476) @galipremsagar

Java cleaner synchronization (#7474) @abellina

Fix java float/double parsing tests (#7473) @revans2

Pass stream and user resource to make_default_constructed_scalar (#7469) @magnatelee

Improve stability of dask_cudf.DataFrame.var and dask_cudf.DataFrame.std (#7453) @rjzamora

Missing device_storage_dispatch change affecting cudf::gather (#7449) @codereport

fix cuFile JNI compile errors (#7445) @rongou

Support Series.__setitem__ with key to a new row (#7443) @isVoid

Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source

Make inclusive scan safe for cases with leading nulls (#7432) @magnatelee

Fix typo in list_device_view::pair_rep_end() (#7423) @mythrocks

Fix string to double conversion and row equivalent comparison (#7410) @ttnghia

Fix thrust failure when transfering data from device_vector to host_vector with vectors of size 1 (#7382) @ttnghia

Fix std::exeception catch-by-reference gcc9 compile error (#7380) @davidwendt

Fix skiprows issue with ORC Reader (#7359) @rgsl888prabhu

fix Arrow CMake file (#7358) @rongou

Fix lists::contains() for NaN and Decimals (#7349) @mythrocks

Handle cupy array in Dataframe.__setitem__ (#7340) @galipremsagar

Fix invalid-device-fn error in cudf::strings::replace_re with multiple regex's (#7336) @davidwendt

FIX Add codecov upload block to gpu script (#6860) @dillon-cullinan

📖 Documentation

Fix join API doxygen (#7890) @shwina

Add Resources to README. (#7697) @bdice

Add isin examples in Docstring (#7479) @galipremsagar

Resolving unlinked type shorthands in cudf doc (#7416) @isVoid

Fix typo in regex.md doc page (#7363) @davidwendt

Fix incorrect strings_column_view::chars_size documentation (#7360) @jlowe

🚀 New Features

Enable basic reductions for decimal columns (#7776) @ChrisJar

Enable join on decimal columns (#7764) @ChrisJar

Allow merging index column with data column using keyword "on" (#7736) @skirui-source

Implement DecimalColumn + Scalar and add cudf.Scalars of Decimal64Dtype (#7732) @brandon-b-miller

Add support for unique groupby aggregation (#7726) @shwina

Expose libcudf's label_bins function to cudf (#7724) @vyasr

Adding support for equi-join on struct (#7720) @hyperbolic2346

Add decimal column comparison operations (#7716) @isVoid

Implement scan operations for decimal columns (#7707) @ChrisJar

Enable typecasting between decimal and int (#7691) @ChrisJar

Enable decimal support in parquet writer (#7673) @devavret

Adds list.unique API (#7664) @isVoid

Fix NaN handling in drop_list_duplicates (#7662) @ttnghia

Add lists.sort_values API (#7657) @isVoid

Add is_integer API that can check for the validity of a string-to-integer conversion (#7642) @ttnghia

Adds explode API (#7607) @isVoid

Adds list.take, python binding for cudf::lists::segmented_gather (#7591) @isVoid

Implement cudf::label_bins() (#7554) @vyasr

Add Python bindings for lists::contains (#7547) @skirui-source

cudf::row_bit_count() support. (#7534) @nvdbaranec

Implement drop_list_duplicates (#7528) @ttnghia

Add Python bindings for lists::extract_lists_element (#7505) @skirui-source

Add explode_outer and explode_outer_position (#7499) @hyperbolic2346

Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller

Add struct support to parquet writer (#7461) @devavret

Enable type conversion from float to decimal type (#7450) @ChrisJar

Add cython for converting strings/fixed-point functions (#7429) @davidwendt

Add struct column support to cudf::sort and cudf::sorted_order (#7422) @karthikeyann

Implement groupby collect_set (#7420) @ttnghia

Merge branch-0.18 into branch-0.19 (#7411) @raydouglass

Refactor strings column factories (#7397) @harrism

Add groupby scan operations (sort groupby) (#7387) @karthikeyann

Add cudf::explode_position (#7376) @hyperbolic2346

Add string conversion to/from decimal values libcudf APIs (#7364) @davidwendt

Add groupby SUM_OF_SQUARES support (#7362) @karthikeyann

Add Series.drop api (#7304) @isVoid

get_json_object() implementation (#7286) @nvdbaranec

Python API for LIstMethods.len() (#7283) @isVoid

Support null_policy::EXCLUDE for COLLECT rolling aggregation (#7264) @mythrocks

Add support for special tokens in nvtext::subword_tokenizer (#7254) @davidwendt

Fix inplace update of data and add Series.update (#7201) @galipremsagar

Implement cudf::group_by (hash) for decimal32 and decimal64 (#7190) @codereport

Adding support to specify "level" parameter for Dataframe.rename (#7135) @skirui-source

🛠️ Improvements

fix GDS include path for version 0.95 (#7877) @rongou

Update dask + distributed to 2021.4.0 (#7858) @jakirkham

Add ability to extract include dirs from CUDF_HOME (#7848) @galipremsagar

Add USE_GDS as an option in build script (#7833) @pxLi

add an allocate method with stream in java DeviceMemoryBuffer (#7826) @rongou

Constrain dask and distributed versions to 2021.3.1 (#7825) @shwina

Revert dask versioning of concat dispatch (#7823) @galipremsagar

add copy methods in Java memory buffer (#7791) @rongou

Update README and CONTRIBUTING for 0.19 (#7778) @robertmaynard

Allow hash_partition to take a seed value (#7771) @magnatelee

Turn on NVTX by default in java build (#7761) @tgravescs

Add Java bindings to join gather map APIs (#7751) @jlowe

Add replacements column support for Java replaceNulls (#7750) @jlowe

Add Java bindings for row_bit_count (#7749) @jlowe

Remove unused JVM array creation (#7748) @jlowe

Added JNI support for new is_integer (#7739) @revans2

Create and promote library aliases in libcudf installations (#7734) @trxcllnt

Support groupby operations for decimal dtypes (#7731) @vyasr

Memory map the input file only when GDS compatiblity mode is not used (#7717) @vuule

Replace device_vector with device_uvector in null_mask (#7715) @harrism

Struct hashing support for SerialMurmur3 and SparkMurmur3 (#7714) @jlowe

Add gbenchmark for nvtext replace-tokens function (#7708) @davidwendt

Use stream in groupby calls (#7705) @karthikeyann

Update codeowners file (#7701) @ajschmidt8

Cleanup groupby to use host_span, device_span, device_uvector (#7698) @karthikeyann

Add gbenchmark for nvtext ngrams functions (#7693) @davidwendt

Misc Python/Cython optimizations (#7686) @shwina

Add gbenchmark for nvtext tokenize functions (#7684) @davidwendt

Add column_device_view to orc writer (#7676) @kaatish

cudf_kafka now uses cuDF CMake export targets (CPM) (#7674) @robertmaynard

Add gbenchmark for nvtext normalize functions (#7668) @davidwendt

Resolve unnecessary import of thrust/optional.hpp in types.hpp (#7667) @vyasr

Feature/optimize accessor copy (#7660) @vyasr

Fix find_package(cudf) (#7658) @trxcllnt

Work-around for gcc7 compile error on Centos7 (#7652) @davidwendt

Add in JNI support for count_elements (#7651) @revans2

Fix issues with building cudf in a non-conda environment (#7647) @galipremsagar

Refactor ConfigureCUDA to not conditionally insert compiler flags (#7643) @robertmaynard

Add gbenchmark for converting strings to/from timestamps (#7641) @davidwendt

Handle constructing a cudf.Scalar from a cudf.Scalar (#7639) @shwina

Add in JNI support for table partition (#7637) @revans2

Add explicit fixed_point merge test (#7635) @codereport

Add JNI support for IDENTITY hash partitioning (#7626) @revans2

Java support on explode_outer (#7625) @sperlingxx

Java support of casting string from/to decimal (#7623) @sperlingxx

Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism

Add gbenchmark for cudf::strings::translate function (#7617) @davidwendt

Use file(COPY ) over file(INSTALL ) so cmake output is reduced (#7616) @robertmaynard

Use rmm::device_uvector in place of rmm::device_vector for ORC reader/writer and cudf::io::column_buffer (#7614) @vuule

Refactor Java host-side buffer concatenation to expose separate steps (#7610) @jlowe

Add gbenchmarks for string substrings functions (#7603) @davidwendt

Refactor string conversion check (#7599) @ttnghia

JNI: Pass names of children struct columns to native Arrow IPC writer (#7598) @firestarman

Revert "ENH Fix stale GHA and prevent duplicates " (#7595) @mike-wendt

ENH Fix stale GHA and prevent duplicates (#7594) @mike-wendt

Fix auto-detecting GPU architectures (#7593) @trxcllnt

Reduce cudf library size (#7583) @robertmaynard

Optimize cudf::make_strings_column for long strings (#7576) @davidwendt

Always build and export the cudf::cudftestutil target (#7574) @trxcllnt

Eliminate literal parameters to uvector::set_element_async and device_scalar::set_value (#7563) @harrism

Add gbenchmark for strings::concatenate (#7560) @davidwendt

Update Changelog Link (#7550) @ajschmidt8

Add gbenchmarks for strings replace regex functions (#7541) @davidwendt

Add __repr__ for Column and ColumnAccessor (#7531) @shwina

Support Decimal DIV changes in cudf (#7527) @razajafri

Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt

Use device_uvector, device_span in sort groupby (#7523) @karthikeyann

Add gbenchmarks for strings extract function (#7522) @davidwendt

Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt

Reduce compile time/size for scan.cu (#7516) @davidwendt

Change device_vector to device_uvector in nvtext source files (#7512) @davidwendt

Removed unneeded includes from traits.hpp (#7509) @davidwendt

FIX Remove random build directory generation for ccache (#7508) @dillon-cullinan

xfail failing pytest in pandas 1.2.3 (#7507) @galipremsagar

JNI bit cast (#7493) @revans2

Combine rolling window function tests (#7480) @mythrocks

Prepare Changelog for Automation (#7477) @ajschmidt8

Java support for explode position (#7471) @sperlingxx

Update 0.18 changelog entry (#7463) @ajschmidt8

JNI: Support skipping nulls for collect aggregation (#7457) @firestarman

Join APIs that return gathermaps (#7454) @shwina

Remove dependence on managed memory for multimap test (#7451) @jrhemstad

Use cuFile for Parquet IO when available (#7444) @vuule

Statistics cleanup (#7439) @kaatish

Add gbenchmarks for strings filter functions (#7438) @davidwendt

fixed_point + cudf::binary_operation API Changes (#7435) @codereport

Improve string gather performance (#7433) @jlowe

Don't use user resource for a temporary allocation in sort_by_key (#7431) @magnatelee

Detail APIs for datetime functions (#7430) @magnatelee

Replace thrust::max_element with thrust::reduce in strings findall_re (#7428) @davidwendt

Add gbenchmark for strings split/split_record functions (#7427) @davidwendt

Update JNI build to use CMAKE_CUDA_ARCHITECTURES (#7425) @jlowe

Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt

Simplify type dispatch with device_storage_dispatch (#7419) @codereport

Java support for casting of nested child columns (#7417) @razajafri

Improve scalar string replace performance for long strings (#7415) @jlowe

Remove unneeded temporary device vector for strings scatter specialization (#7409) @davidwendt

bitmask_or implementation with bitmask refactor (#7406) @rwlee

Add other cudf::strings::replace functions to current strings replace gbenchmark (#7403) @davidwendt

Clean up included headers in device_operators.cuh (#7401) @codereport

Move nullable index iterator to indexalator factory (#7399) @davidwendt

ENH Pass ccache variables to conda recipe & use Ninja in CI (#7398) @Ethyling

upgrade maven-antrun-plugin to support maven parallel builds (#7393) @rongou

Add gbenchmark for strings find/contains functions (#7392) @davidwendt

Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard

Refactor libcudf strings::replace to use make_strings_children utility (#7384) @davidwendt

Added in JNI support for out of core sort algorithm (#7381) @revans2

Upgrade pandas to 1.2 (#7375) @galipremsagar

Rename logical_cast to bit_cast and allow additional conversions (#7373) @ttnghia

jitify 2 support (#7372) @cwharris

compile_udf: Cache PTX for similar functions (#7371) @gmarkall

Add string scalar replace benchmark (#7369) @jlowe

Add gbenchmark for strings contains_re/count_re functions (#7366) @davidwendt

Update orc reader and writer fuzz tests (#7357) @galipremsagar

Improve url_decode performance for long strings (#7353) @jlowe

cudf::ast Small Refactorings (#7352) @codereport

Remove std::cout and print in the scatter test function EmptyListsOfNullableStrings. (#7342) @ttnghia

Use cudf::detail::make_counting_transform_iterator (#7338) @codereport

Change block size parameter from a global to a template param. (#7333) @nvdbaranec

Partial clean up of ORC writer (#7324) @vuule

Add gbenchmark for cudf::strings::to_lower (#7316) @davidwendt

Update Java bindings version to 0.19-SNAPSHOT (#7307) @pxLi

Move cudf::test::make_counting_transform_iterator to cudf/detail/iterator.cuh (#7306) @codereport

Use string literals in fixed_point release_asserts (#7303) @codereport

Fix merge conflicts for #7295 (#7297) @ajschmidt8

Add UTF-8 chars to create_random_column<string_view> benchmark utility (#7292) @davidwendt

Abstracting block reduce and block scan from cuIO kernels with cub apis (#7278) @rgsl888prabhu

Build.sh use cmake --build to drive build system invocation (#7270) @robertmaynard

Refactor dictionary support for reductions any/all (#7242) @davidwendt

Replace stream.value() with stream for stream_view args (#7236) @karthikeyann

Interval index and interval_range (#7182) @marlenezw

avro reader integration tests (#7156) @cwharris

Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt

Adding Interval Dtype (#6984) @marlenezw

Cleaning up for loops with make_(counting_)transform_iterator (#6546) @codereport

Source code(tar.gz)
Source code(zip)
v0.19.0(Apr 21, 2021)
🚨 Breaking Changes

Allow hash_partition to take a seed value (#7771) @magnatelee

Allow merging index column with data column using keyword "on" (#7736) @skirui-source

Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2

Replace device_vector with device_uvector in null_mask (#7715) @harrism

Don't identify decimals as strings. (#7710) @vyasr

Fix Java Parquet write after writer API changes (#7655) @revans2

Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism

Update missing docstring examples in python public APIs (#7546) @galipremsagar

Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt

Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt

Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller

Add struct support to parquet writer (#7461) @devavret

Join APIs that return gathermaps (#7454) @shwina

fixed_point + cudf::binary_operation API Changes (#7435) @codereport

Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source

Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt

Refactor strings column factories (#7397) @harrism

Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard

Upgrade pandas to 1.2 (#7375) @galipremsagar

Rename logical_cast to bit_cast and allow additional conversions (#7373) @ttnghia

Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt

🐛 Bug Fixes

Fix a NameError in meta dispatch API (#7996) @galipremsagar

Reindex in DataFrame.__setitem__ (#7957) @galipremsagar

jitify direct-to-cubin compilation and caching. (#7919) @cwharris

Use dynamic cudart for nvcomp in java build (#7896) @abellina

fix "incompatible redefinition" warnings (#7894) @cwharris

cudf consistently specifies the cuda runtime (#7887) @robertmaynard

disable verbose output for jitify_preprocess (#7886) @cwharris

CMake jit_preprocess_files function only runs when needed (#7872) @robertmaynard

Push DeviceScalar construction into cython for list.contains (#7864) @brandon-b-miller

cudf now sets an install rpath of $ORIGIN (#7863) @robertmaynard

Don't install Thrust examples, tests, docs, and python files (#7811) @robertmaynard

Sort by index in groupby tests more consistently (#7802) @shwina

Revert "Update conda recipes pinning of repo dependencies (#7743)" (#7793) @raydouglass

Add decimal column handling in copy_type_metadata (#7788) @shwina

Add column names validation in parquet writer (#7786) @galipremsagar

Fix Java explode outer unit tests (#7782) @jlowe

Fix compiler warning about non-POD types passed through ellipsis (#7781) @jrhemstad

User resource fix for replace_nulls (#7769) @magnatelee

Fix type dispatch for columnar replace_nulls (#7768) @jlowe

Add ignore_order parameter to dask-cudf concat dispatch (#7765) @galipremsagar

Fix slicing and arrow representations of decimal columns (#7755) @vyasr

Fixing issue with explode_outer position not nulling position entries of null rows (#7754) @hyperbolic2346

Implement scatter for struct columns (#7752) @ttnghia

Fix data corruption in string columns (#7746) @galipremsagar

Fix string length in stripe dictionary building (#7744) @kaatish

Update conda recipes pinning of repo dependencies (#7743) @mike-wendt

Enable dask dispatch to cuDF's is_categorical_dtype for cuDF objects (#7740) @brandon-b-miller

Fix dictionary size computation in ORC writer (#7737) @vuule

Fix cudf::cast overflow for decimal64 to int32_t or smaller in certain cases (#7733) @codereport

Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2

Disable column_view data accessors for unsupported types (#7725) @jrhemstad

Materialize RangeIndex when index=True in parquet writer (#7711) @galipremsagar

Don't identify decimals as strings. (#7710) @vyasr

Fix return type of DataFrame.argsort (#7706) @galipremsagar

Fix/correct cudf installed package requirements (#7688) @robertmaynard

Fix SparkMurmurHash3_32 hash inconsistencies with Apache Spark (#7672) @jlowe

Fix ORC reader issue with reading empty string columns (#7656) @rgsl888prabhu

Fix Java Parquet write after writer API changes (#7655) @revans2

Fixing empty null lists throwing explode_outer for a loop. (#7649) @hyperbolic2346

Fix internal compiler error during JNI Docker build (#7645) @jlowe

Fix Debug build break with device_uvectors in grouped_rolling.cu (#7633) @mythrocks

Parquet reader: Fix issue when using skip_rows on non-nested columns containing nulls (#7627) @nvdbaranec

Fix ORC reader for empty DataFrame/Table (#7624) @rgsl888prabhu

Fix specifying GPU architecture in JNI build (#7612) @jlowe

Fix ORC writer OOM issue (#7605) @vuule

Fix 0.18 --> 0.19 automerge (#7589) @kkraus14

Fix ORC issue with incorrect timestamp nanosecond values (#7581) @vuule

Fix missing Dask imports (#7580) @kkraus14

CMAKE_CUDA_ARCHITECTURES doesn't change when build-system invokes cmake (#7579) @robertmaynard

Another fix for offsets_end() iterator in lists_column_view (#7575) @ttnghia

Fix ORC writer output corruption with string columns (#7565) @vuule

Fix cudf::lists::sort_lists failing for sliced column (#7564) @ttnghia

FIX Fix Anaconda upload args (#7558) @dillon-cullinan

Fix index mismatch issue in equality related APIs (#7555) @galipremsagar

FIX Revert gpuci_conda_retry on conda file output locations (#7552) @dillon-cullinan

Fix offset_end iterator for lists_column_view, which was not correctl… (#7551) @ttnghia

Fix no such file dlpack.h error when build libcudf (#7549) @chenrui17

Update missing docstring examples in python public APIs (#7546) @galipremsagar

Decimal32 Build Fix (#7544) @razajafri

FIX Retry conda output location (#7540) @dillon-cullinan

fix missing renames of dask git branches from master to main (#7535) @kkraus14

Remove detail from device_span (#7533) @rwlee

Change dask and distributed branch to main (#7532) @dantegd

Update JNI build to use CUDF_USE_ARROW_STATIC (#7526) @jlowe

Make sure rmm::rmm CMake target is visibile to cudf users (#7524) @robertmaynard

Fix contiguous_split not properly handling output partitions > 2 GB. (#7515) @nvdbaranec

Change jit launch to safe_launch (#7510) @devavret

Fix comparison between Datetime/Timedelta columns and NULL scalars (#7504) @brandon-b-miller

Fix off-by-one error in char-parallel string scalar replace (#7502) @jlowe

Fix JNI deprecation of all, put it on the wrong version before (#7501) @revans2

Fix Series/Dataframe Mixed Arithmetic (#7491) @brandon-b-miller

Fix JNI build after removal of libcudf sub-libraries (#7486) @jlowe

Correctly compile benchmarks (#7485) @robertmaynard

Fix bool column corruption with ORC Reader (#7483) @rgsl888prabhu

Fix __repr__ for categorical dtype (#7476) @galipremsagar

Java cleaner synchronization (#7474) @abellina

Fix java float/double parsing tests (#7473) @revans2

Pass stream and user resource to make_default_constructed_scalar (#7469) @magnatelee

Improve stability of dask_cudf.DataFrame.var and dask_cudf.DataFrame.std (#7453) @rjzamora

Missing device_storage_dispatch change affecting cudf::gather (#7449) @codereport

fix cuFile JNI compile errors (#7445) @rongou

Support Series.__setitem__ with key to a new row (#7443) @isVoid

Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source

Make inclusive scan safe for cases with leading nulls (#7432) @magnatelee

Fix typo in list_device_view::pair_rep_end() (#7423) @mythrocks

Fix string to double conversion and row equivalent comparison (#7410) @ttnghia

Fix thrust failure when transfering data from device_vector to host_vector with vectors of size 1 (#7382) @ttnghia

Fix std::exeception catch-by-reference gcc9 compile error (#7380) @davidwendt

Fix skiprows issue with ORC Reader (#7359) @rgsl888prabhu

fix Arrow CMake file (#7358) @rongou

Fix lists::contains() for NaN and Decimals (#7349) @mythrocks

Handle cupy array in Dataframe.__setitem__ (#7340) @galipremsagar

Fix invalid-device-fn error in cudf::strings::replace_re with multiple regex's (#7336) @davidwendt

FIX Add codecov upload block to gpu script (#6860) @dillon-cullinan

📖 Documentation

Fix join API doxygen (#7890) @shwina

Add Resources to README. (#7697) @bdice

Add isin examples in Docstring (#7479) @galipremsagar

Resolving unlinked type shorthands in cudf doc (#7416) @isVoid

Fix typo in regex.md doc page (#7363) @davidwendt

Fix incorrect strings_column_view::chars_size documentation (#7360) @jlowe

🚀 New Features

Enable basic reductions for decimal columns (#7776) @ChrisJar

Enable join on decimal columns (#7764) @ChrisJar

Allow merging index column with data column using keyword "on" (#7736) @skirui-source

Implement DecimalColumn + Scalar and add cudf.Scalars of Decimal64Dtype (#7732) @brandon-b-miller

Add support for unique groupby aggregation (#7726) @shwina

Expose libcudf's label_bins function to cudf (#7724) @vyasr

Adding support for equi-join on struct (#7720) @hyperbolic2346

Add decimal column comparison operations (#7716) @isVoid

Implement scan operations for decimal columns (#7707) @ChrisJar

Enable typecasting between decimal and int (#7691) @ChrisJar

Enable decimal support in parquet writer (#7673) @devavret

Adds list.unique API (#7664) @isVoid

Fix NaN handling in drop_list_duplicates (#7662) @ttnghia

Add lists.sort_values API (#7657) @isVoid

Add is_integer API that can check for the validity of a string-to-integer conversion (#7642) @ttnghia

Adds explode API (#7607) @isVoid

Adds list.take, python binding for cudf::lists::segmented_gather (#7591) @isVoid

Implement cudf::label_bins() (#7554) @vyasr

Add Python bindings for lists::contains (#7547) @skirui-source

cudf::row_bit_count() support. (#7534) @nvdbaranec

Implement drop_list_duplicates (#7528) @ttnghia

Add Python bindings for lists::extract_lists_element (#7505) @skirui-source

Add explode_outer and explode_outer_position (#7499) @hyperbolic2346

Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller

Add struct support to parquet writer (#7461) @devavret

Enable type conversion from float to decimal type (#7450) @ChrisJar

Add cython for converting strings/fixed-point functions (#7429) @davidwendt

Add struct column support to cudf::sort and cudf::sorted_order (#7422) @karthikeyann

Implement groupby collect_set (#7420) @ttnghia

Merge branch-0.18 into branch-0.19 (#7411) @raydouglass

Refactor strings column factories (#7397) @harrism

Add groupby scan operations (sort groupby) (#7387) @karthikeyann

Add cudf::explode_position (#7376) @hyperbolic2346

Add string conversion to/from decimal values libcudf APIs (#7364) @davidwendt

Add groupby SUM_OF_SQUARES support (#7362) @karthikeyann

Add Series.drop api (#7304) @isVoid

get_json_object() implementation (#7286) @nvdbaranec

Python API for LIstMethods.len() (#7283) @isVoid

Support null_policy::EXCLUDE for COLLECT rolling aggregation (#7264) @mythrocks

Add support for special tokens in nvtext::subword_tokenizer (#7254) @davidwendt

Fix inplace update of data and add Series.update (#7201) @galipremsagar

Implement cudf::group_by (hash) for decimal32 and decimal64 (#7190) @codereport

Adding support to specify "level" parameter for Dataframe.rename (#7135) @skirui-source

🛠️ Improvements

fix GDS include path for version 0.95 (#7877) @rongou

Update dask + distributed to 2021.4.0 (#7858) @jakirkham

Add ability to extract include dirs from CUDF_HOME (#7848) @galipremsagar

Add USE_GDS as an option in build script (#7833) @pxLi

add an allocate method with stream in java DeviceMemoryBuffer (#7826) @rongou

Constrain dask and distributed versions to 2021.3.1 (#7825) @shwina

Revert dask versioning of concat dispatch (#7823) @galipremsagar

add copy methods in Java memory buffer (#7791) @rongou

Update README and CONTRIBUTING for 0.19 (#7778) @robertmaynard

Allow hash_partition to take a seed value (#7771) @magnatelee

Turn on NVTX by default in java build (#7761) @tgravescs

Add Java bindings to join gather map APIs (#7751) @jlowe

Add replacements column support for Java replaceNulls (#7750) @jlowe

Add Java bindings for row_bit_count (#7749) @jlowe

Remove unused JVM array creation (#7748) @jlowe

Added JNI support for new is_integer (#7739) @revans2

Create and promote library aliases in libcudf installations (#7734) @trxcllnt

Support groupby operations for decimal dtypes (#7731) @vyasr

Memory map the input file only when GDS compatiblity mode is not used (#7717) @vuule

Replace device_vector with device_uvector in null_mask (#7715) @harrism

Struct hashing support for SerialMurmur3 and SparkMurmur3 (#7714) @jlowe

Add gbenchmark for nvtext replace-tokens function (#7708) @davidwendt

Use stream in groupby calls (#7705) @karthikeyann

Update codeowners file (#7701) @ajschmidt8

Cleanup groupby to use host_span, device_span, device_uvector (#7698) @karthikeyann

Add gbenchmark for nvtext ngrams functions (#7693) @davidwendt

Misc Python/Cython optimizations (#7686) @shwina

Add gbenchmark for nvtext tokenize functions (#7684) @davidwendt

Add column_device_view to orc writer (#7676) @kaatish

cudf_kafka now uses cuDF CMake export targets (CPM) (#7674) @robertmaynard

Add gbenchmark for nvtext normalize functions (#7668) @davidwendt

Resolve unnecessary import of thrust/optional.hpp in types.hpp (#7667) @vyasr

Feature/optimize accessor copy (#7660) @vyasr

Fix find_package(cudf) (#7658) @trxcllnt

Work-around for gcc7 compile error on Centos7 (#7652) @davidwendt

Add in JNI support for count_elements (#7651) @revans2

Fix issues with building cudf in a non-conda environment (#7647) @galipremsagar

Refactor ConfigureCUDA to not conditionally insert compiler flags (#7643) @robertmaynard

Add gbenchmark for converting strings to/from timestamps (#7641) @davidwendt

Handle constructing a cudf.Scalar from a cudf.Scalar (#7639) @shwina

Add in JNI support for table partition (#7637) @revans2

Add explicit fixed_point merge test (#7635) @codereport

Add JNI support for IDENTITY hash partitioning (#7626) @revans2

Java support on explode_outer (#7625) @sperlingxx

Java support of casting string from/to decimal (#7623) @sperlingxx

Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism

Add gbenchmark for cudf::strings::translate function (#7617) @davidwendt

Use file(COPY ) over file(INSTALL ) so cmake output is reduced (#7616) @robertmaynard

Use rmm::device_uvector in place of rmm::device_vector for ORC reader/writer and cudf::io::column_buffer (#7614) @vuule

Refactor Java host-side buffer concatenation to expose separate steps (#7610) @jlowe

Add gbenchmarks for string substrings functions (#7603) @davidwendt

Refactor string conversion check (#7599) @ttnghia

JNI: Pass names of children struct columns to native Arrow IPC writer (#7598) @firestarman

Revert "ENH Fix stale GHA and prevent duplicates " (#7595) @mike-wendt

ENH Fix stale GHA and prevent duplicates (#7594) @mike-wendt

Fix auto-detecting GPU architectures (#7593) @trxcllnt

Reduce cudf library size (#7583) @robertmaynard

Optimize cudf::make_strings_column for long strings (#7576) @davidwendt

Always build and export the cudf::cudftestutil target (#7574) @trxcllnt

Eliminate literal parameters to uvector::set_element_async and device_scalar::set_value (#7563) @harrism

Add gbenchmark for strings::concatenate (#7560) @davidwendt

Update Changelog Link (#7550) @ajschmidt8

Add gbenchmarks for strings replace regex functions (#7541) @davidwendt

Add __repr__ for Column and ColumnAccessor (#7531) @shwina

Support Decimal DIV changes in cudf (#7527) @razajafri

Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt

Use device_uvector, device_span in sort groupby (#7523) @karthikeyann

Add gbenchmarks for strings extract function (#7522) @davidwendt

Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt

Reduce compile time/size for scan.cu (#7516) @davidwendt

Change device_vector to device_uvector in nvtext source files (#7512) @davidwendt

Removed unneeded includes from traits.hpp (#7509) @davidwendt

FIX Remove random build directory generation for ccache (#7508) @dillon-cullinan

xfail failing pytest in pandas 1.2.3 (#7507) @galipremsagar

JNI bit cast (#7493) @revans2

Combine rolling window function tests (#7480) @mythrocks

Prepare Changelog for Automation (#7477) @ajschmidt8

Java support for explode position (#7471) @sperlingxx

Update 0.18 changelog entry (#7463) @ajschmidt8

JNI: Support skipping nulls for collect aggregation (#7457) @firestarman

Join APIs that return gathermaps (#7454) @shwina

Remove dependence on managed memory for multimap test (#7451) @jrhemstad

Use cuFile for Parquet IO when available (#7444) @vuule

Statistics cleanup (#7439) @kaatish

Add gbenchmarks for strings filter functions (#7438) @davidwendt

fixed_point + cudf::binary_operation API Changes (#7435) @codereport

Improve string gather performance (#7433) @jlowe

Don't use user resource for a temporary allocation in sort_by_key (#7431) @magnatelee

Detail APIs for datetime functions (#7430) @magnatelee

Replace thrust::max_element with thrust::reduce in strings findall_re (#7428) @davidwendt

Add gbenchmark for strings split/split_record functions (#7427) @davidwendt

Update JNI build to use CMAKE_CUDA_ARCHITECTURES (#7425) @jlowe

Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt

Simplify type dispatch with device_storage_dispatch (#7419) @codereport

Java support for casting of nested child columns (#7417) @razajafri

Improve scalar string replace performance for long strings (#7415) @jlowe

Remove unneeded temporary device vector for strings scatter specialization (#7409) @davidwendt

bitmask_or implementation with bitmask refactor (#7406) @rwlee

Add other cudf::strings::replace functions to current strings replace gbenchmark (#7403) @davidwendt

Clean up included headers in device_operators.cuh (#7401) @codereport

Move nullable index iterator to indexalator factory (#7399) @davidwendt

ENH Pass ccache variables to conda recipe & use Ninja in CI (#7398) @Ethyling

upgrade maven-antrun-plugin to support maven parallel builds (#7393) @rongou

Add gbenchmark for strings find/contains functions (#7392) @davidwendt

Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard

Refactor libcudf strings::replace to use make_strings_children utility (#7384) @davidwendt

Added in JNI support for out of core sort algorithm (#7381) @revans2

Upgrade pandas to 1.2 (#7375) @galipremsagar

Rename logical_cast to bit_cast and allow additional conversions (#7373) @ttnghia

jitify 2 support (#7372) @cwharris

compile_udf: Cache PTX for similar functions (#7371) @gmarkall

Add string scalar replace benchmark (#7369) @jlowe

Add gbenchmark for strings contains_re/count_re functions (#7366) @davidwendt

Update orc reader and writer fuzz tests (#7357) @galipremsagar

Improve url_decode performance for long strings (#7353) @jlowe

cudf::ast Small Refactorings (#7352) @codereport

Remove std::cout and print in the scatter test function EmptyListsOfNullableStrings. (#7342) @ttnghia

Use cudf::detail::make_counting_transform_iterator (#7338) @codereport

Change block size parameter from a global to a template param. (#7333) @nvdbaranec

Partial clean up of ORC writer (#7324) @vuule

Add gbenchmark for cudf::strings::to_lower (#7316) @davidwendt

Update Java bindings version to 0.19-SNAPSHOT (#7307) @pxLi

Move cudf::test::make_counting_transform_iterator to cudf/detail/iterator.cuh (#7306) @codereport

Use string literals in fixed_point release_asserts (#7303) @codereport

Fix merge conflicts for #7295 (#7297) @ajschmidt8

Add UTF-8 chars to create_random_column<string_view> benchmark utility (#7292) @davidwendt

Abstracting block reduce and block scan from cuIO kernels with cub apis (#7278) @rgsl888prabhu

Build.sh use cmake --build to drive build system invocation (#7270) @robertmaynard

Refactor dictionary support for reductions any/all (#7242) @davidwendt

Replace stream.value() with stream for stream_view args (#7236) @karthikeyann

Interval index and interval_range (#7182) @marlenezw

avro reader integration tests (#7156) @cwharris

Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt

Adding Interval Dtype (#6984) @marlenezw

Cleaning up for loops with make_(counting_)transform_iterator (#6546) @codereport

Source code(tar.gz)
Source code(zip)
v0.18.1(Mar 15, 2021)

Source code(tar.gz)
Source code(zip)
v0.18.0(Feb 24, 2021)
Breaking Changes 🚨

Default groupby to sort=False (#7180) @isVoid

Add libcudf API for parsing of ORC statistics (#7136) @vuule

Replace ORC writer api with class (#7099) @rgsl888prabhu

Pack/unpack functionality to convert tables to and from a serialized format. (#7096) @nvdbaranec

Replace parquet writer api with class (#7058) @rgsl888prabhu

Add days check to cudf::is_timestamp using cuda::std::chrono classes (#7028) @davidwendt

Fix default parameter values of write_csv and write_parquet (#6967) @vuule

Align Series.groupby API to match Pandas (#6964) @kkraus14

Share factorize implementation with Index and cudf module (#6885) @brandon-b-miller

Bug Fixes 🐛

Remove incorrect std::move call on return variable (#7319) @davidwendt

Fix failing CI ORC test (#7313) @vuule

Disallow constructing frames from a ColumnAccessor (#7298) @shwina

fix java cuFile tests (#7296) @rongou

Fix style issues related to NumPy (#7279) @shwina

Fix bug when iloc slice terminates at before-the-zero position (#7277) @isVoid

Fix copying dtype metadata after calling libcudf functions (#7271) @shwina

Move lists utility function definition out of header (#7266) @mythrocks

Throw if bool column would cause incorrect result when writing to ORC (#7261) @vuule

Use uvector in replace_nulls; Fix sort_helper::grouped_value doc (#7256) @isVoid

Remove floating point types from cudf::sort fast-path (#7250) @davidwendt

Disallow picking output columns from nested columns. (#7248) @devavret

Fix loc for Series with a MultiIndex (#7243) @shwina

Fix Arrow column test leaks (#7241) @tgravescs

Fix test column vector leak (#7238) @kuhushukla

Fix some bugs in java scalar support for decimal (#7237) @revans2

Improve assert_eq handling of scalar (#7220) @isVoid

Fix missing null_count() comparison in test framework and related failures (#7219) @nvdbaranec

Remove floating point types from radix sort fast-path (#7215) @davidwendt

Fixing parquet benchmarks (#7214) @rgsl888prabhu

Handle various parameter combinations in replace API (#7207) @galipremsagar

Export mock aws credentials for s3 tests (#7176) @ayushdg

Add MultiIndex.rename API (#7172) @isVoid

Fix importing list & struct types in from_arrow (#7162) @galipremsagar

Fixing parquet precision writing failing if scale is equal to precision (#7146) @hyperbolic2346

Update s3 tests to use moto_server (#7144) @ayushdg

Fix JIT cache multi-process test flakiness in slow drives (#7142) @devavret

Fix compilation errors in libcudf (#7138) @galipremsagar

Fix compilation failure caused by -Wall addition. (#7134) @codereport

Add informative error message for sep in CSV writer (#7095) @galipremsagar

Add JIT cache per compute capability (#7090) @devavret

Implement __hash__ method for ListDtype (#7081) @galipremsagar

Only upload packages that were built (#7077) @raydouglass

Fix comparisons between Series and cudf.NA (#7072) @brandon-b-miller

Handle nan values correctly in Series.one_hot_encoding (#7059) @galipremsagar

Add unstack() support for non-multiindexed dataframes (#7054) @isVoid

Fix read_orc for decimal type (#7034) @rgsl888prabhu

Fix backward compatibility of loading a 0.16 pkl file (#7033) @galipremsagar

Decimal casts in JNI became a NOOP (#7032) @revans2

Restore usual instance/subclass checking to cudf.DateOffset (#7029) @shwina

Add days check to cudf::is_timestamp using cuda::std::chrono classes (#7028) @davidwendt

Fix to_csv delimiter handling of timestamp format (#7023) @davidwendt

Pin librdkakfa to gcc 7 compatible version (#7021) @raydouglass

Fix fillna & dropna to also consider np.nan as a missing value (#7019) @galipremsagar

Fix round operator's HALF_EVEN computation for negative integers (#7014) @nartal1

Skip Thrust sort patch if already applied (#7009) @harrism

Fix cudf::hash_partition for decimal32 and decimal64 (#7006) @codereport

Fix Thrust unroll patch command (#7002) @harrism

Fix loc behaviour when key of incorrect type is used (#6993) @shwina

Fix int to datetime conversion in csv_read (#6991) @kaatish

fix excluding cufile tests by default (#6988) @rongou

Fix java cufile tests when cufile is not installed (#6987) @revans2

Make cudf::round for fixed_point when scale = -decimal_places a no-op (#6975) @codereport

Fix type comparison for java (#6970) @revans2

Fix default parameter values of write_csv and write_parquet (#6967) @vuule

Align Series.groupby API to match Pandas (#6964) @kkraus14

Fix timestamp parsing in ORC reader for timezones without transitions (#6959) @vuule

Fix typo in numerical.py (#6957) @rgsl888prabhu

fixed_point_value double-shifts in fixed_point construction (#6950) @codereport

fix libcu++ include path for jni (#6948) @rongou

Fix groupby agg/apply behaviour when no key columns are provided (#6945) @shwina

Avoid inserting null elements into join hash table when nulls are treated as unequal (#6943) @hyperbolic2346

Fix cudf::merge gtest for dictionary columns (#6942) @davidwendt

Pass numeric scalars of the same dtype through numeric binops (#6938) @brandon-b-miller

Fix N/A detection for empty fields in CSV reader (#6922) @vuule

Fix rmm_mode=managed parameter for gtests (#6912) @davidwendt

Fix nullmask offset handling in parquet and orc writer (#6889) @kaatish

Correct the sampling range when sampling with replacement (#6884) @ChrisJar

Handle nested string columns with no children in contiguous_split. (#6864) @nvdbaranec

Fix columns & index handling in dataframe constructor (#6838) @galipremsagar

Documentation 📖

Update readme (#7318) @shwina

Fix typo in cudf.core.column.string.extract docs (#7253) @adelevie

Update doxyfile project number (#7161) @davidwendt

Update 10 minutes to cuDF and CuPy with new APIs (#7158) @ChrisJar

Cross link RMM & libcudf Doxygen docs (#7149) @ajschmidt8

Add documentation for support dtypes in all IO formats (#7139) @galipremsagar

Add groupby docs (#7100) @shwina

Update cudf python docstrings with new null representation (<NA>) (#7050) @galipremsagar

Make Doxygen comments formatting consistent (#7041) @vuule

Add docs for working with missing data (#7010) @galipremsagar

Remove warning in from_dlpack and to_dlpack methods (#7001) @miguelusque

libcudf Developer Guide (#6977) @harrism

Add JNI wrapper for the cuFile API (GDS) (#6940) @rongou

New Features 🚀

Support numeric_only field for rank() (#7213) @isVoid

Add support for cudf::binary_operation TRUE_DIV for decimal32 and decimal64 (#7198) @codereport

Implement COLLECT rolling window aggregation (#7189) @mythrocks

Add support for array-like inputs in cudf.get_dummies (#7181) @galipremsagar

Default groupby to sort=False (#7180) @isVoid

Add libcudf lists column count_elements API (#7173) @davidwendt

Implement cudf::group_by (sort) for decimal32 and decimal64 (#7169) @codereport

Add encoding and compression argument to CSV writer (#7168) @VibhuJawa

cudf::rolling_window SUM support for decimal32 and decimal64 (#7147) @codereport

Adding support for explode to cuDF (#7140) @hyperbolic2346

Add libcudf API for parsing of ORC statistics (#7136) @vuule

update GDS/cuFile location for 0.9 release (#7131) @rongou

Add Segmented sort (#7122) @karthikeyann

Add cudf::binary_operation NULL_MIN, NULL_MAX & NULL_EQUALS for decimal32 and decimal64 (#7119) @codereport

Add scale and value methods to fixed_point (#7109) @codereport

Replace ORC writer api with class (#7099) @rgsl888prabhu

Pack/unpack functionality to convert tables to and from a serialized format. (#7096) @nvdbaranec

Improve digitize API (#7071) @isVoid

Add List types support in data generator (#7064) @galipremsagar

cudf::scan support for decimal32 and decimal64 (#7063) @codereport

cudf::rolling ROW_NUMBER support for decimal32 and decimal64 (#7061) @codereport

Replace parquet writer api with class (#7058) @rgsl888prabhu

Support contains() on lists of primitives (#7039) @mythrocks

Implement cudf::rolling for decimal32 and decimal64 (#7037) @codereport

Add ffill and bfill to string columns (#7036) @isVoid

Enable round in cudf for DataFrame and Series (#7022) @ChrisJar

Extend replace_nulls_policy to string and dictionary type (#7004) @isVoid

Add segmented_gather(list_column, gather_list) (#7003) @karthikeyann

Add method field to fillna for fixed width columns (#6998) @isVoid

Manual merge of branch 0.17 into branch 0.18 (#6995) @shwina

Implement cudf::reduce for decimal32 and decimal64 (part 2) (#6980) @codereport

Add Ufunc alias look up for appropriate numpy ufunc dispatching (#6973) @VibhuJawa

Add pytest-xdist to dev environment.yml (#6958) @galipremsagar

Add Index.set_names api (#6929) @galipremsagar

Add replace_null API with replace_policy parameter, fixed_width column support (#6907) @isVoid

Share factorize implementation with Index and cudf module (#6885) @brandon-b-miller

Implement update() function (#6883) @skirui-source

Add groupby idxmin, idxmax aggregation (#6856) @karthikeyann

Implement cudf::reduce for decimal32 and decimal64 (part 1) (#6814) @codereport

Implement cudf.DateOffset for months (#6775) @brandon-b-miller

Add Python DecimalColumn (#6715) @shwina

Add dictionary support to libcudf groupby functions (#6585) @davidwendt

Improvements 🛠️

Update stale GHA with exemptions & new labels (#7395) @mike-wendt

Add GHA to mark issues/prs as stale/rotten (#7388) @Ethyling

Unpin from numpy < 1.20 (#7335) @shwina

Prepare Changelog for Automation (#7309) @galipremsagar

Prepare Changelog for Automation (#7272) @ajschmidt8

Add JNI support for converting Arrow buffers to CUDF ColumnVectors (#7222) @tgravescs

Add coverage for skiprows and num_rows in parquet reader fuzz testing (#7216) @galipremsagar

Define and implement more behavior for merging on categorical variables (#7209) @brandon-b-miller

Add CudfSeriesGroupBy to optimize dask_cudf groupby-mean (#7194) @rjzamora

Add dictionary column support to rolling_window (#7186) @davidwendt

Modify the semantics of end pointers in cuIO to match standard library (#7179) @vuule

Adding unit tests for fixed_point with extremely large scales (#7178) @codereport

Fast path single column sort (#7167) @davidwendt

Fix -Werror=sign-compare errors in device code (#7164) @trxcllnt

Refactor cudf::string_view host and device code (#7159) @davidwendt

Enable logic for GPU auto-detection in cudfjni (#7155) @gerashegalov

Java bindings for Fixed-point type support for Parquet (#7153) @razajafri

Add Java interface for the new API 'explode' (#7151) @firestarman

Replace offsets with iterators in cuIO utilities and CSV parser (#7150) @vuule

Add gbenchmarks for reduction aggregations any() and all() (#7129) @davidwendt

Update JNI for contiguous_split packed results (#7127) @jlowe

Add JNI and Java bindings for list_contains (#7125) @kuhushukla

Add Java unit tests for window aggregate 'collect' (#7121) @firestarman

verify window operations on decimal with java tests (#7120) @sperlingxx

Adds in JNI support for creating an list column from existing columns (#7112) @revans2

Build libcudf with -Wall (#7105) @trxcllnt

Add column_device_view pointers to EncColumnDesc (#7097) @kaatish

Add pyorc to dev environment (#7085) @galipremsagar

JNI support for creating struct column from existing columns and fixed bug in struct with no children (#7084) @revans2

Fastpath single strings column in cudf::sort (#7075) @davidwendt

Upgrade nvcomp to 1.2.1 (#7069) @rongou

Refactor ORC ProtobufReader to make it more extendable (#7055) @vuule

Add Java tests for decimal casts (#7051) @sperlingxx

Auto-label PRs based on their content (#7044) @jolorunyomi

Create sort gbenchmark for strings column (#7040) @davidwendt

Refactor io memory fetches to use hostdevice_vector methods (#7035) @ChrisJar

Spark Murmur3 hash functionality (#7024) @rwlee

Fix libcudf strings logic where size_type is used to access INT32 column data (#7020) @davidwendt

Adding decimal writing support to parquet (#7017) @hyperbolic2346

Add compression="infer" as default for dask_cudf.read_csv (#7013) @rjzamora

Correct ORC docstring; other minor cuIO improvements (#7012) @vuule

Reduce number of hostdevice_vector allocations in parquet reader (#7005) @devavret

Check output size overflow on strings gather (#6997) @davidwendt

Improve representation of MultiIndex (#6992) @galipremsagar

Disable some pragma unroll statements in thrust sort.h (#6982) @davidwendt

Minor cudf::round internal refactoring (#6976) @codereport

Add Java bindings for URL conversion (#6972) @jlowe

Enable strict_decimal_types in parquet reading (#6969) @sperlingxx

Add in basic support to JNI for logical_cast (#6954) @revans2

Remove duplicate file array_tests.cpp (#6953) @karthikeyann

Add null mask fixed_point_column_wrapper constructors (#6951) @codereport

Update Java bindings version to 0.18-SNAPSHOT (#6949) @jlowe

Use simplified rmm::exec_policy (#6939) @harrism

Add null count test for apply_boolean_mask (#6903) @harrism

Implement DataFrame.quantile for datetime and timedelta data types (#6902) @ChrisJar

Remove **kwargs from string/categorical methods (#6750) @shwina

Refactor rolling.cu to reduce compile time (#6512) @mythrocks

Add static type checking via Mypy (#6381) @shwina

Update to official libcu++ on Github (#6275) @trxcllnt

Source code(tar.gz)
Source code(zip)
v0.17.0(Dec 10, 2020)

v0.17.0 Release
Source code(tar.gz)
Source code(zip)
v0.18.0a(Mar 12, 2021)
🔗 Links

Development Branch

Compare with main branch

🚨 Breaking Changes

Default groupby to sort=False (#7180) @isVoid

Add libcudf API for parsing of ORC statistics (#7136) @vuule

Replace ORC writer api with class (#7099) @rgsl888prabhu

Pack/unpack functionality to convert tables to and from a serialized format. (#7096) @nvdbaranec

Replace parquet writer api with class (#7058) @rgsl888prabhu

Add days check to cudf::is_timestamp using cuda::std::chrono classes (#7028) @davidwendt

Fix default parameter values of write_csv and write_parquet (#6967) @vuule

Align Series.groupby API to match Pandas (#6964) @kkraus14

Share factorize implementation with Index and cudf module (#6885) @brandon-b-miller

🐛 Bug Fixes

Fix null-bounds calculation for ranged window queries (#7568) @mythrocks

Remove incorrect std::move call on return variable (#7319) @davidwendt

Fix failing CI ORC test (#7313) @vuule

Disallow constructing frames from a ColumnAccessor (#7298) @shwina

fix java cuFile tests (#7296) @rongou

Fix style issues related to NumPy (#7279) @shwina

Fix bug when iloc slice terminates at before-the-zero position (#7277) @isVoid

Fix copying dtype metadata after calling libcudf functions (#7271) @shwina

Move lists utility function definition out of header (#7266) @mythrocks

Throw if bool column would cause incorrect result when writing to ORC (#7261) @vuule

Use uvector in replace_nulls; Fix sort_helper::grouped_value doc (#7256) @isVoid

Remove floating point types from cudf::sort fast-path (#7250) @davidwendt

Disallow picking output columns from nested columns. (#7248) @devavret

Fix loc for Series with a MultiIndex (#7243) @shwina

Fix Arrow column test leaks (#7241) @tgravescs

Fix test column vector leak (#7238) @kuhushukla

Fix some bugs in java scalar support for decimal (#7237) @revans2

Improve assert_eq handling of scalar (#7220) @isVoid

Fix missing null_count() comparison in test framework and related failures (#7219) @nvdbaranec

Remove floating point types from radix sort fast-path (#7215) @davidwendt

Fixing parquet benchmarks (#7214) @rgsl888prabhu

Handle various parameter combinations in replace API (#7207) @galipremsagar

Export mock aws credentials for s3 tests (#7176) @ayushdg

Add MultiIndex.rename API (#7172) @isVoid

Fix importing list & struct types in from_arrow (#7162) @galipremsagar

Fixing parquet precision writing failing if scale is equal to precision (#7146) @hyperbolic2346

Update s3 tests to use moto_server (#7144) @ayushdg

Fix JIT cache multi-process test flakiness in slow drives (#7142) @devavret

Fix compilation errors in libcudf (#7138) @galipremsagar

Fix compilation failure caused by -Wall addition. (#7134) @codereport

Add informative error message for sep in CSV writer (#7095) @galipremsagar

Add JIT cache per compute capability (#7090) @devavret

Implement __hash__ method for ListDtype (#7081) @galipremsagar

Only upload packages that were built (#7077) @raydouglass

Fix comparisons between Series and cudf.NA (#7072) @brandon-b-miller

Handle nan values correctly in Series.one_hot_encoding (#7059) @galipremsagar

Add unstack() support for non-multiindexed dataframes (#7054) @isVoid

Fix read_orc for decimal type (#7034) @rgsl888prabhu

Fix backward compatibility of loading a 0.16 pkl file (#7033) @galipremsagar

Decimal casts in JNI became a NOOP (#7032) @revans2

Restore usual instance/subclass checking to cudf.DateOffset (#7029) @shwina

Add days check to cudf::is_timestamp using cuda::std::chrono classes (#7028) @davidwendt

Fix to_csv delimiter handling of timestamp format (#7023) @davidwendt

Pin librdkakfa to gcc 7 compatible version (#7021) @raydouglass

Fix fillna & dropna to also consider np.nan as a missing value (#7019) @galipremsagar

Fix round operator's HALF_EVEN computation for negative integers (#7014) @nartal1

Skip Thrust sort patch if already applied (#7009) @harrism

Fix cudf::hash_partition for decimal32 and decimal64 (#7006) @codereport

Fix Thrust unroll patch command (#7002) @harrism

Fix loc behaviour when key of incorrect type is used (#6993) @shwina

Fix int to datetime conversion in csv_read (#6991) @kaatish

fix excluding cufile tests by default (#6988) @rongou

Fix java cufile tests when cufile is not installed (#6987) @revans2

Make cudf::round for fixed_point when scale = -decimal_places a no-op (#6975) @codereport

Fix type comparison for java (#6970) @revans2

Fix default parameter values of write_csv and write_parquet (#6967) @vuule

Align Series.groupby API to match Pandas (#6964) @kkraus14

Fix timestamp parsing in ORC reader for timezones without transitions (#6959) @vuule

Fix typo in numerical.py (#6957) @rgsl888prabhu

fixed_point_value double-shifts in fixed_point construction (#6950) @codereport

fix libcu++ include path for jni (#6948) @rongou

Fix groupby agg/apply behaviour when no key columns are provided (#6945) @shwina

Avoid inserting null elements into join hash table when nulls are treated as unequal (#6943) @hyperbolic2346

Fix cudf::merge gtest for dictionary columns (#6942) @davidwendt

Pass numeric scalars of the same dtype through numeric binops (#6938) @brandon-b-miller

Fix N/A detection for empty fields in CSV reader (#6922) @vuule

Fix rmm_mode=managed parameter for gtests (#6912) @davidwendt

Fix nullmask offset handling in parquet and orc writer (#6889) @kaatish

Correct the sampling range when sampling with replacement (#6884) @ChrisJar

Handle nested string columns with no children in contiguous_split. (#6864) @nvdbaranec

Fix columns & index handling in dataframe constructor (#6838) @galipremsagar

📖 Documentation

Update readme (#7318) @shwina

Fix typo in cudf.core.column.string.extract docs (#7253) @adelevie

Update doxyfile project number (#7161) @davidwendt

Update 10 minutes to cuDF and CuPy with new APIs (#7158) @ChrisJar

Cross link RMM & libcudf Doxygen docs (#7149) @ajschmidt8

Add documentation for support dtypes in all IO formats (#7139) @galipremsagar

Add groupby docs (#7100) @shwina

Update cudf python docstrings with new null representation (<NA>) (#7050) @galipremsagar

Make Doxygen comments formatting consistent (#7041) @vuule

Add docs for working with missing data (#7010) @galipremsagar

Remove warning in from_dlpack and to_dlpack methods (#7001) @miguelusque

libcudf Developer Guide (#6977) @harrism

Add JNI wrapper for the cuFile API (GDS) (#6940) @rongou

🚀 New Features

Support numeric_only field for rank() (#7213) @isVoid

Add support for cudf::binary_operation TRUE_DIV for decimal32 and decimal64 (#7198) @codereport

Implement COLLECT rolling window aggregation (#7189) @mythrocks

Add support for array-like inputs in cudf.get_dummies (#7181) @galipremsagar

Default groupby to sort=False (#7180) @isVoid

Add libcudf lists column count_elements API (#7173) @davidwendt

Implement cudf::group_by (sort) for decimal32 and decimal64 (#7169) @codereport

Add encoding and compression argument to CSV writer (#7168) @VibhuJawa

cudf::rolling_window SUM support for decimal32 and decimal64 (#7147) @codereport

Adding support for explode to cuDF (#7140) @hyperbolic2346

Add libcudf API for parsing of ORC statistics (#7136) @vuule

update GDS/cuFile location for 0.9 release (#7131) @rongou

Add Segmented sort (#7122) @karthikeyann

Add cudf::binary_operation NULL_MIN, NULL_MAX & NULL_EQUALS for decimal32 and decimal64 (#7119) @codereport

Add scale and value methods to fixed_point (#7109) @codereport

Replace ORC writer api with class (#7099) @rgsl888prabhu

Pack/unpack functionality to convert tables to and from a serialized format. (#7096) @nvdbaranec

Improve digitize API (#7071) @isVoid

Add List types support in data generator (#7064) @galipremsagar

cudf::scan support for decimal32 and decimal64 (#7063) @codereport

cudf::rolling ROW_NUMBER support for decimal32 and decimal64 (#7061) @codereport

Replace parquet writer api with class (#7058) @rgsl888prabhu

Support contains() on lists of primitives (#7039) @mythrocks

Implement cudf::rolling for decimal32 and decimal64 (#7037) @codereport

Add ffill and bfill to string columns (#7036) @isVoid

Enable round in cudf for DataFrame and Series (#7022) @ChrisJar

Extend replace_nulls_policy to string and dictionary type (#7004) @isVoid

Add segmented_gather(list_column, gather_list) (#7003) @karthikeyann

Add method field to fillna for fixed width columns (#6998) @isVoid

Manual merge of branch 0.17 into branch 0.18 (#6995) @shwina

Implement cudf::reduce for decimal32 and decimal64 (part 2) (#6980) @codereport

Add Ufunc alias look up for appropriate numpy ufunc dispatching (#6973) @VibhuJawa

Add pytest-xdist to dev environment.yml (#6958) @galipremsagar

Add Index.set_names api (#6929) @galipremsagar

Add replace_null API with replace_policy parameter, fixed_width column support (#6907) @isVoid

Share factorize implementation with Index and cudf module (#6885) @brandon-b-miller

Implement update() function (#6883) @skirui-source

Add groupby idxmin, idxmax aggregation (#6856) @karthikeyann

Implement cudf::reduce for decimal32 and decimal64 (part 1) (#6814) @codereport

Implement cudf.DateOffset for months (#6775) @brandon-b-miller

Add Python DecimalColumn (#6715) @shwina

Add dictionary support to libcudf groupby functions (#6585) @davidwendt

🛠️ Improvements

Update stale GHA with exemptions & new labels (#7395) @mike-wendt

Add GHA to mark issues/prs as stale/rotten (#7388) @Ethyling

Unpin from numpy < 1.20 (#7335) @shwina

Prepare Changelog for Automation (#7309) @galipremsagar

Prepare Changelog for Automation (#7272) @ajschmidt8

Add JNI support for converting Arrow buffers to CUDF ColumnVectors (#7222) @tgravescs

Add coverage for skiprows and num_rows in parquet reader fuzz testing (#7216) @galipremsagar

Define and implement more behavior for merging on categorical variables (#7209) @brandon-b-miller

Add CudfSeriesGroupBy to optimize dask_cudf groupby-mean (#7194) @rjzamora

Add dictionary column support to rolling_window (#7186) @davidwendt

Modify the semantics of end pointers in cuIO to match standard library (#7179) @vuule

Adding unit tests for fixed_point with extremely large scales (#7178) @codereport

Fast path single column sort (#7167) @davidwendt

Fix -Werror=sign-compare errors in device code (#7164) @trxcllnt

Refactor cudf::string_view host and device code (#7159) @davidwendt

Enable logic for GPU auto-detection in cudfjni (#7155) @gerashegalov

Java bindings for Fixed-point type support for Parquet (#7153) @razajafri

Add Java interface for the new API 'explode' (#7151) @firestarman

Replace offsets with iterators in cuIO utilities and CSV parser (#7150) @vuule

Add gbenchmarks for reduction aggregations any() and all() (#7129) @davidwendt

Update JNI for contiguous_split packed results (#7127) @jlowe

Add JNI and Java bindings for list_contains (#7125) @kuhushukla

Add Java unit tests for window aggregate 'collect' (#7121) @firestarman

verify window operations on decimal with java tests (#7120) @sperlingxx

Adds in JNI support for creating an list column from existing columns (#7112) @revans2

Build libcudf with -Wall (#7105) @trxcllnt

Add column_device_view pointers to EncColumnDesc (#7097) @kaatish

Add pyorc to dev environment (#7085) @galipremsagar

JNI support for creating struct column from existing columns and fixed bug in struct with no children (#7084) @revans2

Fastpath single strings column in cudf::sort (#7075) @davidwendt

Upgrade nvcomp to 1.2.1 (#7069) @rongou

Refactor ORC ProtobufReader to make it more extendable (#7055) @vuule

Add Java tests for decimal casts (#7051) @sperlingxx

Auto-label PRs based on their content (#7044) @jolorunyomi

Create sort gbenchmark for strings column (#7040) @davidwendt

Refactor io memory fetches to use hostdevice_vector methods (#7035) @ChrisJar

Spark Murmur3 hash functionality (#7024) @rwlee

Fix libcudf strings logic where size_type is used to access INT32 column data (#7020) @davidwendt

Adding decimal writing support to parquet (#7017) @hyperbolic2346

Add compression="infer" as default for dask_cudf.read_csv (#7013) @rjzamora

Correct ORC docstring; other minor cuIO improvements (#7012) @vuule

Reduce number of hostdevice_vector allocations in parquet reader (#7005) @devavret

Check output size overflow on strings gather (#6997) @davidwendt

Improve representation of MultiIndex (#6992) @galipremsagar

Disable some pragma unroll statements in thrust sort.h (#6982) @davidwendt

Minor cudf::round internal refactoring (#6976) @codereport

Add Java bindings for URL conversion (#6972) @jlowe

Enable strict_decimal_types in parquet reading (#6969) @sperlingxx

Add in basic support to JNI for logical_cast (#6954) @revans2

Remove duplicate file array_tests.cpp (#6953) @karthikeyann

Add null mask fixed_point_column_wrapper constructors (#6951) @codereport

Update Java bindings version to 0.18-SNAPSHOT (#6949) @jlowe

Use simplified rmm::exec_policy (#6939) @harrism

Add null count test for apply_boolean_mask (#6903) @harrism

Implement DataFrame.quantile for datetime and timedelta data types (#6902) @ChrisJar

Remove **kwargs from string/categorical methods (#6750) @shwina

Refactor rolling.cu to reduce compile time (#6512) @mythrocks

Add static type checking via Mypy (#6381) @shwina

Update to official libcu++ on Github (#6275) @trxcllnt

Source code(tar.gz)
Source code(zip)

Owner

RAPIDS

Open GPU Data Science

GitHub http://rapids.ai

A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner

swifter A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner. Blog posts Release 1.0.0 Fir

2.2k Jan 4, 2023

cuDF - GPU DataFrame Library

cuDF - GPU DataFrames NOTE: For the latest stable README.md ensure you are on the main branch. Resources cuDF Reference Documentation: Python API refe

5.2k Jan 8, 2023

BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.

A lightweight, GPU accelerated, SQL engine built on the RAPIDS.ai ecosystem. Get Started on app.blazingsql.com Getting Started | Documentation | Examp

1.8k Jan 2, 2023

:truck: Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark

To launch a live notebook server to test optimus using binder or Colab, click on one of the following badges: Optimus is the missing framework to prof

1.3k Dec 30, 2022

Lazy Profiler is a simple utility to collect CPU, GPU, RAM and GPU Memory stats while the program is running.

lazyprofiler Lazy Profiler is a simple utility to collect CPU, GPU, RAM and GPU Memory stats while the program is running. Installation Use the packag

28 Dec 9, 2022

General purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases.

Vulkan Kompute The general purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabl

The Institute for Ethical Machine Learning

1k Dec 26, 2022

nvitop, an interactive NVIDIA-GPU process viewer, the one-stop solution for GPU process management

An interactive NVIDIA-GPU process viewer, the one-stop solution for GPU process management.

1.3k Jan 2, 2023

Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation. Intel iHD GPU (iGPU) support. NVIDIA GPU (dGPU) support.

mtomo Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation.

24 Mar 2, 2022

High performance Cross-platform Inference-engine, you could run Anakin on x86-cpu,arm, nv-gpu, amd-gpu,bitmain and cambricon devices.

Anakin2.0 Welcome to the Anakin GitHub. Anakin is a cross-platform, high-performance inference engine, which is originally developed by Baidu engineer

514 Dec 28, 2022

cuDF - GPU DataFrame Library

Related tags

Overview

cuDF - GPU DataFrames

Quick Start

Installation

CUDA/GPU requirements

Conda

Build/Install from Source

Contributing

Contact

Open GPU Data Science

Apache Arrow on GPU

Comments

Process

1. Gather pain points

2. Derive requirements

3. Design Mock Up

4. Implementation

5. Initial Refactor

6. Full Refactor

C++

Spark

Python/Pandas

Discussion

Description

Checklist

Description

Checklist

Releases(v22.12.01)

v22.12.01(Dec 8, 2022)

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

v22.12.00(Dec 8, 2022)

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

v23.02.00a(Nov 18, 2022)

🔗 Links

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

v22.10.01(Nov 3, 2022)

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

v22.10.00(Oct 12, 2022)

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

v22.08.01(Sep 29, 2022)

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

v22.08.00(Aug 17, 2022)

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

v22.10.00a(Nov 3, 2022)

🔗 Links

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements