A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner

Overview

swifter

A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner.

PyPI version CircleCI codecov Code style: black GitHub stars PyPI - Downloads

Blog posts

Documentation

To know about latest improvements, please check the changelog.

Further documentations on swifter is available here.

Check out the examples notebook, along with the speed benchmark notebook. The benchmarks are created using the library perfplot.

Installation:

$ pip install -U pandas # upgrade pandas
$ pip install swifter # first time installation

$ pip install -U swifter # upgrade to latest version if already installed

alternatively, to install on Anaconda:

conda install -c conda-forge swifter

...after installing, import swifter into your code along with pandas using:

import pandas as pd
import swifter

...alternatively, swifter can be used with modin dataframes in the same manner:

import modin.pandas as pd
import swifter

NOTE: if you import swifter before modin, you will have to additionally register modin: swifter.register_modin()

Easy to use

df = pd.DataFrame({'x': [1, 2, 3, 4], 'y': [5, 6, 7, 8]})

# runs on single core
df['x2'] = df['x'].apply(lambda x: x**2)
# runs on multiple cores
df['x2'] = df['x'].swifter.apply(lambda x: x**2)

# use swifter apply on whole dataframe
df['agg'] = df.swifter.apply(lambda x: x.sum() - x.min())

# use swifter apply on specific columns
df['outCol'] = df[['inCol1', 'inCol2']].swifter.apply(my_func)
df['outCol'] = df[['inCol1', 'inCol2', 'inCol3']].swifter.apply(my_func,
             positional_arg, keyword_arg=keyword_argval)

Vectorizes your function, when possible

Alt text Alt text

When vectorization is not possible, automatically decides which is faster: to use dask parallel processing or a simple pandas apply

Alt text Alt text

Notes

  1. The function is documented in the .py file. In Jupyter Notebooks, you can see the docs by pressing Shift+Tab(x3). Also, check out the complete documentation here along with the changelog.

  2. Please upgrade your version of pandas, as the pandas extension api used in this module is a recent addition to pandas.

  3. Import modin before importing swifter, if you wish to use modin with swifter. Otherwise, use swifter.register_modin() to access it.

  4. Do not use swifter to apply a function that modifies external variables. Under the hood, swifter does sample applies to optimize performance. These sample applies will modify the external variable in addition to the final apply. Thus, you will end up with an erroneously modified external variable.

  5. It is advised to disable the progress bar if calling swifter from a forked process as the progress bar may get confused between various multiprocessing modules.

  6. If swifter return is different than pandas try explicitly casting type e.g.: df.swifter.apply(lambda x: float(np.angle(x)))

Comments
  • Slow Performance of Swifter for Text Preprocessing

    Slow Performance of Swifter for Text Preprocessing

    Hi @jmcarpenter2,

    Dear Swifter Folks,

    Recently, i found the speed when using swifter is 5-10x slower than using vanilla pandas apply for case that the process is not vectorized (my case is doing text preprocessing).

    The experiment is like this:

    
    import pandas as pd
    import swifter
    
    def clean_text(text):
        text = text.strip()
        text = text.replace(' ', '_')
        return text
    
    N_rows = 7000000
    df_data = pd.DataFrame([["i want to break free"]] * N_rows, columns=["text"])
    
    %time df_data['text'] = df_data['text'].swifter.apply(clean_text)
    
    %time df_data['text'] = df_data['text'].apply(clean_text)
    

    Is it expected? let's have a discussion to make sure i'm not missing something. Thank you!

    opened by hadyan-tvlk 26
  • Swifter using only single core

    Swifter using only single core

    I am applying swifter to a function which takes several values apart from datetime variable. After running the code I saw it using only a single core (available 6 cores). The data is of size 476k rows. With a single core, it takes about 7.5 minutes.

    I added a set_npartitions(16) it improved the processing time to 3.5 minutes but still using a single core.

    Any reason why it can't use all the cores?

    opened by raghu1121 13
  • Swifter Restarting Script

    Swifter Restarting Script

    Hi,

    I have attempted to speed up some data processing involving data frames with over 4 million rows with swifter on Python 3.6

    I have prepended some of my pandas applys with swifter, however it seems to complete restart the script multiple times (printing out debug information over and over) and create multiple threads within the call stack crashing the program

    I have been unable to trace which apply causes the error at this point in time

    I understand that python 3 is experimental, if you'd like me to share my anaconda environment let me know

    opened by Jack-McKew 12
  • swifter install is stuck

    swifter install is stuck

    Hey guys, using Python 3.9 here on my local (MacOS). Tried a simple pip install swifter in my venv, and have not been able to pass through this:

    INFO: pip is looking at multiple versions of ipykernel to determine which version is compatible with other requirements. This could take a while. Collecting ipykernel>=4.5.1 Using cached ipykernel-5.4.1-py3-none-any.whl (119 kB) Using cached ipykernel-5.4.0-py3-none-any.whl (119 kB) Using cached ipykernel-5.3.4-py3-none-any.whl (120 kB) Using cached ipykernel-5.3.3-py3-none-any.whl (120 kB) Using cached ipykernel-5.3.2-py3-none-any.whl (120 kB) Using cached ipykernel-5.3.1-py3-none-any.whl (120 kB) Using cached ipykernel-5.3.0-py3-none-any.whl (119 kB)

    I believe someone else also had this issue and has documented it in this stack overflow post. https://stackoverflow.com/questions/65238819/failed-to-install-swifter-via-pip-info-pip-is-looking-at-multiple-versions

    installation issue 
    opened by amankagarwal 11
  • swifter apply for resample groups

    swifter apply for resample groups

    I've used swifter to speed up apply calls on DataFrames, but this isn't the only context apply is used in pandas. Would it be simple to implement for resample objects also?

    See: pandas.DataFrame.resample

    Can we go from: series.resample('3T').apply(custom_resampler) to: series.resample('3T').swifter.apply(custom_resampler)?

    enhancement 
    opened by harahu 11
  • Question:  Swifter with lambda functions

    Question: Swifter with lambda functions

    Hi-

    Sorry to leave a question here, but I didn't see any other way to reach you. I am loving swifter and would like to figure out how to apply it in a double-apply that I'm doing between two dataframes. I'm using hamming distance to calculate the distance between two strings from columns of two different data frames as follows:

    df1
    id    |     Target
    12   |     AATTGG
    57   |     GGAACC
    
    df2
    id    |     ngram
    22   |     AATTGC
    42   |     AATTGA
    
    import distance
    df1.Target.apply(lambda bc: df2.ngram.apply(lambda x: distance.hamming(bc, x)))
    

    Is there a way to do something list this in swifter?

    Thanks!

    opened by summerela 10
  • Python 3.10 support?

    Python 3.10 support?

    I am trying to install swifter via conda in a new virtual environment based on Python 3.10 and it fails with some dependency issues. Is Python 3.10 not supported, or perhaps something else is going on with my environment?

    Thank you.

    installation issue 
    opened by borice 9
  •  TypeError: TypeError('encode() argument 1 must be string, not bool',) while `apply`ing to dataframe

    TypeError: TypeError('encode() argument 1 must be string, not bool',) while `apply`ing to dataframe

    Python 2.7.15 Swifter 0.281 Pandas 0.24.1 Numpy 1.16.1

    Trying to switch a relatively simple currently working dataframe .apply to use this package, I ran into this exception. Here is the code I am running

    def score_stuff(df_to_score, months, predictor):
        def compute_value(row):
            return predictor.compute_n_month_values(row.plan, row.length_in_months, row.months)
    
        df_to_score['output'] = df_to_score.apply(compute_value, axis=1)
    

    (row.plan is a string/object, row.length_in_months is a float, and row.months is an int. There are other cols in the df_to_score of many types but they are not referenced in the compute_value() method)

    Here's the stack trace.

    File "/opt/airflow/repo/dags/scripts/models/score_stuff.py", line 128, in score_stuff
       df_to_score['output'] = df_to_score.apply(compute_value, axis=1)
    File "/var/lib/venv/airflow/local/lib/python2.7/site-packages/swifter/swifter.py", line 285, in apply
      **kwds
    File "/var/lib/venv/airflow/local/lib/python2.7/site-packages/tqdm/_tqdm.py", line 657, in inner
      t = tclass(*targs, total=total, **tkwargs)
    File "/var/lib/venv/airflow/local/lib/python2.7/site-packages/tqdm/_tqdm.py", line 945, in __init__
      self.display()
    File "/var/lib/venv/airflow/local/lib/python2.7/site-packages/tqdm/_tqdm.py", line 1315, in display
      self.sp(self.__repr__() if msg is None else msg)
    File "/var/lib/venv/airflow/local/lib/python2.7/site-packages/tqdm/_tqdm.py", line 250, in print_status
      fp_write('\r' + s + (' ' * max(last_len[0] - len_s, 0)))
    File "/var/lib/venv/airflow/local/lib/python2.7/site-packages/tqdm/_tqdm.py", line 243, in fp_write
      fp.write(_unicode(s))
    File "/var/lib/venv/airflow/local/lib/python2.7/site-packages/tqdm/_utils.py", line 160, in write
      self, 'encoding')))
    TypeError: encode() argument 1 must be string, not bool
    Exception TypeError: TypeError('encode() argument 1 must be string, not bool',) in <bound method tqdm.__del__ of Pandas Apply:   0%|          | 0/14168 [00:00<?, ?it/s]> ignored
    

    Interestingly, using .swifter was fine on my local OS X machine on a small dataset, but failing on a 16 core EC2 instance with a larger dataset.

    I tried passing raw=True just in case that might help, but it did not... am i just doing something dumb?

    opened by apurvis 9
  • swifter using single core only

    swifter using single core only

    Hi, I have tried to use the swift.apply() on a pandas dataframe and can't get it to use more than one core.

    I'm running swifter version 1.0.9 on a centos 8 server with 20 cores, and 202 GB RAM using jupyter notebooks. Everything was installed using conda.

    Information on the DataFrame:

    DatetimeIndex: 1950000 entries, 2016-11-10 06:32:00.000030+00:00 to 2016-11-10 06:44:59.999630+00:00 Columns: 741 entries, 2700.321045 to 3199.975098 dtypes: uint16(741) memory usage: 2.8 GB

    The code to run the swifter.apply() is:

    def rail_break(amplitude_ser):
        amplitude_max_ser = amplitude_ser.rolling(window=1000, min_periods=1).max()
        alarm_amp_threshold = 1.05
        alarm_time_threshold = dt.timedelta(minutes=5)
        background_amp_time = dt.timedelta(seconds=5)
        mean_background_amp = amplitude_max_ser[amplitude_max_ser.index <= (amplitude_max_ser.index[0] + 
                                                                            background_amp_time)].mean()
        alarm_amp_threshold = mean_background_amp * alarm_amp_threshold
        try:
            alarm_start_time = amplitude_max_ser.index[ amplitude_max_ser >= alarm_amp_threshold ][0]
        except(IndexError):
            alarm = False
            return alarm
        alarm_end_time = alarm_start_time + alarm_time_threshold
        alarm = amplitude_max_ser[ (amplitude_max_ser.index >= alarm_start_time) & 
                          (amplitude_max_ser.index <= alarm_end_time)] <= alarm_amp_threshold
        alarm = not(alarm.any())
        return alarm
    
    alarm_distances = amplitude_df.columns[amplitude_df.swifter.apply(rail_break, axis=0)]
    alarm_df = amplitude_df.loc[:,alarm_distances]
    

    I have tried the following but it still only uses one core:

    1. used amplitude_df.swifter.set_dask_scheduler('processes').apply(rail_break, axis=0)
    2. transposed the DataFrame to use axis=1
    opened by malapradej 8
  • Kernel dies after importing swifter

    Kernel dies after importing swifter

    Hello!

    I am experiencing an issue when trying to import swifter in Jupyter Notebook - Kernel basically dies after importing.

    image

    python version: 3.7.4 pandas version: 1.0.1 swifter version: 0.301

    I'm on MacOS Mojave (v 10.14) and 8 GB RAM.

    I am using Anaconda 3 and I've tried both installing via pip and via conda.

    Also tried using virtualenv just in case there was any incompatibility issue but still ran into the same problem.

    Thank you in advance for your attention, if you need any other details please just ask for them!

    opened by Mmoncadaisla 8
  • Configurable progress bar instances

    Configurable progress bar instances

    I didn't really like how I was forced to have "Pandas Apply" or "Dask Apply" as my output. So I did a thing.

    The Dask progress bar enforces it's own total argument, so I had it override anything sent into it.

    import pandas as pd
    import swifter
    df = pd.DataFrame([{'a': 1, 'b': 2}, {'c': 3, 'd': 4}])
    df.swifter.apply(print)
    Pandas Apply: 100%|█████████████████████████| 4/4 [00:00<00:00, 1561.11it/s]
    
    df.swifter.progress_bar(desc='testy!', total=2).apply(print)
    testy!: 100%|████████████████████████████████| 2/2 [00:00<00:00, 768.75it/s]
    
    opened by rlynch-ironnet 8
  • Why does

    Why does "force_parallel(enable=True)" not work?

    In this code, dask works:

    def has_inter(x_cat_set, now_set):
        inter = x_cat_set.intersection(now_set)
        return len(inter) == 0 
    
    def get_negs2(now_set,si_doc, num, df3):
        negs_set = set(df3[df3.loc[:,'s_cat'].swifter.progress_bar(False).apply(has_inter, args=(now_set, ))].s_id)
        negs = list(negs_set)
        return negs
    
    neg_dict = df2.loc[:, 's_cat'].swifter.force_parallel(enable=True).apply(get_negs2, args=(si_doc, n_neg, df3,))
    
    

    This is the result: image

    In this code, dask doesn't works:

    
    def get_negs(line, si_doc, num, df3):
        now_set = line['s_cat']
        negs_set = set(df3[df3.loc[:,'s_cat'].swifter.progress_bar(False).apply(has_inter, args=(now_set, ))].s_id)
        negs = list(negs_set)
        return negs
    
    neg_dict = df2.swifter.force_parallel(enable=True).allow_dask_on_strings(enable=True).apply(get_negs,args=(si_doc,n_neg, df3,),axis=1)
    

    This is the result: image

    Why are there different results? I want to use the second method, because I need to use two columns of data in other cases.

    opened by kongbo96 0
  • Swifter With GroupBy - Crashing Python

    Swifter With GroupBy - Crashing Python

    Using Swifter with group by I am running into an error that is crashing the Python instance. Error below, please let me know if there is any more that will help getting to the bottom of this.

    2022-10-06 15:07:26,013 INFO worker.py:1518 -- Started a local Ray instance. [failure_signal_handler.cc : 171] RAW: sigaltstack() failed with errno=1 /Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '

    opened by SamPetherbridge 0
  • Inform the user whether multiprocessing was used

    Inform the user whether multiprocessing was used

    Hi, thanks for this cool library.

    One thing that would be nice would be to tell the user what swifter decided to do, e.g.:

    • was it able to vectorize?
    • did it choose to apply multiprocessing with Dask?

    Right now it seems everything is totally transparent to the user; I cannot easily tell if swifter is even using more than one core.

    opened by tadamcz 0
  • swifter.groupby() does not support with dropna=False

    swifter.groupby() does not support with dropna=False

    I found that the swifter groupby apply chain will encounter the error when trying to sort index, if I set dropna to False for the groupby step.

    Here is the error log: Traceback (most recent call last): File "/paedyl01/disk1/yangyxt/ngs_scripts/acmg_automated_anno.py", line 76, in wrapper result = func(*args, **kwargs) File "/paedyl01/disk1/yangyxt/ngs_scripts/acmg_automated_anno.py", line 484, in BP2_PM3_compound_with_patho return df.swifter.groupby([gene_col], as_index=False, dropna=False).apply(check_compound_per_gene, File "/home/yangyxt/anaconda3/envs/dask/lib/python3.9/site-packages/swifter/swifter.py", line 661, in apply return self._ray_apply(func, *args, **kwds) File "/home/yangyxt/anaconda3/envs/dask/lib/python3.9/site-packages/swifter/swifter.py", line 650, in _ray_apply return pd.concat(ray.get(apply_chunks), axis=self._axis).sort_index() File "/home/yangyxt/anaconda3/envs/dask/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper return func(*args, **kwargs) File "/home/yangyxt/anaconda3/envs/dask/lib/python3.9/site-packages/pandas/core/frame.py", line 6447, in sort_index return super().sort_index( File "/home/yangyxt/anaconda3/envs/dask/lib/python3.9/site-packages/pandas/core/generic.py", line 4685, in sort_index indexer = get_indexer_indexer( File "/home/yangyxt/anaconda3/envs/dask/lib/python3.9/site-packages/pandas/core/sorting.py", line 94, in get_indexer_indexer indexer = nargsort( File "/home/yangyxt/anaconda3/envs/dask/lib/python3.9/site-packages/pandas/core/sorting.py", line 417, in nargsort indexer = non_nan_idx[non_nans.argsort(kind=kind)] TypeError: '<' not supported between instances of 'int' and 'tuple' ERROR:2022-09-28 13:40:29,310:wrapper:83:Exception raised in main_anno_process. exception: '<' not supported between instances of 'int' and 'tuple'

    The dataframe put to use swifter.groupby() has a common numerical index. From 0 to len(df). The groupby column might have some rows with NA values and I do wish to keep them. I guess that's why this issue happened. I 'm not sure whether this can be fixed or optimized. Pls take a look.

    opened by yangyxt 1
  • IndexError: tuple index out of range (when using dask_apply)

    IndexError: tuple index out of range (when using dask_apply)

    Hi,

    swifter version: 1.1.3 dask version: 2022.05.0 pandas version: 1.4.2 python version: 3.9.12

    When I use dataframe[col].apply(func) it does work.

    When I use dataframe[col].swifter.allow_dask_on_strings(enable=True).apply(func) on SMALL sample (10), it use pandas apply under the hood and it works.

    When I use dataframe[col].swifter.allow_dask_on_strings(enable=True).apply(func) on BIGGER sample (1000), it use dask apply under the hood and it does NOT works. Seems to have a problem when switching to dask apply. Here is the complete error:

    Traceback (most recent call last): File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/swifter/swifter.py", line 241, in apply self._validate_apply( File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/swifter/base.py", line 50, in _validate_apply raise ValueError(error_message) ValueError: Vectorized function sample doesn't match pandas apply sample.

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "", line 1, in File "", line 47, in transform File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/swifter/swifter.py", line 255, in apply return self._dask_apply(func, convert_dtype, *args, **kwds) File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/swifter/swifter.py", line 173, in _dask_apply dd.from_pandas(sample, npartitions=self._npartitions) File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/dask/base.py", line 292, in compute (result,) = compute(self, traverse=False, **kwargs) File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/dask/base.py", line 576, in compute return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)]) File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/dask/base.py", line 576, in return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)]) File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/dask/dataframe/core.py", line 129, in finalize return _concat(results) File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/dask/dataframe/core.py", line 110, in _concat return da.core.concatenate3(args) File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/dask/array/core.py", line 5124, in concatenate3 chunks = chunks_from_arrays(arrays) File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/dask/array/core.py", line 4911, in chunks_from_arrays result.append(tuple(shape(deepfirst(a))[dim] for a in arrays)) File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/dask/array/core.py", line 4911, in result.append(tuple(shape(deepfirst(a))[dim] for a in arrays)) IndexError: tuple index out of range

    opened by CoteDave 1
  • Swifter

    Swifter "progress_bar" Not Working

    I just started experimenting with Swifter a few minutes ago and have been struggling to get the progress bar to show.

    I have the code snippet below, that was appropriated using the example code provided.

    Why is the prgress_bar(enable=True) option not working? Is there something wrong with my code?

    var_unza_dspace_dataframe["subjectMistakes"] = var_unza_dspace_dataframe["subject"].str.split("=").swifter.allow_dask_on_strings(enable=True).progress_bar(
        enable=True, desc='Subjects Mistakes'
    ).apply(fxn_subject_spellchecker)
    
    opened by lightonphiri 11
Owner
Jason Carpenter
Accelerating AI development for leading companies
Jason Carpenter
The goal of pandas-log is to provide feedback about basic pandas operations. It provides simple wrapper functions for the most common functions that add additional logs

pandas-log The goal of pandas-log is to provide feedback about basic pandas operations. It provides simple wrapper functions for the most common funct

Eyal Trabelsi 206 Dec 13, 2022
cuDF - GPU DataFrame Library

cuDF - GPU DataFrames NOTE: For the latest stable README.md ensure you are on the main branch. Built based on the Apache Arrow columnar memory format,

RAPIDS 5.2k Dec 31, 2022
NumPy and Pandas interface to Big Data

Blaze translates a subset of modified NumPy and Pandas-like syntax to databases and other computing systems. Blaze allows Python users a familiar inte

Blaze 3.1k Jan 1, 2023
sqldf for pandas

pandasql pandasql allows you to query pandas DataFrames using SQL syntax. It works similarly to sqldf in R. pandasql seeks to provide a more familiar

yhat 1.2k Jan 9, 2023
Pandas Google BigQuery

pandas-gbq pandas-gbq is a package providing an interface to the Google BigQuery API from pandas Installation Install latest release version via conda

Python for Data 348 Jan 3, 2023
Koalas: pandas API on Apache Spark

pandas API on Apache Spark Explore Koalas docs » Live notebook · Issues · Mailing list Help Thirsty Koalas Devastated by Recent Fires The Koalas proje

Databricks 3.2k Jan 4, 2023
Modin: Speed up your Pandas workflows by changing a single line of code

Scale your pandas workflows by changing one line of code To use Modin, replace the pandas import: # import pandas as pd import modin.pandas as pd Inst

null 8.2k Jan 1, 2023
The easy way to write your own flavor of Pandas

Pandas Flavor The easy way to write your own flavor of Pandas Pandas 0.23 added a (simple) API for registering accessors with Pandas objects. Pandas-f

Zachary Sailer 260 Jan 1, 2023
High performance datastore for time series and tick data

Arctic TimeSeries and Tick store Arctic is a high performance datastore for numeric data. It supports Pandas, numpy arrays and pickled objects out-of-

Man Group 2.9k Dec 23, 2022
A Python package for manipulating 2-dimensional tabular data structures

datatable This is a Python package for manipulating 2-dimensional tabular data structures (aka data frames). It is close in spirit to pandas or SFrame

H2O.ai 1.6k Jan 5, 2023
PdpCLI is a pandas DataFrame processing CLI tool which enables you to build a pandas pipeline from a configuration file.

PdpCLI Quick Links Introduction Installation Tutorial Basic Usage Data Reader / Writer Plugins Introduction PdpCLI is a pandas DataFrame processing CL

Yasuhiro Yamaguchi 15 Jan 7, 2022
a simple function that randomly generates and applies console text colors

ChangeConsoleTextColour a simple function that randomly generates and applies console text colors This repository corresponds to my Python Functions f

Mariya 6 Sep 20, 2022
Interactive Web App with Streamlit and Scikit-learn that applies different Classification algorithms to popular datasets

Interactive Web App with Streamlit and Scikit-learn that applies different Classification algorithms to popular datasets Datasets Used: Iris dataset,

Samrat Mitra 2 Nov 18, 2021
Create HTML profiling reports from pandas DataFrame objects

Pandas Profiling Documentation | Slack | Stack Overflow Generates profile reports from a pandas DataFrame. The pandas df.describe() function is great

null 10k Jan 1, 2023
Create HTML profiling reports from pandas DataFrame objects

Pandas Profiling Documentation | Slack | Stack Overflow Generates profile reports from a pandas DataFrame. The pandas df.describe() function is great

null 6.8k Feb 18, 2021
Create HTML profiling reports from pandas DataFrame objects

Pandas Profiling Documentation | Slack | Stack Overflow Generates profile reports from a pandas DataFrame. The pandas df.describe() function is great

null 10k Jan 1, 2023
Create HTML profiling reports from pandas DataFrame objects

Pandas Profiling Documentation | Slack | Stack Overflow Generates profile reports from a pandas DataFrame. The pandas df.describe() function is great

null 10k Jan 1, 2023
Supply a wrapper ``StockDataFrame`` based on the ``pandas.DataFrame`` with inline stock statistics/indicators support.

Stock Statistics/Indicators Calculation Helper VERSION: 0.3.2 Introduction Supply a wrapper StockDataFrame based on the pandas.DataFrame with inline s

Cedric Zhuang 1.1k Dec 28, 2022
Supply a wrapper ``StockDataFrame`` based on the ``pandas.DataFrame`` with inline stock statistics/indicators support.

Stock Statistics/Indicators Calculation Helper VERSION: 0.3.2 Introduction Supply a wrapper StockDataFrame based on the pandas.DataFrame with inline s

Cedric Zhuang 1.1k Dec 28, 2022
Bearsql allows you to query pandas dataframe with sql syntax.

Bearsql adds sql syntax on pandas dataframe. It uses duckdb to speedup the pandas processing and as the sql engine

null 14 Jun 22, 2022