295 Repositories
Python sklearn-pandas Libraries
A Python module for creating Excel XLSX files.
XlsxWriter XlsxWriter is a Python module for writing files in the Excel 2007+ XLSX file format. XlsxWriter can be used to write text, numbers, formula
Statistical package in Python based on Pandas
Pingouin is an open-source statistical package written in Python 3 and based mostly on Pandas and NumPy. Some of its main features are listed below. F
scikit-learn inspired API for CRFsuite
sklearn-crfsuite sklearn-crfsuite is a thin CRFsuite (python-crfsuite) wrapper which provides interface simlar to scikit-learn. sklearn_crfsuite.CRF i
A library of sklearn compatible categorical variable encoders
Categorical Encoding Methods A set of scikit-learn-style transformers for encoding categorical variables into numeric by means of different techniques
cuDF - GPU DataFrame Library
cuDF - GPU DataFrames NOTE: For the latest stable README.md ensure you are on the main branch. Resources cuDF Reference Documentation: Python API refe
Automatically build ARIMA, SARIMAX, VAR, FB Prophet and XGBoost Models on Time Series data sets with a Single Line of Code. Now updated with Dask to handle millions of rows.
Auto_TS: Auto_TimeSeries Automatically build multiple Time Series models using a Single Line of Code. Now updated with Dask. Auto_timeseries is a comp
Common financial technical indicators implemented in Pandas.
FinTA (Financial Technical Analysis) Common financial technical indicators implemented in Pandas. This is work in progress, bugs are expected and resu
Supply a wrapper ``StockDataFrame`` based on the ``pandas.DataFrame`` with inline stock statistics/indicators support.
Stock Statistics/Indicators Calculation Helper VERSION: 0.3.2 Introduction Supply a wrapper StockDataFrame based on the pandas.DataFrame with inline s
Performance analysis of predictive (alpha) stock factors
Alphalens Alphalens is a Python Library for performance analysis of predictive (alpha) stock factors. Alphalens works great with the Zipline open sour
Technical Analysis Library using Pandas and Numpy
Technical Analysis Library in Python It is a Technical Analysis library useful to do feature engineering from financial time series datasets (Open, Cl
A python wrapper for Alpha Vantage API for financial data.
alpha_vantage Python module to get stock data/cryptocurrencies from the Alpha Vantage API Alpha Vantage delivers a free API for real time financial da
Yahoo! Finance market data downloader (+faster Pandas Datareader)
Yahoo! Finance market data downloader Ever since Yahoo! finance decommissioned their historical data API, many programs that relied on it to stop work
Repository to hold code for the cap-bot varient that is being presented at the SIIC Defence Hackathon 2021.
capbot-siic Repository to hold code for the cap-bot varient that is being presented at the SIIC Defence Hackathon 2021. Problem Inspiration A plethora
Performance analysis of predictive (alpha) stock factors
Alphalens Alphalens is a Python Library for performance analysis of predictive (alpha) stock factors. Alphalens works great with the Zipline open sour
Pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more.
weightedcalcs weightedcalcs is a pandas-based Python library for calculating weighted means, medians, standard deviations, and more. Features Plays we
Supply a wrapper ``StockDataFrame`` based on the ``pandas.DataFrame`` with inline stock statistics/indicators support.
Stock Statistics/Indicators Calculation Helper VERSION: 0.3.2 Introduction Supply a wrapper StockDataFrame based on the pandas.DataFrame with inline s
Create HTML profiling reports from pandas DataFrame objects
Pandas Profiling Documentation | Slack | Stack Overflow Generates profile reports from a pandas DataFrame. The pandas df.describe() function is great
An extension to pandas dataframes describe function.
pandas_summary An extension to pandas dataframes describe function. The module contains DataFrameSummary object that extend describe() with: propertie
SigOpt wrappers for scikit-learn methods
SigOpt + scikit-learn Interfacing This package implements useful interfaces and wrappers for using SigOpt and scikit-learn together Getting Started In
Use evolutionary algorithms instead of gridsearch in scikit-learn
sklearn-deap Use evolutionary algorithms instead of gridsearch in scikit-learn. This allows you to reduce the time required to find the best parameter
Hyper-parameter optimization for sklearn
hyperopt-sklearn Hyperopt-sklearn is Hyperopt-based model selection among machine learning algorithms in scikit-learn. See how to use hyperopt-sklearn
Genetic feature selection module for scikit-learn
sklearn-genetic Genetic feature selection module for scikit-learn Genetic algorithms mimic the process of natural selection to search for optimal valu
scikit-learn inspired API for CRFsuite
sklearn-crfsuite sklearn-crfsuite is a thin CRFsuite (python-crfsuite) wrapper which provides interface simlar to scikit-learn. sklearn_crfsuite.CRF i
Supervised domain-agnostic prediction framework for probabilistic modelling
A supervised domain-agnostic framework that allows for probabilistic modelling, namely the prediction of probability distributions for individual data
A sklearn-compatible Python implementation of Multifactor Dimensionality Reduction (MDR) for feature construction.
Master status: Development status: Package information: MDR A scikit-learn-compatible Python implementation of Multifactor Dimensionality Reduction (M
a feature engineering wrapper for sklearn
Few Few is a Feature Engineering Wrapper for scikit-learn. Few looks for a set of feature transformations that work best with a specified machine lear
Directions overlay for working with pandas in an analysis environment
dovpanda Directions OVer PANDAs Directions are hints and tips for using pandas in an analysis environment. dovpanda is an overlay companion for workin
A Python toolkit for processing tabular data
meza: A Python toolkit for processing tabular data Index Introduction | Requirements | Motivation | Hello World | Usage | Interoperability | Installat
Clean APIs for data cleaning. Python implementation of R package Janitor
pyjanitor pyjanitor is a Python implementation of the R package janitor, and provides a clean API for cleaning data. Why janitor? Originally a port of
Pandas integration with sklearn
Sklearn-pandas This module provides a bridge between Scikit-Learn's machine learning methods and pandas-style Data Frames. In particular, it provides
functional data manipulation for pandas
pandas-ply: functional data manipulation for pandas pandas-ply is a thin layer which makes it easier to manipulate data with pandas. In particular, it
Easy pipelines for pandas DataFrames.
pdpipe ˨ Easy pipelines for pandas DataFrames (learn how!). Website: https://pdpipe.github.io/pdpipe/ Documentation: https://pdpipe.github.io/pdpipe/d
The goal of pandas-log is to provide feedback about basic pandas operations. It provides simple wrapper functions for the most common functions that add additional logs
pandas-log The goal of pandas-log is to provide feedback about basic pandas operations. It provides simple wrapper functions for the most common funct
The easy way to write your own flavor of Pandas
Pandas Flavor The easy way to write your own flavor of Pandas Pandas 0.23 added a (simple) API for registering accessors with Pandas objects. Pandas-f
A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner
swifter A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner. Blog posts Release 1.0.0 Fir
Modin: Speed up your Pandas workflows by changing a single line of code
Scale your pandas workflows by changing one line of code To use Modin, replace the pandas import: # import pandas as pd import modin.pandas as pd Inst
Koalas: pandas API on Apache Spark
pandas API on Apache Spark Explore Koalas docs » Live notebook · Issues · Mailing list Help Thirsty Koalas Devastated by Recent Fires The Koalas proje
High performance datastore for time series and tick data
Arctic TimeSeries and Tick store Arctic is a high performance datastore for numeric data. It supports Pandas, numpy arrays and pickled objects out-of-
Universal 1d/2d data containers with Transformers functionality for data analysis.
XPandas (extended Pandas) implements 1D and 2D data containers for storing type-heterogeneous tabular data of any type, and encapsulates feature extra
Pandas Google BigQuery
pandas-gbq pandas-gbq is a package providing an interface to the Google BigQuery API from pandas Installation Install latest release version via conda
sqldf for pandas
pandasql pandasql allows you to query pandas DataFrames using SQL syntax. It works similarly to sqldf in R. pandasql seeks to provide a more familiar
NumPy and Pandas interface to Big Data
Blaze translates a subset of modified NumPy and Pandas-like syntax to databases and other computing systems. Blaze allows Python users a familiar inte
cuDF - GPU DataFrame Library
cuDF - GPU DataFrames NOTE: For the latest stable README.md ensure you are on the main branch. Built based on the Apache Arrow columnar memory format,
Create HTML profiling reports from pandas DataFrame objects
Pandas Profiling Documentation | Slack | Stack Overflow Generates profile reports from a pandas DataFrame. The pandas df.describe() function is great
Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)
Karate Club is an unsupervised machine learning extension library for NetworkX. Please look at the Documentation, relevant Paper, Promo Video, and Ext
Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models
Highly interpretable, sklearn-compatible classifier based on decision rules This is a scikit-learn compatible wrapper for the Bayesian Rule List class
50% faster, 50% less RAM Machine Learning. Numba rewritten Sklearn. SVD, NNMF, PCA, LinearReg, RidgeReg, Randomized, Truncated SVD/PCA, CSR Matrices all 50+% faster
[Due to the time taken @ uni, work + hell breaking loose in my life, since things have calmed down a bit, will continue commiting!!!] [By the way, I'm
daily report of @arkinvest ETF activity + data collection
ark_invest daily weekday report of @arkinvest ETF activity + data collection This script was created to: Extract and save daily csv's from ARKInvest's
🍊 :bar_chart: :bulb: Orange: Interactive data analysis
Orange Data Mining Orange is a data mining and visualization toolbox for novice and expert alike. To explore data with Orange, one requires no program
xarray: N-D labeled arrays and datasets
xarray is an open source project and Python package that makes working with labelled multi-dimensional arrays simple, efficient, and fun!
Interactive plotting for Pandas using Vega-Lite
pdvega: Vega-Lite plotting for Pandas Dataframes pdvega is a library that allows you to quickly create interactive Vega-Lite plots from Pandas datafra
Draw interactive NetworkX graphs with Altair
nx_altair Draw NetworkX graphs with Altair nx_altair offers a similar draw API to NetworkX but returns Altair Charts instead. If you'd like to contrib
Automatically Visualize any dataset, any size with a single line of code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.
AutoViz Automatically Visualize any dataset, any size with a single line of code. AutoViz performs automatic visualization of any dataset with one lin
Visualize and compare datasets, target values and associations, with one line of code.
In-depth EDA (target analysis, comparison, feature analysis, correlation) in two lines of code! Sweetviz is an open-source Python library that generat
Joyplots in Python with matplotlib & pandas :chart_with_upwards_trend:
JoyPy JoyPy is a one-function Python package based on matplotlib + pandas with a single purpose: drawing joyplots (a.k.a. ridgeline plots). The code f
Bokeh Plotting Backend for Pandas and GeoPandas
Pandas-Bokeh provides a Bokeh plotting backend for Pandas, GeoPandas and Pyspark DataFrames, similar to the already existing Visualization feature of
A GUI for Pandas DataFrames
PandasGUI A GUI for analyzing Pandas DataFrames. Demo Installation Install latest release from PyPi: pip install pandasgui Install directly from Githu
A high-level plotting API for pandas, dask, xarray, and networkx built on HoloViews
hvPlot A high-level plotting API for the PyData ecosystem built on HoloViews. Build Status Coverage Latest dev release Latest release Docs What is it?
Productivity Tools for Plotly + Pandas
Cufflinks This library binds the power of plotly with the flexibility of pandas for easy plotting. This library is available on https://github.com/san
Missing data visualization module for Python.
missingno Messy datasets? Missing values? missingno provides a small toolset of flexible and easy-to-use missing data visualizations and utilities tha
Create HTML profiling reports from pandas DataFrame objects
Pandas Profiling Documentation | Slack | Stack Overflow Generates profile reports from a pandas DataFrame. The pandas df.describe() function is great
Statistical data visualization using matplotlib
seaborn: statistical data visualization Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing
Interactive plotting for Pandas using Vega-Lite
pdvega: Vega-Lite plotting for Pandas Dataframes pdvega is a library that allows you to quickly create interactive Vega-Lite plots from Pandas datafra
Draw interactive NetworkX graphs with Altair
nx_altair Draw NetworkX graphs with Altair nx_altair offers a similar draw API to NetworkX but returns Altair Charts instead. If you'd like to contrib
Automatically Visualize any dataset, any size with a single line of code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.
AutoViz Automatically Visualize any dataset, any size with a single line of code. AutoViz performs automatic visualization of any dataset with one lin
Visualize and compare datasets, target values and associations, with one line of code.
In-depth EDA (target analysis, comparison, feature analysis, correlation) in two lines of code! Sweetviz is an open-source Python library that generat
Joyplots in Python with matplotlib & pandas :chart_with_upwards_trend:
JoyPy JoyPy is a one-function Python package based on matplotlib + pandas with a single purpose: drawing joyplots (a.k.a. ridgeline plots). The code f
Bokeh Plotting Backend for Pandas and GeoPandas
Pandas-Bokeh provides a Bokeh plotting backend for Pandas, GeoPandas and Pyspark DataFrames, similar to the already existing Visualization feature of
A GUI for Pandas DataFrames
PandasGUI A GUI for analyzing Pandas DataFrames. Demo Installation Install latest release from PyPi: pip install pandasgui Install directly from Githu
A high-level plotting API for pandas, dask, xarray, and networkx built on HoloViews
hvPlot A high-level plotting API for the PyData ecosystem built on HoloViews. Build Status Coverage Latest dev release Latest release Docs What is it?
Productivity Tools for Plotly + Pandas
Cufflinks This library binds the power of plotly with the flexibility of pandas for easy plotting. This library is available on https://github.com/san
Missing data visualization module for Python.
missingno Messy datasets? Missing values? missingno provides a small toolset of flexible and easy-to-use missing data visualizations and utilities tha
Create HTML profiling reports from pandas DataFrame objects
Pandas Profiling Documentation | Slack | Stack Overflow Generates profile reports from a pandas DataFrame. The pandas df.describe() function is great
Statistical data visualization using matplotlib
seaborn: statistical data visualization Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing
A plugin for Flake8 that checks pandas code
pandas-vet pandas-vet is a plugin for flake8 that provides opinionated linting for pandas code. It began as a project during the PyCascades 2019 sprin
Mypy stubs, i.e., type information, for numpy, pandas and matplotlib
Mypy type stubs for NumPy, pandas, and Matplotlib This is a PEP-561-compliant stub-only package which provides type information for matplotlib, numpy
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
AWS Data Wrangler Pandas on AWS Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretMana
Pandas Google BigQuery
pandas-gbq pandas-gbq is a package providing an interface to the Google BigQuery API from pandas Installation Install latest release version via conda
A pandas-like deferred expression system, with first-class SQL support
Ibis: Python data analysis framework for Hadoop and SQL engines Service Status Documentation Conda packages PyPI Azure Coverage Ibis is a toolbox to b
TuShare is a utility for crawling historical data of China stocks
TuShare Tushare Pro版已发布,请访问新的官网了解和查询数据接口! https://tushare.pro TuShare是实现对股票/期货等金融数据从数据采集、清洗加工 到 数据存储过程的工具,满足金融量化分析师和学习数据分析的人在数据获取方面的需求,它的特点是数据覆盖范围广,接口
Soda SQL Data testing, monitoring and profiling for SQL accessible data.
Soda SQL Data testing, monitoring and profiling for SQL accessible data. What does Soda SQL do? Soda SQL allows you to Stop your pipeline when bad dat
A compendium of useful, interesting, inspirational usage of pandas functions, each example will be an ipynb file
Pandas_by_examples A compendium of useful/interesting/inspirational usage of pandas functions, each example will be an ipynb file What is this reposit
An end-to-end machine learning web app to predict rugby scores (Pandas, SQLite, Keras, Flask, Docker)
Rugby score prediction An end-to-end machine learning web app to predict rugby scores Overview An demo project to provide a high-level overview of the
Python SDK for IEX Cloud
iexfinance Python SDK for IEX Cloud. Architecture mirrors that of the IEX Cloud API (and its documentation). An easy-to-use toolkit to obtain data for
A Python module for creating Excel XLSX files.
XlsxWriter XlsxWriter is a Python module for writing files in the Excel 2007+ XLSX file format. XlsxWriter can be used to write text, numbers, formula
Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)
Karate Club is an unsupervised machine learning extension library for NetworkX. Please look at the Documentation, relevant Paper, Promo Video, and Ext
Sparkling Pandas
SparklingPandas SparklingPandas aims to make it easy to use the distributed computing power of PySpark to scale your data analysis with Pandas. Sparkl
The windML framework provides an easy-to-use access to wind data sources within the Python world, building upon numpy, scipy, sklearn, and matplotlib. Renewable Wind Energy, Forecasting, Prediction
windml Build status : The importance of wind in smart grids with a large number of renewable energy resources is increasing. With the growing infrastr
NumPy and Pandas interface to Big Data
Blaze translates a subset of modified NumPy and Pandas-like syntax to databases and other computing systems. Blaze allows Python users a familiar inte
A Sklearn-like Framework for Hyperparameter Tuning and AutoML in Deep Learning projects. Finally have the right abstractions and design patterns to properly do AutoML. Let your pipeline steps have hyperparameter spaces. Enable checkpoints to cut duplicate calculations. Go from research to production environment easily.
Neuraxle Pipelines Code Machine Learning Pipelines - The Right Way. Neuraxle is a Machine Learning (ML) library for building machine learning pipeline
Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)
Karate Club is an unsupervised machine learning extension library for NetworkX. Please look at the Documentation, relevant Paper, Promo Video, and Ext
a delightful machine learning tool that allows you to train, test and use models without writing code
igel A delightful machine learning tool that allows you to train/fit, test and use models without writing code Note I'm also working on a GUI desktop
Statistical data visualization using matplotlib
seaborn: statistical data visualization Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing
NumPy and Pandas interface to Big Data
Blaze translates a subset of modified NumPy and Pandas-like syntax to databases and other computing systems. Blaze allows Python users a familiar inte
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
AWS Data Wrangler Pandas on AWS Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretMana