3094 Python Swahili-nlp-data Libraries

Koalas: pandas API on Apache Spark

pandas API on Apache Spark Explore Koalas docs » Live notebook · Issues · Mailing list Help Thirsty Koalas Devastated by Recent Fires The Koalas proje

3.2k Jan 4, 2023

A Python package for manipulating 2-dimensional tabular data structures

datatable This is a Python package for manipulating 2-dimensional tabular data structures (aka data frames). It is close in spirit to pandas or SFrame

1.6k Jan 5, 2023

High performance datastore for time series and tick data

Arctic TimeSeries and Tick store Arctic is a high performance datastore for numeric data. It supports Pandas, numpy arrays and pickled objects out-of-

2.9k Dec 23, 2022

A pure Python implementation of Apache Spark's RDD and DStream interfaces.

pysparkling Pysparkling provides a faster, more responsive way to develop programs for PySpark. It enables code intended for Spark applications to exe

254 Dec 6, 2022

Universal 1d/2d data containers with Transformers functionality for data analysis.

XPandas (extended Pandas) implements 1D and 2D data containers for storing type-heterogeneous tabular data of any type, and encapsulates feature extra

25 Mar 14, 2022

Pandas Google BigQuery

pandas-gbq pandas-gbq is a package providing an interface to the Google BigQuery API from pandas Installation Install latest release version via conda

348 Jan 3, 2023

NumPy and Pandas interface to Big Data

Blaze translates a subset of modified NumPy and Pandas-like syntax to databases and other computing systems. Blaze allows Python users a familiar inte

3.1k Jan 1, 2023

Create HTML profiling reports from pandas DataFrame objects

Pandas Profiling Documentation | Slack | Stack Overflow Generates profile reports from a pandas DataFrame. The pandas df.describe() function is great

10k Jan 1, 2023

Logging MXNet data for visualization in TensorBoard.

Logging MXNet Data for Visualization in TensorBoard Overview MXBoard provides a set of APIs for logging MXNet data for visualization in TensorBoard. T

327 Dec 5, 2022

Interpretability and explainability of data and machine learning models

AI Explainability 360 (v0.2.1) The AI Explainability 360 toolkit is an open-source library that supports interpretability and explainability of datase

1.2k Dec 29, 2022

A library for debugging/inspecting machine learning classifiers and explaining their predictions

ELI5 ELI5 is a Python package which helps to debug machine learning classifiers and explain their predictions. It provides support for the following m

2.6k Dec 30, 2022

An intuitive library to add plotting functionality to scikit-learn objects.

Welcome to Scikit-plot Single line functions for detailed visualizations The quickest and easiest way to go from analysis... ...to this. Scikit-plot i

2.3k Dec 31, 2022

A data-driven approach to quantify the value of classifiers in a machine learning ensemble.

Documentation | External Resources | Research Paper Shapley is a Python library for evaluating binary classifiers in a machine learning ensemble. The

187 Dec 27, 2022

With Holoviews, your data visualizes itself.

HoloViews Stop plotting your data - annotate your data and let it visualize itself. HoloViews is an open-source Python library designed to make data a

2.3k Jan 4, 2023

How on earth can I ever think of a solution like that in an interview?!

fuck-coding-interviews This repository is created by an awkward programmer who always struggles with coding problems on LeetCode, even with some Easy

613 Jan 8, 2023

Algorithms and data structures for educational, demonstrational and experimental purposes.

Algorithms and Data Structures (ands) Introduction This project was created for personal use mostly while studying for an exam (starting in the month

50 Dec 6, 2022

:computer: Data Structures and Algorithms in Python

Algorithms in Python Implementations of a few algorithms and datastructures for fun and profit! Completed Karatsuba Multiplication Basic Sorting Rabin

2.9k Jan 1, 2023

Python library that makes it easy for data scientists to create charts.

Chartify Chartify is a Python library that makes it easy for data scientists to create charts. Why use Chartify? Consistent input data format: Spend l

3.2k Jan 1, 2023

NLP made easy

GluonNLP: Your Choice of Deep Learning for NLP GluonNLP is a toolkit that helps you solve NLP problems. It provides easy-to-use tools that helps you l

Distributed (Deep) Machine Learning Community

2.5k Jan 4, 2023

Keras community contributions

keras-contrib : Keras community contributions Keras-contrib is deprecated. Use TensorFlow Addons. The future of Keras-contrib: We're migrating to tens

1.6k Dec 21, 2022

📝 Wrapper library for text generation / language models at char and word level with RNN in TensorFlow

tensorlm Generate Shakespeare poems with 4 lines of code. Installation tensorlm is written in / for Python 3.4+ and TensorFlow 1.1+ pip3 install tenso

63 May 22, 2021

Machine Learning Platform for Kubernetes

Reproduce, Automate, Scale your data science. Welcome to Polyaxon, a platform for building, training, and monitoring large scale deep learning applica

3.2k Dec 23, 2022

A simplified framework and utilities for PyTorch

Here is Poutyne. Poutyne is a simplified framework for PyTorch and handles much of the boilerplating code needed to train neural networks. Use Poutyne

534 Dec 17, 2022

Data loaders and abstractions for text and NLP

torchtext This repository consists of: torchtext.datasets: The raw text iterators for common NLP datasets torchtext.data: Some basic NLP building bloc

3.2k Jan 8, 2023

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

6.9k Jan 5, 2023

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

14.5k Jan 7, 2023

Python-based implementations of algorithms for learning on imbalanced data.

ND DIAL: Imbalanced Algorithms Minimalist Python-based implementations of algorithms for imbalanced learning. Includes deep and representational learn

220 Dec 13, 2022

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

imbalanced-learn imbalanced-learn is a python package offering a number of re-sampling techniques commonly used in datasets showing strong between-cla

6.2k Jan 1, 2023

MLBox is a powerful Automated Machine Learning python library.

MLBox is a powerful Automated Machine Learning python library. It provides the following features: Fast reading and distributed data preprocessing/cle

1.4k Jan 6, 2023

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Master status: Development status: Package information: TPOT stands for Tree-based Pipeline Optimization Tool. Consider TPOT your Data Science Assista

8.9k Jan 9, 2023

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

Prophet: Automatic Forecasting Procedure Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends ar

15.4k Jan 7, 2023

A machine learning toolkit dedicated to time-series data

tslearn The machine learning toolkit for time series analysis in Python Section Description Installation Installing the dependencies and tslearn Getti

2.3k Jan 5, 2023

[HELP REQUESTED] Generalized Additive Models in Python

pyGAM Generalized Additive Models in Python. Documentation Official pyGAM Documentation: Read the Docs Building interpretable models with Generalized

747 Jan 5, 2023

50% faster, 50% less RAM Machine Learning. Numba rewritten Sklearn. SVD, NNMF, PCA, LinearReg, RidgeReg, Randomized, Truncated SVD/PCA, CSR Matrices all 50+% faster

[Due to the time taken @ uni, work + hell breaking loose in my life, since things have calmed down a bit, will continue commiting!!!] [By the way, I'm

1.4k Jan 1, 2023

A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2021 Links Doc

4.2k Dec 29, 2022

A toolkit for making real world machine learning and data analysis applications in C++

dlib C++ library Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real worl

11.6k Jan 2, 2023

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

3k Jan 8, 2023

daily report of @arkinvest ETF activity + data collection

ark_invest daily weekday report of @arkinvest ETF activity + data collection This script was created to: Extract and save daily csv's from ARKInvest's

27 Jan 2, 2023

test

Lidar-data-decode In this project, you can decode your lidar data frame(pcap file) and make your own datasets(test dataset) in Windows without any hug

46 Dec 5, 2022

DaCy: The State of the Art Danish NLP pipeline using SpaCy

DaCy: A SpaCy NLP Pipeline for Danish DaCy is a Danish preprocessing pipeline trained in SpaCy. At the time of writing it has achieved State-of-the-Ar

71 Jan 6, 2023

An automated tool that fetches information about your crypto stake and generates historical data in time.

Introduction Yield explorer is a WIP! I needed a tool that would show me historical data and performance of my staked crypto but was unable to find a

42 Nov 26, 2022

Deep Illuminator is a data augmentation tool designed for image relighting.

Deep Illuminator Deep Illuminator is a data augmentation tool designed for image relighting. It can be used to easily and efficiently genera

52 Nov 29, 2022

Ultra-Data-Efficient GAN Training: Drawing A Lottery Ticket First, Then Training It Toughly

Ultra-Data-Efficient GAN Training: Drawing A Lottery Ticket First, Then Training It Toughly Code for this paper Ultra-Data-Efficient GAN Tra

77 Oct 5, 2022

🍊 :bar_chart: :bulb: Orange: Interactive data analysis

Orange Data Mining Orange is a data mining and visualization toolbox for novice and expert alike. To explore data with Orange, one requires no program

3.9k Jan 5, 2023

3D visualization of scientific data in Python

Mayavi: 3D visualization of scientific data in Python Mayavi docs: http://docs.enthought.com/mayavi/mayavi/ TVTK docs: http://docs.enthought.com/mayav

1.1k Jan 6, 2023

Data intensive science for everyone.

The latest information about Galaxy can be found on the Galaxy Community Hub. Community support is available at Galaxy Help. Galaxy Quickstart Galaxy

1k Jan 8, 2023

CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers catalog.data.gov, open.canada.ca/data, data.humdata.org among many other sites.

CKAN: The Open Source Data Portal Software CKAN is the world’s leading open-source data portal platform. CKAN makes it easy to publish, share and work

3.6k Dec 27, 2022

An interactive explorer for single-cell transcriptomics data

an interactive explorer for single-cell transcriptomics data cellxgene (pronounced "cell-by-gene") is an interactive data explorer for single-cell tra

424 Dec 15, 2022

A Python library that helps data scientists to infer causation rather than observing correlation.

1.7k Jan 4, 2023

ProPublica's collaborative tip-gathering framework. Import and manage CSV, Google Sheets and Screendoor data with ease.

Collaborate This is a web application for managing and building stories based on tips solicited from the public. This project is meant to be easy to s

86 Oct 18, 2022

🦉Data Version Control | Git for Data & Models

Website • Docs • Blog • Twitter • Chat (Community & Support) • Tutorial • Mailing List Data Version Control or DVC is an open-source tool for data sci

10.9k Jan 5, 2023

ProPublica's collaborative tip-gathering framework. Import and manage CSV, Google Sheets and Screendoor data with ease.

Collaborate This is a web application for managing and building stories based on tips solicited from the public. This project is meant to be easy to s

86 Oct 18, 2022

Backend, modern REST API for obtaining match and odds data crawled from multiple sites. Using FastAPI, MongoDB as database, Motor as async MongoDB client, Scrapy as crawler and Docker.

Introduction Apiestas is a project composed of a backend powered by the awesome framework FastAPI and a crawler powered by Scrapy. This project has fo

54 Dec 13, 2022

API & Webapp to answer questions about COVID-19. Using NLP (Question Answering) and trusted data sources.

This open source project serves two purposes. Collection and evaluation of a Question Answering dataset to improve existing QA/search methods - COVID-

329 Nov 10, 2022

MazeRL is an application oriented Deep Reinforcement Learning (RL) framework

MazeRL is an application oriented Deep Reinforcement Learning (RL) framework, addressing real-world decision problems. Our vision is to cover the complete development life cycle of RL applications ranging from simulation engineering up to agent development, training and deployment.

222 Dec 24, 2022

Data Visualizer for Super Mario Kart (SNES)

21 Nov 20, 2022

💻 Algo-Phantoms-Backend is an Application that provides pathways and quizzes along with a code editor to help you towards your DSA journey.📰🔥 This repository contains the REST APIs of the application.✨

Algo-Phantom-Backend 💻 Algo-Phantoms-Backend is an Application that provides pathways and quizzes along with a code editor to help you towards your D

44 Nov 15, 2022

🔩 Like builtins, but boltons. 250+ constructs, recipes, and snippets which extend (and rely on nothing but) the Python standard library. Nothing like Michael Bolton.

Boltons boltons should be builtins. Boltons is a set of over 230 BSD-licensed, pure-Python utilities in the same spirit as — and yet conspicuously mis

5.4k Feb 20, 2021

A PyTorch implementation of "Pathfinder Discovery Networks for Neural Message Passing"

A PyTorch implementation of "Pathfinder Discovery Networks for Neural Message Passing" (WebConf 2021). Abstract In this work we propose Pathfind

49 Dec 1, 2022

The first machine learning framework that encourages learning ML concepts instead of memorizing class functions.

SeaLion is designed to teach today's aspiring ml-engineers the popular machine learning concepts of today in a way that gives both intuition and ways of application. We do this through concise algorithms that do the job in the least jargon possible and examples to guide you through every step of the way.

324 Dec 27, 2022

Contrastive Learning Inverts the Data Generating Process

Official code to reproduce the results and data presented in the paper Contrastive Learning Inverts the Data Generating Process.

71 Nov 25, 2022

ForecastGA is a Python tool to forecast Google Analytics data using several popular time series models.

ForecastGA is a tool that combines a couple of popular libraries, Atspy and googleanalytics, with a few enhancements.

36 Jan 3, 2023

Aggregating gridded data (xarray) to polygons

A package to aggregate gridded data in xarray to polygons in geopandas using area-weighting from the relative area overlaps between pixels and polygons. Check out the binder link above for a sample code run!

42 Nov 9, 2022

A simple worker for OpenClubhouse to sync data.

OpenClubhouse-Worker This is a simple worker for OpenClubhouse to sync CH channel data.

100 Dec 17, 2022

Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

Summarization, translation, Q&A, text generation and more at blazing speed using a T5 version implemented in ONNX. This package is still in alpha stag

137 Feb 1, 2021

Named-entity recognition using neural networks. Easy-to-use and state-of-the-art results.

NeuroNER NeuroNER is a program that performs named-entity recognition (NER). Website: neuroner.com. This page gives step-by-step instructions to insta

1.5k Feb 17, 2021

Yet another Python binding for fastText

pyfasttext Warning! pyfasttext is no longer maintained: use the official Python binding from the fastText repository: https://github.com/facebookresea

228 Feb 17, 2021

Multilingual text (NLP) processing toolkit

polyglot Polyglot is a natural language pipeline that supports massive multilingual applications. Free software: GPLv3 license Documentation: http://p

1.8k Feb 18, 2021

Topic Modelling for Humans

gensim – Topic Modelling in Python Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Targ

11.7k Feb 18, 2021

Text vectorization tool to outperform TFIDF for classification tasks

WHAT: Supervised text vectorization tool Textvec is a text vectorization tool, with the aim to implement all the "classic" text vectorization NLP meth

160 Feb 9, 2021

NLP library designed for reproducible experimentation management

Welcome to the Transfer NLP library, a framework built on top of PyTorch to promote reproducible experimentation and Transfer Learning in NLP You can

287 Feb 14, 2021

🏖 Easy training and deployment of seq2seq models.

Headliner Headliner is a sequence modeling library that eases the training and in particular, the deployment of custom sequence models for both resear

220 Feb 10, 2021

Textpipe: clean and extract metadata from text

textpipe: clean and extract metadata from text textpipe is a Python package for converting raw text in to clean, readable text and extracting metadata

272 Feb 16, 2021

Unsupervised text tokenizer focused on computational efficiency

YouTokenToMe YouTokenToMe is an unsupervised text tokenizer focused on computational efficiency. It currently implements fast Byte Pair Encoding (BPE)

718 Feb 18, 2021

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari Overview | Performance | Installation | Documentation | Contributing 🎉 🎉 🎉 We released the 2.0.0 version with TF2 Support. 🎉 🎉 🎉 If you

2k Feb 9, 2021

Scikit-learn style model finetuning for NLP

Scikit-learn style model finetuning for NLP Finetune is a library that allows users to leverage state-of-the-art pretrained NLP models for a wide vari

631 Feb 2, 2021

:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.

(Framework for Adapting Representation Models) What is it? FARM makes Transfer Learning with BERT & Co simple, fast and enterprise-ready. It's built u

1.1k Feb 14, 2021

DELTA is a deep learning based natural language and speech processing platform.

DELTA - A DEep learning Language Technology plAtform What is DELTA? DELTA is a deep learning based end-to-end natural language and speech processing p

1.4k Feb 17, 2021

Text preprocessing, representation and visualization from zero to hero.

Text preprocessing, representation and visualization from zero to hero. From zero to hero • Installation • Getting Started • Examples • API • FAQ • Co

2.1k Feb 13, 2021

NeMo: a toolkit for conversational AI

NVIDIA NeMo Introduction NeMo is a toolkit for creating Conversational AI applications. NeMo product page. Introductory video. The toolkit comes with

2.4k Feb 18, 2021

Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides

2.1k Feb 17, 2021

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

2.6k Feb 18, 2021

Super easy library for BERT based NLP models

Fast-Bert New - Learning Rate Finder for Text Classification Training (borrowed with thanks from https://github.com/davidtvs/pytorch-lr-finder) Suppor

1.5k Feb 18, 2021

Beautiful visualizations of how language differs among document types.

Scattertext 0.1.0.0 A tool for finding distinguishing terms in corpora and displaying them in an interactive HTML scatter plot. Points corresponding t

1.5k Feb 17, 2021

Basic Utilities for PyTorch Natural Language Processing (NLP)

Basic Utilities for PyTorch Natural Language Processing (NLP) PyTorch-NLP, or torchnlp for short, is a library of basic utilities for PyTorch NLP. tor

1.9k Feb 18, 2021

Snips Python library to extract meaning from text

Snips NLU Snips NLU (Natural Language Understanding) is a Python library that allows to extract structured information from sentences written in natur

3.5k Feb 17, 2021

A full spaCy pipeline and models for scientific/biomedical documents.

This repository contains custom pipes and models related to using spaCy for scientific documents. In particular, there is a custom tokenizer that adds

831 Feb 17, 2021

🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

spacy-transformers: Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy This package provides spaCy components and architectures to use tr

903 Feb 17, 2021

🦆 Contextually-keyed word vectors

sense2vec: Contextually-keyed word vectors sense2vec (Trask et. al, 2015) is a nice twist on word2vec that lets you learn more interesting and detaile

1.2k Feb 17, 2021

✨Fast Coreference Resolution in spaCy with Neural Networks

✨ NeuralCoref 4.0: Coreference Resolution in spaCy with Neural Networks. NeuralCoref is a pipeline extension for spaCy 2.1+ which annotates and resolv

2.2k Feb 18, 2021

Extract Keywords from sentence or Replace keywords in sentences.

FlashText This module can be used to replace keywords in sentences or extract keywords from sentences. It is based on the FlashText algorithm. Install

4.7k Feb 17, 2021

The Classical Language Toolkit

Notice: This Git branch (dev) contains the CLTK's upcoming major release (v. 1.0.0). See https://github.com/cltk/cltk/tree/master and https://docs.clt

629 Feb 17, 2021

Python implementation of TextRank for phrase extraction and summarization of text documents

PyTextRank PyTextRank is a Python implementation of TextRank as a spaCy pipeline extension, used to: extract the top-ranked phrases from text document

1.4k Feb 17, 2021

fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.

fastNLP fastNLP是一款轻量级的自然语言处理（NLP）工具包，目标是快速实现NLP任务以及构建复杂模型。 fastNLP具有如下的特性：统一的Tabular式数据容器，简化数据预处理过程；内置多种数据集的Loader和Pipe，省去预处理代码; 各种方便的NLP工具，例如Embedd

2k Feb 14, 2021

Module for automatic summarization of text documents and HTML pages.

Automatic text summarizer Simple library and command line utility for extracting summary from HTML pages or plain texts. The package also contains sim

2.5k Feb 17, 2021

Official Stanford NLP Python Library for Many Human Languages

Stanza: A Python NLP Library for Many Human Languages The Stanford NLP Group's official Python NLP library. It contains support for running various ac

5.2k Feb 17, 2021

:mag: End-to-End Framework for building natural language search interfaces to data by utilizing Transformers and the State-of-the-Art of NLP. Supporting DPR, Elasticsearch, HuggingFace’s Modelhub and much more!

Haystack is an end-to-end framework that enables you to build powerful and production-ready pipelines for different search use cases. Whether you want

1.4k Feb 18, 2021

Python Swahili-nlp-data Resources

Python swahili-nlp-data Libraries

Koalas: pandas API on Apache Spark

A Python package for manipulating 2-dimensional tabular data structures

High performance datastore for time series and tick data

A pure Python implementation of Apache Spark's RDD and DStream interfaces.

Universal 1d/2d data containers with Transformers functionality for data analysis.

Pandas Google BigQuery

NumPy and Pandas interface to Big Data

Create HTML profiling reports from pandas DataFrame objects

Logging MXNet data for visualization in TensorBoard.

Interpretability and explainability of data and machine learning models

A library for debugging/inspecting machine learning classifiers and explaining their predictions

An intuitive library to add plotting functionality to scikit-learn objects.

A data-driven approach to quantify the value of classifiers in a machine learning ensemble.

With Holoviews, your data visualizes itself.

How on earth can I ever think of a solution like that in an interview?!

Algorithms and data structures for educational, demonstrational and experimental purposes.

:computer: Data Structures and Algorithms in Python

Python library that makes it easy for data scientists to create charts.

NLP made easy

Keras community contributions

📝 Wrapper library for text generation / language models at char and word level with RNN in TensorFlow

Machine Learning Platform for Kubernetes

A simplified framework and utilities for PyTorch

Data loaders and abstractions for text and NLP

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Python-based implementations of algorithms for learning on imbalanced data.

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

MLBox is a powerful Automated Machine Learning python library.

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

A machine learning toolkit dedicated to time-series data

[HELP REQUESTED] Generalized Additive Models in Python

50% faster, 50% less RAM Machine Learning. Numba rewritten Sklearn. SVD, NNMF, PCA, LinearReg, RidgeReg, Randomized, Truncated SVD/PCA, CSR Matrices all 50+% faster

A library of extension and helper modules for Python's data analysis and machine learning libraries.

A toolkit for making real world machine learning and data analysis applications in C++

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

daily report of @arkinvest ETF activity + data collection

test

DaCy: The State of the Art Danish NLP pipeline using SpaCy

An automated tool that fetches information about your crypto stake and generates historical data in time.

Deep Illuminator is a data augmentation tool designed for image relighting.

Ultra-Data-Efficient GAN Training: Drawing A Lottery Ticket First, Then Training It Toughly

🍊 :bar_chart: :bulb: Orange: Interactive data analysis

3D visualization of scientific data in Python

Data intensive science for everyone.

CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers catalog.data.gov, open.canada.ca/data, data.humdata.org among many other sites.

An interactive explorer for single-cell transcriptomics data

A Python library that helps data scientists to infer causation rather than observing correlation.

ProPublica's collaborative tip-gathering framework. Import and manage CSV, Google Sheets and Screendoor data with ease.

🦉Data Version Control | Git for Data & Models

ProPublica's collaborative tip-gathering framework. Import and manage CSV, Google Sheets and Screendoor data with ease.

Backend, modern REST API for obtaining match and odds data crawled from multiple sites. Using FastAPI, MongoDB as database, Motor as async MongoDB client, Scrapy as crawler and Docker.

API & Webapp to answer questions about COVID-19. Using NLP (Question Answering) and trusted data sources.

MazeRL is an application oriented Deep Reinforcement Learning (RL) framework

Data Visualizer for Super Mario Kart (SNES)

💻 Algo-Phantoms-Backend is an Application that provides pathways and quizzes along with a code editor to help you towards your DSA journey.📰🔥 This repository contains the REST APIs of the application.✨

🔩 Like builtins, but boltons. 250+ constructs, recipes, and snippets which extend (and rely on nothing but) the Python standard library. Nothing like Michael Bolton.

A PyTorch implementation of "Pathfinder Discovery Networks for Neural Message Passing"

The first machine learning framework that encourages learning ML concepts instead of memorizing class functions.

Contrastive Learning Inverts the Data Generating Process

ForecastGA is a Python tool to forecast Google Analytics data using several popular time series models.

Aggregating gridded data (xarray) to polygons

A simple worker for OpenClubhouse to sync data.

Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

Named-entity recognition using neural networks. Easy-to-use and state-of-the-art results.

Yet another Python binding for fastText

Multilingual text (NLP) processing toolkit

Topic Modelling for Humans

Text vectorization tool to outperform TFIDF for classification tasks

NLP library designed for reproducible experimentation management

🏖 Easy training and deployment of seq2seq models.

Textpipe: clean and extract metadata from text

Unsupervised text tokenizer focused on computational efficiency

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Scikit-learn style model finetuning for NLP

:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.

DELTA is a deep learning based natural language and speech processing platform.