25 Repositories
Python cleaning Libraries
Data Inspector is an open-source python library that brings 15++ types of different functions to make EDA, data cleaning easier.
Data Inspector Data Inspector is an open-source python library that brings 15 types of different functions to make EDA, data cleaning easier. Author:
LotteryBuyPredictionWebApp - Lottery Purchase Prediction Model
Lottery Purchase Prediction Model Objective and Goal Predict the lottery type th
Eureka is a Rest-API framework scraper based on FastAPI for cleaning and organizing data, designed for the Eureka by Turing project of the National University of Colombia
Eureka is a Rest-API framework scraper based on FastAPI for cleaning and organizing data, designed for the Eureka by Turing project of the National University of Colombia
Jupyter notebook and datasets from the pandas Q&A video series
Python pandas Q&A video series Read about the series, and view all of the videos on one page: Easier data analysis in Python with pandas. Jupyter Note
General Assembly's 2015 Data Science course in Washington, DC
DAT8 Course Repository Course materials for General Assembly's Data Science course in Washington, DC (8/18/15 - 10/29/15). Instructor: Kevin Markham (
Data cleaning, missing value handle, EDA use in this project
Lending Club Case Study Project Brief Solving this assignment will give you an idea about how real business problems are solved using EDA. In this cas
SAT Project - The first project I had done at General Assembly, performed EDA, data cleaning and created data visualizations
Project 1: Standardized Test Analysis by Adam Klesc Overview This project covers: Basic statistics and probability Many Python programming concepts Pr
Cleaning-utils - a collection of small Python functions and classes which make cleaning pipelines shorter and easier
cleaning-utils [] [] [] cleaning-utils is a collection of small Python functions
Cleaning and analysing aggregated UK political polling data.
Analysing aggregated UK polling data The tweet collection & storage pipeline used in email-service is used to also collect tweets from @britainelects.
Built for cleaning purposes in military institutions
Ferramenta do AL Construído para fins de limpeza em instituições militares. Instalação Requer python = 3.2 pip install -r requirements.txt Usagem Exe
Converts a Bangla numeric string to literal words.
Bangla Number in Words Converts a Bangla numeric string to literal words. Install $ pip install banglanum2words Usage
CleanX is an open source python library for exploring, cleaning and augmenting large datasets of X-rays, or certain other types of radiological images.
cleanX CleanX is an open source python library for exploring, cleaning and augmenting large datasets of X-rays, or certain other types of radiological
cleanlab is the data-centric ML ops package for machine learning with noisy labels.
cleanlab is the data-centric ML ops package for machine learning with noisy labels. cleanlab cleans labels and supports finding, quantifying, and lear
dirty_cat is a Python module for machine-learning on dirty categorical variables.
dirty_cat dirty_cat is a Python module for machine-learning on dirty categorical variables.
An open-source NLP library: fast text cleaning and preprocessing.
An open-source NLP library: fast text cleaning and preprocessing
Data cleaning tools for Business analysis
Datacleaning datacleaning tools for Business analysis This program is made for Vicky's work. You can use it, too. 数据清洗 该数据清洗工具是为了商业分析 这个程序是为了Vicky的工作而
A method for cleaning and classifying text using transformers.
NLP Translation and Classification The repository contains a method for classifying and cleaning text using NLP transformers. Overview The input data
DataPrep — The easiest way to prepare data in Python
DataPrep — The easiest way to prepare data in Python
This python module is an easy-to-use port of the text normalization used in the paper "Not low-resource anymore: Aligner ensembling, batch filtering, and new datasets for Bengali-English machine translation". It is intended to be used for normalizing / cleaning Bengali and English text.
normalizer This python module is an easy-to-use port of the text normalization used in the paper "Not low-resource anymore: Aligner ensembling, batch
A framework for cleaning Chinese dialog data
A framework for cleaning Chinese dialog data
PClean: A Domain-Specific Probabilistic Programming Language for Bayesian Data Cleaning
PClean: A Domain-Specific Probabilistic Programming Language for Bayesian Data Cleaning Warning: This is a rapidly evolving research prototype.
Clean APIs for data cleaning. Python implementation of R package Janitor
pyjanitor pyjanitor is a Python implementation of the R package janitor, and provides a clean API for cleaning data. Why janitor? Originally a port of
Web scraping library and command-line tool for text discovery and extraction (main content, metadata, comments)
trafilatura: Web scraping tool for text discovery and retrieval Description Trafilatura is a Python package and command-line tool which seamlessly dow
:truck: Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
To launch a live notebook server to test optimus using binder or Colab, click on one of the following badges: Optimus is the missing framework to prof