2909 Python Website-to-Json-Data Libraries

Explorative Data Analysis Guidelines

Explorative Data Analysis Get data into a usable format! Find out if the following predictive modeling phase will be successful! Combine everything in

18 Dec 26, 2022

cleanlab is the data-centric ML ops package for machine learning with noisy labels.

cleanlab is the data-centric ML ops package for machine learning with noisy labels. cleanlab cleans labels and supports finding, quantifying, and lear

51 Nov 28, 2022

Data imputations library to preprocess datasets with missing data

Impyute is a library of missing data imputation algorithms. This library was designed to be super lightweight, here's a sneak peak at what impyute can do.

329 Dec 5, 2022

Kaggler is a Python package for lightweight online machine learning algorithms and utility functions for ETL and data analysis.

Kaggler is a Python package for lightweight online machine learning algorithms and utility functions for ETL and data analysis. It is distributed under the MIT License.

720 Dec 25, 2022

Skoot is a lightweight python library of machine learning transformer classes that interact with scikit-learn and pandas.

Skoot is a lightweight python library of machine learning transformer classes that interact with scikit-learn and pandas. Its objective is to ex

54 Aug 20, 2022

dirty_cat is a Python module for machine-learning on dirty categorical variables.

dirty_cat dirty_cat is a Python module for machine-learning on dirty categorical variables.

637 Dec 29, 2022

Pypeln is a simple yet powerful Python library for creating concurrent data pipelines.

Pypeln Pypeln (pronounced as "pypeline") is a simple yet powerful Python library for creating concurrent data pipelines. Main Features Simple: Pypeln

1.4k Dec 31, 2022

A Guide for Feature Engineering and Feature Selection, with implementations and examples in Python.

Feature Engineering & Feature Selection A comprehensive guide [pdf] [markdown] for Feature Engineering and Feature Selection, with implementations and

968 Dec 29, 2022

apricot implements submodular optimization for the purpose of selecting subsets of massive data sets to train machine learning models quickly.

Please consider citing the manuscript if you use apricot in your academic work! You can find more thorough documentation here. apricot implements subm

457 Dec 20, 2022

MCML is a toolkit for semi-supervised dimensionality reduction and quantitative analysis of Multi-Class, Multi-Label data

MCML is a toolkit for semi-supervised dimensionality reduction and quantitative analysis of Multi-Class, Multi-Label data. We demonstrate its use

26 Nov 29, 2022

Dump Data from FTDI Serial Port to Binary File on MacOS

1 Nov 24, 2021

Crypto Stats and Tweets Data Pipeline using Airflow

Crypto Stats and Tweets Data Pipeline using Airflow Introduction Project Overview This project was brought upon through Udacity's nanodegree program.

1 Nov 24, 2021

A package to predict protein inter-residue geometries from sequence data

trRosetta This package is a part of trRosetta protein structure prediction protocol developed in: Improved protein structure prediction using predicte

185 Jan 7, 2023

An optimization and data collection toolbox for convenient and fast prototyping of computationally expensive models.

An optimization and data collection toolbox for convenient and fast prototyping of computationally expensive models. Hyperactive: is very easy to lear

422 Jan 4, 2023

68 keypoint annotations for COFW test data

68 keypoint annotations for COFW test data This repository contains manually annotated 68 keypoints for COFW test data (original annotation of CFOW da

31 Dec 6, 2022

Centralized whale instance using github actions, sourcing metadata from bigquery-public-data.

Whale Demo Instance: Bigquery Public Data This is a fully-functioning demo instance of the whale data catalog, actively scraping data from Bigquery's

17 Dec 14, 2022

Fast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE)

FFT-accelerated Interpolation-based t-SNE (FIt-SNE) Introduction t-Stochastic Neighborhood Embedding (t-SNE) is a highly successful method for dimensi

547 Dec 21, 2022

An interactive UMAP visualization of the MNIST data set.

Code for an interactive UMAP visualization of the MNIST data set. Demo at https://grantcuster.github.io/umap-explorer/. You can read more about the de

70 Dec 27, 2022

A high-performance topological machine learning toolbox in Python

giotto-tda is a high-performance topological machine learning toolbox in Python built on top of scikit-learn and is distributed under the G

632 Dec 29, 2022

Single-Cell Analysis in Python. Scales to 1M cells.

Scanpy – Single-Cell Analysis in Python Scanpy is a scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It inc

1.4k Jan 5, 2023

3D rendered visualization of the austrian monuments registry

Visualization of the Austrian Monuments Visualization of the monument landscape of the austrian monuments registry (Bundesdenkmalamt Denkmalverzeichni

3 Oct 24, 2019

Falcon: Interactive Visual Analysis for Big Data

Falcon: Interactive Visual Analysis for Big Data Crossfilter millions of records without latencies. This project is work in progress and not documente

803 Dec 27, 2022

A flexible tool for creating, organizing, and sharing visualizations of live, rich data. Supports Torch and Numpy.

Visdom A flexible tool for creating, organizing, and sharing visualizations of live, rich data. Supports Python. Overview Concepts Setup Usage API To

9.4k Jan 7, 2023

Complex heatmaps are efficient to visualize associations between different sources of data sets and reveal potential patterns.

Make Complex Heatmaps Complex heatmaps are efficient to visualize associations between different sources of data sets and reveal potential patterns. H

973 Jan 9, 2023

A set of useful perceptually uniform colormaps for plotting scientific data

Colorcet: Collection of perceptually uniform colormaps Build Status Coverage Latest dev release Latest release Docs What is it? Colorcet is a collecti

590 Dec 31, 2022

Streamlit — The fastest way to build data apps in Python

Welcome to Streamlit 👋 The fastest way to build and share data apps. Streamlit lets you turn data scripts into sharable web apps in minutes, not week

22k Jan 6, 2023

The purpose of this project is to share knowledge on how awesome Streamlit is and can be

Awesome Streamlit The fastest way to build Awesome Tools and Apps! Powered by Python! The purpose of this project is to share knowledge on how Awesome

1.5k Jan 7, 2023

A flexible tool for creating, organizing, and sharing visualizations of live, rich data. Supports Torch and Numpy.

Visdom A flexible tool for creating, organizing, and sharing visualizations of live, rich data. Supports Python. Overview Concepts Setup Usage API To

9.4k Jan 7, 2023

Select, weight and analyze complex sample data

Sample Analytics In large-scale surveys, often complex random mechanisms are used to select samples. Estimates derived from such samples must reflect

37 Dec 15, 2022

Datashader is a data rasterization pipeline for automating the process of creating meaningful representations of large amounts of data.

2.9k Jan 6, 2023

PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows.

An open-source, low-code machine learning library in Python 🚀 Version 2.3.5 out now! Check out the release notes here. Official • Docs • Install • Tu

6.7k Jan 8, 2023

Visualization ideas for data science

Nuance I use Nuance to curate varied visualization thoughts during my data scientist career. It is not yet a package but a list of small ideas. Welcom

16 Nov 3, 2022

🌲 Implementation of the Robust Random Cut Forest algorithm for anomaly detection on streams

416 Jan 6, 2023

Approximate Nearest Neighbor Search for Sparse Data in Python!

Approximate Nearest Neighbor Search for Sparse Data in Python! This library is well suited to finding nearest neighbors in sparse, high dimensional spaces (like text documents).

906 Jan 1, 2023

Project looking into use of autoencoder for semi-supervised learning and comparing data requirements compared to supervised learning.

2 Dec 17, 2021

A program that analyzes data from inertia measurement units installeed in aircraft and generates g-exceedance curves

1 Nov 23, 2021

What if home automation was homoiconic? Just transformations of data? No more YAML!

radiale what if home-automation was also homoiconic? The upper or proximal row contains three bones, to which Gegenbaur has applied the terms radiale,

21 Mar 26, 2022

Steganography Image/Data Injector.

Byte Steganography Image/Data Injector. For artists or people to inject their own print/data into their images. TODO Add more file formats to support.

4 Nov 16, 2022

Python module for data science and machine learning users.

dsnk-distributions package dsnk distribution is a Python module for data science and machine learning that was created with the goal of reducing calcu

1 Nov 23, 2021

Use Flask API to wrap Facebook data. Grab the wapper of Facebook public pages without an API key.

Facebook Scraper Use Flask API to wrap Facebook data. Grab the wapper of Facebook public pages without an API key. (Currently working 2021) Setup Befo

2 Dec 27, 2021

Python beta calculator that retrieves stock and market data and provides linear regressions.

Stock and Index Beta Calculator Python script that calculates the beta (β) of a stock against the chosen index. The script retrieves the data and resa

4 Jul 29, 2022

Nick Craig-Wood's Website

Nick Craig-Wood's public website This directory tree is used to build all the different docs for Nick Craig-Wood's website. The content here is (c) Ni

2 Sep 2, 2022

Lightweight library for accessing data and configuration

accsr This lightweight library contains utilities for managing, loading, uploading, opening and generally wrangling data and configurations. It was ba

7 Mar 9, 2022

BErt-like Neurophysiological Data Representation

BENDR BErt-like Neurophysiological Data Representation This repository contains the source code for reproducing, or extending the BERT-like self-super

114 Dec 23, 2022

SnapMix: Semantically Proportional Mixing for Augmenting Fine-grained Data (AAAI 2021)

SnapMix: Semantically Proportional Mixing for Augmenting Fine-grained Data (AAAI 2021) PyTorch implementation of SnapMix | paper Method Overview Cite

126 Dec 30, 2022

ClearML - Auto-Magical Suite of tools to streamline your ML workflow. Experiment Manager, MLOps and Data-Management

ClearML - Auto-Magical Suite of tools to streamline your ML workflow Experiment Manager, MLOps and Data-Management ClearML Formerly known as Allegro T

4k Jan 9, 2023

Rainbow DQN implementation that outperforms the paper's results on 40% of games using 20x less data 🌈

Rainbow 🌈 An implementation of Rainbow DQN which outperforms the paper's (Hessel et al. 2017) results on 40% of tested games while using 20x less dat

31 Dec 21, 2022

Deep Learning with PyTorch made easy 🚀 !

Deep Learning with PyTorch made easy 🚀 ! Carefree? carefree-learn aims to provide CAREFREE usages for both users and developers. It also provides a c

381 Dec 22, 2022

Python package for missing-data imputation with deep learning

MIDASpy Overview MIDASpy is a Python package for multiply imputing missing data using deep learning methods. The MIDASpy algorithm offers significant

77 Dec 3, 2022

Create a database, insert data and easily select it with Sqlite

sqliteBasics create a database, insert data and easily select it with Sqlite Watch on YouTube a step by step tutorial explaining this code: https://yo

27 Dec 27, 2022

A Python package to process & model ChEMBL data.

insilico: A Python package to process & model ChEMBL data. ChEMBL is a manually curated chemical database of bioactive molecules with drug-like proper

0 Dec 9, 2021

An open source utility for creating publication quality LaTex figures generated from OpenFOAM data files.

foamTEX An open source utility for creating publication quality LaTex figures generated from OpenFOAM data files. Explore the docs » Report Bug · Requ

1 Dec 19, 2021

Data Applications Project

DBMS project- Hotel Franchise Data and application project By TEAM Kurukunda Bhargavi Pamulapati Pallavi Greeshma Amaraneni What is this project about

1 Nov 28, 2021

Sheet Data Image/PDF-to-CSV Converter

5 Nov 22, 2021

Official repository for "Restormer: Efficient Transformer for High-Resolution Image Restoration". SOTA for motion deblurring, image deraining, denoising (Gaussian/real data), and defocus deblurring.

Restormer: Efficient Transformer for High-Resolution Image Restoration Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan,

906 Dec 30, 2022

A single model for shaping, creating, accessing, storing data within a Database

'db' within pydantic - A single model for shaping, creating, accessing, storing data within a Database Key Features Integrated Redis Caching Support A

178 Dec 16, 2022

Image classification for projects and researches

This is a tool to help you quickly solve classification problems including: data analysis, training, report results and model explanation.

2 Dec 27, 2021

MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble

datasketch: Big Data Looks Small datasketch gives you probabilistic data structures that can process and search very large amount of data super fast,

1.9k Jan 7, 2023

AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty

AugMix Introduction We propose AugMix, a data processing technique that mixes augmented images and enforces consistent embeddings of the augmented ima

876 Dec 17, 2022

How to use TensorLayer

How to use TensorLayer While research in Deep Learning continues to improve the world, we use a bunch of tricks to implement algorithms with TensorLay

349 Dec 7, 2022

The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.

The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate. Website • Key Features • How To Use • Docs •

21.1k Dec 29, 2022

This project uses Youtube data API's to do youtube tags analysis based on viewCount, comments etc.

Youtube video details analyser Steps to run this project Please set the AuthKey which you can fetch from google developer console and paste it in the

1 Nov 21, 2021

zeus is a Python implementation of the Ensemble Slice Sampling method.

zeus is a Python implementation of the Ensemble Slice Sampling method. Fast & Robust Bayesian Inference, Efficient Markov Chain Monte Carlo (MCMC), Bl

197 Dec 4, 2022

Source code and notebooks to reproduce experiments and benchmarks on Bias Faces in the Wild (BFW).

Face Recognition: Too Bias, or Not Too Bias? Robinson, Joseph P., Gennady Livitz, Yann Henon, Can Qin, Yun Fu, and Samson Timoner. "Face recognition:

41 Dec 12, 2022

Python module for performing linear regression for data with measurement errors and intrinsic scatter

Linear regression for data with measurement errors and intrinsic scatter (BCES) Python module for performing robust linear regression on (X,Y) data po

56 Sep 27, 2022

Projeto: Machine Learning: Linguagens de Programacao 2004-2001

Projeto: Machine Learning: Linguagens de Programacao 2004-2001 Projeto de Data Science e Machine Learning de análise de linguagens de programação de 2

0 Jun 29, 2021

Predicting the usefulness of reviews given the review text and metadata surrounding the reviews.

Predicting Yelp Review Quality Table of Contents Introduction Motivation Goal and Central Questions The Data Data Storage and ETL EDA Data Pipeline Da

3 Nov 27, 2022

Tutorials, assignments, and competitions for MIT Deep Learning related courses.

MIT Deep Learning This repository is a collection of tutorials for MIT Deep Learning courses. More added as courses progress. Tutorial: Deep Learning

9.5k Jan 7, 2023

Python solutions to solve practical business problems.

Python Business Analytics Also instead of "watching" you can join the link-letter, it's already being sent out to about 90 people and you are free to

357 Dec 26, 2022

Python Data Science Handbook: full text in Jupyter Notebooks

Python Data Science Handbook This repository contains the entire Python Data Science Handbook, in the form of (free!) Jupyter notebooks. How to Use th

36.9k Dec 28, 2022

Design and build a wrapper for the Open Weather API current weather data service

Design and build a wrapper for the Open Weather API current weather data service that returns a city's temperature, with caching, also allowing for the temperature of the latest queried cities that are still validly cached to be retrieved.

1 Jun 27, 2022

Empyrial is a Python-based open-source quantitative investment library dedicated to financial institutions and retail investors

By Investors, For Investors. Want to read this in Chinese? Click here Empyrial is a Python-based open-source quantitative investment library dedicated

640 Dec 31, 2022

Clinica is a software platform for clinical research studies involving patients with neurological and psychiatric diseases and the acquisition of multimodal data

Clinica Software platform for clinical neuroimaging studies Homepage | Documentation | Paper | Forum | See also: AD-ML, AD-DL ClinicaDL About The Proj

165 Dec 29, 2022

MRQy is a quality assurance and checking tool for quantitative assessment of magnetic resonance imaging (MRI) data.

Front-end View Backend View Table of Contents Description Prerequisites Running Basic Information Measurements User Interface Feedback and usage Descr

Center for Computational Imaging and Personalized Diagnostics

58 Dec 2, 2022

Python Automated Machine Learning library for tabular data.

Simple but powerful Automated Machine Learning library for tabular data. It uses efficient in-memory SAP HANA algorithms to automate routine Data Scie

47 Dec 17, 2022

Data Version Control or DVC is an open-source tool for data science and machine learning projects

Continuous Machine Learning project integration with DVC Data Version Control or DVC is an open-source tool for data science and machine learning proj

2 Jul 29, 2021

Backtesting an algorithmic trading strategy using Machine Learning and Sentiment Analysis.

Trading Tesla with Machine Learning and Sentiment Analysis An interactive program to train a Random Forest Classifier to predict Tesla daily prices us

31 Nov 17, 2022

Introducing neural networks to predict stock prices

IntroNeuralNetworks in Python: A Template Project IntroNeuralNetworks is a project that introduces neural networks and illustrates an example of how o

637 Jan 4, 2023

PyExplainer: A Local Rule-Based Model-Agnostic Technique (Explainable AI)

PyExplainer PyExplainer is a local rule-based model-agnostic technique for generating explanations (i.e., why a commit is predicted as defective) of J

AI Wizards for Software Management (AWSM) Research Group

14 Nov 13, 2022

An interactive dashboard for visualisation, integration and classification of data using Active Learning.

AstronomicAL An interactive dashboard for visualisation, integration and classification of data using Active Learning. AstronomicAL is a human-in-the-

45 Nov 28, 2022

This repository contains code for building education startup.

Learning Management System Overview It's the code for EssayBrain, a tool for teacher that automatically grades and validates essays. In order to valid

1 Nov 21, 2021

Full automated data pipeline using docker images

Create postgres tables from CSV files This first section is only relate to creating tables from CSV files using postgres container alone. Just one of

1 Nov 21, 2021

For specific function. For my own convenience. Remind owner to share data to another DITO user.

1 Dec 14, 2021

Building house price data pipelines with Apache Beam and Spark on GCP

This project contains the process from building a web crawler to extract the raw data of house price to create ETL pipelines using Google Could Platform services.

1 Nov 22, 2021

Json2Xml tool will help you convert from json COCO format to VOC xml format in Object Detection Problem.

JSON 2 XML All codes assume running from root directory. Please update the sys path at the beginning of the codes before running. Over View Json2Xml t

6 Aug 22, 2022

topic modeling on unstructured data in Space news articles retrieved from the Guardian (UK) newspaper using API

NLP Space News Topic Modeling Photos by nasa.gov (1, 2, 3, 4, 5) and extremetech.com Table of Contents Project Idea Data acquisition Primary data sour

1 Jan 3, 2022

Automated Machine Learning Pipeline for tabular data. Designed for predictive maintenance applications, failure identification, failure prediction, condition monitoring, etc.

10 May 15, 2022

This project uses unsupervised machine learning to identify correlations between daily inoculation rates in the USA and twitter sentiment in regards to COVID-19.

4 Oct 15, 2022

Python Website-to-Json-Data Resources

Python Website-to-Json-Data Libraries

Explorative Data Analysis Guidelines

cleanlab is the data-centric ML ops package for machine learning with noisy labels.

Data imputations library to preprocess datasets with missing data

Kaggler is a Python package for lightweight online machine learning algorithms and utility functions for ETL and data analysis.

Skoot is a lightweight python library of machine learning transformer classes that interact with scikit-learn and pandas.

dirty_cat is a Python module for machine-learning on dirty categorical variables.

Pypeln is a simple yet powerful Python library for creating concurrent data pipelines.

A Guide for Feature Engineering and Feature Selection, with implementations and examples in Python.

apricot implements submodular optimization for the purpose of selecting subsets of massive data sets to train machine learning models quickly.

MCML is a toolkit for semi-supervised dimensionality reduction and quantitative analysis of Multi-Class, Multi-Label data

Dump Data from FTDI Serial Port to Binary File on MacOS

Crypto Stats and Tweets Data Pipeline using Airflow

A package to predict protein inter-residue geometries from sequence data

An optimization and data collection toolbox for convenient and fast prototyping of computationally expensive models.

68 keypoint annotations for COFW test data

Centralized whale instance using github actions, sourcing metadata from bigquery-public-data.

Fast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE)

An interactive UMAP visualization of the MNIST data set.

A high-performance topological machine learning toolbox in Python

Single-Cell Analysis in Python. Scales to 1M cells.

3D rendered visualization of the austrian monuments registry

Falcon: Interactive Visual Analysis for Big Data

A flexible tool for creating, organizing, and sharing visualizations of live, rich data. Supports Torch and Numpy.

Complex heatmaps are efficient to visualize associations between different sources of data sets and reveal potential patterns.

A set of useful perceptually uniform colormaps for plotting scientific data

Streamlit — The fastest way to build data apps in Python

The purpose of this project is to share knowledge on how awesome Streamlit is and can be

A flexible tool for creating, organizing, and sharing visualizations of live, rich data. Supports Torch and Numpy.

Select, weight and analyze complex sample data

Datashader is a data rasterization pipeline for automating the process of creating meaningful representations of large amounts of data.

PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows.

Visualization ideas for data science

🌲 Implementation of the Robust Random Cut Forest algorithm for anomaly detection on streams

Approximate Nearest Neighbor Search for Sparse Data in Python!

Project looking into use of autoencoder for semi-supervised learning and comparing data requirements compared to supervised learning.

A program that analyzes data from inertia measurement units installeed in aircraft and generates g-exceedance curves

What if home automation was homoiconic? Just transformations of data? No more YAML!

Steganography Image/Data Injector.

Python module for data science and machine learning users.

Use Flask API to wrap Facebook data. Grab the wapper of Facebook public pages without an API key.

Python beta calculator that retrieves stock and market data and provides linear regressions.

Nick Craig-Wood's Website

Lightweight library for accessing data and configuration

BErt-like Neurophysiological Data Representation

SnapMix: Semantically Proportional Mixing for Augmenting Fine-grained Data (AAAI 2021)

ClearML - Auto-Magical Suite of tools to streamline your ML workflow. Experiment Manager, MLOps and Data-Management

Rainbow DQN implementation that outperforms the paper's results on 40% of games using 20x less data 🌈

Deep Learning with PyTorch made easy 🚀 !

Python package for missing-data imputation with deep learning

Create a database, insert data and easily select it with Sqlite

A Python package to process & model ChEMBL data.

An open source utility for creating publication quality LaTex figures generated from OpenFOAM data files.

Data Applications Project

Sheet Data Image/PDF-to-CSV Converter

Official repository for "Restormer: Efficient Transformer for High-Resolution Image Restoration". SOTA for motion deblurring, image deraining, denoising (Gaussian/real data), and defocus deblurring.

A single model for shaping, creating, accessing, storing data within a Database

Image classification for projects and researches

MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble

AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty

How to use TensorLayer

The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.

This project uses Youtube data API's to do youtube tags analysis based on viewCount, comments etc.

zeus is a Python implementation of the Ensemble Slice Sampling method.

Source code and notebooks to reproduce experiments and benchmarks on Bias Faces in the Wild (BFW).

Python module for performing linear regression for data with measurement errors and intrinsic scatter

Projeto: Machine Learning: Linguagens de Programacao 2004-2001

Predicting the usefulness of reviews given the review text and metadata surrounding the reviews.

Tutorials, assignments, and competitions for MIT Deep Learning related courses.

Python solutions to solve practical business problems.

Python Data Science Handbook: full text in Jupyter Notebooks

Design and build a wrapper for the Open Weather API current weather data service

Empyrial is a Python-based open-source quantitative investment library dedicated to financial institutions and retail investors

Clinica is a software platform for clinical research studies involving patients with neurological and psychiatric diseases and the acquisition of multimodal data

MRQy is a quality assurance and checking tool for quantitative assessment of magnetic resonance imaging (MRI) data.

Python Automated Machine Learning library for tabular data.

Data Version Control or DVC is an open-source tool for data science and machine learning projects

Backtesting an algorithmic trading strategy using Machine Learning and Sentiment Analysis.

Introducing neural networks to predict stock prices