651 Repositories
Python materials-science Libraries
Glue is a python project to link visualizations of scientific datasets across many files.
Glue Glue is a python project to link visualizations of scientific datasets across many files. Click on the image for a quick demo: Features Interacti
Transform-Invariant Non-Negative Matrix Factorization
Transform-Invariant Non-Negative Matrix Factorization A comprehensive Python package for Non-Negative Matrix Factorization (NMF) with a focus on learn
PsychoPy is an open-source package for creating experiments in behavioral science.
PsychoPy is an open-source package for creating experiments in behavioral science. It aims to provide a single package that is: precise enoug
A collection of neat and practical data science and machine learning projects
Data Science A collection of neat and practical data science and machine learning projects Explore the docs Β» Report Bug Β· Request Feature Table of Co
A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.
Cookiecutter Data Science A logical, reasonably standardized, but flexible project structure for doing and sharing data science work. Project homepage
eyes is a Public Opinion Mining System focusing on taiwanese forums such as PTT, Dcard.
eyes is a Public Opinion Mining System focusing on taiwanese forums such as PTT, Dcard. Features π₯ Article monitor: helps you capture the trend at a
π Economics Observatory Visualisation Repository
Economics Observatory Visualisation Repository Website | Visualisations | Data | Here you will find all the data visualisations and infographics attac
wikirepo is a Python package that provides a framework to easily source and leverage standardized Wikidata information
Python based Wikidata framework for easy dataframe extraction wikirepo is a Python package that provides a framework to easily source and leverage sta
PyChemia, Python Framework for Materials Discovery and Design
PyChemia, Python Framework for Materials Discovery and Design PyChemia is an open-source Python Library for materials structural search. The purpose o
The Multi-Mission Maximum Likelihood framework (3ML)
PyPi Conda The Multi-Mission Maximum Likelihood framework (3ML) A framework for multi-wavelength/multi-messenger analysis for astronomy/astrophysics.
Python-based Space Physics Environment Data Analysis Software
pySPEDAS pySPEDAS is an implementation of the SPEDAS framework for Python. The Space Physics Environment Data Analysis Software (SPEDAS) framework is
DataPrep β The easiest way to prepare data in Python
DataPrep β The easiest way to prepare data in Python
Universal Reddit Scraper - A comprehensive Reddit scraping command-line tool written in Python.
Universal Reddit Scraper - A comprehensive Reddit scraping command-line tool written in Python.
Supplementary materials for ISMIR 2021 LBD paper "Evaluation of Latent Space Disentanglement in the Presence of Interdependent Attributes"
Evaluation of Latent Space Disentanglement in the Presence of Interdependent Attributes Supplementary materials for ISMIR 2021 LBD submission: K. N. W
A lightweight, hub-and-spoke dashboard for multi-account Data Science projects
A lightweight, hub-and-spoke dashboard for cross-account Data Science Projects Introduction Modern Data Science environments often involve many indepe
Useful materials and tutorials for 110-1 NTU DBME5028 (Application of Deep Learning in Medical Imaging)
Useful materials and tutorials for 110-1 NTU DBME5028 (Application of Deep Learning in Medical Imaging)
Tablexplore is an application for data analysis and plotting built in Python using the PySide2/Qt toolkit.
Tablexplore is an application for data analysis and plotting built in Python using the PySide2/Qt toolkit.
Write maintainable, production-ready pipelines using Jupyter or your favorite text editor. Develop locally, deploy to the cloud. βοΈ
Write maintainable, production-ready pipelines using Jupyter or your favorite text editor. Develop locally, deploy to the cloud. βοΈ
Efficient Python Tricks and Tools for Data Scientists
Why efficient Python? Because using Python more efficiently will make your code more readable and run more efficiently.
This YoloV5 based model is fit to detect people and different types of land vehicles, and displaying their density on a fitted map, according to their coordinates and detected labels.
This YoloV5 based model is fit to detect people and different types of land vehicles, and displaying their density on a fitted map, according to their
AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications.
AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications. With just a few lines of code, you can train and deploy high-accuracy machine learning and deep learning models tabular data.
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 200 universities.
D2L.ai: Interactive Deep Learning Book with Multi-Framework Code, Math, and Discussions Book website | STAT 157 Course at UC Berkeley | Latest version
Using / reproducing ACD from the paper "Hierarchical interpretations for neural network predictions" π§ (ICLR 2019)
Hierarchical neural-net interpretations (ACD) π§ Produces hierarchical interpretations for a single prediction made by a pytorch neural network. Offic
This repo contains research materials released by members of the Google Brain team in Tokyo.
Brain Tokyo Workshop π§ πΌ This repo contains research materials released by members of the Google Brain team in Tokyo. Past Projects Weight Agnostic
A library for using chemistry in your applications
Chemistry in python Resources Used The following items are not made by me! Click the words to go to the original source Periodic Tab Json - Used in -
A Tools that help Data Scientists and ML engineers train and deploy ML models.
Domino Research This repo contains projects under active development by the Domino R&D team. We build tools that help Data Scientists and ML engineers
Labelling platform for text using distant supervision
With DataQA, you can label unstructured text documents using rule-based distant supervision.
Model Agnostic Confidence Estimator (MACEST) - A Python library for calibrating Machine Learning models' confidence scores
Model Agnostic Confidence Estimator (MACEST) - A Python library for calibrating Machine Learning models' confidence scores
collection of interesting Computer Science resources
collection of interesting Computer Science resources
Obsidian tools - a Python package for analysing an Obsidian.md vault
obsidiantools is a Python package for getting structured metadata about your Obsidian.md notes and analysing your vault.
A demo of a data science project using Kedro
iris Overview This is your new Kedro project, which was generated using Kedro 0.17.4. Take a look at the Kedro documentation to get started. Rules and
Rafael Project- Classifying rockets to different types using data science algorithms.
Rocket-Classify Rafael Project- Classifying rockets to different types using data science algorithms. In this project we received data base with data
Scitizen - Help scientific research for the benefit of mankind and humanity π¬
Scitizen - Help scientific research for the benefit of mankind and humanity π¬ Scitizen has been built from the ground up to give everyone the possibi
CS 506 - Computational Tools for Data Science
CS 506 - Computational Tools for Data Science Code, slides, and notes for Boston University CS506 Fall 2021 The Final Project Repository can be found
Learn how to responsibly deliver value with ML.
Made With ML Applied ML Β· MLOps Β· Production Join 30K+ developers in learning how to responsibly deliver value with ML. π₯ Among the top MLOps reposit
Visions provides an extensible suite of tools to support common data analysis operations
Visions And these visions of data types, they kept us up past the dawn. Visions provides an extensible suite of tools to support common data analysis
An open-source Python project series where beginners can contribute and practice coding.
Python Mini Projects A collection of easy Python small projects to help you improve your programming skills. Table Of Contents Aim Of The Project Cont
Python library for generating CycloneDX SBOMs
Python Library for generating CycloneDX This CycloneDX module for Python can generate valid CycloneDX bill-of-material document containing an aggregat
Kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code
A Python framework for creating reproducible, maintainable and modular data science code.
Fully reproducible, Dockerized, step-by-step, tutorial on how to mock a "real-time" Kafka data stream from a timestamped csv file. Detailed blog post published on Towards Data Science.
time-series-kafka-demo Mock stream producer for time series data using Kafka. I walk through this tutorial and others here on GitHub and on my Medium
Lux Academy & Data Science East Africa Python Boot Camp, Building and Deploying Flask Application Using Docker Demo App.
Flask and Docker Application Demo A Docker image is a read-only, inert template that comes with instructions for deploying containers. In Docker, ever
β¨Rubrix is a production-ready Python framework for exploring, annotating, and managing data in NLP projects.
β¨A Python framework to explore, label, and monitor data for NLP projects
FLAML is a lightweight Python library that finds accurate machine learning models automatically, efficiently and economically
FLAML - Fast and Lightweight AutoML
skimpy is a light weight tool that provides summary statistics about variables in data frames within the console.
skimpy Welcome Welcome to skimpy! skimpy is a light weight tool that provides summary statistics about variables in data frames within the console. Th
Pipeline for fast building text classification TF-IDF + LogReg baselines.
Text Classification Baseline Pipeline for fast building text classification TF-IDF + LogReg baselines. Usage Instead of writing custom code for specif
Natural Language Processing library built with AllenNLP π²π±
Custom Natural Language Processing with big and small models π²π±
Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.
Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.
Slack bot for monitoring your Metaflow flows!
Metaflowbot - Slack Bot for your Metaflow flows! Metaflowbot makes it fun and easy to monitor your Metaflow runs, past and present. Imagine starting a
A Python package implementing various colour checker detection algorithms and related utilities.
A Python package implementing various colour checker detection algorithms and related utilities.
Evidently helps analyze machine learning models during validation or production monitoring
Evidently helps analyze machine learning models during validation or production monitoring. The tool generates interactive visual reports and JSON profiles from pandas DataFrame or csv files. Currently 6 reports are available.
Rubrix is a free and open-source tool for exploring and iterating on data for artificial intelligence projects.
Open-source tool for exploring, labeling, and monitoring data for AI projects
MachineLearningStocks is designed to be an intuitive and highly extensible template project applying machine learning to making stock predictions.
Using python and scikit-learn to make stock predictions
The MLOps platform for innovators π
β DS2.ai is an integrated AI operation solution that supports all stages from custom AI development to deployment. It is an AI-specialized platform service that collects data, builds a training dataset through data labeling, and enables automatic development of artificial intelligence and easy deployment and operation.
Pre-trained model, code, and materials from the paper "Impact of Adversarial Examples on Deep Learning Models for Biomedical Image Segmentation" (MICCAI 2019).
Adaptive Segmentation Mask Attack This repository contains the implementation of the Adaptive Segmentation Mask Attack (ASMA), a targeted adversarial
A community run, 5-day PyTorch Deep Learning Bootcamp
Deep Learning Winter School, November 2107. Tel Aviv Deep Learning Bootcamp : http://deep-ml.com. About Tel-Aviv Deep Learning Bootcamp is an intensiv
π All-in-one web-based IDE specialized for machine learning and data science.
All-in-one web-based development environment for machine learning Getting Started β’ Features & Screenshots β’ Support β’ Report a Bug β’ FAQ β’ Known Issu
XGBoost-Ray is a distributed backend for XGBoost, built on top of distributed computing framework Ray.
XGBoost-Ray is a distributed backend for XGBoost, built on top of distributed computing framework Ray.
A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.
A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.
Azure Cloud Advocates at Microsoft are pleased to offer a 12-week, 24-lesson curriculum all about Machine Learning
Azure Cloud Advocates at Microsoft are pleased to offer a 12-week, 24-lesson curriculum all about Machine Learning
Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code
Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rather than invoking the Python interpreter, Tuplex generates optimized LLVM bytecode for the given pipeline and input data set.
This repository contains all the code and materials distributed in the 2021 Q-Programming Summer of Qode.
Q-Programming Summer of Qode This repository contains all the code and materials distributed in the Q-Programming Summer of Qode. If you want to creat
A data preprocessing package for time series data. Design for machine learning and deep learning.
A data preprocessing package for time series data. Design for machine learning and deep learning.
Meerkat provides fast and flexible data structures for working with complex machine learning datasets.
Meerkat makes it easier for ML practitioners to interact with high-dimensional, multi-modal data. It provides simple abstractions for data inspection, model evaluation and model training supported by efficient and robust IO under the hood.
An intelligent, flexible grammar of machine learning.
An english representation of machine learning. Modify what you want, let us handle the rest. Overview Nylon is a python library that lets you customiz
flexible time-series processing & feature extraction
tsflex is a toolkit for flexible time-series processing & feature extraction, making few assumptions about input data. Useful links Documentation Exam
fds is a tool for Data Scientists made by DAGsHub to version control data and code at once.
Fast Data Science, AKA fds, is a CLI for Data Scientists to version control data and code at once, by conveniently wrapping git and dvc
π§ͺ Panel-Chemistry - exploratory data analysis and build powerful data and viz tools within the domain of Chemistry using Python and HoloViz Panel.
π§ͺπ π. The purpose of the panel-chemistry project is to make it really easy for you to do DATA ANALYSIS and build powerful DATA AND VIZ APPLICATIONS within the domain of Chemistry using using Python and HoloViz Panel.
A site devoted to celebrating to matching books with readers and readers with books. Inspired by the Readers' Advisory process in library science, Literati, and Stitch Fix.
Welcome to Readers' Advisory Greetings, fellow book enthusiasts! Visit Readers' Advisory! Menu Technologies Key Features Database Schema Front End Rou
A Curated Collection of Awesome Python Scripts
A Curated Collection of Awesome Python Scripts that will make you go wow. This repository will help you in getting those green squares. Hop in and enjoy the journey of open source. π
All course materials for the Zero to Mastery Deep Learning with TensorFlow course.
All course materials for the Zero to Mastery Deep Learning with TensorFlow course.
A New, Interactive Approach to Learning Python
This is the repository for The Python Workshop, published by Packt. It contains all the supporting project files necessary to work through the course from start to finish.
pyprobables is a pure-python library for probabilistic data structures
pyprobables is a pure-python library for probabilistic data structures. The goal is to provide the developer with a pure-python implementation of common probabilistic data-structures to use in their work.
leafmap - A Python package for geospatial analysis and interactive mapping in a Jupyter environment.
A Python package for geospatial analysis and interactive mapping with minimal coding in a Jupyter environment
Bachelor's Thesis in Computer Science: Privacy-Preserving Federated Learning Applied to Decentralized Data
federated is the source code for the Bachelor's Thesis Privacy-Preserving Federated Learning Applied to Decentralized Data (Spring 2021, NTNU) Federat
Orchest is a browser based IDE for Data Science.
Orchest is a browser based IDE for Data Science. It integrates your favorite Data Science tools out of the box, so you donβt have to. The application is easy to use and can run on your laptop as well as on a large scale cloud cluster.
An Unsupervised Graph-based Toolbox for Fraud Detection
An Unsupervised Graph-based Toolbox for Fraud Detection Introduction: UGFraud is an unsupervised graph-based fraud detection toolbox that integrates s
Apache Liminal is an end-to-end platform for data engineers & scientists, allowing them to build, train and deploy machine learning models in a robust and agile way
Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.
A Lucid Framework for Transparent and Interpretable Machine Learning Models.
Currently a Beta-Version lucidmode is an open-source, low-code and lightweight Python framework for transparent and interpretable machine learning mod
A computer algebra system written in pure Python
SymPy See the AUTHORS file for the list of authors. And many more people helped on the SymPy mailing list, reported bugs, helped organize SymPy's part
Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)
Karate Club is an unsupervised machine learning extension library for NetworkX. Please look at the Documentation, relevant Paper, Promo Video, and Ext
Ray provides a simple, universal API for building distributed applications.
An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.
hyppo is an open-source software package for multivariate hypothesis testing.
hyppo (HYPothesis Testing in PythOn, pronounced "Hippo") is an open-source software package for multivariate hypothesis testing.
A scikit-learn-compatible module for estimating prediction intervals.
|Anaconda|_ MAPIE - Model Agnostic Prediction Interval Estimator MAPIE allows you to easily estimate prediction intervals using your favourite sklearn
π The Most Comprehensive List of Kaggle Solutions and Ideas π
π Collection of Kaggle Solutions and Ideas π
skweak: A software toolkit for weak supervision applied to NLP tasks
Labelled data remains a scarce resource in many practical NLP scenarios. This is especially the case when working with resource-poor languages (or text domains), or when using task-specific labels without pre-existing datasets. The only available option is often to collect and annotate texts by hand, which is expensive and time-consuming.
Diffgram - Supervised Learning Data Platform
Data Annotation, Data Labeling, Annotation Tooling, Training Data for Machine Learning
A Pythonic introduction to methods for scaling your data science and machine learning work to larger datasets and larger models, using the tools and APIs you know and love from the PyData stack (such as numpy, pandas, and scikit-learn).
This tutorial's purpose is to introduce Pythonistas to methods for scaling their data science and machine learning work to larger datasets and larger models, using the tools and APIs they know and love from the PyData stack (such as numpy, pandas, and scikit-learn).
Algorithmic trading using machine learning.
Algorithmic Trading This machine learning algorithm was built using Python 3 and scikit-learn with a Decision Tree Classifier. The program gathers sto
Introducing neural networks to predict stock prices
IntroNeuralNetworks in Python: A Template Project IntroNeuralNetworks is a project that introduces neural networks and illustrates an example of how o
Providing the solutions for high-frequency trading (HFT) strategies using data science approaches (Machine Learning) on Full Orderbook Tick Data.
Modeling High-Frequency Limit Order Book Dynamics Using Machine Learning Framework to capture the dynamics of high-frequency limit order books. Overvi
Using python and scikit-learn to make stock predictions
MachineLearningStocks in python: a starter project and guide EDIT as of Feb 2021: MachineLearningStocks is no longer actively maintained MachineLearni
Satellite imagery for dummies.
felicette Satellite imagery for dummies. What can you do with this tool? TL;DR: Generate JPEG earth imagery from coordinates/location name with public
Processing and interpolating spatial data with a twist of machine learning
Documentation | Documentation (dev version) | Contact | Part of the Fatiando a Terra project About Verde is a Python library for processing spatial da
Apache Superset is a Data Visualization and Data Exploration Platform
Apache Superset is a Data Visualization and Data Exploration Platform
IPython: Productive Interactive Computing
IPython: Productive Interactive Computing Overview Welcome to IPython. Our full documentation is available on ipython.readthedocs.io and contains info
π© Like builtins, but boltons. 250+ constructs, recipes, and snippets which extend (and rely on nothing but) the Python standard library. Nothing like Michael Bolton.
Boltons boltons should be builtins. Boltons is a set of over 230 BSD-licensed, pure-Python utilities in the same spirit as β and yet conspicuously mis
Viewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.
Viewflow Viewflow is a framework built on the top of Airflow that enables data scientists to create materialized views. It allows data scientists to f
Automated Machine Learning Pipeline with Feature Engineering and Hyper-Parameters Tuning
The mljar-supervised is an Automated Machine Learning Python package that works with tabular data. I
A probabilistic programming language in TensorFlow. Deep generative models, variational inference.
Edward is a Python library for probabilistic modeling, inference, and criticism. It is a testbed for fast experimentation and research with probabilis
Probabilistic reasoning and statistical analysis in TensorFlow
TensorFlow Probability TensorFlow Probability is a library for probabilistic reasoning and statistical analysis in TensorFlow. As part of the TensorFl