659 Repositories
Python computational-science Libraries
Epidemiology analysis package
zEpid zEpid is an epidemiology analysis package, providing easy to use tools for epidemiologists coding in Python 3.5+. The purpose of this library is
Skoot is a lightweight python library of machine learning transformer classes that interact with scikit-learn and pandas.
Skoot is a lightweight python library of machine learning transformer classes that interact with scikit-learn and pandas. Its objective is to ex
dirty_cat is a Python module for machine-learning on dirty categorical variables.
dirty_cat dirty_cat is a Python module for machine-learning on dirty categorical variables.
Open-source python package for the extraction of Radiomics features from 2D and 3D images and binary masks.
pyradiomics v3.0.1 Build Status Linux macOS Windows Radiomics feature extraction in Python This is an open-source python package for the extraction of
apricot implements submodular optimization for the purpose of selecting subsets of massive data sets to train machine learning models quickly.
Please consider citing the manuscript if you use apricot in your academic work! You can find more thorough documentation here. apricot implements subm
Python+Numpy+OpenGL: fast, scalable and beautiful scientific visualization
Python+Numpy+OpenGL: fast, scalable and beautiful scientific visualization
An optimization and data collection toolbox for convenient and fast prototyping of computationally expensive models.
An optimization and data collection toolbox for convenient and fast prototyping of computationally expensive models. Hyperactive: is very easy to lear
A high-performance topological machine learning toolbox in Python
giotto-tda is a high-performance topological machine learning toolbox in Python built on top of scikit-learn and is distributed under the G
Single-Cell Analysis in Python. Scales to 1M cells.
Scanpy – Single-Cell Analysis in Python Scanpy is a scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It inc
Streamlit — The fastest way to build data apps in Python
Welcome to Streamlit 👋 The fastest way to build and share data apps. Streamlit lets you turn data scripts into sharable web apps in minutes, not week
The purpose of this project is to share knowledge on how awesome Streamlit is and can be
Awesome Streamlit The fastest way to build Awesome Tools and Apps! Powered by Python! The purpose of this project is to share knowledge on how Awesome
PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows.
An open-source, low-code machine learning library in Python 🚀 Version 2.3.5 out now! Check out the release notes here. Official • Docs • Install • Tu
Visualization ideas for data science
Nuance I use Nuance to curate varied visualization thoughts during my data scientist career. It is not yet a package but a list of small ideas. Welcom
Python module for data science and machine learning users.
dsnk-distributions package dsnk distribution is a Python module for data science and machine learning that was created with the goal of reducing calcu
Scheme for training and applying a label propagation framework
Factorisation-based Image Labelling Overview This is a scheme for training and applying the factorisation-based image labelling (FIL) framework. Some
Deep Learning with PyTorch made easy 🚀 !
Deep Learning with PyTorch made easy 🚀 ! Carefree? carefree-learn aims to provide CAREFREE usages for both users and developers. It also provides a c
BookNLP, a natural language processing pipeline for books
BookNLP BookNLP is a natural language processing pipeline that scales to books and other long documents (in English), including: Part-of-speech taggin
How to use TensorLayer
How to use TensorLayer While research in Deep Learning continues to improve the world, we use a bunch of tricks to implement algorithms with TensorLay
The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.
The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate. Website • Key Features • How To Use • Docs •
Projeto: Machine Learning: Linguagens de Programacao 2004-2001
Projeto: Machine Learning: Linguagens de Programacao 2004-2001 Projeto de Data Science e Machine Learning de análise de linguagens de programação de 2
Predicting the usefulness of reviews given the review text and metadata surrounding the reviews.
Predicting Yelp Review Quality Table of Contents Introduction Motivation Goal and Central Questions The Data Data Storage and ETL EDA Data Pipeline Da
Tutorials, assignments, and competitions for MIT Deep Learning related courses.
MIT Deep Learning This repository is a collection of tutorials for MIT Deep Learning courses. More added as courses progress. Tutorial: Deep Learning
Python solutions to solve practical business problems.
Python Business Analytics Also instead of "watching" you can join the link-letter, it's already being sent out to about 90 people and you are free to
Python Data Science Handbook: full text in Jupyter Notebooks
Python Data Science Handbook This repository contains the entire Python Data Science Handbook, in the form of (free!) Jupyter notebooks. How to Use th
Python Automated Machine Learning library for tabular data.
Simple but powerful Automated Machine Learning library for tabular data. It uses efficient in-memory SAP HANA algorithms to automate routine Data Scie
Data Version Control or DVC is an open-source tool for data science and machine learning projects
Continuous Machine Learning project integration with DVC Data Version Control or DVC is an open-source tool for data science and machine learning proj
Introducing neural networks to predict stock prices
IntroNeuralNetworks in Python: A Template Project IntroNeuralNetworks is a project that introduces neural networks and illustrates an example of how o
This project uses unsupervised machine learning to identify correlations between daily inoculation rates in the USA and twitter sentiment in regards to COVID-19.
Twitter COVID-19 Sentiment Analysis Members: Christopher Bach | Khalid Hamid Fallous | Jay Hirpara | Jing Tang | Graham Thomas | David Wetherhold Pro
Accelerating model creation and evaluation.
EmeraldML A machine learning library for streamlining the process of (1) cleaning and splitting data, (2) training, optimizing, and testing various mo
A benchmark of data-centric tasks from across the machine learning lifecycle.
A benchmark of data-centric tasks from across the machine learning lifecycle.
QED-C: The Quantum Economic Development Consortium provides these computer programs and software for use in the fields of quantum science and engineering.
Application-Oriented Performance Benchmarks for Quantum Computing This repository contains a collection of prototypical application- or algorithm-cent
Learning infinite-resolution image processing with GAN and RL from unpaired image datasets, using a differentiable photo editing model.
Exposure: A White-Box Photo Post-Processing Framework ACM Transactions on Graphics (presented at SIGGRAPH 2018) Yuanming Hu1,2, Hao He1,2, Chenxi Xu1,
A scikit-learn-compatible module for estimating prediction intervals.
MAPIE - Model Agnostic Prediction Interval Estimator MAPIE allows you to easily estimate prediction intervals (or prediction sets) using your favourit
PyClustering is a Python, C++ data mining library.
pyclustering is a Python, C++ data mining library (clustering algorithm, oscillatory networks, neural networks). The library provides Python and C++ implementations (C++ pyclustering library) of each algorithm or model. C++ pyclustering library is a part of pyclustering and supported for Linux, Windows and MacOS operating systems.
Code-free deep segmentation for computational pathology
NoCodeSeg: Deep segmentation made easy! This is the official repository for the manuscript "Code-free development and deployment of deep segmentation
A collection of online resources to help you on your Tech journey.
Everything Tech Resources & Projects About The Project Coming from an engineering background and looking to up skill yourself on a new field can be di
ObsPy: A Python Toolbox for seismology/seismological observatories.
ObsPy is an open-source project dedicated to provide a Python framework for processing seismological data. It provides parsers for common file formats
Time Series Cross-Validation -- an extension for scikit-learn
TSCV: Time Series Cross-Validation This repository is a scikit-learn extension for time series cross-validation. It introduces gaps between the traini
Deep Survival Machines - Fully Parametric Survival Regression
Package: dsm Python package dsm provides an API to train the Deep Survival Machines and associated models for problems in survival analysis. The under
A framework for using LSTMs to detect anomalies in multivariate time series data. Includes spacecraft anomaly data and experiments from the Mars Science Laboratory and SMAP missions.
Telemanom (v2.0) v2.0 updates: Vectorized operations via numpy Object-oriented restructure, improved organization Merge branches into single branch fo
DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.
DoWhy | An end-to-end library for causal inference Amit Sharma, Emre Kiciman Introducing DoWhy and the 4 steps of causal inference | Microsoft Researc
StackNet is a computational, scalable and analytical Meta modelling framework
StackNet This repository contains StackNet Meta modelling methodology (and software) which is part of my work as a PhD Student in the computer science
Responsible Machine Learning with Python
Examples of techniques for training interpretable ML models, explaining ML models, and debugging ML models for accuracy, discrimination, and security.
LOFO (Leave One Feature Out) Importance calculates the importances of a set of features based on a metric of choice,
LOFO (Leave One Feature Out) Importance calculates the importances of a set of features based on a metric of choice, for a model of choice, by iteratively removing each feature from the set, and evaluating the performance of the model, with a validation scheme of choice, based on the chosen metric.
moDel Agnostic Language for Exploration and eXplanation
moDel Agnostic Language for Exploration and eXplanation Overview Unverified black box model is the path to the failure. Opaqueness leads to distrust.
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
NNI Doc | 简体中文 NNI (Neural Network Intelligence) is a lightweight but powerful toolkit to help users automate Feature Engineering, Neural Architecture
🌊 River is a Python library for online machine learning.
River is a Python library for online machine learning. It is the result of a merger between creme and scikit-multiflow. River's ambition is to be the go-to library for doing machine learning on streaming data.
Transpile trained scikit-learn estimators to C, Java, JavaScript and others.
sklearn-porter Transpile trained scikit-learn estimators to C, Java, JavaScript and others. It's recommended for limited embedded systems and critical
ModelChimp is an experiment tracker for Deep Learning and Machine Learning experiments.
ModelChimp What is ModelChimp? ModelChimp is an experiment tracker for Deep Learning and Machine Learning experiments. ModelChimp provides the followi
An orchestration platform for the development, production, and observation of data assets.
Dagster An orchestration platform for the development, production, and observation of data assets. Dagster lets you define jobs in terms of the data f
Metaflow is a human-friendly Python/R library that helps scientists and engineers build and manage real-life data science projects
Metaflow Metaflow is a human-friendly Python/R library that helps scientists and engineers build and manage real-life data science projects. Metaflow
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
List of Data Science Cheatsheets to rule the world
Data Science Cheatsheets List of Data Science Cheatsheets to rule the world. Table of Contents Business Science Business Science Problem Framework Dat
A demo without 🚀 science, just simple UTXO spending logic.
Stuck TX Demo Docker container that runs 4 dogecoind to demonstrate "the stuck tx problem". Scenario A wallet sends out 3 transactions to a recipient
Doom o’clock is a website/project that features a countdown of “when will the earth end” and a greenhouse gas effect emission prediction that’s predicted
Doom o’clock is a website/project that features a countdown of “when will the earth end” and a greenhouse gas effect emission prediction that’s predicted
Bodywork deploys machine learning projects developed in Python, to Kubernetes.
Bodywork deploys machine learning projects developed in Python, to Kubernetes. It helps you to: serve models as microservices execute batch jobs run r
MetPy is a collection of tools in Python for reading, visualizing and performing calculations with weather data.
MetPy MetPy is a collection of tools in Python for reading, visualizing and performing calculations with weather data. MetPy follows semantic versioni
nbsafety adds a layer of protection to computational notebooks by solving the stale dependency problem when executing cells out-of-order
nbsafety adds a layer of protection to computational notebooks by solving the stale dependency problem when executing cells out-of-order
Generate Gaussian 09 input files for the rotamers of an input compound.
Rotapy Purpose Generate Gaussian 09 input files for the rotamers of an input compound. Distance to the axis of rotation remains constant throughout th
A Scheil-Gulliver simulation tool using pycalphad.
scheil A Scheil-Gulliver simulation tool using pycalphad. import matplotlib.pyplot as plt from pycalphad import Database, variables as v from scheil i
Fitting thermodynamic models with pycalphad
ESPEI ESPEI, or Extensible Self-optimizing Phase Equilibria Infrastructure, is a tool for thermodynamic database development within the CALPHAD method
Powerful, efficient particle trajectory analysis in scientific Python.
freud Overview The freud Python library provides a simple, flexible, powerful set of tools for analyzing trajectories obtained from molecular dynamics
This repository details the steps in creating a Part of Speech tagger using Trigram Hidden Markov Models and the Viterbi Algorithm without using external libraries.
POS-Tagger This repository details the creation of a Part-of-Speech tagger using Trigram Hidden Markov Models to predict word tags in a word sequence.
This package is a python library with tools for the Molecular Simulation - Software Gromos.
This package is a python library with tools for the Molecular Simulation - Software Gromos. It allows you to easily set up, manage and analyze simulations in python.
A curated list of the latest breakthroughs in AI by release date with a clear video explanation, link to a more in-depth article, and code.
A curated list of the latest breakthroughs in AI by release date with a clear video explanation, link to a more in-depth article, and code
Python package provinding tools for artistic interactive applications using AI
Documentation redrawing Python package provinding tools for artistic interactive applications using AI Created by ReDrawing Campinas team for the Open
edaSQL is a library to link SQL to Exploratory Data Analysis and further more in the Data Engineering.
edaSQL is a python library to bridge the SQL with Exploratory Data Analysis where you can connect to the Database and insert the queries. The query results can be passed to the EDA tool which can give greater insights to the user.
ZenML 🙏: MLOps framework to create reproducible ML pipelines for production machine learning.
ZenML is an extensible, open-source MLOps framework to create production-ready machine learning pipelines. It has a simple, flexible syntax, is cloud and tool agnostic, and has interfaces/abstractions that are catered towards ML workflows.
A primitive Python wrapper around the Gromacs tools.
README: GromacsWrapper A primitive Python wrapper around the Gromacs tools. The library is tested with GROMACS 4.6.5, 2018.x, 2019.x, 2020.x, and 2021
A collection of tools for biomedical research assay analysis in Python.
waltlabtools A collection of tools for biomedical research assay analysis in Python. Key Features Analysis for assays such as digital ELISA, including
Data science, Data manipulation and Machine learning package.
duality Data science, Data manipulation and Machine learning package. Use permitted according to the terms of use and conditions set by the attached l
SW components and demos for visual kinship recognition. An emphasis is put on the FIW dataset-- data loaders, benchmarks, results in summary.
FIW Data Development Kit Table of Contents Introduction Families In the Wild Database Publications Organization To Do License Getting Involved Introdu
Planning Algorithms in AI and Robotics. MSc course at Skoltech Data Science program
Planning Algorithms in AI and Robotics course T2 2021-22 The Planning Algorithms in AI and Robotics course at Skoltech, MS in Data Science, during T2,
Algorithms covered in the Bioinformatics Course part of the Cambridge Computer Science Tripos
Bioinformatics This is a repository of all the algorithms covered in the Bioinformatics Course part of the Cambridge Computer Science Tripos Algorithm
A large-scale database for graph representation learning
A large-scale database for graph representation learning
Python and data science snippets on the command line
Python Snippet Tool A tool to get Python and data science snippets at Data Science Simplified on the command line. You can read my article to learn ho
A framework for feature exploration in Data Science
Beehive A framework for feature exploration in Data Science Background What do we do when we finish one episode of feature exploration in a jupyter no
Improving your data science workflows with
Make Better Defaults Author: Kjell Wooding [email protected] This is the git repo for Makefiles: One great trick for making your conda environments mo
Cool Python features for machine learning that I used to be too afraid to use. Will be updated as I have more time / learn more.
python-is-cool A gentle guide to the Python features that I didn't know existed or was too afraid to use. This will be updated as I learn more and bec
A Snakemake workflow for standardised sc/snRNAseq analysis
single_snake_sequencing - sc/snRNAseq Snakemake Workflow A Snakemake workflow for standardised sc/snRNAseq analysis. Every single cell analysis is sli
These data visualizations were created for my introductory computer science course using Python
Homework 2: Matplotlib and Data Visualization Overview These data visualizations were created for my introductory computer science course using Python
Convolutional neural network web app trained to track our infant’s sleep schedule using our Google Nest camera.
Machine Learning Sleep Schedule Tracker What is it? Convolutional neural network web app trained to track our infant’s sleep schedule using our Google
Differentiable Quantum Chemistry (only Differentiable Density Functional Theory and Hartree Fock at the moment)
DQC: Differentiable Quantum Chemistry Differentiable quantum chemistry package. Currently only support differentiable density functional theory (DFT)
Data Science Environment Setup in single line
datascienv is package that helps your to setup your environment in single line of code with all dependency and it is also include pyforest that provide single line of import all required ml libraries
A multi-page streamlit app for the geospatial community.
A multi-page streamlit app for the geospatial community.
A short non 100% Accurate Solar System in pygame
solar-system-pygame Controls UP/DOWN for Emulation Speed Control ESC for Pause/Unpause q to Quit c or ESC again to Continue LEFT CLICK to Add an orbit
Compares various time-series feature sets on computational performance, within-set structure, and between-set relationships.
feature-set-comp Compares various time-series feature sets on computational performance, within-set structure, and between-set relationships. Reposito
PyGMT - A Python interface for the Generic Mapping Tools
PyGMT A Python interface for the Generic Mapping Tools Documentation (development version) | Contact | Try Online Why PyGMT? A beautiful map is worth
Deep Learning Interviews book: Hundreds of fully solved job interview questions from a wide range of key topics in AI.
This book was written for you: an aspiring data scientist with a quantitative background, facing down the gauntlet of the interview process in an increasingly competitive field. For most of you, the interview process is the most significant hurdle between you and a dream job.
A Python package implementing various HDRI / Radiance image processing algorithms.
Colour - HDRI A Python package implementing various HDRI / Radiance image processing algorithms. It is open source and freely available under the New
A Python package implementing various CFA (Colour Filter Array) demosaicing algorithms and related utilities.
Colour - Demosaicing A Python package implementing various CFA (Colour Filter Array) demosaicing algorithms and related utilities. It is open source a
High-quality implementations of standard and SOTA methods on a variety of tasks.
Uncertainty Baselines The goal of Uncertainty Baselines is to provide a template for researchers to build on. The baselines can be a starting point fo
A Web API for automatic background removal using Deep Learning. App is made using Flask and deployed on Heroku.
Automatic_Background_Remover A Web API for automatic background removal using Deep Learning. App is made using Flask and deployed on Heroku. 👉 https:
This repository requires you to solve a problem by writing some basic python code.
Can You Solve a Problem? A beginner friendly repository that requires you to solve familiar problems with python. This could be as simple as implement
The implementation of the submitted paper "Deep Multi-Behaviors Graph Network for Voucher Redemption Rate Prediction" in SIGKDD 2021 Applied Data Science Track.
DMBGN: Deep Multi-Behaviors Graph Networks for Voucher Redemption Rate Prediction The implementation of the accepted paper "Deep Multi-Behaviors Graph
Tools for making image cutouts from sets of TESS full frame images
Cutout tools for astronomical images Astrocut provides tools for making cutouts from sets of astronomical images with shared footprints. It is under a
Lale is a Python library for semi-automated data science.
Lale is a Python library for semi-automated data science. Lale makes it easy to automatically select algorithms and tune hyperparameters of pipelines that are compatible with scikit-learn, in a type-safe fashion.
Glue is a python project to link visualizations of scientific datasets across many files.
Glue Glue is a python project to link visualizations of scientific datasets across many files. Click on the image for a quick demo: Features Interacti
Transform-Invariant Non-Negative Matrix Factorization
Transform-Invariant Non-Negative Matrix Factorization A comprehensive Python package for Non-Negative Matrix Factorization (NMF) with a focus on learn