3176 Repositories
Python exploratory-data-analysis Libraries
Introducing neural networks to predict stock prices
IntroNeuralNetworks in Python: A Template Project IntroNeuralNetworks is a project that introduces neural networks and illustrates an example of how o
PyExplainer: A Local Rule-Based Model-Agnostic Technique (Explainable AI)
PyExplainer PyExplainer is a local rule-based model-agnostic technique for generating explanations (i.e., why a commit is predicted as defective) of J
An interactive dashboard for visualisation, integration and classification of data using Active Learning.
AstronomicAL An interactive dashboard for visualisation, integration and classification of data using Active Learning. AstronomicAL is a human-in-the-
Full automated data pipeline using docker images
Create postgres tables from CSV files This first section is only relate to creating tables from CSV files using postgres container alone. Just one of
For specific function. For my own convenience. Remind owner to share data to another DITO user.
For specific function. For my own convenience. Remind owner to share data to another DITO user.
Building house price data pipelines with Apache Beam and Spark on GCP
This project contains the process from building a web crawler to extract the raw data of house price to create ETL pipelines using Google Could Platform services.
topic modeling on unstructured data in Space news articles retrieved from the Guardian (UK) newspaper using API
NLP Space News Topic Modeling Photos by nasa.gov (1, 2, 3, 4, 5) and extremetech.com Table of Contents Project Idea Data acquisition Primary data sour
Real-time analysis of intracranial neurophysiology recordings.
py_neuromodulation Click this button to run the "Tutorial ML with py_neuro" notebooks: The py_neuromodulation toolbox allows for real time capable pro
Automated Machine Learning Pipeline for tabular data. Designed for predictive maintenance applications, failure identification, failure prediction, condition monitoring, etc.
Automated Machine Learning Pipeline for tabular data. Designed for predictive maintenance applications, failure identification, failure prediction, condition monitoring, etc.
This project uses unsupervised machine learning to identify correlations between daily inoculation rates in the USA and twitter sentiment in regards to COVID-19.
Twitter COVID-19 Sentiment Analysis Members: Christopher Bach | Khalid Hamid Fallous | Jay Hirpara | Jing Tang | Graham Thomas | David Wetherhold Pro
Accelerating model creation and evaluation.
EmeraldML A machine learning library for streamlining the process of (1) cleaning and splitting data, (2) training, optimizing, and testing various mo
This Repository consists of my solutions in Python 3 to various problems in Data Structures and Algorithms
Problems and it's solutions. Problem solving, a great Speed comes with a good Accuracy. The more Accurate you can write code, the more Speed you will
Conversational text Analysis using various NLP techniques
Conversational text Analysis using various NLP techniques
Analyse japanese ebooks using MeCab to determine the difficulty level for japanese learners
japanese-ebook-analysis This aim of this project is to make analysing the contents of a japanese ebook easy and streamline the process for non-technic
BESS: Balanced Evolutionary Semi-Stacking for Disease Detection via Partially Labeled Imbalanced Tongue Data
Balanced-Evolutionary-Semi-Stacking Code for the paper ''BESS: Balanced Evolutionary Semi-Stacking for Disease Detection via Partially Labeled Imbalan
go-cqhttp API typing annoations, return data models and utils for nonebot
go-cqhttp API typing annoations, return data models and utils for nonebot
Herramienta para transferir eventos de Sucuri WAF hacia Azure Data Tables.
Transfiere eventos de Sucuri hacia Azure Data Tables Script para transferir eventos del Sucuri Web Application Firewall (WAF) hacia Azure Data Tables,
An easy-to-use library for emulating code in minidump files.
dumpulator Note: This is a work-in-progress prototype, please treat it as such. An easy-to-use library for emulating code in minidump files. Example T
FakeDataGen is a Full Valid Fake Data Generator.
FakeDataGen is a Full Valid Fake Data Generator. This tool helps you to create fake accounts (in Spanish format) with fully valid data. Within this in
LinkML based SPARQL template library and execution engine
sparqlfun LinkML based SPARQL template library and execution engine modularized core library of SPARQL templates generic templates using common vocabs
Convert your JSON data to a valid Python object to allow accessing keys with the member access operator(.)
JSONObjectMapper Allows you to transform JSON data into an object whose members can be queried using the member access operator. Unlike json.dumps in
A benchmark of data-centric tasks from across the machine learning lifecycle.
A benchmark of data-centric tasks from across the machine learning lifecycle.
Code and real data for the paper "Counterfactual Temporal Point Processes", available at arXiv.
counterfactual-tpp This is a repository containing code and real data for the paper Counterfactual Temporal Point Processes. Pre-requisites This code
Official PyTorch implementation for "Low Precision Decentralized Distributed Training with Heterogenous Data"
Low Precision Decentralized Training with Heterogenous Data Official PyTorch implementation for "Low Precision Decentralized Distributed Training with
Machine Learning Privacy Meter: A tool to quantify the privacy risks of machine learning models with respect to inference attacks, notably membership inference attacks
ML Privacy Meter Machine learning is playing a central role in automated decision making in a wide range of organization and service providers. The da
Official repository for "Restormer: Efficient Transformer for High-Resolution Image Restoration". SOTA results for single-image motion deblurring, image deraining, image denoising (synthetic and real data), and dual-pixel defocus deblurring.
Restormer: Efficient Transformer for High-Resolution Image Restoration Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan,
Self-attentive task GAN for space domain awareness data augmentation.
SATGAN TODO: update the article URL once published. Article about this implemention The self-attentive task generative adversarial network (SATGAN) le
A multi-platform GUI for bit-based analysis, processing, and visualization
A multi-platform GUI for bit-based analysis, processing, and visualization
Render tokei's output to interactive sunburst chart.
Render tokei's output to interactive sunburst chart.
This library is an ongoing effort towards bringing the data exchanging ability between Java/Scala and Python
PyJava This library is an ongoing effort towards bringing the data exchanging ability between Java/Scala and Python
A number of methods in order to perform Natural Language Processing on live data derived from Twitter
A number of methods in order to perform Natural Language Processing on live data derived from Twitter
A PyTorch implementation of "CoAtNet: Marrying Convolution and Attention for All Data Sizes".
CoAtNet Overview This is a PyTorch implementation of CoAtNet specified in "CoAtNet: Marrying Convolution and Attention for All Data Sizes", arXiv 2021
Project issue to website data transformation toolkit
braintransform Project issue to website data transformation toolkit. Introduction The purpose of these scripts is to be able to dynamically generate t
PyTorch implementation for our NeurIPS 2021 Spotlight paper "Long Short-Term Transformer for Online Action Detection".
Long Short-Term Transformer for Online Action Detection Introduction This is a PyTorch implementation for our NeurIPS 2021 Spotlight paper "Long Short
Simulation code and tutorial for BBHnet training data
Simulation Dataset for BBHnet NOTE: OLD README, UPDATE IN PROGRESS We generate simulation dataset to train BBHnet, our deep learning framework for det
Clip Bing Maps backgound as RGB geotif image using center-point from vector data of a shapefile and Bing Maps zoom
Clip Bing Maps backgound as RGB geotif image using center-point from vector data of a shapefile and Bing Maps zoom. Also, rasterize shapefile vectors as corresponding label image.
PyTorch Kafka Dataset: A definition of a dataset to get training data from Kafka.
PyTorch Kafka Dataset: A definition of a dataset to get training data from Kafka.
GeneGAN: Learning Object Transfiguration and Attribute Subspace from Unpaired Data
GeneGAN: Learning Object Transfiguration and Attribute Subspace from Unpaired Data By Shuchang Zhou, Taihong Xiao, Yi Yang, Dieqiao Feng, Qinyao He, W
Training, generation, and analysis code for Learning Particle Physics by Example: Location-Aware Generative Adversarial Networks for Physics
Location-Aware Generative Adversarial Networks (LAGAN) for Physics Synthesis This repository contains all the code used in L. de Oliveira (@lukedeo),
Code and data for paper "Deep Photo Style Transfer"
deep-photo-styletransfer Code and data for paper "Deep Photo Style Transfer" Disclaimer This software is published for academic and non-commercial use
A data preprocessing and feature engineering script for a machine learning pipeline is prepared.
FEATURE ENGINEERING Business Problem: A data preprocessing and feature engineering script for a machine learning pipeline needs to be prepared. It is
Required for a machine learning pipeline data preprocessing and variable engineering script needs to be prepared
Feature-Engineering Required for a machine learning pipeline data preprocessing and variable engineering script needs to be prepared. When the dataset
Sentiment Analysis Project using Count Vectorizer and TF-IDF Vectorizer
Sentiment Analysis Project This project contains two sentiment analysis programs for Hotel Reviews using a Hotel Reviews dataset from Datafiniti. The
The repository forked from NVlabs uses our data. (Differentiable rasterization applied to 3D model simplification tasks)
nvdiffmodeling [origin_code] Differentiable rasterization applied to 3D model simplification tasks, as described in the paper: Appearance-Driven Autom
Extract rooms type, door, neibour rooms, rooms corners nad bounding boxes, and generate graph from rplan dataset
Housegan-data-reader House-GAN++ (data-reader) Code and instructions for converting rplan dataset (raster images) to housegan++ data format. House-GAN
A Semi-Intelligent ChatBot filled with statistical and economical data for the Premier League.
MONEYBALL - ChatBot Module: 4006CEM, Class: B, Group: 5 Contributors: Jonas Djondo Roshan Kc Cole Samson Daniel Rodrigues Ihteshaam Naseer Kind remind
Data Utilities e.g. for importing files to onetask
Use this repository to easily convert your source files (csv, txt, excel, json, html) into record-oriented JSON files that can be uploaded into onetask.
Finding Label and Model Errors in Perception Data With Learned Observation Assertions
Finding Label and Model Errors in Perception Data With Learned Observation Assertions This is the project page for Finding Label and Model Errors in P
🐦 Quickly annotate data from the comfort of your Jupyter notebook
🐦 pigeon - Quickly annotate data on Jupyter Pigeon is a simple widget that lets you quickly annotate a dataset of unlabeled examples from the comfort
This is the code used in the paper "Entity Embeddings of Categorical Variables".
This is the code used in the paper "Entity Embeddings of Categorical Variables". If you want to get the original version of the code used for the Kagg
A scikit-learn-compatible module for estimating prediction intervals.
MAPIE - Model Agnostic Prediction Interval Estimator MAPIE allows you to easily estimate prediction intervals (or prediction sets) using your favourit
A high performance implementation of HDBSCAN clustering.
HDBSCAN HDBSCAN - Hierarchical Density-Based Spatial Clustering of Applications with Noise. Performs DBSCAN over varying epsilon values and integrates
PyClustering is a Python, C++ data mining library.
pyclustering is a Python, C++ data mining library (clustering algorithm, oscillatory networks, neural networks). The library provides Python and C++ implementations (C++ pyclustering library) of each algorithm or model. C++ pyclustering library is a part of pyclustering and supported for Linux, Windows and MacOS operating systems.
The Fundamental Clustering Problems Suite (FCPS) summaries 54 state-of-the-art clustering algorithms, common cluster challenges and estimations of the number of clusters as well as the testing for cluster tendency.
FCPS Fundamental Clustering Problems Suite The package provides over sixty state-of-the-art clustering algorithms for unsupervised machine learning pu
t-SNE and hierarchical clustering are popular methods of exploratory data analysis, particularly in biology.
tree-SNE t-SNE and hierarchical clustering are popular methods of exploratory data analysis, particularly in biology. Building on recent advances in s
Subpopulation detection in high-dimensional single-cell data
PhenoGraph for Python3 PhenoGraph is a clustering method designed for high-dimensional single-cell data. It works by creating a graph ("network") repr
Kats, a kit to analyze time series data, a lightweight, easy-to-use, generalizable, and extendable framework to perform time series analysis, from understanding the key statistics and characteristics, detecting change points and anomalies, to forecasting future trends.
Description Kats is a toolkit to analyze time series data, a lightweight, easy-to-use, and generalizable framework to perform time series analysis. Ti
A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.
pmdarima Pmdarima (originally pyramid-arima, for the anagram of 'py' + 'arima') is a statistical library designed to fill the void in Python's time se
A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.
pmdarima Pmdarima (originally pyramid-arima, for the anagram of 'py' + 'arima') is a statistical library designed to fill the void in Python's time se
Contains an implementation (sklearn API) of the algorithm proposed in "GENDIS: GEnetic DIscovery of Shapelets" and code to reproduce all experiments.
GENDIS GENetic DIscovery of Shapelets In the time series classification domain, shapelets are small subseries that are discriminative for a certain cl
Calling Julia from Python - an experiment on data loading
Calling Julia from Python - an experiment on data loading See the slides. TLDR After reading Patrick's blog post, we decided to try to replace C++ wit
An experimental Python-to-C transpiler and domain specific language for embedded high-performance computing
An experimental Python-to-C transpiler and domain specific language for embedded high-performance computing
The audio-video synchronization of MKV Container Format is exploited to achieve data hiding
The audio-video synchronization of MKV Container Format is exploited to achieve data hiding, where the hidden data can be utilized for various management purposes, including hyper-linking, annotation, and authentication
Urban Big Data Centre Housing Sensor Project
Housing Sensor Project The Urban Big Data Centre is conducting a study of indoor environmental data in Scottish houses. We are using Raspberry Pi devi
[NeurIPS 2021] Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data
Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data (NeurIPS 2021) This repository will provide the official PyTorch implementa
A deep-learning pipeline for segmentation of ambiguous microscopic images.
Welcome to Official repository of deepflash2 - a deep-learning pipeline for segmentation of ambiguous microscopic images. Quick Start in 30 seconds se
Full-featured Decision Trees and Random Forests learner.
CID3 This is a full-featured Decision Trees and Random Forests learner. It can save trees or forests to disk for later use. It is possible to query tr
Repository containing detailed experiments related to the paper "Memotion Analysis through the Lens of Joint Embedding".
Memotion Analysis Through The Lens Of Joint Embedding This repository contains the experiments conducted as described in the paper 'Memotion Analysis
This repository is maintained for the scientific paper tittled " Study of keyword extraction techniques for Electric Double Layer Capacitor domain using text similarity indexes: An experimental analysis "
kwd-extraction-study This repository is maintained for the scientific paper tittled " Study of keyword extraction techniques for Electric Double Layer
Stochastic gradient descent with model building
Stochastic Model Building (SMB) This repository includes a new fast and robust stochastic optimization algorithm for training deep learning models. Th
Scrutinizing XAI with linear ground-truth data
This repository contains all the experiments presented in the corresponding paper: "Scrutinizing XAI using linear ground-truth data with suppressor va
Code and real data for the paper "Counterfactual Temporal Point Processes", available at arXiv.
counterfactual-tpp This is a repository containing code and real data for the paper Counterfactual Temporal Point Processes. Pre-requisites This code
Code and data accompanying our SVRHM'21 paper.
Code and data accompanying our SVRHM'21 paper. Requires tensorflow 1.13, python 3.7, scikit-learn, and pytorch 1.6.0 to be installed. Python scripts i
Automated detection of anomalous exoplanet transits in light curve data.
Automatically detecting anomalous exoplanet transits This repository contains the source code for the paper "Automatically detecting anomalous exoplan
DataCLUE: 国内首个以数据为中心的AI测评(含模型分析报告)
DataCLUE: A Benchmark Suite for Data-centric NLP You can get the english version of README. 以数据为中心的AI测评(DataCLUE) 内容导引 章节 描述 简介 介绍以数据为中心的AI测评(DataCLUE
Stochastic Extragradient: General Analysis and Improved Rates
Stochastic Extragradient: General Analysis and Improved Rates This repository is the official implementation of the paper "Stochastic Extragradient: G
Text-to-Music Retrieval using Pre-defined/Data-driven Emotion Embeddings
Text2Music Emotion Embedding Text-to-Music Retrieval using Pre-defined/Data-driven Emotion Embeddings Reference Emotion Embedding Spaces for Matching
A collection of online resources to help you on your Tech journey.
Everything Tech Resources & Projects About The Project Coming from an engineering background and looking to up skill yourself on a new field can be di
SASE : Self-Adaptive noise distribution network for Speech Enhancement with heterogeneous data of Cross-Silo Federated learning
SASE : Self-Adaptive noise distribution network for Speech Enhancement with heterogeneous data of Cross-Silo Federated learning We propose a SASE mode
Interactive Web App with Streamlit and Scikit-learn that applies different Classification algorithms to popular datasets
Interactive Web App with Streamlit and Scikit-learn that applies different Classification algorithms to popular datasets Datasets Used: Iris dataset,
FLIR/DJI IR Camera Data Parser, Python Version
FLIR/DJI IR Camera Data Parser, Python Version Parser infrared camera data as NumPy data. Usage Clone this respository and cd thermal_parser. Run pip
Scikit learn library models to account for data and concept drift.
liquid_scikit_learn Scikit learn library models to account for data and concept drift. This python library focuses on solving data drift and concept d
Python plugin/extra to load data files from an external source (such as AWS S3) to a local directory
Data Loader Plugin - Python Table of Content (ToC) Data Loader Plugin - Python Table of Content (ToC) Overview References Python module Python virtual
Service for working with open data of the State Duma of the Russian Federation
Сервис для работы с открытыми данными Госдумы РФ Исходные данные из API Госдумы РФ извлекаются с помощью Apache Nifi и приземляются в хранилище Clickh
MASS (Mueen's Algorithm for Similarity Search) - a python 2 and 3 compatible library used for searching time series sub-sequences under z-normalized Euclidean distance for similarity.
Introduction MASS allows you to search a time series for a subquery resulting in an array of distances. These array of distances enable you to identif
ObsPy: A Python Toolbox for seismology/seismological observatories.
ObsPy is an open-source project dedicated to provide a Python framework for processing seismological data. It provides parsers for common file formats
sktime companion package for deep learning based on TensorFlow
NOTE: sktime-dl is currently being updated to work correctly with sktime 0.6, and wwill be fully relaunched over the summer. The plan is Refactor and
Luminaire is a python package that provides ML driven solutions for monitoring time series data.
A hands-off Anomaly Detection Library Table of contents What is Luminaire Quick Start Time Series Outlier Detection Workflow Anomaly Detection for Hig
Time Series Cross-Validation -- an extension for scikit-learn
TSCV: Time Series Cross-Validation This repository is a scikit-learn extension for time series cross-validation. It introduces gaps between the traini
Python library to download market data via Bloomberg, Eikon, Quandl, Yahoo etc.
findatapy findatapy creates an easy to use Python API to download market data from many sources including Quandl, Bloomberg, Yahoo, Google etc. using
Portfolio analytics for quants, written in Python
QuantStats: Portfolio analytics for quants QuantStats Python library that performs portfolio profiling, allowing quants and portfolio managers to unde
scikit-survival is a Python module for survival analysis built on top of scikit-learn.
scikit-survival scikit-survival is a Python module for survival analysis built on top of scikit-learn. It allows doing survival analysis while utilizi
Library of Stan Models for Survival Analysis
survivalstan: Survival Models in Stan author: Jacki Novik Overview Library of Stan Models for Survival Analysis Features: Variety of standard survival
PySurvival is an open source python package for Survival Analysis modeling
PySurvival What is Pysurvival ? PySurvival is an open source python package for Survival Analysis modeling - the modeling concept used to analyze or p
Deep Survival Machines - Fully Parametric Survival Regression
Package: dsm Python package dsm provides an API to train the Deep Survival Machines and associated models for problems in survival analysis. The under
Banpei is a Python package of the anomaly detection.
Banpei Banpei is a Python package of the anomaly detection. Anomaly detection is a technique used to identify unusual patterns that do not conform to
A framework for using LSTMs to detect anomalies in multivariate time series data. Includes spacecraft anomaly data and experiments from the Mars Science Laboratory and SMAP missions.
Telemanom (v2.0) v2.0 updates: Vectorized operations via numpy Object-oriented restructure, improved organization Merge branches into single branch fo
A Python package for modular causal inference analysis and model evaluations
Causal Inference 360 A Python package for inferring causal effects from observational data. Description Causal inference analysis enables estimating t
DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.
DoWhy | An end-to-end library for causal inference Amit Sharma, Emre Kiciman Introducing DoWhy and the 4 steps of causal inference | Microsoft Researc
Responsible Machine Learning with Python
Examples of techniques for training interpretable ML models, explaining ML models, and debugging ML models for accuracy, discrimination, and security.