1190 Repositories
Python analysis-pipeline Libraries
Lazarus analysis tools and research report
Lazarus Research This repository publishes analysis reports and analysis tools for Operation Dream Job and Operation JTrack for Lazarus. Tools Python
Project under the certification "Data Analysis with Python" on FreeCodeCamp
Sea Level Predictor Assignment You will anaylize a dataset of the global average sea level change since 1880. You will use the data to predict the sea
Tools for the analysis, simulation, and presentation of Lorentz TEM data.
ltempy ltempy is a set of tools for Lorentz TEM data analysis, simulation, and presentation. Features Single Image Transport of Intensity Equation (SI
A paper list for aspect based sentiment analysis.
Aspect-Based-Sentiment-Analysis A paper list for aspect based sentiment analysis. Survey [IEEE-TAC-20]: Issues and Challenges of Aspect-based Sentimen
This is a Docker-based pipeline for preparing sextractor-ready multiwavelength images
Pipeline for creating NB422-detected (ODI) catalog The repository contains a Docker-based pipeline for preprocessing observational data. The pipeline
Pipeline to convert a haploid assembly into diploid
HapDup (haplotype duplicator) is a pipeline to convert a haploid long read assembly into a dual diploid assembly. The reconstructed haplotypes
WAL enables programmable waveform analysis.
This repro introcudes the Waveform Analysis Language (WAL). The initial paper on WAL will appear at ASPDAC'22 and can be downloaded here: https://www.
Scikit-Learn useful pre-defined Pipelines Hub
Scikit-Pipes Scikit-Learn useful pre-defined Pipelines Hub Usage: Install scikit-pipes It's advised to install sklearn-genetic using a virtual env, in
π Papermill is a tool for parameterizing, executing, and analyzing Jupyter Notebooks.
papermill is a tool for parameterizing, executing, and analyzing Jupyter Notebooks. Papermill lets you: parameterize notebooks execute notebooks This
Automatically visualize your pandas dataframe via a single print! π π‘
A Python API for Intelligent Visual Discovery Lux is a Python library that facilitate fast and easy data exploration by automating the visualization a
Intel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application
Intel(R) Extension for Scikit-learn* Installation | Documentation | Examples | Support | FAQ With Intel(R) Extension for Scikit-learn you can accelera
A DSL for data-driven computational pipelines
"Dataflow variables are spectacularly expressive in concurrent programming" Henri E. Bal , Jennifer G. Steiner , Andrew S. Tanenbaum Quick overview Ne
This repository contains the official implementation code of the paper Transformer-based Feature Reconstruction Network for Robust Multimodal Sentiment Analysis
This repository contains the official implementation code of the paper Transformer-based Feature Reconstruction Network for Robust Multimodal Sentiment Analysis, accepted at ACMMM 2021.
Modeling Temporal Concept Receptive Field Dynamically for Untrimmed Video Analysis
Modeling Temporal Concept Receptive Field Dynamically for Untrimmed Video Analysis This is a PyTorch implementation of the model described in our pape
A unified 3D Transformer Pipeline for visual synthesis
Overview This is the official repo for the paper: "NΓWA: Visual Synthesis Pre-training for Neural visUal World creAtion". NΓWA is a unified multimodal
Portfolio Optimization and Quantitative Strategic Asset Allocation in Python
Riskfolio-Lib Quantitative Strategic Asset Allocation, Easy for Everyone. Description Riskfolio-Lib is a library for making quantitative strategic ass
π§ͺ Cutting-edge experimental spaCy components and features
spacy-experimental: Cutting-edge experimental spaCy components and features This package includes experimental components and features for spaCy v3.x,
Web scraped S&P 500 Data from Wikipedia using Pandas and performed Exploratory Data Analysis on the data.
Web scraped S&P 500 Data from Wikipedia using Pandas and performed Exploratory Data Analysis on the data. Then used Yahoo Finance to get the related stock data and displayed them in the form of charts.
Graph parsing approach to structured sentiment analysis.
Fine-grained Sentiment Analysis as Dependency Graph Parsing This repository contains the code and datasets described in following paper: Fine-grained
Real-time ground filtering algorithm of cloud points acquired using Terrestrial Laser Scanner (TLS)
This repository contains tools to simulate the ground filtering process of a registered point cloud. The repository contains two filtering methods. The first method uses a normal vector, and fit to plane. The second method utilizes voxel adjacency, and fit to plane.
Does Oversizing Improve Prosumer Profitability in a Flexibility Market? - A Sensitivity Analysis using PV-battery System
Does Oversizing Improve Prosumer Profitability in a Flexibility Market? - A Sensitivity Analysis using PV-battery System The possibilities to involve
Epidemiology analysis package
zEpid zEpid is an epidemiology analysis package, providing easy to use tools for epidemiologists coding in Python 3.5+. The purpose of this library is
Tutorial in Python targeted at Epidemiologists. Will discuss the basics of analysis in Python 3
Python-for-Epidemiologists This repository is an introduction to epidemiology analyses in Python. Additionally, the tutorials for my library zEpid are
Explorative Data Analysis Guidelines
Explorative Data Analysis Get data into a usable format! Find out if the following predictive modeling phase will be successful! Combine everything in
Kaggler is a Python package for lightweight online machine learning algorithms and utility functions for ETL and data analysis.
Kaggler is a Python package for lightweight online machine learning algorithms and utility functions for ETL and data analysis. It is distributed under the MIT License.
MCML is a toolkit for semi-supervised dimensionality reduction and quantitative analysis of Multi-Class, Multi-Label data
MCML is a toolkit for semi-supervised dimensionality reduction and quantitative analysis of Multi-Class, Multi-Label data. We demonstrate its use
An example project using OpenPrompt under pytorch-lightning for prompt-based SST2 sentiment analysis model
pl_prompt_sst An example project using OpenPrompt under the framework of pytorch-lightning for a training prompt-based text classification model on SS
Crypto Stats and Tweets Data Pipeline using Airflow
Crypto Stats and Tweets Data Pipeline using Airflow Introduction Project Overview This project was brought upon through Udacity's nanodegree program.
Python factor analysis library (PCA, CA, MCA, MFA, FAMD)
Prince is a library for doing factor analysis. This includes a variety of methods including principal component analysis (PCA) and correspondence anal
A high-performance topological machine learning toolbox in Python
giotto-tda is a high-performance topological machine learning toolbox in Python built on top of scikit-learn and is distributed under the G
Single-Cell Analysis in Python. Scales to 1M cells.
Scanpy β Single-Cell Analysis in Python Scanpy is a scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It inc
Falcon: Interactive Visual Analysis for Big Data
Falcon: Interactive Visual Analysis for Big Data Crossfilter millions of records without latencies. This project is work in progress and not documente
Streamlit β The fastest way to build data apps in Python
Welcome to Streamlit π The fastest way to build and share data apps. Streamlit lets you turn data scripts into sharable web apps in minutes, not week
Datashader is a data rasterization pipeline for automating the process of creating meaningful representations of large amounts of data.
Datashader is a data rasterization pipeline for automating the process of creating meaningful representations of large amounts of data.
Introduction to Geospatial Analysis in Python
Introduction to Geospatial Analysis in Python This repository is in support of a talk on geospatial data. Data To recreate all of the examples, the da
Infomap is a network clustering algorithm based on the Map equation.
Infomap Infomap is a network clustering algorithm based on the Map equation. For detailed documentation, see mapequation.org/infomap. For a list of re
Code to reproduce the results in the paper "Tensor Component Analysis for Interpreting the Latent Space of GANs".
Tensor Component Analysis for Interpreting the Latent Space of GANs [ paper | project page ] Code to reproduce the results in the paper "Tensor Compon
Python beta calculator that retrieves stock and market data and provides linear regressions.
Stock and Index Beta Calculator Python script that calculates the beta (Ξ²) of a stock against the chosen index. The script retrieves the data and resa
A Powerful Serverless Analysis Toolkit That Takes Trial And Error Out of Machine Learning Projects
KXY: A Seemless API to 10x The Productivity of Machine Learning Engineers Documentation https://www.kxy.ai/reference/ Installation From PyPi: pip inst
Detectron2 for Document Layout Analysis
Detectron2 trained on PubLayNet dataset This repo contains the training configurations, code and trained models trained on PubLayNet dataset using Det
Python Blood Vessel Topology Analysis
Python Blood Vessel Topology Analysis This repository is not being updated anymore. The new version of PyVesTo is called PyVaNe and is available at ht
This repo contains implementation of different architectures for emotion recognition in conversations.
Emotion Recognition in Conversations Updates π₯ π₯ π₯ Date Announcements 03/08/2021 π π We have released a new dataset M2H2: A Multimodal Multiparty
BookNLP, a natural language processing pipeline for books
BookNLP BookNLP is a natural language processing pipeline that scales to books and other long documents (in English), including: Part-of-speech taggin
Image classification for projects and researches
This is a tool to help you quickly solve classification problems including: data analysis, training, report results and model explanation.
Single cell current best practices tutorial case study for the paper:Luecken and Theis, "Current best practices in single-cell RNA-seq analysis: a tutorial"
Scripts for "Current best-practices in single-cell RNA-seq: a tutorial" This repository is complementary to the publication: M.D. Luecken, F.J. Theis,
This project uses Youtube data API's to do youtube tags analysis based on viewCount, comments etc.
Youtube video details analyser Steps to run this project Please set the AuthKey which you can fetch from google developer console and paste it in the
zeus is a Python implementation of the Ensemble Slice Sampling method.
zeus is a Python implementation of the Ensemble Slice Sampling method. Fast & Robust Bayesian Inference, Efficient Markov Chain Monte Carlo (MCMC), Bl
Source code and notebooks to reproduce experiments and benchmarks on Bias Faces in the Wild (BFW).
Face Recognition: Too Bias, or Not Too Bias? Robinson, Joseph P., Gennady Livitz, Yann Henon, Can Qin, Yun Fu, and Samson Timoner. "Face recognition:
Predicting the usefulness of reviews given the review text and metadata surrounding the reviews.
Predicting Yelp Review Quality Table of Contents Introduction Motivation Goal and Central Questions The Data Data Storage and ETL EDA Data Pipeline Da
Optimal Randomized Canonical Correlation Analysis
ORCCA Optimal Randomized Canonical Correlation Analysis This project is for the python version of ORCCA algorithm. It depends on Numpy for matrix calc
Empyrial is a Python-based open-source quantitative investment library dedicated to financial institutions and retail investors
By Investors, For Investors. Want to read this in Chinese? Click here Empyrial is a Python-based open-source quantitative investment library dedicated
A machine learning model for analyzing text for user sentiment and determine whether its a positive, neutral, or negative review.
Sentiment Analysis on Yelp's Dataset Author: Roberto Sanchez, Talent Path: D1 Group Docker Deployment: Deployment of this application can be found her
Backtesting an algorithmic trading strategy using Machine Learning and Sentiment Analysis.
Trading Tesla with Machine Learning and Sentiment Analysis An interactive program to train a Random Forest Classifier to predict Tesla daily prices us
Full automated data pipeline using docker images
Create postgres tables from CSV files This first section is only relate to creating tables from CSV files using postgres container alone. Just one of
Building house price data pipelines with Apache Beam and Spark on GCP
This project contains the process from building a web crawler to extract the raw data of house price to create ETL pipelines using Google Could Platform services.
Real-time analysis of intracranial neurophysiology recordings.
py_neuromodulation Click this button to run the "Tutorial ML with py_neuro" notebooks: The py_neuromodulation toolbox allows for real time capable pro
Automated Machine Learning Pipeline for tabular data. Designed for predictive maintenance applications, failure identification, failure prediction, condition monitoring, etc.
Automated Machine Learning Pipeline for tabular data. Designed for predictive maintenance applications, failure identification, failure prediction, condition monitoring, etc.
This project uses unsupervised machine learning to identify correlations between daily inoculation rates in the USA and twitter sentiment in regards to COVID-19.
Twitter COVID-19 Sentiment Analysis Members: Christopher Bach | Khalid Hamid Fallous | Jay Hirpara | Jing Tang | Graham Thomas | David Wetherhold Pro
A simple python application for running a CI pipeline locally
A simple python application for running a CI pipeline locally This app currently supports GitLab CI scripts
Conversational text Analysis using various NLP techniques
Conversational text Analysis using various NLP techniques
Analyse japanese ebooks using MeCab to determine the difficulty level for japanese learners
japanese-ebook-analysis This aim of this project is to make analysing the contents of a japanese ebook easy and streamline the process for non-technic
An easy-to-use library for emulating code in minidump files.
dumpulator Note: This is a work-in-progress prototype, please treat it as such. An easy-to-use library for emulating code in minidump files. Example T
A multi-platform GUI for bit-based analysis, processing, and visualization
A multi-platform GUI for bit-based analysis, processing, and visualization
Render tokei's output to interactive sunburst chart.
Render tokei's output to interactive sunburst chart.
PyTorch implementation for our NeurIPS 2021 Spotlight paper "Long Short-Term Transformer for Online Action Detection".
Long Short-Term Transformer for Online Action Detection Introduction This is a PyTorch implementation for our NeurIPS 2021 Spotlight paper "Long Short
Two phase pipeline + StreamlitTwo phase pipeline + Streamlit
Two phase pipeline + Streamlit This is an example project that demonstrates how to create a pipeline that consists of two phases of execution. In betw
Training, generation, and analysis code for Learning Particle Physics by Example: Location-Aware Generative Adversarial Networks for Physics
Location-Aware Generative Adversarial Networks (LAGAN) for Physics Synthesis This repository contains all the code used in L. de Oliveira (@lukedeo),
A data preprocessing and feature engineering script for a machine learning pipeline is prepared.
FEATURE ENGINEERING Business Problem: A data preprocessing and feature engineering script for a machine learning pipeline needs to be prepared. It is
Required for a machine learning pipeline data preprocessing and variable engineering script needs to be prepared
Feature-Engineering Required for a machine learning pipeline data preprocessing and variable engineering script needs to be prepared. When the dataset
Sentiment Analysis Project using Count Vectorizer and TF-IDF Vectorizer
Sentiment Analysis Project This project contains two sentiment analysis programs for Hotel Reviews using a Hotel Reviews dataset from Datafiniti. The
A simple python application for running a CI pipeline locally This app currently supports GitLab CI scripts
π Simple Local CI Runner π A simple python application for running a CI pipeline locally This app currently supports GitLab CI scripts βοΈ Setup Inst
A high performance implementation of HDBSCAN clustering.
HDBSCAN HDBSCAN - Hierarchical Density-Based Spatial Clustering of Applications with Noise. Performs DBSCAN over varying epsilon values and integrates
The Fundamental Clustering Problems Suite (FCPS) summaries 54 state-of-the-art clustering algorithms, common cluster challenges and estimations of the number of clusters as well as the testing for cluster tendency.
FCPS Fundamental Clustering Problems Suite The package provides over sixty state-of-the-art clustering algorithms for unsupervised machine learning pu
t-SNE and hierarchical clustering are popular methods of exploratory data analysis, particularly in biology.
tree-SNE t-SNE and hierarchical clustering are popular methods of exploratory data analysis, particularly in biology. Building on recent advances in s
Kats, a kit to analyze time series data, a lightweight, easy-to-use, generalizable, and extendable framework to perform time series analysis, from understanding the key statistics and characteristics, detecting change points and anomalies, to forecasting future trends.
Description Kats is a toolkit to analyze time series data, a lightweight, easy-to-use, and generalizable framework to perform time series analysis. Ti
A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.
pmdarima Pmdarima (originally pyramid-arima, for the anagram of 'py' + 'arima') is a statistical library designed to fill the void in Python's time se
A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.
pmdarima Pmdarima (originally pyramid-arima, for the anagram of 'py' + 'arima') is a statistical library designed to fill the void in Python's time se
Contains an implementation (sklearn API) of the algorithm proposed in "GENDIS: GEnetic DIscovery of Shapelets" and code to reproduce all experiments.
GENDIS GENetic DIscovery of Shapelets In the time series classification domain, shapelets are small subseries that are discriminative for a certain cl
An experimental Python-to-C transpiler and domain specific language for embedded high-performance computing
An experimental Python-to-C transpiler and domain specific language for embedded high-performance computing
A deep-learning pipeline for segmentation of ambiguous microscopic images.
Welcome to Official repository of deepflash2 - a deep-learning pipeline for segmentation of ambiguous microscopic images. Quick Start in 30 seconds se
Full-featured Decision Trees and Random Forests learner.
CID3 This is a full-featured Decision Trees and Random Forests learner. It can save trees or forests to disk for later use. It is possible to query tr
Repository containing detailed experiments related to the paper "Memotion Analysis through the Lens of Joint Embedding".
Memotion Analysis Through The Lens Of Joint Embedding This repository contains the experiments conducted as described in the paper 'Memotion Analysis
This repository is maintained for the scientific paper tittled " Study of keyword extraction techniques for Electric Double Layer Capacitor domain using text similarity indexes: An experimental analysis "
kwd-extraction-study This repository is maintained for the scientific paper tittled " Study of keyword extraction techniques for Electric Double Layer
Stochastic gradient descent with model building
Stochastic Model Building (SMB) This repository includes a new fast and robust stochastic optimization algorithm for training deep learning models. Th
Stochastic Extragradient: General Analysis and Improved Rates
Stochastic Extragradient: General Analysis and Improved Rates This repository is the official implementation of the paper "Stochastic Extragradient: G
Interactive Web App with Streamlit and Scikit-learn that applies different Classification algorithms to popular datasets
Interactive Web App with Streamlit and Scikit-learn that applies different Classification algorithms to popular datasets Datasets Used: Iris dataset,
MASS (Mueen's Algorithm for Similarity Search) - a python 2 and 3 compatible library used for searching time series sub-sequences under z-normalized Euclidean distance for similarity.
Introduction MASS allows you to search a time series for a subquery resulting in an array of distances. These array of distances enable you to identif
Portfolio analytics for quants, written in Python
QuantStats: Portfolio analytics for quants QuantStats Python library that performs portfolio profiling, allowing quants and portfolio managers to unde
scikit-survival is a Python module for survival analysis built on top of scikit-learn.
scikit-survival scikit-survival is a Python module for survival analysis built on top of scikit-learn. It allows doing survival analysis while utilizi
Library of Stan Models for Survival Analysis
survivalstan: Survival Models in Stan author: Jacki Novik Overview Library of Stan Models for Survival Analysis Features: Variety of standard survival
PySurvival is an open source python package for Survival Analysis modeling
PySurvival What is Pysurvival ? PySurvival is an open source python package for Survival Analysis modeling - the modeling concept used to analyze or p
Deep Survival Machines - Fully Parametric Survival Regression
Package: dsm Python package dsm provides an API to train the Deep Survival Machines and associated models for problems in survival analysis. The under
Banpei is a Python package of the anomaly detection.
Banpei Banpei is a Python package of the anomaly detection. Anomaly detection is a technique used to identify unusual patterns that do not conform to
A Python package for modular causal inference analysis and model evaluations
Causal Inference 360 A Python package for inferring causal effects from observational data. Description Causal inference analysis enables estimating t
moDel Agnostic Language for Exploration and eXplanation
moDel Agnostic Language for Exploration and eXplanation Overview Unverified black box model is the path to the failure. Opaqueness leads to distrust.
Code for our paper "Multi-scale Guided Attention for Medical Image Segmentation"
Medical Image Segmentation with Guided Attention This repository contains the code of our paper: "'Multi-scale self-guided attention for medical image
A combination of autoregressors and autoencoders using XLNet for sentiment analysis
A combination of autoregressors and autoencoders using XLNet for sentiment analysis Abstract In this paper sentiment analysis has been performed in or
Identifies the faulty wafer before it can be used for the fabrication of integrated circuits and, in photovoltaics, to manufacture solar cells.
Identifies the faulty wafer before it can be used for the fabrication of integrated circuits and, in photovoltaics, to manufacture solar cells. The project retrains itself after every prediction, making it more robust and generalized over time.
Sentiment analysis on streaming twitter data using Spark Structured Streaming & Python
Sentiment analysis on streaming twitter data using Spark Structured Streaming & Python This project is a good starting point for those who have little
Important dataframe statistics with a single command
quick_eda Receiving dataframe statistics with one command Project description A python package for Data Scientists, Students, ML Engineers and anyone