389 Repositories
The Python Extract, Load, Transform Studio is an application for performing ELT (and ETL) tasks. Under the hood the application consists of a two parts.
A 2-dimensional physics engine written in Cairo
About Unsub is a collection analysis tool that assists libraries in analyzing their journal subscriptions. The tool provides rich data and a summary g
PySpark-Structured-Streaming-ROS-Kafka-ApacheSpark-Cassandra The purpose of this project is to demonstrate a structured streaming pipeline with Apache
How Long Does It Take Ordinary People To "Get Good" At Chess? TL;DR: According to 5.5 years of data from 2.3 million players and 450 million games, mo
This tutorial's purpose is to introduce Pythonistas to methods for scaling their data science and machine learning work to larger datasets and larger models, using the tools and APIs they know and love from the PyData stack (such as numpy, pandas, and scikit-learn).
A package to aggregate gridded data in xarray to polygons in geopandas using area-weighting from the relative area overlaps between pixels and polygons. Check out the binder link above for a sample code run!
Statistical Rethinking: A Bayesian Course Using CmdStanPy and Plotnine Intro This repo contains the python/stan version of the Statistical Rethinking
Elicited Helper tools to construct probability distributions built from expert elicited data for use in monte carlo simulations. Credit to Brett Hoove
Allen Institute Visual Behavior Analysis package This repository contains code for analyzing behavioral data from the Allen Brain Observatory: Visual
Working Time Statistics of working hours and working conditions by industry and company
AptaMAT Purpose AptaMat is a simple script which aims to measure differences between DNA or RNA secondary structures. The method is based on the compa
Data Scientist Learning Plan Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials
EMGDecomp Package for decomposing EMG signals into motor unit firings, created for Formento et al 2021. Based heavily on Negro et al, 2016. Supports G
h1stats - h1 Program Stats Scraper This python3 script will call out to HackerOne's graphql API and scrape all currently active programs for informati
Revolut Stocks calculator for Bulgarian National Revenue Agency Information Processing and calculating the required information about stock possession
DefAP is a program developed to facilitate the exploration of a material's defect chemistry. A large number of features are provided and rapid exploration is supported through the use of autoplotting with carefully considered automatic data labelling and simplification options enabling production of publication quality plots.
Contents LDP LinUCB Description Model Architecture Dataset Environment Requirements Script Description Script and Sample Code Script Parameters Launch
API to fetch elements of the periodic table in JSON format. Uses Pandas for dumping .csv data to .json and Flask for API Integration. Deployed on "pyt
Tweetmetric Tweetmetric allows you to track various metrics on your most recent tweets, such as impressions, retweets and clicks on your profile. The
GBiStat package A python package to assist programmers with data analysis. This package could be used to plot : Binomial Distribution of the dataset p
Minimal Pandas API for Polars Install From PyPI: pip install minimal-pandas-api-for-polars Example Usage (see tests/test_minimal_pandas_api_for_polars
signac-flow - manage workflows with signac The signac framework helps users manage and scale file-based workflows, facilitating data reuse, sharing, a
PCAfold is an open-source Python library for generating, analyzing and improving low-dimensional manifolds obtained via Principal Component Analysis (PCA).
napari-time-slicer A meta plugin for processing timelapse data timepoint by timepoint. It enables a list of napari plugins to process 2D+t or 3D+t dat
Data Analyst with PYTHON Data Analyst berperan dalam menghasilkan analisa data serta mempresentasikan insight untuk membantu proses pengambilan keputu
Candlestick-Pattern-Recognition-with-Python-and-TA-Lib Goal Look at the S&P500 to try and get a better understanding of these candlestick patterns and
WhiteBox Utilities Toolkit: Tools to make your life easier Fancy data functions that will make your life as a data scientist easier. Installing To ins
NSE-timeseries-form-CSV-file-creator-and-SQL-appender- This creates a ohlc timeseries from downloaded CSV files from National Stock Exchange India (NS
PyChemia, Python Framework for Materials Discovery and Design PyChemia is an open-source Python Library for materials structural search. The purpose o
Retailer's Sale Data Simulation Retail-Sim is python package to easily create synthetic dataset of retaile store. Simulation Model Simulator consists
Galvanalyser is a system for automatically storing data generated by battery cycling machines in a database, using a set of "harvesters", whose job it
ped-crash-techvol: Texas Ped Crash Tech Volume Pack In conjunction with the Final Report "Identifying Risk Factors that Lead to Increase in Fatal Pede
X-news - Pipeline data use scrapy, kafka, spark streaming, spark ML and elasticsearch, Kibana
qg-edgeofchaos Analysis scripts for QG equations FIle/Folder Structure eigensolvers.py - Spectral and finite-difference solvers for Rossby wave eigenf
PyEmits, a python package for easy manipulation in time-series data. Time-series data is very common in real life. Engineering FSI industry (Financial
Flenser Have you ever been handed a dataset you've never seen before? Flenser is a simple, minimal, automated exploratory data analysis tool. It runs
CubingB is a timer/analyzer for speedsolving Rubik's cubes (and related puzzles). It focuses on supporting "smart cubes" (i.e. bluetooth cubes) for recording the exact moves of a solve in real time.
similarpanda A module to check for differences between pandas Dataframes, and generate a report in Excel format. This is helpful in a workplace settin
ESPEI ESPEI, or Extensible Self-optimizing Phase Equilibria Infrastructure, is a tool for thermodynamic database development within the CALPHAD method
Realtime Financial Market Data Visualization and Analysis Introduction This repo shows my project about real-time stock data pipeline. All the code is
Quickstart Hyenas is a curde Hy handle written on top of Pandas API to allow for more elegant access to data-scientist's powerhouse that is Pandas. In
Statistical Analysis 📈 This repository focuses on statistical analysis and the exploration used on various data sets for personal and professional pr
Hippolyzer Hippolyzer is a revival of Linden Lab's PyOGP library targeting modern Python 3, with a focus on debugging issues in Second Life-compatible
timeranges Time ranges. Read the Docs Installation pip timeranges is available on pip: pip install timeranges GitHub You can also install the latest v
Web Trader Web Trader is a trading website that consolidates data from Nasdaq, allowing the user to search up the ticker symbol and price of any stock
Backtesting the "Cramer Effect" & Recommendations from Cramer Recommendations from Cramer: On the show Mad-Money (CNBC) Jim Cramer picks stocks which
Stitch together Nanopore tiled amplicon data using a reference guided approach Tiled amplicon data, like those produced from primers designed with pri
Data Analysis for First Year Laboratory at Imperial College, London. For personal reference only, and to reference in lab reports and lab books.
Pandas Tab Implementation of Stata's tabulate command in Pandas for extremely easy to type one-way and two-way tabulations. Support: Python 3.7 and 3.
A Numba-based two-point correlation function (2PCF) calculator using a grid decomposition. Like Corrfunc, but written in Numba, with simplicity and hackability in mind.
A real data analysis and modeling project - restaurant inspections Jafar Pourbemany 9/27/2021 This project represents data analysis and modeling of re
Hatchet Hatchet is a Python-based library that allows Pandas dataframes to be indexed by structured tree and graph data. It is intended for analyzing
ShallowSeismicImaging Tools to imaging the shallow seismic structure, above 10 km, based on the ZH ratio measured from the ambient seismic noise, and