3035 Repositories
Python AWS-Serverless-Data-Engineering-Pipeline Libraries
A terminal spreadsheet multitool for discovering and arranging data
VisiData v2.6.1 A terminal interface for exploring and arranging tabular data. VisiData supports tsv, csv, sqlite, json, xlsx (Excel), hdf5, and many
A large-scale database for graph representation learning
A large-scale database for graph representation learning
In this project, ETL pipeline is build on data warehouse hosted on AWS Redshift.
ETL Pipeline for AWS Project Description In this project, ETL pipeline is build on data warehouse hosted on AWS Redshift. The data is loaded from S3 t
MLOps will help you to understand how to build a Continuous Integration and Continuous Delivery pipeline for an ML/AI project.
page_type languages products description sample python azure azure-machine-learning-service azure-devops Code which demonstrates how to set up and ope
AWS Lambda Fast API starter application
AWS Lambda Fast API Fast API starter application compatible with API Gateway and Lambda Function. How to deploy it? Terraform AWS Lambda API is a reus
Data on Free Food at MIT
MIT Free Food Timing Procrastinating research by plotting data on how long it takes emails on the free-food at mit edu mailing list to go through. Dat
Minimal working example of data acquisition with nidaqmx python API
Data Aquisition using NI-DAQmx python API Based on this project It is a minimal working example for data acquisition using the NI-DAQmx python API. It
A toolkit for geo ML data processing and model evaluation (fork of solaris)
An open source ML toolkit for overhead imagery. This is a beta version of lunular which may continue to develop. Please report any bugs through issues
Data cleaning tools for Business analysis
Datacleaning datacleaning tools for Business analysis This program is made for Vicky's work. You can use it, too. 数据清洗 该数据清洗工具是为了商业分析 这个程序是为了Vicky的工作而
Based on 125GB of data leaked from Twitch, you can see their monthly revenues from 2019-2021
Twitch Revenues Bu script'i kullanarak istediğiniz yayıncıların, Twitch'den sızdırılan 125 GB'lik veriye dayanarak, 2019-2021 arası aylık gelirlerini
fMRIprep Pipeline To Machine Learning
fMRIprep Pipeline To Machine Learning(Demo) 所有配置均在config.py文件下定义 前置环境(lilab) 各个节点均安装docker,并有fmripre的镜像 可以使用conda中的base环境(相应的第三份包之后更新) 1. fmriprep scr
This repo represents all we learned and are learning in Data Structure course.
DataStructure Journey This repo represents all we learned and are learning in Data Structure course which is based on CLRS book and is being taught by
Enhancing Knowledge Tracing via Adversarial Training
Enhancing Knowledge Tracing via Adversarial Training This repository contains source code for the paper "Enhancing Knowledge Tracing via Adversarial T
Aquarius - Enabling Fast, Scalable, Data-Driven Virtual Network Functions
Aquarius Aquarius - Enabling Fast, Scalable, Data-Driven Virtual Network Functions NOTE: We are currently going through the open-source process requir
Python library for analysis of time series data including dimensionality reduction, clustering, and Markov model estimation
deeptime Releases: Installation via conda recommended. conda install -c conda-forge deeptime pip install deeptime Documentation: deeptime-ml.github.io
How to Leverage Multimodal EHR Data for Better Medical Predictions?
How to Leverage Multimodal EHR Data for Better Medical Predictions? This repository contains the code of the paper: How to Leverage Multimodal EHR Dat
A command line tool for visualizing CSV/spreadsheet-like data
PerfPlotter Read data from CSV files using pandas and generate interactive plots using bokeh, which can then be embedded into HTML pages and served by
Data Competition: automated systems that can detect whether people are not wearing masks or are wearing masks incorrectly
Table of contents Introduction Dataset Model & Metrics How to Run Quickstart Install Training Evaluation Detection DATA COMPETITION The COVID-19 pande
Scraping web pages to get data
Scraping Data Get public data and save in database This is project use Python How to run a project 1 - Clone the repository 2 - Install beautifulsoup4
Python + AWS Lambda Hands OnPython + AWS Lambda Hands On
Python + AWS Lambda Hands On Python Criada em 1990, por Guido Van Rossum. "Bala de prata" (quase). Muito utilizado em: Automatizações - Selenium, Beau
[IROS2021] NYU-VPR: Long-Term Visual Place Recognition Benchmark with View Direction and Data Anonymization Influences
NYU-VPR This repository provides the experiment code for the paper Long-Term Visual Place Recognition Benchmark with View Direction and Data Anonymiza
PdpCLI is a pandas DataFrame processing CLI tool which enables you to build a pandas pipeline from a configuration file.
PdpCLI Quick Links Introduction Installation Tutorial Basic Usage Data Reader / Writer Plugins Introduction PdpCLI is a pandas DataFrame processing CL
Python package with library and CLI tool for analyzing SeaFlow data
Seaflowpy A Python package for SeaFlow flow cytometer data. Table of Contents Install Read EVT/OPP/VCT Files Command-line Interface Configuration Inte
Python and data science snippets on the command line
Python Snippet Tool A tool to get Python and data science snippets at Data Science Simplified on the command line. You can read my article to learn ho
AthenaCLI is a CLI tool for AWS Athena service that can do auto-completion and syntax highlighting.
Introduction AthenaCLI is a command line interface (CLI) for the Athena service that can do auto-completion and syntax highlighting, and is a proud me
Splitgraph command line client and python library
Splitgraph Overview Splitgraph is a tool for building, versioning and querying reproducible datasets. It's inspired by Docker and Git, so it feels fam
Cloudkeeper is “housekeeping for clouds” - find leaky resources, manage quota limits, detect drift and clean up.
Cloudkeeper Housekeeping for Clouds! Table of contents Overview Docker based quick start Cloning this repository Component list Contact License Overvi
TAug :: Time Series Data Augmentation using Deep Generative Models
TAug :: Time Series Data Augmentation using Deep Generative Models Note!!! The package is under development so be careful for using in production! Fea
Deep Illuminator is a data augmentation tool designed for image relighting. It can be used to easily and efficiently generate a wide range of illumination variants of a single image.
Deep Illuminator Deep Illuminator is a data augmentation tool designed for image relighting. It can be used to easily and efficiently generate a wide
SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data
SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data Au
A general, feasible, and extensible framework for classification tasks.
Pytorch Classification A general, feasible and extensible framework for 2D image classification. Features Easy to configure (model, hyperparameters) T
This provides the R code and data to replicate results in "The USS Trustee’s risky strategy"
USSBriefs2021 This provides the R code and data to replicate results in "The USS Trustee’s risky strategy" by Neil M Davies, Jackie Grant and Chin Yan
[NeurIPS-2021] Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data
MosaicKD Code for NeurIPS-21 paper "Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data" 1. Motivation Natural images share common l
Codes and Data Processing Files for our paper.
Code Scripts and Processing Files for EEG Sleep Staging Paper 1. Folder Tree ./src_preprocess (data preprocessing files for SHHS and Sleep EDF) sleepE
Simple data balancing baselines for worst-group-accuracy benchmarks.
BalancingGroups Code to replicate the experimental results from Simple data balancing baselines achieve competitive worst-group-accuracy. Replicating
Official project repository for 'Normality-Calibrated Autoencoder for Unsupervised Anomaly Detection on Data Contamination'
NCAE_UAD Official project repository of 'Normality-Calibrated Autoencoder for Unsupervised Anomaly Detection on Data Contamination' Abstract In this p
Colossal-AI: A Unified Deep Learning System for Large-Scale Parallel Training
ColossalAI An integrated large-scale model training system with efficient parallelization techniques. arXiv: Colossal-AI: A Unified Deep Learning Syst
Driver Buddy Reloaded is an IDA Pro Python plugin that helps automate some tedious Windows Kernel Drivers reverse engineering tasks.
Driver Buddy Reloaded Quickstart Table of Contents Installation Usage About Driver Buddy Reloaded Finding DispatchDeviceControl Labelling WDM & WDF St
Data visualization electromagnetic spectrum
Datenvisualisierung-Elektromagnetischen-Spektrum Anhand des Moduls matplotlib sollen die Daten des elektromagnetischen Spektrums dargestellt werden. D
A framework for feature exploration in Data Science
Beehive A framework for feature exploration in Data Science Background What do we do when we finish one episode of feature exploration in a jupyter no
Experimental proxy for dumping the unencrypted packet data from Brawl Stars (WIP)
Brawl Stars Proxy Experimental proxy for version 39.99 of Brawl Stars. It allows you to capture the packets being sent between the Brawl Stars client
DWIPrep is a robust and easy-to-use pipeline for preprocessing of diverse dMRI data.
DWIPrep: A Robust Preprocessing Pipeline for dMRI Data DWIPrep is a robust and easy-to-use pipeline for preprocessing of diverse dMRI data. The transp
A tutorial for people to run synthetic data replica's from source healthcare datasets
Synthetic-Data-Replica-for-Healthcare Description What is this? A tailored hands-on tutorial showing how to use Python to create synthetic data replic
[NeurIPS-2021] Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data
MosaicKD Code for NeurIPS-21 paper "Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data" 1. Motivation Natural images share common l
Rover is a command line interface application that allows through browse through mission data, images, metadata from the NASA Official Website
🤖 rover Rover is a command line interface application that allows through browse through mission data, images, metadata from the NASA Official Websit
Data, model training, and evaluation code for "PubTables-1M: Towards a universal dataset and metrics for training and evaluating table extraction models".
PubTables-1M This repository contains training and evaluation code for the paper "PubTables-1M: Towards a universal dataset and metrics for training a
Spatial Interpolation Toolbox is a Python-based GUI that is able to interpolate spatial data in vector format.
Spatial Interpolation Toolbox This is the home to Spatial Interpolation Toolbox, a graphical user interface (GUI) for interpolating geographic vector
Improving your data science workflows with
Make Better Defaults Author: Kjell Wooding [email protected] This is the git repo for Makefiles: One great trick for making your conda environments mo
A framework for attentive explainable deep learning on tabular data
🧠 kendrite A framework for attentive explainable deep learning on tabular data 💨 Quick start kedro run 🧱 Built upon Technology Description Links ke
AWS Enumeration and Footprinting Tool
Quiet Riot 🎶 C'mon, Feel The Noise 🎶 An enumeration tool for scalable, unauthenticated validation of AWS principals; including AWS Acccount IDs, roo
Towhee is a flexible machine learning framework currently focused on computing deep learning embeddings over unstructured data.
Towhee is a flexible machine learning framework currently focused on computing deep learning embeddings over unstructured data.
aws ec2.py companion script to generate sshconfigs with auto bastion host discovery
ec2-bastion-sshconfig This script will interate over instances found by ec2.py and if those instances are not publically accessible it will search the
Pytorch implementation of Cut-Thumbnail in the paper Cut-Thumbnail:A Novel Data Augmentation for Convolutional Neural Network.
Cut-Thumbnail (Accepted at ACM MULTIMEDIA 2021) Tianshu Xie, Xuan Cheng, Xiaomin Wang, Minghui Liu, Jiali Deng, Tao Zhou, Ming Liu This is the officia
An example of repository data as bundles
Bundles This repository is just an example of how we can host Git bundles in a way that supports fetching data from precomputed bundles without the or
Colossal-AI: A Unified Deep Learning System for Large-Scale Parallel Training
ColossalAI An integrated large-scale model training system with efficient parallelization techniques Installation PyPI pip install colossalai Install
Cool Python features for machine learning that I used to be too afraid to use. Will be updated as I have more time / learn more.
python-is-cool A gentle guide to the Python features that I didn't know existed or was too afraid to use. This will be updated as I learn more and bec
This Lambda will Pull propagated routes from TGW and update VPC route table
AWS-Transitgateway-Route-Propagation This Lambda will Pull propagated routes from TGW and update VPC route table. Tested on python 3.8 Lambda AWS INST
SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data
SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data Au
Simple data balancing baselines for worst-group-accuracy benchmarks.
BalancingGroups Code to replicate the experimental results from Simple data balancing baselines achieve competitive worst-group-accuracy. Replicating
My project contrasts K-Nearest Neighbors and Random Forrest Regressors on Real World data
kNN-vs-RFR My project contrasts K-Nearest Neighbors and Random Forrest Regressors on Real World data In many areas, rental bikes have been launched to
Tool for working with Y-chromosome data from YFull and FTDNA
ycomp ycomp is a tool for working with Y-chromosome data from YFull and FTDNA. Run ycomp -h for information on how to use the program. Installation Th
K-means clustering is a method used for clustering analysis, especially in data mining and statistics.
K Means Algorithm What is K Means This algorithm is an iterative algorithm that partitions the dataset according to their features into K number of pr
Homework 2: Matplotlib and Data Visualization
Homework 2: Matplotlib and Data Visualization Overview These data visualizations were created for my introductory computer science course using Python
PyTorch framework A simple and complete framework for PyTorch, providing a variety of data loading and simple task solutions that are easy to extend and migrate
PyTorch framework A simple and complete framework for PyTorch, providing a variety of data loading and simple task solutions that are easy to extend and migrate
Anomaly Detection Based on Hierarchical Clustering of Mobile Robot Data
We proposed a new approach to detect anomalies of mobile robot data. We investigate each data seperately with two clustering method hierarchical and k-means. There are two sub-method that we used for produce an anomaly score. Then, we merge these two score and produce merged anomaly score as a result.
Can we visualize a large scientific data set with a surrogate model? We're building a GAN for the Earth's Mantle Convection data set to see if we can!
EarthGAN - Earth Mantle Surrogate Modeling Can a surrogate model of the Earth’s Mantle Convection data set be built such that it can be readily run in
This repo is all about different data structures and algorithms..
Data Structure and Algorithm : Want to learn data strutrues and algorithms ??? Then Stop thinking more and start to learn today. This repo will help y
Demo to explain how to use AWS Chalice to connect to twitter API and change profile picture at scheduled times.
chalice-twitter-demo Demo to explain how to use AWS Chalice to connect to twitter API and change profile picture at scheduled times. Video Demo Click
Data visualization using matplotlib
Data visualization using matplotlib project instructions Top 5 Most Common Coffee Origins In this visualization I used data from Ankur Chavda on Kaggl
Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples
Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples This repository is the official implementation of paper [Qimera: Data-free Q
Leyna's Visualizing Data With Python
Leyna's Visualizing Data Below is information on the number of bilingual students in three school districts in Massachusetts. You will also find infor
HW_02 Data visualisation task
HW_02 Data visualisation and Matplotlib practice Instructions for HW_02 Idea for data analysis As I was brainstorming ideas and running through databa
Light-weight network, depth estimation, knowledge distillation, real-time depth estimation, auxiliary data.
light-weight-depth-estimation Boosting Light-Weight Depth Estimation Via Knowledge Distillation, https://arxiv.org/abs/2105.06143 Junjie Hu, Chenyou F
Building and deploying AWS Lambda Shared Layers
AWS Lambda Shared Layers This repository is hosting the code from the following blog post: AWS Lambda & Shared layers for Python. The goal of this rep
Dumps to CSV all the resources in an organization's member accounts
AWS Org Inventory Dumps to CSV all the resources in an organization's member accounts. Set your environment's AWS_PROFILE and AWS_DEFAULT_REGION varia
AWS Workmail Migration Tool
WMigrate A tool for migrating AWS Workmail Users and Groups cross region and cross accounts. It also creates user and group aliases and adds the users
Rick and Morty Data Visualization with python
Rick and Morty Data Visualization For this project I looked at data for the TV show Rick and Morty Number of Episodes at a Certain Location Here is th
Keir&'s Visualizing Data on Life Expectancy
Keir's Visualizing Data on Life Expectancy Below is information on life expectancy in the United States from 1900-2017. You will also find information
This is a small repository for me to implement my simply Data Visualisation skills through Python.
Data Visualisations This is a small repository for me to implement my simply Data Visualisation skills through Python. Steam Population Chart from 10/
TrackTech: Real-time tracking of subjects and objects on multiple cameras
TrackTech: Real-time tracking of subjects and objects on multiple cameras This project is part of the 2021 spring bachelor final project of the Bachel
Generate custom detailed survey paper with topic clustered sections and proper citations, from just a single query in just under 30 mins !!
Auto-Research A no-code utility to generate a detailed well-cited survey with topic clustered sections (draft paper format) and other interesting arti
These data visualizations were created as homework for my CS40 class. I hope you enjoy!
Data Visualizations These data visualizations were created as homework for my CS40 class. I hope you enjoy! Nobel Laureates by their Country of Birth
Python IDE or notebook to generate a basic Kepler.gl data visualization
geospatial-data-analysis [readme] Use this code in your Python IDE or notebook to generate a basic Kepler.gl data visualization, without pre-configura
These data visualizations were created for my introductory computer science course using Python
Homework 2: Matplotlib and Data Visualization Overview These data visualizations were created for my introductory computer science course using Python
A script that trains a model to recognize handwritten digits using the MNIST data set.
handwritten-digits-recognition A script that trains a model to recognize handwritten digits using the MNIST data set. Then it loads external files and
This repository contains code and data for "On the Multimodal Person Verification Using Audio-Visual-Thermal Data"
trimodal_person_verification This repository contains the code, and preprocessed dataset featured in "A Study of Multimodal Person Verification Using
code for generating data set ES-ImageNet with corresponding training code
es-imagenet-master code for generating data set ES-ImageNet with corresponding training code dataset generator some codes of ODG algorithm The variabl
A Context-aware Visual Attention-based training pipeline for Object Detection from a Webpage screenshot!
CoVA: Context-aware Visual Attention for Webpage Information Extraction Abstract Webpage information extraction (WIE) is an important step to create k
Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".
CoProtector Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".
DECAF: Generating Fair Synthetic Data Using Causally-Aware Generative Networks
DECAF (DEbiasing CAusal Fairness) Code Author: Trent Kyono This repository contains the code used for the "DECAF: Generating Fair Synthetic Data Using
A novel pipeline framework for multi-hop complex KGQA task. About the paper title: Improving Multi-hop Embedded Knowledge Graph Question Answering by Introducing Relational Chain Reasoning
Rce-KGQA A novel pipeline framework for multi-hop complex KGQA task. This framework mainly contains two modules, answering_filtering_module and relati
The Research PACS on AWS solution facilitates researchers' access medical images stored in the clinical PACS in a secure and seamless manner
Research PACS on AWS Challenge to solve Solution presentation Deploy the solution Further reading Releases License Challenge to solve The rise of new
Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".
CoProtector Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".
Code accompanying the paper "How Tight Can PAC-Bayes be in the Small Data Regime?"
How Tight Can PAC-Bayes be in the Small Data Regime? This is the code to reproduce all experiments for the following paper: @inproceedings{Foong:2021:
One-Stop Destination for codes of all Data Structures & Algorithms
CodingSimplified_GK This repository is aimed at creating a One stop Destination of codes of all Data structures and Algorithms along with basic explai
Spectacular AI SDK fuses data from cameras and IMU sensors and outputs an accurate 6-degree-of-freedom pose of a device.
Spectacular AI SDK examples Spectacular AI SDK fuses data from cameras and IMU sensors (accelerometer and gyroscope) and outputs an accurate 6-degree-
Django e-commerce website with Advanced Features and SEO Friendly
MyTech® - Your Technology Django e-commerce website with Advanced Features and SEO Friendly Images and Prices are only used for Demo purpose and does
Visualize the electric field of a point charge network.
ElectriPy ⚡ Visualize the electric field of a point charges network. 🔌 Installation Install ElectriPy package: $ pip install electripy You are all d
PyTorch Code for NeurIPS 2021 paper Anti-Backdoor Learning: Training Clean Models on Poisoned Data.
Anti-Backdoor Learning PyTorch Code for NeurIPS 2021 paper Anti-Backdoor Learning: Training Clean Models on Poisoned Data. Check the unlearning effect