3255 Repositories
Python probabilistic-data-analysis Libraries
Astrostatistics class for the MSc degree in Astrophysics at the University of Milan-Bicocca (Italy)
Astrostatistics Davide Gerosa - [email protected] University of Milano-Bicocca, 2022. Schedule Introduction Probability and Statistics I Probabi
An open-source outlier detection package by Getcontact Data Team
pyfbad The pyfbad library supports anomaly detection projects. An end-to-end anomaly detection application can be written using the source codes of th
Pizza Orders Data Pipeline Usecase Solved by SQL, Sqoop, HDFS, Hive, Airflow.
PizzaOrders_DataPipeline There is a Tony who is owning a New Pizza shop. He knew that pizza alone was not going to help him get seed funding to expand
The final project of "Applying AI to 2D Medical Imaging Data" of "AI for Healthcare" nanodegree - Udacity.
Pneumonia Detection from X-Rays Project Overview In this project, you will apply the skills that you have acquired in this 2D medical imaging course t
Credit EDA Case Study Using Python
This case study aims to identify patterns which indicate if a client has difficulty paying their installments which may be used for taking actions such as denying the loan, reducing the amount of loan, lending (to risky applicants) at a higher interest rate, etc
The final project for "Applying AI to Wearable Device Data" course from "AI for Healthcare" - Udacity.
Motion Compensated Pulse Rate Estimation Overview This project has 2 main parts. Develop a Pulse Rate Algorithm on the given training data. Then Test
Project for the discipline of Visual Data Analysis at EMAp FGV.
Analysis of the dissemination of fake news about COVID-19 on Twitter This project was the final work for the discipline of Visual Data Analysis of the
The final project of "Applying AI to EHR Data" of "AI for Healthcare" nanodegree - Udacity.
Patient Selection for Diabetes Drug Testing Project Overview EHR data is becoming a key source of real-world evidence (RWE) for the pharmaceutical ind
Use graph-based analysis to re-classify stocks and to improve Markowitz portfolio optimization
Dynamic Stock Industrial Classification Use graph-based analysis to re-classify stocks and experiment different re-classification methodologies to imp
The final project of "Applying AI to 3D Medical Imaging Data" from "AI for Healthcare" nanodegree - Udacity.
Quantifying Hippocampus Volume for Alzheimer's Progression Background Alzheimer's disease (AD) is a progressive neurodegenerative disorder that result
Using Global fishing watch's data to build a machine learning model that can identify illegal fishing and poaching activities through satellite and geo-location data.
Using Global fishing watch's data to build a machine learning model that can identify illegal fishing and poaching activities through satellite and geo-location data.
Self-supervised spatio-spectro-temporal represenation learning for EEG analysis
EEG-Oriented Self-Supervised Learning and Cluster-Aware Adaptation This repository provides a tensorflow implementation of a submitted paper: EEG-Orie
A list of Python Bots used to extract data from several websites
A list of Python Bots used to extract data from several websites. Data extraction is for products on e-commerce (ecommerce) websites. Data fetched i
Madanalysis5 - A package for event file analysis and recasting of LHC results
Welcome to MadAnalysis 5 Outline What is MadAnalysis 5? Requirements Downloading
MOOSE (Multi-organ objective segmentation) a data-centric AI solution that generates multilabel organ segmentations to facilitate systemic TB whole-person research
MOOSE (Multi-organ objective segmentation) a data-centric AI solution that generates multilabel organ segmentations to facilitate systemic TB whole-person research.The pipeline is based on nn-UNet and has the capability to segment 120 unique tissue classes from a whole-body 18F-FDG PET/CT image.
A simple baseline for the 2022 IEEE GRSS Data Fusion Contest (DFC2022)
DFC2022 Baseline A simple baseline for the 2022 IEEE GRSS Data Fusion Contest (DFC2022) This repository uses TorchGeo, PyTorch Lightning, and Segmenta
Calendar heatmaps from Pandas time series data
Note: See MarvinT/calmap for the maintained version of the project. That is also the version that gets published to PyPI and it has received several f
Aircraft design optimization made fast through modern automatic differentiation
Aircraft design optimization made fast through modern automatic differentiation. Plug-and-play analysis tools for aerodynamics, propulsion, structures, trajectory design, and much more.
Unauthenticated Sqlinjection that leads to dump data base but this one impersonated Admin and drops a interactive shell
Unauthenticated Sqlinjection that leads to dump database but this one impersonated Admin and drops a interactive shell
Med to csv - A simple way to parse MedAssociate output file in tidy data
MedAssociates to CSV file A simple way to parse MedAssociate output file in tidy
Stock-Prediction - prediction of stock market movements using sentiment analysis and deep learning.
Stock-Prediction- In this project, we aim to enhance the prediction of stock market movements using sentiment analysis and deep learning. We divide th
This project is related to a No-SQL database, whose data are referred to autoctone botanic species
This project is related to a No-SQL database, whose data are referred to autoctone botanic species. The final goal is creating a function that performs the estimation of the ornamental value, given the specific characteristics of a single species.
CellRank's reproducibility repository.
CellRank's reproducibility repository We believe that reproducibility is key and have made it as simple as possible to reproduce our results. Please e
Data-Uncertainty Guided Multi-Phase Learning for Semi-supervised Object Detection
An official implementation of paper Data-Uncertainty Guided Multi-Phase Learning for Semi-supervised Object Detection
ALSPAC data analysis studying links between screen-usage and mental health issues in children. Provided data has been synthesised.
ADSMH - Mental Health and Screen Time Group coursework for Applied Data Science at the University of Bristol. Overview The data set that you have was
The repository is my code for various types of data visualization cases based on the Matplotlib library.
ScienceGallery The repository is my code for various types of data visualization cases based on the Matplotlib library. It summarizes the code and cas
STARCH compuets regional extreme storm physical characteristics and moisture balance based on spatiotemporal precipitation data from reanalysis or climate model data.
STARCH (Storm Tracking And Regional CHaracterization) STARCH computes regional extreme storm physical and moisture balance characteristics based on sp
El Niño - Southern Oscillation analysis compared to minimum flow rates of rivers in northeast Brazil
ENSO (El Niño - Southern Oscillation) analysis in northeast Brazil É comprovada a influência dos fenômenos El Niño e La Niña nas secas no nordesde bra
Data aggregated from the reports found at the MCPS COVID Dashboard into a set of visualizations.
Montgomery County Public Schools COVID-19 Visualizer Contents About this project Data Support this project About this project Data All data we use can
This is the course project of AI3602: Data Mining of SJTU
This is the course project of AI3602: Data Mining of SJTU. Group Members include Jinghao Feng, Mingyang Jiang and Wenzhong Zheng.
Automatically pick a winner who Retweeted, Commented, and Followed your Twitter account!
AutomaticTwitterGiveaways automates selecting winners for "Retweet, Comment, Follow" type Twitter giveaways.
EchoDNS - Analyze your DNS traffic super easy, shows all requested DNS traffic
EchoDNS - Analyze your DNS traffic super easy, shows all requested DNS traffic
K Closest Points and Maximum Clique Pruning for Efficient and Effective 3D Laser Scan Matching (To appear in RA-L 2022)
KCP The official implementation of KCP: k Closest Points and Maximum Clique Pruning for Efficient and Effective 3D Laser Scan Matching, accepted for p
This is RFA-Toolbox, a simple and easy-to-use library that allows you to optimize your neural network architectures using receptive field analysis (RFA) and create graph visualizations of your architecture.
ReceptiveFieldAnalysisToolbox This is RFA-Toolbox, a simple and easy-to-use library that allows you to optimize your neural network architectures usin
Python data loader for Solar Orbiter's (SolO) Energetic Particle Detector (EPD).
Data loader (and downloader) for Solar Orbiter/EPD energetic charged particle sensors EPT, HET, and STEP. Supports level 2 and low latency data provided by ESA's Solar Orbiter Archive.
Data Analytics on Genomes and Genetics
Data Analytics performed on On genomes and Genetics dataset to predict genetic disorder and disorder subclass. DONE by TEAM SIGMA!
Sentiment-Analysis and EDA on the IMDB Movie Review Dataset
Sentiment-Analysis and EDA on the IMDB Movie Review Dataset The main part of the work focuses on the exploration and study of different approaches whi
ioztat is a storage load analysis tool for OpenZFS
ioztat is a storage load analysis tool for OpenZFS. It provides iostat-like statistics at an individual dataset/zvol level.
Video Games Web Scraper is a project that crawls websites and APIs and extracts video game related data from their pages.
Video Games Web Scraper Video Games Web Scraper is a project that crawls websites and APIs and extracts video game related data from their pages. This
A Python Bytecode Disassembler helping reverse engineers in dissecting Python binaries
A Python Bytecode Disassembler helping reverse engineers in dissecting Python binaries by disassembling and analyzing the compiled python byte-code(.pyc) files across all python versions (including Python 3.10.*)
Predict the Site EUI, given the characteristics of the building and the weather data for the location of the building.
wids_datathon_2022 Description: Contains a data pipeline used to predict energy EUI Goals: Dataset exploration Automating the parameter fitting, gener
In this project , I play with the YouTube data API and extract trending videos in Nigeria on a particular day
YouTubeTrendingVideosAnalysis In this project , I played with the YouTube data API and extracted trending videos in Nigeria on a particular day. This
This repository structures data in title, summary, tags, sentiment given a fragment of a conversation
Understand-conversation-AI This repository structures data in title, summary, tags, sentiment given a fragment of a conversation How to install: pip i
Kinetics-Data-Preprocessing
Kinetics-Data-Preprocessing Kinetics-400 and Kinetics-600 are common video recognition datasets used by popular video understanding projects like Slow
Financial portfolio optimisation in python, including classical efficient frontier, Black-Litterman, Hierarchical Risk Parity
PyPortfolioOpt has recently been published in the Journal of Open Source Software 🎉 PyPortfolioOpt is a library that implements portfolio optimizatio
Python Machine Learning Jupyter Notebooks (ML website)
Python Machine Learning Jupyter Notebooks (ML website) Dr. Tirthajyoti Sarkar, Fremont, California (Please feel free to connect on LinkedIn here) Also
Practical Machine Learning with Python
Master the essential skills needed to recognize and solve complex real-world problems with Machine Learning and Deep Learning by leveraging the highly popular Python Machine Learning Eco-system.
Semi-Automated Data Processing
Perform semi automated exploratory data analysis, feature engineering and feature selection on provided dataset by visualizing every possibilities on each step and assisting the user to make a meaningful decision to achieve a low-bias and low-variance model.
The bidirectional mapping library for Python.
bidict The bidirectional mapping library for Python. Status bidict: has been used for many years by several teams at Google, Venmo, CERN, Bank of Amer
Python Library to get fast extensive Dummy Data for testing
Dumda Python Library to get fast extensive Dummy Data for testing https://pypi.org/project/dumda/ Installation pip install dumda Usage: Cities from d
Displaying plot of death rates from past years in Poland. Data source from these years is in readme
Average-Death-Rate Displaying plot of death rates from past years in Poland The goal collect the data from a CSV file count the ADR (Average Death Rat
Advanced raster and geometry manipulations
buzzard In a nutshell, the buzzard library provides powerful abstractions to manipulate together images and geometries that come from different kind o
Security audit Python project dependencies against security advisory databases.
Security audit Python project dependencies against security advisory databases.
Download and process GOES-16 and GOES-17 data from NOAA's archive on AWS using Python.
Download and display GOES-East and GOES-West data GOES-East and GOES-West satellite data are made available on Amazon Web Services through NOAA's Big
Shelf DB is a tiny document database for Python to stores documents or JSON-like data
Shelf DB Introduction Shelf DB is a tiny document database for Python to stores documents or JSON-like data. Get it $ pip install shelfdb shelfquery S
Exploratory Data Analysis of the 2019 Indian General Elections using a dataset from Kaggle.
2019-indian-election-eda Exploratory Data Analysis of the 2019 Indian General Elections using a dataset from Kaggle. This project is a part of the Cou
DCM is a set of tools that helps you to keep your data in your Django Models consistent.
Django Consistency Model DCM is a set of tools that helps you to keep your data in your Django Models consistent. Motivation You have a lot of legacy
API Server for VoIP analysis (CDR + Audio CODECs)
Swagger generated server Overview This server was generated by the swagger-codegen project. By using the OpenAPI-Spec from a remote server, you can ea
To design and implement the Identification of Iris Flower species using machine learning using Python and the tool Scikit-Learn.
To design and implement the Identification of Iris Flower species using machine learning using Python and the tool Scikit-Learn.
The RAP community of practice includes all analysts and data scientists who are interested in adopting the working practices included in reproducible analytical pipelines (RAP) at NHS Digital.
The RAP community of practice includes all analysts and data scientists who are interested in adopting the working practices included in reproducible analytical pipelines (RAP) at NHS Digital.
AB-test-analyzer - Python class to perform AB test analysis
AB-test-analyzer Python class to perform AB test analysis Overview This repo con
Data Science Course at Dept. of Computer Engineering, Chula 2022
2110446 Data Science Course at Chula 2022 Short links for exercises: Week1: Intro to Numpy, Pandas Numpy: https://colab.research.google.com/github/kao
Contains analysis of trends from Fitbit Dataset (source: Kaggle) to see how the trends can be applied to Bellabeat customers and Bellabeat products
Contains analysis of trends from Fitbit Dataset (source: Kaggle) to see how the trends can be applied to Bellabeat customers and Bellabeat products.
Repo for The Crown: Exploratory Analysis of Nim Malware DEF CON 615 talk
Repo for "The Crown: Exploratory Analysis of Nim Malware" DEF CON 615 talk
Udacity - Data Analyst Nanodegree - Project 4 - Wrangle and Analyze Data
WeRateDogs Twitter Data from 2015 to 2017 Udacity - Data Analyst Nanodegree - Project 4 - Wrangle and Analyze Data Table of Contents Introduction Proj
BSDotPy, A module to get a bombsquad player's account data.
BSDotPy BSDotPy, A module to get a bombsquad player's account data from bombsquad's servers. Badges Provided By: shields.io Acknowledgements Issues Pu
Using knowledge-informed machine learning on the PRONOSTIA (FEMTO) and IMS bearing data sets. Predict remaining-useful-life (RUL).
Knowledge Informed Machine Learning using a Weibull-based Loss Function Exploring the concept of knowledge-informed machine learning with the use of a
Supervised 3D Pre-training on Large-scale 2D Natural Image Datasets for 3D Medical Image Analysis
Introduction This is an implementation of our paper Supervised 3D Pre-training on Large-scale 2D Natural Image Datasets for 3D Medical Image Analysis.
Orange Chicken: Data-driven Model Generalizability in Crosslinguistic Low-resource Morphological Segmentation
Orange Chicken: Data-driven Model Generalizability in Crosslinguistic Low-resource Morphological Segmentation This repository contains code and data f
Classification of Long Sequential Data using Circular Dilated Convolutional Neural Networks
Classification of Long Sequential Data using Circular Dilated Convolutional Neural Networks arXiv preprint: https://arxiv.org/abs/2201.02143. Architec
HuSpaCy: industrial-strength Hungarian natural language processing
HuSpaCy: Industrial-strength Hungarian NLP HuSpaCy is a spaCy model and a library providing industrial-strength Hungarian language processing faciliti
Repository for the AugmentedPCA Python package.
Overview This Python package provides implementations of Augmented Principal Component Analysis (AugmentedPCA) - a family of linear factor models that
Specification language for generating Generalized Linear Models (with or without mixed effects) from conceptual models
tisane Tisane: Authoring Statistical Models via Formal Reasoning from Conceptual and Data Relationships TL;DR: Analysts can use Tisane to author gener
Source code for the plant extraction workflow introduced in the paper “Agricultural Plant Cataloging and Establishment of a Data Framework from UAV-based Crop Images by Computer Vision”
Plant extraction workflow Source code for the plant extraction workflow introduced in the paper "Agricultural Plant Cataloging and Establishment of a
A framework for multi-step probabilistic time-series/demand forecasting models
JointDemandForecasting.py A framework for multi-step probabilistic time-series/demand forecasting models File stucture JointDemandForecasting contains
FEMDA: Robust classification with Flexible Discriminant Analysis in heterogeneous data
FEMDA: Robust classification with Flexible Discriminant Analysis in heterogeneous data. Flexible EM-Inspired Discriminant Analysis is a robust supervised classification algorithm that performs well in noisy and contaminated datasets.
Active Transport Analytics Model: A new strategic transport modelling and data visualization framework
{ATAM} Active Transport Analytics Model Active Transport Analytics Model (“ATAM”
Historic weather - Home Assistant custom component for accessing historic weather data
Historic Weather for Home Assistant (CC) 2022 by Andreas Frisch github@fraxinas.
In this repo, I will put all the code related to data science using python libraries like Numpy, Pandas, Matplotlib, Seaborn and many more.
Python-for-DS In this repo, I will put all the code related to data science using python libraries like Numpy, Pandas, Matplotlib, Seaborn and many mo
This repository contains answers of the Shopify Summer 2022 Data Science Intern Challenge.
Data-Science-Intern-Challenge This repository contains answers of the Shopify Summer 2022 Data Science Intern Challenge. Summer 2022 Data Science Inte
Active Transport Analytics Model (ATAM) is a new strategic transport modelling and data visualization framework for Active Transport as well as emerging micro-mobility modes
{ATAM} Active Transport Analytics Model Active Transport Analytics Model (“ATAM”) is a new strategic transport modelling and data visualization framew
Sentinel-1 SAR time series analysis for OSINT use
SARveillance Sentinel-1 SAR time series analysis for OSINT use. Description Generates a time lapse GIF of the Sentinel-1 satellite images for the loca
Validate arbitrary image uploads from incoming data urls while preserving file integrity but removing EXIF and unwanted artifacts and RCE exploit potential
Validate arbitrary base64-encoded image uploads as incoming data urls while preserving image integrity but removing EXIF and unwanted artifacts and mitigating RCE-exploit potential.
This is a simple website crawler which asks for a website link from the user to crawl and find specific data from the given website address.
This is a simple website crawler which asks for a website link from the user to crawl and find specific data from the given website address.
Deep learning with TensorFlow and earth observation data.
Deep Learning with TensorFlow and EO Data Complete file set for Jupyter Book Autor: Development Seed Date: 04 October 2021 ISBN: (to come) Notebook tu
Big Data & Cloud Computing for Oceanography
DS2 Class 2022, Big Data & Cloud Computing for Oceanography Home of the 2022 ISblue Big Data & Cloud Computing for Oceanography class (IMT-A, ENSTA, I
Generating new names based on trends in data using GPT2 (Transformer network)
MLOpsNameGenerator Overall Goal The goal of the project is to develop a model that is capable of creating Pokémon names based on its description, usin
Official git for "CTAB-GAN: Effective Table Data Synthesizing"
CTAB-GAN This is the official git paper CTAB-GAN: Effective Table Data Synthesizing. The paper is published on Asian Conference on Machine Learning (A
A Python package that can be used to download post and comment data from Reddit.
Reddit Data Collector Reddit Data Collector is a Python package that allows a user to collect post and comment data from Reddit. It is built on top of
A practical ML pipeline for data labeling with experiment tracking using DVC.
Auto Label Pipeline A practical ML pipeline for data labeling with experiment tracking using DVC Goals: Demonstrate reproducible ML Use DVC to build a
A vanilla 3D face modeling on pose-invariant and multi-lightning image data
3D-Face-Modeling A vanilla 3D face modeling on pose-invariant and multi-lightning image data Table of Contents Background Install Usage Contributing B
🎁 3,000,000+ Unsplash images made available for research and machine learning
The Unsplash Dataset The Unsplash Dataset is made up of over 250,000+ contributing global photographers and data sourced from hundreds of millions of
People tracker on the Internet: OSINT analysis and research tool by Jose Pino
trape (stable) v2.0 People tracker on the Internet: Learn to track the world, to avoid being traced. Trape is an OSINT analysis and research tool, whi
A collection of machine learning examples and tutorials.
machine_learning_examples A collection of machine learning examples and tutorials.
Always know what to expect from your data.
Great Expectations Always know what to expect from your data. Introduction Great Expectations helps data teams eliminate pipeline debt, through data t
Jupyter notebook and datasets from the pandas Q&A video series
Python pandas Q&A video series Read about the series, and view all of the videos on one page: Easier data analysis in Python with pandas. Jupyter Note
Code and data accompanying Natural Language Processing with PyTorch
Natural Language Processing with PyTorch Build Intelligent Language Applications Using Deep Learning By Delip Rao and Brian McMahan Welcome. This is a
100 data puzzles for pandas, ranging from short and simple to super tricky (60% complete)
100 pandas puzzles Puzzles notebook Solutions notebook Inspired by 100 Numpy exerises, here are 100* short puzzles for testing your knowledge of panda
FMA: A Dataset For Music Analysis
FMA: A Dataset For Music Analysis Michaël Defferrard, Kirell Benzi, Pierre Vandergheynst, Xavier Bresson. International Society for Music Information