2836 Repositories
Python data-science-at-scale Libraries
Semantic code search implementation using Tensorflow framework and the source code data from the CodeSearchNet project
Semantic Code Search Semantic code search implementation using Tensorflow framework and the source code data from the CodeSearchNet project. The model
Code for the paper titled "Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages"
Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages Code for the paper titled "Prabhupadavani: A Code-mixed Speech Translation Data
A classification model capable of accurately predicting the price of secondhand cars
The purpose of this project is create a classification model capable of accurately predicting the price of secondhand cars. The data used for model building is open source and has been added to this repository. Most packages used are usually pre-installed in most developed environments and tools like collab, jupyter, etc. This can be useful for people looking to enhance the way the code their predicitve models and efficient ways to deal with tabular data!
Analyzing Earth Observation (EO) data is complex and solutions often require custom tailored algorithms.
eo-grow Earth observation framework for scaled-up processing in Python. Analyzing Earth Observation (EO) data is complex and solutions often require c
A small scale relica of bank management system using the MySQL queries in the python language.
Bank_Management_system This is a Bank Management System Database Project. Abstract: The main aim of the Bank Management Mini project is to keep record
Statistical & Probabilistic Analysis of Store Sales, University Survey, & Manufacturing data
Statistical_Modelling Statistical & Probabilistic Analysis of Store Sales, University Survey, & Manufacturing data Statistical Methods for Decision Ma
Melanoma Skin Cancer Detection using Convolutional Neural Networks and Transfer Learning🕵🏻♂️
This is a Kaggle competition in which we have to identify if the given lesion image is malignant or not for Melanoma which is a type of skin cancer.
Final Project for Practical Python Programming and Algorithms for Data Analysis
Final Project for Practical Python Programming and Algorithms for Data Analysis (PHW2781L, Summer 2020) Redlining, Race-Exclusive Deed Restriction Lan
Fetch fund data from avanza.se using Python and some web scraping with bs4
Py(A)vanza Fetch fund data from avanza.se using Python and some web scraping with bs4. The default way is to display the data in the terminal, apply -
Storing, versioning, and downloading files from S3 made as easy as using open() in Python. Caching included.
open(LARGE) Storing, versioning, and downloading files from S3 made as easy as using open() in Python. Caching included. Motivation Oftentimes, especi
This library provides an abstraction to perform Model Versioning using Weight & Biases.
Description This library provides an abstraction to perform Model Versioning using Weight & Biases. Features Version a new trained model Promote a mod
Aalto-cs-msc-theses - Listing of M.Sc. Theses of the Department of Computer Science at Aalto University
Aalto-CS-MSc-Theses Listing of M.Sc. Theses of the Department of Computer Scienc
Python-geoarrow - Storing geometry data in Apache Arrow format
geoarrow Storing geometry data in Apache Arrow format Installation $ pip install
Scikit-event-correlation - Event Correlation and Forecasting over High Dimensional Streaming Sensor Data algorithms
scikit-event-correlation Event Correlation and Changing Detection Algorithm Theo
Graviti-python-sdk - Graviti Data Platform Python SDK
Graviti Python SDK Graviti Python SDK is a python library to access Graviti Data
DomainMonitor is a web project that has a RESTful API to get a domain's subdomains and whois data.
DomainMonitor is a web project that has a RESTful API to get a domain's subdomains and whois data.
OptiPLANT is a cloud-based based system that empowers professional and non-professional data scientists to build high-quality predictive models
OptiPLANT OptiPLANT is a cloud-based based system that empowers professional and non-professional data scientists to build high-quality predictive mod
Dynamic vae - Dynamic VAE algorithm is used for anomaly detection of battery data
Dynamic VAE frame Automatic feature extraction can be achieved by probability di
This repo provides the source code & data of our paper "GreaseLM: Graph REASoning Enhanced Language Models"
GreaseLM: Graph REASoning Enhanced Language Models This repo provides the source code & data of our paper "GreaseLM: Graph REASoning Enhanced Language
Clustering is a popular approach to detect patterns in unlabeled data
Visual Clustering Clustering is a popular approach to detect patterns in unlabeled data. Existing clustering methods typically treat samples in a data
SpyQL - SQL with Python in the middle
SpyQL SQL with Python in the middle Concept SpyQL is a query language that combines: the simplicity and structure of SQL with the power and readabilit
Data-sets from the survey and analysis
bachelor-thesis "Umfragewerte.xlsx" contains the orginal survey results. "umfrage_alle.csv" contains the survey results but one participant is cancele
Compute execution plan: A DAG representation of work that you want to get done. Individual nodes of the DAG could be simple python or shell tasks or complex deeply nested parallel branches or embedded DAGs themselves.
Hello from magnus Magnus provides four capabilities for data teams: Compute execution plan: A DAG representation of work that you want to get done. In
Es-schema - Common Data Schemas for Elasticsearch
Common Data Schemas for Elasticsearch The Common Data Schema for Elasticsearch i
Data analysis and visualisation projects from a range of individual projects and applications
Python-Data-Analysis-and-Visualisation-Projects Data analysis and visualisation projects from a range of individual projects and applications. Python
Predicting Auction Sale Price using the kaggle bulldozer auction sales data: Modeling with Ensembles vs Neural Network
Predicting Auction Sale Price using the kaggle bulldozer auction sales data: Modeling with Ensembles vs Neural Network The performances of tree ensemb
A program made in PYTHON🐍 that automatically performs data insertions into a POSTGRES database 🐘 , using as base a .CSV file 📁 , useful in mass data insertions
A program made in PYTHON🐍 that automatically performs data insertions into a POSTGRES database 🐘 , using as base a .CSV file 📁 , useful in mass data insertions.
Labelbox is the fastest way to annotate data to build and ship artificial intelligence applications
Labelbox Labelbox is the fastest way to annotate data to build and ship artificial intelligence applications. Use this github repository to help you s
Continuous Security Group Rule Change Detection & Response at scale
Introduction Get notified of Security Group Changes across all AWS Accounts & Regions in an AWS Organization, with the ability to respond/revert those
FairLens is an open source Python library for automatically discovering bias and measuring fairness in data
FairLens FairLens is an open source Python library for automatically discovering bias and measuring fairness in data. The package can be used to quick
This is a curated list of medical data for machine learning
Medical Data for Machine Learning This is a curated list of medical data for machine learning. This list is provided for informational purposes only,
Segment axon and myelin from microscopy data using deep learning
Segment axon and myelin from microscopy data using deep learning. Written in Python. Using the TensorFlow framework. Based on a convolutional neural network architecture. Pixels are classified as either axon, myelin or background.
Equipped customers with insights about their EVs Hourly energy consumption and helped predict future charging behavior using LSTM model
Equipped customers with insights about their EVs Hourly energy consumption and helped predict future charging behavior using LSTM model. Designed sample dashboard with insights and recommendation for customers.
A Celery application to collect data, download media and extract information from social media APIs
Project IBEX A Celery application to collect data, download media and extract information from social media APIs. Requirements You must have a Redis D
Data processing with Pandas.
Processing-data-with-python This is a simple example showing how to use Pandas to create a dataframe and the processing data with python. The jupyter
Nobel Data Analysis
Nobel_Data_Analysis This project is for analyzing a set of data about people who have won the Nobel Prize in different fields and different countries
Investigating EV charging data
Investigating EV charging data Introduction: Got an opportunity to work with a home monitoring technology company over the last 6 months whose goal wa
Analyze the Gravitational wave data stored at LIGO/VIRGO observatories
Gravitational-Wave-Analysis This project showcases how to analyze the Gravitational wave data stored at LIGO/VIRGO observatories, using Python program
Contains modeling practice materials and homework for the Computational Neuroscience course at Okinawa Institute of Science and Technology
A310 Computational Neuroscience - Okinawa Institute of Science and Technology, 2022 This repository contains modeling practice materials and homework
This python script allows you to manipulate the audience data from Sl.ido surveys
Slido-Automated-VoteBot This python script allows you to manipulate the audience data from Sl.ido surveys Since Slido blocks interference from automat
Cormen-Lib - An academic tool for data structures and algorithms courses
The Cormen-lib module is an insular data structures and algorithms library based on the Thomas H. Cormen's Introduction to Algorithms Third Edition. This library was made specifically for administering and grading assignments related to data structure and algorithms in computer science.
Collapse by Conditioning: Training Class-conditional GANs with Limited Data
Collapse by Conditioning: Training Class-conditional GANs with Limited Data Moha
GoogleFormSpammer - A simple CLI script to spam Google Forms used by Crypto Wallet scammers to collect stolen data
GoogleFormSpammer - A simple CLI script to spam Google Forms used by Crypto Wallet scammers to collect stolen data
Implement the Perspective open source code in preparation for data visualization
Task Overview | Installation Instructions | Link to Module 2 Introduction Experience Technology at JP Morgan Chase Try out what real work is like in t
PostQF is a user-friendly Postfix queue data filter which operates on data produced by postqueue -j.
PostQF Copyright © 2022 Ralph Seichter PostQF is a user-friendly Postfix queue data filter which operates on data produced by postqueue -j. See the ma
Synthetic data need to preserve the statistical properties of real data in terms of their individual behavior and (inter-)dependences
Synthetic data need to preserve the statistical properties of real data in terms of their individual behavior and (inter-)dependences. Copula and functional Principle Component Analysis (fPCA) are statistical models that allow these properties to be simulated (Joe 2014). As such, copula generated data have shown potential to improve the generalization of machine learning (ML) emulators (Meyer et al. 2021) or anonymize real-data datasets (Patki et al. 2016).
An easy-to-use feature store
A feature store is a data storage system for data science and machine-learning. It can store raw data and also transformed features, which can be fed straight into an ML model or training script.
Learn Basic to advanced level Data visualisation techniques from this Repository
Data visualisation Hey, You can learn Basic to advanced level Data visualisation techniques from this Repository. Data visualization is the graphic re
Create charts with Python in a very similar way to creating charts using Chart.js
Create charts with Python in a very similar way to creating charts using Chart.js. The charts created are fully configurable, interactive and modular and are displayed directly in the output of the the cells of your jupyter notebook environment.
visualize_ML is a python package made to visualize some of the steps involved while dealing with a Machine Learning problem
visualize_ML visualize_ML is a python package made to visualize some of the steps involved while dealing with a Machine Learning problem. It is build
Equibles Stocks API for Python
Equibles Stocks API for Python Requirements. Python 2.7 and 3.4+ Installation & Usage pip install If the python package is hosted on Github, you can i
A server and client for passing data between computercraft computers/turtles across dimensions or even servers.
ccserver A server and client for passing data between computercraft computers/turtles across dimensions or even servers. pastebin get zUnE5N0v client
Feature engineering and machine learning: together at last
Feature engineering and machine learning: together at last! Lambdo is a workflow engine which significantly simplifies data analysis by unifying featu
INFO-H515 - Big Data Scalable Analytics
INFO-H515 - Big Data Scalable Analytics Jacopo De Stefani, Giovanni Buroni, Théo Verhelst and Gianluca Bontempi - Machine Learning Group Exercise clas
This module is used to create Convolutional AutoEncoders for Variational Data Assimilation
VarDACAE This module is used to create Convolutional AutoEncoders for Variational Data Assimilation. A user can define, create and train an AE for Dat
Tutorials, examples, collections, and everything else that falls into the categories: pattern classification, machine learning, and data mining
**Tutorials, examples, collections, and everything else that falls into the categories: pattern classification, machine learning, and data mining.** S
Perform sentiment analysis on textual data that people generally post on websites like social networks and movie review sites.
Sentiment Analyzer The goal of this project is to perform sentiment analysis on textual data that people generally post on websites like social networ
Dive into Machine Learning
Dive into Machine Learning Hi there! You might find this guide helpful if: You know Python or you're learning it 🐍 You're new to Machine Learning You
TensorDebugger (TDB) is a visual debugger for deep learning. It extends TensorFlow with breakpoints + real-time visualization of the data flowing through the computational graph
TensorDebugger (TDB) is a visual debugger for deep learning. It extends TensorFlow (Google's Deep Learning framework) with breakpoints + real-time visualization of the data flowing through the computational graph.
Python for Data Analysis, 2nd Edition
Python for Data Analysis, 2nd Edition Materials and IPython notebooks for "Python for Data Analysis" by Wes McKinney, published by O'Reilly Media Buy
INF42 - Topological Data Analysis
TDA INF421(Conception et analyse d'algorithmes) Projet : Topological Data Analysis SphereMin Etant donné un nuage des points, ce programme contient de
Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials
Data Scientist Learning Plan Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials
sequitur is a library that lets you create and train an autoencoder for sequential data in just two lines of code
sequitur sequitur is a library that lets you create and train an autoencoder for sequential data in just two lines of code. It implements three differ
Google AI Open Images - Object Detection Track: Open Solution
Google AI Open Images - Object Detection Track: Open Solution This is an open solution to the Google AI Open Images - Object Detection Track 😃 More c
TGS Salt Identification Challenge
TGS Salt Identification Challenge This is an open solution to the TGS Salt Identification Challenge. Note Unfortunately, we can no longer provide supp
Airbus Ship Detection Challenge
Airbus Ship Detection Challenge This is an open solution to the Airbus Ship Detection Challenge. Our goals We are building entirely open solution to t
This is an analysis and prediction project for house prices in King County, USA based on certain features of the house
This is a project for analysis and estimation of House Prices in King County USA The .csv file contains the data of the house and the .ipynb file con
Python Sreamlit Duplicate Records Finder Remover
Python-Sreamlit-Duplicate-Records-Finder-Remover Streamlit is an open-source Python library that makes it easy to create and share beautiful, custom w
Program that predicts the NBA mvp based on data from previous years.
NBA MVP Predictor A machine learning model using RandomForest Regression that predicts NBA MVP's using player data. Explore the docs » View Demo · Rep
Simple Translator in Python
Simple Translator in Python Project Description: In this project, we'll be making a very simple translator in Python using some libraries. Requirement
Open solution to the Toxic Comment Classification Challenge
Starter code: Kaggle Toxic Comment Classification Challenge More competitions 🎇 Check collection of public projects 🎁 , where you can find multiple
dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.
dbd: database prototyping tool dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL d
Data-driven Computer Science UoB
COMS20011_2021 Data-driven Computer Science UoB Staff Laurence Aitchison [[email protected]] (unit director) Majid Mirmehdi [m.mirmehdi
Colab notebook and additional materials for Python-driven analysis of redlining data in Philadelphia
RedliningExploration The Google Colaboratory file contained in this repository contains work inspired by a project on educational inequality in the Ph
An interactive App to play with Spotify data, both from the Spotify Web API and from CSV datasets.
An interactive App to play with Spotify data, both from the Spotify Web API and from CSV datasets.
Airborne magnetic data of the Osborne Mine and Lightning Creek sill complex, Australia
Osborne Mine, Australia - Airborne total-field magnetic anomaly This is a section of a survey acquired in 1990 by the Queensland Government, Australia
This a classic fintech problem that introduces real life difficulties such as data imbalance. Check out the notebook to find out more!
Credit Card Fraud Detection Introduction Online transactions have become a crucial part of any business over the years. Many of those transactions use
FFCV: Fast Forward Computer Vision (and other ML workloads!)
Fast Forward Computer Vision: train models at a fraction of the cost with accele
ServiceX Transformer that converts flat ROOT ntuples into columnwise data
ServiceX_Uproot_Transformer ServiceX Transformer that converts flat ROOT ntuples into columnwise data Usage You can invoke the transformer from the co
This repo is dedicated to the data extraction and manipulation of the World Bank's database called STEP.
Overview Welcome to the Step-X repository. This repo is dedicated to the data extraction and manipulation of the World Bank's database called STEP. Be
GNES enables large-scale index and semantic search for text-to-text, image-to-image, video-to-video and any-to-any content form
GNES is Generic Neural Elastic Search, a cloud-native semantic search system based on deep neural network.
ECLARE: Extreme Classification with Label Graph Correlations
ECLARE ECLARE: Extreme Classification with Label Graph Correlations @InProceedings{Mittal21b, author = "Mittal, A. and Sachdeva, N. and Agrawal
NeurIPS workshop paper 'Counter-Strike Deathmatch with Large-Scale Behavioural Cloning'
Counter-Strike Deathmatch with Large-Scale Behavioural Cloning Tim Pearce, Jun Zhu Offline RL workshop, NeurIPS 2021 Paper: https://arxiv.org/abs/2104
CO2Ampel - This RaspberryPi project uses weather data to estimate the share of renewable energy in the power grid
CO2Ampel This RaspberryPi project uses weather data to estimate the share of ren
The scope of this project will be to build a data ware house on Google Cloud Platform that will help answer common business questions as well as powering dashboards
The scope of this project will be to build a data ware house on Google Cloud Platform that will help answer common business questions as well as powering dashboards.
RedlineSpam - Python tool to spam Redline Infostealer panels with legit looking data
RedlineSpam Python tool to spam Redline Infostealer panels with legit looking da
PrimaryBid - Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift
Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift This project is composed of two parts: Part1 and Part2
Conditional Generative Adversarial Networks (CGAN) for Mobility Data Fusion
This code implements the paper, Kim et al. (2021). Imputing Qualitative Attributes for Trip Chains Extracted from Smart Card Data Using a Conditional Generative Adversarial Network. Transportation Research Part C. Under Review.
Import, connect and transform data into Excel
xlwings_query Import, connect and transform data into Excel. Description The concept is to apply data transformations to a main query object. When the
Code for Private Recommender Systems: How Can Users Build Their Own Fair Recommender Systems without Log Data? (SDM 2022)
Private Recommender Systems: How Can Users Build Their Own Fair Recommender Systems without Log Data? (SDM 2022) We consider how a user of a web servi
A combination between python-flask, that fetch and send data from league client during champion select thanks to LCU
A combination between python-flask, that fetch data and send from league client during champion select thanks to LCU and compare picked champs to the gamesDataBase that we need to collect using my other python script and then send the games result to localhost:5000/members that will be read by electron-reactJS script to present the results as a GUI on browser (localhost:5000)
RedisJSON - a JSON data type for Redis
RedisJSON is a Redis module that implements ECMA-404 The JSON Data Interchange Standard as a native data type. It allows storing, updating and fetching JSON values from Redis keys (documents).
Qcover is an open source effort to help exploring combinatorial optimization problems in Noisy Intermediate-scale Quantum(NISQ) processor.
Qcover is an open source effort to help exploring combinatorial optimization problems in Noisy Intermediate-scale Quantum(NISQ) processor. It is devel
Th2En & Th2Zh: The large-scale datasets for Thai text cross-lingual summarization
Th2En & Th2Zh: The large-scale datasets for Thai text cross-lingual summarization 📥 Download Datasets 📥 Download Trained Models INTRODUCTION TH2ZH (
100 Days of Code Learning program to keep a habit of coding daily and learn things at your own pace with help from our remote community.
100 Days of Code Learning program to keep a habit of coding daily and learn things at your own pace with help from our remote community.
Elkeid HUB - A rule/event processing engine maintained by the Elkeid Team that supports streaming/offline data processing
Elkeid HUB - A rule/event processing engine maintained by the Elkeid Team that supports streaming/offline data processing
Beta Shapley: a Unified and Noise-reduced Data Valuation Framework for Machine Learning
Beta Shapley: a Unified and Noise-reduced Data Valuation Framework for Machine Learning This repository provides an implementation of the paper Beta S
Used Logistic Regression, Random Forest, and XGBoost to predict the outcome of Search & Destroy games from the Call of Duty World League for the 2018 and 2019 seasons.
Call of Duty World League: Search & Destroy Outcome Predictions Growing up as an avid Call of Duty player, I was always curious about what factors led
converts nominal survey data into a numerical value based on a dictionary lookup.
SWAP RATE Converts nominal survey data into a numerical values based on a dictionary lookup. It allows the user to switch nominal scale data from text
This is a place where I'm playing around with pandas to analyze data in a csv/excel file.
pandas-csv-excel-analysis This is a place where I'm playing around with pandas to analyze data in a csv/excel file. 0-start A very simple cheat sheet