3238 Repositories
Python open-house-london-data Libraries
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training [Arxiv] VideoMAE: Masked Autoencoders are Data-Efficient Learne
Official code for our CVPR '22 paper "Dataset Distillation by Matching Training Trajectories"
Dataset Distillation by Matching Training Trajectories Project Page | Paper This repo contains code for training expert trajectories and distilling sy
Open-Source CI/CD platform for ML teams. Deliver ML products, better & faster. ⚡️🧑🔧
Deliver ML products, better & faster Giskard is an Open-Source CI/CD platform for ML teams. Inspect ML models visually from your Python notebook 📗 Re
A data engineering project with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more!
Streamify A data pipeline with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more! Description Objective The project will stre
Rank 3 : Source code for OPPO 6G Data Generation Challenge
OPPO 6G Data Generation with an E2E Framework Homepage of OPPO 6G Data Generation Challenge Datasets H1_32T4R.mat H2_32T4R.mat Please put the original
Pocsploit is a lightweight, flexible and novel open source poc verification framework
Pocsploit is a lightweight, flexible and novel open source poc verification framework
Course materials for: Geospatial Data Science
Course materials for: Geospatial Data Science These course materials cover the lectures for the course held for the first time in spring 2022 at IT Un
✨ Real-life Data Analysis and Model Training Workshop by Global AI Hub.
🎓 Data Analysis and Model Training Course by Global AI Hub Syllabus: Day 1 What is Data? Multimedia Structured and Unstructured Data Data Types Data
Specification for storing geospatial vector data (point, line, polygon) in Parquet
GeoParquet About This repository defines how to store geospatial vector data (point, lines, polygons) in Apache Parquet, a popular columnar storage fo
Forecasting for knowable future events using Bayesian informative priors (forecasting with judgmental-adjustment).
What is judgyprophet? judgyprophet is a Bayesian forecasting algorithm based on Prophet, that enables forecasting while using information known by the
Display your data in an attractive way in your notebook!
Bloxs Bloxs is a simple python package that helps you display information in an attractive way (formed in blocks). Perfect for building dashboards, re
Free Data Engineering course!
Data Engineering Zoomcamp Register in DataTalks.Club's Slack Join the #course-data-engineering channel The videos are published to DataTalks.Club's Yo
This repository contains the best Data Science free hand-picked resources to equip you with all the industry-driven skills and interview preparation kit.
Best Data Science Resources Hey, Data Enthusiasts out there! Finally, after lots of requests from the community I finally came up with the best free D
Sample code and notebooks for Vertex AI, the end-to-end machine learning platform on Google Cloud
Google Cloud Vertex AI Samples Welcome to the Google Cloud Vertex AI sample repository. Overview The repository contains notebooks and community conte
Write python locally, execute SQL in your data warehouse
RasgoQL Write python locally, execute SQL in your data warehouse ≪ Read the Docs · Join Our Slack » RasgoQL is a Python package that enables you to ea
Data types specify the different sizes and values that can be stored in the variable. For example, Python stores numbers, strings, and a list of values using different data types. Learn different types of Python data types along with their respective in-built functions and methods.
02_Python_Datatypes Introduction 👋 Data types specify the different sizes and values that can be stored in the variable. For example, Python stores n
A short course on Julia and open-source software development
Advanced Scientific Computing: producing better code This course is taught as a 6-session "nanocourse" at Washington University in St. Louis. See the
A Python package develop for transportation spatio-temporal big data processing, analysis and visualization.
English 中文版 TransBigData Introduction TransBigData is a Python package developed for transportation spatio-temporal big data processing, analysis and
This repository contains all the data analytics projects that I've worked on in python.
93_Python_Data_Analytics_Projects This repository contains all the data analytics projects that I've worked on in python. No. Name 01 001_Cervical_Can
Data stream analytics: Implement online learning methods to address concept drift in data streams using the River library. Code for the paper entitled "PWPAE: An Ensemble Framework for Concept Drift Adaptation in IoT Data Streams" accepted in IEEE GlobeCom 2021.
PWPAE-Concept-Drift-Detection-and-Adaptation This is the code for the paper entitled "PWPAE: An Ensemble Framework for Concept Drift Adaptation in IoT
🙌Kart of 210+ projects based on machine learning, deep learning, computer vision, natural language processing and all. Show your support by ✨ this repository.
ML-ProjectKart 📌 Repository This kart showcases the finest collection of all projects based on machine learning, deep learning, computer vision, natu
Seaborn is one of the go-to tools for statistical data visualization in python. It has been actively developed since 2012 and in July 2018, the author released version 0.9. This version of Seaborn has several new plotting features, API changes and documentation updates which combine to enhance an already great library. This article will walk through a few of the highlights and show how to use the new scatter and line plot functions for quickly creating very useful visualizations of data.
12_Python_Seaborn_Module Introduction 👋 From the website, “Seaborn is a Python data visualization library based on matplotlib. It provides a high-lev
This repository contains implementations of all Machine Learning Algorithms from scratch in Python. Mathematics required for ML and many projects have also been included.
👏 Pre- requisites to Machine Learning
My Solutions to 120 commonly asked data science interview questions.
Data_Science_Interview_Questions Introduction 👋 Here are the answers to 120 Data Science Interview Questions The above answer some is modified based
I've demonstrated the working of the decision tree-based ID3 algorithm. Use an appropriate data set for building the decision tree and apply this knowledge to classify a new sample. All the steps have been explained in detail with graphics for better understanding.
Python Decision Tree and Random Forest Decision Tree A Decision Tree is one of the popular and powerful machine learning algorithms that I have learne
Implementation of RITA (Real Intelligence Threat Analytics) in Jupyter Notebook with improved scoring algorithm.
RITA (Real Intelligence Threat Analytics) in Jupyter Notebook RITA is an open source framework for network traffic analysis sponsored by Active Counte
In this project we combine techniques from neural voice cloning and musical instrument synthesis to achieve good results from as little as 16 seconds of target data.
Neural Instrument Cloning In this project we combine techniques from neural voice cloning and musical instrument synthesis to achieve good results fro
Sionna: An Open-Source Library for Next-Generation Physical Layer Research
Sionna: An Open-Source Library for Next-Generation Physical Layer Research Sionna™ is an open-source Python library for link-level simulations of digi
StorSeismic: An approach to pre-train a neural network to store seismic data features
StorSeismic: An approach to pre-train a neural network to store seismic data features This repository contains codes and resources to reproduce experi
A workshop with several modules to help learn Feast, an open-source feature store
Workshop: Learning Feast This workshop aims to teach users about Feast, an open-source feature store. We explain concepts & best practices by example,
Data Visualizations for the #30DayChartChallenge
The #30DayChartChallenge This repository contains all the charts made for the #30DayChartChallenge during the month of April. This project aims to exp
BankNote-Net: Open dataset and encoder model for assistive currency recognition
BankNote-Net: Open Dataset for Assistive Currency Recognition Millions of people around the world have low or no vision. Assistive software applicatio
Repositório para o #alurachallengedatascience1
1° Challenge de Dados - Alura A Alura Voz é uma empresa de telecomunicação que nos contratou para atuar como cientistas de dados na equipe de vendas.
A platform for intelligent agent learning based on a 3D open-world FPS game developed by Inspir.AI.
Wilderness Scavenger: 3D Open-World FPS Game AI Challenge This is a platform for intelligent agent learning based on a 3D open-world FPS game develope
Georeferencing large amounts of data for free.
Geolocate Georeferencing large amounts of data for free. Special thanks to @brunodepauloalmeida and the whole team for the contributions. How? It's us
gget is a free and open-source command-line tool and Python package that enables efficient querying of genomic databases.
gget is a free and open-source command-line tool and Python package that enables efficient querying of genomic databases. gget consists of a collection of separate but interoperable modules, each designed to facilitate one type of database querying in a single line of code.
PyPOTS - A Python Toolbox for Data Mining on Partially-Observed Time Series
A python toolbox/library for data mining on partially-observed time series, supporting tasks of forecasting/imputation/classification/clustering on incomplete multivariate time series with missing values.
Coreference resolution for English, French, German and Polish, optimised for limited training data and easily extensible for further languages
Coreferee Author: Richard Paul Hudson, Explosion AI 1. Introduction 1.1 The basic idea 1.2 Getting started 1.2.1 English 1.2.2 French 1.2.3 German 1.2
Easily pull telemetry data and create beautiful visualizations for analysis.
This repository is a work in progress. Anything and everything is subject to change. Porpo Table of Contents Porpo Table of Contents General Informati
Regress.me is an easy to use data visualization tool powered by Dash/Plotly.
Regress.me Regress.me is an easy to use data visualization tool powered by Dash/Plotly. Regress.me.-.Google.Chrome.2022-05-10.15-58-59.mp4 Get Started
Use the state-of-the-art m2m100 to translate large data on CPU/GPU/TPU. Super Easy!
Easy-Translate is a script for translating large text files in your machine using the M2M100 models from Facebook/Meta AI. We also privide a script fo
Automatic packaging of the open-composite libs for OvGME
OvGME Packager for OpenXR – OpenComposite for DCS Note This repository is currently unsupported and needs to be migrated to the upstream OpenComposite
An attempt to map the areas with active conflict in Ukraine using open source twitter data.
Live Action Map (LAM) An attempt to use open source data on Twitter to map areas with active conflict. Right now it is used for the Ukraine-Russia con
Practical Blind Denoising via Swin-Conv-UNet and Data Synthesis
Practical Blind Denoising via Swin-Conv-UNet and Data Synthesis [Paper] [Online Demo] The following results are obtained by our SCUNet with purely syn
Detecting silent model failure. NannyML estimates performance with an algorithm called Confidence-based Performance estimation (CBPE), developed by core contributors. It is the only open-source algorithm capable of fully capturing the impact of data drift on performance.
Website • Docs • Community Slack 💡 What is NannyML? NannyML is an open-source python library that allows you to estimate post-deployment model perfor
A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way :chestnut:
Squirrel Core Share, load, and transform data in a collaborative, flexible, and efficient way What is Squirrel? Squirrel is a Python library that enab
The most Advanced yet simple Multi Cloud tool to transfer Your Data from any cloud to any cloud remotely based on Rclone.⚡
Multi Cloud Transfer (Advanced!) 🔥 1.Setup and Start using Rclone on Google Colab and Create/Edit/View and delete your Rclone config file and keep th
Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.
English | 简体中文 Easy Parallel Library Overview Easy Parallel Library (EPL) is a general and efficient library for distributed model training. Usability
Access Undenied parses AWS AccessDenied CloudTrail events, explains the reasons for them, and offers actionable remediation steps. Open-sourced by Ermetic.
Access Undenied on AWS Access Undenied parses AWS AccessDenied CloudTrail events, explains the reasons for them, and offers actionable fixes. Access U
Repo for "Benchmarking Robustness of 3D Point Cloud Recognition against Common Corruptions" https://arxiv.org/abs/2201.12296
Benchmarking Robustness of 3D Point Cloud Recognition against Common Corruptions This repo contains the dataset and code for the paper Benchmarking Ro
My Implementation for the paper EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks using Tensorflow
Easy Data Augmentation Implementation This repository contains my Implementation for the paper EDA: Easy Data Augmentation Techniques for Boosting Per
An introduction to free, automated web scraping with GitHub’s powerful new Actions framework.
An introduction to free, automated web scraping with GitHub’s powerful new Actions framework Published at palewi.re/docs/first-github-scraper/ Contrib
[Arxiv preprint] Causality-inspired Single-source Domain Generalization for Medical Image Segmentation (code&data-processing pipeline)
Causality-inspired Single-source Domain Generalization for Medical Image Segmentation Arxiv preprint Repository under construction. Might still be bug
Ana's Portfolio
Ana's Portfolio ✌️ Welcome to my Portfolio! You will find here different Projects I have worked on (from scratch) 💪 Projects 💻 1️⃣ Hangman game (Mad
clock_plot provides a simple way to visualize timeseries data, mapping 24 hours onto the 360 degrees of a polar plot
clock_plot clock_plot provides a simple way to visualize timeseries data mapping 24 hours onto the 360 degrees of a polar plot. For usage, please see
A Power BI/Google Studio Dashboard to analyze previous OTC CatchUps
OTC CatchUp Dashboard A Power BI/Google Studio dashboard analyzing OTC CatchUps. File Contents * ├───data ├───old summaries ─── *.md ├
A python package for generating, analyzing and visualizing building shadows
pybdshadow Introduction pybdshadow is a python package for generating, analyzing and visualizing building shadows from large scale building geographic
Containerized Demo of Apache Spark MLlib on a Data Lakehouse (2022)
Spark-DeltaLake-Demo Reliable, Scalable Machine Learning (2022) This project was completed in an attempt to become better acquainted with the latest b
Open-source code for Generic Grouping Network (GGN, CVPR 2022)
Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity Pytorch implementation for "Open-World Instance Segmen
torchlm is aims to build a high level pipeline for face landmarks detection, it supports training, evaluating, exporting, inference(Python/C++) and 100+ data augmentations
💎A high level pipeline for face landmarks detection, supports training, evaluating, exporting, inference and 100+ data augmentations, compatible with torchvision and albumentations, can easily install with pip.
Material for my PyConDE & PyData Berlin 2022 Talk "5 Steps to Speed Up Your Data-Analysis on a Single Core"
5 Steps to Speed Up Your Data-Analysis on a Single Core Material for my talk at the PyConDE & PyData Berlin 2022 Description Your data analysis pipeli
Resources complimenting the Machine Learning Course led in the Faculty of mathematics and informatics part of Sofia University.
Machine Learning and Data Mining, Summer 2021-2022 How to learn data science and machine learning? Programming. Learn Python. Basic Statistics. Take a
Bot developed in python, 100% open-source, compatible with Windows and Linux.
Bombcrypto Bot [Family JOW] Bot desenvolvido em python, 100% do código é aberto, para aqueles que tenham conhecimento validarem que não existe nenhum
An open source development framework to help you build data workflows and modern data architecture on AWS.
AWS DataOps Development Kit (DDK) The AWS DataOps Development Kit is an open source development framework for customers that build data workflows and
Esse é o meu primeiro repo tratando de fim a fim, uma pipeline de dados abertos do governo brasileiro relacionado a compras de contrato e cronogramas anuais com spark, em pyspark e SQL!
Olá! Esse é o meu primeiro repo tratando de fim a fim, uma pipeline de dados abertos do governo brasileiro relacionado a compras de contrato e cronogr
On Uncertainty, Tempering, and Data Augmentation in Bayesian Classification
Understanding Bayesian Classification This repository hosts the code to reproduce the results presented in the paper On Uncertainty, Tempering, and Da
This repository contains helper functions which can help you generate additional data points depending on your NLP task.
NLP Albumentations For Data Augmentation This repository contains helper functions which can help you generate additional data points depending on you
MarcoPolo is a clustering-free approach to the exploration of bimodally expressed genes along with group information in single-cell RNA-seq data
MarcoPolo is a method to discover differentially expressed genes in single-cell RNA-seq data without depending on prior clustering Overview MarcoPolo
Sample data associated with the Aurora-BP study
The Aurora-BP Study and Dataset This repository contains sample code, sample data, and explanatory information for working with the Aurora-BP dataset
IEEE-WIE presents WIE Week of Code (WIEWoC), a 3-days long open-source contribution event starting from 1st March, highlighting different python spaces including web development, machine learning, game development, data structures and algorithms, and substantially more! We are creating an open source repository on Python along with all its applications!
WIE-WoC IEEE-WIE presents WIE Week of Code (WIEWoC), a 3-days long open-source contribution event starting from 1st March, highlighting different pyth
Contains the code and data for our #ICSE2022 paper titled as "CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Naming Sequences"
CodeFill This repository contains the code for our paper titled as "CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Namin
A spatial genome aligner for analyzing multiplexed DNA-FISH imaging data.
jie jie is a spatial genome aligner. This package parses true chromatin imaging signal from noise by aligning signals to a reference DNA polymer model
The FLARE team's open-source library to disassemble Common Intermediate Language (CIL) instructions.
dncil is a Common Intermediate Language (CIL) disassembly library written in Python that supports parsing the header, instructions, and exception hand
Download & Install mods for your favorit game with a few simple clicks
Husko's SteamWorkshop Downloader 🔴 IMPORTANT ❗ 🔴 The Tool is currently being rewritten so updates will be slow and only on the dev branch until it i
HTTP API for TON (The Open Network)
HTTP API for The Open Network Since TON nodes uses its own ADNL binary transport protocol, a intermediate service is needed for an HTTP connection. TO
This is the open source implementation of the ICLR2022 paper "StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis"
StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image
Machine learning beginner to Kaggle competitor in 30 days. Non-coders welcome. The program starts Monday, August 2, and lasts four weeks. It's designed for people who want to learn machine learning.
30-Days-of-ML-Kaggle 🔥 About the Hands On Program 💻 Machine learning beginner → Kaggle competitor in 30 days. Non-coders welcome The program starts
A workshop on data visualization in Python with notebooks and exercises for following along.
Beyond the Basics: Data Visualization in Python The human brain excels at finding patterns in visual representations, which is why data visualizations
Tutorial repo for an end-to-end Data Science project
End-to-end Data Science project This is the repo with the notebooks, code, and additional material used in the ITI's workshop. The goal of the session
Official code for "Distributed Deep Learning in Open Collaborations" (NeurIPS 2021)
Distributed Deep Learning in Open Collaborations This repository contains the code for the NeurIPS 2021 paper "Distributed Deep Learning in Open Colla
AI Summer's complete catalog of articles
Learn Deep Learning with AI Summer A collection of all articles (almost 100) written for the AI Summer blog organized by topic. Deep Learning Theory M
1000+ ready code templates to kickstart your next AI experiment
AI Seed Projects Start with ready code for your next AI experiment. Choose from 1000+ code templates, across a wide variety of use cases. All examples
Open-source data observability for modern data teams
Use cases Monitor your data warehouse in minutes: Data anomalies monitoring as dbt tests Data lineage made simple, reliable, and automated dbt operati
StyleGAN-Human: A Data-Centric Odyssey of Human Generation
StyleGAN-Human: A Data-Centric Odyssey of Human Generation Abstract: Unconditional human image generation is an important task in vision and graphics,
ipyvizzu - Jupyter notebook integration of Vizzu
ipyvizzu - Jupyter notebook integration of Vizzu. Tutorial · Examples · Repository About The Project ipyvizzu is the Jupyter Notebook integration of V
ML powered analytics engine for outlier detection and root cause analysis.
Website • Docs • Blog • LinkedIn • Community Slack ML powered analytics engine for outlier detection and root cause analysis ✨ What is Chaos Genius? C
This is a repo of basic Machine Learning!
Basic Machine Learning This repository contains a topic-wise curated list of Machine Learning and Deep Learning tutorials, articles and other resource
An open-source Deep Learning Engine for Healthcare that aims to treat & prevent major diseases
AlphaCare Background AlphaCare is a work-in-progress, open-source Deep Learning Engine for Healthcare that aims to treat and prevent major diseases. T
Data, notebooks, and articles associated with the RSNA AI Deep Learning Lab at RSNA 2021
RSNA AI Deep Learning Lab 2021 Intro Welcome Deep Learners! This document provides all the information you need to participate in the RSNA AI Deep Lea
Red Team Toolkit is an Open-Source Django Offensive Web-App which is keeping the useful offensive tools used in the red-teaming together.
RedTeam Toolkit Note: Only legal activities should be conducted with this project. Red Team Toolkit is an Open-Source Django Offensive Web-App contain
Hacktoberfest 2021 contribution repository✨
🎃 HacktoberFest-2021 🎃 Repository for Hacktoberfest Note: Although, We are actively focusing on Machine Learning, Data Science and Tricky Python pro
Make your first PR. A beginner friendly repository made specifically for open source beginners. Add any program under any language (it can be anything from a simple program to a complex data structure algorithm). Happy coding...
Hacktober Fest 2021 Upload Different Types of Programs in any Language Use this project to make your first contribution to an open source project on G
Data labels and scripts for fastMRI.org
fastMRI+: Clinical pathology annotations for the fastMRI dataset The fastMRI dataset is a publicly available MRI raw (k-space) dataset. It has been us
⚓ Eurybia monitor model drift over time and securize model deployment with data validation
View Demo · Documentation · Medium article 🔍 Overview Eurybia is a Python library which aims to help in : Detecting data drift and model drift Valida
As we all know the BGMI Loot Crate comes with so many resources for the gamers, this ML Crate will be the hub of various ML projects which will be the resources for the ML enthusiasts! Open Source Program: SWOC 2021 and JWOC 2022.
Machine Learning Loot Crate 💻 🧰 🔴 Welcome contributors! As we all know the BGMI Loot Crate comes with so many resources for the gamers, this ML Cra
Learn Data Science with focus on adding value with the most efficient tech stack.
DataScienceWithPython Get started with Data Science with Python An engaging journey to become a Data Scientist with Python TL;DR Download all Jupyter
Package towards building Explainable Forecasting and Nowcasting Models with State-of-the-art Deep Neural Networks and Dynamic Factor Model on Time Series data sets with single line of code. Also, provides utilify facility for time-series signal similarities matching, and removing noise from timeseries signals.
DeepXF: Explainable Forecasting and Nowcasting with State-of-the-art Deep Neural Networks and Dynamic Factor Model Also, verify TS signal similarities
An end-to-end framework for mixed-integer optimization with data-driven learned constraints.
OptiCL OptiCL is an end-to-end framework for mixed-integer optimization (MIO) with data-driven learned constraints. We address a problem setting in wh
Data Inspector is an open-source python library that brings 15++ types of different functions to make EDA, data cleaning easier.
Data Inspector Data Inspector is an open-source python library that brings 15 types of different functions to make EDA, data cleaning easier. Author:
Data and code to support "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley)
anlp21 Course materials for "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley) Syllabus: http://people.ischool.berkeley.edu/~dba