7091 Repositories
Python databricks-data-scientist-learning-path Libraries
A high-performance distributed deep learning system targeting large-scale and automated distributed training.
HETU Documentation | Examples Hetu is a high-performance distributed deep learning system targeting trillions of parameters DL model training, develop
Sleep staging from ECG, assisted with EEG
Sleep_Staging_Knowledge Distillation This codebase implements knowledge distillation approach for ECG based sleep staging assisted by EEG based sleep
pandas: powerful Python data analysis toolkit
pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive.
A mutable set that remembers the order of its entries. One of Python's missing data types.
An OrderedSet is a mutable data structure that is a hybrid of a list and a set. It remembers the order of its entries, and every entry has an index number that can be looked up.
Type4Py: Deep Similarity Learning-Based Type Inference for Python
Type4Py: Deep Similarity Learning-Based Type Inference for Python This repository contains the implementation of Type4Py and instructions for re-produ
A GitHub action that suggests type annotations for Python using machine learning.
Typilus: Suggest Python Type Annotations A GitHub action that suggests type annotations for Python using machine learning. This action makes suggestio
AI4Good project for detecting waste in the environment
Detect waste AI4Good project for detecting waste in environment. www.detectwaste.ml. Our latest results were published in Waste Management journal in
Estoult - a Python toolkit for data mapping with an integrated query builder for SQL databases
Estoult Estoult is a Python toolkit for data mapping with an integrated query builder for SQL databases. It currently supports MySQL, PostgreSQL, and
A powerful framework for decentralized federated learning with user-defined communication topology
Scatterbrained Decentralized Federated Learning Scatterbrained makes it easy to build federated learning systems. In addition to traditional federated
Coursera learning course Python the basics. Programming exercises and tasks
HSE_Python_the_basics Welcome to BAsics programming Python! You’re joining thousands of learners currently enrolled in the course. I'm excited to have
A real-time financial data streaming pipeline and visualization platform using Apache Kafka, Cassandra, and Bokeh.
Realtime Financial Market Data Visualization and Analysis Introduction This repo shows my project about real-time stock data pipeline. All the code is
Learning Lightweight Low-Light Enhancement Network using Pseudo Well-Exposed Images
Learning Lightweight Low-Light Enhancement Network using Pseudo Well-Exposed Images This repository contains the implementation of the following paper
This is a graphql api build using ariadne python that serves a graphql-endpoint at port 3002 to perform language translation and identification using deep learning in python pytorch.
Language Translation and Identification this machine/deep learning api that will be served as a graphql-api using ariadne, to perform the following ta
Pyspark project that able to do joins on the spark data frames.
SPARK JOINS This project is to perform inner, all outer joins and semi joins. create_df.py: load_data.py : helps to put data into Spark data frames. d
Implementation of parameterized soft-exponential activation function.
Soft-Exponential-Activation-Function: Implementation of parameterized soft-exponential activation function. In this implementation, the parameters are
Self-Supervised Learning
Self-Supervised Learning Features self_supervised offers features like modular framework support for multi-gpu training using PyTorch Lightning easy t
yt is an open-source, permissively-licensed Python library for analyzing and visualizing volumetric data.
The yt Project yt is an open-source, permissively-licensed Python library for analyzing and visualizing volumetric data. yt supports structured, varia
MLReef is an open source ML-Ops platform that helps you collaborate, reproduce and share your Machine Learning work with thousands of other users.
The collaboration platform for Machine Learning MLReef is an open source ML-Ops platform that helps you collaborate, reproduce and share your Machine
GebPy is a Python-based, open source tool for the generation of geological data of minerals, rocks and complete lithological sequences.
GebPy is a Python-based, open source tool for the generation of geological data of minerals, rocks and complete lithological sequences. The data can be generated randomly or with respect to user-defined constraints, for example a specific element concentration within minerals and rocks or the order of units within a complete lithological profile.
A python interface for training Reinforcement Learning bots to battle on pokemon showdown
The pokemon showdown Python environment A Python interface to create battling pokemon agents. poke-env offers an easy-to-use interface for creating ru
A lightweight, pure-Python mobile robot simulator designed for experiments in Artificial Intelligence (AI) and Machine Learning, especially for Jupyter Notebooks
aitk.robots A lightweight Python robot simulator for JupyterLab, Notebooks, and other Python environments. Goals A lightweight mobile robotics simulat
Predicting Baseball Metric Clusters: Clustering Application in Python Using scikit-learn
Clustering Clustering Application in Python Using scikit-learn This repository contains the prediction of baseball metric clusters using MLB Statcast
PyGAD, a Python 3 library for building the genetic algorithm and training machine learning algorithms (Keras & PyTorch).
PyGAD: Genetic Algorithm in Python PyGAD is an open-source easy-to-use Python 3 library for building the genetic algorithm and optimizing machine lear
Official implementation of the paper "Backdoor Attacks on Self-Supervised Learning".
SSL-Backdoor Abstract Large-scale unlabeled data has allowed recent progress in self-supervised learning methods that learn rich visual representation
NCVX (NonConVeX): A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning.
NCVX (NonConVeX): A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning.
Selecting Parallel In-domain Sentences for Neural Machine Translation Using Monolingual Texts
DataSelection-NMT Selecting Parallel In-domain Sentences for Neural Machine Translation Using Monolingual Texts Quick update: The paper got accepted o
Learning with Subset Stacking
Learning with Subset Stacking (LESS) LESS is a new supervised learning algorithm that is based on training many local estimators on subsets of a given
Implementation of "Semi-supervised Domain Adaptive Structure Learning"
Semi-supervised Domain Adaptive Structure Learning - ASDA This repo contains the source code and dataset for our ASDA paper. Illustration of the propo
These are the materials for the paper "Few-Shot Out-of-Domain Transfer Learning of Natural Language Explanations"
Few-shot-NLEs These are the materials for the paper "Few-Shot Out-of-Domain Transfer Learning of Natural Language Explanations". You can find the smal
Supervised & unsupervised machine-learning techniques are applied to the database of weighted P4s which admit Calabi-Yau hypersurfaces.
Weighted Projective Spaces ML Description: The database of 5-vectors describing 4d weighted projective spaces which admit Calabi-Yau hypersurfaces are
Twin-deep neural network for semi-supervised learning of materials properties
Deep Semi-Supervised Teacher-Student Material Synthesizability Prediction Citation: Semi-supervised teacher-student deep neural network for materials
FinRL-Meta: A Universe for Data-Driven Financial Reinforcement Learning. 🔥
FinRL-Meta: A Universe of Market Environments. FinRL-Meta is a universe of market environments for data-driven financial reinforcement learning. Users
Centroid-UNet is deep neural network model to detect centroids from satellite images.
Centroid UNet - Locating Object Centroids in Aerial/Serial Images Introduction Centroid-UNet is deep neural network model to detect centroids from Aer
Semantically Contrastive Learning for Low-light Image Enhancement
Semantically Contrastive Learning for Low-light Image Enhancement Here, we propose an effective semantically contrastive learning paradigm for Low-lig
Hypernetwork-Ensemble Learning of Segmentation Probability for Medical Image Segmentation with Ambiguous Labels
Hypernet-Ensemble Learning of Segmentation Probability for Medical Image Segmentation with Ambiguous Labels The implementation of Hypernet-Ensemble Le
MinkLoc3D-SI: 3D LiDAR place recognition with sparse convolutions,spherical coordinates, and intensity
MinkLoc3D-SI: 3D LiDAR place recognition with sparse convolutions,spherical coordinates, and intensity Introduction The 3D LiDAR place recognition aim
CR-FIQA: Face Image Quality Assessment by Learning Sample Relative Classifiability
This is the official repository of the paper: CR-FIQA: Face Image Quality Assessment by Learning Sample Relative Classifiability A private copy of the
This is the official source code of "BiCAT: Bi-Chronological Augmentation of Transformer for Sequential Recommendation".
BiCAT This is our TensorFlow implementation for the paper: "BiCAT: Sequential Recommendation with Bidirectional Chronological Augmentation of Transfor
The code of "Dependency Learning for Legal Judgment Prediction with a Unified Text-to-Text Transformer".
Code data_preprocess.py: preprocess data for Dependent-T5. parameters.py: define parameters of Dependent-T5. train_tools.py: traning and evaluation co
Code of the paper "Shaping Visual Representations with Attributes for Few-Shot Learning (ASL)".
Shaping Visual Representations with Attributes for Few-Shot Learning This code implements the Shaping Visual Representations with Attributes for Few-S
Active learning for Mask R-CNN in Detectron2
MaskAL - Active learning for Mask R-CNN in Detectron2 Summary MaskAL is an active learning framework that automatically selects the most-informative i
Pydantic model generator for easy conversion of JSON, OpenAPI, JSON Schema, and YAML data sources.
datamodel-code-generator This code generator creates pydantic model from an openapi file and others. Help See documentation for more details. Supporte
A Django app that creates automatic web UIs for Python scripts.
Wooey is a simple web interface to run command line Python scripts. Think of it as an easy way to get your scripts up on the web for routine data anal
RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
RDFLib RDFLib is a pure Python package for working with RDF. RDFLib contains most things you need to work with RDF, including: parsers and serializers
WebRTC and ORTC implementation for Python using asyncio
aiortc What is aiortc? aiortc is a library for Web Real-Time Communication (WebRTC) and Object Real-Time Communication (ORTC) in Python. It is built o
PyG (PyTorch Geometric) - A library built upon PyTorch to easily write and train Graph Neural Networks (GNNs)
PyG (PyTorch Geometric) is a library built upon PyTorch to easily write and train Graph Neural Networks (GNNs) for a wide range of applications related to structured data.
Superset custom path for python
It is a common requirement to have superset running under a base url, (https://mydomain.at/analytics/ instead of https://mydomain.at/). I created the
A repository with exploration into using transformers to predict DNA ↔ transcription factor binding
Transcription Factor binding predictions with Attention and Transformers A repository with exploration into using transformers to predict DNA ↔ transc
Titanic data analysis for python
Titanic-data-analysis This Repo is an analysis on Titanic_mod.csv This csv file contains some assumed data of the Titanic ship after sinking This full
The Python-Weather-App is a service that provides weather data
The Python-Weather-App is a service that provides weather data, including current weather data to the developers of web services and mobile applications.
Implementation based on Paper - Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling
Implementation based on Paper - Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling
Official implementation of the article "Unsupervised JPEG Domain Adaptation For Practical Digital Forensics"
Unsupervised JPEG Domain Adaptation for Practical Digital Image Forensics @WIFS2021 (Montpellier, France) Rony Abecidan, Vincent Itier, Jeremie Boulan
[TNNLS 2021] The official code for the paper "Learning Deep Context-Sensitive Decomposition for Low-Light Image Enhancement"
CSDNet-CSDGAN this is the code for the paper "Learning Deep Context-Sensitive Decomposition for Low-Light Image Enhancement" Environment Preparing pyt
A PyTorch implementation of deep-learning-based registration
DiffuseMorph Implementation A PyTorch implementation of deep-learning-based registration. Requirements OS : Ubuntu / Windows Python 3.6 PyTorch 1.4.0
Code for the paper "Learning-Augmented Algorithms for Online Steiner Tree"
Learning-Augmented Algorithms for Online Steiner Tree This is the code for the paper "Learning-Augmented Algorithms for Online Steiner Tree". Requirem
Code basis for the paper "Camera Condition Monitoring and Readjustment by means of Noise and Blur" (2021)
Camera Condition Monitoring and Readjustment by means of Noise and Blur This repository contains the source code of the paper: Wischow, M., Gallego, G
Code for 'Blockwise Sequential Model Learning for Partially Observable Reinforcement Learning' (AAAI 2022)
Blockwise Sequential Model Learning Code for 'Blockwise Sequential Model Learning for Partially Observable Reinforcement Learning' (AAAI 2022) For ins
How the Deep Q-learning method works and discuss the new ideas that makes the algorithm work
Deep Q-Learning Recommend papers The first step is to read and understand the method that you will implement. It was first introduced in a 2013 paper
Big data on k8s
# microsoft azure # https://docs.microsoft.com/en-us/cli/azure/install-azure-cli az account set --subscription [] az aks get-credentials --resource-g
A super lightweight Lagrangian model for calculating millions of trajectories using ERA5 data
Easy-ERA5-Trck Easy-ERA5-Trck Galleries Install Usage Repository Structure Module Files Version iteration Easy-ERA5-Trck is a super lightweight Lagran
A PyTorch library and evaluation platform for end-to-end compression research
CompressAI CompressAI (compress-ay) is a PyTorch library and evaluation platform for end-to-end compression research. CompressAI currently provides: c
Jiminy, fast and portable Python/C++ simulator of poly-articulated systems with OpenAI Gym interface for reinforcement learning.
Jiminy is a fast and portable cross-platform open-source simulator for poly-articulated systems. It was built with two ideas in mind: provide a fast y
The geospatial toolkit for redistricting data.
maup maup is the geospatial toolkit for redistricting data. The package streamlines the basic workflows that arise when working with blocks, precincts
A piece of software that shows a traceroute of a URL redirect path
Tracing URL redirects has never been easier! Usage • Download 🚩 Use Cases To see where an affiliate link ends up To see what affiliate network is bei
Delta Sharing: An Open Protocol for Secure Data Sharing
Delta Sharing: An Open Protocol for Secure Data Sharing Delta Sharing is an open protocol for secure real-time exchange of large datasets, which enabl
PyIOmica (pyiomica) is a Python package for omics analyses.
PyIOmica (pyiomica) This repository contains PyIOmica, a Python package that provides bioinformatics utilities for analyzing (dynamic) omics datasets.
A part of HyRiver software stack for accessing hydrology data through web services
Package Description Status PyNHD Navigate and subset NHDPlus (MR and HR) using web services Py3DEP Access topographic data through National Map's 3DEP
Random dataframe and database table generator
Random database/dataframe generator Authored and maintained by Dr. Tirthajyoti Sarkar, Fremont, USA Introduction Often, beginners in SQL or data scien
A 3D Slicer Extension to view data from the flywheel heirarchy
flywheel-connect A 3D Slicer Extension to view, select, and download images from a Flywheel instance to 3D Slicer and storing Slicer outputs back to F
Streaming parser for multipart/form-data written in Python
Streaming multipart/form-data parser streaming_form_data provides a Python parser for parsing multipart/form-data input chunks (the encoding used when
A python module for retrieving and parsing WHOIS data
pythonwhois A WHOIS retrieval and parsing library for Python. Dependencies None! All you need is the Python standard library. Instructions The manual
Libextract: extract data from websites
Libextract is a statistics-enabled data extraction library that works on HTML and XML documents and written in Python
Persistent/Immutable/Functional data structures for Python
Pyrsistent Pyrsistent is a number of persistent collections (by some referred to as functional data structures). Persistent in the sense that they are
Persistent dict, backed by sqlite3 and pickle, multithread-safe.
sqlitedict -- persistent dict, backed-up by SQLite and pickle A lightweight wrapper around Python's sqlite3 database with a simple, Pythonic dict-like
A mutable set that remembers the order of its entries. One of Python's missing data types.
An OrderedSet is a mutable data structure that is a hybrid of a list and a set. It remembers the order of its entries, and every entry has an index nu
Python tree data library
Links Documentation PyPI GitHub Changelog Issues Contributors If you enjoy anytree Getting started Usage is simple. Construction from anytree impo
A JSON-friendly data structure which allows both object attributes and dictionary keys and values to be used simultaneously and interchangeably.
A JSON-friendly data structure which allows both object attributes and dictionary keys and values to be used simultaneously and interchangeably.
This is an implementation of PEP 557, Data Classes.
This is an implementation of PEP 557, Data Classes. It is a backport for Python 3.6. Because dataclasses will be included in Python 3.7, any discussio
D2LV: A Data-Driven and Local-Verification Approach for Image Copy Detection
Facebook AI Image Similarity Challenge: Matching Track —— Team: imgFp This is the source code of our 3rd place solution to matching track of Image Sim
Microsoft Azure provides a wide number of services for managing and storing data
Microsoft Azure provides a wide number of services for managing and storing data. One product is Microsoft Azure SQL. Which gives us the capability to create and manage instances of SQL Servers hosted in the cloud. This project, demonstrates how to use these services to manage data we collect from different sources.
Upload an Excel/CSV file ( 200 MB) and produce a short summary of the data.
Data-Analysis-Report Deployed App 1. What is this app? Upload an excel/csv file and produce a summary report of the data. 2. Where to upload? How to p
Computer Vision Script to recognize first person motion, developed as final project for the course "Machine Learning and Deep Learning"
Overview of The Code BaseColab/MLDL_FPAR.pdf: it contains the full explanation of our work Base Colab: it contains the base colab used to perform all
1st Solution For NeurIPS 2021 Competition on ML4CO Dual Task
KIDA: Knowledge Inheritance in Data Aggregation This project releases our 1st place solution on NeurIPS2021 ML4CO Dual Task. Slide and model weights a
PyTorch implementation of a collections of scalable Video Transformer Benchmarks.
PyTorch implementation of Video Transformer Benchmarks This repository is mainly built upon Pytorch and Pytorch-Lightning. We wish to maintain a colle
A `Neural = Symbolic` framework for sound and complete weighted real-value logic
Logical Neural Networks LNNs are a novel Neuro = symbolic framework designed to seamlessly provide key properties of both neural nets (learning) and s
Official implementation of the AAAI 2022 paper "Learning Token-based Representation for Image Retrieval"
Token: Token-based Representation for Image Retrieval PyTorch training code for Token-based Representation for Image Retrieval. We propose a joint loc
easyNeuron is a simple way to create powerful machine learning models, analyze data and research cutting-edge AI.
easyNeuron is a simple way to create powerful machine learning models, analyze data and research cutting-edge AI.
Search for files under the specified directory. Extract the file name and file path and import them as data.
Search for files under the specified directory. Extract the file name and file path and import them as data. Based on that, search for the file, select it and open it.
A lightweight library designed to accelerate the process of training PyTorch models by providing a minimal
A lightweight library designed to accelerate the process of training PyTorch models by providing a minimal, but extensible training loop which is flexible enough to handle the majority of use cases, and capable of utilizing different hardware options with no code changes required.
pandas-gbq is a package providing an interface to the Google BigQuery API from pandas
pandas-gbq pandas-gbq is a package providing an interface to the Google BigQuery API from pandas Installation Install latest release version via conda
Utils for streaming large files (S3, HDFS, gzip, bz2...)
smart_open — utils for streaming large files in Python What? smart_open is a Python 3 library for efficient streaming of very large files from/to stor
A system for quickly generating training data with weak supervision
Programmatically Build and Manage Training Data Announcement The Snorkel team is now focusing their efforts on Snorkel Flow, an end-to-end AI applicat
Extract data from a wide range of Internet sources into a pandas DataFrame.
pandas-datareader Up to date remote data access for pandas, works for multiple versions of pandas. Installation Install using pip pip install pandas-d
Singer is an open source standard for moving data between databases, web APIs, files, queues, and just about anything else you can think of.
Singer is an open source standard for moving data between databases, web APIs, files, queues, and just about anything else you can think of. Th
Synthetic Data Generation for tabular, relational and time series data.
An Open Source Project from the Data to AI Lab, at MIT Website: https://sdv.dev Documentation: https://sdv.dev/SDV User Guides Developer Guides Github
Flexible HDF5 saving/loading and other data science tools from the University of Chicago
deepdish Flexible HDF5 saving/loading and other data science tools from the University of Chicago. This repository also host a Deep Learning blog: htt
Python library for reading and writing tabular data via streams.
tabulator-py A library for reading and writing tabular data (csv/xls/json/etc). [Important Notice] We have released Frictionless Framework. This frame
A common, beautiful interface to tabular data, no matter the format
rows No matter in which format your tabular data is: rows will import it, automatically detect types and give you high-level Python objects so you can
Tools for test driven data-wrangling and data validation.
datatest: Test driven data-wrangling and data validation Datatest helps to speed up and formalize data-wrangling and data validation tasks. It impleme
Tools for parsing messy tabular data.
Parsing for messy tables A library for dealing with messy tabular data in several formats, guessing types and detecting headers. See the documentation