3321 Repositories
Python Iris-Data-Set-Classification Libraries
Codes for our IJCAI21 paper: Dialogue Discourse-Aware Graph Model and Data Augmentation for Meeting Summarization
DDAMS This is the pytorch code for our IJCAI 2021 paper Dialogue Discourse-Aware Graph Model and Data Augmentation for Meeting Summarization [Arxiv Pr
Procedural 3D data generation pipeline for architecture
Synthetic Dataset Generator Authors: Stanislava Fedorova Alberto Tono Meher Shashwat Nigam Jiayao Zhang Amirhossein Ahmadnia Cecilia bolognesi Dominik
Image data augmentation scheduler for albumentations transforms
albu_scheduler Scheduler for albumentations transforms based on PyTorch schedulers interface Usage TransformMultiStepScheduler import albumentations a
An Unsupervised Graph-based Toolbox for Fraud Detection
An Unsupervised Graph-based Toolbox for Fraud Detection Introduction: UGFraud is an unsupervised graph-based fraud detection toolbox that integrates s
[CVPR 2021] Exemplar-Based Open-Set Panoptic Segmentation Network (EOPSN)
EOPSN: Exemplar-Based Open-Set Panoptic Segmentation Network (CVPR 2021) PyTorch implementation for EOPSN. We propose open-set panoptic segmentation t
Apache Liminal is an end-to-end platform for data engineers & scientists, allowing them to build, train and deploy machine learning models in a robust and agile way
Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.
A simple demonstration of how a django-based website can be set up for local development with microk8s
Django with MicroK8s Start Building Your Project This project provides a Django web app running as a single node Kubernetes cluster in microk8s. It is
A powerful bot to copy your google drive data to your team drive
⚛️ Clonebot - Heroku version ⚡ CloneBot is a telegram bot that allows you to copy folder/team drive to team drives. One of the main advantage of this
A Lucid Framework for Transparent and Interpretable Machine Learning Models.
Currently a Beta-Version lucidmode is an open-source, low-code and lightweight Python framework for transparent and interpretable machine learning mod
Cloud-native, data onboarding architecture for the Google Cloud Public Datasets program
Public Datasets Pipelines Cloud-native, data pipeline architecture for onboarding datasets to the Google Cloud Public Datasets Program. Overview Requi
This is a repository for the Duke University Cloud Computing course project on Serveless Data Engineering Pipeline. For this project, I recreated the below pipeline.
AWS Data Engineering Pipeline This is a repository for the Duke University Cloud Computing course project on Serverless Data Engineering Pipeline. For
Donors data of Tamil Nadu Chief Ministers Relief Fund scrapped from https://ereceipt.tn.gov.in/cmprf/Interface/CMPRF/MonthWiseReport
Tamil Nadu Chief Minister's Relief Fund Donors Scrapped data from https://ereceipt.tn.gov.in/cmprf/Interface/CMPRF/MonthWiseReport Scrapper scrapper.p
Python utility function to communicate with a subprocess using iterables: for when data is too big to fit in memory and has to be streamed
iterable-subprocess Python utility function to communicate with a subprocess using iterables: for when data is too big to fit in memory and has to be
Streaming Anomaly Detection Framework in Python (Outlier Detection for Streaming Data)
Python Streaming Anomaly Detection (PySAD) PySAD is an open-source python framework for anomaly detection on streaming multivariate data. Documentatio
Aggregating gridded data (xarray) to polygons
A package to aggregate gridded data in xarray to polygons in geopandas using area-weighting from the relative area overlaps between pixels and polygons.
Graphsignal Logger
Graphsignal Logger Overview Graphsignal is an observability platform for monitoring and troubleshooting production machine learning applications. It h
Python function to stream unzip all the files in a ZIP archive: without loading the entire ZIP file or any of its files into memory at once
Python function to stream unzip all the files in a ZIP archive: without loading the entire ZIP file or any of its files into memory at once
Show Data: Show your dataset in web browser!
Show Data is to generate html tables for large scale image dataset, especially for the dataset in remote server. It provides some useful commond line tools and fully customizeble API reference to generate html table different tasks.
Stream Framework is a Python library, which allows you to build news feed, activity streams and notification systems using Cassandra and/or Redis. The authors of Stream-Framework also provide a cloud service for feed technology:
Stream Framework Activity Streams & Newsfeeds Stream Framework is a Python library which allows you to build activity streams & newsfeeds using Cassan
Unofficial implementation of MLP-Mixer: An all-MLP Architecture for Vision
MLP-Mixer: An all-MLP Architecture for Vision This repo contains PyTorch implementation of MLP-Mixer: An all-MLP Architecture for Vision. Usage : impo
The programm for collecting data from Tinkoff API and building Excel table.
tinkproject The program for portfolio analysis via Tinkoff API Hello! This is my first project, please, don't judge me. This project was developed for
tsai is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series classification, regression and forecasting.
Time series Timeseries Deep Learning Pytorch fastai - State-of-the-art Deep Learning with Time Series and Sequences in Pytorch / fastai
Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.
Multi-Task Framework for Cross-Lingual Abstractive Summarization (MCLAS) The code for ACL2021 paper Cross-Lingual Abstractive Summarization with Limit
Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms
FNet: Mixing Tokens with Fourier Transforms Pytorch implementation of Fnet : Mixing Tokens with Fourier Transforms. Citation: @misc{leethorp2021fnet,
Purge your likes and wall comments from VKontakte. Set yourself free from your digital footprint.
vk_liberator Regain liberty in the cruel social media world. This program assists you with purging your metadata from Russian social network VKontakte
Implementation of ResMLP, an all MLP solution to image classification, in Pytorch
ResMLP - Pytorch Implementation of ResMLP, an all MLP solution to image classification out of Facebook AI, in Pytorch Install $ pip install res-mlp-py
A curated list of programmatic weak supervision papers and resources
A curated list of programmatic weak supervision papers and resources
Deep learning toolbox based on PyTorch for hyperspectral data classification.
Deep learning toolbox based on PyTorch for hyperspectral data classification.
Code & Data for Enhancing Photorealism Enhancement
Code & Data for Enhancing Photorealism Enhancement
A framework for cleaning Chinese dialog data
A framework for cleaning Chinese dialog data
Unofficial Implementation of MLP-Mixer in TensorFlow
mlp-mixer-tf Unofficial Implementation of MLP-Mixer [abs, pdf] in TensorFlow. Note: This project may have some bugs in it. I'm still learning how to i
Simple HTML and PDF document generator for Python - with built-in support for popular data analysis and plotting libraries.
Esparto is a simple HTML and PDF document generator for Python. Its primary use is for generating shareable single page reports with content from popular analytics and data science libraries.
Simple but maybe too simple config management through python data classes. We use it for machine learning.
👩✈️ Coqpit Simple, light-weight and no dependency config handling through python data classes with to/from JSON serialization/deserialization. Curre
Objective of the repository is to learn and build machine learning models using Pytorch. 30DaysofML Using Pytorch
30 Days Of Machine Learning Using Pytorch Objective of the repository is to learn and build machine learning models using Pytorch. List of Algorithms
Repository for XLM-T, a framework for evaluating multilingual language models on Twitter data
This is the XLM-T repository, which includes data, code and pre-trained multilingual language models for Twitter. XLM-T - A Multilingual Language Mode
Parrot is a paraphrase based utterance augmentation framework purpose built to accelerate training NLU models
Parrot is a paraphrase based utterance augmentation framework purpose built to accelerate training NLU models. A paraphrase framework is more than just a paraphrasing model.
Hue Editor: Open source SQL Query Assistant for Databases/Warehouses
Hue Editor: Open source SQL Query Assistant for Databases/Warehouses
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/
Texar-PyTorch is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar
Unsupervised Language Modeling at scale for robust sentiment classification
** DEPRECATED ** This repo has been deprecated. Please visit Megatron-LM for our up to date Large-scale unsupervised pretraining and finetuning code.
Data manipulation and transformation for audio signal processing, powered by PyTorch
torchaudio: an audio library for PyTorch The aim of torchaudio is to apply PyTorch to the audio domain. By supporting PyTorch, torchaudio follows the
Statsmodels: statistical modeling and econometrics in Python
About statsmodels statsmodels is a Python package that provides a complement to scipy for statistical computations including descriptive statistics an
Ethereum ETL lets you convert blockchain data into convenient formats like CSVs and relational databases.
Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions.
A set of high-level abstractions for Django forms
django-formtools Django's "formtools" is a set of high-level abstractions for Django forms. Currently for form previews and multi-step forms. This cod
Ray provides a simple, universal API for building distributed applications.
An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.
erdantic is a simple tool for drawing entity relationship diagrams (ERDs) for Python data model classes
erdantic is a simple tool for drawing entity relationship diagrams (ERDs) for Python data model classes. Diagrams are rendered using the venerable Graphviz library.
AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data
AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data [WIP] Unofficial Pytorch implementation of AdaSpeech 2. Requirements : All code written i
CLEAR algorithm for multi-view data association
CLEAR: Consistent Lifting, Embedding, and Alignment Rectification Algorithm The Matlab, Python, and C++ implementation of the CLEAR algorithm, as desc
Source codes for the paper "Local Additivity Based Data Augmentation for Semi-supervised NER"
LADA This repo contains codes for the following paper: Jiaao Chen*, Zhenghui Wang*, Ran Tian, Zichao Yang, Diyi Yang: Local Additivity Based Data Augm
Codes for our paper "SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge" (EMNLP 2020)
SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge Introduction SentiLARE is a sentiment-aware pre-trained language
Learning Representational Invariances for Data-Efficient Action Recognition
Learning Representational Invariances for Data-Efficient Action Recognition Official PyTorch implementation for Learning Representational Invariances
Domain Generalization with MixStyle, ICLR'21.
MixStyle This repo contains the code of our ICLR'21 paper, "Domain Generalization with MixStyle". The OpenReview link is https://openreview.net/forum?
Official implementation of "Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets" (CVPR2021)
Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets This is the official implementation of "Towards Good Pract
Django application and library for importing and exporting data with admin integration.
django-import-export django-import-export is a Django application and library for importing and exporting data with included admin integration. Featur
Packages of Example Data for The Effect
causaldata This repository will contain R, Stata, and Python packages, all called causaldata, which contain data sets that can be used to implement th
hyppo is an open-source software package for multivariate hypothesis testing.
hyppo (HYPothesis Testing in PythOn, pronounced "Hippo") is an open-source software package for multivariate hypothesis testing.
Implementation of different ML Algorithms from scratch, written in Python 3.x
Implementation of different ML Algorithms from scratch, written in Python 3.x
A command line tool for memorizing algorithms in Python by typing them.
Algo Drills A command line tool for memorizing algorithms in Python by typing them. In alpha and things will change. How it works Type out an algorith
TorchFlare is a simple, beginner-friendly, and easy-to-use PyTorch Framework train your models effortlessly.
TorchFlare TorchFlare is a simple, beginner-friendly and an easy-to-use PyTorch Framework train your models without much effort. It provides an almost
Boost learning for GNNs from the graph structure under challenging heterophily settings. (NeurIPS'20)
Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs Jiong Zhu, Yujun Yan, Lingxiao Zhao, Mark Heimann, Leman Akoglu,
Official repo for the work titled "SharinGAN: Combining Synthetic and Real Data for Unsupervised GeometryEstimation"
SharinGAN Official repo for the work titled "SharinGAN: Combining Synthetic and Real Data for Unsupervised GeometryEstimation" The official project we
🤗 The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools
🤗 The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools
Research shows Google collects 20x more data from Android than Apple collects from iOS. Block this non-consensual telemetry using pihole blocklists.
pihole-antitelemetry Research shows Google collects 20x more data from Android than Apple collects from iOS. Block both using these pihole lists. Proj
Request execution of Galaxy SARS-CoV-2 variation analysis workflows on input data you provide.
SARS-CoV-2 processing requests Request execution of Galaxy SARS-CoV-2 variation analysis workflows on input data you provide. Prerequisites This autom
CLI and Streamlit applications to create APIs from Excel data files within seconds, using FastAPI
FastAPI-Wrapper CLI & APIness Streamlit App Arvindra Sehmi, Oxford Economics Ltd. | Website | LinkedIn (Updated: 21 April, 2021) fastapi-wrapper is mo
Xarray backend to Copernicus Sentinel-1 satellite data products
xarray-sentinel WARNING: this product is a "technology preview" / pre-Alpha Xarray backend to explore and load Copernicus Sentinel-1 satellite data pr
A scikit-learn-compatible module for estimating prediction intervals.
|Anaconda|_ MAPIE - Model Agnostic Prediction Interval Estimator MAPIE allows you to easily estimate prediction intervals using your favourite sklearn
Code and data of the Fine-Grained R2R Dataset proposed in paper Sub-Instruction Aware Vision-and-Language Navigation
Fine-Grained R2R Code and data of the Fine-Grained R2R Dataset proposed in the EMNLP2020 paper Sub-Instruction Aware Vision-and-Language Navigation. C
CorNet Correlation Networks for Extreme Multi-label Text Classification
CorNet Correlation Networks for Extreme Multi-label Text Classification Prerequisites python==3.6.3 pytorch==1.2.0 torchgpipe==0.0.5 click==7.0 ruamel
🏅 The Most Comprehensive List of Kaggle Solutions and Ideas 🏅
🏅 Collection of Kaggle Solutions and Ideas 🏅
Reimplementation of the paper `Human Attention Maps for Text Classification: Do Humans and Neural Networks Focus on the Same Words? (ACL2020)`
Human Attention for Text Classification Re-implementation of the paper Human Attention Maps for Text Classification: Do Humans and Neural Networks Foc
Official Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(2021) paper
ImageNet-21K Pretraining for the Masses Paper | Pretrained models Official PyTorch Implementation Tal Ridnik, Emanuel Ben-Baruch, Asaf Noy, Lihi Zelni
Coreference resolution for English, German and Polish, optimised for limited training data and easily extensible for further languages
Coreferee Author: Richard Paul Hudson, msg systems ag 1. Introduction 1.1 The basic idea 1.2 Getting started 1.2.1 English 1.2.2 German 1.2.3 Polish 1
stock data on eink with raspberry
small python skript to display tradegate data on a waveshare e-ink important you need locale "de_AT.UTF-8 UTF-8" installed. do so in raspi-config's Lo
Attack classification models with transferability, black-box attack; unrestricted adversarial attacks on imagenet
Attack classification models with transferability, black-box attack; unrestricted adversarial attacks on imagenet, CVPR2021 安全AI挑战者计划第六期:ImageNet无限制对抗攻击 决赛第四名(team name: Advers)
Y. Zhang, Q. Yao, W. Dai, L. Chen. AutoSF: Searching Scoring Functions for Knowledge Graph Embedding. IEEE International Conference on Data Engineering (ICDE). 2020
AutoSF The code for our paper "AutoSF: Searching Scoring Functions for Knowledge Graph Embedding" and this paper has been accepted by ICDE2020. News:
Focus on Algorithm Design, Not on Data Wrangling
The dataTap Python library is the primary interface for using dataTap's rich data management tools. Create datasets, stream annotations, and analyze model performance all with one library.
The repo contains the code of the ACL2020 paper `Dice Loss for Data-imbalanced NLP Tasks`
Dice Loss for NLP Tasks This repository contains code for Dice Loss for Data-imbalanced NLP Tasks at ACL2020. Setup Install Package Dependencies The c
skweak: A software toolkit for weak supervision applied to NLP tasks
Labelled data remains a scarce resource in many practical NLP scenarios. This is especially the case when working with resource-poor languages (or text domains), or when using task-specific labels without pre-existing datasets. The only available option is often to collect and annotate texts by hand, which is expensive and time-consuming.
Draw datasets from within Jupyter.
drawdata This small python app allows you to draw a dataset in a jupyter notebook. This should be very useful when teaching machine learning algorithm
Educational project on how to build an ETL (Extract, Transform, Load) data pipeline, orchestrated with Airflow.
ETL Pipeline with Airflow, Spark, s3, MongoDB and Amazon Redshift
[Preprint] Escaping the Big Data Paradigm with Compact Transformers, 2021
Compact Transformers Preprint Link: Escaping the Big Data Paradigm with Compact Transformers By Ali Hassani[1]*, Steven Walton[1]*, Nikhil Shah[1], Ab
Simple, light-weight config handling through python data classes with to/from JSON serialization/deserialization.
Simple but maybe too simple config management through python data classes. We use it for machine learning.
PyTorch implementation of neural style randomization for data augmentation
README Augment training images for deep neural networks by randomizing their visual style, as described in our paper: https://arxiv.org/abs/1809.05375
Weakly supervised medical named entity classification
Trove Trove is a research framework for building weakly supervised (bio)medical named entity recognition (NER) and other entity attribute classifiers
Implementation of Convolutional enhanced image Transformer
CeiT : Convolutional enhanced image Transformer This is an unofficial PyTorch implementation of Incorporating Convolution Designs into Visual Transfor
Diffgram - Supervised Learning Data Platform
Data Annotation, Data Labeling, Annotation Tooling, Training Data for Machine Learning
addon for blender to import mocap data from tools like easymocap, frankmocap and Vibe
b3d_mocap_import addon for blender to import mocap data from tools like easymocap, frankmocap and Vibe ==================VIBE================== To use
The Django Leaflet Admin List package provides an admin list view featured by the map and bounding box filter for the geo-based data of the GeoDjango.
The Django Leaflet Admin List package provides an admin list view featured by the map and bounding box filter for the geo-based data of the GeoDjango. It requires a django-leaflet package.
📼Command line tool based on youtube-dl to easily download selected channels from your subscriptions.
youtube-cdl Command line tool based on youtube-dl to easily download selected channels from your subscriptions. This tool is very handy if you want to
Policy and data administration, distribution, and real-time updates on top of Open Policy Agent
⚡ OPAL ⚡ Open Policy Administration Layer OPAL is an administration layer for Open Policy Agent (OPA), detecting changes to both policy and policy dat
Visualize classified time series data with interactive Sankey plots in Google Earth Engine
sankee Visualize changes in classified time series data with interactive Sankey plots in Google Earth Engine Contents Description Installation Using P
Code related to "Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity" paper
DataTuner You have just found the DataTuner. This repository provides tools for fine-tuning language models for a task. See LICENSE.txt for license de
PyTorch3D is FAIR's library of reusable components for deep learning with 3D data
Introduction PyTorch3D provides efficient, reusable components for 3D Computer Vision research with PyTorch. Key features include: Data structure for
An end-to-end PyTorch framework for image and video classification
What's New: March 2021: Added RegNetZ models November 2020: Vision Transformers now available, with training recipes! 2020-11-20: Classy Vision v0.5 R
Sandbox for training deep learning networks
Deep learning networks This repo is used to research convolutional networks primarily for computer vision tasks. For this purpose, the repo contains (
Quickly comparing your image classification models with the state-of-the-art models (such as DenseNet, ResNet, ...)
Image Classification Project Killer in PyTorch This repo is designed for those who want to start their experiments two days before the deadline and ki
A Pythonic introduction to methods for scaling your data science and machine learning work to larger datasets and larger models, using the tools and APIs you know and love from the PyData stack (such as numpy, pandas, and scikit-learn).
This tutorial's purpose is to introduce Pythonistas to methods for scaling their data science and machine learning work to larger datasets and larger models, using the tools and APIs they know and love from the PyData stack (such as numpy, pandas, and scikit-learn).
DABO: Data Augmentation with Bilevel Optimization
DABO: Data Augmentation with Bilevel Optimization [Paper] The goal is to automatically learn an efficient data augmentation regime for image classific