3466 Python Data-pre-processing Libraries

📜 GPT-2 Rhyming Limerick and Haiku models using data augmentation

Well-formed Limericks and Haikus with GPT2 📜 GPT-2 Rhyming Limerick and Haiku models using data augmentation In collaboration with Matthew Korahais &

2 May 26, 2022

A desktop application developed in Python with PyQt5 to predict demand and help monitor and schedule brewing processes for Barnaby's Brewhouse.

brewhouse-management A desktop application developed in Python with PyQt5 to predict demand and help monitor and schedule brewing processes for Barnab

2 Jul 9, 2022

Resources related to our paper "CLIN-X: pre-trained language models and a study on cross-task transfer for concept extraction in the clinical domain"

CLIN-X (CLIN-X-ES) & (CLIN-X-EN) This repository holds the companion code for the system reported in the paper: "CLIN-X: pre-trained language models a

4 Dec 5, 2022

Data Platform com AWS CDK

Welcome to your CDK Python project! This is a blank project for Python development with CDK. The cdk.json file tells the CDK Toolkit how to execute yo

8 Jul 2, 2022

Get MODBUS data from Sofar (K-TLX) inverter through LSW-3 or LSE module

SOFAR Inverter + LSW-3/LSE Small utility to read data from SOFAR K-TLX inverters through the Solarman (LSW-3/LSE) datalogger. Two scripts to get inver

58 Dec 29, 2022

Data and analysis relating to the 5.8M Melbourne quake of 2021

quake2021 Data and analysis relating to the 5.8M Melbourne quake of 2021 Monash University Woodside Living Lab Building The building is located here T

6 May 16, 2022

Scrapegoat is a python library that can be used to scrape the websites from internet based on the relevance of the given topic irrespective of language using Natural Language Processing

Scrapegoat is a python library that can be used to scrape the websites from internet based on the relevance of the given topic irrespective of language using Natural Language Processing. It can be mainly used for non-English language to get accurate and relevant scraped text.

10 Jul 6, 2022

Automatic data visualization in atom with the nteract data-explorer

Data Explorer Interactively explore your data directly in atom with hydrogen! The nteract data-explorer provides automatic data visualization, so you

65 Dec 1, 2022

Tidy data structures, summaries, and visualisations for missing data

naniar naniar provides principled, tidy ways to summarise, visualise, and manipulate missing data with minimal deviations from the workflows in ggplot

611 Dec 22, 2022

Catch-all collection of generative art made using processing

Generative art with Processing.py Some art I have created for fun. Dependencies Processing for Python, see how to download/use here Packages contained

2 Mar 12, 2022

Streaming over lightweight data transformations

Description Data augmentation libarary for Deep Learning, which supports images, segmentation masks, labels and keypoints. Furthermore, SOLT is fast a

Research Unit of Medical Imaging, Physics and Technology

256 Jan 8, 2023

A knowledge base construction engine for richly formatted data

Fonduer is a Python package and framework for building knowledge base construction (KBC) applications from richly formatted data. Note that Fonduer is

386 Dec 5, 2022

computer vision, image processing and machine learning on the web browser or node.

Image processing and Machine learning labs computer vision, image processing and machine learning on the web browser or node note Fast Fourier Trans

487 Nov 11, 2022

Automatic labeling, conversion of different data set formats, sample size statistics, model cascade

Simple Gadget Collection for Object Detection Tasks Automatic image annotation Conversion between different annotation formats Obtain statistical info

4 Aug 24, 2022

This is a web crawler that works on employ email data by gmane.org and visualizes it in different ways.

crawler_to_visual_gmane Analyzing an EMAIL Archive from gmane and vizualizing the data using the D3 JavaScript library. This is a set of tools that al

1 Dec 20, 2021

Base on browser-time to get har from network, and use python to analyze the data .

base on browser-time to get har from network, and use python to analyze the data

1 Dec 20, 2021

A custom qq-plot for two sample data comparision

QQ-Plot 2 Sample Just a gist to include the custom code to draw a qq-plot in python when dealing with a "two sample problem". This means when u try to

1 Dec 20, 2021

This script provides LIVE feedback for On-The-Fly data collection with RELION

README This script provides LIVE feedback for On-The-Fly data collection with RELION (very useful to explore already processed datasets too!) Creating

6 Jul 14, 2022

BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese

Table of contents Introduction Using BARTpho with fairseq Using BARTpho with transformers Notes BARTpho: Pre-trained Sequence-to-Sequence Models for V

58 Dec 23, 2022

Fast methods to work with hydro- and topography data in pure Python.

PyFlwDir Intro PyFlwDir contains a series of methods to work with gridded DEM and flow direction datasets, which are key to many workflows in many ear

27 Dec 7, 2022

Pytorch library for seismic data augmentation

27 Nov 22, 2022

A Parameter-free Deep Embedded Clustering Method for Single-cell RNA-seq Data

A Parameter-free Deep Embedded Clustering Method for Single-cell RNA-seq Data Overview Clustering analysis is widely utilized in single-cell RNA-seque

3 May 8, 2022

Turning images into '9-pan' palettes using KMeans clustering from sklearn.

img2palette Turning images into '9-pan' palettes using KMeans clustering from sklearn. Requirements We require: Pillow, for opening and processing ima

2 Jan 1, 2022

Highly decentralized and censorship-resistant way to store key data

Beacon coin Beacon coin is a Chia singelton coin that can store data that needs to be: always available censorship resistant versioned potentially imm

24 Oct 4, 2022

Convert Table data to approximate values with GUI

Table_Editor Convert Table data to approximate values with GUIs... usage - Import methods for extension Tables. Imported method supposed to have only

1 Jan 10, 2022

Python SDK for LUSID by FINBOURNE, a bi-temporal investment management data platform with portfolio accounting capabilities.

LUSID® Python SDK This is the Python SDK for LUSID by FINBOURNE, a bi-temporal investment management data platform with portfolio accounting capabilit

6 Dec 24, 2022

simple way to build the declarative and destributed data pipelines with python

unipipeline simple way to build the declarative and distributed data pipelines. Why you should use it Declarative strict config Scaffolding Fully type

0 Jan 26, 2022

Python library for creating data pipelines with chain functional programming

PyFunctional Features PyFunctional makes creating data pipelines easy by using chained functional operators. Here are a few examples of what it can do

2.1k Jan 5, 2023

Automation that uses Github Actions, Google Drive API, YouTube Data API and youtube-dl together to feed BackJam app with new music

1 Nov 21, 2021

Evaluation of file formats in the context of geo-referenced 3D geometries.

Geo-referenced Geometry File Formats Classic geometry file formats as .obj, .off, .ply, .stl or .dae do not support the utilization of coordinate syst

Advanced Information Systems and Technology

11 Mar 2, 2022

Fast Python reader and editor for ASAM MDF / MF4 (Measurement Data Format) files

asammdf is a fast parser and editor for ASAM (Association for Standardization of Automation and Measuring Systems) MDF (Measurement Data Format) files

440 Dec 31, 2022

🌎 The Modern Declarative Data Flow Framework for the AI Empowered Generation.

🌎 JSONClasses JSONClasses is a declarative data flow pipeline and data graph framework. Official Website: https://www.jsonclasses.com Official Docume

53 Dec 9, 2022

A PyTorch-based model pruning toolkit for pre-trained language models

English | 中文说明 TextPruner是一个为预训练语言模型设计的模型裁剪工具包，通过轻量、快速的裁剪方法对模型进行结构化剪枝，从而实现压缩模型体积、提升模型速度。其他相关资源：知识蒸馏工具TextBrewer：https://github.com/airaria/TextBrewe

231 Jan 8, 2023

A full pipeline AutoML tool for tabular data

HyperGBM Doc | 中文 We Are Hiring！ Dear folks,we are offering challenging opportunities located in Beijing for both professionals and students who are k

240 Jan 3, 2023

Feature Store for Machine Learning

Overview Feast is an open source feature store for machine learning. Feast is the fastest path to productionizing analytic data for model training and

3.8k Dec 30, 2022

A distributed block-based data storage and compute engine

Nebula is an extremely-fast end-to-end interactive big data analytics solution. Nebula is designed as a high-performance columnar data storage and tabular OLAP engine.

131 Dec 26, 2022

AtlasNet: A Papier-Mâché Approach to Learning 3D Surface Generation

AtlasNet [Project Page] [Paper] [Talk] AtlasNet: A Papier-Mâché Approach to Learning 3D Surface Generation Thibault Groueix, Matthew Fisher, Vladimir

577 Dec 17, 2022

Use .csv files to record, play and evaluate motion capture data.

Purpose These scripts allow you to record mocap data to, and play from .csv files. This approach facilitates parsing of body movement data in statisti

21 Dec 12, 2022

Healthsea is a spaCy pipeline for analyzing user reviews of supplementary products for their effects on health.

Welcome to Healthsea ✨ Create better access to health with spaCy. Healthsea is a pipeline for analyzing user reviews to supplement products by extract

75 Dec 19, 2022

Python package for analyzing sensor-collected human motion data

71 Nov 5, 2022

Motion Reconstruction Code and Data for Skills from Videos (SFV)

Motion Reconstruction Code and Data for Skills from Videos (SFV) This repo contains the data and the code for motion reconstruction component of the S

268 Dec 1, 2022

demir.ai Dataset Operations

demir.ai Dataset Operations With this application, you can have the empty values (nan/null) deleted or filled before giving your dataset to machine le

8 Nov 1, 2022

pyo is a Python module written in C to help digital signal processing script creation.

1.1k Jan 1, 2023

Itchio Downloader Tool with python

Itchio Downloader Tool Install pip install git+https://github.com/emersont1/itchio Download All Games in library from account python -m itchio.downloa

69 Dec 5, 2022

Process RunGap output file of a workout and load data into Apple Numbers Spreadsheet and my website with API calls

1 Jan 3, 2022

Howell County, Missouri, COVID-19 data and (unofficial) estimates

COVID-19 in Howell County, Missouri This repository contains the daily data files used to generate my COVID-19 dashboard for Howell County, Missouri,

0 Jun 18, 2022

BERT, LDA, and TFIDF based keyword extraction in Python

BERT, LDA, and TFIDF based keyword extraction in Python kwx is a toolkit for multilingual keyword extraction based on Google's BERT and Latent Dirichl

41 Dec 27, 2022

Functions for easily making publication-quality figures with matplotlib.

Data-viz utils 📈 Functions for data visualization in matplotlib 📚 API Can be installed using pip install dvu and then imported with import dvu. You

16 Sep 15, 2022

BERT-based Financial Question Answering System

BERT-based Financial Question Answering System In this example, we use Jina, PyTorch, and Hugging Face transformers to build a production-ready BERT-b

61 Sep 18, 2022

Continuously evaluated, functional, incremental, time-series forecasting

timemachines Autonomous, univariate, k-step ahead time-series forecasting functions assigned Elo ratings You can: Use some of the functionality of a s

343 Jan 4, 2023

Code for: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification Prerequisite PyTorch = 1.2.0 Python3 torch

16 Dec 14, 2022

On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation (Findings of EMNLP 2021))

PTvsBT On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation (Findings of EMNLP 2021) Citation Please cite a

10 Nov 25, 2022

ARKitScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile RGB-D Data

ARKitScenes This repo accompanies the research paper, ARKitScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile RGB-D

371 Jan 5, 2023

Code for TIP 2017 paper --- Illumination Decomposition for Photograph with Multiple Light Sources.

Illumination_Decomposition Code for TIP 2017 paper --- Illumination Decomposition for Photograph with Multiple Light Sources. This code implements the

7 Nov 15, 2020

A DiY holiday project to demonstrate how you can send data from adafruitIO cloud to a balena edge device

holiday-star balena ❤️ adafruitIO Introduction A DiY holiday project to demonstrate how you can send data from adafruitIO cloud to a balena edge devic

3 Dec 20, 2021

Industrial Image Anomaly Localization Based on Gaussian Clustering of Pre-trained Feature

Industrial Image Anomaly Localization Based on Gaussian Clustering of Pre-trained Feature Q. Wan, L. Gao, X. Li and L. Wen, "Industrial Image Anomaly

6 Dec 25, 2022

A hangman game that I created. Thanks to Data Flair for giving me the code!

Hangman A hangman game that I created. Thanks to Data Flair for giving me the code! Run python3 hangman.py in a terminal if you have Python 3. Please

0 Dec 24, 2022

A Streamlit web-app for a data-science project that aims to evaluate if the answer to a question is helpful.

How useful is the aswer? A Streamlit web-app for a data-science project that aims to evaluate if the answer to a question is helpful. If you want to l

1 Dec 17, 2021

DataShare - Simple library for data sharing between scripts and public functions calling

DataShare - Simple library for data sharing between scripts and public functions calling. Installation. Install code, Delete LICENSE, README, readme.t

1 Dec 17, 2021

User-friendly Voice Cloning Application

Multi-Language-RTVC stands for Multi-Language Real Time Voice Cloning and is a Voice Cloning Tool capable of transfering speaker-specific audio featur

19 Dec 30, 2022

Ensembling Off-the-shelf Models for GAN Training

Data-Efficient GANs with DiffAugment project | paper | datasets | video | slides Generated using only 100 images of Obama, grumpy cats, pandas, the Br

1.2k Dec 26, 2022

Some pre-commit hooks for OpenMMLab projects

pre-commit-hooks Some pre-commit hooks for OpenMMLab projects. Using pre-commit-hooks with pre-commit Add this to your .pre-commit-config.yaml - rep

16 Nov 29, 2022

The official repository for ROOT: analyzing, storing and visualizing big data, scientifically

About The ROOT system provides a set of OO frameworks with all the functionality needed to handle and analyze large amounts of data in a very efficien

2k Dec 29, 2022

Fit models to your data in Python with Sherpa.

Table of Contents Sherpa License How To Install Sherpa Using Anaconda Using pip Building from source History Release History Sherpa Sherpa is a modeli

134 Jan 7, 2023

Download candlestick data fast & easy for analysis

crypto-candlesticks 📈 The goal behind this project is to facilitate downloading cryptocurrency candlestick data fast & simple. Currently only the Bit

31 Dec 11, 2022

Label data using HuggingFace's transformers and automatically get a prediction service

Label Studio for Hugging Face's Transformers Website • Docs • Twitter • Join Slack Community Transfer learning for NLP models by annotating your textu

135 Dec 29, 2022

Synchrosqueezing, wavelet transforms, and time-frequency analysis in Python

Synchrosqueezing is a powerful reassignment method that focuses time-frequency representations, and allows extraction of instantaneous amplitudes and frequencies

382 Jan 6, 2023

🐍 A hyper-fast Python module for reading/writing JSON data using Rust's serde-json.

A hyper-fast, safe Python module to read and write JSON data. Works as a drop-in replacement for Python's built-in json module. This is alpha software

479 Jan 1, 2023

Codes for realizing theories learned from Data Mining, Machine Learning, Deep Learning without using the present Python packages.

Codes-for-Algorithms Codes for realizing theories learned from Data Mining, Machine Learning, Deep Learning without using the present Python packages.

1 Apr 12, 2022

A model that attempts to learn and benefit from data collected on card counting.

A model that attempts to learn and benefit from data collected on card counting. A decision tree like model is built to win more often than loose and increase the bet of the player appropriately to come out winning as much money as possible.

1 Dec 17, 2021

Offline Multi-Agent Reinforcement Learning Implementations: Solving Overcooked Game with Data-Driven Method

Overcooked-AI We suppose to apply traditional offline reinforcement learning technique to multi-agent algorithm. In this repository, we implemented be

14 Sep 16, 2022

For radiometrically calibrating and PSF deconvolving IRIS data

irispreppy For radiometrically calibrating and PSF deconvolving IRIS data. I dislike how I need to own proprietary software (IDL) just to simply prepa

4 Nov 1, 2022

Data 25 Star Wars Project With Python

Data 25 Star Wars Project Instructions The character data in your MongoDB database has been pulled from https://swapi.tech/. As well as 'people', the

1 Dec 24, 2021

UniSpeech - Large Scale Self-Supervised Learning for Speech

UniSpeech The family of UniSpeech: WavLM (arXiv): WavLM: Large-Scale Self-Supervised Pre-training for Full Stack Speech Processing UniSpeech (ICML 202

281 Dec 15, 2022

Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition

SEW (Squeezed and Efficient Wav2vec) The repo contains the code of the paper "Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speec

67 Dec 1, 2022

Rotary Transformer is an MLM pre-trained language model with rotary position embedding (RoPE)

[中文|English] Rotary Transformer Rotary Transformer is an MLM pre-trained language model with rotary position embedding (RoPE). The RoPE is a relative

211 Dec 8, 2021

ByT5: Towards a token-free future with pre-trained byte-to-byte models

ByT5: Towards a token-free future with pre-trained byte-to-byte models ByT5 is a tokenizer-free extension of the mT5 model. Instead of using a subword

409 Jan 6, 2023

Code for CodeT5: a new code-aware pre-trained encoder-decoder model.

CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation This is the official PyTorch implementation

564 Jan 8, 2023

CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

CPT This repository contains code and checkpoints for CPT. CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Gener

342 Jan 5, 2023

Tools to download and cleanup Common Crawl data

cc_net Tools to download and clean Common Crawl as introduced in our paper CCNet. If you found these resources useful, please consider citing: @inproc

483 Jan 2, 2023

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

English | 简体中文 | 繁體中文 | 한국어 State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow 🤗 Transformers provides thousands of pretrained models

77.1k Dec 31, 2022

Pre-training with Extracted Gap-sentences for Abstractive SUmmarization Sequence-to-sequence models

PEGASUS library Pre-training with Extracted Gap-sentences for Abstractive SUmmarization Sequence-to-sequence models, or PEGASUS, uses self-supervised

1.4k Dec 22, 2022

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Hiring We are hiring at all levels (including FTE researchers and interns)! If you are interested in working with us on NLP and large-scale pre-traine

7.8k Jan 9, 2023

Unsupervised Language Model Pre-training for French

FlauBERT and FLUE FlauBERT is a French BERT trained on a very large and heterogeneous French corpus. Models of different sizes are trained using the n

212 Dec 10, 2022

Large-scale pretraining for dialogue

A State-of-the-Art Large-scale Pretrained Response Generation Model (DialoGPT) This repository contains the source code and trained model for a large-

1.8k Jan 7, 2023

Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

Introduction Funnel-Transformer is a new self-attention model that gradually compresses the sequence of hidden states to a shorter one and hence reduc

197 Dec 11, 2022

Code for EMNLP20 paper: "ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training"

ProphetNet-X This repo provides the code for reproducing the experiments in ProphetNet. In the paper, we propose a new pre-trained language model call

394 Dec 17, 2022

CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

CodeBERT This repo provides the code for reproducing the experiments in CodeBERT: A Pre-Trained Model for Programming and Natural Languages. CodeBERT

1k Jan 3, 2023

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

ELECTRA Introduction ELECTRA is a method for self-supervised language representation learning. It can be used to pre-train transformer networks using

2.1k Dec 28, 2022

Awesome Treasure of Transformers Models Collection

💁 Awesome Treasure of Transformers Models for Natural Language processing contains papers, videos, blogs, official repo along with colab Notebooks. 🛫☑️

577 Jan 7, 2023

A dashboard for Raspberry Pi to display environmental weather data, rain radar, weather forecast, etc. written in Python

Weather Clock for Raspberry PI This project is a dashboard for Raspberry Pi to display environmental weather data, rain radar, weather forecast, etc.

1 May 1, 2022

Pytorch codes for Feature Transfer Learning for Face Recognition with Under-Represented Data

FTLNet_Pytorch Pytorch codes for Feature Transfer Learning for Face Recognition with Under-Represented Data 1. Introduction This repo is an unofficial

1 Nov 4, 2020

A Pytorch reproduction of Range Loss, which is proposed in paper 《Range Loss for Deep Face Recognition with Long-Tailed Training Data》

RangeLoss Pytorch This is a Pytorch reproduction of Range Loss, which is proposed in paper 《Range Loss for Deep Face Recognition with Long-Tailed Trai

7 Nov 27, 2021

Python Data-pre-processing Resources

Python data-pre-processing Libraries

📜 GPT-2 Rhyming Limerick and Haiku models using data augmentation

A desktop application developed in Python with PyQt5 to predict demand and help monitor and schedule brewing processes for Barnaby's Brewhouse.

Resources related to our paper "CLIN-X: pre-trained language models and a study on cross-task transfer for concept extraction in the clinical domain"

Data Platform com AWS CDK

Get MODBUS data from Sofar (K-TLX) inverter through LSW-3 or LSE module

Data and analysis relating to the 5.8M Melbourne quake of 2021

Scrapegoat is a python library that can be used to scrape the websites from internet based on the relevance of the given topic irrespective of language using Natural Language Processing

Automatic data visualization in atom with the nteract data-explorer

Tidy data structures, summaries, and visualisations for missing data

Catch-all collection of generative art made using processing

Streaming over lightweight data transformations

A knowledge base construction engine for richly formatted data

computer vision, image processing and machine learning on the web browser or node.

Automatic labeling, conversion of different data set formats, sample size statistics, model cascade

This is a web crawler that works on employ email data by gmane.org and visualizes it in different ways.

Base on browser-time to get har from network, and use python to analyze the data .

A custom qq-plot for two sample data comparision

This script provides LIVE feedback for On-The-Fly data collection with RELION

BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese

Fast methods to work with hydro- and topography data in pure Python.

Pytorch library for seismic data augmentation

A Parameter-free Deep Embedded Clustering Method for Single-cell RNA-seq Data

Turning images into '9-pan' palettes using KMeans clustering from sklearn.

Highly decentralized and censorship-resistant way to store key data

Convert Table data to approximate values with GUI

Python SDK for LUSID by FINBOURNE, a bi-temporal investment management data platform with portfolio accounting capabilities.

simple way to build the declarative and destributed data pipelines with python

Python library for creating data pipelines with chain functional programming

Automation that uses Github Actions, Google Drive API, YouTube Data API and youtube-dl together to feed BackJam app with new music

Evaluation of file formats in the context of geo-referenced 3D geometries.

Fast Python reader and editor for ASAM MDF / MF4 (Measurement Data Format) files

🌎 The Modern Declarative Data Flow Framework for the AI Empowered Generation.

A PyTorch-based model pruning toolkit for pre-trained language models

A full pipeline AutoML tool for tabular data

Feature Store for Machine Learning

A distributed block-based data storage and compute engine

AtlasNet: A Papier-Mâché Approach to Learning 3D Surface Generation

Use .csv files to record, play and evaluate motion capture data.

Healthsea is a spaCy pipeline for analyzing user reviews of supplementary products for their effects on health.

Python package for analyzing sensor-collected human motion data

Motion Reconstruction Code and Data for Skills from Videos (SFV)

demir.ai Dataset Operations

pyo is a Python module written in C to help digital signal processing script creation.

Itchio Downloader Tool with python

Process RunGap output file of a workout and load data into Apple Numbers Spreadsheet and my website with API calls

Howell County, Missouri, COVID-19 data and (unofficial) estimates

BERT, LDA, and TFIDF based keyword extraction in Python

Functions for easily making publication-quality figures with matplotlib.

BERT-based Financial Question Answering System

Continuously evaluated, functional, incremental, time-series forecasting

Code for: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation (Findings of EMNLP 2021))

ARKitScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile RGB-D Data

Code for TIP 2017 paper --- Illumination Decomposition for Photograph with Multiple Light Sources.

A DiY holiday project to demonstrate how you can send data from adafruitIO cloud to a balena edge device

Industrial Image Anomaly Localization Based on Gaussian Clustering of Pre-trained Feature

A hangman game that I created. Thanks to Data Flair for giving me the code!

A Streamlit web-app for a data-science project that aims to evaluate if the answer to a question is helpful.

DataShare - Simple library for data sharing between scripts and public functions calling

User-friendly Voice Cloning Application

Ensembling Off-the-shelf Models for GAN Training

Some pre-commit hooks for OpenMMLab projects

The official repository for ROOT: analyzing, storing and visualizing big data, scientifically

Fit models to your data in Python with Sherpa.

Download candlestick data fast & easy for analysis

Label data using HuggingFace's transformers and automatically get a prediction service

Synchrosqueezing, wavelet transforms, and time-frequency analysis in Python

🐍 A hyper-fast Python module for reading/writing JSON data using Rust's serde-json.

Codes for realizing theories learned from Data Mining, Machine Learning, Deep Learning without using the present Python packages.

A model that attempts to learn and benefit from data collected on card counting.

Offline Multi-Agent Reinforcement Learning Implementations: Solving Overcooked Game with Data-Driven Method

For radiometrically calibrating and PSF deconvolving IRIS data

Data 25 Star Wars Project With Python

UniSpeech - Large Scale Self-Supervised Learning for Speech

Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition

Rotary Transformer is an MLM pre-trained language model with rotary position embedding (RoPE)

ByT5: Towards a token-free future with pre-trained byte-to-byte models

Code for CodeT5: a new code-aware pre-trained encoder-decoder model.