3255 Repositories
Python probabilistic-data-analysis Libraries
TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset.
TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset. TunBERT was applied to three NLP downstream tasks: Sentiment Analysis (SA), Tunisian Dialect Identification (TDI) and Reading Comprehension Question-Answering (RCQA)
An tiny CLI to load data from a JSON File during development.
JSON Server - An tiny CLI to load data from a JSON File during development.
flexible time-series processing & feature extraction
tsflex is a toolkit for flexible time-series processing & feature extraction, making few assumptions about input data. Useful links Documentation Exam
LifeSaver automatically, periodically saves USB flash drive data into the PC
LifeSaver automatically, periodically saves USB flash drive data into the PC. Theoriticaly it will work with any any connected drive ex - Hard Disk ,SSD ... But, can't handle Backing up multipatition drives. I can guess, but cannot be sure of, how it will react to multipartiton system.
Save data from Instagram takeout to a SQLite database
instagram-to-sqlite Save data from a Instagram takeout to a SQLite database. Mise En Place git clone https://github.com/gavindsouza/instagram-to-sqlit
List of short Codeforces problems with a statement of 1000 characters or less. Python script and data files included.
Shortest problems on Codeforces List of Codeforces problems with a short problem statement of 1000 characters or less. Sorted for each rating level. B
fds is a tool for Data Scientists made by DAGsHub to version control data and code at once.
Fast Data Science, AKA fds, is a CLI for Data Scientists to version control data and code at once, by conveniently wrapping git and dvc
Performance data for WASM SIMD instructions.
WASM SIMD Data This repository contains code and data which can be used to generate a JSON file containing information about the WASM SIMD proposal. F
A tool for the creation of rooms used in maps in the game Wastelands
Wastelands Room Data editor A tool for the creation of rooms used in maps in the game Wastelands Creates .wrd files, that get loaded by the map genera
🧪 Panel-Chemistry - exploratory data analysis and build powerful data and viz tools within the domain of Chemistry using Python and HoloViz Panel.
🧪📈 🐍. The purpose of the panel-chemistry project is to make it really easy for you to do DATA ANALYSIS and build powerful DATA AND VIZ APPLICATIONS within the domain of Chemistry using using Python and HoloViz Panel.
spafe: Simplified Python Audio-Features Extraction
spafe aims to simplify features extractions from mono audio files. The library can extract of the following features: BFCC, LFCC, LPC, LPCC, MFCC, IMFCC, MSRCC, NGCC, PNCC, PSRCC, PLP, RPLP, Frequency-stats etc. It also provides various filterbank modules (Mel, Bark and Gammatone filterbanks) and other spectral statistics.
🐸 Identify anything. pyWhat easily lets you identify emails, IP addresses, and more. Feed it a .pcap file or some text and it'll tell you what it is! 🧙♀️
🐸 Identify anything. pyWhat easily lets you identify emails, IP addresses, and more. Feed it a .pcap file or some text and it'll tell you what it is! 🧙♀️
Python code for "Machine learning: a probabilistic perspective" (2nd edition)
Python code for "Machine learning: a probabilistic perspective" (2nd edition)
Kats is a toolkit to analyze time series data, a lightweight, easy-to-use, and generalizable framework to perform time series analysis.
Kats, a kit to analyze time series data, a lightweight, easy-to-use, generalizable, and extendable framework to perform time series analysis, from understanding the key statistics and characteristics, detecting change points and anomalies, to forecasting future trends.
Segmentation and Identification of Vertebrae in CT Scans using CNN, k-means Clustering and k-NN
Segmentation and Identification of Vertebrae in CT Scans using CNN, k-means Clustering and k-NN If you use this code for your research, please cite ou
This websocket program is for data transmission between server and client. Data transmission is for Federated Learning in Edge computing environment.
websocket-for-data-transmission This websocket program is for data transmission between server and client. Data transmission is for Federated Learning
MDAnalysis tool to calculate membrane curvature.
The MDAkit for membrane curvature analysis is part of the Google Summer of Code program and it is linked to a Code of Conduct.
NBEATSx: Neural basis expansion analysis with exogenous variables
NBEATSx: Neural basis expansion analysis with exogenous variables We extend the NBEATS model to incorporate exogenous factors. The resulting method, c
[ICLR2021] Unlearnable Examples: Making Personal Data Unexploitable
Unlearnable Examples Code for ICLR2021 Spotlight Paper "Unlearnable Examples: Making Personal Data Unexploitable " by Hanxun Huang, Xingjun Ma, Sarah
A gesture recognition system powered by OpenPose, k-nearest neighbours, and local outlier factor.
OpenHands OpenHands is a gesture recognition system powered by OpenPose, k-nearest neighbours, and local outlier factor. Currently the system can iden
AugLy is a data augmentations library that currently supports four modalities (audio, image, text & video) and over 100 augmentations
AugLy is a data augmentations library that currently supports four modalities (audio, image, text & video) and over 100 augmentations. Each modality’s augmentations are contained within its own sub-library. These sub-libraries include both function-based and class-based transforms, composition operators, and have the option to provide metadata about the transform applied, including its intensity.
A set of functions and analysis classes for solvation structure analysis
SolvationAnalysis The macroscopic behavior of a liquid is determined by its microscopic structure. For ionic systems, like batteries and many enzymes,
The coda and data for "Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach" (ACL '21)
We propose a hierarchical core-fringe learning framework to measure fine-grained domain relevance of terms – the degree that a term is relevant to a broad (e.g., computer science) or narrow (e.g., deep learning) domain.
A data-driven maritime port simulator
PySeidon - A Data-Driven Maritime Port Simulator 🌊 Extendable and modular software for maritime port simulation. This software uses entity-component
IMGUR5K handwriting set. It is a handwritten in-the-wild dataset, which contains challenging real world handwritten samples from different writers.The dataset is shared as a set of image urls with annotations. This code downloads the images and verifies the hash to the image to avoid data contamination.
IMGUR5K Handwriting Dataset To run the code for downloading the urls and generate corresponding annotations : Usage: python download_imgur5k.py --data
A denoising diffusion probabilistic model (DDPM) tailored for conditional generation of protein distograms
Denoising Diffusion Probabilistic Model for Proteins Implementation of Denoising Diffusion Probabilistic Model in Pytorch. It is a new approach to gen
resultados (data) de elecciones 2021 y código para extraer data de la ONPE
elecciones-peru-2021-ONPE Resultados (data) de elecciones 2021 y código para extraer data de la ONPE Data Licencia liberal, pero si vas a usarlo por f
Source code and data from the RecSys 2020 article "Carousel Personalization in Music Streaming Apps with Contextual Bandits" by W. Bendada, G. Salha and T. Bontempelli
Carousel Personalization in Music Streaming Apps with Contextual Bandits - RecSys 2020 This repository provides Python code and data to reproduce expe
Code and data for the paper "Hearing What You Cannot See"
Hearing What You Cannot See: Acoustic Vehicle Detection Around Corners Public repository of the paper "Hearing What You Cannot See: Acoustic Vehicle D
Vent domain information retrieval tool, which is capable of retrieving customer information
Vent domain information retrieval tool, which is capable of retrieving customer information. This tool has been created for the purpose of complete education, Iam not responsible for any illegal activities.
Provide Prometheus url_sd compatible API Endpoint with data from Netbox
netbox-plugin-prometheus-sd Provide Prometheus http_sd compatible API Endpoint with data from Netbox. HTTP SD is a new feature in Prometheus and not a
The official PyTorch implementation of recent paper - SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training
This repository is the official PyTorch implementation of SAINT. Find the paper on arxiv SAINT: Improved Neural Networks for Tabular Data via Row Atte
Source codes of CenterTrack++ in 2021 ICME Workshop on Big Surveillance Data Processing and Analysis
MOT Tracked object bounding box association (CenterTrack++) New association method based on CenterTrack. Two new branches (Tracked Size and IOU) are a
Code implementation of "Sparsity Probe: Analysis tool for Deep Learning Models"
Sparsity Probe: Analysis tool for Deep Learning Models This repository is a limited implementation of Sparsity Probe: Analysis tool for Deep Learning
Vehicle Identification Speed Detection (VISD) extracts vehicle information like License Plate number, Manufacturer and colour from a video and provides this data in the form of a CSV file
Vehicle Identification Speed Detection (VISD) extracts vehicle information like License Plate number, Manufacturer and colour from a video and provides this data in the form of a CSV file. VISD can also perform vehicle speed detection on a video. All these features of VSID are provided to the user using a Web Application which is created using Flask
Code and data of the ACL 2021 paper: Few-Shot Text Ranking with Meta Adapted Synthetic Weak Supervision
MetaAdaptRank This repository provides the implementation of meta-learning to reweight synthetic weak supervision data described in the paper Few-Shot
State-of-the-art data augmentation search algorithms in PyTorch
MuarAugment Description MuarAugment is a package providing the easiest way to a state-of-the-art data augmentation pipeline. How to use You can instal
This is django-import-export module that exports data into many formats
django-import-export This is django-import-export module which exports data into many formats, you can implement this in your admin panel. - Dehydrat
InsightFace: 2D and 3D Face Analysis Project on MXNet and PyTorch
InsightFace: 2D and 3D Face Analysis Project on MXNet and PyTorch
ESP32 python application to read data from a Tilt™ Hydrometer for homebrewing
TitlESP32 ESP32 MicroPython application to read and log data from a Tilt™ Hydrometer. Requirements A board with an ESP32 chip USB cable - USB A / micr
Official PyTorch implementation for FastDPM, a fast sampling algorithm for diffusion probabilistic models
Official PyTorch implementation for "On Fast Sampling of Diffusion Probabilistic Models". FastDPM generation on CIFAR-10, CelebA, and LSUN datasets. S
LinkedIn connections analyzer
LinkedIn Connections Analyzer 🔗 https://linkedin-analzyer.herokuapp.com Hey hey 👋 , welcome to my LinkedIn connections analyzer. I recently found ou
Corset is a web-based data selection portal that helps you getting relevant data from massive amounts of parallel data.
Corset is a web-based data selection portal that helps you getting relevant data from massive amounts of parallel data. So, if you don't need the whole corpus, but just a suitable subset (indeed, a cor(pus sub)set, this is what Corset will do for you--and the reason of the name of the tool.
Probabilistic Gradient Boosting Machines
PGBM Probabilistic Gradient Boosting Machines (PGBM) is a probabilistic gradient boosting framework in Python based on PyTorch/Numba, developed by Air
A Curated Collection of Awesome Python Scripts
A Curated Collection of Awesome Python Scripts that will make you go wow. This repository will help you in getting those green squares. Hop in and enjoy the journey of open source. 🚀
Streamlit component for Let's-Plot visualization library
streamlit-letsplot This is a work-in-progress, providing a convenience function to plot charts from the Lets-Plot visualization library. Example usage
The BCNet related data and inference model.
BCNet This repository includes the some source code and related dataset of paper BCNet: Learning Body and Cloth Shape from A Single Image, ECCV 2020,
A framework for analyzing computer vision models with simulated data
3DB: A framework for analyzing computer vision models with simulated data Paper Quickstart guide Blog post Installation Follow instructions on: https:
This code reproduces the results of the paper, "Measuring Data Leakage in Machine-Learning Models with Fisher Information"
Fisher Information Loss This repository contains code that can be used to reproduce the experimental results presented in the paper: Awni Hannun, Chua
Adversarial Attacks on Probabilistic Autoregressive Forecasting Models.
Attack-Probabilistic-Models This is the source code for Adversarial Attacks on Probabilistic Autoregressive Forecasting Models. This repository contai
Training data extraction on GPT-2
Training data extraction from GPT-2 This repository contains code for extracting training data from GPT-2, following the approach outlined in the foll
This is the official Pytorch implementation of "Lung Segmentation from Chest X-rays using Variational Data Imputation", Raghavendra Selvan et al. 2020
README This is the official Pytorch implementation of "Lung Segmentation from Chest X-rays using Variational Data Imputation", Raghavendra Selvan et a
[ICML 2021, Long Talk] Delving into Deep Imbalanced Regression
Delving into Deep Imbalanced Regression This repository contains the implementation code for paper: Delving into Deep Imbalanced Regression Yuzhe Yang
Learn Blockchains by Building One, A simple Blockchain in Python using Flask as a micro web framework.
Blockchain ✨ Learn Blockchains by Building One Yourself Installation Make sure Python 3.6+ is installed. Install Flask Web Framework. Clone this repos
Paddle implementation for "Highly Efficient Knowledge Graph Embedding Learning with Closed-Form Orthogonal Procrustes Analysis" (NAACL 2021)
ProcrustEs-KGE Paddle implementation for Highly Efficient Knowledge Graph Embedding Learning with Orthogonal Procrustes Analysis 🙈 A more detailed re
Efficient Lottery Ticket Finding: Less Data is More
The lottery ticket hypothesis (LTH) reveals the existence of winning tickets (sparse but critical subnetworks) for dense networks, that can be trained in isolation from random initialization to match the latter’s accuracies.
Using context-free grammar formalism to parse English sentences to determine their structure to help computer to better understand the meaning of the sentence.
Sentance Parser Executing the Program Make sure Python 3.6+ is installed. Install requirements $ pip install requirements.txt Run the program:
Symbolic Parallel Adaptive Importance Sampling for Probabilistic Program Analysis in JAX
SYMPAIS: Symbolic Parallel Adaptive Importance Sampling for Probabilistic Program Analysis Overview | Installation | Documentation | Examples | Notebo
Simple Python application to transform Serial data into OSC messages
SerialToOSC-Bridge Simple Python application to transform Serial data into OSC messages. The current purpose is to be a compatibility layer between ha
Python implementation of the Density Line Chart by Moritz & Fisher.
PyDLC - Density Line Charts with Python Python implementation of the Density Line Chart (Moritz & Fisher, 2018) to visualize large collections of time
Boosted neural network for tabular data
XBNet - Xtremely Boosted Network Boosted neural network for tabular data XBNet is an open source project which is built with PyTorch which tries to co
code for our paper "Source Data-absent Unsupervised Domain Adaptation through Hypothesis Transfer and Labeling Transfer"
SHOT++ Code for our TPAMI submission "Source Data-absent Unsupervised Domain Adaptation through Hypothesis Transfer and Labeling Transfer" that is ext
This is the research repository for Vid2Doppler: Synthesizing Doppler Radar Data from Videos for Training Privacy-Preserving Activity Recognition.
Vid2Doppler: Synthesizing Doppler Radar Data from Videos for Training Privacy-Preserving Activity Recognition This is the research repository for Vid2
A standalone package to scrape financial data from listed Vietnamese companies via Vietstock
Scrape Financial Data of Vietnamese Listed Companies - Version 2 A standalone package to scrape financial data from listed Vietnamese companies via Vi
Repository for the "Gotta Go Fast When Generating Data with Score-Based Models" paper
Gotta Go Fast When Generating Data with Score-Based Models This repo contains the official implementation for the paper Gotta Go Fast When Generating
🔬 A curated list of awesome machine learning strategies & tools in financial market.
🔬 A curated list of awesome machine learning strategies & tools in financial market.
APKLeaks - Scanning APK file for URIs, endpoints & secrets.
APKLeaks - Scanning APK file for URIs, endpoints & secrets.
Registration Loss Learning for Deep Probabilistic Point Set Registration
RLLReg This repository contains a Pytorch implementation of the point set registration method RLLReg. Details about the method can be found in the 3DV
This repo contains the code and data used in the paper "Wizard of Search Engine: Access to Information Through Conversations with Search Engines"
Wizard of Search Engine: Access to Information Through Conversations with Search Engines by Pengjie Ren, Zhongkun Liu, Xiaomeng Song, Hongtao Tian, Zh
Lumen provides a framework for visual analytics, which allows users to build data-driven dashboards from a simple yaml specification
Lumen project provides a framework for visual analytics, which allows users to build data-driven dashboards from a simple yaml specification
Home Assistant for Opendata CWB. Get the weather forecast of the city in Taiwan.
Home assistant support for Opendata CWB. The readme in Traditional Chinese. This integration is based on OpenWeatherMap (@csparpa, pyowm) to develop.
Solve various integral equations using numerical methods in Python
Solve Volterra and Fredholm integral equations This Python package estimates Volterra and Fredholm integral equations using known techniques. Installa
A New, Interactive Approach to Learning Python
This is the repository for The Python Workshop, published by Packt. It contains all the supporting project files necessary to work through the course from start to finish.
Keras-1D-ACGAN-Data-Augmentation
Keras-1D-ACGAN-Data-Augmentation What is the ACGAN(Auxiliary Classifier GANs) ? Related Paper : [Abstract : Synthesizing high resolution photorealisti
A Python Module That Uses ANN To Predict A Stocks Price And Also Provides Accurate Technical Analysis With Many High Potential Implementations!
Stox A Module to predict the "close price" for the next day and give "technical analysis". It uses a Neural Network and the LSTM algorithm to predict
Uncertainty-aware Semantic Segmentation of LiDAR Point Clouds for Autonomous Driving
SalsaNext: Fast, Uncertainty-aware Semantic Segmentation of LiDAR Point Clouds for Autonomous Driving Abstract In this paper, we introduce SalsaNext f
The implementation of the algorithm in the paper "Safe Deep Semi-Supervised Learning for Unseen-Class Unlabeled Data" published in ICML 2020.
DS3L This is the code for paper "Safe Deep Semi-Supervised Learning for Unseen-Class Unlabeled Data" published in ICML 2020. Setups The code is implem
NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling
NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling For Official repo of NU-Wave: A Diffusion Probabilistic Model for Neural Audio Up
Pytorch codes for "Self-supervised Multi-view Stereo via Effective Co-Segmentation and Data-Augmentation"
Self-Supervised-MVS This repository is the official PyTorch implementation of our AAAI 2021 paper: "Self-supervised Multi-view Stereo via Effective Co
Code for CVPR 2021 oral paper "Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts"
Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts The rapid progress in 3D scene understanding has come with growing dem
Deep functional residue identification
DeepFRI Deep functional residue identification Citing @article {Gligorijevic2019, author = {Gligorijevic, Vladimir and Renfrew, P. Douglas and Koscio
Data & Code for ACCENTOR Adding Chit-Chat to Enhance Task-Oriented Dialogues
ACCENTOR: Adding Chit-Chat to Enhance Task-Oriented Dialogues Overview ACCENTOR consists of the human-annotated chit-chat additions to the 23.8K dialo
pyprobables is a pure-python library for probabilistic data structures
pyprobables is a pure-python library for probabilistic data structures. The goal is to provide the developer with a pure-python implementation of common probabilistic data-structures to use in their work.
leafmap - A Python package for geospatial analysis and interactive mapping in a Jupyter environment.
A Python package for geospatial analysis and interactive mapping with minimal coding in a Jupyter environment
SiamMOT is a region-based Siamese Multi-Object Tracking network that detects and associates object instances simultaneously.
SiamMOT: Siamese Multi-Object Tracking
Code for the paper "Adapting Monolingual Models: Data can be Scarce when Language Similarity is High"
Wietse de Vries • Martijn Bartelds • Malvina Nissim • Martijn Wieling Adapting Monolingual Models: Data can be Scarce when Language Similarity is High
EOReader is a multi-satellite reader allowing you to open optical and SAR data.
Remote-sensing opensource python library reading optical and SAR sensors, loading and stacking bands, clouds, DEM and index.
Bachelor's Thesis in Computer Science: Privacy-Preserving Federated Learning Applied to Decentralized Data
federated is the source code for the Bachelor's Thesis Privacy-Preserving Federated Learning Applied to Decentralized Data (Spring 2021, NTNU) Federat
Orchest is a browser based IDE for Data Science.
Orchest is a browser based IDE for Data Science. It integrates your favorite Data Science tools out of the box, so you don’t have to. The application is easy to use and can run on your laptop as well as on a large scale cloud cluster.
Code for reproducing our analysis in the paper titled: Image Cropping on Twitter: Fairness Metrics, their Limitations, and the Importance of Representation, Design, and Agency
Image Crop Analysis This is a repo for the code used for reproducing our Image Crop Analysis paper as shared on our blog post. If you plan to use this
Codes for our IJCAI21 paper: Dialogue Discourse-Aware Graph Model and Data Augmentation for Meeting Summarization
DDAMS This is the pytorch code for our IJCAI 2021 paper Dialogue Discourse-Aware Graph Model and Data Augmentation for Meeting Summarization [Arxiv Pr
Procedural 3D data generation pipeline for architecture
Synthetic Dataset Generator Authors: Stanislava Fedorova Alberto Tono Meher Shashwat Nigam Jiayao Zhang Amirhossein Ahmadnia Cecilia bolognesi Dominik
Image data augmentation scheduler for albumentations transforms
albu_scheduler Scheduler for albumentations transforms based on PyTorch schedulers interface Usage TransformMultiStepScheduler import albumentations a
Deep Probabilistic Programming Course @ DIKU
Deep Probabilistic Programming Course @ DIKU
An Unsupervised Graph-based Toolbox for Fraud Detection
An Unsupervised Graph-based Toolbox for Fraud Detection Introduction: UGFraud is an unsupervised graph-based fraud detection toolbox that integrates s
Apache Liminal is an end-to-end platform for data engineers & scientists, allowing them to build, train and deploy machine learning models in a robust and agile way
Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.
A powerful bot to copy your google drive data to your team drive
⚛️ Clonebot - Heroku version ⚡ CloneBot is a telegram bot that allows you to copy folder/team drive to team drives. One of the main advantage of this
A Lucid Framework for Transparent and Interpretable Machine Learning Models.
Currently a Beta-Version lucidmode is an open-source, low-code and lightweight Python framework for transparent and interpretable machine learning mod
Cloud-native, data onboarding architecture for the Google Cloud Public Datasets program
Public Datasets Pipelines Cloud-native, data pipeline architecture for onboarding datasets to the Google Cloud Public Datasets Program. Overview Requi
This is a repository for the Duke University Cloud Computing course project on Serveless Data Engineering Pipeline. For this project, I recreated the below pipeline.
AWS Data Engineering Pipeline This is a repository for the Duke University Cloud Computing course project on Serverless Data Engineering Pipeline. For