3252 Python Crime-Data-Batch-Processing Libraries

This tool parses log data and allows to define analysis pipelines for anomaly detection.

logdata-anomaly-miner This tool parses log data and allows to define analysis pipelines for anomaly detection. It was designed to run the analysis wit

32 Nov 27, 2022

Tools for analyzing data collected with a custom unity-based VR for insects.

unityvr Tools for analyzing data collected with a custom unity-based VR for insects. Organization: The unityvr package contains the following submodul

1 Dec 14, 2022

A tool for batch processing large fasta files and accompanying metadata table to upload to repositories via API

Fasta Uploader A tool for batch processing large fasta files and accompanying metadata table to repositories via API The python fasta_uploader.py scri

Centre for Infectious Disease and One Health

1 Dec 9, 2021

A collection of tools for biomedical research assay analysis in Python.

waltlabtools A collection of tools for biomedical research assay analysis in Python. Key Features Analysis for assays such as digital ELISA, including

1 Apr 18, 2022

A learning-based data collection tool for human segmentation

FullBodyFilter A Learning-Based Data Collection Tool For Human Segmentation Contents Documentation Source Code and Scripts Overview of Project Usage O

4 Jun 24, 2022

A collection of robust and fast processing tools for parsing and analyzing web archive data.

ChatNoir Resiliparse A collection of robust and fast processing tools for parsing and analyzing web archive data. Resiliparse is part of the ChatNoir

24 Nov 29, 2022

Suite of tools for retrieving USGS NWIS observations and evaluating National Water Model (NWM) data.

Documentation OWPHydroTools GitHub pages documentation Motivation We developed OWPHydroTools with data scientists in mind. We attempted to ensure the

36 Dec 11, 2022

A tool and a library for SVG path data transformations.

SVG path data transformation toolkit A tool and a library for SVG path data transformations. Currently it supports a translation and a scaling. Usage

2 Mar 7, 2022

A python tool for batch importing excel/csv files into mysql database.

PySimpleGUI numpy pandas pymysql chardet

30 Jan 8, 2023

NLP Overview

NLP-Overview Introduction The field of NPL encompasses a variety of topics which involve the computational processing and understanding of human langu

1 Jan 13, 2022

An Open-Source Toolkit for Prompt-Learning.

An Open-Source Framework for Prompt-learning. Overview • Installation • How To Use • Docs • Paper • Citation • What's New? Nov 2021: Now we have relea

2.3k Jan 7, 2023

Causal-BALD: Deep Bayesian Active Learning of Outcomes to Infer Treatment-Effects from Observational Data.

13 Oct 7, 2022

Implementation of popular bandit algorithms in batch environments.

batch-bandits Implementation of popular bandit algorithms in batch environments. Source code to our paper "The Impact of Batch Learning in Stochastic

2 Sep 11, 2022

Create shortcuts on Windows to your Python, EXE, Batch files or any other file using a GUI made with PySimpleGUI

shor Windows Shortcut Creation Create Windows Shortcuts to your python programs and any other file easily using this application created using PySimpl

7 Nov 16, 2021

python's memory-saving dictionary data structure

ConstDict python代替的Dict数据结构若字典不会增加字段，只读/原字段修改使用ConstDict可节省内存 Dict()内存主要消耗的地方： 1、Dict扩容机制，预留内存空间 2、Dict也是一个对象，内部会动态维护__dict__，增加slot类属性可以节省内容节省内存大小

1 Nov 3, 2021

This repository has datasets containing information of Uber pickups in NYC from April 2014 to September 2014 and January to June 2015. data Analysis , virtualization and some insights are gathered here

uber-pickups-analysis Data Source: https://www.kaggle.com/fivethirtyeight/uber-pickups-in-new-york-city Information about data set The dataset contain

1 Nov 3, 2021

LegalNLP - Natural Language Processing Methods for the Brazilian Legal Language

LegalNLP - Natural Language Processing Methods for the Brazilian Legal Language ⚖️ The library of Natural Language Processing for Brazilian legal lang

125 Dec 20, 2022

Public repo for automation scripts

Script_Quickies Public repo for automation scripts Dependencies Chrome webdriver .exe (make sure it matches the version of chrome you are using) Selen

1 Nov 4, 2021

Streamz helps you build pipelines to manage continuous streams of data

Streamz helps you build pipelines to manage continuous streams of data. It is simple to use in simple cases, but also supports complex pipelines that involve branching, joining, flow control, feedback, back pressure, and so on.

1.1k Dec 28, 2022

MongoDB utility to inflate the contents of small collection to a new larger collection

MongoDB Data Inflater ("data-inflater") The data-inflater tool is a MongoDB utility to automate the creation of a new large database collection using

3 Nov 28, 2021

Visualization Data Drug in thailand during 2014 to 2020

Visualization Data Drug in thailand during 2014 to 2020 Data sorce from ข้อมูลเปิดภาครัฐ สำนักงาน ป.ป.ส Inttroducing program Using tkinter module for

1 Jan 5, 2022

Official implementation of Generalized Data Weighting via Class-level Gradient Manipulation (NeurIPS 2021).

Generalized Data Weighting via Class-level Gradient Manipulation This repository is the official implementation of Generalized Data Weighting via Clas

9 Nov 3, 2021

nrgpy is the Python package for processing NRG Data Files

nrgpy nrgpy is the Python package for processing NRG Data Files Website and source: https://github.com/nrgpy/nrgpy Documentation: https://nrgpy.github

23 Dec 8, 2022

The Metabolomics Integrator (MINT) is a post-processing tool for liquid chromatography-mass spectrometry (LCMS) based metabolomics.

MINT (Metabolomics Integrator) The Metabolomics Integrator (MINT) is a post-processing tool for liquid chromatography-mass spectrometry (LCMS) based m

0 May 4, 2022

Data science, Data manipulation and Machine learning package.

duality Data science, Data manipulation and Machine learning package. Use permitted according to the terms of use and conditions set by the attached l

3 Oct 19, 2022

Extract the temperature data of each wire from the thermal imager raw data.

Wire-Tempurature-Detection Extract the temperature data of each wire from the thermal imager raw data. The motivation of this computer vision project

1 Nov 3, 2021

VevestaX is an open source Python package for ML Engineers and Data Scientists.

VevestaX Track failed and successful experiments as well as features. VevestaX is an open source Python package for ML Engineers and Data Scientists.

24 Dec 14, 2022

Connects to a local SenseCap M1 Helium Hotspot and pulls API Data.

sensecap_api_checker_HELIUM Connects to a local SenseCap M1 Helium Hotspot and pulls API Data.

1 Nov 3, 2021

scikit-multimodallearn is a Python package implementing algorithms multimodal data.

scikit-multimodallearn is a Python package implementing algorithms multimodal data. It is compatible with scikit-learn, a popul

12 Jun 29, 2022

Distributing Deep Learning Hyperparameter Tuning for 3D Medical Image Segmentation

DistMIS Distributing Deep Learning Hyperparameter Tuning for 3D Medical Image Segmentation. DistriMIS Distributing Deep Learning Hyperparameter Tuning

2 Sep 9, 2022

Generalized Data Weighting via Class-level Gradient Manipulation

Generalized Data Weighting via Class-level Gradient Manipulation This repository is the official implementation of Generalized Data Weighting via Clas

18 Nov 12, 2022

Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.

EfficientZero (NeurIPS 2021) Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021. Thank you for you

671 Jan 3, 2023

Personal thermal comfort models using digital twins: Preference prediction with BIM-extracted spatial-temporal proximity data from Build2Vec

Personal thermal comfort models using digital twins: Preference prediction with BIM-extracted spatial-temporal proximity data from Build2Vec This repo

Building and Urban Data Science (BUDS) Group

5 Dec 2, 2022

SW components and demos for visual kinship recognition. An emphasis is put on the FIW dataset-- data loaders, benchmarks, results in summary.

FIW Data Development Kit Table of Contents Introduction Families In the Wild Database Publications Organization To Do License Getting Involved Introdu

12 Jun 4, 2022

[NeurIPS 2021] “Improving Contrastive Learning on Imbalanced Data via Open-World Sampling”

Improving Contrastive Learning on Imbalanced Data via Open-World Sampling Introduction Contrastive learning approaches have achieved great success in

4 Nov 3, 2021

Data and Code for paper Outlining and Filling: Hierarchical Query Graph Generation for Answering Complex Questions over Knowledge Graph is available for research purposes.

Data and Code for paper Outlining and Filling: Hierarchical Query Graph Generation for Answering Complex Questions over Knowledge Graph is available f

5 Nov 10, 2022

Cross-lingual Transfer for Speech Processing using Acoustic Language Similarity

Cross-lingual Transfer for Speech Processing using Acoustic Language Similarity Indic TTS Samples can be found at https://peter-yh-wu.github.io/cross-

1 Nov 12, 2022

Code for a seq2seq architecture with Bahdanau attention designed to map stereotactic EEG data from human brains to spectrograms, using the PyTorch Lightning.

stereoEEG2speech We provide code for a seq2seq architecture with Bahdanau attention designed to map stereotactic EEG data from human brains to spectro

15 Nov 11, 2022

Bayes-Newton—A Gaussian process library in JAX, with a unifying view of approximate Bayesian inference as variants of Newton's algorithm.

Bayes-Newton Bayes-Newton is a library for approximate inference in Gaussian processes (GPs) in JAX (with objax), built and actively maintained by Wil

165 Nov 27, 2022

The Timescale NFT Starter Kit is a step-by-step guide to get up and running with collecting, storing, analyzing and visualizing NFT data from OpenSea, using PostgreSQL and TimescaleDB.

Timescale NFT Starter Kit The Timescale NFT Starter Kit is a step-by-step guide to get up and running with collecting, storing, analyzing and visualiz

102 Dec 24, 2022

This is the accompanying repository for the Bloomberg Global Coal Countdown website.

This is the accompanying repository for the Bloomberg Global Coal Countdown (BGCC) website. Data Sources Dashboard Data Schema and Validation License

7 Jun 1, 2022

Privacy as Code for DSAR Orchestration: Privacy Request automation to fulfill GDPR, CCPA, and LGPD data subject requests.

Meet Fidesops: Privacy as Code for DSAR Orchestration A part of the greater Fides ecosystem. ⚡ Overview Fidesops (fee-dez-äps, combination of the Lati

44 Dec 6, 2022

A tool for hiding data inside of images

Stegenography-tool a tool for hiding data inside of images Quick test: do python steg-encode.py test/message.txt test/covid19.png to generate the test

2 Nov 2, 2021

Planning Algorithms in AI and Robotics. MSc course at Skoltech Data Science program

Planning Algorithms in AI and Robotics course T2 2021-22 The Planning Algorithms in AI and Robotics course at Skoltech, MS in Data Science, during T2,

6 Sep 21, 2022

PyTorch implementation of "A Two-Stage End-to-End System for Speech-in-Noise Hearing Aid Processing"

Implementation of the Sheffield entry for the first Clarity enhancement challenge (CEC1) This repository contains the PyTorch implementation of "A Two

10 Aug 19, 2022

EOD Historical Data Python Library (Unofficial)

EOD Historical Data Python Library (Unofficial) https://eodhistoricaldata.com Installation python3 -m pip install eodhistoricaldata Note Demo API key

20 Dec 22, 2022

A selection of SQLite3 databases to practice querying from.

Dummy SQL Databases This is a collection of dummy SQLite3 databases, for learning and practicing SQL querying, generated with the VS Code extension Ge

1 Feb 26, 2022

This is an auto-ML tool specialized in detecting of outliers

Auto-ML tool specialized in detecting of outliers Description This tool will allows you, with a Dash visualization, to compare 10 models of machine le

1 Nov 3, 2021

Official repository of the paper "A Variational Approximation for Analyzing the Dynamics of Panel Data". Mixed Effect Neural ODE. UAI 2021.

Official repository of the paper (UAI 2021) "A Variational Approximation for Analyzing the Dynamics of Panel Data", Mixed Effect Neural ODE. Panel dat

7 Nov 26, 2022

Data Poisoning based on Adversarial Attacks using Non-Robust Features

Data Poisoning based on Adversarial Attacks using Non-Robust Features Usage python main.py [-h] [--gpu | -g GPU] [--eps |-e EPSILON] [--pert | -p PER

1 Nov 2, 2021

this repository has datasets containing information of Uber pickups in NYC from April 2014 to September 2014 and January to June 2015. data Analysis , virtualization and some insights are gathered here

uber-pickups-analysis Data Source: https://www.kaggle.com/fivethirtyeight/uber-pickups-in-new-york-city Information about data set The dataset contain

1 Nov 2, 2021

A terminal spreadsheet multitool for discovering and arranging data

VisiData v2.6.1 A terminal interface for exploring and arranging tabular data. VisiData supports tsv, csv, sqlite, json, xlsx (Excel), hdf5, and many

6.2k Jan 4, 2023

DiscoNet: Learning Distilled Collaboration Graph for Multi-Agent Perception [NeurIPS 2021]

DiscoNet: Learning Distilled Collaboration Graph for Multi-Agent Perception [NeurIPS 2021] Yiming Li, Shunli Ren, Pengxiang Wu, Siheng Chen, Chen Feng

Automation and Intelligence for Civil Engineering (AI4CE) Lab @ NYU

98 Dec 21, 2022

Book on Julia for Data Science

349 Dec 25, 2022

A large-scale database for graph representation learning

29 Nov 25, 2022

In this project, ETL pipeline is build on data warehouse hosted on AWS Redshift.

ETL Pipeline for AWS Project Description In this project, ETL pipeline is build on data warehouse hosted on AWS Redshift. The data is loaded from S3 t

1 Nov 1, 2021

Data on Free Food at MIT

MIT Free Food Timing Procrastinating research by plotting data on how long it takes emails on the free-food at mit edu mailing list to go through. Dat

2 Nov 1, 2021

Minimal working example of data acquisition with nidaqmx python API

Data Aquisition using NI-DAQmx python API Based on this project It is a minimal working example for data acquisition using the NI-DAQmx python API. It

1 Nov 5, 2021

A toolkit for geo ML data processing and model evaluation (fork of solaris)

An open source ML toolkit for overhead imagery. This is a beta version of lunular which may continue to develop. Please report any bugs through issues

4 Nov 4, 2021

Batch Python Program Verify

Batch Python Program Verify About As a TA(teaching assistant) of Programming Class, it is very annoying to test students' homework assignments one by

7 Dec 20, 2022

Data cleaning tools for Business analysis

Datacleaning datacleaning tools for Business analysis This program is made for Vicky's work. You can use it, too. 数据清洗该数据清洗工具是为了商业分析这个程序是为了Vicky的工作而

3 Nov 16, 2021

Based on 125GB of data leaked from Twitch, you can see their monthly revenues from 2019-2021

Twitch Revenues Bu script'i kullanarak istediğiniz yayıncıların, Twitch'den sızdırılan 125 GB'lik veriye dayanarak, 2019-2021 arası aylık gelirlerini

4 Nov 11, 2021

This repo represents all we learned and are learning in Data Structure course.

DataStructure Journey This repo represents all we learned and are learning in Data Structure course which is based on CLRS book and is being taught by

3 Jan 22, 2022

Enhancing Knowledge Tracing via Adversarial Training

Enhancing Knowledge Tracing via Adversarial Training This repository contains source code for the paper "Enhancing Knowledge Tracing via Adversarial T

14 Oct 24, 2022

convert-to-opus-cli is a Python CLI program for converting audio files to opus audio format.

convert-to-opus-cli convert-to-opus-cli is a Python CLI program for converting audio files to opus audio format. Installation Must have installed ffmp

4 Dec 21, 2022

Code for "Discovering Non-monotonic Autoregressive Orderings with Variational Inference" (paper and code updated from ICLR 2021)

Discovering Non-monotonic Autoregressive Orderings with Variational Inference Description This package contains the source code implementation of the

10 Dec 29, 2022

Aquarius - Enabling Fast, Scalable, Data-Driven Virtual Network Functions

Aquarius Aquarius - Enabling Fast, Scalable, Data-Driven Virtual Network Functions NOTE: We are currently going through the open-source process requir

0 Jun 2, 2022

Python library for analysis of time series data including dimensionality reduction, clustering, and Markov model estimation

deeptime Releases: Installation via conda recommended. conda install -c conda-forge deeptime pip install deeptime Documentation: deeptime-ml.github.io

495 Dec 28, 2022

How to Leverage Multimodal EHR Data for Better Medical Predictions?

How to Leverage Multimodal EHR Data for Better Medical Predictions? This repository contains the code of the paper: How to Leverage Multimodal EHR Dat

13 Dec 13, 2022

A command line tool for visualizing CSV/spreadsheet-like data

PerfPlotter Read data from CSV files using pandas and generate interactive plots using bokeh, which can then be embedded into HTML pages and served by

0 Jun 25, 2022

Use case: quick JSON processing/restructuring with jq without terminal

alfred-jq Alfred workflow to conveniently process JQ on clipboard based on a jq query Also available at: packal/jq Use case: quick JSON processing/res

5 Sep 30, 2022

Natural Language Processing

NLP Natural Language Processing apps Multilingual_NLP.py start #This script is demonstartion of Mul

1 Oct 31, 2021

Data Competition: automated systems that can detect whether people are not wearing masks or are wearing masks incorrectly

Table of contents Introduction Dataset Model & Metrics How to Run Quickstart Install Training Evaluation Detection DATA COMPETITION The COVID-19 pande

1 Feb 27, 2022

Terminal-based audio-to-text converter

att Terminal-based audio-to-text converter Project description A terminal-based audio-to-text converter written in python, enabling you to convert .wa

4 Dec 15, 2022

Scraping web pages to get data

Scraping Data Get public data and save in database This is project use Python How to run a project 1 - Clone the repository 2 - Install beautifulsoup4

2 Nov 1, 2021

Python Processing Tool for Vasp Ipnut/Output

PivotPy A Python Processing Tool for Vasp Input/Output. A CLI is available in Powershell, see Vasp2Visual. stylea{text-decoration: none !important;c

5 Aug 16, 2022

[IROS2021] NYU-VPR: Long-Term Visual Place Recognition Benchmark with View Direction and Data Anonymization Influences

NYU-VPR This repository provides the experiment code for the paper Long-Term Visual Place Recognition Benchmark with View Direction and Data Anonymiza

22 Sep 28, 2022

PdpCLI is a pandas DataFrame processing CLI tool which enables you to build a pandas pipeline from a configuration file.

PdpCLI Quick Links Introduction Installation Tutorial Basic Usage Data Reader / Writer Plugins Introduction PdpCLI is a pandas DataFrame processing CL

15 Jan 7, 2022

Python package with library and CLI tool for analyzing SeaFlow data

Seaflowpy A Python package for SeaFlow flow cytometer data. Table of Contents Install Read EVT/OPP/VCT Files Command-line Interface Configuration Inte

3 Nov 3, 2021

Python and data science snippets on the command line

Python Snippet Tool A tool to get Python and data science snippets at Data Science Simplified on the command line. You can read my article to learn ho

19 Dec 21, 2022

Splitgraph command line client and python library

Splitgraph Overview Splitgraph is a tool for building, versioning and querying reproducible datasets. It's inspired by Docker and Git, so it feels fam

313 Dec 24, 2022

Ralph is a command-line tool to fetch, extract, convert and push your tracking logs from various storage backends to your LRS or any other compatible storage or database backend.

Ralph is a command-line tool to fetch, extract, convert and push your tracking logs (aka learning events) from various storage backends to your

18 Jan 5, 2023

TAug :: Time Series Data Augmentation using Deep Generative Models

TAug :: Time Series Data Augmentation using Deep Generative Models Note!!! The package is under development so be careful for using in production! Fea

35 Dec 6, 2022

Deep Illuminator is a data augmentation tool designed for image relighting. It can be used to easily and efficiently generate a wide range of illumination variants of a single image.

Deep Illuminator Deep Illuminator is a data augmentation tool designed for image relighting. It can be used to easily and efficiently generate a wide

52 Nov 29, 2022

SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data

SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data Au

14 Nov 28, 2022

IndoNLI: A Natural Language Inference Dataset for Indonesian

IndoNLI: A Natural Language Inference Dataset for Indonesian This is a repository for data and code accompanying our EMNLP 2021 paper "IndoNLI: A Natu

15 Feb 10, 2022

A general, feasible, and extensible framework for classification tasks.

Pytorch Classification A general, feasible and extensible framework for 2D image classification. Features Easy to configure (model, hyperparameters) T

26 Nov 22, 2022

This provides the R code and data to replicate results in "The USS Trustee’s risky strategy"

USSBriefs2021 This provides the R code and data to replicate results in "The USS Trustee’s risky strategy" by Neil M Davies, Jackie Grant and Chin Yan

1 Oct 30, 2021

Open source single image super-resolution toolbox containing various functionality for training a diverse number of state-of-the-art super-resolution models. Also acts as the companion code for the IEEE signal processing letters paper titled 'Improving Super-Resolution Performance using Meta-Attention Layers’.

Deep-FIR Codebase - Super Resolution Meta Attention Networks About This repository contains the main coding framework accompanying our work on meta-at

17 Jun 17, 2022

[NeurIPS-2021] Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data

MosaicKD Code for NeurIPS-21 paper "Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data" 1. Motivation Natural images share common l

37 Nov 10, 2022

Codes and Data Processing Files for our paper.

Code Scripts and Processing Files for EEG Sleep Staging Paper 1. Folder Tree ./src_preprocess (data preprocessing files for SHHS and Sleep EDF) sleepE

18 Dec 12, 2022

Easily Process a Batch of Cox Models

ezcox: Easily Process a Batch of Cox Models The goal of ezcox is to operate a batch of univariate or multivariate Cox models and return tidy result. ⏬

15 May 23, 2022

Simple data balancing baselines for worst-group-accuracy benchmarks.

BalancingGroups Code to replicate the experimental results from Simple data balancing baselines achieve competitive worst-group-accuracy. Replicating

29 Dec 2, 2022

Official project repository for 'Normality-Calibrated Autoencoder for Unsupervised Anomaly Detection on Data Contamination'

NCAE_UAD Official project repository of 'Normality-Calibrated Autoencoder for Unsupervised Anomaly Detection on Data Contamination' Abstract In this p

2 Feb 10, 2022

Python Crime-Data-Batch-Processing Resources

Python Crime-Data-Batch-Processing Libraries

This tool parses log data and allows to define analysis pipelines for anomaly detection.

Tools for analyzing data collected with a custom unity-based VR for insects.

A tool for batch processing large fasta files and accompanying metadata table to upload to repositories via API

A collection of tools for biomedical research assay analysis in Python.

A learning-based data collection tool for human segmentation

A collection of robust and fast processing tools for parsing and analyzing web archive data.

Suite of tools for retrieving USGS NWIS observations and evaluating National Water Model (NWM) data.

A tool and a library for SVG path data transformations.

A python tool for batch importing excel/csv files into mysql database.

NLP Overview

An Open-Source Toolkit for Prompt-Learning.

Causal-BALD: Deep Bayesian Active Learning of Outcomes to Infer Treatment-Effects from Observational Data.

Implementation of popular bandit algorithms in batch environments.

Create shortcuts on Windows to your Python, EXE, Batch files or any other file using a GUI made with PySimpleGUI

python's memory-saving dictionary data structure

This repository has datasets containing information of Uber pickups in NYC from April 2014 to September 2014 and January to June 2015. data Analysis , virtualization and some insights are gathered here

LegalNLP - Natural Language Processing Methods for the Brazilian Legal Language

Public repo for automation scripts

Streamz helps you build pipelines to manage continuous streams of data

MongoDB utility to inflate the contents of small collection to a new larger collection

Visualization Data Drug in thailand during 2014 to 2020

Official implementation of Generalized Data Weighting via Class-level Gradient Manipulation (NeurIPS 2021).

nrgpy is the Python package for processing NRG Data Files

The Metabolomics Integrator (MINT) is a post-processing tool for liquid chromatography-mass spectrometry (LCMS) based metabolomics.

Data science, Data manipulation and Machine learning package.

Extract the temperature data of each wire from the thermal imager raw data.

VevestaX is an open source Python package for ML Engineers and Data Scientists.

Connects to a local SenseCap M1 Helium Hotspot and pulls API Data.

scikit-multimodallearn is a Python package implementing algorithms multimodal data.

Distributing Deep Learning Hyperparameter Tuning for 3D Medical Image Segmentation

Generalized Data Weighting via Class-level Gradient Manipulation

Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.

Personal thermal comfort models using digital twins: Preference prediction with BIM-extracted spatial-temporal proximity data from Build2Vec

SW components and demos for visual kinship recognition. An emphasis is put on the FIW dataset-- data loaders, benchmarks, results in summary.

[NeurIPS 2021] “Improving Contrastive Learning on Imbalanced Data via Open-World Sampling”

Data and Code for paper Outlining and Filling: Hierarchical Query Graph Generation for Answering Complex Questions over Knowledge Graph is available for research purposes.

Cross-lingual Transfer for Speech Processing using Acoustic Language Similarity

Code for a seq2seq architecture with Bahdanau attention designed to map stereotactic EEG data from human brains to spectrograms, using the PyTorch Lightning.

Bayes-Newton—A Gaussian process library in JAX, with a unifying view of approximate Bayesian inference as variants of Newton's algorithm.

The Timescale NFT Starter Kit is a step-by-step guide to get up and running with collecting, storing, analyzing and visualizing NFT data from OpenSea, using PostgreSQL and TimescaleDB.

This is the accompanying repository for the Bloomberg Global Coal Countdown website.

Privacy as Code for DSAR Orchestration: Privacy Request automation to fulfill GDPR, CCPA, and LGPD data subject requests.

A tool for hiding data inside of images

Planning Algorithms in AI and Robotics. MSc course at Skoltech Data Science program

PyTorch implementation of "A Two-Stage End-to-End System for Speech-in-Noise Hearing Aid Processing"

EOD Historical Data Python Library (Unofficial)

A selection of SQLite3 databases to practice querying from.

This is an auto-ML tool specialized in detecting of outliers

Official repository of the paper "A Variational Approximation for Analyzing the Dynamics of Panel Data". Mixed Effect Neural ODE. UAI 2021.

Data Poisoning based on Adversarial Attacks using Non-Robust Features

this repository has datasets containing information of Uber pickups in NYC from April 2014 to September 2014 and January to June 2015. data Analysis , virtualization and some insights are gathered here

A terminal spreadsheet multitool for discovering and arranging data

DiscoNet: Learning Distilled Collaboration Graph for Multi-Agent Perception [NeurIPS 2021]

Book on Julia for Data Science

A large-scale database for graph representation learning

In this project, ETL pipeline is build on data warehouse hosted on AWS Redshift.

Data on Free Food at MIT

Minimal working example of data acquisition with nidaqmx python API

A toolkit for geo ML data processing and model evaluation (fork of solaris)

Batch Python Program Verify

Data cleaning tools for Business analysis

Based on 125GB of data leaked from Twitch, you can see their monthly revenues from 2019-2021

This repo represents all we learned and are learning in Data Structure course.

Enhancing Knowledge Tracing via Adversarial Training

convert-to-opus-cli is a Python CLI program for converting audio files to opus audio format.

Code for "Discovering Non-monotonic Autoregressive Orderings with Variational Inference" (paper and code updated from ICLR 2021)

Aquarius - Enabling Fast, Scalable, Data-Driven Virtual Network Functions

Python library for analysis of time series data including dimensionality reduction, clustering, and Markov model estimation

How to Leverage Multimodal EHR Data for Better Medical Predictions?

A command line tool for visualizing CSV/spreadsheet-like data

Use case: quick JSON processing/restructuring with jq without terminal

Natural Language Processing

Data Competition: automated systems that can detect whether people are not wearing masks or are wearing masks incorrectly

Terminal-based audio-to-text converter

Scraping web pages to get data

Python Processing Tool for Vasp Ipnut/Output

[IROS2021] NYU-VPR: Long-Term Visual Place Recognition Benchmark with View Direction and Data Anonymization Influences

PdpCLI is a pandas DataFrame processing CLI tool which enables you to build a pandas pipeline from a configuration file.