566 Repositories
Python awesome-public-datasets Libraries
Active Learning demo using two small datasets
ActiveLearningDemo How to run step one put the dataset folder and use command below to split the dataset to the required structure run utils.py For ea
STS Benchmark comprises a selection of the English datasets used in the STS tasks organized in the context of SemEval between 2012 and 2017. The selection of datasets include text from image captions, news headlines and user forums.
stsb_multi_mt_en STS Benchmark comprises a selection of the English datasets used in the STS tasks organized in the context of SemEval between 2012 an
Instant search for and access to many datasets in Pyspark.
SparkDataset Provides instant access to many datasets right from Pyspark (in Spark DataFrame structure). Drop a star if you like the project. 😃 Motiv
this repository has datasets containing information of Uber pickups in NYC from April 2014 to September 2014 and January to June 2015. data Analysis , virtualization and some insights are gathered here
uber-pickups-analysis Data Source: https://www.kaggle.com/fivethirtyeight/uber-pickups-in-new-york-city Information about data set The dataset contain
Clint is a module filled with a set of awesome tools for developing commandline applications.
Clint: Python Command-line Interface Tools Clint is a module filled with a set of awesome tools for developing commandline applications. C ommand L in
🐍 Python CLI tool to get public information from a GitHub account
🐍 Gitter 🐍 Python CLI tool to get public information from a GitHub account 🤔 What's this? Gitter is a open-source project created to easily uses th
img-proof (IPA) provides a command line utility to test images in the Public Cloud
overview img-proof (IPA) provides a command line utility to test images in the Public Cloud (AWS, Azure, GCE, etc.). With img-proof you can now test c
A tutorial for people to run synthetic data replica's from source healthcare datasets
Synthetic-Data-Replica-for-Healthcare Description What is this? A tailored hands-on tutorial showing how to use Python to create synthetic data replic
Official Datasets and Implementation from our Paper "Video Class Agnostic Segmentation in Autonomous Driving".
Video Class Agnostic Segmentation [Method Paper] [Benchmark Paper] [Project] [Demo] Official Datasets and Implementation from our Paper "Video Class A
HW 2: Visualizing interesting datasets
HW 2: Visualizing interesting datasets Check out the project instructions here! Mean Earnings per Hour for Males and Females My first graph uses data
Asterisk is a framework to generate high-quality training datasets at scale
Asterisk is a framework to generate high-quality training datasets at scale
A dataset handling library for computer vision datasets in LOST-fromat
A dataset handling library for computer vision datasets in LOST-fromat
HM02: Visualizing Interesting Datasets
HM02: Visualizing Interesting Datasets This is a homework assignment for CSCI 40 class at Claremont McKenna College. Go to the project page to learn m
Multivariate Time Series Transformer, public version
Multivariate Time Series Transformer Framework This code corresponds to the paper: George Zerveas et al. A Transformer-based Framework for Multivariat
🎵 A repository for manually annotating files to create labeled acoustic datasets for machine learning.
🎵 A repository for manually annotating files to create labeled acoustic datasets for machine learning.
Awesome Spectral Indices in Python.
Awesome Spectral Indices in Python: Numpy | Pandas | GeoPandas | Xarray | Earth Engine | Planetary Computer | Dask GitHub: https://github.com/davemlz/
Efficient Training of Visual Transformers with Small Datasets
Official codes for "Efficient Training of Visual Transformers with Small Datasets", NerIPS 2021.
Repo for "Physion: Evaluating Physical Prediction from Vision in Humans and Machines" submission to NeurIPS 2021 (Datasets & Benchmarks track)
Physion: Evaluating Physical Prediction from Vision in Humans and Machines This repo contains code and data to reproduce the results in our paper, Phy
an elegant datasets factory
rawbuilder an elegant datasets factory Free software: MIT license Documentation: https://rawbuilder.readthedocs.io. Features Schema oriented datasets
Build, deploy and extract satellite public constellations with one command line.
SatExtractor Build, deploy and extract satellite public constellations with one command line. Table of Contents About The Project Getting Started Stru
Public Implementation of ChIRo from "Learning 3D Representations of Molecular Chirality with Invariance to Bond Rotations"
Learning 3D Representations of Molecular Chirality with Invariance to Bond Rotations This directory contains the model architectures and experimental
I made this so I can control my Tapo L510 light bulb and Govee H6159 light strip using the PyP100 module and the Govee public API
TAPO-And-Govee-Controller I made this so I can control my Tapo L510 light bulb and Govee H6159 light strip using the PyP100 module and the Govee publi
Extremely simple and fast extreme multi-class and multi-label classifiers.
napkinXC napkinXC is an extremely simple and fast library for extreme multi-class and multi-label classification, that focus of implementing various m
Code and datasets for the paper "KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization for Relation Extraction"
KnowPrompt Code and datasets for our paper "KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization for Relation Extraction" Requireme
Pytorch implementation for "Distribution-Balanced Loss for Multi-Label Classification in Long-Tailed Datasets" (ECCV 2020 Spotlight)
Distribution-Balanced Loss [Paper] The implementation of our paper Distribution-Balanced Loss for Multi-Label Classification in Long-Tailed Datasets (
Rhythm bot clone for discord written in Python and uses YouTube to get media files.
Tunebot About Rhythm bot clone for discord written in Python and uses YouTube to get media files. Usage You need a .env file within the same directory
QuakeLabeler is a Python package to create and manage your seismic training data, processes, and visualization in a single place — so you can focus on building the next big thing.
QuakeLabeler Quake Labeler was born from the need for seismologists and developers who are not AI specialists to easily, quickly, and independently bu
A python app which aggregates and splits costs from multiple public cloud providers into a csv
Cloud Billing This project aggregates the costs public cloud resources by accounts, services and tags by importing the invoices from public cloud prov
An NLP library with Awesome pre-trained Transformer models and easy-to-use interface, supporting wide-range of NLP tasks from research to industrial applications.
简体中文 | English News [2021-10-12] PaddleNLP 2.1版本已发布!新增开箱即用的NLP任务能力、Prompt Tuning应用示例与生成任务的高性能推理! 🎉 更多详细升级信息请查看Release Note。 [2021-08-22]《千言:面向事实一致性的生
The "breathing k-means" algorithm with datasets and example notebooks
The Breathing K-Means Algorithm (with examples) The Breathing K-Means is an approximation algorithm for the k-means problem that (on average) is bette
Glue is a python project to link visualizations of scientific datasets across many files.
Glue Glue is a python project to link visualizations of scientific datasets across many files. Click on the image for a quick demo: Features Interacti
eyes is a Public Opinion Mining System focusing on taiwanese forums such as PTT, Dcard.
eyes is a Public Opinion Mining System focusing on taiwanese forums such as PTT, Dcard. Features 🔥 Article monitor: helps you capture the trend at a
Show Public IP Information In Linux Taskbar
IP Information In Linux Taskbar 📍 How Use IP Script? 🤔 Download ip.py script and save somewhere in your system. Add command applet in your taskbar a
Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Tensor2Tensor Tensor2Tensor, or T2T for short, is a library of deep learning models and datasets designed to make deep learning more accessible and ac
Public repo for the ICCV2021-CVAMD paper "Is it Time to Replace CNNs with Transformers for Medical Images?"
Is it Time to Replace CNNs with Transformers for Medical Images? Accepted at ICCV-2021: Workshop on Computer Vision for Automated Medical Diagnosis (C
The first online catalogue for Arabic NLP datasets.
Masader The first online catalogue for Arabic NLP datasets. This catalogue contains 200 datasets with more than 25 metadata annotations for each datas
A repository for collating all the resources such as articles, blogs, papers, and books related to Bayesian Statistics.
A repository for collating all the resources such as articles, blogs, papers, and books related to Bayesian Statistics.
A curated list of awesome resources combining Transformers with Neural Architecture Search
A curated list of awesome resources combining Transformers with Neural Architecture Search
A public API written in Python using the Flask web framework to determine the direction of a road sign using AI
python-public-API This repository is a public API for solving the problem of the final of the AIIJC competition. The task is to create an AI for the c
Benchmark datasets, data loaders, and evaluators for graph machine learning
Overview The Open Graph Benchmark (OGB) is a collection of benchmark datasets, data loaders, and evaluators for graph machine learning. Datasets cover
Datasets accompanying the paper ConditionalQA: A Complex Reading Comprehension Dataset with Conditional Answers.
ConditionalQA Datasets accompanying the paper ConditionalQA: A Complex Reading Comprehension Dataset with Conditional Answers. Disclaimer This dataset
Better GitHub statistics images for your profile, with stats from private and public repos
Better GitHub statistics images for your profile, with stats from private and public repos
Lite version of my Gatekeeper backdoor for public use.
MayorSec Backdoor Fully functioning bind-type backdoor This backdoor is a fully functioning bind shell and lite version of my full functioning Gatekee
Python Package for DataHerb: create, search, and load datasets.
The Python Package for DataHerb A DataHerb Core Service to Create and Load Datasets.
HyperSpy is an open source Python library for the interactive analysis of multidimensional datasets
HyperSpy is an open source Python library for the interactive analysis of multidimensional datasets that can be described as multidimensional arrays o
Acid's Utilities is a bot for my Discord server that alerts when I go live, welcomes new users, has some awesome games and so much more!
Acid's Utilities Acid's Utilities is a bot for my Discord server that alerts when I go live, welcomes new users, has some awesome games and so much mo
Python suite to construct benchmark machine learning datasets from the MIMIC-III clinical database.
MIMIC-III Benchmarks Python suite to construct benchmark machine learning datasets from the MIMIC-III clinical database. Currently, the benchmark data
Download all games from a public Itch.io Game Jam
Itch Jam Downloader Downloads all games from a public Itch.io Game Jam. What you'll need: Python 3.8+ pip install -r requirements.txt For site mirrori
🐱🏍 A curated list of awesome things related to Hugo themes.
awesome-hugo-themes Automated deployment @ 2021-10-12 06:24:07 Asia/Shanghai &sorted=updated Theme Author License GitHub Stars Updated Blonde wamo MIT
Awesome Weak-Shot Learning
Awesome Weak-Shot Learning In weak-shot learning, all categories are split into non-overlapped base categories and novel categories, in which base cat
Algorithmic and AI MIDI Drums Generator Implementation
Algorithmic and AI MIDI Drums Generator Implementation
This is a public repo where code samples are stored for the book Practical MLOps.
[Book-2021] Practical MLOps O'Reilly Book
A set of examples around hub for creating and processing datasets
Examples for Hub - Dataset Format for AI A repository showcasing examples of using Hub Uploading Dataset Places365 Colab Tutorials Notebook Link Getti
Download all games from a public Itch.io Game Jam
Itch Jam Downloader Downloads all games from a public Itch.io Game Jam. What you'll need: Python 3.8+ pip install -r requirements.txt For site mirrori
Kaggle G2Net Gravitational Wave Detection : 2nd place solution
Kaggle G2Net Gravitational Wave Detection : 2nd place solution
Lazy package to start your project using FastAPI✨
Fastapi-lazy 🦥 Utilities that you use in various projects made in FastAPI. Source Code: https://github.com/yezz123/fastapi-lazy Install the project:
Analyse a forensic target (such as a directory) to find and report files found and not found from CIRCL hashlookup public service
Analyse a forensic target (such as a directory) to find and report files found and not found from CIRCL hashlookup public service. This tool can help a digital forensic investigator to know the context, origin of specific files during a digital forensic investigation.
The first public PyTorch implementation of Attentive Recurrent Comparators
arc-pytorch PyTorch implementation of Attentive Recurrent Comparators by Shyam et al. A blog explaining Attentive Recurrent Comparators Visualizing At
TorchXRayVision: A library of chest X-ray datasets and models.
torchxrayvision A library for chest X-ray datasets and models. Including pre-trained models. ( 🎬 promo video about the project) Motivation: While the
A curated list of neural rendering resources.
Awesome-of-Neural-Rendering A curated list of neural rendering and related resources. Please feel free to pull requests or open an issue to add papers
A curated list of awesome mathematics resources
A curated list of awesome mathematics resources
Awesome Long-Tailed Learning
Awesome Long-Tailed Learning This repo pays specially attention to the long-tailed distribution, where labels follow a long-tailed or power-law distri
A simple recipe for training and inferencing Transformer architecture for Multi-Task Learning on custom datasets. You can find two approaches for achieving this in this repo.
multitask-learning-transformers A simple recipe for training and inferencing Transformer architecture for Multi-Task Learning on custom datasets. You
This is a gentle introductin on how to start using an awesome library called Weights and Biases.
🪄 W&B Minimal PyTorch Tutorial This tutorial is also accompanied with a PyTorch source code, it can be found in src folder. Furthermore, all plots an
C++ Implementation of PyTorch Tutorials for Everyone
C++ Implementation of PyTorch Tutorials for Everyone OS (Compiler)\LibTorch 1.9.0 macOS (clang 10.0, 11.0, 12.0) Linux (gcc 8, 9, 10, 11) Windows (msv
Create Fast and easy image datasets using reddit
Reddit-Image-Scraper Reddit Reddit is an American Social news aggregation, web content rating, and discussion website. Reddit has been devided by topi
A collection of resources/tools and analyses for the angr binary analysis framework.
Awesome angr A collection of resources/tools and analyses for the angr binary analysis framework. This page does not only collect links and external r
The tool to make NLP datasets ready to use
chazutsu photo from Kaikado, traditional Japanese chazutsu maker chazutsu is the dataset downloader for NLP. import chazutsu r = chazutsu.data
TorchGeo is a PyTorch domain library, similar to torchvision, that provides datasets, transforms, samplers, and pre-trained models specific to geospatial data.
TorchGeo is a PyTorch domain library, similar to torchvision, that provides datasets, transforms, samplers, and pre-trained models specific to geospatial data.
S3-plugin is a high performance PyTorch dataset library to efficiently access datasets stored in S3 buckets.
S3-plugin is a high performance PyTorch dataset library to efficiently access datasets stored in S3 buckets.
Detects members having unicode names. Public bot: @scarletwitchprobot
✨ Scarletwitch bot ✨ Detects unicode names members in a tg chat & provides a option to take action on that user ! Public bot: @scarletwitchprobot Supp
collection of interesting Computer Science resources
collection of interesting Computer Science resources
A ready-to-use curated list of Spectral Indices for Remote Sensing applications.
A ready-to-use curated list of Spectral Indices for Remote Sensing applications. GitHub: https://github.com/davemlz/awesome-ee-spectral-indices Docume
Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.
Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless. This is the official Roboflow python package that interfaces with the Roboflow API.
System Design course at HSE (2021)
System Design course at HSE (2021) Wiki-страница курса Структура репозитория: slides - директория с презентациями с занятий tasks - материалы для выпо
Signatures and IoCs from public Volexity blog posts.
threat-intel This repository contains IoCs related to Volexity public threat intelligence blog posts. They are organised by year, and within each year
The public discord bot, created by: primitt, further developed by: duino-coin team.
Duino Stats Mini A public Duino-Stats Discord bot. Click this link to invite the bot to your server. License Duino Stats Mini distributed under the MI
This repository contains the official release of the model "BanglaBERT" and associated downstream finetuning code and datasets introduced in the paper titled "BanglaBERT: Combating Embedding Barrier in Multilingual Models for Low-Resource Language Understanding".
BanglaBERT This repository contains the official release of the model "BanglaBERT" and associated downstream finetuning code and datasets introduced i
An awesome script to convert the University Of Oviedo web calendar to Google or Outlook calendars.
autoUniCalendar Un script en Python para convertir el calendario de la intranet de la Universidad de Oviedo en un calendario de Outlook o Google Calen
This python module is an easy-to-use port of the text normalization used in the paper "Not low-resource anymore: Aligner ensembling, batch filtering, and new datasets for Bengali-English machine translation". It is intended to be used for normalizing / cleaning Bengali and English text.
normalizer This python module is an easy-to-use port of the text normalization used in the paper "Not low-resource anymore: Aligner ensembling, batch
An easy way to build PyTorch datasets. Modularly build datasets and automatically cache processed results
EasyDatas An easy way to build PyTorch datasets. Modularly build datasets and automatically cache processed results Installation pip install git+https
A tiny Python library for generating public IDs from integers
pids Create short public identifiers based on integer IDs. Installation pip install pids Usage from pids import pid public_id = pid.from_int(1234) #
TorchIO is a Medical image preprocessing and augmentation toolkit for deep learning. Part of the PyTorch Ecosystem.
Medical image preprocessing and augmentation toolkit for deep learning. Part of the PyTorch Ecosystem.
OpenCVのGrabCut()を利用したセマンティックセグメンテーション向けアノテーションツール(Annotation tool using GrabCut() of OpenCV. It can be used to create datasets for semantic segmentation.)
[Japanese/English] GrabCut-Annotation-Tool GrabCut-Annotation-Tool.mp4 OpenCVのGrabCut()を利用したアノテーションツールです。 セマンティックセグメンテーション向けのデータセット作成にご使用いただけます。 ※Grab
A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.
About This repository provides data and code for the paper: Scalable Data Annotation Pipeline for High-Quality Large Speech Datasets Development (subm
Collection of NLP model explanations and accompanying analysis tools
Thermostat is a large collection of NLP model explanations and accompanying analysis tools. Combines explainability methods from the captum library wi
Docker is an open platform for developing, shipping, and running applications OS-level virtualization to deliver software in packages called containers However, 'security' is a top request on Docker's public roadmap This project aims at vulnerability check for such docker containers. New contributions are accepted
Docker-Vulnerability-Check Docker is an open platform for developing, shipping, and running applications OS-level virtualization to deliver software i
The official repository for our paper "The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers". We significantly improve the systematic generalization of transformer models on a variety of datasets using simple tricks and careful considerations.
Codebase for training transformers on systematic generalization datasets. The official repository for our EMNLP 2021 paper The Devil is in the Detail:
GraphGT: Machine Learning Datasets for Graph Generation and Transformation
GraphGT: Machine Learning Datasets for Graph Generation and Transformation Dataset Website | Paper Installation Using pip To install the core environm
Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)
This is a playground for pytorch beginners, which contains predefined models on popular dataset. Currently we support mnist, svhn cifar10, cifar100 st
Minimal But Practical Image Classifier Pipline Using Pytorch, Finetune on ResNet18, Got 99% Accuracy on Own Small Datasets.
PyTorch Image Classifier Updates As for many users request, I released a new version of standared pytorch immage classification example at here: http:
Simple tool/toolkit for evaluating NLG (Natural Language Generation) offering various automated metrics.
Simple tool/toolkit for evaluating NLG (Natural Language Generation) offering various automated metrics. Jury offers a smooth and easy-to-use interface. It uses datasets for underlying metric computation, and hence adding custom metric is easy as adopting datasets.Metric.
Public API client for GETTR, a "non-bias [sic] social network," designed for data archival and analysis.
GoGettr GoGettr is an API client for GETTR, a "non-bias [sic] social network." (We will not reward their domain with a hyperlink.) GoGettr is built an
A Survey of Natural Language Generation in Task-Oriented Dialogue System (TOD): Recent Advances and New Frontiers
A Survey of Natural Language Generation in Task-Oriented Dialogue System (TOD): Recent Advances and New Frontiers
UIUCTF 2021 Public Challenge Repository
UIUCTF-2021-Public UIUCTF 2021 Public Challenge Repository Notes: every challenge folder contains a challenge.yml file in the format for ctfcli, CTFd'
Tool to check whether a GCP bucket is public or not.
Tool to check publicly accessible GCP bucket. Blog https://justm0rph3u5.medium.com/gcp-inspector-auditing-publicly-exposed-gcp-bucket-ac6cad55618c Wha
An awesome Python wrapper for an awesome Docker CLI!
An awesome Python wrapper for an awesome Docker CLI!
We evaluate our method on different datasets (including ShapeNet, CUB-200-2011, and Pascal3D+) and achieve state-of-the-art results, outperforming all the other supervised and unsupervised methods and 3D representations, all in terms of performance, accuracy, and training time.
An Effective Loss Function for Generating 3D Models from Single 2D Image without Rendering Papers with code | Paper Nikola Zubić Pietro Lio University
User-related REST API based on the awesome Django REST Framework
Django REST Registration User registration REST API, based on Django REST Framework. Documentation Full documentation for the project is available at
一套完整的微博舆情分析流程代码,包括微博爬虫、LDA主题分析和情感分析。
已经将项目的关键文件上传,包含微博爬虫、LDA主题分析和情感分析三个部分。 1.微博爬虫 实现微博评论爬取和微博用户信息爬取,一天大概十万条。 2.LDA主题分析 实现文档主题抽取,包括数据清洗及分词、主题数的确定(主题一致性和困惑度)和最优主题模型的选择(暴力搜索)。 3.情感分析 实现评论文本的