2769 Repositories
Python data-analytics-projects Libraries
WRENCH: Weak supeRvision bENCHmark
🔧 What is it? Wrench is a benchmark platform containing diverse weak supervision tasks. It also provides a common and easy framework for development
Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021
Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021
Pro Football Reference Game Data Webscraper
Pro Football Reference Game Data Webscraper Code Copyright Yeetzsche This is a simple Pro Football Reference Webscraper that can either collect all ga
A framework for launching new Django Rest Framework projects quickly.
DRFx A framework for launching new Django Rest Framework projects quickly. Comes with a custom user model, login/logout/signup, social authentication
Cookiecutter Django is a framework for jumpstarting production-ready Django projects quickly.
Cookiecutter Django Powered by Cookiecutter, Cookiecutter Django is a framework for jumpstarting production-ready Django projects quickly. Documentati
A Tools that help Data Scientists and ML engineers train and deploy ML models.
Domino Research This repo contains projects under active development by the Domino R&D team. We build tools that help Data Scientists and ML engineers
Code and data form the paper BERT Got a Date: Introducing Transformers to Temporal Tagging
BERT Got a Date: Introducing Transformers to Temporal Tagging Satya Almasian*, Dennis Aumiller*, and Michael Gertz Heidelberg University Contact us vi
Python library which makes it possible to dynamically mask/anonymize data using JSON string or python dict rules in a PySpark environment.
pyspark-anonymizer Python library which makes it possible to dynamically mask/anonymize data using JSON string or python dict rules in a PySpark envir
Meli Data Challenge 2021 - First Place Solution
My solution for the Meli Data Challenge 2021
The PASS dataset: pretrained models and how to get the data - PASS: Pictures without humAns for Self-Supervised Pretraining
The PASS dataset: pretrained models and how to get the data - PASS: Pictures without humAns for Self-Supervised Pretraining
Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation, available for both PyTorch and Tensorflow.
Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation, available for both PyTorch and Tensorflow.
A python module to manipulate XCode projects
This module can read, modify, and write a .pbxproj file from an Xcode 4+ projects. The file is usually called project.pbxproj and can be found inside the .xcodeproj bundle. Because some task cannot be done by clicking on an UI or opening Xcode to do it for you, this python module lets you automate the modification process.
An esoteric data type built entirely of NaNs.
NaNsAreNumbers An esoteric data type built entirely of NaNs. Installation pip install nans_are_numbers Explanation A floating point number is just co
Labelling platform for text using distant supervision
With DataQA, you can label unstructured text documents using rule-based distant supervision.
Implementation of Squeezenet in pytorch, pretrained models on Cifar 10 data to come
Pytorch Squeeznet Pytorch implementation of Squeezenet model as described in https://arxiv.org/abs/1602.07360 on cifar-10 Data. The definition of Sque
Recurrent Variational Autoencoder that generates sequential data implemented with pytorch
Pytorch Recurrent Variational Autoencoder Model: This is the implementation of Samuel Bowman's Generating Sentences from a Continuous Space with Kim's
Hierarchical unsupervised and semi-supervised topic models for sparse count data with CorEx
Anchored CorEx: Hierarchical Topic Modeling with Minimal Domain Knowledge Correlation Explanation (CorEx) is a topic model that yields rich topics tha
TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP
TextAttack 🐙 Generating adversarial examples for NLP models [TextAttack Documentation on ReadTheDocs] About • Setup • Usage • Design About TextAttack
Faker is a Python package that generates fake data for you.
Faker is a Python package that generates fake data for you. Whether you need to bootstrap your database, create good-looking XML documents, fill-in yo
Multi-user accounts for Django projects
django-organizations Summary Groups and multi-user account management Author Ben Lopatin (http://benlopatin.com) Status Separate individual user ident
Model Agnostic Confidence Estimator (MACEST) - A Python library for calibrating Machine Learning models' confidence scores
Model Agnostic Confidence Estimator (MACEST) - A Python library for calibrating Machine Learning models' confidence scores
PySpark Cheat Sheet - learn PySpark and develop apps faster
This cheat sheet will help you learn PySpark and write PySpark apps faster. Everything in here is fully functional PySpark code you can run or adapt to your programs.
payu payment gateway integration for django projects
Django-PayU This package provides integration between Django and PayU Payment Gateway. Quick start Install 'django-payu' using the following command:
Small projects for python beginners.
Python Mini Projects For Beginners I recently started doing the #100DaysOfCode Challenge in Python. I've used Python before, but I had switched to JS
TorchGeo is a PyTorch domain library, similar to torchvision, that provides datasets, transforms, samplers, and pre-trained models specific to geospatial data.
TorchGeo is a PyTorch domain library, similar to torchvision, that provides datasets, transforms, samplers, and pre-trained models specific to geospatial data.
The project is investigating methods to extract human-marked data from document forms such as surveys and tests.
The project is investigating methods to extract human-marked data from document forms such as surveys and tests. They can read questions, multiple-choice exam papers, and grade.
Python reader for Linked Data in HDF5 files
Linked Data are becoming more popular for user-created metadata in HDF5 files.
This is a repository for "100 days of code challenge" projects. You can reach all projects from beginner to professional which are written in Python.
100 Days of Code It's a challenge that aims to gain code practice and enhance programming knowledge. Day #1 Create a Band Name Generator It's actually
Gnosis-py includes a set of libraries to work with Ethereum and Gnosis projects
Gnosis-py Gnosis-py includes a set of libraries to work with Ethereum and Gnosis projects: EthereumClient, a wrapper over Web3.py Web3 client includin
Django package to log request values such as device, IP address, user CPU time, system CPU time, No of queries, SQL time, no of cache calls, missing, setting data cache calls for a particular URL with a basic UI.
django-web-profiler's documentation: Introduction: django-web-profiler is a django profiling tool which logs, stores debug toolbar statistics and also
Analytics services for Django projects
django-analytical The django-analytical application integrates analytics services into a Django project. Using an analytics service with a Django proj
Data from popular CS:GO website hltv.org
Welcome to hltv-data 👋 🎮 Data from popular CS:GO website hltv.org Install pip install hltv-data Usage The public methods can be reached using HLTVCl
Obsidian tools - a Python package for analysing an Obsidian.md vault
obsidiantools is a Python package for getting structured metadata about your Obsidian.md notes and analysing your vault.
A demo of a data science project using Kedro
iris Overview This is your new Kedro project, which was generated using Kedro 0.17.4. Take a look at the Kedro documentation to get started. Rules and
django-dashing is a customisable, modular dashboard application framework for Django to visualize interesting data about your project. Inspired in the dashboard framework Dashing
django-dashing django-dashing is a customisable, modular dashboard application framework for Django to visualize interesting data about your project.
A command-line utility that creates projects from cookiecutters (project templates), e.g. Python package projects, VueJS projects.
Cookiecutter A command-line utility that creates projects from cookiecutters (project templates), e.g. creating a Python package project from a Python
Complete Two-Factor Authentication for Django providing the easiest integration into most Django projects.
Django Two-Factor Authentication Complete Two-Factor Authentication for Django. Built on top of the one-time password framework django-otp and Django'
:couple: Multi-user accounts for Django projects
django-organizations Summary Groups and multi-user account management Author Ben Lopatin (http://benlopatin.com) Status Separate individual user ident
An improved django-admin-tools dashboard for Django projects
django-fluent-dashboard The fluent_dashboard module offers a custom admin dashboard, built on top of django-admin-tools (docs). The django-admin-tools
Produces a summary CSV report of an Amber Electric customer's energy consumption and cost data.
Amber Electric Usage Summary This is a command line tool that produces a summary CSV report of an Amber Electric customer's energy consumption and cos
Rafael Project- Classifying rockets to different types using data science algorithms.
Rocket-Classify Rafael Project- Classifying rockets to different types using data science algorithms. In this project we received data base with data
[ArXiv 2021] Data-Efficient Instance Generation from Instance Discrimination
InsGen - Data-Efficient Instance Generation from Instance Discrimination Data-Efficient Instance Generation from Instance Discrimination Ceyuan Yang,
Shape Matching of Real 3D Object Data to Synthetic 3D CADs (3DV project @ ETHZ)
Real2CAD-3DV Shape Matching of Real 3D Object Data to Synthetic 3D CADs (3DV project @ ETHZ) Group Member: Yue Pan, Yuanwen Yue, Bingxin Ke, Yujie He
A daily updated JSON dataset of all the Open House London venues, events, and metadata
Open House London listings data All of it. Automatically scraped hourly with updates committed to git, autogenerated per-day CSV's, and autogenerated
lightweight, fast and robust columnar dataframe for data analytics with online update
streamdf Streamdf is a lightweight data frame library built on top of the dictionary of numpy array, developed for Kaggle's time-series code competiti
Collections of python projects
nppy, mostly contains projects written in Python. Some projects are very simple while some are a bit lenghty and difficult(for beginners) Requirements
CS 506 - Computational Tools for Data Science
CS 506 - Computational Tools for Data Science Code, slides, and notes for Boston University CS506 Fall 2021 The Final Project Repository can be found
An all-inclusive Python framework for the Riot Games League of Legends API. We focus on making the data easy and fun to work with, while providing all the tools necessary to create a website or do data analysis.
Cassiopeia A Python adaptation of the Riot Games League of Legends API (https://developer.riotgames.com/). Cassiopeia is the sister library to Orianna
Learn how to responsibly deliver value with ML.
Made With ML Applied ML · MLOps · Production Join 30K+ developers in learning how to responsibly deliver value with ML. 🔥 Among the top MLOps reposit
Visions provides an extensible suite of tools to support common data analysis operations
Visions And these visions of data types, they kept us up past the dawn. Visions provides an extensible suite of tools to support common data analysis
An open-source Python project series where beginners can contribute and practice coding.
Python Mini Projects A collection of easy Python small projects to help you improve your programming skills. Table Of Contents Aim Of The Project Cont
An unofficial python API for trading on the DeGiro platform, with the ability to get real time data and historical data.
DegiroAPI An unofficial API for the trading platform Degiro written in Python with the ability to get real time data and historical data for products.
jsoooooooon derulo - Make sure your 'jason derulo' is featured as the first part of your json data
jsonderulo Make sure your 'jason derulo' is featured as the first part of your json data Install: # python pip install jsonderulo poetry add jsonderul
Tools to help record data from Qiskit jobs
archiver4qiskit Tools to help record data from Qiskit jobs. Install with pip install git+https://github.com/NCCR-SPIN/archiver4qiskit.git Import the
This reporistory contains the test-dev data of the paper "xGQA: Cross-lingual Visual Question Answering".
This reporistory contains the test-dev data of the paper "xGQA: Cross-lingual Visual Question Answering".
“Data Augmentation for Cross-Domain Named Entity Recognition” (EMNLP 2021)
Data Augmentation for Cross-Domain Named Entity Recognition Authors: Shuguang Chen, Gustavo Aguilar, Leonardo Neves and Thamar Solorio This repository
Python solutions to Codeforces problems
CodeForces This repository is dedicated to my Python solutions for CodeForces problems. Feel free to copy, contribute and/or comment. If you find any
Python code written to utilize the Korlan usb2can hardware to send and receive data over the can-bus on a 2008 Nissan 350z
nissan_ecu_hacking Python code written to utilize the Korlan usb2can hardware to send and receive data over the can-bus on a 2008 Nissan 350z My goal
Pokemon catch events project to demonstrate data pipeline on AWS
Pokemon Catches Data Pipeline This is a sample project to practice end-to-end data project; Terraform is used to deploy infrastructure; Kafka is the t
Sample project showing reliable data ingestion application using FastAPI and dramatiq
Create and deploy a reliable data ingestion service with FastAPI, SQLModel and Dramatiq This is the source code for the data ingestion service explain
A collection of simple python mini projects to enhance your python skills
A collection of simple python mini projects to enhance your python skills
Kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code
A Python framework for creating reproducible, maintainable and modular data science code.
Code and data for "Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning" (EMNLP 2021).
GD-VCR Code for Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning (EMNLP 2021). Research Questions and Aims: How well can a model perform o
A simple script & container to pull COVID data from covidlive.com.au and post a summary to a slack channel
CovidLive AU Summary Slackbot This bot is a very simple slackbot that pulls data, summarises and posts up to date AU COVID stats to a provided slack c
Related resources for our EMNLP 2021 paper Plan-then-Generate: Controlled Data-to-Text Generation via Planning
Plan-then-Generate: Controlled Data-to-Text Generation via Planning Authors: Yixuan Su, David Vandyke, Sihui Wang, Yimai Fang, and Nigel Collier Code
A visualization of people a user follows on Twitter
Twitter-Map This software allows the user to create maps of Twitter accounts. Installation git clone [email protected]:OGreenwood672/Twitter-Map.git cd T
DYA ( Ditch YouTube API ) is a package created to power the user with YouTube Data API functionality without any API Key
Ditch YouTubeAPI (BETA) DYA ( Ditch YouTube API ) is a package created to power the user with YouTube Data API functionality without any API Key Detai
Using Python to scrape some basic player information from www.premierleague.com and then use Pandas to analyse said data.
PremiershipPlayerAnalysis Using Python to scrape some basic player information from www.premierleague.com and then use Pandas to analyse said data. No
Selene is a Python library and command line interface for training deep neural networks from biological sequence data such as genomes.
Selene is a Python library and command line interface for training deep neural networks from biological sequence data such as genomes.
Home Assistant integration for energy consumption data from UK SMETS (Smart) meters using the Hildebrand Glow API.
Hildebrand Glow (DCC) Integration Home Assistant integration for energy consumption data from UK SMETS (Smart) meters using the Hildebrand Glow API. T
Fix datetime EXIF data in photos downloaded from Google Takeout
fix-google-takeout Warning Use at your own risk. Backup your photos. Overview Google takeout for photos
DataCLUE: 国内首个以数据为中心的AI测评(含模型分析报告)
DataCLUE 以数据为中心的AI测评(DataCLUE) DataCLUE: A Chinese Data-centric Language Evaluation Benchmark 内容导引 章节 描述 简介 介绍以数据为中心的AI测评(DataCLUE)的背景 任务描述 任务描述 实验结果
load .txt to train YOLOX, same as Yolo others
YOLOX train your data you need generate data.txt like follow format (per line- one image). prepare one data.txt like this: img_path1 x1,y1,x2,y2,clas
A simple Python library to integrate with the Heron Data API
Heron Python This library provides easy access to the Heron Data API from applications written in Python. Documentation No language-specific docs are
Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data
Real-ESRGAN Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data Ported from https://github.com/xinntao/Real-ESRGAN Depend
G-NIA model from "Single Node Injection Attack against Graph Neural Networks" (CIKM 2021)
Single Node Injection Attack against Graph Neural Networks This repository is our Pytorch implementation of our paper: Single Node Injection Attack ag
Fully reproducible, Dockerized, step-by-step, tutorial on how to mock a "real-time" Kafka data stream from a timestamped csv file. Detailed blog post published on Towards Data Science.
time-series-kafka-demo Mock stream producer for time series data using Kafka. I walk through this tutorial and others here on GitHub and on my Medium
Lux Academy & Data Science East Africa Python Boot Camp, Building and Deploying Flask Application Using Docker Demo App.
Flask and Docker Application Demo A Docker image is a read-only, inert template that comes with instructions for deploying containers. In Docker, ever
Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"
Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"
A library for interacting with Path of Exile game and economy data, and a unique loot filter generation framework.
wraeblast A library for interfacing with Path of Exile game and economy data, and a set of item filters geared towards trade league players. Filter Ge
We protect the privacy of the data on your computer by using the camera of your Debian based Pardus operating system. 🕵️
Pardus Lookout We protect the privacy of the data on your computer by using the camera of your Debian based Pardus operating system. The application i
Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"
Text-AutoAugment (TAA) This repository contains the code for our paper Text AutoAugment: Learning Compositional Augmentation Policy for Text Classific
Functional Data Analysis, or FDA, is the field of Statistics that analyses data that depend on a continuous parameter.
Functional Data Analysis Python package
The FinQA dataset from paper: FinQA: A Dataset of Numerical Reasoning over Financial Data
Data and code for EMNLP 2021 paper "FinQA: A Dataset of Numerical Reasoning over Financial Data"
txtai executes machine-learning workflows to transform data and build AI-powered semantic search applications.
txtai executes machine-learning workflows to transform data and build AI-powered semantic search applications.
This repository contains Python Projects for Beginners as well as for Intermediate Developers built by Contributors.
Python Projects {Open Source} Introduction The repository was built with a tree-like structure in mind, it contains collections of Python Projects. Mo
Bot for Telegram data Analysis
Bot Scraper for telegram This bot use an AI to Work powered by BOG Team you must do the following steps to make the bot functional: Install the requir
Kyrie Eleison - The best and unique way to encrypt some data or a file safely
Encrypt your important data and files easily and safely with Kyrie Eleison.
This is the code for the EMNLP 2021 paper AEDA: An Easier Data Augmentation Technique for Text Classification
The baseline code is for EDA: Easy Data Augmentation techniques for boosting performance on text classification tasks
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
img2dataset Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine. Also supports
✨Rubrix is a production-ready Python framework for exploring, annotating, and managing data in NLP projects.
✨A Python framework to explore, label, and monitor data for NLP projects
Data augmentation for NLP, accepted at EMNLP 2021 Findings
AEDA: An Easier Data Augmentation Technique for Text Classification This is the code for the EMNLP 2021 paper AEDA: An Easier Data Augmentation Techni
CLabel is a terminal-based cluster labeling tool that allows you to explore text data interactively and label clusters based on reviewing that data.
CLabel is a terminal-based cluster labeling tool that allows you to explore text data interactively and label clusters based on reviewing that
TorchIO is a Medical image preprocessing and augmentation toolkit for deep learning. Part of the PyTorch Ecosystem.
Medical image preprocessing and augmentation toolkit for deep learning. Part of the PyTorch Ecosystem.
FLAML is a lightweight Python library that finds accurate machine learning models automatically, efficiently and economically
FLAML - Fast and Lightweight AutoML
The purpose of this code base is to add a specified signal-to-noise ratio noise from MUSAN dataset to a pure speech signal and to generate far-field speech data using room impulse response data from BUT Speech@FIT Reverb Database.
Add_noise_and_rir_to_speech The purpose of this code base is to add a specified signal-to-noise ratio noise from MUSAN dataset to a pure speech signal
skimpy is a light weight tool that provides summary statistics about variables in data frames within the console.
skimpy Welcome Welcome to skimpy! skimpy is a light weight tool that provides summary statistics about variables in data frames within the console. Th
The Devils Eye is an OSINT tool that searches the Darkweb for onion links and descriptions that match with the users query without requiring the use for Tor.
The Devil's Eye searches the darkweb for information relating to the user's query and returns the results including .onion links and their description
Python based tool to extract forensic info from EventTranscript.db (Windows Diagnostic Data)
EventTranscriptParser EventTranscriptParser is python based tool to extract forensically useful details from EventTranscript.db (Windows Diagnostic Da