3285 Repositories
Python data-flow-analysis Libraries
Selene is a Python library and command line interface for training deep neural networks from biological sequence data such as genomes.
Selene is a Python library and command line interface for training deep neural networks from biological sequence data such as genomes.
Adaptive Graph Convolution for Point Cloud Analysis
Adaptive Graph Convolution for Point Cloud Analysis This repository contains the implementation of AdaptConv for point cloud analysis. Adaptive Graph
Home Assistant integration for energy consumption data from UK SMETS (Smart) meters using the Hildebrand Glow API.
Hildebrand Glow (DCC) Integration Home Assistant integration for energy consumption data from UK SMETS (Smart) meters using the Hildebrand Glow API. T
Fix datetime EXIF data in photos downloaded from Google Takeout
fix-google-takeout Warning Use at your own risk. Backup your photos. Overview Google takeout for photos
DataCLUE: 国内首个以数据为中心的AI测评(含模型分析报告)
DataCLUE 以数据为中心的AI测评(DataCLUE) DataCLUE: A Chinese Data-centric Language Evaluation Benchmark 内容导引 章节 描述 简介 介绍以数据为中心的AI测评(DataCLUE)的背景 任务描述 任务描述 实验结果
Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.
Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.
load .txt to train YOLOX, same as Yolo others
YOLOX train your data you need generate data.txt like follow format (per line- one image). prepare one data.txt like this: img_path1 x1,y1,x2,y2,clas
A simple Python library to integrate with the Heron Data API
Heron Python This library provides easy access to the Heron Data API from applications written in Python. Documentation No language-specific docs are
[ICCV 2021] Amplitude-Phase Recombination: Rethinking Robustness of Convolutional Neural Networks in Frequency Domain
Amplitude-Phase Recombination (ICCV'21) Official PyTorch implementation of "Amplitude-Phase Recombination: Rethinking Robustness of Convolutional Neur
Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data
Real-ESRGAN Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data Ported from https://github.com/xinntao/Real-ESRGAN Depend
G-NIA model from "Single Node Injection Attack against Graph Neural Networks" (CIKM 2021)
Single Node Injection Attack against Graph Neural Networks This repository is our Pytorch implementation of our paper: Single Node Injection Attack ag
Fully reproducible, Dockerized, step-by-step, tutorial on how to mock a "real-time" Kafka data stream from a timestamped csv file. Detailed blog post published on Towards Data Science.
time-series-kafka-demo Mock stream producer for time series data using Kafka. I walk through this tutorial and others here on GitHub and on my Medium
Lux Academy & Data Science East Africa Python Boot Camp, Building and Deploying Flask Application Using Docker Demo App.
Flask and Docker Application Demo A Docker image is a read-only, inert template that comes with instructions for deploying containers. In Docker, ever
Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"
Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"
A library for interacting with Path of Exile game and economy data, and a unique loot filter generation framework.
wraeblast A library for interfacing with Path of Exile game and economy data, and a set of item filters geared towards trade league players. Filter Ge
We protect the privacy of the data on your computer by using the camera of your Debian based Pardus operating system. 🕵️
Pardus Lookout We protect the privacy of the data on your computer by using the camera of your Debian based Pardus operating system. The application i
Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"
Text-AutoAugment (TAA) This repository contains the code for our paper Text AutoAugment: Learning Compositional Augmentation Policy for Text Classific
Functional Data Analysis, or FDA, is the field of Statistics that analyses data that depend on a continuous parameter.
Functional Data Analysis Python package
The FinQA dataset from paper: FinQA: A Dataset of Numerical Reasoning over Financial Data
Data and code for EMNLP 2021 paper "FinQA: A Dataset of Numerical Reasoning over Financial Data"
The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"
The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"
txtai executes machine-learning workflows to transform data and build AI-powered semantic search applications.
txtai executes machine-learning workflows to transform data and build AI-powered semantic search applications.
Bot for Telegram data Analysis
Bot Scraper for telegram This bot use an AI to Work powered by BOG Team you must do the following steps to make the bot functional: Install the requir
Kyrie Eleison - The best and unique way to encrypt some data or a file safely
Encrypt your important data and files easily and safely with Kyrie Eleison.
This is the code for the EMNLP 2021 paper AEDA: An Easier Data Augmentation Technique for Text Classification
The baseline code is for EDA: Easy Data Augmentation techniques for boosting performance on text classification tasks
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
img2dataset Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine. Also supports
✨Rubrix is a production-ready Python framework for exploring, annotating, and managing data in NLP projects.
✨A Python framework to explore, label, and monitor data for NLP projects
Data augmentation for NLP, accepted at EMNLP 2021 Findings
AEDA: An Easier Data Augmentation Technique for Text Classification This is the code for the EMNLP 2021 paper AEDA: An Easier Data Augmentation Techni
A Python multilingual toolkit for Sentiment Analysis and Social NLP tasks
pysentimiento: A Python toolkit for Sentiment Analysis and Social NLP tasks A Transformer-based library for SocialNLP classification tasks. Currently
A scalable implementation of WobblyStitcher for 3D microscopy images
WobblyStitcher Introduction A scalable implementation of WobblyStitcher Dependencies $ python -m pip install numpy scikit-image Visualization ImageJ
CLabel is a terminal-based cluster labeling tool that allows you to explore text data interactively and label clusters based on reviewing that data.
CLabel is a terminal-based cluster labeling tool that allows you to explore text data interactively and label clusters based on reviewing that
TorchIO is a Medical image preprocessing and augmentation toolkit for deep learning. Part of the PyTorch Ecosystem.
Medical image preprocessing and augmentation toolkit for deep learning. Part of the PyTorch Ecosystem.
FLAML is a lightweight Python library that finds accurate machine learning models automatically, efficiently and economically
FLAML - Fast and Lightweight AutoML
The purpose of this code base is to add a specified signal-to-noise ratio noise from MUSAN dataset to a pure speech signal and to generate far-field speech data using room impulse response data from BUT Speech@FIT Reverb Database.
Add_noise_and_rir_to_speech The purpose of this code base is to add a specified signal-to-noise ratio noise from MUSAN dataset to a pure speech signal
skimpy is a light weight tool that provides summary statistics about variables in data frames within the console.
skimpy Welcome Welcome to skimpy! skimpy is a light weight tool that provides summary statistics about variables in data frames within the console. Th
pysentimiento: A Python toolkit for Sentiment Analysis and Social NLP tasks
A Python multilingual toolkit for Sentiment Analysis and Social NLP tasks
The Devils Eye is an OSINT tool that searches the Darkweb for onion links and descriptions that match with the users query without requiring the use for Tor.
The Devil's Eye searches the darkweb for information relating to the user's query and returns the results including .onion links and their description
Python based tool to extract forensic info from EventTranscript.db (Windows Diagnostic Data)
EventTranscriptParser EventTranscriptParser is python based tool to extract forensically useful details from EventTranscript.db (Windows Diagnostic Da
A Python interface between Earth Engine and xarray for processing weather and climate data
wxee What is wxee? wxee was built to make processing gridded, mesoscale time series weather and climate data quick and easy by integrating the data ca
A data parser for the internal syncing data format used by Fog of World.
A data parser for the internal syncing data format used by Fog of World. The parser is not designed to be a well-coded library with good performance, it is more like a demo for showing the data structure.
A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.
About This repository provides data and code for the paper: Scalable Data Annotation Pipeline for High-Quality Large Speech Datasets Development (subm
code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren*, Raymond A. Yeh*, Alexander G. Schwing.
Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning Overview This code is for paper: Not All Unlabeled Data are Equa
This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.
MultiModal-InfoMax This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Informa
Collection of NLP model explanations and accompanying analysis tools
Thermostat is a large collection of NLP model explanations and accompanying analysis tools. Combines explainability methods from the captum library wi
Goblyn is a Python tool focused to enumeration and capture of website files metadata.
Goblyn Metadata Enumeration What's Goblyn? Goblyn is a tool focused to enumeration and capture of website files metadata. How it works? Goblyn will se
Ray-based parallel data preprocessing for NLP and ML.
Wrangl Ray-based parallel data preprocessing for NLP and ML. pip install wrangl # for latest pip install git+https://github.com/vzhong/wrangl See exa
Code and data for the EMNLP 2021 paper "Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts". Coming soon!
ToxiChat Code and data for the EMNLP 2021 paper "Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts". Install depen
Framework for creating efficient data processing pipelines
Aqueduct Framework for creating efficient data processing pipelines. Contact Feel free to ask questions in telegram t.me/avito-ml Key Features Increas
This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.
Multimodal Deep Learning 🎆 🎆 🎆 Announcing the multimodal deep learning repository that contains implementation of various deep learning-based model
DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences (PacBio) Circular Consensus Sequencing (CCS) data.
DeepConsensus DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences (PacBio) Circular Consensus Sequencing (CCS)
Applicator Kit for Modo allow you to apply Apple ARKit Face Tracking data from your iPhone or iPad to your characters in Modo.
Applicator Kit for Modo Applicator Kit for Modo allow you to apply Apple ARKit Face Tracking data from your iPhone or iPad with a TrueDepth camera to
Deep Learning ❤️ OneFlow
Deep Learning with OneFlow made easy 🚀 ! Carefree? carefree-learn aims to provide CAREFREE usages for both users and developers. User Side Computer V
A Red Team tool for exfiltrating sensitive data from Jira tickets.
Jir-thief This Module will connect to Jira's API using an access token, export to a word .doc, and download the Jira issues that the target has access
TextDescriptives - A Python library for calculating a large variety of statistics from text
A Python library for calculating a large variety of statistics from text(s) using spaCy v.3 pipeline components and extensions. TextDescriptives can be used to calculate several descriptive statistics, readability metrics, and metrics related to dependency distance.
Official PyTorch code for Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021)
Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021) This repository is the official P
Official PyTorch code for Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021)
Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021) This repository is the official P
Code in PyTorch for the convex combination linear IAF and the Householder Flow, J.M. Tomczak & M. Welling
VAE with Volume-Preserving Flows This is a PyTorch implementation of two volume-preserving flows as described in the following papers: Tomczak, J. M.,
WRENCH: Weak supeRvision bENCHmark
🔧 What is it? Wrench is a benchmark platform containing diverse weak supervision tasks. It also provides a common and easy framework for development
Official Python wrapper for the Quantel Finance API
Quantel is a powerful financial data and insights API. It provides easy access to world-class financial information. Quantel goes beyond just financial statements, giving users valuable information like insider transactions, major shareholder transactions, share ownership, peers, and so much more.
Evidence enables analysts to deliver a polished business intelligence system using SQL and markdown.
Evidence enables analysts to deliver a polished business intelligence system using SQL and markdown
prettymaps - A minimal Python library to draw customized maps from OpenStreetMap data.
A small set of Python functions to draw pretty maps from OpenStreetMap data. Based on osmnx, matplotlib and shapely libraries.
Auto-updating data to assist in investment to NEPSE
Symbol Ratios Summary Sector LTP Undervalued Bonus % MEGA Strong Commercial Banks 368 5 10 JBBL Strong Development Banks 568 5 10 SIFC Strong Finance
Abusing Microsoft 365 OAuth Authorization Flow for Phishing Attack
Microsoft365_devicePhish Abusing Microsoft 365 OAuth Authorization Flow for Phishing Attack This is a simple proof-of-concept script that allows an at
Cross-platform MachO/ObjC Static binary analysis tool & library. class-dump + otool + lipo + more
ktool Static Mach-O binary metadata analysis tool / information dumper pip3 install k2l Development is currently taking place on the @python3.10 branc
Pipeline for fast building text classification TF-IDF + LogReg baselines.
Text Classification Baseline Pipeline for fast building text classification TF-IDF + LogReg baselines. Usage Instead of writing custom code for specif
Public API client for GETTR, a "non-bias [sic] social network," designed for data archival and analysis.
GoGettr GoGettr is an API client for GETTR, a "non-bias [sic] social network." (We will not reward their domain with a hyperlink.) GoGettr is built an
Image transformations designed for Scene Text Recognition (STR) data augmentation. Published at ICCV 2021 Workshop on Interactive Labeling and Data Augmentation for Vision.
Data Augmentation for Scene Text Recognition (ICCV 2021 Workshop) (Pronounced as "strog") Paper Arxiv Why it matters? Scene Text Recognition (STR) req
Code for the KDD 2021 paper 'Filtration Curves for Graph Representation'
Filtration Curves for Graph Representation This repository provides the code from the KDD'21 paper Filtration Curves for Graph Representation. Depende
We present a framework for training multi-modal deep learning models on unlabelled video data by forcing the network to learn invariances to transformations applied to both the audio and video streams.
Multi-Modal Self-Supervision using GDT and StiCa This is an official pytorch implementation of papers: Multi-modal Self-Supervision from Generalized D
Generative Models as a Data Source for Multiview Representation Learning
GenRep Project Page | Paper Generative Models as a Data Source for Multiview Representation Learning Ali Jahanian, Xavier Puig, Yonglong Tian, Phillip
Import, visualize, and analyze SpiderFoot OSINT data in Neo4j, a graph database
SpiderFoot Neo4j Tools Import, visualize, and analyze SpiderFoot OSINT data in Neo4j, a graph database Step 1: Installation NOTE: This installs the sf
Low code JSON to extract data in one line
JSON Inline Low code JSON to extract data in one line ENG RU Installation pip install json-inline Usage Rules Modificator Description ?key:value Searc
A Pythonic Data Catalog powered by Ray that brings exabyte-level scalability and fast, ACID-compliant, change-data-capture to your big data workloads.
DeltaCAT DeltaCAT is a Pythonic Data Catalog powered by Ray. Its data storage model allows you to define and manage fast, scalable, ACID-compliant dat
Data Preparation, Processing, and Visualization for MoVi Data
MoVi-Toolbox Data Preparation, Processing, and Visualization for MoVi Data, https://www.biomotionlab.ca/movi/ MoVi is a large multipurpose dataset of
Part-Aware Data Augmentation for 3D Object Detection in Point Cloud
Part-Aware Data Augmentation for 3D Object Detection in Point Cloud This repository contains a reference implementation of our Part-Aware Data Augment
Official repository with code and data accompanying the NAACL 2021 paper "Hurdles to Progress in Long-form Question Answering" (https://arxiv.org/abs/2103.06332).
Hurdles to Progress in Long-form Question Answering This repository contains the official scripts and datasets accompanying our NAACL 2021 paper, "Hur
Parametric Contrastive Learning (ICCV2021)
Parametric-Contrastive-Learning This repository contains the implementation code for ICCV2021 paper: Parametric Contrastive Learning (https://arxiv.or
Generative Flow Networks
Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation Implementation for our paper, submitted to NeurIPS 2021 (also chec
Malcolm is a powerful, easily deployable network traffic analysis tool suite for full packet capture artifacts (PCAP files) and Zeek logs.
Malcolm is a powerful, easily deployable network traffic analysis tool suite for full packet capture artifacts (PCAP files) and Zeek logs.
Phishing Abusing Microsoft 365 OAuth Authorization Flow
Microsoft365_devicePhish Abusing Microsoft 365 OAuth Authorization Flow for Phishing Attack This is a simple proof-of-concept script that allows an at
Nasdaq Cloud Data Service (NCDS) provides a modern and efficient method of delivery for realtime exchange data and other financial information. This repository provides an SDK for developing applications to access the NCDS.
Nasdaq Cloud Data Service (NCDS) Nasdaq Cloud Data Service (NCDS) provides a modern and efficient method of delivery for realtime exchange data and ot
Using python-binance to provide websocket data to freqtrade
The goal of this project is to provide an alternative way to get realtime data from Binance and use it in freqtrade despite the exchange used. It also uses talipp for computing
LOC-FLOW is an “hands-free” earthquake location workflow to process continuous seismic records
LOC-FLOW is an “hands-free” earthquake location workflow to process continuous seismic records: from raw waveforms to well located earthquakes with magnitude calculations. The package assembles several popular routines for sequential earthquake location refinements, suitable for catalog building ranging from local to regional scales.
Named Entity Recognition with Small Strongly Labeled and Large Weakly Labeled Data
Named Entity Recognition with Small Strongly Labeled and Large Weakly Labeled Data arXiv This is the code base for weakly supervised NER. We provide a
Script de monitoramento de telemetria para missões espaciais, cansat e foguetemodelismo.
Aeroespace_GroundStation Script de monitoramento de telemetria para missões espaciais, cansat e foguetemodelismo. Imagem 1 - Dashboard realizando moni
Home Assistant integration for spanish electrical data providers (e.g., datadis)
homeassistant-edata Esta integración para Home Assistant te permite seguir de un vistazo tus consumos y máximas potencias alcanzadas. Para ello, se ap
This repository contains a streaming Dataflow pipeline written in Python with Apache Beam, reading data from PubSub.
Sample streaming Dataflow pipeline written in Python This repository contains a streaming Dataflow pipeline written in Python with Apache Beam, readin
This is the code for ACL2021 paper A Unified Generative Framework for Aspect-Based Sentiment Analysis
This is the code for ACL2021 paper A Unified Generative Framework for Aspect-Based Sentiment Analysis Install the package in the requirements.txt, the
Natural Language Processing library built with AllenNLP 🌲🌱
Custom Natural Language Processing with big and small models 🌲🌱
Python ELT Studio, an application for building ELT (and ETL) data flows.
The Python Extract, Load, Transform Studio is an application for performing ELT (and ETL) tasks. Under the hood the application consists of a two parts.
VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning
VisualGPT Our Paper VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning Main Architecture of Our VisualGPT Downloa
Test django schema and data migrations, including migrations' order and best practices.
django-test-migrations Features Allows to test django schema and data migrations Allows to test both forward and rollback migrations Allows to test th
PyTorch Lightning Optical Flow models, scripts, and pretrained weights.
PyTorch Lightning Optical Flow models, scripts, and pretrained weights.
Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.
Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.
A light-weight image labelling tool for Python designed for creating segmentation data sets.
An image labelling tool for creating segmentation data sets, for Django and Flask.
一套完整的微博舆情分析流程代码,包括微博爬虫、LDA主题分析和情感分析。
已经将项目的关键文件上传,包含微博爬虫、LDA主题分析和情感分析三个部分。 1.微博爬虫 实现微博评论爬取和微博用户信息爬取,一天大概十万条。 2.LDA主题分析 实现文档主题抽取,包括数据清洗及分词、主题数的确定(主题一致性和困惑度)和最优主题模型的选择(暴力搜索)。 3.情感分析 实现评论文本的
Download history data from binance and save to dataframe or csv file
Binance history data downloader Download history data from binance and save to dataframe or csv file
Slack bot for monitoring your Metaflow flows!
Metaflowbot - Slack Bot for your Metaflow flows! Metaflowbot makes it fun and easy to monitor your Metaflow runs, past and present. Imagine starting a
Creates a C array from a hex-string or a stream of binary data.
hex2array-c Creates a C array from a hex-string. Usage Usage: python3 hex2array_c.py HEX_STRING [-h|--help] Use '-' to read the hex string from STDIN.
Implementation of "Selection via Proxy: Efficient Data Selection for Deep Learning" from ICLR 2020.
Selection via Proxy: Efficient Data Selection for Deep Learning This repository contains a refactored implementation of "Selection via Proxy: Efficien