1190 Repositories
Python analysis-pipeline Libraries
Pipelines de datos, 2021.
Este repo ilustra un proceso sencillo de automatización de transformación y modelado de datos, a través de un pipeline utilizando Luigi. Stack princip
Elementary is an open-source data reliability framework for modern data teams. The first module of the framework is data lineage.
Data lineage made simple, reliable, and automated. Effortlessly track the flow of data, understand dependencies and analyze impact. Features Visualiza
Shotgrid Toolkit Engine for Gaffer
Shotgun toolkit engine for Gaffer Contact : Diego Garcia Huerta Overview Implementation of a shotgun engine for Gaffer. It supports the classic bootst
ThePhish: an automated phishing email analysis tool
ThePhish ThePhish is an automated phishing email analysis tool based on TheHive, Cortex and MISP. It is a web application written in Python 3 and base
Pre-trained BERT Models for Ancient and Medieval Greek, and associated code for LaTeCH 2021 paper titled - "A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek"
Ancient Greek BERT The first and only available Ancient Greek sub-word BERT model! State-of-the-art post fine-tuning on Part-of-Speech Tagging and Mor
Research Artifact of USENIX Security 2022 Paper: Automated Side Channel Analysis of Media Software with Manifold Learning
Manifold-SCA Research Artifact of USENIX Security 2022 Paper: Automated Side Channel Analysis of Media Software with Manifold Learning The repo is org
📊📈 Serves up Pandas dataframes via the Django REST Framework for use in client-side (i.e. d3.js) visualizations and offline analysis (e.g. Excel)
📊📈 Serves up Pandas dataframes via the Django REST Framework for use in client-side (i.e. d3.js) visualizations and offline analysis (e.g. Excel)
Sampling profiler for Python programs
py-spy: Sampling profiler for Python programs py-spy is a sampling profiler for Python programs. It lets you visualize what your Python program is spe
Know your customer pipeline in apache air flow
KYC_pipline Know your customer pipeline in apache air flow For a successful pipeline run take these steps: Run you Airflow server Admin - connection
An adaptable Snakemake workflow which uses GATKs best practice recommendations to perform germline mutation calling starting with BAM files
Germline Mutation Calling This Snakemake workflow follows the GATK best-practice recommandations to call small germline variants. The pipeline require
A collection of resources/tools and analyses for the angr binary analysis framework.
Awesome angr A collection of resources/tools and analyses for the angr binary analysis framework. This page does not only collect links and external r
A Multilingual Latent Dirichlet Allocation (LDA) Pipeline with Stop Words Removal, n-gram features, and Inverse Stemming, in Python.
Multilingual Latent Dirichlet Allocation (LDA) Pipeline This project is for text clustering using the Latent Dirichlet Allocation (LDA) algorithm. It
A production-ready pipeline for text mining and subject indexing
A production-ready pipeline for text mining and subject indexing
Project repository of Apache Airflow, deployed on Docker in Amazon EC2 via GitLab.
Airflow on Docker in EC2 + GitLab's CI/CD Personal project for simple data pipeline using Airflow. Airflow will be installed inside Docker container,
Beta Distribution Guided Aspect-aware Graph for Aspect Category Sentiment Analysis with Affective Knowledge. Proceedings of EMNLP 2021
AAGCN-ACSA EMNLP 2021 Introduction This repository was used in our paper: Beta Distribution Guided Aspect-aware Graph for Aspect Category Sentiment An
Pyan3 - Offline call graph generator for Python 3
Pyan takes one or more Python source files, performs a (rather superficial) static analysis, and constructs a directed graph of the objects in the combined source, and how they define or use each other. The graph can be output for rendering by GraphViz or yEd.
Sampling profiler for Python programs
py-spy: Sampling profiler for Python programs py-spy is a sampling profiler for Python programs. It lets you visualize what your Python program is spe
Django package to log request values such as device, IP address, user CPU time, system CPU time, No of queries, SQL time, no of cache calls, missing, setting data cache calls for a particular URL with a basic UI.
django-web-profiler's documentation: Introduction: django-web-profiler is a django profiling tool which logs, stores debug toolbar statistics and also
📊📈 Serves up Pandas dataframes via the Django REST Framework for use in client-side (i.e. d3.js) visualizations and offline analysis (e.g. Excel)
Django REST Pandas Django REST Framework + pandas = A Model-driven Visualization API Django REST Pandas (DRP) provides a simple way to generate and se
Obsidian tools - a Python package for analysing an Obsidian.md vault
obsidiantools is a Python package for getting structured metadata about your Obsidian.md notes and analysing your vault.
Pipeline is an asset packaging library for Django.
Pipeline Pipeline is an asset packaging library for Django, providing both CSS and JavaScript concatenation and compression, built-in JavaScript templ
A spaCy wrapper of OpenTapioca for named entity linking on Wikidata
spaCyOpenTapioca A spaCy wrapper of OpenTapioca for named entity linking on Wikidata. Table of contents Installation How to use Local OpenTapioca Vizu
CDLA: A Chinese document layout analysis (CDLA) dataset
CDLA: A Chinese document layout analysis (CDLA) dataset 介绍 CDLA是一个中文文档版面分析数据集,面向中文文献类(论文)场景。包含以下10个label: 正文 标题 图片 图片标题 表格 表格标题 页眉 页脚 注释 公式 Text Title
Technical Indicators implemented in Python only using Numpy-Pandas as Magic - Very Very Fast! Very tiny! Stock Market Financial Technical Analysis Python library . Quant Trading automation or cryptocoin exchange
MyTT Technical Indicators implemented in Python only using Numpy-Pandas as Magic - Very Very Fast! to Stock Market Financial Technical Analysis Python
An all-inclusive Python framework for the Riot Games League of Legends API. We focus on making the data easy and fun to work with, while providing all the tools necessary to create a website or do data analysis.
Cassiopeia A Python adaptation of the Riot Games League of Legends API (https://developer.riotgames.com/). Cassiopeia is the sister library to Orianna
Visions provides an extensible suite of tools to support common data analysis operations
Visions And these visions of data types, they kept us up past the dawn. Visions provides an extensible suite of tools to support common data analysis
An open-source Python project series where beginners can contribute and practice coding.
Python Mini Projects A collection of easy Python small projects to help you improve your programming skills. Table Of Contents Aim Of The Project Cont
Klara is a static analysis tools to automatic generate test case, based on SMT (z3) solver, with a powerful ast level inference system.
Automatic test case generation for python and static analysis library
Some usefull scripts for the Nastran's 145 solution (Flutter Analysis) using the pyNastran package.
nastran-aero-flutter This project is intended to analyse the Supersonic Panel Flutter using the NASTRAN software. The project uses the pyNastran and t
TidyPy is a tool that encapsulates a number of other static analysis tools and makes it easy to configure, execute, and review their results.
TidyPy Contents Overview Features Usage Docker Configuration Ignoring Issues Included Tools Included Reporters Included Integrations Extending TidyPy
Goddard Image Analysis and Navigation Tool
Copyright 2021 United States Government as represented by the Administrator of the National Aeronautics and Space Administration. No copyright is clai
Pokemon catch events project to demonstrate data pipeline on AWS
Pokemon Catches Data Pipeline This is a sample project to practice end-to-end data project; Terraform is used to deploy infrastructure; Kafka is the t
Kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code
A Python framework for creating reproducible, maintainable and modular data science code.
Collection of useful (to me) python scripts for interacting with napari
Napari scripts A collection of napari related tools in various state of disrepair/functionality. Browse_LIF_widget.py This module can be imported, for
Selene is a Python library and command line interface for training deep neural networks from biological sequence data such as genomes.
Selene is a Python library and command line interface for training deep neural networks from biological sequence data such as genomes.
Adaptive Graph Convolution for Point Cloud Analysis
Adaptive Graph Convolution for Point Cloud Analysis This repository contains the implementation of AdaptConv for point cloud analysis. Adaptive Graph
[ICCV 2021] Amplitude-Phase Recombination: Rethinking Robustness of Convolutional Neural Networks in Frequency Domain
Amplitude-Phase Recombination (ICCV'21) Official PyTorch implementation of "Amplitude-Phase Recombination: Rethinking Robustness of Convolutional Neur
Functional Data Analysis, or FDA, is the field of Statistics that analyses data that depend on a continuous parameter.
Functional Data Analysis Python package
The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"
The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"
Bot for Telegram data Analysis
Bot Scraper for telegram This bot use an AI to Work powered by BOG Team you must do the following steps to make the bot functional: Install the requir
A Python multilingual toolkit for Sentiment Analysis and Social NLP tasks
pysentimiento: A Python toolkit for Sentiment Analysis and Social NLP tasks A Transformer-based library for SocialNLP classification tasks. Currently
A scalable implementation of WobblyStitcher for 3D microscopy images
WobblyStitcher Introduction A scalable implementation of WobblyStitcher Dependencies $ python -m pip install numpy scikit-image Visualization ImageJ
TorchIO is a Medical image preprocessing and augmentation toolkit for deep learning. Part of the PyTorch Ecosystem.
Medical image preprocessing and augmentation toolkit for deep learning. Part of the PyTorch Ecosystem.
skimpy is a light weight tool that provides summary statistics about variables in data frames within the console.
skimpy Welcome Welcome to skimpy! skimpy is a light weight tool that provides summary statistics about variables in data frames within the console. Th
pysentimiento: A Python toolkit for Sentiment Analysis and Social NLP tasks
A Python multilingual toolkit for Sentiment Analysis and Social NLP tasks
A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.
About This repository provides data and code for the paper: Scalable Data Annotation Pipeline for High-Quality Large Speech Datasets Development (subm
This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.
MultiModal-InfoMax This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Informa
Collection of NLP model explanations and accompanying analysis tools
Thermostat is a large collection of NLP model explanations and accompanying analysis tools. Combines explainability methods from the captum library wi
This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.
Multimodal Deep Learning 🎆 🎆 🎆 Announcing the multimodal deep learning repository that contains implementation of various deep learning-based model
TextDescriptives - A Python library for calculating a large variety of statistics from text
A Python library for calculating a large variety of statistics from text(s) using spaCy v.3 pipeline components and extensions. TextDescriptives can be used to calculate several descriptive statistics, readability metrics, and metrics related to dependency distance.
Official Python wrapper for the Quantel Finance API
Quantel is a powerful financial data and insights API. It provides easy access to world-class financial information. Quantel goes beyond just financial statements, giving users valuable information like insider transactions, major shareholder transactions, share ownership, peers, and so much more.
Cross-platform MachO/ObjC Static binary analysis tool & library. class-dump + otool + lipo + more
ktool Static Mach-O binary metadata analysis tool / information dumper pip3 install k2l Development is currently taking place on the @python3.10 branc
Pipeline for fast building text classification TF-IDF + LogReg baselines.
Text Classification Baseline Pipeline for fast building text classification TF-IDF + LogReg baselines. Usage Instead of writing custom code for specif
Public API client for GETTR, a "non-bias [sic] social network," designed for data archival and analysis.
GoGettr GoGettr is an API client for GETTR, a "non-bias [sic] social network." (We will not reward their domain with a hyperlink.) GoGettr is built an
Code for the KDD 2021 paper 'Filtration Curves for Graph Representation'
Filtration Curves for Graph Representation This repository provides the code from the KDD'21 paper Filtration Curves for Graph Representation. Depende
CPOST is a CLI tool to assist with the proper sizing of Clara Deploy pipelines
CPOST (Clara Pipeline Operator Sizing Tool) Tool to measure resource usage of Clara Platform pipeline operators Cpost is a tool that will help you run
Malcolm is a powerful, easily deployable network traffic analysis tool suite for full packet capture artifacts (PCAP files) and Zeek logs.
Malcolm is a powerful, easily deployable network traffic analysis tool suite for full packet capture artifacts (PCAP files) and Zeek logs.
This repository contains a streaming Dataflow pipeline written in Python with Apache Beam, reading data from PubSub.
Sample streaming Dataflow pipeline written in Python This repository contains a streaming Dataflow pipeline written in Python with Apache Beam, readin
This is the code for ACL2021 paper A Unified Generative Framework for Aspect-Based Sentiment Analysis
This is the code for ACL2021 paper A Unified Generative Framework for Aspect-Based Sentiment Analysis Install the package in the requirements.txt, the
一套完整的微博舆情分析流程代码,包括微博爬虫、LDA主题分析和情感分析。
已经将项目的关键文件上传,包含微博爬虫、LDA主题分析和情感分析三个部分。 1.微博爬虫 实现微博评论爬取和微博用户信息爬取,一天大概十万条。 2.LDA主题分析 实现文档主题抽取,包括数据清洗及分词、主题数的确定(主题一致性和困惑度)和最优主题模型的选择(暴力搜索)。 3.情感分析 实现评论文本的
A robust pointcloud registration pipeline based on correlation.
PHASER: A Robust and Correspondence-Free Global Pointcloud Registration Ubuntu 18.04+ROS Melodic: Overview Pointcloud registration using correspondenc
Procedural 3D data generation pipeline for architecture
Synthetic Dataset Generator Authors: Stanislava Fedorova Alberto Tono Meher Shashwat Nigam Jiayao Zhang Amirhossein Ahmadnia Cecilia bolognesi Dominik
Yuyu Scanner is a Web Reconnaissance & Web Analysis Scanner to find assets and information about targets.
Yuyu Scanner Yuyu Scanner is a Web Reconnaissance & Web Analysis Scanner to find assets and information about targets. installation ! run as root
Scraping and analysis of leetcode-compensations page.
Leetcode compensations report Scraping and analysis of leetcode-compensations page.
Finetuning Pipeline
KLUE Baseline Korean(한국어) KLUE-baseline contains the baseline code for the Korean Language Understanding Evaluation (KLUE) benchmark. See our paper fo
This demo showcase the use of onnxruntime-rs with a GPU on CUDA 11 to run Bert in a data pipeline with Rust.
Demo BERT ONNX pipeline written in rust This demo showcase the use of onnxruntime-rs with a GPU on CUDA 11 to run Bert in a data pipeline with Rust. R
PyTorch implementation of the wavelet analysis from Torrence & Compo
Continuous Wavelet Transforms in PyTorch This is a PyTorch implementation for the wavelet analysis outlined in Torrence and Compo (BAMS, 1998). The co
LogDeep is an open source deeplearning-based log analysis toolkit for automated anomaly detection.
LogDeep is an open source deeplearning-based log analysis toolkit for automated anomaly detection.
This is an online course where you can learn and master the skill of low-level performance analysis and tuning.
Performance Ninja Class This is an online course where you can learn to find and fix low-level performance issues, for example CPU cache misses and br
Anomaly detection on SQL data warehouses and databases
With CueObserve, you can run anomaly detection on data in your SQL data warehouses and databases. Getting Started Install via Docker docker run -p 300
TLA - Twitter Linguistic Analysis
TLA - Twitter Linguistic Analysis Tool for linguistic analysis of communities TLA is built using PyTorch, Transformers and several other State-of-the-
Greppin' Logs: Leveling Up Log Analysis
This repo contains sample code and example datasets from Jon Stewart and Noah Rubin's presentation at the 2021 SANS DFIR Summit titled Greppin' Logs. The talk was centered around the idea that Forensics is Data Engineering and Data Science, and should be approached as such. Jon and Noah focused on the core (Unix) command line tools useful to anyone analyzing datasets from a terminal, purpose-built tools for handling structured tabular and JSON data, Stroz Friedberg's open source multipattern search tool Lightgrep, and scaling with AWS.
Newsemble is an API that provides easy access to the current news for programmatic analysis
Newsemble is an API that provides easy access to the current news for programmatic analysis. It has been built using Python, BeautifulSoup and MongoDB.
DRIFT is a tool for Diachronic Analysis of Scientific Literature.
About DRIFT is a tool for Diachronic Analysis of Scientific Literature. The application offers user-friendly and customizable utilities for two modes:
Squidpy is a tool for the analysis and visualization of spatial molecular data.
Squidpy is a tool for the analysis and visualization of spatial molecular data. It builds on top of scanpy and anndata, from which it inherits modularity and scalability. It provides analysis tools that leverages the spatial coordinates of the data, as well as tissue images if available.
Code and dataset for ACL2018 paper "Exploiting Document Knowledge for Aspect-level Sentiment Classification"
Aspect-level Sentiment Classification Code and dataset for ACL2018 [paper] ‘‘Exploiting Document Knowledge for Aspect-level Sentiment Classification’’
Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening
Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening Introduction This is an implementation of the model used for breast
The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.
The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.
🛠 All-in-one web-based IDE specialized for machine learning and data science.
All-in-one web-based development environment for machine learning Getting Started • Features & Screenshots • Support • Report a Bug • FAQ • Known Issu
Manage your exceptions in Python like a PRO
A linter to manage all your python exceptions and try/except blocks (limited only for those who like dinosaurs).
A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.
A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.
A linter to manage all your python exceptions and try/except blocks (limited only for those who like dinosaurs).
Manage your exceptions in Python like a PRO Currently in BETA. Inspired by this blog post. I shared the building process of this tool here. “For those
Pose Detection and Machine Learning for real-time body posture analysis during exercise to provide audiovisual feedback on improvement of form.
Posture: Pose Tracking and Machine Learning for prescribing corrective suggestions to improve posture and form while exercising. This repository conta
Visual Python is a GUI-based Python code generator, developed on the Jupyter Notebook environment as an extension.
Visual Python is a GUI-based Python code generator, developed on the Jupyter Notebook environment as an extension.
Distributed DataLoader For Pytorch Based On Ray
Dpex——用户无感知分布式数据预处理组件 一、前言 随着GPU与CPU的算力差距越来越大以及模型训练时的预处理Pipeline变得越来越复杂,CPU部分的数据预处理已经逐渐成为了模型训练的瓶颈所在,这导致单机的GPU配置的提升并不能带来期望的线性加速。预处理性能瓶颈的本质在于每个GPU能够使用的C
Implementation of StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation in PyTorch
StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation Implementation of StyleSpace Analysis: Disentangled Controls for StyleGAN Ima
Simplified diarization pipeline using some pretrained models - audio file to diarized segments in a few lines of code
simple_diarizer Simplified diarization pipeline using some pretrained models. Made to be a simple as possible to go from an input audio file to diariz
PyTorch implementation of ARM-Net: Adaptive Relation Modeling Network for Structured Data.
A ready-to-use framework of latest models for structured (tabular) data learning with PyTorch. Applications include recommendation, CRT prediction, healthcare analytics, and etc.
A Python Module That Uses ANN To Predict A Stocks Price And Also Provides Accurate Technical Analysis With Many High Potential Implementations!
Stox ⚡ A Python Module For The Stock Market ⚡ A Module to predict the "close price" for the next day and give "technical analysis". It uses a Neural N
Gamestonk Terminal is an awesome stock and crypto market terminal
Gamestonk Terminal is an awesome stock and crypto market terminal. A FOSS alternative to Bloomberg Terminal.
30 Days Of Machine Learning Using Pytorch
Objective of the repository is to learn and build machine learning models using Pytorch. 30DaysofML Using Pytorch
Code for our TKDE paper "Understanding WeChat User Preferences and “Wow” Diffusion"
wechat-wow-analysis Understanding WeChat User Preferences and “Wow” Diffusion. Fanjin Zhang, Jie Tang, Xueyi Liu, Zhenyu Hou, Yuxiao Dong, Jing Zhang,
TorchX is a library containing standard DSLs for authoring and running PyTorch related components for an E2E production ML pipeline.
TorchX is a library containing standard DSLs for authoring and running PyTorch related components for an E2E production ML pipeline
frida-based ceserver. iOS analysis is possible with Cheat Engine.
frida-ceserver frida-based ceserver. iOS analysis is possible with Cheat Engine. Original by Dark Byte. Usage Install frida on iOS. python main.py Cyd
TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset.
TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset. TunBERT was applied to three NLP downstream tasks: Sentiment Analysis (SA), Tunisian Dialect Identification (TDI) and Reading Comprehension Question-Answering (RCQA)
🧪 Panel-Chemistry - exploratory data analysis and build powerful data and viz tools within the domain of Chemistry using Python and HoloViz Panel.
🧪📈 🐍. The purpose of the panel-chemistry project is to make it really easy for you to do DATA ANALYSIS and build powerful DATA AND VIZ APPLICATIONS within the domain of Chemistry using using Python and HoloViz Panel.
spafe: Simplified Python Audio-Features Extraction
spafe aims to simplify features extractions from mono audio files. The library can extract of the following features: BFCC, LFCC, LPC, LPCC, MFCC, IMFCC, MSRCC, NGCC, PNCC, PSRCC, PLP, RPLP, Frequency-stats etc. It also provides various filterbank modules (Mel, Bark and Gammatone filterbanks) and other spectral statistics.
🐸 Identify anything. pyWhat easily lets you identify emails, IP addresses, and more. Feed it a .pcap file or some text and it'll tell you what it is! 🧙♀️
🐸 Identify anything. pyWhat easily lets you identify emails, IP addresses, and more. Feed it a .pcap file or some text and it'll tell you what it is! 🧙♀️
Kats is a toolkit to analyze time series data, a lightweight, easy-to-use, and generalizable framework to perform time series analysis.
Kats, a kit to analyze time series data, a lightweight, easy-to-use, generalizable, and extendable framework to perform time series analysis, from understanding the key statistics and characteristics, detecting change points and anomalies, to forecasting future trends.