503 Repositories
Python extraction-engine Libraries
A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API
micrograd A tiny Autograd engine (with a bite! :)). Implements backpropagation (reverse-mode autodiff) over a dynamically built DAG and a small neural
Manim is an animation engine for explanatory math videos.
A community-maintained Python framework for creating mathematical animations.
BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.
A lightweight, GPU accelerated, SQL engine built on the RAPIDS.ai ecosystem. Get Started on app.blazingsql.com Getting Started | Documentation | Examp
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
NVIDIA DALI The NVIDIA Data Loading Library (DALI) is a library for data loading and pre-processing to accelerate deep learning applications. It provi
Automatic extraction of relevant features from time series:
tsfresh This repository contains the TSFRESH python package. The abbreviation stands for "Time Series Feature extraction based on scalable hypothesis
:mag_right: :chart_with_upwards_trend: :snake: :moneybag: Backtest trading strategies in Python.
Backtesting.py Backtest trading strategies with Python. Project website Documentation the project if you use it. Installation $ pip install backtestin
PURE: End-to-End Relation Extraction
PURE: End-to-End Relation Extraction This repository contains (PyTorch) code and pre-trained models for PURE (the Princeton University Relation Extrac
"Very simple but works well" Computer Vision based ID verification solution provided by LibraX.
ID Verification by LibraX.ai This is the first free Identity verification in the market. LibraX.ai is an identity verification platform for developers
Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)
Open Semantic Search https://opensemanticsearch.org Integrated search server, ETL framework for document processing (crawling, text extraction, text a
A general list of resources to image text localization and recognition 场景文本位置感知与识别的论文资源与实现合集 シーンテキストの位置認識と識別のための論文リソースの要約
Scene Text Localization & Recognition Resources Read this institute-wise: English, 简体中文. Read this year-wise: English, 简体中文. Tags: [STL] (Scene Text L
OCR engine for all the languages
Description kraken is a turn-key OCR system optimized for historical and non-Latin script material. kraken's main features are: Fully trainable layout
A Tensorflow model for text recognition (CNN + seq2seq with visual attention) available as a Python package and compatible with Google Cloud ML Engine.
Attention-based OCR Visual attention-based OCR model for image recognition with additional tools for creating TFRecords datasets and exporting the tra
Line based ATR Engine based on OCRopy
OCR Engine based on OCRopy and Kraken using python3. It is designed to both be easy to use from the command line but also be modular to be integrated
Tesseract Open Source OCR Engine (main repository)
Tesseract OCR About This package contains an OCR engine - libtesseract and a command line program - tesseract. Tesseract 4 adds a new neural net (LSTM
CUTIE (TensorFlow implementation of Convolutional Universal Text Information Extractor)
CUTIE TensorFlow implementation of the paper "CUTIE: Learning to Understand Documents with Convolutional Universal Text Information Extractor." Xiaohu
Python library to extract tabular data from images and scanned PDFs
Overview ExtractTable - API to extract tabular data from images and scanned PDFs The motivation is to make it easy for developers to extract tabular d
Camelot: PDF Table Extraction for Humans
Camelot: PDF Table Extraction for Humans Camelot is a Python library that makes it easy for anyone to extract tables from PDF files! Note: You can als
Table Extraction Tool
Tree Structure - Table Extraction Fonduer has been successfully extended to perform information extraction from richly formatted data such as tables.
Unofficial implementation of "TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images"
TableNet Unofficial implementation of ICDAR 2019 paper : TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from
A semi-automatic open-source tool for Layout Analysis and Region EXtraction on early printed books.
LAREX LAREX is a semi-automatic open-source tool for layout analysis on early printed books. It uses a rule based connected components approach which
Make your own game in a font!
Project structure. Included is a suite of tools to create font games. Tutorial: For a quick tutorial about how to make your own game go here For devel
AntroPy: entropy and complexity of (EEG) time-series in Python
AntroPy is a Python 3 package providing several time-efficient algorithms for computing the complexity of time-series. It can be used for example to e
An unofficial Blender add-on for Autodesk's Arnold render engine.
Arnold for Blender Arnold for Blender (or BtoA) provides a bridge to the Arnold renderer from within Blender's standard interface. BtoA is an unoffici
Privacy-respecting metasearch engine
Privacy-respecting, hackable metasearch engine / pronunciation səːks. If you are looking for running instances, ready to use, then visit searx.space.
Malware Configuration And Payload Extraction
CAPE: Malware Configuration And Payload Extraction CAPE is a malware sandbox. It is derived from Cuckoo and is designed to automate the process of mal
Malware Configuration And Payload Extraction
CAPEv2 (Python3) has now been released CAPEv2 With the imminent end-of-life for Python 2 (January 1 2020), CAPEv1 will be phased out. Please upgrade t
StackStorm (aka "IFTTT for Ops") is event-driven automation for auto-remediation, security responses, troubleshooting, deployments, and more. Includes rules engine, workflow, 160 integration packs with 6000+ actions (see https://exchange.stackstorm.org) and ChatOps. Installer at https://docs.stackstorm.com/install/index.html. Questions? https://forum.stackstorm.com/.
StackStorm is a platform for integration and automation across services and tools, taking actions in response to events. Learn more at www.stackstorm.
Code Review, hosted on Google App Engine
Welcome to Rietveld GitHub Wiki: https://github.com/rietveld-codereview/rietveld/wiki Google Group: http://groups.google.com/group/codereview-discuss
A Python Static Website Generator
Version 0.8.9 Overview Hyde starter kit by merlinrebrovic is a really nice way to get started with hyde. Hyde layout for bootstrap by auzigog is also
Feature extraction made simple with torchextractor
torchextractor: PyTorch Intermediate Feature Extraction Introduction Too many times some model definitions get remorselessly copy-pasted just because
Hermes Bytecode Reverse Engineering Tool (Assemble/Disassemble Hermes Bytecode)
hbctool A command-line interface for disassembling and assembling the Hermes Bytecode. Since the React Native team created their own JavaScript engine
LibXtract is a simple, portable, lightweight library of audio feature extraction functions.
LibXtract LibXtract is a simple, portable, lightweight library of audio feature extraction functions. The purpose of the library is to provide a relat
a library for audio and music analysis
aubio aubio is a library to label music and sounds. It listens to audio signals and attempts to detect events. For instance, when a drum is hit, at wh
Automatic extraction of relevant features from time series:
tsfresh This repository contains the TSFRESH python package. The abbreviation stands for "Time Series Feature extraction based on scalable hypothesis
BatchFlow helps you conveniently work with random or sequential batches of your data and define data processing and machine learning workflows even for datasets that do not fit into memory.
BatchFlow BatchFlow helps you conveniently work with random or sequential batches of your data and define data processing and machine learning workflo
Abilian Social Business Engine - an enterprise social networking / collaboration platform.
About Abilian SBE (Social Business Engine) is a platform for social business applications, and more specifically collaborative / enterprise 2.0 busine
Animation engine for explanatory math videos
Manim is an engine for precise programatic animations, designed for creating explanatory math videos. Note, there are two versions of manim. This repo
Data intensive science for everyone.
The latest information about Galaxy can be found on the Galaxy Community Hub. Community support is available at Galaxy Help. Galaxy Quickstart Galaxy
MoinMoin Wiki Development (2.0+), unstable, for production please use 1.9.x.
MoinMoin - a wiki engine in Python MoinMoin is an easy to use, full-featured and extensible wiki software package written in Python. It can fulfill a
A free & open modern, fast email client with user-friendly encryption and privacy features
Welcome to Mailpile! Introduction Mailpile (https://www.mailpile.is/) is a modern, fast web-mail client with user-friendly encryption and privacy feat
Abilian Social Business Engine - an enterprise social networking / collaboration platform.
About Abilian SBE (Social Business Engine) is a platform for social business applications, and more specifically collaborative / enterprise 2.0 busine
:mag: Ambar: Document Search Engine
🔍 Ambar: Document Search Engine Ambar is an open-source document search engine with automated crawling, OCR, tagging and instant full-text search. Am
Knowledge Management for Humans using Machine Learning & Tags
HyperTag HyperTag helps humans intuitively express how they think about their files using tags and machine learning.
Minecraft clone using Python Ursina game engine!
Minecraft clone using Python Ursina game engine!
Knowledge Management for Humans using Machine Learning & Tags
HyperTag helps humans intuitively express how they think about their files using tags and machine learning. Represent how you think using tags. Find what you look for using semantic search for your text documents (yes, even PDF's) and images.
An Open-Source Package for Neural Relation Extraction (NRE)
OpenNRE We have a DEMO website (http://opennre.thunlp.ai/). Try it out! OpenNRE is an open-source and extensible toolkit that provides a unified frame
Snips Python library to extract meaning from text
Snips NLU Snips NLU (Natural Language Understanding) is a Python library that allows to extract structured information from sentences written in natur
Extract Keywords from sentence or Replace keywords in sentences.
FlashText This module can be used to replace keywords in sentences or extract keywords from sentences. It is based on the FlashText algorithm. Install
Python implementation of TextRank for phrase extraction and summarization of text documents
PyTextRank PyTextRank is a Python implementation of TextRank as a spaCy pipeline extension, used to: extract the top-ranked phrases from text document
Module for automatic summarization of text documents and HTML pages.
Automatic text summarizer Simple library and command line utility for extracting summary from HTML pages or plain texts. The package also contains sim
:mag: End-to-End Framework for building natural language search interfaces to data by utilizing Transformers and the State-of-the-Art of NLP. Supporting DPR, Elasticsearch, HuggingFace’s Modelhub and much more!
Haystack is an end-to-end framework that enables you to build powerful and production-ready pipelines for different search use cases. Whether you want
State of the Art Natural Language Processing
Spark NLP: State of the Art Natural Language Processing Spark NLP is a Natural Language Processing library built on top of Apache Spark ML. It provide
An open source library for deep learning end-to-end dialog systems and chatbots.
DeepPavlov is an open-source conversational AI library built on TensorFlow, Keras and PyTorch. DeepPavlov is designed for development of production re
Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.
TextBlob: Simplified Text Processing Homepage: https://textblob.readthedocs.io/ TextBlob is a Python (2 and 3) library for processing textual data. It
ChatterBot is a machine learning, conversational dialog engine for creating chat bots
ChatterBot ChatterBot is a machine-learning based conversational dialog engine build in Python which makes it possible to generate responses based on
Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications
A Python library for audio feature extraction, classification, segmentation and applications This doc contains general info. Click here for the comple
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Project DeepSpeech DeepSpeech is an open-source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu's Deep Spee
:speech_balloon: SpeechPy - A Library for Speech Processing and Recognition: http://speechpy.readthedocs.io/en/latest/
SpeechPy Official Project Documentation Table of Contents Documentation Which Python versions are supported Citation How to Install? Local Installatio
Audio features extraction
Yaafe Yet Another Audio Feature Extractor Build status Branch master : Branch dev : Anaconda : Install Conda Yaafe can be easily install with conda. T
CPU inference engine that delivers unprecedented performance for sparse models
The DeepSparse Engine is a CPU runtime that delivers unprecedented performance by taking advantage of natural sparsity within neural networks to reduce compute required as well as accelerate memory bound workloads. It is focused on model deployment and scaling machine learning pipelines, fitting seamlessly into your existing deployments as an inference backend.
An Open-Source Package for Neural Relation Extraction (NRE)
OpenNRE We have a DEMO website (http://opennre.thunlp.ai/). Try it out! OpenNRE is an open-source and extensible toolkit that provides a unified frame
Snips Python library to extract meaning from text
Snips NLU Snips NLU (Natural Language Understanding) is a Python library that allows to extract structured information from sentences written in natur
Extract Keywords from sentence or Replace keywords in sentences.
FlashText This module can be used to replace keywords in sentences or extract keywords from sentences. It is based on the FlashText algorithm. Install
Python implementation of TextRank for phrase extraction and summarization of text documents
PyTextRank PyTextRank is a Python implementation of TextRank as a spaCy pipeline extension, used to: extract the top-ranked phrases from text document
Module for automatic summarization of text documents and HTML pages.
Automatic text summarizer Simple library and command line utility for extracting summary from HTML pages or plain texts. The package also contains sim
State of the Art Natural Language Processing
Spark NLP: State of the Art Natural Language Processing Spark NLP is a Natural Language Processing library built on top of Apache Spark ML. It provide
An open source library for deep learning end-to-end dialog systems and chatbots.
DeepPavlov is an open-source conversational AI library built on TensorFlow, Keras and PyTorch. DeepPavlov is designed for development of production re
Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.
TextBlob: Simplified Text Processing Homepage: https://textblob.readthedocs.io/ TextBlob is a Python (2 and 3) library for processing textual data. It
ChatterBot is a machine learning, conversational dialog engine for creating chat bots
ChatterBot ChatterBot is a machine-learning based conversational dialog engine build in Python which makes it possible to generate responses based on
Apache Spark - A unified analytics engine for large-scale data processing
Apache Spark Spark is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an op
A Python library for the Docker Engine API
Docker SDK for Python A Python library for the Docker Engine API. It lets you do anything the docker command does, but from within Python apps – run c
Reusable workflow library for Django
django-viewflow Viewflow is a lightweight reusable workflow library that helps to organize people collaboration business logic in django applications.
Web scraping library and command-line tool for text discovery and extraction (main content, metadata, comments)
trafilatura: Web scraping tool for text discovery and retrieval Description Trafilatura is a Python package and command-line tool which seamlessly dow
News, full-text, and article metadata extraction in Python 3. Advanced docs:
Newspaper3k: Article scraping & curation Inspired by requests for its simplicity and powered by lxml for its speed: "Newspaper is an amazing python li
:incoming_envelope: IMAP/SMTP sync system with modern APIs
Nylas Sync Engine The Nylas Sync Engine provides a RESTful API on top of a powerful email sync platform, making it easy to build apps on top of email.
ASGI support for the Tartiflette GraphQL engine
tartiflette-asgi is a wrapper that provides ASGI support for the Tartiflette Python GraphQL engine. It is ideal for serving a GraphQL API over HTTP, o
tartiflette-aiohttp is a wrapper of aiohttp which includes the Tartiflette GraphQL Engine, do not hesitate to take a look of the Tartiflette project.
tartiflette-aiohttp is a wrapper of aiohttp which includes the Tartiflette GraphQL Engine. You can take a look at the Tartiflette API documentation. U
GraphQL Engine built with Python 3.6+ / asyncio
Tartiflette is a GraphQL Server implementation built with Python 3.6+. Summary Motivation Status Usage Installation Installation dependencies Tartifle
GraphQL is a query language and execution engine tied to any backend service.
GraphQL The GraphQL specification is edited in the markdown files found in /spec the latest release of which is published at https://graphql.github.io
Python binding to Modest engine (fast HTML5 parser with CSS selectors).
A fast HTML5 parser with CSS selectors using Modest engine. Installation From PyPI using pip: pip install selectolax Development version from github:
🛠️ Learn a technology X by doing a project - Search engine of project-based learning
Learn X by doing Y 🛠️ Learn a technology X by doing a project Y Website You can contribute by adding projects to the CSV file.
This is a repository for a No-Code object detection inference API using the OpenVINO. It's supported on both Windows and Linux Operating systems.
OpenVINO Inference API This is a repository for an object detection inference API using the OpenVINO. It's supported on both Windows and Linux Operati
Jinja is a fast, expressive, extensible templating engine.
Jinja is a fast, expressive, extensible templating engine. Special placeholders in the template allow writing code similar to Python syntax.
Tesseract Open Source OCR Engine (main repository)
Tesseract OCR About This package contains an OCR engine - libtesseract and a command line program - tesseract. Tesseract 4 adds a new neural net (LSTM
Embed the Duktape JS interpreter in Python
Introduction Pyduktape is a python wrapper around Duktape, an embeddable Javascript interpreter. On top of the interpreter wrapper, pyduktape offers e
A Python library for the Docker Engine API
Docker SDK for Python A Python library for the Docker Engine API. It lets you do anything the docker command does, but from within Python apps – run c
Module for automatic summarization of text documents and HTML pages.
Automatic text summarizer Simple library and command line utility for extracting summary from HTML pages or plain texts. The package also contains sim
News, full-text, and article metadata extraction in Python 3. Advanced docs:
Newspaper3k: Article scraping & curation Inspired by requests for its simplicity and powered by lxml for its speed: "Newspaper is an amazing python li
An open source library for deep learning end-to-end dialog systems and chatbots.
DeepPavlov is an open-source conversational AI library built on TensorFlow, Keras and PyTorch. DeepPavlov is designed for development of production re
Snips Python library to extract meaning from text
Snips NLU Snips NLU (Natural Language Understanding) is a Python library that allows to extract structured information from sentences written in natur
A Comparative Framework for Multimodal Recommender Systems
Cornac Cornac is a comparative framework for multimodal recommender systems. It focuses on making it convenient to work with models leveraging auxilia
Best Practices on Recommendation Systems
Recommenders What's New (February 4, 2021) We have a new relase Recommenders 2021.2! It comes with lots of bug fixes, optimizations and 3 new algorith
Crab is a flexible, fast recommender engine for Python that integrates classic information filtering recommendation algorithms in the world of scientific Python packages (numpy, scipy, matplotlib).
Crab - A Recommendation Engine library for Python Crab is a flexible, fast recommender engine for Python that integrates classic information filtering r
Unified Interface for Constructing and Managing Workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow.
Couler What is Couler? Couler aims to provide a unified interface for constructing and managing workflows on different workflow engines, such as Argo
A powerful workflow engine implemented in pure Python
Spiff Workflow Summary Spiff Workflow is a workflow engine implemented in pure Python. It is based on the excellent work of the Workflow Patterns init
The easiest way to automate your data
Hello, world! 👋 We've rebuilt data engineering for the data science era. Prefect is a new workflow management system, designed for modern infrastruct
Small, fast HTTP client library for Python. Features persistent connections, cache, and Google App Engine support. Originally written by Joe Gregorio, now supported by community.
Introduction httplib2 is a comprehensive HTTP client library, httplib2.py supports many features left out of other HTTP libraries. HTTP and HTTPS HTTP
Safely add untrusted strings to HTML/XML markup.
MarkupSafe MarkupSafe implements a text object that escapes characters so it is safe to use in HTML and XML. Characters that have special meanings are
ASGI support for the Tartiflette GraphQL engine
tartiflette-asgi is a wrapper that provides ASGI support for the Tartiflette Python GraphQL engine. It is ideal for serving a GraphQL API over HTTP, o
tartiflette-aiohttp is a wrapper of aiohttp which includes the Tartiflette GraphQL Engine, do not hesitate to take a look of the Tartiflette project.
tartiflette-aiohttp is a wrapper of aiohttp which includes the Tartiflette GraphQL Engine. You can take a look at the Tartiflette API documentation. U