4428 Repositories
Python segmentation-based-text-recognition Libraries
Sequential model-based optimization with a `scipy.optimize` interface
Scikit-Optimize Scikit-Optimize, or skopt, is a simple and efficient library to minimize (very) expensive and noisy black-box functions. It implements
Sequential Model-based Algorithm Configuration
SMAC v3 Project Copyright (C) 2016-2018 AutoML Group Attention: This package is a reimplementation of the original SMAC tool (see reference below). Ho
A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning.
Master status: Development status: Package information: scikit-rebate This package includes a scikit-learn-compatible Python implementation of ReBATE,
scikit-learn addon to operate on set/"group"-based features
skl-groups skl-groups is a package to perform machine learning on sets (or "groups") of features in Python. It extends the scikit-learn library with s
Gluon CV Toolkit
Gluon CV Toolkit | Installation | Documentation | Tutorials | GluonCV provides implementations of the state-of-the-art (SOTA) deep learning models in
📝 Wrapper library for text generation / language models at char and word level with RNN in TensorFlow
tensorlm Generate Shakespeare poems with 4 lines of code. Installation tensorlm is written in / for Python 3.4+ and TensorFlow 1.1+ pip3 install tenso
Data loaders and abstractions for text and NLP
torchtext This repository consists of: torchtext.datasets: The raw text iterators for common NLP datasets torchtext.data: Some basic NLP building bloc
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a
Python-based implementations of algorithms for learning on imbalanced data.
ND DIAL: Imbalanced Algorithms Minimalist Python-based implementations of algorithms for imbalanced learning. Includes deep and representational learn
A scikit-learn based module for multi-label et. al. classification
scikit-multilearn scikit-multilearn is a Python module capable of performing multi-label learning tasks. It is built on-top of various scientific Pyth
Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch
COCO LM Pretraining (wip) Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch. They were a
一款高性能敏感词(非法词/脏字)检测过滤组件,附带繁体简体互换,支持全角半角互换,汉字转拼音,模糊搜索等功能。
一款高性能非法词(敏感词)检测组件,附带繁体简体互换,支持全角半角互换,获取拼音首字母,获取拼音字母,拼音模糊搜索等功能。
SU Music Player — The first open-source PyTgCalls based Pyrogram bot to play music in voice chats
SU Music Player — The first open-source PyTgCalls based Pyrogram bot to play music in voice chats Note Neither this, or PyTgCalls are fully
LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search
LightSpeech UnOfficial PyTorch implementation of LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search.
[CVPR 2021] Released code for Counterfactual Zero-Shot and Open-Set Visual Recognition
Counterfactual Zero-Shot and Open-Set Visual Recognition This project provides implementations for our CVPR 2021 paper Counterfactual Zero-S
Ella is a CMS based on Python web framework Django with a main focus on high-traffic news websites and Internet magazines.
Ella CMS Ella is opensource CMS based on Django framework, designed for flexibility. It is composed from several modules: Ella core is the main module
The docker-based Open edX distribution designed for peace of mind
Tutor: the docker-based Open edX distribution designed for peace of mind Tutor is a docker-based Open edX distribution, both for production and local
Web based localization tool with tight version control integration.
Weblate is a copylefted libre software web-based continuous localization system, used by over 1150 libre projects and companies in more than 115 count
Plugin-based, unopinionated membership administration software
byro is a membership administration tool for small and medium sized clubs/NGOs/associations of all kinds, with a focus on the DACH region. While it is
Free and open-source digital preservation system designed to maintain standards-based, long-term access to collections of digital objects.
Archivematica By Artefactual Archivematica is a web- and standards-based, open-source application which allows your institution to preserve long-term
:mag: Ambar: Document Search Engine
🔍 Ambar: Document Search Engine Ambar is an open-source document search engine with automated crawling, OCR, tagging and instant full-text search. Am
a lightweight web framework based on fastapi
start-fastapi Version 2021, based on FastAPI, an easy-to-use web app developed upon Starlette Framework Version 2020 中文文档 Requirements python 3.6+ (fo
ReST based network device broker
The Open API Platform for Network Devices netpalm makes it easy to push and pull state from your apps to your network by providing multiple southbound
The template for building scalable web APIs based on FastAPI, Tortoise ORM and other.
FastAPI and Tortoise ORM. Powerful but simple template for web APIs w/ FastAPI (as web framework) and Tortoise-ORM (for working via database without h
JSON-RPC server based on fastapi
Description JSON-RPC server based on fastapi: https://fastapi.tiangolo.com Motivation Autogenerated OpenAPI and Swagger (thanks to fastapi) for JSON-R
FastAPI Admin Dashboard based on FastAPI and Tortoise ORM.
FastAPI ADMIN 中文文档 Introduction FastAPI-Admin is a admin dashboard based on fastapi and tortoise-orm. FastAPI-Admin provide crud feature out-of-the-bo
A Prometheus Python client library for asyncio-based applications
aioprometheus aioprometheus is a Prometheus Python client library for asyncio-based applications. It provides metrics collection and serving capabilit
Windows Stack Based Auto Buffer Overflow Exploiter
Autoflow - Windows Stack Based Auto Buffer Overflow Exploiter Autoflow is a tool that exploits windows stack based buffer overflow automatically.
Pytorch Code for "Medical Transformer: Gated Axial-Attention for Medical Image Segmentation"
Medical-Transformer Pytorch Code for the paper "Medical Transformer: Gated Axial-Attention for Medical Image Segmentation" About this repo: This repo
Repository to create Ascii art in CMD based on video file.
Made to take any file format, and transform it into ascii art, displayed as a video in the cmd. If the cmd formatting is wrong, try zooming a little and remember to make cmd fullscreen. I made my cmd fullscreen, and zoomed out one tic. Written in Python 3.9
Implementation of SETR model, Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.
SETR - Pytorch Since the original paper (Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.) has no official
This is a new web-based photo management application. Run it on your home server and it will let you find the right photo from your collection on any device. Smart filtering is made possible by object recognition, location awareness, color analysis and other ML algorithms.
Photonix Photo Manager This is a photo management application based on web technologies. Run it on your home server and it will let you find what you
Python-based tools for document analysis and OCR
ocropy OCRopus is a collection of document analysis programs, not a turn-key OCR system. In order to apply it to your documents, you may need to do so
Qt based ebook reader
Qt based ebook reader Currently supports: pdf epub djvu fb2 mobi azw / azw3 / azw4 cbr / cbz md Contribute Paypal Bitcoin: 17jaxj26vFJNqQ2hEVerbBV5fpT
Gaphor is the simple modeling tool
Gaphor Gaphor is a UML and SysML modeling application written in Python. It is designed to be easy to use, while still being powerful. Gaphor implemen
Full text search for flask.
flask-msearch Installation To install flask-msearch: pip install flask-msearch # when MSEARCH_BACKEND = "whoosh" pip install whoosh blinker # when MSE
Python library for creating PEG parsers
PyParsing -- A Python Parsing Module Introduction The pyparsing module is an alternative approach to creating and executing simple grammars, vs. the t
This repo holds code for TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation
TransUNet This repo holds code for TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation Usage
Implementation of TimeSformer, a pure attention-based solution for video classification
TimeSformer - Pytorch Implementation of TimeSformer, a pure and simple attention-based solution for reaching SOTA on video classification.
Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks
Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks. It takes raw videos/images + text as inputs, and outputs task predictions. ClipBERT is designed based on 2D CNNs and transformers, and uses a sparse sampling strategy to enable efficient end-to-end video-and-language learning.
Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals.
Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals This repo contains the Pytorch implementation of our paper: Unsupervised Seman
A Bot Which Can generate Random Account Based On Your Hits.
AccountGenBot This Bot Can Generate Account With Hits You Save (Randomly) Keyfeatures Join To Use Support Limit Account Generation Using Sql Customiza
This program is to make a video based on Deep Dream
This program is to make a video based on Deep Dream. The program is modified from DeepDreamAnim and DeepDreamVideo with additional functions for bleding two frames based on the optical flows. It also supports the image division to apply the Deep Dream algorithm to a large image.
Knowledge Management for Humans using Machine Learning & Tags
HyperTag HyperTag helps humans intuitively express how they think about their files using tags and machine learning.
Reviving Iterative Training with Mask Guidance for Interactive Segmentation
This repository provides the source code for training and testing state-of-the-art click-based interactive segmentation models with the official PyTorch implementation
Puzzle-CAM: Improved localization via matching partial and full features.
Puzzle-CAM The official implementation of "Puzzle-CAM: Improved localization via matching partial and full features".
Simulation-Based Inference Benchmark
This repository contains a simulation-based inference benchmark framework, sbibm, which we describe in the associated manuscript "Benchmarking Simulation-based Inference".
Knowledge Management for Humans using Machine Learning & Tags
HyperTag helps humans intuitively express how they think about their files using tags and machine learning. Represent how you think using tags. Find what you look for using semantic search for your text documents (yes, even PDF's) and images.
The template for building scalable web APIs based on FastAPI, Tortoise ORM and other.
FastAPI and Tortoise ORM. Powerful but simple template for web APIs w/ FastAPI (as web framework) and Tortoise-ORM (for working via database without h
Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.
Summarization, translation, Q&A, text generation and more at blazing speed using a T5 version implemented in ONNX. This package is still in alpha stag
Named-entity recognition using neural networks. Easy-to-use and state-of-the-art results.
NeuroNER NeuroNER is a program that performs named-entity recognition (NER). Website: neuroner.com. This page gives step-by-step instructions to insta
Get list of common stop words in various languages in Python
Python Stop Words Table of contents Overview Available languages Installation Basic usage Python compatibility Overview Get list of common stop words
Facilitating the design, comparison and sharing of deep text matching models.
MatchZoo Facilitating the design, comparison and sharing of deep text matching models. MatchZoo 是一个通用的文本匹配工具包,它旨在方便大家快速的实现、比较、以及分享最新的深度文本匹配模型。 🔥 News
Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.
anaGo anaGo is a Python library for sequence labeling(NER, PoS Tagging,...), implemented in Keras. anaGo can solve sequence labeling tasks such as nam
Multilingual text (NLP) processing toolkit
polyglot Polyglot is a natural language pipeline that supports massive multilingual applications. Free software: GPLv3 license Documentation: http://p
Text vectorization tool to outperform TFIDF for classification tasks
WHAT: Supervised text vectorization tool Textvec is a text vectorization tool, with the aim to implement all the "classic" text vectorization NLP meth
Textpipe: clean and extract metadata from text
textpipe: clean and extract metadata from text textpipe is a Python package for converting raw text in to clean, readable text and extracting metadata
Unsupervised text tokenizer focused on computational efficiency
YouTokenToMe YouTokenToMe is an unsupervised text tokenizer focused on computational efficiency. It currently implements fast Byte Pair Encoding (BPE)
Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.
Kashgari Overview | Performance | Installation | Documentation | Contributing 🎉 🎉 🎉 We released the 2.0.0 version with TF2 Support. 🎉 🎉 🎉 If you
Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet
Sockeye This package contains the Sockeye project, an open-source sequence-to-sequence framework for Neural Machine Translation based on Apache MXNet
DELTA is a deep learning based natural language and speech processing platform.
DELTA - A DEep learning Language Technology plAtform What is DELTA? DELTA is a deep learning based end-to-end natural language and speech processing p
Text preprocessing, representation and visualization from zero to hero.
Text preprocessing, representation and visualization from zero to hero. From zero to hero • Installation • Getting Started • Examples • API • FAQ • Co
NeMo: a toolkit for conversational AI
NVIDIA NeMo Introduction NeMo is a toolkit for creating Conversational AI applications. NeMo product page. Introductory video. The toolkit comes with
Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts
gpt-2-simple A simple Python package that wraps existing model fine-tuning and generation scripts for OpenAI's GPT-2 text generation model (specifical
Python package for performing Entity and Text Matching using Deep Learning.
DeepMatcher DeepMatcher is a Python package for performing entity and text matching using deep learning. It provides built-in neural networks and util
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/
Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides
Super easy library for BERT based NLP models
Fast-Bert New - Learning Rate Finder for Text Classification Training (borrowed with thanks from https://github.com/davidtvs/pytorch-lr-finder) Suppor
Beautiful visualizations of how language differs among document types.
Scattertext 0.1.0.0 A tool for finding distinguishing terms in corpora and displaying them in an interactive HTML scatter plot. Points corresponding t
Snips Python library to extract meaning from text
Snips NLU Snips NLU (Natural Language Understanding) is a Python library that allows to extract structured information from sentences written in natur
🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.
pySBD: Python Sentence Boundary Disambiguation (SBD) pySBD - python Sentence Boundary Disambiguation (SBD) - is a rule-based sentence boundary detecti
Extract Keywords from sentence or Replace keywords in sentences.
FlashText This module can be used to replace keywords in sentences or extract keywords from sentences. It is based on the FlashText algorithm. Install
Python implementation of TextRank for phrase extraction and summarization of text documents
PyTextRank PyTextRank is a Python implementation of TextRank as a spaCy pipeline extension, used to: extract the top-ranked phrases from text document
fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.
fastNLP fastNLP是一款轻量级的自然语言处理(NLP)工具包,目标是快速实现NLP任务以及构建复杂模型。 fastNLP具有如下的特性: 统一的Tabular式数据容器,简化数据预处理过程; 内置多种数据集的Loader和Pipe,省去预处理代码; 各种方便的NLP工具,例如Embedd
Module for automatic summarization of text documents and HTML pages.
Automatic text summarizer Simple library and command line utility for extracting summary from HTML pages or plain texts. The package also contains sim
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
T5: Text-To-Text Transfer Transformer The t5 library serves primarily as code for reproducing the experiments in Exploring the Limits of Transfer Lear
Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.
textgenrnn Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code, or quickly tr
Official Stanford NLP Python Library for Many Human Languages
Stanza: A Python NLP Library for Many Human Languages The Stanford NLP Group's official Python NLP library. It contains support for running various ac
A natural language modeling framework based on PyTorch
Overview PyText is a deep-learning based NLP modeling framework built on PyTorch. PyText addresses the often-conflicting requirements of enabling rapi
State of the Art Natural Language Processing
Spark NLP: State of the Art Natural Language Processing Spark NLP is a Natural Language Processing library built on top of Apache Spark ML. It provide
VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.
VADER-Sentiment-Analysis VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifica
Fixes mojibake and other glitches in Unicode text, after the fact.
ftfy: fixes text for you print(fix_encoding("(ง'⌣')ง")) (ง'⌣')ง Full documentation: https://ftfy.readthedocs.org Testimonials “My life is li
Making text a first-class citizen in TensorFlow.
TensorFlow Text - Text processing in Tensorflow IMPORTANT: When installing TF Text with pip install, please note the version of TensorFlow you are run
An open source library for deep learning end-to-end dialog systems and chatbots.
DeepPavlov is an open-source conversational AI library built on TensorFlow, Keras and PyTorch. DeepPavlov is designed for development of production re
Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.
TextBlob: Simplified Text Processing Homepage: https://textblob.readthedocs.io/ TextBlob is a Python (2 and 3) library for processing textual data. It
Data loaders and abstractions for text and NLP
torchtext This repository consists of: torchtext.data: Generic data loaders, abstractions, and iterators for text (including vocabulary and word vecto
A very simple framework for state-of-the-art Natural Language Processing (NLP)
A very simple framework for state-of-the-art NLP. Developed by Humboldt University of Berlin and friends. IMPORTANT: (30.08.2020) We moved our models
Library for fast text representation and classification.
fastText fastText is a library for efficient learning of word representations and sentence classification. Table of contents Resources Models Suppleme
Unsupervised text tokenizer for Neural Network-based text generation.
SentencePiece SentencePiece is an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabu
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual
💫 Industrial-strength Natural Language Processing (NLP) in Python
spaCy: Industrial-strength NLP spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest researc
A Python toolbox for gaining geometric insights into high-dimensional data
"To deal with hyper-planes in a 14 dimensional space, visualize a 3D space and say 'fourteen' very loudly. Everyone does it." - Geoff Hinton Overview
Speech recognition module for Python, supporting several engines and APIs, online and offline.
SpeechRecognition Library for performing speech recognition, with support for several engines and APIs, online and offline. Speech recognition engine/
Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications
A Python library for audio feature extraction, classification, segmentation and applications This doc contains general info. Click here for the comple
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
⚠️ Checkout develop branch to see what is coming in pyannote.audio 2.0: a much smaller and cleaner codebase Python-first API (the good old pyannote-au
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Project DeepSpeech DeepSpeech is an open-source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu's Deep Spee
:speech_balloon: SpeechPy - A Library for Speech Processing and Recognition: http://speechpy.readthedocs.io/en/latest/
SpeechPy Official Project Documentation Table of Contents Documentation Which Python versions are supported Citation How to Install? Local Installatio
Gammatone-based spectrograms, using gammatone filterbanks or Fourier transform weightings.
Gammatone Filterbank Toolkit Utilities for analysing sound using perceptual models of human hearing. Jason Heeris, 2013 Summary This is a port of Malc
An audio digital processing toolbox based on a workflow/pipeline principle
AudioTK Audio ToolKit is a set of audio filters. It helps assembling workflows for specific audio processing workloads. The audio workflow is split in
Python library for handling audio datasets.
AUDIOMATE Audiomate is a library for easy access to audio datasets. It provides the datastructures for accessing/loading different datasets in a gener
A Flask extension that enables or disables features based on configuration.
Flask FeatureFlags This is a Flask extension that adds feature flagging to your applications. This lets you turn parts of your site on or off based on