842 Python Nlp-pipeline Libraries

MapReader: A computer vision pipeline for the semantic exploration of maps at scale

MapReader A computer vision pipeline for the semantic exploration of maps at scale MapReader is an end-to-end computer vision (CV) pipeline designed b

25 Dec 26, 2022

Behavioral Testing of Clinical NLP Models

Behavioral Testing of Clinical NLP Models This repository contains code for testing the behavior of clinical prediction models based on patient letter

2 Sep 20, 2022

PowerApps-docstring is a console based, pipeline ready application that automatically generates user and technical documentation for Power Apps.

powerapps-docstring PowerApps-docstring is a console based, pipeline ready application that automatically generates user and technical documentation f

30 Nov 23, 2022

Source Code and data for my paper titled Linguistic Knowledge in Data Augmentation for Natural Language Processing: An Example on Chinese Question Matching

Description The source code and data for my paper titled Linguistic Knowledge in Data Augmentation for Natural Language Processing: An Example on Chin

3 Jun 28, 2022

APIlocal_dbAWS_RDS - Disclaimer! All data used is for educational purposes only.

APIlocal_dbAWS_RDS Disclaimer! All data used is for educational purposes only. ETL pipeline diagram. Aim of project By creating a fully working pipe

0 Apr 25, 2022

WSDM‘2022: Knowledge Enhanced Sports Game Summarization

Knowledge Enhanced Sports Game Summarization Cooming Soon! :) Data will be released after approval process. Code will be published once the author of

14 Jul 13, 2022

A procedural Blender pipeline for photorealistic training image generation

BlenderProc2 A procedural Blender pipeline for photorealistic rendering. Documentation | Tutorials | Examples | ArXiv paper | Workshop paper Features

1.8k Jan 2, 2023

Haystack is an open source NLP framework that leverages Transformer models.

Haystack is an end-to-end framework that enables you to build powerful and production-ready pipelines for different search use cases. Whether you want

2.7k Nov 28, 2021

KakaoBrain KoGPT (Korean Generative Pre-trained Transformer)

KoGPT KoGPT (Korean Generative Pre-trained Transformer) https://github.com/kakaobrain/kogpt https://huggingface.co/kakaobrain/kogpt Model Descriptions

799 Dec 28, 2022

Conversational text Analysis using various NLP techniques

PyConverse Let me try first Installation pip install pyconverse Usage Please try this notebook that demos the core functionalities: basic usage noteb

158 Dec 25, 2022

A Facebook Messenger Chatbot using NLP

A Facebook Messenger Chatbot using NLP This project is about creating a messenger chatbot using basic NLP techniques and models like Logistic Regressi

6 Nov 20, 2022

Package to compute Mauve, a similarity score between neural text and human text. Install with `pip install mauve-text`.

MAUVE MAUVE is a library built on PyTorch and HuggingFace Transformers to measure the gap between neural text and human text with the eponymous MAUVE

182 Jan 2, 2023

Make differentially private training of transformers easy for everyone

private-transformers This codebase facilitates fast experimentation of differentially private training of Hugging Face transformers. What is this? Why

73 Dec 28, 2022

This is a Docker-based pipeline for preparing sextractor-ready multiwavelength images

Pipeline for creating NB422-detected (ODI) catalog The repository contains a Docker-based pipeline for preprocessing observational data. The pipeline

1 Sep 1, 2022

A curated list of awesome papers for Semantic Retrieval (TOIS Accepted: Semantic Models for the First-stage Retrieval: A Comprehensive Review).

189 Dec 28, 2022

Pipeline to convert a haploid assembly into diploid

HapDup (haplotype duplicator) is a pipeline to convert a haploid long read assembly into a dual diploid assembly. The reconstructed haplotypes

50 Jan 5, 2023

Scikit-Learn useful pre-defined Pipelines Hub

Scikit-Pipes Scikit-Learn useful pre-defined Pipelines Hub Usage: Install scikit-pipes It's advised to install sklearn-genetic using a virtual env, in

1 Apr 26, 2022

📚 Papermill is a tool for parameterizing, executing, and analyzing Jupyter Notebooks.

papermill is a tool for parameterizing, executing, and analyzing Jupyter Notebooks. Papermill lets you: parameterize notebooks execute notebooks This

5.1k Jan 3, 2023

A DSL for data-driven computational pipelines

"Dataflow variables are spectacularly expressive in concurrent programming" Henri E. Bal , Jennifer G. Steiner , Andrew S. Tanenbaum Quick overview Ne

1.9k Jan 3, 2023

Automated question generation and question answering from Turkish texts using text-to-text transformers

Turkish Question Generation Offical source code for "Automated question generation & question answering from Turkish texts using text-to-text transfor

29 Dec 14, 2022

A unified 3D Transformer Pipeline for visual synthesis

Overview This is the official repo for the paper: "NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion". NÜWA is a unified multimodal

2.6k Jan 3, 2023

🧪 Cutting-edge experimental spaCy components and features

spacy-experimental: Cutting-edge experimental spaCy components and features This package includes experimental components and features for spaCy v3.x,

65 Dec 30, 2022

An easy to use Natural Language Processing library and framework for predicting, training, fine-tuning, and serving up state-of-the-art NLP models.

Welcome to AdaptNLP A high level framework and library for running, training, and deploying state-of-the-art Natural Language Processing (NLP) models

407 Jan 3, 2023

Crypto Stats and Tweets Data Pipeline using Airflow

Crypto Stats and Tweets Data Pipeline using Airflow Introduction Project Overview This project was brought upon through Udacity's nanodegree program.

1 Nov 24, 2021

State of the art faster Natural Language Processing in Tensorflow 2.0 .

tf-transformers: faster and easier state-of-the-art NLP in TensorFlow 2.0 ****************************************************************************

74 Dec 5, 2022

Datashader is a data rasterization pipeline for automating the process of creating meaningful representations of large amounts of data.

2.9k Jan 6, 2023

PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows.

An open-source, low-code machine learning library in Python 🚀 Version 2.3.5 out now! Check out the release notes here. Official • Docs • Install • Tu

6.7k Jan 8, 2023

A fast, efficient universal vector embedding utility package.

Magnitude: a fast, simple vector embedding utility library A feature-packed Python package and vector storage file format for utilizing vector embeddi

1.5k Jan 2, 2023

Repo for the paper "DiLBERT: Cheap Embeddings for Disease Related Medical NLP"

DiLBERT Repo for the paper "DiLBERT: Cheap Embeddings for Disease Related Medical NLP" Pretrained Model The pretrained model presented in the paper is

2 Dec 15, 2022

Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.

Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models. Solve a variety of tasks with pre-trained models or finetune them in

227 Dec 10, 2022

BookNLP, a natural language processing pipeline for books

BookNLP BookNLP is a natural language processing pipeline that scales to books and other long documents (in English), including: Part-of-speech taggin

654 Jan 2, 2023

Stanza: A Python NLP Library for Many Human Languages

Official Stanford NLP Python Library for Many Human Languages

5.8k Nov 22, 2021

Utilities for preprocessing text for deep learning with Keras

Note: This utility is really old and is no longer maintained. You should use keras.layers.TextVectorization instead of this. Utilities for pre-process

180 Dec 9, 2022

How to use TensorLayer

How to use TensorLayer While research in Deep Learning continues to improve the world, we use a bunch of tricks to implement algorithms with TensorLay

349 Dec 7, 2022

Predicting the usefulness of reviews given the review text and metadata surrounding the reviews.

Predicting Yelp Review Quality Table of Contents Introduction Motivation Goal and Central Questions The Data Data Storage and ETL EDA Data Pipeline Da

3 Nov 27, 2022

Multiple implementations for abstractive text summurization , using google colab

Text Summarization models if you are able to endorse me on Arxiv, i would be more than glad https://arxiv.org/auth/endorse?x=FRBB89 thanks This repo i

463 Dec 26, 2022

NLP and Text Generation Experiments in TensorFlow 2.x / 1.x

Code has been run on Google Colab, thanks Google for providing computational resources Contents Natural Language Processing（自然语言处理） Text Classificati

1.5k Nov 14, 2022

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Introduction XLNet is a new unsupervised language representation learning method based on a novel generalized permutation language modeling objective.

6k Jan 7, 2023

TensorFlow code and pre-trained models for BERT

BERT ***** New March 11th, 2020: Smaller BERT Models ***** This is a release of 24 smaller BERT models (English only, uncased, trained with WordPiece

32.9k Jan 8, 2023

Deep learning for NLP crash course at ABBYY.

Deep NLP Course at ABBYY Deep learning for NLP crash course at ABBYY. Suggested textbook: Neural Network Methods in Natural Language Processing by Yoa

597 Dec 18, 2022

nlp-tutorial is a tutorial for who is studying NLP(Natural Language Processing) using Pytorch

nlp-tutorial is a tutorial for who is studying NLP(Natural Language Processing) using Pytorch. Most of the models in NLP were implemented with less than 100 lines of code.(except comments or blank lines)

11.9k Jan 8, 2023

Biterm Topic Model (BTM): modeling topics in short texts

Biterm Topic Model Bitermplus implements Biterm topic model for short texts introduced by Xiaohui Yan, Jiafeng Guo, Yanyan Lan, and Xueqi Cheng. Actua

49 Dec 30, 2022

Binary LSTM model for text classification

Text Classification The purpose of this repository is to create a neural network model of NLP with deep learning for binary classification of texts re

1 Mar 11, 2022

Backtesting an algorithmic trading strategy using Machine Learning and Sentiment Analysis.

Trading Tesla with Machine Learning and Sentiment Analysis An interactive program to train a Random Forest Classifier to predict Tesla daily prices us

31 Nov 17, 2022

AI-powered literature discovery and review engine for medical/scientific papers

AI-powered literature discovery and review engine for medical/scientific papers paperai is an AI-powered literature discovery and review engine for me

819 Dec 30, 2022

Full automated data pipeline using docker images

Create postgres tables from CSV files This first section is only relate to creating tables from CSV files using postgres container alone. Just one of

1 Nov 21, 2021

Building house price data pipelines with Apache Beam and Spark on GCP

This project contains the process from building a web crawler to extract the raw data of house price to create ETL pipelines using Google Could Platform services.

1 Nov 22, 2021

topic modeling on unstructured data in Space news articles retrieved from the Guardian (UK) newspaper using API

NLP Space News Topic Modeling Photos by nasa.gov (1, 2, 3, 4, 5) and extremetech.com Table of Contents Project Idea Data acquisition Primary data sour

1 Jan 3, 2022

Automated Machine Learning Pipeline for tabular data. Designed for predictive maintenance applications, failure identification, failure prediction, condition monitoring, etc.

10 May 15, 2022

A simple python application for running a CI pipeline locally

A simple python application for running a CI pipeline locally This app currently supports GitLab CI scripts

0 Jan 11, 2022

Conversational text Analysis using various NLP techniques

159 Jan 6, 2023

Politecnico of Turin Thesis: "Implementation and Evaluation of an Educational Chatbot based on NLP Techniques"

THESIS_CAIRONE_FIORENTINO Politecnico of Turin Thesis: "Implementation and Evaluation of an Educational Chatbot based on NLP Techniques" GENERATE TOKE

1 Dec 10, 2021

ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab

AliceMind AliceMind: ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab This repository provides pre-trained encode

1.4k Jan 4, 2023

Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).

Spanish Language Models 💃🏻 A repository part of the MarIA project. Corpora 📃 Corpora Number of documents Number of tokens Size (GB) BNE 201,080,084

Plan de Tecnologías del Lenguaje - Gobierno de España

203 Dec 20, 2022

Maha is a text processing library specially developed to deal with Arabic text.

An Arabic text processing library intended for use in NLP applications Maha is a text processing library specially developed to deal with Arabic text.

184 Nov 27, 2022

Two phase pipeline + StreamlitTwo phase pipeline + Streamlit

Two phase pipeline + Streamlit This is an example project that demonstrates how to create a pipeline that consists of two phases of execution. In betw

1 Nov 17, 2021

Code for training and evaluation of the model from "Language Generation with Recurrent Generative Adversarial Networks without Pre-training"

Language Generation with Recurrent Generative Adversarial Networks without Pre-training Code for training and evaluation of the model from "Language G

253 Sep 14, 2022

A data preprocessing and feature engineering script for a machine learning pipeline is prepared.

FEATURE ENGINEERING Business Problem: A data preprocessing and feature engineering script for a machine learning pipeline needs to be prepared. It is

7 Dec 18, 2021

Required for a machine learning pipeline data preprocessing and variable engineering script needs to be prepared

Feature-Engineering Required for a machine learning pipeline data preprocessing and variable engineering script needs to be prepared. When the dataset

5 Apr 21, 2022

A simple python application for running a CI pipeline locally This app currently supports GitLab CI scripts

🏃 Simple Local CI Runner 🏃 A simple python application for running a CI pipeline locally This app currently supports GitLab CI scripts ⚙️ Setup Inst

0 Jan 11, 2022

🛠️ Tools for Transformers compression using Lightning ⚡

Bert-squeeze is a repository aiming to provide code to reduce the size of Transformer-based models or decrease their latency at inference time.

66 Dec 11, 2022

Python library for Serbian Natural language processing (NLP)

SrbAI - Python biblioteka za procesiranje srpskog jezika SrbAI je projekat prikupljanja algoritama i modela za procesiranje srpskog jezika u jedinstve

3 Nov 22, 2022

A deep-learning pipeline for segmentation of ambiguous microscopic images.

Welcome to Official repository of deepflash2 - a deep-learning pipeline for segmentation of ambiguous microscopic images. Quick Start in 30 seconds se

39 Dec 19, 2022

PESTO: Switching Point based Dynamic and Relative Positional Encoding for Code-Mixed Languages

PESTO: Switching Point based Dynamic and Relative Positional Encoding for Code-Mixed Languages Abstract NLP applications for code-mixed (CM) or mix-li

1 Nov 12, 2021

Integrated Semantic and Phonetic Post-correction for Chinese Speech Recognition

Integrated Semantic and Phonetic Post-correction for Chinese Speech Recognition | paper | dataset | pretrained detection model | Authors: Yi-Chang Che

1 Aug 23, 2022

Improving the robustness and performance of biomedical NLP models through adversarial training

RobustBioNLP Improving the robustness and performance of biomedical NLP models through adversarial training In this repository you can find suppliment

3 Sep 20, 2022

DataCLUE: 国内首个以数据为中心的AI测评（含模型分析报告）

DataCLUE: A Benchmark Suite for Data-centric NLP You can get the english version of README. 以数据为中心的AI测评(DataCLUE) 内容导引章节描述简介介绍以数据为中心的AI测评(DataCLUE

135 Dec 22, 2022

Source code for our EMNLP'21 paper 《Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning》

Child-Tuning Source code for EMNLP 2021 Long paper: Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning. 1. Environ

46 Dec 12, 2022

TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning

TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning Authors: Yixuan Su, Fangyu Liu, Zaiqiao Meng, Lei Shu, Ehsan Shareghi, and Nig

21 Nov 17, 2021

🚪✊Knock Knock: Get notified when your training ends with only two additional lines of code

Knock Knock A small library to get a notification when your training is complete or when it crashes during the process with two additional lines of co

2.5k Jan 7, 2023

Python package for Turkish Language.

PyTurkce Python package for Turkish Language. Documentation: https://pyturkce.readthedocs.io. Installation pip install pyturkce Usage from pyturkce im

14 Oct 9, 2022

Identifies the faulty wafer before it can be used for the fabrication of integrated circuits and, in photovoltaics, to manufacture solar cells.

Identifies the faulty wafer before it can be used for the fabrication of integrated circuits and, in photovoltaics, to manufacture solar cells. The project retrains itself after every prediction, making it more robust and generalized over time.

2 Jul 1, 2022

KakaoBrain KoGPT (Korean Generative Pre-trained Transformer)

KoGPT KoGPT (Korean Generative Pre-trained Transformer) https://github.com/kakaobrain/kogpt https://huggingface.co/kakaobrain/kogpt Model Descriptions

797 Dec 26, 2022

Journey is a NLP-Powered Developer assistant

Journey Journey is a NLP-Powered Developer assistant Using on the powerful Natural Language Processing library Mindmeld, this projects aims to assist

21 Dec 11, 2022

ML for NLP and Computer Vision.

Sparrow is our open-source ML product. It runs on Skipper MLOps infrastructure.

2 Nov 28, 2021

Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.

2 Nov 20, 2021

We have built a Voice based Personal Assistant for people to access files hands free in their device using natural language processing.

Voice Based Personal Assistant We have built a Voice based Personal Assistant for people to access files hands free in their device using natural lang

2 Nov 13, 2021

Python Nlp-pipeline Resources

Python nlp-pipeline Libraries

MapReader: A computer vision pipeline for the semantic exploration of maps at scale

Behavioral Testing of Clinical NLP Models

PowerApps-docstring is a console based, pipeline ready application that automatically generates user and technical documentation for Power Apps.

Source Code and data for my paper titled Linguistic Knowledge in Data Augmentation for Natural Language Processing: An Example on Chinese Question Matching

APIlocal_dbAWS_RDS - Disclaimer! All data used is for educational purposes only.

WSDM‘2022: Knowledge Enhanced Sports Game Summarization

A procedural Blender pipeline for photorealistic training image generation

Haystack is an open source NLP framework that leverages Transformer models.

KakaoBrain KoGPT (Korean Generative Pre-trained Transformer)

Conversational text Analysis using various NLP techniques

A Facebook Messenger Chatbot using NLP

Package to compute Mauve, a similarity score between neural text and human text. Install with `pip install mauve-text`.

Make differentially private training of transformers easy for everyone

This is a Docker-based pipeline for preparing sextractor-ready multiwavelength images

A curated list of awesome papers for Semantic Retrieval (TOIS Accepted: Semantic Models for the First-stage Retrieval: A Comprehensive Review).

Pipeline to convert a haploid assembly into diploid

Scikit-Learn useful pre-defined Pipelines Hub

📚 Papermill is a tool for parameterizing, executing, and analyzing Jupyter Notebooks.

A DSL for data-driven computational pipelines

Automated question generation and question answering from Turkish texts using text-to-text transformers

A unified 3D Transformer Pipeline for visual synthesis

🧪 Cutting-edge experimental spaCy components and features

An easy to use Natural Language Processing library and framework for predicting, training, fine-tuning, and serving up state-of-the-art NLP models.

Crypto Stats and Tweets Data Pipeline using Airflow

State of the art faster Natural Language Processing in Tensorflow 2.0 .

Datashader is a data rasterization pipeline for automating the process of creating meaningful representations of large amounts of data.

PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows.

A fast, efficient universal vector embedding utility package.

Repo for the paper "DiLBERT: Cheap Embeddings for Disease Related Medical NLP"

Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.

BookNLP, a natural language processing pipeline for books

Stanza: A Python NLP Library for Many Human Languages

Utilities for preprocessing text for deep learning with Keras

How to use TensorLayer

Predicting the usefulness of reviews given the review text and metadata surrounding the reviews.

Multiple implementations for abstractive text summurization , using google colab

NLP and Text Generation Experiments in TensorFlow 2.x / 1.x

XLNet: Generalized Autoregressive Pretraining for Language Understanding

TensorFlow code and pre-trained models for BERT

Deep learning for NLP crash course at ABBYY.

nlp-tutorial is a tutorial for who is studying NLP(Natural Language Processing) using Pytorch

Biterm Topic Model (BTM): modeling topics in short texts

Binary LSTM model for text classification

Backtesting an algorithmic trading strategy using Machine Learning and Sentiment Analysis.

AI-powered literature discovery and review engine for medical/scientific papers

Full automated data pipeline using docker images

Building house price data pipelines with Apache Beam and Spark on GCP

topic modeling on unstructured data in Space news articles retrieved from the Guardian (UK) newspaper using API

Automated Machine Learning Pipeline for tabular data. Designed for predictive maintenance applications, failure identification, failure prediction, condition monitoring, etc.

A simple python application for running a CI pipeline locally

Conversational text Analysis using various NLP techniques

Politecnico of Turin Thesis: "Implementation and Evaluation of an Educational Chatbot based on NLP Techniques"

ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab

Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).

Maha is a text processing library specially developed to deal with Arabic text.

Two phase pipeline + StreamlitTwo phase pipeline + Streamlit

Code for training and evaluation of the model from "Language Generation with Recurrent Generative Adversarial Networks without Pre-training"

A data preprocessing and feature engineering script for a machine learning pipeline is prepared.

Required for a machine learning pipeline data preprocessing and variable engineering script needs to be prepared

A simple python application for running a CI pipeline locally This app currently supports GitLab CI scripts

🛠️ Tools for Transformers compression using Lightning ⚡

Python library for Serbian Natural language processing (NLP)

A deep-learning pipeline for segmentation of ambiguous microscopic images.

PESTO: Switching Point based Dynamic and Relative Positional Encoding for Code-Mixed Languages

Integrated Semantic and Phonetic Post-correction for Chinese Speech Recognition

Improving the robustness and performance of biomedical NLP models through adversarial training

DataCLUE: 国内首个以数据为中心的AI测评（含模型分析报告）

Source code for our EMNLP'21 paper 《Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning》

TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning

🚪✊Knock Knock: Get notified when your training ends with only two additional lines of code

Python package for Turkish Language.

Identifies the faulty wafer before it can be used for the fabrication of integrated circuits and, in photovoltaics, to manufacture solar cells.

KakaoBrain KoGPT (Korean Generative Pre-trained Transformer)

Journey is a NLP-Powered Developer assistant

ML for NLP and Computer Vision.

Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.

We have built a Voice based Personal Assistant for people to access files hands free in their device using natural language processing.

Bodywork deploys machine learning projects developed in Python, to Kubernetes.