917 Python Nlp-toolkit Libraries

A modified Sequential and NLP based Bot

A modified Sequential and NLP based Bot I improvised this bot a bit with some implementations as a part of my own hobby project :) Note: I do not own

2 Jan 7, 2022

A Lightweight NLP Data Loader for All Deep Learning Frameworks in Python

LineFlow: Framework-Agnostic NLP Data Loader in Python LineFlow is a simple text dataset loader for NLP deep learning tasks. LineFlow was designed to

177 Jan 4, 2023

Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application with a focus on embedded systems.

Welcome to Spokestack Python! This library is intended for developing voice interfaces in Python. This can include anything from Raspberry Pi applicat

133 Sep 20, 2022

PatZilla is a modular patent information research platform and data integration toolkit with a modern user interface and access to multiple data sources.

68 Dec 14, 2022

Py2neo is a comprehensive toolkit for working with Neo4j from within Python applications or from the command line.

Py2neo v3 Py2neo is a client library and toolkit for working with Neo4j from within Python applications and from the command line. The core library ha

64 Oct 14, 2022

A toolkit to generate MR sequence diagrams

mrsd: a toolkit to generate MR sequence diagrams mrsd is a Python toolkit to generate MR sequence diagrams, as shown below for the basic FLASH sequenc

3 Dec 25, 2021

Fast and robust date extraction from web pages, with Python or on the command-line

Find original and updated publication dates of any web page. From the command-line or within Python, all the steps needed from web page download to HTML parsing, scraping, and text analysis are included.

60 Dec 14, 2022

MMRazor: a model compression toolkit for model slimming and AutoML

Documentation: https://mmrazor.readthedocs.io/ English | 简体中文 Introduction MMRazor is a model compression toolkit for model slimming and AutoML, which

899 Jan 2, 2023

Practical Natural Language Processing Tools for Humans is build on the top of Senna Natural Language Processing (NLP)

Practical Natural Language Processing Tools for Humans is build on the top of Senna Natural Language Processing (NLP) predictions: part-of-speech (POS) tags, chunking (CHK), name entity recognition (NER), semantic role labeling (SRL) and syntactic parsing (PSG) with skip-gram all in Python and still more features will be added. The website give is for downlarding Senna tool

20 Apr 30, 2022

Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

Simplemma: a simple multilingual lemmatizer for Python Purpose Lemmatization is the process of grouping together the inflected forms of a word so they

70 Dec 29, 2022

Faster, modernized fork of the language identification tool langid.py

py3langid py3langid is a fork of the standalone language identification tool langid.py by Marco Lui. Original license: BSD-2-Clause. Fork license: BSD

12 Nov 5, 2022

Quick insights from Zoom meeting transcripts using Graph + NLP

Transcript Analysis - Graph + NLP This program extracts insights from Zoom Meeting Transcripts (.vtt) using TigerGraph and NLTK. In order to run this

7 Sep 17, 2022

Unofficial Python library for using the Polish Wordnet (plWordNet / Słowosieć)

Polish Wordnet Python library Simple, easy-to-use and reasonably fast library for using the Słowosieć (also known as PlWordNet) - a lexico-semantic da

12 Dec 23, 2022

Evaluation toolkit of the informative tracking benchmark comprising 9 scenarios, 180 diverse videos, and new challenges.

Informative-tracking-benchmark Informative tracking benchmark (ITB) higher diversity. It contains 9 representative scenarios and 180 diverse videos. m

15 Nov 26, 2022

A Japanese tokenizer based on recurrent neural networks

Nagisa is a python module for Japanese word segmentation/POS-tagging. It is designed to be a simple and easy-to-use tool. This tool has the following

325 Jan 5, 2023

Crawl BookCorpus

These are scripts to reproduce BookCorpus by yourself.

590 Jan 3, 2023

⛵️The official PyTorch implementation for "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing" (EMNLP 2020).

BERT-of-Theseus Code for paper "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing". BERT-of-Theseus is a new compressed BERT by progre

284 Nov 25, 2022

Basic yet complete Machine Learning pipeline for NLP tasks

Basic yet complete Machine Learning pipeline for NLP tasks This repository accompanies the article on building basic yet complete ML pipelines for sol

20 Aug 22, 2022

Code for the project carried out fulfilling the course requirements for Fall 2021 NLP at NYU

Introduction Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization,

1 Apr 25, 2022

Sentiment Analysis web app using Streamlit - American Airlines Tweets

Analyse des sentiments à partir des Tweets L'application est développée par Streamlit L'analyse sentimentale est effectuée sur l'ensemble de données d

2 Feb 4, 2022

The source code of "Language Models are Few-shot Multilingual Learners" (MRL @ EMNLP 2021)

Language Models are Few-shot Multilingual Learners Paper This is the source code of the paper [Arxiv] [ACL Anthology]: This code has been written usin

45 Nov 21, 2022

Lightweight utility tools for the detection of multiple spellings, meanings, and language-specific terminology in British and American English

Breame ( British English and American English) Breame is a lightweight Python package with a number of utility tools to aid in the detection of words

8 Oct 10, 2022

Materials (slides, code, assignments) for the NYU class I teach on NLP and ML Systems (Master of Engineering).

FREE_7773 Repo containing material for the NYU class (Master of Engineering) I teach on NLP, ML Sys etc. For context on what the class is trying to ac

90 Dec 19, 2022

Metasploit Multi Purpose Exploiting Toolkit For Termux

MSF-EXPLOIT MSF-ANDRO is a Metasploit Multi Purpose Exploiting Toolkit For Termux . Only a Basic Script , Still in Development . FEATURES : Install Me

22 Dec 29, 2022

BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese

Table of contents Introduction Using BARTpho with fairseq Using BARTpho with transformers Notes BARTpho: Pre-trained Sequence-to-Sequence Models for V

58 Dec 23, 2022

Contains links to publicly available datasets for modeling health outcomes using speech and language.

speech-nlp-datasets Contains links to publicly available datasets for modeling various health outcomes using speech and language. Speech-based Corpora

77 Dec 7, 2022

A PyTorch-based model pruning toolkit for pre-trained language models

English | 中文说明 TextPruner是一个为预训练语言模型设计的模型裁剪工具包，通过轻量、快速的裁剪方法对模型进行结构化剪枝，从而实现压缩模型体积、提升模型速度。其他相关资源：知识蒸馏工具TextBrewer：https://github.com/airaria/TextBrewe

231 Jan 8, 2023

Stochastic Tensor Optimization for Robot Motion - A GPU Robot Motion Toolkit

STORM Stochastic Tensor Optimization for Robot Motion - A GPU Robot Motion Toolkit [Install Instructions] [Paper] [Website] This package contains code

101 Dec 12, 2022

BERT, LDA, and TFIDF based keyword extraction in Python

BERT, LDA, and TFIDF based keyword extraction in Python kwx is a toolkit for multilingual keyword extraction based on Google's BERT and Latent Dirichl

41 Dec 27, 2022

BERT-based Financial Question Answering System

BERT-based Financial Question Answering System In this example, we use Jina, PyTorch, and Hugging Face transformers to build a production-ready BERT-b

61 Sep 18, 2022

Chinese NER with albert/electra or other bert descendable model (keras)

Chinese NLP (albert/electra with Keras) Named Entity Recognization Project Structure ./ ├── NER │ ├── __init__.py │ ├── log

2 Nov 20, 2022

ONT Analysis Toolkit (OAT)

A toolkit for monitoring ONT MinION sequencing, followed by data analysis, for viral genomes amplified with tiled amplicon sequencing.

6 Jun 14, 2022

Label data using HuggingFace's transformers and automatically get a prediction service

Label Studio for Hugging Face's Transformers Website • Docs • Twitter • Join Slack Community Transfer learning for NLP models by annotating your textu

135 Dec 29, 2022

Malaya-Speech is a Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow.

Malaya-Speech is a Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow. Documentation Proper documentation is available at

151 Jan 5, 2023

Code for CodeT5: a new code-aware pre-trained encoder-decoder model.

CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation This is the official PyTorch implementation

564 Jan 8, 2023

The code for the Subformer, from the EMNLP 2021 Findings paper: "Subformer: Exploring Weight Sharing for Parameter Efficiency in Generative Transformers", by Machel Reid, Edison Marrese-Taylor, and Yutaka Matsuo

Subformer This repository contains the code for the Subformer. To help overcome this we propose the Subformer, allowing us to retain performance while

10 Dec 27, 2022

Transcribing audio files using Hugging Face's implementation of Wav2Vec2 + "chain-linking" NLP tasks to combine speech-to-text with downstream tasks like translation and summarisation.

PART 2: CHAIN LINKING AUDIO-TO-TEXT NLP TASKS 2A: TRANSCRIBE-TRANSLATE-SENTIMENT-ANALYSIS In notebook3.0, I demo a simple workflow to: transcribe a lo

30 Jul 13, 2022

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

English | 简体中文 | 繁體中文 | 한국어 State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow 🤗 Transformers provides thousands of pretrained models

77.1k Dec 31, 2022

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Hiring We are hiring at all levels (including FTE researchers and interns)! If you are interested in working with us on NLP and large-scale pre-traine

7.8k Jan 9, 2023

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

ELECTRA Introduction ELECTRA is a method for self-supervised language representation learning. It can be used to pre-train transformer networks using

2.1k Dec 28, 2022

Awesome Treasure of Transformers Models Collection

💁 Awesome Treasure of Transformers Models for Natural Language processing contains papers, videos, blogs, official repo along with colab Notebooks. 🛫☑️

577 Jan 7, 2023

The projects lets you extract glossary words and their definitions from a given piece of text automatically using NLP techniques

Unsupervised technique to Glossary and Definition Extraction Code Files GPT2-DefinitionModel.ipynb - GPT-2 model for definition generation. Data_Gener

28 May 25, 2021

Powerful unsupervised domain adaptation method for dense retrieval.

Powerful unsupervised domain adaptation method for dense retrieval

191 Dec 28, 2022

Converts a Bangla numeric string to literal words.

Bangla Number in Words Converts a Bangla numeric string to literal words. Install $ pip install banglanum2words Usage

3 Aug 29, 2022

Text completion with Hugging Face and TensorFlow.js running on Node.js

Katana ML Text Completion 🤗 Description Runs with with Hugging Face DistilBERT and TensorFlow.js on Node.js distilbert-model - converter from Hugging

2 Nov 4, 2022

A toolkit for document-level event extraction, containing some SOTA model implementations

❤️ A Toolkit for Document-level Event Extraction with & without Triggers Hi, there 👋 . Thanks for your stay in this repo. This project aims at buildi

159 Dec 22, 2022

Using Bert as the backbone model for lime, designed for NLP task explanation (sentence pair text classification task)

Lime Comparing deep contextualized model for sentences highlighting task. In addition, take the classic explanation model "LIME" with bert-base model

2 Jan 18, 2022

ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab

AliceMind AliceMind: ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab This repository provides pre-trained encode

1.4k Jan 1, 2023

pandas: powerful Python data analysis toolkit

pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive.

36.4k Jan 3, 2023

SimpleITK is an image analysis toolkit with a large number of components supporting general filtering operations, image segmentation and registration

672 Jan 5, 2023

MIT version of the PyMca XRF Toolkit

PyMca This is the MIT version of the PyMca XRF Toolkit. Please read the LICENSE file for details. Installation Ready-to-use packages are available for

43 Nov 23, 2022

Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

keytotext Idea is to build a model which will take keywords as inputs and generate sentences as outputs. Potential use case can include: Marketing Sea

364 Jan 3, 2023

Estoult - a Python toolkit for data mapping with an integrated query builder for SQL databases

Estoult Estoult is a Python toolkit for data mapping with an integrated query builder for SQL databases. It currently supports MySQL, PostgreSQL, and

15 Dec 29, 2022

A toolkit for document-level event extraction, containing some SOTA model implementations

Document-level Event Extraction via Heterogeneous Graph-based Interaction Model with a Tracker Source code for ACL-IJCNLP 2021 Long paper: Document-le

84 Dec 15, 2022

Evaluation toolkit of the informative tracking benchmark comprising 9 scenarios, 180 diverse videos, and new challenges.

Informative-tracking-benchmark Informative tracking benchmark (ITB) higher diversity. It contains 9 representative scenarios and 180 diverse videos. m

15 Nov 26, 2022

Multiview Dataset Toolkit

Multiview Dataset Toolkit Using multi-view cameras is a natural way to obtain a complete point cloud. However, there is to date only one multi-view 3D

11 Dec 22, 2022

Dnspython is a DNS toolkit for Python.

dnspython is a DNS toolkit for Python. It supports almost all record types. It can be used for queries, zone transfers, and dynamic updates. It supports TSIG authenticated messages and EDNS0.

2.1k Jan 6, 2023

RCT-ART is an NLP pipeline built with spaCy for converting clinical trial result sentences into tables through jointly extracting intervention, outcome and outcome measure entities and their relations.

Randomised controlled trial abstract result tabulator RCT-ART is an NLP pipeline built with spaCy for converting clinical trial result sentences into

2 Sep 16, 2022

The geospatial toolkit for redistricting data.

maup maup is the geospatial toolkit for redistricting data. The package streamlines the basic workflows that arise when working with blocks, precincts

Metric Geometry and Gerrymandering Group

60 Dec 5, 2022

Intercepting proxy + analysis toolkit for Second Life compatible virtual worlds

Hippolyzer Hippolyzer is a revival of Linden Lab's PyOGP library targeting modern Python 3, with a focus on debugging issues in Second Life-compatible

6 Sep 1, 2022

wxPython's Project Phoenix. A new implementation of wxPython, better, stronger, faster than he was before.

wxPython Project Phoenix Introduction Welcome to wxPython's Project Phoenix! Phoenix is the improved next-generation wxPython, "better, stronger, fast

1.9k Jan 7, 2023

NLP tool to extract emotional phrase from tweets 🤩

Emotional phrase extractor Extract phrase in the given text that is used to express the sentiment. Capturing sentiment in language is important in the

38 Oct 17, 2022

An efficient toolkit for Face Stylization based on the paper "AgileGAN: Stylizing Portraits by Inversion-Consistent Transfer Learning"

MMGEN-FaceStylor English | 简体中文 Introduction This repo is an efficient toolkit for Face Stylization based on the paper "AgileGAN: Stylizing Portraits

182 Dec 27, 2022

Python Toolkit containing different Cyber Attacks Tools

Helikopter Python Toolkit containing different Cyber Attacks Tools. Tools in Helikopter Toolkit 1. FattyNigger (PYTHON WORM) 2. Taxes (PYTHON PASS EXT

22 Dec 4, 2022

Continuously update some NLP practice based on different tasks.

NLP_practice We will continuously update some NLP practice based on different tasks. prerequisites Software pytorch = 1.10 torchtext = 0.11.0 sklear

0 Jan 5, 2022

State-of-the-art NLP through transformer models in a modular design and consistent APIs.

Trapper (Transformers wRAPPER) Trapper is an NLP library that aims to make it easier to train transformer based models on downstream tasks. It wraps h

42 Sep 21, 2022

ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab

AliceMind AliceMind: ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab This repository provides pre-trained encode

922 Dec 10, 2021

Model Validation Toolkit is a collection of tools to assist with validating machine learning models prior to deploying them to production and monitoring them after deployment to production.

25 Dec 28, 2022

This is a project of data parallel that running on NLP tasks.

2 Dec 12, 2021

Hashformers is a framework for hashtag segmentation with transformers.

Hashtag segmentation is the task of automatically inserting the missing spaces between the words in a hashtag. Hashformers applies Transformer models

41 Nov 9, 2022

Suite of 500 procedurally-generated NLP tasks to study language model adaptability

TaskBench500 The TaskBench500 dataset and code for generating tasks. Data The TaskBench dataset is available under wget http://web.mit.edu/bzl/www/Tas

20 May 17, 2022

Logan is a toolkit for building standalone Django applications

Logan Logan is a toolkit for running standalone Django applications. It provides you with tools to create a CLI runner, manage settings, and the abili

206 Jan 3, 2023

Suite of 500 procedurally-generated NLP tasks to study language model adaptability

TaskBench500 The TaskBench500 dataset and code for generating tasks. Data The TaskBench dataset is available under wget http://web.mit.edu/bzl/www/Tas

20 May 17, 2022

This repository contains Python scripts for extracting linguistic features from Filipino texts.

Filipino Text Linguistic Feature Extractors This repository contains scripts for extracting linguistic features from Filipino texts. The scripts were

1 Oct 5, 2021

Chatbot in 200 lines of code using TensorLayer

Seq2Seq Chatbot This is a 200 lines implementation of Twitter/Cornell-Movie Chatbot, please read the following references before you read the code: Pr

820 Dec 17, 2022

End-To-End Memory Network using Tensorflow

MemN2N Implementation of End-To-End Memory Networks with sklearn-like interface using Tensorflow. Tasks are from the bAbl dataset. Get Started git clo

339 Oct 27, 2022

Tensorflow implementation of Character-Aware Neural Language Models.

Character-Aware Neural Language Models Tensorflow implementation of Character-Aware Neural Language Models. The original code of author can be found h

751 Dec 26, 2022

Machine Learning Toolkit for Kubernetes

Kubeflow the cloud-native platform for machine learning operations - pipelines, training and deployment. Documentation Please refer to the official do

12.1k Jan 3, 2023

A design of MIDI language for music generation task, specifically for Natural Language Processing (NLP) models.

MIDI Language Introduction Reference Paper: Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions: code This

3 May 25, 2022

Course project of NLP@UCAS

NaiveMT Prepare Clone this repository git clone [email protected]:Poeroz/NaiveMT.git Install Please first install PyTorch = 1.5.0, then type the followi

2 Apr 24, 2022

Simple bots or Simbots is a library designed to create simple bots using the power of python. This library utilises Intent, Entity, Relation and Context model to create bots .

Simple bots or Simbots is a library designed to create simple chat bots using the power of python. This library utilises Intent, Entity, Relation and

14 Dec 15, 2021

QML: A Python Toolkit for Quantum Machine Learning

QML is a Python2/3-compatible toolkit for representation learning of properties of molecules and solids.

176 Dec 9, 2022

A Chinese to English Neural Model Translation Project

ZH-EN NMT Chinese to English Neural Machine Translation This project is inspired by Stanford's CS224N NMT Project Dataset used in this project: News C

29 Nov 26, 2022

An assignment from my grad-level data mining course demonstrating some experience with NLP/neural networks/Pytorch

NLP-Pytorch-Assignment An assignment from my grad-level data mining course (before I started personal projects) demonstrating some experience with NLP

0 Feb 6, 2022

PyTorch toolkit for biomedical imaging

farabio is a minimal PyTorch toolkit for out-of-the-box deep learning support in biomedical imaging. For further information, see Wikis and Docs.

47 Dec 28, 2022

Deploying a Text Summarization NLP use case on Docker Container Utilizing Nvidia GPU

GPU Docker NLP Application Deployment Deploying a Text Summarization NLP use case on Docker Container Utilizing Nvidia GPU, to setup the enviroment on

9 Oct 14, 2022

Natural Language Processing Tasks and Examples.

Natural Language Processing Tasks and Examples With the advancement of A.I. technology in recent years, natural language processing technology has bee

53 Dec 20, 2022

Pipeline for training LSA models using Scikit-Learn.

Latent Semantic Analysis Pipeline for training LSA models using Scikit-Learn. Usage Instead of writing custom code for latent semantic analysis, you j

23 Sep 5, 2022

This is a project built for FALLABOUT2021 event under SRMMIC, This project deals with NLP poetry generation.

FALLABOUT-SRMMIC 21 POETRY-GENERATION HINGLISH DESCRIPTION We have developed a NLP(natural language processing) model which automatically generates a

7 Sep 28, 2021

Pre-Training with Whole Word Masking for Chinese BERT

7.7k Dec 31, 2022

Official implementations for various pre-training models of ERNIE-family, covering topics of Language Understanding & Generation, Multimodal Understanding & Generation, and beyond.

English|简体中文 ERNIE是百度开创性提出的基于知识增强的持续学习语义理解框架，该框架将大数据预训练与多源丰富知识相结合，通过持续学习技术，不断吸收海量文本数据中词汇、结构、语义等方面的知识，实现模型效果不断进化。ERNIE在累积 40 余个典型 NLP 任务取得 SOTA 效果，并在 G

5.4k Jan 3, 2023

Revisiting Pre-trained Models for Chinese Natural Language Processing (Findings of EMNLP 2020)

This repository contains the resources in our paper "Revisiting Pre-trained Models for Chinese Natural Language Processing", which will be published i

463 Dec 30, 2022

SentAugment is a data augmentation technique for semi-supervised learning in NLP.

SentAugment SentAugment is a data augmentation technique for semi-supervised learning in NLP. It uses state-of-the-art sentence embeddings to structur

363 Dec 30, 2022

Sorce code and datasets for "K-BERT: Enabling Language Representation with Knowledge Graph",

K-BERT Sorce code and datasets for "K-BERT: Enabling Language Representation with Knowledge Graph", which is implemented based on the UER framework. R

834 Jan 9, 2023

Toolkit for developing and maintaining ML models

modelkit Python framework for production ML systems. modelkit is a minimalist yet powerful MLOps library for Python, built for people who want to depl

140 Dec 27, 2022

Python toolkit for defining+simulating+visualizing+analyzing attractors, dynamical systems, iterated function systems, roulette curves, and more

Attractors A small module that provides functions and classes for very efficient simulation and rendering of iterated function systems; dynamical syst

1 Aug 4, 2021

Modified GPT using average pooling to reduce the softmax attention memory constraints.

The official implementation of "BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?, ACL 2021 main conference"

BERT is to NLP what AlexNet is to CV This is the official implementation of BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Iden

20 Nov 3, 2022

Python Nlp-toolkit Resources

Python nlp-toolkit Libraries

A modified Sequential and NLP based Bot

A Lightweight NLP Data Loader for All Deep Learning Frameworks in Python

Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application with a focus on embedded systems.

PatZilla is a modular patent information research platform and data integration toolkit with a modern user interface and access to multiple data sources.

Py2neo is a comprehensive toolkit for working with Neo4j from within Python applications or from the command line.

A toolkit to generate MR sequence diagrams

Fast and robust date extraction from web pages, with Python or on the command-line

MMRazor: a model compression toolkit for model slimming and AutoML

Practical Natural Language Processing Tools for Humans is build on the top of Senna Natural Language Processing (NLP)

Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

Faster, modernized fork of the language identification tool langid.py

Quick insights from Zoom meeting transcripts using Graph + NLP

Unofficial Python library for using the Polish Wordnet (plWordNet / Słowosieć)

Evaluation toolkit of the informative tracking benchmark comprising 9 scenarios, 180 diverse videos, and new challenges.

A Japanese tokenizer based on recurrent neural networks

Crawl BookCorpus

⛵️The official PyTorch implementation for "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing" (EMNLP 2020).

Basic yet complete Machine Learning pipeline for NLP tasks

Code for the project carried out fulfilling the course requirements for Fall 2021 NLP at NYU

Sentiment Analysis web app using Streamlit - American Airlines Tweets

The source code of "Language Models are Few-shot Multilingual Learners" (MRL @ EMNLP 2021)

Lightweight utility tools for the detection of multiple spellings, meanings, and language-specific terminology in British and American English

Materials (slides, code, assignments) for the NYU class I teach on NLP and ML Systems (Master of Engineering).

Metasploit Multi Purpose Exploiting Toolkit For Termux

BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese

Contains links to publicly available datasets for modeling health outcomes using speech and language.

A PyTorch-based model pruning toolkit for pre-trained language models

Stochastic Tensor Optimization for Robot Motion - A GPU Robot Motion Toolkit

BERT, LDA, and TFIDF based keyword extraction in Python

BERT-based Financial Question Answering System

Chinese NER with albert/electra or other bert descendable model (keras)

ONT Analysis Toolkit (OAT)

Label data using HuggingFace's transformers and automatically get a prediction service

Malaya-Speech is a Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow.

Code for CodeT5: a new code-aware pre-trained encoder-decoder model.

The code for the Subformer, from the EMNLP 2021 Findings paper: "Subformer: Exploring Weight Sharing for Parameter Efficiency in Generative Transformers", by Machel Reid, Edison Marrese-Taylor, and Yutaka Matsuo

Transcribing audio files using Hugging Face's implementation of Wav2Vec2 + "chain-linking" NLP tasks to combine speech-to-text with downstream tasks like translation and summarisation.

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Awesome Treasure of Transformers Models Collection

The projects lets you extract glossary words and their definitions from a given piece of text automatically using NLP techniques

Powerful unsupervised domain adaptation method for dense retrieval.

Converts a Bangla numeric string to literal words.

Text completion with Hugging Face and TensorFlow.js running on Node.js

A toolkit for document-level event extraction, containing some SOTA model implementations

Using Bert as the backbone model for lime, designed for NLP task explanation (sentence pair text classification task)

ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab

pandas: powerful Python data analysis toolkit

SimpleITK is an image analysis toolkit with a large number of components supporting general filtering operations, image segmentation and registration

MIT version of the PyMca XRF Toolkit

Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

Estoult - a Python toolkit for data mapping with an integrated query builder for SQL databases

A toolkit for document-level event extraction, containing some SOTA model implementations

Evaluation toolkit of the informative tracking benchmark comprising 9 scenarios, 180 diverse videos, and new challenges.

Multiview Dataset Toolkit

Dnspython is a DNS toolkit for Python.

RCT-ART is an NLP pipeline built with spaCy for converting clinical trial result sentences into tables through jointly extracting intervention, outcome and outcome measure entities and their relations.

The geospatial toolkit for redistricting data.

Intercepting proxy + analysis toolkit for Second Life compatible virtual worlds

wxPython's Project Phoenix. A new implementation of wxPython, better, stronger, faster than he was before.

NLP tool to extract emotional phrase from tweets 🤩

An efficient toolkit for Face Stylization based on the paper "AgileGAN: Stylizing Portraits by Inversion-Consistent Transfer Learning"

Python Toolkit containing different Cyber Attacks Tools

Continuously update some NLP practice based on different tasks.

State-of-the-art NLP through transformer models in a modular design and consistent APIs.

ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab

Model Validation Toolkit is a collection of tools to assist with validating machine learning models prior to deploying them to production and monitoring them after deployment to production.

This is a project of data parallel that running on NLP tasks.

Hashformers is a framework for hashtag segmentation with transformers.

Suite of 500 procedurally-generated NLP tasks to study language model adaptability

Logan is a toolkit for building standalone Django applications

Suite of 500 procedurally-generated NLP tasks to study language model adaptability

This repository contains Python scripts for extracting linguistic features from Filipino texts.

Chatbot in 200 lines of code using TensorLayer

End-To-End Memory Network using Tensorflow

Tensorflow implementation of Character-Aware Neural Language Models.

Machine Learning Toolkit for Kubernetes