802 Repositories
Python CS-332-NLP-Tutorials-Notes Libraries
Python library for parsing resumes using natural language processing and machine learning
CVParser Python library for parsing resumes using natural language processing and machine learning. Setup Installation on Linux and Mac OS Follow the
NLP project that works with news (NER, context generation, news trend analytics)
СоАвтор СоАвтор – платформа и открытый набор инструментов для редакций и журналистов-фрилансеров, который призван сделать процесс создания контента ма
For storing the complete exploration of Visual Question Answering for our B.Tech Project
Multi-Image vqa @authors: Akhilesh, Janhavi, Harsh Paper summary, Ideas tried and their corresponding results: on wiki Other discussions: on discussio
Develop open-source Python Arabic NLP libraries that the Arab world will easily use in all Natural Language Processing applications
Develop open-source Python Arabic NLP libraries that the Arab world will easily use in all Natural Language Processing applications
txtai: Build AI-powered semantic search applications in Go
txtai: Build AI-powered semantic search applications in Go txtai executes machine-learning workflows to transform data and build AI-powered semantic s
Fast, DB Backed pretrained word embeddings for natural language processing.
Embeddings Embeddings is a python package that provides pretrained word embeddings for natural language processing and machine learning. Instead of lo
Multilingual word vectors in 78 languages
Aligning the fastText vectors of 78 languages Facebook recently open-sourced word vectors in 89 languages. However these vectors are monolingual; mean
Simple notes app backend using Python's FastAPI framework.
my-notes-app Simple notes app backend using Python's FastAPI framework. Route "/": User login (GET): return 200, list of all of their notes; User sign
OB_Template is a vault template reference for using Obsidian.
Obsidian Template OB_Template is a vault template reference for using Obsidian. If you've tested out Obsidian. and worked through the "Obsidian Help"
An automated program that helps customers of Pizza Palour place their pizza orders
PIzza_Order_Assistant Introduction An automated program that helps customers of Pizza Palour place their pizza orders. The program uses voice commands
jiant is an NLP toolkit
🚨 Update 🚨 : As of 2021/10/17, the jiant project is no longer being actively maintained. This means there will be no plans to add new models, tasks,
A modified Sequential and NLP based Bot
A modified Sequential and NLP based Bot I improvised this bot a bit with some implementations as a part of my own hobby project :) Note: I do not own
A Lightweight NLP Data Loader for All Deep Learning Frameworks in Python
LineFlow: Framework-Agnostic NLP Data Loader in Python LineFlow is a simple text dataset loader for NLP deep learning tasks. LineFlow was designed to
Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application with a focus on embedded systems.
Welcome to Spokestack Python! This library is intended for developing voice interfaces in Python. This can include anything from Raspberry Pi applicat
Fast and robust date extraction from web pages, with Python or on the command-line
Find original and updated publication dates of any web page. From the command-line or within Python, all the steps needed from web page download to HTML parsing, scraping, and text analysis are included.
Meaningful and minimalist release notes for developers
Managing manual release notes is hard. Therefore, everyone tends to generate release notes from commit messages. But, you won't get a meaningful release note at the end.
Practical Natural Language Processing Tools for Humans is build on the top of Senna Natural Language Processing (NLP)
Practical Natural Language Processing Tools for Humans is build on the top of Senna Natural Language Processing (NLP) predictions: part-of-speech (POS) tags, chunking (CHK), name entity recognition (NER), semantic role labeling (SRL) and syntactic parsing (PSG) with skip-gram all in Python and still more features will be added. The website give is for downlarding Senna tool
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
Simplemma: a simple multilingual lemmatizer for Python Purpose Lemmatization is the process of grouping together the inflected forms of a word so they
Faster, modernized fork of the language identification tool langid.py
py3langid py3langid is a fork of the standalone language identification tool langid.py by Marco Lui. Original license: BSD-2-Clause. Fork license: BSD
Quick insights from Zoom meeting transcripts using Graph + NLP
Transcript Analysis - Graph + NLP This program extracts insights from Zoom Meeting Transcripts (.vtt) using TigerGraph and NLTK. In order to run this
Unofficial Python library for using the Polish Wordnet (plWordNet / Słowosieć)
Polish Wordnet Python library Simple, easy-to-use and reasonably fast library for using the Słowosieć (also known as PlWordNet) - a lexico-semantic da
A Japanese tokenizer based on recurrent neural networks
Nagisa is a python module for Japanese word segmentation/POS-tagging. It is designed to be a simple and easy-to-use tool. This tool has the following
Crawl BookCorpus
These are scripts to reproduce BookCorpus by yourself.
⛵️The official PyTorch implementation for "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing" (EMNLP 2020).
BERT-of-Theseus Code for paper "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing". BERT-of-Theseus is a new compressed BERT by progre
Basic yet complete Machine Learning pipeline for NLP tasks
Basic yet complete Machine Learning pipeline for NLP tasks This repository accompanies the article on building basic yet complete ML pipelines for sol
Code for the project carried out fulfilling the course requirements for Fall 2021 NLP at NYU
Introduction Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization,
Sentiment Analysis web app using Streamlit - American Airlines Tweets
Analyse des sentiments à partir des Tweets L'application est développée par Streamlit L'analyse sentimentale est effectuée sur l'ensemble de données d
The source code of "Language Models are Few-shot Multilingual Learners" (MRL @ EMNLP 2021)
Language Models are Few-shot Multilingual Learners Paper This is the source code of the paper [Arxiv] [ACL Anthology]: This code has been written usin
Lightweight utility tools for the detection of multiple spellings, meanings, and language-specific terminology in British and American English
Breame ( British English and American English) Breame is a lightweight Python package with a number of utility tools to aid in the detection of words
Materials (slides, code, assignments) for the NYU class I teach on NLP and ML Systems (Master of Engineering).
FREE_7773 Repo containing material for the NYU class (Master of Engineering) I teach on NLP, ML Sys etc. For context on what the class is trying to ac
BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese
Table of contents Introduction Using BARTpho with fairseq Using BARTpho with transformers Notes BARTpho: Pre-trained Sequence-to-Sequence Models for V
Simple tutorials using Google's TensorFlow Framework
TensorFlow-Tutorials Introduction to deep learning based on Google's TensorFlow framework. These tutorials are direct ports of Newmu's Theano Tutorial
Tensorflow Tutorials using Jupyter Notebook
Tensorflow Tutorials using Jupyter Notebook TensorFlow tutorials written in Python (of course) with Jupyter Notebook. Tried to explain as kindly as po
Simple and ready-to-use tutorials for TensorFlow
TensorFlow World To support maintaining and upgrading this project, please kindly consider Sponsoring the project developer. Any level of support is a
TensorFlow tutorials and best practices.
Effective TensorFlow 2 Table of Contents Part I: TensorFlow 2 Fundamentals TensorFlow 2 Basics Broadcasting the good and the ugly Take advantage of th
Contains links to publicly available datasets for modeling health outcomes using speech and language.
speech-nlp-datasets Contains links to publicly available datasets for modeling various health outcomes using speech and language. Speech-based Corpora
Convert complex chord names to midi notes
ezchord Simple python script that can convert complex chord names to midi notes Prerequisites pip install midiutil Usage ./ezchord.py Dmin7 G7 C timi
A PyTorch-based model pruning toolkit for pre-trained language models
English | 中文说明 TextPruner是一个为预训练语言模型设计的模型裁剪工具包,通过轻量、快速的裁剪方法对模型进行结构化剪枝,从而实现压缩模型体积、提升模型速度。 其他相关资源: 知识蒸馏工具TextBrewer:https://github.com/airaria/TextBrewe
Robot Hacking Manual (RHM). From robotics to cybersecurity. Papers, notes and writeups from a journey into robot cybersecurity.
RHM: Robot Hacking Manual Download in PDF RHM v0.4 ┃ Read online The Robot Hacking Manual (RHM) is an introductory series about cybersecurity for robo
BERT, LDA, and TFIDF based keyword extraction in Python
BERT, LDA, and TFIDF based keyword extraction in Python kwx is a toolkit for multilingual keyword extraction based on Google's BERT and Latent Dirichl
BERT-based Financial Question Answering System
BERT-based Financial Question Answering System In this example, we use Jina, PyTorch, and Hugging Face transformers to build a production-ready BERT-b
Chinese NER with albert/electra or other bert descendable model (keras)
Chinese NLP (albert/electra with Keras) Named Entity Recognization Project Structure ./ ├── NER │ ├── __init__.py │ ├── log
Label data using HuggingFace's transformers and automatically get a prediction service
Label Studio for Hugging Face's Transformers Website • Docs • Twitter • Join Slack Community Transfer learning for NLP models by annotating your textu
The program converts Swiss notes into American notes
Informatik-Programmieren Einleitung: Das Programm rechnet Schweizer Noten in das Amerikanische Noten um. Der Benutzer kann seine Note eingeben und der
Code for CodeT5: a new code-aware pre-trained encoder-decoder model.
CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation This is the official PyTorch implementation
The code for the Subformer, from the EMNLP 2021 Findings paper: "Subformer: Exploring Weight Sharing for Parameter Efficiency in Generative Transformers", by Machel Reid, Edison Marrese-Taylor, and Yutaka Matsuo
Subformer This repository contains the code for the Subformer. To help overcome this we propose the Subformer, allowing us to retain performance while
Transcribing audio files using Hugging Face's implementation of Wav2Vec2 + "chain-linking" NLP tasks to combine speech-to-text with downstream tasks like translation and summarisation.
PART 2: CHAIN LINKING AUDIO-TO-TEXT NLP TASKS 2A: TRANSCRIBE-TRANSLATE-SENTIMENT-ANALYSIS In notebook3.0, I demo a simple workflow to: transcribe a lo
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
English | 简体中文 | 繁體中文 | 한국어 State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow 🤗 Transformers provides thousands of pretrained models
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Hiring We are hiring at all levels (including FTE researchers and interns)! If you are interested in working with us on NLP and large-scale pre-traine
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
ELECTRA Introduction ELECTRA is a method for self-supervised language representation learning. It can be used to pre-train transformer networks using
Awesome Treasure of Transformers Models Collection
💁 Awesome Treasure of Transformers Models for Natural Language processing contains papers, videos, blogs, official repo along with colab Notebooks. 🛫☑️
The projects lets you extract glossary words and their definitions from a given piece of text automatically using NLP techniques
Unsupervised technique to Glossary and Definition Extraction Code Files GPT2-DefinitionModel.ipynb - GPT-2 model for definition generation. Data_Gener
Powerful unsupervised domain adaptation method for dense retrieval.
Powerful unsupervised domain adaptation method for dense retrieval
Converts a Bangla numeric string to literal words.
Bangla Number in Words Converts a Bangla numeric string to literal words. Install $ pip install banglanum2words Usage
Text completion with Hugging Face and TensorFlow.js running on Node.js
Katana ML Text Completion 🤗 Description Runs with with Hugging Face DistilBERT and TensorFlow.js on Node.js distilbert-model - converter from Hugging
Using Bert as the backbone model for lime, designed for NLP task explanation (sentence pair text classification task)
Lime Comparing deep contextualized model for sentences highlighting task. In addition, take the classic explanation model "LIME" with bert-base model
ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab
AliceMind AliceMind: ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab This repository provides pre-trained encode
Idea is to build a model which will take keywords as inputs and generate sentences as outputs.
keytotext Idea is to build a model which will take keywords as inputs and generate sentences as outputs. Potential use case can include: Marketing Sea
A toolkit for document-level event extraction, containing some SOTA model implementations
Document-level Event Extraction via Heterogeneous Graph-based Interaction Model with a Tracker Source code for ACL-IJCNLP 2021 Long paper: Document-le
RCT-ART is an NLP pipeline built with spaCy for converting clinical trial result sentences into tables through jointly extracting intervention, outcome and outcome measure entities and their relations.
Randomised controlled trial abstract result tabulator RCT-ART is an NLP pipeline built with spaCy for converting clinical trial result sentences into
This is a small notes web app, with python and flask microframework. Using sqlite3
Python Notes App. This is a small web application maked with flask-python for add notes easily and quickly. Dependencies. You can create a virtual env
NLP tool to extract emotional phrase from tweets 🤩
Emotional phrase extractor Extract phrase in the given text that is used to express the sentiment. Capturing sentiment in language is important in the
Continuously update some NLP practice based on different tasks.
NLP_practice We will continuously update some NLP practice based on different tasks. prerequisites Software pytorch = 1.10 torchtext = 0.11.0 sklear
State-of-the-art NLP through transformer models in a modular design and consistent APIs.
Trapper (Transformers wRAPPER) Trapper is an NLP library that aims to make it easier to train transformer based models on downstream tasks. It wraps h
Athens: a great tool for taking notes and organising knowldge
AthensSyncer Athens is a great tool for taking notes and organising knowldge. But it is a bummer that you cannot use it accross multiple devices. Well
ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab
AliceMind AliceMind: ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab This repository provides pre-trained encode
This is a project of data parallel that running on NLP tasks.
This is a project of data parallel that running on NLP tasks.
Hashformers is a framework for hashtag segmentation with transformers.
Hashtag segmentation is the task of automatically inserting the missing spaces between the words in a hashtag. Hashformers applies Transformer models
Suite of 500 procedurally-generated NLP tasks to study language model adaptability
TaskBench500 The TaskBench500 dataset and code for generating tasks. Data The TaskBench dataset is available under wget http://web.mit.edu/bzl/www/Tas
Suite of 500 procedurally-generated NLP tasks to study language model adaptability
TaskBench500 The TaskBench500 dataset and code for generating tasks. Data The TaskBench dataset is available under wget http://web.mit.edu/bzl/www/Tas
This repository contains Python scripts for extracting linguistic features from Filipino texts.
Filipino Text Linguistic Feature Extractors This repository contains scripts for extracting linguistic features from Filipino texts. The scripts were
An ongoing curated list of frameworks, libraries, learning tutorials, software and resources in Python Language.
Python Development Welcome to the world of Python. An ongoing curated list of frameworks, libraries, learning tutorials, software and resources in Pyt
Grab the changelog from releases on Github
release-notes-scraper This simple script can be used to grab the release notes for projects from github that do not keep a CHANGELOG, but publish thei
A project designed to make taking notes easier than ever - by doing it all on command line
A project designed to make taking notes easier than ever - by doing it all on command line! Yes, all of your files are easily accessible through one command interface, and can be written to at any time! #ad #sponsored
Chatbot in 200 lines of code using TensorLayer
Seq2Seq Chatbot This is a 200 lines implementation of Twitter/Cornell-Movie Chatbot, please read the following references before you read the code: Pr
A Simple Example for Imitation Learning with Dataset Aggregation (DAGGER) on Torcs Env
Imitation Learning with Dataset Aggregation (DAGGER) on Torcs Env This repository implements a simple algorithm for imitation learning: DAGGER. In thi
End-To-End Memory Network using Tensorflow
MemN2N Implementation of End-To-End Memory Networks with sklearn-like interface using Tensorflow. Tasks are from the bAbl dataset. Get Started git clo
Tensorflow implementation of Character-Aware Neural Language Models.
Character-Aware Neural Language Models Tensorflow implementation of Character-Aware Neural Language Models. The original code of author can be found h
Spatial Transformer Nets in TensorFlow/ TensorLayer
MOVED TO HERE Spatial Transformer Networks Spatial Transformer Networks (STN) is a dynamic mechanism that produces transformations of input images (or
Useful guides, tutorials, and FAQs related to LEGO Universe and Darkflame Universe.
Awesome Lego Universe A curated list of awesome things related to LEGO Universe. LEGO Universe was a kid-friendly massively-multiplayer online role pl
A design of MIDI language for music generation task, specifically for Natural Language Processing (NLP) models.
MIDI Language Introduction Reference Paper: Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions: code This
Course project of NLP@UCAS
NaiveMT Prepare Clone this repository git clone [email protected]:Poeroz/NaiveMT.git Install Please first install PyTorch = 1.5.0, then type the followi
Simple bots or Simbots is a library designed to create simple bots using the power of python. This library utilises Intent, Entity, Relation and Context model to create bots .
Simple bots or Simbots is a library designed to create simple chat bots using the power of python. This library utilises Intent, Entity, Relation and
Tutorials on advanced python topics, and literate programming framework to write them.
Advanced course on Python3 This course covers several topics Python decorators The python object system / meta classes Also see my text on Python impo
NotesToCommands - a fully customizable notes / command template program, allowing users to instantly execute terminal commands
NotesToCommands is a fully customizable notes / command template program, allowing users to instantly execute terminal commands with dynamic arguments grouped into sections in their notes/files. It was originally created for pentesting uses, to avoid the needed remembrance and retyping of sets of commands for various attacks.
A note taker for NVDA. Allows the user to create, edit, view, manage and export notes to different formats.
Quick Notetaker add-on for NVDA The Quick Notetaker add-on is a wonderful tool which allows writing notes quickly and easily anytime and from any app
A Chinese to English Neural Model Translation Project
ZH-EN NMT Chinese to English Neural Machine Translation This project is inspired by Stanford's CS224N NMT Project Dataset used in this project: News C
An assignment from my grad-level data mining course demonstrating some experience with NLP/neural networks/Pytorch
NLP-Pytorch-Assignment An assignment from my grad-level data mining course (before I started personal projects) demonstrating some experience with NLP
Deploying a Text Summarization NLP use case on Docker Container Utilizing Nvidia GPU
GPU Docker NLP Application Deployment Deploying a Text Summarization NLP use case on Docker Container Utilizing Nvidia GPU, to setup the enviroment on
Natural Language Processing Tasks and Examples.
Natural Language Processing Tasks and Examples With the advancement of A.I. technology in recent years, natural language processing technology has bee
Pipeline for training LSA models using Scikit-Learn.
Latent Semantic Analysis Pipeline for training LSA models using Scikit-Learn. Usage Instead of writing custom code for latent semantic analysis, you j
This is a project built for FALLABOUT2021 event under SRMMIC, This project deals with NLP poetry generation.
FALLABOUT-SRMMIC 21 POETRY-GENERATION HINGLISH DESCRIPTION We have developed a NLP(natural language processing) model which automatically generates a
Pre-Training with Whole Word Masking for Chinese BERT
Pre-Training with Whole Word Masking for Chinese BERT
Official implementations for various pre-training models of ERNIE-family, covering topics of Language Understanding & Generation, Multimodal Understanding & Generation, and beyond.
English|简体中文 ERNIE是百度开创性提出的基于知识增强的持续学习语义理解框架,该框架将大数据预训练与多源丰富知识相结合,通过持续学习技术,不断吸收海量文本数据中词汇、结构、语义等方面的知识,实现模型效果不断进化。ERNIE在累积 40 余个典型 NLP 任务取得 SOTA 效果,并在 G
Revisiting Pre-trained Models for Chinese Natural Language Processing (Findings of EMNLP 2020)
This repository contains the resources in our paper "Revisiting Pre-trained Models for Chinese Natural Language Processing", which will be published i
SentAugment is a data augmentation technique for semi-supervised learning in NLP.
SentAugment SentAugment is a data augmentation technique for semi-supervised learning in NLP. It uses state-of-the-art sentence embeddings to structur
Sorce code and datasets for "K-BERT: Enabling Language Representation with Knowledge Graph",
K-BERT Sorce code and datasets for "K-BERT: Enabling Language Representation with Knowledge Graph", which is implemented based on the UER framework. R
My notes on Data structure and Algos in golang implementation and python
My notes on DS and Algo Table of Contents Arrays LinkedList Trees Types of trees: Tree/Graph Traversal Algorithms Heap Priorty Queue Trie Graphs Graph
This an Anki add on that automatically converts Notion notes into Anki flash cards. Currently in development!
NotionFlash This is an Anki add on in development that will allow automatically convert your Notion study notes into Anki flash cards. The Anki deck c
Modified GPT using average pooling to reduce the softmax attention memory constraints.
NLP-GPT-Upsampling This repository contains an implementation of Open AI's GPT Model. In particular, this implementation takes inspiration from the Ny