Home / Python Text Data & NLP
1361 Repositories
Sortby
Biterm Topic Model Bitermplus implements Biterm topic model for short texts introduced by Xiaohui Yan, Jiafeng Guo, Yanyan Lan, and Xueqi Cheng. Actua
Global Prosody Style Transfer Without Text Transcriptions This repository provides a PyTorch implementation of AutoPST, which enables unsupervised glo
Snips NLU Snips NLU (Natural Language Understanding) is a Python library that allows to extract structured information from sentences written in natur
天池中药说明书实体识别挑战冠军方案;中文命名实体识别;NER; BERT-CRF & BERT-SPAN & BERT-MRC;Pytorch
PTR Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification" If you use the code, please cite the following paper: @art
NERDA Not only is NERDA a mesmerizing muppet-like character. NERDA is also a python package, that offers a slick easy-to-use interface for fine-tuning
Paradigm Shift in NLP Welcome to the webpage for "Paradigm Shift in Natural Language Processing". Some resources of the paper are constantly maintaine
GPT2-NewsTitle 带有超详细注释的GPT2新闻标题生成项目 UpDate 01.02.2021 从网上收集数据,将清华新闻数据、搜狗新闻数据等新闻数据集,以及开源的一些摘要数据进行整理清洗,构建一个较完善的中文摘要数据集。 数据集清洗时,仅进行了简单地规则清洗。
7 Python code smells This repository contains the code for the Python code smells video on the ArjanCodes channel (watch the video here). The example
A Python multilingual toolkit for Sentiment Analysis and Social NLP tasks
ftfy: fixes text for you print(fix_encoding("(ง'⌣')ง")) (ง'⌣')ง Full documentation: https://ftfy.readthedocs.org Testimonials “My life is li
Takker - 一个普通的QQ机器人 此项目为基于 Nonebot2 和 go-cqhttp 开发,以 Sqlite 作为数据库的QQ群娱乐机器人 关于 纯兴趣开发,部分功能借鉴了大佬们的代码,作为Q群的娱乐+功能性Bot 声明 此项目仅用于学习交流,请勿用于非法用途 这是开发者的第一个Pytho
spaCy-wrap: For Wrapping fine-tuned transformers in spaCy pipelines spaCy-wrap is minimal library intended for wrapping fine-tuned transformers from t
Augmenty: The cherry on top of your NLP pipeline Augmenty is an augmentation library based on spaCy for augmenting texts. Besides a wide array of high
Introduction This is a PyTorch implementation of the following research papers: (1) Hierarchical Text Generation and Planning for Strategic Dialogue (
Negative Sampling for NER Unlabeled entity problem is prevalent in many NER scenarios (e.g., weakly supervised NER). Our paper in ICLR-2021 proposes u
Label Studio for Hugging Face's Transformers Website • Docs • Twitter • Join Slack Community Transfer learning for NLP models by annotating your textu
Phonemizer -- foʊnmaɪzɚ The phonemizer allows simple phonemization of words and texts in many languages. Provides both the phonemize command-line tool
TearingNet: Point Cloud Autoencoder to Learn Topology-Friendly Representations Created by Jiahao Pang, Duanshun Li, and Dong Tian from InterDigital In
NAME inflect.py - Correctly generate plurals, singular nouns, ordinals, indefinite articles; convert numbers to words. SYNOPSIS import inflect p = in
pkuseg:一个多领域中文分词工具包 (English Version) pkuseg 是基于论文[Luo et. al, 2019]的工具包。其简单易用,支持细分领域分词,有效提升了分词准确度。 目录 主要亮点 编译和安装 各类分词工具包的性能对比 使用方式 论文引用 作者 常见问题及解答 主要
CIF-PyTorch This is a PyTorch based implementation of continuous integrate-and-fire (CIF) module for end-to-end (E2E) automatic speech recognition (AS
Data and code for EMNLP 2021 paper "FinQA: A Dataset of Numerical Reasoning over Financial Data"
WHAT: Supervised text vectorization tool Textvec is a text vectorization tool, with the aim to implement all the "classic" text vectorization NLP meth
Recasing and punctuation model based on Bert Benoit Favre 2021 This system converts a sequence of lowercase tokens without punctuation to a sequence o
ReaLiSe ReaLiSe is a multi-modal Chinese spell checking model. This the office code for the paper Read, Listen, and See: Leveraging Multimodal Informa
Poincaré Embeddings for Learning Hierarchical Representations PyTorch implementation of Poincaré Embeddings for Learning Hierarchical Representations
Open-World Entity Segmentation Project Website Lu Qi*, Jason Kuen*, Yi Wang, Jiuxiang Gu, Hengshuang Zhao, Zhe Lin, Philip Torr, Jiaya Jia This projec
Flaxformer: transformer architectures in JAX/Flax Flaxformer is a transformer library for primarily NLP and multimodal research at Google. It is used
minnn by Graham Neubig, Zhisong Zhang, and Divyansh Kaushik This is an exercise in developing a minimalist neural network toolkit for NLP, part of Car
WordDumb A calibre plugin that generates Word Wise and X-Ray files then sends them to Kindle. Supports KFX, AZW3 and MOBI eBooks. Languages X-Ray supp
ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python) 日本語は以下に続きます (Japanese follows) English: This book is written in Japanese and primaril
chazutsu photo from Kaikado, traditional Japanese chazutsu maker chazutsu is the dataset downloader for NLP. import chazutsu r = chazutsu.data
파이썬 비트코인 투자 자동화 강의 코드 by 유튜브 조코딩 채널 pyupbit 라이브러리를 활용하여 upbit 거래소에서 비트코인 자동매매를 하는 코드입니다. 파일 구성 test.py : 잔고 조회 (1강) backtest.py : 백테스팅 코드 (2강) bestK.p
Python-zhuyin - An open source Python library that provides a unified interface for converting between Chinese pinyin and Zhuyin (bopomofo)
8bit breadboard computer simulator This is an assembler + simulator/emulator of Ben Eater's 8bit breadboard computer. For a version with its RAM upgra
🗣 VoiceMe: Personalized voice generation in TTS Abstract Novel text-to-speech systems can generate entirely new voices that were not seen during trai
phonenumbers Python Library This is a Python port of Google's libphonenumber library It supports Python 2.5-2.7 and Python 3.x (in the same codebase,
Kashgari Overview | Performance | Installation | Documentation | Contributing 🎉 🎉 🎉 We released the 2.0.0 version with TF2 Support. 🎉 🎉 🎉 If you
Channel Auto-Post Bot This bot can send all new messages from one channel, directly to another channel (or group, just in case), without the forwarded
Content What is deepaudio-speaker? Installation Get Started Model Architecture How to contribute to deepaudio-speaker? Acknowledge What is deepaudio-s
Write Better Python Code This repository contains the code examples used in my Write Better Python Code series published on YouTube: https:/
本项目是参考 TPLinker 中HandshakingTagging思想,将TPLinker由原来的关系抽取(RE)模型修改为命名实体识别(NER)模型。
医学实体抽取_GlobalPointer_torch 介绍 思想来自于苏神 GlobalPointer,原始版本是基于keras实现的,模型结构实现参考现有 pytorch 复现代码【感谢!】,基于torch百分百复现苏神原始效果。 数据集 中文医学命名实体数据集 点这里申请,很简单,共包含九类医学
UIS-RNN Overview This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm. UIS-RNN solves the problem of s
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context This repository contains the code in both PyTorch and TensorFlow for our paper
Trex is a tool to match semantically similar functions based on transfer learning.
End to end text to speech system using gruut and onnx
Large Models of Source Code I occasionally train and publicly release large neural language models on programs, including PolyCoder. Here, I describe
Active Learning allows you to efficiently label training data in a small-data scenario.
🚨 Update 🚨 : As of 2021/10/17, the jiant project is no longer being actively maintained. This means there will be no plans to add new models, tasks,
Unsupervised Machine Translation This repository contains the original implementation of the unsupervised PBSMT and NMT models presented in Phrase-Bas
DeepMatcher DeepMatcher is a Python package for performing entity and text matching using deep learning. It provides built-in neural networks and util
Instance-level Image Retrieval using Reranking Transformers Fuwen Tan, Jiangbo Yuan, Vicente Ordonez, ICCV 2021. Abstract Instance-level image retriev