1394 Repositories
Python deduplicate-text-datasets Libraries
Sorce code and datasets for "K-BERT: Enabling Language Representation with Knowledge Graph",
K-BERT Sorce code and datasets for "K-BERT: Enabling Language Representation with Knowledge Graph", which is implemented based on the UER framework. R
Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"
UNITER: UNiversal Image-TExt Representation Learning This is the official repository of UNITER (ECCV 2020). This repository currently supports finetun
Oscar and VinVL
Oscar: Object-Semantics Aligned Pre-training for Vision-and-Language Tasks VinVL: Revisiting Visual Representations in Vision-Language Models Updates
Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition
Wav2Vec2 STT Python Beta Software Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 mode
This is the code for our paper "Iconary: A Pictionary-Based Game for Testing Multimodal Communication with Drawings and Text"
Iconary This is the code for our paper "Iconary: A Pictionary-Based Game for Testing Multimodal Communication with Drawings and Text". It includes the
Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation
Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation Woncheol Shin1, Gyubok Lee1, Jiyoung Lee1, Joonseok Lee2,3, Edward Ch
Official Pytorch implementation of "CLIPstyler:Image Style Transfer with a Single Text Condition"
CLIPstyler Official Pytorch implementation of "CLIPstyler:Image Style Transfer with a Single Text Condition" Environment Pytorch 1.7.1, Python 3.6 $ c
PLUR is a collection of source code datasets suitable for graph-based machine learning.
PLUR (Programming-Language Understanding and Repair) is a collection of source code datasets suitable for graph-based machine learning. We provide scripts for downloading, processing, and loading the datasets. This is done by offering a unified API and data structures for all datasets.
Re-implementation of 'Grokking: Generalization beyond overfitting on small algorithmic datasets'
Re-implementation of the paper 'Grokking: Generalization beyond overfitting on small algorithmic datasets' Paper Original paper can be found here Data
This program ingests a Cisco "sh ip arp" as a text file and produces the list of vendors seen in the file
IP-ARP-Vendor_lookup This program ingests a Cisco "sh ip arp" as a text file and produces the list of vendors seen in the file Why? Answers the questi
Open source annotation tool for machine learning practitioners.
doccano doccano is an open source text annotation tool for humans. It provides annotation features for text classification, sequence labeling and sequ
A Model for Natural Language Attack on Text Classification and Inference
TextFooler A Model for Natural Language Attack on Text Classification and Inference This is the source code for the paper: Jin, Di, et al. "Is BERT Re
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
ELECTRA Introduction ELECTRA is a method for self-supervised language representation learning. It can be used to pre-train transformer networks using
Optimizing Deeper Transformers on Small Datasets
DT-Fixup Optimizing Deeper Transformers on Small Datasets Paper published in ACL 2021: arXiv Detailed instructions to replicate our results in the pap
This repository contains the code for "Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference"
Pattern-Exploiting Training (PET) This repository contains the code for Exploiting Cloze Questions for Few-Shot Text Classification and Natural Langua
DANeS is an open-source E-newspaper dataset by collaboration between DATASET JSC (dataset.vn) and AIV Group (aivgroup.vn)
DANeS - Open-source E-newspaper dataset Source: Technology vector created by macrovector - www.freepik.com. DANeS is an open-source E-newspaper datase
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Website • Docs • Twitter • Join Slack Community What is Label Studio? Label Studio is an open source data labeling tool. It lets you label data types
This is used to convert a string to an Image with Handwritten Characters.
Text-to-Handwriting-using-python This is used to convert a string to an Image with Handwritten Characters. text_to_handwriting(string: str, save_to: s
Searches for MAC addresses in a text file of a Cisco "show IP arp" in any address format
show-ip-arp-mac-lookup Searches for MAC addresses in a text file of a Cisco "show IP arp" in any address format What it does: Takes a text file with t
Line-level Handwritten Text Recognition (HTR) system implemented with TensorFlow.
Line-level Handwritten Text Recognition with TensorFlow This model is an extended version of the Simple HTR system implemented by @Harald Scheidl and
FOTS Pytorch Implementation
News!!! Recognition branch now is added into model. The whole project has beed optimized and refactored. ICDAR Dataset SynthText 800K Dataset detectio
DALLE-tools provided useful dataset utilities to improve you workflow with WebDatasets.
DALLE tools DALLE-tools is a github repository with useful tools to categorize, annotate or check the sanity of your datasets. Installation Just clone
Various Algorithms for Short Text Mining
Short Text Mining in Python Introduction This package shorttext is a Python package that facilitates supervised and unsupervised learning for short te
Tightness-aware Evaluation Protocol for Scene Text Detection
TIoU-metric Release on 27/03/2019. This repository is built on the ICDAR 2015 evaluation code. If you propose a better metric and require further eval
Datasets, Transforms and Models specific to Computer Vision
vision Datasets, Transforms and Models specific to Computer Vision Installation First install the nightly version of OneFlow python3 -m pip install on
Official implementation of VQ-Diffusion: Vector Quantized Diffusion Model for Text-to-Image Synthesis
Official implementation of VQ-Diffusion: Vector Quantized Diffusion Model for Text-to-Image Synthesis
Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic [Paper] [Colab is coming soon] Approach Example Usage To r
A leetcode scraper to compile all questions in leetcode free tier to text file. pdf also available.
A leetcode scraper to compile all questions in leetcode free tier to text file, pdf also available. if new questions get added, run again to get new questions.
A benchmark dataset for emulating atmospheric radiative transfer in weather and climate models with machine learning (NeurIPS 2021 Datasets and Benchmarks Track)
ClimART - A Benchmark Dataset for Emulating Atmospheric Radiative Transfer in Weather and Climate Models Official PyTorch Implementation Using deep le
Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic [Paper] [Colab is coming soon] Approach Example Usage To r
Official implementation for: Blended Diffusion for Text-driven Editing of Natural Images.
Blended Diffusion for Text-driven Editing of Natural Images Blended Diffusion for Text-driven Editing of Natural Images Omri Avrahami, Dani Lischinski
Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, CVPR 2016.
SynthText Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Ved
WSDM‘2022: Knowledge Enhanced Sports Game Summarization
Knowledge Enhanced Sports Game Summarization Cooming Soon! :) Data will be released after approval process. Code will be published once the author of
Traditional Chinese Text Recognition Dataset: Synthetic Dataset and Labeled Data
Traditional Chinese Text Recognition Dataset: Synthetic Dataset and Labeled Data Authors: Yi-Chang Chen, Yu-Chuan Chang, Yen-Cheng Chang and Yi-Ren Ye
PyTorch implementation of CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition
PyTorch implementation of CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition The unofficial code of CDistNet. Now, we ha
Text-Based zombie apocalyptic decision-making game in Python
Inspiration We shared university first year game coursework.[to gauge previous experience and start brainstorming] Adapted a particular nuclear fallou
Experimenting with computer vision techniques to generate annotated image datasets from gameplay recordings automatically.
Experimenting with computer vision techniques to generate annotated image datasets from gameplay recordings automatically. The collected data will then be used to train a deep neural network that can detect enemy player models in real time, during gameplay. Finally, a virtual input device will adjust the player's crosshair based on live detections for greater accuracy.
Combine Tacotron2 and Hifi GAN to generate speech from text
EndToEndTextToSpeech Combine Tacotron2 and Hifi GAN to generate speech from text Download weights Hifi GAN - hifi_gan/checkpoint/ : pretrain 2.5M ste
A unofficial pytorch implementation of PAN(PSENet2): Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network
Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network Requirements pytorch 1.1+ torchvision 0.3+ pyclipper opencv3 gcc
A simple telegram bot to recognize lengthy voice files to text and vice versa with multiple language support.
Voicebot A simple Telegram bot to convert lengthy voice clips to text and vice versa with supporting languages. Mandatory Variables API_HASH - Yo
Veri Setinizi Yolov5 Formatına Dönüştürün
Veri Setinizi Yolov5 Formatına Dönüştürün! Bu Repo da Neler Var? Xml Formatındaki Veri Setini .Txt Formatına Çevirme Xml Formatındaki Dosyaları Silme
Translate U is capable of translating the text present in an image from one language to the other.
Translate U is capable of translating the text present in an image from one language to the other. The app uses OCR and Google translate to identify and translate across 80+ languages.
Conversational text Analysis using various NLP techniques
PyConverse Let me try first Installation pip install pyconverse Usage Please try this notebook that demos the core functionalities: basic usage noteb
Text-to-Speech for Belarusian language
title emoji colorFrom colorTo sdk app_file pinned Belarusian TTS 🐸 green green gradio app.py false Belarusian TTS 📢 🤖 Belarusian TTS (text-to-speec
Generate text captions for images from their CLIP embeddings. Includes PyTorch model code and example training script.
clip-text-decoder Generate text captions for images from their CLIP embeddings. Includes PyTorch model code and example training script. Example Predi
Package to compute Mauve, a similarity score between neural text and human text. Install with `pip install mauve-text`.
MAUVE MAUVE is a library built on PyTorch and HuggingFace Transformers to measure the gap between neural text and human text with the eponymous MAUVE
This tool will help you convert your text to handwriting xD
So your teacher asked you to upload written assignments? Hate writing assigments? This tool will help you convert your text to handwriting xD
Text Extraction Formulation + Feedback Loop for state-of-the-art WSD (EMNLP 2021)
ConSeC is a novel approach to Word Sense Disambiguation (WSD), accepted at EMNLP 2021. It frames WSD as a text extraction task and features a feedback loop strategy that allows the disambiguation of a target word to be conditioned not only on its context but also on the explicit senses assigned to nearby words.
Simple python program to auto credit your code, text, book, whatever!
Credit Simple python program to auto credit your code, text, book, whatever! Setup First change credit_text to whatever text you would like to credit
A neat little program to read the text from the "All Ten Fingers" program, and write them back.
ATFTyper A neat little program to read the text from the "All Ten Fingers" program, and write them back. How does it work? This program uses the Pillo
Some example code for using a raspberry pi to draw text (including emojis) and twitch emotes to a HUB75 RGB matrix via an HTTP post endpoint.
Some example code for using a raspberry pi to draw text (including emojis) and twitch emotes to a HUB75 RGB matrix via an HTTP post endpoint.
Convert Video Files To Text And Audio
Video-To-Text Convert Video Files To Text And Audio Convert To Audio 1: open dvtt folder in cmd 2: run this command in cmd = main.py Audio Convert To
Omnidirectional Scene Text Detection with Sequential-free Box Discretization (IJCAI 2019). Including competition model, online demo, etc.
Box_Discretization_Network This repository is built on the pytorch [maskrcnn_benchmark]. The method is the foundation of our ReCTs-competition method
Metrics to evaluate quality and efficacy of synthetic datasets.
An Open Source Project from the Data to AI Lab, at MIT Metrics for Synthetic Data Generation Projects Website: https://sdv.dev Documentation: https://
NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.
NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.
A hilarious program that translates text to Google's
Bork, bork, bork! A hilarious program that translates text in a file to Google's "bork bork bork' language. How to use Download the main.py file. Chan
Code for text augmentation method leveraging large-scale language models
HyperMix Code for our paper GPT3Mix and conducting classification experiments using GPT-3 prompt-based data augmentation. Getting Started Installing P
Automated question generation and question answering from Turkish texts using text-to-text transformers
Turkish Question Generation Offical source code for "Automated question generation & question answering from Turkish texts using text-to-text transfor
A python gui program to generate reddit text to speech videos from the id of any post.
Reddit text to speech generator A python gui program to generate reddit text to speech videos from the id of any post. Current functionality Generate
Romanian Automatic Speech Recognition from the ROBIN project
RobinASR This repository contains Robin's Automatic Speech Recognition (RobinASR) for the Romanian language based on the DeepSpeech2 architecture, tog
code for EMNLP 2019 paper Text Summarization with Pretrained Encoders
PreSumm This code is for EMNLP 2019 paper Text Summarization with Pretrained Encoders Updates Jan 22 2020: Now you can Summarize Raw Text Input!. Swit
OpenMMLab Text Detection, Recognition and Understanding Toolbox
Introduction English | 简体中文 MMOCR is an open-source toolbox based on PyTorch and mmdetection for text detection, text recognition, and the correspondi
VGGVox models for Speaker Identification and Verification trained on the VoxCeleb (1 & 2) datasets
VGGVox models for speaker identification and verification This directory contains code to import and evaluate the speaker identification and verificat
An easy to use, user-friendly and efficient code for extracting OpenAI CLIP (Global/Grid) features from image and text respectively.
Extracting OpenAI CLIP (Global/Grid) Features from Image and Text This repo aims at providing an easy to use and efficient code for extracting image &
R interface to fast.ai
R interface to fastai The fastai package provides R wrappers to fastai. The fastai library simplifies training fast and accurate neural nets using mod
Creating multimodal multitask models
Fusion Brain Challenge The English version of the document can be found here. Обновления 01.11 Мы выкладываем пример данных, аналогичных private test
Data imputations library to preprocess datasets with missing data
Impyute is a library of missing data imputation algorithms. This library was designed to be super lightweight, here's a sneak peak at what impyute can do.
OCR Streamlit App is used to extract text from images using python's easyocr, pytorch and streamlit packages
OCR-Streamlit-App OCR Streamlit App is used to extract text from images using python's easyocr, pytorch and streamlit packages OCR app gets an image a
State of the art faster Natural Language Processing in Tensorflow 2.0 .
tf-transformers: faster and easier state-of-the-art NLP in TensorFlow 2.0 ****************************************************************************
Create beautiful diagrams just by typing mathematical notation in plain text.
Penrose Penrose is an early-stage system that is still in development. Our system is not ready for contributions or public use yet, but hopefully will
An implementation of the largeVis algorithm for visualizing large, high-dimensional datasets, for R
largeVis This is an implementation of the largeVis algorithm described in (https://arxiv.org/abs/1602.00370). It also incorporates: A very fast algori
Extract countries, regions and cities from a URL or text
This project is no longer being maintained and has been archived. Please check the Forks list for newer versions. Forks We are aware of two 3rd party
Raven is a tool written in Python3 allowing you to generate an unique image with some text.
🐦 Raven is a tool written in Python3 allowing you to generate an unique image with some text. It does it by searching the text on Google, do
German Text-To-Speech Engine using Tacotron and Griffin-Lim
jotts JoTTS is a German text-to-speech engine using tacotron and griffin-lim. The synthesizer model has been trained on my voice using Tacotron1. Due
Detectron2 for Document Layout Analysis
Detectron2 trained on PubLayNet dataset This repo contains the training configurations, code and trained models trained on PubLayNet dataset using Det
Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.
Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models. Solve a variety of tasks with pre-trained models or finetune them in
This toolkit provides codes to download and pre-process the SLUE datasets, train the baseline models, and evaluate SLUE tasks.
slue-toolkit We introduce Spoken Language Understanding Evaluation (SLUE) benchmark. This toolkit provides codes to download and pre-process the SLUE
Deep Learning with PyTorch made easy 🚀 !
Deep Learning with PyTorch made easy 🚀 ! Carefree? carefree-learn aims to provide CAREFREE usages for both users and developers. It also provides a c
Convert Text-to Handwriting Using Python
Convert Text-to Handwriting Using Python Description In this project we'll use python library that's "pywhatkit" for converting text to handwriting. t
Supervised Classification from Text (P)
MSc-Thesis Module: Masters Research Thesis Language: Python Grade: 75 Title: An investigation of supervised classification of therapeutic process from
Simple package for Sublime Text 4; download URL's for local viewing and editing
URLDownloader This is a simple example package that allows you to easily download the contents of any web URL to edit locally. Given a URL, the packag
L-Verse: Bidirectional Generation Between Image and Text
Far beyond learning long-range interactions of natural language, transformers are becoming the de-facto standard for many vision tasks with their power and scalabilty
NeurIPS 2021 Datasets and Benchmarks Track
AP-10K: A Benchmark for Animal Pose Estimation in the Wild Introduction | Updates | Overview | Download | Training Code | Key Questions | License Intr
A simple text editor for linux
wolf-editor A simple text editor for linux Installing using Deb Package Download newest package from releases CD into folder where the downloaded acka
Utilities for preprocessing text for deep learning with Keras
Note: This utility is really old and is no longer maintained. You should use keras.layers.TextVectorization instead of this. Utilities for pre-process
Predicting the usefulness of reviews given the review text and metadata surrounding the reviews.
Predicting Yelp Review Quality Table of Contents Introduction Motivation Goal and Central Questions The Data Data Storage and ETL EDA Data Pipeline Da
Multiple implementations for abstractive text summurization , using google colab
Text Summarization models if you are able to endorse me on Arxiv, i would be more than glad https://arxiv.org/auth/endorse?x=FRBB89 thanks This repo i
BERT score for text generation
BERTScore Automatic Evaluation Metric described in the paper BERTScore: Evaluating Text Generation with BERT (ICLR 2020). News: Features to appear in
NLP and Text Generation Experiments in TensorFlow 2.x / 1.x
Code has been run on Google Colab, thanks Google for providing computational resources Contents Natural Language Processing(自然语言处理) Text Classificati
Python Data Science Handbook: full text in Jupyter Notebooks
Python Data Science Handbook This repository contains the entire Python Data Science Handbook, in the form of (free!) Jupyter notebooks. How to Use th
Binary LSTM model for text classification
Text Classification The purpose of this repository is to create a neural network model of NLP with deep learning for binary classification of texts re
A machine learning model for analyzing text for user sentiment and determine whether its a positive, neutral, or negative review.
Sentiment Analysis on Yelp's Dataset Author: Roberto Sanchez, Talent Path: D1 Group Docker Deployment: Deployment of this application can be found her
Apply different text recognition services to images of handwritten documents.
Handprint The Handwritten Page Recognition Test is a command-line program that invokes HTR (handwritten text recognition) services on images of docume
Python: Wrangled and unpivoted gaming datasets. Tableau: created dashboards - Market Beacon and Player’s Shopping Guide.
Created two information products for GameStop. Using Python, wrangled and unpivoted datasets, and created Tableau dashboards.
This project uses unsupervised machine learning to identify correlations between daily inoculation rates in the USA and twitter sentiment in regards to COVID-19.
Twitter COVID-19 Sentiment Analysis Members: Christopher Bach | Khalid Hamid Fallous | Jay Hirpara | Jing Tang | Graham Thomas | David Wetherhold Pro
rst2pdf: Use a text editor. Make a PDF.
rst2pdf: Use a text editor. Make a PDF.
Conversational text Analysis using various NLP techniques
Conversational text Analysis using various NLP techniques
GUI Python Auto text bot.
Bagel Text Bot Auto text bot. Written in Python. Contributing Pull requests are welcome. For major changes, please open an issue first to discuss what
Shows Odin Lang errors in Sublime Text.
OdinErrors Shows Odin Lang errors in Sublime Text. Config Collections and defines are stored in ols.json (Hijacked from ols). { "collections": [
Neurons Dataset API - The official dataloader and visualization tools for Neurons Datasets.
Neurons Dataset API - The official dataloader and visualization tools for Neurons Datasets. Introduction We propose our dataloader API for loading and