1759 Repositories
Python language-files Libraries
Extended refactoring capabilities for Python LSP Server using Rope.
pylsp-rope Extended refactoring capabilities for Python LSP Server using Rope. This is a plugin for Python LSP Server, so you also need to have it ins
Simple tools to make/dump CPC+ CPR cartridge files
Simple tools to make/dump CPC+ CPR cartridge files mkcpr.py: make a CPR file from files (one chunk per file); see notes cprdump.py: dump the chunks of
Pre-training BERT masked language models with custom vocabulary
Pre-training BERT Masked Language Models (MLM) This repository contains the method to pre-train a BERT model using custom vocabulary. It was used to p
Python Marlin Configurator to make valid configuration files to be used to compile Marlin with.
marlin-configurator Concept originally imagined by The-EG using PowerShell Build Script for Marlin Configurations The purpose of this project is to pa
A CLI tools to get you started on any project in any language
Any Template A faster easier to Quick start any programming project. Installation pip3 install any-template Features No third party dependencies. Tem
A tool written in python to generate basic repo files from github
A tool written in python to generate basic repo files from github
django Filer is a file management application for django that makes handling of files and images a breeze.
django Filer is a file management application for django that makes handling of files and images a breeze.
Using AWS Batch jobs to bulk copy/sync files in S3
Using AWS Batch jobs to bulk copy/sync files in S3
produces PCA on genotypes from fasta files (popPhyl's ID format)
popPhyl_PCA Performs PCA of genotypes. Works in two steps. 1. Input file A single fasta file containing different loci, in different populations/speci
Built with Python programming language and QT library and Guess the number in three easy, medium and hard rolls
password-generator Built with Python programming language and QT library and Guess the number in three easy, medium and hard rolls Password generator
Build a translation program similar to Google Translate with Python programming language and QT library
google-translate Build a translation program similar to Google Translate with Python programming language and QT library Different parts of the progra
Built with Python programming language and QT library and Guess the number in three easy, medium and hard rolls
guess-the-numbers Built with Python programming language and QT library and Guess the number in three easy, medium and hard rolls Number guessing game
ChessCoach is a neural network-based chess engine capable of natural-language commentary.
ChessCoach is a neural network-based chess engine capable of natural-language commentary.
Google's Meena transformer chatbot implementation
Here's my attempt at recreating Meena, a state of the art chatbot developed by Google Research and described in the paper Towards a Human-like Open-Domain Chatbot.
LexGLUE: A Benchmark Dataset for Legal Language Understanding in English
LexGLUE: A Benchmark Dataset for Legal Language Understanding in English ⚖️ 🏆 🧑🎓 👩⚖️ Dataset Summary Inspired by the recent widespread use of th
Code for CodeT5: a new code-aware pre-trained encoder-decoder model.
CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation This is the official PyTorch implementation
PolyGlot, a fuzzing framework for language processors
PolyGlot, a fuzzing framework for language processors Build We tested PolyGlot on Ubuntu 18.04. Get the source code: git clone https://github.com/s3te
DziriBERT: a Pre-trained Language Model for the Algerian Dialect
DziriBERT DziriBERT is the first Transformer-based Language Model that has been pre-trained specifically for the Algerian Dialect. It handles Algerian
An esoteric programming language that supports concurrency, regex, and web requests.
The Hofstadter Esoteric Programming Language Hofstadter's Law: It always takes longer than you expect, even when you take into account Hofstadter's La
How to Create a YouTube Bot that Increases Views using Python Programming Language
YouTube-Bot-in-Python-Selenium How to Create a YouTube Bot that Increases Views using Python Programming Language. The app is for educational purpose
Config files for my GitHub profile.
Config files for my GitHub profile.
Confirm that files have been uploaded to Backblaze Cloud Backup successfully
Backblaze Backup Checker This Python script compares metadata captured from files within source folders against data parsed from Backblaze Cloud Backu
This program can encrypt and decrypt your files so that they can no longer be identified.
File_Cryptographer Table of Contents: About the Program Features Requirements Preview Credits Reach Me See Also About the Program: with this program,
Analyse a forensic target (such as a directory) to find and report files found and not found from CIRCL hashlookup public service
Analyse a forensic target (such as a directory) to find and report files found and not found from CIRCL hashlookup public service. This tool can help a digital forensic investigator to know the context, origin of specific files during a digital forensic investigation.
LSTM and QRNN Language Model Toolkit for PyTorch
LSTM and QRNN Language Model Toolkit This repository contains the code used for two Salesforce Research papers: Regularizing and Optimizing LSTM Langu
PyGo custom language, New but similar language programming
New but similar language programming. Now we are capable to program in a very similar language to Python but at the same time get the efficiency of Go.
Wagtail CLIP allows you to search your Wagtail images using natural language queries.
Wagtail CLIP allows you to search your Wagtail images using natural language queries.
Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks
Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks, which modifies the input text with a textual template and directly uses PLMs to conduct pre-trained tasks. This library provides a standard, flexible and extensible framework to deploy the prompt-learning pipeline. OpenPrompt supports loading PLMs directly from huggingface transformers. In the future, we will also support PLMs implemented by other libraries.
A topology optimization framework written in Taichi programming language, which is embedded in Python.
Taichi TopOpt (Under Active Development) Intro A topology optimization framework written in Taichi programming language, which is embedded in Python.
Google and Stanford University released a new pre-trained model called ELECTRA
Google and Stanford University released a new pre-trained model called ELECTRA, which has a much compact model size and relatively competitive performance compared to BERT and its variants. For further accelerating the research of the Chinese pre-trained model, the Joint Laboratory of HIT and iFLYTEK Research (HFL) has released the Chinese ELECTRA models based on the official code of ELECTRA. ELECTRA-small could reach similar or even higher scores on several NLP tasks with only 1/10 parameters compared to BERT and its variants.
A Simple Telegram Bot that can Download Files From Mega.nz and Upload It to Telegram
MegaDL-Bot A Simple Telegram Bot By @mrkpbots to Download Files From Mega.nz and Upload It to Telegram Features No Login Required All Mega.nz File Lin
A Chinese to English Neural Model Translation Project
ZH-EN NMT Chinese to English Neural Machine Translation This project is inspired by Stanford's CS224N NMT Project Dataset used in this project: News C
Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation, available for both PyTorch and Tensorflow.
Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation, available for both PyTorch and Tensorflow.
Pytorch implementation of paper "Efficient Nearest Neighbor Language Models" (EMNLP 2021)
Pytorch implementation of paper "Efficient Nearest Neighbor Language Models" (EMNLP 2021)
Paradigm Shift in NLP - "Paradigm Shift in Natural Language Processing".
Paradigm Shift in NLP Welcome to the webpage for "Paradigm Shift in Natural Language Processing". Some resources of the paper are constantly maintaine
CLIPort: What and Where Pathways for Robotic Manipulation
CLIPort CLIPort: What and Where Pathways for Robotic Manipulation Mohit Shridhar, Lucas Manuelli, Dieter Fox CoRL 2021 CLIPort is an end-to-end imitat
A simple API to upload notes or files to KBFS
This API can be used to upload either secure notes or files to a secure KeybaseFS folder.
This is a tool to automate the icons generation from sets of svg files into fonts and atlases.
SVG Icon processing tool for C++
Pre-trained BERT Models for Ancient and Medieval Greek, and associated code for LaTeCH 2021 paper titled - "A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek"
Ancient Greek BERT The first and only available Ancient Greek sub-word BERT model! State-of-the-art post fine-tuning on Part-of-Speech Tagging and Mor
DziriBERT: a Pre-trained Language Model for the Algerian Dialect
DziriBERT is the first Transformer-based Language Model that has been pre-trained specifically for the Algerian Dialect.
Differentiable architecture search for convolutional and recurrent networks
Differentiable Architecture Search Code accompanying the paper DARTS: Differentiable Architecture Search Hanxiao Liu, Karen Simonyan, Yiming Yang. arX
Wonk is a tool for combining a set of AWS policy files into smaller compiled policy sets.
Wonk is a tool for combining a set of AWS policy files into smaller compiled policy sets.
Sign Language is detected in realtime using video sequences. Our approach involves MediaPipe Holistic for keypoints extraction and LSTM Model for prediction.
RealTime Sign Language Detection using Action Recognition Approach Real-Time Sign Language is commonly predicted using models whose architecture consi
The Official interpreter for the Pix programming language.
The official interpreter for the Pix programming language. Pix Pix is a programming language dedicated to readable syntax and usability Q) Is Pix the
Pytorch implementations of various Deep NLP models in cs-224n(Stanford Univ)
DeepNLP-models-Pytorch Pytorch implementations of various Deep NLP models in cs-224n(Stanford Univ: NLP with Deep Learning) This is not for Pytorch be
An IPython Notebook tutorial on deep learning for natural language processing, including structure prediction.
Table of Contents: Introduction to Torch's Tensor Library Computation Graphs and Automatic Differentiation Deep Learning Building Blocks: Affine maps,
C++ Implementation of PyTorch Tutorials for Everyone
C++ Implementation of PyTorch Tutorials for Everyone OS (Compiler)\LibTorch 1.9.0 macOS (clang 10.0, 11.0, 12.0) Linux (gcc 8, 9, 10, 11) Windows (msv
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 200 universities.
D2L.ai: Interactive Deep Learning Book with Multi-Framework Code, Math, and Discussions Book website | STAT 157 Course at UC Berkeley | Latest version
Code2flow generates call graphs for dynamic programming language. Code2flow supports Python, Javascript, Ruby, and PHP.
Code2flow generates call graphs for dynamic programming language. Code2flow supports Python, Javascript, Ruby, and PHP.
the project for the most brutal and effective language learning technique
- "The project for the most brutal and effective language learning technique" (c) Alex Kay The langflow project was created especially for language le
Repository containing the project files for CEN4020's Team Utah.
inCollege-Team-Utah Repository containing the project files for CEN4020's Team Utah. Contributors: Deepak Putta Jose Ramirez Fuentes Jaason Raudales C
Automating the process of sorting files in my downloads folder by file type.
downloads-folder-automation Automating the process of sorting files in a user's downloads folder on Windows by file type. This script iterates through
Lingvo is a framework for building neural networks in Tensorflow, particularly sequence models.
Lingvo is a framework for building neural networks in Tensorflow, particularly sequence models.
Parser manager for parsing DOC, DOCX, PDF or HTML files
Parser manager Description Parser gets PDF, DOC, DOCX or HTML file via API and saves parsed data to the database. Implemented in Ruby 3.0.1 using Acti
BMInf (Big Model Inference) is a low-resource inference package for large-scale pretrained language models (PLMs).
BMInf (Big Model Inference) is a low-resource inference package for large-scale pretrained language models (PLMs).
Code for ACL'2021 paper WARP 🌀 Word-level Adversarial ReProgramming
Code for ACL'2021 paper WARP 🌀 Word-level Adversarial ReProgramming. Outperforming `GPT-3` on SuperGLUE Few-Shot text classification.
📢 Video Chat Stream Telegram Bot. Can ⏳ Stream Live Videos, Radios, YouTube Videos & Telegram Video Files On Your Video Chat Of Channels & Groups !
Telegram Video Chat Bot (Beta) 📢 Video Chat Stream Telegram Bot 🤖 Can Stream Live Videos, Radios, YouTube Videos & Telegram Video Files On Your Vide
An adaptable Snakemake workflow which uses GATKs best practice recommendations to perform germline mutation calling starting with BAM files
Germline Mutation Calling This Snakemake workflow follows the GATK best-practice recommandations to call small germline variants. The pipeline require
Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation
GPT2-Pytorch with Text-Generator Better Language Models and Their Implications Our model, called GPT-2 (a successor to GPT), was trained simply to pre
PyTorch original implementation of Cross-lingual Language Model Pretraining.
XLM NEW: Added XLM-R model. PyTorch original implementation of Cross-lingual Language Model Pretraining. Includes: Monolingual language model pretrain
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context Code in both PyTorch and TensorFlow
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context This repository contains the code in both PyTorch and TensorFlow for our paper
🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.
English | 简体中文 | 繁體中文 State-of-the-art Natural Language Processing for Jax, PyTorch and TensorFlow 🤗 Transformers provides thousands of pretrained mo
PyTorch Language Model for 1-Billion Word (LM1B / GBW) Dataset
PyTorch Large-Scale Language Model A Large-Scale PyTorch Language Model trained on the 1-Billion Word (LM1B) / (GBW) dataset Latest Results 39.98 Perp
Pyccel stands for Python extension language using accelerators.
Pyccel stands for Python extension language using accelerators.
Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation, available for both PyTorch and Tensorflow.
Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation, available for both PyTorch and Tensorflow.
Discretized Integrated Gradients for Explaining Language Models (EMNLP 2021)
Discretized Integrated Gradients for Explaining Language Models (EMNLP 2021) Overview of paths used in DIG and IG. w is the word being attributed. The
Labelling platform for text using distant supervision
With DataQA, you can label unstructured text documents using rule-based distant supervision.
wireguard-config-benchmark is a python script that benchmarks the download speeds for the connections defined in one or more wireguard config files
wireguard-config-benchmark is a python script that benchmarks the download speeds for the connections defined in one or more wireguard config files. If multiple configs are benchmarked it will output a file ranking them from fastest to slowest.
Tevatron is a simple and efficient toolkit for training and running dense retrievers with deep language models.
Tevatron Tevatron is a simple and efficient toolkit for training and running dense retrievers with deep language models. The toolkit has a modularized
Easy to use Python module to extract Exif metadata from digital image files.
Easy to use Python module to extract Exif metadata from digital image files.
Simple embedding based text classifier inspired by fastText, implemented in tensorflow
FastText in Tensorflow This project is based on the ideas in Facebook's FastText but implemented in Tensorflow. However, it is not an exact replica of
A PyTorch implementation of the Transformer model in "Attention is All You Need".
Attention is all you need: A Pytorch Implementation This is a PyTorch implementation of the Transformer model in "Attention is All You Need" (Ashish V
PICARD - Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models
This is the official implementation of the following paper: Torsten Scholak, Nathan Schucher, Dzmitry Bahdanau. PICARD - Parsing Incrementally for Con
A Domain Specific Language (DSL) for building language patterns. These can be later compiled into spaCy patterns, pure regex, or any other format
RITA DSL This is a language, loosely based on language Apache UIMA RUTA, focused on writing manual language rules, which compiles into either spaCy co
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks
A Deep Learning NLP/NLU library by Intel® AI Lab Overview | Models | Installation | Examples | Documentation | Tutorials | Contributing NLP Architect
A Multilingual Latent Dirichlet Allocation (LDA) Pipeline with Stop Words Removal, n-gram features, and Inverse Stemming, in Python.
Multilingual Latent Dirichlet Allocation (LDA) Pipeline This project is for text clustering using the Latent Dirichlet Allocation (LDA) algorithm. It
Accurately generate all possible forms of an English word e.g "election" -- "elect", "electoral", "electorate" etc.
Accurately generate all possible forms of an English word Word forms can accurately generate all possible forms of an English word. It can conjugate v
The tool to make NLP datasets ready to use
chazutsu photo from Kaikado, traditional Japanese chazutsu maker chazutsu is the dataset downloader for NLP. import chazutsu r = chazutsu.data
TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP
TextAttack 🐙 Generating adversarial examples for NLP models [TextAttack Documentation on ReadTheDocs] About • Setup • Usage • Design About TextAttack
Strict separation of config from code.
Python Decouple: Strict separation of settings from code Decouple helps you to organize your settings so that you can change parameters without having
EMNLP'2021: Can Language Models be Biomedical Knowledge Bases?
BioLAMA BioLAMA is biomedical factual knowledge triples for probing biomedical LMs. The triples are collected and pre-processed from three sources: CT
This repo in the implementation of EMNLP'21 paper "SPARQLing Database Queries from Intermediate Question Decompositions" by Irina Saparina, Anton Osokin
SPARQLing Database Queries from Intermediate Question Decompositions This repo is the implementation of the following paper: SPARQLing Database Querie
Incorporating KenLM language model with HuggingFace implementation of Wav2Vec2CTC Model using beam search decoding
Wav2Vec2CTC With KenLM Using KenLM ARPA language model with beam search to decode audio files and show the most probable transcription. Assuming you'v
Python reader for Linked Data in HDF5 files
Linked Data are becoming more popular for user-created metadata in HDF5 files.
The repository for the paper: Multilingual Translation via Grafting Pre-trained Language Models
Graformer The repository for the paper: Multilingual Translation via Grafting Pre-trained Language Models Graformer (also named BridgeTransformer in t
Serve files with Django.
django-downloadview django-downloadview makes it easy to serve files with Django: you manage files with Django (permissions, filters, generation, ...)
A Django application that provides country choices for use with forms, flag icons static files, and a country field for models.
Django Countries A Django application that provides country choices for use with forms, flag icons static files, and a country field for models. Insta
Django-Audiofield is a simple app that allows Audio files upload, management and conversion to different audio format (mp3, wav & ogg), which also makes it easy to play audio files into your Django application.
Django-Audiofield Description: Django Audio Management Tools Maintainer: Areski Contributors: list of contributors Django-Audiofield is a simple app t
AnnIE - Annotation Platform, tool for open information extraction annotations using text files.
AnnIE - Annotation Platform, tool for open information extraction annotations using text files.
Natural Language Processing with transformers
we want to create a repo to illustrate usage of transformers in chinese
Simple DDL Parser to parse SQL (HQL, TSQL, AWS Redshift, Snowflake and other dialects) ddl files to json/python dict with full information about columns: types, defaults, primary keys, etc.
Simple DDL Parser Build with ply (lex & yacc in python). A lot of samples in 'tests/. Is it Stable? Yes, library already has about 5000+ usage per day
Search Git commits in natural language
NaLCoS - NAtural Language COmmit Search Search commit messages in your repository in natural language. NaLCoS (NAtural Language COmmit Search) is a co
Abstractive opinion summarization system (SelSum) and the largest dataset of Amazon product summaries (AmaSum). EMNLP 2021 conference paper.
Learning Opinion Summarizers by Selecting Informative Reviews This repository contains the codebase and the dataset for the corresponding EMNLP 2021
IndoBERTweet is the first large-scale pretrained model for Indonesian Twitter. Published at EMNLP 2021 (main conference)
IndoBERTweet 🐦 🇮🇩 1. Paper Fajri Koto, Jey Han Lau, and Timothy Baldwin. IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effe
[EMNLP 2021] LM-Critic: Language Models for Unsupervised Grammatical Error Correction
LM-Critic: Language Models for Unsupervised Grammatical Error Correction This repo provides the source code & data of our paper: LM-Critic: Language M
Code for the paper "Flexible Generation of Natural Language Deductions"
Code for the paper "Flexible Generation of Natural Language Deductions"
Code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".
This repository contains the code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".
A django compressor tool that bundles css, js files to a single css, js file with webpack and updates your html files with respective css, js file path.
django-webpacker's documentation: Introduction: django-webpacker is a django compressor tool which bundles css, js files to a single css, js file with
Mathics is a general-purpose computer algebra system (CAS). It is an open-source alternative to Mathematica
Mathics is a general-purpose computer algebra system (CAS). It is an open-source alternative to Mathematica. It is free both as in "free beer" and as in "freedom".
CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation
CPT This repository contains code and checkpoints for CPT. CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Gener