133 Repositories
Python sentence-encoding Libraries
This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand how to use NLP for text feature engineering.
Python_Natural_Language_Processing This repository contains tutorials on important topics related to Natural Language Processing (NPL). No. Name 01 01
Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding (CVPR2022)
Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding by Qiaole Dong*, Chenjie Cao*, Yanwei Fu Paper and Supple
Process text, including tokenizing and representing sentences as vectors and Applying some concepts like RNN, LSTM and GRU to create a classifier can detect the language in which a sentence is written from among 17 languages.
Language Identifier What is this ? The goal of this project is to create a model that is able to predict a given sentence language through text proces
Library for converting from RGB / GrayScale image to base64 and back.
Library for converting RGB / Grayscale numpy images from to base64 and back. Installation pip install -U image_to_base_64 Conversion RGB to base 64 b
NAACL 2022: MCSE: Multimodal Contrastive Learning of Sentence Embeddings
MCSE: Multimodal Contrastive Learning of Sentence Embeddings This repository contains code and pre-trained models for our NAACL-2022 paper MCSE: Multi
In this project we predict the forest cover type using the cartographic variables in the training/test datasets.
Kaggle Competition: Forest Cover Type Prediction In this project we predict the forest cover type (the predominant kind of tree cover) using the carto
PBN Obfuscator: A overpowered obfuscator for python, which will help you protect your source code
PBN Obfuscator PBN Obfuscator is a overpowered obfuscator for python, which will
CATE: Computation-aware Neural Architecture Encoding with Transformers
CATE: Computation-aware Neural Architecture Encoding with Transformers Code for paper: CATE: Computation-aware Neural Architecture Encoding with Trans
hashily is a Python module that provides a variety of text decoding and encoding operations.
hashily is a python module that performs a variety of text decoding and encoding functions. It also various functions for encrypting and decrypting text using various ciphers.
Estimation of the CEFR complexity score of a given word, sentence or text.
NLP-Swedish … allows to estimate CEFR (Common European Framework of References) complexity score of a given word, sentence or text. CEFR scores come f
SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples
SNCSE SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples This is the repository for SNCSE. SNCSE aims to allev
For encoding a text longer than 512 tokens, for example 800. Set max_pos to 800 during both preprocessing and training.
LongScientificFormer For encoding a text longer than 512 tokens, for example 800. Set max_pos to 800 during both preprocessing and training. Some code
BERTopic is a topic modeling technique that leverages 🤗 transformers and c-TF-IDF to create dense clusters allowing for easily interpretable topics whilst keeping important words in the topic descriptions
BERTopic BERTopic is a topic modeling technique that leverages 🤗 transformers and c-TF-IDF to create dense clusters allowing for easily interpretable
The model is designed to train a single and large neural network in order to predict correct translation by reading the given sentence.
Neural Machine Translation communication system The model is basically direct to convert one source language to another targeted language using encode
Official implementation for the paper "SAPE: Spatially-Adaptive Progressive Encoding for Neural Optimization".
SAPE Project page Paper Official implementation for the paper "SAPE: Spatially-Adaptive Progressive Encoding for Neural Optimization". Environment Cre
A natural language processing model for sequential sentence classification in medical abstracts.
NLP PubMed Medical Research Paper Abstract (Randomized Controlled Trial) A natural language processing model for sequential sentence classification in
Prompt-BERT: Prompt makes BERT Better at Sentence Embeddings
Prompt-BERT: Prompt makes BERT Better at Sentence Embeddings Results on STS Tasks Model STS12 STS13 STS14 STS15 STS16 STSb SICK-R Avg. unsup-prompt-be
The undersampled DWI image using Slice-Interleaved Diffusion Encoding (SIDE) method can be reconstructed by the UNet network.
UNet-SIDE The undersampled DWI image using Slice-Interleaved Diffusion Encoding (SIDE) method can be reconstructed by the UNet network. For Super Reso
Mapping a variable-length sentence to a fixed-length vector using BERT model
Are you looking for X-as-service? Try the Cloud-Native Neural Search Framework for Any Kind of Data bert-as-service Using BERT model as a sentence enc
Semi-Automated Data Processing
Perform semi automated exploratory data analysis, feature engineering and feature selection on provided dataset by visualizing every possibilities on each step and assisting the user to make a meaningful decision to achieve a low-bias and low-variance model.
Convolutional Neural Networks for Sentence Classification
Convolutional Neural Networks for Sentence Classification Code for the paper Convolutional Neural Networks for Sentence Classification (EMNLP 2014). R
NICE-GAN — Official PyTorch Implementation Reusing Discriminators for Encoding: Towards Unsupervised Image-to-Image Translation
NICE-GAN-pytorch - Official PyTorch implementation of NICE-GAN: Reusing Discriminators for Encoding: Towards Unsupervised Image-to-Image Translation
Deep-Learning-Image-Captioning - Implementing convolutional and recurrent neural networks in Keras to generate sentence descriptions of images
Deep Learning - Image Captioning with Convolutional and Recurrent Neural Nets ========================================================================
The model is designed to train a single and large neural network in order to predict correct translation by reading the given sentence.
Neural Machine Translation communication system The model is basically direct to convert one source language to another targeted language using encode
Self-Guided Contrastive Learning for BERT Sentence Representations
Self-Guided Contrastive Learning for BERT Sentence Representations This repository is dedicated for releasing the implementation of the models utilize
Utilize Korean BERT model in sentence-transformers library
ko-sentence-transformers 이 프로젝트는 KoBERT 모델을 sentence-transformers 에서 보다 쉽게 사용하기 위해 만들어졌습니다. Ko-Sentence-BERT-SKTBERT 프로젝트에서는 KoBERT 모델을 sentence-trans
jiant is an NLP toolkit
🚨 Update 🚨 : As of 2021/10/17, the jiant project is no longer being actively maintained. This means there will be no plans to add new models, tasks,
Korean Sentence Embedding Repository
Korean-Sentence-Embedding 🍭 Korean sentence embedding repository. You can download the pre-trained models and inference right away, also it provides
PyTorch implementation of Rethinking Positional Encoding in Language Pre-training
TUPE PyTorch implementation of Rethinking Positional Encoding in Language Pre-training. Quickstart Clone this repository. git clone https://github.com
SEOVER: Sentence-level Emotion Orientation Vector based Conversation Emotion Recognition Model
SEOVER-Master This code is the implementation of paper: SEOVER: Sentence-level Emotion Orientation Vector based Conversation Emotion Recognition Model
Python codecs extension featuring CLI tools for encoding/decoding anything
CodExt Encode/decode anything. This library extends the native codecs library (namely for adding new custom encodings and character mappings) and prov
Eth brownie struct encoding example
eth-brownie struct encoding example Overview This repository contains an example of encoding a struct, so that it can be used in a function call, usin
Article Reranking by Memory-enhanced Key Sentence Matching for Detecting Previously Fact-checked Claims.
MTM This is the official repository of the paper: Article Reranking by Memory-enhanced Key Sentence Matching for Detecting Previously Fact-checked Cla
Malware-Related Sentence Classification
Malware-Related Sentence Classification This repo contains the code for the ICTAI 2021 paper "Enrichment of Features for Malware-Related Sentence Clas
Fast Base64 encoding/decoding in Python
Fast Base64 implementation This project is a wrapper on libbase64. It aims to provide a fast base64 implementation for base64 encoding/decoding. Insta
🔎 Like Chardet. 🚀 Package for encoding & language detection. Charset detection.
Charset Detection, for Everyone 👋 The Real First Universal Charset Detector A library that helps you read text from an unknown charset encoding. Moti
Using Bert as the backbone model for lime, designed for NLP task explanation (sentence pair text classification task)
Lime Comparing deep contextualized model for sentences highlighting task. In addition, take the classic explanation model "LIME" with bert-base model
Proof of concept for CVE-2021-31166, a remote HTTP.sys use-after-free triggered remotely.
CVE-2021-31166: HTTP Protocol Stack Remote Code Execution Vulnerability This is a proof of concept for CVE-2021-31166 ("HTTP Protocol Stack Remote Cod
Continuous Augmented Positional Embeddings (CAPE) implementation for PyTorch
PyTorch implementation of Continuous Augmented Positional Embeddings (CAPE), by Likhomanenko et al. Enhance your Transformer positional embeddings with easy-to-use augmentations!
Code for the paper "Controllable Video Captioning with an Exemplar Sentence"
SMCG Code for the paper "Controllable Video Captioning with an Exemplar Sentence" Introduction We investigate a novel and challenging task, namely con
Evaluating Cross-lingual Sentence Representations
XNLI: The Cross-Lingual NLI Corpus XNLI is an evaluation corpus for language transfer and cross-lingual sentence classification in 15 languages. New:
Code for the paper "Controllable Video Captioning with an Exemplar Sentence"
SMCG Code for the paper "Controllable Video Captioning with an Exemplar Sentence" Introduction We investigate a novel and challenging task, namely con
A PyTorch Implementation of PGL-SUM from "Combining Global and Local Attention with Positional Encoding for Video Summarization", Proc. IEEE ISM 2021
PGL-SUM: Combining Global and Local Attention with Positional Encoding for Video Summarization PyTorch Implementation of PGL-SUM From "PGL-SUM: Combin
Implementation of the paper 'Sentence Bottleneck Autoencoders from Transformer Language Models'
Introduction This repository contains the code for the paper Sentence Bottleneck Autoencoders from Transformer Language Models by Ivan Montero, Nikola
Code for the AAAI 2022 paper "Zero-Shot Cross-Lingual Machine Reading Comprehension via Inter-Sentence Dependency Graph".
multilingual-mrc-isdg Code for the AAAI 2022 paper "Zero-Shot Cross-Lingual Machine Reading Comprehension via Inter-Sentence Dependency Graph". This r
Official Implementation for Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation
Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation We present a generic image-to-image translation framework, pixel2style2pixel (pSp
Fluency ENhanced Sentence-bert Evaluation (FENSE), metric for audio caption evaluation. And Benchmark dataset AudioCaps-Eval, Clotho-Eval.
FENSE The metric, Fluency ENhanced Sentence-bert Evaluation (FENSE), for audio caption evaluation, proposed in the paper "Can Audio Captions Be Evalua
Parallel Latent Tree-Induction for Faster Sequence Encoding
FastTrees This repository contains the experimental code supporting the FastTrees paper by Bill Pung. Software Requirements Python 3.6, NLTK and PyTor
Encoding Causal Macrovariables
Encoding Causal Macrovariables Data Natural climate data ('El Nino') Self-generated data ('Simulated') Experiments Detecting macrovariables through th
Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations
Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations Code repo for paper Trans-Encoder: Unsupervised sentence-pa
Auto-Encoding Score Distribution Regression for Action Quality Assessment
DAE-AQA It is an open source program reference to paper Auto-Encoding Score Distribution Regression for Action Quality Assessment. 1.Introduction DAE
Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.
WSDEC This is the official repo for our NeurIPS paper Weakly Supervised Dense Event Captioning in Videos. Description Repo directories ./: global conf
Collapse a set of redundant kmers to use IUPAC degenerate bases
kmer-collapse Collapse a set of redundant kmers to use IUPAC degenerate bases Overview Given an input set of kmers, find the smallest set of kmers tha
A telegram bot for compressing/encoding videos in h264 format.
Video-Encoder-Bot a telegram bot for compressing/encoding videos in h264 format. Configuration Add values in environment variables or add them in conf
Code for the paper "A Simple but Tough-to-Beat Baseline for Sentence Embeddings".
Code for the paper "A Simple but Tough-to-Beat Baseline for Sentence Embeddings".
Pytorch implementation of Rosca, Mihaela, et al. "Variational Approaches for Auto-Encoding Generative Adversarial Networks."
alpha-GAN Unofficial pytorch implementation of Rosca, Mihaela, et al. "Variational Approaches for Auto-Encoding Generative Adversarial Networks." arXi
Fully Convolutional Refined Auto Encoding Generative Adversarial Networks for 3D Multi Object Scenes
Fully Convolutional Refined Auto-Encoding Generative Adversarial Networks for 3D Multi Object Scenes This repository contains the source code for Full
This is the code used in the paper "Entity Embeddings of Categorical Variables".
This is the code used in the paper "Entity Embeddings of Categorical Variables". If you want to get the original version of the code used for the Kagg
PESTO: Switching Point based Dynamic and Relative Positional Encoding for Code-Mixed Languages
PESTO: Switching Point based Dynamic and Relative Positional Encoding for Code-Mixed Languages Abstract NLP applications for code-mixed (CM) or mix-li
A CV toolkit for my papers.
PyTorch-Encoding created by Hang Zhang Documentation Please visit the Docs for detail instructions of installation and usage. Please visit the link to
Very simple encoding scheme that will encode data as a series of OwOs or UwUs.
OwO Encoder Very simple encoding scheme that will encode data as a series of OwOs or UwUs. The encoder is a simple state machine. Still needs a decode
BaseCrack is a tool written in Python that can decode all alphanumeric base encoding schemes.
BaseCrack Decoder For Base Encoding Schemes BaseCrack is a tool written in Python that can decode all alphanumeric base encoding schemes. This tool ca
Algorithmic encoding of protected characteristics and its implications on disparities across subgroups
Algorithmic encoding of protected characteristics and its implications on disparities across subgroups This repository contains the code for the paper
A PyTorch implementation of unsupervised SimCSE
A PyTorch implementation of unsupervised SimCSE
A sentence search engine that fetches examples from trusted news/media organisations. Great for writing better English.
A sentence search engine that fetches examples from trusted news/media websites. Great for improving writing & speaking better English.
Bootstrapped Unsupervised Sentence Representation Learning (ACL 2021)
Install first pip3 install -e . Training python3 training/unsupervised_tuning.py python3 training/supervised_tuning.py python3 training/multilingual_
The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization
PRIMER The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization. PRIMER is a pre-trained model for mu
Burp Suite extension for encoding/decoding EVM calldata
unblocker Burp Suite extension for encoding/decoding EVM calldata 0x00_prerequisites Burp Suite Java 8+ Python 2.7 0x01_installation clone this reposi
EMNLP'2021: SimCSE: Simple Contrastive Learning of Sentence Embeddings
SimCSE: Simple Contrastive Learning of Sentence Embeddings This repository contains the code and pre-trained models for our paper SimCSE: Simple Contr
The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization
PRIMER The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization. PRIMER is a pre-trained model for mu
Transformer Based Korean Sentence Spacing Corrector
TKOrrector Transformer Based Korean Sentence Spacing Corrector License Summary This solution is made available under Apache 2 license. See the LICENSE
Source code for GNN-LSPE (Graph Neural Networks with Learnable Structural and Positional Representations)
Graph Neural Networks with Learnable Structural and Positional Representations Source code for the paper "Graph Neural Networks with Learnable Structu
You can encode and decode base85, ascii85, base64, base32, and base16 with this tool.
You can encode and decode base85, ascii85, base64, base32, and base16 with this tool.
Cross-platform command-line AV1 / VP9 / HEVC / H264 encoding framework with per scene quality encoding
Av1an A cross-platform framework to streamline encoding Easy, Fast, Efficient and Feature Rich An easy way to start using AV1 / HEVC / H264 / VP9 / VP
BPEmb is a collection of pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) and trained on Wikipedia.
BPEmb is a collection of pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) and trained on Wikipedia. Its intended use is as input for neural models in natural language processing.
Pytorch implementation of CVPR2020 paper “VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation”
VectorNet Re-implementation This is the unofficial pytorch implementation of CVPR2020 paper "VectorNet: Encoding HD Maps and Agent Dynamics from Vecto
AirCode: A Robust Object Encoding Method
AirCode This repo contains source codes for the arXiv preprint "AirCode: A Robust Object Encoding Method" Demo Object matching comparison when the obj
A Structured Self-attentive Sentence Embedding
Structured Self-attentive sentence embeddings Implementation for the paper A Structured Self-Attentive Sentence Embedding, which was published in ICLR
CNNs for Sentence Classification in PyTorch
Introduction This is the implementation of Kim's Convolutional Neural Networks for Sentence Classification paper in PyTorch. Kim's implementation of t
Applying "Load What You Need: Smaller Versions of Multilingual BERT" to LaBSE
smaller-LaBSE LaBSE(Language-agnostic BERT Sentence Embedding) is a very good method to get sentence embeddings across languages. But it is hard to fi
Search Git commits in natural language
NaLCoS - NAtural Language COmmit Search Search commit messages in your repository in natural language. NaLCoS (NAtural Language COmmit Search) is a co
Line as a Visual Sentence: Context-aware Line Descriptor for Visual Localization
Line as a Visual Sentence with LineTR This repository contains the inference code, pretrained model, and demo scripts of the following paper. It suppo
This repository contains the PyTorch implementation of the paper STaCK: Sentence Ordering with Temporal Commonsense Knowledge appearing at EMNLP 2021.
STaCK: Sentence Ordering with Temporal Commonsense Knowledge This repository contains the pytorch implementation of the paper STaCK: Sentence Ordering
Simple integer-valued time series bit packing
Smahat allows to encode a sequence of integer values using a fixed (for all values) number of bits but minimal with regards to the data range. For example: for a series of boolean values only one bit is needed, for a series of integer percentages 7 bits are needed, etc.
🍊 PAUSE (Positive and Annealed Unlabeled Sentence Embedding), accepted by EMNLP'2021 🌴
PAUSE: Positive and Annealed Unlabeled Sentence Embedding Sentence embedding refers to a set of effective and versatile techniques for converting raw
The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"
The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"
[ACMMM 2021 Oral] Enhanced Invertible Encoding for Learned Image Compression
InvCompress Official Pytorch Implementation for "Enhanced Invertible Encoding for Learned Image Compression", ACMMM 2021 (Oral) Figure: Our framework
Korean Simple Contrastive Learning of Sentence Embeddings using SKT KoBERT and kakaobrain KorNLU dataset
KoSimCSE Korean Simple Contrastive Learning of Sentence Embeddings implementation using pytorch SimCSE Installation git clone https://github.com/BM-K/
Code for technical report "An Improved Baseline for Sentence-level Relation Extraction".
RE_improved_baseline Code for technical report "An Improved Baseline for Sentence-level Relation Extraction". Requirements torch = 1.8.1 transformers
This project is a sample demo of Arxiv search related to AI/ML Papers built using Streamlit, sentence-transformers and Faiss.
This project is a sample demo of Arxiv search related to AI/ML Papers built using Streamlit, sentence-transformers and Faiss.
A Structured Self-attentive Sentence Embedding
Structured Self-attentive sentence embeddings Implementation for the paper A Structured Self-Attentive Sentence Embedding, which was published in ICLR
REST API for sentence tokenization and embedding using Multilingual Universal Sentence Encoder.
MUSE stands for Multilingual Universal Sentence Encoder - multilingual extension (supports 16 languages) of Universal Sentence Encoder (USE).
Relative Positional Encoding for Transformers with Linear Complexity
Stochastic Positional Encoding (SPE) This is the source code repository for the ICML 2021 paper Relative Positional Encoding for Transformers with Lin
Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch
Rotary Embeddings - Pytorch A standalone library for adding rotary embeddings to transformers in Pytorch, following its success as relative positional
Distance Encoding for GNN Design
Distance-encoding for GNN design This repository is the official PyTorch implementation of the DEGNN and DEAGNN framework reported in the paper: Dista
[ACL 20] Probing Linguistic Features of Sentence-level Representations in Neural Relation Extraction
REval Table of Contents Introduction Overview Requirements Installation Probing Usage Citation License 🎓 Introduction REval is a simple framework for
Shared code for training sentence embeddings with Flax / JAX
flax-sentence-embeddings This repository will be used to share code for the Flax / JAX community event to train sentence embeddings on 1B+ training pa
中文无监督SimCSE Pytorch实现
A PyTorch implementation of unsupervised SimCSE SimCSE: Simple Contrastive Learning of Sentence Embeddings 1. 用法 无监督训练 python train_unsup.py ./data/ne
一个关于摩斯密码解密与加密的库 / A library about encoding and decoding Morse code.
Morsecoder By Lemonix 介绍 一个关于摩斯密码解密与加密的库
Cải thiện Elasticsearch trong bài toán semantic search sử dụng phương pháp Sentence Embeddings
Cải thiện Elasticsearch trong bài toán semantic search sử dụng phương pháp Sentence Embeddings Trong bài viết này mình sẽ sử dụng pretrain model SimCS