212 Repositories
Python universal-embeddings Libraries
Python package to generate image embeddings with CLIP without PyTorch/TensorFlow
imgbeddings A Python package to generate embedding vectors from images, using OpenAI's robust CLIP model via Hugging Face transformers. These image em
Curso práctico: NLP de cero a cien 🤗
Curso Práctico: NLP de cero a cien Comprende todos los conceptos y arquitecturas clave del estado del arte del NLP y aplícalos a casos prácticos utili
SeqTR: A Simple yet Universal Network for Visual Grounding
SeqTR This is the official implementation of SeqTR: A Simple yet Universal Network for Visual Grounding, which simplifies and unifies the modelling fo
A library built upon PyTorch for building embeddings on discrete event sequences using self-supervision
pytorch-lifestream a library built upon PyTorch for building embeddings on discrete event sequences using self-supervision. It can process terabyte-si
Sapiens is a human antibody language model based on BERT.
Sapiens: Human antibody language model ____ _ / ___| __ _ _ __ (_) ___ _ __ ___ \___ \ / _` | '_ \| |/ _ \ '
This is a repo of basic Machine Learning!
Basic Machine Learning This repository contains a topic-wise curated list of Machine Learning and Deep Learning tutorials, articles and other resource
Code for "MetaMorph: Learning Universal Controllers with Transformers", Gupta et al, ICLR 2022
MetaMorph: Learning Universal Controllers with Transformers This is the code for the paper MetaMorph: Learning Universal Controllers with Transformers
Hyperbolic Image Segmentation, CVPR 2022
Hyperbolic Image Segmentation, CVPR 2022 This is the implementation of paper Hyperbolic Image Segmentation (CVPR 2022). Repository structure assets :
ACL22 paper: Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost
Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost LOVE is accpeted by ACL22 main conference as a long pape
NAACL 2022: MCSE: Multimodal Contrastive Learning of Sentence Embeddings
MCSE: Multimodal Contrastive Learning of Sentence Embeddings This repository contains code and pre-trained models for our NAACL-2022 paper MCSE: Multi
Universal Adversarial Examples in Remote Sensing: Methodology and Benchmark
Universal Adversarial Examples in Remote Sensing: Methodology and Benchmark Yong
UFT - Universal File Transfer With Python
UFT 2.0.0 UFT (Universal File Transfer) is a CLI tool , which can be used to upl
UMPNet: Universal Manipulation Policy Network for Articulated Objects
UMPNet: Universal Manipulation Policy Network for Articulated Objects Zhenjia Xu, Zhanpeng He, Shuran Song Columbia University Robotics and Automation
Some embedding layer implementation using ivy library
ivy-manual-embeddings Some embedding layer implementation using ivy library. Just for fun. It is based on NYCTaxiFare dataset from kaggle (cut down to
A small tool to test and visualize protein embeddings and amino acid proportions.
polyprotein_stats A small tool to test and visualize protein embeddings and amino acid proportions. Currently deployed on streamlit.io. Given a set of
nlabel is a library for generating, storing and retrieving tagging information and embedding vectors from various nlp libraries through a unified interface.
nlabel is a library for generating, storing and retrieving tagging information and embedding vectors from various nlp libraries through a unified interface.
Author Disambiguation using Knowledge Graph Embeddings with Literals
Author Name Disambiguation with Knowledge Graph Embeddings using Literals This is the repository for the master thesis project on Knowledge Graph Embe
CSKG is a commonsense knowledge graph that combines seven popular sources into a consolidated representation
CSKG: The CommonSense Knowledge Graph CSKG is a commonsense knowledge graph that combines seven popular sources into a consolidated representation: AT
BERTopic is a topic modeling technique that leverages 🤗 transformers and c-TF-IDF to create dense clusters allowing for easily interpretable topics whilst keeping important words in the topic descriptions
BERTopic BERTopic is a topic modeling technique that leverages 🤗 transformers and c-TF-IDF to create dense clusters allowing for easily interpretable
Built a deep neural network (DNN) that functions as an end-to-end machine translation pipeline
Built a deep neural network (DNN) that functions as an end-to-end machine translation pipeline. The pipeline accepts english text as input and returns the French translation.
Keras Image Embeddings using Contrastive Loss
Image to Embedding projection in vector space. Implementation in keras and tensorflow of batch all triplet loss for one-shot/few-shot learning.
Keras Image Embeddings using Contrastive Loss
Keras-Image-Embeddings-using-Contrastive-Loss Image to Embedding projection in vector space. Implementation in keras and tensorflow for custom data. B
Informal Persian Universal Dependency Treebank
Informal Persian Universal Dependency Treebank (iPerUDT) Informal Persian Universal Dependency Treebank, consisting of 3000 sentences and 54,904 token
Prompt-BERT: Prompt makes BERT Better at Sentence Embeddings
Prompt-BERT: Prompt makes BERT Better at Sentence Embeddings Results on STS Tasks Model STS12 STS13 STS14 STS15 STS16 STSb SICK-R Avg. unsup-prompt-be
Instance Segmentation by Jointly Optimizing Spatial Embeddings and Clustering Bandwidth
Instance segmentation by jointly optimizing spatial embeddings and clustering bandwidth This codebase implements the loss function described in: Insta
Lbl2Vec learns jointly embedded label, document and word vectors to retrieve documents with predefined topics from an unlabeled document corpus.
Lbl2Vec Lbl2Vec is an algorithm for unsupervised document classification and unsupervised document retrieval. It automatically generates jointly embed
Code for the paper "Asymptotics of ℓ2 Regularized Network Embeddings"
README Code for the paper Asymptotics of L2 Regularized Network Embeddings. Requirements Requires Stellargraph 1.2.1, Tensorflow 2.6.0, scikit-learm 0
HuSpaCy: industrial-strength Hungarian natural language processing
HuSpaCy: Industrial-strength Hungarian NLP HuSpaCy is a spaCy model and a library providing industrial-strength Hungarian language processing faciliti
Script and models for clustering LAION-400m CLIP embeddings.
clustering-laion400m Script and models for clustering LAION-400m CLIP embeddings. Models were fit on the first million or so image embeddings. A subje
DeepSpamReview: Detection of Fake Reviews on Online Review Platforms using Deep Learning Architectures. Summer Internship project at CoreView Systems.
Detection of Fake Reviews on Online Review Platforms using Deep Learning Architectures Dataset: https://s3.amazonaws.com/fast-ai-nlp/yelp_review_polar
GNEE - GAT Neural Event Embeddings
GNEE - GAT Neural Event Embeddings This repository contains source code for the GNEE (GAT Neural Event Embeddings) method introduced in the paper: "Se
Augmented CLIP - Training simple models to predict CLIP image embeddings from text embeddings, and vice versa.
Train aug_clip against laion400m-embeddings found here: https://laion.ai/laion-400-open-dataset/ - note that this used the base ViT-B/32 CLIP model. S
Automatic library of congress classification, using word embeddings from book titles and synopses.
Automatic Library of Congress Classification The Library of Congress Classification (LCC) is a comprehensive classification system that was first deve
Code for Universal Semi-Supervised Semantic Segmentation models paper accepted in ICCV 2019
USSS_ICCV19 Code for Universal Semi Supervised Semantic Segmentation accepted to ICCV 2019. Full Paper available at https://arxiv.org/abs/1811.10323.
Unsupervised Attributed Multiplex Network Embedding (AAAI 2020)
Unsupervised Attributed Multiplex Network Embedding (DMGI) Overview Nodes in a multiplex network are connected by multiple types of relations. However
Code for hyperboloid embeddings for knowledge graph entities
Implementation for the papers: Self-Supervised Hyperboloid Representations from Logical Queries over Knowledge Graphs, Nurendra Choudhary, Nikhil Rao,
Fast, DB Backed pretrained word embeddings for natural language processing.
Embeddings Embeddings is a python package that provides pretrained word embeddings for natural language processing and machine learning. Instead of lo
An universal file format tool kit. At present will handle the ico format problem.
An universal file format tool kit. At present will handle the ico format problem.
Noto fonts go universal! Download Noto fonts combined to suit your region (South Asia, SE Asia, Africa-MiddleEast, Europe-Americas).
Go Noto Universal Noto fonts go universal! Download Noto fonts combined to suit your region (South Asia, SE Asia, East Asia, Africa-MiddleEast, Europe
PyTorch implementation of DeepUME: Learning the Universal Manifold Embedding for Robust Point Cloud Registration (BMVC 2021)
DeepUME: Learning the Universal Manifold Embedding for Robust Point Cloud Registration [video] [paper] [supplementary] [data] [thesis] Introduction De
A universal memory dumper using Frida
Fridump Fridump (v0.1) is an open source memory dumping tool, primarily aimed to penetration testers and developers. Fridump is using the Frida framew
CMUA-Watermark: A Cross-Model Universal Adversarial Watermark for Combating Deepfakes (AAAI2022)
CMUA-Watermark The official code for CMUA-Watermark: A Cross-Model Universal Adversarial Watermark for Combating Deepfakes (AAAI2022) arxiv. It is bas
BERT, LDA, and TFIDF based keyword extraction in Python
BERT, LDA, and TFIDF based keyword extraction in Python kwx is a toolkit for multilingual keyword extraction based on Google's BERT and Latent Dirichl
Noto fonts go universal! Download Noto fonts combined to suit your region
noto-cjk Noto CJK fonts Noto Serif CJK update was released on 25 October 2021. We moved the release history and other notes into both Sans and Serif s
LUKE -- Language Understanding with Knowledge-based Embeddings
LUKE (Language Understanding with Knowledge-based Embeddings) is a new pre-trained contextualized representation of words and entities based on transf
Mask2Former: Masked-attention Mask Transformer for Universal Image Segmentation in TensorFlow 2
Mask2Former: Masked-attention Mask Transformer for Universal Image Segmentation in TensorFlow 2 Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexan
Text Classification in Turkish Texts with Bert
You can watch the details of the project on my youtube channel Project Interface Project Second Interface Goal= Correctly guessing the classification
Train emoji embeddings based on emoji descriptions.
emoji2vec This is my attempt to train, visualize and evaluate emoji embeddings as presented by Ben Eisner, Tim Rocktäschel, Isabelle Augenstein, Matko
Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.
WECHSEL Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models. arXiv: https://arx
A high-performance distributed deep learning system targeting large-scale and automated distributed training.
HETU Documentation | Examples Hetu is a high-performance distributed deep learning system targeting trillions of parameters DL model training, develop
Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.
WECHSEL Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models. arXiv: https://arx
Continuous Augmented Positional Embeddings (CAPE) implementation for PyTorch
PyTorch implementation of Continuous Augmented Positional Embeddings (CAPE), by Likhomanenko et al. Enhance your Transformer positional embeddings with easy-to-use augmentations!
Scripts and outputs related to the paper Prediction of Adverse Biological Effects of Chemicals Using Knowledge Graph Embeddings.
Knowledge Graph Embeddings and Chemical Effect Prediction, 2020. Scripts and outputs related to the paper Prediction of Adverse Biological Effects of
VarCLR: Variable Semantic Representation Pre-training via Contrastive Learning
VarCLR: Variable Representation Pre-training via Contrastive Learning New: Paper accepted by ICSE 2022. Preprint at arXiv! This repository contain
ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction
ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction. NeurIPS 2021.
A spider for Universal Online Judge(UOJ) system, converting problem pages to PDFs.
Universal Online Judge Spider Introduction This is a spider for Universal Online Judge (UOJ) system (https://uoj.ac/). It also works for all other Onl
VarCLR: Variable Semantic Representation Pre-training via Contrastive Learning
VarCLR: Variable Representation Pre-training via Contrastive Learning New: Paper accepted by ICSE 2022. Preprint at arXiv! This repository contain
BERTMap: A BERT-Based Ontology Alignment System
BERTMap: A BERT-based Ontology Alignment System Important Notices The relevant paper was accepted in AAAI-2022. Arxiv version is available at: https:/
100+ Chinese Word Vectors 上百种预训练中文词向量
Chinese Word Vectors 中文词向量 中文 This project provides 100+ Chinese Word Vectors (embeddings) trained with different representations (dense and sparse),
LUKE -- Language Understanding with Knowledge-based Embeddings
LUKE (Language Understanding with Knowledge-based Embeddings) is a new pre-trained contextualized representation of words and entities based on transf
Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration
Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration This is the official repository for the EMNLP 2021 long pa
A single model that parses Universal Dependencies across 75 languages.
A single model that parses Universal Dependencies across 75 languages. Given a sentence, jointly predicts part-of-speech tags, morphology tags, lemmas, and dependency trees.
Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"
UNITER: UNiversal Image-TExt Representation Learning This is the official repository of UNITER (ECCV 2020). This repository currently supports finetun
Code release for "Masked-attention Mask Transformer for Universal Image Segmentation"
Mask2Former: Masked-attention Mask Transformer for Universal Image Segmentation Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, Ro
Code for this paper The Lottery Ticket Hypothesis for Pre-trained BERT Networks.
The Lottery Ticket Hypothesis for Pre-trained BERT Networks Code for this paper The Lottery Ticket Hypothesis for Pre-trained BERT Networks. [NeurIPS
Official Implementation of "Learning Disentangled Behavior Embeddings"
DBE: Disentangled-Behavior-Embedding Official implementation of Learning Disentangled Behavior Embeddings (NeurIPS 2021). Environment requirement The
OpenL3: Open-source deep audio and image embeddings
OpenL3 OpenL3 is an open-source Python library for computing deep audio and image embeddings. Please refer to the documentation for detailed instructi
Generate text captions for images from their CLIP embeddings. Includes PyTorch model code and example training script.
clip-text-decoder Generate text captions for images from their CLIP embeddings. Includes PyTorch model code and example training script. Example Predi
Resources related to EMNLP 2021 paper "FAME: Feature-Based Adversarial Meta-Embeddings for Robust Input Representations"
FAME: Feature-based Adversarial Meta-Embeddings This is the companion code for the experiments reported in the paper "FAME: Feature-Based Adversarial
Switch spaces for knowledge graph embeddings
SwisE Switch spaces for knowledge graph embeddings. Requirements: python3 pytorch numpy tqdm Reproduce the results To reproduce the reported results,
Code release for Universal Domain Adaptation(CVPR 2019)
Universal Domain Adaptation Code release for Universal Domain Adaptation(CVPR 2019) Requirements python 3.6+ PyTorch 1.0 pip install -r requirements.t
Universal Probability Distributions with Optimal Transport and Convex Optimization
Sylvester normalizing flows for variational inference Pytorch implementation of Sylvester normalizing flows, based on our paper: Sylvester normalizing
Exploring dimension-reduced embeddings
sleepwalk Exploring dimension-reduced embeddings This is the code repository. See here for the Sleepwalk web page. License and disclaimer This program
A fast, efficient universal vector embedding utility package.
Magnitude: a fast, simple vector embedding utility library A feature-packed Python package and vector storage file format for utilizing vector embeddi
Repo for the paper "DiLBERT: Cheap Embeddings for Disease Related Medical NLP"
DiLBERT Repo for the paper "DiLBERT: Cheap Embeddings for Disease Related Medical NLP" Pretrained Model The pretrained model presented in the paper is
Stanza: A Python NLP Library for Many Human Languages
Official Stanford NLP Python Library for Many Human Languages
Code for the paper "A Simple but Tough-to-Beat Baseline for Sentence Embeddings".
Code for the paper "A Simple but Tough-to-Beat Baseline for Sentence Embeddings".
Source for the paper "Universal Activation Function for machine learning"
Universal Activation Function Tensorflow and Pytorch source code for the paper Yuen, Brosnan, Minh Tu Hoang, Xiaodai Dong, and Tao Lu. "Universal acti
Code for "Universal inference meets random projections: a scalable test for log-concavity"
How to use this repository This repository contains code to replicate the results of "Universal inference meets random projections: a scalable test fo
GemNet model in PyTorch, as proposed in "GemNet: Universal Directional Graph Neural Networks for Molecules" (NeurIPS 2021)
GemNet: Universal Directional Graph Neural Networks for Molecules Reference implementation in PyTorch of the geometric message passing neural network
Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).
Spanish Language Models 💃🏻 A repository part of the MarIA project. Corpora 📃 Corpora Number of documents Number of tokens Size (GB) BNE 201,080,084
This is the code used in the paper "Entity Embeddings of Categorical Variables".
This is the code used in the paper "Entity Embeddings of Categorical Variables". If you want to get the original version of the code used for the Kagg
Code base for reproducing results of I.Schubert, D.Driess, O.Oguz, and M.Toussaint: Learning to Execute: Efficient Learning of Universal Plan-Conditioned Policies in Robotics. NeurIPS (2021)
Learning to Execute (L2E) Official code base for completely reproducing all results reported in I.Schubert, D.Driess, O.Oguz, and M.Toussaint: Learnin
Text-to-Music Retrieval using Pre-defined/Data-driven Emotion Embeddings
Text2Music Emotion Embedding Text-to-Music Retrieval using Pre-defined/Data-driven Emotion Embeddings Reference Emotion Embedding Spaces for Matching
Żmija is a simple universal code generation tool.
Żmija Żmija is a simple universal code generation tool. It is intended to be used as a means to generate code that is both efficient and easily mainta
Code repository for EMNLP 2021 paper 'Adversarial Attacks on Knowledge Graph Embeddings via Instance Attribution Methods'
Adversarial Attacks on Knowledge Graph Embeddings via Instance Attribution Methods This is the code repository to accompany the EMNLP 2021 paper on ad
🌸 fastText + Bloom embeddings for compact, full-coverage vectors with spaCy
floret: fastText + Bloom embeddings for compact, full-coverage vectors with spaCy floret is an extended version of fastText that can produce word repr
gtfs2vec - Learning GTFS Embeddings for comparing PublicTransport Offer in Microregions
gtfs2vec This is a companion repository for a gtfs2vec - Learning GTFS Embeddings for comparing PublicTransport Offer in Microregions publication. Vis
[NeurIPS 2021] "Delayed Propagation Transformer: A Universal Computation Engine towards Practical Control in Cyber-Physical Systems"
Delayed Propagation Transformer: A Universal Computation Engine towards Practical Control in Cyber-Physical Systems Introduction Multi-agent control i
The code of paper ConE: Cone Embeddings for Multi-Hop Reasoning over Knowledge Graphs. Zhanqiu Zhang, Jie Wang, Jiajun Chen, Shuiwang Ji, Feng Wu. NeurIPS 2021.
ConE: Cone Embeddings for Multi-Hop Reasoning over Knowledge Graphs This is the code of paper ConE: Cone Embeddings for Multi-Hop Reasoning over Knowl
A PyTorch implementation of unsupervised SimCSE
A PyTorch implementation of unsupervised SimCSE
Data, model training, and evaluation code for "PubTables-1M: Towards a universal dataset and metrics for training and evaluating table extraction models".
PubTables-1M This repository contains training and evaluation code for the paper "PubTables-1M: Towards a universal dataset and metrics for training a
ConE: Cone Embeddings for Multi-Hop Reasoning over Knowledge Graphs
ConE: Cone Embeddings for Multi-Hop Reasoning over Knowledge Graphs This is the code of paper ConE: Cone Embeddings for Multi-Hop Reasoning over Knowl
Towhee is a flexible machine learning framework currently focused on computing deep learning embeddings over unstructured data.
Towhee is a flexible machine learning framework currently focused on computing deep learning embeddings over unstructured data.
Language Models for the legal domain in Spanish done @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).
Spanish legal domain Language Model ⚖️ This repository contains the page for two main resources for the Spanish legal domain: A RoBERTa model: https:/
Bootstrapped Unsupervised Sentence Representation Learning (ACL 2021)
Install first pip3 install -e . Training python3 training/unsupervised_tuning.py python3 training/supervised_tuning.py python3 training/multilingual_
Code for reproducing our paper: LMSOC: An Approach for Socially Sensitive Pretraining
LMSOC: An Approach for Socially Sensitive Pretraining Code for reproducing the paper LMSOC: An Approach for Socially Sensitive Pretraining to appear a
A Word Level Transformer layer based on PyTorch and 🤗 Transformers.
Transformer Embedder A Word Level Transformer layer based on PyTorch and 🤗 Transformers. How to use Install the library from PyPI: pip install transf
[EMNLP 2021] Mirror-BERT: Converting Pretrained Language Models to universal text encoders without labels.
[EMNLP 2021] Mirror-BERT: Converting Pretrained Language Models to universal text encoders without labels.