Script and models for clustering LAION-400m CLIP embeddings.

Peter Baylies

Last update: Oct 4, 2022

Related tags

Text Data & NLP clustering-laion400m

Overview

clustering-laion400m

Script and models for clustering LAION-400m CLIP embeddings.

Models were fit on the first million or so image embeddings. A subjective description of what the labels appear to be is included in cluster-labels.txt along with counts for the first million or so embeddings (aka the first file).

Precomputed labels are here: https://archive.org/details/laion400m-64-clustering-labels.tar

Run Fit Clusters.ipynb to reproduce the labels or create your own clusters / models. This requires the CLIP embeddings from the LAION 400m open dataset, which can be found here: https://laion.ai/laion-400-open-dataset/

You might also like...

InferSent sentence embeddings

InferSent InferSent is a sentence embeddings method that provides semantic representations for English sentences. It is trained on natural language in

2.2k Dec 27, 2022

Cải thiện Elasticsearch trong bài toán semantic search sử dụng phương pháp Sentence Embeddings

Cải thiện Elasticsearch trong bài toán semantic search sử dụng phương pháp Sentence Embeddings Trong bài viết này mình sẽ sử dụng pretrain model SimCS

18 Nov 25, 2022

Shared code for training sentence embeddings with Flax / JAX

flax-sentence-embeddings This repository will be used to share code for the Flax / JAX community event to train sentence embeddings on 1B+ training pa

23 Dec 30, 2022

Convolutional 2D Knowledge Graph Embeddings resources

ConvE Convolutional 2D Knowledge Graph Embeddings resources. Paper: Convolutional 2D Knowledge Graph Embeddings Used in the paper, but do not use thes

586 Dec 24, 2022

PyTorch implementation of the NIPS-17 paper "Poincaré Embeddings for Learning Hierarchical Representations"

Poincaré Embeddings for Learning Hierarchical Representations PyTorch implementation of Poincaré Embeddings for Learning Hierarchical Representations

1.6k Dec 29, 2022

Original implementation of the pooling method introduced in "Speaker embeddings by modeling channel-wise correlations"

Speaker-Embeddings-Correlation-Pooling This is the original implementation of the pooling method introduced in "Speaker embeddings by modeling channel

10 Apr 30, 2022

An easy-to-use Python module that helps you to extract the BERT embeddings for a large text dataset (Bengali/English) efficiently.

37 Sep 5, 2022

🌸 fastText + Bloom embeddings for compact, full-coverage vectors with spaCy

floret: fastText + Bloom embeddings for compact, full-coverage vectors with spaCy floret is an extended version of fastText that can produce word repr

222 Dec 16, 2022

Switch spaces for knowledge graph embeddings

SwisE Switch spaces for knowledge graph embeddings. Requirements: python3 pytorch numpy tqdm Reproduce the results To reproduce the reported results,

4 Dec 1, 2021

Comments

Reasoning behind FastICA/PCA vectors for GMM

Just out of curiosity, I was wondering what the reasoning was behind using a combined PCA and FastICA vectors for Gaussian Mixture Models. I wasn't able to find this kind of approach anywhere else. Does it offer some specific benefits regarding CLIP feature vectors?

Thanks!

opened by njanakiev 3

Owner

Peter Baylies

GitHub

An easy to use, user-friendly and efficient code for extracting OpenAI CLIP (Global/Grid) features from image and text respectively.

Extracting OpenAI CLIP (Global/Grid) Features from Image and Text This repo aims at providing an easy to use and efficient code for extracting image &

13 Jan 6, 2023

Korean Simple Contrastive Learning of Sentence Embeddings using SKT KoBERT and kakaobrain KorNLU dataset

KoSimCSE Korean Simple Contrastive Learning of Sentence Embeddings implementation using pytorch SimCSE Installation git clone https://github.com/BM-K/

34 Nov 24, 2022

BPEmb is a collection of pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) and trained on Wikipedia.

BPEmb is a collection of pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) and trained on Wikipedia. Its intended use is as input for neural models in natural language processing.

1.1k Jan 3, 2023

Script and models for clustering LAION-400m CLIP embeddings.

Related tags

Overview

clustering-laion400m

You might also like...

InferSent sentence embeddings

Cải thiện Elasticsearch trong bài toán semantic search sử dụng phương pháp Sentence Embeddings

Shared code for training sentence embeddings with Flax / JAX

Convolutional 2D Knowledge Graph Embeddings resources

PyTorch implementation of the NIPS-17 paper "Poincaré Embeddings for Learning Hierarchical Representations"

Original implementation of the pooling method introduced in "Speaker embeddings by modeling channel-wise correlations"

An easy-to-use Python module that helps you to extract the BERT embeddings for a large text dataset (Bengali/English) efficiently.

🌸 fastText + Bloom embeddings for compact, full-coverage vectors with spaCy

Switch spaces for knowledge graph embeddings

Comments

Reasoning behind FastICA/PCA vectors for GMM

Owner

Peter Baylies

An easy to use, user-friendly and efficient code for extracting OpenAI CLIP (Global/Grid) features from image and text respectively.

Korean Simple Contrastive Learning of Sentence Embeddings using SKT KoBERT and kakaobrain KorNLU dataset

BPEmb is a collection of pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) and trained on Wikipedia.

A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN

AudioCLIP Extending CLIP to Image, Text and Audio

CPC-big and k-means clustering for zero-resource speech processing

Sentence Embeddings with BERT & XLNet

Sentence Embeddings with BERT & XLNet

SimCSE: Simple Contrastive Learning of Sentence Embeddings

A library for Multilingual Unsupervised or Supervised word Embeddings