A fast hierarchical dimensionality reduction algorithm.

Overview

h-NNE: Hierarchical Nearest Neighbor Embedding

A fast hierarchical dimensionality reduction algorithm.

h-NNE is a general purpose dimensionality reduction algorithm such as t-SNE and UMAP. It stands out for its speed, simplicity and the fact that it provides a hierarchy of clusterings as part of its projection process. The algorithm is inspired by the FINCH clustering algorithm. For more information on the structure of the algorithm, please look at our corresponding paper in ArXiv:

M. Saquib Sarfraz*, Marios Koulakis*, Constantin Seibold, Rainer Stiefelhagen. Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction. CVPR 2022.

More details are available in the project documentation.

Installation

The project is available in PyPI. To install run:

pip install hnne

How to use h-NNE

The HNNE class implements the common methods of the sklearn interface.

Simple projection example

import numpy as np
from hnne import HNNE

data = np.random.random(size=(1000, 256))

hnne = HNNE(dim=2)
projection = hnne.fit_transform(data)

Projecting on new points

hnne = HNNE()
projection = hnne.fit_transform(data)

new_data_projection = hnne.transform(new_data)

Demos

The following demo notebooks are available:

  1. Basic Usage
  2. Multiple Projections
  3. Clustering for Free
  4. Monitor Quality of Network Embeddings

Citation

If you make use of this project in your work, it would be appreciated if you cite the hnne paper:

@article{hnne,
  title={Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction},
  author={M. Saquib Sarfraz, Marios Koulakis, Constantin Seibold, Rainer Stiefelhagen},
  booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2022}
}

If you make use of the clustering properties of the algorithm please also cite:

 @inproceedings{finch,
   author    = {M. Saquib Sarfraz and Vivek Sharma and Rainer Stiefelhagen},
   title     = {Efficient Parameter-free Clustering Using First Neighbor Relations},
   booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
   pages = {8934--8943},
   year  = {2019}
}
You might also like...
An algorithm that can solve the word puzzle Wordle with an optimal number of guesses on HARD mode.
An algorithm that can solve the word puzzle Wordle with an optimal number of guesses on HARD mode.

WordleSolver An algorithm that can solve the word puzzle Wordle with an optimal number of guesses on HARD mode. How to use the program Copy this proje

VampiresVsWerewolves - Our Implementation of a MiniMax algorithm with alpha beta pruning in the context of an in-class competition

VampiresVsWerewolves Our Implementation of a MiniMax algorithm with alpha beta pruning in the context of an in-class competition. Our Algorithm finish

Fast topic modeling platform

The state-of-the-art platform for topic modeling. Full Documentation User Mailing List Download Releases User survey What is BigARTM? BigARTM is a pow

Easy, fast, effective, and automatic g-code compression!
Easy, fast, effective, and automatic g-code compression!

Getting to the meat of g-code. Easy, fast, effective, and automatic g-code compression! MeatPack nearly doubles the effective data rate of a standard

Library for fast text representation and classification.

fastText fastText is a library for efficient learning of word representations and sentence classification. Table of contents Resources Models Suppleme

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Provides an implementation of today's most used tokenizers, with a focus on performance and versatility. Main features: Train new vocabularies and tok

✨Fast Coreference Resolution in spaCy with Neural Networks
✨Fast Coreference Resolution in spaCy with Neural Networks

✨ NeuralCoref 4.0: Coreference Resolution in spaCy with Neural Networks. NeuralCoref is a pipeline extension for spaCy 2.1+ which annotates and resolv

:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.
:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.

(Framework for Adapting Representation Models) What is it? FARM makes Transfer Learning with BERT & Co simple, fast and enterprise-ready. It's built u

Library for fast text representation and classification.

fastText fastText is a library for efficient learning of word representations and sentence classification. Table of contents Resources Models Suppleme

Comments
  • TypingError on `fit_transform`

    TypingError on `fit_transform`

    I'm getting this error when I'm running fit_transform. Could I please get some help here?

    The input size is (213136, 1024) with float16.

    TypingError: Failed in nopython mode pipeline (step: nopython frontend)
    No implementation of function Function(<built-in function arange>) found for signature:
     
     >>> arange(int64)
     
    During: resolving callee type: Function(<built-in function arange>)
    During: typing of call at ~/anaconda3/envs/torch111-cuda113/lib/python3.9/site-packages/pynndescent/rp_trees.py (764)
    
    
    File "../../../../anaconda3/envs/torch111-cuda113/lib/python3.9/site-packages/pynndescent/rp_trees.py", line 764:
    def make_dense_tree(data, rng_state, leaf_size=30, angular=False):
        indices = np.arange(data.shape[0]).astype(np.int32)
    
    opened by y27choi 4
  • Could not import hnne in Win10

    Could not import hnne in Win10

    Thanks for developping hnne! I tried to reproduce the examples, but it's weird I couldn't import hnne.

    (base) C:\Users\Ci>pip install numba
    Defaulting to user installation because normal site-packages is not writeable
    Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
    Requirement already satisfied: numba in d:\anaconda3\lib\site-packages (0.55.2)
    Requirement already satisfied: numpy<1.23,>=1.18 in c:\users\ci\appdata\roaming\python\python39\site-packages (from numba) (1.20.0)
    Requirement already satisfied: setuptools in d:\anaconda3\lib\site-packages (from numba) (63.4.2)
    Requirement already satisfied: llvmlite<0.39,>=0.38.0rc1 in d:\anaconda3\lib\site-packages (from numba) (0.38.1)
    
    (base) C:\Users\Ci>pip install pynndescent
    Defaulting to user installation because normal site-packages is not writeable
    Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
    Requirement already satisfied: pynndescent in d:\anaconda3\lib\site-packages (0.5.7)
    Requirement already satisfied: llvmlite>=0.30 in d:\anaconda3\lib\site-packages (from pynndescent) (0.38.1)
    Requirement already satisfied: scipy>=1.0 in d:\anaconda3\lib\site-packages (from pynndescent) (1.9.0)
    Requirement already satisfied: joblib>=0.11 in c:\users\ci\appdata\roaming\python\python39\site-packages (from pynndescent) (0.14.1)
    Requirement already satisfied: numba>=0.51.2 in d:\anaconda3\lib\site-packages (from pynndescent) (0.55.2)
    Requirement already satisfied: scikit-learn>=0.18 in c:\users\ci\appdata\roaming\python\python39\site-packages (from pynndescent) (1.0.2)
    Requirement already satisfied: setuptools in d:\anaconda3\lib\site-packages (from numba>=0.51.2->pynndescent) (63.4.2)
    Requirement already satisfied: numpy<1.23,>=1.18 in c:\users\ci\appdata\roaming\python\python39\site-packages (from numba>=0.51.2->pynndescent) (1.20.0)
    Requirement already satisfied: threadpoolctl>=2.0.0 in d:\anaconda3\lib\site-packages (from scikit-learn>=0.18->pynndescent) (3.1.0)
    (base) C:\Users\Ci>pip install hnne
    Defaulting to user installation because normal site-packages is not writeable
    Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
    Requirement already satisfied: hnne in d:\anaconda3\lib\site-packages (0.1.7)
    Requirement already satisfied: typer in d:\anaconda3\lib\site-packages (from hnne) (0.6.1)
    Requirement already satisfied: pandas in d:\anaconda3\lib\site-packages (from hnne) (1.4.3)
    Requirement already satisfied: pynndescent in d:\anaconda3\lib\site-packages (from hnne) (0.5.7)
    Requirement already satisfied: numpy==1.20 in c:\users\ci\appdata\roaming\python\python39\site-packages (from hnne) (1.20.0)
    Requirement already satisfied: cython in d:\anaconda3\lib\site-packages (from hnne) (0.29.32)
    Requirement already satisfied: scipy in d:\anaconda3\lib\site-packages (from hnne) (1.9.0)
    Requirement already satisfied: tqdm in d:\anaconda3\lib\site-packages (from hnne) (4.64.0)
    Requirement already satisfied: sklearn in c:\users\ci\appdata\roaming\python\python39\site-packages (from hnne) (0.0)
    Requirement already satisfied: python-dateutil>=2.8.1 in d:\anaconda3\lib\site-packages (from pandas->hnne) (2.8.2)
    Requirement already satisfied: pytz>=2020.1 in d:\anaconda3\lib\site-packages (from pandas->hnne) (2022.1)
    Requirement already satisfied: joblib>=0.11 in c:\users\ci\appdata\roaming\python\python39\site-packages (from pynndescent->hnne) (0.14.1)
    Requirement already satisfied: llvmlite>=0.30 in d:\anaconda3\lib\site-packages (from pynndescent->hnne) (0.38.1)
    Requirement already satisfied: numba>=0.51.2 in d:\anaconda3\lib\site-packages (from pynndescent->hnne) (0.55.2)
    Requirement already satisfied: scikit-learn>=0.18 in c:\users\ci\appdata\roaming\python\python39\site-packages (from pynndescent->hnne) (1.0.2)
    Requirement already satisfied: colorama in d:\anaconda3\lib\site-packages (from tqdm->hnne) (0.4.5)
    Requirement already satisfied: click<9.0.0,>=7.1.1 in d:\anaconda3\lib\site-packages (from typer->hnne) (8.1.3)
    Requirement already satisfied: setuptools in d:\anaconda3\lib\site-packages (from numba>=0.51.2->pynndescent->hnne) (63.4.2)
    Requirement already satisfied: six>=1.5 in d:\anaconda3\lib\site-packages (from python-dateutil>=2.8.1->pandas->hnne) (1.16.0)
    Requirement already satisfied: threadpoolctl>=2.0.0 in d:\anaconda3\lib\site-packages (from scikit-learn>=0.18->pynndescent->hnne) (3.1.0)
    
    (base) C:\Users\Ci>
    

    image

    opened by Ci-TJ 2
  • Segmentation fault if the number of samples exceeds 80k x 1024

    Segmentation fault if the number of samples exceeds 80k x 1024

    The call to the hnne fit_transform results in a segmentation fault for a shape exceeding 80k X 1024. The segmentation faults occurs at the call of NNDescent function where RP-trees are being built and descent steps are about to start.

    opened by deepakanand14 0
Owner
Marios Koulakis
My latest work is in deep learning, computer vision and mathematics.
Marios Koulakis
Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation This repository is the pytorch implementation of our paper: Hierarchical Cr

null 44 Jan 6, 2023
PyTorch implementation of the NIPS-17 paper "Poincaré Embeddings for Learning Hierarchical Representations"

Poincaré Embeddings for Learning Hierarchical Representations PyTorch implementation of Poincaré Embeddings for Learning Hierarchical Representations

Facebook Research 1.6k Dec 29, 2022
Hierarchical unsupervised and semi-supervised topic models for sparse count data with CorEx

Anchored CorEx: Hierarchical Topic Modeling with Minimal Domain Knowledge Correlation Explanation (CorEx) is a topic model that yields rich topics tha

Greg Ver Steeg 592 Dec 18, 2022
Python module (C extension and plain python) implementing Aho-Corasick algorithm

pyahocorasick pyahocorasick is a fast and memory efficient library for exact or approximate multi-pattern string search meaning that you can find mult

Wojciech Muła 763 Dec 27, 2022
Python module (C extension and plain python) implementing Aho-Corasick algorithm

pyahocorasick pyahocorasick is a fast and memory efficient library for exact or approximate multi-pattern string search meaning that you can find mult

Wojciech Muła 579 Feb 17, 2021
Top2Vec is an algorithm for topic modeling and semantic search.

Top2Vec is an algorithm for topic modeling and semantic search. It automatically detects topics present in text and generates jointly embedded topic, document and word vectors.

Dimo Angelov 2.4k Jan 6, 2023
This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.

UIS-RNN Overview This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm. UIS-RNN solves the problem of s

Google 1.4k Dec 28, 2022
Implementation of TF-IDF algorithm to find documents similarity with cosine similarity

NLP learning Trying to learn NLP to use in my projects! Table of Contents About The Project Built With Getting Started Requirements Run Usage License

Faraz Farangizadeh 3 Aug 25, 2022
This repository details the steps in creating a Part of Speech tagger using Trigram Hidden Markov Models and the Viterbi Algorithm without using external libraries.

POS-Tagger This repository details the creation of a Part-of-Speech tagger using Trigram Hidden Markov Models to predict word tags in a word sequence.

Raihan Ahmed 1 Dec 9, 2021