Neighbourhood Retrieval (Nearest Neighbours) with Distance Correlation.

Overview

Neighbourhood Retrieval with Distance Correlation

Assign Pseudo class labels to datapoints in the latent space.

  • NNDC is a slim wrapper around FAISS.
  • NNDC transforms the space such that the Inner Product Index in FAISS (IndexFlatIP) computes the Distance Correlation.
  • Support for KernelPCA (non-linear PCA) for dimensionality reduction.

Installation

pip install git+https://github.com/The-Learning-Machines/nndc

Usage

dim = 128 
n = 20000

import nndc
import numpy as np

index = nndc.DCIndex(
    in_dim=dim, # Dimensionality of the input vectors
    threshold=0.2, # How far away from a vector is the neighbourhood
    out_dim=32, # Dimensionality of the vectors after PCA (only needed if using PCA)
    use_pca=True, # Use KernelPCA
    verbose=True,
    kernel="rbf" # Use Radial Basis Function as the kernel for KernelPCA
)

# Generate Random data
np.random.seed(1234)             
xb = np.random.random((n, dim)).astype('float32')
xb[:, 0] += np.arange(n) / 1000.
xq = np.random.random((100, dim)).astype('float32')
xq[:, 0] += np.arange(100) / 1000.

# Fit KernelPCA
index.add_pca_training_data(xb[:1000, :])
index.fit_pca()

# Add vectors to the Index
vector_ids = np.arange(xb.shape[0])
index.add(xb, vector_ids)

# Build a nerighbourhood graph
index.build_neighbourhood()

# Query the neighbours of vector with ID=0
neighbour_ids, neighbour_similarity = index[0]   
You might also like...
Fully Automated YouTube Channel ▶️with Added Extra Features.

Fully Automated Youtube Channel ▒█▀▀█ █▀▀█ ▀▀█▀▀ ▀▀█▀▀ █░░█ █▀▀▄ █▀▀ █▀▀█ ▒█▀▀▄ █░░█ ░░█░░ ░▒█░░ █░░█ █▀▀▄ █▀▀ █▄▄▀ ▒█▄▄█ ▀▀▀▀ ░░▀░░ ░▒█░░ ░▀▀▀ ▀▀▀░

GPU implementation of $k$-Nearest Neighbors and Shared-Nearest Neighbors

GPU implementation of kNN and SNN GPU implementation of $k$-Nearest Neighbors and Shared-Nearest Neighbors Supported by numba cuda and faiss library E

Scikit-event-correlation - Event Correlation and Forecasting over High Dimensional Streaming Sensor Data algorithms

scikit-event-correlation Event Correlation and Changing Detection Algorithm Theo

Neighbourhood - A python-django web app to help the residence of a given neighborhood know their surrounding better
Neighbourhood - A python-django web app to help the residence of a given neighborhood know their surrounding better

Neighbourhood A python-django web app to help the residence of a given neighborh

Open source software for image correlation, distance and analysis
Open source software for image correlation, distance and analysis

Douglas-Quaid Project Open source software for image correlation, distance and analysis. Strongly related to : Carl-Hauser Problem statement (@CIRCL)

A way to analyse how malware and/or goodware samples vary from each other using Shannon Entropy, Hausdorff Distance and Jaro-Winkler Distance
A way to analyse how malware and/or goodware samples vary from each other using Shannon Entropy, Hausdorff Distance and Jaro-Winkler Distance

A way to analyse how malware and/or goodware samples vary from each other using Shannon Entropy, Hausdorff Distance and Jaro-Winkler Distance

Hand-distance-measurement-game - Hand Distance Measurement Game
Hand-distance-measurement-game - Hand Distance Measurement Game

Hand Distance Measurement Game This is program is made to calculate the distance

Book Recommender System Using Sci-kit learn N-neighbours

Model-Based-Recommender-Engine I created a book Recommender System using Sci-kit learn's N-neighbours algorithm for my model and the streamlit library

Personal implementation of paper "Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval"

Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval This repo provides personal implementation of paper Approximate Ne

Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch
Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch

Memorizing Transformers - Pytorch Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memori

Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding for Zero-Example Video Retrieval.
Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding for Zero-Example Video Retrieval.

Dual Encoding for Video Retrieval by Text Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding

Fast Image Retrieval is an open source image retrieval framework

Fast Image Retrieval is an open source image retrieval framework release by Center of Image and Signal Processing Lab (CISiP Lab), Universiti Malaya. This framework implements most of the major binary hashing methods, together with both popular backbone networks and public datasets.

Fast Image Retrieval (FIRe) is an open source image retrieval project

Fast Image Retrieval (FIRe) is an open source image retrieval project release by Center of Image and Signal Processing Lab (CISiP Lab), Universiti Malaya. This project implements most of the major binary hashing methods to date, together with different popular backbone networks and public datasets.

Vector space based Information Retrieval System for Text Processing - Information retrieval

Information Retrieval: Text Processing Group 13 Sequence of operations Install Requirements Add given wikipedia files to the corpus directory. Downloa

Image-retrieval-baseline - MUGE Multimodal Retrieval Baseline

MUGE Multimodal Retrieval Baseline This repo is implemented based on the open_cl

🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.
🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.

In recent years, the dense retrievers based on pre-trained language models have achieved remarkable progress. To facilitate more developers using cutt

Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk
Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

Annoy Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings to search for points in space that are close to a given quer

Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk
Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

Annoy Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings to search for points in space that are close to a given quer

TorchPQ is a python library for Approximate Nearest Neighbor Search (ANNS) and Maximum Inner Product Search (MIPS) on GPU using Product Quantization (PQ) algorithm.
Owner
The Learning Machines
The Learning Machines
A framework for building (and incrementally growing) graph-based data structures used in hierarchical or DAG-structured clustering and nearest neighbor search

A framework for building (and incrementally growing) graph-based data structures used in hierarchical or DAG-structured clustering and nearest neighbor search

Nicholas Monath 31 Nov 3, 2022
My project contrasts K-Nearest Neighbors and Random Forrest Regressors on Real World data

kNN-vs-RFR My project contrasts K-Nearest Neighbors and Random Forrest Regressors on Real World data In many areas, rental bikes have been launched to

null 1 Oct 28, 2021
Implementation of K-Nearest Neighbors Algorithm Using PySpark

KNN With Spark Implementation of KNN using PySpark. The KNN was used on two separate datasets (https://archive.ics.uci.edu/ml/datasets/iris and https:

Zachary Petroff 4 Dec 30, 2022
Anomaly Detection and Correlation library

luminol Overview Luminol is a light weight python library for time series data analysis. The two major functionalities it supports are anomaly detecti

LinkedIn 1.1k Jan 1, 2023
Optimal Randomized Canonical Correlation Analysis

ORCCA Optimal Randomized Canonical Correlation Analysis This project is for the python version of ORCCA algorithm. It depends on Numpy for matrix calc

Yinsong Wang 1 Nov 21, 2021
A Python-based application demonstrating various search algorithms, namely Depth-First Search (DFS), Breadth-First Search (BFS), and A* Search (Manhattan Distance Heuristic)

A Python-based application demonstrating various search algorithms, namely Depth-First Search (DFS), Breadth-First Search (BFS), and the A* Search (using the Manhattan Distance Heuristic)

null 17 Aug 14, 2022
SIMD-accelerated bitwise hamming distance Python module for hexidecimal strings

hexhamming What does it do? This module performs a fast bitwise hamming distance of two hexadecimal strings. This looks like: DEADBEEF = 1101111010101

Michael Recachinas 12 Oct 14, 2022
ml4ir: Machine Learning for Information Retrieval

ml4ir: Machine Learning for Information Retrieval | changelog Quickstart → ml4ir Read the Docs | ml4ir pypi | python ReadMe ml4ir is an open source li

Salesforce 77 Jan 6, 2023
A gesture recognition system powered by OpenPose, k-nearest neighbours, and local outlier factor.

OpenHands OpenHands is a gesture recognition system powered by OpenPose, k-nearest neighbours, and local outlier factor. Currently the system can iden

Paul Treanor 12 Jan 10, 2022