The implementation for paper Joint t-SNE for Comparable Projections of Multiple High-Dimensional Datasets.

IDEAS Lab

Last update: Dec 18, 2022

Related tags

Deep Learning Joint_tsne

Overview

Joint t-sne

This is the implementation for paper Joint t-SNE for Comparable Projections of Multiple High-Dimensional Datasets.

abstract:

We present Joint t-Stochastic Neighbor Embedding (Joint t-SNE), a technique to generate comparable projections of multiple high-dimensional datasets. Although t-SNE has been widely employed to visualize high-dimensional datasets from various domains, it is limited to projecting a single dataset. When a series of high-dimensional datasets, such as datasets changing over time, is projected independently using t-SNE, misaligned layouts are obtained. Even items with identical features across datasets are projected to different locations, making the technique unsuitable for comparison tasks. To tackle this problem, we introduce edge similarity, which captures the similarities between two adjacent time frames based on the Graphlet Frequency Distribution (GFD). We then integrate a novel loss term into the t-SNE loss function, which we call vector constraints, to preserve the vectors between projected points across the projections, allowing these points to serve as visual landmarks for direct comparisons between projections. Using synthetic datasets whose ground-truth structures are known, we show that Joint t-SNE outperforms existing techniques, including Dynamic t-SNE, in terms of local coherence error, Kullback-Leibler divergence, and neighborhood preservation. We also showcase a real-world use case to visualize and compare the activation of different layers of a neural network.

Environment:

This is a hybrid programming based on C++ and Python, and supported by shell script.
It requires Qt, Python 3.6, numpy and scikit-learn.

How to use:

Put the directory of your data sequence, e.g. "YOUR_DATA" in ./data. There are several requirements on the format and organization of your data:
- Each data frame is named as f_i.txt, where i is the time step/index of this data frame in the sequence.
- The j th row of the data frame contains both the feature vector and label of the j th item, which is seperated by \tab. The label is at the last position.
- All data frames must have the same number of rows, and the the same item is at the same row in different data frames to compute the node similarities one by one.
Create a configuration file, e.g. "YOUR_DATA.json" in ./config, which is organized as a json structure.

 
  
  {
  "algo": {
    "k_closest_count": 3,
    "perplexity": 70,
    "bfs_level": 1,
    "gamma": 0.1
  },
  "thesne": {
    "data_name": "YOUR_DATA",
    "pts_size": 2000,
    "norm": false,
    "data_ids": [1, 3, 6, 9],
    "data_dims": [100, 100, 100, 100, 100, 100, 100, 100, 100, 100],
    "data_titles": [
      "t=0",
      "t=1",
      "t=2",
      "t=3",
      "t=4",
      "t=5",
      "t=6",
      "t=7",
      "t=8",
      "t=9"
    ]
  }
}

In this file, algo represents the hyperparamters of our algorithm except for bfs_level, which always equals to 1. thesne contains the information of the input data. Please remember that data_name must be consistent with the directory name in the previous step.

Create a shell script, e.g. "YOUR_DATA.sh" in ./scripts as below:

 
  
  # !/bin/bash
# 1. specify the path of the configuration file
config_path="config/YOUR_DATA.json"

workdir=$(pwd)

# 2. build knn graph for each data frame
python3 codes/graphBuild/run.py $config_path

# 3. compute edge similarities between each two adjacent data frames
buildDir="codes/graphSim/build"
if [ ! -d $buildDir ]; then
    mkdir $buildDir
    echo "create directory ${buildDir}"
else
    echo "directory ${buildDir} already exists."
fi
cd $buildDir
qmake ../
make

cd $workdir

# bin is dependent on your operating system
bin=$buildDir/graphSim.app/Contents/MacOS/graphSim
$bin $config_path


# 4. run t-sne optimization
python3 codes/thesne/run.py $config_path

There are several places you should pay attention to.

Again, config_path must be consitent with the name of configuration file in the previous step
bin is dependent on your operating system. If you use linux, you probably should change it to
```
  bin=$buildDir/graphSim
```

In root directory, type

 
  
  sh scripts/YOUR_DATA.sh

The final embeddings will be generated in ./results/YOUR_DATA.

Optionally, you can use codes/draw/run.py to plot the embeddings.

Example:

You can find an example in ./scripts/10_cluster_contract.sh.

You might also like...

Official implementation of the ICCV 2021 paper "Joint Inductive and Transductive Learning for Video Object Segmentation"

JOINT This is the official implementation of Joint Inductive and Transductive learning for Video Object Segmentation, to appear in ICCV 2021. @inproce

35 Oct 16, 2022

One implementation of the paper "DMRST: A Joint Framework for Document-Level Multilingual RST Discourse Segmentation and Parsing".

Introduction One implementation of the paper "DMRST: A Joint Framework for Document-Level Multilingual RST Discourse Segmentation and Parsing". Users

18 Dec 11, 2022

An implementation of the [Hierarchical (Sig-Wasserstein) GAN] algorithm for large dimensional Time Series Generation

Comments

Missing header file in codes/graphSim/math_utils.cpp

Missing header files ("assert.h", "math.h") cause some functions to be unavailable and cause errors. I don't know if the bug was caused by system compatibility or negligence, or the author deliberately did it, so I propose to modify the file in question.

opened by HazardTrigger 1

The implementation for paper Joint t-SNE for Comparable Projections of Multiple High-Dimensional Datasets.

Related tags

Overview

Joint t-sne

abstract:

Environment:

How to use:

Example:

You might also like...

Official implementation of the ICCV 2021 paper "Joint Inductive and Transductive Learning for Video Object Segmentation"

One implementation of the paper "DMRST: A Joint Framework for Document-Level Multilingual RST Discourse Segmentation and Parsing".

An implementation of the [Hierarchical (Sig-Wasserstein) GAN] algorithm for large dimensional Time Series Generation

An implementation of chunked, compressed, N-dimensional arrays for Python.

An easy way to build PyTorch datasets. Modularly build datasets and automatically cache processed results

Deep Learning Datasets Maker is a QGIS plugin to make datasets creation easier for raster and vector data.

Cl datasets - PyTorch image dataloaders and utility functions to load datasets for supervised continual learning

Object detection on multiple datasets with an automatically learned unified label space.

[CVPR2021] DoDNet: Learning to segment multi-organ and tumors from multiple partially labeled datasets

Comments

Missing header file in codes/graphSim/math_utils.cpp

Owner

IDEAS Lab

An unofficial personal implementation of UM-Adapt, specifically to tackle joint estimation of panoptic segmentation and depth prediction for autonomous driving datasets.

SNE-RoadSeg in PyTorch, ECCV 2020

Company clustering with K-means/GMM and visualization with PCA, t-SNE, using SSAN relation extraction

Code to run experiments in SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression.

Torch-based tool for quantizing high-dimensional vectors using additive codebooks

Scikit-event-correlation - Event Correlation and Forecasting over High Dimensional Streaming Sensor Data algorithms

Code for HLA-Face: Joint High-Low Adaptation for Low Light Face Detection (CVPR21)

PyTorch implementation of DirectCLR from paper Understanding Dimensional Collapse in Contrastive Self-supervised Learning

Official PyTorch implementation of MX-Font (Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts)

This repo is a PyTorch implementation for Paper "Unsupervised Learning for Cuboid Shape Abstraction via Joint Segmentation from Point Clouds"