Interactive dimensionality reduction for large datasets

Last update: Dec 14, 2022

Related tags

Deep Learning blossom

Overview

BlosSOM 🌼

BlosSOM is a graphical environment for running semi-supervised dimensionality reduction with EmbedSOM. You can use it to explore multidimensional datasets, and produce great-looking 2-dimensional visualizations.

WARNING: BlosSOM is still under development, some stuff may not work right, but things will magically improve without notice. Feel free to open an issue if something looks wrong.

❓ Overview
🔧 Compiling and running
➡️ How-To 💡
📘 Documentation

BlosSOM was developed at the MFF UK Prague, in cooperation with IOCB Prague.

Overview

BlosSOM creates a landmark-based model of the dataset, and dynamically projects all dataset point to your screen (using EmbedSOM). Several other algorithms and tools are provided to manage the landmarks; a quick overview follows:

High-dimensional landmark positioning:
- Self-organizing maps
- k-Means
2D landmark positioning
- k-NN graph generation (only adds edges, not vertices)
- force-based graph layouting
- dynamic t-SNE
Dimensionality reduction
- EmbedSOM
- CUDA EmbedSOM (with roughly 500x speedup, enabling smooth display of a few millions of points)
Manual landmark position optimization
Visualization settings (colors, transparencies, cluster coloring, ...)
Dataset transformations and dimension scaling
Import from matrix-like data files
- FCS3.0 (Flow Cytometry Standard files)
- TSV (Tab-separated CSV)
Export of the data for plotting

Compiling and running BlosSOM

You will need cmake build system and SDL2.

For CUDA EmbedSOM to work, you need the NVIDIA CUDA toolkit. Append -DBUILD_CUDA=1 to cmake options to enable the CUDA version.

Windows (Visual Studio 2019)

Dependencies

The project requires SDL2 as an external dependency:

install vcpkg tool and remember your vcpkg directory
install SDL: vcpkg install SDL2:x64-windows

Compilation

git submodule init
git submodule update

mkdir build
cd build

# You need to fix the path to vcpkg in the following command:
cmake .. -G "Visual Studio 16 2019" -A x64 -DCMAKE_BUILD_TYPE="Release" -DCMAKE_INSTALL_PREFIX=./inst -DCMAKE_TOOLCHAIN_FILE=your-vcpkg-clone-directory/scripts/buildsystems/vcpkg.cmake

cmake --build . --config Release
cmake --install . --config Release

Running

Open Visual Studio solution BlosSOM.sln, set blossom as startup project, set configuration to Release and run the project.

Linux (and possibly other unix-like systems)

Dependencies

The project requires SDL2 as an external dependency. Install libsdl2-dev (on Debian-based systems) or SDL2-devel (on Red Hat-based systems), or similar (depending on the Linux distribution). You should be able to install cmake package the same way.

Compilation

git submodule init
git submodule update

mkdir build
cd build
cmake .. -DCMAKE_INSTALL_PREFIX=./inst    # or any other directory
make install                              # use -j option to speed up the build

Running

./inst/bin/blossom

Documentation

Basic usage of the software and the description of the user interface is available in HOWTO.md.
Some technical details about the code may be found in src/README.md.
Doxygen-generated documentation of the source code can be found at https://molnsona.github.io/blossom/

Quickstart

Click on the "plus" button on the bottom right side of the window
Choose Open file (the first button from the top) and open a file from the demo_data/ directory
You can now add and delete landmarks using ctrl+mouse click, and drag them around.
Use the tools and settings available under the "plus" button to optimize the landmark positions and get a better visualization.

See the HOWTO for more details and hints.

Performance and CUDA

If you pass -DBUILD_CUDA=1 to the cmake commands, you will get extra executable called blossom_cuda (or blossom_cuda.exe, on Windows).

The 2 versions of BlosSOM executable differ mainly in the performance of EmbedSOM projection, which is more than 100× faster on GPUs than on CPUs. If the dataset gets large, only a fixed-size slice of the dataset gets processed each frame (e.g., at most 1000 points in case of CPU) to keep the framerate in a usable range. The defaults in BlosSOM should work smoothly for many use-cases (defaulting at 1k points per frame on CPU and 50k points per frame on GPU).

If required (e.g., if you have a really fast GPU), you may modify the constants in the corresponding source files, around the call sites of clean_range(), which is the function that manages the round-robin refreshing of the data. Functionality that dynamically chooses the best data-crunching rate is being implemented and should be available soon.

License

BlosSOM is licensed under GPLv3 or later. Several small libraries bundled in the repository are licensed with MIT-style licenses.

You might also like...

[SIGGRAPH'22] StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets

[Project] [PDF] This repository contains code for our SIGGRAPH'22 paper "StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets" by Axel Sauer, Katja

742 Jan 4, 2023

An easy way to build PyTorch datasets. Modularly build datasets and automatically cache processed results

EasyDatas An easy way to build PyTorch datasets. Modularly build datasets and automatically cache processed results Installation pip install git+https

4 Dec 14, 2021

Deep Learning Datasets Maker is a QGIS plugin to make datasets creation easier for raster and vector data.

Deep Learning Dataset Maker Deep Learning Datasets Maker is a QGIS plugin to make datasets creation easier for raster and vector data. How to use Down

25 Dec 15, 2022

Cl datasets - PyTorch image dataloaders and utility functions to load datasets for supervised continual learning

Continual learning datasets Introduction This repository contains PyTorch image

5 Aug 28, 2022

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

This is the Vowpal Wabbit fast online learning code. Why Vowpal Wabbit? Vowpal Wabbit is a machine learning system which pushes the frontier of machin

8.1k Jan 6, 2023

Reviving Iterative Training with Mask Guidance for Interactive Segmentation

Comments

speed up compilation by turning off unused magnum parts

From a conversation with @mosra (thanks!!!)

set(WITH_INTERCONNECT OFF CACHE BOOL "")
set(WITH_PLUGINMANAGER OFF CACHE BOOL "")
set(WITH_TESTSUITE OFF CACHE BOOL "")
set(WITH_MAIN OFF CACHE BOOL "") # maybe needed for windows

set(WITH_DEBUGTOOLS OFF CACHE BOOL "")
set(WITH_MESHTOOLS OFF CACHE BOOL "")
set(WITH_PRIMITIVES OFF CACHE BOOL "")
set(WITH_SCENEGRAPH OFF CACHE BOOL "")
set(WITH_SCENETOOLS OFF CACHE BOOL "")
set(WITH_SHADERTOOLS OFF CACHE BOOL "")
set(WITH_VK OFF CACHE BOOL "")
set(WITH_TEXT OFF CACHE BOOL "")
set(WITH_TEXTURETOOLS OFF CACHE BOOL "")
set(WITH_TRADE OFF CACHE BOOL "")

opened by exaexa 1

Fix `asinh` transformation issues

Manipulating the asinh transformations in trans_data sometimes results in scaling problems, where all data become Inf, then NaNs, and reload is needed.

I guess it's because of some missing checks in the dynamic computation of means&variances.
bug

opened by exaexa 0
Check OSX compatibility

AFAIK there was no attempt to compile BlosSOM on Macs, but at least the slow CUDA-less part should work well.

We should check the compatibility (if someone finds a mac :D ) and add build instructions to README.

opened by exaexa 0
Dynamic data throughput

Currently the amount of computation done each frame is fixed (see #3 ). It would be much better to collect timing information and try to hit the nice 50fps with maximum data processing speed.
enhancement

opened by exaexa 0

Interactive dimensionality reduction for large datasets

Related tags

Overview

BlosSOM 🌼

Overview

Compiling and running BlosSOM

Windows (Visual Studio 2019)

Dependencies

Compilation

Running

Linux (and possibly other unix-like systems)

Dependencies

Compilation

Running

Documentation

Quickstart

Performance and CUDA

License

You might also like...

[SIGGRAPH'22] StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets

An easy way to build PyTorch datasets. Modularly build datasets and automatically cache processed results

Deep Learning Datasets Maker is a QGIS plugin to make datasets creation easier for raster and vector data.

Cl datasets - PyTorch image dataloaders and utility functions to load datasets for supervised continual learning

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

Reviving Iterative Training with Mask Guidance for Interactive Segmentation

Your interactive network visualizing dashboard

[CVPR 2021] Anycost GANs for Interactive Image Synthesis and Editing

Open source code for Paper "A Co-Interactive Transformer for Joint Slot Filling and Intent Detection"

Comments

speed up compilation by turning off unused magnum parts

Fix `asinh` transformation issues

Check OSX compatibility

Dynamic data throughput

Owner

TLDR: Twin Learning for Dimensionality Reduction

DimReductionClustering - Dimensionality Reduction + Clustering + Unsupervised Score Metrics

PyTorch implementation of the paper Deep Networks from the Principle of Rate Reduction

Source code for NAACL 2021 paper "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference"

Official NumPy Implementation of Deep Networks from the Principle of Rate Reduction (2021)

TensorFlow implementation of Barlow Twins (Barlow Twins: Self-Supervised Learning via Redundancy Reduction)

InDuDoNet+: A Model-Driven Interpretable Dual Domain Network for Metal Artifact Reduction in CT Images

An official source code for paper Deep Graph Clustering via Dual Correlation Reduction, accepted by AAAI 2022

Official implementation of "Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets" (CVPR2021)

A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.