MiniSom is a minimalistic implementation of the Self Organizing Maps

Overview

MiniSom

Self Organizing Maps

MiniSom is a minimalistic and Numpy based implementation of the Self Organizing Maps (SOM). SOM is a type of Artificial Neural Network able to convert complex, nonlinear statistical relationships between high-dimensional data items into simple geometric relationships on a low-dimensional display. Minisom is designed to allow researchers to easily build on top of it and to give students the ability to quickly grasp its details.

Updates about MiniSom are posted on Twitter.

Installation

Just use pip:

pip install minisom

or download MiniSom to a directory of your choice and use the setup script:

git clone https://github.com/JustGlowing/minisom.git
python setup.py install

How to use it

In order to use MiniSom you need your data organized as a Numpy matrix where each row corresponds to an observation or as list of lists like the following:

data = [[ 0.80,  0.55,  0.22,  0.03],
        [ 0.82,  0.50,  0.23,  0.03],
        [ 0.80,  0.54,  0.22,  0.03],
        [ 0.80,  0.53,  0.26,  0.03],
        [ 0.79,  0.56,  0.22,  0.03],
        [ 0.75,  0.60,  0.25,  0.03],
        [ 0.77,  0.59,  0.22,  0.03]]      

Then you can train MiniSom just as follows:

from minisom import MiniSom    
som = MiniSom(6, 6, 4, sigma=0.3, learning_rate=0.5) # initialization of 6x6 SOM
som.train(data, 100) # trains the SOM with 100 iterations

You can obtain the position of the winning neuron on the map for a given sample as follows:

som.winner(data[0])

For an overview of all the features implemented in minisom you can browse the following examples: https://github.com/JustGlowing/minisom/tree/master/examples

Export a SOM and load it again

A model can be saved using pickle as follows

import pickle
som = MiniSom(7, 7, 4)

# ...train the som here

# saving the som in the file som.p
with open('som.p', 'wb') as outfile:
    pickle.dump(som, outfile)

and can be loaded as follows

with open('som.p', 'rb') as infile:
    som = pickle.load(infile)

Note that if a lambda function is used to define the decay factor MiniSom will not be pickable anymore.

Explore parameters

You can use this dashboard to explore the effect of the parameters on a sample dataset: https://share.streamlit.io/justglowing/minisom/dashboard/dashboard.py

Examples

Here are some of the charts you'll see how to generate in the examples:

Seeds map Class assignment
Handwritteng digits mapping Hexagonal Topology som hexagonal toplogy
Color quantization Outliers detection

Other tutorials

How to cite MiniSom

@misc{vettigliminisom,
  title={MiniSom: minimalistic and NumPy-based implementation of the Self Organizing Map},
  author={Giuseppe Vettigli},
  year={2018},
  url={https://github.com/JustGlowing/minisom/},
}

Who uses Minisom?

Guidelines to contribute

  1. In the description of your Pull Request explain clearly what does it implements/fixes and your changes. Possibly give an example in the description of the PR. In cases that the PR is about a code speedup, report a reproducible example and quantify the speedup.
  2. Give your pull request a helpful title that summarises what your contribution does.
  3. Write unit tests for your code and make sure the existing tests are up to date. pytest can be used for this:
pytest minisom.py
  1. Make sure that there a no stylistic issues using pycodestyle:
pycodestyle minisom.py
  1. Make sure your code is properly commented and documented. Each public method needs to be documented as the existing ones.
Comments
  • Introducing possibility to train the SOM so that learning_rate and sigma are constant during one epoch.

    Introducing possibility to train the SOM so that learning_rate and sigma are constant during one epoch.

    This pull request introduces the possibility to train the SOM so that learning_rate and sigma are only being decreased after each epoch. During one epoch the SOM is updated once per given input vector (=len(data) times) with constant learning_rate and sigma. This should lead to a greater independence between the order of the input vectors and the resulting SOM.

    In order to use this feature, one only has to use train_epochs() instead of train().

    learning_rate and sigma could (should?) technically be updated only once every epoch but in order to change as little code as possible those parameters are still updated every time update() gets called (but with constant paramters during one epoch). This could be 'optimised' if desired.

    opened by jriege555 22
  • Fixed topographic_error() and quantization_error()

    Fixed topographic_error() and quantization_error()

    Problems:

    • The previous topographic_error() method is incorrect. bmu_1 and bmu_2 are not the coordinates of the best two matching units.
    • The previous topographic_error() and quantization_error() uses explicit for-loops, which is very slow.

    Fixes:

    • Fixed incorrect implementation of topographic_error() method.
    • Changed the topographic_error() and quantization_error() methods with vectorized implementation.
    opened by wei-zhang-thz 17
  • quantization error (theoretical question)

    quantization error (theoretical question)

    I have a question about the interpretability of the quantization error.

    How can we know that the SOM is reliable ? does the quantization error need to be lower than a certain value ?

    For exemple, in my case, i have a quantization errror of 7.0 which is quite high in comparison to the exemple given in the documentation. Does that mean my som is not reliable ?

    question 
    opened by lachhebo 13
  • Do you know why nodes change completely when I reran the same setup with varying number of iterations?

    Do you know why nodes change completely when I reran the same setup with varying number of iterations?

    Hey :-)

    First of all thank you for providing this tool, it seems very handy! I am using SOM with geopotential height anomalies over a given region as input variables to cluster meteorological circulation patterns (ca. 2000 observations). What is really strange is that the SOM nodes differ completely when I rerun the same setup with more iterations (e.g. doubling from 10000 to 20000). It produces nodes not only in a different order, but also such that have no analogue in the new SOM... Is there anything I am doing wrong?

    Thank you very much - below some details about the setup

    The example I am using most often is sigma=1 (Gaussian), lr=0.5, SOM sizes between 2x4 to 4x5. The problem occurs no matter the initialization (pca or random) and no matter the training (single, batch, random). My code is basically only:

    SOM

    som = MiniSom(som_m, som_n, ndims, sigma=sigma, learning_rate=lr, neighborhood_function='gaussian') som.pca_weights_init(somarr) som.train_batch(somarr,10000,verbose=True)

    ...

    plot

    for m in range(som_m): for n in range(som_n): ax... pltarr = som.get_weights()[m,n,:].reshape((nsomlat,nsomlon)) p = ax.contourf(somlons,somlats,pltarr,cmap='seismic', transform=ccrs.PlateCarree())

    question 
    opened by michel039 12
  • Vectorized the _activate function

    Vectorized the _activate function

    Great library, but I noticed that the training code for your SOMs is not vectorized. You use the fast_norm function a lot, which may be faster than linalg.norm for 1D arrays, but iterating over every spot in the SOM is a lot slower than just calling linalg.norm.

    This pull request replaces fast_norm with linalg.norm in 2 places where I saw iteration over the whole SOM. Some simple testing with a 100x100 SOM showed ~40x speedup on my laptop.

    After making the changes, the unit tests failed, which I believe is caused by incorrectly setting up the testing weights as a 2D array rather than a 3D array. So I changed that too, and now the unit tests pass. I also did a few rough tests of my own, and the results of self.winner(x) and the training seem to be the same as before.

    opened by AustinT 11
  • Time Series

    Time Series

    Hello! I am trying to use my time series data for the example uploaded, but I encounter this error when initializing pca. Also, the second image is the error that I encounter when I use random initialization.

    image image

    opened by jaybhiesantos 10
  • How to cluster images?

    How to cluster images?

    I would like to know how to cluster images instead of reading CSV I want to read all images from disk and cluster those images using SOM.

    Can you please share some examples?

    opened by balavenkatesh3322 10
  • Example: Hexagonal Topology bokeh

    Example: Hexagonal Topology bokeh

    Summary

    This branch actions on https://github.com/JustGlowing/minisom/issues/86 by adding to the existing examples/HexagonalTopology.ipynb notebook an interactive bokeh example of the equivalent matplotlib plot.

    The purpose of adding interactivity was so that further exploration could be conducted on the plot to see where the original data points are mapped to in the SOM space.

    Check

    • [x] This branch adds value to the main repository, so it is worthwhile to include.
    • [x] The bokeh plot is equivalent to the matplotlib plot.
    • [ ] The code is error free and works on your machine.
    • [x] The logic of showing data points in the hover tooltip is sound.

    Note

    This "closes #86".

    opened by avisionh 10
  • speed up in update method

    speed up in update method

    Hi! Thanks for sharing the library! I noticed that if you change the loop in the update method with an einsum operation you can speed up the training by some amount. Hope you find it useful. Christos

    opened by Sourmpis 10
  • Add topographic error calculation for hexagonal grid

    Add topographic error calculation for hexagonal grid

    This PR adds the functionality for Topographic Error calculation, computed by finding the first-best-matching and second-best-matching neurons in the hexagonal grid.

    Screenshot 2022-04-12 005139

    The topographic error calculation is based on the above equation, which considers if the first-best-matching and second-best-matching neurons are neighbors in the SOM grid.

    opened by TharindaDilshan 9
  • new visualizations

    new visualizations

    Hi, I have implemented a number of visualizations in the BasicUsage file. Addionally, I did some minor changes (mainly typos) in some other files. As this is my first use of github, I do not know how to separate both topics and make two pull requests... I hope this works out!

    opened by bijae 9
  • Topographic error wrong for hexagonal topography with rectangular grid

    Topographic error wrong for hexagonal topography with rectangular grid

    Hi,

    I am trying to get the topographic error from a SOM with 11x7 neurons, hexagonal topography.

    When I do, I get this error:

         21     return (-1, -1)
         22 y = som._weights.shape[1]
    ---> 23 coords = som.convert_map_to_euclidean((index % y, int(index/y)))
         24 return coords
    
    File ~/.local/lib/python3.8/site-packages/minisom.py:243, in MiniSom.convert_map_to_euclidean(self, xy)
        237 def convert_map_to_euclidean(self, xy):
        238     """Converts map coordinates into euclidean coordinates
        239     that reflects the chosen topology.
        240 
        241     Only useful if the topology chosen is not rectangular.
        242     """
    --> 243     return self._xx.T[xy], self._yy.T[xy]
    
    IndexError: index 8 is out of bounds for axis 1 with size 7
    

    I don't think this line of code makes sense:

    coords = som.convert_map_to_euclidean((index % y, int(index/y)))

    Shouldn't the parameters be inverted, e.g.:

    coords = som.convert_map_to_euclidean((int(index/y), index % y))

    Anyway, thanks for the amazing work!

    bug 
    opened by mbarison 6
  • Matching Matlab hyperparameters

    Matching Matlab hyperparameters

    Hi there!Thank you for this great work!

    I switched to using python from the Matlab, version of SOM However I found the result was quite different. Where I could have a perfect 100% in MatLab but somehow only get 19% in f1-score here.

    The only thing I changed from the default setting in Matlab is using a 10*10. som = MiniSom(10, 10, 4096, sigma=1.5, learning_rate=0.7,activation_distance='euclidean', neighborhood_function='gaussian', topology='hexagonal', random_seed=10) And this is what I had for my settings using minisom.

    Any suggestions so I could maybe recreate the result from Matlab?

    Thank you in advance!

    question 
    opened by AmousQiu 3
  • Is there a way to obtain a distance of each point to its BMU?

    Is there a way to obtain a distance of each point to its BMU?

    Hi, first and foremost thank you for your great work and allowing to implement SOM algorithm in such convienent way. I wanted to ask if there is a possibility to obtain a kind of list with the distances between each point and its Best Matching Unit (Node) on trained SOM grid? I have read the documentation and saw different attributes for the SOM object, however it appears to me that none of them allow to return the (euclidean) distance to BMU. Thanks in advance for support!

    question 
    opened by JMiklaszewski 1
  • Is there an option to obtain the BMU value directly?

    Is there an option to obtain the BMU value directly?

    Hi there,

    I am trying to use BMU values a metric to classify my data. Features are seismic attributes. Your function “distance_from_weights” was my first guess but it´s not exporting BMUS directly. We do have to manipulate it to remove the second BMU.

    np.argsort(distance_from_weights(data), axis=1)[:, :2] -----> np.argsort(distance_from_weights(data), axis=1)[:, :1]

    Do you mind to build that function?

    question 
    opened by akol67 1
  • Wrong value in topographic error function?

    Wrong value in topographic error function?

    So a topographic error occurs when the two bmu of a sample are not adjacent. Shouldn't then t = 1? If the bmu are two hops apart in a corner, their euclidean distance is sqrt(2) = 1.4142 . So with distance > 1.42 this doesn't count as an error. Or am I missing something?

    question 
    opened by SandroMartens 0
  • Example spatio-temporal climate data

    Example spatio-temporal climate data

    This pull request is to load a SOM example on climate data notebook, which is usually 2D (time, lat, lon).

    I've been looking a lot into SOM examples, and it's hard to find examples on climate data...so I hope this notebook can help future users (and also me, if you find something wrong on the use).

    For the example, I've used the tutorial dataset from Xarray.

    opened by carocamargo 2
Releases(2.3.0)
Owner
Giuseppe Vettigli
Data Scientist, teaching fellow, Python enthusiast, fearless visionarist, lateral thinker.
Giuseppe Vettigli
The most simple and minimalistic navigation dashboard.

Navigation This project follows a goal to have simple and lightweight dashboard with different links. I use it to have my own self-hosted service dash

Yaroslav 23 Dec 23, 2022
Implementation of "Fast and Flexible Temporal Point Processes with Triangular Maps" (Oral @ NeurIPS 2020)

Fast and Flexible Temporal Point Processes with Triangular Maps This repository includes a reference implementation of the algorithms described in "Fa

Oleksandr Shchur 20 Dec 2, 2022
Pytorch implementation of CVPR2020 paper “VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation”

VectorNet Re-implementation This is the unofficial pytorch implementation of CVPR2020 paper "VectorNet: Encoding HD Maps and Agent Dynamics from Vecto

null 120 Jan 6, 2023
Image morphing without reference points by applying warp maps and optimizing over them.

Differentiable Morphing Image morphing without reference points by applying warp maps and optimizing over them. Differentiable Morphing is machine lea

Alex K 380 Dec 19, 2022
PyTorch implementations of the paper: "Learning Independent Instance Maps for Crowd Localization"

IIM - Crowd Localization This repo is the official implementation of paper: Learning Independent Instance Maps for Crowd Localization. The code is dev

tao han 91 Nov 10, 2022
Spatial Intention Maps for Multi-Agent Mobile Manipulation (ICRA 2021)

spatial-intention-maps This code release accompanies the following paper: Spatial Intention Maps for Multi-Agent Mobile Manipulation Jimmy Wu, Xingyua

Jimmy Wu 70 Jan 2, 2023
Spatial Action Maps for Mobile Manipulation (RSS 2020)

spatial-action-maps Update: Please see our new spatial-intention-maps repository, which extends this work to multi-agent settings. It contains many ne

Jimmy Wu 27 Nov 30, 2022
Reimplementation of the paper `Human Attention Maps for Text Classification: Do Humans and Neural Networks Focus on the Same Words? (ACL2020)`

Human Attention for Text Classification Re-implementation of the paper Human Attention Maps for Text Classification: Do Humans and Neural Networks Foc

Shunsuke KITADA 15 Dec 13, 2021
Range Image-based LiDAR Localization for Autonomous Vehicles Using Mesh Maps

Range Image-based 3D LiDAR Localization This repo contains the code for our ICRA2021 paper: Range Image-based LiDAR Localization for Autonomous Vehicl

Photogrammetry & Robotics Bonn 208 Dec 15, 2022
Deep Compression for Dense Point Cloud Maps.

DEPOCO This repository implements the algorithms described in our paper Deep Compression for Dense Point Cloud Maps. How to get started (using Docker)

Photogrammetry & Robotics Bonn 67 Dec 6, 2022
Neural Surface Maps

Neural Surface Maps Official implementation of Neural Surface Maps - Luca Morreale, Noam Aigerman, Vladimir Kim, Niloy J. Mitra [Paper] [Project Page]

Luca Morreale 49 Dec 13, 2022
Riemannian Convex Potential Maps

Modeling distributions on Riemannian manifolds is a crucial component in understanding non-Euclidean data that arises, e.g., in physics and geology. The budding approaches in this space are limited by representational and computational tradeoffs. We propose and study a class of flows that uses convex potentials from Riemannian optimal transport. These are universal and can model distributions on any compact Riemannian manifold without requiring domain knowledge of the manifold to be integrated into the architecture. We demonstrate that these flows can model standard distributions on spheres, and tori, on synthetic and geological data.

Facebook Research 61 Nov 28, 2022
OrienMask: Real-time Instance Segmentation with Discriminative Orientation Maps

OrienMask This repository implements the framework OrienMask for real-time instance segmentation. It achieves 34.8 mask AP on COCO test-dev at the spe

null 45 Dec 13, 2022
A large dataset of 100k Google Satellite and matching Map images, resembling pix2pix's Google Maps dataset.

Larger Google Sat2Map dataset This dataset extends the aerial ⟷ Maps dataset used in pix2pix (Isola et al., CVPR17). The provide script download_sat2m

null 34 Dec 28, 2022
constructing maps of intellectual influence from publication data

Influencemap Project @ ANU Influence in the academic communities has been an area of interest for researchers. This can be seen in the popularity of a

CS Metrics 13 Jun 18, 2022
Approaches to modeling terrain and maps in python

topography ?? Contains different approaches to modeling terrain and topographic-style maps in python Features Inverse Distance Weighting (IDW) A given

John Gutierrez 1 Aug 10, 2022
Faune proche - Retrieval of Faune-France data near a google maps location

faune_proche Récupération des données de Faune-France près d'un lieu google maps

null 4 Feb 15, 2022
Locally Enhanced Self-Attention: Rethinking Self-Attention as Local and Context Terms

LESA Introduction This repository contains the official implementation of Locally Enhanced Self-Attention: Rethinking Self-Attention as Local and Cont

Chenglin Yang 20 Dec 31, 2021
This repository contains the code for "Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP".

Self-Diagnosis and Self-Debiasing This repository contains the source code for Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based

Timo Schick 62 Dec 12, 2022