A Python Library for Self Organizing Map (SOM)

Overview

SOMPY

A Python Library for Self Organizing Map (SOM)

As much as possible, the structure of SOM is similar to somtoolbox in Matlab. It has the following functionalities:

  1. Only Batch training, which is faster than online training. It has parallel processing option similar to sklearn format and it speeds up the training procedure, but it depends on the data size and mainly the size of the SOM grid.I couldn't manage the memory problem and therefore, I recommend single core processing at the moment. But nevertheless, the implementation of the algorithm is carefully done for all those important matrix calculations, such as scipy sparse matrix and numexpr for calculation of Euclidean distance.
  2. PCA (or RandomPCA (default)) initialization, using sklearn or random initialization.
  3. component plane visualization (different modes).
  4. Hitmap.
  5. U-Matrix visualization.
  6. 1-d or 2-d SOM with only rectangular, planar grid. (works well in comparison with hexagonal shape, when I was checking in Matlab with somtoolbox).
  7. Different methods for function approximation and predictions (mostly using Sklearn).

Dependencies:

SOMPY has the following dependencies:

  • numpy
  • scipy
  • scikit-learn
  • numexpr
  • matplotlib
  • pandas
  • ipdb

Installation:

python setup.py install

Many thanks to @sebastiandev, the library is now standardized in a pythonic tradition. Below you can see some basic examples, showing how to use the library. But I recommend you to go through the codes. There are several functionalities already implemented, but not documented. I would be very happy to add your new examples here.

Basic Example

Citation

There is no published paper about this library. However if possible, please cite the library as follows:

@misc{moosavi2014sompy,
  title={SOMPY: A Python Library for Self Organizing Map (SOM)},
  author={Moosavi, V and Packmann, S and Vall{\'e}s, I},
  note={GitHub.[Online]. Available: https://github. com/sevamoo/SOMPY},
  year={2014}
}

For more information, you can contact me via [email protected] or [email protected], but please report an issue first.

Thanks a lot. Best Vahid Moosavi

Comments
  • Clusters in HitMapView

    Clusters in HitMapView

    Hi,

    I have a question about the HitMapView. I do clustering as follows and the output is perfectly fine:

    cl = som.cluster(n_clusters=3)

    But the HitMapView always uses the default number of clusters which is 8 and I could not find a way to change it:

    h = hitmap.HitMapView(10, 10, 'hitmap', text_size=8, show_text=True)

    How can I specify number of clusters in a HitMapView to a different number?

    Regards Amin

    opened by anjomshoaa 15
  • Custom labels allowed in U-Matrix

    Custom labels allowed in U-Matrix

    If labels == True labels are generated using build_data_labels(), if labels is a list of strings then those are used instead.

    So all I added the following within the show() method of UMatrixView:

    if labels:
        if labels == True:
          labels = som.build_data_labels()
        for label, x, y in zip(labels, coord[:, 1], coord[:, 0]):
            plt.annotate(str(label), xy = (x, y), horizontalalignment = 'center', verticalalignment = 'center')
    
    opened by oliviaguest 13
  • Add option for desnormalizing the codebook when showing the components map

    Add option for desnormalizing the codebook when showing the components map

    I add a parameter to the show function inside the View2D class for desnormalize the components scale. This way it is far easier to interpret the results. Example of usage:

    from sompy.visualization.mapview import View2D view2D = View2D(5,1,"rand data",text_size=7) view2D.show(sm, col_sz=3, desnormalize=True)

    opened by ivallesp 8
  • ImportError: cannot import name 'SOMFactory'

    ImportError: cannot import name 'SOMFactory'

    Hello,

    I have installed sompy without any errors (https://github.com/sevamoo/SOMPY). However when i try:

    import sompy

    I receive an error code:

    File "C:\Anaconda3\lib\site-packages\sompy-1.0-py3.5.egg\sompy__init__.py", line 30, in from sompy import SOMFactory ImportError: cannot import name 'SOMFactory'

    I am using Python 3.5.1 (Anaconda 4.1.1 for Windows) Any help or suggestions would be greatly appreciated.

    Stefan

    opened by stef8310 6
  • lenght error

    lenght error

    hello, I'm trying to use some of the visualization forms, however, I always run into the same error.

    Model mapsize = [10,10] som = sompy.SOMFactory.build(diabetes, mapsize, mask=None, mapshape='planar', lattice='rect', normalization='var', initialization='pca', neighborhood='gaussian', training='batch', name='sompy') som.train(n_job=1, verbose='info')

    MapView `from sompy.visualization.mapview import View2D

    view2D = View2D(10,10,"", text_size=7) view2D.show(som) plt.show()`

    error2

    opened by Hudson11 4
  • Rect visualization bug

    Rect visualization bug

    Good day,

    There seems to be another bug, in particular, the using rect mode in train and later k-means and visualizing. This is the end product of the rect visualization using hitmapview.

    The color seems to be out of place with the labels: image

    This seems to be the color mapping associated with the hexa plot: The following is training on hexa mode and using k-means image

    I was using the an older version initially (rect visual works) and i believe there might be a chance that there is some bug associated with it. Below is the rect mod plotted using the older version. image

    opened by germayneng 4
  • Access Neuron values?

    Access Neuron values?

    Hi there,

    After the map has been trained by the algorithm, is there any way to acess the neuron weight values? I would like to check what has been stored in specific indexes to see which clusters have been identified from my training set.

    Thanks, Jack

    opened by JackActon 4
  • Examples are not worked

    Examples are not worked

    Can you make any of the examples worked please? None of it can be runned because of changes in the sompy library (parameters/methods names, etc)

    In the sample http://nbviewer.jupyter.org/urls/gist.githubusercontent.com/sevamoo/8f26d64470e00960684a/raw/SOMPY_example

    the constructor call has the wrong parameters:

    sm = SOM.SOM('sm', Data, mapsize = [msz0, msz1],norm_method = 'var',initmethod='pca')

    and so on..

    opened by zxweed 4
  • Correct hex lattice and implement hex u-matrix

    Correct hex lattice and implement hex u-matrix

    Hi there! First of all thank you for developing SOMPY! I have been using SOMPY for some time now and one of the features I think it's missing is the hexagonal grid for the u-matrix plot so I decided to implement it myself and share it here. Let me know of what you think and if this is a feature you want to add to the package or not. Also, I am planning to build an interactive u-matrix visualization in the future with Bokeh, let me know if you would be interested in adding it to the package.


    Changed generate_hex_lattice() from sompy/codebook.py such as the coordinates correspond to a regular hexagon grid (which wasn't the case previously).

    Changed sompy/visualization/umatrix.py such that if the som object has a hexagonal lattice, then the u-matrix is plotted on a hexagonal grid. All additional plot features (e.g. contour, blob, etc.) are compatible with the hexagonal grid. Also added colorbar to the plot.

    opened by dfhssilva 3
  • Add function for calculating the topographic error

    Add function for calculating the topographic error

    Hi another time,

    I just added something that I need! I wrote a function for calculating the topographic error, which in my opinion is equal or more important than the quantization error. It controls the proportion of units which model similar prototypes are contiguous. It is calculated as the proportion of all data vectors for which first and second BMUs are not adjacent units.

    The general strategy I follow for training reliable SOM is:

    • Generate several maps varying the seed and the accessible parameters like training_length, and radius.
    • Calculate the quantization and topographic errors
    • Choose, between the maps which have very low or 0 topographic errors, the one which have less quantization error. This way I am choosing a map which will be very interpretable... hint: it is a good idea, once all the maps have been trained, to draw a scatterplot representing the quantization error vs the topographic error.

    I modified the find_som function and the related ones for allowing the calculation of the 2nd BMU. If not specified, it works in a normal way

    opened by ivallesp 3
  • Implemented hexagonal lattice

    Implemented hexagonal lattice

    Hi!

    I have just implemented hex lattice and its visualization tools. Please find attached a teaser!

    Apart from that, my contribution contains

    • Global minor refactoring
    • Add function to correctly calculate the quantization error

    In the next days, I will probably be adding more support to the hex lattice and cleaning some parts of the library! I am also planning to write a tutorial jupyter notebook for with an hex lattice example

    download 1

    Best, Iván Vallés

    opened by ivallesp 2
  • Get the Codebooks

    Get the Codebooks

        Hello  Vahid Moosavi,
    

    Thank you for inventing this great sompy package. It was very useful to me. I have been using it for my article publication and it will be referenced duly.

    However I have one problem understanding the component planes. My input variables are many, hence deciphering the patterns to associations just from the component planes is very cumbersome. I need help on the sompy command which I can use to get the results from the component planes so I can do statistical inferences on it.

    I want to know if there is a way out to get this problem resolved. I will be glad if my requisition is given the due clarification.

    Counting on your cooperation. Thank you

    Best Regards, John Owusu

    Originally posted by @nanakayhacker in https://github.com/sevamoo/SOMPY/issues/128#issuecomment-1366372339

    opened by sevamoo 2
  • Distance between hexagon centers

    Distance between hexagon centers

    When running:

    sm = SOMFactory().build(data, mapsize=[10,10], normalization = 'var', initialization='random', component_names=names, lattice="hexa")
    sm.train(n_job=1, verbose=False, train_rough_len=2, train_finetune_len=5)
    
    from sompy.visualization.mapview import View2D
    view2D  = View2D(10,10,"",text_size=10)
    view2D.show(sm, col_sz=5, which_dim="all", denormalize=True)
    plt.tight_layout()
    plt.show()
    

    I get overlapping hexagons. I'd like to adjust their spacing so that they just touch or even have a little space between them.

    I have reviewed the source codes and can't seem to find where to make that adjustment.

    opened by wayneking517 0
  • setup.py error

    setup.py error

    When installing requirements using setup file python setup.py install

    An exception has occurred, use %tb to see the full traceback.

    Traceback (most recent call last):

    File "C:\Users\IPCS\anaconda3\lib\distutils\core.py", line 134, in setup ok = dist.parse_command_line()

    File "C:\Users\IPCS\anaconda3\lib\site-packages\setuptools\dist.py", line 707, in parse_command_line result = _Distribution.parse_command_line(self)

    File "C:\Users\IPCS\anaconda3\lib\distutils\dist.py", line 501, in parse_command_line raise DistutilsArgError("no commands supplied")

    DistutilsArgError: no commands supplied

    During handling of the above exception, another exception occurred:

    SystemExit: usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...] or: setup.py --help [cmd1 cmd2 ...] or: setup.py --help-commands or: setup.py cmd --help

    error: no commands supplied

    opened by lobe814 0
  • How many parameters in SOMPY?

    How many parameters in SOMPY?

    I've dataset ok 645K rows and 25 features, I've trained using sompy with this:

    # train some models and check the best one
    for i in range(7):
        sm = SOMFactory().build(data, mapsize=[random.choice(list(range(15, 25))),
                                               random.choice(list(range(10, 15)))],
                                normalization = 'var', initialization='random',
                                component_names=names, lattice="hexa")
        
        sm.train(n_job=4, verbose=False, train_rough_len=16, train_finetune_len=16)
        joblib.dump(sm, path_out+"Models/model_{}.joblib".format(i))
    

    the joblib file has about 750Mo how this can be possible?

    opened by abdoulsn 0
  • SAMMON Map

    SAMMON Map

    Hi,

    I was wondering if there is a SAMMON Map feature in sompy like that in the matlab SOMtoolbox? I was looking through the visualization folder but couldn't find something similar where I could plot the relative location of each node to each other. I tried checking the sompy output to see if there was something that could be used to make a SAMMON map but the closest thing I could find would be the "lattice_distances" however it was turning out to be a really convoluted process.

    Thanks in advance!

    opened by aspassiani 2
  • Plot data label in UMatrix

    Plot data label in UMatrix

    I was using somoclu's library but It isn't working for python 3 and I just started working with SOMPY. The idea is try to do the same thing, I would like to plot the class's point for every input. This's with somoclu image

    And this's with SOMPY image

    Is that possible?

    Thank you in advance!

    opened by deiry 1
Owner
Vahid Moosavi
Vahid Moosavi
Massively parallel self-organizing maps: accelerate training on multicore CPUs, GPUs, and clusters

Somoclu Somoclu is a massively parallel implementation of self-organizing maps. It exploits multicore CPUs, it is able to rely on MPI for distributing

Peter Wittek 239 Nov 10, 2022
Simple implementation of Self Organizing Maps (SOMs) with rectangular and hexagonal grid topologies

py-self-organizing-map Simple implementation of Self Organizing Maps (SOMs) with rectangular and hexagonal grid topologies. A SOM is a simple unsuperv

Jonas Grebe 1 Feb 10, 2022
A central task in drug discovery is searching, screening, and organizing large chemical databases

A central task in drug discovery is searching, screening, and organizing large chemical databases. Here, we implement clustering on molecular similarity. We support multiple methods to provide a interactive exploration of chemical space.

NVIDIA Corporation 124 Jan 7, 2023
basemap - Plot on map projections (with coastlines and political boundaries) using matplotlib.

Basemap Plot on map projections (with coastlines and political boundaries) using matplotlib. ⚠️ Warning: this package is being deprecated in favour of

Matplotlib Developers 706 Dec 28, 2022
Script to create an animated data visualisation for categorical timeseries data - GIF choropleth map with annotations.

choropleth_ldn Simple script to create a chloropleth map of London with categorical timeseries data. The script in main.py creates a gif of the most f

null 1 Oct 7, 2021
Time series visualizer is a flexible extension that provides filling world map by country from real data.

Time-series-visualizer Time series visualizer is a flexible extension that provides filling world map by country from csv or json file. You can know d

Long Ng 3 Jul 9, 2021
By default, networkx has problems with drawing self-loops in graphs.

By default, networkx has problems with drawing self-loops in graphs. It makes it hard to draw a graph with self-loops or to make a nicely looking chord diagram. This repository provides some code to draw self-loops nicely

Vladimir Shitov 5 Jan 6, 2022
Declarative statistical visualization library for Python

Altair http://altair-viz.github.io Altair is a declarative statistical visualization library for Python. With Altair, you can spend more time understa

Altair 8k Jan 5, 2023
Cartopy - a cartographic python library with matplotlib support

Cartopy is a Python package designed to make drawing maps for data analysis and visualisation easy. Table of contents Overview Get in touch License an

null 1.2k Jan 1, 2023
a plottling library for python, based on D3

Hello August 2013 Hello! Maybe you're looking for a nice Python interface to build interactive, javascript based plots that look as nice as all those

Mike Dewar 1.4k Dec 28, 2022
Multi-class confusion matrix library in Python

Table of contents Overview Installation Usage Document Try PyCM in Your Browser Issues & Bug Reports Todo Outputs Dependencies Contribution References

Sepand Haghighi 1.3k Dec 31, 2022
NorthPitch is a python soccer plotting library that sits on top of Matplotlib

NorthPitch is a python soccer plotting library that sits on top of Matplotlib.

Devin Pleuler 30 Feb 22, 2022
The interactive graphing library for Python (includes Plotly Express) :sparkles:

plotly.py Latest Release User forum PyPI Downloads License Data Science Workspaces Our recommended IDE for Plotly’s Python graphing library is Dash En

Plotly 12.7k Jan 5, 2023
🎨 Python Echarts Plotting Library

pyecharts Python ❤️ ECharts = pyecharts English README ?? 简介 Apache ECharts (incubating) 是一个由百度开源的数据可视化,凭借着良好的交互性,精巧的图表设计,得到了众多开发者的认可。而 Python 是一门富有表达

pyecharts 13.1k Jan 3, 2023
Declarative statistical visualization library for Python

Altair http://altair-viz.github.io Altair is a declarative statistical visualization library for Python. With Altair, you can spend more time understa

Altair 6.4k Feb 13, 2021
Python library that makes it easy for data scientists to create charts.

Chartify Chartify is a Python library that makes it easy for data scientists to create charts. Why use Chartify? Consistent input data format: Spend l

Spotify 3.2k Jan 4, 2023
:small_red_triangle: Ternary plotting library for python with matplotlib

python-ternary This is a plotting library for use with matplotlib to make ternary plots plots in the two dimensional simplex projected onto a two dime

Marc 611 Dec 29, 2022
The interactive graphing library for Python (includes Plotly Express) :sparkles:

plotly.py Latest Release User forum PyPI Downloads License Data Science Workspaces Our recommended IDE for Plotly’s Python graphing library is Dash En

Plotly 8.9k Feb 18, 2021
🎨 Python Echarts Plotting Library

pyecharts Python ❤️ ECharts = pyecharts English README ?? 简介 Apache ECharts (incubating) 是一个由百度开源的数据可视化,凭借着良好的交互性,精巧的图表设计,得到了众多开发者的认可。而 Python 是一门富有表达

pyecharts 10.6k Feb 18, 2021