Massively parallel self-organizing maps: accelerate training on multicore CPUs, GPUs, and clusters

Peter Wittek

Last update: Nov 10, 2022

Related tags

Data Visualization somoclu

Overview

Somoclu

Somoclu is a massively parallel implementation of self-organizing maps. It exploits multicore CPUs, it is able to rely on MPI for distributing the workload in a cluster, and it can be accelerated by CUDA. A sparse kernel is also included, which is useful for training maps on vector spaces generated in text mining processes.

Key features:

Fast execution by parallelization: OpenMP, MPI, and CUDA are supported.
Multi-platform: Linux, macOS, and Windows are supported.
Planar and toroid maps.
Rectangular and hexagonal grids.
Gaussian and bubble neighborhood functions.
Both dense and sparse input data are supported.
Large maps of several hundred thousand neurons are feasible.
Integration with Databionic ESOM Tools.
Python, R, Julia, and MATLAB interfaces for the dense CPU and GPU kernels.

For more information, refer to the manuscript about the library [1].

Usage

Basic Command Line Use

Somoclu takes a plain text input file -- either dense or sparse data. Example files are included.

$ [mpirun -np NPROC] somoclu [OPTIONs] INPUT_FILE OUTPUT_PREFIX

Arguments:

-c FILENAME              Specify an initial codebook for the map.
-d NUMBER                Coefficient in the Gaussian neighborhood function
                         exp(-||x-y||^2/(2*(coeff*radius)^2)) (default: 0.5)
-e NUMBER                Maximum number of epochs
-g TYPE                  Grid type: square or hexagonal (default: square)
-h, --help               This help text
-k NUMBER                Kernel type
                            0: Dense CPU
                            1: Dense GPU
                            2: Sparse CPU
-l NUMBER                Starting learning rate (default: 0.1)
-L NUMBER                Finishing learning rate (default: 0.01)
-m TYPE                  Map type: planar or toroid (default: planar)
-n FUNCTION              Neighborhood function (bubble or gaussian, default: gaussian)
-p NUMBER                Compact support for Gaussian neighborhood
                         (0: false, 1: true, default: 0)
-r NUMBER                Start radius (default: half of the map in direction min(x,y))
-R NUMBER                End radius (default: 1)
-s NUMBER                Save interim files (default: 0):
                            0: Do not save interim files
                            1: Save U-matrix only
                            2: Also save codebook and best matching
-t STRATEGY              Radius cooling strategy: linear or exponential (default: linear)
-T STRATEGY              Learning rate cooling strategy: linear or exponential (default: linear)
-v NUMBER                Verbosity level, 0-2 (default: 0)
-x, --columns NUMBER     Number of columns in map (size of SOM in direction x)
-y, --rows    NUMBER     Number of rows in map (size of SOM in direction y)

Examples:

$ somoclu data/rgbs.txt data/rgbs
$ mpirun -np 4 somoclu -k 0 --rows 20 --columns 20 data/rgbs.txt data/rgbs

With random initialization, the initial codebook will be filled with random numbers ranging from 0 to 1. Either supply your own initial codebook or normalize your data to fall in this range.

If the range of the values of the features includes negative numbers, the codebook will eventually adjust. It is, however, not advised to have negative values, especially if the codebook is initialized from 0 to 1. This comes from the batch training nature of the parallel implementation. The batch update rule will change the codebook values with weighted averages of the data points, and with negative values, the updates can cancel out.

The maps generated by the GPU and the CPU kernels are likely to be different. For computational efficiency, Somoclu uses single-precision floats. This occasionally results in identical distances between a data instance and the neurons. The CPU version will pick the best matching unit with the lowest coordinate values. Such sequentiality cannot be guaranteed in the reduction kernel of the GPU variant. This is not a bug, but it is better to be aware of it.

Efficient Parallel and Distributed Execution

The CPU kernels use OpenMP to load multicore processors. On a single node, this is more efficient than launching tasks with MPI to match the number of cores. The MPI tasks replicated the codebook, which is especially inefficient for large maps.

For instance, given a single node with eight cores, the following execution will use 1/8th of the memory, and will run 10-20% faster:

$ somoclu -x 200 -y 200 data/rgbs.txt data/rgbs

Or, equivalently:

$ OMP_NUM_THREADS=8 somoclu -x 200 -y 200 data/rgbs.txt data/rgbs

Avoid the following on a single node:

$ OMP_NUM_THREADS=1 mpirun -np 8 somoclu -x 200 -y 200 data/rgbs.txt data/rgbs

The same caveats apply for the sparse CPU kernel.

Visualisation

The primary purpose of generating a map is visualisation. Apart from the Python interface, Somoclu does not come with its own functions for visualisation, since there are numerous generic tools that are capable of plotting high-quality figures. The R version integrates with kohonen and the MATLAB version with somtoolbox.

The output formats U-matrix and the codebook of the command-line version are compatible with Databionic ESOM Tools for more advanced visualisation.

Input File Formats

One sparse and two dense data formats are supported. All of them are plain text files. The entries can be separated by any white-space character. One row represents one data instance across all formats. Comment lines starting with a hash mark are ignored.

The sparse format follows the libsvm guidelines. The first feature is zero-indexed. For instance, the vector [ 1.2 0 0 3.4] is represented as the following line in the file: 0:1.2 3:3.4. The file is parsed twice: once to get the number of instances and features, and the second time to read the data in the individual threads.

The basic dense format includes the coordinates of the data vectors, separated by a white-space. Just like the sparse format, this file is parsed twice to get the basic dimensions right.

The .lrn file of Databionic ESOM Tools is also accepted and it is parsed only once. The format is described as follows:

% n

% m

% s1 s2 .. sm

% var_name1 var_name2 .. var_namem

x11 x12 .. x1m

x21 x22 .. x2m

. . . .

xn1 xn2 .. xnm

Here n is the number of rows in the file, that is, the number of data instances. Parameter m defines the number of columns in the file. The next row defines the column mask: the value 1 for a column means the column should be used in the training. Note that the first column in this format is always a unique key, so this should have the value 9 in the column mask. The row with the variable names is ignore by Somoclu. The elements of the matrix follow -- from here, the file is identical to the basic dense format, with the addition of the first column as the unique key.

If the input file is sparse, but a dense kernel is invoked, Somoclu will execute and results will be incorrect. Invoking a sparse kernel on a dense input file is likely to lead to a segmentation fault.

Interfaces

Python, Julia, R, and MATLAB interfaces are available for the dense CPU and GPU kernels. MPI and the sparse kernel are not support through the interfaces. For respective examples, see the folders in src.

The Python version is also available in PyPI. You can install it with

$ pip install somoclu

Alternatively, it is also available on conda-forge:

$ conda install somoclu

Some pre-built binaries in the wheel format or Windows installer are provided at PyPI Dowloads, they are tested with Anaconda distributions. If you encounter errors like ImportError: DLL load failed: The specified module could not be found when import somoclu, you may need to use Dependency Walker as shown here on _somoclu_wrap.pyd to find out missing DLLs and place them at the write place. Usually right version (32/64bit) of vcomp90.dll, msvcp90.dll, msvcr90.dll should be put to C:\Windows\System32 or C:\Windows\SysWOW64.

The wheel binaries for macOS are compiled with the system clang++, which means by default it is not parallelized. To use the parallel version on Mac, you can either use the version in conda-forge or compile it from source with your favourite OpenMP-friendly compiler. To get it working with the GPU kernel, you might have to follow the instructions at the Somoclu - Python Interface.

The R version is available on CRAN. You can install it with

install.packages("Rsomoclu")

To get it working with the GPU kernel, download the source zip file and specify your CUDA directory the following way:

R CMD INSTALL src/Rsomoclu_version.tar.gz --configure-args=/path/to/cuda

The Julia version is available on GitHub. The standard Pkg.add("Somoclu") should work.

For using the MATLAB toolbox, install SOM-Toolbox following the instructions at ilarinieminen/SOM-Toolbox and define the location of your MATLAB install to the configure script:

./configure --without-mpi --with-matlab=/usr/local/MATLAB/R2014a

For the GPU kernel, specify the location of your CUDA library for the configure script. More detailed instructions are in the MATLAB source folder.

Compilation & Installation

These are the instructions for compiling the core library and the command line interface. The only dependency is a C++ compiler chain -- GCC, ICC, clang, and VC were tested.

Multicore execution is supported through OpenMP -- the compiler must support this. Distributed systems are supported through MPI. The package was tested with OpenMPI. It should also work with other MPI flavours. CUDA support is optional.

Linux or macOS

If you have just cloned the git repository first run

$ ./autogen.sh

Then follow the standard POSIX procedure:

$ ./configure [options]
$ make
$ make install

Options for configure

--prefix=PATH           Set directory prefix for installation

By default Somoclu is installed into /usr/local. If you prefer a different location, use this option to select an installation directory.

--without-mpi           Disregard any MPI installation found.
--with-mpi=MPIROOT      Use MPI root directory.
--with-mpi-compilers=DIR or --with-mpi-compilers=yes
                          use MPI compiler (mpicxx) found in directory DIR, or
                          in your PATH if =yes
--with-mpi-libs="LIBS"  MPI libraries [default "-lmpi"]
--with-mpi-incdir=DIR   MPI include directory [default MPIROOT/include]
--with-mpi-libdir=DIR   MPI library directory [default MPIROOT/lib]

The above flags allow the identification of the correct MPI library the user wishes to use. The flags are especially useful if MPI is installed in a non-standard location, or when multiple MPI libraries are available.

--with-cuda=/path/to/cuda           Set path for CUDA

Somoclu looks for CUDA in /usr/local/cuda. If your installation is not there, then specify the path with this parameter. If you do not want CUDA enabled, set the parameter to --without-cuda.

Windows

Use the somoclu.sln under src/Windows/somoclu as an example Visual Studio 2015 solution. Modify the CUDA version or VC compiler version according to your needs.

The default solution enables all of OpenMP, MPI, and CUDA. The default MPI installation path is C:\Program Files (x86)\Microsoft SDKs\MPI\, modify the settings if yours is in a different path. The configuration default CUDA version is 9.1. Disable MPI by removing HAVE_MPI macro in the project properties (Properties -> Configuration Properties -> C/C++ -> Preprocessor). Disable CUDA by removing CUDA macro in the solution properties and uncheck CUDA in Project -> Custom Build Rules. If you open the solution without CUDA installed, please remove the following sections in somoclu.vcxproj:

  <ImportGroup Label="ExtensionSettings">
    <Import Project="$(VCTargetsPath)\BuildCustomizations\CUDA 9.1.props" />
  </ImportGroup>

and

  <ImportGroup Label="ExtensionTargets">
    <Import Project="$(VCTargetsPath)\BuildCustomizations\CUDA 9.1.targets" />
  </ImportGroup>

or change the version number according to which you installed.

The usage is identical to the Linux version through command line (see the relevant section).

Acknowledgment

This work was supported by the European Commission Seventh Framework Programme under Grant Agreement Number FP7-601138 PERICLES and by the AWS in Education Machine Learning Grant award.

Citation

Peter Wittek, Shi Chao Gao, Ik Soo Lim, Li Zhao (2017). Somoclu: An Efficient Parallel Library for Self-Organizing Maps. Journal of Statistical Software, 78(9), pp.1--21. DOI:10.18637/jss.v078.i09. arXiv:1305.1422.

Comments

wrap_train

Hello.

While using somoclu (windows7, python3.4) and calling the som.train() command (with and without args), I get the following error: som.train(epochs=epochs, radius0=radius0, scale0=scale0) File "C:\Python34\lib\site-packages\somoclu\train.py", line 158, in train wrap_train(np.ravel(self._data), epochs, self._n_columns, self._n_rows, NameError: name 'wrap_train' is not defined Any ideas?

Thank you!
bug

opened by dinos66 67
Activation on the surface of the map
I would like to obtain the activation of every SOM unit for a given input, i.e., not just the MAU/BMU.

I see from the comment here https://github.com/peterwittek/somoclu/issues/39#issuecomment-235973972 by @ghost that obtaining the BMUs is relatively simple. This makes me assume/infer that the activation across the whole map is calculated by the codebook times the input - is this correct?

For example:

def get_surface_state(som, X): D = np.dot(som.codebook.reshape((som.codebook.shape[0] * som.codebook.shape[1], som.codebook.shape[2])), X.T).T return D

If yes, can D be used as the input to another SOM? Or would that be meaningless, in your opinion?
enhancement question
opened by oliviaguest 46
Cannot import without error -- libiomp no longer in homebrew, etc.

In [2]: import somoclu Warning: training function cannot be imported. Only pre-trained maps can be analyzed. If you installed Somoclu with pip on OS X, this typically means missing libiomp. Please refer to the documentation and to this issue: https://github.com/peterwittek/somoclu/issues/28

It seems this issue was fixed, but I am still running into it. Any help?

opened by kevglynn 24
NameError: global name 'wrap_train' is not defined

I installed the whole package according to the following file: https://somoclu.readthedocs.io/en/stable/download.html

I am running a sample example from the following. https://somoclu.readthedocs.io/en/stable/example.html

I am on ubuntu 16 and python 2.7

I am getting the following error

NameError: global name 'wrap_train' is not defined

This is the following code where I am getting the error: som.train(data)

Help me with this error.
duplicate question

opened by bharadwaj509 18
Differing results between R and CLI

I had been using the CLI version of Somoclu and getting results consistent with other implementations of batch-trained SOMs (R-Kohonen and Matlab Neural network Toolbox). When you responded to my request for the inclusion of a bubble neighborhood function I decided to use the R package to test it rather than recompile for CLI (which was initially done for me by a colleague.) Following your instructions I compiled and tested the new version of the R package. I found that I was getting much higher quantitization errors than in CLI. In order to determine whether the difference was due to R or the requested changes I installed the current, unmodified, R package and compared the same input file using the same initial codebook. Using CLI I got a quantitization error of 5.73 but with R the quantitization error was 18.95.

Here is the CLI command:

somoclu -c T7_init_weights_nospace_CRend.wts -e 100 -k 1 -m planar -t linear -r 9 -R 1 -T linear -l 1 -L 0.01 -s 0 -x 18 -y 15 t7_norow_somoclu T_Opt_6

Here is the R script: dataTemp <- data.frame(fread("t7_norow_somoclu")) dataSource <- as.matrix(dataTemp) initTemp <- data.frame(fread("T7_init_weights_nospace_CRend.wts")) initSource <- as.matrix(initTemp) nSomX <- 18 nSomY <- 15 nEpoch <- 100 radius0 <- 9 radiusN <- 1 radiusCooling <- "linear" scale0 <- 1 scaleN <- 0.01 scaleCooling <- "linear" kernelType <- 0 mapType <- "planar" gridType <- "rectangular" compactSupport <- FALSE codebook <- initSource res <- Rsomoclu.train(dataSource, nEpoch, nSomX, nSomY, radius0, radiusN, radiusCooling, scale0, scaleN, scaleCooling, kernelType, mapType, gridType, compactSupport, codebook) head(res$globalBmus)
bug

opened by brogie62 14
Neighborhood Function Selection

Could you allow for user selection of neighborhood function beyond Gaussian? I would like the option of bubble but there are other functions that different users may prefer.

Thanks for the great work!
enhancement

opened by brogie62 14
Visual Studio 2015 - project load failed

Project load fails when trying to open src\Windows\somoclu\somoclu.sln (1.6.1) in Visual Studio Community 2015 Update 3 on Windows 10. The error message in Solution Explorer is,: "The project requires user input. Reload the project for more information." What am I missing?
enhancement

opened by pegasone 13
Batch training vs online training
Is there any way to update the SOM after a single pattern is a presented? I tried to send a pattern set with only a single member but I get the following error because presumably it needs more than a single pattern:

File "/somoclu/train.py", line 224, in update_data self.n_vectors, self.n_dim = data.shape ValueError: need more than 1 value to unpack

Is there an easy way around this I am missing? Shall I just edit the function to allow a single pattern or will that break other things?
question
opened by oliviaguest 12
some make problems in fedora20 - incl a way to get it to work, somehow

hi, in order to make it was necessary to edit io.cpp and add "#include " because its dependency in iostream has been removed with gcc 4.3. further more i comment the line with setDevice because of the error "undefined reference to `setDevice'", which allowed me to make it. BUT now i have problems with CUDA. if i try the gpu kernel, i get following error: $somoclu -x 100 -y 200 file folder -e 20 -k 1 --> nVectors: 417 nVectorsPerRank: 417 nDimensions: 0 Epoch: 0 Radius: 50 ** On entry to SGEMM parameter number 8 had an illegal value !!!! kernel execution error. Aborted terminate called after throwing an instance of 'thrust::system::system_error' what(): unload of CUDA runtime failed Aborted (core dumped)

would you have any suggestions? thanks a lot!

opened by standfest 12

BMU inconsistencies (Python)

[Using GPU, hexagonal lattice, toroid config]

I'm not sure what causes it but there appear to be some BMU inconsistencies when comparing the BMUs returned by training and the BMUs you compute yourself using the codebook and the data.

To reproduce:

import numpy as np
import somoclu
import matplotlib.pyplot as plt

SAMPLES = 50000
DIMS = 21

train_data = np.random.uniform(low=0.0, high=1.0, size=SAMPLES*DIMS).reshape((SAMPLES,DIMS)).astype(np.float32)

som = somoclu.Somoclu(
    32, 32,
    data=train_data,
    maptype="toroid",
    gridtype="hexagonal",
    kerneltype=0
)

som.train(epochs=32, radius0=min(som._n_rows, som._n_columns)/2, radiusN=1, scale0=0.1, scaleN=0.01)

W = som.codebook.reshape((som.codebook.shape[0] * som.codebook.shape[1], som.codebook.shape[2]))
X = train_data

D = -2*np.dot(W, X.T) + (W**2).sum(1)[:, None] + (X**2).sum(1)[:, None].T
BMU = (D==D.min(0)[None,:]).astype("float32").T
NBMU =  BMU.reshape((X.shape[0], som.codebook.shape[0], som.codebook.shape[1]))
new_bmus = np.vstack(NBMU.nonzero()[1:][::-1]).T

hitmap = BMU.sum(axis=0).reshape((som.codebook.shape[0], som.codebook.shape[1])).T
hitmap2 = np.zeros((som.umatrix.shape[0], som.umatrix.shape[1]))
for x in range(0, som.bmus.shape[0]):
    hitmap2[som.bmus[x][0], som.bmus[x][1]] += 1

print np.absolute(new_bmus - som.bmus).sum()

plt.imshow(hitmap - hitmap2)
plt.show()

bug

opened by ghost 11

Preprocessor Macro usage should be limited

The code has a large number of pre-processor macros that may not be necessary and makes the code confusing.

Guarding pragma within #defines for omp may not be needed as unrecognized pragmas are ignored by compiler.

opened by sambitdash 10
Batch mode and learning rate

If somoclu always uses the batch training mode, how is the learning rate used? If the update is done according to the batch training equation given in Wittek et al, 2017 (Somoclu: An Efficient Parallel Library for Self-Organizing Maps), learning rate is not used at all.

opened by jtorppa 3
Numpy requirement in setup.py

I'm trying to use somoclu in a project. When I make a PR of my project to a github repo, it failed the codecov test and the readthedoc build because to install somoclu I need numpy in the first place (as there is a line import numpy in the setup.py file of somoclu)

This is not a problem of somoclu itself, but that codecov and Read the Docs do not have numpy. I'm wondering if any small modification could be made to handle this problem? Thanks in advance!

opened by yanzastro 3

Get bmu of testing data

I can use som.bmus to get the coordinates of bmus of the training data, but I want to calculate the coordinates of bmus of testing data from a pre-trained SOM. I can define something like:

def f(i, data):
    dmap = np.sum(((data[i] - som.codebook)**2)**0.5, axis=2)
    return np.asarray(np.unravel_index(np.argmin(dmap), dmap.shape, order='F'))

def get_test_bmus(som, data):    
    with Pool() as p:
        bmus = p.map(lambda i: f(i, data), np.arange(len(data)))  # use multiprocess since the testing data can be very large
    return np.asarray(bmus)

but I'm wondering if there is built-in method in a somoclu class that can do such job?

opened by yanzastro 2

Attempting to use an MPI routine before initializing MPI

Hi guys,

I'm struggling in my attempt to build from source code in a windows environment and I did almost everything "right", but when I tried to run a example, this error appeared:

Attempting to use an MPI routine before initializing MPI

It is just one line of sadness and desolation, lol The version of somoclu was the 1.7.5, python 3.8.6, conda env, CUDA 10.2, Win10, MPI version from microsoft and VS2019.

In my work PC with ubuntu, every thing just worked fine with the conda install script.

Considering that I am a big noob in building complex stuff from source, I probably did something wrong and didn't even realize it..

Could anyone please help me on this one?

opened by joaoponte 4
(core dumped)

I received core dumped error. My data size is 382776x174688. I submit a job in cluster high performance compauter using the scrips mpirun -np 8 somoclu -g hexagonal -m toroid --rows 22 --columns 17 psl_n.txt psl_DJF

*** Error in `somoclu': munmap_chunk(): invalid ======= Backtrace: ========= /lib64/libc.so.6(+0x7ada4)[0x2b1d72d3cda4] somoclu[0x437528] somoclu[0x4070b3] /lib64/libc.so.6(__libc_start_main+0xf5)[0x2b1d72ce3b35 somoclu[0x40751c] ======= Memory map: ======== 00400000-005e8000 r-xp 00000000 00:2a 113822987310 007e8000-007e9000 r--p 001e8000 00:2a 113822987310 007e9000-007ea000 rw-p 001e9000 00:2a 113822987310 007ea000-007eb000 rw-p 00000000 00:00 0 01783000-01b25000 rw-p 00000000 00:00 0 2b1d6ef41000-2b1d6ef61000 r-xp 00000000 00:24 201029694 2b1d6ef61000-2b1d6ef63000 rw-p 00000000 00:00 0 2b1d6ef9b000-2b1d6efa3000 rw-p 00000000 00:00 0 2b1d6f160000-2b1d6f161000 r--p 0001f000 00:24 201029694 2b1d6f161000-2b1d6f162000 rw-p 00020000 00:24 201029694 2b1d6f162000-2b1d6f163000 rw-p 00000000 00:00 0 2b1d6f163000-2b1d6f165000 r-xp 00000000 00:24 201299306 2b1d6f165000-2b1d6f365000 ---p 00002000 00:24 201299306 2b1d6f365000-2b1d6f366000 r--p 00002000 00:24 201299306 2b1d6f366000-2b1d6f367000 rw-p 00003000 00:24 201299306 2b1d6f367000-2b1d6f3c9000 r-xp 00000000 00:2b 340587319 2b1d6f3c9000-2b1d6f5c9000 ---p 00062000 00:2b 340587319 2b1d6f5c9000-2b1d6f5cc000 rw-p 00062000 00:2b 340587319 2b1d6f5cc000-2b1d6f5cd000 rw-p 00000000 00:00 0 2b1d6f5cd000-2b1d71d52000 r-xp 00000000 00:2b 340587313 2b1d71d52000-2b1d71f51000 ---p 02785000 00:2b 340587313 2b1d71f51000-2b1d71f6f000 rw-p 02784000 00:2b 340587313 2b1d71f6f000-2b1d71f7d000 rw-p 00000000 00:00 0 2b1d71f7d000-2b1d72146000 r-xp 00000000 00:2a 49401166130 2b1d72146000-2b1d72345000 ---p 001c9000 00:2a 49401166130 2b1d72345000-2b1d72350000 r--p 001c8000 00:2a 49401166130 2b1d72350000-2b1d72353000 rw-p 001d3000 00:2a 49401166130 2b1d72353000-2b1d72356000 rw-p 00000000 00:00 0 2b1d72356000-2b1d72456000 r-xp 00000000 00:24 201413623 2b1d72456000-2b1d72656000 ---p 00100000 00:24 201413623 2b1d72656000-2b1d72657000 r--p 00100000 00:24 201413623 2b1d72657000-2b1d72658000 rw-p 00101000 00:24 201413623 2b1d72658000-2b1d7268c000 r-xp 00000000 00:2a 49393205273 2b1d7268c000-2b1d7288c000 ---p 00034000 00:2a 49393205273 2b1d7288c000-2b1d7288d000 r--p 00034000 00:2a 49393205273 2b1d7288d000-2b1d7288e000 rw-p 00035000 00:2a 49393205273 2b1d7288e000-2b1d728a5000 r-xp 00000000 00:2a 49401166125 2b1d728a5000-2b1d72aa4000 ---p 00017000 00:2a 49401166125 2b1d72aa4000-2b1d72aa5000 r--p 00016000 00:2a 49401166125 2b1d72aa5000-2b1d72aa6000 rw-p 00017000 00:2a 49401166125 2b1d72aa6000-2b1d72abd000 r-xp 00000000 00:24 201413908 2b1d72abd000-2b1d72cbc000 ---p 00017000 00:24 201413908 2b1d72cbc000-2b1d72cbd000 r--p 00016000 00:24 201413908 2b1d72cbd000-2b1d72cbe000 rw-p 00017000 00:24 201413908 2b1d72cbe000-2b1d72cc2000 rw-p 00000000 00:00 0 2b1d72cc2000-2b1d72e78000 r-xp 00000000 00:24 201299203 2b1d72e78000-2b1d73078000 ---p 001b6000 00:24 201299203 2b1d73078000-2b1d7307c000 r--p 001b6000 00:24 201299203 2b1d7307c000-2b1d7307e000 rw-p 001ba000 00:24 201299203 2b1d7307e000-2b1d73083000 rw-p 00000000 00:00 0 2b1d73083000-2b1d7308a000 r-xp 00000000 00:24 201427121 2b1d7308a000-2b1d73289000 ---p 00007000 00:24 201427121 2b1d73289000-2b1d7328a000 r--p 00006000 00:24 201427121 2b1d7328a000-2b1d7328b000 rw-p 00007000 00:24 201427121 7fff63ab4000-7fff63ad6000 rw-p 00000000 00:00 0 7fff63bd2000-7fff63bd4000 r-xp 00000000 00:00 0 ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 /var/opt/ud/torque-4.2.10/mom_priv/jobs/285503.hpc12.hpc.SC: pointer: 0x0000000001807310 *** ] /home/bcheneka/Build_WRF/LIBRARIES/somoclu/src/somoclu /home/bcheneka/Build_WRF/LIBRARIES/somoclu/src/somoclu /home/bcheneka/Build_WRF/LIBRARIES/somoclu/src/somoclu [heap] /usr/lib64/ld-2.17.so /usr/lib64/ld-2.17.so /usr/lib64/ld-2.17.so /usr/lib64/libdl-2.17.so /usr/lib64/libdl-2.17.so /usr/lib64/libdl-2.17.so /usr/lib64/libdl-2.17.so /opt/ud/cuda-8.0/lib64/libcudart.so.8.0.44 /opt/ud/cuda-8.0/lib64/libcudart.so.8.0.44 /opt/ud/cuda-8.0/lib64/libcudart.so.8.0.44 /opt/ud/cuda-8.0/lib64/libcublas.so.8.0.45 /opt/ud/cuda-8.0/lib64/libcublas.so.8.0.45 /opt/ud/cuda-8.0/lib64/libcublas.so.8.0.45 /home/bcheneka/gcc-9.2.0/lib64/libstdc++.so.6.0.27 /home/bcheneka/gcc-9.2.0/lib64/libstdc++.so.6.0.27 /home/bcheneka/gcc-9.2.0/lib64/libstdc++.so.6.0.27 /home/bcheneka/gcc-9.2.0/lib64/libstdc++.so.6.0.27 /usr/lib64/libm-2.17.so /usr/lib64/libm-2.17.so /usr/lib64/libm-2.17.so /usr/lib64/libm-2.17.so /home/bcheneka/gcc-9.2.0/lib64/libgomp.so.1.0.0 /home/bcheneka/gcc-9.2.0/lib64/libgomp.so.1.0.0 /home/bcheneka/gcc-9.2.0/lib64/libgomp.so.1.0.0 /home/bcheneka/gcc-9.2.0/lib64/libgomp.so.1.0.0 /home/bcheneka/gcc-9.2.0/lib64/libgcc_s.so.1 /home/bcheneka/gcc-9.2.0/lib64/libgcc_s.so.1 /home/bcheneka/gcc-9.2.0/lib64/libgcc_s.so.1 /home/bcheneka/gcc-9.2.0/lib64/libgcc_s.so.1 /usr/lib64/libpthread-2.17.so /usr/lib64/libpthread-2.17.so /usr/lib64/libpthread-2.17.so /usr/lib64/libpthread-2.17.so /usr/lib64/libc-2.17.so /usr/lib64/libc-2.17.so /usr/lib64/libc-2.17.so /usr/lib64/libc-2.17.so /usr/lib64/librt-2.17.so /usr/lib64/librt-2.17.so /usr/lib64/librt-2.17.so /usr/lib64/librt-2.17.so [stack] [vdso] [vsyscall] line 28: 331675 Aborted (core dumped) MP_NUM_THREADS=8 somoclu -g hexagonal -m toroid --rows 22 --columns 17 psl_n.txt psl_DJF

opened by bedassa 3

Releases(1.7.6)

1.7.6(Oct 31, 2021)
new release for easier installation

What's Changed

don't write intermediate U-matrix when s=0 by @yoch in https://github.com/peterwittek/somoclu/pull/116

Fix a serious bug by @yao531441 in https://github.com/peterwittek/somoclu/pull/125

Provisional fix for Issue #130 by @MattWenham in https://github.com/peterwittek/somoclu/pull/131

Add possibility to choose the order of the returned numpy array in Somoclu.get_bmus() by @giacomolanciano in https://github.com/peterwittek/somoclu/pull/146

src/Makefile.in: fix compilation rule for libsomoclu so that make cor… by @tomcucinotta in https://github.com/peterwittek/somoclu/pull/150

Update LICENSE by @achapkowski in https://github.com/peterwittek/somoclu/pull/152

Minor fixes by @tomcucinotta in https://github.com/peterwittek/somoclu/pull/158

Add norm-p as distance metric (with positive real p) by @tomcucinotta in https://github.com/peterwittek/somoclu/pull/160

New Contributors

@yao531441 made their first contribution in https://github.com/peterwittek/somoclu/pull/125

@MattWenham made their first contribution in https://github.com/peterwittek/somoclu/pull/131

@giacomolanciano made their first contribution in https://github.com/peterwittek/somoclu/pull/146

@tomcucinotta made their first contribution in https://github.com/peterwittek/somoclu/pull/150

@achapkowski made their first contribution in https://github.com/peterwittek/somoclu/pull/152

Full Changelog: https://github.com/peterwittek/somoclu/compare/1.7.5...1.7.6
Source code(tar.gz)
Source code(zip)
libsomoclu.so.zip(40.65 KB)
python_dist_wheel.zip(247.65 KB)
Rsomoclu.zip(12.44 KB)
1.7.5(Mar 1, 2018)
New: A Makefile for mingw to build on Windows.

Changed: PR #94 added a much more efficient sparse kernel.

Changed: boilerplate code for Julia greatly improved.

Changed: Code cleanup, pre-processor macros simplified.

Changed: Adapted to Seaborn API changes in plotting heatmaps.

Source code(tar.gz)
Source code(zip)
somoclu-1.7.5.tar.gz(1.55 MB)
1.7.4(Jun 6, 2017)
Fixed: The random seed was set to 0 for testing purposes. This is now changed to a wall-time based initialization.

Source code(tar.gz)
Source code(zip)
somoclu-1.7.4-cp36-cp36m-macosx_10_7_x86_64.whl(41.31 KB)
somoclu-1.7.4.tar.gz(1.51 MB)
somoclu-python-1.7.4.tar.gz(1.51 MB)
1.7.3(Apr 25, 2017)
New: Verbosity parameter in the command-line, Python, MATLAB, and Julia interfaces.

Changed: Calculation of U-matrix parallelized.

Changed: Moved feeding data to train method in the Python interface.

Fixed: Sparse matrix reader made more robust.

Fixed: Compatibility with kohonen 3 resolved.

Fixed: Compatibility with Matplotlib 2 resolved.

Source code(tar.gz)
Source code(zip)
somoclu-1.7.3.tar.gz(1.51 MB)
1.7.2(Nov 24, 2016)
New: The coefficient of the Gaussian neighborhood function exp(-||x-y||^2/(2*(coeff*radius)^2)) is now exposed in all interfaces as a parameter.

New: get_bmu function in the Python interface to get the best matching units given an activation map.

Changed: Updated PCA initialization in the Python interface to work with sk-learn 0.18 onwards.

Changed: Radii can be float values.

Fixed: Only positive values were written back to codebook during update.

Fixed: Sparse data is read correctly when there are class labels.

Source code(tar.gz)
Source code(zip)
somoclu-1.7.2.tar.gz(1.51 MB)
1.7.1(Oct 2, 2016)
Fixed: macOS build works again.

Source code(tar.gz)
Source code(zip)
somoclu-1.7.1.tar.gz(1.51 MB)
1.7.0(Sep 30, 2016)
New: Julia interface is available (https://github.com/peterwittek/Somoclu.jl).

New: Method get_surface_state of the Somoclu object in Python calculates the activation map for all data instances.

New: Method view_activation_map of the Somoclu object in Python allows plotting the activation map for the training data instances or for a new data instance.

New: Method view_similarity_matrix of the Somoclu object in Python visualizes the similarity matrix of data points according to their distance to the nodes in the map.

Fixed: CRAN-friendliness improved.

Source code(tar.gz)
Source code(zip)
somoclu-1.7.0.tar.gz(1.51 MB)
1.6.2(Aug 9, 2016)
Changed: In-place codebook updates when compiled without MPI. This improves update speed and substantially cuts memory use.

Changed: Compatible with Visual Studio 15.

Fixed: The BMUs returned after training were from before the last epoch. Now another round of BMU search is done.

Fixed: Training can continue on the same data in the Python wrapper.

Fixed: GPU memory allocation problem on Windows.

Source code(tar.gz)
Source code(zip)
Rsomoclu_1.6.2.tar.gz(58.86 KB)
somoclu-1.6.2-cp27-cp27m-win32.whl(28.28 KB)
somoclu-1.6.2-cp27-cp27m-win_amd64.whl(32.64 KB)
somoclu-1.6.2-cp35-cp35m-win32.whl(32.52 KB)
somoclu-1.6.2-cp35-cp35m-win_amd64.whl(38.51 KB)
somoclu-1.6.2.tar.gz(1.51 MB)
somoclu-1.6.2.win-amd64-py2.7.exe(142.50 KB)
somoclu-1.6.2.win-amd64-py3.5.exe(171.92 KB)
somoclu-1.6.2.win32-py2.7.exe(220.20 KB)
somoclu-1.6.2.win32-py3.5.exe(158.95 KB)
somoclu-python-1.6.2.tar.gz(5.30 MB)
1.6.1(Feb 22, 2016)
New: Option for PCA initialization is added to the Python interface.

New: Clustering of the codebook with arbitrary clustering algorithm in scikit-learn is now possible in the Python interface.

Source code(tar.gz)
Source code(zip)
somoclu-1.6.1.tar.gz(1.50 MB)
1.6(Jan 10, 2016)
New: R wrapper integrates with kohonen package.

New: MATLAB wrapper integrates with soomtoolbox.

New: Better handling of CUDA compilation in the Python interface.

Changed: Throws an exception if GPU kernel is requested, but it was compiled without it. The earlier behaviour quietly defaulted to the CPU kernel.

Source code(tar.gz)
Source code(zip)
somoclu-1.6-cp27-none-win32.whl(27.16 KB)
somoclu-1.6-cp27-none-win_amd64.whl(31.76 KB)
somoclu-1.6-cp34-none-win32.whl(29.01 KB)
somoclu-1.6-cp34-none-win_amd64.whl(191.19 KB)
somoclu-1.6.tar.gz(1.16 MB)
somoclu-1.6.win-amd64-py2.7.exe(250.93 KB)
somoclu-1.6.win-amd64-py3.4.exe(251.38 KB)
somoclu-1.6.win32-py2.7.exe(219.34 KB)
somoclu-1.6.win32-py3.4.exe(216.20 KB)
1.5.1(Dec 2, 2015)
New: Neighborhood function can be chosen between Gaussian and bubble.

Fixed: R wrapper passes arrays with correct orientation.

Fixed: io.cpp is no longer required in the wrappers. An exception is thrown when needed.

Source code(tar.gz)
Source code(zip)
somoclu-1.5.1.tar.gz(1.17 MB)
1.5(Sep 30, 2015)
New: Python interface has visual capabilities.

New: Option for hexagonal grid.

New: Option for requesting compact support in updating the map.

New: Python, R, and MATLAB interfaces now allow passing an initial codebook.

Changed: Reduced memory use in calculating U-matrices.

Changed: Build system rebuilt and simplified.

Source code(tar.gz)
Source code(zip)
somoclu-1.5.tar.gz(1.42 MB)
1.4.1(Jan 28, 2015)
Better support for ICC.

Faster code when compiling with GCC.

Building instructions and documentation improved.

Bug fixes: portability for R, using native R random number generator.

Source code(tar.gz)
Source code(zip)

Owner

Peter Wittek

GitHub https://peterwittek.github.io/somoclu/

Implementation of SOMs (Self-Organizing Maps) with neighborhood-based map topologies.

py-self-organizing-maps Simple implementation of self-organizing maps (SOMs) A SOM is an unsupervised method for learning a mapping from a discrete ne

6 Nov 22, 2022

Color maps for POV-Ray v3.7 from the Plasma, Inferno, Magma and Viridis color maps in Python's Matplotlib

POV-Ray-color-maps Color maps for POV-Ray v3.7 from the Plasma, Inferno, Magma and Viridis color maps in Python's Matplotlib. The include file Color_M

1 Apr 5, 2022

A Python Library for Self Organizing Map (SOM)

SOMPY A Python Library for Self Organizing Map (SOM) As much as possible, the structure of SOM is similar to somtoolbox in Matlab. It has the followin

497 Dec 29, 2022

A central task in drug discovery is searching, screening, and organizing large chemical databases

A central task in drug discovery is searching, screening, and organizing large chemical databases. Here, we implement clustering on molecular similarity. We support multiple methods to provide a interactive exploration of chemical space.

124 Jan 7, 2023

Parallel t-SNE implementation with Python and Torch wrappers.

Multicore t-SNE This is a multicore modification of Barnes-Hut t-SNE by L. Van der Maaten with python and Torch CFFI-based wrappers. This code also wo

1.7k Jan 9, 2023

Parallel t-SNE implementation with Python and Torch wrappers.

Multicore t-SNE This is a multicore modification of Barnes-Hut t-SNE by L. Van der Maaten with python and Torch CFFI-based wrappers. This code also wo

1.5k Feb 17, 2021

UNMAINTAINED! Renders beautiful SVG maps in Python.

Kartograph is not maintained anymore As you probably already guessed from the commit history in this repo, Kartograph.py is not maintained, which mean

1k Dec 9, 2022

Python Data. Leaflet.js Maps.

folium Python Data, Leaflet.js Maps folium builds on the data wrangling strengths of the Python ecosystem and the mapping strengths of the Leaflet.js

6k Jan 2, 2023

Easily convert matplotlib plots from Python into interactive Leaflet web maps.

mplleaflet mplleaflet is a Python library that converts a matplotlib plot into a webpage containing a pannable, zoomable Leaflet map. It can also embe

502 Dec 28, 2022

🗾 Streamlit Component for rendering kepler.gl maps

streamlit-keplergl ?? Streamlit Component for rendering kepler.gl maps in a streamlit app. ?? Live Demo ?? Installation pip install streamlit-keplergl

39 Dec 14, 2022

A package for plotting maps in R with ggplot2

Attention! Google has recently changed its API requirements, and ggmap users are now required to register with Google. From a user’s perspective, ther

719 Jan 4, 2023

Visualize data of Vietnam's regions with interactive maps.

Plotting Vietnam Development Map This is my personal project that I use plotly to analyse and visualize data of Vietnam's regions with interactive map

1 Jun 26, 2022

Extensible, parallel implementations of t-SNE

openTSNE openTSNE is a modular Python implementation of t-Distributed Stochasitc Neighbor Embedding (t-SNE) [1], a popular dimensionality-reduction al

1.1k Jan 3, 2023

Extensible, parallel implementations of t-SNE

openTSNE openTSNE is a modular Python implementation of t-Distributed Stochasitc Neighbor Embedding (t-SNE) [1], a popular dimensionality-reduction al

751 Feb 15, 2021

By default, networkx has problems with drawing self-loops in graphs.

By default, networkx has problems with drawing self-loops in graphs. It makes it hard to draw a graph with self-loops or to make a nicely looking chord diagram. This repository provides some code to draw self-loops nicely

5 Jan 6, 2022

Simple, realtime visualization of neural network training performance.

pastalog Simple, realtime visualization server for training neural networks. Use with Lasagne, Keras, Tensorflow, Torch, Theano, and basically everyth

416 Dec 29, 2022

Visualize the training curve from the *.csv file (tensorboard format).

Training-Curve-Vis Visualize the training curve from the *.csv file (tensorboard format). Feature Custom labels Curve smoothing Support for multiple c

7 Feb 23, 2022

Drug design and development team HackBio internship is a virtual bioinformatics program that introduces students and professional to advanced practical bioinformatics and its applications globally.

-Nyokong. Drug design and development team HackBio internship is a virtual bioinformatics program that introduces students and professional to advance

4 Aug 4, 2022

The Timescale NFT Starter Kit is a step-by-step guide to get up and running with collecting, storing, analyzing and visualizing NFT data from OpenSea, using PostgreSQL and TimescaleDB.

Timescale NFT Starter Kit The Timescale NFT Starter Kit is a step-by-step guide to get up and running with collecting, storing, analyzing and visualiz

102 Dec 24, 2022

Massively parallel self-organizing maps: accelerate training on multicore CPUs, GPUs, and clusters

Related tags

Overview

Somoclu

Usage

Basic Command Line Use

Efficient Parallel and Distributed Execution

Visualisation

Input File Formats

Interfaces

Compilation & Installation

Linux or macOS

Windows

Acknowledgment

Citation

Comments

Releases(1.7.6)

1.7.6(Oct 31, 2021)

What's Changed

New Contributors

1.7.5(Mar 1, 2018)

1.7.4(Jun 6, 2017)

1.7.3(Apr 25, 2017)

1.7.2(Nov 24, 2016)

1.7.1(Oct 2, 2016)

1.7.0(Sep 30, 2016)

1.6.2(Aug 9, 2016)

1.6.1(Feb 22, 2016)

1.6(Jan 10, 2016)

1.5.1(Dec 2, 2015)

1.5(Sep 30, 2015)

1.4.1(Jan 28, 2015)

Owner

Peter Wittek

Implementation of SOMs (Self-Organizing Maps) with neighborhood-based map topologies.

Color maps for POV-Ray v3.7 from the Plasma, Inferno, Magma and Viridis color maps in Python's Matplotlib

A Python Library for Self Organizing Map (SOM)

A central task in drug discovery is searching, screening, and organizing large chemical databases

Parallel t-SNE implementation with Python and Torch wrappers.

Parallel t-SNE implementation with Python and Torch wrappers.

UNMAINTAINED! Renders beautiful SVG maps in Python.

Python Data. Leaflet.js Maps.

Easily convert matplotlib plots from Python into interactive Leaflet web maps.

🗾 Streamlit Component for rendering kepler.gl maps

A package for plotting maps in R with ggplot2

Visualize data of Vietnam's regions with interactive maps.

Extensible, parallel implementations of t-SNE

Extensible, parallel implementations of t-SNE

By default, networkx has problems with drawing self-loops in graphs.

Simple, realtime visualization of neural network training performance.

Visualize the training curve from the *.csv file (tensorboard format).

Drug design and development team HackBio internship is a virtual bioinformatics program that introduces students and professional to advanced practical bioinformatics and its applications globally.

The Timescale NFT Starter Kit is a step-by-step guide to get up and running with collecting, storing, analyzing and visualizing NFT data from OpenSea, using PostgreSQL and TimescaleDB.