Parallel t-SNE implementation with Python and Torch wrappers.

Overview

Multicore t-SNE Build Status

This is a multicore modification of Barnes-Hut t-SNE by L. Van der Maaten with python and Torch CFFI-based wrappers. This code also works faster than sklearn.TSNE on 1 core.

What to expect

Barnes-Hut t-SNE is done in two steps.

  • First step: an efficient data structure for nearest neighbours search is built and used to compute probabilities. This can be done in parallel for each point in the dataset, this is why we can expect a good speed-up by using more cores.

  • Second step: the embedding is optimized using gradient descent. This part is essentially consecutive so we can only optimize within iteration. In fact some parts can be parallelized effectively, but not all of them a parallelized for now. That is why second step speed-up will not be that significant as first step sepeed-up but there is still room for improvement.

So when can you benefit from parallelization? It is almost true, that the second step computation time is constant of D and depends mostly on N. The first part's time depends on D a lot, so for small D time(Step 1) << time(Step 2), for large D time(Step 1) >> time(Step 2). As we are only good at parallelizing step 1 we will benefit most when D is large enough (MNIST's D = 784 is large, D = 10 even for N=1000000 is not so much). I wrote multicore modification originally for Springleaf competition, where my data table was about 300000 x 3000 and only several days left till the end of the competition so any speed-up was handy.

Benchmark

1 core

Interestingly, that this code beats other implementations. We compare to sklearn (Barnes-Hut of course), L. Van der Maaten's bhtsne, py_bh_tsne repo (cython wrapper for bhtsne with QuadTree). perplexity = 30, theta=0.5 for every run. In fact py_bh_tsne repo works at the same speed as this code when using more optimization flags for compiler.

This is a benchmark for 70000x784 MNIST data:

Method Step 1 (sec) Step 2 (sec)
MulticoreTSNE(n_jobs=1) 912 350
bhtsne 4257 1233
py_bh_tsne 1232 367
sklearn(0.18) ~5400 ~20920

I did my best to find what is wrong with sklearn numbers, but it is the best benchmark I could do (you can find test script in python/tests folder).

Multicore

This table shows a relative to 1 core speed-up when using n cores.

n_jobs Step 1 Step 2
1 1x 1x
2 1.54x 1.05x
4 2.6x 1.2x
8 5.6x 1.65x

How to use

Python and torch wrappers are available.

Python

Install

Directly from pypi

pip install MulticoreTSNE

From source

Make sure cmake is installed on your system, and you will also need a sensible C++ compiler, such as gcc or llvm-clang. On macOS, you can get both via homebrew.

To install the package, please do:

git clone https://github.com/DmitryUlyanov/Multicore-TSNE.git
cd Multicore-TSNE/
pip install .

Tested with both Python 2.7 and 3.6 (conda) and Ubuntu 14.04.

Run

You can use it as a near drop-in replacement for sklearn.manifold.TSNE.

from MulticoreTSNE import MulticoreTSNE as TSNE

tsne = TSNE(n_jobs=4)
Y = tsne.fit_transform(X)

Please refer to sklearn TSNE manual for parameters explanation.

This implementation n_components=2, which is the most common case (use Barnes-Hut t-SNE or sklearn otherwise). Also note that some parameters are there just for the sake of compatibility with sklearn and are otherwise ignored. See MulticoreTSNE class docstring for more info.

MNIST example

from sklearn.datasets import load_digits
from MulticoreTSNE import MulticoreTSNE as TSNE
from matplotlib import pyplot as plt

digits = load_digits()
embeddings = TSNE(n_jobs=4).fit_transform(digits.data)
vis_x = embeddings[:, 0]
vis_y = embeddings[:, 1]
plt.scatter(vis_x, vis_y, c=digits.target, cmap=plt.cm.get_cmap("jet", 10), marker='.')
plt.colorbar(ticks=range(10))
plt.clim(-0.5, 9.5)
plt.show()

Test

You can test it on MNIST dataset with the following command:

python MulticoreTSNE/examples/test.py <n_jobs>

Note on jupyter use

To make the computation log visible in jupyter please install wurlitzer (pip install wurlitzer) and execute this line in any cell beforehand:

%load_ext wurlitzer

Memory leakages are possible if you interrupt the process. Should be OK if you let it run until the end.

Torch

To install execute the following command from repository folder:

luarocks make torch/tsne-1.0-0.rockspec

or

luarocks install https://raw.githubusercontent.com/DmitryUlyanov/Multicore-TSNE/master/torch/tsne-1.0-0.rockspec

You can run t-SNE like that:

tsne = require 'tsne'

Y = tsne(X, n_components, perplexity, n_iter, angle, n_jobs)

torch.DoubleTensor type only supported for now.

License

Inherited from original repo's license.

Future work

  • Allow other types than double
  • Improve step 2 performance (possible)

Citation

Please cite this repository if it was useful for your research:

@misc{Ulyanov2016,
  author = {Ulyanov, Dmitry},
  title = {Multicore-TSNE},
  year = {2016},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/DmitryUlyanov/Multicore-TSNE}},
}

Of course, do not forget to cite L. Van der Maaten's paper

Comments
  • installation fails on macOS

    installation fails on macOS

    I followed the instructions on this link to get a version of gcc that supports openmp.

    but it looks like the install script isn't using gcc:

    -- The C compiler identification is AppleClang 8.0.0.8000038
    -- The CXX compiler identification is AppleClang 8.0.0.8000038
    -- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc
    

    help?

    opened by sg-s 26
  • Will there be a version for Windows?

    Will there be a version for Windows?

    Whilst the compilation and installation worked fine on Windows 8.1, running the code in Python results in

    OSError: cannot load library \lib\site-packages\MulticoreTSNE/libtsne_multicore.so: error 0x7e

    I guess that's since windows rather expects a DLL than a .so library. Unfortunately my CMAKE skills are not sufficient to adjust the current build instructions to also produce a .dll on Windows - so here's me hoping that someone might fix that.

    opened by Quasimondo 10
  • A range of refreshments ...

    A range of refreshments ...

    fixes https://github.com/DmitryUlyanov/Multicore-TSNE/issues/27 fixes https://github.com/DmitryUlyanov/Multicore-TSNE/issues/24 fixes https://github.com/DmitryUlyanov/Multicore-TSNE/issues/14 fixes https://github.com/DmitryUlyanov/Multicore-TSNE/issues/13 closes https://github.com/DmitryUlyanov/Multicore-TSNE/pull/16

    opened by kernc 8
  • Improve Windos compatibility and test on AppVeyor

    Improve Windos compatibility and test on AppVeyor

    This PR ~~improves~~enables building on Windos, proved by example on AppVeyor. If you like it, consider enabling AppVeyor for this project. Two tests needed to be adapted a bit, I guess due to their imprecise, arbitrary-set constraints. I know not better.

    Updated the readme to what seem to be the current build instructions. I don't think anything is needed besides cmake and a compiler. OpenMP is optional, pending compiler availability. Please feel free to edit as you see fit.

    OpenMP requirement was loosened because recent MSVC compilers only support 2.0 version of that standard from 1998.

    AppVeyor example run.


    Would you be interested in maintaining this package also on PyPI? I'd be happy to follow up with a small addendum to Travis and AppVeyor recipes, which would result in you just needing to issue the following few steps to publish a new release, making it accessible to install on all (three) platforms:

    git commit setup.py -m "bump version to 1.1 or whatever"
    git tag 1.1
    git push --tags
    # The tagged commit build results would be automatically pushed to PyPI
    
    opened by kernc 7
  • package is now pip-installable

    package is now pip-installable

    Hi Dmtry!

    This PR fixes problem #6

    I did this by using setuptools install function instead of the one contained in distutils. In addition to that I slightly rewrote the custom install function in setup.py. This was necessary, since the original part didn't abort the installation process in the case cmake wasn't installed. For completeness I also added a gitignore file.

    opened by guenteru 7
  • Unable to install using Pip

    Unable to install using Pip

    I am unable to install using pip install .It gives Running setup.py bdist_wheel for MulticoreTSNE ... error, CMake error at CMakeLists.txt:1 Failed to run MSBuild command.Then build failed error. I am using cmake version 3.11.0-rc4, Microsoft .Net Framework v4.0.30319. I am getting this error: $ pip install . Processing c:\users\deep chatterjee\multicore-tsne Requirement already satisfied: numpy in e:\anaconda3\lib\site-packages (from Mul ticoreTSNE==0.1) (1.14.5) Requirement already satisfied: cffi in e:\anaconda3\lib\site-packages (from Mult icoreTSNE==0.1) (1.10.0) Requirement already satisfied: pycparser in e:\anaconda3\lib\site-packages (from cffi->MulticoreTSNE==0.1) (2.18) Building wheels for collected packages: MulticoreTSNE Running setup.py bdist_wheel for MulticoreTSNE: started Running setup.py bdist_wheel for MulticoreTSNE: finished with status 'error' Complete output from command E:\Anaconda3\python.exe -u -c "import setuptools, tokenize;file='C:\Users\Public\Documents\Wondershare\CreatorTemp\pip- req-build-fc9af9iu\setup.py';f=getattr(tokenize, 'open', open)(file);code=f .read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" b dist_wheel -d C:\Users\Public\Documents\Wondershare\CreatorTemp\pip-wheel-c0r7vy ex --python-tag cp36: running bdist_wheel running build running build_py creating build creating build\lib.win-amd64-3.6 creating build\lib.win-amd64-3.6\MulticoreTSNE copying MulticoreTSNE_init_.py -> build\lib.win-amd64-3.6\MulticoreTSNE creating build\lib.win-amd64-3.6\MulticoreTSNE\tests copying MulticoreTSNE\tests\test_base.py -> build\lib.win-amd64-3.6\MulticoreT SNE\tests copying MulticoreTSNE\tests_init_.py -> build\lib.win-amd64-3.6\MulticoreTS NE\tests running egg_info creating MulticoreTSNE.egg-info writing MulticoreTSNE.egg-info\PKG-INFO writing dependency_links to MulticoreTSNE.egg-info\dependency_links.txt writing requirements to MulticoreTSNE.egg-info\requires.txt writing top-level names to MulticoreTSNE.egg-info\top_level.txt writing manifest file 'MulticoreTSNE.egg-info\SOURCES.txt' reading manifest file 'MulticoreTSNE.egg-info\SOURCES.txt' reading manifest template 'MANIFEST.in' writing manifest file 'MulticoreTSNE.egg-info\SOURCES.txt' running build_ext cmake version 3.11.0-rc4

    CMake suite maintained and supported by Kitware (kitware.com/cmake). -- Building for: Visual Studio 10 2010 CMake Error at CMakeLists.txt:1 (PROJECT): Failed to run MSBuild command:

      C:/Windows/Microsoft.NET/Framework/v4.0.30319/MSBuild.exe
    
    to get the value of VCTargetsPath:
    
      Microsoft (R) Build Engine version 4.6.1055.0
      [Microsoft .NET Framework, version 4.0.30319.42000]
      Copyright (C) Microsoft Corporation. All rights reserved.
    
      Build started 02-08-2018 17:04:52.
      Project "C:\Users\Public\Documents\Wondershare\CreatorTemp\pip-req-build-f                                                                                                                c9af9iu\build\temp.win-amd64-3.6\Release\CMakeFiles\3.11.0-rc4\VCTargetsPath.vcx                                                                                                                proj" on node 1 (default targets).
      C:\Users\Public\Documents\Wondershare\CreatorTemp\pip-req-build-fc9af9iu\b                                                                                                                uild\temp.win-amd64-3.6\Release\CMakeFiles\3.11.0-rc4\VCTargetsPath.vcxproj(14,2                                                                                                                ): error MSB4019: The imported project "C:\Microsoft.Cpp.Default.props" was not                                                                                                                 found. Confirm that the path in the <Import> declaration is correct, and that th                                                                                                                e file exists on disk.
      Done Building Project "C:\Users\Public\Documents\Wondershare\CreatorTemp\p                                                                                                                ip-req-build-fc9af9iu\build\temp.win-amd64-3.6\Release\CMakeFiles\3.11.0-rc4\VCT                                                                                                                argetsPath.vcxproj" (default targets) -- FAILED.
    
      Build FAILED.
    
      "C:\Users\Public\Documents\Wondershare\CreatorTemp\pip-req-build-fc9af9iu\                                                                                                                build\temp.win-amd64-3.6\Release\CMakeFiles\3.11.0-rc4\VCTargetsPath.vcxproj" (d                                                                                                                efault target) (1) ->
        C:\Users\Public\Documents\Wondershare\CreatorTemp\pip-req-build-fc9af9iu                                                                                                                \build\temp.win-amd64-3.6\Release\CMakeFiles\3.11.0-rc4\VCTargetsPath.vcxproj(14                                                                                                                ,2): error MSB4019: The imported project "C:\Microsoft.Cpp.Default.props" was no                                                                                                                t found. Confirm that the path in the <Import> declaration is correct, and that                                                                                                                 the file exists on disk.
    
          0 Warning(s)
          1 Error(s)
    
      Time Elapsed 00:00:00.06
    
    
    Exit code: 1
    

    -- Configuring incomplete, errors occurred! See also "C:/Users/Public/Documents/Wondershare/CreatorTemp/pip-req-build-fc9a f9iu/build/temp.win-amd64-3.6/Release/CMakeFiles/CMakeOutput.log".

    ERROR: Cannot generate Makefile. See above errors.


    Failed building wheel for MulticoreTSNE Running setup.py clean for MulticoreTSNE Failed to build MulticoreTSNE Installing collected packages: MulticoreTSNE Running setup.py install for MulticoreTSNE: started Running setup.py install for MulticoreTSNE: finished with status 'error' Complete output from command E:\Anaconda3\python.exe -u -c "import setuptool s, tokenize;file='C:\Users\Public\Documents\Wondershare\CreatorTemp\pi p-req-build-fc9af9iu\setup.py';f=getattr(tokenize, 'open', open)(file);code =f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record C:\Users\Public\Documents\Wondershare\CreatorTemp\pip-record-r ni883bh\install-record.txt --single-version-externally-managed --compile: running install running build running build_py creating build creating build\lib.win-amd64-3.6 creating build\lib.win-amd64-3.6\MulticoreTSNE copying MulticoreTSNE_init_.py -> build\lib.win-amd64-3.6\MulticoreTSNE creating build\lib.win-amd64-3.6\MulticoreTSNE\tests copying MulticoreTSNE\tests\test_base.py -> build\lib.win-amd64-3.6\Multicor eTSNE\tests copying MulticoreTSNE\tests_init_.py -> build\lib.win-amd64-3.6\Multicore TSNE\tests running egg_info writing MulticoreTSNE.egg-info\PKG-INFO writing dependency_links to MulticoreTSNE.egg-info\dependency_links.txt writing requirements to MulticoreTSNE.egg-info\requires.txt writing top-level names to MulticoreTSNE.egg-info\top_level.txt reading manifest file 'MulticoreTSNE.egg-info\SOURCES.txt' reading manifest template 'MANIFEST.in' writing manifest file 'MulticoreTSNE.egg-info\SOURCES.txt' running build_ext cmake version 3.11.0-rc4

    CMake suite maintained and supported by Kitware (kitware.com/cmake).
    -- Building for: Visual Studio 10 2010
    CMake Error at CMakeLists.txt:1 (PROJECT):
      Failed to run MSBuild command:
    
        C:/Windows/Microsoft.NET/Framework/v4.0.30319/MSBuild.exe
    
      to get the value of VCTargetsPath:
    
        Microsoft (R) Build Engine version 4.6.1055.0
        [Microsoft .NET Framework, version 4.0.30319.42000]
        Copyright (C) Microsoft Corporation. All rights reserved.
    
        Build started 02-08-2018 17:04:57.
        Project "C:\Users\Public\Documents\Wondershare\CreatorTemp\pip-req-build                                                                                                                -fc9af9iu\build\temp.win-amd64-3.6\Release\CMakeFiles\3.11.0-rc4\VCTargetsPath.v                                                                                                                cxproj" on node 1 (default targets).
        C:\Users\Public\Documents\Wondershare\CreatorTemp\pip-req-build-fc9af9iu                                                                                                                \build\temp.win-amd64-3.6\Release\CMakeFiles\3.11.0-rc4\VCTargetsPath.vcxproj(14                                                                                                                ,2): error MSB4019: The imported project "C:\Microsoft.Cpp.Default.props" was no                                                                                                                t found. Confirm that the path in the <Import> declaration is correct, and that                                                                                                                 the file exists on disk.
        Done Building Project "C:\Users\Public\Documents\Wondershare\CreatorTemp                                                                                                                \pip-req-build-fc9af9iu\build\temp.win-amd64-3.6\Release\CMakeFiles\3.11.0-rc4\V                                                                                                                CTargetsPath.vcxproj" (default targets) -- FAILED.
    
        Build FAILED.
    
        "C:\Users\Public\Documents\Wondershare\CreatorTemp\pip-req-build-fc9af9i                                                                                                                u\build\temp.win-amd64-3.6\Release\CMakeFiles\3.11.0-rc4\VCTargetsPath.vcxproj"                                                                                                                 (default target) (1) ->
          C:\Users\Public\Documents\Wondershare\CreatorTemp\pip-req-build-fc9af9                                                                                                                iu\build\temp.win-amd64-3.6\Release\CMakeFiles\3.11.0-rc4\VCTargetsPath.vcxproj(                                                                                                                14,2): error MSB4019: The imported project "C:\Microsoft.Cpp.Default.props" was                                                                                                                 not found. Confirm that the path in the <Import> declaration is correct, and tha                                                                                                                t the file exists on disk.
    
            0 Warning(s)
            1 Error(s)
    
        Time Elapsed 00:00:00.04
    
    
      Exit code: 1
    
    
    
    -- Configuring incomplete, errors occurred!
    See also "C:/Users/Public/Documents/Wondershare/CreatorTemp/pip-req-build-fc                                                                                                                9af9iu/build/temp.win-amd64-3.6/Release/CMakeFiles/CMakeOutput.log".
    
    ERROR: Cannot generate Makefile. See above errors.
    
    ----------------------------------------
    

    Command "E:\Anaconda3\python.exe -u -c "import setuptools, tokenize;file='C: \Users\Public\Documents\Wondershare\CreatorTemp\pip-req-build-fc9af9iu\se tup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n' , '\n');f.close();exec(compile(code, file, 'exec'))" install --record C:\Use rs\Public\Documents\Wondershare\CreatorTemp\pip-record-rni883bh\install-record.t xt --single-version-externally-managed --compile" failed with error code 1 in C: \Users\Public\Documents\Wondershare\CreatorTemp\pip-req-build-fc9af9iu\

    Thanks in advance

    opened by deepchatterjeevns 5
  • Support for n_components==1, others support it.

    Support for n_components==1, others support it.

    Looks like the failure might be happening in: Multicore-TSNE/multicore_tsne/tsne.cpp with this function: evaluateError where it is producing nans.

    Willing to send $100USD in Bitcoin to the first person that can demonstrate a solution before I do.

    opened by metaheuristic 5
  • Use option `random_state` / Set seed via `srand (random_state);`

    Use option `random_state` / Set seed via `srand (random_state);`

    Hi Dmitry,

    currently, the option random_state is avoided and thereby every tSNE plot looks different. Would you consider setting a seed for the initialization as described above, in random_state != None? If you want, I can make a pull request for that.

    Cheers, Alex

    opened by falexwolf 4
  • remove verbose from example

    remove verbose from example

    Running the example as-is, I get:

    $python MulticoreTSNE/examples/test.py 2
    downloading MNIST
    downloaded
    Traceback (most recent call last):
      File "MulticoreTSNE/examples/test.py", line 67, in <module>
        mnist_tsne = tsne.fit_transform(mnist, verbose=1)
    TypeError: fit_transform() got an unexpected keyword argument 'verbose'
    

    This is not unexpected, as the fit_transform method does not take a verbose flag :)

    opened by jona-sassenhagen 3
  • adding early exaggeration iterations as argument

    adding early exaggeration iterations as argument

    Adding argument to allow setting of the number of iterations that the algorithm stays in the early exaggeration phase.

    This argument sets the values for two algorithm params that aren't currently exposed: stop_lying_iter and mom_switch_iter. It doesn't seem like these two variables ever get set to different values, thus a single argument to control them both should be fine in practice.

    Controlling the learning rate in concert with the number of early exaggeration iterations and total iterations can produce a high quality embedding in a shorter time interval. This interplay is discussed at https://doi.org/10.1101/451690.

    opened by cciccole 2
  • test fails with

    test fails with "OSError: cannot load library"

    OS: ubuntu python version: Python 3.6.4 :: Anaconda, Inc.

    steps to reproduce:

    1. python MulticoreTSNE/examples/test.py

    full error:

    downloading MNIST
    downloaded
    Traceback (most recent call last):
      File "MulticoreTSNE/examples/test.py", line 81, in <module>
        tsne = TSNE(n_jobs=int(args.n_jobs), verbose=1, n_components=args.n_components, random_state=660)
      File "/home/marder/anaconda3/envs/mctsne/lib/python3.6/site-packages/MulticoreTSNE/__init__.py", line 63, in __init__
        self.C = self.ffi.dlopen(path + "/libtsne_multicore.so")
      File "/home/marder/anaconda3/envs/mctsne/lib/python3.6/site-packages/cffi/api.py", line 141, in dlopen
        lib, function_cache = _make_ffi_library(self, name, flags)
      File "/home/marder/anaconda3/envs/mctsne/lib/python3.6/site-packages/cffi/api.py", line 802, in _make_ffi_library
        backendlib = _load_backend_lib(backend, libname, flags)
      File "/home/marder/anaconda3/envs/mctsne/lib/python3.6/site-packages/cffi/api.py", line 797, in _load_backend_lib
        raise OSError(msg)
    OSError: cannot load library '/home/marder/anaconda3/envs/mctsne/lib/python3.6/site-packages/MulticoreTSNE/libtsne_multicore.so': /home/marder/anaconda3/envs/mctsne/lib/python3.6/site-packages/MulticoreTSNE/libtsne_multicore.so: undefined symbol: _ZNSt8ios_base4InitD1Ev.  Additionally, ctypes.util.find_library() did not manage to locate a library called '/home/marder/anaconda3/envs/mctsne/lib/python3.6/site-packages/MulticoreTSNE/libtsne_multicore.so'
    
    
    opened by sg-s 2
  • why can't Multicore-TSNE speed up my experiment with mnist

    why can't Multicore-TSNE speed up my experiment with mnist

    image my test code of Multicore-TSNE is below
    %%time
    import numpy as np
    from MulticoreTSNE import MulticoreTSNE as TSNE
    from matplotlib import pyplot as plt
    if __name__ == "__main__":
        data=np.load("/DATA1/zhangjingxiao/yxk/dataset/mnist/mnist.npz")
        for item in data.files:
            print(item)# find keys,the you can get value
            print(data[item].shape)
            print("=================")
        ## follow this
    
        X=data["x_train"].reshape(60000,-1)
        y=data["y_train"]
        print(X.shape)
    
        embeddings=TSNE(n_jobs=8).fit_transform(X)
        vis_x=embeddings[:,0]
        vis_y=embeddings[:,1]
        plt.scatter(vis_x,vis_y,c=y)
        plt.show()
    

    my result is image

    my test code of TSNE in sklearn is

    %%time
    import numpy as np
    from sklearn.manifold import TSNE 
    #from MulticoreTSNE import MulticoreTSNE as TSNE
    from matplotlib import pyplot as plt
    
    data=np.load("/DATA1/zhangjingxiao/yxk/dataset/mnist/mnist.npz")
    for item in data.files:
        print(item)# find keys,the you can get value
        print(data[item].shape)
        print("=================")
    ## follow this
    
    X=data["x_train"].reshape(60000,-1)
    y=data["y_train"]
    print(X.shape)
    
    tsne = TSNE(n_components=2, init='pca', random_state=0)
    embeddings= tsne.fit_transform(X)
    vis_x=embeddings[:,0]
    vis_y=embeddings[:,1]
    plt.scatter(vis_x,vis_y,c=y)
    plt.show()
    

    my result is below image It seems that mutlticore-tsne didnt speed up my experiment? Could you give me some advice about it ? Thank you

    opened by yuxiaokang-source 0
  • Fix compiling

    Fix compiling

    I had trouble getting this package to install, and have realized that if line 55 in setup.py is changed from: self.cmake_args or "--", to self.cmake_args or "-S",

    then multicore-TSNE compiles correctly. If this change could be made, that'd be awesome!

    opened by domergal16 3
  • signature of sklearn.datasets.make_blobs has changed

    signature of sklearn.datasets.make_blobs has changed

    When running the test suite while packaging this package for openSUSE/Factory, test_base.TestMulticoreTSNE.setUpClass errors:

    [   54s] ======================================================================
    [   54s] ERROR: setUpClass (test_base.TestMulticoreTSNE)
    [   54s] ----------------------------------------------------------------------
    [   54s] Traceback (most recent call last):
    [   54s]   File "/home/abuild/rpmbuild/BUILD/MulticoreTSNE-0.1/MulticoreTSNE/tests/test_base.py", line 24, in setUpClass
    [   54s]     cls.Xy = make_blobs(20, 100, 2, shuffle=False)
    [   54s] TypeError: make_blobs() takes from 0 to 2 positional arguments but 3 positional arguments (and 2 keyword-only arguments) were given
    [   54s] 
    [   54s] ----------------------------------------------------------------------
    

    The problem is that the signature of the function has changed and only the first parameters are positional. This patch seems to help:

    ---
     MulticoreTSNE/tests/test_base.py |    2 +-
     1 file changed, 1 insertion(+), 1 deletion(-)
    
    --- a/MulticoreTSNE/tests/test_base.py
    +++ b/MulticoreTSNE/tests/test_base.py
    @@ -21,7 +21,7 @@ def pdist(X):
     class TestMulticoreTSNE(unittest.TestCase):
         @classmethod
         def setUpClass(cls):
    -        cls.Xy = make_blobs(20, 100, 2, shuffle=False)
    +        cls.Xy = make_blobs(20, 100, centers=2, shuffle=False)
    
         def test_tsne(self):
             X, y = self.Xy
    
    opened by mcepl 0
  • cmake version requirement.txt

    cmake version requirement.txt

    Hi,

    in the requirement.txt its cmake>=3.17.0. However, with python 3.6 and the newest cmake==3.22.0 the MulticoreTSNE installation fails. When I use cmake==3.17.1 the installation works. Maybe that can be updated in the requirement.txt?

    opened by flde 1
  • Enabling verbose causes kernel crash (likely a divided by zero)

    Enabling verbose causes kernel crash (likely a divided by zero)

    I have found the most peculiar bug: if verbose is enabled, with some inputs, the library causes a jupyter kernel crash.

    I believe it to be caused by a divide by zero, as when I print the output of the dmesg command I get:

    [ 8324.933431]  in libtsne_multicore.so[7fcdffd99000+26000]
    [ 8324.933434] traps: ZMQbg/1[7595] trap divide error ip:7fcdffdac3bc sp:7fcdfeb699a0 error:0
    

    You can reproduce the error by running:

    import numpy as np
    from MulticoreTSNE import MulticoreTSNE
    
    a = np.array([
        [0, 1, 1, 1, 1],
        [1, 0, 1, 2, 1],
        [1, 1, 0, 1, 2],
        [1, 2, 1, 0, 1],
        [1, 1, 2, 1, 0]
    ])
    
    MulticoreTSNE(
        verbose=True
    ).fit_transform(a)
    

    You can immediately test it on COLAB here:

    COLAB link: https://colab.research.google.com/drive/1uM0GFpLm1Ln_VJRsMLEjyuG0GsnQvZng?usp=sharing

    opened by LucaCappelletti94 0
Owner
Dmitry Ulyanov
Co-Founder at in3D, Phd @ Skoltech
Dmitry Ulyanov
Extensible, parallel implementations of t-SNE

openTSNE openTSNE is a modular Python implementation of t-Distributed Stochasitc Neighbor Embedding (t-SNE) [1], a popular dimensionality-reduction al

Pavlin Poličar 1.1k Jan 3, 2023
Extensible, parallel implementations of t-SNE

openTSNE openTSNE is a modular Python implementation of t-Distributed Stochasitc Neighbor Embedding (t-SNE) [1], a popular dimensionality-reduction al

Pavlin Poličar 751 Feb 15, 2021
Massively parallel self-organizing maps: accelerate training on multicore CPUs, GPUs, and clusters

Somoclu Somoclu is a massively parallel implementation of self-organizing maps. It exploits multicore CPUs, it is able to rely on MPI for distributing

Peter Wittek 239 Nov 10, 2022
Python implementation of the Density Line Chart by Moritz & Fisher.

PyDLC - Density Line Charts with Python Python implementation of the Density Line Chart (Moritz & Fisher, 2018) to visualize large collections of time

Charles L. Bérubé 10 Jan 6, 2023
Simple python implementation with matplotlib to manually fit MIST isochrones to Gaia DR2 color-magnitude diagrams

Simple python implementation with matplotlib to manually fit MIST isochrones to Gaia DR2 color-magnitude diagrams

Karl Jaehnig 7 Oct 22, 2022
Simple implementation of Self Organizing Maps (SOMs) with rectangular and hexagonal grid topologies

py-self-organizing-map Simple implementation of Self Organizing Maps (SOMs) with rectangular and hexagonal grid topologies. A SOM is a simple unsuperv

Jonas Grebe 1 Feb 10, 2022
Simple plotting for Python. Python wrapper for D3xter - render charts in the browser with simple Python syntax.

PyDexter Simple plotting for Python. Python wrapper for D3xter - render charts in the browser with simple Python syntax. Setup $ pip install PyDexter

D3xter 31 Mar 6, 2021
A high performance implementation of HDBSCAN clustering. http://hdbscan.readthedocs.io/en/latest/

HDBSCAN Now a part of scikit-learn-contrib HDBSCAN - Hierarchical Density-Based Spatial Clustering of Applications with Noise. Performs DBSCAN over va

Leland McInnes 91 Dec 29, 2022
The implementation of the paper "HIST: A Graph-based Framework for Stock Trend Forecasting via Mining Concept-Oriented Shared Information".

The HIST framework for stock trend forecasting The implementation of the paper "HIST: A Graph-based Framework for Stock Trend Forecasting via Mining C

Wentao Xu 111 Jan 3, 2023
Implementation of SOMs (Self-Organizing Maps) with neighborhood-based map topologies.

py-self-organizing-maps Simple implementation of self-organizing maps (SOMs) A SOM is an unsupervised method for learning a mapping from a discrete ne

Jonas Grebe 6 Nov 22, 2022
A Python Binder that merge 2 files with any extension by creating a new python file and compiling it to exe which runs both payloads.

Update ! ANONFILE MIGHT NOT WORK ! About A Python Binder that merge 2 files with any extension by creating a new python file and compiling it to exe w

Vesper 15 Oct 12, 2022
Debugging, monitoring and visualization for Python Machine Learning and Data Science

Welcome to TensorWatch TensorWatch is a debugging and visualization tool designed for data science, deep learning and reinforcement learning from Micr

Microsoft 3.3k Dec 27, 2022
Python module for drawing and rendering beautiful atoms and molecules using Blender.

Batoms is a Python package for editing and rendering atoms and molecules objects using blender. A Python interface that allows for automating workflows.

Xing Wang 1 Jul 6, 2022
eoplatform is a Python package that aims to simplify Remote Sensing Earth Observation by providing actionable information on a wide swath of RS platforms and provide a simple API for downloading and visualizing RS imagery

An Earth Observation Platform Earth Observation made easy. Report Bug | Request Feature About eoplatform is a Python package that aims to simplify Rem

Matthew Tralka 4 Aug 11, 2022
Python scripts for plotting audiograms and related data from Interacoustics Equinox audiometer and Otoaccess software.

audiometry Python scripts for plotting audiograms and related data from Interacoustics Equinox 2.0 audiometer and Otoaccess software. Maybe similar sc

Hamilton Lab at UT Austin 2 Jun 15, 2022
Drug design and development team HackBio internship is a virtual bioinformatics program that introduces students and professional to advanced practical bioinformatics and its applications globally.

-Nyokong. Drug design and development team HackBio internship is a virtual bioinformatics program that introduces students and professional to advance

null 4 Aug 4, 2022
The Timescale NFT Starter Kit is a step-by-step guide to get up and running with collecting, storing, analyzing and visualizing NFT data from OpenSea, using PostgreSQL and TimescaleDB.

Timescale NFT Starter Kit The Timescale NFT Starter Kit is a step-by-step guide to get up and running with collecting, storing, analyzing and visualiz

Timescale 102 Dec 24, 2022
Plot and save the ground truth and predicted results of human 3.6 M and CMU mocap dataset.

Visualization-of-Human3.6M-Dataset Plot and save the ground truth and predicted results of human 3.6 M and CMU mocap dataset. human-motion-prediction

Gaurav Kumar Yadav 5 Nov 18, 2022
Visual Python is a GUI-based Python code generator, developed on the Jupyter Notebook environment as an extension.

Visual Python is a GUI-based Python code generator, developed on the Jupyter Notebook environment as an extension.

Visual Python 564 Jan 3, 2023