Extensible, parallel implementations of t-SNE

Overview

openTSNE

Build Status Documentation Status Codacy Badge License Badge

openTSNE is a modular Python implementation of t-Distributed Stochasitc Neighbor Embedding (t-SNE) [1], a popular dimensionality-reduction algorithm for visualizing high-dimensional data sets. openTSNE incorporates the latest improvements to the t-SNE algorithm, including the ability to add new data points to existing embeddings [2], massive speed improvements [3] [4], enabling t-SNE to scale to millions of data points and various tricks to improve global alignment of the resulting visualizations [5].

Macosko 2015 mouse retina t-SNE embedding

A visualization of 44,808 single cell transcriptomes obtained from the mouse retina [6] embedded using the multiscale kernel trick to better preserve the global aligment of the clusters.

Installation

openTSNE requires Python 3.6 or higher in order to run.

Conda

openTSNE can be easily installed from conda-forge with

conda install --channel conda-forge opentsne

Conda package

PyPi

openTSNE is also available through pip and can be installed with

pip install opentsne

PyPi package

Installing from source

If you wish to install openTSNE from source, please run

python setup.py install

in the root directory to install the appropriate dependencies and compile the necessary binary files.

Please note that openTSNE requires a C/C++ compiler to be available on the system. Additionally, numpy must be pre-installed in the active environment.

In order for openTSNE to utilize multiple threads, the C/C++ compiler must support OpenMP. In practice, almost all compilers implement this with the exception of older version of clang on OSX systems.

To squeeze the most out of openTSNE, you may also consider installing FFTW3 prior to installation. FFTW3 implements the Fast Fourier Transform, which is heavily used in openTSNE. If FFTW3 is not available, openTSNE will use numpy’s implementation of the FFT, which is slightly slower than FFTW. The difference is only noticeable with large data sets containing millions of data points.

A hello world example

Getting started with openTSNE is very simple. First, we'll load up some data using scikit-learn

from sklearn import datasets

iris = datasets.load_iris()
x, y = iris["data"], iris["target"]

then, we'll import and run

from openTSNE import TSNE

embedding = TSNE().fit(x)

Citation

If you make use of openTSNE for your work we would appreciate it if you would cite the paper

@article {Poli{\v c}ar731877,
    author = {Poli{\v c}ar, Pavlin G. and Stra{\v z}ar, Martin and Zupan, Bla{\v z}},
    title = {openTSNE: a modular Python library for t-SNE dimensionality reduction and embedding},
    year = {2019},
    doi = {10.1101/731877},
    publisher = {Cold Spring Harbor Laboratory},
    URL = {https://www.biorxiv.org/content/early/2019/08/13/731877},
    eprint = {https://www.biorxiv.org/content/early/2019/08/13/731877.full.pdf},
    journal = {bioRxiv}
}

openTSNE implements two efficient algorithms for t-SNE. Please consider citing the original authors of the algorithm that you use. If you use FIt-SNE (default), then the citation is [4] below, but if you use Barnes-Hut the citation is [3].

References

[1] Van Der Maaten, Laurens, and Hinton, Geoffrey. “Visualizing data using t-SNE.” Journal of Machine Learning Research 9.Nov (2008): 2579-2605.
[2] Poličar, Pavlin G., Martin Stražar, and Blaž Zupan. “Embedding to Reference t-SNE Space Addresses Batch Effects in Single-Cell Classification.” BioRxiv (2019): 671404.
[3] (1, 2) Van Der Maaten, Laurens. “Accelerating t-SNE using tree-based algorithms.” Journal of Machine Learning Research 15.1 (2014): 3221-3245.
[4] (1, 2) Linderman, George C., et al. "Fast interpolation-based t-SNE for improved visualization of single-cell RNA-seq data." Nature Methods 16.3 (2019): 243.
[5] Kobak, Dmitry, and Berens, Philipp. “The art of using t-SNE for single-cell transcriptomics.” Nature Communications 10, 5416 (2019).
[6] Macosko, Evan Z., et al. “Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets.” Cell 161.5 (2015): 1202-1214.
Issues
  • A bunch of comments and questions

    A bunch of comments and questions

    Hi Pavlin! Great work. I did not know about Orange but I am working with scRNA-seq data myself (cf. your Zeisel2018 example) and I am using Python, so it's interesting to see developments in that direction.

    I have a couple of scattered comments/questions that I will just dump here. This isn't a real "issue".

    1. You say that BH is much faster than FFT for smaller datasets. That's interesting; I did not notice this. What kind of numbers are you talking about here? I was under impression that with n<10k both methods are so fast (I guess all 1000 iterations under 1 min?) that the exact time does not really matter...

    2. Any specific reason to use "Python/Numba implementation of nearest neighbor descent" for approximate nearest neighbours? There are some popular libraries, e.g. annoy. Is your implementation much faster than that? Because otherwise it could be easier to use a well-known established library... I think Leland McInnes is using something similar (Numba implementation of nearest neighbor descent) in his UMAP; did you follow him here?

    3. I did not look at the actual code, but from the description on the main page it sounds that you don't have a vanilla t-SNE implementation in here. Is it true? I think it would be nice to have vanilla t-SNE in here too. For datasets with n=1k-2k it's pretty fast and I guess many people would prefer to use vanilla t-SNE if possible.

    4. I noticed you writing this in one of the closed issues:

      we allow new data to be added into the existing embedding by direct optimization. To my knowledge, no other library does this. It's sometimes difficult to get nice embeddings like this, but it may have potential.

      That's interesting. How exactly are you doing this? You fix the existing embedding, compute all the affinities for the extended dataset (original data + new data) and then optimize the cost by allowing only the positions of the new points to change? Something like that?

    5. George sped up his code quite a bit by adding multithreading to the F_attr computations. He is now implementing multithreading for the repulsive forces too. See https://github.com/KlugerLab/FIt-SNE/pull/32, and the discussion there. This might be interesting for you too. Or are you already using multithreading during gradient descent?

    6. I am guessing that your Zeisel2018 plot is colored using the same 16 "megaclusters" that Zeisel et al. use in Figure 1B (https://www.cell.com/cms/attachment/f1754f20-890c-42f5-aa27-bbb243127883/gr1_lrg.jpg). If so, it would be great if you used the same colors as in their figure; this would ease the comparison. Of course you are not trying to make comparisons here, but this is something that would be interesting to me personally :)

    opened by dkobak 37
  • Runtime and RAM usage compared to FIt-SNE

    Runtime and RAM usage compared to FIt-SNE

    I understand that openTSNE is expected to be slower than FIt-SNE, but I'd like to understand how much slower it is in typical situations. As I reported earlier, when I run it on 70000x50 PCA-reduced MNIST data with default parameters and n_jobs=-1, I get ~60 seconds with FIt-SNE and ~120 seconds with openTSNE. Every 50 iterations take around 2s vs around 4s.

    I did not check for this specific case, but I suspect that FFT takes only a small fraction of this time, and the computational bottleneck is formed by the attractive forces. Can one profile openTSNE and see how much time is taken by different steps, such as repulsive/attractive computations?

    Apart from that, and possibly even more worryingly, I replicated the data 6x and added some noise, to get a 420000x50 data matrix. It takes FIt-SNE around 1Gb of RAM to allocate the space for the kNN matrix, so it works just fine on my laptop. However, openTSNE rapidly took >7Gb of RAM and crashed the kernel (I have 16 Gb but around half was taken by other processes). This happened in the first seconds, so I assume it happens during the kNN search. Does pynndescent eat up so much memory in this case?

    discussion 
    opened by dkobak 25
  • Why does transform() have exaggeration=2 by default?

    Why does transform() have exaggeration=2 by default?

    The parameters of the transform function are

    def transform(self, X, perplexity=5, initialization="median", k=25,
    learning_rate=100, n_iter=100, exaggeration=2, momentum=0, max_grad_norm=0.05):
    

    so it has exaggeration=2 by default. Why? This looks unintuitive to me: exaggeration is a slightly "weird" trick that can arguably be very useful for huge data sets, but I would expect the out-of-sample embedding to work just fine with it. Am I missing something?

    I am also curious why momentum is set to 0 (unlike in normal tSNE optimization), but here I don't have any intuition for what it should be.

    Another question is: will this function work with n_iter=0 if one just wants to get an embedding using medians of k nearest neighbours? That would be handy. Or is there another way to get this? Perhaps from prepare_partial?

    And lastly, when transform() is applied to points from a very different data set (imagine positioning Smart-seq2 cells onto a 10x Chromium reference), I prefer to use correlation distances because I suspect Euclidean distances might be completely off (even when the original tSNE was done using Euclidean distances). I think openTSNE currently does not support this, right? Did you have any problems with that? One could perhaps allow transform() to take a metric argument (is correlation among the supported metrics, btw?). The downside is that if this metric is different from the metric used to prepare the embedding, then the nearest neighbours object will have to be recomputed, so it will suddenly become much slower. Let me know if I should post it as a separate issue.

    question 
    opened by dkobak 25
  • Add spectral initialization using diffusion maps

    Add spectral initialization using diffusion maps

    Description of changes

    Fixes #110.

    I ended up implementing diffusion maps only because computationally, computing the leading eigenvectors is much faster than the smallest eigenvectors, and of the various spectral methods, diffusion maps are the only ones that require this. I checked what UMAP does - it uses the symmetric normalized laplacian for initialization - but they manually set the number of lanczos iteration limit, which I don't understand. This seemed like the better option.

    @dkobak Do you want to take a look at this? I implemented this using scipy.sparse.linalg.svds because it turns out to be faster than scipy.sparse.linalg.eigsh and it seemed to produce strange results when I increased the error tolerance, while svds results seemed reasonable.

    Includes
    • [X] Code changes
    • [ ] Tests
    • [ ] Documentation
    opened by pavlin-policar 22
  • `pynndescent` has recently changed

    `pynndescent` has recently changed

    Expected behaviour

    Return the embedding

    Actual behaviour

    Return the embedding with one warning : .../miniconda3/lib/python3.7/site-packages/openTSNE/nearest_neighbors.py:181: UserWarning: pynndescent has recently changed which distance metrics are supported, and openTSNE.nearest_neighbors has not been updated. Please notify the developers of this change. "pynndescent has recently changed which distance metrics are supported, "

    Steps to reproduce the behavior

    Hello World steps

    opened by VallinP 18
  • Added Annoy support

    Added Annoy support

    Added Annoy support as per #101. Annoy is used by default if it supports the given metric and if the input data is not scipy.sparse (otherwise Pynndescent is used).

    This needs installed Annoy (I installed this https://anaconda.org/conda-forge/python-annoy), but I wasn't sure where to add this dependency.

    opened by dkobak 16
  • FFT parameters and runtime for very expanded embeddings

    FFT parameters and runtime for very expanded embeddings

    I have been doing some experiments on convergence and running t-SNE for many more iterations than I normally do. And I again noticed something that I used to see every now and then: the runtime jumps wildly between "epochs" of 50 iterations. This only happens when the embedding is very expanded and so FFT gets really slow. Look:

    Iteration   50, KL divergence 4.8674, 50 iterations in 1.8320 sec
    Iteration  100, KL divergence 4.3461, 50 iterations in 1.8760 sec
    Iteration  150, KL divergence 4.0797, 50 iterations in 2.6252 sec
    Iteration  200, KL divergence 3.9082, 50 iterations in 4.5062 sec
    Iteration  250, KL divergence 3.7864, 50 iterations in 5.4258 sec
    Iteration  300, KL divergence 3.6957, 50 iterations in 7.2500 sec
    Iteration  350, KL divergence 3.6259, 50 iterations in 9.0705 sec
    Iteration  400, KL divergence 3.5711, 50 iterations in 10.1077 sec
    Iteration  450, KL divergence 3.5271, 50 iterations in 12.2412 sec
    Iteration  500, KL divergence 3.4909, 50 iterations in 13.6440 sec
    Iteration  550, KL divergence 3.4604, 50 iterations in 14.6127 sec
    Iteration  600, KL divergence 3.4356, 50 iterations in 17.2364 sec
    Iteration  650, KL divergence 3.4143, 50 iterations in 17.6973 sec
    Iteration  700, KL divergence 3.3986, 50 iterations in 27.9720 sec
    Iteration  750, KL divergence 3.3914, 50 iterations in 34.0480 sec
    Iteration  800, KL divergence 3.3863, 50 iterations in 34.4572 sec
    Iteration  850, KL divergence 3.3820, 50 iterations in 36.9247 sec
    Iteration  900, KL divergence 3.3779, 50 iterations in 47.0994 sec
    Iteration  950, KL divergence 3.3737, 50 iterations in 40.8424 sec
    Iteration 1000, KL divergence 3.3696, 50 iterations in 62.1549 sec
    Iteration 1050, KL divergence 3.3653, 50 iterations in 30.6310 sec
    Iteration 1100, KL divergence 3.3613, 50 iterations in 44.9781 sec
    Iteration 1150, KL divergence 3.3571, 50 iterations in 36.9257 sec
    Iteration 1200, KL divergence 3.3531, 50 iterations in 66.3830 sec
    Iteration 1250, KL divergence 3.3493, 50 iterations in 37.7215 sec
    Iteration 1300, KL divergence 3.3457, 50 iterations in 33.7942 sec
    Iteration 1350, KL divergence 3.3421, 50 iterations in 33.7507 sec
    Iteration 1400, KL divergence 3.3387, 50 iterations in 59.2065 sec
    Iteration 1450, KL divergence 3.3354, 50 iterations in 36.3713 sec
    Iteration 1500, KL divergence 3.3323, 50 iterations in 39.1894 sec
    Iteration 1550, KL divergence 3.3293, 50 iterations in 67.3239 sec
    Iteration 1600, KL divergence 3.3265, 50 iterations in 33.9837 sec
    Iteration 1650, KL divergence 3.3238, 50 iterations in 63.5015 sec
    

    For the record, this is on full MNIST with uniform k=15 affinity, n_jobs=-1. Note that after it gets to 30 seconds / 50 iterations, it starts fluctuating between 30 and 60. This does not make sense.

    I suspect it may be related to how interpolation params are chosen depending on the grid size. Can it be that those heuristics may need improvement?

    Incidentally, can it be that the interpolation params can be relaxed once the embedding becomes very large (e.g. span larger than [-100,100]) so that optimisation runs faster without -- perhaps! -- compromising the approximation too much?

    CCing to @linqiaozhi.

    opened by dkobak 15
  • Pynndescent build/query

    Pynndescent build/query

    We discussed this before, but I've been playing around with some sparse data now and wanted to report some runtimes.

    When using pynndecent, openTSNE runs build() with n_neighbors=15 and then query() with n_neighbors=3*perplexity. At the same time, Leland said that that's not efficient and the recommended way to use pynndescent is to run build() with the desired number of neighbors and then simply take its constructed kNN graph without querying. You said that you ran some benchmarks and found your way to be faster. Here are runtimes I got on X that is sparse of size (100000, 9630).

    nn = NNDescent(X, metric='cosine', n_neighbors=15)      # Wall time: 39 s
    nn.query(X, k=15)                                         # Wall time: 1min 57s
    nn.query(X, k=90)                                         # Wall time: 3min 21s
    nn90 = NNDescent(X, metric='cosine', n_neighbors=90)  # Wall time: 7min 45s
    nn90.query(X, k=90)                                     # Wall time: 57min 53s
    

    For k=90 it is indeed faster to build with k=15 and then query with k=90, so I can confirm your observation.

    My only suggestion would be to modify the NNDescent class so that if the desired k is less than some threshold then build is done with k+1 and then the constructed tree is returned without query. We can simply use 15 as the threshold. I did this locally and can PR.

    opened by dkobak 15
  • Cannot pass random_state to PerplexityBasedNN when using Annoy

    Cannot pass random_state to PerplexityBasedNN when using Annoy

    Hi Pavlin,

    this is quite a miniscule bug, but I noticed that when using PerplexityBasedNN it fails when you pass it a numpy RandomState instance as it uses that for the call to the AnnoyIndex(...).set_seed(seed) call. Since the documentation says that is accepts both an integer and a numpy random state, I guess this is a (tiny) bug.

    Expected behaviour

    It sets a seed for the internal random state of annoy.

    Actual behaviour

    It crashes with a TypeError:

      File "/home/jnb/dev/openTSNE/openTSNE/nearest_neighbors.py", line 276, in build
        self.index.set_seed(self.random_state)
    TypeError: an integer is required (got type numpy.random.mtrand.RandomState)
    
    Steps to reproduce the behavior
    import numpy as np
    from openTSNE import PerplexityBasedNN
    
    random_state = np.random.default_rng(333)
    data = random_state.uniform(size=(10000,10))
    PerplexityBasedNN(rdata, andom_state=random_state)
    

    Fix

    in nearest_neighbors.py line 275 can be changed from self.index.set_seed(self.random_state) to

            if isinstance(self.random_state, int):
                self.index.set_seed(self.random_state)
            else: # has to be a numpy RandomState
                self.index.set_seed(self.random_state.randint(-(2 ** 31), 2 ** 31))
    

    Let me know if it should come as a pull request or if you'll just incorporate it like this. Cheers

    opened by jnboehm 14
  • Workaround for -1 in pynndescent index

    Workaround for -1 in pynndescent index

    Fixes #130 .

    Changes:

    • Query() is only used for k>15.
    • n_jobs fixed to 1 for sparse inputs to avoid a pynndescent bug
    • find all points where index contains -1 values, and let them randomly attract each other.
    opened by dkobak 14
  • Don't see any effect of n_jobs

    Don't see any effect of n_jobs

    Parameter n_jobs does not seem to influence the speed at all: I get exactly the same speed with n_jobs=1 and n_jobs=-1 (as well as other values) on my n=23k data set. It's weird -- what can I do to debug this?

    opened by dkobak 14
  • The docstring on pickling Annoy objects seems to be out of date

    The docstring on pickling Annoy objects seems to be out of date

    The docstring here https://github.com/pavlin-policar/openTSNE/blob/master/openTSNE/nearest_neighbors.py#L213 (about pickling Annoy objects into a separate file) seems obsolete: that's not the way this is currently done in the code.

    opened by dkobak 0
  • [Windows] save TSNEEmbedding to binary, Directory error

    [Windows] save TSNEEmbedding to binary, Directory error

    Expected behaviour

    On Windows OS, when trying to save the TSNEEmbedding object, or affinities, tried to save it with pickle.dump(embeddings,open(os.path.join(self.models_path,"tsne_global_embeddings.sav"),"wb")) or also tried to save as array to reconstruct the object later using numpy.save("file.npy",affinities)

    These lines both work just fine under linux distributions what I tried. But loading them back on Windows breaks with the same error as on the save methods, both scenario.

    Actual behaviour

    Windows OS can't find/create the temporary directory/files when trying to touch the file. unfortunately I haven't had more time to look deeper into it yet, what could cause this behaviour.

    *** NotADirectoryError: [WinError 267] The directory name is invalid: 'C:\\Users\\tomcs\\AppData\\Local\\Temp\\tmp7biujwdz\\tmp.ann

    Steps to reproduce the behavior

    opentsne==0.6.2

    I think this would be the same with most settings. although I am using the following settings to train before trying to save.

    affinities = openTSNE.affinity.PerplexityBasedNN( X, perplexity=500, n_jobs=32, random_state=0, ) init = openTSNE.initialization.pca(X,n_components=3, random_state=42) tsne = openTSNE.TSNE(3, exaggeration=None, n_jobs=16, verbose=True, negative_gradient_method ="bh" ) embeddings = tsne.fit(affinities=affinities, initialization=init) pickle.dump(embeddings,open("tsne_global_embeddings.sav","wb"))

    opened by tomcsojn 1
  • pandas DataFrames KeyError

    pandas DataFrames KeyError

    Hi Pavlin,

    I worked with openTSNE using pandas DataFrames as input for a while and it worked fine. Suddenly I got a KeyError. It seems that for larger DataFrames openTSNE brekas down (other than sklearn). This minimal example will reproduce the error:

    from openTSNE import TSNE
    import numpy as np
    import pandas as pd
    
    df = pd.DataFrame(np.random.randint(0,100,size=(2000, 4)), 
                      columns=list('ABCD'))
    
    tsne = TSNE()
    
    ## adding this line solves the problem entirely
    # df  = df.to_numpy()
    
    ## this works
    tsne.fit(df[:500])
    
    ## this doesn't work
    tsne.fit(df)
    

    Do you have an idea what could cause this, and whether it's worth while to extend openTSNE for pandas DataFrames?

    bug 
    opened by fsvbach 8
  • Memory collapses with precomputed block matrix

    Memory collapses with precomputed block matrix

    Expected behaviour

    When I run tSNE on a symmetric 200x200 block matrix such as this one distancematrix0 I expect TSNE to return 4 distinct clusters (actually 4 points only). Sklearn yields this.

    Actual behaviour

    Using openTSNE the terminal crashes with full memory (50% of the time). If it survives the clusters are visible, however the result is not as satisfying.

    Steps to reproduce the behavior

    matrix = Block matrix tsne = TSNE(metric='precomputed', initialization='spectral', negative_gradient_method='bh') embedding = tsne.fit(matrix)

    NOTE: I am using the direct installation from GitHub this morning.

    bug 
    opened by fsvbach 17
Releases(v0.6.2)
  • v0.6.2(Mar 18, 2022)

    Changes

    • By default, we now use the MultiscaleMixture affinity model, enabling us to pass in a list of perplexities instead of a single perplexity value. This is fully backwards compatible.
    • Previously, perplexity values would be changed according to the dataset. E.g. we pass in perplexity=100 with N=150. Then TSNE.perplexity would be equal to 50. Instead, keep this value as is and add an effective_perplexity_ attribute (following the convention from scikit-learn, which puts in the corrected perplexity values.
    • Fix bug where interpolation grid was being prepared even when using BH optimization during transform.
    • Enable calling .transform with precomputed distances. In this case, the data matrix will be assumed to be a distance matrix.

    Build changes

    • Build with oldest-supported-numpy
    • Build linux wheels on manylinux2014 instead of manylinux2010, following numpy's example
    • Build MacOS wheels on macOS-10.15 instead of macos-10.14 Azure VM
    • Fix potential problem with clang-13, which actually does optimization with infinities using the -ffast-math flag
    Source code(tar.gz)
    Source code(zip)
  • v0.6.0(Apr 25, 2021)

    Changes:

    • Remove affinites from TSNE construction, allow custom affinities and initialization in .fit method. This improves the API when dealing with non-tabular data. This is not backwards compatible.
    • Add metric="precomputed". This includes the addition of openTSNE.nearest_neighbors.PrecomputedDistanceMatrix and openTSNE.nearest_neighbors.PrecomputedNeighbors.
    • Add knn_index parameter to openTSNE.affinity classes.
    • Add (less-than-ideal) workaround for pickling Annoy objects.
    • Extend the range of recommended FFTW boxes up to 1000.
    • Remove deprecated openTSNE.nearest_neighbors.BallTree.
    • Remove deprecated openTSNE.callbacks.ErrorLogger.
    • Remove deprecated TSNE.neighbors_method property.
    • Add and set as default negative_gradient_method="auto".
    Source code(tar.gz)
    Source code(zip)
  • v0.5.0(Dec 24, 2020)

  • v0.4.0(May 4, 2020)

    Major changes:

    • Remove numba dependency, switch over to using Annoy nearest neighbor search. Pynndescent is now optional and can be used if installed manually.
    • Massively speed-up transform by keeping reference interpolation grid fixed. Limit new points to circle centered around reference embedding.
    • Implement variable degrees of freedom.

    Minor changes:

    • Add spectral initialization using diffusion maps.
    • Replace cumbersome ErrorLogger callback with the verbose flag.
    • Change the default number of iterations to 750.
    • Add learning_rate="auto" option.
    • Remove the min_grad_norm parameter.

    Bugfixes:

    • Fix case where KL divergence was sometimes reported as NaN.
    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Sep 11, 2018)

    In order to make usage as simple as possible and remove and external dependencies on FFTW (which needed to be installed locally before), this update replaces FFTW with numpy's FFT.

    Source code(tar.gz)
    Source code(zip)
Owner
Pavlin Poličar
PhD student working on applying machine learning methods to biomedical and scRNA-seq data.
Pavlin Poličar
Parallel t-SNE implementation with Python and Torch wrappers.

Multicore t-SNE This is a multicore modification of Barnes-Hut t-SNE by L. Van der Maaten with python and Torch CFFI-based wrappers. This code also wo

Dmitry Ulyanov 1.7k Jun 27, 2022
Parallel t-SNE implementation with Python and Torch wrappers.

Multicore t-SNE This is a multicore modification of Barnes-Hut t-SNE by L. Van der Maaten with python and Torch CFFI-based wrappers. This code also wo

Dmitry Ulyanov 1.5k Feb 17, 2021
Massively parallel self-organizing maps: accelerate training on multicore CPUs, GPUs, and clusters

Somoclu Somoclu is a massively parallel implementation of self-organizing maps. It exploits multicore CPUs, it is able to rely on MPI for distributing

Peter Wittek 236 Apr 26, 2022
Graphical display tools, to help students debug their class implementations in the Carcassonne family of projects

carcassonne_tools Graphical display tools, to help students debug their class implementations in the Carcassonne family of projects NOTE NOTE NOTE The

null 1 Nov 8, 2021
Python ts2vg package provides high-performance algorithm implementations to build visibility graphs from time series data.

ts2vg: Time series to visibility graphs The Python ts2vg package provides high-performance algorithm implementations to build visibility graphs from t

Carlos Bergillos 16 Jun 16, 2022
Extensible, parallel implementations of t-SNE

openTSNE openTSNE is a modular Python implementation of t-Distributed Stochasitc Neighbor Embedding (t-SNE) [1], a popular dimensionality-reduction al

Pavlin Poličar 751 Feb 15, 2021
Parallel t-SNE implementation with Python and Torch wrappers.

Multicore t-SNE This is a multicore modification of Barnes-Hut t-SNE by L. Van der Maaten with python and Torch CFFI-based wrappers. This code also wo

Dmitry Ulyanov 1.7k Jun 27, 2022
Parallel t-SNE implementation with Python and Torch wrappers.

Multicore t-SNE This is a multicore modification of Barnes-Hut t-SNE by L. Van der Maaten with python and Torch CFFI-based wrappers. This code also wo

Dmitry Ulyanov 1.5k Feb 17, 2021
Python implementation of an automatic parallel parking system in a virtual environment, including path planning, path tracking, and parallel parking

Automatic Parallel Parking: Path Planning, Path Tracking & Control This repository contains a python implementation of an automatic parallel parking s

null 82 Jun 27, 2022
SNE-RoadSeg in PyTorch, ECCV 2020

SNE-RoadSeg Introduction This is the official PyTorch implementation of SNE-RoadSeg: Incorporating Surface Normal Information into Semantic Segmentati

null 213 Jun 17, 2022
The implementation for paper Joint t-SNE for Comparable Projections of Multiple High-Dimensional Datasets.

Joint t-sne This is the implementation for paper Joint t-SNE for Comparable Projections of Multiple High-Dimensional Datasets. abstract: We present Jo

IDEAS Lab 5 Apr 12, 2022
The test data, code and detailed description of the AW t-SNE algorithm

AW-t-SNE The test data, code and result of the AW t-SNE algorithm Structure of the folder Datasets: This folder contains two datasets, the MNIST datas

null 1 Mar 9, 2022
Company clustering with K-means/GMM and visualization with PCA, t-SNE, using SSAN relation extraction

RE results graph visualization and company clustering Installation pip install -r requirements.txt python -m nltk.downloader stopwords python3.7 main.

Jieun Han 1 Feb 11, 2022
Drop-in replacement of Django admin comes with lots of goodies, fully extensible with plugin support, pretty UI based on Twitter Bootstrap.

Xadmin Drop-in replacement of Django admin comes with lots of goodies, fully extensible with plugin support, pretty UI based on Twitter Bootstrap. Liv

差沙 4.7k Jun 27, 2022
Simple and extensible administrative interface framework for Flask

Flask-Admin The project was recently moved into its own organization. Please update your references to [email protected]:flask-admin/flask-admin.git. Int

Flask-Admin 5.1k Jun 30, 2022
Kotti is a high-level, Pythonic web application framework based on Pyramid and SQLAlchemy. It includes an extensible Content Management System called the Kotti CMS.

Kotti Kotti is a high-level, Pythonic web application framework based on Pyramid and SQLAlchemy. It includes an extensible Content Management System c

Kotti 383 Jun 17, 2022
Python money class with optional CLDR-backed locale-aware formatting and an extensible currency exchange solution.

Python Money Money class with optional CLDR-backed locale-aware formatting and an extensible currency exchange solution. This is version 1.4.0-dev. De

Carlos Palol 206 Jun 18, 2022
Jinja is a fast, expressive, extensible templating engine.

Jinja is a fast, expressive, extensible templating engine. Special placeholders in the template allow writing code similar to Python syntax.

The Pallets Projects 8.6k Jun 27, 2022
A fast, extensible and spec-compliant Markdown parser in pure Python.

mistletoe mistletoe is a Markdown parser in pure Python, designed to be fast, spec-compliant and fully customizable. Apart from being the fastest Comm

Mi Yu 485 Jun 28, 2022
Simple and extensible administrative interface framework for Flask

Flask-Admin The project was recently moved into its own organization. Please update your references to [email protected]:flask-admin/flask-admin.git. Int

Flask-Admin 4.6k Feb 7, 2021
simplejson is a simple, fast, extensible JSON encoder/decoder for Python

simplejson simplejson is a simple, fast, complete, correct and extensible JSON <http://json.org> encoder and decoder for Python 3.3+ with legacy suppo

null 1.5k Jun 29, 2022
Lightweight, extensible data validation library for Python

Cerberus Cerberus is a lightweight and extensible data validation library for Python. >>> v = Validator({'name': {'type': 'string'}}) >>> v.validate({

eve 2.8k Jun 27, 2022
A simple, fast, extensible python library for data validation.

Validr A simple, fast, extensible python library for data validation. Simple and readable schema 10X faster than jsonschema, 40X faster than schematic

kk 208 Jun 4, 2022
Extensible memoizing collections and decorators

cachetools This module provides various memoizing collections and decorators, including variants of the Python Standard Library's @lru_cache function

Thomas Kemmer 1.4k Jun 27, 2022
A Fast, Extensible Progress Bar for Python and CLI

tqdm tqdm derives from the Arabic word taqaddum (تقدّم) which can mean "progress," and is an abbreviation for "I love you so much" in Spanish (te quie

tqdm developers 22.3k Jun 28, 2022
fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.

fastNLP fastNLP是一款轻量级的自然语言处理(NLP)工具包,目标是快速实现NLP任务以及构建复杂模型。 fastNLP具有如下的特性: 统一的Tabular式数据容器,简化数据预处理过程; 内置多种数据集的Loader和Pipe,省去预处理代码; 各种方便的NLP工具,例如Embedd

fastNLP 2.6k Jun 21, 2022
Pattern Matching for Python 3.7+ in a simple, yet powerful, extensible manner.

Awesome Pattern Matching (apm) for Python pip install awesome-pattern-matching Simple Powerful Extensible Composable Functional Python 3.7+, PyPy3.7+

Julian Fleischer 92 May 3, 2022
fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.

fastNLP fastNLP是一款轻量级的自然语言处理(NLP)工具包,目标是快速实现NLP任务以及构建复杂模型。 fastNLP具有如下的特性: 统一的Tabular式数据容器,简化数据预处理过程; 内置多种数据集的Loader和Pipe,省去预处理代码; 各种方便的NLP工具,例如Embedd

fastNLP 2k Feb 14, 2021