Dimensionality reduction in very large datasets using Siamese Networks

Overview

DOI DOI Documentation Status Downloads Build Status

ivis

Implementation of the ivis algorithm as described in the paper Structure-preserving visualisation of high dimensional single-cell datasets. Ivis is designed to reduce dimensionality of very large datasets using a siamese neural network trained on triplets. Both unsupervised and supervised modes are supported.

ivis 10M data points

Installation

Ivis runs on top of TensorFlow. To install the latest ivis release from PyPi running on the CPU TensorFlow package, run:

# TensorFlow 2 packages require a pip version >19.0.
pip install --upgrade pip
pip install ivis[cpu]

If you have CUDA installed and want ivis to use the tensorflow-gpu package, run

pip install ivis[gpu]

Development version can be installed directly from from github:

git clone https://github.com/beringresearch/ivis
cd ivis
pip install -e '.[cpu]'

The following optional dependencies are needed if using the visualization callbacks while training the Ivis model:

  • matplotlib
  • seaborn

Upgrading

Ivis Python package is updated frequently! To upgrade, run:

pip install ivis --upgrade

Features

  • Scalable: ivis is fast and easily extends to millions of observations and thousands of features.
  • Versatile: numpy arrays, sparse matrices, and hdf5 files are supported out of the box. Additionally, both categorical and continuous features are handled well, making it easy to apply ivis to heterogeneous problems including clustering and anomaly detection.
  • Accurate: ivis excels at preserving both local and global features of a dataset. Often, ivis performs better at preserving global structure of the data than t-SNE, making it easy to visualise and interpret high-dimensional datasets.
  • Generalisable: ivis supports addition of new data points to original embeddings via a transform method, making it easy to incorporate ivis into standard sklearn Pipelines.

And many more! See ivis readme for latest additions and examples.

Examples

from ivis import Ivis
from sklearn.preprocessing import MinMaxScaler
from sklearn import datasets

iris = datasets.load_iris()
X = iris.data
X_scaled = MinMaxScaler().fit_transform(X)

model = Ivis(embedding_dims=2, k=15)

embeddings = model.fit_transform(X_scaled)

Copyright 2021 Bering Limited

Comments
  • Bug with index.build(ntrees)

    Bug with index.build(ntrees)

    Hello,

    I'm trying to run the ivis examples (both the simple iris one and the mnist one, and I keep getting this error whenever the model fitting is being called (running this on Debian). Any thoughts?

    In [7]: embeddings = ivis.fit_transform(mnist.data)
    
    Error truncating file: Invalid argument
    ---------------------------------------------------------------------------
    Exception                                 Traceback (most recent call last)
    <ipython-input-7-d5f1692c2b85> in <module>
    ----> 1 embeddings = ivis.fit_transform(mnist.data)
    
    /opt/conda/envs/ivisumap/lib/python3.7/site-packages/ivis/ivis.py in fit_transform(self, X, Y, shuffle_mode)
        289         """
        290
    --> 291         self.fit(X, Y, shuffle_mode)
        292         return self.transform(X)
        293
    
    /opt/conda/envs/ivisumap/lib/python3.7/site-packages/ivis/ivis.py in fit(self, X, Y, shuffle_mode)
        269         """
        270
    --> 271         self._fit(X, Y, shuffle_mode)
        272         return self
        273
    
    /opt/conda/envs/ivisumap/lib/python3.7/site-packages/ivis/ivis.py in _fit(self, X, Y, shuffle_mode)
        146                 print('Building KNN index')
        147             build_annoy_index(X, self.annoy_index_path,
    --> 148                               ntrees=self.ntrees, verbose=self.verbose)
        149
        150         datagen = generator_from_index(X, Y,
    
    /opt/conda/envs/ivisumap/lib/python3.7/site-packages/ivis/data/knn.py in build_annoy_index(X, path, ntrees, verbose)
         28
         29     # Build n trees
    ---> 30     index.build(ntrees)
         31     if platform.system() == 'Windows':
         32         index.save(path)
    
    Exception: Invalid argument
    
    opened by sadatnfs 15
  • Windows compatibility?

    Windows compatibility?

    Really excited to compare Ivis to UMAP on a project I am currently working on.

    The server I have access to is a Windows 10 machine, with a Python 3.7 Anaconda environment.

    Following the install instructions and trying to run the MNIST example, I am seeing the following error: TypeError: can't pickle annoy.Annoy objects

    enhancement help wanted 
    opened by paul-harambee 13
  • Ivis seems to provoke errors when composing a sklearn.pipeline.Pipeline passed to sklearn.model_selection.GridSearchCV and executed in parallel

    Ivis seems to provoke errors when composing a sklearn.pipeline.Pipeline passed to sklearn.model_selection.GridSearchCV and executed in parallel

    The problem

    I noticed that when Ivis compose a sklearn.pipeline.Pipeline which is passed to sklearn.model_selection.GridSearch to fine-tune hyper-parameters across all estimators/transformers, and GridSearch has n_jobs=-1 (i.e., when executions within GridSearch are parallel), errors are thrown. This does not happen when n_jobs=1 (i.e., when the executions within GridSearch are sequential).

    Since Pipeline globally regulates the n_jobs parameter, thus not supporting the parallelization of only specific steps, this problem forces the global use of n_jobs=1, which sensibly slows down the fine-tuning process by underusing the computational power of the setup in which the script is being executed (even in parts where n_jobs=-1 would work).

    Environment

    A virtual environment was created specifically to this repository, wherein all modules described in requirements.txt were installed. My setup runs an up-to-date version of Windows 10 (no WSL).

    Runtime

    python=3.8.4
    

    Relevant modules

    ivis=2.0.3
    tensorflow=2.5.0
    

    Minimal reproducible example

    Code

    if __name__ == "__main__":
        import tempfile
        import ivis
    
        from sklearn import datasets, ensemble, model_selection, pipeline, preprocessing
        from os import environ
    
        environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
    
        X, y = datasets.load_iris(return_X_y=True)
    
        pipeline_with_ivis = pipeline.Pipeline([
            ("normalize", preprocessing.MinMaxScaler()),
            ("project", ivis.Ivis()),
            ("classify", ensemble.RandomForestClassifier()),
        ], memory=tempfile.mkdtemp())
    
        parameter_grid = {
            "project__k": (15,),
            "project__verbose": (True,),
    
            "classify__random_state": (2021,)
        }
    
        grid_search = model_selection.GridSearchCV(pipeline_with_ivis, parameter_grid, scoring="accuracy", cv=10, n_jobs=-1,
                                                   return_train_score=True, verbose=3).fit(X, y)
    

    Error

    <REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:615: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
    Traceback (most recent call last):
      File "<REPOSITORY_ROOT>\ivis\data\neighbour_retrieval\knn.py", line 212, in extract_knn
        process.start()
      File "C:\Python38\lib\multiprocessing\process.py", line 121, in start
        self._popen = self._Popen(self)
      File "C:\Python38\lib\multiprocessing\context.py", line 224, in _Popen
        return _default_context.get_context().Process._Popen(process_obj)
      File "<REPOSITORY_ROOT>\venv\lib\site-packages\joblib\externals\loky\backend\process.py", line 39, in _Popen
        return Popen(process_obj)
      File "<REPOSITORY_ROOT>\venv\lib\site-packages\joblib\externals\loky\backend\popen_loky_win32.py", line 70, in __init__
        child_env.update(process_obj.env)
    AttributeError: 'KnnWorker' object has no attribute 'env'
    
    During handling of the above exception, another exception occurred:
    Traceback (most recent call last):
      File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 598, in _fit_and_score
        estimator.fit(X_train, y_train, **fit_params)
      File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 341, in fit
        Xt = self._fit(X, y, **fit_params_steps)
      File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 303, in _fit
        X, fitted_transformer = fit_transform_one_cached(
      File "<REPOSITORY_ROOT>\venv\lib\site-packages\joblib\memory.py", line 591, in __call__
        return self._cached_call(args, kwargs)[0]
      File "<REPOSITORY_ROOT>\venv\lib\site-packages\joblib\memory.py", line 534, in _cached_call
        out, metadata = self.call(*args, **kwargs)
      File "<REPOSITORY_ROOT>\venv\lib\site-packages\joblib\memory.py", line 761, in call
        output = self.func(*args, **kwargs)
      File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 754, in _fit_transform_one
        res = transformer.fit_transform(X, y, **fit_params)
      File "<REPOSITORY_ROOT>\ivis\ivis.py", line 350, in fit_transform
        self.fit(X, Y, shuffle_mode)
      File "<REPOSITORY_ROOT>\ivis\ivis.py", line 328, in fit
        self._fit(X, Y, shuffle_mode)
      File "<REPOSITORY_ROOT>\ivis\ivis.py", line 190, in _fit
        self.neighbour_matrix = AnnoyKnnMatrix.build(X, path=self.annoy_index_path,
      File "<REPOSITORY_ROOT>\ivis\data\neighbour_retrieval\knn.py", line 63, in build
        return cls(index, X.shape, path, k, search_k, precompute, include_distances, verbose)
      File "<REPOSITORY_ROOT>\ivis\data\neighbour_retrieval\knn.py", line 48, in __init__
        self.precomputed_neighbours = self.get_neighbour_indices()
      File "<REPOSITORY_ROOT>\ivis\data\neighbour_retrieval\knn.py", line 96, in get_neighbour_indices
        return extract_knn(
      File "<REPOSITORY_ROOT>\ivis\data\neighbour_retrieval\knn.py", line 236, in extract_knn
        process.terminate()
      File "C:\Python38\lib\multiprocessing\process.py", line 133, in terminate
        self._popen.terminate()
    AttributeError: 'NoneType' object has no attribute 'terminate'
      warnings.warn("Estimator fit failed. The score on this train-test"
    
    [...]
    
    <REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_search.py:922: UserWarning: One or more of the test scores are non-finite: [nan]
      warnings.warn(
    

    Discussion

    By coding and playing with the example above, I acquired the understanding that, since both sklearn uses joblib and ivis uses multiprocessing, these modules might not be playing well with each other for some reason.

    I would discard the understanding that nested estimators/transformers with parallel routines would be the problem: estimators like sklearn.ensemble.RandomForestClassifier can be set to have n_jobs=-1 without problem within the Pipeline passed to GridSearchCV.

    I am particularly affected by this issue because I want to employ ivis in projects that involve hyper-parameter fine-tuning using cross-validation via GridSearchCV with concurrent executions. I attempted to diagnose the problem, but to no avail, which is why I bring this issue to your attention.

    Observation: another part of this problem is a design choice that is not adherent to the sklearn API guidelines, whose solution I propose and detail in #95. This issue does not cause the aforementioned error, but might cause other errors that could affect the same use scenario (Pipeline in GridSearchCV running in parallel).

    opened by imatheussm 10
  • attempt  to apply non-function

    attempt to apply non-function

    I want to install ivis in R, but show the error as the title. The system of my computer is Windows, so I have installed conda before running the code., can anyone help me to solve this problem. thank you! library (reticulate) devtools : : install _github("beringresearch/ivis/R-package") library (ivis) model <- ivis (k = 3) Error in ivis _object$Ivis(embedding _dims = embedding _dims, k = k, distance = distance, : attempt to apply non-function

    opened by Feifei0511 9
  • Issue installing and running ivis R package in RStudio

    Issue installing and running ivis R package in RStudio

    Hello,

    For JOSS review.

    The installation instructions fail when run in the RStudio environment:

    > devtools::install_github("beringresearch/ivis/R-package", force=TRUE)
    Downloading GitHub repo beringresearch/ivis@master
    ✔  checking for file ‘/private/var/folders/cp/8rn2cs_x79zcbp_yb75ychg80000gq/T/Rtmpud6pnU/remotesbe4d59017fdb/beringresearch-ivis-bbccdb7/R-package/DESCRIPTION’ ...
    ─  preparing ‘ivis’:
    ✔  checking DESCRIPTION meta-information ...
    ─  checking for LF line-endings in source and make files and shell scripts
    ─  checking for empty or unneeded directories
    ─  building ‘ivis_1.1.3.tar.gz’
       
    * installing *source* package ‘ivis’ ...
    ** using staged installation
    ** R
    ** byte-compile and prepare package for lazy loading
    ** help
    *** installing help indices
    ** building package indices
    ** testing if installed package can be loaded from temporary location
    Error: package or namespace load failed for ‘ivis’:
     .onLoad failed in loadNamespace() for 'ivis', details:
      call: path.expand(path)
      error: invalid 'path' argument
    Error: loading failed
    Execution halted
    ERROR: loading failed
    * removing ‘/Library/Frameworks/R.framework/Versions/3.6/Resources/library/ivis’
    * restoring previous ‘/Library/Frameworks/R.framework/Versions/3.6/Resources/library/ivis’
    Error: Failed to install 'ivis' from GitHub:
      (converted from warning) installation of package ‘/var/folders/cp/8rn2cs_x79zcbp_yb75ychg80000gq/T//Rtmpud6pnU/filebe4d71713083/ivis_1.1.3.tar.gz’ had non-zero exit status
    

    However, it does work fine when run in the console (Darwin Kernel Version 18.6.0: Thu Apr 25 23:16:27 PDT 2019; root:xnu-4903.261.4~2/RELEASE_X86_64 x86_64):

    > devtools::install_github("beringresearch/ivis/R-package", force=TRUE)
    Downloading GitHub repo beringresearch/ivis@master
       checking for file ‘/private/var/folders/cp/8rn2cs_x79zcbp_yb75ychg80000gq/T/Rtmpvj2CT3/remotesc3827327cfb8/beringresearch-ivis-bbccdb7/R-package/DESCRIPTION’✔  checking for file ‘/private/var/folders/cp/8rn2cs_x79zcbp_yb75ychg80000gq/T/Rtmpvj2CT3/remotesc3827327cfb8/beringresearch-ivis-bbccdb7/R-package/DESCRIPTION’
    ─  preparing ‘ivis’:
    ✔  checking DESCRIPTION meta-information ...
    ─  checking for LF line-endings in source and make files and shell scripts
    ─  checking for empty or unneeded directories
    ─  building ‘ivis_1.1.3.tar.gz’
       
    * installing *source* package ‘ivis’ ...
    ** using staged installation
    ** R
    ** byte-compile and prepare package for lazy loading
    ** help
    *** installing help indices
    ** building package indices
    ** testing if installed package can be loaded from temporary location
    ** testing if installed package can be loaded from final location
    ** testing if installed package keeps a record of temporary installation path
    * DONE (ivis)
    

    Moreover, the ivis package (installed from the terminal) can be loaded from an R console in a terminal, but throws the following error when loaded in RStudio

    > library(ivis)
    Error: package or namespace load failed for ‘ivis’:
     .onLoad failed in loadNamespace() for 'ivis', details:
      call: path.expand(path)
      error: invalid 'path' argument
    

    This is most likely due to conda not being on the PATH in RStudio:

    # RStudio
    > system("echo $PATH")
    /usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/local/ncbi/igblast/bin:/Library/TeX/texbin:/opt/X11/bin:/opt/local/bin
    # Console
    > system("echo $PATH")
    /Users/kevin/miniconda3/bin:/Users/kevin/miniconda3/condabin:/usr/local/opt/imagemagick@6/bin:/Users/kevin/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/ncbi/igblast/bin:/Library/TeX/texbin:/opt/X11/bin
    

    Is there a recommended way to set up an environment to run ivis in RStudio, or are users only expected to run it from a terminal R console?

    Thanks!

    opened by kevinrue 6
  • Enable registration or passing of a custom triplet loss function

    Enable registration or passing of a custom triplet loss function

    In Python, Ivis.__init__ accepts a distance: str keyword argument, which sets from a dictionary a predefined triplet loss function for that distance metric. Currently, one of the ways to provide a custom distance function is to monkeypatch the ivis.nn.losses.get_loss_functions. Other ways to accomplish the same are even messier from the perspectives of usage and implementation.

    The nature of dimensionality reduction, especially when dealing with one-hot-encoded categorical features, sometimes requires custom ways to calculate loss. Under the hood, ivis has the ability to enable custom loss functions, but any such offerings need to be implemented in a clean and API-idiomatic manner.


    A custom distance function requires its own triplet loss implementation. Ivis.__init__ could support an additional keyword argument (e.g. triplet_loss: Callable[..., ...] = ...) for users to be able to pass their own.

    Alternatively, it could simply be passed inside the existing distance kwarg, with its signature changing to distance: Union[str, Callable[..., ...]].

    Another way would be to make the losses dictionary built by ivis.nn.losses.get_loss_functions a module-level loss function registrar.

    Additionally, docs and examples need to be updated on how to correctly implement a custom loss function. With all currently available distance metrics, the triplet loss implementation follows a very similar pattern, and should not be too daunting to attempt to implement.

    opened by mihajenko 5
  • Add a vignette to the R package

    Add a vignette to the R package

    Hello,

    For JOSS review.

    Is your feature request related to a problem? Please describe.

    The R package lacks documentation of an application to a real-life dataset.

    Describe the solution you'd like

    Please add a vignette in the R package demonstrating at least an example application to a single-cell dataset. Basically, the equivalent of the scanpy workflow here.

    A convenient way to use the pbmc3k dataset for demonstration purposes is the Bioconductor TENxPBMCData package.

    Suggested code:

    library(TENxPBMCData)
    tenx_ pbmc3k <- TENxPBMCData(dataset = "pbmc3k")
    

    Ideally, consider using the vignette (or a separate one) to also give an introduction to the functionality of the R package. It is not necessary to duplicate information already described in the documentation of the Python package (DRY principle); you may simply include a link to the main page.

    Describe alternatives you've considered

    A working example of an R workflow could also be included in the documentation of the Python package, although this is probably unnecessarily difficult to maintain. Ideally, that example would be run and tested for every new release of the Python and R source code.

    Additional context Once you have an R vignette written, you should also consider using pkgdown to automatically create a GitHub website including the full package documentation.

    opened by kevinrue 5
  • Extremely slow extraction of KNN neighbours on 100k samples

    Extremely slow extraction of KNN neighbours on 100k samples

    I'm using ivis[cpu] on a dataset of about 100k samples with around 200k sparse features. My training dataset is stored in an h5 file and I use the following code to fit and transform the dataset:

    with h5py.File(filename, 'r') as f:
          X = f['data']
          Y = pd.Categorical(meta_df["label"]).codes
          model = Ivis(epochs=5, k=15)
          model.fit(X, Y, shuffle_mode='batch') # Shuffle batches when using h5 files
    
          embeddings = model.transform(X)
    

    However, it takes so long:

    Building KNN index
    100%|██████████| 105942/105942 [55:07<00:00, 32.03it/s]
    Extracting KNN neighbours
      0%|          | 262/105942 [7:16:38<2935:20:19, 99.99s/it]
    

    2935 hours!! Am I missing something? or this is expected? Should I switch to GPU?

    By the way, I'm using a google colab system with 8 CPU cores, 50 GB Ram, and an SSD disk.

    opened by adavoudi 4
  • How to get stable results?

    How to get stable results?

    Hello Folks,

    thank you for all the work on this lib. I have a question about reproducibility: Is there a way to set a random seed or random state and get stable results?

    I'm trying to achieve this with:

    import random
    import numpy
    random.seed(42)
    numpy.random.seed(42)
    

    I'm aware that these are not threadsafe, so this may be the reason of the not reproducible results. Anyway, is there any way to enforce this?

    opened by rsarai 4
  • model_save: optimizer is not compatible with pickle

    model_save: optimizer is not compatible with pickle

    When attempting to use save_model after fitting a supervised Ivis instance, I get an error when trying to save. It looks like some part of the optimizer is not compatible to be pickled with python.

    Replicate:

    import ivis
    i = ivis.Ivis(embedding_dims=10, n_epochs_without_progress=5)
    i.fit(X, y)
    i.save_model("model.ivis")
    
    Traceback (most recent call last):
      File "src/ivis_persist.py", line 69, in <module>
        ivises[output].save_model(f"models/{output}.ivis")
      File "/Users/pbaumgartner/anaconda3/envs/env/lib/python3.7/site-packages/ivis/ivis.py", line 404, in save_model
        pkl.dump(self.model_.optimizer, f)
    AttributeError: Can't pickle local object 'make_gradient_clipnorm_fn.<locals>.<lambda>'
    

    System Info: Running ivis==2.0.0 on macOS with python 3.7.

    bug 
    opened by pmbaumgartner 4
  • R pkg fit() call finishes but subprocess doesn't terminate

    R pkg fit() call finishes but subprocess doesn't terminate

    This model consistently feels like a magic trick, thanks for contributing!

    Bug I'm running the ivis R package(v1.7.1) (more system details below). I can get model$fit() and model$transform() working just fine and producing substantive results. However, when the R process finishes and returns the fitted model, I'm seeing continued sky-high system usage. The R process calling ivis is definitely completed and back to a command prompt, but in htop I can see the RStudio GUI process (parent of the rsession process) occupying at least 2 full cores. Some process further down is not stopping when the R process gets the returned value. (Restarting the R session does kill it.)

    I don't understand enough of the ivis-through-reticulate toolchain to provide more helpful diagnostics in this first report, but happy to run experiments and document further.

    Environment

    • ivis R package(v1.7.1), installed from Github (beringresearch/ivis@56a8479) 14 Apr 2020
    • reticulate (v1.15), 2020-04-02 CRAN (R 3.6.2)
    • R 3.6.2 on MacOS 10.14.6 (18G4032)
    platform       x86_64-apple-darwin15.6.0   
    arch           x86_64                      
    os             darwin15.6.0                
    system         x86_64, darwin15.6.0        
    status                                     
    major          3                           
    minor          6.2                         
    year           2019                        
    month          12                          
    day            12                          
    svn rev        77560                       
    language       R                           
    version.string R version 3.6.2 (2019-12-12)
    nickname       Dark and Stormy Night  
    
    opened by sheffe 4
  • InternalError: Graph execution error:

    InternalError: Graph execution error:

    Hello, I want to use ivis to do the analysis for my scRNA-seq data.

    Here is my code:

    def getReduction(X):
        #X = PCA(n_components=4, copy=True, random_state=1).fit_transform(X)
        from ivis import Ivis
        model = Ivis(embedding_dims=4, k=15)
        X = model.fit_transform(X)
        print(X.shape)
        return X
    

    but I got some errors:

    ---------------------------------------------------------------------------
    InternalError                             Traceback (most recent call last)
    Input In [9], in <cell line: 1>()
    ----> 1 multi_train_x = getReduction(train_x)
    
    Input In [8], in getReduction(X)
          3 from ivis import Ivis
          4 model = Ivis(embedding_dims=6, k=15)
    ----> 5 X = model.fit_transform(X)
          6 print(X.shape)
          7 return X
    
    File /opt/conda/lib/python3.8/site-packages/ivis/ivis.py:368, in Ivis.fit_transform(self, X, Y, shuffle_mode)
        349 def fit_transform(self, X, Y=None, shuffle_mode=True):
        350     """Fit to data then transform
        351 
        352     Parameters
       (...)
        365         Embedding of the data in low-dimensional space.
        366     """
    --> 368     self.fit(X, Y, shuffle_mode)
        369     return self.transform(X)
    
    File /opt/conda/lib/python3.8/site-packages/ivis/ivis.py:346, in Ivis.fit(self, X, Y, shuffle_mode)
        328 def fit(self, X, Y=None, shuffle_mode=True):
        329     """Fit an ivis model.
        330 
        331     Parameters
       (...)
        343         Returns estimator instance.
        344     """
    --> 346     self._fit(X, Y, shuffle_mode)
        347     return self
    
    File /opt/conda/lib/python3.8/site-packages/ivis/ivis.py:318, in Ivis._fit(self, X, Y, shuffle_mode)
        315 if self.verbose > 0:
        316     print('Training neural network')
    --> 318 hist = self.model_.fit(
        319     datagen,
        320     epochs=self.epochs,
        321     callbacks=self.callbacks_ + [EarlyStopping(monitor='loss',
        322                                                patience=self.n_epochs_without_progress)],
        323     shuffle=shuffle_mode,
        324     steps_per_epoch=int(np.ceil(X.shape[0] / self.batch_size)),
        325     verbose=self.verbose)
        326 self.loss_history_ += hist.history['loss']
    
    File /opt/conda/lib/python3.8/site-packages/keras/utils/traceback_utils.py:67, in filter_traceback.<locals>.error_handler(*args, **kwargs)
         65 except Exception as e:  # pylint: disable=broad-except
         66   filtered_tb = _process_traceback_frames(e.__traceback__)
    ---> 67   raise e.with_traceback(filtered_tb) from None
         68 finally:
         69   del filtered_tb
    
    File /opt/conda/lib/python3.8/site-packages/tensorflow/python/eager/execute.py:54, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
         52 try:
         53   ctx.ensure_initialized()
    ---> 54   tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
         55                                       inputs, attrs, num_outputs)
         56 except core._NotOkStatusException as e:
         57   if name is not None:
    
    InternalError: Graph execution error:
    
    Detected at node 'model_1/model/dense/MatMul' defined at (most recent call last):
        File "/opt/conda/lib/python3.8/runpy.py", line 194, in _run_module_as_main
          return _run_code(code, main_globals, None,
        File "/opt/conda/lib/python3.8/runpy.py", line 87, in _run_code
          exec(code, run_globals)
        File "/opt/conda/lib/python3.8/site-packages/ipykernel_launcher.py", line 17, in <module>
          app.launch_new_instance()
        File "/opt/conda/lib/python3.8/site-packages/traitlets/config/application.py", line 846, in launch_instance
          app.start()
        File "/opt/conda/lib/python3.8/site-packages/ipykernel/kernelapp.py", line 712, in start
          self.io_loop.start()
        File "/opt/conda/lib/python3.8/site-packages/tornado/platform/asyncio.py", line 199, in start
          self.asyncio_loop.run_forever()
        File "/opt/conda/lib/python3.8/asyncio/base_events.py", line 570, in run_forever
          self._run_once()
        File "/opt/conda/lib/python3.8/asyncio/base_events.py", line 1859, in _run_once
          handle._run()
        File "/opt/conda/lib/python3.8/asyncio/events.py", line 81, in _run
          self._context.run(self._callback, *self._args)
        File "/opt/conda/lib/python3.8/site-packages/ipykernel/kernelbase.py", line 504, in dispatch_queue
          await self.process_one()
        File "/opt/conda/lib/python3.8/site-packages/ipykernel/kernelbase.py", line 493, in process_one
          await dispatch(*args)
        File "/opt/conda/lib/python3.8/site-packages/ipykernel/kernelbase.py", line 400, in dispatch_shell
          await result
        File "/opt/conda/lib/python3.8/site-packages/ipykernel/kernelbase.py", line 724, in execute_request
          reply_content = await reply_content
        File "/opt/conda/lib/python3.8/site-packages/ipykernel/ipkernel.py", line 383, in do_execute
          res = shell.run_cell(
        File "/opt/conda/lib/python3.8/site-packages/ipykernel/zmqshell.py", line 528, in run_cell
          return super().run_cell(*args, **kwargs)
        File "/opt/conda/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 2880, in run_cell
          result = self._run_cell(
        File "/opt/conda/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 2935, in _run_cell
          return runner(coro)
        File "/opt/conda/lib/python3.8/site-packages/IPython/core/async_helpers.py", line 129, in _pseudo_sync_runner
          coro.send(None)
        File "/opt/conda/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3134, in run_cell_async
          has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
        File "/opt/conda/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3337, in run_ast_nodes
          if await self.run_code(code, result, async_=asy):
        File "/opt/conda/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3397, in run_code
          exec(code_obj, self.user_global_ns, self.user_ns)
        File "/tmp/ipykernel_1917/2291785529.py", line 1, in <cell line: 1>
          multi_train_x = getReduction(train_x)
        File "/tmp/ipykernel_1917/2290316524.py", line 5, in getReduction
          X = model.fit_transform(X)
        File "/opt/conda/lib/python3.8/site-packages/ivis/ivis.py", line 368, in fit_transform
          self.fit(X, Y, shuffle_mode)
        File "/opt/conda/lib/python3.8/site-packages/ivis/ivis.py", line 346, in fit
          self._fit(X, Y, shuffle_mode)
        File "/opt/conda/lib/python3.8/site-packages/ivis/ivis.py", line 318, in _fit
          hist = self.model_.fit(
        File "/opt/conda/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
          return fn(*args, **kwargs)
        File "/opt/conda/lib/python3.8/site-packages/keras/engine/training.py", line 1409, in fit
          tmp_logs = self.train_function(iterator)
        File "/opt/conda/lib/python3.8/site-packages/keras/engine/training.py", line 1051, in train_function
          return step_function(self, iterator)
        File "/opt/conda/lib/python3.8/site-packages/keras/engine/training.py", line 1040, in step_function
          outputs = model.distribute_strategy.run(run_step, args=(data,))
        File "/opt/conda/lib/python3.8/site-packages/keras/engine/training.py", line 1030, in run_step
          outputs = model.train_step(data)
        File "/opt/conda/lib/python3.8/site-packages/keras/engine/training.py", line 889, in train_step
          y_pred = self(x, training=True)
        File "/opt/conda/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
          return fn(*args, **kwargs)
        File "/opt/conda/lib/python3.8/site-packages/keras/engine/training.py", line 490, in __call__
          return super().__call__(*args, **kwargs)
        File "/opt/conda/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
          return fn(*args, **kwargs)
        File "/opt/conda/lib/python3.8/site-packages/keras/engine/base_layer.py", line 1014, in __call__
          outputs = call_fn(inputs, *args, **kwargs)
        File "/opt/conda/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
          return fn(*args, **kwargs)
        File "/opt/conda/lib/python3.8/site-packages/keras/engine/functional.py", line 458, in call
          return self._run_internal_graph(
        File "/opt/conda/lib/python3.8/site-packages/keras/engine/functional.py", line 596, in _run_internal_graph
          outputs = node.layer(*args, **kwargs)
        File "/opt/conda/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
          return fn(*args, **kwargs)
        File "/opt/conda/lib/python3.8/site-packages/keras/engine/training.py", line 490, in __call__
          return super().__call__(*args, **kwargs)
        File "/opt/conda/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
          return fn(*args, **kwargs)
        File "/opt/conda/lib/python3.8/site-packages/keras/engine/base_layer.py", line 1014, in __call__
          outputs = call_fn(inputs, *args, **kwargs)
        File "/opt/conda/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
          return fn(*args, **kwargs)
        File "/opt/conda/lib/python3.8/site-packages/keras/engine/functional.py", line 458, in call
          return self._run_internal_graph(
        File "/opt/conda/lib/python3.8/site-packages/keras/engine/functional.py", line 596, in _run_internal_graph
          outputs = node.layer(*args, **kwargs)
        File "/opt/conda/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
          return fn(*args, **kwargs)
        File "/opt/conda/lib/python3.8/site-packages/keras/engine/base_layer.py", line 1014, in __call__
          outputs = call_fn(inputs, *args, **kwargs)
        File "/opt/conda/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
          return fn(*args, **kwargs)
        File "/opt/conda/lib/python3.8/site-packages/keras/layers/core/dense.py", line 221, in call
          outputs = tf.matmul(a=inputs, b=self.kernel)
    Node: 'model_1/model/dense/MatMul'
    Attempting to perform BLAS operation using StreamExecutor without BLAS support
    	 [[{{node model_1/model/dense/MatMul}}]] [Op:__inference_train_function_1703]
    
    

    Thanks !!!

    opened by bitcometz 3
  • Add conda-forge package

    Add conda-forge package

    In addition to the pypi package, please add a conda-forge package (https://conda-forge.org).

    I can give support if needed.

    You can easily create a boilerplate conda recipe with grayskull (starting from the pypi package): https://github.com/conda-incubator/grayskull (note: the "annoy" package is called "python-annoy" in conda-forge).

    opened by candalfigomoro 0
  • Distance-weighted random sampling of non-neighbor negatives

    Distance-weighted random sampling of non-neighbor negatives

    Not a fully-baked feature request, just a directional hunch. I've found the conclusions from this paper Sampling Matters in Deep Embedding Learning pretty intuitive -- (1) the method for choosing negative samples is critical to the overall embedding, maybe more than the specific loss function, and (2) a distance-weighted sampling of negatives had some nice properties during training and better results compared to uniform random sampling or oversampling hard cases.

    I'm brand-new to Annoy, not confident on the implementation details or performance changes here, but I suspect that the prebuilt index could be used for both positive and negative sampling. An example: the current approach draws random negatives in sequence and chooses the first index not in a neighbor list. A distance-weighted approach for choosing a negative for each triplet might work like this:

    • Draw a random set of candidate negatives
    • Drop any candidate negatives already in the neighbor list
    • Choose from the remaining set of candidates with probabilities proportional to 1/f(dist(i, j)), where f(dist) could be just 1/dist, 1/sqrt(dist), etc

    Annoy gives us the dist(i, j) without much of a performance hit. Weighted choice of the candidate negatives puts a (tunable) thumb on the scale for triplets that contain closer/harder-negative matches.

    This idea probably does increase some hyperparameter selection headaches. I think the impactful choices here are the size of the initial set of candidate negatives and (especially) f(dist).

    opened by sheffe 2
  • Custom generator for training on out-of-memory datasets

    Custom generator for training on out-of-memory datasets

    In https://bering-ivis.readthedocs.io/en/latest/oom_datasets.html, for out-of-memory datasets, you say to train on h5 files that exist on disk.

    In my case, I can't use h5 files, but I could use a custom generator which yields numpy array batched data.

    Is there a way to provide batched data through a custom generator function? Something like keras' fit_generator.

    Thank you

    opened by candalfigomoro 5
Releases(2.08)
  • 2.07(Mar 10, 2022)

    • Added ability to save/load ivis models that have not been trained. This also fixes an issue when using GridSearchCV in conjunction with ivis
    • Bugfix for triplet generator when used in conjunction with a dataset exposing the custom get_triplet_data method
    Source code(tar.gz)
    Source code(zip)
  • 2.06(Oct 17, 2021)

    New features:

    • ivis models are now serializable via pickle/dill/joblib. Thanks to @imatheussm for his contributions toward this.
    • The save_model method now accepts an optional "save_format" argument. Setting it to "tfs" will export ivis models in the TensorFlow SavedModel format, which integrates well with other TensorFlow libraries.
    Source code(tar.gz)
    Source code(zip)
  • 2.0.5rc1(Jun 4, 2021)

    • Knn retrieval made more efficient by switching from multi-processing to multi-threading. Memory savings depend on OS and core count.
    • Fixed issue where saved ivis models would attempt to load the index at the path they were saved with - this can't be relied on when the index is temporary and deleted after use.
    • Fixed issue where Annoy Index metric parameter was not passed to an index that was loaded from disk.
    • A few other things changed, including better error handling, cleaner code, and allowing for saving AnnoyKnnMatrix via pickle
    Source code(tar.gz)
    Source code(zip)
  • 2.0.5(Jul 13, 2021)

    Highlights:

    • Improved training speed for numpy arrray inputs thanks to a faster triplet generator.
    • Batched retrieval capabilities that makes ivis much faster when training on out-of-memory data that is retrieved in parallel.
    • Improved performance when using Ivis with precompute=False option by using multi-threading when retrieving batches of KNN on-demand.
    • Added deprecation notices for minor upcoming changes to API for consistency and adherence to sklearn API.
    Source code(tar.gz)
    Source code(zip)
  • 2.0.3(May 26, 2021)

  • 2.02(Apr 15, 2021)

  • 2.0.1(Jan 6, 2021)

  • 2.0.0(Dec 8, 2020)

    Major ivis release!

    Version 2.0 features:

    • Unsupervised, semi-supervised, and fully supervised dimensionality reduction
    • Support for arbitrary datasets:
      • N-dimensional arrays
      • Image files on disk
      • Custom data connectors
    • In- and out-of-memory data ingestion
    • Resumable training
    • Arbitrary neural network backbones
    • Customizable neighbour retrieval
    • Callbacks and Tensorboard integration
    Source code(tar.gz)
    Source code(zip)
  • 1.8.4(Nov 2, 2020)

  • 1.8.3(Oct 28, 2020)

  • 1.8.2(Oct 28, 2020)

  • 1.8.1(Jun 11, 2020)

  • 1.8.0(May 13, 2020)

    • Introducing neighbour_matrix parameter for provision of arbitrary KNNs.
    • Transition to tf.Datasets, improving memory efficiency and overall stability
    Source code(tar.gz)
    Source code(zip)
  • 1.7.0(Jan 7, 2020)

  • 1.6.0(Oct 29, 2019)

    Major features:

    • Support for semi-supervised dimensionality reduction
    • Switch from using fit_generator to fit for training the Keras model
    • Address eager execution issues with TF 2.0
    • User-configurable on-disk-building of Annoy index.
    • Tidy handling of interrupted multi-thread processes

    Minor features:

    • Tests for semi-supervised DR

    • Improved input validation

    • Better hyper parameter validation

    • Slight changes to default hyperparameters

    • Bug fixes

    Source code(tar.gz)
    Source code(zip)
  • 1.5.3(Oct 3, 2019)

    • Control eager execution
    • R package updates and improvements
    • Save ivis object with a custom model
    • Bug squashes and performance improvements
    Source code(tar.gz)
    Source code(zip)
  • 1.5.0(Oct 1, 2019)

  • 1.4.1(Sep 5, 2019)

  • 1.4.0(Aug 19, 2019)

    A number of major additions:

    • Support for both classification- and regression-type supervision
    • Access to all Keras losses for supervised dimensionality reduction
    • Bug fixes and performance improvements
    Source code(tar.gz)
    Source code(zip)
  • 1.3.0(Aug 6, 2019)

    This release introduces a number of new features into ivis:

    • Windows support
    • Code changes to support ivis on Python2
    • R package received a major facelift - with big thanks to JOSS reviewers
    • Added cosine distance metric in triplet loss function
    • Minor bug fixes and performance improvements
    Source code(tar.gz)
    Source code(zip)
  • 1.2.4(Aug 5, 2019)

  • 1.2.3-joss(Aug 5, 2019)

  • 1.2.3(Jul 4, 2019)

  • 1.2.2(Jul 2, 2019)

  • 1.2.1(Jul 2, 2019)

  • 1.2.0(Jul 2, 2019)

    Supervised mode added to ivis. Additional features:

    • Add classification_weight parameter to allow users to tune balance between classification vs. triplet loss.
    • Add Ivis callbacks module for ivis-specific callbacks such as checkpointing during training. Ivis object code changed to deal with provided callbacks.
    • Tensorboard callbacks
    • Sparse matrix support in supervised mode
    Source code(tar.gz)
    Source code(zip)
  • 1.1.5(Jun 25, 2019)

    Significant improvement in processing speed for both precompute=True and precompute=False option using Keras Sequence generator. Addresses #21 .

    Source code(tar.gz)
    Source code(zip)
  • 1.1.4(Jun 20, 2019)

A central task in drug discovery is searching, screening, and organizing large chemical databases

A central task in drug discovery is searching, screening, and organizing large chemical databases. Here, we implement clustering on molecular similarity. We support multiple methods to provide a interactive exploration of chemical space.

NVIDIA Corporation 124 Jan 7, 2023
Visualizations for machine learning datasets

Introduction The facets project contains two visualizations for understanding and analyzing machine learning datasets: Facets Overview and Facets Dive

PAIR code 7.1k Jan 7, 2023
Visualize and compare datasets, target values and associations, with one line of code.

In-depth EDA (target analysis, comparison, feature analysis, correlation) in two lines of code! Sweetviz is an open-source Python library that generat

Francois Bertrand 2.3k Jan 5, 2023
The open-source tool for building high-quality datasets and computer vision models

The open-source tool for building high-quality datasets and computer vision models. Website • Docs • Try it Now • Tutorials • Examples • Blog • Commun

Voxel51 2.4k Jan 7, 2023
Visualizations for machine learning datasets

Introduction The facets project contains two visualizations for understanding and analyzing machine learning datasets: Facets Overview and Facets Dive

PAIR code 6.5k Feb 17, 2021
Visualize and compare datasets, target values and associations, with one line of code.

In-depth EDA (target analysis, comparison, feature analysis, correlation) in two lines of code! Sweetviz is an open-source Python library that generat

Francois Bertrand 1.2k Feb 18, 2021
The open-source tool for building high-quality datasets and computer vision models

The open-source tool for building high-quality datasets and computer vision models. Website • Docs • Try it Now • Tutorials • Examples • Blog • Commun

Voxel51 209 Feb 17, 2021
Draw datasets from within Jupyter.

drawdata This small python app allows you to draw a dataset in a jupyter notebook. This should be very useful when teaching machine learning algorithm

vincent d warmerdam 505 Nov 27, 2022
Glue is a python project to link visualizations of scientific datasets across many files.

Glue Glue is a python project to link visualizations of scientific datasets across many files. Click on the image for a quick demo: Features Interacti

null 675 Dec 9, 2022
HM02: Visualizing Interesting Datasets

HM02: Visualizing Interesting Datasets This is a homework assignment for CSCI 40 class at Claremont McKenna College. Go to the project page to learn m

Qiaoling Chen 11 Oct 26, 2021
HW 2: Visualizing interesting datasets

HW 2: Visualizing interesting datasets Check out the project instructions here! Mean Earnings per Hour for Males and Females My first graph uses data

null 7 Oct 27, 2021
Learning Convolutional Neural Networks with Interactive Visualization.

CNN Explainer An interactive visualization system designed to help non-experts learn about Convolutional Neural Networks (CNNs) For more information,

Polo Club of Data Science 6.3k Jan 1, 2023
Dipto Chakrabarty 7 Sep 6, 2022
Statistical data visualization using matplotlib

seaborn: statistical data visualization Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing

Michael Waskom 10.2k Dec 30, 2022
Statistical data visualization using matplotlib

seaborn: statistical data visualization Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing

Michael Waskom 8.1k Feb 13, 2021
Interactive plotting for Pandas using Vega-Lite

pdvega: Vega-Lite plotting for Pandas Dataframes pdvega is a library that allows you to quickly create interactive Vega-Lite plots from Pandas datafra

Altair 342 Oct 26, 2022
Statistical data visualization using matplotlib

seaborn: statistical data visualization Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing

Michael Waskom 8.1k Feb 18, 2021
Interactive plotting for Pandas using Vega-Lite

pdvega: Vega-Lite plotting for Pandas Dataframes pdvega is a library that allows you to quickly create interactive Vega-Lite plots from Pandas datafra

Altair 340 Feb 1, 2021
basemap - Plot on map projections (with coastlines and political boundaries) using matplotlib.

Basemap Plot on map projections (with coastlines and political boundaries) using matplotlib. ⚠️ Warning: this package is being deprecated in favour of

Matplotlib Developers 706 Dec 28, 2022