Non-Metric Space Library (NMSLIB): An efficient similarity search library and a toolkit for evaluation of k-NN methods for generic non-metric spaces.

Overview

Pypi version Downloads Downloads Build Status Windows Build Status Join the chat at https://gitter.im/nmslib/Lobby

Non-Metric Space Library (NMSLIB)

Important Notes

  • NMSLIB is generic but fast, see the results of ANN benchmarks.
  • A standalone implementation of our fastest method HNSW also exists as a header-only library.
  • All the documentation (including using Python bindings and the query server, description of methods and spaces, building the library, etc) can be found on this page.
  • For generic questions/inquiries, please, use the Gitter chat: GitHub issues page is for bugs and feature requests.

Objectives

Non-Metric Space Library (NMSLIB) is an efficient cross-platform similarity search library and a toolkit for evaluation of similarity search methods. The core-library does not have any third-party dependencies. It has been gaining popularity recently. In particular, it has become a part of Amazon Elasticsearch Service.

The goal of the project is to create an effective and comprehensive toolkit for searching in generic and non-metric spaces. Even though the library contains a variety of metric-space access methods, our main focus is on generic and approximate search methods, in particular, on methods for non-metric spaces. NMSLIB is possibly the first library with a principled support for non-metric space searching.

NMSLIB is an extendible library, which means that is possible to add new search methods and distance functions. NMSLIB can be used directly in C++ and Python (via Python bindings). In addition, it is also possible to build a query server, which can be used from Java (or other languages supported by Apache Thrift (version 0.12). Java has a native client, i.e., it works on many platforms without requiring a C++ library to be installed.

Authors: Bilegsaikhan Naidan, Leonid Boytsov, Yury Malkov, David Novak. With contributions from Ben Frederickson, Lawrence Cayton, Wei Dong, Avrelin Nikita, Dmitry Yashunin, Bob Poekert, @orgoro, @gregfriedland, Scott Gigante, Maxim Andreev, Daniel Lemire, Nathan Kurz, Alexander Ponomarenko.

Brief History

NMSLIB started as a personal project of Bilegsaikhan Naidan, who created the initial code base, the Python bindings, and participated in earlier evaluations. The most successful class of methods--neighborhood/proximity graphs--is represented by the Hierarchical Navigable Small World Graph (HNSW) due to Malkov and Yashunin (see the publications below). Other most useful methods, include a modification of the VP-tree due to Boytsov and Naidan (2013), a Neighborhood APProximation index (NAPP) proposed by Tellez et al. (2013) and improved by David Novak, as well as a vanilla uncompressed inverted file.

Credits and Citing

If you find this library useful, feel free to cite our SISAP paper [BibTex] as well as other papers listed in the end. One crucial contribution to cite is the fast Hierarchical Navigable World graph (HNSW) method [BibTex]. Please, also check out the stand-alone HNSW implementation by Yury Malkov, which is released as a header-only HNSWLib library.

License

The code is released under the Apache License Version 2.0 http://www.apache.org/licenses/. Older versions of the library include additional components, which have different licenses (but this does not apply to NMLISB 2.x):

Older versions of the library included the following components:

  • The LSHKIT, which is embedded in our library, is distributed under the GNU General Public License, see http://www.gnu.org/licenses/.
  • The k-NN graph construction algorithm NN-Descent due to Dong et al. 2011 (see the links below), which is also embedded in our library, seems to be covered by a free-to-use license, similar to Apache 2.
  • FALCONN library's licence is MIT.

Funding

Leonid Boytsov was supported by the Open Advancement of Question Answering Systems (OAQA) group and the following NSF grant #1618159: "Matching and Ranking via Proximity Graphs: Applications to Question Answering and Beyond". Bileg was supported by the iAd Center.

Related Publications

Most important related papers are listed below in the chronological order:

Comments
  • Add support to build aarch64 wheels

    Add support to build aarch64 wheels

    Travis-CI allows for the creation of aarch64 wheels.

    Build: https://travis-ci.com/github/janaknat/nmslib/builds/205780637

    There are 8-9 failures when testing hnsw. Any suggestions on how to fix these? A majority of the failures are due to expected=0.99 and calculated=~0.98.

    Tagging @jmazanec15 since he added ARM compatibility.

    opened by janaknat 33
  • Speed up pip install

    Speed up pip install

    Currently pip installing is slow, since there is a compile step. Is there any way to speed it up? On my macbook:

    time pip install --no-cache nmslib
    Collecting nmslib
      Downloading https://files.pythonhosted.org/packages/e1/95/1f7c90d682b79398c5ee3f9296be8d2640fa41de24226bcf5473c801ada6/nmslib-1.7.3.6.tar.gz (255kB)
        100% |████████████████████████████████| 256kB 8.8MB/s 
    Requirement already satisfied: pybind11>=2.0 in .../virtualenv/python3.6/lib/python3.6/site-packages (from nmslib) (2.2.4)
    Requirement already satisfied: numpy in .../virtualenv/python3.6/lib/python3.6/site-packages (from nmslib) (1.15.4)
    Installing collected packages: nmslib
      Running setup.py install for nmslib ... -
    done
    Successfully installed nmslib-1.7.3.6
    
    real	3m11.091s
    

    would it be a good idea to provide pre-compiled wheels over pip? That would also simplify the process of finding the pybind11 headers (I had to do something special to copy them in for pip when running with a --target dir)

    opened by matthen 33
  • Can't load index?

    Can't load index?

    Hi, this might me more of a question than problem in the library. I have created an index with NAPP and saved it using saveIndex. However when I load it with loadIndex I get the following error:

    Check failed: A previously saved index is apparently used with a different data set, a different data set split, and/or a different gold standard file! (detected an object index >= #of data points

    Am I doing something wrong?

    Thanks for the help.

    EDIT: The message doesn't make sense to me because I'm not "using the index with a data set", I'm just loading it.

    EDIT2: I'm using the Python interface.

    enhancement 
    opened by zommerfelds 31
  • Custom Metrics

    Custom Metrics

    Hello,

    I wanted to perform NN search on a dataset of genomes. For this task, the distance between 2 datapoints is calculated by a custom script? Is there I can incorporate this without having to create the entire NN search algorithm myself and only modify some parts of your code?

    opened by Chokerino 30
  • Python process crashes: 'pybind11::error_already_set'

    Python process crashes: 'pybind11::error_already_set'

    nmslib is the only lib in our project that relies on pybind11 and we could narrow it down to the Dask nodes that use nmslib. When we disable the nodes that use nmslib it doesn't crash.

    terminate called after throwing an instance of 'pybind11::error_already_set'
      what():  TypeError: '>=' not supported between instances of 'int' and 'NoneType'
    
    At:
      /opt/conda/envs/jobnet-env/lib/python3.6/logging/__init__.py(1546): isEnabledFor
      /opt/conda/envs/jobnet-env/lib/python3.6/logging/__init__.py(1293): debug
    
    /usr/local/bin/entrypoint.sh: line 46:    21 Aborted                 (core dumped) python scripts/cli.py "${@:2}"```
    

    Version:

    - nmslib~=1.7.2
    - pybind11=2.2
    
    opened by lukin0110 28
  • Make failed in linking Boost library

    Make failed in linking Boost library

    Hello,

    I am facing an error in this step:

    [ 75%] Linking CXX executable ../release/experiment

    All of errors liked that:

    undefined reference to `boost::program_options:

    I install latest libraries version and checked that libboost 1.58 is compatible with g++ 4.9. I think maybe it related with C++11, however It returns error in both g++ 4.9 and 4.7.

    This is my system information:

    -- The C compiler identification is GNU 4.9.3 -- The CXX compiler identification is GNU 4.9.3 -- Check for working C compiler: /usr/bin/cc -- Check for working C compiler: /usr/bin/cc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Detecting C compile features -- Detecting C compile features - done -- Check for working CXX compiler: /usr/bin/c++ -- Check for working CXX compiler: /usr/bin/c++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX compile features - done -- Build type: Release -- GSL using gsl-config /usr/bin/gsl-config -- Using GSL from /usr -- Found GSL. -- Found Eigen3: /usr/include/eigen3 (Required is at least version "3") -- Found Eigen3. -- Boost version: 1.58.0 -- Found the following Boost libraries: -- system -- filesystem -- program_options -- Found BOOST.

    I also install Clang and LLDB 3.6. I tried search many possible solution but can not fix that :(.

    opened by nguyenv7 26
  • Python wrapper crashes while retrieving nearest neighbors when M>100

    Python wrapper crashes while retrieving nearest neighbors when M>100

    Hi, I am working on a problem where I need to retrieve ~500 nearest neighbors out of a million points. I am using the python wrapper for HNSW method. The code works perfectly well if I set the value of parameter M <=100 but setting it greater than 100, the code crashes during retrieving nearest neighbors (no issues while building the model) with an "invalid next size" error. Any idea why this might be happening? Thanks Himanshu

    bug 
    opened by hjain689 25
  • Incorrect distances returned for all-zero query

    Incorrect distances returned for all-zero query

    An all-zero query vector will result in NMSLib incorrectly reporting a distance of zero for its nearest neighbours (see example below). Is this related to #187? Is there a suggested workaround?

    # Training set (CSR sparse matrix)
    X.todense()
    # Out:
    # matrix([[4., 2., 3., 1., 0., 0., 0., 0., 0.],
    #         [2., 1., 0., 0., 3., 0., 1., 2., 1.],
    #         [4., 2., 0., 0., 3., 1., 0., 0., 0.]], dtype=float32)
    
    # Query vector (CSR sparse matrix)
    r.todense()
    # Out:
    # matrix([[0., 0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)
    
    # Train and query
    import nmslib
    index = nmslib.init(
        method='hnsw',
        space='cosinesimil_sparse_fast',
        data_type=nmslib.DataType.SPARSE_VECTOR,
        dtype=nmslib.DistType.FLOAT)
    index.addDataPointBatch(X)
    index.createIndex()
    index.knnQueryBatch(r, k=3)
    # Out:
    # [(array([2, 1, 0], dtype=int32), array([0., 0., 0.], dtype=float32))]
    
    # Note that distances are all 0, which is incorrect!
    # Same result for dense training & query vectors.
    
    bug 
    opened by lsorber 24
  • Jaccard to method HSNW for sparse features

    Jaccard to method HSNW for sparse features

    Hi,

    I want to know if HSNW provides Jaccard (similarity or distance, does not matter), besides cosine, for sparse features. There are scenarios in which Jaccard outperforms.

    Python notebooks provided show the following metrices: l2, l2sqr_sift, cosinesimil_sparse.

    According to space_sparse_scalar.h, the following metrices seem to be implemented, or in preparation, to sparse features: #define SPACE_SPARSE_COSINE_SIMILARITY "cosinesimil_sparse" #define SPACE_SPARSE_ANGULAR_DISTANCE "angulardist_sparse" #define SPACE_SPARSE_NEGATIVE_SCALAR "negdotprod_sparse" #define SPACE_SPARSE_QUERY_NORM_NEGATIVE_SCALAR "querynorm_negdotprod_sparse"

    What does each of these metrices mean? I also saw cosinesimil_sparse_fast in a few files. What is it, and how is it compared to cosinesimil_sparse? Is it ready for use?

    I can provide a Jaccard implementation for sparse vectors, given 2 vectors implemented as hash tables, but I haven't found out how to integrate it to the code. It would also be preferable to check which metrices are already available. The closest clue I got was to expand the following files: distcomp_scalar.cc, hnsw.cc and hnsw_distfunc_opt.cc, but I am not sure which steps to make. I saw some mentions to Jaccard in space_sparse_jaccard.cc and distcomp.h. But no examples are given.

    Thanks in advance.

    opened by icarocd 24
  • pybind11.h not found when installing using pip

    pybind11.h not found when installing using pip

    I'm trying to install python bindings on Ubuntu 16.04 machine:

    $ pip3 install pybind11 nmslib
    Collecting nmslib
      Using cached https://files.pythonhosted.org/packages/de/eb/28b2060bb1750426c5618e3ad6ce830ac3cfd56cb3eccfb799e52d6064db/nmslib-1.7.2.tar.gz
    Requirement already satisfied: pybind11>=2.0 in /homes/alexandrov/.virtualenvs/pytorch/lib/python3.5/site-packages (from nmslib) (2.2.2)
    Requirement already satisfied: numpy in /homes/alexandrov/.virtualenvs/pytorch/lib/python3.5/site-packages (from nmslib) (1.14.2)
    Building wheels for collected packages: nmslib
      Running setup.py bdist_wheel for nmslib ... error
      Complete output from command /homes/alexandrov/.virtualenvs/pytorch/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-0y71oxa4/nmslib/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/pip-wheel-916r1rr9 --python-tag cp35:
      running bdist_wheel
      running build
      running build_ext
      creating tmp
      x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.5m -I/homes/alexandrov/.virtualenvs/pytorch/include/python3.5m -c /tmp/tmpwekdswov.cpp -o tmp/tmpwekdswov.o -std=c++14
      cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
      x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.5m -I/homes/alexandrov/.virtualenvs/pytorch/include/python3.5m -c /tmp/tmpyyphh022.cpp -o tmp/tmpyyphh022.o -fvisibility=hidden
      cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
      building 'nmslib' extension
      creating build
      creating build/temp.linux-x86_64-3.5
      creating build/temp.linux-x86_64-3.5/nmslib
      creating build/temp.linux-x86_64-3.5/nmslib/similarity_search
      creating build/temp.linux-x86_64-3.5/nmslib/similarity_search/src
      creating build/temp.linux-x86_64-3.5/nmslib/similarity_search/src/method
      creating build/temp.linux-x86_64-3.5/nmslib/similarity_search/src/space
      x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I./nmslib/similarity_search/include -Iinclude -Iinclude -I/homes/alexandrov/.virtualenvs/pytorch/lib/python3.5/site-packages/numpy/core/include -I/usr/include/python3.5m -I/homes/alexandrov/.virtualenvs/pytorch/include/python3.5m -c nmslib.cc -o build/temp.linux-x86_64-3.5/nmslib.o -O3 -march=native -fopenmp -DVERSION_INFO="1.7.2" -std=c++14 -fvisibility=hidden
      cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
      nmslib.cc:16:31: fatal error: pybind11/pybind11.h: No such file or directory
      compilation terminated.
      error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
    

    Clearly, pybind11 headers were not installed on my machine. This library is not packaged for apt-get (at least not for Ubuntu 16.04), so I needed to manually install from source.

    Would be nice if nmslib install script took care of this.

    opened by taketwo 23
  • Optimized index raises RuntimeError on load when saved with `negdotprod` space

    Optimized index raises RuntimeError on load when saved with `negdotprod` space

    Basically, this is what I am trying to do

    import nmslib
    
    space = 'negdotprod'
    
    vectors = [[1, 2], [3, 4], [5, 6]]
    
    index = nmslib.init(space=space, method='hnsw')
    index.addDataPointBatch(vectors)
    index.createIndex(
        {'M': 15, 'efConstruction': 200, 'skip_optimized_index': 0, 'post': 0}
    )
    index.saveIndex('test.index')
    
    new_index = nmslib.init(space=space, method='hnsw')
    new_index.loadIndex('test.index')
    

    and it raises

    Check failed: totalElementsStored_ == this->data_.size() The number of stored elements 3 doesn't match the number of data points ! Did you forget to re-load data?
    Traceback (most recent call last):
      File "8.py", line 15, in <module>
        new_index.loadIndex('test.index')
    RuntimeError: Check failed: The number of stored elements 3 doesn't match the number of data points ! Did you forget to re-load data?
    

    If I change space variable to cosinesimil, it works just fine. It seems that data points are not stored, even though hnsw method with skip_optimized_index=0 is used.

    opened by chomechome 22
  • Unable to pip install nmslib, including historic versions

    Unable to pip install nmslib, including historic versions

    Hey sorry to bother you,

    I've been trying to download scispacy via pip on windows 10 using python 3.10.0 today and it keeps failing due to errors about nmslib I've tried pip installing nmslib versions: 1.7.3.6 1.8 2.1.1

    None of them have worked though, curiously. I've had a long look around scispacys github and yours but nothing I've read has given me any solutions.

    I've also flagged it with scispacy on their github. Anyway I have no idea what's going on but just thought I'd let you know. Cheers Kind regards, Chris

    opened by Cbezz 5
  • Strict typing is needed: Using wrong input can cause distances to be all one, e.g., with cosinesimil_sparse/HNSW when calling knnQueryBatch on a dense array

    Strict typing is needed: Using wrong input can cause distances to be all one, e.g., with cosinesimil_sparse/HNSW when calling knnQueryBatch on a dense array

    Hey, I'm trying to use nmslib's HNSW with a csr_matrix containing sparse vectors.

    Creating the index works fine, adding the data and setting query time params too:

        items = ["foo is a kind of thing", "bar is another one", "this bar is a real one!", "I prefer to use a foo"] # etc, len=3000
        similar_items_index = nmslib.init(
            space="cosinesimil_sparse",
            method="hnsw",
            data_type=nmslib.DataType.SPARSE_VECTOR,
            dtype=nmslib.DistType.FLOAT,
        )
        vectorizer = TfidfVectorizer(dtype=np.float32, token_pattern=r"\S+")
        embeddings: csr_matrix = vectorizer.fit_transform(items)
        similar_items_index.addDataPointBatch(embeddings)
        similar_items_index.createIndex({"M": 128, "efConstruction": 32, "post": 2}, print_progress=False)
        similar_items_index.setQueryTimeParams({"ef": 512})
    

    But when I search with knnQueryBatch, all the returned distances are equal to 1:

    similar_items_index.knnQueryBatch([query_embedding], 5)[0]
    

    -> Knn results: ids, with distances all set to 1

    Am I missing something in the proper usage of HNSW with sparse vector data?

    Setup for reproduction
    • This uses the text-similarity data from Kaggle, downloaded in /tmp/. Any other text dataset should be fine, as computing similarity scores is not required to see the problem with returned distances.
    
    import csv
    from typing import Dict
    
    import nmslib
    import numpy as np
    from implicit.evaluation import csr_matrix
    from sklearn.feature_extraction.text import TfidfVectorizer
    
    CSV_PATH = "/tmp/data/"
    
    
    def main():
        similar_items_index = nmslib.init(
            space="cosinesimil_sparse",
            method="hnsw",
            data_type=nmslib.DataType.SPARSE_VECTOR,
            dtype=nmslib.DistType.FLOAT,
        )
        items = set()
        ids: Dict[str, int] = {}
        rids: Dict[int, str] = {}
        similarities = {}
        for file in [
            f"{CSV_PATH}/similarity-test.csv",
            f"{CSV_PATH}/similarity-train.csv",
        ]:
            with open(file) as f:
                reader = csv.reader(f, delimiter=",", quotechar="|")
                header = next(reader)
                for i, l in enumerate(reader):
                    desc_x = l[header.index("description_x")]
                    desc_y = l[header.index("description_y")]
                    similar = bool(l[header.index("same_security")])
                    id = len(items)
                    if desc_x not in items:
                        items.add(desc_x)
                        ids[desc_x] = id
                        rids[id] = desc_x
                        id_x = id
                        id += 1
                    else:
                        id_x = ids[desc_x]
                    if desc_y not in items:
                        items.add(desc_y)
                        ids[desc_y] = id
                        rids[id] = desc_y
                        id_y = id
                        id += 1
                    else:
                        id_y = ids[desc_y]
                    if similar:
                        similarities[id_x] = id_y
                        similarities[id_y] = id_x
             print(f"Loaded {len(items)}, total {len(similarities)/2} pairs of similar queries.")
             vectorizer = TfidfVectorizer(dtype=np.float32, token_pattern=r"\S+")
        embeddings: csr_matrix = vectorizer.fit_transform(items)
        print("Embedded items, adding datapoints..")
        similar_items_index.addDataPointBatch(embeddings)
        print("Creating index..")
        similar_items_index.createIndex({"M": 128, "efConstruction": 32, "post": 2}, print_progress=False)
        print("Setting index query params..")
        similar_items_index.setQueryTimeParams({"ef": 512})
        print("Searching...")
        score = 0
        total_similar = 0
        for item_id, item in enumerate(items):
            query_embedding = vectorizer.transform([item]).getrow(0).toarray()
            top_50, distances = similar_items_index.knnQueryBatch([query_embedding], 50)[0]
            top_50_texts = [rids[t] for t in top_50]
            try:
                expected = similarities[item_id]
                expected_text = rids[expected]
                if expected:
                    score += 1 if expected in top_50 else 0
            except KeyError:
                continue  # No similar noted on this item.
            total_similar += 1
        print(
            f"After querying {len(items)} of which {total_similar}, we found the similar item in the top50 {score} times."
        )
    
    
    if __name__ == "__main__":
        main()
    
    opened by PLNech 6
  • More encompassing approach for Mac M1 chips

    More encompassing approach for Mac M1 chips

    On a Mac architecture, platform.processor may return i386 even when on a Mac M1. The code below should be more accurate. See stack overflow comment, another stack overflow comment and stack overflow post for some more information / validation that the uname approach is more all encompassing.

    I was personally running into this problem and the following fix solved it for me.

    This PR is a slightly edited solution to what is contained in https://github.com/nmslib/nmslib/pull/485 with many thanks to @netj for getting this started.

    opened by JewlsIOB 3
  • Calling setQueryTimeParams results in a SIGSEGV

    Calling setQueryTimeParams results in a SIGSEGV

    Hi there! Trying to perform knnQuery on an indexed csr_matrix, I got the issue reported in #480 from this code:

            model = TfidfVectorizer(dtype=np.float32, token_pattern=r"\S+")
            embeddings = model.fit_transform(corpus_tfidf)
            logger.info(f"Creating vector index from a {len(corpus_tfidf)} corpus embedded as {embeddings.shape}...")
            index = nmslib.init(method="hnsw", space="cosinesimil_sparse", data_type=nmslib.DataType.SPARSE_VECTOR, dtype=nmslib.DistType.FLOAT)
            logger.info("Adding datapoints to index...")
            index.addDataPointBatch(embeddings)
            logger.info("Creating final index...")
            index.createIndex()
    
            logger.info(f"Search neightbors for first embedding {embeddings[0]})
            index.knnQuery(embeddings[0])
    

    As described in #480, this results in an IndexError: tuple index out of range.

    When trying to apply the index.setQueryTimeParams({'efSearch': efS, 'algoType': 'old'}) workaround mentioned in another issue , it results in a segmentation fault.

    I can reproduce it with the following minimal example, looks like even without arguments the call errors:

    index = nmslib.init(method="hnsw", space="cosinesimil_sparse", data_type=nmslib.DataType.SPARSE_VECTOR, dtype=nmslib.DistType.FLOAT)
    print("Setting index queryParams...")
    index.setQueryTimeParams()
    print("Adding datapoints to index...")
    

    ->

    Setting index queryParams...
    Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
    

    Env info

    • python -V -> Python 3.7.11
    • pip freeze | grep nmslib -> nmslib==2.1.1
    opened by PLNech 3
  • NMSLIB doesn't work on Windows 11

    NMSLIB doesn't work on Windows 11

    Hello,

    We use nmslib as default engine for TensorFlow Similarity due to it's broad compatibility with various OSes. We got multiple reports, and I was able to confirm it, that nmslib don't install on Windows 11, potentially related to the issue #498.

    Do you have any idea if/when you will be able to take a look at this? With the increased adoption of Win11 it become problematic for us.

    Thanks :)

    opened by ebursztein 15
Releases(v2.1.1)
  • v2.1.1(Feb 3, 2021)

    Note: We unfortunately had deployment issues. As a result we had to delete several versions between 2.0.6 and 2.1.1. If you installed one of these versions, please, delete them and install a more recent version (>=2.1.1).

    The current build focuses on:

    1. Providing more efficient ("optimized") implementations for spaces: negdotprod, l1, linf.
    2. Binaries for ARM 64 (aarch64).
    Source code(tar.gz)
    Source code(zip)
  • v2.0.6(Apr 16, 2020)

  • v2.0.5(Nov 7, 2019)

    The main objective of this release to provide binary wheels. For compatibility reasons, we need to stick to basic SSE2 instructions. However, when the Python library is being imported, it prints a message suggesting that a more efficient version can be installed from sources (and tells how to do this).

    Furthermore, this release removes a lot of old code, which speeds up compilation by 70%:

    1. Non-performing methods
    2. Double-indices

    This is a step towards more lightweight NMSLIB library.

    Source code(tar.gz)
    Source code(zip)
  • v1.8.1(Jun 23, 2019)

  • v1.8(Jun 6, 2019)

    This is a clean-up release focusing on several important issues:

    1. Fixing a bug with knnQuery #370
    2. Added a possibility to save/load data efficiently from the Python bindings (and the query server) #356 Python notebooks are updated accordingly
    3. We have bit Jaccard space (many thanks @gregfriedland)
    4. Upgraded the query server to use a recent Apache Thrift
    5. Importantly the documentation is reorganized quite a bit: 5.1 There is now a single entry point for all the docs 5.2 Most of the docs are now online and only fairly technical description of search spaces and methods is in the PDF manual.
    Source code(tar.gz)
    Source code(zip)
  • v1.7.3.6(Oct 4, 2018)

  • v1.7.3.4(Aug 6, 2018)

  • v1.7.3.2(Jul 13, 2018)

  • v1.7.3.1(Jul 9, 2018)

  • v1.7.2(Feb 20, 2018)

    1. Improving concurrency in Python (preventing hanging in a certain situation https://github.com/searchivarius/nmslib/issues/291)
    2. Improving ParallelFor : passing thread ID and not starting threads in a single-thread mode.
    Source code(tar.gz)
    Source code(zip)
  • v1.7(Feb 4, 2018)

  • v1.6(Dec 15, 2016)

    Here are the list of changes for the version 1.6 (manual isn't updated yet):

    We especially thank the following people for the fixes:

    • Bileg Naidan (@bileg)
    • Bob Poekert (@bobpoekert)
    • @orgoro
    1. We simplified the build by excluding the code that required 3rd party code from the core library. In other words, the core library does not have any 3rd party dependencies (not even boost). To build the full version of library you have to run cmake as follows: cmake . -DWITH_EXTRAS=1
    2. It should now be possible to build on MAC.
    3. We improve Python bindings (thanks to @bileg) and their installation process (thanks to @bobpoekert):
      1. We merged our generic and vector bindings into a single module. We upgraded to a more standard installation process via distutils. You can run: python setup.py build and then sudo python setup.py install.
      2. We improved our support for sparse spaces: you can pass data in the form of a numpy sparse array!
      3. There are now batch multi-threaded querying and addition of data.
      4. addDataPoint* functions return a position of an inserted entry. This can be useful if you use function getDataPoint
      5. For examples of using Python API, please, see *.py files in the folder python_bindings.
      6. Note that to execute unit tests you need: python-numpy, python-scipy, and python-pandas.
    4. Because we got rid of boost, we, unfortunately, do not support command-line options WITHOUT arguments. Instead, you have pass values 0 or 1.
    5. However, the utility experiment (experiment.exe) now accepts the option recallOnly. If this option has argument 1, then the only effectiveness metric computed is recall. This is useful for evaluation of HNSW, because (for efficiency reasons) HNSW does not return proper distance values (e.g., for L2 it's a squared distance, not the original one). This makes it impossible to compute effectiveness metrics other than recall (returning wrong distance values would also lead to experiment terminating with an error message).
    6. Additional spaces:
      1. negdotprod_sparse: negative inner (dot) product. This is a sparse space.
      2. querynorm_negdotprod_sparse: query-normalized inner (dot) product, which is the dot product divded by the query norm.
      3. renyi_diverg: Renyi divergence. It has the parameter alpha.
      4. ab_diverg: α-β-divergence. It has two parameters: alpha and beta.
    7. Additional search methods:
      1. simple_invindx: A classical inverted index with a document-at-a-time processing (via a prirority queue). It doesn't have parameters, but works only with the sparse space negdotprod_sparse.
      2. falconn: we ported (created a wrapper for) a June 2016's version of FALCONN library.
        1. Unlike the original implementation, our wrapper works directly with sparse vector spaces as well as with dense vector spaces.
        2. However, our wrapper has to duplicate data twice: so this method is useful mostly as a benchmark.
        3. Our wrapper directly supports a data centering trick, which can boost performance sometimes.
        4. Most parameters (hash_family, cross_polytope, hyperplane, storage_hash_table, num_hash_bits, num_hash_tables, num_probes, num_rotations, seed, feature_hashing_dimension) merely map to FALCONN parameters.
        5. Setting additional parameters norm_data and center_data tells us to center and normalize data. Our implementation of the centering (which is done unfortunately before the hashing trick is applied) for sparse data is horribly inefficient, so we wouldn't recommend using it. Besides, it doesn't seem to improve results. Just in case, the number of sprase dimensions used for centering is controlled by the parameter max_sparse_dim_to_center.
        6. Our FALCONN wrapper would normally use the distance provided by NMSLIB, but you can force using FALCONN's distance function implementation by setting: use_falconn_dist to 1.
    Source code(tar.gz)
    Source code(zip)
  • v1.5.3(Jul 11, 2016)

  • v1.5.2(Jul 2, 2016)

  • v1.5.1(Jun 1, 2016)

  • v1.5(May 20, 2016)

    1. A new efficient method: a hierarchical (navigable) small-world graph (HNSW), contributed by Yury Malkov (@yurymalkov). Works with g++, Visual Studio, Intel Compiler, but doesn't work with Clang yet.
    2. A query server, which can have clients in C++, Java, Python, and other languages supported by Apache Thrift
    3. Python bindings for vector and non-vector spaces
    4. Improved performance of two core methods SW-graph and NAPP
    5. Better handling of the gold standard data in the benchmarking utility experiment
    6. Updated API that permits search methods to serialize indices
    7. Improved documentation (e.g., we added tuning guidelines for best methods)
    Source code(tar.gz)
    Source code(zip)
Owner
null
This is the repository for CVPR2021 Dynamic Metric Learning: Towards a Scalable Metric Space to Accommodate Multiple Semantic Scales

Intro This is the repository for CVPR2021 Dynamic Metric Learning: Towards a Scalable Metric Space to Accommodate Multiple Semantic Scales Vehicle Sam

null 39 Jul 21, 2022
Implementation of temporal pooling methods studied in [ICIP'20] A Comparative Evaluation Of Temporal Pooling Methods For Blind Video Quality Assessment

Implementation of temporal pooling methods studied in [ICIP'20] A Comparative Evaluation Of Temporal Pooling Methods For Blind Video Quality Assessment

Zhengzhong Tu 5 Sep 16, 2022
TensorFlow Similarity is a python package focused on making similarity learning quick and easy.

TensorFlow Similarity is a python package focused on making similarity learning quick and easy.

null 912 Jan 8, 2023
Sharpened cosine similarity torch - A Sharpened Cosine Similarity layer for PyTorch

Sharpened Cosine Similarity A layer implementation for PyTorch Install At your c

Brandon Rohrer 203 Nov 30, 2022
Paper: Cross-View Kernel Similarity Metric Learning Using Pairwise Constraints for Person Re-identification

Cross-View Kernel Similarity Metric Learning Using Pairwise Constraints for Person Re-identification T M Feroz Ali, Subhasis Chaudhuri, ICVGIP-20-21

T M Feroz Ali 3 Jun 17, 2022
Densely Connected Search Space for More Flexible Neural Architecture Search (CVPR2020)

DenseNAS The code of the CVPR2020 paper Densely Connected Search Space for More Flexible Neural Architecture Search. Neural architecture search (NAS)

Jamin Fong 291 Nov 18, 2022
Evaluation and Benchmarking of Speech Super-resolution Methods

Speech Super-resolution Evaluation and Benchmarking What this repo do: A toolbox for the evaluation of speech super-resolution algorithms. Unify the e

Haohe Liu (刘濠赫) 84 Dec 20, 2022
A PyTorch-based open-source framework that provides methods for improving the weakly annotated data and allows researchers to efficiently develop and compare their own methods.

Knodle (Knowledge-supervised Deep Learning Framework) - a new framework for weak supervision with neural networks. It provides a modularization for se

null 93 Nov 6, 2022
Deep Image Search is an AI-based image search engine that includes deep transfor learning features Extraction and tree-based vectorized search.

Deep Image Search - AI-Based Image Search Engine Deep Image Search is an AI-based image search engine that includes deep transfer learning features Ex

null 139 Jan 1, 2023
Space robot - (Course Project) Using the space robot to capture the target satellite that is disabled and spinning, then stabilize and fix it up

Space robot - (Course Project) Using the space robot to capture the target satellite that is disabled and spinning, then stabilize and fix it up

Mingrui Yu 3 Jan 7, 2022
Cascaded Deep Video Deblurring Using Temporal Sharpness Prior and Non-local Spatial-Temporal Similarity

This repository is the official PyTorch implementation of Cascaded Deep Video Deblurring Using Temporal Sharpness Prior and Non-local Spatial-Temporal Similarity

hippopmonkey 4 Dec 11, 2022
aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)

Bayesian Methods for Hackers Using Python and PyMC The Bayesian method is the natural approach to inference, yet it is hidden from readers behind chap

Cameron Davidson-Pilon 25.1k Jan 2, 2023
A variational Bayesian method for similarity learning in non-rigid image registration (CVPR 2022)

A variational Bayesian method for similarity learning in non-rigid image registration We provide the source code and the trained models used in the re

daniel grzech 14 Nov 21, 2022
A curated list of awesome resources related to Semantic Search🔎 and Semantic Similarity tasks.

A curated list of awesome resources related to Semantic Search?? and Semantic Similarity tasks.

null 224 Jan 4, 2023
A deep learning based semantic search platform that computes similarity scores between provided query and documents

semanticsearch This is a deep learning based semantic search platform that computes similarity scores between provided query and documents. Documents

null 1 Nov 30, 2021
FuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space OptimizationFuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space Optimization

FuseDream This repo contains code for our paper (paper link): FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimizat

XCL 191 Dec 31, 2022
Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

This repository is a toolkit to do machine learning for programming languages. It implements tokenization, dataset preprocessing, model training and m

Facebook Research 408 Jan 1, 2023
An evaluation toolkit for voice conversion models.

Voice-conversion-evaluation An evaluation toolkit for voice conversion models. Sample test pair Generate the metadata for evaluating models. The direc

null 30 Aug 29, 2022