An Indexer that works out-of-the-box when you have less than 100K stored Documents

Jina AI

Last update: Mar 15, 2022

Related tags

Data Analysis executor-U100KIndexer

Overview

U100KIndexer

An Indexer that works out-of-the-box when you have less than 100K stored Documents. U100K means under 100K. At 100K stored Documents with 768-dim embeddings, you can expect 300ms for single query or 20~120QPS for batch queries. Results are full Documents.

U100KIndexer leverages jina.DocumenetArrayMemmap as the storage backend and .match() to conduct nearest neighbours search. It returns the full Documents as-is, hence no need to concatenate it with another key-value indexer to retrieve Documents.

Pros & cons

Pros

Exhaustive search: highest recall
Fast indexing
Acceptable query performance under 100K
Always return full Documents
No extra dependencies

Cons

Slow query time

Performance

The indexing and query performance on 768-dim embeddings is as follows (unit is second):

Stored data	Indexing time	Query size=1	Query size=8	Query size=64
10000	0.256	0.019	0.029	0.086
50000	1.156	0.147	0.177	0.314
100000	2.329	0.297	0.332	0.536
200000	4.704	0.656	0.744	1.050
400000	11.105	1.289	1.536	2.793

Benchmark script can be found in benchmark.py.

Tips

To change workspace,

U100KIndexer(metas={'workspace': './my'})

Or .add(..., uses_metas={'workspace': './my'}) when you use it in a Flow.

You might also like...

🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.

pySBD: Python Sentence Boundary Disambiguation (SBD) pySBD - python Sentence Boundary Disambiguation (SBD) - is a rule-based sentence boundary detecti

277 Feb 18, 2021

Official PyTorch implementation of Less is More: Pay Less Attention in Vision Transformers.

Less is More: Pay Less Attention in Vision Transformers Official PyTorch implementation of Less is More: Pay Less Attention in Vision Transformers. By

73 Jan 1, 2023

A large dataset of 100k Google Satellite and matching Map images, resembling pix2pix's Google Maps dataset.

Larger Google Sat2Map dataset This dataset extends the aerial ⟷ Maps dataset used in pix2pix (Isola et al., CVPR17). The provide script download_sat2m

34 Dec 28, 2022

A (completely native) python3 wifi brute-force attack using the 100k most common passwords (2021)

wifi-bf [LINUX ONLY] A (completely native) python3 wifi brute-force attack using the 100k most common passwords (2021) This script is purely for educa

20 Nov 12, 2022

Deploy a ML inference service on a budget in less than 10 lines of code.

BudgetML is perfect for practitioners who would like to quickly deploy their models to an endpoint, but not waste a lot of time, money, and effort trying to figure out how to do this end-to-end.

1.3k Dec 25, 2022

A minimalist GUI frontend for the youtube-dl. Takes up less than 4 KB.

📥 libre-DL A minimalist GUI wrapper for youtube-dl. Written in python. Total size less than 4 KB. Contributions welcome. You don't need youtube-dl pr

40 Sep 23, 2022

Create or join a private chatroom without any third-party middlemen in less than 30 seconds, available through an AES encrypted password protected link.

PY-CHAT Create or join a private chatroom without any third-party middlemen in less than 30 seconds, available through an AES encrypted password prote

1 Nov 24, 2021

Run CodeServer on Google Colab using Inlets in less than 60 secs using your own domain.

Inlets Colab Run CodeServer on Colab using Inlets in less than 60 secs using your own domain. Features Optimized for Inlets/InletsPro Use your own Cus

2 Dec 30, 2021

List Less Than Ten with python

0 Feb 7, 2022

easySpeech is an open-source Python wrapper for google speech to text API that doesn't require PyAudio(So you especially windows user don't have to deal with the errors while installing PyAudio) and also works with hugging face transformers

easySpeech easySpeech is an open source python wrapper for google speech to text api that doesn't require PyAaudio(So you specially windows user don't

14 May 24, 2022

Much faster than SORT(Simple Online and Realtime Tracking), a little worse than SORT

QSORT QSORT(Quick + Simple Online and Realtime Tracking) is a simple online and realtime tracking algorithm for 2D multiple object tracking in video s

8 Jul 27, 2022

Code of PVTv2 is released! PVTv2 largely improves PVTv1 and works better than Swin Transformer with ImageNet-1K pre-training.

Updates (2020/06/21) Code of PVTv2 is released! PVTv2 largely improves PVTv1 and works better than Swin Transformer with ImageNet-1K pre-training. Pyr

1.3k Jan 4, 2023

Have you ever wondered how cool it would be to have your own A.I

Have you ever wondered how cool it would be to have your own A.I. assistant Imagine how easier it would be to send emails without typing a single word, doing Wikipedia searches without opening web browsers, and performing many other daily tasks like playing music with the help of a single voice command.

1 Nov 9, 2021

Transparent proxy server that works as a poor man's VPN. Forwards over ssh. Doesn't require admin. Works with Linux and MacOS. Supports DNS tunneling.

sshuttle: where transparent proxy meets VPN meets ssh As far as I know, sshuttle is the only program that solves the following common case: Your clien

9.4k Jan 4, 2023

A Regex based linter tool that works for any language and works exclusively with custom linting rules.

renag Documentation Available Here Short for Regex (re) Nag (like "one who complains"). Now also PEGs (Parsing Expression Grammars) compatible with py

12 Oct 20, 2022

PyQt QGraphicsView with selection box. User can move vertical border of the box horizontally.

pyqt-horizontal-selection-square-graphics-view PyQt QGraphicsView with selection box. User can move vertical border of the box horizontally. Requireme

3 Nov 7, 2022

Black-Box-Tuning - Black-Box Tuning for Language-Model-as-a-Service

Black-Box-Tuning Source code for paper "Black-Box Tuning for Language-Model-as-a

149 Jan 4, 2023

You can easily send campaigns, e-marketing have actually account using cash will thank you for using our tools, and you can support our Vodafone Cash +201090788026

*** Welcome User Sorry I Mean Hello Brother ✓ Devolper and Design : Mokhtar Abdelkreem ========================================== You Can Follow Us O

1 Nov 3, 2021

Spam your friends and famly and when you do your famly will disown you and you will have no friends.

SpamBot9000 Spam your friends and family and when you do your family will disown you and you will have no friends. Terms of Use Disclaimer: Please onl

0 Jun 9, 2022

An Indexer that works out-of-the-box when you have less than 100K stored Documents

Related tags

Overview

U100KIndexer

Pros & cons

Pros

Cons

Performance

Tips

You might also like...

🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.

Official PyTorch implementation of Less is More: Pay Less Attention in Vision Transformers.

A large dataset of 100k Google Satellite and matching Map images, resembling pix2pix's Google Maps dataset.

A (completely native) python3 wifi brute-force attack using the 100k most common passwords (2021)

Deploy a ML inference service on a budget in less than 10 lines of code.

A minimalist GUI frontend for the youtube-dl. Takes up less than 4 KB.

Create or join a private chatroom without any third-party middlemen in less than 30 seconds, available through an AES encrypted password protected link.

Run CodeServer on Google Colab using Inlets in less than 60 secs using your own domain.

List Less Than Ten with python

easySpeech is an open-source Python wrapper for google speech to text API that doesn't require PyAudio(So you especially windows user don't have to deal with the errors while installing PyAudio) and also works with hugging face transformers

Much faster than SORT(Simple Online and Realtime Tracking), a little worse than SORT

Code of PVTv2 is released! PVTv2 largely improves PVTv1 and works better than Swin Transformer with ImageNet-1K pre-training.

Have you ever wondered how cool it would be to have your own A.I

Transparent proxy server that works as a poor man's VPN. Forwards over ssh. Doesn't require admin. Works with Linux and MacOS. Supports DNS tunneling.

A Regex based linter tool that works for any language and works exclusively with custom linting rules.

PyQt QGraphicsView with selection box. User can move vertical border of the box horizontally.

Black-Box-Tuning - Black-Box Tuning for Language-Model-as-a-Service

You can easily send campaigns, e-marketing have actually account using cash will thank you for using our tools, and you can support our Vodafone Cash +201090788026

Spam your friends and famly and when you do your famly will disown you and you will have no friends.

Owner

Jina AI

Parses data out of your Google Takeout (History, Activity, Youtube, Locations, etc...)

The Spark Challenge Student Check-In/Out Tracking Script

A Pythonic introduction to methods for scaling your data science and machine learning work to larger datasets and larger models, using the tools and APIs you know and love from the PyData stack (such as numpy, pandas, and scikit-learn).

Bearsql allows you to query pandas dataframe with sql syntax.

Show you how to integrate Zeppelin with Airflow

Datashredder is a simple data corruption engine written in python. You can corrupt anything text, images and video.

A utility for functional piping in Python that allows you to access any function in any scope as a partial.

Streamz helps you build pipelines to manage continuous streams of data

AnonStress-Stored-XSS-Exploit - An exploit and demonstration on how to exploit a Stored XSS vulnerability in anonstress

🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.