Semantic search through a vectorized Wikipedia (SentenceBERT) with the Weaviate vector search engine

SeMI Technologies

Last update: Dec 26, 2022

Related tags

Text Data & NLP semantic-search-through-wikipedia-with-weaviate

Overview

Semantic search through Wikipedia with the Weaviate vector search engine

Weaviate is an open source vector search engine with build-in vectorization and question answering modules. We imported the complete English language Wikipedia article dataset into a single Weaviate instance to conduct semantic search queries through the Wikipedia articles, besides this, we've made all the graph relations between the articles too. We have made the import scripts, pre-processed articles, and backup available so that you can run the complete setup yourself.

In this repository, you'll find the 3-steps needed to replicate the import, but there are also downlaods available to skip the first two steps.

If you like what you see, a ⭐ on the Weaviate Github repo or joining our Slack is appreciated.

Additional links:

Frequently Asked Questions

Q	A
Can I run this setup with a non-English dataset?	Yes – first, you need to go through the whole process (i.e., start with Step 1). E.g., if you want French, you can download the French version of Wikipedia like this: `https://dumps.wikimedia.org/frwiki/latest/frwiki-latest-pages-articles.xml.bz2` (note that `en` if replaced with `fr`). Next, you need to change the Weaviate vectorizer module to an appropriate language. You can choose an OOTB language model as outlined here or add your own model as outlined here.
Can I run this setup with all languages?	Yes – you can follow two strategies. You can use a multilingual model or extend the Weaviate schema to store different languages with different classes. The latter has the upside that you can use multiple vectorizers (e.g., per language) or a more elaborate sharding strategy. But in the end, both are possible.
Can I run this with Kubernetes?	Of course, you need to start from Step 2. But if you follow the Kubernetes set up in the docs you should be good :-)
Can I run this with my own data?	Yes! This is just a demo dataset, you can use any data you have and like. Go to the Weaviate docs or join our Slack to get started.

Acknowledgments

The t2v-transformers module used contains the sentence-transformers-multi-qa-MiniLM-L6-cos transformer created by the SBERT team
Thanks to the team of Obsei for sharing the idea on our Slack channel

Stats

description	value
Articles imported	`11.348.257`
Paragaphs imported	`27.377.159`
Graph cross references	`125.447.595`
Wikipedia version	`truthy October 9th, 2021`
Machine for inference	`12 CPU – 100 GB RAM – 250Gb SSD – 1 x NVIDIA Tesla P4`
Weaviate version	`v1.7.2`
Dataset size	`122GB`

Example queries

Import

There are 3-steps in the import process. You can also skip the first two and directly import the backup

Step 1: Process the Wikipedia dump

In this process, the Wikipedia dataset is processed and cleaned (the markup is removed, HTML tags are removed, etc). The output file is a JSON Lines document that will be used in the next step.

Process from the Wikimedia dump:

$ cd step-1
$ wget https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
$ bzip2 -d filename.bz2
$ pip3 install -r requirements.txt
$ python3 process.py

The import takes a few hours, so probably you want to do something like:

$ nohup python3 -u process.py &

You can also download the processed file from October 9th, 2021, and skip the above steps

$ wget https://storage.googleapis.com/semi-technologies-public-data/wikipedia-en-articles.json.gz
$ gunzip wikipedia-en-articles.json.gz

Step 2: Import the dataset and vectorized the content

Weaviate takes care of the complete import and vectorization process but you'll need some GPU and CPU muscle to achieve this. Important to bear in mind is that this is only needed on import time. If you don't want to spend the resources on doing the import, you can go to the next step in the process and download the Weaviate backup. The machine needed for inference is way cheaper.

We will be using a single Weaviate instance, but four Tesla T4 GPUs that we will stuff with 8 models each. To efficiently do this, we are going to add an NGINX load balancer between Weaviate and the vectorizers.

Every Weaviate text2vec-module will be using a multi-qa-MiniLM-L6-cos-v1 sentence transformer.
The volume is mounted outside the container to /var/weaviate. This allows us to use this folder as a backup that can be imported in the next step.
Make sure to have Docker-compose with GPU support installed.
The import scripts assumes that the JSON file is called wikipedia-en-articles.json.

$ cd step-2
$ docker-compose up -d
$ pip3 install -r requirements.txt
$ python3 import.py

The import takes a few hours, so probably you want to do something like:

$ nohup python3 -u import.py &

After the import is done, you can shut down the Docker containers by running docker-compose down.

You can now query the dataset!

Step 3: Load from backup

Start here if you want to work with a backup of the dataset without importing it

You can now run the dataset! We would advise running it with 1 GPU, but you can also run it on CPU only (without Q&A). The machine you need for inference is significantly smaller.

Note that Weaviate needs some time to import the backup (if you use the setup mentioned above +/- 15min). You can see the status of the backup in the docker logs of the Weaviate container.

# clone this repository
$ git clone https://github.com/semi-technologies/semantic-search-through-Wikipedia-with-Weaviate/
# go into the backup dir
$ cd step-3
# download the Weaviate backup
$ curl https://storage.googleapis.com/semi-technologies-public-data/weaviate-1.8.0-rc.2-backup-wikipedia-py-en-multi-qa-MiniLM-L6-cos.tar.gz -O
# untar the backup (112G unpacked)
$ tar -xvzf weaviate-1.8.0-rc.2-backup-wikipedia-py-en-multi-qa-MiniLM-L6-cos.tar.gz
# get the unpacked directory
$ echo $(pwd)/var/weaviate
# use the above result (e.g., /home/foobar/var/weaviate)
#   update volumes in docker-compose.yml (NOT PERSISTENCE_DATA_PATH!) to the above output
#   (e.g., 
#     volumes:
#       - /home/foobar/var/weaviate:/var/lib/weaviate
#   )    
#
#   With 12 CPUs this process takes about 12 to 15 minutes to complete.
#   The Weaviate instance will be available directly, but the cache is pre-filling in this timeframe

With GPU

$ cd step-3
$ docker-compose -f docker-compose-gpu.yml up -d

Without GPU

$ cd step-3
$ docker-compose -f docker-compose-no-gpu.yml up -d

Example queries

"Where is the States General of The Netherlands located?" try it live!

##
# Using the Q&A module I
##
{
  Get {
    Paragraph(
      ask: {
        question: "Where is the States General of The Netherlands located?"
        properties: ["content"]
      }
      limit: 1
    ) {
      _additional {
        answer {
          result
          certainty
        }
      }
      content
      title
    }
  }
}

"What was the population of the Dutch city Utrecht in 2019?" try it live!

##
# Using the Q&A module II
##
{
  Get {
    Paragraph(
      ask: {
        question: "What was the population of the Dutch city Utrecht in 2019?"
        properties: ["content"]
      }
      limit: 1
    ) {
      _additional {
        answer {
          result
          certainty
        }
      }
      content
      title
    }
  }
}

About the concept "Italian food" try it live!

##
# Generic question about Italian food
##
{
  Get {
    Paragraph(
      nearText: {
        concepts: ["Italian food"]
      }
      limit: 50
    ) {
      content
      order
      title
      inArticle {
        ... on Article {
          title
        }
      }
    }
  }
}

"What was Michael Brecker's first saxophone?" in the Wikipedia article about "Michael Brecker" try it live!

##
# Mixing scalar queries and semantic search queries
##
{
  Get {
    Paragraph(
      ask: {
        question: "What was Michael Brecker's first saxophone?"
        properties: ["content"]
      }
      where: {
        operator: Equal
        path: ["inArticle", "Article", "title"]
        valueString: "Michael Brecker"
      }
      limit: 1
    ) {
      _additional {
        answer {
          result
        }
      }
      content
      order
      title
      inArticle {
        ... on Article {
          title
        }
      }
    }
  }
}

Get all Wikipedia graph connections for "jazz saxophone players" try it live!

##
# Mixing semantic search queries with graph connections
##
{
  Get {
    Paragraph(
      nearText: {
        concepts: ["jazz saxophone players"]
      }
      limit: 25
    ) {
      content
      order
      title
      inArticle {
        ... on Article { # <== Graph connection I
          title
          hasParagraphs { # <== Graph connection II
            ... on Paragraph {
              title
            }
          }
        }
      }
    }
  }
}

Comments

How to use only retrival component of semantic search (Information extraction)

Hello i want to use this semantic search engine. I just want to retrive all documents rather performing question answering on it. How to get the data of information retrival part using python client ?

opened by theainerd 7

Disambiguation improvements

Hu guys, thanks for this outstanding work! I was trying your GraphQL demo endpoint and I had this disambiguation issue. For the query Marc Marquez (the moto racer) I'm getting the wrong document result, while adding more context (like motorcycle racer) I get the correct result. Assumed that I cannot have for this item in the query more context, but "related wikipedia items" (let's say Valentino Rossi), is it possible to add a related item node embedding for this query?

{
  Get {
    Paragraph(
      ask: {
        question: "Marc Marquez"
        properties: ["content"]
      }
      limit: 1
    ) {
      content
      order
      title
      inArticle {
        ... on Article {
          title
        }
      }
      _additional {
        answer {
          result
        }
      }
    }
  }
}

result:

{
  "data": {
    "Get": {
      "Paragraph": [
        {
          "_additional": {
            "answer": {
              "result": null
            }
          },
          "content": "On November 10, 2017, Marquez was signed to the Detroit Lions' practice squad. He was promoted to the active roster on November 27, 2017. On September 11, 2018, Marquez was waived by the Lions. ",
          "inArticle": [
            {
              "title": "Bradley Marquez"
            }
          ],
          "order": 5,
          "title": "Detroit Lions"
        }
      ]
    }
  }
}

opened by loretoparisi 4

Import slow down

Hi,

I was trying to run the demo on an AWS instance and was running into errors/slow down after I imported 12,000,000 paragraphs.

Instance Details: g4dn.12xlarge 4 T4 GPUs 48 vCPU 192 GB RAM Ubuntu 18.04 OS

After I import 12 million paragraphs, the machine performance considerably slows down. I start getting a batch creation timeout message for a batch size of 512, and I continue to get the message when I reduce the batch size to 50. Any ideas why the process slows down so much? The "vectorCacheMaxObjects": 150000000000 so I know that is not the issue.

opened by murak038 1

Error: CUDA error: no kernel image is available for execution on the device with Weaviate python client only

Hi,

I followed the online tutorial: https://weaviate.io/developers/weaviate/current/tutorials/semantic-search-through-wikipedia.html#3-step-tutorial

When searching within the weaviate console:

##
# Generic question about Italian food
##
{
  Get {
    Paragraph(
      nearText: {
        concepts: ["Italian food"]
      }
      limit: 50
    ) {
      content
      order
      title
      inArticle {
        ... on Article {
          title
        }
      }
    }
  }
}

I got the following answer:

{
  "data": {
    "Get": {
      "Paragraph": [
        {
          "content": "Italian cuisine has a great variety of different ingredients which are commonly used, ranging from fruits, vegetables, grains, cheeses, meats and fish. In the North of Italy, fish (such as cod, or baccalà), potatoes, rice, corn (maize), sausages, pork, and different types of cheese are the most common ingredients. Pasta dishes with tomato are common throughout Italy. Italians use ingredients that are fresh and subtly seasoned and spiced. In Northern Italy though there are many kinds of stuffed pasta,  and  are equally popular if not more so. Ligurian ingredients include several types of fish and seafood dishes. Basil (found in pesto), nuts, and olive oil are very common. In Emilia-Romagna, common ingredients include ham (prosciutto), sausage (cotechino), different sorts of salami, truffles, grana, Parmigiano-Reggiano, tomatoes (Bolognese sauce or ragù) and aceto balsamico. Traditional Central Italian cuisine uses ingredients such as tomatoes, all kinds of meat, fish, and pecorino. In Tuscany, pasta (especially pappardelle) is traditionally served with meat sauce (including game meat). In Southern Italy, tomatoes (fresh or cooked into tomato sauce), peppers, olives and olive oil, garlic, artichokes, oranges, ricotta cheese, eggplants, zucchini, certain types of fish (anchovies, sardines and tuna), and capers are important components to the local cuisine. Cheeses and dairy products are foods of which Italy has a great diversity of existing types. The varieties of Italian cheeses and dairy products are very numerous; there are more than 600 distinct types throughout the country, of which 490 are protected and marked as PDO (Protected designation of origin), PGI (Protected Geographical Indication) and PAT (Prodotto agroalimentare tradizionale). Olive oil is the most commonly used vegetable fat in Italian cooking, and as the basis for sauces, replaced only in some recipes and in some geographical areas by butter or lard. Italy is the largest consumer of olive oil, with 30% of the world total; it also has the largest range of olive cultivars in existence and is the second largest producer and exporter in the world. Bread has always been, as it has for other Mediterranean countries, a fundamental food in Italian cuisine. There are numerous regional types of bread. Italian cuisine has a great variety of sausages and cured meats, many of which are protected and marked as PDO and PGI, and make up 34% of the total of sausages and cured meats consumed in Europe, while others are marked as PAT. Meat, especially beef, pork and poultry, is very present in Italian cuisine, in a very wide range of preparations and recipes. It is also important as an ingredient in the preparation of sauces for pasta. In addition to the varieties mentioned, albeit less commonly, sheep, goat, horse, rabbit and, even less commonly, game meat are also consumed in Italy. Since Italy is largely surrounded by the sea, therefore having a great coastal development and being rich in lakes, fish (both marine and freshwater), as well as crustaceans, molluscs and other seafood, enjoy a prominent place in Italian cuisine, as in general in the Mediterranean cuisine. Fish is the second course in meals and is also an ingredient in the preparation of seasonings for types of pasta. It is also widely used in appetizers. Italian cuisine is also well known (and well regarded) for its use of a diverse variety of pasta. Pasta include noodles in various lengths, widths, and shapes. Most pastas may be distinguished by the shapes for which they are named—penne, maccheroni, spaghetti, linguine, fusilli, lasagne, and many more varieties that are filled with other ingredients like ravioli and tortellini. The word pasta is also used to refer to dishes in which pasta products are a primary ingredient. It is usually served with sauce. There are hundreds of different shapes of pasta with at least locally recognized names. Examples include spaghetti (thin rods), rigatoni (tubes or cylinders), fusilli (swirls), and lasagne (sheets). Dumplings, like gnocchi (made with potatoes or pumpkin) and noodles like spätzle, are sometimes considered pasta. Pasta is divided into two broad categories: dry pasta (100% durum wheat flour mixed with water) and fresh pasta (also with soft wheat flour and almost always mixed with eggs). Pasta is generally cooked by boiling. Under Italian law, dry pasta (pasta secca) can only be made from durum wheat flour or durum wheat semolina, and is more commonly used in Southern Italy compared to their Northern counterparts, who traditionally prefer the fresh egg variety. Durum flour and durum semolina have a yellow tinge in colour. Italian pasta is traditionally cooked  (English: firm to the bite, meaning not too soft). There are many types of wheat flour with varying gluten and protein levels depending on the variety of grain used. Particular varieties of pasta may also use other grains and milling methods to make the flour, as specified by law. Some pasta varieties, such as pizzoccheri, are made from buckwheat flour. Fresh pasta may include eggs (, \"egg pasta\"). Both dry and fresh pasta are used to prepare the dish, in three different ways: : pasta is cooked and then served with a sauce or other condiment; minestrone: pasta is cooked and served in meat or vegetable broth, even together with chopped vegetables; pasta al forno: the pasta is first cooked and seasoned, and then passed back to the oven. Pizza, consisting of a usually round, flat base of leavened wheat-based dough topped with tomatoes, cheese, and often various other ingredients (such as anchovies, mushrooms, onions, olives, meats, and more), which is then baked at a high temperature, traditionally in a wood-fired oven, is the best known and most consumed Italian food in the world. In 2009, upon Italy's request, Neapolitan pizza was registered with the European Union as a Traditional Speciality Guaranteed dish,Official Journal of the European Union, Commission regulation (EU) No 97/2010 , 5 February 2010International Trademark Association, European Union: Pizza napoletana obtains \"Traditional Speciality Guaranteed\" status , 1 April 2010 and in 2017 the art of its making was included on UNESCO's list of intangible cultural heritage. In Italy it is consumed as a single dish () or as a snack, even on the go (pizza al taglio). In the various regions, dishes similar to pizza are the various types of focaccia, such as piadina, crescia or sfincione. ",
          "inArticle": [
            {
              "title": "Italian cuisine"
            }
          ],
          "order": 7,
          "title": "Basic foods"
        }, ...
}

But, while trying to search the same query through weaviate python client:

import weaviate

client = weaviate.Client("http://localhost:8080")

nearText = {
    "concepts": ["Italian food"]
}

search = client.query\
    .get("Paragraph", ["content", "order", "title", "_additional {certainty} "])\
    .with_near_text(nearText)\
    .with_limit(50)\
    .do()

print(search)

I got the following error message: {'data': {'Get': {'Paragraph': None}}, 'errors': [{'locations': [{'column': 6, 'line': 1}], 'message': 'explorer: get class: vectorize params: vectorize params: vectorize params: vectorize keywords: remote client vectorize: fail with status 500: CUDA error: no kernel image is available for execution on the device\nCUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1.', 'path': ['Get', 'Paragraph']}]}

Would you have any idea?

opened by Matthieu-Tinycoaching 6

fail with status 500: CUDA error: no kernel image is available for execution on the device

Hi,

I try to reproduce step 2 of the semantic search through wikipedia on my local computer with RTX 3090, and while importing data with the nohup python3 -u import.py & command I got the following error message:

2022-10-10 16:20:59.925 | INFO     | __main__:import_data_without_crefs:195 - Imported (1490 / 215) – Islamic world with # of paragraphs 7
2022-10-10 16:20:59.925 | INFO     | __main__:import_data_without_crefs:195 - Imported (1491 / 215) – Islamic world with # of paragraphs 7
2022-10-10 16:20:59.925 | INFO     | __main__:import_data_without_crefs:195 - Imported (1492 / 215) – Islamic world with # of paragraphs 7
2022-10-10 16:20:59.925 | INFO     | __main__:import_data_without_crefs:195 - Imported (1493 / 215) – Islamic world with # of paragraphs 7
2022-10-10 16:20:59.925 | INFO     | __main__:import_data_without_crefs:195 - Imported (1494 / 215) – Islamic world with # of paragraphs 7
2022-10-10 16:20:59.925 | INFO     | __main__:import_data_without_crefs:195 - Imported (1495 / 215) – Islamic world with # of paragraphs 7
2022-10-10 16:20:59.926 | INFO     | __main__:import_data_without_crefs:195 - Imported (1497 / 216) – Multiverse with # of paragraphs 4
2022-10-10 16:20:59.926 | INFO     | __main__:import_data_without_crefs:195 - Imported (1498 / 216) – Multiverse with # of paragraphs 4
2022-10-10 16:20:59.926 | INFO     | __main__:import_data_without_crefs:195 - Imported (1499 / 216) – Multiverse with # of paragraphs 4
2022-10-10 16:21:00.482 | DEBUG    | __main__:handle_results:169 - fail with status 500: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
2022-10-10 16:21:00.482 | DEBUG    | __main__:handle_results:169 - fail with status 500: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
2022-10-10 16:21:00.482 | DEBUG    | __main__:handle_results:169 - fail with status 500: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
2022-10-10 16:21:00.482 | DEBUG    | __main__:handle_results:169 - fail with status 500: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
2022-10-10 16:21:00.482 | DEBUG    | __main__:handle_results:169 - fail with status 500: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
2022-10-10 16:21:00.482 | DEBUG    | __main__:handle_results:169 - fail with status 500: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
2022-10-10 16:21:00.482 | DEBUG    | __main__:handle_results:169 - fail with status 500: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
2022-10-10 16:21:00.482 | DEBUG    | __main__:handle_results:169 - fail with status 500: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

I follow the guidelines provided at https://weaviate.io/developers/weaviate/current/tutorials/semantic-search-through-wikipedia.html#step-2-import-the-dataset-and-vectorize-the-content and set the following files to work on local computer:

1 / docker-compose.yml:

---
version: '3.4'
services:
  weaviate:
    command:
    - --host
    - 0.0.0.0
    - --port
    - '8080'
    - --scheme
    - http
    image: semitechnologies/weaviate:1.15.3
    ports:
    - 8080:8080
    restart: on-failure:0
    volumes:
      - /media/matthieu/HDD_4T01/weaviate:/var/lib/weaviate
      # - /home/matthieu/weaviate:/var/lib/weaviate
    depends_on:
      - loadbalancer
    environment:
      TRANSFORMERS_INFERENCE_API: 'http://loadbalancer:8080'
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'text2vec-transformers'
      ENABLE_MODULES: 'text2vec-transformers'
      # ENABLE_MODULES: ''
      CLUSTER_HOSTNAME: 'node1'
  ##
  # Load in all GPUs
  ##
  loadbalancer:
    image: nginx
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
    depends_on:
      - t2v-transformers-01-001
      - t2v-transformers-01-002
      - t2v-transformers-01-003
      - t2v-transformers-01-004
      - t2v-transformers-01-005
      - t2v-transformers-01-006
      - t2v-transformers-01-007
      - t2v-transformers-01-008
      - t2v-transformers-01-009
      - t2v-transformers-01-010
      - t2v-transformers-01-011
      - t2v-transformers-01-012

  t2v-transformers-01-001:
    image: semitechnologies/transformers-inference:sentence-transformers-multi-qa-MiniLM-L6-cos-v1
    restart: always
    environment:
      ENABLE_CUDA: '1'
      NVIDIA_VISIBLE_DEVICES: '0' # <== set the GPU to use. 0 = 1st GPU, 1 = 2nd, etc
    deploy:
      resources:
        reservations:
          devices:
          - capabilities: 
            - 'gpu'
  t2v-transformers-01-002:
    image: semitechnologies/transformers-inference:sentence-transformers-multi-qa-MiniLM-L6-cos-v1
    restart: always
    environment:
      ENABLE_CUDA: '1'
      NVIDIA_VISIBLE_DEVICES: '0'
    deploy:
      resources:
        reservations:
          devices:
          - capabilities: 
            - 'gpu'
  t2v-transformers-01-003:
    image: semitechnologies/transformers-inference:sentence-transformers-multi-qa-MiniLM-L6-cos-v1
    restart: always
    environment:
      ENABLE_CUDA: '1'
      NVIDIA_VISIBLE_DEVICES: '0'
    deploy:
      resources:
        reservations:
          devices:
          - capabilities: 
            - 'gpu'
  t2v-transformers-01-004:
    image: semitechnologies/transformers-inference:sentence-transformers-multi-qa-MiniLM-L6-cos-v1
    restart: always
    environment:
      ENABLE_CUDA: '1'
      NVIDIA_VISIBLE_DEVICES: '0'
    deploy:
      resources:
        reservations:
          devices:
          - capabilities: 
            - 'gpu'
  t2v-transformers-01-005:
    image: semitechnologies/transformers-inference:sentence-transformers-multi-qa-MiniLM-L6-cos-v1
    restart: always
    environment:
      ENABLE_CUDA: '1'
      NVIDIA_VISIBLE_DEVICES: '0'
    deploy:
      resources:
        reservations:
          devices:
          - capabilities: 
            - 'gpu'
  t2v-transformers-01-006:
    image: semitechnologies/transformers-inference:sentence-transformers-multi-qa-MiniLM-L6-cos-v1
    restart: always
    environment:
      ENABLE_CUDA: '1'
      NVIDIA_VISIBLE_DEVICES: '0'
    deploy:
      resources:
        reservations:
          devices:
          - capabilities: 
            - 'gpu'
  t2v-transformers-01-007:
    image: semitechnologies/transformers-inference:sentence-transformers-multi-qa-MiniLM-L6-cos-v1
    restart: always
    environment:
      ENABLE_CUDA: '1'
      NVIDIA_VISIBLE_DEVICES: '0'
    deploy:
      resources:
        reservations:
          devices:
          - capabilities: 
            - 'gpu'
  t2v-transformers-01-008:
    image: semitechnologies/transformers-inference:sentence-transformers-multi-qa-MiniLM-L6-cos-v1
    restart: always
    environment:
      ENABLE_CUDA: '1'
      NVIDIA_VISIBLE_DEVICES: '0'
    deploy:
      resources:
        reservations:
          devices:
          - capabilities: 
            - 'gpu'
  t2v-transformers-01-009:
    image: semitechnologies/transformers-inference:sentence-transformers-multi-qa-MiniLM-L6-cos-v1
    restart: always
    environment:
      ENABLE_CUDA: '1'
      NVIDIA_VISIBLE_DEVICES: '0'
    deploy:
      resources:
        reservations:
          devices:
          - capabilities: 
            - 'gpu'
  t2v-transformers-01-010:
    image: semitechnologies/transformers-inference:sentence-transformers-multi-qa-MiniLM-L6-cos-v1
    restart: always
    environment:
      ENABLE_CUDA: '1'
      NVIDIA_VISIBLE_DEVICES: '0'
    deploy:
      resources:
        reservations:
          devices:
          - capabilities: 
            - 'gpu'
  t2v-transformers-01-011:
    image: semitechnologies/transformers-inference:sentence-transformers-multi-qa-MiniLM-L6-cos-v1
    restart: always
    environment:
      ENABLE_CUDA: '1'
      NVIDIA_VISIBLE_DEVICES: '0'
    deploy:
      resources:
        reservations:
          devices:
          - capabilities: 
            - 'gpu'
  t2v-transformers-01-012:
    image: semitechnologies/transformers-inference:sentence-transformers-multi-qa-MiniLM-L6-cos-v1
    restart: always
    environment:
      ENABLE_CUDA: '1'
      NVIDIA_VISIBLE_DEVICES: 'all'
    deploy:
      resources:
        reservations:
          devices:
          - capabilities: 
            - 'gpu'
...

2 / nginx.conf:

user  nginx;
worker_processes  1;
error_log  /var/log/nginx/error.log warn;
pid        /var/run/nginx.pid;

events {
    worker_connections  1024;
}

http {
    upstream modules {
        least_conn;
        server t2v-transformers-01-001:8080;
        server t2v-transformers-01-002:8080;
        server t2v-transformers-01-003:8080;
        server t2v-transformers-01-004:8080;
        server t2v-transformers-01-005:8080;
        server t2v-transformers-01-006:8080;
        server t2v-transformers-01-007:8080;
        server t2v-transformers-01-008:8080;
        server t2v-transformers-01-009:8080;
        server t2v-transformers-01-010:8080;
        server t2v-transformers-01-011:8080;
        server t2v-transformers-01-012:8080;
    }

    include                 /etc/nginx/mime.types;
    default_type            application/octet-stream;
    keepalive_timeout       65;
    client_body_buffer_size 128M;
    client_max_body_size    128M;

    server {
        listen 8080 default_server;
        listen [::]:8080 default_server;
        location / {
            proxy_set_header                    Host $http_host;
            proxy_set_header                    X-Url-Scheme $scheme;
            proxy_set_header                    X-Real-IP $remote_addr;
            proxy_set_header                    X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_pass                          http://modules;
            proxy_buffering                     off;
            proxy_read_timeout                  3600;
            proxy_redirect                      off;
        }
    }
}

My GPU card is well configured with CUDA 11.6 and cuDNN and the following command nvidia-smi returns:

Mon Oct 10 16:39:30 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.39.01    Driver Version: 510.39.01    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:21:00.0  On |                  N/A |
|  0%   55C    P8    54W / 390W |   1272MiB / 24576MiB |      5%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2389      G   /usr/lib/xorg/Xorg                 38MiB |
|    0   N/A  N/A      2935      G   /usr/bin/gnome-shell              118MiB |
|    0   N/A  N/A      9316      G   /usr/lib/xorg/Xorg                423MiB |
|    0   N/A  N/A      9462      G   /usr/bin/gnome-shell              116MiB |
|    0   N/A  N/A     10657      G   ...veSuggestionsOnlyOnDemand       71MiB |
|    0   N/A  N/A     12204      G   ...rbird/259/thunderbird-bin      178MiB |
|    0   N/A  N/A     18483      G   ...AAAAAAAAA= --shared-files       47MiB |
|    0   N/A  N/A     22141      G   ...142126277284136387,131072      203MiB |
|    0   N/A  N/A     68713      G   ...RendererForSitePerProcess       68MiB |
+-----------------------------------------------------------------------------+

opened by Matthieu-Tinycoaching 3

Owner

SeMI Technologies

SeMI Technologies creates database software like the Weaviate vector search engine

GitHub

Yet Another Sequence Encoder - Encode sequences to vector of vector in python !

Yase Yet Another Sequence Encoder - encode sequences to vector of vectors in python ! Why Yase ? Yase enable you to encode any sequence which can be r

12 Aug 19, 2021

Semi-automated vocabulary generation from semantic vector models

vec2word Semi-automated vocabulary generation from semantic vector models This script generates a list of potential conlang word forms along with asso

9 Nov 25, 2022

Create a semantic search engine with a neural network (i.e. BERT) whose knowledge base can be updated

Create a semantic search engine with a neural network (i.e. BERT) whose knowledge base can be updated. This engine can later be used for downstream tasks in NLP such as Q&A, summarization, generation, and natural language understanding (NLU).

1 Mar 20, 2022

Reading Wikipedia to Answer Open-Domain Questions

DrQA This is a PyTorch implementation of the DrQA system described in the ACL 2017 paper Reading Wikipedia to Answer Open-Domain Questions. Quick Link

4.3k Jan 1, 2023

DensePhrases provides answers to your natural language questions from the entire Wikipedia in real-time

DensePhrases provides answers to your natural language questions from the entire Wikipedia in real-time. While it efficiently searches the answers out of 60 billion phrases in Wikipedia, it is also very accurate having competitive accuracy with state-of-the-art open-domain QA models

543 Jan 8, 2023

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

740 Dec 24, 2022

BPEmb is a collection of pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) and trained on Wikipedia.

BPEmb is a collection of pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) and trained on Wikipedia. Its intended use is as input for neural models in natural language processing.

1.1k Jan 3, 2023

Official codebase for Can Wikipedia Help Offline Reinforcement Learning?

82 Dec 19, 2022

The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank

Main Idea The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank Semantic Search Re

2 Jan 28, 2022

Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.

CTC Decoding Algorithms Update 2021: installable Python package Python implementation of some common Connectionist Temporal Classification (CTC) decod

736 Jan 3, 2023

Generate vector graphics from a textual caption

VectorAscent: Generate vector graphics from a textual description Example "a painting of an evergreen tree" python text_to_painting.py --prompt "a pai

97 Dec 15, 2022

SinglepassTextCluster, an TextCluster tools based on Singlepass cluster algorithm that use tfidf vector and doc2vec，which can be used for individual real-time corpus cluster task。基于single-pass算法思想的自动文本聚类小组件，内置tfidf和doc2vec两种文本向量方法，可自动输出聚类数目、类簇文档集合和簇类大小，用于自有实时数据的聚类任务。

项目的背景 SinglepassTextCluster, an TextCluster tool based on Singlepass cluster algorithm that use tfidf vector and doc2vec，which can be used for individ

34 Dec 18, 2022

Semantic search through a vectorized Wikipedia (SentenceBERT) with the Weaviate vector search engine

Related tags

Overview

Semantic search through Wikipedia with the Weaviate vector search engine

Frequently Asked Questions

Acknowledgments

Stats

Example queries

Import

Step 1: Process the Wikipedia dump

Step 2: Import the dataset and vectorized the content

Step 3: Load from backup

With GPU

Without GPU

Example queries

Comments

How to use only retrival component of semantic search (Information extraction)

Disambiguation improvements

Import slow down

Error: CUDA error: no kernel image is available for execution on the device with Weaviate python client only

fail with status 500: CUDA error: no kernel image is available for execution on the device

Owner

SeMI Technologies

Yet Another Sequence Encoder - Encode sequences to vector of vector in python !

Semi-automated vocabulary generation from semantic vector models

Create a semantic search engine with a neural network (i.e. BERT) whose knowledge base can be updated

Reading Wikipedia to Answer Open-Domain Questions

DensePhrases provides answers to your natural language questions from the entire Wikipedia in real-time

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

BPEmb is a collection of pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) and trained on Wikipedia.

Official codebase for Can Wikipedia Help Offline Reinforcement Learning?

The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank

Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.

Generate vector graphics from a textual caption

Search for documents in a domain through Google. The objective is to extract metadata

Top2Vec is an algorithm for topic modeling and semantic search.

Cải thiện Elasticsearch trong bài toán semantic search sử dụng phương pháp Sentence Embeddings

Blue Brain text mining toolbox for semantic search and structured information extraction

txtai: Build AI-powered semantic search applications in Go

ChatterBot is a machine learning, conversational dialog engine for creating chat bots

ChatterBot is a machine learning, conversational dialog engine for creating chat bots