Weaviate demo with the text2vec-openai module

Overview

Weaviate demo with the text2vec-openai module

This repository contains an example of how to use the Weaviate text2vec-openai module. When using this demo dataset, Weaviate will vectorize the data and the queries based on OpenAI's Babbage model.

What is Weaviate?

Weaviate is an open-source, modular vector search engine. It works like any other database you're used to (it has full CRUD support, it's cloud-native, etc), but it is created around the concept of storing all data objects based on the vector representations (i.e., embeddings) of these data objects. Within Weaviate you can mix traditional, scalar search filters with vector search filters through its GraphQL-API.

Weaviate modules can be used to -among other things- vectorize the data objects you add to Weaviate. In this demo, the text2vec-openai module is used to vectorize all data using OpenAI's Babbage model.

You can read about Weaviate in more detail in the software docs.

About the Dataset

This dataset contains descriptions of 34,886 movies from around the world. The dataset is taken from Kaggle.

Run the setup

Before running this setup, make sure you have an OpenAPI ready, you can create one here.

0. Update you OpenAI API key

$ export OPENAI_APIKEY=YOUR_API_KEY

1. Run the container

Run the container:

$ docker-compose up -d

2. Import the data

After the container starts up, you can import the data by running:

# Install the Weaviate Python client
$ pip3 install -r requirements.txt
# Import the data with the format `./import.py {URL} {OPENAI RATE LIMIT}`
$ ./import.py http://localhost:8080 550

Note: because the OpenAI API comes with a rate limit, we have taken this into account for this demo dataset. If you work with your own dataset and you've requested an increase/removal of your rate limit, you can increase the import speed. You can read here how to do this.

3. Query the data

You can query the data via the GraphQL interface that's available in the Weaviate Console (under "Self Hosted Weaviate").

Or you can test the example queries below.

Example Query

Learn how to use the Get{} function of the Weaviate GraphQL-API here.

{
  Get {
    Movie(
      nearText: {
        concepts: ["Movie about Venice"]
      }
      where: {
        path: ["year"]
        operator: LessThan
        valueInt: 1950
      }
      limit: 5
    ) {
      title
      plot
      year
      director {
        ... on Director {
          name
        }
      }
      genre {
        ... on Genre {
          name
        }
      }
    }
  }
}
You might also like...
Galois is an auto code completer for code editors (or any text editor) based on OpenAI GPT-2.
Galois is an auto code completer for code editors (or any text editor) based on OpenAI GPT-2.

Galois is an auto code completer for code editors (or any text editor) based on OpenAI GPT-2. It is trained (finetuned) on a curated list of approximately 45K Python (~470MB) files gathered from the Github. Currently, it just works properly on Python but not bad at other languages (thanks to GPT-2's power).

Converts python code into c++ by using OpenAI CODEX.

🦾 codex_py2cpp 🤖 OpenAI Codex Python to C++ Code Generator Your Python Code is too slow? 🐌 You want to speed it up but forgot how to code in C++? ⌨

Demo programs for the Talking Head Anime from a Single Image 2: More Expressive project.
Demo programs for the Talking Head Anime from a Single Image 2: More Expressive project.

Demo Code for "Talking Head Anime from a Single Image 2: More Expressive" This repository contains demo programs for the Talking Head Anime

A demo for end-to-end English and Chinese text spotting using ABCNet.
A demo for end-to-end English and Chinese text spotting using ABCNet.

ABCNet_Chinese A demo for end-to-end English and Chinese text spotting using ABCNet. This is an old model that was trained a long ago, which serves as

A demo of chinese asr
A demo of chinese asr

chinese_asr_demo 一个端到端的中文语音识别模型训练、测试框架 具备数据预处理、模型训练、解码、计算wer等等功能 训练数据 训练数据采用thchs_30,

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.
Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Pattern Pattern is a web mining module for Python. It has tools for: Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM par

Summarization module based on KoBART
Summarization module based on KoBART

KoBART-summarization Install KoBART pip install git+https://github.com/SKT-AI/KoBART#egg=kobart Requirements pytorch==1.7.0 transformers==4.0.0 pytor

Module for automatic summarization of text documents and HTML pages.

Automatic text summarizer Simple library and command line utility for extracting summary from HTML pages or plain texts. The package also contains sim

Python module (C extension and plain python) implementing Aho-Corasick algorithm

pyahocorasick pyahocorasick is a fast and memory efficient library for exact or approximate multi-pattern string search meaning that you can find mult

Owner
SeMI Technologies
SeMI Technologies creates database software like the Weaviate vector search engine
SeMI Technologies
Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts

gpt-2-simple A simple Python package that wraps existing model fine-tuning and generation scripts for OpenAI's GPT-2 text generation model (specifical

Max Woolf 3.1k Jan 7, 2023
Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts

gpt-2-simple A simple Python package that wraps existing model fine-tuning and generation scripts for OpenAI's GPT-2 text generation model (specifical

Max Woolf 2.5k Feb 17, 2021
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Phil Wang 5k Jan 2, 2023
Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

GPT2-Pytorch with Text-Generator Better Language Models and Their Implications Our model, called GPT-2 (a successor to GPT), was trained simply to pre

Tae-Hwan Jung 775 Jan 8, 2023
A python project made to generate code using either OpenAI's codex or GPT-J (Although not as good as codex)

CodeJ A python project made to generate code using either OpenAI's codex or GPT-J (Although not as good as codex) Install requirements pip install -r

TheProtagonist 1 Dec 6, 2021
An easy to use, user-friendly and efficient code for extracting OpenAI CLIP (Global/Grid) features from image and text respectively.

Extracting OpenAI CLIP (Global/Grid) Features from Image and Text This repo aims at providing an easy to use and efficient code for extracting image &

Jianjie(JJ) Luo 13 Jan 6, 2023
Bot to connect a real Telegram user, simulating responses with OpenAI's davinci GPT-3 model.

AI-BOT Bot to connect a real Telegram user, simulating responses with OpenAI's davinci GPT-3 model.

Thempra 2 Dec 21, 2022
OpenAI CLIP text encoders for multiple languages!

Multilingual-CLIP OpenAI CLIP text encoders for any language Colab Notebook · Pre-trained Models · Report Bug Overview OpenAI recently released the pa

Fredrik Carlsson 481 Dec 30, 2022
Use Tensorflow2.7.0 Build OpenAI'GPT-2

TF2_GPT-2 Use Tensorflow2.7.0 Build OpenAI'GPT-2 使用最新tensorflow2.7.0构建openai官方的GPT-2 NLP模型 优点 使用无监督技术 拥有大量词汇量 可实现续写(堪比“xx梦续写”) 实现对话后续将应用于FloatTech的Bot

Watermelon 9 Sep 13, 2022
This is a modification of the OpenAI-CLIP repository of moein-shariatnia

This is a modification of the OpenAI-CLIP repository of moein-shariatnia

Sangwon Beak 2 Mar 4, 2022