A desktop GUI providing an audio interface for GPT3.

Last update: Nov 27, 2022

Related tags

Overview

Jabberwocky

neil_degrasse_tyson_with_audio.mp4

Project Description

This GUI provides an audio interface to GPT-3. My main goal was to provide a convenient way to interact with various experts or public figures: imagine discussing physics with Einstein or hip hop with Kanye (or hip hop with Einstein? 🤔 ). I often find writing and speaking to be wildly different experiences and I imagined the same would be true when interacting with GPT-3. This turned out to be partially true - the default Mac text-to-speech functionality I'm using here is certainly not as lifelike as I'd like. Perhaps more powerful audio generation methods will pop up in a future release...

We also provide Task Mode containing built-in prompts for a number of sample tasks:

Summarization
Explain like I'm 5
Translation
How To (step by step instructions for performing everyday tasks)
Writing Style Analysis
Explain machine learning concepts in simple language
Generate ML paper abstracts
MMA Fight Analysis and Prediction

Getting Started

Clone the repo.

git clone https://github.com/hdmamin/jabberwocky.git

Install the necessary packages. I recommend using a virtual environment of some kind (virtualenv, conda, etc). If you're not using Mac OS, you could try installing portaudio with whatever package manager you're using, but app behavior on other systems is unknown.

brew install portaudio
pip install -r requirements.txt
python -m nltk.downloader punkt

If you have make installed you can simply use the command:

make install

Add your openai API key somewhere the program can access it. There are two ways to do this:

echo your_openai_api_key > ~/.openai

export OPENAI_API_KEY=your_openai_api_key

(Make sure to use your actual key, not the literal text your_openai_api_key.)

Run the app.

python gui/main.py

Or with make:

make run

Usage

Conversation Mode

In conversation mode, you can chat with a number of pre-defined personas or add new ones. New personas can be autogenerated or defined manually.

See data/conversation_personas for examples of autogenerated prompts. You can likely achieve better results using custom prompts though.

Conversation mode only supports spoken input, though you can edit flawed transcriptions manually. Querying GPT-3 with nonsensical or ungrammatical text will negatively affect response quality.

Task Mode

In task mode, you can ask GPT-3 to perform a number pre-defined tasks. Written and spoken input are both supported. By default, GPT-3's response is both typed out and read aloud.

Transcripts of responses from a small subset of non-conversation tasks can be found in the data/completions directory. You can also save your own completions while using the app.

Usage Notes

The first time you speak, the speech transcription back end will take a few seconds to calibrate to the level of ambient noise in your environment. You will know it's ready for transcription when you see a "Listening..." message appear below the Record button. Calibration only occurs once to save time.

Hotkeys

CTRL + SHIFT: Start recording audio (same as pressing the "Record" button).
CTRL + a: Get GPT-3's response to whatever input you've recorded (same as pressing the "Get Response" button).

Project Members

Harrison Mamin

Repo Structure

jabberwocky/
├── data         # Raw and processed data. Some files are excluded from github but the ones needed to run the app are there.
├── notes        # Miscellaneous notes from the development process stored as raw text files.
├── notebooks    # Jupyter notebooks for experimentation and exploratory analysis.
├── reports      # Markdown reports (performance reports, blog posts, etc.)
├── gui          # GUI scripts. The main script should be run from the project root directory. 
└── lib          # Python package. Code can be imported in analysis notebooks, py scripts, etc.

The docker and setup dirs contain remnants from previous attempts to package the app. While I ultimately decided to go with a simpler approach, I left them in the repo so I have the option of picking up where I left off if I decide to work on a new version.

You might also like...

TalkNet: Audio-visual active speaker detection Model

Is someone talking? TalkNet: Audio-visual active speaker detection Model This repository contains the code for our ACM MM 2021 paper, TalkNet, an acti

142 Dec 14, 2022

Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration This repo contains only model Implementation of Zero-Shot Text-to-Speech for Text

33 Sep 22, 2022

Utility for Google Text-To-Speech batch audio files generator. Ideal for prompt files creation with Google voices for application in offline IVRs

Google Text-To-Speech Batch Prompt File Maker Are you in the need of IVR prompts, but you have no voice actors? Let Google talk your prompts like a pr

1 Aug 19, 2021

Jarvis is a simple Chatbot with a GUI capable of chatting and retrieving information and daily news from the internet for it's user.

J.A.R.V.I.S Kindly consider starring this repository if you like the program :-) What/Who is J.A.R.V.I.S? J.A.R.V.I.S is an chatbot written that is bu

50 Dec 31, 2022

Comments

android?

Hi, ive been really wanting to try jabberwocky for a while now..problem is...i dont have a cpu..lame, i know..i have an android phone and im running termux for some stuff.. what would it take for me to somehow port this over to android/termex/vps? Sorry.. btw i have an api beta key

opened by djhashh 1

A desktop GUI providing an audio interface for GPT3.

Related tags

Overview

Jabberwocky

Project Description

Getting Started

Usage

Conversation Mode

Task Mode

Usage Notes

Hotkeys

Project Members

Repo Structure

You might also like...

TalkNet: Audio-visual active speaker detection Model

Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

Utility for Google Text-To-Speech batch audio files generator. Ideal for prompt files creation with Google voices for application in offline IVRs

Jarvis is a simple Chatbot with a GUI capable of chatting and retrieving information and daily news from the internet for it's user.

Minimal GUI for accessing the Watson Text to Speech service.

Text to speech converter with GUI made in Python.

Simple GUI where you can enter an article and get a crisp summarized version.

This is a GUI program that will generate a word search puzzle image

Paddlespeech Streaming ASR GUI

Comments

android?

Owner

Use the power of GPT3 to execute any function inside your programs just by giving some doctests

Wrapper to display a script output or a text file content on the desktop in sway or other wlroots-based compositors

PocketSphinx is a lightweight speech recognition engine, specifically tuned for handheld and mobile devices, though it works equally well on the desktop

Data manipulation and transformation for audio signal processing, powered by PyTorch

Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

VMD Audio/Text control with natural language

AudioCLIP Extending CLIP to Image, Text and Audio

Simplified diarization pipeline using some pretrained models - audio file to diarized segments in a few lines of code