Covid-ChatBot - A Rapid Response Virtual Agent for Covid-19 Queries

Overview

COVID-19 CHatBot

A Rapid Response Virtual Agent for Covid-19 Queries

Contents

  1. What is ChatBot
  2. Types of ChatBots
  3. About the Project
  4. Dataset
  5. Prerequisites
  6. Building a Chatbot

1. What is ChatBot

A chatbot is an intelligent piece of software that is capable of communicating and performing actions similar to a human. Chatbots are used a lot in customer interaction, marketing on social network sites and instantly messaging the client.second

2. Types of CHatBots

There are two basic types of chatbot models based on how they are built; Retrieval based and Generative based models.

  • Retrieval based Chatbots
    A retrieval-based chatbot uses predefined input patterns and responses. It then uses some type of heuristic approach to select the appropriate response. It is widely used in the industry to make goal-oriented chatbots where we can customize the tone and flow of the chatbot to drive our customers with the best experience.
  • Generative based Chatbots
    Generative models are not based on some predefined responses.
    They are based on seq 2 seq neural networks. It is the same idea as machine translation. In machine translation, we translate the source code from one language to another language but here, we are going to transform input into an output. It needs a large amount of data and it is based on Deep Neural networks.

3. About the Project

In this Python and NLP project, we have build a chatbot using deep learning techniques. The chatbot will be trained on the dataset which contains categories (intents), pattern and responses. We used a special recurrent neural network (LSTM) to classify which category the user’s message belongs to and then we will give a random response from the list of responses. Let’s see how to create a retrieval based chatbot using NLTK, Keras, Python, etc.

4. Dataset

The dataset we have used is ‘intents.json’. This is a JSON file that contains the patterns we need to find and the responses we want to return to the user.

5. Prerequisites

The project requires you to have good knowledge of Python, Keras, and Natural language processing (NLTK). Along with them, we will use some helping modules which you can download using the python-pip command. Some basic Git commands are:

pip install tensorflow, keras, pickle, nltk

6. Building a ChatBot

  • Intents.json – The data file which has predefined patterns and responses.
  • train_chatbot.py – In this Python file, we wrote a script to build the model and train our chatbot.
  • Words.pkl – This is a pickle file in which we store the words Python object that contains a list of our vocabulary.
  • Classes.pkl – The classes pickle file contains the list of categories.
  • Chatbot_model.h5 – This is the trained model that contains information about the model and has weights of the neurons.
  • Chatgui.py – This is the Python script in which we implemented GUI for our chatbot. Users can easily interact with the bot.

Please follow the following steps to create the project

1. Import and load the data file

First, make a file name as train_chatbot.py. We import the necessary packages for our chatbot and initialize the variables we will use in our Python project. The data file is in JSON format so we used the json package to parse the JSON file into Python.
intents-data-file

2. Preprocess data

When working with text data, we need to perform various preprocessing on the data before we make a machine learning or a deep learning model. Based on the requirements we need to apply various operations to preprocess the data.
Tokenizing is the most basic and first thing you can do on text data. Tokenizing is the process of breaking the whole text into small parts like words.
Here we iterate through the patterns and tokenize the sentence using nltk.word_tokenize() function and append each word in the words list. We also create a list of classes for our tags.

for intent in intents['intents']:
    for pattern in intent['patterns']:

        #tokenize each word
        w = nltk.word_tokenize(pattern)
        words.extend(w)
        #add documents in the corpus
        documents.append((w, intent['tag']))

        # add to our classes list
        if intent['tag'] not in classes:
            classes.append(intent['tag'])

Now we will lemmatize each word and remove duplicate words from the list. Lemmatizing is the process of converting a word into its lemma form and then creating a pickle file to store the Python objects which we will use while predicting.

# lemmatize, lower each word and remove duplicates
words = [lemmatizer.lemmatize(w.lower()) for w in words if w not in ignore_words]
words = sorted(list(set(words)))
# sort classes
classes = sorted(list(set(classes)))
# documents = combination between patterns and intents
print (len(documents), "documents")
# classes = intents
print (len(classes), "classes", classes)
# words = all words, vocabulary
print (len(words), "unique lemmatized words", words)

pickle.dump(words,open('words.pkl','wb'))
pickle.dump(classes,open('classes.pkl','wb'))

3. Create training and testing data

Now, we will create the training data in which we will provide the input and the output. Our input will be the pattern and output will be the class our input pattern belongs to. But the computer doesn’t understand text so we will convert text into numbers.

# create our training data
training = []
# create an empty array for our output
output_empty = [0] * len(classes)
# training set, bag of words for each sentence
for doc in documents:
    # initialize our bag of words
    bag = []
    # list of tokenized words for the pattern
    pattern_words = doc[0]
    # lemmatize each word - create base word, in attempt to represent related words
    pattern_words = [lemmatizer.lemmatize(word.lower()) for word in pattern_words]
    # create our bag of words array with 1, if word match found in current pattern
    for w in words:
        bag.append(1) if w in pattern_words else bag.append(0)

    # output is a '0' for each tag and '1' for current tag (for each pattern)
    output_row = list(output_empty)
    output_row[classes.index(doc[1])] = 1

    training.append([bag, output_row])
# shuffle our features and turn into np.array
random.shuffle(training)
training = np.array(training)
# create train and test lists. X - patterns, Y - intents
train_x = list(training[:,0])
train_y = list(training[:,1])
print("Training data created")

4. Build the model

We have our training data ready, now we will build a deep neural network that has 3 layers. We use the Keras sequential API for this. After training the model for 200 epochs, we achieved 100% accuracy on our model.

# Create model - 3 layers. First layer 128 neurons, second layer 64 neurons and 3rd output layer contains number of neurons
# equal to number of intents to predict output intent with softmax
model = Sequential()
model.add(Dense(128, input_shape=(len(train_x[0]),), activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(len(train_y[0]), activation='softmax'))

# Compile model. Stochastic gradient descent with Nesterov accelerated gradient gives good results for this model
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])

#fitting and saving the model 
hist = model.fit(np.array(train_x), np.array(train_y), epochs=200, batch_size=5, verbose=1)
model.save('chatbot_model.h5', hist)

print("model created")

5. Predict the response( Graphical User Interface)

To predict the sentences and get a response from the user to let us create a new file ‘chatapp.py’.

We will load the trained model and then use a graphical user interface that will predict the response from the bot. The model will only tell us the class it belongs to, so we will implement some functions which will identify the class and then retrieve us a random response from the list of responses.

Again we import the necessary packages and load the ‘words.pkl’ and ‘classes.pkl’ pickle files which we have created when we trained our model:

import nltk
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
import pickle
import numpy as np

from keras.models import load_model
model = load_model('chatbot_model.h5')
import json
import random
intents = json.loads(open('intents.json').read())
words = pickle.load(open('words.pkl','rb'))
classes = pickle.load(open('classes.pkl','rb'))

To predict the class, we will need to provide input in the same way as we did while training. So we will create some functions that will perform text preprocessing and then predict the class.

def clean_up_sentence(sentence):
    # tokenize the pattern - split words into array
    sentence_words = nltk.word_tokenize(sentence)
    # stem each word - create short form for word
    sentence_words = [lemmatizer.lemmatize(word.lower()) for word in sentence_words]
    return sentence_words
# return bag of words array: 0 or 1 for each word in the bag that exists in the sentence

def bow(sentence, words, show_details=True):
    # tokenize the pattern
    sentence_words = clean_up_sentence(sentence)
    # bag of words - matrix of N words, vocabulary matrix
    bag = [0]*len(words) 
    for s in sentence_words:
        for i,w in enumerate(words):
            if w == s: 
                # assign 1 if current word is in the vocabulary position
                bag[i] = 1
                if show_details:
                    print ("found in bag: %s" % w)
    return(np.array(bag))

def predict_class(sentence, model):
    # filter out predictions below a threshold
    p = bow(sentence, words,show_details=False)
    res = model.predict(np.array([p]))[0]
    ERROR_THRESHOLD = 0.25
    results = [[i,r] for i,r in enumerate(res) if r>ERROR_THRESHOLD]
    # sort by strength of probability
    results.sort(key=lambda x: x[1], reverse=True)
    return_list = []
    for r in results:
        return_list.append({"intent": classes[r[0]], "probability": str(r[1])})
    return return_list

After predicting the class, we will get a random response from the list of intents.

def getResponse(ints, intents_json):
    tag = ints[0]['intent']
    list_of_intents = intents_json['intents']
    for i in list_of_intents:
        if(i['tag']== tag):
            result = random.choice(i['responses'])
            break
    return result

def chatbot_response(text):
    ints = predict_class(text, model)
    res = getResponse(ints, intents)
    return res

Now we will develop a graphical user interface. Let’s use Tkinter library which is shipped with tons of useful libraries for GUI. We will take the input message from the user and then use the helper functions we have created to get the response from the bot and display it on the GUI.

6. Run the chatbot

To run the chatbot, we have two main files; train_chatbot.py and chatapp.py.
First, we train the model using the command in the terminal:

python train_chatbot.py

If we don’t see any error during training, we have successfully created the model. Then to run the app, we run the second file.

python chatgui.py

The program will open up a GUI window within a few seconds. With the GUI you can easily chat with the bot.

You might also like...
Blender addon for executing the operator in response to the received OSC message.
Blender addon for executing the operator in response to the received OSC message.

I/F Joiner 受信したOSCメッセージに応じてオペレータ(bpy.ops)を実行するアドオンです. OSC通信に対応したコントローラやアプリをインストールしたスマートフォンを使用してBlenderを操作することが可能になります. 同時開発しているAndroidコントローラ化アプリMocopa

MiniJVM is simple java virtual machine written by python language, it can load class file from file system and run it.

MiniJVM MiniJVM是一款使用python编写的简易JVM,能够从本地加载class文件并且执行绝大多数指令。 支持的功能 1.从本地磁盘加载class并解析 2.支持绝大多数指令集的执行 3.支持虚拟机内存分区以及对象的创建 4.支持方法的调用和参数传递 5.支持静态代码块的初始化 不支

This is a vscode extension with a Virtual Assistant that you can play with when you are bored or you need help..

VS Code Virtual Assistant This is a vscode extension with a Virtual Assistant that you can play with when you are bored or you need help. Its currentl

TeamFleming is a multicultural group of 20 young bioinformatics enthusiasts participating in the 2021 HackBio Virtual Summer Internship
TeamFleming is a multicultural group of 20 young bioinformatics enthusiasts participating in the 2021 HackBio Virtual Summer Internship

💻 Welcome to Team Fleming's Repo! #TeamFleming is a multicultural group of 20 young bioinformatics enthusiasts participating in the 2021 HackBio Virt

This is a practice on Airflow, which is building virtual env, installing Airflow and constructing data pipeline (DAGs)

airflow-test This is a practice on Airflow, which is Builing virtualbox env and setting Airflow on that env Installing Airflow using python virtual en

🤖🤖 Jarvis is an virtual assistant which can some tasks easy for you like surfing on web opening an app and much more... 🤖🤖

Jarvis 🤖 🤖 Jarvis is an virtual assistant which can some tasks easy for you like surfing on web opening an app and much more... 🤖 🤖 Developer : su

Virtual Assistant Using Python

-Virtual-Assistant-Using-Python Virtual desktop assistant is an awesome thing. If you want your machine to run on your command like Jarvis did for Ton

Powerful virtual assistant in python

Virtual assistant in python Powerful virtual assistant in python Set up Step 1: download repo and unzip Step 2: pip install requirements.txt (if py au

pyToledo is a Python library to interact with the common virtual learning environment for the Association KU Leuven (Toledo).

pyToledo pyToledo is a Python library to interact with the common virtual learning environment for the Association KU Leuven a.k.a Toledo. Motivation

Owner
NelakurthiSudheer
Data Scientist | Kaggle Participant | Web Developer | MachineHack
NelakurthiSudheer
Ontario-Covid-Screening - An automated Covid-19 School Screening Tool for Ontario

Ontario-Covid19-Screening An automated Covid-19 School Screening Tool for Ontari

Rayan K 0 Feb 20, 2022
BloodCheck enables Red and Blue Teams to manage multiple Neo4j databases and run Cypher queries against a BloodHound dataset.

BloodCheck BloodCheck enables Red and Blue Teams to manage multiple Neo4j databases and run Cypher queries against a BloodHound dataset. Installation

Mr B0b 16 Nov 5, 2021
A simple panel with IP, CNPJ, CEP and PLACA queries

Painel mpm Um painel simples com consultas de IP, CNPJ, CEP e PLACA Início ?? apt update && apt upgrade -y pkg i python git pip install requests Insta

MrDiniz 4 Nov 4, 2022
It is a personal assistant chatbot, capable to perform many tasks same as Google Assistant plus more extra features...

PersonalAssistant It is an Personal Assistant, capable to perform many tasks with some unique features, that you haven'e seen yet.... Features / Tasks

Roshan Kumar 95 Dec 21, 2022
Incident Response Process and Playbooks | Goal: Playbooks to be Mapped to MITRE Attack Techniques

PURPOSE OF PROJECT That this project will be created by the SOC/Incident Response Community Develop a Catalog of Incident Response Playbook for every

Austin Songer 987 Jan 2, 2023
Yunqi Chen 7 Oct 30, 2022
A simple chatbot that I made for school project

Chatbot: Python A simple chatbot that I made for school Project. Tho this chatbot is dumb sometimes, but it's not too bad lol. Check it Out! FAQ How t

Prashant 2 Nov 13, 2021
Modify version of impacket wmiexec.py, get output(data,response) from registry, don't need SMB connection, also bypassing antivirus-software in lateral movement like WMIHACKER.

wmiexec-RegOut Modify version of impacket wmiexec.py,wmipersist.py. Got output(data,response) from registry, don't need SMB connection, but I'm in the

小离 228 Jan 4, 2023
Template (v0) do Sistema Chatbot - atividade síncrona - INE5404

ine-5404-sistema-chatbot-template Template (v0) do Sistema Chatbot - atividade síncrona - INE5404 Veja abaixo um exemplo de funcionamento do sistema:

null 0 Dec 7, 2021
Headless chatbot that detects spam and posts links to it to chatrooms for quick deletion.

SmokeDetector Headless chatbot that detects spam and posts it to chatrooms. Uses ChatExchange, takes questions from the Stack Exchange realtime tab, a

Charcoal 421 Dec 21, 2022