EfficientWord-Net
Hotword detection based on one-shot learning
Home assistants require special phrases called hotwords to get activated (eg:"ok google")
EfficientWord-Net is an hotword detection engine based on one-shot learning inspired from FaceNet's Siamese Network Architecture. Works very similar to face recognition , just requires a few samples of your own custom hotword to get going. No extra training or huge datasets required!! This will allow developers to add custom hotwords to their programs without a sweat or any extra charges. Just like google assistant's hotword detector, the engine performs the best when 3-4 hotword samples are collected directly from the user This repository is an official implemenation of EfficientWord-Net as a python library from the authors.
The library is purely written with python and uses Google's Tflite implemenation for faster realtime inference.
Demo of EfficientWord-Net in Pi
EfficientWord-Net.mp4
Access preprint
The research paper is currently under review in IEEE, click here to access the preprint and the training code will be available for public access once the paper is published.
Python Version Requirements
This Library works between python versions: 3.6 to 3.9
Dependencies Installation
Before running the pip installation command for the library, few dependencies need to be installed manually.
- PyAudio (depends on PortAudio)
- Tflite (tensorflow lightweight binaries)
- Librosa (Binaries might not be available for certain systems) Mac OS M* and Raspberry Pi users might have to compile these dependecies.
tflite package cannot be listed in requirements.txt hence will be automatically installed when the package is initialized in the system.
librosa package is not required for inference only cases , however when generate_reference is called , will be automatically installed.
Package Installation
Run the following pip command
pip install EfficientWord-Net
and to import running
import eff_word_net
Demo
After installing the packages, you can run the Demo script inbuilt with library (ensure you have a working mic).
Accesss Documentation from : https://ant-brain.github.io/EfficientWord-Net/
Command to run demo
python -m eff_word_net.engine
Generating Custom Wakewords
For any new hotword, the library needs information about the hotword, this information is obtained from a file called {wakeword}_ref.json
. Eg: For the wakeword 'alexa', the library would need the file called alexa_ref.json
These files can be generated with the following procedure:
One needs to collect few 4 to 10 uniquely sounding pronunciations of a given wakeword. Then put them into a seperate folder, which doesnt contain anything else.
Finally run this command, it will ask for the input folder's location (containing the audio files) and the output folder (where _ref.json file will be stored).
python -m eff_word_net.generate_reference
The pathname of the generated wakeword needs to passed to the HotwordDetector detector instance.
HotwordDetector(
hotword="hello",
reference_file = "/full/path/name/of/hello_ref.json"),
activation_count = 3 #2 by default
)
Few wakewords such as Mycroft, Google, Firefox, Alexa, Mobile, Siri the library has predefined embeddings readily available in the library installation directory, its path is readily available in the following variable
from eff_word_net import samples_loc
Try your first single hotword detection script
import os
from eff_word_net.streams import SimpleMicStream
from eff_word_net.engine import HotwordDetector
from eff_word_net import samples_loc
mycroft_hw = HotwordDetector(
hotword="Mycroft",
reference_file = os.path.join(samples_loc,"mycroft_ref.json"),
activation_count=3
)
mic_stream = SimpleMicStream()
mic_stream.start_stream()
print("Say Mycroft ")
while True :
frame = mic_stream.getFrame()
result = mycroft_hw.checkFrame(frame)
if(result):
print("Wakeword uttered")
Detecting Mulitple Hotwords from audio streams
The library provides a computation friendly way to detect multiple hotwords from a given stream, installed of running checkFrame()
of each wakeword individually
import os
from eff_word_net.streams import SimpleMicStream
from eff_word_net import samples_loc
print(samples_loc)
alexa_hw = HotwordDetector(
hotword="Alexa",
reference_file = os.path.join(samples_loc,"alexa_ref.json"),
)
siri_hw = HotwordDetector(
hotword="Siri",
reference_file = os.path.join(samples_loc,"siri_ref.json"),
)
mycroft_hw = HotwordDetector(
hotword="mycroft",
reference_file = os.path.join(samples_loc,"mycroft_ref.json"),
activation_count=3
)
multi_hw_engine = MultiHotwordDetector(
detector_collection = [
alexa_hw,
siri_hw,
mycroft_hw,
],
)
mic_stream = SimpleMicStream()
mic_stream.start_stream()
print("Say Mycroft / Alexa / Siri")
while True :
frame = mic_stream.getFrame()
result = multi_hw_engine.findBestMatch(frame)
if(None not in result):
print(result[0],f",Confidence {result[1]:0.4f}")
Access documentation of the library from here : https://ant-brain.github.io/EfficientWord-Net/
activation_count
in HotwordDetector
About Documenatation with detailed explanation on the usage of activation_count
parameter in HotwordDetector
is in the making , For now understand that for long hotwords 3 is advisable and 2 for smaller hotwords. If the detector gives out multiple triggers for a single utterance, try increasing activation_count
. To experiment begin with smaller values. Default value for the same is 2
FAQ :
- Hotword Perfomance is bad : if you are having some issue like this , feel to ask the same in discussions
CONTRIBUTION:
- If you have an ideas to make the project better, feel free to ping us in discussions
- The current logmelcalc.tflite graph can convert only 1 audio frame to Log Mel Spectrogram at a time. It will be of a great help if tensorflow guru's outthere help us out with this.
TODO :
- Add audio file handler in streams. PR's are welcome.
- Remove librosa requirement to encourage generating reference files directly in edge devices
- Add more detailed documentation explaining slider window concept
SUPPORT US:
Our hotword detector's performance is notably low when compared to Porcupine. We have thought about better NN architectures for the engine and hope to outperform Porcupine. This has been our undergrad project. Hence your support and encouragement will motivate us to develop the engine. If you loved this project recommend this to your peers, give us a