Production First and Production Ready End-to-End Keyword Spotting Toolkit

Last update: Dec 30, 2022

Related tags

Text Processing wekws

Overview

WeKws

Production First and Production Ready End-to-End Keyword Spotting Toolkit.

The goal of this toolkit it to...

Small footprint keyword spotting (KWS), or specifically wake-up word (WuW) detection is a typical and important module in internet of things (IoT) devices. It provides a way for users to control IoT devices with a hands-free experience. A WuW detection system usually runs locally and persistently on IoT devices, which requires low consumptional power, less model parameters, low computational comlexity and to detect predefined keyword in a streaming way, i.e., requires low latency.

Typical Scenario

We are going to support the following typical applications of wakeup word:

Single wake-up word
Multiple wake-up words
Customizable wake-up word
Personalized wake-up word, i.e. combination of wake-up word detection and voiceprint

Installation

Clone the repo

git clone https://github.com/wenet-e2e/wekws.git

Install Conda: please see https://docs.conda.io/en/latest/miniconda.html
Create Conda env:

conda create -n wenet python=3.8
conda activate wenet
pip install -r requirements.txt
conda install pytorch=1.10.0 torchaudio=0.10.0 cudatoolkit=11.1 -c pytorch -c conda-forge

Dataset

We plan to support a variaty of open source wake-up word datasets, include but not limited to:

All the well-trained models on these dataset will be made public avaliable.

Runtime

We plan to support a variaty of hardwares and platforms, including:

Web browser
x86
Android
Raspberry Pi

Comments

hi_xiaowen-mdtc perfermance can't be reproduced
I followed the stages in "$root_path/examples/hi_xiaowen/s0/run.sh" , and try to reproduce network perfermance using config/mdtc.yaml, num_average=10, max_epoch=80, but after training and check results(score.txt, stats.0.txt, stats.1.txt), the score, fa and recall seems very strange:

the highest score comes out at first 1/2 frame, the keyword speech has not be spoken at that time.

most scores are lower than 0.5.

because we config two keywords, but most postive wav got valuable and high score at both two output points. this is strange, when i change to ds_tcn config, the score seems correct.
opened by sugarcase 7
Overtraining and Bias towards keywords
Hi,

I'm facing the following issues using wekws:

It is overfitting after 2-4 epochs only. (Even train on hundreds, or thousands of hours of data).

High False positive. When there are many keywords like 20, it's confusing between those, and more bias towards keywords than Freetext (-1) class.

Confuse between similar sounding words, and predict freetext as a keyword.

Can you please suggest any solution for it?

Thanks
opened by csetanmayjain 4
Plotted DET Curve is empty

Hi, I train a model on my custom dataset, with a single keyword. Using compute_accuracy.py, I'm getting, 97% accuracy on the test dataset.

But when I tried to plot the DET curve using, plot_det_curve.py, the output DET curve image is empty. (Before running this, I already compute score.txt, and stats.txt)

Thanks

opened by csetanmayjain 4
How to prepare dataset for RIR & Musan Augmentation

Hi, Would like know how to prepare dataset for RIR & Musan Augmentation I go through the script, and understand that it needs data in .mdb format that should be inside lmdb folder. I have raw audio files, how to prepare data for it? Also, would like to know, is there any flag in the configuration file, which I can use as a flag to apply augmentation or not.

Thanks

opened by csetanmayjain 3
Server socket error when training while another task already run

Describe the bug

I have a server with 4GPU gtx1080 ubuntu 16.4

When I run train process using run.sh, if already another train task was already running, it will occur error:

Start training ... [W socket.cpp:401] [c10d] The server socket has failed to bind to [::]:29400 (errno: 98 - Address already in use). [W socket.cpp:401] [c10d] The server socket has failed to bind to 0.0.0.0:29400 (errno: 98 - Address already in use). [E socket.cpp:435] [c10d] The server socket has failed to listen on any local network address.

How to solve this case ?

opened by robotnc 3
MDTC causal config missing and cause failed

Traceback (most recent call last): File "kws/bin/train.py", line 230, in main() File "kws/bin/train.py", line 141, in main model = init_model(configs['model']) File "/home/pengteng.spt/wekws-master/kws/model/kws_model.py", line 125, in init_model causal = configs['backbone']['causal'] KeyError: 'causal' Traceback (most recent call last): File "kws/bin/train.py", line 230, in main() File "kws/bin/train.py", line 141, in main model = init_model(configs['model']) File "/home/pengteng.spt/wekws-master/kws/model/kws_model.py", line 125, in init_model causal = configs['backbone']['causal'] KeyError: 'causal' ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 101738) of binary: /home/pengteng.spt/miniconda2/envs/wenet/bin/python

opened by sugarcase 3
Finetuning support

Hi, Is there any way to train a KWS model, in case we want to train a new wake word model, without having much data?

Or can I finetune an existing model on the new wake word, which is trained on a good amount of data? If yes, how?

Thanks

opened by csetanmayjain 2
FFT result different with kaldi SRFFT

Hello I compared the result of WeKws fft and kaldi srfft, I found that the result is different, could you tell me which method of fft do we used ? the project is clean and light, I liked it!

Thanks

opened by mahuichao 2
fix export onnx model, backbone has no attribute 'padding'

AttributeError: 'MDTC' object has no attribute 'padding' Now, model backbone not has attribute 'padding in wekws and all model input cache is default torch.zeros(0, 0, 0, dtype=torch.float) in train.py

opened by xiaoqiang306 2
when mdtc add cache?

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like A clear and concise description of what you want to happen.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

opened by jldeng3 2
Add the Common Voice Single Word Dataset

https://commonvoice.mozilla.org/en/datasets

Finding single word datasets for English is hard and the Single Word Dataset from Common Voice is a rarity being multi national. Also it has a very useful sample 'Hey' that can be concatenated with other keywords. I used Sox to arrange by pitch/trim and then used Hey from Common Voice and Marvin from the Google Dataset to concatenate 'Hey Marvin' to create a good phonetically unique keyword.

opened by StuartIanNaylor 2
WeKws Roadmap 2.0
WeKws is a community-driven project and we love your feedback and proposals on where we should be heading. Feel free to volunteer yourself if you are interested in trying out some items(they do not have to be on the list).

The following items are in 2.0:

[ ] Rubustness, improve the robustness by learning acoustic feature rather than other features of the keywords.

[ ] Various on-device chips support.

[ ] Unsupervised model or pretrained model exploration.
opened by robin1001 0
Issue during testing while training with 80 Fbins hyperparameter

Hi, I train a new model after changing Fbins to 80. The model trained successfully, but when I tried to test the model on some test cases, getting the following error:

RuntimeError: The size of tensor a (80) must match the size of tensor b (40) at non-singleton dimension 2

I just change the hyperparameter in conf file, apart from this, do we need to make change anywhere else in the code? Is it something hard coded?

opened by csetanmayjain 2
GRU-based model?

I wonder why there isn't any GRU-based model implemented here.

Reading your paper, especially Fig. 1-4, I have the impression that GRU-based model performed better than TCN-based counterpart. I understand that for production system there are more considerations/constraints (model size, FLOPS/energy, etc.) than just FAR/FRR. Just want to make sure I didn't miss anything.

opened by nuance1979 9
memory leak using LookupCustomMetadataMap

Describe the bug pointer returned by LookupCustomMetadataMap must be released using allocator.Free(); (https://github.com/wenet-e2e/wekws/blob/main/runtime/core/kws/keyword_spotting.cc#L38-L41)

opened by deyituo 1
容易过拟合

您好，你们的工程非常棒，集合了小型的优秀的唤醒词模型以及提出创新性的了max_pooling loss．从我们用自己的数据跑你们的模型来看，比较容易过拟合，具体表现：１，训练集loss过快收敛，训练集acc过快的到达95%以上，大概两个step的时间

2，验证集的数据稍微和训练集有些不一致，loss就比较大，验证集acc=0．如果从同类的数据集中划出一部分数据作为验证集，剩余的作为训练集，loss就比较正常，acc也能达到95%以上.

3，和验证集比较类似的测试集（包括纯干净的数据)，测试结果也不佳，激活很差，有的激活率为0

4，从我们的实验结果来看，我们最终的测试集得和训练集尽可能的像，哪怕有比较小的差距，测试结果都是一边倒，个位数的识别率．

5，不知道你们有没有这样的情况，或者说我们还有哪里的技术点没有get到？有没有一些解决方案？谢谢，期待你们的回复．

opened by Sundy1219 4

Owner

Production First and Production Ready End-to-End Speech Toolkit

GitHub

Markup is an online annotation tool that can be used to transform unstructured documents into structured formats for NLP and ML tasks, such as named-entity recognition. Markup learns as you annotate in order to predict and suggest complex annotations. Markup also provides integrated access to existing and custom ontologies, enabling the prediction and suggestion of ontology mappings based on the text you're annotating.

Markup is an online annotation tool that can be used to transform unstructured documents into structured formats for NLP and ML tasks, such as named-entity recognition. Markup learns as you annotate in order to predict and suggest complex annotations. Markup also provides integrated access to existing and custom ontologies, enabling the prediction and suggestion of ontology mappings based on the text you're annotating.

146 Dec 18, 2022

🐸 Identify anything. pyWhat easily lets you identify emails, IP addresses, and more. Feed it a .pcap file or some text and it'll tell you what it is! 🧙‍♀️

?? Identify anything. pyWhat easily lets you identify emails, IP addresses, and more. Feed it a .pcap file or some text and it'll tell you what it is! ??‍♀️

5.6k Jan 3, 2023

You can encode and decode base85, ascii85, base64, base32, and base16 with this tool.

8 Dec 20, 2022

StealBit1.1 and earlier strings and config extraction scripts

StealBit1.1 and earlier scripts Use strings_decryptor.py to extract RC4 encrypted strings from a StealBit1.1 sample(s). Use config_extractor.py to ext

5 Dec 29, 2022

Fixes mojibake and other glitches in Unicode text, after the fact.

ftfy: fixes text for you >>> print(fix_encoding("(à¸‡'âŒ£')à¸‡")) (ง'⌣')ง Full documentation: https://ftfy.readthedocs.org Testimonials “My life is li

3.4k Jan 8, 2023

The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity

Contents Maintainer wanted Introduction Installation Documentation License History Source code Authors Maintainer wanted I am looking for a new mainta

1.2k Dec 16, 2022

Implementation of hashids (http://hashids.org) in Python. Compatible with Python 2 and Python 3

hashids for Python 2.7 & 3 A python port of the JavaScript hashids implementation. It generates YouTube-like hashes from one or many numbers. Use hash

1.4k Jan 2, 2023

A generator library for concise, unambiguous and URL-safe UUIDs.

Description shortuuid is a simple python library that generates concise, unambiguous, URL-safe UUIDs. Often, one needs to use non-sequential IDs in pl

1.8k Dec 31, 2022

A Python library that provides an easy way to identify devices like mobile phones, tablets and their capabilities by parsing (browser) user agent strings.

Python User Agents user_agents is a Python library that provides an easy way to identify/detect devices like mobile phones, tablets and their capabili

1.3k Dec 22, 2022

Production First and Production Ready End-to-End Keyword Spotting Toolkit

Related tags

Overview

WeKws

Typical Scenario

Installation

Dataset

Runtime

Comments

Owner

🐸 Identify anything. pyWhat easily lets you identify emails, IP addresses, and more. Feed it a .pcap file or some text and it'll tell you what it is! 🧙‍♀️

You can encode and decode base85, ascii85, base64, base32, and base16 with this tool.

StealBit1.1 and earlier strings and config extraction scripts

Fixes mojibake and other glitches in Unicode text, after the fact.

The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity

Implementation of hashids (http://hashids.org) in Python. Compatible with Python 2 and Python 3

A generator library for concise, unambiguous and URL-safe UUIDs.

A Python library that provides an easy way to identify devices like mobile phones, tablets and their capabilities by parsing (browser) user agent strings.

Format Covid values to ASCII-Table (Only for Germany and Austria)

Text to ASCII and ASCII to text

Hspell, the free Hebrew spellchecker and morphology engine.

REST API for sentence tokenization and embedding using Multilingual Universal Sentence Encoder.

Hotpotato is a recipe portfolio App that assists users to discover and comment new recipes.

Etranslate is a free and unlimited python library for transiting your texts

An experimental Fang Song style Chinese font generated with skeleton-tracing and pix2pix

The project is investigating methods to extract human-marked data from document forms such as surveys and tests.

a python package that lets you add custom colors and text formatting to your scripts in a very easy way!

Build a translation program similar to Google Translate with Python programming language and QT library