Production First and Production Ready End-to-End Keyword Spotting Toolkit

Overview

WeKws

Production First and Production Ready End-to-End Keyword Spotting Toolkit.

The goal of this toolkit it to...

Small footprint keyword spotting (KWS), or specifically wake-up word (WuW) detection is a typical and important module in internet of things (IoT) devices. It provides a way for users to control IoT devices with a hands-free experience. A WuW detection system usually runs locally and persistently on IoT devices, which requires low consumptional power, less model parameters, low computational comlexity and to detect predefined keyword in a streaming way, i.e., requires low latency.

Typical Scenario

We are going to support the following typical applications of wakeup word:

  • Single wake-up word
  • Multiple wake-up words
  • Customizable wake-up word
  • Personalized wake-up word, i.e. combination of wake-up word detection and voiceprint

Installation

  • Clone the repo
git clone https://github.com/wenet-e2e/wekws.git
conda create -n wenet python=3.8
conda activate wenet
pip install -r requirements.txt
conda install pytorch=1.10.0 torchaudio=0.10.0 cudatoolkit=11.1 -c pytorch -c conda-forge

Dataset

We plan to support a variaty of open source wake-up word datasets, include but not limited to:

All the well-trained models on these dataset will be made public avaliable.

Runtime

We plan to support a variaty of hardwares and platforms, including:

  • Web browser
  • x86
  • Android
  • Raspberry Pi
Comments
  • hi_xiaowen-mdtc perfermance can't be reproduced

    hi_xiaowen-mdtc perfermance can't be reproduced

    I followed the stages in "$root_path/examples/hi_xiaowen/s0/run.sh" , and try to reproduce network perfermance using config/mdtc.yaml, num_average=10, max_epoch=80, but after training and check results(score.txt, stats.0.txt, stats.1.txt), the score, fa and recall seems very strange: image image

    1. the highest score comes out at first 1/2 frame, the keyword speech has not be spoken at that time.
    2. most scores are lower than 0.5.
    3. because we config two keywords, but most postive wav got valuable and high score at both two output points. this is strange, when i change to ds_tcn config, the score seems correct. image
    opened by sugarcase 7
  • Overtraining and Bias towards keywords

    Overtraining and Bias towards keywords

    Hi,

    I'm facing the following issues using wekws:

    1. It is overfitting after 2-4 epochs only. (Even train on hundreds, or thousands of hours of data).
    2. High False positive. When there are many keywords like 20, it's confusing between those, and more bias towards keywords than Freetext (-1) class.
    3. Confuse between similar sounding words, and predict freetext as a keyword.

    Can you please suggest any solution for it?

    Thanks

    opened by csetanmayjain 4
  • Plotted DET Curve is empty

    Plotted DET Curve is empty

    Hi, I train a model on my custom dataset, with a single keyword. Using compute_accuracy.py, I'm getting, 97% accuracy on the test dataset.

    But when I tried to plot the DET curve using, plot_det_curve.py, the output DET curve image is empty. (Before running this, I already compute score.txt, and stats.txt)

    Thanks

    opened by csetanmayjain 4
  • How to prepare dataset for RIR & Musan Augmentation

    How to prepare dataset for RIR & Musan Augmentation

    Hi, Would like know how to prepare dataset for RIR & Musan Augmentation I go through the script, and understand that it needs data in .mdb format that should be inside lmdb folder. I have raw audio files, how to prepare data for it? Also, would like to know, is there any flag in the configuration file, which I can use as a flag to apply augmentation or not.

    Thanks

    opened by csetanmayjain 3
  • Server socket error when training while another task already run

    Server socket error when training while another task already run

    Describe the bug

    I have a server with 4GPU gtx1080 ubuntu 16.4

    When I run train process using run.sh, if already another train task was already running, it will occur error:

    Start training ... [W socket.cpp:401] [c10d] The server socket has failed to bind to [::]:29400 (errno: 98 - Address already in use). [W socket.cpp:401] [c10d] The server socket has failed to bind to 0.0.0.0:29400 (errno: 98 - Address already in use). [E socket.cpp:435] [c10d] The server socket has failed to listen on any local network address.

    How to solve this case ?

    opened by robotnc 3
  • MDTC causal config missing and cause failed

    MDTC causal config missing and cause failed

    image

    Traceback (most recent call last): File "kws/bin/train.py", line 230, in main() File "kws/bin/train.py", line 141, in main model = init_model(configs['model']) File "/home/pengteng.spt/wekws-master/kws/model/kws_model.py", line 125, in init_model causal = configs['backbone']['causal'] KeyError: 'causal' Traceback (most recent call last): File "kws/bin/train.py", line 230, in main() File "kws/bin/train.py", line 141, in main model = init_model(configs['model']) File "/home/pengteng.spt/wekws-master/kws/model/kws_model.py", line 125, in init_model causal = configs['backbone']['causal'] KeyError: 'causal' ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 101738) of binary: /home/pengteng.spt/miniconda2/envs/wenet/bin/python

    opened by sugarcase 3
  • Finetuning support

    Finetuning support

    Hi, Is there any way to train a KWS model, in case we want to train a new wake word model, without having much data?

    Or can I finetune an existing model on the new wake word, which is trained on a good amount of data? If yes, how?

    Thanks

    opened by csetanmayjain 2
  • FFT result different with kaldi SRFFT

    FFT result different with kaldi SRFFT

    Hello I compared the result of WeKws fft and kaldi srfft, I found that the result is different, could you tell me which method of fft do we used ? the project is clean and light, I liked it!

    Thanks

    opened by mahuichao 2
  • fix export onnx model, backbone has no attribute 'padding'

    fix export onnx model, backbone has no attribute 'padding'

    AttributeError: 'MDTC' object has no attribute 'padding' Now, model backbone not has attribute 'padding in wekws and all model input cache is default torch.zeros(0, 0, 0, dtype=torch.float) in train.py

    opened by xiaoqiang306 2
  • when mdtc add cache?

    when mdtc add cache?

    Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

    Describe the solution you'd like A clear and concise description of what you want to happen.

    Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

    Additional context Add any other context or screenshots about the feature request here.

    opened by jldeng3 2
  • Add the Common Voice Single Word Dataset

    Add the Common Voice Single Word Dataset

    https://commonvoice.mozilla.org/en/datasets

    Finding single word datasets for English is hard and the Single Word Dataset from Common Voice is a rarity being multi national. Also it has a very useful sample 'Hey' that can be concatenated with other keywords. I used Sox to arrange by pitch/trim and then used Hey from Common Voice and Marvin from the Google Dataset to concatenate 'Hey Marvin' to create a good phonetically unique keyword.

    opened by StuartIanNaylor 2
  • WeKws Roadmap 2.0

    WeKws Roadmap 2.0

    WeKws is a community-driven project and we love your feedback and proposals on where we should be heading. Feel free to volunteer yourself if you are interested in trying out some items(they do not have to be on the list).

    The following items are in 2.0:

    • [ ] Rubustness, improve the robustness by learning acoustic feature rather than other features of the keywords.
    • [ ] Various on-device chips support.
    • [ ] Unsupervised model or pretrained model exploration.
    opened by robin1001 0
  • Issue during testing while training with 80 Fbins hyperparameter

    Issue during testing while training with 80 Fbins hyperparameter

    Hi, I train a new model after changing Fbins to 80. The model trained successfully, but when I tried to test the model on some test cases, getting the following error:

    RuntimeError: The size of tensor a (80) must match the size of tensor b (40) at non-singleton dimension 2

    I just change the hyperparameter in conf file, apart from this, do we need to make change anywhere else in the code? Is it something hard coded?

    opened by csetanmayjain 2
  • GRU-based model?

    GRU-based model?

    I wonder why there isn't any GRU-based model implemented here.

    Reading your paper, especially Fig. 1-4, I have the impression that GRU-based model performed better than TCN-based counterpart. I understand that for production system there are more considerations/constraints (model size, FLOPS/energy, etc.) than just FAR/FRR. Just want to make sure I didn't miss anything.

    opened by nuance1979 9
  • memory leak using LookupCustomMetadataMap

    memory leak using LookupCustomMetadataMap

    Describe the bug pointer returned by LookupCustomMetadataMap must be released using allocator.Free(); (https://github.com/wenet-e2e/wekws/blob/main/runtime/core/kws/keyword_spotting.cc#L38-L41)

    opened by deyituo 1
  • 容易过拟合

    容易过拟合

    您好,你们的工程非常棒,集合了小型的优秀的唤醒词模型以及提出创新性的了max_pooling loss.从我们用自己的数据跑你们的模型来看,比较容易过拟合,具体表现: 1,训练集loss过快收敛,训练集acc过快的到达95%以上,大概两个step的时间

    2,验证集的数据稍微和训练集有些不一致,loss就比较大,验证集acc=0.如果从同类的数据集中划出一部分数据作为验证集,剩余的作为训练集,loss就比较正常,acc也能达到95%以上.

    3,和验证集比较类似的测试集(包括纯干净的数据),测试结果也不佳,激活很差,有的激活率为0

    4,从我们的实验结果来看,我们最终的测试集得和训练集尽可能的像,哪怕有比较小的差距,测试结果都是一边倒,个位数的识别率.

    5,不知道你们有没有这样的情况,或者说我们还有哪里的技术点没有get到?有没有一些解决方案? 谢谢,期待你们的回复.

    opened by Sundy1219 4
Owner
Production First and Production Ready End-to-End Speech Toolkit
null
Markup is an online annotation tool that can be used to transform unstructured documents into structured formats for NLP and ML tasks, such as named-entity recognition. Markup learns as you annotate in order to predict and suggest complex annotations. Markup also provides integrated access to existing and custom ontologies, enabling the prediction and suggestion of ontology mappings based on the text you're annotating.

Markup is an online annotation tool that can be used to transform unstructured documents into structured formats for NLP and ML tasks, such as named-entity recognition. Markup learns as you annotate in order to predict and suggest complex annotations. Markup also provides integrated access to existing and custom ontologies, enabling the prediction and suggestion of ontology mappings based on the text you're annotating.

Samuel Dobbie 146 Dec 18, 2022
🐸 Identify anything. pyWhat easily lets you identify emails, IP addresses, and more. Feed it a .pcap file or some text and it'll tell you what it is! 🧙‍♀️

?? Identify anything. pyWhat easily lets you identify emails, IP addresses, and more. Feed it a .pcap file or some text and it'll tell you what it is! ??‍♀️

Brandon 5.6k Jan 3, 2023
You can encode and decode base85, ascii85, base64, base32, and base16 with this tool.

You can encode and decode base85, ascii85, base64, base32, and base16 with this tool.

null 8 Dec 20, 2022
StealBit1.1 and earlier strings and config extraction scripts

StealBit1.1 and earlier scripts Use strings_decryptor.py to extract RC4 encrypted strings from a StealBit1.1 sample(s). Use config_extractor.py to ext

Soolidsnake 5 Dec 29, 2022
Fixes mojibake and other glitches in Unicode text, after the fact.

ftfy: fixes text for you >>> print(fix_encoding("(ง'⌣')ง")) (ง'⌣')ง Full documentation: https://ftfy.readthedocs.org Testimonials “My life is li

Luminoso Technologies, Inc. 3.4k Jan 8, 2023
The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity

Contents Maintainer wanted Introduction Installation Documentation License History Source code Authors Maintainer wanted I am looking for a new mainta

Antti Haapala 1.2k Dec 16, 2022
Implementation of hashids (http://hashids.org) in Python. Compatible with Python 2 and Python 3

hashids for Python 2.7 & 3 A python port of the JavaScript hashids implementation. It generates YouTube-like hashes from one or many numbers. Use hash

David Aurelio 1.4k Jan 2, 2023
A generator library for concise, unambiguous and URL-safe UUIDs.

Description shortuuid is a simple python library that generates concise, unambiguous, URL-safe UUIDs. Often, one needs to use non-sequential IDs in pl

Stavros Korokithakis 1.8k Dec 31, 2022
A Python library that provides an easy way to identify devices like mobile phones, tablets and their capabilities by parsing (browser) user agent strings.

Python User Agents user_agents is a Python library that provides an easy way to identify/detect devices like mobile phones, tablets and their capabili

Selwin Ong 1.3k Dec 22, 2022
Format Covid values to ASCII-Table (Only for Germany and Austria)

Covid-19-Formatter (Only for Germany and Austria) Dieses Script speichert die gemeldeten Daten des RKIs / BMSGPK und formatiert diese zu einer Asci Ta

null 56 Jan 22, 2022
Text to ASCII and ASCII to text

Text2ASCII Description This python script (converter.py) contains two functions: encode() is used to return a list of Integer, one item per character

null 4 Jan 22, 2022
Hspell, the free Hebrew spellchecker and morphology engine.

Hspell, the free Hebrew spellchecker and morphology engine.

null 16 Sep 15, 2022
REST API for sentence tokenization and embedding using Multilingual Universal Sentence Encoder.

MUSE stands for Multilingual Universal Sentence Encoder - multilingual extension (supports 16 languages) of Universal Sentence Encoder (USE).

Dani El-Ayyass 47 Sep 5, 2022
Hotpotato is a recipe portfolio App that assists users to discover and comment new recipes.

Hotpotato Hotpotato is a recipe portfolio App that assists users to discover and comment new recipes. It is a fullstack React App made with a Redux st

Nico G Pierson 13 Nov 5, 2021
Etranslate is a free and unlimited python library for transiting your texts

Etranslate is a free and unlimited python library for transiting your texts

Abolfazl Khalili 16 Sep 13, 2022
An experimental Fang Song style Chinese font generated with skeleton-tracing and pix2pix

An experimental Fang Song style Chinese font generated with skeleton-tracing and pix2pix, with glyphs based on cwTeXFangSong. The font is optimised fo

Lingdong Huang 98 Jan 7, 2023
The project is investigating methods to extract human-marked data from document forms such as surveys and tests.

The project is investigating methods to extract human-marked data from document forms such as surveys and tests. They can read questions, multiple-choice exam papers, and grade.

Harry 5 Mar 27, 2022
a python package that lets you add custom colors and text formatting to your scripts in a very easy way!

colormate Python script text formatting package What is colormate? colormate is a python library that lets you add text formatting to your scripts, it

Rodrigo 2 Dec 14, 2022
Build a translation program similar to Google Translate with Python programming language and QT library

google-translate Build a translation program similar to Google Translate with Python programming language and QT library Different parts of the progra

Amir Hussein Sharifnezhad 3 Oct 9, 2021