Production First and Production Ready End-to-End Keyword Spotting Toolkit

Overview

wenet-kws

Production First and Production Ready End-to-End Keyword Spotting Toolkit.

The goal of this toolkit it to...

Small footprint keyword spotting (KWS), or specifically wake-up word (WuW) detection is a typical and important module in internet of things (IoT) devices. It provides a way for users to control IoT devices with a hands-free experience. A WuW detection system usually runs locally and persistently on IoT devices, which requires low consumptional power, less model parameters, low computational comlexity and to detect predefined keyword in a streaming way, i.e., requires low latency.

Typical Scenario

We are going to support the following typical applications of wakeup word:

  • Single wake-up word
  • Multiple wake-up words
  • Customizable wake-up word
  • Personalized wake-up word, i.e. combination of wake-up word detection and voiceprint

Dataset

We plan to support a variaty of open source wake-up word datasets, include but not limited to:

All the well-trained models on these dataset will be made public avaliable.

Runtime

We plan to support a variaty of hardwares and platforms, including:

  • Web browser
  • x86
  • Android
  • Raspberry Pi
Comments
  • hi_xiaowen-mdtc perfermance can't be reproduced

    hi_xiaowen-mdtc perfermance can't be reproduced

    I followed the stages in "$root_path/examples/hi_xiaowen/s0/run.sh" , and try to reproduce network perfermance using config/mdtc.yaml, num_average=10, max_epoch=80, but after training and check results(score.txt, stats.0.txt, stats.1.txt), the score, fa and recall seems very strange: image image

    1. the highest score comes out at first 1/2 frame, the keyword speech has not be spoken at that time.
    2. most scores are lower than 0.5.
    3. because we config two keywords, but most postive wav got valuable and high score at both two output points. this is strange, when i change to ds_tcn config, the score seems correct. image
    opened by sugarcase 7
  • Overtraining and Bias towards keywords

    Overtraining and Bias towards keywords

    Hi,

    I'm facing the following issues using wekws:

    1. It is overfitting after 2-4 epochs only. (Even train on hundreds, or thousands of hours of data).
    2. High False positive. When there are many keywords like 20, it's confusing between those, and more bias towards keywords than Freetext (-1) class.
    3. Confuse between similar sounding words, and predict freetext as a keyword.

    Can you please suggest any solution for it?

    Thanks

    opened by csetanmayjain 4
  • Plotted DET Curve is empty

    Plotted DET Curve is empty

    Hi, I train a model on my custom dataset, with a single keyword. Using compute_accuracy.py, I'm getting, 97% accuracy on the test dataset.

    But when I tried to plot the DET curve using, plot_det_curve.py, the output DET curve image is empty. (Before running this, I already compute score.txt, and stats.txt)

    Thanks

    opened by csetanmayjain 4
  • How to prepare dataset for RIR & Musan Augmentation

    How to prepare dataset for RIR & Musan Augmentation

    Hi, Would like know how to prepare dataset for RIR & Musan Augmentation I go through the script, and understand that it needs data in .mdb format that should be inside lmdb folder. I have raw audio files, how to prepare data for it? Also, would like to know, is there any flag in the configuration file, which I can use as a flag to apply augmentation or not.

    Thanks

    opened by csetanmayjain 3
  • Server socket error when training while another task already run

    Server socket error when training while another task already run

    Describe the bug

    I have a server with 4GPU gtx1080 ubuntu 16.4

    When I run train process using run.sh, if already another train task was already running, it will occur error:

    Start training ... [W socket.cpp:401] [c10d] The server socket has failed to bind to [::]:29400 (errno: 98 - Address already in use). [W socket.cpp:401] [c10d] The server socket has failed to bind to 0.0.0.0:29400 (errno: 98 - Address already in use). [E socket.cpp:435] [c10d] The server socket has failed to listen on any local network address.

    How to solve this case ?

    opened by robotnc 3
  • MDTC causal config missing and cause failed

    MDTC causal config missing and cause failed

    image

    Traceback (most recent call last): File "kws/bin/train.py", line 230, in main() File "kws/bin/train.py", line 141, in main model = init_model(configs['model']) File "/home/pengteng.spt/wekws-master/kws/model/kws_model.py", line 125, in init_model causal = configs['backbone']['causal'] KeyError: 'causal' Traceback (most recent call last): File "kws/bin/train.py", line 230, in main() File "kws/bin/train.py", line 141, in main model = init_model(configs['model']) File "/home/pengteng.spt/wekws-master/kws/model/kws_model.py", line 125, in init_model causal = configs['backbone']['causal'] KeyError: 'causal' ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 101738) of binary: /home/pengteng.spt/miniconda2/envs/wenet/bin/python

    opened by sugarcase 3
  • Finetuning support

    Finetuning support

    Hi, Is there any way to train a KWS model, in case we want to train a new wake word model, without having much data?

    Or can I finetune an existing model on the new wake word, which is trained on a good amount of data? If yes, how?

    Thanks

    opened by csetanmayjain 2
  • FFT result different with kaldi SRFFT

    FFT result different with kaldi SRFFT

    Hello I compared the result of WeKws fft and kaldi srfft, I found that the result is different, could you tell me which method of fft do we used ? the project is clean and light, I liked it!

    Thanks

    opened by mahuichao 2
  • fix export onnx model, backbone has no attribute 'padding'

    fix export onnx model, backbone has no attribute 'padding'

    AttributeError: 'MDTC' object has no attribute 'padding' Now, model backbone not has attribute 'padding in wekws and all model input cache is default torch.zeros(0, 0, 0, dtype=torch.float) in train.py

    opened by xiaoqiang306 2
  • when mdtc add cache?

    when mdtc add cache?

    Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

    Describe the solution you'd like A clear and concise description of what you want to happen.

    Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

    Additional context Add any other context or screenshots about the feature request here.

    opened by jldeng3 2
  • Add the Common Voice Single Word Dataset

    Add the Common Voice Single Word Dataset

    https://commonvoice.mozilla.org/en/datasets

    Finding single word datasets for English is hard and the Single Word Dataset from Common Voice is a rarity being multi national. Also it has a very useful sample 'Hey' that can be concatenated with other keywords. I used Sox to arrange by pitch/trim and then used Hey from Common Voice and Marvin from the Google Dataset to concatenate 'Hey Marvin' to create a good phonetically unique keyword.

    opened by StuartIanNaylor 2
  • WeKws Roadmap 2.0

    WeKws Roadmap 2.0

    WeKws is a community-driven project and we love your feedback and proposals on where we should be heading. Feel free to volunteer yourself if you are interested in trying out some items(they do not have to be on the list).

    The following items are in 2.0:

    • [ ] Rubustness, improve the robustness by learning acoustic feature rather than other features of the keywords.
    • [ ] Various on-device chips support.
    • [ ] Unsupervised model or pretrained model exploration.
    opened by robin1001 0
  • Issue during testing while training with 80 Fbins hyperparameter

    Issue during testing while training with 80 Fbins hyperparameter

    Hi, I train a new model after changing Fbins to 80. The model trained successfully, but when I tried to test the model on some test cases, getting the following error:

    RuntimeError: The size of tensor a (80) must match the size of tensor b (40) at non-singleton dimension 2

    I just change the hyperparameter in conf file, apart from this, do we need to make change anywhere else in the code? Is it something hard coded?

    opened by csetanmayjain 2
  • GRU-based model?

    GRU-based model?

    I wonder why there isn't any GRU-based model implemented here.

    Reading your paper, especially Fig. 1-4, I have the impression that GRU-based model performed better than TCN-based counterpart. I understand that for production system there are more considerations/constraints (model size, FLOPS/energy, etc.) than just FAR/FRR. Just want to make sure I didn't miss anything.

    opened by nuance1979 9
  • memory leak using LookupCustomMetadataMap

    memory leak using LookupCustomMetadataMap

    Describe the bug pointer returned by LookupCustomMetadataMap must be released using allocator.Free(); (https://github.com/wenet-e2e/wekws/blob/main/runtime/core/kws/keyword_spotting.cc#L38-L41)

    opened by deyituo 1
  • 容易过拟合

    容易过拟合

    您好,你们的工程非常棒,集合了小型的优秀的唤醒词模型以及提出创新性的了max_pooling loss.从我们用自己的数据跑你们的模型来看,比较容易过拟合,具体表现: 1,训练集loss过快收敛,训练集acc过快的到达95%以上,大概两个step的时间

    2,验证集的数据稍微和训练集有些不一致,loss就比较大,验证集acc=0.如果从同类的数据集中划出一部分数据作为验证集,剩余的作为训练集,loss就比较正常,acc也能达到95%以上.

    3,和验证集比较类似的测试集(包括纯干净的数据),测试结果也不佳,激活很差,有的激活率为0

    4,从我们的实验结果来看,我们最终的测试集得和训练集尽可能的像,哪怕有比较小的差距,测试结果都是一边倒,个位数的识别率.

    5,不知道你们有没有这样的情况,或者说我们还有哪里的技术点没有get到?有没有一些解决方案? 谢谢,期待你们的回复.

    opened by Sundy1219 4
Owner
Production First and Production Ready End-to-End Speech Toolkit
null
apple's universal binaries BUT MUCH WORSE (PRACTICAL SHITPOST) (NOT PRODUCTION READY)

hyperuniversality investment opportunity: what if we could run multiple architectures in a single file, again apple universal binaries, but worse how

luna 2 Oct 19, 2021
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform tasks on automatic speech recogniti

Soohwan Kim 26 Dec 14, 2022
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform tasks on automatic speech recogniti

Soohwan Kim 86 Jun 11, 2021
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

?? Contributing to OpenSpeech ?? OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform ta

Openspeech TEAM 513 Jan 3, 2023
End-to-End Speech Processing Toolkit

ESPnet: end-to-end speech processing toolkit system/pytorch ver. 1.0.1 1.1.0 1.2.0 1.3.1 1.4.0 1.5.1 1.6.0 1.7.1 1.8.1 ubuntu18/python3.8/pip ubuntu18

ESPnet 5.9k Jan 3, 2023
Espresso: A Fast End-to-End Neural Speech Recognition Toolkit

Espresso Espresso is an open-source, modular, extensible end-to-end neural automatic speech recognition (ASR) toolkit based on the deep learning libra

Yiming Wang 919 Jan 3, 2023
Perform sentiment analysis and keyword extraction on Craigslist listings

craiglist-helper synopsis Perform sentiment analysis and keyword extraction on Craigslist listings Background I love Craigslist. I've found most of my

Mark Musil 1 Nov 8, 2021
BERT, LDA, and TFIDF based keyword extraction in Python

BERT, LDA, and TFIDF based keyword extraction in Python kwx is a toolkit for multilingual keyword extraction based on Google's BERT and Latent Dirichl

Andrew Tavis McAllister 41 Dec 27, 2022
Wake: Context-Sensitive Automatic Keyword Extraction Using Word2vec

Wake Wake: Context-Sensitive Automatic Keyword Extraction Using Word2vec Abstract استخراج خودکار کلمات کلیدی متون کوتاه فارسی با استفاده از word2vec ب

Omid Hajipoor 1 Dec 17, 2021
:mag: End-to-End Framework for building natural language search interfaces to data by utilizing Transformers and the State-of-the-Art of NLP. Supporting DPR, Elasticsearch, HuggingFace’s Modelhub and much more!

Haystack is an end-to-end framework that enables you to build powerful and production-ready pipelines for different search use cases. Whether you want

deepset 1.4k Feb 18, 2021
An open source library for deep learning end-to-end dialog systems and chatbots.

DeepPavlov is an open-source conversational AI library built on TensorFlow, Keras and PyTorch. DeepPavlov is designed for development of production re

Neural Networks and Deep Learning lab, MIPT 6k Dec 30, 2022
An open source library for deep learning end-to-end dialog systems and chatbots.

DeepPavlov is an open-source conversational AI library built on TensorFlow, Keras and PyTorch. DeepPavlov is designed for development of production re

Neural Networks and Deep Learning lab, MIPT 6k Dec 31, 2022
An open source library for deep learning end-to-end dialog systems and chatbots.

DeepPavlov is an open-source conversational AI library built on TensorFlow, Keras and PyTorch. DeepPavlov is designed for development of production re

Neural Networks and Deep Learning lab, MIPT 5k Feb 18, 2021
Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks. It takes raw videos/images + text as inputs, and outputs task predictions. ClipBERT is designed based on 2D CNNs and transformers, and uses a sparse sampling strategy to enable efficient end-to-end video-and-language learning.

Jie Lei 雷杰 612 Jan 4, 2023
Rhasspy 673 Dec 28, 2022
PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

Chung-Ming Chien 1k Dec 30, 2022
An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

CRNN paper:An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition 1. create your ow

Tsukinousag1 3 Apr 2, 2022
🤗 The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools

?? The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools

Hugging Face 15k Jan 2, 2023