Keyword spotting on Arm Cortex-M Microcontrollers

Arm Software

Last update: Dec 30, 2022

Related tags

Deep Learning python machine-learning arm deep-neural-networks microcontrollers cmsis-nn

Overview

Keyword spotting for Microcontrollers

This repository consists of the tensorflow models and training scripts used in the paper: Hello Edge: Keyword spotting on Microcontrollers. The scripts are adapted from Tensorflow examples and some are repeated here for the sake of making these scripts self-contained.

To train a DNN with 3 fully-connected layers with 128 neurons in each layer, run:

python train.py --model_architecture dnn --model_size_info 128 128 128

The command line argument --model_size_info is used to pass the neural network layer dimensions such as number of layers, convolution filter size/stride as a list to models.py, which builds the tensorflow graph based on the provided model architecture and layer dimensions. For more info on model_size_info for each network architecture see models.py. The training commands with all the hyperparameters to reproduce the models shown in the paper are given here.

To run inference on the trained model from a checkpoint on train/val/test set, run:

python test.py --model_architecture dnn --model_size_info 128 128 128 --checkpoint 
<checkpoint path>

To freeze the trained model checkpoint into a .pb file, run:

python freeze.py --model_architecture dnn --model_size_info 128 128 128 --checkpoint 
<checkpoint path> --output_file dnn.pb

Pretrained models

Trained models (.pb files) for different neural network architectures such as DNN, CNN, Basic LSTM, LSTM, GRU, CRNN and DS-CNN shown in this arXiv paper are added in Pretrained_models. Accuracy of the models on validation set, their memory requirements and operations per inference are also summarized in the following table.

To run an audio file through the trained model (e.g. a DNN) and get top prediction, run:

python label_wav.py --wav <audio file> --graph Pretrained_models/DNN/DNN_S.pb 
--labels Pretrained_models/labels.txt --how_many_labels 1

Quantization Guide and Deployment on Microcontrollers

A quick guide on quantizing the KWS neural network models is here. The example code for running a DNN model on a Cortex-M development board is also provided here.

Comments

Calculate a negative value in bias_shift (shift right)

At the final fc layer bias dec bit =10. I calculated the last layer, wx=8 ， bias=10 ， bias_shit=-2，bias need to be shifted right 。 But there is no option for the right shift 。 what should I do with this situation.

opened by ccnankai 15
Change of wanted_words resulting very low recognition accuracy (only 24%)

Hi, there, I changed several wanted_words in "train.py" as: "up,down,left,right,stop,go,follow,forward,backward,on,off"

I then trained ds_cnn_s model using instructions in "train_commands.txt" "python train.py --model_architecture ds_cnn --model_size_info 5 64 10 4 2 2 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 --dct_coefficient_count 10 --window_size_ms 40 --window_stride_ms 20 --learning_rate 0.0005,0.0001,0.00002 --how_many_training_steps 10000,10000,10000 --summaries_dir work/DS_CNN/DS_CNN1/retrain_logs --train_dir work/DS_CNN/DS_CNN1/training"

I changed the wanted_words in "test.py" to the same ones above and

"python test.py --model_architecture ds_cnn --model_size_info 5 64 10 4 2 2 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 --checkpoint=/Users/michaelhe/ml/asr/ML-KWS-for-MCU/work/DS_CNN/DS_CNN1/training/best/ds_cnn_9294.ckpt-27200"

INFO:tensorflow:set_size=33946 INFO:tensorflow:Confusion Matrix: [[2837 0 0 0 0 0 0 0 0 0 0 0 0] [ 60 2174 0 0 0 0 0 654 0 0 0 0 0] [ 163 1902 22 0 0 0 0 800 0 0 0 3 0] [ 48 1225 0 35 0 0 0 1909 0 0 0 0 0] [ 52 2636 0 0 12 7 0 409 0 0 0 0 0] [ 36 2420 0 0 0 345 0 180 0 0 0 0 0] [ 86 1204 0 0 0 0 0 1750 0 0 0 0 0] [ 34 190 0 0 0 0 0 2884 0 0 0 0 0] [ 42 811 0 0 0 0 0 364 72 0 0 0 0] [ 23 989 0 0 0 0 0 181 4 1 0 0 0] [ 17 1257 0 0 0 0 0 98 0 0 0 0 0] [ 55 2400 0 0 0 0 0 486 0 0 0 136 0] [ 122 1668 0 0 0 0 0 1141 0 0 0 1 1]] INFO:tensorflow:Training accuracy = 25.10% (N=33946) INFO:tensorflow:set_size=3999 INFO:tensorflow:Confusion Matrix: [[334 0 0 0 0 0 0 0 0 0 0 0 0] [ 4 251 0 0 0 0 0 79 0 0 0 0 0] [ 19 220 1 0 0 0 0 110 0 0 0 0 0] [ 3 102 0 4 0 0 0 268 0 0 0 0 0] [ 6 304 0 0 1 1 0 40 0 0 0 0 0] [ 4 302 0 0 0 36 0 21 0 0 0 0 0] [ 12 128 0 0 0 0 0 210 0 0 0 0 0] [ 3 30 0 0 0 0 0 339 0 0 0 0 0] [ 1 80 0 0 0 0 0 44 7 0 0 0 0] [ 0 118 0 0 0 0 0 27 0 1 0 0 0] [ 3 140 0 0 0 0 0 10 0 0 0 0 0] [ 7 272 0 0 0 0 0 75 0 0 0 9 0] [ 13 209 1 0 0 0 0 149 0 0 0 1 0]] INFO:tensorflow:Validation accuracy = 24.58% (N=3999) INFO:tensorflow:set_size=4492 INFO:tensorflow:Confusion Matrix: [[375 0 0 0 0 0 0 0 0 0 0 0 0] [ 5 300 0 0 0 1 0 69 0 0 0 0 0] [ 25 274 2 0 0 0 0 124 0 0 0 0 0] [ 6 160 0 4 0 0 0 236 0 0 0 0 0] [ 5 352 0 0 0 0 0 55 0 0 0 0 0] [ 3 331 0 0 0 20 0 42 0 0 0 0 0] [ 11 165 0 0 0 0 0 235 0 0 0 0 0] [ 5 28 0 0 0 0 0 369 0 0 0 0 0] [ 2 124 0 0 0 0 0 40 6 0 0 0 0] [ 3 135 0 0 0 0 0 16 1 0 0 0 0] [ 2 146 0 0 0 0 0 17 0 0 0 0 0] [ 9 297 0 0 0 0 0 73 2 0 0 15 0] [ 13 227 0 0 0 0 0 161 1 0 0 0 0]]

**### **INFO:tensorflow:Test accuracy = 24.29% (N=4492)

24.29% accuracy seems very low.****

### Did I make any mistake somewhere in the process?

Thank you so much.

opened by auvilink 15
Reasons for the Random keyword detection on board ?

Hello All,

I have deployed the DS-CNN model on S32K148 (external analog microphone connected), which is detecting some random output without any input given by user. if we say a keyword loudly and multiple times then its working. After tuning the frame_length, frame_shift, num_frames mel low and mel upper frequency, it is detecting silence and unknown words with maximum accuracy and other keywords are haven't detecting properly?

it would be really helpful, if somebody know the answers to this problem

Is the problem with the model ?

parameters (frame_length, frame_shift, num_frames,......etc) ?

Microphone (Currently I am using Analog, do I need to to switch to Digital)?

Thanks

opened by saichand07 11
change wav file to 1500ms ,get poor result

HI, have you run ds_cnn on mcu already. I have only 200 training samples, the effect is very poor on MCU. My wav file is 1500ms long.Can I run the program only by changing these three parameters FRAME_LEN_MS , FRAME_SHIFT_MS and NUM_FRAMES? is it right？

opened by ccnankai 11
provided DS-CNN accuracy (on board)

Hi,

When testing the provided DS-CNN on my board, the accuracy seems to be very low. Especially in the real time example, the model almost always predicts "left" as soon as there is some sound. Everything seems fine listening to the audio loopback, and the DNN provides much better results. Is there specific parameters to use with DS-CNN in order to get high accuracy ?

Thanks

opened by 4p0pt0Z 11
This error shows that "....might take a minute", however it cost one hour and still donot success.

C:\Users\Administrator\Desktop\ML-KWS-for-MCU\Deployment>mbed new kws_simple_tes t --mbedlib [mbed] Creating new program "kws_simple_test" (git) [mbed] Adding library "mbed" from "https://mbed.org/users/mbed_official/code/mbe d/builds" at latest revision in the current branch [mbed] Downloading library build "5aab5a7997ee" (might take a minute)

opened by oneway3124 10
Training the model with my own data

Hi, As the train.py describes, the program support training with my own data. However, when I use my own .wav files by --data_dir, it says IOError: CRC check failed 0x7668da3b != 0x316d6d75L. Have you ever test the program with some other database instead of the google recordings? Thanks.

opened by zhaoforever 9
there are one error when I try to deploy on target.

1.Build and run a simple KWS inference 2.mbed new kws_simple_test --mbedlib 3. ERROR An error occurred while unpacking library archive ".bld.rev-5aab5a7997ee.zip" in "c:\Users\Administrator\Desktop\ML-KWS-for-MCU\Deployment\kws_simple_test\mbed"

opened by oneway3124 8
Kernel size of DS-CNN's Conv1
Hi, I read throughout the paper but it's still not clear about kernel size, so let me ask:

How did you decide your kernel size of DS-CNN (4 x 10)?

Is it "time by feature" or "feature by time"? It looks like size for feature is 4, and size for time is 10 in your code. But for me, it is more reasonable if 10 is for feature and 4 is for time.

This is because it seems not to be clear what is the best kernel representation especially for speech, questioned by this article: https://towardsdatascience.com/whats-wrong-with-spectrograms-and-cnns-for-audio-processing-311377d7ccd

I appreciate if you could share any of your thought when you were elaborating your fine DS-CNN. Thank you.
opened by blmd-niz 8

Quantization and scaling

@navsuda In case of DNN, --act_max should have 5 parameters such as --act_max 32 30 30 30 30, while the DNN structure is made of 4 layers

void DNN::run_nn(q7_t* in_data, q7_t* out_data)
{
	// Run all layers
	
	// IP1 
	arm_fully_connected_q7(in_data, ip1_wt, IN_DIM, IP1_OUT_DIM, 1, 7, ip1_bias, ip1_out, vec_buffer);
        // RELU1
	arm_relu_q7(ip1_out, IP1_OUT_DIM);

	// IP2 
	arm_fully_connected_q7(ip1_out, ip2_wt, IP1_OUT_DIM, IP2_OUT_DIM, 2, 8, ip2_bias, ip2_out, vec_buffer);
        // RELU2
	arm_relu_q7(ip2_out, IP2_OUT_DIM);

	// IP3 
	arm_fully_connected_q7(ip2_out, ip3_wt, IP2_OUT_DIM, IP3_OUT_DIM, 2, 9, ip3_bias, ip3_out, vec_buffer);
        // RELU3
	arm_relu_q7(ip3_out, IP3_OUT_DIM);

	// IP4 
	arm_fully_connected_q7(ip3_out, ip4_wt, IP3_OUT_DIM, OUT_DIM, 0, 6, ip4_bias, out_data, vec_buffer);

}

How should I use these 5 --act_max values to scale 4 arm_fully_connected_q7() functions?

opened by pooyaww 7

Quantization

Hi,

In conclusion of the paper, it states "We quantized representative trained 32-bit floating-point KWS models into 8-bit fixed-point versions... ".

Is the pretrained model already in 8 bit fixed point ??

opened by pribadihcr 7
reproduction in keras doesn't give same accuracy

Hi,

I have been trying to reproduce the same DS-CNN architecture in Keras mode in Edge Impulse Studio. I implemented the exact same arch and try to give the system the same dataset as training and validation. but I see an overfitting issue (training accuracy goes to 100 but validation accuracy stays around 88 percent). Do you have any idea why reproducing in Keras giving me this overfitting issue?

Thank you in advance,

opened by asadisina 1
How to run a DNN model with 436 neurons on the MCU?

Hi.

I've tried to run, quantize and the compile the dnn model with 3 layers and 436 neurons for each layers. It is possible to launch the command: python train.py --model_architecture dnn --model_size_info 436 436 436 --window_size_ms 40 --windo_stride_ms 40 --dct_coefficient_count 10

without any problem and the model will be trained. After that I digit: python quant_test.py --model_architecture dnn --model_size_info 436 436 436 --window_size_ms 40 --windo_stride_ms 40 --dct_coefficient_count 10 --checkpoint --act_max 32 32 32 32 32

the main problem is that the last command generates a file with the weights too heavy for my MCU (like 800KB against the 300KB of the model with 144 neurons per layer). How to solve this kind of problem ? I know that i need to quantize but the result doesn't fit in the memory of my MCU (I use the DISCO_F746NG like in the example). Another question is related to the file quant_models.py: I guess that is possible to train the NN with this thing, but what type of command I need to launch?

opened by Bosch936 0
link

Faild to link kws_simple_test after compiling，The problem is described as follows： arm-none-eabi-cpp：fatal error：‘-c’ is not a valid option to the preprocessor compilation terminated.

opened by lisem777 0
mbed error

$ mbed new kws_simple_test --mbedlib [mbed] Working path "E:\Zbit\AI\KWS\Git\ML-KWS-for-MCU\Deployment" (program) [mbed] Creating new program "kws_simple_test" (git) [mbed] Adding library "mbed" from "https://mbed.org/users/mbed_official/code/mbed/builds" at branch/tag "tip" [mbed] Unpacking library build "65be27845400" in "E:\Zbit\AI\KWS\Git\ML-KWS-for-MCU\Deployment\kws_simple_test\mbed" [mbed] Unpacking library build "65be27845400" in "E:\Zbit\AI\KWS\Git\ML-KWS-for-MCU\Deployment\kws_simple_test\mbed" [mbed] Updating reference "mbed" -> "https://mbed.org/users/mbed_official/code/mbed/builds/65be27845400" [mbed] Couldn't find build tools in your program. Downloading the mbed 2.0 SDK tools...

opened by aktoey 1
Completely wrong predictions on own wav files

When I try to run my own audio recordings using label_wav.py, I always get random predictions. I trimmed the audio files to only contain the keyword and still receive false outputs. Does anyone experience the same?

opened by oezguensi 1

error to run label_wav

im getting this error.



(py37) C:\projects\audio\ML-KWS-for-MCU>python label_wav.py --wav yesno.wav --graph Pretrained_models/DNN/DNN_S.pb --labels Pretrained_models/labels.txt --how_many_labels 1
C:\Users\MasterRoot\Anaconda3\envs\py37\lib\site-packages\tensorflow\python\framework\dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
C:\Users\MasterRoot\Anaconda3\envs\py37\lib\site-packages\tensorflow\python\framework\dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
C:\Users\MasterRoot\Anaconda3\envs\py37\lib\site-packages\tensorflow\python\framework\dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
C:\Users\MasterRoot\Anaconda3\envs\py37\lib\site-packages\tensorflow\python\framework\dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
C:\Users\MasterRoot\Anaconda3\envs\py37\lib\site-packages\tensorflow\python\framework\dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
C:\Users\MasterRoot\Anaconda3\envs\py37\lib\site-packages\tensorflow\python\framework\dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
Traceback (most recent call last):
  File "label_wav.py", line 37, in <module>
    import tensorflow as tf
  File "C:\Users\MasterRoot\Anaconda3\envs\py37\lib\site-packages\tensorflow\__init__.py", line 28, in <module>
    from tensorflow.python import pywrap_tensorflow  # pylint: disable=unused-import
  File "C:\Users\MasterRoot\Anaconda3\envs\py37\lib\site-packages\tensorflow\python\__init__.py", line 83, in <module>
    from tensorflow.python import keras
  File "C:\Users\MasterRoot\Anaconda3\envs\py37\lib\site-packages\tensorflow\python\keras\__init__.py", line 26, in <module>
    from tensorflow.python.keras import activations
  File "C:\Users\MasterRoot\Anaconda3\envs\py37\lib\site-packages\tensorflow\python\keras\activations.py", line 24, in <module>
    from tensorflow.python.keras.utils.generic_utils import deserialize_keras_object
  File "C:\Users\MasterRoot\Anaconda3\envs\py37\lib\site-packages\tensorflow\python\keras\utils\__init__.py", line 39, in <module>
    from tensorflow.python.keras.utils.multi_gpu_utils import multi_gpu_model
  File "C:\Users\MasterRoot\Anaconda3\envs\py37\lib\site-packages\tensorflow\python\keras\utils\multi_gpu_utils.py", line 22, in <module>
    from tensorflow.python.keras.engine.training import Model
  File "C:\Users\MasterRoot\Anaconda3\envs\py37\lib\site-packages\tensorflow\python\keras\engine\training.py", line 40, in <module>
    from tensorflow.python.keras.engine import network
  File "C:\Users\MasterRoot\Anaconda3\envs\py37\lib\site-packages\tensorflow\python\keras\engine\network.py", line 39, in <module>
    from tensorflow.python.keras import saving
  File "C:\Users\MasterRoot\Anaconda3\envs\py37\lib\site-packages\tensorflow\python\keras\saving\__init__.py", line 33, in <module>
    from tensorflow.python.keras.saving.saved_model import export_saved_model
ImportError: cannot import name 'export_saved_model' from 'tensorflow.python.keras.saving.saved_model' (C:\Users\MasterRoot\Anaconda3\envs\py37\lib\site-packages\tensorflow\python\keras\saving\saved_model\__init__.py)

i need some help to fix it please

opened by TheMasterRoot 1

Owner

Arm Software

GitHub

Unofficial TensorFlow implementation of the Keyword Spotting Transformer model

Keyword Spotting Transformer This is the unofficial TensorFlow implementation of the Keyword Spotting Transformer model. This model is used to train o

8 May 11, 2022

PyTorch implementations of neural network models for keyword spotting

Honk: CNNs for Keyword Spotting Honk is a PyTorch reimplementation of Google's TensorFlow convolutional neural networks for keyword spotting, which ac

475 Dec 15, 2022

A Convolutional Transformer for Keyword Spotting

☢️ Audiomer ☢️ Audiomer: A Convolutional Transformer for Keyword Spotting [ arXiv ] [ Previous SOTA ] [ Model Architecture ] Results on SpeechCommands

49 Jan 27, 2022

Unofficial TensorFlow implementation of the Keyword Spotting Transformer model

Keyword Spotting Transformer This is the unofficial TensorFlow implementation of the Keyword Spotting Transformer model. This model is used to train o

8 May 11, 2022

Keyword-BERT: Keyword-Attentive Deep Semantic Matching

project discription An implementation of the Keyword-BERT model mentioned in my paper Keyword-Attentive Deep Semantic Matching (Plz cite this github r

1 Nov 14, 2021

VOneNet: CNNs with a Primary Visual Cortex Front-End

VOneNet: CNNs with a Primary Visual Cortex Front-End A family of biologically-inspired Convolutional Neural Networks (CNNs). VOneNets have the followi

99 Dec 22, 2022

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

SwinTextSpotter This is the pytorch implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text R

183 Jan 3, 2023

ManipulaTHOR, a framework that facilitates visual manipulation of objects using a robotic arm

ManipulaTHOR: A Framework for Visual Object Manipulation Kiana Ehsani, Winson Han, Alvaro Herrasti, Eli VanderBilt, Luca Weihs, Eric Kolve, Aniruddha

65 Dec 30, 2022

Attention-driven Robot Manipulation (ARM) which includes Q-attention

Attention-driven Robotic Manipulation (ARM) This codebase is home to: Q-attention: Enabling Efficient Learning for Vision-based Robotic Manipulation I

84 Dec 29, 2022

PyTorch implementation of ARM-Net: Adaptive Relation Modeling Network for Structured Data.

A ready-to-use framework of latest models for structured (tabular) data learning with PyTorch. Applications include recommendation, CRT prediction, healthcare analytics, and etc.

48 Nov 30, 2022

Doosan robotic arm, simulation, control, visualization in Gazebo and ROS2 for Reinforcement Learning.

Robotic Arm Simulation in ROS2 and Gazebo General Overview This repository includes: First, how to simulate a 6DoF Robotic Arm from scratch using GAZE

12 Jan 2, 2023

High performance Cross-platform Inference-engine, you could run Anakin on x86-cpu,arm, nv-gpu, amd-gpu,bitmain and cambricon devices.

Anakin2.0 Welcome to the Anakin GitHub. Anakin is a cross-platform, high-performance inference engine, which is originally developed by Baidu engineer

514 Dec 28, 2022

Make a Turtlebot3 follow a figure 8 trajectory and create a robot arm and make it follow a trajectory

HW2 - ME 495 Overview Part 1: Makes the robot move in a figure 8 shape. The robot starts moving when launched on a real turtlebot3 and can be paused a

0 Oct 21, 2022

A robotic arm that mimics hand movement through MediaPipe tracking.

La-Z-Arm A robotic arm that mimics hand movement through MediaPipe tracking. Hardware NVidia Jetson Nano Sparkfun Pi Servo Shield Micro Servos Webcam

1 Jun 5, 2022

Pytorch implementation of "ARM: Any-Time Super-Resolution Method"

ARM-Net Dependencies Python 3.6 Pytorch 1.7 Results Train Data preprocessing cd data_scripts python extract_subimages_test.py python data_augmentation

55 Nov 24, 2022

Control-Robot-Arm-using-PS4-Controller - A Robotic Arm based on Raspberry Pi and Arduino that controlled by PS4 Controller

Control-Robot-Arm-using-PS4-Controller You can see all details about this Robot

5 Jan 1, 2022