☢️
Audiomer
☢️
Audiomer: A Convolutional Transformer for Keyword Spotting
[ arXiv ] |
[ Previous SOTA ] |
[ Model Architecture ] |
---|
Results on SpeechCommands
data:image/s3,"s3://crabby-images/8fcef/8fcefde616bf9465f03d1f729e66b032411eba74" alt=""
Model Architecture
data:image/s3,"s3://crabby-images/76ea4/76ea42e6be353325d5e6c7bebd1eb1d745c14ce2" alt=""
Performer Conv-Attention
data:image/s3,"s3://crabby-images/896b3/896b33799ef671307db8cb1d19efdad6cf16af4e" alt=""
Usage
To reproduce the results in the paper, follow the instructions:
- To download the Speech Commands v2 dataset, run:
python3 datamodules/SpeechCommands12.py
- To train Audiomer-S and Audiomer-L on all three datasets thrice, run:
python3 run_expts.py
- To evaluate a model on a dataset, run:
python3 evaluate.py --checkpoint_path /path/to/checkpoint.ckpt --model <model type> --dataset <name of dataset>
. - For example:
python3 evaluate.py --checkpoint_path ./epoch=300.ckpt --model S --dataset SC20
System requirements
- NVIDIA GPU with CUDA
- Python 3.6 or higher.
- pytorch_lightning
- torchaudio
- performer_pytorch