StrengthNet
Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis"
https://arxiv.org/abs/2110.03156
Dependency
Ubuntu 18.04.5 LTS
- GPU: Quadro RTX 6000
- Driver version: 450.80.02
- CUDA version: 11.0
Python 3.5
- tensorflow-gpu 2.0.0b1 (cudnn=7.6.0)
- scipy
- pandas
- matplotlib
- librosa
Environment set-up
For example,
conda create -n strengthnet python=3.5
conda activate strengthnet
pip install -r requirements.txt
conda install cudnn=7.6.0
Usage
-
Run
python utils.py
to extract .wav to .h5; -
Run
python train.py
to train a CNN-BLSTM based StrengthNet;
Evaluating new samples
-
Put the waveforms you wish to evaluate in a folder. For example,
/ / -
Run
python test.py --rootdir
/ /
This script will evaluate all the .wav
files in
, and write the results to
.
By default, the output/strengthnet.h5
pretrained model is used.
Citation
If you find this work useful in your research, please consider citing:
@misc{liu2021strengthnet,
title={StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis},
author={Rui Liu and Berrak Sisman and Haizhou Li},
year={2021},
eprint={2110.03156},
archivePrefix={arXiv},
primaryClass={cs.SD}
}
Resources
The ESD corpus is released by the HLT lab, NUS, Singapore.
The strength scores for the English samples of the ESD corpus are available here.
Acknowledgements:
MOSNet: https://github.com/lochenchou/MOSNet
Relative Attributes: Relative Attributes
License
This work is released under MIT License (see LICENSE file for details).