Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition

Arya Aftab

Last update: Nov 12, 2022

Related tags

Deep Learning lightweight deep-learning fully-convolutional-networks speech-emotion-recognition tflite tensorflow2

Overview

Light-SERNet

This is the Tensorflow 2.x implementation of our paper "Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition", submitted in ICASSP 2022.

In this paper, we propose an efficient and lightweight fully convolutional neural network(FCNN) for speech emotion recognition in systems with limited hardware resources. In the proposed FCNN model, various feature maps are extracted via three parallel paths with different filter sizes. This helps deep convolution blocks to extract high-level features, while ensuring sufficient separability. The extracted features are used to classify the emotion of the input speech segment. While our model has a smaller size than that of the state-of-the-art models, it achieves a higher performance on the IEMOCAP and EMO-DB datasets.

Run

1. Clone Repository

$ git clone https://github.com/AryaAftab/LIGHT-SERNET.git
$ cd LIGHT-SERNET/

2. Requirements

Tensorflow >= 2.3.0
Numpy >= 1.19.2
Tqdm >= 4.50.2
Matplotlib> = 3.3.1
Scikit-learn >= 0.23.2

$ pip install -r requirements.txt

3. Data:

Download EMO-DB and IEMOCAP(requires permission to access) datasets
extract them in data folder

4. Prepare datasets :

Use the following code to convert each dataset to the desired size(second):

$ python utils/segment/segment_dataset.py -dp data/{dataset_folder} -ip utils/DATASET_INFO.json -d {datasetname_in_jsonfile} -l {desired_size(seconds)}

For example, for EMO-DB Dataset :

$ python utils/segment/segment_dataset.py -dp data/EMO-DB -ip utils/DATASET_INFO.json -d EMO-DB -l 3

5. Set hyperparameters and training config :

You only need to change the constants in the hyperparameters.py to set the hyperparameters and the training config.

6. Strat training:

Use the following code to train the model on the desired dataset with the desired cost function.

Note 1: The database name is the name of the database folder after segmentation.
Note 2: The results for the confusion matrix are saved in the result folder.

$ python train.py -dn {dataset_name_after_segmentation} -ln {cost_function_name}

For example, for EMO-DB Dataset :

$ python train.py -dn EMO-DB_3s_Segmented -ln focal

Citation

If you find our code useful for your research, please consider citing:

@article{aftab2021light,
  title={Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition},
  author={Aftab, Arya and Morsali, Alireza and Ghaemmaghami, Shahrokh and Champagne, Benoit},
  journal={arXiv preprint arXiv:2110.03435},
  year={2021}
}

You might also like...

Dewarping Document Image By Displacement Flow Estimation with Fully Convolutional Network.

Dewarping Document Image By Displacement Flow Estimation with Fully Convolutional Network

111 Dec 27, 2022

Dewarping Document Image By Displacement Flow Estimation with Fully Convolutional Network

39 Aug 2, 2021

PocketNet: Extreme Lightweight Face Recognition Network using Neural Architecture Search and Multi-Step Knowledge Distillation

PocketNet This is the official repository of the paper: PocketNet: Extreme Lightweight Face Recognition Network using Neural Architecture Search and M

40 Dec 22, 2022

A PyTorch implementation for V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation

Comments

cannot run the IEMOCAP dataset on windows

Hello, could you show the data folder architecture so I understand the way you organised the dataset. I kept getting errors to segment the data. I extracted the IEMOCAP_full_release in the data folder the renamed it as IEMOCAP, however, I kept getting errors of files not found.

opened by nijaaouikhalil 24
InvalidArgumentError: Cannot batch tensors with different shapes in component 0.

Hello! Good job! But I have an error. I want to test the model with my audio files. I have created a folder my_test_3.0s_Segmented in date where the audio is tagged by emotion. Everything goes well, but I always get an error at the moment: list(test_dataset.as_numpy_iterator()) InvalidArgumentError: Cannot batch tensors with different shapes in component 0. First element had shape [103,40,1] and element 1 had shape [92,40,1]. [Op:IteratorGetNext] This prevents me from testing. I used my code on test data generated while training the model. The code works and I get the result. How can I fix it?

opened by ReyraV 1
function cleaning_directory_filename()

I think the function cleaning_directory_filename() breaks the speaker independence in the paper, i.e., 10-fold cross-validation, causing speaker overlap in the training and test sets. Removing this function, I get an 8% drop in WA. Could you explain my confusion.

opened by csDevin 0

Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition

Related tags

Overview

Light-SERNet

Run

1. Clone Repository

2. Requirements

3. Data:

4. Prepare datasets :

5. Set hyperparameters and training config :

6. Strat training:

Citation

You might also like...

Dewarping Document Image By Displacement Flow Estimation with Fully Convolutional Network.

Dewarping Document Image By Displacement Flow Estimation with Fully Convolutional Network

PocketNet: Extreme Lightweight Face Recognition Network using Neural Architecture Search and Multi-Step Knowledge Distillation

A PyTorch implementation for V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation

Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis"

African language Speech Recognition - Speech-to-Text

Method for facial emotion recognition compitition of Xunfei and Datawhale .

An Api for Emotion recognition.

This repo contains implementation of different architectures for emotion recognition in conversations.

Comments

cannot run the IEMOCAP dataset on windows

InvalidArgumentError: Cannot batch tensors with different shapes in component 0.

function cleaning_directory_filename()

Owner

Arya Aftab

Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network

Deepface is a lightweight face recognition and facial attribute analysis (age, gender, emotion and race) framework for python

A real-time speech emotion recognition application using Scikit-learn and gradio

Speech Emotion Recognition with Fusion of Acoustic- and Linguistic-Feature-Based Decisions

An implementation of paper `Real-time Convolutional Neural Networks for Emotion and Gender Classification` with PaddlePaddle.

PyTorch implementation of "ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context" (INTERSPEECH 2020)

This is a model made out of Neural Network specifically a Convolutional Neural Network model

Learning Lightweight Low-Light Enhancement Network using Pseudo Well-Exposed Images

TCNN Temporal convolutional neural network for real-time speech enhancement in the time domain

End-to-End Object Detection with Fully Convolutional Network