Identify the emotion of multiple speakers in an Audio Segment

Overview

PR MIT License made-with-python


Logo

MevonAI - Speech Emotion Recognition

Identify the emotion of multiple speakers in a Audio Segment
Report Bug · Request Feature

Try the Demo Here

Open In Colab

Table of Contents

About The Project

Logo

The main aim of the project is to Identify the emotion of multiple speakers in a call audio as a application for customer satisfaction feedback in call centres.

Built With

Getting Started

Follow the Below Instructions for setting the project up on your local Machine.

Installation

  1. Create a python virtual environment
sudo apt install python3-venv
mkdir mevonAI
cd mevonAI
python3 -m venv mevon-env
source mevon-env/bin/activate
  1. Clone the repo
git clone https://github.com/SuyashMore/MevonAI-Speech-Emotion-Recognition.git
  1. Install Dependencies
cd MevonAI-Speech-Emotion-Recognition/
cd src/
sudo chmod +x setup.sh
./setup.sh

Running the Application

  1. Add audio files in .wav format for analysis in src/input/ folder

  2. Run Speech Emotion Recognition using

python3 speechEmotionRecognition.py
  1. By Default , the application will use the Pretrained Model Available in "src/model/"

  2. Diarized files will be stored in "src/output/" folder

  3. Predicted Emotions will be stored in a separate .csv file in src/ folder

Here's how it works:

Speaker Diarization

  • Speaker diarisation (or diarization) is the process of partitioning an input audio stream into homogeneous segments according to the speaker identity. It can enhance the readability of an automatic speech transcription by structuring the audio stream into speaker turns and, when used together with speaker recognition systems, by providing the speaker’s true identity. It is used to answer the question "who spoke when?" Speaker diarisation is a combination of speaker segmentation and speaker clustering. The first aims at finding speaker change points in an audio stream. The second aims at grouping together speech segments on the basis of speaker characteristics.

Logo

Feature Extraction

  • When we do Speech Recognition tasks, MFCCs is the state-of-the-art feature since it was invented in the 1980s.This shape determines what sound comes out. If we can determine the shape accurately, this should give us an accurate representation of the phoneme being produced. The shape of the vocal tract manifests itself in the envelope of the short time power spectrum, and the job of MFCCs is to accurately represent this envelope.

Logo

The Above Image represents the audio Waveform , the below image shows the converted MFCC Output on which we will Run our CNN Model.

CNN Model

  • Use Convolutional Neural Network to recognize emotion on the MFCCs with the following Architecture
model = Sequential()

#Input Layer
model.add(Conv2D(32, 5,strides=2,padding='same',
                 input_shape=(13,216,1)))
model.add(Activation('relu'))
model.add(BatchNormalization())

#Hidden Layer 1
model.add(Conv2D(64, 5,strides=2,padding='same',))
model.add(Activation('relu'))
model.add(BatchNormalization())

#Hidden Layer 2
model.add(Conv2D(64, 5,strides=2,padding='same',))
model.add(Activation('relu'))
model.add(BatchNormalization())

#Flatten Conv Net
model.add(Flatten())

#Output Layer
model.add(Dense(7))
model.add(Activation('softmax'))

Training the Model

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

Distributed under the MIT License. See LICENSE for more information.

Acknowledgements

FAQ

  • How do I do specifically so and so?
    • Create an Issue to this repo , we will respond to the query
Comments
  • Not Running  speechEmotionRecognition.py

    Not Running speechEmotionRecognition.py

    When I try to run the app, I have this error : Using TensorFlow backend. /usr/local/lib/python3.6/site-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning) Traceback (most recent call last): File "./src/speechEmotionRecognition.py", line 21, in model = tf.keras.models.load_model('model/lstm_cnn_rectangular_lowdropout_trainedoncustomdata.h5') File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/keras/saving/save.py", line 146, in load_model loader_impl.parse_saved_model(filepath) File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/saved_model/loader_impl.py", line 83, in parse_saved_model constants.SAVED_MODEL_FILENAME_PB)) OSError: SavedModel file does not exist at: model/lstm_cnn_rectangular_lowdropout_trainedoncustomdata.h5/{saved_model.pbtxt|saved_model.pb}

    Can you help me ?

    bug 
    opened by lenoirdevinci 10
  • AttributeError: module 'tensorflow._api.v1.compat.v2' has no attribute '__internal__'

    AttributeError: module 'tensorflow._api.v1.compat.v2' has no attribute '__internal__'

    Hello Suyash, I am trying to run your project on Colab as I am new to this but I am facing some issues. 1) When running: !chmod +x setup.sh !./setup.sh I am receiving following errors :- ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. tensorflow-probability 0.12.1 requires gast>=0.3.2, but you have gast 0.2.2 which is incompatible. kapre 0.3.5 requires tensorflow>=2.0.0, but you have tensorflow 1.15.0 which is incompatible.

    2) After this when I run !python3 speechEmotionRecognition.py receiving this error:- Using TensorFlow backend. Traceback (most recent call last): File "speechEmotionRecognition.py", line 2, in import keras File "/usr/local/lib/python3.7/dist-packages/keras/init.py", line 3, in from . import utils File "/usr/local/lib/python3.7/dist-packages/keras/utils/init.py", line 26, in from .vis_utils import model_to_dot File "/usr/local/lib/python3.7/dist-packages/keras/utils/vis_utils.py", line 7, in from ..models import Model File "/usr/local/lib/python3.7/dist-packages/keras/models.py", line 10, in from .engine.input_layer import Input File "/usr/local/lib/python3.7/dist-packages/keras/engine/init.py", line 3, in from .input_layer import Input File "/usr/local/lib/python3.7/dist-packages/keras/engine/input_layer.py", line 7, in from .base_layer import Layer File "/usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py", line 12, in from .. import initializers File "/usr/local/lib/python3.7/dist-packages/keras/initializers/init.py", line 124, in populate_deserializable_objects() File "/usr/local/lib/python3.7/dist-packages/keras/initializers/init.py", line 49, in populate_deserializable_objects LOCAL.GENERATED_WITH_V2 = tf.internal.tf2.enabled() File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/util/module_wrapper.py", line 193, in getattr attr = getattr(self._tfmw_wrapped_module, name) AttributeError: module 'tensorflow._api.v1.compat.v2' has no attribute 'internal'

    bug 
    opened by r-a-17 2
  • IndexError: too many indices for array: array is 1-dimensional, but 3 were indexed

    IndexError: too many indices for array: array is 1-dimensional, but 3 were indexed

    when i use wav file, it is error: Traceback (most recent call last): File "D:/Projects/Audio/Emotion/SER/MevonAI-Speech-Emotion-Recognition/src/speakerDiarization.py", line 253, in main("filter_"+FILE_N, embedding_per_second=0.6, overlap_rate=0.4) File "D:/Projects/Audio/Emotion/SER/MevonAI-Speech-Emotion-Recognition/src/speakerDiarization.py", line 170, in main feats = np.array(feats)[:,0,:].astype(float) # [splits, embedding dim] IndexError: too many indices for array: array is 1-dimensional, but 3 were indexed

    help wanted 
    opened by WestbrookZero 2
  • error for adding audio folder

    error for adding audio folder

    hey there, I tried a lot to add audio folder but it throws error, even I made one folder for one file and changed the code but still its shows error
    as such :- Traceback (most recent call last): File "speechEmotionRecognition.py", line 62, in bk.diarizeFromFolder(f'{INPUT_FOLDER_PATH}{subdir}{"/"}',(f'{OUTPUT_FOLDER_PATH}{subdir}{"/"}')) File "/content/emotion/src/bulkDiarize.py", line 29, in diarizeFromFolder diarizeAudio(TOTAL_PATH,TOTAL_OUTPUT_PATH,expectedSpeakers=2) File "/content/emotion/src/speakerDiarization.py", line 242, in diarizeAudio main("filterTemp.wav", embedding_per_second=0.6, overlap_rate=0.4,exportFile=exportFile,expectedSpeakers=expectedSpeakers) File "/content/emotion/src/speakerDiarization.py", line 170, in main feats = np.array(feats)[:,0,:].astype(float) # [splits, embedding dim] IndexError: too many indices for array: array is 1-dimensional, but 3 were indexed

    please could u explain me .

    help wanted 
    opened by har790 1
  • Running instructions

    Running instructions

    Hi @SuyashMore

    Can you please tell me on which OS did you run this app? Because on Mac there is no 3.6.9 Python version to download. Could you please provide some more detailed instructions to run it because there are a lot of compatibilities issues?

    Great job btw!

    help wanted 
    opened by IvanLjubicic 1
SEOVER: Sentence-level Emotion Orientation Vector based Conversation Emotion Recognition Model

SEOVER-Master This code is the implementation of paper: SEOVER: Sentence-level Emotion Orientation Vector based Conversation Emotion Recognition Model

null 4 Feb 24, 2022
[CVPR2021] DoDNet: Learning to segment multi-organ and tumors from multiple partially labeled datasets

DoDNet This repo holds the pytorch implementation of DoDNet: DoDNet: Learning to segment multi-organ and tumors from multiple partially labeled datase

null 116 Dec 12, 2022
Code for the Active Speakers in Context Paper (CVPR2020)

Active Speakers in Context This repo contains the official code and models for the "Active Speakers in Context" CVPR 2020 paper. Before Training The c

null 43 Oct 14, 2022
PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis

Daft-Exprt - PyTorch Implementation PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis The

Keon Lee 47 Dec 18, 2022
Make your AirPlay devices as TTS speakers

Apple AirPlayer Home Assistant integration component, make your AirPlay devices as TTS speakers. Before Use 2021.6.X or earlier Apple Airplayer compon

George Zhao 117 Dec 15, 2022
BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation This is a demo implementation of BYOL for Audio (BYOL-A), a self-sup

NTT Communication Science Laboratories 160 Jan 4, 2023
Storage-optimizer - Identify potintial optimizations on the cloud storage accounts

Storage Optimizer Identify potintial optimizations on the cloud storage accounts

Zaher Mousa 1 Feb 13, 2022
A computer vision pipeline to identify the "icons" in Christian paintings

Christian-Iconography A computer vision pipeline to identify the "icons" in Christian paintings. A bit about iconography. Iconography is related to id

Rishab Mudliar 3 Jul 30, 2022
Official PyTorch implementation of MX-Font (Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts)

Introduction Pytorch implementation of Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Expert. | paper Song Park1

Clova AI Research 97 Dec 23, 2022
Code for "Learning to Segment Rigid Motions from Two Frames".

rigidmask Code for "Learning to Segment Rigid Motions from Two Frames". ** This is a partial release with inference and evaluation code.

Gengshan Yang 157 Nov 21, 2022
code for `Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation`

Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation (CVPR 2021) Introduction PBR is a conceptually simple yet effective

H.Chen 143 Jan 5, 2023
Official Tensorflow implementation of "M-LSD: Towards Light-weight and Real-time Line Segment Detection"

M-LSD: Towards Light-weight and Real-time Line Segment Detection Official Tensorflow implementation of "M-LSD: Towards Light-weight and Real-time Line

NAVER/LINE Vision 357 Jan 4, 2023
Pytorch implementation of "M-LSD: Towards Light-weight and Real-time Line Segment Detection"

M-LSD: Towards Light-weight and Real-time Line Segment Detection Pytorch implementation of "M-LSD: Towards Light-weight and Real-time Line Segment Det

null 123 Jan 4, 2023
LETR: Line Segment Detection Using Transformers without Edges

LETR: Line Segment Detection Using Transformers without Edges Introduction This repository contains the official code and pretrained models for Line S

mlpc-ucsd 157 Jan 6, 2023
COD-Rank-Localize-and-Segment (CVPR2021)

COD-Rank-Localize-and-Segment (CVPR2021) Simultaneously Localize, Segment and Rank the Camouflaged Objects Full camouflage fixation training dataset i

JingZhang 52 Dec 20, 2022
Temporal Segment Networks (TSN) in PyTorch

TSN-Pytorch We have released MMAction, a full-fledged action understanding toolbox based on PyTorch. It includes implementation for TSN as well as oth

null 1k Jan 3, 2023
【ACMMM 2021】DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning

DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning (ACMMM 2021) Overview We release the code of the DSANet (Dynamic S

Wenhao Wu 46 Dec 27, 2022
This project aims to segment 4 common retinal lesions from Fundus Images.

This project aims to segment 4 common retinal lesions from Fundus Images.

Husam Nujaim 1 Oct 10, 2021
Pytorch implementation of paper "Learning Co-segmentation by Segment Swapping for Retrieval and Discovery"

SegSwap Pytorch implementation of paper "Learning Co-segmentation by Segment Swapping for Retrieval and Discovery" [PDF] [Project page] If our project

xshen 41 Dec 10, 2022