TCNN Temporal convolutional neural network for real-time speech enhancement in the time domain

凌逆战

Last update: Dec 30, 2022

You might also like...

Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network

Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network This repository is the official implementation of Speech Separati

116 Nov 9, 2022

An implementation of paper `Real-time Convolutional Neural Networks for Emotion and Gender Classification` with PaddlePaddle.

简介通过PaddlePaddle框架复现了论文 Real-time Convolutional Neural Networks for Emotion and Gender Classification 中提出的两个模型，分别是SimpleCNN和MiniXception。利用 imdb_crop

8 Mar 11, 2022

Python codes for Lite Audio-Visual Speech Enhancement.

Lite Audio-Visual Speech Enhancement (Interspeech 2020) Introduction This is the PyTorch implementation of Lite Audio-Visual Speech Enhancement (LAVSE

85 Dec 1, 2022

Implementation of "A Deep Learning Loss Function based on Auditory Power Compression for Speech Enhancement" by pytorch

This repository is used to suspend the results of our paper "A Deep Learning Loss Function based on Auditory Power Compression for Speech Enhancement"

19 Sep 30, 2022

A Low Complexity Speech Enhancement Framework for Full-Band Audio (48kHz) based on Deep Filtering.

DeepFilterNet A Low Complexity Speech Enhancement Framework for Full-Band Audio (48kHz) based on Deep Filtering. libDF contains Rust code used for dat

292 Dec 25, 2022

Spontaneous Facial Micro Expression Recognition using 3D Spatio-Temporal Convolutional Neural Networks

Spontaneous Facial Micro Expression Recognition using 3D Spatio-Temporal Convolutional Neural Networks Abstract Facial expression recognition in video

103 Dec 29, 2022

Spectral Temporal Graph Neural Network (StemGNN in short) for Multivariate Time-series Forecasting

Spectral Temporal Graph Neural Network for Multivariate Time-series Forecasting This repository is the official implementation of Spectral Temporal Gr

306 Dec 29, 2022

Real-Time-Student-Attendence-System - Real Time Student Attendence System

Real-Time-Student-Attendence-System The Student Attendance Management System Pro

1 Feb 15, 2022

Unofficial implementation of "TTNet: Real-time temporal and spatial video analysis of table tennis" (CVPR 2020)

TTNet-Pytorch The implementation for the paper "TTNet: Real-time temporal and spatial video analysis of table tennis" An introduction of the project c

438 Dec 29, 2022

Comments

Overfitting Problem

Hello. Thanks for this model architecture code. I am just very beginner in this area and not much familiar with Deep Learnning standard procedures and all.

Actually, I have converted this code - https://github.com/haoxiangsnr/IRM-based-Speech-Enhancement-using-LSTM into time domain and then used your model architecture code for this. I am using google Colab platform for this.

I have used TIMIT dataset which is having 8732 utterances and randomly mixed with UrbanSound8K noises at -5dB,-4dB,-3dB,-2dB,-1dB,0dB and 1dB, so I am having 8732 noisy speeches. Then I convert it into overlapping frames. And output proceeds.

But I am not sure whether it is overfitting - After 600 epochs of training validation, average PESQ score obtained is 2.13. Average PESQ between clean speech and noisy speech of UrbanSound8k + TIMIT clean speech is around 1.8 . (On each epoch 900 noisy utterences are trained and on next epoch utterences are shuffled and it is trained on other 900 utterences)

But upon testing, I use NOIZEUS database which is unseen to the TCNN network. I am getting very low PESQ score of 1.42 after loading checkpoints. Also when i run same inference script I get different PESQ scores like 1.3, 1.4 or 1.5 something like that ! On same model checkpoint.

Any suggestions why this is the case ? It would be very helpful.

Thanks.

opened by HardeyPandya 7

TCNN Temporal convolutional neural network for real-time speech enhancement in the time domain

Related tags

Overview

TCNN

网络框架图

每一层的输入输出

You might also like...

Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network

An implementation of paper `Real-time Convolutional Neural Networks for Emotion and Gender Classification` with PaddlePaddle.

Python codes for Lite Audio-Visual Speech Enhancement.

Implementation of "A Deep Learning Loss Function based on Auditory Power Compression for Speech Enhancement" by pytorch

A Low Complexity Speech Enhancement Framework for Full-Band Audio (48kHz) based on Deep Filtering.

Spontaneous Facial Micro Expression Recognition using 3D Spatio-Temporal Convolutional Neural Networks

Spectral Temporal Graph Neural Network (StemGNN in short) for Multivariate Time-series Forecasting

Real-Time-Student-Attendence-System - Real Time Student Attendence System

Unofficial implementation of "TTNet: Real-time temporal and spatial video analysis of table tennis" (CVPR 2020)

Comments

Overfitting Problem

Owner

凌逆战

Official code of "R2RNet: Low-light Image Enhancement via Real-low to Real-normal Network."

PyTorch implementation of "A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."

The implementation of "Optimizing Shoulder to Shoulder: A Coordinated Sub-Band Fusion Model for Real-Time Full-Band Speech Enhancement"

Speech Enhancement Generative Adversarial Network Based on Asymmetric AutoEncoder

Temporal Dynamic Convolutional Neural Network for Text-Independent Speaker Verification and Phonemetic Analysis

CBREN: Convolutional Neural Networks for Constant Bit Rate Video Quality Enhancement

HiFi++: a Unified Framework for Neural Vocoding, Bandwidth Extension and Speech Enhancement

Implementation for paper "STAR: A Structure-aware Lightweight Transformer for Real-time Image Enhancement" (ICCV 2021).

This is a model made out of Neural Network specifically a Convolutional Neural Network model

CVPR2021: Temporal Context Aggregation Network for Temporal Action Proposal Refinement