Multi-Scale Temporal Frequency Convolutional Network With Axial Attention for Speech Enhancement

Shimin Zhang

Last update: Dec 19, 2022

Related tags

Overview

MTFAA-Net

Unofficial PyTorch implementation of Baidu's MTFAA-Net: "Multi-Scale Temporal Frequency Convolutional Network With Axial Attention for Speech Enhancement".

some whls

# under your python env.
pip install einops
pip install spafe

some bugs?

Implementation details may not be consistent with the paper, and any comments are welcome ([email protected])

Comments

about ERB band filter bank

Hi! Iwonder if we can reduce nerb which is currently using 256. I think if it has less number of bands it can reduce IOPS. based on Baidu paper, I guess they are using ERB Bands applied in [10].

As I noticed in PercepNet paper(A Perceptually-Motivated Approach for Low-Complexity, Real-Time Enhancement of Fullband Speech) they are using 34 bands with 20khz highest frequency.

opened by jzi040941 3
CUDA out of memory when using the network to train

Hello,

First of all, thank you for proving the implementation. It was very helpful to understand the paper.

I had one question though. When I was trying to train the network using 30-second 48kHz audio, I always run into CUDA out of memory error, even if the batch size is set to 1. Have you seen that in your experiments or do you have any advice maybe?

Anything will be greatly appreciated!

opened by nubma 3
erb.py reported a error :expected np.ndarray (got tuple)
The report is as follows:

File "E:/code_paper/MTFAA-Net-main/erb.py", line 24, in __init__ filter = th.from_numpy(filter).float() TypeError: expected np.ndarray (got tuple)

The error occurred on line 24 of erb.py. filter = th.from_numpy(filter).float() "filter" is a tuple has two members.

My Python version is 3.9.12. My spafe version is 0.2.0. My torch version is 1.12.1.

Hope you can teach me.
invalid
opened by FragrantRookie 2
Lincense

Hi @echocatzh

I think there are many people supposed to use your awesome work for both commercial and non-commercial purposes it would be great for people who use this and of course for you as well if you could add an explicit License would you be able to add the license file?

opened by jzi040941 1
Did you normalize signals when you calculate loss?

Hi, thanks for you great work. I find that the loss decrease hardly when I train your MTFAA, I dont normalize signals when I calculate loss. Maybe I should normalize signals like 《Data augmentation and loss normalization for deep noise suppression》，I want to know your way to calculate loss.

opened by YangangCao 1
question about the network

thanks for your code, there is a problem still confuse me, the input of the u-net structure is the magnitude after the phase encoder, but the output of the u-net have two-stage mask, one is magnitude mask, the other is phase mask and magnitude mask, I am confusing that there is no phase information input to the u-net structure, how can it get the correct phase mask? or after phase encoder, although the output is magnitude, but it includes phase information?

opened by wendongj 1

Owner

Shimin Zhang

Speech Enhancement

GitHub

SASE : Self-Adaptive noise distribution network for Speech Enhancement with heterogeneous data of Cross-Silo Federated learning

SASE : Self-Adaptive noise distribution network for Speech Enhancement with heterogeneous data of Cross-Silo Federated learning We propose a SASE mode

1 Nov 20, 2021

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

Text to speech (using Python) Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and co

19 Jun 30, 2022

In this repository, I have developed an end to end Automatic speech recognition project. I have developed the neural network model for automatic speech recognition with PyTorch and used MLflow to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry.

End to End Automatic Speech Recognition In this repository, I have developed an end to end Automatic speech recognition project. I have developed the

22 Nov 13, 2022

Sequence modeling benchmarks and temporal convolutional networks

Sequence Modeling Benchmarks and Temporal Convolutional Networks (TCN) This repository contains the experiments done in the work An Empirical Evaluati

3.5k Jan 3, 2023

Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU，一个中文文本分类、序列标注工具包，支持中文长文本、短文本的多类、多标签分类任务，支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

186 Dec 24, 2022

multi-label，classifier，text classification，多标签文本分类，文本分类，BERT，ALBERT，multi-label-classification，seq2seq，attention，beam search

30 Dec 12, 2022

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

Deepvoice3_pytorch PyTorch implementation of convolutional networks-based text-to-speech synthesis models: arXiv:1710.07654: Deep Voice 3: Scaling Tex

1.8k Dec 30, 2022

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

⚠️ Checkout develop branch to see what is coming in pyannote.audio 2.0: a much smaller and cleaner codebase Python-first API (the good old pyannote-au

2.2k Jan 9, 2023

Speech Recognition for Uyghur using Speech transformer

Speech Recognition for Uyghur using Speech transformer Training: this model using CTC loss and Cross Entropy loss for training. Download pretrained mo

11 Nov 17, 2022

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

3.2k Dec 31, 2022

PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

1k Dec 30, 2022

Simple Speech to Text, Text to Speech

Simple Speech to Text, Text to Speech 1. Download Repository Opsi 1 Download repository ini, extract di lokasi yang diinginkan Opsi 2 Jika sudah famil

5 Dec 28, 2021

A Python module made to simplify the usage of Text To Speech and Speech Recognition.

Nav Module The solution for voice related stuff in Python Nav is a Python module which simplifies voice related stuff in Python. Just import the Modul

1 Dec 20, 2021

Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

STEMM: Self-learning with Speech-Text Manifold Mixup for Speech Translation This is a PyTorch implementation for the ACL 2022 main conference paper ST

29 Oct 16, 2022

Lumped-element impedance calculator and frequency-domain plotter.

fastZ: Lumped-Element Impedance Calculator fastZ is a small tool for calculating and visualizing electrical impedance in Python. Features include: Sup

47 Nov 18, 2022

Count the frequency of letters or words in a text file and show a graph.

Word Counter By EBUS Coding Club Count the frequency of letters or words in a text file and show a graph. Requirements Python 3.9 or higher matplotlib

0 Apr 9, 2022

Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

This is a fork of Fairseq(-py) with implementations of the following models: Pervasive Attention - 2D Convolutional Neural Networks for Sequence-to-Se

490 Dec 15, 2022

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

GenSen Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning Sandeep Subramanian, Adam Trischler, Yoshua B

309 Oct 19, 2022

PhoNLP: A BERT-based multi-task learning toolkit for part-of-speech tagging, named entity recognition and dependency parsing

PhoNLP is a multi-task learning model for joint part-of-speech (POS) tagging, named entity recognition (NER) and dependency parsing. Experiments on Vietnamese benchmark datasets show that PhoNLP produces state-of-the-art results, outperforming a single-task learning approach that fine-tunes the pre-trained Vietnamese language model PhoBERT for each task independently.

109 Dec 2, 2022

Multi-Scale Temporal Frequency Convolutional Network With Axial Attention for Speech Enhancement

Related tags

Overview

MTFAA-Net

some whls

some bugs?

Comments

about ERB band filter bank

CUDA out of memory when using the network to train

erb.py reported a error :expected np.ndarray (got tuple)

Lincense

Did you normalize signals when you calculate loss?

question about the network

Owner

Shimin Zhang

SASE : Self-Adaptive noise distribution network for Speech Enhancement with heterogeneous data of Cross-Silo Federated learning

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

Sequence modeling benchmarks and temporal convolutional networks

Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

multi-label，classifier，text classification，多标签文本分类，文本分类，BERT，ALBERT，multi-label-classification，seq2seq，attention，beam search

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Speech Recognition for Uyghur using Speech transformer

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

Simple Speech to Text, Text to Speech

A Python module made to simplify the usage of Text To Speech and Speech Recognition.

Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

Lumped-element impedance calculator and frequency-domain plotter.

Count the frequency of letters or words in a text file and show a graph.

Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

PhoNLP: A BERT-based multi-task learning toolkit for part-of-speech tagging, named entity recognition and dependency parsing