Multi-Scale Temporal Frequency Convolutional Network With Axial Attention for Speech Enhancement

Overview
Comments
  • about ERB band filter bank

    about ERB band filter bank

    opened by jzi040941 3
  • CUDA out of memory when using the network to train

    CUDA out of memory when using the network to train

    Hello,

    First of all, thank you for proving the implementation. It was very helpful to understand the paper.

    I had one question though. When I was trying to train the network using 30-second 48kHz audio, I always run into CUDA out of memory error, even if the batch size is set to 1. Have you seen that in your experiments or do you have any advice maybe?

    Anything will be greatly appreciated!

    opened by nubma 3
  • erb.py reported a error :expected np.ndarray (got tuple)

    erb.py reported a error :expected np.ndarray (got tuple)

    The report is as follows:

      File "E:/code_paper/MTFAA-Net-main/erb.py", line 24, in __init__
        filter = th.from_numpy(filter).float()
    TypeError: expected np.ndarray (got tuple)
    

    The error occurred on line 24 of erb.py. filter = th.from_numpy(filter).float() "filter" is a tuple has two members.

    My Python version is 3.9.12. My spafe version is 0.2.0. My torch version is 1.12.1.

    Hope you can teach me.

    invalid 
    opened by FragrantRookie 2
  • Lincense

    Lincense

    Hi @echocatzh

    I think there are many people supposed to use your awesome work for both commercial and non-commercial purposes it would be great for people who use this and of course for you as well if you could add an explicit License would you be able to add the license file?

    opened by jzi040941 1
  • Did you normalize signals when you calculate loss?

    Did you normalize signals when you calculate loss?

    Hi, thanks for you great work. I find that the loss decrease hardly when I train your MTFAA, I dont normalize signals when I calculate loss. Maybe I should normalize signals like 《Data augmentation and loss normalization for deep noise suppression》,I want to know your way to calculate loss.

    opened by YangangCao 1
  • question about the network

    question about the network

    thanks for your code, there is a problem still confuse me, the input of the u-net structure is the magnitude after the phase encoder, but the output of the u-net have two-stage mask, one is magnitude mask, the other is phase mask and magnitude mask, I am confusing that there is no phase information input to the u-net structure, how can it get the correct phase mask? or after phase encoder, although the output is magnitude, but it includes phase information?

    opened by wendongj 1
Owner
Shimin Zhang
Speech Enhancement
Shimin Zhang
SASE : Self-Adaptive noise distribution network for Speech Enhancement with heterogeneous data of Cross-Silo Federated learning

SASE : Self-Adaptive noise distribution network for Speech Enhancement with heterogeneous data of Cross-Silo Federated learning We propose a SASE mode

Tower 1 Nov 20, 2021
Sequence modeling benchmarks and temporal convolutional networks

Sequence Modeling Benchmarks and Temporal Convolutional Networks (TCN) This repository contains the experiments done in the work An Empirical Evaluati

CMU Locus Lab 3.5k Jan 3, 2023
Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

null 186 Dec 24, 2022
multi-label,classifier,text classification,多标签文本分类,文本分类,BERT,ALBERT,multi-label-classification,seq2seq,attention,beam search

multi-label,classifier,text classification,多标签文本分类,文本分类,BERT,ALBERT,multi-label-classification,seq2seq,attention,beam search

hellonlp 30 Dec 12, 2022
PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

Deepvoice3_pytorch PyTorch implementation of convolutional networks-based text-to-speech synthesis models: arXiv:1710.07654: Deep Voice 3: Scaling Tex

Ryuichi Yamamoto 1.8k Dec 30, 2022
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

⚠️ Checkout develop branch to see what is coming in pyannote.audio 2.0: a much smaller and cleaner codebase Python-first API (the good old pyannote-au

pyannote 2.2k Jan 9, 2023
Speech Recognition for Uyghur using Speech transformer

Speech Recognition for Uyghur using Speech transformer Training: this model using CTC loss and Cross Entropy loss for training. Download pretrained mo

Uyghur 11 Nov 17, 2022
Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Alexander Veysov 3.2k Dec 31, 2022
PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

Chung-Ming Chien 1k Dec 30, 2022
Simple Speech to Text, Text to Speech

Simple Speech to Text, Text to Speech 1. Download Repository Opsi 1 Download repository ini, extract di lokasi yang diinginkan Opsi 2 Jika sudah famil

Habib Abdurrasyid 5 Dec 28, 2021
A Python module made to simplify the usage of Text To Speech and Speech Recognition.

Nav Module The solution for voice related stuff in Python Nav is a Python module which simplifies voice related stuff in Python. Just import the Modul

Snm Logic 1 Dec 20, 2021
Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

STEMM: Self-learning with Speech-Text Manifold Mixup for Speech Translation This is a PyTorch implementation for the ACL 2022 main conference paper ST

ICTNLP 29 Oct 16, 2022
Lumped-element impedance calculator and frequency-domain plotter.

fastZ: Lumped-Element Impedance Calculator fastZ is a small tool for calculating and visualizing electrical impedance in Python. Features include: Sup

Wesley Hileman 47 Nov 18, 2022
Count the frequency of letters or words in a text file and show a graph.

Word Counter By EBUS Coding Club Count the frequency of letters or words in a text file and show a graph. Requirements Python 3.9 or higher matplotlib

EBUS Coding Club 0 Apr 9, 2022
Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

This is a fork of Fairseq(-py) with implementations of the following models: Pervasive Attention - 2D Convolutional Neural Networks for Sequence-to-Se

Maha 490 Dec 15, 2022
Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

GenSen Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning Sandeep Subramanian, Adam Trischler, Yoshua B

Maluuba Inc. 309 Oct 19, 2022
PhoNLP: A BERT-based multi-task learning toolkit for part-of-speech tagging, named entity recognition and dependency parsing

PhoNLP is a multi-task learning model for joint part-of-speech (POS) tagging, named entity recognition (NER) and dependency parsing. Experiments on Vietnamese benchmark datasets show that PhoNLP produces state-of-the-art results, outperforming a single-task learning approach that fine-tunes the pre-trained Vietnamese language model PhoBERT for each task independently.

VinAI Research 109 Dec 2, 2022