This is the repo of the manuscript "Dual-branch Attention-In-Attention Transformer for speech enhancement"

Guochen Yu

Last update: Dec 16, 2022

Related tags

Deep Learning DB-AIAT

Overview

DB-AIAT: A Dual-branch attention-in-attention transformer for single-channel SE (https://arxiv.org/abs/2110.06467)

This is the repo of the manuscript "Dual-branch Attention-In-Attention Transformer for speech enhancement", which is accepted by ICASSP2022.

Abstract：Curriculum learning begins to thrive in the speech enhancement area, which decouples the original spectrum estimation task into multiple easier sub-tasks to achieve better performance. Motivated by that, we propose a dual-branch attention-in-attention transformer-based module dubbed DB-AIAT to handle both coarse- and fine-grained regions of spectrum in parallel. From a complementary perspective, a magnitude masking branch is proposed to estimate the overall spectral magnitude, while a complex refining branch is designed to compensate for the missing complex spectral details and implicitly derive phase information. Within each branch, we propose a novel attention-in-attention transformer-based module to replace the conventional RNNs and temporal convolutional network for temporal sequence modeling. Specifically, the proposed attention-in-attention transformer consists of adaptive temporal-frequency attention transformer blocks and an adaptive hierarchical attention module, which can capture long-term time-frequency dependencies and further aggregate global hierarchical contextual information. The experimental results on VoiceBank + Demand dataset show that DB-AIAT yields state-of-the-art performance (e.g., 3.31 PESQ, 95.6% STOI and 10.79dB SSNR) over previous advanced systems with a relatively light model size (2.81M).

Code:

You can use dual_aia_trans_merge_crm() in aia_trans.py for dual-branch SE, while aia_complex_trans_mag() and aia_complex_trans_ri() are single-branch aprroaches. The trained weights on VB dataset is also provided. You can directly perform inference or finetune the model by using vb_aia_merge_new.pth.tar.

requirements:

CUDA 10.1
torch == 1.8.0
pesq == 0.0.1
librosa == 0.7.2
SoundFile == 0.10.3

How to train

Step1

prepare your data. Run json_extract.py to generate json files, which records the utterance file names for both training and validation set

# Run json_extract.py
json_extract.py

Step2

change the parameter settings accroding to your directory (within config_merge.py)

Step3

Network Training (you can also use aia_complex_trans_mag() and aia_complex_trans_ri() network in aia_trans.py for single-branch SE)

# Run main.py to begin network training 
# solver_merge.py and train_merge.py contain detailed training process
main_merge.py

Inference:

The trained weights vb_aia_merge_new.pth.tar on VB dataset is also provided in BEST_MODEL.

# Run main.py to enhance the noisy speech samples.
enhance.py

Comparison with SOTA:

Citation

If you use our code in your research or wish to refer to the baseline results, please use the following BibTeX entry.

@article{yu2021dual,
title={Dual-branch Attention-In-Attention Transformer for single-channel speech enhancement},
author={Yu, Guochen and Li, Andong and Wang, Yutian and Guo, Yinuo and Wang, Hui and Zheng, Chengshi},
journal={arXiv preprint arXiv:2110.06467},
year={2021}
}

You might also like...

This repo is developed for Strong Baseline For Vehicle Re-Identification in Track 2 Ai-City-2021 Challenges

A STRONG BASELINE FOR VEHICLE RE-IDENTIFICATION This paper is accepted to the IEEE Conference on Computer Vision and Pattern Recognition Workshop(CVPR

78 Dec 29, 2022

Official repo for the work titled "SharinGAN: Combining Synthetic and Real Data for Unsupervised GeometryEstimation"

SharinGAN Official repo for the work titled "SharinGAN: Combining Synthetic and Real Data for Unsupervised GeometryEstimation" The official project we

23 Oct 19, 2022

This repo contains the pytorch implementation for Dynamic Concept Learner (accepted by ICLR 2021).

DCL-PyTorch Pytorch implementation for the Dynamic Concept Learner (DCL). More details can be found at the project page. Framework Grounding Physical

31 Jan 6, 2023

A repo that contains all the mesh keys needed for mesh backend, along with a code example of how to use them in python

Mesh-Keys A repo that contains all the mesh keys needed for mesh backend, along with a code example of how to use them in python Have been seeing alot

53 Dec 13, 2022

시각 장애인을 위한 스마트 지팡이에 활용될 딥러닝 모델 (DL Model Repo)

SmartCane-DL-Model Smart Cane using semantic segmentation 참고한 Github repositoy 🔗 https://github.com/JunHyeok96/Road-Segmentation.git 데이터셋 🔗 https://

4 Dec 3, 2021

Repo for "Event-Stream Representation for Human Gaits Identification Using Deep Neural Networks"

Summary This is the code for the paper Event-Stream Representation for Human Gaits Identification Using Deep Neural Networks by Yanxiang Wang, Xian Zh

54 Jan 3, 2023

We will release the code of "ConTNet: Why not use convolution and transformer at the same time?" in this repo

ConTNet Introduction ConTNet (Convlution-Tranformer Network) is proposed mainly in response to the following two issues: (1) ConvNets lack a large rec

93 Nov 8, 2022

The repo of the preprinting paper "Labels Are Not Perfect: Inferring Spatial Uncertainty in Object Detection"

Inferring Spatial Uncertainty in Object Detection A teaser version of the code for the paper Labels Are Not Perfect: Inferring Spatial Uncertainty in

21 Mar 3, 2022

This Repo is the official CUDA implementation of ICCV 2019 Oral paper for CARAFE: Content-Aware ReAssembly of FEatures

Introduction This Repo is the official CUDA implementation of ICCV 2019 Oral paper for CARAFE: Content-Aware ReAssembly of FEatures. @inproceedings{Wa

42 Jan 7, 2023

Comments

too quiet parts of speech

Hey! One of the Enhance.py arguments, --snr, is not used in your code. At least in the enhance function, I have not found a use for it. Using the default settings, my slightly noisy sound is clearer, however parts of speech are much quieter and speech volume is uneven. I think snr can help with this, right?

opened by Nistrian 3
the code of exchange information between the two branches in aia_net.py seems not math to the diagram?
from line 188 to 190 ,

if i >=1: output_mag_i = input_mag + output_list_ri[-1] else: output_mag_i = input_mag

from the diagram,i think it should be output_mag_i = output_list_mag[-1]+ output_list_ri[-1],is that right?
opened by l2009312042 2
Add Replicate demo and Cog configuration

Hey @yuguochencuc! 👋

I'm really impressed with the speech enhancement quality of your model.

This pull request makes it possible to run your model inside a Docker environment, which makes it easier for other people to run it. We're using an open source tool called Cog to make this process easier.

This also means we can make a web page where other people can try out your model! View it here: https://replicate.com/yuguochencuc/db-aiat

When you sign in with GitHub you become the owner of the Replicate page so you can edit it, and we'll feature it on our website and tweet about it too.

In case you're wondering who I am, I'm from Replicate, where we're trying to make machine learning reproducible. We got frustrated that we couldn't run all the really interesting ML work being done. So, we're going round implementing models we like. 😊

opened by andreasjansson 0
Experimental settings...

Hi! First of all, thank you for providing a great project. I'm trying to run your code in my environment, but I'm having some problems.

What GPU did you use and what was the batch size? I run the code with 2080ti(11GB) and 3090(24GB), and even if the batch size is set to 1, cuda out of memory occurs.

Even with A100(40GB), the batch size can only be set up to 2. In this case, the experiment takes too long.

+ A model with much larger parameters than DB-AIAT runs, but why doesn't DB-AIAT work?

opened by seorim0 5

Owner

Guochen Yu

Phd of Communication University of China and Key Laboratory of Noise and Vibration Research, Institute of Acoustics, Chinese Academy of Sciences

GitHub

This repo holds code for TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

TransUNet This repo holds code for TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation Usage

1.4k Jan 4, 2023

This repo provides the official code for TransBTS: Multimodal Brain Tumor Segmentation Using Transformer (https://arxiv.org/pdf/2103.04430.pdf).

TransBTS: Multimodal Brain Tumor Segmentation Using Transformer This repo is the official implementation for TransBTS: Multimodal Brain Tumor Segmenta

247 Dec 28, 2022

Repo for CVPR2021 paper "QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information"

QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information by Masato Tamura, Hiroki Ohashi, and Tomoaki Yosh

105 Dec 23, 2022

This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.

Skeleton Aware Multi-modal Sign Language Recognition By Songyao Jiang, Bin Sun, Lichen Wang, Yue Bai, Kunpeng Li and Yun Fu. Smile Lab @ Northeastern

128 Dec 8, 2022

This is the repo of the manuscript "Dual-branch Attention-In-Attention Transformer for speech enhancement"

Related tags

Overview

DB-AIAT: A Dual-branch attention-in-attention transformer for single-channel SE (https://arxiv.org/abs/2110.06467)

Code:

requirements:

How to train

Step1

Step2

Step3

Inference:

Comparison with SOTA:

Citation

You might also like...

This repo is developed for Strong Baseline For Vehicle Re-Identification in Track 2 Ai-City-2021 Challenges

Official repo for the work titled "SharinGAN: Combining Synthetic and Real Data for Unsupervised GeometryEstimation"

This repo contains the pytorch implementation for Dynamic Concept Learner (accepted by ICLR 2021).

A repo that contains all the mesh keys needed for mesh backend, along with a code example of how to use them in python

시각 장애인을 위한 스마트 지팡이에 활용될 딥러닝 모델 (DL Model Repo)

Repo for "Event-Stream Representation for Human Gaits Identification Using Deep Neural Networks"

We will release the code of "ConTNet: Why not use convolution and transformer at the same time?" in this repo

The repo of the preprinting paper "Labels Are Not Perfect: Inferring Spatial Uncertainty in Object Detection"

This Repo is the official CUDA implementation of ICCV 2019 Oral paper for CARAFE: Content-Aware ReAssembly of FEatures

Comments

too quiet parts of speech

the code of exchange information between the two branches in aia_net.py seems not math to the diagram?

Add Replicate demo and Cog configuration

Experimental settings...

Owner

Guochen Yu

This repo holds code for TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

This repo provides the official code for TransBTS: Multimodal Brain Tumor Segmentation Using Transformer (https://arxiv.org/pdf/2103.04430.pdf).

Repo for CVPR2021 paper "QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information"

Official repo for QHack—the quantum machine learning hackathon

This repo is customed for VisDrone.

This repo is customed for VisDrone.

The official repo of the CVPR2021 oral paper: Representative Batch Normalization with Feature Calibration

A repo with study material, exercises, examples, etc for Devnet SPAUTO

The repo contains the code of the ACL2020 paper `Dice Loss for Data-imbalanced NLP Tasks`

This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.