Multistream CNN for Robust Acoustic Modeling

ASAPP Research

Last update: Sep 21, 2022

Related tags

Deep Learning multistream-cnn

Overview

Multistream Convolutional Neural Network (CNN)

A multistream CNN is a novel neural network architecture for robust acoustic modeling in speech recognition tasks. It processes input speech with diverse resolutions by applying different dilation rates to convolutional neural networks across multiple streams to achieve the robustness. The dilation rate of 3 are selected from the multiples of a sub-sampling rate of 3 frames. Each stream stacks TDNN-F layers (a variant of 1D CNN), and output embedding vectors from the streams are concatenated then projected to the final layer, as illustrated below:

References

Multistream CNN for Robust Acoustic Modeling [paper]

{
  @inproceedings{han2021multistream-cnn,
    title={Multistream CNN for Robust Acoustic Modeling},
    author={Kyu J. Han and Jing Pan and Venkata Krishna Naveen Tadala and Tao Ma and Dan Povey},
    booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
    year={2021}
}

ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition [paper]

{
  @inproceedings{pan2020asapp-asr,
    title={ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition},
    author={Jing Pan and Joshua Shapiro and Jeremy Wohlwend and Kyu J. Han and Tao Lei and Tao Ma},
    booktitle={Interspeech},
    year={2020}
}

Installation

Please follow the original Kaldi build sequence, as below.

>> cd tools; make; cd ../src; ./configure; make clean; make -j clean depend; make -j all

Recipes and Results

LibriSpeech

>> egs/librispeech/s5/local/chain/run_multistream_cnn_1a.sh

	dev-clean	dev-other	test-clean	test-other
tdnn_1d	3.29	8.71	3.80	8.76
multistream_cnn_1a	3.20	7.68	3.54	7.87

Fisher-SWBD

>> egs/fisher_swbd/s5/local/chain/run_multistream_cnn_1a.sh

	eval2000	swbd	callhm
tdnn_7d	12.6	8.8	16.3
multistream_cnn_1a	12.6	9.2	15.7

CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation

CoTr: Efficient 3D Medical Image Segmentation by bridging CNN and Transformer This is the official pytorch implementation of the CoTr: Paper: CoTr: Ef

218 Dec 25, 2022

git《Beta R-CNN: Looking into Pedestrian Detection from Another Perspective》(NeurIPS 2020) GitHub:[fig3]

Beta R-CNN: Looking into Pedestrian Detection from Another Perspective This is the pytorch implementation of our paper "[Beta R-CNN: Looking into Pede

35 Sep 8, 2021

A spherical CNN for weather forecasting

DeepSphere-Weather - Deep Learning on the sphere for weather/climate applications. The code in this repository provides a scalable and flexible framew

47 Dec 25, 2022

BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search

BossNAS This repository contains PyTorch evaluation code, retraining code and pretrained models of our paper: BossNAS: Exploring Hybrid CNN-transforme

127 Dec 26, 2022

《A-CNN: Annularly Convolutional Neural Networks on Point Clouds》(2019)

A-CNN: Annularly Convolutional Neural Networks on Point Clouds Created by Artem Komarichev, Zichun Zhong, Jing Hua from Department of Computer Science

44 Feb 24, 2022

Detection of drones using their thermal signatures from thermal camera through YOLO-V3 based CNN with modifications to encapsulate drone motion

Drone Detection using Thermal Signature This repository highlights the work for night-time drone detection using a using an Optris PI Lightweight ther

6 Dec 31, 2022

Segmentation and Identification of Vertebrae in CT Scans using CNN, k-means Clustering and k-NN

Segmentation and Identification of Vertebrae in CT Scans using CNN, k-means Clustering and k-NN If you use this code for your research, please cite ou

41 Dec 8, 2022

The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.

The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.

1.2k Jan 4, 2023

LiDAR R-CNN: An Efficient and Universal 3D Object Detector

LiDAR R-CNN: An Efficient and Universal 3D Object Detector Introduction This is the official code of LiDAR R-CNN: An Efficient and Universal 3D Object

295 Jan 5, 2023

Comments

Question about the tensor size between single-stream and multi-stream
Thanks for your gorgeous work again; it is a very impressive result.

When I read the script "run_multistream_cnn_1a.sh", I have a question about the size.

The lines 144-150 show the single-stream and the last one is:

conv-relu-batchnorm-layer name=cnn5 $cnn_opts height-in=10 height-out=10 time-offsets=-1,0,1 height-offsets=-1,0,1 num-filters-out=256

I imagine that the size of the output should be [length_of_seq, height, num_filters] (assume batch size = 1). A spectrum is like a image: length_of_image = based on real case, height = 10, num_filters=256.

Next step, the output is imported in multi-stream( lines 152~207), the first line of this part:

relu-batchnorm-dropout-layer name=tdnn6a $affine_opts input=cnn5 dim=512

It looks like the affine transformation occurs here, and [length_of_seq, 10, 256] is affined to [length_of_seq, 10, 512]. The remaining part would always follow the dim=512.

Am I right? Thanks so much.
opened by kaiyikang 0

Multistream CNN for Robust Acoustic Modeling

Related tags

Overview

Multistream Convolutional Neural Network (CNN)

References

Installation

Recipes and Results

You might also like...

CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation

git《Beta R-CNN: Looking into Pedestrian Detection from Another Perspective》(NeurIPS 2020) GitHub:[fig3]

A spherical CNN for weather forecasting

BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search

《A-CNN: Annularly Convolutional Neural Networks on Point Clouds》(2019)

Detection of drones using their thermal signatures from thermal camera through YOLO-V3 based CNN with modifications to encapsulate drone motion

Segmentation and Identification of Vertebrae in CT Scans using CNN, k-means Clustering and k-NN

The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.

LiDAR R-CNN: An Efficient and Universal 3D Object Detector

Comments

Question about the tensor size between single-stream and multi-stream

Owner

ASAPP Research

NFT-Price-Prediction-CNN - Using visual feature extraction, prices of NFTs are predicted via CNN (Alexnet and Resnet) architectures.

This Repostory contains the pretrained DTLN-aec model for real-time acoustic echo cancellation.

Code release for paper: The Boombox: Visual Reconstruction from Acoustic Vibrations

Pytorch Implementation of DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis (TTS Extension)

ICML 21 - Voice2Series: Reprogramming Acoustic Models for Time Series Classification

This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEECH" submitted to ICASSP 2022

Speech Emotion Recognition with Fusion of Acoustic- and Linguistic-Feature-Based Decisions

This repository contains the implementation of the paper: "Towards Frequency-Based Explanation for Robust CNN"

Adversarial-Information-Bottleneck - Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck (NeurIPS21)

FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.