Original implementation of the pooling method introduced in "Speaker embeddings by modeling channel-wise correlations"

Themos Stafylakis

Last update: Apr 30, 2022

Related tags

Text Data & NLP Speaker-Embeddings-Correlation-Pooling

Overview

Speaker-Embeddings-Correlation-Pooling

This is the original implementation of the pooling method introduced in "Speaker embeddings by modeling channel-wise correlations" by T. Stafylakis, J. Rohdin, and L. Burget (Interspeech 2021), a result of the collaboration between Omilia - Conversational Intelligence and Brno University of Technology (BUT), which you may find here.

The code is in TensorFlow1 (TF1) but it should work with TF2 too. I only provide the code for creating the network and the required hyperparameters. The training hyperparameters we used can be found in the paper.

The code is well-commented, at least the part and (hyper-)parameters required for the correlation pooling.

Apart from the experiments provided in the paper, the code allows the user to: (a) Combine standard statistics pooling with correlation pooling, by concatenating the two pooling layers into a single one, and (b) Extract correlation pooling from outputs of all 4 internal ResNet blocks (aka stages) and concatenate them in the pooling layer.

The code can be more efficiently written using tensor-only operators. However, to facilitate research we have implemented it using lists of tensors, e.g. after merging frequency bins to frequency ranges. Despite this inefficiency, we observe no differences between correlation pooling and standard stats pooling in training speed.

Start with the file train_resnet.py, which creates the ResNet (with the pooling mechanism) and sets its parameters. All parameters are set so that you reproduce our best performing experiment (P7 in the paper).

So, try it and let us know what you'll get! Themos

Ceaser-Cipher - The Caesar Cipher technique is one of the earliest and simplest method of encryption technique

Ceaser-Cipher The Caesar Cipher technique is one of the earliest and simplest me

2 May 12, 2022

Seq2seq attn - Use the Seq2Seq method to implement machine translation and introduce Attention mechanism to improve the results

Seq2seq_attn Use the Seq2Seq method to implement machine translation and use the

1 Jun 28, 2022

LCG T-TEST USING EUCLIDEAN METHOD

This project has been created for statistical usage, purposing for determining ATL takers and nontakers using LCG ttest and Euclidean Method, especially for internal business case in Telkomsel.

2 Jan 21, 2022

SAINT PyTorch implementation

SAINT-pytorch A Simple pyTorch implementation of "Towards an Appropriate Query, Key, and Value Computation for Knowledge Tracing" based on https://arx

63 Dec 25, 2022

An implementation of model parallel GPT-3-like models on GPUs, based on the DeepSpeed library. Designed to be able to train models in the hundreds of billions of parameters or larger.

GPT-NeoX An implementation of model parallel GPT-3-like models on GPUs, based on the DeepSpeed library. Designed to be able to train models in the hun

3.1k Jan 8, 2023

Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.

TextDistance TextDistance -- python library for comparing distance between two or more sequences by many algorithms. Features: 30+ algorithms Pure pyt

3k Jan 6, 2023

Python implementation of TextRank for phrase extraction and summarization of text documents

PyTextRank PyTextRank is a Python implementation of TextRank as a spaCy pipeline extension, used to: extract the top-ranked phrases from text document

1.9k Jan 6, 2023

Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.

TextDistance TextDistance -- python library for comparing distance between two or more sequences by many algorithms. Features: 30+ algorithms Pure pyt

1.9k Feb 18, 2021

Python implementation of TextRank for phrase extraction and summarization of text documents

PyTextRank PyTextRank is a Python implementation of TextRank as a spaCy pipeline extension, used to: extract the top-ranked phrases from text document

1.4k Feb 17, 2021

Comments

Question about stats pooling in 2D convnet paper
Hi there!

Great paper and great repo. My question is rather related to your paper. In the paper you mention:

The statistics pooling layer in speaker embeddings networks with 2D CNN architectures is a concatenation of the mean and std of each of the F × C frequency-channel pairs

I am a bit confused on this end. In pytorch terms if my resent output is B x C x T x F, how exactly do I implement stats pooling?

would it be:

x = x.permute(0,2,3,1) #B x T x F x C x = x.reshape(B,T,F x C) # B x T x (FxC)

followed by a stats pooling layer?

Thank You for the help!
opened by Sreyan88 2

Original implementation of the pooling method introduced in "Speaker embeddings by modeling channel-wise correlations"

Related tags

Overview

Speaker-Embeddings-Correlation-Pooling

You might also like...

Ceaser-Cipher - The Caesar Cipher technique is one of the earliest and simplest method of encryption technique

Seq2seq attn - Use the Seq2Seq method to implement machine translation and introduce Attention mechanism to improve the results

LCG T-TEST USING EUCLIDEAN METHOD

SAINT PyTorch implementation

An implementation of model parallel GPT-3-like models on GPUs, based on the DeepSpeed library. Designed to be able to train models in the hundreds of billions of parameters or larger.

Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.

Python implementation of TextRank for phrase extraction and summarization of text documents

Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.

Python implementation of TextRank for phrase extraction and summarization of text documents

Comments

Question about stats pooling in 2D convnet paper

Owner

Themos Stafylakis

This repository contains the official release of the model "BanglaBERT" and associated downstream finetuning code and datasets introduced in the paper titled "BanglaBERT: Combating Embedding Barrier in Multilingual Models for Low-Resource Language Understanding".

Modified GPT using average pooling to reduce the softmax attention memory constraints.

PyTorch original implementation of Cross-lingual Language Model Pretraining.

A method to generate speech across multiple speakers

ThinkTwice: A Two-Stage Method for Long-Text Machine Reading Comprehension

A CRM department in a local bank works on classify their lost customers with their past datas. So they want predict with these method that average loss balance and passive duration for future.

A NLP program: tokenize method, PoS Tagging with deep learning

A method for cleaning and classifying text using transformers.

Code for text augmentation method leveraging large-scale language models

Code to reprudece NeurIPS paper: Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks