Original implementation of the pooling method introduced in "Speaker embeddings by modeling channel-wise correlations"

Overview

Speaker-Embeddings-Correlation-Pooling

This is the original implementation of the pooling method introduced in "Speaker embeddings by modeling channel-wise correlations" by T. Stafylakis, J. Rohdin, and L. Burget (Interspeech 2021), a result of the collaboration between Omilia - Conversational Intelligence and Brno University of Technology (BUT), which you may find here.

The code is in TensorFlow1 (TF1) but it should work with TF2 too. I only provide the code for creating the network and the required hyperparameters. The training hyperparameters we used can be found in the paper.

The code is well-commented, at least the part and (hyper-)parameters required for the correlation pooling.

Apart from the experiments provided in the paper, the code allows the user to: (a) Combine standard statistics pooling with correlation pooling, by concatenating the two pooling layers into a single one, and (b) Extract correlation pooling from outputs of all 4 internal ResNet blocks (aka stages) and concatenate them in the pooling layer.

The code can be more efficiently written using tensor-only operators. However, to facilitate research we have implemented it using lists of tensors, e.g. after merging frequency bins to frequency ranges. Despite this inefficiency, we observe no differences between correlation pooling and standard stats pooling in training speed.

Start with the file train_resnet.py, which creates the ResNet (with the pooling mechanism) and sets its parameters. All parameters are set so that you reproduce our best performing experiment (P7 in the paper).

So, try it and let us know what you'll get! Themos

You might also like...
Ceaser-Cipher - The Caesar Cipher technique is one of the earliest and simplest method of encryption technique

Ceaser-Cipher The Caesar Cipher technique is one of the earliest and simplest me

LCG T-TEST USING EUCLIDEAN METHOD

This project has been created for statistical usage, purposing for determining ATL takers and nontakers using LCG ttest and Euclidean Method, especially for internal business case in Telkomsel.

SAINT PyTorch implementation
SAINT PyTorch implementation

SAINT-pytorch A Simple pyTorch implementation of "Towards an Appropriate Query, Key, and Value Computation for Knowledge Tracing" based on https://arx

An implementation of model parallel GPT-3-like models on GPUs, based on the DeepSpeed library. Designed to be able to train models in the hundreds of billions of parameters or larger.

GPT-NeoX An implementation of model parallel GPT-3-like models on GPUs, based on the DeepSpeed library. Designed to be able to train models in the hun

Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.
Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.

TextDistance TextDistance -- python library for comparing distance between two or more sequences by many algorithms. Features: 30+ algorithms Pure pyt

Python implementation of TextRank for phrase extraction and summarization of text documents
Python implementation of TextRank for phrase extraction and summarization of text documents

PyTextRank PyTextRank is a Python implementation of TextRank as a spaCy pipeline extension, used to: extract the top-ranked phrases from text document

Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.
Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.

TextDistance TextDistance -- python library for comparing distance between two or more sequences by many algorithms. Features: 30+ algorithms Pure pyt

Python implementation of TextRank for phrase extraction and summarization of text documents
Python implementation of TextRank for phrase extraction and summarization of text documents

PyTextRank PyTextRank is a Python implementation of TextRank as a spaCy pipeline extension, used to: extract the top-ranked phrases from text document

Comments
  • Question about stats pooling in 2D convnet paper

    Question about stats pooling in 2D convnet paper

    Hi there!

    Great paper and great repo. My question is rather related to your paper. In the paper you mention:

    The statistics pooling layer in speaker embeddings networks with 2D CNN architectures is a concatenation of the mean and std of each of the F × C frequency-channel pairs
    

    I am a bit confused on this end. In pytorch terms if my resent output is B x C x T x F, how exactly do I implement stats pooling?

    would it be:

    x = x.permute(0,2,3,1) #B x T x F x C
    x = x.reshape(B,T,F x C) # B x T x (FxC)
    

    followed by a stats pooling layer?

    Thank You for the help!

    opened by Sreyan88 2
Owner
Themos Stafylakis
Themos Stafylakis
Modified GPT using average pooling to reduce the softmax attention memory constraints.

NLP-GPT-Upsampling This repository contains an implementation of Open AI's GPT Model. In particular, this implementation takes inspiration from the Ny

WD 1 Dec 3, 2021
PyTorch original implementation of Cross-lingual Language Model Pretraining.

XLM NEW: Added XLM-R model. PyTorch original implementation of Cross-lingual Language Model Pretraining. Includes: Monolingual language model pretrain

Facebook Research 2.7k Dec 27, 2022
A method to generate speech across multiple speakers

VoiceLoop PyTorch implementation of the method described in the paper VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop. VoiceLoop is a n

Facebook Archive 873 Dec 15, 2022
ThinkTwice: A Two-Stage Method for Long-Text Machine Reading Comprehension

ThinkTwice ThinkTwice is a retriever-reader architecture for solving long-text machine reading comprehension. It is based on the paper: ThinkTwice: A

Walle 4 Aug 6, 2021
A CRM department in a local bank works on classify their lost customers with their past datas. So they want predict with these method that average loss balance and passive duration for future.

Rule-Based-Classification-in-a-Banking-Case. A CRM department in a local bank works on classify their lost customers with their past datas. So they wa

ÖMER YILDIZ 4 Mar 20, 2022
A NLP program: tokenize method, PoS Tagging with deep learning

IRIS NLP SYSTEM A NLP program: tokenize method, PoS Tagging with deep learning Report Bug · Request Feature Table of Contents About The Project Built

Zakaria 7 Dec 13, 2022
A method for cleaning and classifying text using transformers.

NLP Translation and Classification The repository contains a method for classifying and cleaning text using NLP transformers. Overview The input data

Ray Chamidullin 0 Nov 15, 2022
Code for text augmentation method leveraging large-scale language models

HyperMix Code for our paper GPT3Mix and conducting classification experiments using GPT-3 prompt-based data augmentation. Getting Started Installing P

NAVER AI 47 Dec 20, 2022
Code to reprudece NeurIPS paper: Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks

Accelerated Sparse Neural Training: A Provable and Efficient Method to FindN:M Transposable Masks Recently, researchers proposed pruning deep neural n

itay hubara 4 Feb 23, 2022