Discriminative Condition-Aware PLDA

Overview

DCA-PLDA

This repository implements the Discriminative Condition-Aware Backend described in the paper:

L. Ferrer, M. McLaren, and N. Brümmer, "A Speaker Verification Backend with Robust Performance across Conditions", in Computer Speech and Language, volume 71, 2021

This backend has the same functional form as the usual probabilistic discriminant analysis (PLDA) backend which is commonly used for speaker verification, including the preprocessing stages. It also integrates the calibration stage as part of the backend, where the calibration parameters depend on an estimated condition for the signal. The condition is internally represented by a very low dimensional vector. See the paper for more details on the mathematical formulation of the backend.

We have found this system to provide great out-of-the-box performance across a very wide range of conditions, when training the backend with a variety of data including Voxceleb, SRE (from the NIST speaker recognition evaluations), Switchboard, Mixer 6, RATS and FVC Australian datasets, as described in the above paper.

The code can also be used to train and evaluate a standard PLDA pipeline. Basically, the initial model before any training epochs is identical to a PLDA system, with an option for weighting the samples during training to compensate for imbalance across training domains.

Further, the current version of the code can also be used to do language detection. In this case, we have not yet explored the use of condition-awereness, but rather focused on a novel hierachical approach, which is described in the following paper:

L. Ferrer, D. Castan, M. McLaren, and A. Lawson, "A Hierarchical Model for Spoken Language Recognition", arXiv:2201.01364, 2021

Example scripts and configuration files to do both speaker verification and language detection are provided in the examples directory.

This code was written by Luciana Ferrer. We thank Niko Brummer for his help with the calibration code in the calibration.py file and for providing the code to do heavy-tail PLDA. The pre-computed embeddings provided to run the example were computed using SRI's software and infrastructure.

We will appreciate any feedback about the code or the approaches. Also, please let us know if you find bugs.

How to install

  1. Clone this repository:

    git clone https://github.com/luferrer/DCA-PLDA.git

  2. Install the requirements:

    pip install -r requirements.txt

  3. If you want to run the example code, download the pre-computed embeddings for the task you want to run from:

    https://sftp.speech.sri.com/forms/DCA-DPLDA

    Untar the file and move (or link) the resulting data/ dir inside the example dir for the task you want to run.

  4. You can then run the run_all script which runs several experiments using different configuration files and training sets. You can edit it to just try a single configuration, if you want. Please, see the top of that script for an explanation on what is run and where the output results end up. The run_all scripts will take a few hours to run (on a GPU) if all configurations are run. A RESULTS file is also provided for comparison. The run_all script should generate similar numbers to those in that file if all goes well.

About the examples

The example dir contains two example recipes, one for speaker verification and one for language detection.

Speaker Verification

The example provided with the repository includes the Voxceleb and FVC Australian subsets of the training data used in the paper, since the other datasets are not freely available. As such, the resulting system will only work well on conditions similar to those present in that data. For this reason, we test the resulting model on SITW and Voxceleb2 test dataset, which are very similar in nature to the Voxceleb data used for training. We also test on a set of FVC speakers which are held-out from training.

Language Detection

The example uses the Voxlingua107 dataset which contains a large number of languages.

How to change the examples to use your own data and embeddings

The example scripts run using embeddings for each task extracted at SRI International using standard x-vector architectures. See the papers cited above for a description of the characteristics of the corresponding embedding extractors. Unfortunately, we are unable to release the embedding extractors, but you should be able to replace these embeddings with any type of speaker or language embeddings (eg, those that can be extracted with Kaldi).

The audio files corresponding to the databases used in the speaker verification example above can be obtained for free:

For the language detection example, the Voxlingua107 audio samples can be obtained from http://bark.phon.ioc.ee/voxlingua107/.

Once you have extracted embeddings for all that data using your own procedure, you can set up all the lists and embeddings in the same way and with the same format (hdf5 or npz in the case of embeddings) as in the example data dir for your task of interest and use the run_all script.

Note on scoring multi-sample enrollment models

For now, for speaker verification, the DCA-PLDA model only knows how to calibrate trials that are given by a comparison of two individual speech waveforms since that is the way we create trials during training. The code in this repo can still score trials with multi-file enrollment models, but it does it in a hacky way. Basically, it scores each enrollment sample against the test sample for the trial and then averages the scores. This works reasonably well but it is not ideal. A generalization to scoring multi-sample enrollment trials within the model is left as future work.

You might also like...
Official Repo for Ground-aware Monocular 3D Object Detection for Autonomous Driving

Visual 3D Detection Package: This repo aims to provide flexible and reproducible visual 3D detection on KITTI dataset. We expect scripts starting from

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

Apache MXNet (incubating) for Deep Learning Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to m

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

A PyTorch implementation of Sharpness-Aware Minimization for Efficiently Improving Generalization

sam.pytorch A PyTorch implementation of Sharpness-Aware Minimization for Efficiently Improving Generalization ( Foret+2020) Paper, Official implementa

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

Apache MXNet (incubating) for Deep Learning Master Docs License Apache MXNet (incubating) is a deep learning framework designed for both efficiency an

Official implementation of our paper
Official implementation of our paper "LLA: Loss-aware Label Assignment for Dense Pedestrian Detection" in Pytorch.

LLA: Loss-aware Label Assignment for Dense Pedestrian Detection This project provides an implementation for "LLA: Loss-aware Label Assignment for Dens

An official implementation of
An official implementation of "SFNet: Learning Object-aware Semantic Correspondence" (CVPR 2019, TPAMI 2020) in PyTorch.

PyTorch implementation of SFNet This is the implementation of the paper "SFNet: Learning Object-aware Semantic Correspondence". For more information,

[ICLR 2021] HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark
[ICLR 2021] HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark

HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark Accepted as a spotlight paper at ICLR 2021. Table of content File structure Prerequi

[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion
[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

Comments
  • Error in eigh of decomp.py

    Error in eigh of decomp.py

    Hi, Thank you for your paper and repos. I think this is an astonish innovation for speech verification. I am an undergraduate student trying to train the model with our mother langue dataset. But we have an error and cannot figure it out why. Could you guys have look at the errors and have some suggestion how to fix it for us?

    Traceback (most recent call last): File "expe.py", line 196, in trn_loss = model.init_params_with_data(trn_dataset, config_trn, device=device, subset=init_subset) File "/home/thule/Project/voice-verification/DCA-PLDA-master/examples/modules.py", line 183, in init_params_with_data self.lda_stage.init_with_lda(x, speaker_ids, init_params, sec_ids=domain_ids) File "/home/thule/Project/voice-verification/DCA-PLDA-master/examples/modules.py", line 598, in init_with_lda evals, evecs = linalg.eigh(BCov, WCov) File "/home/thule/anaconda3/envs/Hope/lib/python3.8/site-packages/scipy/linalg/decomp.py", line 581, in eigh raise LinAlgError('The leading minor of order {} of B is not ' numpy.linalg.LinAlgError: The leading minor of order 20 of B is not positive definite. The factorization of B could not be completed and no eigenvalues or eigenvectors were computed.

    Our info in eigh function was 532 when it's supposed to be 0. For x vector, we tried both MFCC and PNCC, but we had the same errors. Our xvector was ways smaller than yours though (about a hundredth of yours).

    We are really lost. We really appreciate your helps. Thank you for your time.

    opened by Jul1999 9
  • downloading the sample data

    downloading the sample data

    I am having some issues downloading the sample data you provided. It seems the download always gets stuck after 460 MB for me. Not sure what's the issue, I have a stable internet connection.

    opened by zabir-nabil 0
  • Bug in ROCCH.Bayes_error_rate

    Bug in ROCCH.Bayes_error_rate

    Try this:

    scores = np.array([1.0,2.0,3.0])
    labels = np.array([0,0,1])
    rocch = ROCCH(PAV(scores,labels))
    ber, pmiss, pfa = rocch.Bayes_error_rate(-np.inf,True)
    

    This gives ber, pmiss, pfa = (0,0,0), but pfa should be 1. The ROCCH is correctly computed and so is ber. But sometimes pmiss or pfa is not. The problem is here. The relevant vertex of the ROCCH is found by minimization. But sometimes, the minimum is not unique. The value of the minimum is the Bayes error-rate ber, which is then correct. But the wrong index sometimes happens to be chosen, returning the wrong pmiss or pfa.

    I will figure out how to fix this.

    Niko

    opened by bsxfan 0
Owner
Luciana Ferrer
Luciana Ferrer
Discriminative Region Suppression for Weakly-Supervised Semantic Segmentation

Discriminative Region Suppression for Weakly-Supervised Semantic Segmentation (AAAI 2021) Official pytorch implementation of our paper: Discriminative

Beom 74 Dec 27, 2022
Code for Discriminative Sounding Objects Localization (NeurIPS 2020)

Discriminative Sounding Objects Localization Code for our NeurIPS 2020 paper Discriminative Sounding Objects Localization via Self-supervised Audiovis

null 51 Dec 11, 2022
OrienMask: Real-time Instance Segmentation with Discriminative Orientation Maps

OrienMask This repository implements the framework OrienMask for real-time instance segmentation. It achieves 34.8 mask AP on COCO test-dev at the spe

null 45 Dec 13, 2022
Joint Discriminative and Generative Learning for Person Re-identification. CVPR'19 (Oral)

Joint Discriminative and Generative Learning for Person Re-identification [Project] [Paper] [YouTube] [Bilibili] [Poster] [Supp] Joint Discriminative

NVIDIA Research Projects 1.2k Dec 30, 2022
[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

Discriminative Region-based Multi-Label Zero-Shot Learning (ICCV 2021) [arXiv][Project page >> coming soon] Sanath Narayan*, Akshita Gupta*, Salman Kh

Akshita Gupta 54 Nov 21, 2022
[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

Discriminative Region-based Multi-Label Zero-Shot Learning (ICCV 2021) [arXiv][Project page >> coming soon] Sanath Narayan*, Akshita Gupta*, Salman Kh

Akshita Gupta 54 Nov 21, 2022
Official implementation of the paper 'Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution' in CVPR 2022

LDL Paper | Supplementary Material Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution Jie Liang*, Hu

null 150 Dec 26, 2022
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

Apache MXNet (incubating) for Deep Learning Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to m

The Apache Software Foundation 20.2k Jan 8, 2023
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

Apache MXNet (incubating) for Deep Learning Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to m

The Apache Software Foundation 20.2k Jan 5, 2023
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

Chao Ma 3k Jan 3, 2023