This repository contains a set of codes to run (i.e., train, perform inference with, evaluate) a diarization method called EEND-vector-clustering.

Overview

EEND-vector clustering

The EEND-vector clustering (End-to-End-Neural-Diarization-vector clustering) is a speaker diarization framework that integrates two complementary major diarization approaches, i.e., traditional clustering-based and emerging end-to-end neural network-based approaches, to make the best of both worlds. In [1] it is shown that the EEND-vector clustering outperforms EEND when the recording is long (e.g., more than 5 min), while in [2] it is shown based on CALLHOME data that it outperforms x-vector clustering and EEND-EDA especially when the number of speakers in recordings is large.

This repository contains an example implementation of the EEND-vector clustering based on Pytorch to reproduce the results in [2], i.e., the CALLHOME experiments. For the trainer, we use Padertorch. This repository is implemented based on EEND and relies on some useful functions provided therein.

References

[1] Keisuke Kinoshita, Marc Delcroix, and Naohiro Tawara, "Integrating end-to-end neural and clustering-based diarization: Getting the best of both worlds," Proc. ICASSP, pp. 7198–7202, 2021

[2] Keisuke Kinoshita, Marc Delcroix, and Naohiro Tawara, "Advances in integration of end-to-end neural and clustering-based diarization for real conversational speech," Proc. Interspeech, 2021 (to appear)

Citation

@inproceedings{eend-vector-clustering,
 author = {Keisuke Kinoshita and Marc Delcroix and Naohiro Tawara},
 title = {Integrating End-to-End Neural and Clustering-Based Diarization: Getting the Best of Both Worlds},
 booktitle = {{ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}},
 pages={7198-7202}
 year = {2021}
}

Install tools

Requirements

  • NVIDIA CUDA GPU
  • CUDA Toolkit (version == 9.2, 10.1 or 10.2)

Install kaldi and python environment

cd tools
make
  • This command builds kaldi at tools/kaldi
    • if you want to use pre-build kaldi
      cd tools
      make KALDI=<existing_kaldi_root>
      This option make a symlink at tools/kaldi
  • This command extracts miniconda3 at tools/miniconda3, and creates conda envirionment named 'eend'
  • Then, installs Pytorch and Padertorch into 'eend' environment
  • Then, clones EEND to reference symbolic links stored under eend/, egs/ and utils/

Test recipe (mini_librispeech)

Configuration

  • Modify egs/mini_librispeech/v1/cmd.sh according to your job schedular. If you use your local machine, use "run.pl" (default). If you use Grid Engine, use "queue.pl" If you use SLURM, use "slurm.pl". For more information about cmd.sh see http://kaldi-asr.org/doc/queue.html.

Run data preparation, training, inference, and scoring

cd egs/mini_librispeech/v1
CUDA_VISIBLE_DEVICES=0 ./run.sh
  • See RESULT.md and compare with your result.

CALLHOME experiment

Configuraition

  • Modify egs/callhome/v1/cmd.sh according to your job schedular. If you use your local machine, use "run.pl" (default). If you use Grid Engine, use "queue.pl" If you use SLURM, use "slurm.pl". For more information about cmd.sh see http://kaldi-asr.org/doc/queue.html.

Run data preparation, training, inference, and scoring

cd egs/callhome/v1
CUDA_VISIBLE_DEVICES=0 ./run.sh --db_path <db_path>
# <db_path> means absolute path of the directory where the necessary LDC corpora are stored.
  • See RESULT.md and compare with your result.
  • If you want to run multi-GPU training, simply set CUDA_VISIBLE_DEVICES appropriately. This environment variable may be automatically set by your job schedular such as SLURM.
Comments
  • Requesting for the callhome result.

    Requesting for the callhome result.

    Recently, I want to compare our algorithm with your paper in callhome result. Can you kindly provide the rttm hypothesis of callhome for us in the original paper? Thanks a lot.

    opened by liutaocode 2
  • which file should I pass to spkv_lab while resume training with initmodel?

    which file should I pass to spkv_lab while resume training with initmodel?

    Hi, for some reson my training process was interupted. I want to resume the train from the lastest ckpt and continue training on the old data. There is a para --spkv_lab: "file path of speaker vector with label and speaker ID conversion table for adaptation" . which file does it exactly mean? I tried the featlab_chunk_indices.txt but failed. I cannot find another file suitable for it... please help. Thanks

    opened by kli017 1
  • Performance of different net architecture

    Performance of different net architecture

    Hello, I was wondering have you evaluate with different net architecture? I modified the net according to the transformer's paper (layer numbers, heads numbers and hidden units size). And I found that the result does not get better (even worse on some unseen test wav) with the net become complicated.

    opened by kli017 0
  • Possible to train with audios contained different number of speaker?

    Possible to train with audios contained different number of speaker?

    Hello, I found there is a paramter named num_speakers in the train.yaml. Dose that mean the number of speaker in audio shoud equal to num_speakers? I

    opened by kli017 0
  • get invalid input shape while modified the layer and head num in train.yaml

    get invalid input shape while modified the layer and head num in train.yaml

    Hello, while I modified the layer and head num of transformer in train.yaml I got a RuntimeError. RuntimeError: shape '[128, -1, 12, 21]' is invalid for input of size 4915200

    spk_loss_ratio: 0.03
    spkv_dim: 256
    max_epochs: 120
    input_transform: logmel23_mn
    lr: 0.001
    optimizer: noam
    num_speakers: 3
    gradclip: 5
    chunk_size: 150
    batchsize: 128
    num_workers: 8
    hidden_size: 256
    context_size: 7
    subsampling: 10
    frame_size: 200
    frame_shift: 80
    sampling_rate: 8000
    noam_scale: 1.0
    noam_warmup_steps: 25000
    transformer_encoder_n_heads: 12
    transformer_encoder_n_layers: 8
    transformer_encoder_dropout: 0.1
    seed: 777
    feature_nj: 100
    batchsize_per_gpu: 8
    test_run: 0
    

    I didnt go through the model structure code now. So these two parameter cannot random modified? or they are related to other para(context_size)?

    opened by kli017 0
  • Potentiel issue excluding silent speaker

    Potentiel issue excluding silent speaker

    Hello there,

    Thanks for your efforts in open-sourcing the code, it's vital for us trying to reproduce the result presented in the paper.

    Problem

    But I've come across a RuntimeError when adapting the model with our private data which shows:

    /*/EEND-vector-clustering/eend/pytorch_backend/train.py:186: RuntimeWarning: invalid value encountered in true_divide
      fet_arr[spk] = org / norm
    ...
    Traceback (most recent call last):
    ...
    RuntimeError: The loss (nan) is not finite.
    

    Detail

    After some debugging, I found the problem actually happens during the backpropagation step when there exists an entry left with zeros in the embedding layer: https://github.com/nttcslab-sp/EEND-vector-clustering/blob/b3649eed02fe4f0239f2000fb895120d3c549631/eend/pytorch_backend/train.py#L173-L186

    Since the embeddings are actually loaded from the dumped speaker embeddings generated by the save_spkv_lab.py script when adapting the model, I suspect there might exist some issue in the save_spkv_lab function.

    After some careful step-by-step checking with pdb, I found there is actually some silent speaker label added in the all_labels variable when dumping the speaker embeddings: https://github.com/nttcslab-sp/EEND-vector-clustering/blob/b3649eed02fe4f0239f2000fb895120d3c549631/eend/pytorch_backend/infer.py#L349-L355

    Even when if torch.sum(t_chunked_t[sigma[i]]) > 0, lab can still be -1 which is considered as silent speaker acroding to code in: https://github.com/nttcslab-sp/EEND-vector-clustering/blob/b3649eed02fe4f0239f2000fb895120d3c549631/eend/pytorch_backend/diarization_dataset.py#L94-L99. (This is where makes me feels confused since it should not happen as both lab and T/t_chunked produced with info from kaldi_obj.utt2spk)

    Since these silent speaker labels are -1 and the python list support negative indexing, this issue is silently ignored when dumping the embedding but will cause Exceptions when training begins.

    Question

    I could simply fix this issue by adding speaker label to all_labels only if lab < 0 when saving speaker embeddings and the followed training process could continue smoothly resulting in a good performing model.

    But before opening any PR, I would like to know if you guys have ever come across such an issue or do you have any idea on why this will happen.

    Thanks!

    opened by Zenglinxiao 0
  • fix Agg. Clustering ValueError with sample<2

    fix Agg. Clustering ValueError with sample<2

    Currently, the following Error might arise when a trained EEND-vector model is used to do inference on an audio record with only a single speaker.

    ValueError: Found array with 1 sample(s) (shape=(1, 1)) while a minimum of 2 is required by AgglomerativeClustering.
    

    This PR fixes this issue by adding an extra verification to make sure min_n_samples is always greater than two which avoids doing clustering on one single sample.

    opened by Zenglinxiao 0
Owner
null
Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

============================================================================================================ `MILA will stop developing Theano <https:

null 9.6k Dec 31, 2022
Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

============================================================================================================ `MILA will stop developing Theano <https:

null 9.6k Jan 6, 2023
Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

============================================================================================================ `MILA will stop developing Theano <https:

null 9.3k Feb 12, 2021
Vector AI — A platform for building vector based applications. Encode, query and analyse data using vectors.

Vector AI is a framework designed to make the process of building production grade vector based applications as quickly and easily as possible. Create

Vector AI 267 Dec 23, 2022
Code to run experiments in SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression.

Code to run experiments in SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression. Not an official Google product. Me

Google Research 27 Dec 12, 2022
Ludwig is a toolbox that allows to train and evaluate deep learning models without the need to write code.

Translated in ???? Korean/ Ludwig is a toolbox that allows users to train and test deep learning models without the need to write code. It is built on

Ludwig 8.7k Jan 5, 2023
Ludwig is a toolbox that allows to train and evaluate deep learning models without the need to write code.

Translated in ???? Korean/ Ludwig is a toolbox that allows users to train and test deep learning models without the need to write code. It is built on

Ludwig 8.7k Dec 31, 2022
Train/evaluate a Keras model, get metrics streamed to a dashboard in your browser.

Hera Train/evaluate a Keras model, get metrics streamed to a dashboard in your browser. Setting up Step 1. Plant the spy Install the package pip

Keplr 495 Dec 10, 2022
HeatNet is a python package that provides tools to build, train and evaluate neural networks designed to predict extreme heat wave events globally on daily to subseasonal timescales.

HeatNet HeatNet is a python package that provides tools to build, train and evaluate neural networks designed to predict extreme heat wave events glob

Google Research 6 Jul 7, 2022
Automatically download the cwru data set, and then divide it into training data set and test data set

Automatically download the cwru data set, and then divide it into training data set and test data set.自动下载cwru数据集,然后分训练数据集和测试数据集

null 6 Jun 27, 2022
Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds (CVPR 2022)

Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds (CVPR2022)[paper] Authors: Chenhang He, Ruihuang Li, Shuai Li, L

Billy HE 141 Dec 30, 2022
Graph Regularized Residual Subspace Clustering Network for hyperspectral image clustering

Graph Regularized Residual Subspace Clustering Network for hyperspectral image clustering

Yaoming Cai 5 Jul 18, 2022
Awesome Deep Graph Clustering is a collection of SOTA, novel deep graph clustering methods

ADGC: Awesome Deep Graph Clustering ADGC is a collection of state-of-the-art (SOTA), novel deep graph clustering methods (papers, codes and datasets).

yueliu1999 297 Dec 27, 2022
This repository contains the source code and data for reproducing results of Deep Continuous Clustering paper

Deep Continuous Clustering Introduction This is a Pytorch implementation of the DCC algorithms presented in the following paper (paper): Sohil Atul Sh

Sohil Shah 197 Nov 29, 2022
Demo for the paper "Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation"

Streaming speaker diarization Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation by Juan Manuel Coria, Hervé

Juanma Coria 187 Jan 6, 2023
Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks

Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks. Bayesian-Torch is designed to be flexible and seamless in extending a deterministic deep neural network architecture to corresponding Bayesian form by simply replacing the deterministic layers with Bayesian layers.

Intel Labs 210 Jan 4, 2023
This project deploys a yolo fastest model in the form of tflite on raspberry 3b+. The model is from another repository of mine called -Trash-Classification-Car

Deploy-yolo-fastest-tflite-on-raspberry 觉得有用的话可以顺手点个star嗷 这个项目将垃圾分类小车中的tflite模型移植到了树莓派3b+上面。 该项目主要是为了记录在树莓派部署yolo fastest tflite的流程 (之后有时间会尝试用C++部署来提升

null 7 Aug 16, 2022