Unofficial reimplementation of ECAPA-TDNN for speaker recognition (EER=0.86 for Vox1_O when train only in Vox2)

Overview

Introduction

This repository contains my unofficial reimplementation of the standard ECAPA-TDNN, which is the speaker recognition in VoxCeleb2 dataset.

This repository is modified based on voxceleb_trainer.

Best Performance in this project (with AS-norm)

Dataset Vox1_O Vox1_E Vox1_H
EER 0.86 1.18 2.17
minDCF 0.0686 0.0765 0.1295

System Description

I will write a technique report about this system and all the details later. Please wait.

Dependencies

Note: That is the setting based on my device, you can modify the torch and torchaudio version based on your device.

Start from building the environment

conda create -n ECAPA python=3.7.9 anaconda
conda activate ECAPA
pip install -r requirements.txt

Start from the existing environment

pip install -r requirements.txt

Data preparation

Please follow the official code to perpare your VoxCeleb2 dataset from the 'Data preparation' part in this repository.

Dataset for training usage:

  1. VoxCeleb2 training set;

  2. MUSAN dataset;

  3. RIR dataset.

Dataset for evaluation:

  1. VoxCeleb1 test set for Vox1_O

  2. VoxCeleb1 train set for Vox1_E and Vox1_H (Optional)

Training

Then you can change the data path in the trainECAPAModel.py. Train ECAPA-TDNN model end-to-end by using:

python trainECAPAModel.py --save_path exps/exp1 

Every test_step epoches, system will be evaluated in Vox1_O set and print the EER.

The result will be saved in exps/exp1/score.txt. The model will saved in exps/exp1/model

In my case, I trained 80 epoches in one 3090 GPU. Each epoch takes 37 mins, the total training time is about 48 hours.

Pretrained model

Our pretrained model performs EER: 0.96 in Vox1_O set without AS-norm, you can check it by using:

python trainECAPAModel.py --eval --initial_model exps/pretrain.model

With AS-norm, this system performs EER: 0.86, we will release the code of AS-norm later.

We also update the score.txt file in exps/pretrain_score.txt, it contains the training loss, training acc and EER in Vox1_O in each epoch for your reference.


Reference

@inproceedings{desplanques2020ecapa,
  title={{ECAPA-TDNN: Emphasized Channel Attention, propagation and aggregation in TDNN based speaker verification}},
  author={Desplanques, Brecht and Thienpondt, Jenthe and Demuynck, Kris},
  booktitle={Interspeech 2020},
  pages={3830--3834},
  year={2020}
}
@inproceedings{chung2020in,
  title={In defence of metric learning for speaker recognition},
  author={Chung, Joon Son and Huh, Jaesung and Mun, Seongkyu and Lee, Minjae and Heo, Hee Soo and Choe, Soyeon and Ham, Chiheon and Jung, Sunghwan and Lee, Bong-Jin and Han, Icksang},
  booktitle={Interspeech},
  year={2020}
}

Acknowledge

We study many useful projects in our codeing process, which includes:

clovaai/voxceleb_trainer.

lawlict/ECAPA-TDNN.

speechbrain/speechbrain

ranchlai/speaker-verification

Thanks for these authors to open source their code!

Notes

If you meet the problems about this repository, Please ask me from the 'issue' part in Github (using English) instead of sending the messages to me from bilibili, so others can also benifit from it. Thanks for your understanding!

If you improve the result based on this repository by some methods, please let me know. Thanks!

Comments
  • Accelerating evaluation speed

    Accelerating evaluation speed

    During evaluation, the current implementation calculates the similarity scores one by one using a for loop, that could be slow when the size of "lines" gets larger. Is there an elegant way of vectorizing it?

    opened by dopiwoo 8
  • 模型输入不统一?

    模型输入不统一?

    我看到推理代码中: with torch.no_grad(): embedding_1 = self.speaker_encoder.forward(data_1, aug = False) embedding_1 = F.normalize(embedding_1, p=2, dim=1) embedding_2 = self.speaker_encoder.forward(data_2, aug = False) embedding_2 = F.normalize(embedding_2, p=2, dim=1) embeddings[file] = [embedding_1, embedding_2] 其中,data1是语音全部的数据,data2是分割后又stack的数据。对于不同长度的语音,data1和data2是没有规定长度的?都可以输入到self.speaker_encoder.forward计算embedding???

    opened by JJ-Guo1996 7
  • can not prepare the dataset

    can not prepare the dataset

    When I followed the Data preparation part in the link and ran the this code python3 dataprep.py --save_path data --download --user USERNAME --password PASSWORD , I met with the following error.

    --2021-11-26 14:04:56-- http://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_dev_wav_partaa Resolving www.robots.ox.ac.uk (www.robots.ox.ac.uk)... 129.67.94.2 Connecting to www.robots.ox.ac.uk (www.robots.ox.ac.uk)|129.67.94.2|:80... connected. HTTP request sent, awaiting response... 301 Moved Permanently Location: https://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_dev_wav_partaa [following] --2021-11-26 14:04:58-- https://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_dev_wav_partaa Connecting to www.robots.ox.ac.uk (www.robots.ox.ac.uk)|129.67.94.2|:443... connected. HTTP request sent, awaiting response... 404 Not Found 2021-11-26 14:04:59 ERROR 404: Not Found.

    Traceback (most recent call last): File "Downloads/voxceleb_trainer-master/dataprep.py", line 176, in download(args,fileparts) File "Downloads/voxceleb_trainer-master/dataprep.py", line 58, in download raise ValueError('Download failed %s. If download fails repeatedly, use alternate URL on the VoxCeleb website.'%url) ValueError: Download failed http://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_dev_wav_partaa. If download fails repeatedly, use alternate URL on the VoxCeleb website.

    How can I solve this problem? Thanks!

    opened by chonghaozhang1998 7
  • How to use pretrain.model for continuing training?

    How to use pretrain.model for continuing training?

    I want to add some chinese audios to the training data.

    Can I use your pretrain.model and continue to train using my data,

    Or Do I have to download all the VoxCeleb1data plusing my data, and train it from the beginning?

    Thank you for your reply.

    opened by youyou098888 7
  • Questions about reproduced ECAPA-Tdnn paper

    Questions about reproduced ECAPA-Tdnn paper

    Hi

    I found out there are some differences between your code configrations and original configurations in ECAPA.

    The most important one is in your code, you just random choose 1 of the 6 noise to add . And in ECAPA, they use all 6 noise methods which means they have a largger dataset.

    I trained the 512 channels model, which only can achieve 1.16 EER (1.01 in ECAPA) , but your result in 1024 channel is even better than ECAPA. So is there any secret you holding about training skill? or you changed the configrations in your upload code ( I just copy your project and change the channel num, and everything else stays the same). OR because the tiny differences in your code leads it is better on a large model.

    And thank you for your excellent work! Any help will be appriciated!

    Best

    opened by sliver-7 5
  • About the training time

    About the training time

    Hello, thank you so much for contributing this project. I am training this model recently. I also use one 3090 and the same setting as you. But i need spend about 20 hours for each epoch. Do you know what's the reason? Thank you so much for your answering in advance.

    opened by KAI-LI-JAIST 5
  • How to evaluate your nn

    How to evaluate your nn

    Hi! I'm new at neural networks and i'm having trouble discovering how to evaluate your implementation. By now I'm using an audio dataset which is different from your --eval_path and --eval_list, so I'm running this command:

    python trainECAPAModel.py --eval --initial_model exps/pretrain.model --eval_list /eval_list_directory --eval_path /eval_path_directory

    Is this the correct way to evaluate your implementation? Should I use any different argument? The point is I don't think I understand what exps/pretrain.model is, so I don't know how to use it.

    Looking forward to your response! Thanks

    opened by rosana-sc7 3
  • 关于AS-norm的问题,

    关于AS-norm的问题,

    Hi!Ruijie,在B站关注你好久了!最近在做SASVC的比赛,发现用了你这个仓库做ASV 的 Baseline code. 你在Readme中写了这个ECAPA-TDNN结果是as-norm后的结果,可我没有在你的代码里找到任何关于backend norm的部分。请问是typo吗?还是您没有向本仓库中添加那一段代码?

    opened by ikou-austin 3
  • GPU utilization error!

    GPU utilization error!

    Hi, author. I am training the ECAPA-TDNN model end-to-end by using: python trainECAPAModel.py However, I found that while training, the training time per epoch is very long. After checking, I found that the GPU memory is occupied, but its utilization is 0. I manually set model.cuda(), but it does not work. I'm wondering what part of the program should I change to make the model load successfully.

    opened by daiyuuu 2
  • Do you have any open source plans for the Stage II in your latest paper?

    Do you have any open source plans for the Stage II in your latest paper?

    I have read your great work in SELF-SUPERVISED SPEAKER RECOGNITION WITH LOSS-GATED LEARNING(ICASSP 2022).

    I attempt to follow Stage II in your paper, which shows great gains in your experiments.

    If these parts of codes are available, it will benefit a lot.

    Thanks a lot.

    opened by seacj 2
  • training set is not 5 times bigger after augmentation

    training set is not 5 times bigger after augmentation

    I notice that in dataloader, the size of training set is the same size as original audio size after augmentation.

    So, adding augmentation is not to increase the amount of training data, only to increase the diversity of it ?

    opened by youyou098888 2
  • ECAPA-TDNN

    ECAPA-TDNN

    Hi!

    I'm having trouble understanding ECAPA-TDNN architecture.

    image

    To be specific, I don't understand what does the elements in ECAPA-TDNN do (PreEmphasis,MelSpectrogram,FBankAug,conv1d,relu, batchNorm1d, bottleneck, Attention...) in the context of speaker verification?

    What about classifier AAAsoftmax, optimizer Adam and scheduler stepLR?

    Thanks for your attention and time!

    opened by rosana-sc7 1
Owner
Tao Ruijie
NUS ECE PhD student
Tao Ruijie
Unofficial implement with paper SpeakerGAN: Speaker identification with conditional generative adversarial network

Introduction This repository is about paper SpeakerGAN , and is unofficially implemented by Mingming Huang ([email protected]), Tiezheng Wang (wtz920729

null 7 Jan 3, 2023
text_recognition_toolbox: The reimplementation of a series of classical scene text recognition papers with Pytorch in a uniform way.

text recognition toolbox 1. 项目介绍 该项目是基于pytorch深度学习框架,以统一的改写方式实现了以下6篇经典的文字识别论文,论文的详情如下。该项目会持续进行更新,欢迎大家提出问题以及对代码进行贡献。 模型 论文标题 发表年份 模型方法划分 CRNN 《An End-t

null 168 Dec 24, 2022
PyTorch reimplementation of the paper Involution: Inverting the Inherence of Convolution for Visual Recognition [CVPR 2021].

Involution: Inverting the Inherence of Convolution for Visual Recognition Unofficial PyTorch reimplementation of the paper Involution: Inverting the I

Christoph Reich 100 Dec 1, 2022
[ECCV 2020] Reimplementation of 3DDFAv2, including face mesh, head pose, landmarks, and more.

Stable Head Pose Estimation and Landmark Regression via 3D Dense Face Reconstruction Reimplementation of (ECCV 2020) Towards Fast, Accurate and Stable

Remilia Scarlet 221 Dec 30, 2022
PyTorch reimplementation of minimal-hand (CVPR2020)

Minimal Hand Pytorch Unofficial PyTorch reimplementation of minimal-hand (CVPR2020). you can also find in youtube or bilibili bare hand youtube or bil

Hao Meng 228 Dec 29, 2022
Reimplementation of the paper `Human Attention Maps for Text Classification: Do Humans and Neural Networks Focus on the Same Words? (ACL2020)`

Human Attention for Text Classification Re-implementation of the paper Human Attention Maps for Text Classification: Do Humans and Neural Networks Foc

Shunsuke KITADA 15 Dec 13, 2021
PyTorch reimplementation of hand-biomechanical-constraints (ECCV2020)

Hand Biomechanical Constraints Pytorch Unofficial PyTorch reimplementation of Hand-Biomechanical-Constraints (ECCV2020). This project reimplement foll

Hao Meng 59 Dec 20, 2022
A PyTorch Reimplementation of TecoGAN: Temporally Coherent GAN for Video Super-Resolution

TecoGAN-PyTorch Introduction This is a PyTorch reimplementation of TecoGAN: Temporally Coherent GAN for Video Super-Resolution (VSR). Please refer to

null 165 Dec 17, 2022
a reimplementation of Optical Flow Estimation using a Spatial Pyramid Network in PyTorch

pytorch-spynet This is a personal reimplementation of SPyNet [1] using PyTorch. Should you be making use of this work, please cite the paper according

Simon Niklaus 269 Jan 2, 2023
An official reimplementation of the method described in the INTERSPEECH 2021 paper - Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.

Speech Resynthesis from Discrete Disentangled Self-Supervised Representations Implementation of the method described in the Speech Resynthesis from Di

Facebook Research 253 Jan 6, 2023
PyTorch reimplementation of REALM and ORQA

PyTorch reimplementation of REALM and ORQA

Li-Huai (Allan) Lin 17 Aug 20, 2022
Reimplementation of the paper "Attention, Learn to Solve Routing Problems!" in jax/flax.

JAX + Attention Learn To Solve Routing Problems Reinplementation of the paper Attention, Learn to Solve Routing Problems! using Jax and Flax. Fully su

Gabriela Surita 7 Dec 1, 2022
a reimplementation of UnFlow in PyTorch that matches the official TensorFlow version

pytorch-unflow This is a personal reimplementation of UnFlow [1] using PyTorch. Should you be making use of this work, please cite the paper according

Simon Niklaus 134 Nov 20, 2022
a reimplementation of LiteFlowNet in PyTorch that matches the official Caffe version

pytorch-liteflownet This is a personal reimplementation of LiteFlowNet [1] using PyTorch. Should you be making use of this work, please cite the paper

Simon Niklaus 365 Dec 31, 2022
a reimplementation of Holistically-Nested Edge Detection in PyTorch

pytorch-hed This is a personal reimplementation of Holistically-Nested Edge Detection [1] using PyTorch. Should you be making use of this work, please

Simon Niklaus 375 Dec 6, 2022
Reimplementation of Dynamic Multi-scale filters for Semantic Segmentation.

Paddle implementation of Dynamic Multi-scale filters for Semantic Segmentation.

Hongqiang.Wang 2 Nov 1, 2021
Pytorch reimplementation of PSM-Net: "Pyramid Stereo Matching Network"

This is a Pytorch Lightning version PSMNet which is based on JiaRenChang/PSMNet. use python main.py to start training. PSM-Net Pytorch reimplementatio

XIAOTIAN LIU 1 Nov 25, 2021
PyTorch reimplementation of the Smooth ReLU activation function proposed in the paper "Real World Large Scale Recommendation Systems Reproducibility and Smooth Activations" [arXiv 2022].

Smooth ReLU in PyTorch Unofficial PyTorch reimplementation of the Smooth ReLU (SmeLU) activation function proposed in the paper Real World Large Scale

Christoph Reich 10 Jan 2, 2023