Unofficial implement with paper SpeakerGAN: Speaker identification with conditional generative adversarial network

Last update: Jan 3, 2023

Related tags

Deep Learning speakergan

Overview

Introduction

This repository is about paper SpeakerGAN , and is unofficially implemented by Mingming Huang ([email protected]), Tiezheng Wang ([email protected]) and thanks for advice from TongFeng.

SpeakerGAN paper

SpeakerGAN: Speaker identification with conditional generative adversarial network， by Liyang Chen , Yifeng Liu , Wendong Xiao , Yingxue Wang ,Haiyong Xie.

Usage

For train / test / generate:

python speakergan.py

You may need to change the path of wav vad preprocessed files.

Our results

acc: 94.27% with random sampled testset. 

acc: 93.21% with fixed start sampled testset.

using model file: model/49_D.pkl

acc: 98.44% on training classification accuracy with real samples.

There is about 4% gap on testset lower compared to paper result. We can't find out the reason. We want your help !

Details of paper

The following are details about this paper.

================ input ==================

feature: fbank, 8000hz, 25ms frame, 10ms overlap. shape:(160,64)
dataset: librispeech-100 train-clean-100 POI:251
data preprocess: vad、mean and variance normalization, shuffled.
60% train. 40% test.

================ model architecture ==================

dataflow: data -> feature extraction -> G & D
model architecture:

G: gated CNN, encoder-decoder, Huber loss + adversarial loss

D: ResnetBlocks, template average pooling, FC, softmax, crossentropy loss + adversarial loss
G: shuffler layer, GLU
D: ReLU

================ training ==================

lr: 0-9, 0.0005 | 9-49, 0.0002
L(d): λ1 λ2 = 1
batch_size: 64
D_train steps / G_train steps = 4
Ladv Loss: Label smoothing, 1 -> 0.7 ~ 1.0, 0 -> 0 ~ 0.3

======== not sure or differences with paper ========

weights,bias initialize function, use: xavier_uniform and zeros
pytorch huber_loss： + 0.5 to be same with paper. but no implement here.
for shorter wav, paper: padded with zero. we: padded with feature again.
gated cnn architecture.
we use webrtcvad mode(3) for vad preprocess.

You might also like...

Joint Discriminative and Generative Learning for Person Re-identification. CVPR'19 (Oral)

Joint Discriminative and Generative Learning for Person Re-identification [Project] [Paper] [YouTube] [Bilibili] [Poster] [Supp] Joint Discriminative

1.2k Dec 30, 2022

Speech Enhancement Generative Adversarial Network Based on Asymmetric AutoEncoder

ASEGAN: Speech Enhancement Generative Adversarial Network Based on Asymmetric AutoEncoder 中文版简介 Readme with English Version 介绍基于SEGAN模型的改进版本，使用自主设计的非

53 Nov 17, 2022

House-GAN++: Generative Adversarial Layout Refinement Network towards Intelligent Computational Agent for Professional Architects

House-GAN++ Code and instructions for our paper: House-GAN++: Generative Adversarial Layout Refinement Network towards Intelligent Computational Agent

122 Dec 28, 2022

pytorch implementation for Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network arXiv:1609.04802

PyTorch SRResNet Implementation of Paper: "Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network"(https://arxiv.org/abs

436 Jan 9, 2023

Python code to generate art with Generative Adversarial Network

GAN_Canvas_Maker Generating Art using Generative Adversarial Network (GAN) Python code to generate art with Generative Adversarial Network: https://to

10 Aug 22, 2022

A method that utilized Generative Adversarial Network (GAN) to interpret the black-box deep image classifier models by PyTorch.

3 Dec 29, 2022

Collection of TensorFlow2 implementations of Generative Adversarial Network varieties presented in research papers.

TensorFlow2-GAN Collection of tf2.0 implementations of Generative Adversarial Network varieties presented in research papers. Model architectures will

41 Apr 28, 2022

Official implement of Paper：A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sening images

A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images 深度监督影像融合网络DSIFN用于高分辨率双时相遥感影像变化检测 Of

135 Dec 19, 2022

Temporal Dynamic Convolutional Neural Network for Text-Independent Speaker Verification and Phonemetic Analysis

TDY-CNN for Text-Independent Speaker Verification Official implementation of Temporal Dynamic Convolutional Neural Network for Text-Independent Speake

16 Oct 17, 2022

Unofficial implement with paper SpeakerGAN: Speaker identification with conditional generative adversarial network

Related tags

Overview

Introduction

SpeakerGAN paper

Usage

Our results

Details of paper

You might also like...

Joint Discriminative and Generative Learning for Person Re-identification. CVPR'19 (Oral)

Speech Enhancement Generative Adversarial Network Based on Asymmetric AutoEncoder

House-GAN++: Generative Adversarial Layout Refinement Network towards Intelligent Computational Agent for Professional Architects

pytorch implementation for Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network arXiv:1609.04802

Python code to generate art with Generative Adversarial Network

A method that utilized Generative Adversarial Network (GAN) to interpret the black-box deep image classifier models by PyTorch.

Collection of TensorFlow2 implementations of Generative Adversarial Network varieties presented in research papers.

Official implement of Paper：A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sening images

Temporal Dynamic Convolutional Neural Network for Text-Independent Speaker Verification and Phonemetic Analysis

Owner

VGGVox models for Speaker Identification and Verification trained on the VoxCeleb (1 & 2) datasets

Unofficial implementation of Alias-Free Generative Adversarial Networks. (https://arxiv.org/abs/2106.12423) in PyTorch

Unofficial reimplementation of ECAPA-TDNN for speaker recognition (EER=0.86 for Vox1_O when train only in Vox2)

offical implement of our Lifelong Person Re-Identification via Adaptive Knowledge Accumulation in CVPR2021

Official code for CVPR2022 paper: Depth-Aware Generative Adversarial Network for Talking Head Video Generation

Unofficial Implement PU-Transformer

Unofficial pytorch-lightning implement of Mip-NeRF

Image-to-Image Translation with Conditional Adversarial Networks (Pix2pix) implementation in keras

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

PyTorch implementation of "Image-to-Image Translation Using Conditional Adversarial Networks".