Official implementation of the paper: "LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech"

Related tags

Deep Learning LDNet

Overview

LDNet

Author: Wen-Chin Huang (Nagoya University) Email: [email protected]

This is the official implementation of the paper "LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech". This is a model that takes an input synthetic speech sample and outputs the simulated human rating.

Usage

Currently we support only the VCC2018 dataset. We plan to release the BVCC dataset in the near future.

Requirements

PyTorch 1.9 (versions not too old should be fine.)
librosa
pandas
h5py
scipy
matplotlib
tqdm

Data preparation

# Download the VCC2018 dataset.
cd data
./download.sh vcc2018

Training

We provide configs that correspond to the following rows in the above figure:

(a): MBNet.yaml
(d): LDNet_MobileNetV3_RNN_5e-3.yaml
(e): LDNet_MobileNetV3_FFN_1e-3.yaml
(f): LDNet-MN_MobileNetV3_RNN_FFN_1e-3_lamb4.yaml
(g): LDNet-ML_MobileNetV3_FFN_1e-3.yaml

python train.py --config configs/<config_name> --tag <tag_name>

By default, the experimental results will be stored in exp/<tag_name>, including:

model-<steps>.pt: model checkpoints.
config.yml: the config file.
idtable.pkl: the dictionary that maps listener to ID.
training_<inference_mode>: the validation results generated along the training. This file is useful for model selection. Note that the inference_mode in the config file decides what mode is used during validation in the training.

There are some arguments that can be changed:

--exp_dir: The directory for storing the experimental results.
--data_dir: The data directory. Default is data/vcc2018.
seed: random seed.
update_freq: This is very important. See below.

Batch size and `update_freq`

By default, all LDNet models are trained with a batch size of 60. In my experiments, I used a single NVIDIA GeForce RTX 3090 with 24GB mdemory for training. I cannot fit the whole model in the GPU, so I accumulate gradients for update_freq forward passes and do one backward update. Before training, please check the train_batch_size in the config file, and set update_freq properly. For instance, in configs/LDNet_MobileNetV3_FFN_1e-3.yaml the train_batch_size is 20, so update_freq should be set to 3.

Inference

python inference.py --tag LDNet-ML_MobileNetV3_FFN_1e-3 --mode mean_listener

Use mode to specify which inference mode to use. Choices are: mean_net, all_listeners and mean_listener. By default, all checkpoints in the exp directory will be evaluated.

There are some arguments that can be changed:

ep: if you want to evaluate one model checkpoint, say, model-10000.pt, then simply pass --ep 10000.
start_ep: if you want to evaluate model checkpoints after a certain steps, say, 10000 steps later, then simply pass --start_ep 10000.

There are some files you can inspect after the evaluation:

<dataset_name>_<inference_mode>.csv: the validation and test set results.
<dataset_name>_<inference_mode>_<test/valid>/: figures that visualize the prediction distributions, including;
- <ep>_distribution.png: distribution over the score range (1-5).
- <ep>_utt_scatter_plot_utt: utterance-wise scatter plot of the ground truth and the predicted scores.
- <ep>_sys_scatter_plot_utt: system-wise scatter plot of the ground truth and the predicted scores.

Acknowledgement

This repository inherits from this great unofficial MBNet implementation.

Citation

If you find this recipe useful, please consider citing following paper:

@article{huang2021ldnet,
  title={LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech},
  author={Huang, Wen-Chin and Cooper, Erica and Yamagishi, Junichi and Toda, Tomoki},
  journal={arXiv preprint arXiv:2110.09103},
  year={2021}
}

Comments

How to get the labels of BVCC test wav?

How can I get BVCC test dataset? I only found data and label of train and val in VoiceMOS codalab, but I don't know how to get the labels of test wav. Is it opensource?

opened by jiusansan222 2

Official implementation for NIPS'17 paper: PredRNN: Recurrent Neural Networks for Predictive Learning Using Spatiotemporal LSTMs.

PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive Learning The predictive learning of spatiotemporal sequences aims to generate future

243 Dec 26, 2022

[PyTorch] Official implementation of CVPR2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency". https://arxiv.org/abs/2103.05465

PointDSC repository PyTorch implementation of PointDSC for CVPR'2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency",

153 Dec 14, 2022

This is an official implementation of our CVPR 2021 paper "Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression" (https://arxiv.org/abs/2104.02300)

Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression Introduction In this paper, we are interested in the bottom-up paradigm of estima

367 Dec 27, 2022

The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"

TimeSformer This is an official pytorch implementation of Is Space-Time Attention All You Need for Video Understanding?. In this repository, we provid

1k Dec 31, 2022

PixelPick This is an official implementation of the paper "All you need are a few pixels: semantic segmentation with PixelPick."

PixelPick This is an official implementation of the paper "All you need are a few pixels: semantic segmentation with PixelPick." [Project page] [Paper

59 Sep 25, 2022

Official implementation of GraphMask as presented in our paper Interpreting Graph Neural Networks for NLP With Differentiable Edge Masking.

GraphMask This repository contains an implementation of GraphMask, the interpretability technique for graph neural networks presented in our ICLR 2021

29 Sep 2, 2022

The official implementation of our CVPR 2021 paper - Hybrid Rotation Averaging: A Fast and Robust Rotation Averaging Approach

Graph Optimizer This repo contains the official implementation of our CVPR 2021 paper - Hybrid Rotation Averaging: A Fast and Robust Rotation Averagin

109 Dec 23, 2022

Official Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(2021) paper

ImageNet-21K Pretraining for the Masses Paper | Pretrained models Official PyTorch Implementation Tal Ridnik, Emanuel Ben-Baruch, Asaf Noy, Lihi Zelni

574 Jan 2, 2023

The official PyTorch implementation of the paper: Xili Dai, Xiaojun Yuan, Haigang Gong, Yi Ma. "Fully Convolutional Line Parsing." .

F-Clip — Fully Convolutional Line Parsing This repository contains the official PyTorch implementation of the paper: *Xili Dai, Xiaojun Yuan, Haigang

115 Dec 28, 2022

Official implementation of the paper: "LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech"

Related tags

Overview

LDNet

Usage

Requirements

Data preparation

Training

Batch size and `update_freq`

Inference

Acknowledgement

Citation

You might also like...

Official implementation for NIPS'17 paper: PredRNN: Recurrent Neural Networks for Predictive Learning Using Spatiotemporal LSTMs.

[PyTorch] Official implementation of CVPR2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency". https://arxiv.org/abs/2103.05465

This is an official implementation of our CVPR 2021 paper "Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression" (https://arxiv.org/abs/2104.02300)

The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"

PixelPick This is an official implementation of the paper "All you need are a few pixels: semantic segmentation with PixelPick."

Official implementation of GraphMask as presented in our paper Interpreting Graph Neural Networks for NLP With Differentiable Edge Masking.

The official implementation of our CVPR 2021 paper - Hybrid Rotation Averaging: A Fast and Robust Rotation Averaging Approach

Official Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(2021) paper

The official PyTorch implementation of the paper: Xili Dai, Xiaojun Yuan, Haigang Gong, Yi Ma. "Fully Convolutional Line Parsing." .

Comments

How to get the labels of BVCC test wav?

Owner

Wen-Chin Huang (unilight)

Official implementation of AAAI-21 paper "Label Confusion Learning to Enhance Text Classification Models"

Official PyTorch implementation for paper Context Matters: Graph-based Self-supervised Representation Learning for Medical Images

Official implementation of the ICLR 2021 paper

Official implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis https://arxiv.org/abs/2011.13775

Official pytorch implementation of paper "Image-to-image Translation via Hierarchical Style Disentanglement".

Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

Official implementation of our paper "LLA: Loss-aware Label Assignment for Dense Pedestrian Detection" in Pytorch.

This project is the official implementation of our accepted ICLR 2021 paper BiPointNet: Binary Neural Network for Point Clouds.

Official implementation of our CVPR2021 paper "OTA: Optimal Transport Assignment for Object Detection" in Pytorch.

This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

Official implementation of the paper: "LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech"

Related tags

Overview

LDNet

Usage

Requirements

Data preparation

Training

Batch size and update_freq

Inference

Acknowledgement

Citation

You might also like...

Official implementation for NIPS'17 paper: PredRNN: Recurrent Neural Networks for Predictive Learning Using Spatiotemporal LSTMs.

[PyTorch] Official implementation of CVPR2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency". https://arxiv.org/abs/2103.05465

This is an official implementation of our CVPR 2021 paper "Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression" (https://arxiv.org/abs/2104.02300)

The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"

PixelPick This is an official implementation of the paper "All you need are a few pixels: semantic segmentation with PixelPick."

Official implementation of GraphMask as presented in our paper Interpreting Graph Neural Networks for NLP With Differentiable Edge Masking.

The official implementation of our CVPR 2021 paper - Hybrid Rotation Averaging: A Fast and Robust Rotation Averaging Approach

Official Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(2021) paper

The official PyTorch implementation of the paper: *Xili Dai, Xiaojun Yuan, Haigang Gong, Yi Ma. "Fully Convolutional Line Parsing." *.

Comments

How to get the labels of BVCC test wav?

Owner

Wen-Chin Huang (unilight)

Official implementation of AAAI-21 paper "Label Confusion Learning to Enhance Text Classification Models"

Official PyTorch implementation for paper Context Matters: Graph-based Self-supervised Representation Learning for Medical Images

Official implementation of the ICLR 2021 paper

Official implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis https://arxiv.org/abs/2011.13775

Official pytorch implementation of paper "Image-to-image Translation via Hierarchical Style Disentanglement".

Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

Official implementation of our paper "LLA: Loss-aware Label Assignment for Dense Pedestrian Detection" in Pytorch.

This project is the official implementation of our accepted ICLR 2021 paper BiPointNet: Binary Neural Network for Point Clouds.

Official implementation of our CVPR2021 paper "OTA: Optimal Transport Assignment for Object Detection" in Pytorch.

This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

Batch size and `update_freq`

The official PyTorch implementation of the paper: Xili Dai, Xiaojun Yuan, Haigang Gong, Yi Ma. "Fully Convolutional Line Parsing." .