A pytorch implementation of MBNET: MOS PREDICTION FOR SYNTHESIZED SPEECH WITH MEAN-BIAS NETWORK

Last update: Dec 28, 2022

Related tags

Deep Learning Pytorch-MBNet

Overview

Pytorch-MBNet

A pytorch implementation of MBNET: MOS PREDICTION FOR SYNTHESIZED SPEECH WITH MEAN-BIAS NETWORK

Training

To train a new model, please run train.py, the input arguments are:

--data_path: The path of the directory containing all .wav files of VCC-2018 and the train/dev/test split files (the files in ./data).
--save_dir: The path of the directory to save the trained models. Please create the directory before training.
--total_steps: The total #training step in the training.
--valid_steps: Do the validation every #(valid_steps) of training update.
--log_steps: Log the tensorboard every #(log_steps) of training update.
--update_freq: Gradient accumulation, the default value is 1 (no accumulation).

Testing

To test on VCC-2018, please run test.py, the input arguments are:

--model_path: The path to the saved model.
--idtable_path: The path to the "judge id-number" mapping table file used during training.
--step: The time step for tensorboard log, which can be the same as the training steps.
--split: The valid/test split of data to be used in the testing.

Inference

After training on the VCC data, the model can be utilized to inference on other data. The input arguments are --data_path, --model_path, --save_dir, which are similar to the above. Notice that the bias-net is not used since in this code the ground-truth judge ids are assumed to be unavailable.

The pre-trained model can be found in ./pre_trained.

You might also like...

Submission to Twitter's algorithmic bias bounty challenge

Twitter Ethics Challenge: Pixel Perfect Submission to Twitter's algorithmic bias bounty challenge, by Travis Hoppe (@metasemantic). Abstract We build

4 Aug 19, 2022

Repository for the Bias Benchmark for QA dataset.

BBQ Repository for the Bias Benchmark for QA dataset. Authors: Alicia Parrish, Angelica Chen, Nikita Nangia, Vishakh Padmakumar, Jason Phang, Jana Tho

18 Nov 18, 2022

Source code and notebooks to reproduce experiments and benchmarks on Bias Faces in the Wild (BFW).

Face Recognition: Too Bias, or Not Too Bias? Robinson, Joseph P., Gennady Livitz, Yann Henon, Can Qin, Yun Fu, and Samson Timoner. "Face recognition:

41 Dec 12, 2022

Code for "The Box Size Confidence Bias Harms Your Object Detector"

The Box Size Confidence Bias Harms Your Object Detector - Code Disclaimer: This repository is for research purposes only. It is designed to maintain r

24 Dec 7, 2022

A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

WaveGlow A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis Quick Start: Install requirements: pip install

204 Jul 14, 2022

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech Keon Lee, Ky

114 Dec 12, 2022

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

ERISHA: Multilingual Multispeaker Expressive Text-to-Speech Library ERISHA is a multilingual multispeaker expressive speech synthesis framework. It ca

43 Nov 27, 2022

African language Speech Recognition - Speech-to-Text

Swahili-Speech-To-Text Table of Contents Swahili-Speech-To-Text Overview Scenario Approach Project Structure data: models: notebooks: scripts tests: l

2 Jan 5, 2023

TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction.

TalkNet 2 [WIP] TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Predictio

69 Dec 17, 2022

Comments

Whether frame-level loss is used？

Hi,

In the paper MBNET: MOS PREDICTION FOR SYNTHESIZED SPEECH WITH MEAN-BIAS NETWORK, the author said the frame-level loss borrowed form MOSNet was used. But I can't find it in your code. I reproduced the frame-level loss and found that the effect is not good. So I want to ask if you have used this loss and how effective it is.

Liu

opened by liushenme 6
some instances of the input parameters.

Sorry to disturb you, I am very new of ML, and I try to run the demo at first, but I don't know the input parameters except the modle path, should be like. So could just give some instances of these input parameters?

opened by zhangshengnan96 1
Should ReLU be used in mean_net_dnn/bias_net_dnn?

Hi, thanks for the implementation! I doubt if it is reasonable to use ReLU in both mean_net_dnn and bias_net_dnn. It might be reasonable for mean_net_dnn because the MOS value range is {1,...,5}, but I think we should allow negative bias values. Maybe tanh would be a better choice?

opened by unilight 3

A pytorch implementation of MBNET: MOS PREDICTION FOR SYNTHESIZED SPEECH WITH MEAN-BIAS NETWORK

Related tags

Overview

Pytorch-MBNet

Training

Testing

Inference

You might also like...

Submission to Twitter's algorithmic bias bounty challenge

Repository for the Bias Benchmark for QA dataset.

Source code and notebooks to reproduce experiments and benchmarks on Bias Faces in the Wild (BFW).

Code for "The Box Size Confidence Bias Harms Your Object Detector"

A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

African language Speech Recognition - Speech-to-Text

TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction.

Comments

Whether frame-level loss is used？

some instances of the input parameters.

Should ReLU be used in mean_net_dnn/bias_net_dnn?

Owner

Official Pytorch implementation of "Unbiased Classification Through Bias-Contrastive and Bias-Balanced Learning (NeurIPS 2021)

Finetune SSL models for MOS prediction

UT-Sarulab MOS prediction system using SSL models

Unet network with mean teacher for altrasound image segmentation

Code for BMVC2021 "MOS: A Low Latency and Lightweight Framework for Face Detection, Landmark Localization, and Head Pose Estimation"

Official code for "Mean Shift for Self-Supervised Learning"

[cvpr22] Perturbed and Strict Mean Teachers for Semi-supervised Semantic Segmentation

Implementation for "Domain-Specific Bias Filtering for Single Labeled Domain Generalization"

[CVPR 2021] Counterfactual VQA: A Cause-Effect Look at Language Bias

This repository contains the code for "Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP".