Finetune SSL models for MOS prediction

Related tags

Deep Learning mos-finetune-ssl

Overview

Finetune SSL models for MOS prediction

This is code for our paper under review for ICASSP 2022:

"Generalization Ability of MOS Prediction Networks" Erica Cooper, Wen-Chin Huang, Tomoki Toda, Junichi Yamagishi https://arxiv.org/abs/2110.02635

Please cite this preprint if you use this code.

Dependencies:

Fairseq toolkit: https://github.com/pytorch/fairseq Make sure you can import fairseq in Python.
torch, numpy, scipy, torchaudio
I have exported my conda environment for this project to environment.yml
You also need to download a pretrained wav2vec2 model checkpoint. These can be obtained here: https://github.com/pytorch/fairseq/tree/main/examples/wav2vec Please choose wav2vec_small.pt, w2v_large_lv_fsh_swbd_cv.pt, or xlsr_53_56k.pt.
You also need to have a MOS dataset. Datasets for the MOS prediction challenge will be released once the challenge starts. TODO update with a link.

How to use

Modify the paths in mos_fairseq.py to point to your own data and SSL checkpoints.
Run python mos_fairseq.py to finetune an SSL model on the data.
Modify variables in predict.py to point to your favorite checkpoint.
run predict.py to run inference using that checkpoint.

Acknowledgments

This study is supported by JST CREST grants JP- MJCR18A6, JPMJCR20D3, and JPMJCR19A3, and by MEXT KAKENHI grants 21K11951 and 21K19808. Thanks to the organizers of the Blizzard Challenge and Voice Conversion Challenge, and to Zhenhua Ling, Zhihang Xie, and Zhizheng Wu for answering our questions about past challenges. Thanks also to the Fairseq team for making their code and models available.

License

BSD 3-Clause License

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Comments

How to get the label s of BVCC test wav?

How can I get BVCC test dataset? I only found data and label of train and val in VoiceMOS codalab, but I don't know how to get the label s of test wav. Is it opensource?

opened by jiusansan222 3
Loading pre-trained wav2vec2.0 model error

Hi, have you encountered the error of loading pre-trained wav2vec2.0 models?

the code is as follows: cp_path = './ssl/xlsr_53_56k.pt' model, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task([cp_path]) model = model[0] model.eval()

it works with hubert_large_ll60k.pt, but not with xlsr_53_56k.pt.

the error code is omegaconf.errors.ConfigKeyError: Key 'eval_wer' not in 'AudioPretrainingConfig' full_key: eval_wer reference_type=Optional[AudioPretrainingConfig] object_type=AudioPretrainingConfig

Thank you!

opened by Kristopher-Chen 2
Val loss vray greatly during training

When I used this code and bvcc dataset to trian a mos predict model, I found that the results of serval times were different. And I plot the train loss and val loss picture, it shows that val loss vray greatly from the vary begining of training. Do you have the same findings? Is it overfitting? Thanks!!

opened by jiusansan222 0

Monocular Depth Estimation - Weighted-average prediction from multiple pre-trained depth estimation models

merged_depth runs (1) AdaBins, (2) DiverseDepth, (3) MiDaS, (4) SGDepth, and (5) Monodepth2, and calculates a weighted-average per-pixel absolute dept

39 Nov 21, 2022

We envision models that are pre-trained on a vast range of domain-relevant tasks to become key for molecule property prediction

We envision models that are pre-trained on a vast range of domain-relevant tasks to become key for molecule property prediction. This repository aims to give easy access to state-of-the-art pre-trained models.

90 Jan 8, 2023

Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction".

GNN_PPI Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction". Lear

2 Dec 14, 2022

pyhsmm - library for approximate unsupervised inference in Bayesian Hidden Markov Models (HMMs) and explicit-duration Hidden semi-Markov Models (HSMMs), focusing on the Bayesian Nonparametric extensions, the HDP-HMM and HDP-HSMM, mostly with weak-limit approximations.

Bayesian inference in HSMMs and HMMs This is a Python library for approximate unsupervised inference in Bayesian Hidden Markov Models (HMMs) and expli

527 Dec 4, 2022

Code for pre-training CharacterBERT models (as well as BERT models).

Pre-training CharacterBERT (and BERT) This is a repository for pre-training BERT and CharacterBERT. DISCLAIMER: The code was largely adapted from an o

31 Dec 5, 2022

Quickly comparing your image classification models with the state-of-the-art models (such as DenseNet, ResNet, ...)

Image Classification Project Killer in PyTorch This repo is designed for those who want to start their experiments two days before the deadline and ki

349 Dec 8, 2022

OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)

OCTIS : Optimizing and Comparing Topic Models is Simple! OCTIS (Optimizing and Comparing Topic models Is Simple) aims at training, analyzing and compa

478 Jan 1, 2023

XtremeDistil framework for distilling/compressing massive multilingual neural network models to tiny and efficient models for AI at scale

XtremeDistilTransformers for Distilling Massive Multilingual Neural Networks ACL 2020 Microsoft Research [Paper] [Video] Releasing [XtremeDistilTransf

125 Jan 4, 2023

PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

Official code Cross-Covariance Image Transformer (XCiT)

605 Jan 2, 2023

Finetune SSL models for MOS prediction

Related tags

Overview

Finetune SSL models for MOS prediction

Dependencies:

How to use

Acknowledgments

License

You might also like...

Monocular Depth Estimation - Weighted-average prediction from multiple pre-trained depth estimation models

We envision models that are pre-trained on a vast range of domain-relevant tasks to become key for molecule property prediction

Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction".

pyhsmm - library for approximate unsupervised inference in Bayesian Hidden Markov Models (HMMs) and explicit-duration Hidden semi-Markov Models (HSMMs), focusing on the Bayesian Nonparametric extensions, the HDP-HMM and HDP-HSMM, mostly with weak-limit approximations.

Code for pre-training CharacterBERT models (as well as BERT models).

Quickly comparing your image classification models with the state-of-the-art models (such as DenseNet, ResNet, ...)

OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)

XtremeDistil framework for distilling/compressing massive multilingual neural network models to tiny and efficient models for AI at scale

PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

Comments

How to get the label s of BVCC test wav?

Loading pre-trained wav2vec2.0 model error

Val loss vray greatly during training

Owner

Yamagishi and Echizen Laboratories, National Institute of Informatics

A pytorch implementation of MBNET: MOS PREDICTION FOR SYNTHESIZED SPEECH WITH MEAN-BIAS NETWORK

Official implementation of the paper: "LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech"

Code for BMVC2021 "MOS: A Low Latency and Lightweight Framework for Face Detection, Landmark Localization, and Head Pose Estimation"

Minimal But Practical Image Classifier Pipline Using Pytorch, Finetune on ResNet18, Got 99% Accuracy on Own Small Datasets.

Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetune Paradigm

Finetune the base 64 px GLIDE-text2im model from OpenAI on your own image-text dataset

SelfRemaster: SSL Speech Restoration

Price-Prediction-For-a-Dream-Home - A machine learning based linear regression trained model for house price prediction.

Doge-Prediction - Coding Club prediction ig