UT-Sarulab MOS prediction system using SSL models

sarulab-speech

Last update: Nov 22, 2022

Related tags

Deep Learning UTMOS22

Overview

UTMOS: UTokyo-SaruLab MOS Prediction System

Official implementation of "UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022" submitted to INTERSPEECH 2022.

Abstract:
We present the UTokyo-SaruLab mean opinion score (MOS) prediction system submitted to VoiceMOS Challenge 2022. The challenge is to predict the MOS values of speech samples collected from previous Blizzard Challenges and Voice Conversion Challenges for two tracks: a main track for in-domain prediction and an out-of-domain (OOD) track for which there is less labeled data from different listening tests. Our system is based on ensemble learning of strong and weak learners. Strong learners incorporate several improvements to the previous fine-tuning models of self-supervised learning (SSL) models, while weak learners use basic machine-learning methods to predict scores from SSL features. In the Challenge, our system had the highest score on several metrics for both the main and OOD tracks. In addition, we conducted ablation studies to investigate the effectiveness of our proposed methods.

Demo for UTMOS is available:

How to use

Enviornment setup

This repo uses poetry as the python envoirnmet manager. Install poetry following this instruction first.
Install required python packages using poetry install. And enter the python enviornment with poetry shell. All following operations requires to be inside the poetry shell enviornment.
Second, download necessary fairseq checkpoint using download_strong_checkpoints.sh for strong and download_stacking_checkpoints.sh for stacking.
Next, run the following command to exclude bad wav file from main track training set. The original data will be saved with .bak suffix.

python remove_silenceWav.py --path_to_dataset path-to-dataset/phase1-main/

Model training

Our system predicts MOS with small errors by stacking of strong and weak learners.

To run training and inference with a single strong learner, see strong/README.md.
To run stacking, see stacking/ensemble_multidomain_scripts/README.md.

If you encounter any problems regarding running the code, feel free to submit an issue. The code is not fully tested.

You might also like...

We envision models that are pre-trained on a vast range of domain-relevant tasks to become key for molecule property prediction

We envision models that are pre-trained on a vast range of domain-relevant tasks to become key for molecule property prediction. This repository aims to give easy access to state-of-the-art pre-trained models.

90 Jan 8, 2023

Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction".

GNN_PPI Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction". Lear

2 Dec 14, 2022

Complete system for facial identity system. Include one-shot model, database operation, features visualization, monitoring

2 Dec 28, 2021

Complete system for facial identity system

Complete system for facial identity system. Include one-shot model, database operation, features visualization, monitoring

4 May 2, 2022

Face-Recognition-based-Attendance-System - An implementation of Attendance System in python.

Face-Recognition-based-Attendance-System A real time implementation of Attendance System in python. Pre-requisites To understand the implentation of F

1 Dec 31, 2021

Product-based-recommendation-system - A product based recommendation system which uses Machine learning algorithm such as KNN and cosine similarity

Product-based-recommendation-system A product based recommendation system which

2 Feb 15, 2022

Simple-System-Convert--C--F - Simple System Convert With Python

Comments

Bug when installing Fairseq on Poetry enviroment

I am using Colab with python 3.8.15. always I get this error when I try to install fairseq:

Building wheels for collected packages: ffmpy, future, python-multipart, fairseq Building wheel for ffmpy (setup.py) ... done Created wheel for ffmpy: filename=ffmpy-0.3.0-py3-none-any.whl size=4693 sha256=e964fb758c4a97846c3b56538ea665fc5536d48ec6109e691aac10e255c8e542 Stored in directory: /root/.cache/pip/wheels/c7/a7/3e/a6b4408a53b4de8176071a885ed909562c2e4e9422ef7622fe Building wheel for future (setup.py) ... done Created wheel for future: filename=future-0.18.2-py3-none-any.whl size=491058 sha256=8623669d05d3d4ee11fcffd479b6f6ac05aab5781d9052cfadeb985316b2ba43 Stored in directory: /root/.cache/pip/wheels/01/49/0c/4e0a697824c7bd6516afb22e1af9d51427ccd36c74b05a297e Building wheel for python-multipart (setup.py) ... done Created wheel for python-multipart: filename=python_multipart-0.0.5-py3-none-any.whl size=31671 sha256=446c48e412d72913da9539bc0f08abb900cac1222e21526de244bf8326261785 Stored in directory: /root/.cache/pip/wheels/bf/98/35/8ff0b7838d6311008ca83f447b67df38d2d40f55aedadaf332 error: subprocess-exited-with-error

× Building wheel for fairseq (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip. Building wheel for fairseq (pyproject.toml) ... error ERROR: Failed building wheel for fairseq Successfully built ffmpy future python-multipart Failed to build fairseq ERROR: Could not build wheels for fairseq, which is required to install pyproject.toml-based projects le --inp_path /path/to/wav/file.wav --out_path /path/to/csv/file.csv 2022-11-08 06:18:29 | INFO | fairseq.tasks.text_to_speech | Please install tensorboardX: pip install tensorboardX Traceback (most recent call last): File "predict.py", line 86, in main() File "predict.py", line 54, in main assert args.inp_path.exists() AssertionError

opened by YaseenEltahir 5
Bug on making fold files in stacking

Both https://github.com/sarulab-speech/UTMOS22/blob/756fdf28785386fbfdc082c3fef0f35ab1b59c3c/stacking/ensemble_multidomain_scripts/make_ensemble_dataset_wotest.py#L76 and https://github.com/sarulab-speech/UTMOS22/blob/756fdf28785386fbfdc082c3fef0f35ab1b59c3c/stacking/ensemble_multidomain_scripts/make_ensemble_testphase.py#L53 outputs the "test-0.csv" in the same directory. This causes error in the training of stacking.

opened by hyama5 2

UT-Sarulab MOS prediction system using SSL models

Related tags

Overview

UTMOS: UTokyo-SaruLab MOS Prediction System

How to use

Enviornment setup

Model training

You might also like...

We envision models that are pre-trained on a vast range of domain-relevant tasks to become key for molecule property prediction

Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction".

Complete system for facial identity system. Include one-shot model, database operation, features visualization, monitoring

Complete system for facial identity system

Face-Recognition-based-Attendance-System - An implementation of Attendance System in python.

Product-based-recommendation-system - A product based recommendation system which uses Machine learning algorithm such as KNN and cosine similarity

Simple-System-Convert--C--F - Simple System Convert With Python

Real-Time-Student-Attendence-System - Real Time Student Attendence System

Age and Gender prediction using Keras

Comments

Bug when installing Fairseq on Poetry enviroment

Bug on making fold files in stacking

Owner

sarulab-speech

A pytorch implementation of MBNET: MOS PREDICTION FOR SYNTHESIZED SPEECH WITH MEAN-BIAS NETWORK

Official implementation of the paper: "LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech"

Code for BMVC2021 "MOS: A Low Latency and Lightweight Framework for Face Detection, Landmark Localization, and Head Pose Estimation"

SelfRemaster: SSL Speech Restoration

Price-Prediction-For-a-Dream-Home - A machine learning based linear regression trained model for house price prediction.

Doge-Prediction - Coding Club prediction ig

Face-Recognition-Attendence-System - This face recognition Attendence system using Python

E2e music remastering system - End-to-end Music Remastering System Using Self-supervised and Adversarial Training

Monocular Depth Estimation - Weighted-average prediction from multiple pre-trained depth estimation models