UT-Sarulab MOS prediction system using SSL models

Overview

UTMOS: UTokyo-SaruLab MOS Prediction System

Official implementation of "UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022" submitted to INTERSPEECH 2022.

Abstract:
We present the UTokyo-SaruLab mean opinion score (MOS) prediction system submitted to VoiceMOS Challenge 2022. The challenge is to predict the MOS values of speech samples collected from previous Blizzard Challenges and Voice Conversion Challenges for two tracks: a main track for in-domain prediction and an out-of-domain (OOD) track for which there is less labeled data from different listening tests. Our system is based on ensemble learning of strong and weak learners. Strong learners incorporate several improvements to the previous fine-tuning models of self-supervised learning (SSL) models, while weak learners use basic machine-learning methods to predict scores from SSL features. In the Challenge, our system had the highest score on several metrics for both the main and OOD tracks. In addition, we conducted ablation studies to investigate the effectiveness of our proposed methods.

Demo for UTMOS is available: Hugging Face Spaces

How to use

Enviornment setup

  1. This repo uses poetry as the python envoirnmet manager. Install poetry following this instruction first.
  2. Install required python packages using poetry install. And enter the python enviornment with poetry shell. All following operations requires to be inside the poetry shell enviornment.
  3. Second, download necessary fairseq checkpoint using download_strong_checkpoints.sh for strong and download_stacking_checkpoints.sh for stacking.
  4. Next, run the following command to exclude bad wav file from main track training set. The original data will be saved with .bak suffix.
python remove_silenceWav.py --path_to_dataset path-to-dataset/phase1-main/

Model training

Our system predicts MOS with small errors by stacking of strong and weak learners.

If you encounter any problems regarding running the code, feel free to submit an issue. The code is not fully tested.

You might also like...
We envision models that are pre-trained on a vast range of domain-relevant tasks to become key for molecule property prediction
We envision models that are pre-trained on a vast range of domain-relevant tasks to become key for molecule property prediction

We envision models that are pre-trained on a vast range of domain-relevant tasks to become key for molecule property prediction. This repository aims to give easy access to state-of-the-art pre-trained models.

Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction".

GNN_PPI Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction". Lear

Complete system for facial identity system. Include one-shot model, database operation, features visualization, monitoring

Complete system for facial identity system. Include one-shot model, database operation, features visualization, monitoring

Complete system for facial identity system

Complete system for facial identity system. Include one-shot model, database operation, features visualization, monitoring

Face-Recognition-based-Attendance-System - An implementation of Attendance System in python.

Face-Recognition-based-Attendance-System A real time implementation of Attendance System in python. Pre-requisites To understand the implentation of F

Product-based-recommendation-system - A product based recommendation system which uses Machine learning algorithm such as KNN and cosine similarity Simple-System-Convert--C--F - Simple System Convert With Python
Simple-System-Convert--C--F - Simple System Convert With Python

Simple-System-Convert--C--F REQUIREMENTS Python version : 3 HOW TO USE Run the c

Real-Time-Student-Attendence-System - Real Time Student Attendence System

Real-Time-Student-Attendence-System The Student Attendance Management System Pro

Age and Gender prediction using Keras
Age and Gender prediction using Keras

cnn_age_gender Age and Gender prediction using Keras Dataset example : Description : UTKFace dataset is a large-scale face dataset with long age span

Comments
  • Bug  when installing Fairseq on Poetry  enviroment

    Bug when installing Fairseq on Poetry enviroment

    I am using Colab with python 3.8.15. always I get this error when I try to install fairseq:

    Building wheels for collected packages: ffmpy, future, python-multipart, fairseq Building wheel for ffmpy (setup.py) ... done Created wheel for ffmpy: filename=ffmpy-0.3.0-py3-none-any.whl size=4693 sha256=e964fb758c4a97846c3b56538ea665fc5536d48ec6109e691aac10e255c8e542 Stored in directory: /root/.cache/pip/wheels/c7/a7/3e/a6b4408a53b4de8176071a885ed909562c2e4e9422ef7622fe Building wheel for future (setup.py) ... done Created wheel for future: filename=future-0.18.2-py3-none-any.whl size=491058 sha256=8623669d05d3d4ee11fcffd479b6f6ac05aab5781d9052cfadeb985316b2ba43 Stored in directory: /root/.cache/pip/wheels/01/49/0c/4e0a697824c7bd6516afb22e1af9d51427ccd36c74b05a297e Building wheel for python-multipart (setup.py) ... done Created wheel for python-multipart: filename=python_multipart-0.0.5-py3-none-any.whl size=31671 sha256=446c48e412d72913da9539bc0f08abb900cac1222e21526de244bf8326261785 Stored in directory: /root/.cache/pip/wheels/bf/98/35/8ff0b7838d6311008ca83f447b67df38d2d40f55aedadaf332 error: subprocess-exited-with-error

    × Building wheel for fairseq (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> See above for output.

    note: This error originates from a subprocess, and is likely not a problem with pip. Building wheel for fairseq (pyproject.toml) ... error ERROR: Failed building wheel for fairseq Successfully built ffmpy future python-multipart Failed to build fairseq ERROR: Could not build wheels for fairseq, which is required to install pyproject.toml-based projects le --inp_path /path/to/wav/file.wav --out_path /path/to/csv/file.csv 2022-11-08 06:18:29 | INFO | fairseq.tasks.text_to_speech | Please install tensorboardX: pip install tensorboardX Traceback (most recent call last): File "predict.py", line 86, in main() File "predict.py", line 54, in main assert args.inp_path.exists() AssertionError

    opened by YaseenEltahir 5
  • Bug on making fold files in stacking

    Bug on making fold files in stacking

    Both https://github.com/sarulab-speech/UTMOS22/blob/756fdf28785386fbfdc082c3fef0f35ab1b59c3c/stacking/ensemble_multidomain_scripts/make_ensemble_dataset_wotest.py#L76 and https://github.com/sarulab-speech/UTMOS22/blob/756fdf28785386fbfdc082c3fef0f35ab1b59c3c/stacking/ensemble_multidomain_scripts/make_ensemble_testphase.py#L53 outputs the "test-0.csv" in the same directory. This causes error in the training of stacking.

    opened by hyama5 2
Owner
sarulab-speech
Speech group, Saruwatari-Koyama Lab, the University of Tokyo, Japan.
sarulab-speech
A pytorch implementation of MBNET: MOS PREDICTION FOR SYNTHESIZED SPEECH WITH MEAN-BIAS NETWORK

Pytorch-MBNet A pytorch implementation of MBNET: MOS PREDICTION FOR SYNTHESIZED SPEECH WITH MEAN-BIAS NETWORK Training To train a new model, please ru

null 46 Dec 28, 2022
Official implementation of the paper: "LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech"

LDNet Author: Wen-Chin Huang (Nagoya University) Email: [email protected] This is the official implementation of the paper "LDNet

Wen-Chin Huang (unilight) 40 Nov 20, 2022
Code for BMVC2021 "MOS: A Low Latency and Lightweight Framework for Face Detection, Landmark Localization, and Head Pose Estimation"

MOS-Multi-Task-Face-Detect Introduction This repo is the official implementation of "MOS: A Low Latency and Lightweight Framework for Face Detection,

null 104 Dec 8, 2022
SelfRemaster: SSL Speech Restoration

SelfRemaster: Self-Supervised Speech Restoration Official implementation of SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesi

Takaaki Saeki 46 Jan 7, 2023
Price-Prediction-For-a-Dream-Home - A machine learning based linear regression trained model for house price prediction.

Price-Prediction-For-a-Dream-Home ROADMAP TO THIS LINEAR REGRESSION BASED HOUSE PRICE PREDICTION PREDICTION MODEL Import all the dependencies of the p

DIKSHA DESWAL 1 Dec 29, 2021
Doge-Prediction - Coding Club prediction ig

Doge-Prediction Coding Club prediction ig Basically: Create an application that

null 1 Jan 10, 2022
Face-Recognition-Attendence-System - This face recognition Attendence system using Python

Face-Recognition-Attendence-System I have developed this face recognition Attend

Riya Gupta 4 May 10, 2022
E2e music remastering system - End-to-end Music Remastering System Using Self-supervised and Adversarial Training

End-to-end Music Remastering System This repository includes source code and pre

Junghyun (Tony) Koo 37 Dec 15, 2022
Monocular Depth Estimation - Weighted-average prediction from multiple pre-trained depth estimation models

merged_depth runs (1) AdaBins, (2) DiverseDepth, (3) MiDaS, (4) SGDepth, and (5) Monodepth2, and calculates a weighted-average per-pixel absolute dept

Pranav 39 Nov 21, 2022