Phonetic PosteriorGram (PPG)-Based Voice Conversion (VC)

Related tags

Deep Learning ppg-vc
Overview

ppg-vc

Phonetic PosteriorGram (PPG)-Based Voice Conversion (VC)

This repo implements different kinds of PPG-based VC models. Pretrained models. More models are on the way.

Notes:

  • The PPG model provided in conformer_ppg_model is based on Hybrid CTC-Attention phoneme recognizer, trained with LibriSpeech (960hrs). PPGs have frame-shift of 10 ms, with dimensionality of 144. This modelis very much similar to the one used in this paper.

  • This repo uses HifiGAN V1 as the vocoder model, sampling rate of synthesized audio is 24kHz.

Highlights

  • Any-to-many VC
  • Any-to-Any VC (a.k.a. few/one-shot VC)

How to use

Data preprocessing

  • Please run 1_compute_ctc_att_bnf.py to compute PPG features.
  • Please run 2_compute_f0.py to compute fundamental frequency.
  • Please run 3_compute_spk_dvecs.py to compute speaker d-vectors.

Training

  • Please refer to run.sh

Conversion

  • Plesae refer to test.sh

TODO

  • Upload pretraind models.

Citations

@ARTICLE{liu2021any,
  author={Liu, Songxiang and Cao, Yuewen and Wang, Disong and Wu, Xixin and Liu, Xunying and Meng, Helen},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, 
  title={Any-to-Many Voice Conversion With Location-Relative Sequence-to-Sequence Modeling}, 
  year={2021},
  volume={29},
  number={},
  pages={1717-1728},
  doi={10.1109/TASLP.2021.3076867}
}

@inproceedings{Liu2018,
  author={Songxiang Liu and Jinghua Zhong and Lifa Sun and Xixin Wu and Xunying Liu and Helen Meng},
  title={Voice Conversion Across Arbitrary Speakers Based on a Single Target-Speaker Utterance},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={496--500},
  doi={10.21437/Interspeech.2018-1504},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1504}
}
Comments
  • question about the training of encoder-decoder

    question about the training of encoder-decoder

    Hi, the paper mentioned MSE loss between the predicted mel-spectrogram and ground-truth mel-spectrogram. I am wondering, if the below example is correct. A, our source speaker, has a audio saying 12345. B, our target speaker, also has a audio saying 12345, and some other audios. During training, A’s 12345 will be converted to B’s voice by a B’s audio (any audio). Then the output will be compared with B’s 12345 to compute MSE loss.

    opened by jardnzm 8
  •  ModuleNotFoundError: No module named 'nnsp.layers'

    ModuleNotFoundError: No module named 'nnsp.layers'

    我使用pip install nnsp显示安装成功,然后运行test.sh的时候遇到了没有nnsp.layers的问题,报错信息如下 Traceback (most recent call last): File "convert_from_wav.py", line 11, in <module> from src.mel_decoder_mol_encAddlf0 import MelDecoderMOL File "/code/ppg-vc-main/src/__init__.py", line 4, in <module> from .mel_decoder_lsa import MelDecoderLSA File "/code/ppg-vc-main/src/mel_decoder_lsa.py", line 20, in <module> from .rnn_decoder_lsa import Decoder File "/code/ppg-vc-main/src/rnn_decoder_lsa.py", line 4, in <module> from .lsa_attention import LocationSensitiveAttention File "/code/ppg-vc-main/src/lsa_attention.py", line 4, in <module> from nnsp.layers.basic_layers import Conv1d, Linear ModuleNotFoundError: No module named 'nnsp.layers' 请问这是什么情况?

    opened by dslllu 1
  • Bump pyyaml from 5.3.1 to 5.4

    Bump pyyaml from 5.3.1 to 5.4

    Bumps pyyaml from 5.3.1 to 5.4.

    Changelog

    Sourced from pyyaml's changelog.

    5.4 (2021-01-19)

    Commits
    • 58d0cb7 5.4 release
    • a60f7a1 Fix compatibility with Jython
    • ee98abd Run CI on PR base branch changes
    • ddf2033 constructor.timezone: _copy & deepcopy
    • fc914d5 Avoid repeatedly appending to yaml_implicit_resolvers
    • a001f27 Fix for CVE-2020-14343
    • fe15062 Add 3.9 to appveyor file for completeness sake
    • 1e1c7fb Add a newline character to end of pyproject.toml
    • 0b6b7d6 Start sentences and phrases for capital letters
    • c976915 Shell code improvements
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • Unable to run convert_from_wav.py; Module nnsp is missing a submodule?

    Unable to run convert_from_wav.py; Module nnsp is missing a submodule?

    cmd_2WyWrZJ55f

    text: (base) F:\ppg-vc>python convert_from_wav.py --ppg2mel_model_train_config .\bilstm-vctk-libritts460-oneshot\bilstm_ppg2mel_vctk_libri_oneshotvc.yaml --ppg2mel_model_file .\bilstm-vctk-libritts460-oneshot\step_250000.pth --src_wav_dir F:\p5_characters\Ann\audio --ref_wav_path F:\p5_characters\Akechi\audio\ve370_002_00083.wav -o .\output_ann_to_akechi Traceback (most recent call last): File "convert_from_wav.py", line 11, in <module> from src.mel_decoder_mol_encAddlf0 import MelDecoderMOL File "F:\ppg-vc\src\__init__.py", line 4, in <module> from .mel_decoder_lsa import MelDecoderLSA File "F:\ppg-vc\src\mel_decoder_lsa.py", line 20, in <module> from .rnn_decoder_lsa import Decoder File "F:\ppg-vc\src\rnn_decoder_lsa.py", line 5, in <module> from nnsp.ctc_seq2seq_ppg_vc.lsa_attention import LocationSensitiveAttention ModuleNotFoundError: No module named 'nnsp.ctc_seq2seq_ppg_vc'

    I'm on Python 3.8, latest version of nnsp is installed via pip.

    opened by Iamgoofball 1
  • ZeroDivisionError: float division by zero

    ZeroDivisionError: float division by zero

    Im trying to run test.sh but keep getting the same error, seems I can't get the right utterances. Can anyone kindly give me some advice?

    Experiment name: seq2seq_mol_ppg2mel_vctk_libri_oneshotvc_r4_normMel_v2
    Load PPG-model, PPG2Mel-model, Vocoder-model...
    Removing weight norm...
    Loaded the voice encoder model on cuda in 0.02 seconds.
    Number of source utterances: 0.
    0it [00:00, ?it/s]
    RTF:
    Traceback (most recent call last):
      File "convert_from_wav.py", line 216, in <module>
        main()
      File "convert_from_wav.py", line 212, in main
        convert(args)
      File "/usr/local/lib/python3.7/dist-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
        return func(*args, **kwargs)
      File "convert_from_wav.py", line 167, in convert
        print(total_rtf / cnt)
    ZeroDivisionError: float division by zero
    
    opened by johnny7861532 0
  • Why is sampling rate not consistent for different feature extraction?

    Why is sampling rate not consistent for different feature extraction?

    Thanks for your work first of all. I've found the sampling rate set to 16k in bnf and spk embeddings and 24k for f0 and mel computation during training. May I know what is the purpose?

    opened by Quadcore1010 0
  • Bump numpy from 1.19.2 to 1.22.0

    Bump numpy from 1.19.2 to 1.22.0

    Bumps numpy from 1.19.2 to 1.22.0.

    Release notes

    Sourced from numpy's releases.

    v1.22.0

    NumPy 1.22.0 Release Notes

    NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

    • Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.
    • A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.
    • NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.
    • New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.
    • A new configurable allocator for use by downstream projects.

    These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

    The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

    Expired deprecations

    Deprecated numeric style dtype strings have been removed

    Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

    (gh-19539)

    Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

    numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

    (gh-19615)

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • missing files

    missing files

    what is /home/shaunxliu/data/vctk/fidlists/train_fidlist.new, /home/shaunxliu/data/vctk/fidlists/dev_fidlist.new, /home/shaunxliu/data/vctk/fidlists/eval_fidlist.txt

    how to get such kinds of files for my own dataset

    opened by danielkrisp 2
Owner
Liu Songxiang
Spoken language processing
Liu Songxiang
Pytorch implementation of "MOSNet: Deep Learning based Objective Assessment for Voice Conversion"

MOSNet pytorch implementation of "MOSNet: Deep Learning based Objective Assessment for Voice Conversion" https://arxiv.org/abs/1904.08352 Dependency L

null 9 Nov 18, 2022
Voice Conversion by CycleGAN (语音克隆/语音转换):CycleGAN-VC3

CycleGAN-VC3-PyTorch 中文说明 | English This code is a PyTorch implementation for paper: CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectr

Kun Ma 110 Dec 24, 2022
Implementation of Kaneko et al.'s MaskCycleGAN-VC model for non-parallel voice conversion.

MaskCycleGAN-VC Unofficial PyTorch implementation of Kaneko et al.'s MaskCycleGAN-VC (2021) for non-parallel voice conversion. MaskCycleGAN-VC is the

null 86 Dec 25, 2022
Official implementation of "One-Shot Voice Conversion with Weight Adaptive Instance Normalization".

One-Shot Voice Conversion with Weight Adaptive Instance Normalization By Shengjie Huang, Yanyan Xu*, Dengfeng Ke*, Mingjie Chen, Thomas Hain. This rep

null 31 Dec 7, 2022
An evaluation toolkit for voice conversion models.

Voice-conversion-evaluation An evaluation toolkit for voice conversion models. Sample test pair Generate the metadata for evaluating models. The direc

null 30 Aug 29, 2022
Here is the implementation of our paper S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations.

S2VC Here is the implementation of our paper S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations. In thi

null 81 Dec 15, 2022
StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion

StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion Yinghao Aaron Li, Ali Zare, Nima Mesgarani We pres

Aaron (Yinghao) Li 282 Jan 1, 2023
A project to build an AI voice assistant using Python . The Voice assistant interacts with the humans to perform basic tasks.

AI_Personal_Voice_Assistant_Using_Python A project to build an AI voice assistant using Python . The Voice assistant interacts with the humans to perf

Chumui Tripura 1 Oct 30, 2021
Voice assistant - Voice assistant with python

?? Python Voice Assistant ?? - User's greeting ?? - Writing tasks to todo-list ?

PythonToday 10 Dec 26, 2022
MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.

MMdnn MMdnn is a comprehensive and cross-framework tool to convert, visualize and diagnose deep learning (DL) models. The "MM" stands for model manage

Microsoft 5.7k Jan 9, 2023
This project is a loose implementation of paper "Algorithmic Financial Trading with Deep Convolutional Neural Networks: Time Series to Image Conversion Approach"

Stock Market Buy/Sell/Hold prediction Using convolutional Neural Network This repo is an attempt to implement the research paper titled "Algorithmic F

Asutosh Nayak 136 Dec 28, 2022
PlenOctrees: NeRF-SH Training & Conversion

PlenOctrees Official Repo: NeRF-SH training and conversion This repository contains code to train NeRF-SH and to extract the PlenOctree, constituting

Alex Yu 323 Dec 29, 2022
Implementation for "Manga Filling Style Conversion with Screentone Variational Autoencoder" (SIGGRAPH ASIA 2020 issue)

Manga Filling with ScreenVAE SIGGRAPH ASIA 2020 | Project Website | BibTex This repository is for ScreenVAE introduced in the following paper "Manga F

null 30 Dec 24, 2022
[ICCV2021] IICNet: A Generic Framework for Reversible Image Conversion

IICNet - Invertible Image Conversion Net Official PyTorch Implementation for IICNet: A Generic Framework for Reversible Image Conversion (ICCV2021). D

felixcheng97 55 Dec 6, 2022
Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.

Core ML Tools Use coremltools to convert machine learning models from third-party libraries to the Core ML format. The Python package contains the sup

Apple 3k Jan 8, 2023
Automatic labeling, conversion of different data set formats, sample size statistics, model cascade

Simple Gadget Collection for Object Detection Tasks Automatic image annotation Conversion between different annotation formats Obtain statistical info

llt 4 Aug 24, 2022
Conversion between units used in magnetism

convmag Conversion between various units used in magnetism The conversions between base units available are: T <-> G : 1e4

null 0 Jul 15, 2021
Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel order of RGB and BGR. Simple Channel Converter for ONNX.

scc4onnx Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel

Katsuya Hyodo 16 Dec 22, 2022
Woosung Choi 63 Nov 14, 2022