Any-to-any voice conversion using synthetic specific-speaker speeches as intermedium features

Overview

MediumVC

MediumVC is an utterance-level method towards any-to-any VC. Before that, we propose SingleVC to perform A2O tasks(Xi → Ŷi) , Xi means utterance i spoken by X). The Ŷi are considered as SSIF. To build SingleVC, we employ a novel data augment strategy: pitch-shifted and duration-remained(PSDR) to produce paired asymmetrical training data. Then, based on pre-trained SingleVC, MediumVC performs an asymmetrical reconstruction task(Ŷi → X̂i). Due to the asymmetrical reconstruction mode, MediumVC achieves more efficient feature decoupling and fusion. Experiments demonstrate MediumVC performs strong robustness for unseen speakers across multiple public datasets. Here is the official implementation of the paper, MediumVC.

The following are the overall model architecture.

Model architecture

For the audio samples, please refer to our demo page. The more converted speeches can be found in "Demo/ConvertedSpeeches/".

Envs

You can install the dependencies with

pip install -r requirements.txt

Speaker Encoder

Dvector is a robust speaker verification (SV) system pre-trained on VoxCeleb1 using GE2E loss, and it produces 256-dim speaker embedding. In our evaluation on multiple datasets(VCTK with 30000 pairs, Librispeech with 30000 pairs and VCC2020 with 10000 pairs), the equal error rates(EERs)and thresholds(THRs) are recorded in Table. Then Dvector with THRs is also employed to calculate SV accuracy(ACC) of pairs produced by MediumVC and other contrast methods for objective evaluation. The more details can access paper.

Dataset VCTK LibriSpeech VCC2020
EER(%)/THR 7.71/0.462 7.95/0.337 1.06/0.432

Vocoder

The HiFi-GAN vocoder is employed to convert log mel-spectrograms to waveforms. The model is trained on universal datasets with 13.93M parameters. Through our evaluation, it can synthesize 22.05 kHz high-fidelity speeches over 4.0 MOS, even in cross-language or noisy environments.

Infer

You can download the pretrained model, and then edit "Any2Any/infer/infer_config.yaml".Test Samples could be organized as "wav22050/$figure$/*.wav".

python Any2Any/infer/infer.py

Train from scratch

Preprocessing

The corpus should be organized as "VCTK22050/$figure$/*.wav", and then edit the config file "Any2Any/pre_feature/preprocess_config.yaml".The output "spk_emb_mel_label.pkl" will be used for training.

python Any2Any/pre_feature/figure_spkemb_mel.py

Training

Please edit the paths of pretrained hifi-model,wav2mel,dvector,SingleVC in config file "Any2Any/config.yaml" at first.

python Any2Any/solver.py
You might also like...
Python interface to the WebRTC Voice Activity Detector

py-webrtcvad This is a python interface to the WebRTC Voice Activity Detector (VAD). It is compatible with Python 2 and Python 3. A VAD classifies a p

SU Music Player — The first open-source PyTgCalls based Pyrogram bot to play music in voice chats

SU Music Player — The first open-source PyTgCalls based Pyrogram bot to play music in voice chats Note Neither this, or PyTgCalls are fully

 A bot that can play music on Telegram Group and Channel Voice Chats
A bot that can play music on Telegram Group and Channel Voice Chats

DaisyXmusic ❤ is the best and only Telegram VC player with playlists, Multi Playback, Channel play and more

DaisyXmusic ❤ A bot that can play music on Telegram Group and Channel Voice Chats
DaisyXmusic ❤ A bot that can play music on Telegram Group and Channel Voice Chats

DaisyXmusic ❤ is the best and only Telegram VC player with playlists, Multi Playback, Channel play and more

Jarvis From Basic to Advance - make a voice assistant similar to JARVIS (in iron man movie)
Jarvis From Basic to Advance - make a voice assistant similar to JARVIS (in iron man movie)

JARVIS (Basic to Advance) This was my attempt to make a voice assistant similar to JARVIS (in iron man movie) Let's be honest, it's not as intelligent

A simple voice detection system which can be applied practically for designing a device with capability to detect a baby’s cry and automatically turning on music

Auto-Baby-Cry-Detection-with-Music-Player A simple voice detection system which can be applied practically for designing a device with capability to d

This is an AI that runs in the terminal. It is a voice assistant that can do common activities and can also help in your coding doubts like

This is an AI that runs in the terminal. It is a voice assistant that can do common activities and can also help in your coding doubts like

This is my voice assistant Patric!

voice-assistant This is my voice assistant Patric! You can add can add commands and even modify his name Indice How to use Installation guide How to u

Comments
  • Inference Error

    Inference Error

    Hello, when I try to run the inference code, I get the following error: File "Any2Any/infer/infer.py", line 11, in from Any2Any import util ModuleNotFoundError: No module named 'Any2Any'

    I get this error when running on Google colab and locally on Windows. I believe this means that the code doesn't recognize Any2Any folder as a module which should be solved when init.py exists in the directory. But unfortunately, it still gives an error even when init.py exists.

    opened by AhmedHashish123 6
  • RuntimeError: Error(s) in loading state_dict for MagicModel:

    RuntimeError: Error(s) in loading state_dict for MagicModel:

    Hello, I have a problem when I try to run the inference code with pre trained model, I get the following error:

    【Solver】
    *********  [load]   ***********
    01/28 07:21:45 PM (Elapsed: 00:00:03) loading the model from /content/MediumVC/Any2Any/model/checkpoint-3000.pt
    Traceback (most recent call last):
      File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
        "__main__", mod_spec)
      File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "/content/MediumVC/Any2Any/infer/infer.py", line 95, in <module>
        solver = Solver(config)
      File "/content/MediumVC/Any2Any/infer/infer.py", line 28, in __init__
        self.resume_model(self.config['resume_path'])
      File "/content/MediumVC/Any2Any/infer/infer.py", line 56, in resume_model
        self.Generator.load_state_dict(checkpoint['Generator'])
      File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1483, in load_state_dict
        self.__class__.__name__, "\n\t".join(error_msgs)))
    RuntimeError: Error(s) in loading state_dict for MagicModel:
    	Missing key(s) in state_dict: "any2one.encoder.pre_block.0.conv_block1.conv_block.conv0.bias", "any2one.encoder.pre_block.0.conv_block1.conv_block.conv0.weight", "any2one.encoder.pre_block.0.conv_block2.conv_block.conv0.bias", "any2one.encoder.pre_block.0.conv_block2.conv_block.conv0.weight", "any2one.encoder.pre_block.0.adjust_dim_layer.bias", "any2one.encoder.pre_block.0.adjust_dim_layer.weight", "any2one.encoder.pre_block.1.conv_block1.conv_block.conv0.bias", "any2one.encoder.pre_block.1.conv_block1.conv_block.conv0.weight", "any2one.encoder.pre_block.1.conv_block2.conv_block.conv0.bias", "any2one.encoder.pre_block.1.conv_block2.conv_block.conv0.weight", "any2one.encoder.pre_block.1.adjust_dim_layer.bias", "any2one.encoder.pre_block.1.adjust_dim_layer.weight", "any2one.encoder.pre_block.2.conv_block1.conv_block.conv0.bias", "any2one.encoder.pre_block.2.conv_block1.conv_block.conv0.weight", "any2one.encoder.pre_block.2.conv_block2.conv_block.conv0.bias", "any2one.encoder.pre_block.2.conv_block2.conv_block.conv0.weight", "any2one.encoder.pre_block.2.adjust_dim_layer.bias", "any2one.encoder.pre_block.2.adjust_dim_layer.weight", "any2one.encoder.post_block.0.cross_attn.in_proj_weight", "any2one.encoder.post_block.0.cross_attn.in_proj_bias", "any2one.encoder.post_block.0.cross_attn.out_proj.weight", "any2one.encoder.post_block.0.cross_attn.out_proj.bias", "any2one.decoder.pre_conv_block.0.conv_block1.conv_block.conv0.bias", "any2one.decoder.pre_conv_block.0.conv_block1.conv_block.conv0.weight", "any2one.decoder.pre_conv_block.0.conv_block2.conv_block.conv0.bias", "any2one.decoder.pre_conv_block.0.conv_block2.conv_block.conv0.weight", "any2one.decoder.pre_conv_block.0.adjust_dim_layer.bias", "any2one.decoder.pre_conv_block.0.adjust_dim_layer.weight", "any2one.decoder.pre_attention_block.0.cross_attn.in_proj_weight", "any2one.decoder.pre_attention_block.0.cross_attn.in_proj_bias", "any2one.decoder.pre_attention_block.0.cross_attn.out_proj.weight", "any2one.decoder.pre_attention_block.0.cross_attn.out_proj.bias", "any2one.decoder.mel_linear1.weight", "any2one.decoder.mel_linear1.bias", "any2one.decoder.mel_linear2.weight", "any2one.decoder.mel_linear2.bias", "any2one.decoder.smoothers.0.self_attn.in_proj_weight", "any2one.decoder.smoothers.0.self_attn.in_proj_bias", "any2one.decoder.smoothers.0.self_attn.out_proj.weight", "any2one.decoder.smoothers.0.self_attn.out_proj.bias", "any2one.decoder.smoothers.0.conv0.bias", "any2one.decoder.smoothers.0.conv0.weight", "any2one.decoder.smoothers.0.conv1.bias", "any2one.decoder.smoothers.0.conv1.weight", "any2one.decoder.smoothers.1.self_attn.in_proj_weight", "any2one.decoder.smoothers.1.self_attn.in_proj_bias", "any2one.decoder.smoothers.1.self_attn.out_proj.weight", "any2one.decoder.smoothers.1.self_attn.out_proj.bias", "any2one.decoder.smoothers.1.conv0.bias", "any2one.decoder.smoothers.1.conv0.weight", "any2one.decoder.smoothers.1.conv1.bias", "any2one.decoder.smoothers.1.conv1.weight", "any2one.decoder.smoothers.2.self_attn.in_proj_weight", "any2one.decoder.smoothers.2.self_attn.in_proj_bias", "any2one.decoder.smoothers.2.self_attn.out_proj.weight", "any2one.decoder.smoothers.2.self_attn.out_proj.bias", "any2one.decoder.smoothers.2.conv0.bias", "any2one.decoder.smoothers.2.conv0.weight", "any2one.decoder.smoothers.2.conv1.bias", "any2one.decoder.smoothers.2.conv1.weight", "any2one.decoder.post_block.0.conv_block1.conv_block.conv0.bias", "any2one.decoder.post_block.0.conv_block1.conv_block.conv0.weight", "any2one.decoder.post_block.0.conv_block2.conv_block.conv0.bias", "any2one.decoder.post_block.0.conv_block2.conv_block.conv0.weight", "any2one.decoder.post_block.0.adjust_dim_layer.bias", "any2one.decoder.post_block.0.adjust_dim_layer.weight", "any2one.decoder.post_block.1.conv_block1.conv_block.conv0.bias", "any2one.decoder.post_block.1.conv_block1.conv_block.conv0.weight", "any2one.decoder.post_block.1.conv_block2.conv_block.conv0.bias", "any2one.decoder.post_block.1.conv_block2.conv_block.conv0.weight", "any2one.decoder.post_block.1.adjust_dim_layer.bias", "any2one.decoder.post_block.1.adjust_dim_layer.weight", "any2one.decoder.post_block.2.conv_block1.conv_block.conv0.bias", "any2one.decoder.post_block.2.conv_block1.conv_block.conv0.weight", "any2one.decoder.post_block.2.conv_block2.conv_block.conv0.bias", "any2one.decoder.post_block.2.conv_block2.conv_block.conv0.weight", "any2one.decoder.post_block.2.adjust_dim_layer.bias", "any2one.decoder.post_block.2.adjust_dim_layer.weight", "any2one.decoder.post_block.3.conv_block1.conv_block.conv0.bias", "any2one.decoder.post_block.3.conv_block1.conv_block.conv0.weight", "any2one.decoder.post_block.3.conv_block2.conv_block.conv0.bias", "any2one.decoder.post_block.3.conv_block2.conv_block.conv0.weight", "any2one.decoder.post_block.3.adjust_dim_layer.bias", "any2one.decoder.post_block.3.adjust_dim_layer.weight", "cont_encoder.conv_block0.0.conv_block1.conv_block.conv0.bias", "cont_encoder.conv_block0.0.conv_block1.conv_block.conv0.weight_g", "cont_encoder.conv_block0.0.conv_block1.conv_block.conv0.weight_v", "cont_encoder.conv_block0.0.conv_block2.conv_block.conv0.bias", "cont_encoder.conv_block0.0.conv_block2.conv_block.conv0.weight_g", "cont_encoder.conv_block0.0.conv_block2.conv_block.conv0.weight_v", "cont_encoder.conv_block0.0.adjust_dim_layer.bias", "cont_encoder.conv_block0.0.adjust_dim_layer.weight_g", "cont_encoder.conv_block0.0.adjust_dim_layer.weight_v", "cont_encoder.conv_block0.1.conv_block1.conv_block.conv0.bias", "cont_encoder.conv_block0.1.conv_block1.conv_block.conv0.weight_g", "cont_encoder.conv_block0.1.conv_block1.conv_block.conv0.weight_v", "cont_encoder.conv_block0.1.conv_block2.conv_block.conv0.bias", "cont_encoder.conv_block0.1.conv_block2.conv_block.conv0.weight_g", "cont_encoder.conv_block0.1.conv_block2.conv_block.conv0.weight_v", "cont_encoder.conv_block0.1.adjust_dim_layer.bias", "cont_encoder.conv_block0.1.adjust_dim_layer.weight_g", "cont_encoder.conv_block0.1.adjust_dim_layer.weight_v", "cont_encoder.attention_norm0.0.cross_attn.in_proj_weight", "cont_encoder.attention_norm0.0.cross_attn.in_proj_bias", "cont_encoder.attention_norm0.0.cross_attn.out_proj.weight", "cont_encoder.attention_norm0.0.cross_attn.out_proj.bias", "cont_encoder.conv_block1.0.conv_block1.conv_block.conv0.bias", "cont_encoder.conv_block1.0.conv_block1.conv_block.conv0.weight_g", "cont_encoder.conv_block1.0.conv_block1.conv_block.conv0.weight_v", "cont_encoder.conv_block1.0.conv_block2.conv_block.conv0.bias", "cont_encoder.conv_block1.0.conv_block2.conv_block.conv0.weight_g", "cont_encoder.conv_block1.0.conv_block2.conv_block.conv0.weight_v", "cont_encoder.conv_block1.0.adjust_dim_layer.bias", "cont_encoder.conv_block1.0.adjust_dim_layer.weight_g", "cont_encoder.conv_block1.0.adjust_dim_layer.weight_v", "cont_encoder.conv_block1.1.conv_block1.conv_block.conv0.bias", "cont_encoder.conv_block1.1.conv_block1.conv_block.conv0.weight_g", "cont_encoder.conv_block1.1.conv_block1.conv_block.conv0.weight_v", "cont_encoder.conv_block1.1.conv_block2.conv_block.conv0.bias", "cont_encoder.conv_block1.1.conv_block2.conv_block.conv0.weight_g", "cont_encoder.conv_block1.1.conv_block2.conv_block.conv0.weight_v", "cont_encoder.conv_block1.1.adjust_dim_layer.bias", "cont_encoder.conv_block1.1.adjust_dim_layer.weight_g", "cont_encoder.conv_block1.1.adjust_dim_layer.weight_v", "cont_encoder.attention_norm1.0.cross_attn.in_proj_weight", "cont_encoder.attention_norm1.0.cross_attn.in_proj_bias", "cont_encoder.attention_norm1.0.cross_attn.out_proj.weight", "cont_encoder.attention_norm1.0.cross_attn.out_proj.bias", "generator.pre_block0.0.conv_block1.conv_block.conv0.bias", "generator.pre_block0.0.conv_block1.conv_block.conv0.weight_g", "generator.pre_block0.0.conv_block1.conv_block.conv0.weight_v", "generator.pre_block0.0.conv_block2.conv_block.conv0.bias", "generator.pre_block0.0.conv_block2.conv_block.conv0.weight_g", "generator.pre_block0.0.conv_block2.conv_block.conv0.weight_v", "generator.pre_block0.0.adjust_dim_layer.bias", "generator.pre_block0.0.adjust_dim_layer.weight_g", "generator.pre_block0.0.adjust_dim_layer.weight_v", "generator.pre_block0.1.conv_block1.conv_block.conv0.bias", "generator.pre_block0.1.conv_block1.conv_block.conv0.weight_g", "generator.pre_block0.1.conv_block1.conv_block.conv0.weight_v", "generator.pre_block0.1.conv_block2.conv_block.conv0.bias", "generator.pre_block0.1.conv_block2.conv_block.conv0.weight_g", "generator.pre_block0.1.conv_block2.conv_block.conv0.weight_v", "generator.pre_block0.1.adjust_dim_layer.bias", "generator.pre_block0.1.adjust_dim_layer.weight_g", "generator.pre_block0.1.adjust_dim_layer.weight_v", "generator.attention0.cross_attn.in_proj_weight", "generator.attention0.cross_attn.in_proj_bias", "generator.attention0.cross_attn.out_proj.weight", "generator.attention0.cross_attn.out_proj.bias", "generator.pre_block1.0.conv_block1.conv_block.conv0.bias", "generator.pre_block1.0.conv_block1.conv_block.conv0.weight_g", "generator.pre_block1.0.conv_block1.conv_block.conv0.weight_v", "generator.pre_block1.0.conv_block2.conv_block.conv0.bias", "generator.pre_block1.0.conv_block2.conv_block.conv0.weight_g", "generator.pre_block1.0.conv_block2.conv_block.conv0.weight_v", "generator.pre_block1.0.adjust_dim_layer.bias", "generator.pre_block1.0.adjust_dim_layer.weight_g", "generator.pre_block1.0.adjust_dim_layer.weight_v", "generator.pre_block1.1.conv_block1.conv_block.conv0.bias", "generator.pre_block1.1.conv_block1.conv_block.conv0.weight_g", "generator.pre_block1.1.conv_block1.conv_block.conv0.weight_v", "generator.pre_block1.1.conv_block2.conv_block.conv0.bias", "generator.pre_block1.1.conv_block2.conv_block.conv0.weight_g", "generator.pre_block1.1.conv_block2.conv_block.conv0.weight_v", "generator.pre_block1.1.adjust_dim_layer.bias", "generator.pre_block1.1.adjust_dim_layer.weight_g", "generator.pre_block1.1.adjust_dim_layer.weight_v", "generator.attention1.cross_attn.in_proj_weight", "generator.attention1.cross_attn.in_proj_bias", "generator.attention1.cross_attn.out_proj.weight", "generator.attention1.cross_attn.out_proj.bias", "generator.smoothers.0.self_attn.in_proj_weight", "generator.smoothers.0.self_attn.in_proj_bias", "generator.smoothers.0.self_attn.out_proj.weight", "generator.smoothers.0.self_attn.out_proj.bias", "generator.smoothers.0.conv0.bias", "generator.smoothers.0.conv0.weight_g", "generator.smoothers.0.conv0.weight_v", "generator.smoothers.0.conv1.bias", "generator.smoothers.0.conv1.weight_g", "generator.smoothers.0.conv1.weight_v", "generator.smoothers.1.self_attn.in_proj_weight", "generator.smoothers.1.self_attn.in_proj_bias", "generator.smoothers.1.self_attn.out_proj.weight", "generator.smoothers.1.self_attn.out_proj.bias", "generator.smoothers.1.conv0.bias", "generator.smoothers.1.conv0.weight_g", "generator.smoothers.1.conv0.weight_v", "generator.smoothers.1.conv1.bias", "generator.smoothers.1.conv1.weight_g", "generator.smoothers.1.conv1.weight_v", "generator.smoothers.2.self_attn.in_proj_weight", "generator.smoothers.2.self_attn.in_proj_bias", "generator.smoothers.2.self_attn.out_proj.weight", "generator.smoothers.2.self_attn.out_proj.bias", "generator.smoothers.2.conv0.bias", "generator.smoothers.2.conv0.weight_g", "generator.smoothers.2.conv0.weight_v", "generator.smoothers.2.conv1.bias", "generator.smoothers.2.conv1.weight_g", "generator.smoothers.2.conv1.weight_v", "generator.post_block.0.conv_block1.conv_block.conv0.bias", "generator.post_block.0.conv_block1.conv_block.conv0.weight_g", "generator.post_block.0.conv_block1.conv_block.conv0.weight_v", "generator.post_block.0.conv_block2.conv_block.conv0.bias", "generator.post_block.0.conv_block2.conv_block.conv0.weight_g", "generator.post_block.0.conv_block2.conv_block.conv0.weight_v", "generator.post_block.0.adjust_dim_layer.bias", "generator.post_block.0.adjust_dim_layer.weight_g", "generator.post_block.0.adjust_dim_layer.weight_v", "generator.post_block.1.conv_block1.conv_block.conv0.bias", "generator.post_block.1.conv_block1.conv_block.conv0.weight_g", "generator.post_block.1.conv_block1.conv_block.conv0.weight_v", "generator.post_block.1.conv_block2.conv_block.conv0.bias", "generator.post_block.1.conv_block2.conv_block.conv0.weight_g", "generator.post_block.1.conv_block2.conv_block.conv0.weight_v", "generator.post_block.1.adjust_dim_layer.bias", "generator.post_block.1.adjust_dim_layer.weight_g", "generator.post_block.1.adjust_dim_layer.weight_v", "generator.post_block.2.conv_block1.conv_block.conv0.bias", "generator.post_block.2.conv_block1.conv_block.conv0.weight_g", "generator.post_block.2.conv_block1.conv_block.conv0.weight_v", "generator.post_block.2.conv_block2.conv_block.conv0.bias", "generator.post_block.2.conv_block2.conv_block.conv0.weight_g", "generator.post_block.2.conv_block2.conv_block.conv0.weight_v", "generator.post_block.2.adjust_dim_layer.bias", "generator.post_block.2.adjust_dim_layer.weight_g", "generator.post_block.2.adjust_dim_layer.weight_v", "generator.post_block.3.conv_block1.conv_block.conv0.bias", "generator.post_block.3.conv_block1.conv_block.conv0.weight_g", "generator.post_block.3.conv_block1.conv_block.conv0.weight_v", "generator.post_block.3.conv_block2.conv_block.conv0.bias", "generator.post_block.3.conv_block2.conv_block.conv0.weight_g", "generator.post_block.3.conv_block2.conv_block.conv0.weight_v", "generator.post_block.3.adjust_dim_layer.bias", "generator.post_block.3.adjust_dim_layer.weight_g", "generator.post_block.3.adjust_dim_layer.weight_v", "generator.post_block.4.conv_block1.conv_block.conv0.bias", "generator.post_block.4.conv_block1.conv_block.conv0.weight_g", "generator.post_block.4.conv_block1.conv_block.conv0.weight_v", "generator.post_block.4.conv_block2.conv_block.conv0.bias", "generator.post_block.4.conv_block2.conv_block.conv0.weight_g", "generator.post_block.4.conv_block2.conv_block.conv0.weight_v", "generator.post_block.4.adjust_dim_layer.bias", "generator.post_block.4.adjust_dim_layer.weight_g", "generator.post_block.4.adjust_dim_layer.weight_v". 
    	Unexpected key(s) in state_dict: "encoder.pre_block.0.conv_block1.conv_block.conv0.bias", "encoder.pre_block.0.conv_block1.conv_block.conv0.weight_g", "encoder.pre_block.0.conv_block1.conv_block.conv0.weight_v", "encoder.pre_block.0.conv_block2.conv_block.conv0.bias", "encoder.pre_block.0.conv_block2.conv_block.conv0.weight_g", "encoder.pre_block.0.conv_block2.conv_block.conv0.weight_v", "encoder.pre_block.0.adjust_dim_layer.bias", "encoder.pre_block.0.adjust_dim_layer.weight_g", "encoder.pre_block.0.adjust_dim_layer.weight_v", "encoder.pre_block.1.conv_block1.conv_block.conv0.bias", "encoder.pre_block.1.conv_block1.conv_block.conv0.weight_g", "encoder.pre_block.1.conv_block1.conv_block.conv0.weight_v", "encoder.pre_block.1.conv_block2.conv_block.conv0.bias", "encoder.pre_block.1.conv_block2.conv_block.conv0.weight_g", "encoder.pre_block.1.conv_block2.conv_block.conv0.weight_v", "encoder.pre_block.1.adjust_dim_layer.bias", "encoder.pre_block.1.adjust_dim_layer.weight_g", "encoder.pre_block.1.adjust_dim_layer.weight_v", "encoder.pre_block.2.conv_block1.conv_block.conv0.bias", "encoder.pre_block.2.conv_block1.conv_block.conv0.weight_g", "encoder.pre_block.2.conv_block1.conv_block.conv0.weight_v", "encoder.pre_block.2.conv_block2.conv_block.conv0.bias", "encoder.pre_block.2.conv_block2.conv_block.conv0.weight_g", "encoder.pre_block.2.conv_block2.conv_block.conv0.weight_v", "encoder.pre_block.2.adjust_dim_layer.bias", "encoder.pre_block.2.adjust_dim_layer.weight_g", "encoder.pre_block.2.adjust_dim_layer.weight_v", "encoder.post_block.0.cross_attn.in_proj_weight", "encoder.post_block.0.cross_attn.in_proj_bias", "encoder.post_block.0.cross_attn.out_proj.weight", "encoder.post_block.0.cross_attn.out_proj.bias", "decoder.pre_conv_block.0.conv_block1.conv_block.conv0.bias", "decoder.pre_conv_block.0.conv_block1.conv_block.conv0.weight_g", "decoder.pre_conv_block.0.conv_block1.conv_block.conv0.weight_v", "decoder.pre_conv_block.0.conv_block2.conv_block.conv0.bias", "decoder.pre_conv_block.0.conv_block2.conv_block.conv0.weight_g", "decoder.pre_conv_block.0.conv_block2.conv_block.conv0.weight_v", "decoder.pre_conv_block.0.adjust_dim_layer.bias", "decoder.pre_conv_block.0.adjust_dim_layer.weight_g", "decoder.pre_conv_block.0.adjust_dim_layer.weight_v", "decoder.pre_attention_block.0.cross_attn.in_proj_weight", "decoder.pre_attention_block.0.cross_attn.in_proj_bias", "decoder.pre_attention_block.0.cross_attn.out_proj.weight", "decoder.pre_attention_block.0.cross_attn.out_proj.bias", "decoder.mel_linear1.weight", "decoder.mel_linear1.bias", "decoder.mel_linear2.weight", "decoder.mel_linear2.bias", "decoder.smoothers.0.self_attn.in_proj_weight", "decoder.smoothers.0.self_attn.in_proj_bias", "decoder.smoothers.0.self_attn.out_proj.weight", "decoder.smoothers.0.self_attn.out_proj.bias", "decoder.smoothers.0.conv0.bias", "decoder.smoothers.0.conv0.weight_g", "decoder.smoothers.0.conv0.weight_v", "decoder.smoothers.0.conv1.bias", "decoder.smoothers.0.conv1.weight_g", "decoder.smoothers.0.conv1.weight_v", "decoder.smoothers.1.self_attn.in_proj_weight", "decoder.smoothers.1.self_attn.in_proj_bias", "decoder.smoothers.1.self_attn.out_proj.weight", "decoder.smoothers.1.self_attn.out_proj.bias", "decoder.smoothers.1.conv0.bias", "decoder.smoothers.1.conv0.weight_g", "decoder.smoothers.1.conv0.weight_v", "decoder.smoothers.1.conv1.bias", "decoder.smoothers.1.conv1.weight_g", "decoder.smoothers.1.conv1.weight_v", "decoder.smoothers.2.self_attn.in_proj_weight", "decoder.smoothers.2.self_attn.in_proj_bias", "decoder.smoothers.2.self_attn.out_proj.weight", "decoder.smoothers.2.self_attn.out_proj.bias", "decoder.smoothers.2.conv0.bias", "decoder.smoothers.2.conv0.weight_g", "decoder.smoothers.2.conv0.weight_v", "decoder.smoothers.2.conv1.bias", "decoder.smoothers.2.conv1.weight_g", "decoder.smoothers.2.conv1.weight_v", "decoder.post_block.0.conv_block1.conv_block.conv0.bias", "decoder.post_block.0.conv_block1.conv_block.conv0.weight_g", "decoder.post_block.0.conv_block1.conv_block.conv0.weight_v", "decoder.post_block.0.conv_block2.conv_block.conv0.bias", "decoder.post_block.0.conv_block2.conv_block.conv0.weight_g", "decoder.post_block.0.conv_block2.conv_block.conv0.weight_v", "decoder.post_block.0.adjust_dim_layer.bias", "decoder.post_block.0.adjust_dim_layer.weight_g", "decoder.post_block.0.adjust_dim_layer.weight_v", "decoder.post_block.1.conv_block1.conv_block.conv0.bias", "decoder.post_block.1.conv_block1.conv_block.conv0.weight_g", "decoder.post_block.1.conv_block1.conv_block.conv0.weight_v", "decoder.post_block.1.conv_block2.conv_block.conv0.bias", "decoder.post_block.1.conv_block2.conv_block.conv0.weight_g", "decoder.post_block.1.conv_block2.conv_block.conv0.weight_v", "decoder.post_block.1.adjust_dim_layer.bias", "decoder.post_block.1.adjust_dim_layer.weight_g", "decoder.post_block.1.adjust_dim_layer.weight_v", "decoder.post_block.2.conv_block1.conv_block.conv0.bias", "decoder.post_block.2.conv_block1.conv_block.conv0.weight_g", "decoder.post_block.2.conv_block1.conv_block.conv0.weight_v", "decoder.post_block.2.conv_block2.conv_block.conv0.bias", "decoder.post_block.2.conv_block2.conv_block.conv0.weight_g", "decoder.post_block.2.conv_block2.conv_block.conv0.weight_v", "decoder.post_block.2.adjust_dim_layer.bias", "decoder.post_block.2.adjust_dim_layer.weight_g", "decoder.post_block.2.adjust_dim_layer.weight_v", "decoder.post_block.3.conv_block1.conv_block.conv0.bias", "decoder.post_block.3.conv_block1.conv_block.conv0.weight_g", "decoder.post_block.3.conv_block1.conv_block.conv0.weight_v", "decoder.post_block.3.conv_block2.conv_block.conv0.bias", "decoder.post_block.3.conv_block2.conv_block.conv0.weight_g", "decoder.post_block.3.conv_block2.conv_block.conv0.weight_v", "decoder.post_block.3.adjust_dim_layer.bias", "decoder.post_block.3.adjust_dim_layer.weight_g", "decoder.post_block.3.adjust_dim_layer.weight_v". 
    
    opened by ahmadsab95 0
  • Issue with inference

    Issue with inference

    Hi there, I feel I have my folders set up correctly and dependencies installed, hence an output folder is generated at the start of the inference, however shortly after starting I receive the error:

    11/04 03:09:35 AM (Elapsed: 00:00:04) loading the model from Any2Any/model/checkpoint-3900.pt 11/04 03:09:36 AM (Elapsed: 00:00:04) config = {'hifi_model_path': 'hifivoice/pretrained/UNIVERSAL_V1/g_02500000', 'hifi_config_path': 'hifivoice/pretrained/UNIVERSAL_V1/config.json', 'wav2mel_model_path': 'Any2Any/model/dvector/pre_model/wav2mel.pt', 'dvector_model_path': 'Any2Any/model/dvector/pre_model/dvector-step250000.pt', 'pre_train_singlevc': True, 'singlevc_model_path': 'Any2Any/model/checkpoint-3000.pt', 'test_wav_dir': 'Any2Any/audio/in/', 'out_dir': 'Any2Any/audio/out/', 'batch_size': 1, 'resume_path': 'Any2Any/model/checkpoint-3900.pt', 'num_mels': 80, 'num_freq': 1025, 'n_fft': 1024, 'hop_size': 256, 'win_size': 1024, 'sampling_rate': 22050, 'fmin': 0, 'fmax': 8000, 'num_workers': 1} param Generator size = 26.418132M Traceback (most recent call last): File "Any2Any/infer/infer.py", line 96, in solver.infer() File "Any2Any/infer/infer.py", line 60, in infer test_data_loader = self.get_test_data_loaders() File "Any2Any/infer/infer.py", line 43, in get_test_data_loaders test_filelist = get_infer_dataset_filelist(self.config["test_wav_dir"]) File "/content/drive/MyDrive/MediumVC/Any2Any/meldataset.py", line 109, in get_infer_dataset_filelist source_file = random.choice(source_file_list) File "/usr/lib/python3.7/random.py", line 261, in choice raise IndexError('Cannot choose from an empty sequence') from None IndexError: Cannot choose from an empty sequence

    Would greatly appreacite help here, thanks :)

    opened by corranmac 1
Owner
谷下雨
美中不足
谷下雨
Voice package for Pycord adding extra features.

VoiceIO Voice package for Pycord adding extra features. Example Down bellow is an example of what you can currently do. import voiceio process = voic

pycord 1 Dec 24, 2021
Play any song directly into your group voice chat.

Telegram VCPlayer Bot Play any song directly into your group voice chat. Official Bot : VCPlayerBot | Discussion Group : VoiceChat Music Player Suppor

Shubham Kumar 50 Nov 21, 2022
This is a realtime voice translator program which gets input from user at any language and converts it to the desired language that the user asks

This is a realtime voice translator program which gets input from user at any language and converts it to the desired language that the user asks ...

Mohan Ram S 1 Dec 30, 2021
Use android as mic/speaker for ubuntu

Pulse Audio Control Panel Platforms Requirements sudo apt install ffmpeg pactl (already installed) Download Download the AppImage from release page ch

null 19 Dec 1, 2022
L-SpEx: Localized Target Speaker Extraction

L-SpEx: Localized Target Speaker Extraction The data configuration and simulation of L-SpEx. The code scripts will be released in the future. Data Gen

Meng Ge 20 Jan 2, 2023
GNOME powered sound conversion

SoundConverter A simple sound converter application for the GNOME environment. It reads anything the GStreamer library can read, and writes Ogg Vorbis

Gautier Portet 188 Dec 17, 2022
Telegram Voice-Chat Bot Written In Python Using Pyrogram.

Telegram Voice-Chat Bot Telegram Voice-Chat Bot To Play Music From Various Sources In Your Group Support All linux based os. Windows Mac Diagram Requi

TheHamkerCat 314 Dec 29, 2022
Voice to Text using Raspberry Pi

This module will help to convert your voice (speech) into text using Speech Recognition Library. You can control the devices or you can perform the desired tasks by the word recognition

Raspberry_Pi Pakistan 2 Dec 15, 2021
A voice based calculator by using termux api in Android

termux_voice_calculator This is. A voice based calculator by using termux api in Android Instagram account ?? ?? Requirements and installation Downloa

ʕ´•ᴥ•`ʔ╠ŞĦỮβĦa̷m̷╣ʕ´•ᴥ•`ʔ 2 Apr 29, 2022
Pyrogram bot to automate streaming music in voice chats

Pyrogram bot to automate streaming music in voice chats Help If you face an error, want to discuss this project or get support for it, join it's group

Roj 124 Oct 21, 2022