Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning

Last update: Jan 2, 2023

Related tags

Overview

Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning

English | 中文

❗ Now we provide inferencing code and pre-training models. You could generate any text sounds you want.

⭐ The model training only uses the corpus of neutral emotion, and does not use any strongly emotional speech.

⭐ There are still great challenges in out-of-domain style transfer. Limited by the training corpus, it is difficult for the speaker-embedding or unsupervised style learning (like GST) methods to imitate the unseen data.

⭐ With the help of Unet network and AdaIN layer, our proposed algorithm has powerful speaker and style transfer capabilities.

Infer code or Colab notebook

Demo results

Paper link

😄 The authors are preparing simple, clear, and well-documented training process of Unet-TTS based on Aishell3. It contains:

MFA-based duration alignment
Multi-speaker TTS with speaker_embedding-Instance-Normalization, and this model provides pre-training Content Encoder.
Unet-TTS training
One-shot Voice cloning inference
C++ inference

Stay tuned!

Install Requirements

Install the appropriate TensorFlow and tensorflow-addons versions according to CUDA version.
The default is TensorFlow 2.6 and tensorflow-addons 0.14.0.

pip install TensorFlowTTS

Usage

see file UnetTTS_syn.py or notebook

CUDA_VISIBLE_DEVICES=0 python UnetTTS_syn.py

from UnetTTS_syn import UnetTTS

models_and_params = {"duration_param": "train/configs/unetts_duration.yaml",
                    "duration_model": "models/duration4k.h5",
                    "acous_param": "train/configs/unetts_acous.yaml",
                    "acous_model": "models/acous12k.h5",
                    "vocoder_param": "train/configs/multiband_melgan.yaml",
                    "vocoder_model": "models/vocoder800k.h5"}

feats_yaml = "train/configs/unetts_preprocess.yaml"

text2id_mapper = "models/unetts_mapper.json"

Tts_handel = UnetTTS(models_and_params, text2id_mapper, feats_yaml)

#text: input text
#src_audio: reference audio
#dur_stat: phoneme duration statistis to contraol speed rate
syn_audio, _, _ = Tts_handel.one_shot_TTS(text, src_audio, dur_stat)

Reference

https://github.com/TensorSpeech/TensorFlowTTS

https://github.com/CorentinJ/Real-Time-Voice-Cloning

Comments

AttributeError: 'dict' object has no attribute '__NUMPY_SETUP__'报错

进入到One-Shot-Voice-Cloning-master\TensorFlowTTS文件夹后执行python setup.py install报错

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\CHOPY\Downloads\One-Shot-Voice-Cloning-master\TensorFlowTTS\setup.py", line 74, in setup( File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\site-packages\setuptools_init_.py", line 153, in setup return distutils.core.setup(**attrs) File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\distutils\core.py", line 148, in setup dist.run_commands() File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\distutils\dist.py", line 966, in run_commands self.run_command(cmd) File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\distutils\dist.py", line 985, in run_command cmd_obj.run() File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\site-packages\setuptools\command\install.py", line 67, in run self.do_egg_install() File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\site-packages\setuptools\command\install.py", line 117, in do_egg_install cmd.run(show_deprecation=False) File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\site-packages\setuptools\command\easy_install.py", line 408, in run self.easy_install(spec, not self.no_deps) File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\site-packages\setuptools\command\easy_install.py", line 650, in easy_install return self.install_item(None, spec, tmpdir, deps, True) File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\site-packages\setuptools\command\easy_install.py", line 697, in install_item self.process_distribution(spec, dist, deps) File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\site-packages\setuptools\command\easy_install.py", line 744, in process_distribution distros = WorkingSet([]).resolve( File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\site-packages\pkg_resources_init_.py", line 766, in resolve dist = best[req.key] = env.best_match( File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\site-packages\pkg_resources_init_.py", line 1051, in best_match return self.obtain(req, installer) File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\site-packages\pkg_resources_init_.py", line 1063, in obtain return installer(requirement) File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\site-packages\setuptools\command\easy_install.py", line 669, in easy_install return self.install_item(spec, dist.location, tmpdir, deps) File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\site-packages\setuptools\command\easy_install.py", line 695, in install_item dists = self.install_eggs(spec, download, tmpdir) File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\site-packages\setuptools\command\easy_install.py", line 890, in install_eggs return self.build_and_install(setup_script, setup_base) File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\site-packages\setuptools\command\easy_install.py", line 1162, in build_and_install self.run_setup(setup_script, setup_base, args) File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\site-packages\setuptools\command\easy_install.py", line 1146, in run_setup run_setup(setup_script, args) File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\site-packages\setuptools\sandbox.py", line 262, in run_setup raise File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\contextlib.py", line 137, in exit self.gen.throw(typ, value, traceback) File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\site-packages\setuptools\sandbox.py", line 198, in setup_context yield File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\contextlib.py", line 137, in exit self.gen.throw(typ, value, traceback) File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\site-packages\setuptools\sandbox.py", line 169, in save_modules saved_exc.resume() File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\site-packages\setuptools\sandbox.py", line 143, in resume raise exc.with_traceback(self._tb) File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\site-packages\setuptools\sandbox.py", line 156, in save_modules yield saved File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\site-packages\setuptools\sandbox.py", line 198, in setup_context yield File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\site-packages\setuptools\sandbox.py", line 259, in run_setup _execfile(setup_script, ns) File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\site-packages\setuptools\sandbox.py", line 46, in execfile exec(code, globals, locals) File "C:\Users\CHOPY\AppData\Local\Temp\easy_install-f13cm3xd\pysptk-0.1.20\setup.py", line 136, in File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\site-packages\setuptools_init.py", line 153, in setup return distutils.core.setup(**attrs) File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\distutils\core.py", line 148, in setup dist.run_commands() File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\distutils\dist.py", line 966, in run_commands self.run_command(cmd) File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\distutils\dist.py", line 985, in run_command cmd_obj.run() File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\site-packages\setuptools\command\bdist_egg.py", line 155, in run self.run_command("egg_info") File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\distutils\cmd.py", line 313, in run_command self.distribution.run_command(command) File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\distutils\dist.py", line 985, in run_command cmd_obj.run() File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\site-packages\setuptools\command\egg_info.py", line 299, in run self.find_sources() File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\site-packages\setuptools\command\egg_info.py", line 306, in find_sources mm.run() File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\site-packages\setuptools\command\egg_info.py", line 541, in run self.add_defaults() File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\site-packages\setuptools\command\egg_info.py", line 578, in add_defaults sdist.add_defaults(self) File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\distutils\command\sdist.py", line 228, in add_defaults self._add_defaults_ext() File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\distutils\command\sdist.py", line 311, in _add_defaults_ext build_ext = self.get_finalized_command('build_ext') File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\distutils\cmd.py", line 299, in get_finalized_command cmd_obj.ensure_finalized() File "C:\Users\CHOPY\AppData\Local\Programs\Python\Python39\lib\distutils\cmd.py", line 107, in ensure_finalized self.finalize_options() File "C:\Users\CHOPY\AppData\Local\Temp\easy_install-f13cm3xd\pysptk-0.1.20\setup.py", line 77, in finalize_options url="https://github.com/tensorspeech/TensorFlowTTS",

opened by Chopin68 4

Is english text is supported?

duration model load finished.
acoustics model load finished.
vocode model load finished.
['hello']
phoneme seq: sil hello sil

---------------------------------------------------------------------------

KeyError                                  Traceback (most recent call last)

[<ipython-input-40-71c2e65db483>](https://localhost:8080/#) in <module>
     29 text = "hello"
     30 
---> 31 syn_audio, _, _ = Tts_handel.one_shot_TTS(text, ref_audio)
     32 
     33 ipd.Audio(syn_audio, rate=16000)

2 frames

[/usr/local/lib/python3.7/dist-packages/tensorflow_tts/processor/multispk_voiceclone.py](https://localhost:8080/#) in text_to_sequence(self, text, inference)
    798         #print("text",text)
    799         for symbol in text.split():
--> 800             idx = self.symbol_to_id[symbol]
    801             sequence.append(idx)
    802 

KeyError: 'hello'

opened by netrunner-exe 2

python3.9无法兼容llvmlite

执行了pip install One-Shot-Voice-Cloning/TensorFlowTTS 报错，系统是windows10

查阅了类似错误： https://github.com/numba/llvmlite/issues/669

依照上面的iss，目前下载了 numba===0.53.0rc1.post1 llvmlite===0.36.0rc1

但还是会报错：

Building wheels for collected packages: llvmlite, pyworld, audioread, resampy Building wheel for llvmlite (setup.py) ... error error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully. │ exit code: 1 ╰─> [24 lines of output] running bdist_wheel C:\Environment\Anaconda\Anaconda3\envs\virtual_xjs\python.exe C:\Users\1135053672\AppData\Local\Temp\pip-install-jlnsqgwl\llvmlite_9c5622c13c624a7da37c2a9f225418b3\ffi\build.py Trying generator 'Visual Studio 14 2015 Win64' Traceback (most recent call last): File "C:\Users\1135053672\AppData\Local\Temp\pip-install-jlnsqgwl\llvmlite_9c5622c13c624a7da37c2a9f225418b3\ffi\build.py", line 168, in main() File "C:\Users\1135053672\AppData\Local\Temp\pip-install-jlnsqgwl\llvmlite_9c5622c13c624a7da37c2a9f225418b3\ffi\build.py", line 156, in main main_win32() File "C:\Users\1135053672\AppData\Local\Temp\pip-install-jlnsqgwl\llvmlite_9c5622c13c624a7da37c2a9f225418b3\ffi\build.py", line 88, in main_win32 generator = find_win32_generator() File "C:\Users\1135053672\AppData\Local\Temp\pip-install-jlnsqgwl\llvmlite_9c5622c13c624a7da37c2a9f225418b3\ffi\build.py", line 76, in find_win32_generator try_cmake(cmake_dir, build_dir, generator) File "C:\Users\1135053672\AppData\Local\Temp\pip-install-jlnsqgwl\llvmlite_9c5622c13c624a7da37c2a9f225418b3\ffi\build.py", line 28, in try_cmake subprocess.check_call(['cmake', '-G', generator, cmake_dir]) File "C:\Environment\Anaconda\Anaconda3\envs\virtual_xjs\lib\subprocess.py", line 368, in check_call
retcode = call(*popenargs, **kwargs) File "C:\Environment\Anaconda\Anaconda3\envs\virtual_xjs\lib\subprocess.py", line 349, in call with Popen(*popenargs, **kwargs) as p: File "C:\Environment\Anaconda\Anaconda3\envs\virtual_xjs\lib\subprocess.py", line 951, in init
self._execute_child(args, executable, preexec_fn, close_fds, File "C:\Environment\Anaconda\Anaconda3\envs\virtual_xjs\lib\subprocess.py", line 1420, in _execute_child hp, ht, pid, tid = _winapi.CreateProcess(executable, args, FileNotFoundError: [WinError 2] 系统找不到指定的文件。 error: command 'C:\Environment\Anaconda\Anaconda3\envs\virtual_xjs\python.exe' failed with exit code 1 [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for llvmlite Running setup.py clean for llvmlite Building wheel for pyworld (pyproject.toml) ... done Created wheel for pyworld: filename=pyworld-0.3.0-cp39-cp39-win_amd64.whl size=165598 sha256=aa377fa3596f5069cd4e2935db7cda64cf55825fe8d6a3e1f6f4bebbb579b3c5 Stored in directory: c:\users\1135053672\appdata\local\pip\cache\wheels\52\e9\41\dfd518c392d2c9fbf54cec8b7067afb83e759eda086a39aee4 Building wheel for audioread (setup.py) ... done Created wheel for audioread: filename=audioread-2.1.9-py3-none-any.whl size=23153 sha256=1512e3e485965eb7973b0f43c950fe574fe5b73845401013c8faa26577a53b5c Stored in directory: c:\users\1135053672\appdata\local\pip\cache\wheels\d2\1c\42\1c961e1d65429e9edffdd5fa1b69cae92a1082133abbf39835 Building wheel for resampy (setup.py) ... done Created wheel for resampy: filename=resampy-0.2.2-py3-none-any.whl size=320732 sha256=97a16689ba045017b1f15aa3b3d9a1da59e987f6a228f75902120c5f9461168a Stored in directory: c:\users\1135053672\appdata\local\pip\cache\wheels\17\74\46\c6570ed50edb542a09fb2e88fb135939178f11a0754ceb9752 Successfully built pyworld audioread resampy Failed to build llvmlite Installing collected packages: wrapt, typing-extensions, textgrid, termcolor, tensorflow-estimator, tensorboard-plugin-wit, pyasn1, llvmlite, keras, jamo, flatbuffers, distance, dataclasses, clang, certifi, audioread, appdirs, zipp, wheel, werkzeug, urllib3, unidecode, typeguard, threadpoolctl, tensorboard-data-server, six, setuptools, rsa, regex, PyYAML, pypinyin, pyparsing, pycparser, pyasn1-modules, protobuf, pillow, oauthlib, numpy, kiwisolver, joblib, inflect, idna, gast, g2pM, fonttools, filelock, decorator, cython, cycler, colorama, charset-normalizer, cachetools, tqdm, tensorflow-addons, scipy, requests, pyworld, python-dateutil, packaging, opt-einsum, numba, keras-preprocessing, importlib-metadata, h5py, grpcio, google-pasta, google-auth, click, cffi, astunparse, absl-py, soundfile, scikit-learn, resampy, requests-oauthlib, pooch, nltk, matplotlib, markdown, huggingface-hub, librosa, google-auth-oauthlib, g2p-en, tensorboard, tensorflow-gpu, TensorFlowTTS Running setup.py install for llvmlite ... error error: subprocess-exited-with-error

× Running setup.py install for llvmlite did not run successfully. │ exit code: 1 ╰─> [27 lines of output] running install running build got version from file C:\Users\1135053672\AppData\Local\Temp\pip-install-jlnsqgwl\llvmlite_9c5622c13c624a7da37c2a9f225418b3\llvmlite/_version.py {'version': '0.31.0', 'full': 'fe7d985f6421d87f613bd414479d29d912771562'} running build_ext C:\Environment\Anaconda\Anaconda3\envs\virtual_xjs\python.exe C:\Users\1135053672\AppData\Local\Temp\pip-install-jlnsqgwl\llvmlite_9c5622c13c624a7da37c2a9f225418b3\ffi\build.py Trying generator 'Visual Studio 14 2015 Win64' Traceback (most recent call last): File "C:\Users\1135053672\AppData\Local\Temp\pip-install-jlnsqgwl\llvmlite_9c5622c13c624a7da37c2a9f225418b3\ffi\build.py", line 168, in main() File "C:\Users\1135053672\AppData\Local\Temp\pip-install-jlnsqgwl\llvmlite_9c5622c13c624a7da37c2a9f225418b3\ffi\build.py", line 156, in main main_win32() File "C:\Users\1135053672\AppData\Local\Temp\pip-install-jlnsqgwl\llvmlite_9c5622c13c624a7da37c2a9f225418b3\ffi\build.py", line 88, in main_win32 generator = find_win32_generator() File "C:\Users\1135053672\AppData\Local\Temp\pip-install-jlnsqgwl\llvmlite_9c5622c13c624a7da37c2a9f225418b3\ffi\build.py", line 76, in find_win32_generator try_cmake(cmake_dir, build_dir, generator) File "C:\Users\1135053672\AppData\Local\Temp\pip-install-jlnsqgwl\llvmlite_9c5622c13c624a7da37c2a9f225418b3\ffi\build.py", line 28, in try_cmake subprocess.check_call(['cmake', '-G', generator, cmake_dir]) File "C:\Environment\Anaconda\Anaconda3\envs\virtual_xjs\lib\subprocess.py", line 368, in check_call
retcode = call(*popenargs, **kwargs) File "C:\Environment\Anaconda\Anaconda3\envs\virtual_xjs\lib\subprocess.py", line 349, in call with Popen(*popenargs, **kwargs) as p: File "C:\Environment\Anaconda\Anaconda3\envs\virtual_xjs\lib\subprocess.py", line 951, in init
self._execute_child(args, executable, preexec_fn, close_fds, File "C:\Environment\Anaconda\Anaconda3\envs\virtual_xjs\lib\subprocess.py", line 1420, in _execute_child hp, ht, pid, tid = _winapi.CreateProcess(executable, args, FileNotFoundError: [WinError 2] 系统找不到指定的文件。 error: command 'C:\Environment\Anaconda\Anaconda3\envs\virtual_xjs\python.exe' failed with exit code 1 [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. error: legacy-install-failure

× Encountered error while trying to install package. ╰─> llvmlite

note: This is an issue with the package mentioned above, not pip. hint: See above for output from the failure.

是否需要上面llvm或怎么设置指定路径不太清楚，又或者只能降级python？

opened by SwordMasterJS 1
Other language support

Hi, @CMsmartvoice

Thanks for your great work, I would like to use your method to train Korean/English data for my research. Would you please give some advice? Are there any plans to provide training steps in other languages?

Thanks in advance

opened by ziyigit 1

Tts_handel = UnetTTS(models_and_params, text2id_mapper, feats_yaml) 报错

环境：Google Colab 使用GPU，使用官方文档的notebook 运行到Tts_handel = UnetTTS(models_and_params, text2id_mapper, feats_yaml)报错

UnknownError                              Traceback (most recent call last)
[<ipython-input-16-2776df11a7fe>](https://localhost:8080/#) in <module>()
----> 1 Tts_handel = UnetTTS(models_and_params, text2id_mapper, feats_yaml)

23 frames
[/content/One-Shot-Voice-Cloning/UnetTTS_syn.py](https://localhost:8080/#) in __init__(self, models_and_params, text2id_mapper, feats_yaml)
     22         self.phone_dur_min          = 5
     23         self.phone_dur_max          = 20
---> 24         self.__init_models()
     25 
     26     def one_shot_TTS(self, text, src_audio, duration_stats=None, is_wrap_txt=True):

[/content/One-Shot-Voice-Cloning/UnetTTS_syn.py](https://localhost:8080/#) in __init_models(self)
     72         self.duration_model = TFAutoModel.from_pretrained(config=AutoConfig.from_pretrained(self.models_and_params["duration_param"]), 
     73                                       pretrained_path=self.models_and_params["duration_model"],
---> 74                                       name="Normalized_duration_predictor")
     75         print("duration model load finished.")
     76 

[/usr/local/lib/python3.7/dist-packages/tensorflow_tts/inference/auto_model.py](https://localhost:8080/#) in from_pretrained(cls, config, pretrained_path, **kwargs)
     59                 model = model_class(config=config, **kwargs)
     60                 if is_build:
---> 61                     model._build()
     62                 if pretrained_path is not None and ".h5" in pretrained_path:
     63                     model.load_weights(pretrained_path)

[/usr/local/lib/python3.7/dist-packages/tensorflow_tts/models/unetts.py](https://localhost:8080/#) in _build(self)
     55         char_ids = tf.convert_to_tensor([[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]], tf.int32)
     56         duration_stat = tf.convert_to_tensor([[1., 1., 1., 1.]], tf.float32)
---> 57         self(char_ids, duration_stat)
     58 
     59     def call(

[/usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py](https://localhost:8080/#) in __call__(self, *args, **kwargs)
   1035         with autocast_variable.enable_auto_cast_variables(
   1036             self._compute_dtype_object):
-> 1037           outputs = call_fn(inputs, *args, **kwargs)
   1038 
   1039         if self._activity_regularizer:

[/usr/local/lib/python3.7/dist-packages/tensorflow_tts/models/unetts.py](https://localhost:8080/#) in call(self, char_ids, duration_stat, training, **kwargs)
     72         embedding_output = self.embeddings(char_ids)
     73 
---> 74         encoder_output             = self.encoder([embedding_output, attention_mask], training=training)
     75         last_encoder_hidden_states = encoder_output[0]
     76 

[/usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py](https://localhost:8080/#) in __call__(self, *args, **kwargs)
   1035         with autocast_variable.enable_auto_cast_variables(
   1036             self._compute_dtype_object):
-> 1037           outputs = call_fn(inputs, *args, **kwargs)
   1038 
   1039         if self._activity_regularizer:

[/usr/local/lib/python3.7/dist-packages/tensorflow_tts/models/moduls/core.py](https://localhost:8080/#) in call(self, inputs, training)
    377 
    378             layer_outputs = layer_module(
--> 379                 [hidden_states, attention_mask], training=training
    380             )
    381             hidden_states = layer_outputs[0]

[/usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py](https://localhost:8080/#) in __call__(self, *args, **kwargs)
   1035         with autocast_variable.enable_auto_cast_variables(
   1036             self._compute_dtype_object):
-> 1037           outputs = call_fn(inputs, *args, **kwargs)
   1038 
   1039         if self._activity_regularizer:

[/usr/local/lib/python3.7/dist-packages/tensorflow_tts/models/moduls/core.py](https://localhost:8080/#) in call(self, inputs, training)
    339         attention_output = attention_outputs[0]
    340         intermediate_output = self.intermediate(
--> 341             [attention_output, attention_mask], training=training
    342         )
    343         layer_output = self.bert_output(

[/usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py](https://localhost:8080/#) in __call__(self, *args, **kwargs)
   1035         with autocast_variable.enable_auto_cast_variables(
   1036             self._compute_dtype_object):
-> 1037           outputs = call_fn(inputs, *args, **kwargs)
   1038 
   1039         if self._activity_regularizer:

[/usr/local/lib/python3.7/dist-packages/tensorflow_tts/models/moduls/core.py](https://localhost:8080/#) in call(self, inputs)
    290         hidden_states, attention_mask = inputs
    291 
--> 292         hidden_states = self.conv1d_1(hidden_states)
    293         hidden_states = self.intermediate_act_fn(hidden_states)
    294         hidden_states = self.conv1d_2(hidden_states)

[/usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py](https://localhost:8080/#) in __call__(self, *args, **kwargs)
   1035         with autocast_variable.enable_auto_cast_variables(
   1036             self._compute_dtype_object):
-> 1037           outputs = call_fn(inputs, *args, **kwargs)
   1038 
   1039         if self._activity_regularizer:

[/usr/local/lib/python3.7/dist-packages/keras/layers/convolutional.py](https://localhost:8080/#) in call(self, inputs)
    247       inputs = tf.pad(inputs, self._compute_causal_padding(inputs))
    248 
--> 249     outputs = self._convolution_op(inputs, self.kernel)
    250 
    251     if self.use_bias:

[/usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py](https://localhost:8080/#) in wrapper(*args, **kwargs)
    204     """Call target, and fall back on dispatchers if there is a TypeError."""
    205     try:
--> 206       return target(*args, **kwargs)
    207     except (TypeError, ValueError):
    208       # Note: convert_to_eager_tensor currently raises a ValueError, not a

[/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/nn_ops.py](https://localhost:8080/#) in convolution_v2(input, filters, strides, padding, data_format, dilations, name)
   1136       data_format=data_format,
   1137       dilations=dilations,
-> 1138       name=name)
   1139 
   1140 

[/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/nn_ops.py](https://localhost:8080/#) in convolution_internal(input, filters, strides, padding, data_format, dilations, name, call_from_convolution, num_spatial_dims)
   1266           data_format=data_format,
   1267           dilations=dilations,
-> 1268           name=name)
   1269     else:
   1270       if channel_index == 1:

[/usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py](https://localhost:8080/#) in wrapper(*args, **kwargs)
    204     """Call target, and fall back on dispatchers if there is a TypeError."""
    205     try:
--> 206       return target(*args, **kwargs)
    207     except (TypeError, ValueError):
    208       # Note: convert_to_eager_tensor currently raises a ValueError, not a

[/usr/local/lib/python3.7/dist-packages/tensorflow/python/util/deprecation.py](https://localhost:8080/#) in new_func(*args, **kwargs)
    615                   func.__module__, arg_name, arg_value, 'in a future version'
    616                   if date is None else ('after %s' % date), instructions)
--> 617       return func(*args, **kwargs)
    618 
    619     doc = _add_deprecated_arg_value_notice_to_docstring(

[/usr/local/lib/python3.7/dist-packages/tensorflow/python/util/deprecation.py](https://localhost:8080/#) in new_func(*args, **kwargs)
    615                   func.__module__, arg_name, arg_value, 'in a future version'
    616                   if date is None else ('after %s' % date), instructions)
--> 617       return func(*args, **kwargs)
    618 
    619     doc = _add_deprecated_arg_value_notice_to_docstring(

[/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/nn_ops.py](https://localhost:8080/#) in conv1d(value, filters, stride, padding, use_cudnn_on_gpu, data_format, name, input, dilations)
   2009           data_format=data_format,
   2010           dilations=dilations,
-> 2011           name=name)
   2012     else:
   2013       result = squeeze_batch_dims(

[/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/gen_nn_ops.py](https://localhost:8080/#) in conv2d(input, filter, strides, padding, use_cudnn_on_gpu, explicit_paddings, data_format, dilations, name)
    930       return _result
    931     except _core._NotOkStatusException as e:
--> 932       _ops.raise_from_not_ok_status(e, name)
    933     except _core._FallbackException:
    934       pass

[/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py](https://localhost:8080/#) in raise_from_not_ok_status(e, name)
   6939   message = e.message + (" name: " + name if name is not None else "")
   6940   # pylint: disable=protected-access
-> 6941   six.raise_from(core._status_to_exception(e.code, message), None)
   6942   # pylint: enable=protected-access
   6943 

/usr/local/lib/python3.7/dist-packages/six.py in raise_from(value, from_value)

UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [Op:Conv2D]

opened by xiaocao666tzh 2

How to generate the duration statistical info, like test_wavs/*.npy file.
The npy files in */test_wavs are generated by the MFA tool, but first its corresponding phoneme sequence has to be known.

It is not limited to the above method, but any tool that can predict the duration of articulation can be used, such as the acoustic model of ASR.

The above method can accurately estimate the duration information of the reference audio. For cloning, in fact, the accuracy of duration information is not so demanding, and the result of coarse estimation using manual methods can achieve the same effect. For example, using a speech spectrogram viewing tool, or other audio annotation tools, the duration of phonemes can be estimated audiovisually.

The Style_Encoder in this model is equivalent to an audio frame encoder, where the final output of the network is related to the content only, with phoneme position information embedded in the results. Based on these temporal position encodings, a simple estimation of the phoneme duration of the reference audio can be performed using the Style_Encoder. Better yet, the Style_Encoder method does not require knowledge of the phoneme sequence corresponding to the audio. https://github.com/CMsmartvoice/One-Shot-Voice-Cloning/blob/6beec14888be82ade5164cc9e534f0a0c1ee38f9/TensorFlowTTS/tensorflow_tts/models/moduls/core.py#L700-L705

Originally posted by @CMsmartvoice in https://github.com/CMsmartvoice/One-Shot-Voice-Cloning/issues/3#issuecomment-1046414407
documentation
opened by CMsmartvoice 1
import error

I was following your guide to perform some inference but I encontered a directory issue:

----> 2 from tensorflow_tts.audio_process import preprocess_wav 3 from UnetTTS_syn import UnetTTS

ModuleNotFoundError: No module named 'tensorflow_tts.audio_process'

I tried to move the folder outside TensorFlowTTS, ant the moduleworked but then other errors came out, Could you help ?

opened by ireneb612 2

Owner

GitHub

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

⚠️ Checkout develop branch to see what is coming in pyannote.audio 2.0: a much smaller and cleaner codebase Python-first API (the good old pyannote-au

2.2k Jan 9, 2023

A Non-Autoregressive Transformer based TTS, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS.

A Non-Autoregressive Transformer based TTS, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS.

237 Jan 2, 2023

The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.

VAENAR-TTS This repo contains code accompanying the paper "VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis". Sa

138 Oct 28, 2022

vits chinese, tts chinese, tts mandarin

vits chinese, tts chinese, tts mandarin 史上训练最简单，音质最好的语音合成系统

12 Dec 14, 2022

Ukrainian TTS (text-to-speech) using Coqui TTS

title emoji colorFrom colorTo sdk app_file pinned Ukrainian TTS ?? green green gradio app.py false Ukrainian TTS ?? ?? Ukrainian TTS (text-to-speech)

85 Dec 26, 2022

A Neural Language Style Transfer framework to transfer natural language text smoothly between fine-grained language styles like formal/casual, active/passive, and many more. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.

Styleformer A Neural Language Style Transfer framework to transfer natural language text smoothly between fine-grained language styles like formal/cas

431 Dec 19, 2022

This code extends the neural style transfer image processing technique to video by generating smooth transitions between several reference style images

Neural Style Transfer Transition Video Processing By Brycen Westgarth and Tristan Jogminas Description This code extends the neural style transfer ima

110 Jan 7, 2023

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism This repository is the official PyTorch implementation of our AAAI-2022 paper, in

829 Jan 7, 2023

A multi-voice TTS system trained with an emphasis on quality

TorToiSe Tortoise is a text-to-speech program built with the following priorities: Strong multi-voice capabilities. Highly realistic prosody and inton

2.1k Jan 1, 2023

Repository for the paper: VoiceMe: Personalized voice generation in TTS

?? VoiceMe: Personalized voice generation in TTS Abstract Novel text-to-speech systems can generate entirely new voices that were not seen during trai

80 Dec 29, 2022

Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS)

This repository is an implementation of Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS) with a vocoder that works in real-time. Feel free to check my thesis if you're curious or if you're looking for info I haven't documented. Mostly I would recommend giving a quick look to the figures beyond the introduction.

38.5k Jan 3, 2023

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

15.3k Dec 30, 2022

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

15.3k Jan 3, 2023

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

10.8k Feb 18, 2021

Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning

Related tags

Overview

Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning

Install Requirements

Usage

Reference

Comments

AttributeError: 'dict' object has no attribute '__NUMPY_SETUP__'报错

Is english text is supported?

python3.9无法兼容llvmlite

Other language support

Tts_handel = UnetTTS(models_and_params, text2id_mapper, feats_yaml) 报错

How to generate the duration statistical info, like test_wavs/*.npy file.

import error

Owner

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

A Non-Autoregressive Transformer based TTS, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS.

The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.

vits chinese, tts chinese, tts mandarin

Ukrainian TTS (text-to-speech) using Coqui TTS

A Neural Language Style Transfer framework to transfer natural language text smoothly between fine-grained language styles like formal/casual, active/passive, and many more. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.

This code extends the neural style transfer image processing technique to video by generating smooth transitions between several reference style images

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022

A multi-voice TTS system trained with an emphasis on quality

Repository for the paper: VoiceMe: Personalized voice generation in TTS

Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS)

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Python functions for summarizing and improving voice dictation input.

This project converts your human voice input to its text transcript and to an automated voice too.

Global Rhythm Style Transfer Without Text Transcriptions

A Python wrapper for simple offline real-time dictation (speech-to-text) and speaker-recognition using Vosk.

Multispeaker & Emotional TTS based on Tacotron 2 and Waveglow