Global Rhythm Style Transfer Without Text Transcriptions

Overview

Global Prosody Style Transfer Without Text Transcriptions

This repository provides a PyTorch implementation of AutoPST, which enables unsupervised global prosody conversion without text transcriptions.

This is a short video that explains the main concepts of our work. If you find this work useful and use it in your research, please consider citing our paper.

SpeechSplit

@InProceedings{pmlr-v139-qian21b,
  title = 	 {Global Prosody Style Transfer Without Text Transcriptions},
  author =       {Qian, Kaizhi and Zhang, Yang and Chang, Shiyu and Xiong, Jinjun and Gan, Chuang and Cox, David and Hasegawa-Johnson, Mark},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {8650--8660},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  url = 	 {http://proceedings.mlr.press/v139/qian21b.html}
}

Audio Demo

The audio demo for AutoPST can be found here

Dependencies

  • Python 3.6
  • Numpy
  • Scipy
  • PyTorch == v1.6.0
  • librosa
  • pysptk
  • soundfile
  • wavenet_vocoder pip install wavenet_vocoder==0.1.1 for more information, please refer to https://github.com/r9y9/wavenet_vocoder

To Run Demo

Download pre-trained models to assets

Download the same WaveNet vocoder model as in AutoVC to assets

Please refer to AutoVC if you have any problems with the vocoder part, because they share the same vocoder scripts.

Run demo.ipynb

To Train

Download training data to assets. The provided training data is very small for code verification purpose only. Please use the scripts to prepare your own data for training.

  1. Prepare training data: python prepare_train_data.py

  2. Train 1st Stage: python main_1.py

  3. Train 2nd Stage: python main_2.py

Final Words

This project is part of an ongoing research. We hope this repo is useful for your research. If you need any help or have any suggestions on improving the framework, please raise an issue and we will do our best to get back to you as soon as possible.

Comments
  • How to test AutoPST in onother languages?

    How to test AutoPST in onother languages?

    I have not success to test final section of vocoder code in autopst. In conda envionment, all dependencies have installed. Error something like"from synthessis cannot import build_model" allways shows. I want to ask you, do I need to train in AutoVC speakers, cause I want to use for my own sounds, in another language, not English?

    just want to clone voice of one recording to onother. for different speakers. Do these recordings need to be same lenght and same sentences spoken to be compared during training?

    BTW, I have RTX3060, and this card not supported by version of 1.6.0 of pytorch. I have installed fist onmt python package, than pytorch 1.7.0 with Cuda 11 Thank you

    opened by zelenooki87 6
  • Missing basic execution with different set of speakers.

    Missing basic execution with different set of speakers.

    Hi there, I am trying to follow the code with my own dataset and could run Main_1.py and main_2.py to get xxx-A.ckpt and xxx-B.ckpt files. Now I am not able to understand to run the demo file to prepare specific speakers dictionary to create and convert. Any help is appreciated with a little more direction to follow the steps.

    opened by sainishalini 4
  • Unable to reproduce results

    Unable to reproduce results

    @auspicious3000 We tried reproducing results using your codebase and the dataset found here https://datashare.ed.ac.uk/handle/10283/3443 (that you use) but unfortunately, we were unable to. The outputs that we have so far are extremely noisy (even if source and target speakers are the same). Could you please share a working code with us that might help us reproduce the results? I would greatly appreciate your inputs!

    opened by avanitanna 1
  • Issue with stop prediction for longer utterances.

    Issue with stop prediction for longer utterances.

    Hi @auspicious3000,

    First, thanks for releasing this repository! I've been trying to compare AutoPST to some upcoming work but I'm having an issue with the stop token prediction when converting utterances longer than 1 or 2 seconds. I noticed that you clipped some of the VCTK files for your demo page (and in the test dictionary you provided) so that they're much shorter. How did you use the test utterances in your evaluations? Do you have any recommendations so that I can make as fair a comparison as possible.

    Thanks, Benjamin

    opened by bshall 1
  • SpeechSplit actually better than AutoPST for seen speakers?

    SpeechSplit actually better than AutoPST for seen speakers?

    Hello, I'm reading D.6. SPEECHSPLIT Baseline in the paper.

    Am I understanding this correctly that SpeechSplit performs better for rythm transfer when the doing conversation to a seen speaker ?

    opened by skol101 1
  • How can we generate test_vctk.meta?

    How can we generate test_vctk.meta?

    Hi @auspicious3000, can you please provide me with the code to generate test_vctk.meta? The format of train_vctk.meta (which is generated using prepare_train_data.py) is very different from test. I would like to test your code on new data.

    opened by avanitanna 5
  • How to solve SEA model problem

    How to solve SEA model problem

    Hello. I have referred to your paper. Based on your experiment, I conducted experiment on accent transformation using English accent data from different countries. But the result is very unsatisfactory, I can't even hear the transformed voice clearly.

    I think there may be a problem in the process of training SEA model. But I don't know exactly where the problem is.

    The images show my code for training SEA. Could you help me with this issue? What is the best approach to solve this? 1 2

    opened by sera920 0
  • KeyError when run prepare_train_data.py

    KeyError when run prepare_train_data.py

    image

    Hi, I got an error like this, when run prepare_train_data.py Is spk2emb has vctk16-train-wav key?

    vctk16-train-wav Traceback (most recent call last): File "prepare_train_data.py", line 52, in submeta.append(spk2emb[subdir]) KeyError: 'vctk16-train-wav'

    opened by Jwaminju 2
  • ModuleNotFoundError: No module named 'onmt'

    ModuleNotFoundError: No module named 'onmt'

    Hi, I run into an error about onmt


    ModuleNotFoundError Traceback (most recent call last) in 5 import torch.nn.functional as F 6 from collections import OrderedDict ----> 7 from onmt.utils.misc import sequence_mask 8 from model_autopst import Generator_2 as Predictor 9 from hparams_autopst import hparams

    ModuleNotFoundError: No module named 'onmt'

    but I see folder onmt_modules exists image

    then I install onmt(pip install onmt) and notice it's installing torch 1.3.0 although the requirements say that PyTorch == v1.6.0

    image

    Could you help me with this issue? What is the best approach to solve this?

    opened by stalevna 1
  • Inference with new input audio

    Inference with new input audio

    Hi and thank you for this amazing project!

    I was trying to create a notebook in colab that would allow me to input an audio file, then select the speaker and produce an output accordingly.

    Here the code, it works but I am missing the part on how to change speaker timbre. Do you have any tips on that?

    Thanks a lot in advance!

    opened by shoegazerstella 4
Owner
Kaizhi Qian
Kaizhi Qian
This code extends the neural style transfer image processing technique to video by generating smooth transitions between several reference style images

Neural Style Transfer Transition Video Processing By Brycen Westgarth and Tristan Jogminas Description This code extends the neural style transfer ima

Brycen Westgarth 110 Jan 7, 2023
Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning

Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning English | 中文 ❗ Now we provide inferencing code and pre-training models

null 164 Jan 2, 2023
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

T5: Text-To-Text Transfer Transformer The t5 library serves primarily as code for reproducing the experiments in Exploring the Limits of Transfer Lear

Google Research 4.6k Jan 1, 2023
Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari Overview | Performance | Installation | Documentation | Contributing ?? ?? ?? We released the 2.0.0 version with TF2 Support. ?? ?? ?? If you

Eliyar Eziz 2.3k Dec 29, 2022
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

T5: Text-To-Text Transfer Transformer The t5 library serves primarily as code for reproducing the experiments in Exploring the Limits of Transfer Lear

Google Research 3.2k Feb 17, 2021
Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari Overview | Performance | Installation | Documentation | Contributing ?? ?? ?? We released the 2.0.0 version with TF2 Support. ?? ?? ?? If you

Eliyar Eziz 2k Feb 9, 2021
An easy to use, user-friendly and efficient code for extracting OpenAI CLIP (Global/Grid) features from image and text respectively.

Extracting OpenAI CLIP (Global/Grid) Features from Image and Text This repo aims at providing an easy to use and efficient code for extracting image &

Jianjie(JJ) Luo 13 Jan 6, 2023
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS)

This repository is an implementation of Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS) with a vocoder that works in real-time. Feel free to check my thesis if you're curious or if you're looking for info I haven't documented. Mostly I would recommend giving a quick look to the figures beyond the introduction.

Corentin Jemine 38.5k Jan 3, 2023
Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

null 186 Dec 24, 2022
Extract city and country mentions from Text like GeoText without regex, but FlashText, a Aho-Corasick implementation.

flashgeotext ⚡ ?? Extract and count countries and cities (+their synonyms) from text, like GeoText on steroids using FlashText, a Aho-Corasick impleme

Ben 57 Dec 16, 2022
Global Tracking Transformers, CVPR 2022

Global Tracking Transformers Global Tracking Transformers, Xingyi Zhou, Tianwei Yin, Vladlen Koltun, Philipp Krähenbühl, CVPR 2022 (arXiv 2203.13250)

Xingyi Zhou 304 Dec 16, 2022
Scikit-learn style model finetuning for NLP

Scikit-learn style model finetuning for NLP Finetune is a library that allows users to leverage state-of-the-art pretrained NLP models for a wide vari

indico 665 Dec 17, 2022
Scikit-learn style model finetuning for NLP

Scikit-learn style model finetuning for NLP Finetune is a library that allows users to leverage state-of-the-art pretrained NLP models for a wide vari

indico 631 Feb 2, 2021
:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.

(Framework for Adapting Representation Models) What is it? FARM makes Transfer Learning with BERT & Co simple, fast and enterprise-ready. It's built u

deepset 1.6k Dec 27, 2022
:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.

(Framework for Adapting Representation Models) What is it? FARM makes Transfer Learning with BERT & Co simple, fast and enterprise-ready. It's built u

deepset 1.1k Feb 14, 2021
Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

ConSERT Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer Requirements torch==1.6.0

Yan Yuanmeng 478 Dec 25, 2022