ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

Related tags

Deep Learning ERISHA
Overview

ERISHA: Multilingual Multispeaker Expressive Text-to-Speech Library

Multilingual Multispeaker Expressive Text-to-Speech Library

ERISHA is a multilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available. The term ERISHA means speech in Sanskrit. The framework of ERISHA includes various deep learning architectures such as Global Style Token (GST), Variational Autoencoder (VAE), and Gaussian Mixture Variational Autoencoder (GMVAE), and X-vectors for building prosody encoder.

Currently, the library is in its initial stage of development and will be updated frequently in the coming days.

Stay tuned for more updates, and we are open to collaboration !!!

Installation and Training

Refer INSTALL for initial setup

Available recipes

Available Features

  • Resampling of speech waveforms to target sampling rate in recipes
  • Support to train TTS system for other languages
  • Support to train Multilingual TTS system for other languages

Upcoming updates

Acknowledgements

This implementation uses code from the following repos: NVIDIA, Keith Ito, Prem Seetharaman, Chengqi Deng,Dannynis, Jhosimar George Arias Figueroa

You might also like...
PyTorch implementation of the method described in the paper VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop.
PyTorch implementation of the method described in the paper VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop.

VoiceLoop PyTorch implementation of the method described in the paper VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop. VoiceLoop is a n

Use VITS and Opencpop  to develop singing voice synthesis; Maybe it will VISinger.
Use VITS and Opencpop to develop singing voice synthesis; Maybe it will VISinger.

Init Use VITS and Opencpop to develop singing voice synthesis; Maybe it will VISinger. 本项目基于 https://github.com/jaywalnut310/vits https://github.com/S

An implementation of
An implementation of "Optimal Textures: Fast and Robust Texture Synthesis and Style Transfer through Optimal Transport"

Optex An implementation of Optimal Textures: Fast and Robust Texture Synthesis and Style Transfer through Optimal Transport for TU Delft CS4240. You c

code for our paper "Source Data-absent Unsupervised Domain Adaptation through Hypothesis Transfer and Labeling Transfer"

SHOT++ Code for our TPAMI submission "Source Data-absent Unsupervised Domain Adaptation through Hypothesis Transfer and Labeling Transfer" that is ext

Transfer-Learn is an open-source and well-documented library for Transfer Learning.
Transfer-Learn is an open-source and well-documented library for Transfer Learning.

Transfer-Learn is an open-source and well-documented library for Transfer Learning. It is based on pure PyTorch with high performance and friendly API. Our code is pythonic, and the design is consistent with torchvision. You can easily develop new algorithms, or readily apply existing algorithms.

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language This repository contains UA-GEC data and an accompanying Python lib

Official repository for
Official repository for "Action-Based Conversations Dataset: A Corpus for Building More In-Depth Task-Oriented Dialogue Systems"

Action-Based Conversations Dataset (ABCD) This respository contains the code and data for ABCD (Chen et al., 2021) Introduction Whereas existing goal-

This repository contains the code for "Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP".

Self-Diagnosis and Self-Debiasing This repository contains the source code for Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based

Official code of our work, AVATAR: A Parallel Corpus for Java-Python Program Translation.

AVATAR Official code of our work, AVATAR: A Parallel Corpus for Java-Python Program Translation. AVATAR stands for jAVA-pyThon progrAm tRanslation. AV

Comments
  • EntropyLoss is error

    EntropyLoss is error

    code defines as following: in the loss_fuction.py

    def entropy(self, logits, targets):
          log_q = F.log_softmax(logits, dim=-1)
          return -torch.mean(torch.sum(targets * log_q, dim=-1))
    

    in the module.py cat_prob = F.softmax(self.categorical_layer(z), dim=-1)

    1. so in the entropy function, the first is logits of model output, the second is target, but you use entropy function with wrong orders int the get_encoder_loss cat_lambda*(-self.entropy(cat_target, prob_)
    2. entropy loss is already NLL loss, so it should not have a negative weight of -1, int the get_encoder_loss cat_lambda*(-self.entropy(cat_target, prob_)
    3. because you use F.log_softmax, so the logits should be log_probs, but in your module.py you use a softmax activation, so the log_q should be torch.log(logits)

    so the right loss is following:

    def entropy(self, logits, targets):
          log_q = torch.log(logits)
          return -torch.mean(torch.sum(targets * log_q, dim=-1))
    
    def get_encoder_loss(self, id_, prob_, classes_, cat_lambda, kl_lambda, encoder_type):
            cat_target = self.indices_to_one_hot(id_, classes_)
    
            if (encoder_type == 'gst' or encoder_type == 'x-vector') and cat_lambda != 0.0:
                loss = cat_lambda*(self.entropy(prob_, cat_target) - np.log(0.1))
            elif (encoder_type == 'vae' or encoder_type == 'gst_vae') and (cat_lambda != 0.0 or kl_lambda !=0.0):
                loss = cat_lambda*(self.entropy(prob_[2], cat_target) - np.log(0.1)) + kl_lambda*self.KL_loss(prob_[0], prob_[1])
            elif encoder_type == 'gmvae' and (cat_lambda != 0.0 or kl_lambda !=0.0) :
                loss = self.gaussian_loss(prob_[0], prob_[1], prob_[2], prob_[3], prob_[4])*kl_lambda + (self.entropy(prob_[5], cat_target) - np.log(0.1))*cat_lambda
            else:
                loss = 0.0
    
            return loss
    
    opened by BridgetteSong 0
  • about gmvae loss

    about gmvae loss

    I find the loss of "gmvae" is defined as "gaussian_loss", in the GMVAE paper, it is defined as following "E(q(y|x))[KL(q(zlX) || p(zly))]". the "gaussian_loss" is defined as "log(q(z|x)) - log(p(z|y))". My question is that the "gaussian_loss" shoud be defined as "p(z|y) *(log(q(z|x)) - log(p(z|y)))"? Thanks!

    opened by BridgetteSong 1
Owner
Ajinkya Kulkarni
Ph.D. student at the University of Lorraine!
Ajinkya Kulkarni
Official PyTorch implementation of the paper: Improving Graph Neural Network Expressivity via Subgraph Isomorphism Counting.

Improving Graph Neural Network Expressivity via Subgraph Isomorphism Counting Official PyTorch implementation of the paper: Improving Graph Neural Net

Giorgos Bouritsas 58 Dec 31, 2022
STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech Keon Lee, Ky

Keon Lee 114 Dec 12, 2022
Transfer style api - An API to use with Tranfer Style App, where you can use two image and transfer the style

Transfer Style API It's an API to use with Tranfer Style App, where you can use

Brian Alejandro 1 Feb 13, 2022
Code for the Active Speakers in Context Paper (CVPR2020)

Active Speakers in Context This repo contains the official code and models for the "Active Speakers in Context" CVPR 2020 paper. Before Training The c

null 43 Oct 14, 2022
Make your AirPlay devices as TTS speakers

Apple AirPlayer Home Assistant integration component, make your AirPlay devices as TTS speakers. Before Use 2021.6.X or earlier Apple Airplayer compon

George Zhao 117 Dec 15, 2022
Identify the emotion of multiple speakers in an Audio Segment

MevonAI - Speech Emotion Recognition Identify the emotion of multiple speakers in a Audio Segment Report Bug · Request Feature Try the Demo Here Table

Suyash More 110 Dec 3, 2022
A project to build an AI voice assistant using Python . The Voice assistant interacts with the humans to perform basic tasks.

AI_Personal_Voice_Assistant_Using_Python A project to build an AI voice assistant using Python . The Voice assistant interacts with the humans to perf

Chumui Tripura 1 Oct 30, 2021
Voice assistant - Voice assistant with python

?? Python Voice Assistant ?? - User's greeting ?? - Writing tasks to todo-list ?

PythonToday 10 Dec 26, 2022
Pytorch Implementation of DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis (TTS Extension)

DiffSinger - PyTorch Implementation PyTorch implementation of DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis (TTS Extension). Status

Keon Lee 152 Jan 2, 2023