PyTorch implementation of NATSpeech: A Non-Autoregressive Text-to-Speech Framework

Overview



NATSpeech: A Non-Autoregressive Text-to-Speech Framework

This repo contains official PyTorch implementation of:

Key Features

We implement the following features in this framework:

  • Data processing for non-autoregressive Text-to-Speech using Montreal Forced Aligner.
  • Convenient and scalable framework for training and inference.
  • Simple but efficient random-access dataset implementation.

Install Dependencies

## We tested on Linux/Ubuntu 18.04. 
## Install Python 3.6+ first (Anaconda recommended).

export PYTHONPATH=.
# build a virtual env (recommended).
python -m venv venv
source venv/bin/activate
# install requirements.
pip install -U pip
pip install Cython numpy==1.19.1
pip install torch==1.9.0 # torch >= 1.9.0 recommended
pip install -r requirements.txt
sudo apt install -y sox libsox-fmt-mp3
bash mfa_usr/install_mfa.sh # install forced alignment tool

Documents

Citation

If you find this useful for your research, please cite the following papers:

  • PortaSpeech
@article{ren2021portaspeech,
  title={PortaSpeech: Portable and High-Quality Generative Text-to-Speech},
  author={Ren, Yi and Liu, Jinglin and Zhao, Zhou},
  journal={Advances in Neural Information Processing Systems},
  volume={34},
  year={2021}
}
  • DiffSpeech
@article{liu2021diffsinger,
  title={Diffsinger: Singing voice synthesis via shallow diffusion mechanism},
  author={Liu, Jinglin and Li, Chengxi and Ren, Yi and Chen, Feiyang and Liu, Peng and Zhao, Zhou},
  journal={arXiv preprint arXiv:2105.02446},
  volume={2},
  year={2021}
 }

Acknowledgments

Our codes are influenced by the following repos:

Comments
  • setting `uv = f0 = 0` before normalize?

    setting `uv = f0 = 0` before normalize?

    Hey guys, I found you set uv = f0 = 0 at line 57, what's the intension behind this? https://github.com/NATSpeech/NATSpeech/blob/aef3aa8899c82e40a28e4f59d559b46b18ba87e8/utils/audio/pitch/utils.py#L52-L58

    BTW, I found same issue in DiffSinger https://github.com/MoonInTheRiver/DiffSinger/issues/47

    opened by Cescfangs 1
  • a question about code?

    a question about code?

    `

    class FFTBlocks(nn.Module):

    def __init__(self, hidden_size, num_layers, ffn_kernel_size=9, dropout=0.0,
                 num_heads=2, use_pos_embed=True, use_last_norm=True,
                 use_pos_embed_alpha=True):
        super().__init__()
        self.num_layers = num_layers
        embed_dim = self.hidden_size = hidden_size
        self.dropout = dropout
        self.use_pos_embed = use_pos_embed
        self.use_last_norm = use_last_norm
        if use_pos_embed:
            self.max_source_positions = DEFAULT_MAX_TARGET_POSITIONS
            self.padding_idx = 0
            self.pos_embed_alpha = nn.Parameter(torch.Tensor([1])) if use_pos_embed_alpha else 1
            self.embed_positions = SinusoidalPositionalEmbedding(
                embed_dim, self.padding_idx, init_size=DEFAULT_MAX_TARGET_POSITIONS,
            )
    
        self.layers = nn.ModuleList([])
        self.layers.extend([
            TransformerEncoderLayer(self.hidden_size, self.dropout,
                                    kernel_size=ffn_kernel_size, num_heads=num_heads)
            for _ in range(self.num_layers)
        ])
        if self.use_last_norm:
            self.layer_norm = nn.LayerNorm(embed_dim)
        else:
            self.layer_norm = None
    
    def forward(self, x, padding_mask=None, attn_mask=None, return_hiddens=False):
        """
        :param x: [B, T, C]
        :param padding_mask: [B, T]
        :return: [B, T, C] or [L, B, T, C]
        """
        padding_mask = x.abs().sum(-1).eq(0).data if padding_mask is None else padding_mask
        nonpadding_mask_TB = 1 - padding_mask.transpose(0, 1).float()[:, :, None]  # [T, B, 1]
        ```
    

    if self.use_pos_embed: positions = self.pos_embed_alpha * self.embed_positions(x[..., 0]) x = x + positions x = F.dropout(x, p=self.dropout, training=self.training)

            # B x T x C -> T x B x C
            x = x.transpose(0, 1) * nonpadding_mask_TB
            hiddens = []
            for layer in self.layers:
                x = layer(x, encoder_padding_mask=padding_mask, attn_mask=attn_mask) * nonpadding_mask_TB
                hiddens.append(x)
            if self.use_last_norm:
                x = self.layer_norm(x) * nonpadding_mask_TB
            if return_hiddens:
                x = torch.stack(hiddens, 0)  # [L, T, B, C]
                x = x.transpose(1, 2)  # [L, B, T, C]
            else:
                x = x.transpose(0, 1)  # [B, T, C]
            return x`
    `class FastSpeechEncoder(FFTBlocks):
        def __init__(self, dict_size, hidden_size=256, num_layers=4, kernel_size=9, num_heads=2,
                     dropout=0.0):
    
            super().__init__(hidden_size, num_layers, kernel_size, num_heads=num_heads,
                             use_pos_embed=False, dropout=dropout)  # use_pos_embed_alpha for compatibility
            self.embed_tokens = Embedding(dict_size, hidden_size, 0)
            self.embed_scale = math.sqrt(hidden_size)
            self.padding_idx = 0
            self.embed_positions = SinusoidalPositionalEmbedding(
                hidden_size, self.padding_idx, init_size=DEFAULT_MAX_TARGET_POSITIONS,
            )
    
        def forward(self, txt_tokens, attn_mask=None):
            """
    
            :param txt_tokens: [B, T]
            :return: {
                'encoder_out': [B x T x C]
            }
            """
            encoder_padding_mask = txt_tokens.eq(self.padding_idx).data
            x = self.forward_embedding(txt_tokens)  # [B, T, H]
            if self.num_layers > 0:
                x = super(FastSpeechEncoder, self).forward(x, encoder_padding_mask, attn_mask=attn_mask)
            return x
    
        def forward_embedding(self, txt_tokens):
            # embed tokens and positions
            x = self.embed_scale * self.embed_tokens(txt_tokens)
            if self.use_pos_embed:
                positions = self.embed_positions(txt_tokens)
                x = x + positions
            x = F.dropout(x, p=self.dropout, training=self.training)
            return x
    `
    
    

    I see you use position embedding twice when in encoder,and I don't understand the role of the second which I bold it in the code ,can you explain to me? QAQ looking forward to your reply

    opened by awmmmm 1
  • Meet error when using MFA align data

    Meet error when using MFA align data

    When I try to align the data (ljspeech). I follow the readme part, but when I run python data_gen/tts/runs/train_mfa_align.py --config $CONFIG_NAME I meet the following error. Can anyone know how to solve this problem?

    | Unknow hparams:  []
    | Run MFA for ljspeech. Env vars: CORPUS=ljspeech NUM_JOB=10 MFA_OUTPUTS=mfa_outputs MFA_INPUTS=mfa_inputs MFA_CMD=train
    | Training MFA using 10 cores.
    ERROR - There was an error in the run, please see the log.
    DictionaryError:
    
      Error parsing line 0 of data/processed/ljspeech/mfa_dict.txt: Did not find any tabs, please ensure that your 
        dictionary has tabs between words and their pronunciations.
    
    opened by yangdongchao 0
  • Need steps required my custom data

    Need steps required my custom data

    I have audio data and corresponding phoneme data along with syllable boundaries and stress information. How should I tokenize and encode phoneme and syllable level information along with stress information?

    opened by kafan1986 0
  • fix multiprocessing bug

    fix multiprocessing bug

    Hi @RayeRen, today I've tried your preprocess.py code for LJSpeech dataset. And I realized that there were some processed items in metadata.json doesn't have ph_token key or len(ph.split()) is not equal to len(ph_token). So I've checked your code and found the problem with the line 123 in utils/commons/multiprocess_utils.py.

    https://github.com/NATSpeech/NATSpeech/blob/e7e68d68f3ee70c8d13a1d689b6d69b79331825d/utils/commons/multiprocess_utils.py#L120-L125

    For my understanding, you tried to return the indices and results as passed order instead of processed order, so I think i_now should be yielded instead of job_i. I also code a small snippet to debug your code, you can use it as reference.

    def test_map_func(idx):
        import time
        time.sleep(0.2)
        return {"number": idx*2, "id": idx}
    
    if __name__ == "__main__":
        args = [{"idx": idx} for idx in range(100)]
        ids = []
        for idx, x in multiprocess_run_tqdm(test_map_func, args):
            # args[idx].update(x)
            ids.append(idx)
       print(ids) 
       # print [0, 1, 2, 3, 4, 5, 5, 7, 7, 7, 7, 7, ...] with yield job_i
       # print [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, ...] with yield i_now
    
    opened by leminhnguyen 0
  • Project dependencies may have API risk issues

    Project dependencies may have API risk issues

    Hi, In NATSpeech, inappropriate dependency versioning constraints can cause risks.

    Below are the dependencies and version constraints that the project is using

    matplotlib
    librosa==0.8.0
    tqdm
    pandas
    numba==0.53.1
    numpy==1.19.2
    scipy==1.3
    PyYAML==5.3.1
    tensorboardX
    pyloudnorm
    setuptools>=41.0.0
    g2p_en
    resemblyzer
    webrtcvad
    tensorboard==2.6.0
    scikit-learn==0.24.1
    scikit-image==0.16.2
    textgrid
    jiwer
    pycwt
    PyWavelets
    praat-parselmouth==0.3.3
    jieba
    einops
    chardet
    

    The version constraint == will introduce the risk of dependency conflicts because the scope of dependencies is too strict. The version constraint No Upper Bound and * will introduce the risk of the missing API Error because the latest version of the dependencies may remove some APIs.

    After further analysis, in this project, The version constraint of dependency tqdm can be changed to >=4.36.0,<=4.64.0. The version constraint of dependency numpy can be changed to >=1.16.0rc1,<=1.18.5. The version constraint of dependency setuptools can be changed to >=51.3.0,<=54.1.1. The version constraint of dependency scikit-image can be changed to >=0.9.0,<=0.9.3.

    The above modification suggestions can reduce the dependency conflicts as much as possible, and introduce the latest version as much as possible without calling Error in the projects.

    The invocation of the current project includes all the following methods.

    The calling methods from the tqdm
    tqdm.tqdm
    tqdm.tqdm.set_postfix
    
    The calling methods from the numpy
    numpy.linalg.qr
    numpy.linalg.pinv
    c
    
    The calling methods from the setuptools
    b
    packaging.version.parse
    glob.glob
    
    The calling methods from the scikit-image
    six.iteritems
    
    The calling methods from the all methods
    numpy.exp
    energy.float.sum
    matplotlib.pyplot.imshow
    self.fs2.transpose
    self.conv_k.weight.data.copy_
    self.spectral_convergenge_loss
    utils.os_utils.link_file
    utils.commons.indexed_datasets.IndexedDatasetBuilder.finalize
    numpy.random.randint
    strip_ids
    numpy.array.append
    self.ConformerEncoder.super.forward
    z_postflow.prior_dist.log_prob.mean
    seq_range.unsqueeze.expand.new
    torch.cat.sin
    is_sil.float.cumsum
    MultiprocessManager.add_job
    torch.optim.AdamW
    callable
    self._convert_range
    isinstance
    size.values.len.values.new.fill_
    self.fc2
    self.eos
    utils.nn.seq_utils.set_incremental_state
    u.double.torch.inverse.float
    m.weight.data.normal_
    n_sqz.nonpadding.unsqueeze.repeat.view
    f.write
    self.reducer._rebuild_buckets
    audio_sample_rate.wav_data.parselmouth.Sound.to_pitch_ac
    self.spk_embed_proj
    self.conv_o
    utils.audio.align.mel2token_to_dur.sum
    j.self.num_kernels.i.self.resblocks
    process_cls.process_text
    convert_pad_shape
    numpy.tril
    self.layers.extend
    self.vocoder
    numpy.triu.astype
    scipy.spatial.distance.cdist
    dur_padding.long.sum
    i_word.txt_struct.append
    torch.hann_window
    x.size.x.size.torch.ones.to
    numpy.std
    librosa.resample
    get_lf0_cwt
    self.id_to_token.values
    _1D_window.t._1D_window.mm.float.unsqueeze
    self.Permute.super.__init__
    modules.tts.commons.align_ops.clip_mel2token_to_multiple
    self._to_flatten.append
    self.ph2word_encoder
    torch.nn.functional.mse_loss.mean
    maxlen.lengths.len.torch.ones.to
    torch.multiprocessing.set_sharing_strategy
    self.w_2
    numpy.cumprod
    self.l_mask.to
    xs.size
    max
    self.encoder.size
    AuxModel
    torch.nn.Conv1d
    montreal_forced_aligner.aligner.TrainableAligner.train
    os.path.expanduser
    FastSpeech2OrigInfer.example_run
    lengths.seq_range_expand.new.unsqueeze
    torch.save
    src_mask.float.sum
    TokenTextEncoder
    fix_path
    output.transpose.contiguous.view
    sample.eq
    numpy.squeeze
    self.dump_checkpoint.items
    torch.tril.transpose
    self.add_mel_loss
    self.get_task_ref.train_dataloader
    h.new_zeros
    c
    self.model.fvae.named_children
    self.last_norm
    montreal_forced_aligner.exceptions.ArgumentError
    input.lower
    m.flatten_parameters
    costs.T.cpu.detach.numpy
    wav_gt.clamp.view
    query.self._in_proj.chunk
    range
    mel2ph.numpy.numpy
    nonpadding_sqz.loss_kl.sum
    img1.size
    handler.close
    decoder_inp.detach.mean
    dur.sum.max.torch.arange.to
    os.makedirs
    montreal_forced_aligner.thirdparty.kaldi.validate_transcribe_binaries
    to_torch
    super.__getitem__.update
    utils.commons.indexed_datasets.IndexedDatasetBuilder
    pos.self.padding_idx.self.weights.expand
    self.bias.shape.logs.torch.exp.m.view.to
    move_link_func
    torch.empty
    self.post_flow.train
    fn
    torch.load
    self.ffn_1
    self.proj.bias.data.zero_
    train_mfa_align
    torch.zeros
    preprocess_cls.split
    target_len.new
    optimizer.zero_grad
    data_gen.tts.runs.preprocess.preprocess
    torch.nn.utils.rnn.pack_padded_sequence.sum
    modules.commons.transformer.FFTBlocks
    torch.abs
    get_cont_lf0
    self.save_valid_result
    self.end
    torch.unsqueeze
    end.bias.data.zero_
    ph_token.torch.zeros_like.float
    process_cls.process_wav
    matplotlib.pyplot.matshow
    numpy.zeros
    self.conv_post
    self.run_training_batch
    self.convs.apply
    torch.cat.unsqueeze
    x.abs.sum.eq
    torch.nn.parallel.distributed._DDPSink.apply
    sample_lens.append
    self.wn
    matplotlib.pyplot.hlines
    numpy.ones
    self.conv_k
    hp.get
    self.reset
    six.iteritems
    torch.nn.functional.mse_loss
    self.run_post_glow
    matplotlib.pyplot.subplots
    torch.round
    self.dec_res_proj
    diffusion_step.self.diffusion_projection.unsqueeze
    tgt_padding_mask.float.sum
    self.parameters
    w.replace
    torch.nn.parallel.distributed._find_tensors
    self.drop
    utils.commons.tensor_utils.move_to_cuda
    argparse.ArgumentParser.add_argument
    extract_pitch
    self.FastSpeech2Orig.super.forward_pitch
    self.meta_data
    encoder_out.abs.sum
    fused_add_tanh_sigmoid_multiply
    l.double.torch.inverse.float
    target_padding_mask.src_seg_mask.src_padding_mask.attn.get_phone_coverage_rate.mean
    torch.nn.utils.rnn.pack_padded_sequence
    self.logger.set_runtime_stats_and_log
    utils.commons.dataset_utils.BaseConcatDataset
    encdec_attn.max.values.sum
    utils.audio.align.mel2token_to_dur.tolist
    self.model.post_flow.parameters
    self.p_sample
    os.path.isdir
    target.abs.sum.ne.float
    x_padding.float
    self.LambdaLayer.super.__init__
    torch.cumsum
    pos_emb.self.linear_pos.view
    w_shape.np.random.randn.np.linalg.qr.astype
    output.get
    l.double.torch.inverse.float.double
    self.cwt_stats_layers
    dur_input.data.abs
    f0.log.long
    layers.Swish
    self.PortaSpeechFlowTask.super.save_valid_result
    self.restore_opt_state
    x.view
    attn.transpose.contiguous
    dur.self.length_regulator.detach
    MultiheadAttention
    infer_cls
    CBHG
    x_pos.sum.clamp
    utils.audio.trim_long_silences
    numpy.cumsum
    txt_tokens.shape.torch.LongTensor.to
    mel.torch.from_numpy.float
    self.get_task_ref.test_start
    self._word_encoder
    traceback.print_exc
    is_sil.float.float
    torch.cuda.amp.autocast
    l
    values.new
    self.data_file.seek
    montreal_forced_aligner.command_line.transcribe.run_transcribe_corpus
    self.id_to_token.get
    inspect.isfunction
    f
    torch.nn.Dropout
    getattr.process
    self.MultiPeriodDiscriminator.super.__init__
    repeat_noise
    librosa.feature.delta
    self.cond_layer.transpose
    p.transpose.transpose
    add_global_options
    self.save_checkpoint
    self.load_meta_data
    utils.nn.schedulers.WarmupSchedule
    encdec_attn.gather.mean
    torch.nn.parallel.distributed.Join.notify_join_context
    costs.T.cpu
    matplotlib.pyplot.tight_layout
    utils.audio.librosa_wav2spec.astype
    self.PortaSpeechFlow.super.forward.transpose
    sample.cpu.numpy.tolist
    self.get_task_ref.validation_end
    self.amp_scalar.step
    torch.cuda.is_available
    torch.log1p
    attn.sum.sum
    self.embed_positions
    self.LayerNorm.super.__init__
    utils.commons.hparams.hparams.y.squeeze.mel_spectrogram.transpose.detach
    self.build_tts_model
    self.step
    energy_embed_inp.torch.clamp.long
    self.g_pre_net
    REGISTERED_VOCODERS.get
    nonpadding.unsqueeze.x_recon.noise.abs.mean
    ax.twinx.legend
    xs.dim
    matplotlib.pyplot.title
    modules.commons.conv.ConvBlocks
    x.tgt_nonpadding_BHT.z.self.fvae.decoder.transpose
    self.pitch_predictor
    v.cpu.numpy
    self.init_ddp_connection
    montreal_forced_aligner.helper.setup_logger.warning
    utils.text.text_encoder.build_token_encoder.encode
    self.dec_query_proj
    monitor_op
    torch.nn.init.xavier_uniform_
    self._init_vocab
    samples.items
    dur_pred.cpu.numpy
    torch.distributed.is_initialized
    audio.numpy.numpy
    min
    t_s.self.k_channels.self.n_heads.b.key.view.transpose.size
    p.size
    dur.numpy.tolist
    torch.LongTensor.new_zeros
    hparams.FS_ENCODERS
    self.ffn_2
    modules.tts.glow.utils.squeeze
    x.np.abs.sum
    self.energy_predictor
    self.linear_v
    self.model_disc.parameters
    self.logs.data.copy_
    self.num_heads.attn_mask.repeat.reshape.size
    utils.audio.pitch_extractors.extract_pitch_simple
    self.build_model
    self.fc1
    hiddens.append
    self.W1
    torch.nn.utils.remove_weight_norm
    numpy.eye
    montreal_forced_aligner.config.load_global_config
    x_padded.view.view
    self.ffn.clear_buffer
    self.run_model
    torch.softmax
    torch.nn.Linear
    torch.distributions.Normal
    wav_gt.view.cpu
    f0_mel.long.max
    torch.nn.functional.gelu
    f0.cpu.numpy
    self.sin_pos.cumsum
    montreal_forced_aligner.thirdparty.kaldi.validate_alignment_binaries
    state.items
    self.infer_ins.infer_once
    scales.len.torch.arange.float.to
    self.model.eval
    enc_dec_attn_constraint_mask.unsqueeze.bool
    torch.rsqrt
    os.path.join
    nonpadding.uv.p_pred.F.binary_cross_entropy_with_logits.sum
    self.extend_pe
    self._matmul_with_relative_keys
    g.detach.detach
    self.window.to
    self.sin_pos.sum
    IndexedDataset
    modules.tts.diffspeech.shallow_diffusion_tts.GaussianDiffusion.eval
    self.out_file.write
    self.embedding
    modules.vocoder.hifigan.hifigan.HifiGanGenerator
    c1
    ctx.embed_utterance
    self.MultiLayeredConv1d.super.__init__
    scipy.ndimage.morphology.binary_dilation
    y.reshape.reshape
    utils.commons.hparams.hparams.get
    torch.clamp
    subparser.add_argument
    montreal_forced_aligner.config.update_global_config
    self._in_proj
    dynamic_range_decompression_torch
    strip_ids.pop
    dur.numpy.clamp
    text.replace
    utils.nn.seq_utils.weights_nonzero_speech.sum
    main
    window.type_as.cuda
    encdec_attn.gather.max
    numpy.full
    linear_beta_schedule
    h.isoformat
    self.conv_pre
    self.decode_list
    numpy.mean
    load_adapt_config
    Conv1d
    pycwt.wavelet.MexicanHat
    convolutions.append
    numpy.copy
    montreal_forced_aligner.utils.get_pretrained_acoustic_path
    torch.flatten.view
    self.maxpool
    sum
    torch.isnan
    torch.nn.init.kaiming_normal_
    json.load.most_common
    DecSALayer
    self.get_task_ref.on_train_end
    self.reducer._set_forward_pass_work_handle
    self.save_codes
    x.size.x_lengths.sequence_mask.torch.unsqueeze.to
    filter
    gradio.Interface
    i.self.norm_layers_1
    get_all_ckpts
    torch.qr
    sample.cpu.numpy
    fmap.append
    LogSTFTMagnitudeLoss
    utils.audio.cwt.get_cont_lf0
    LayerNorm
    self.op.set_buffer
    self.DiscriminatorS.super.__init__
    torch.distributed.is_available
    f0_mel.np.rint.astype
    b
    f0.data.cpu.numpy
    join.split
    numpy.where
    numpy.linspace
    webrtcvad.Vad
    os.path.splitext
    moving_average
    self.out_proj.size
    self.save_terminal_logs
    self.linear_pos
    utils.commons.hparams.hparams.update
    self.encoder.long
    lengths.device.maxlen.lengths.len.torch.ones.to.cumsum.t
    self.run_evaluation
    tasks.tts.tts_utils.load_data_preprocessor
    self.forward_qkv
    matplotlib.pyplot.xlim
    self.pre_highway
    self.conv_q
    self.bias_k.repeat
    binarizer_cls
    montreal_forced_aligner.config.load_basic_align
    utils.nn.model_utils.num_params
    self.relu_drop
    self.linear.transpose
    self.conv_project2
    self.feed_forward_macaron
    _1D_window.t._1D_window.mm.float
    self.model.named_children
    utils.commons.ddp_utils.DDP.cuda
    width2.width1.width2.width1.torch.where.float
    utils.nn.seq_utils.softmax
    torch.matmul.float
    self.input_to_batch.get
    self.ResBlock2.super.__init__
    self.q_posterior
    self._attention_bias_proximal
    ExitHooks
    dur_pred.new_zeros
    numpy.linalg.qr
    cosine_beta_schedule
    librosa.effects.trim
    torch.nn.parallel.distributed._tree_flatten_with_rref
    collections.defaultdict
    sample.max
    resemblyzer.VoiceEncoder
    modules.tts.portaspeech.portaspeech_flow.PortaSpeechFlow.to
    end.weight.data.zero_
    self.add_pitch_loss
    f0.clamp.to
    self.build_dataloader
    w.terminate
    self.dec_inp_noise_proj
    utils.audio.align.mel2token_to_dur.log
    nonpadding.unsqueeze
    self.start
    torch.tensor
    torch.cuda.synchronize
    torch.cuda.empty_cache
    self.word_pos_proj
    alpha.dur.float.torch.round.long
    conv
    S.np.abs.astype
    super.__getitem__
    f0.clamp.clamp
    self.model_disc
    torch.nn.functional.pad
    self.get_pos_embed
    scales.len.torch.arange.float
    textgrid.TextGrid.fromFile.write
    self.predict_start_from_noise
    unfix_path
    self.get_weight
    loss.backward
    self.log_stft_magnitude_loss
    self.head_dim.self.num_heads.bsz.k.contiguous.view.transpose
    self.preprocessor.load_dict
    modules.vocoder.hifigan.mel_utils.mel_spectrogram
    numpy.log
    os.environ.get
    MultiprocessManager.get_results
    utils.commons.dataset_utils.batch_by_size.append
    list.items
    dur.sum.max
    loss.self.amp_scalar.scale.backward
    preprocess_args.get
    self.MultiHeadedAttention.super.__init__
    os.path.basename
    scipy.io.wavfile.write
    utils.commons.dataset_utils.collate_1d_or_2d
    self.wn.view
    self.norm_spec
    decoder_inp.detach.detach
    attn.size.torch.arange.to
    scores.masked_fill.masked_fill
    torch.angle
    self.WN.super.__init__
    self.metrics_to_scalars.state_dict
    torch.split
    cls.preprocess_text
    mel2ph.numpy.tolist
    batch.append
    warnings.filterwarnings
    torch.Size
    word_len.dur_pred.device.T.torch.arange.to.float
    binarizer_cls.split
    self.w_1
    self.res_blocks
    IndexError
    self.amp_scalar.scale
    self.reducer.prepare_for_backward
    self.norm_layers.append
    torch.nn.init.calculate_gain
    sys.exit
    self.forward_model
    self.l_mask.transpose.contiguous
    setattr
    tensors.item.items
    encdec_attn.gather.size
    self._get_weight
    self.add_dur_loss
    self._relative_position_to_absolute_position
    word_dur_g.float.sum
    torch.cuda.amp.GradScaler
    encoder_padding_mask.float.transpose
    src_padding.dur.self.length_regulator.detach
    RESERVED_TOKENS.index
    word_len.max
    super.__init__
    self.model.fs2.named_parameters
    self.wn.size
    data_gen.tts.txt_processors.base_text_processor.register_txt_processors
    self.PortaSpeechTask.super.save_valid_result
    librosa.feature.mfcc
    time_warp
    w.join
    v.contiguous.view
    torch.zeros_like
    numpy.exp.astype
    self.encoder.abs
    montreal_forced_aligner.utils.get_available_dict_languages
    tensors_to_np
    _ssim
    functools.wraps
    metrics.items
    validate_args
    x.permute
    modules.tts.commons.align_ops.build_word_mask
    self._get_item.get
    word_nonpadding.wdur_loss.sum
    torch.flip
    self.logger.add_audio
    self.ResBlock1.super.__init__
    dur.self.length_regulator.detach.float
    utils.metrics.diagonal_metrics.get_diagonal_focus_rate
    torch.arange.unsqueeze
    os.path.dirname
    self.training_losses_meter.update
    self.validation_step
    task
    torch.nn.init.zeros_
    self.model.detach
    SpectralConvergengeLoss
    self.layer_norm
    scipy.interpolate.interp1d
    self.prior_flow
    new_audio.append
    self.vocoder.spec2wav
    argparse.ArgumentParser.parse_known_args
    self.id_to_token.update
    utils.commons.hparams.hparams.y.squeeze.mel_spectrogram.transpose
    re.search
    utils.commons.ckpt_utils.get_all_ckpts
    tgt_mels.transpose.transpose
    EncSALayer
    self.encoder.view
    torch.distributed.barrier
    IndexedDatasetBuilder.finalize
    self.model.named_parameters
    spec_out.cpu.numpy
    modules.commons.transformer.MultiheadAttention
    nonpadding.f0.f0_pred.F.l1_loss.sum
    self.resolve_root_node_address.split
    self.resolve_root_node_address
    idx.items.idx.ds.all
    collate_2d
    self.get_task_ref.validation_step
    librosa.core.load
    x_recon.noise.abs.mean
    x.self.dropout.transpose
    numpy.dot
    self.model.store_inverse_all
    self.num_heads.attn_mask.repeat.reshape.repeat
    base_model_name.state_dict.items
    encdec_attn.shape.encdec_attn.reshape.softmax
    montreal_forced_aligner.command_line.download.run_download
    data_gen.tts.wav_processors.base_processor.get_wav_processor_cls
    pandas.DataFrame
    self.ConditionalConvBlocks.super.forward
    numpy.linalg.pinv
    all_ones.seg_ids.max_len.B.h.new_zeros.scatter_add_.contiguous
    t_t.self.k_channels.self.n_heads.b.query.view.transpose.view
    utils.audio.pitch.utils.denorm_f0
    montreal_forced_aligner.helper.setup_logger
    itvs_.append
    self.model_gen.detach
    montreal_forced_aligner.command_line.train_and_align.run_train_corpus
    DiffSpeechInfer.example_run
    torch.log2
    modules.commons.rnn.DecoderRNN
    numpy.isnan
    self.rnn
    torch.cat.size
    acoustic_model.adaptation_config.update_from_align
    dynamic_range_compression_torch
    x.abs.sum
    matplotlib.pyplot.vlines
    self.get_task_ref.val_dataloader
    slice
    montreal_forced_aligner.dictionary.MultispeakerDictionary
    TransformerFFNLayer
    x.abs.sum.transpose
    self.input_to_batch
    txt_struct_.append
    x_sqz.permute.contiguous
    self.MultiResolutionSTFTLoss.super.__init__
    self.stft_loss
    self.preprocess_input
    torch.ones_like
    self.encoder
    self.head_dim.self.num_heads.bsz.v.contiguous.view.transpose
    t.float.fill_.type_as
    self.size
    numpy.exp.exp
    random.shuffle
    numpy.repeat
    self.preprocessor.load_spk_map
    self.dilated_conv
    type
    torch.inverse
    multiprocessing.Queue
    ph2word.gather
    item_raw.get
    textgrid.TextGrid.fromFile
    h.new_ones
    argparse.ArgumentParser
    attn.size.torch.zeros.to
    self.enc_pos_proj
    torch.autograd.Variable
    layer
    T_txt.B.mel2token.new_zeros.scatter_add
    self.restore_weights
    torch.nn.functional.softplus
    collections.Counter
    math.sqrt
    self.flows.append
    fmap_rs.append
    torch.cat.view
    i.self.ffn_layers
    InvConvNear
    align_from_distances
    x_mask.unsqueeze.repeat
    dur_pred.np.cumsum.astype.cpu
    torch.transpose
    format
    lengths.unsqueeze.ids.bool.type
    k.task_ref.getattr.load_state_dict
    self.logger.add_figure
    tuple
    x2word.word2word.build_word_mask.float
    src_len.tgt_len.self.num_heads.bsz.attn_weights_float.view.transpose
    f0.clamp.min
    torch.cat.transpose
    torch.matmul
    gaussian
    time.gmtime
    self.eye.to
    data_gen.tts.wav_processors.base_processor.register_wav_processors
    torch.Tensor.sum
    numpy.concatenate.append
    torch.mean
    torch.nn.ModuleDict
    os.path.normpath
    numpy.random.shuffle
    numpy.cos
    mel_lengths.append
    montreal_forced_aligner.helper.log_config
    self.encoder_attn.in_proj_v
    ConvNorm
    self._absolute_position_to_relative_position
    importlib.import_module
    sample.split
    multiprocessing.Process.start
    args.corpus_directory.rstrip.rstrip
    file_to_cmp.append
    move_to_cpu
    torch.nn.utils.rnn.pack_padded_sequence.abs
    torch.nn.Conv2d
    self.proj_out
    torch.cat
    mel.torch.FloatTensor.unsqueeze.to
    time.strftime
    utils.commons.meters.AvgrageMeter.items
    tgt_padding_mask.float
    attn.max.values.sum
    self.register_buffer
    value.self.linear_v.view
    scores.torch.ones_like.triu
    torch.nn.functional.leaky_relu
    costs.T.cpu.detach
    numpy.interp
    x.size
    wav_pred.view.cpu.float
    re.findall
    self.scatter
    torch.gather
    self.embed
    utils.commons.meters.AvgrageMeter.update
    self.file.flush
    utils.text.encoding.get_encoding
    mfa_usr.adapt.run_adapt_model
    register_pitch_extractor
    torch.nn.Identity
    self._init_vocab_from_list
    modules.vocoder.hifigan.hifigan.MultiScaleDiscriminator
    utils.commons.hparams.hparams.split
    x.transpose.self.in_conv1d.transpose
    math.isinf
    self.proj.weight.data.zero_
    y.squeeze
    self.ln
    wav_gt.clamp.clamp
    utils.commons.hparams.hparams.y_.squeeze.mel_spectrogram.transpose
    numpy.abs
    norm_f
    espnet_positional_embedding.RelPositionalEncoding
    montreal_forced_aligner.utils.validate_dictionary_arg
    modules.tts.fs2_orig.FastSpeech2Orig.eval
    cur_model.load_state_dict
    self.op.clear_buffer
    self.run_model.cpu
    losses.detach
    txt_tokens.torch.zeros_like.bool
    override_config
    layers.Embedding
    self.get_task_ref.test_end
    target_padding_mask.src_padding_mask.attn.get_focus_rate.mean.mean
    get_last_checkpoint
    batches.append
    numpy.finfo
    self.args_queue.full
    torch.matmul.view
    weights.l1_loss.sum
    self.in_proj_qkv
    self.l_mask.transpose
    self.sin_pos.float
    torch.det
    sorted
    self.cond_net
    torch.cat.cos
    pyloudnorm.Meter
    self.word_encoder
    self.saving_result_pool.get_results
    numpy.log2
    PreNet
    self.task
    json.load.values
    self.ph_encoder.encode
    torch.sigmoid
    utils.plot.plot.spec_to_figure
    i.self.ups
    _get_full_incremental_state_key
    subprocess.check_call
    x_unsqz.permute.contiguous
    round
    self.encoder_attn
    tgt_padding_mask.float.new
    mel.torch.FloatTensor.unsqueeze
    re.sub.lower
    ax.twinx.set_ylim
    matplotlib.pyplot.show
    utils.audio.cwt.get_lf0_cwt
    items.append
    self.log_s.sum
    numpy.exp.all
    self.norm_ff
    AcousticModel2.validate
    y.squeeze.unsqueeze
    self._assert_ready_for_upsampling
    cls.process_align
    self.PitchPredictor.super.__init__
    positions.view.self.weights.index_select.view
    f0.torch.ones_like.mel2ph.ph_token.torch.zeros_like.float.scatter_add.clamp_min
    d
    z.permute.contiguous
    utils.commons.ddp_utils.DDP.eval
    create_parser.add_subparsers
    np_p.astype
    inference_cls.split
    attn_weights.mean.size
    modules.tts.commons.align_ops.mel2ph_to_mel2word
    logger.add_scalar
    x.transpose.transpose
    next
    self.get_task_ref.parameters
    seg_ids.repeat
    torch.where
    torch.nn.utils.clip_grad_value_
    self.layer_norm1
    self.postprocess_output
    self.get_task_ref.test_step
    l.strip.split
    i.self.meanpools
    model_name.split
    random.seed
    self.conv_v
    values.shape.size.values.len.values.new.fill_
    self._get_item.astype
    numpy.log10
    n_sqz.t.b.torch.ones.to
    self.mlp
    default
    self.log_metrics
    numpy.concatenate
    utils.audio.pitch.utils.f0_to_coarse
    torch.nn.functional.relu
    InvConv
    config_chains.append
    inverse_cwt_torch
    six.moves.range
    nonpadding.energy.energy_pred.F.mse_loss.sum
    ph.lower
    t_s.self.k_channels.self.n_heads.b.key.view.transpose.view
    os.path.isfile
    token_mask.long.token_idx.sum
    cwt_gt.cpu.numpy
    self.cbhg
    dur_pred.log
    word_len.dur_pred.device.T.torch.arange.to.float.sum
    super.forward
    iter
    self.res_skip_layers.append
    self.configure_ddp
    torch.nn.functional.dropout.abs
    dur_gt.sum.log
    i.self.res_skip_layers
    self.reset_parameters
    x.transpose.contiguous
    word_id.max.B.dur_gt.new_zeros.scatter_add.log
    matplotlib.pyplot.figure.colorbar
    classname.find
    p.process
    base_model_name.state_dict.items.keys
    self.scheduler.step
    self.linear_k
    self.FastSpeech2OrigTask.super.add_pitch_loss
    cwt_out.cpu.numpy
    training_config.training_configs.update
    wav.np.abs.max.astype
    self.model_gen.parameters
    mel2token.repeat
    max_len.B.h.new_zeros.scatter_add_
    tgt_padding_mask.float.max
    utils.audio.pitch.utils.norm_f0
    mask.unsqueeze.eq
    mask.unsqueeze.eq.unsqueeze
    modules.tts.portaspeech.fvae.FVAE
    self.model.to
    self.ups.append
    attn_weights.mean.masked_fill
    stft
    multiprocessing.freeze_support
    self.ffn_layers.append
    torch.rand
    target.abs.sum.ne.float.repeat
    self.get_plot_dur_info
    self.encoder_attn.clear_buffer
    self.sign_s.to
    outputs.append
    cond.transpose.transpose
    torch.tanh
    y.reshape
    self.length_regulator
    self.mel_out
    torch.randint
    Exception
    modules.tts.portaspeech.portaspeech_flow.PortaSpeechFlow
    pos_emb.size
    montreal_forced_aligner.aligner.PretrainedAligner
    pickle.loads
    self.text_encoder_postnet
    self.nn
    round.items
    torch.FloatTensor
    log_s.sum
    D1.copy
    i.self.norm_layers
    random.randint
    torch.nn.LayerNorm
    self.bnorm
    nonpadding.unsqueeze.repeat
    scipy.linalg.lu
    matplotlib.pyplot.savefig
    env_vars.append
    ax.set_yticklabels
    T.mel2token.mel2token_to_dur.float
    subparsers.add_parser.add_argument
    utils.commons.ddp_utils.DDP.zero_grad
    get_local_context
    to_mp3
    layers.ConvolutionModule
    self.conv_k.bias.data.copy_
    self.ConvBlocks.super.__init__
    self.pointwise_conv2
    args.corpus_directory.rstrip
    self.attn
    self.convs2.apply
    self.build_scheduler
    torch.nn.functional.l1_loss.mean
    collate_1d
    self.conv_module
    t_s.self.k_channels.self.n_heads.b.key.view.transpose.transpose
    sep.join
    c.self.model.view.cpu
    parselmouth.Sound
    self.apply
    numpy.rint
    samples.get.size
    self.fvae
    self.RNNEncoder.super.__init__
    torch.nn.init.normal_
    dataset.ordered_indices
    self.norm_final
    c2
    modules.vocoder.hifigan.stft_loss.MultiResolutionSTFTLoss
    self._get_input_buffer.view
    torch.distributed.get_world_size
    BatchNormConv
    self.datasets.collater
    modules.tts.glow.utils.unsqueeze
    weights.mse_loss.sum
    utils.metrics.ssim.ssim
    mask.mask.mask.torch.cumsum.type_as.long
    torch.max
    REGISTERED_TEXT_PROCESSORS.get
    t_s.self._attention_bias_proximal.to
    yaml.safe_dump
    scipy.ndimage.laplace
    self._atomic_save
    yaml.safe_load
    dur_padding.long.float
    self.model.parameters
    ph_token.torch.zeros_like.float.scatter_add
    montreal_forced_aligner.command_line.classify_speakers.run_classify_speakers
    self.p.to
    Args
    self.feed_forward
    self.run_model.values
    int
    output.transpose.contiguous
    self.post_flow
    self.forward_attention
    np_l.astype
    v.replace.strip
    self._safe_id_to_token
    torch.nn.utils.rnn.pack_padded_sequence.transpose_
    espnet_transformer_attn.RelPositionMultiHeadedAttention
    sample.get.float
    torch.nn.functional.conv1d
    modules.tts.commons.align_ops.clip_mel2token_to_multiple.float
    DiscriminatorS
    resblock
    fs2_mels.transpose.transpose
    self.window.data.type
    Encoder
    modules.commons.transformer.FastSpeechDecoder
    math.exp
    torch.stack
    self.train
    self.bias.data.copy_
    self.diffusion_projection
    wav_pred.clamp.abs
    torch.ones_like.sum
    self._get_item
    self.evaluate
    optimizer.load_state_dict
    new_hparam.split
    self.raw_data_dir.open.readlines
    mel2ph.long
    AcousticModel2.log_details
    numpy.log.astype
    args.dictionary_path.lower.endswith
    torch.nn.ConstantPad1d
    self.preprocess_input.get
    torch.nn.functional.dropout.detach
    modules.commons.rel_transformer.RelTransformerEncoder
    utils.commons.indexed_datasets.IndexedDatasetBuilder.add_item
    d.items.append
    matplotlib.pyplot.text
    self.embed_positions.view
    kwargs.get
    _traceback
    mel.exp
    utils.audio.vad.trim_long_silences
    self.conv1d_bank.append
    Tee
    attn_weights.mean.view
    self.post_net1
    res.update
    audio_mask.np.round.astype
    k.all_losses_meter.update
    matplotlib.pyplot.pcolor
    utils.commons.ddp_utils.DDP.train
    GradioInfer
    self.norm_layers_2.append
    x.get_device
    pe.unsqueeze.unsqueeze
    torch.utils.tensorboard.SummaryWriter
    torch.cat.permute
    cal_localnorm_dist
    self.build_vocoder
    self.model
    torch.nn.init.xavier_normal_
    NotImplementedError
    montreal_forced_aligner.command_line.g2p.run_g2p
    u.double.torch.inverse.float.double
    get_pitch_extractor
    torch.log
    self.PortaSpeechFlow.super.forward.repeat
    torch.flatten
    device.b.self.K_step.torch.randint.long
    utils.commons.hparams.set_hparams
    self.conv_project1
    numpy.pad.cpu
    dur_gt.cpu.numpy
    hasattr
    attn_weights.mean.mean
    wav_gt.view.cpu.float
    torch.sqrt
    adapt_model
    enumerate
    torch.nn.Embedding
    torch.sin
    dur_gt.np.cumsum.astype.cpu
    pyloudnorm.Meter.integrated_loudness
    self.z_channels.T.B.torch.randn.kwargs.get.to
    exists
    montreal_forced_aligner.corpus.align_corpus.AlignableCorpus.speaker_utterance_info
    utils.metrics.diagonal_metrics.get_phone_coverage_rate
    word_id.max.B.dur_gt.new_zeros.scatter_add.float
    FVAEEncoder
    self.head_dim.self.num_heads.bsz.tgt_len.q.contiguous.view.transpose
    x.transpose.abs
    self.weight.float
    img1.data.type
    optimizer.state_dict
    montreal_forced_aligner.config.load_command_configuration.save
    optimizer.state.values
    self.pointwise_conv1
    utils.commons.tensor_utils.tensors_to_scalars
    self.energy_embed
    utils.nn.model_utils.print_arch
    self._set_input_buffer
    self.DiscriminatorP.super.__init__
    spec.transpose.transpose
    self.token_encoder.eos
    sample.abs
    utils.audio.librosa_wav2spec
    join
    t_t.self.k_channels.self.n_heads.b.query.view.transpose.size
    montreal_forced_aligner.config.align_yaml_to_config
    model.parameters
    output.transpose.transpose
    matplotlib.pyplot.axis
    x_pos.x_pos.sum.clamp.x_pos.cumsum.sum
    self.encoder.unsqueeze
    modules.tts.fs.FastSpeech
    self.MultiScaleDiscriminator.super.__init__
    init_ctx_func
    montreal_forced_aligner.config.load_command_history
    yaml.safe_load.items
    montreal_forced_aligner.thirdparty.kaldi.validate_train_dictionary_binaries
    torch.nn.utils.weight_norm
    torch.logdet
    self.DurationPredictor.super.__init__
    H.max_len.B.h.new_zeros.scatter_add_
    sample.get
    optimizer_states.append
    torch.nn.functional.multi_head_attention_forward
    n_sqz.t.b.torch.ones.to.unsqueeze
    self.denorm_spec
    torch.Generator.manual_seed
    self.vocoder.to
    create_parser
    numpy.real.std
    self.decoder
    self.run_vocoder
    glob.glob
    k
    self.fit
    hparams.FS_DECODERS
    str
    self._sync_params
    Linear
    x.abs.sum.long.sum
    self.attn_layers.append
    gradio.Interface.launch
    self.FastSpeechEncoder.super.forward
    self.ConformerEncoder.super.forward.abs
    self.dur_predictor
    scipy.io.wavfile.read
    torch.tensor.expand
    reduce_tensors
    target_len.size.target_len.new.fill_
    diagonal_attn.sum.sum
    modules.commons.nar_tts_modules.LengthRegulator
    STFTLoss
    binarize
    torch.nn.utils.rnn.pack_padded_sequence.size
    create_window
    set.add
    torch.Tensor
    itertools.chain.from_iterable
    self.forward_embedding
    FlipLayer
    self.depthwise_conv
    self.p.inverse
    numpy.angle
    utils.nn.seq_utils.softmax.type_as
    value.size
    input_lengths.cpu.numpy.cpu
    f.readlines
    torch.ones
    extract
    kwargs.items
    self.FastSpeechDataset.super.collater
    spec.cpu.numpy.cpu
    self.model_gen.squeeze
    self.layer_norm2
    model_log_variance.exp
    preprocess
    wav_raw.np.abs.max
    montreal_forced_aligner.dictionary.Dictionary
    txt_loader
    data_gen.tts.runs.binarize.binarize
    numpy.seterr
    xs.ind.mask.expand_as.to
    saved_hparams.update
    tasks.tts.vocoder_infer.base_vocoder.get_vocoder_cls
    modules.tts.diffspeech.net.DiffNet
    utils.commons.multiprocess_utils.multiprocess_run_tqdm
    self.module.training_step
    f0.data.cpu
    utils.audio.cwt.cwt2f0
    y_batch.collate_2d.transpose
    ActNorm
    montreal_forced_aligner.aligner.TrainableAligner.export_textgrids
    cal_localnorm_dist.squeeze
    numpy.sign.astype
    numpy.argsort
    montreal_forced_aligner.utils.get_available_ivector_languages
    self.linear
    item.get
    matplotlib.pyplot.close
    self.get_task_ref.test_dataloader
    librosa_pad_lr
    self.spk_id_proj
    self.FastSpeech2OrigTask.super.save_valid_result
    print
    utils.audio.io.save_wav
    modules.commons.conv.ConditionalConvBlocks
    self.conv_1
    montreal_forced_aligner.helper.setup_logger.debug
    cls.process_audio
    lf0_rec_sum.std.mean
    Mish
    tqdm.tqdm
    t_s.self.k_channels.self.n_heads.b.value.view.transpose
    imag.real.torch.clamp.torch.sqrt.transpose
    self.layer_norm3
    acoustic_model.meta.get
    txt_tokens.eq
    torch.mean.remove_weight_norm
    utils.commons.trainer.Trainer.fit
    pos.startswith
    CouplingLayer
    self.word_encoder.encode
    word_len.torch.arange.to
    self.file.write
    self.out_proj
    k.config_node.type
    self.bias_v.repeat
    c_batch.collate_2d.transpose
    load_config
    noise_like
    unmatched_keys.append
    self.metrics_to_scalars.item
    encdec_attn.max.values.sum.argmax
    torch.nn.functional.binary_cross_entropy_with_logits
    nonpadding.losses.sum
    torch.optim.lr_scheduler.StepLR
    modules.tts.fs.FastSpeech.eval
    ret.transpose.transpose
    torch.device
    self.post
    unicodedata.category
    torch.nn.utils.rnn.pack_padded_sequence.transpose
    w.split
    cls.infer_once
    target.size
    montreal_forced_aligner.command_line.train_ivector_extractor.run_train_ivector_extractor
    self.output_projection
    open
    numpy.zeros_like
    sentence.strip.split
    query.size
    atexit.register
    modules.tts.diffspeech.shallow_diffusion_tts.GaussianDiffusion
    int16_max.wav.np.round.astype
    torch.nn.functional.conv2d.pow
    list.append
    z.permute.contiguous.view
    multiprocessing.Process
    utils.commons.multiprocess_utils.MultiprocessManager
    sample.cpu
    self.byte_offsets.append
    q.insert
    utils.text.text_encoder.is_sil_phoneme
    self.Reshape.super.__init__
    item_.update
    move_to_cuda
    sample.float
    args.hparams.split
    utils.commons.dataset_utils.collate_2d
    x.permute.contiguous
    exit
    dtw
    istft
    a.gather
    _is_batch_full
    dur_gt.np.cumsum.astype
    dur_input.detach.detach
    utils.metrics.diagonal_metrics.get_focus_rate
    self.in_proj_k
    numpy.random.seed
    self.encoder.transpose
    cwt_gt.cpu.numpy.cpu
    device.t.torch.tensor.long
    self.EncoderLayer.super.__init__
    self.token_encoder.decode
    t.float
    numpy.triu
    src_padding_mask.float
    cls_name.pkg.importlib.import_module.getattr
    utils.commons.trainer.Trainer.test
    x_sqz.permute.contiguous.view
    self.in_proj_q
    f.readlines.join.strip
    self.linear_q
    DiscriminatorP
    matplotlib.use
    self.fs2
    torch.nn.utils.clip_grad_norm_
    torch.cat.new_zeros
    utils.commons.dataset_utils.collate_1d
    self.weights.to
    nonpadding_sqz.logpx.logqx.sum
    numpy.sign
    wav_out.cpu.numpy.cpu
    attn.size.attn.size.attn.new.bool.fill_
    mel.exp.sum
    word_len.max.B.torch.zeros.to
    nltk.tokenize.TweetTokenizer
    self.register_parameter
    word_id.max.B.dur_pred.new_zeros.scatter_add.log
    self.weights.index_select
    new_np.append
    self.check_index
    target.abs
    self.PortaSpeechTask.super.add_dur_loss
    matplotlib.pyplot.xlabel
    input
    montreal_forced_aligner.config.load_command_configuration
    encdec_attn.gather.reshape
    modules.commons.layers.Embedding
    super.collater
    length.max
    torch.nn.functional.pad.view
    nonpadding_sqz.sum
    torch.nn.GELU
    logging.info
    self.enc_pos_proj.transpose
    self.weight.float.torch.inverse.to
    datetime.datetime.now.strftime
    item.torch.LongTensor.to
    montreal_forced_aligner.command_line.train_lm.run_train_lm
    wav_out.cpu.numpy
    os.replace
    lengths.unsqueeze.ids.bool
    MultiprocessManager
    a.gather.reshape
    torch.is_grad_enabled
    ax.set_yticks
    self.get_lr
    p.insert
    self.SSIM.super.__init__
    nonpadding.wdur.sum
    base_model_name.state_dict.items.items
    ldj.mean
    self.FastSpeech2OrigTask.super.__init__
    numpy.random.normal
    torch.tensor.mean
    matplotlib.pyplot.xticks
    torch.nn.ModuleList
    args.dictionary_path.lower
    matplotlib.pyplot.plot
    lengths.torch.max.item
    is_sil.is_sil.cumsum.long.max
    y.squeeze.squeeze
    self.in_conv1d
    self.results_queue.get
    ResidualBlock
    ph2word.device.word_len.max.B.torch.zeros.to.scatter_add
    modules.commons.wavenet.WN
    norm_f0
    numpy.random.randn
    time.time
    ph.split
    target_padding_mask.src_padding_mask.attn.get_focus_rate.mean
    move_to_cpu.cpu
    math.log
    word_id.max.B.dur_pred.new_zeros.scatter_add
    matplotlib.pyplot.ylabel
    cls.process_wav
    inverse_cwt
    uv.float.sum
    datetime.datetime.now
    torch.nn.Sequential
    target_padding_mask.src_seg_mask.src_padding_mask.attn.get_phone_coverage_rate.mean.mean
    i.self.attn_layers
    self._init_vocab_from_file
    LambdaLayer
    os.getenv
    nonpadding.transpose.transpose
    torch.tril.size
    utils.audio.pitch.utils.norm_interp_f0
    cls.g2p
    torch.cdist
    torch.distributed.init_process_group
    self.validation_end
    self.__setattr__
    m.store_inverse
    _1D_window.t._1D_window.mm.float.unsqueeze.unsqueeze.expand
    self.initialize
    args.acoustic_model_path.lower
    self.get_task_ref.build_tensorboard
    self._check_sync_bufs_post_fwd
    numpy.random.rand
    nltk.pos_tag
    s.split
    torch.randn_like
    torch.LongTensor.max
    f0_mel.long.min
    torch.cat.contiguous
    torch.nn.Softplus
    utils.nn.seq_utils.group_hidden_by_segs
    utils.nn.schedulers.NoneSchedule
    self.DecoderRNN.super.__init__
    outputs.pd.DataFrame.to_csv
    window_size.gaussian.unsqueeze.mm
    hs.append
    f.writelines
    montreal_forced_aligner.command_line.thirdparty.run_thirdparty
    x.transpose.contiguous.view
    x.scipy.ndimage.laplace.var
    torch.nn.BatchNorm1d
    utils.commons.tensor_utils.tensors_to_scalars.values
    montreal_forced_aligner.helper.setup_logger.removeHandler
    torch.sum
    self.get_task_ref.named_parameters
    mel2ph.clamp
    pyloudnorm.normalize.loudness
    T.torch.arange.to
    window_size.gaussian.unsqueeze
    self.get_task_ref.on_epoch_end
    tensors.item.item
    self.apply_sparse_mask
    self.beta.view
    mel2token_to_dur
    v.replace.replace
    self.out_file.close
    torch.nn.LSTM
    self.token_encoder.pad
    montreal_forced_aligner.command_line.anchor.run_anchor
    FVAEDecoder
    local_src.mean.unsqueeze
    spectral_normalize_torch
    montreal_forced_aligner.aligner.TrainableAligner
    dur_pred.sum
    shuffle_batches
    output.transpose.contiguous.view.transpose
    nonpadding.uv.uv_pred.F.binary_cross_entropy_with_logits.sum
    length.unsqueeze
    modules.tts.portaspeech.portaspeech_flow.PortaSpeechFlow.store_inverse_all
    torch.nn.functional.linear
    float.t.float.fill_.type_as
    multiprocess_run
    dataloader.sampler.set_epoch
    y.unsqueeze.transpose
    torch.clamp_min
    getattr.start
    torch.chunk
    self.denoise_fn.clamp_
    target_len.size
    wav_gt.clamp.abs
    math.sqrt.masked_fill
    torch.distributed.get_rank
    self.VocoderBaseTask.super.__init__
    mel.torch.from_numpy.float.to
    self.metrics_to_scalars.parameters
    x.long.sum
    self.vocoder.eval
    T.mel2ph.mel2token_to_dur.float
    torch.utils.data.DataLoader
    spec_gt.cpu.numpy
    utils.commons.ckpt_utils.load_ckpt
    modules.tts.commons.align_ops.expand_states
    f0.cpu.numpy.cpu
    wav_gt.view.cpu.float.numpy
    torch.from_numpy
    hparams.clear
    self.logs.shape.logs.view.to
    tasks.tts.tts_utils.parse_mel_losses
    cls.process_pitch
    numpy.any
    self.run_decoder
    self.skip_projection
    torch.nn.functional.dropout.permute
    self.amp_scalar.update
    zip
    float
    seq_len.bsz.positions.view.self.weights.index_select.view.detach
    self.output_fn
    os.cpu_count
    x.self.dropout.transpose.self.w_2.transpose
    matplotlib.pyplot.yticks
    enc_dec_attn_constraint_mask.unsqueeze
    t.float.fill_
    ExitHooks.hook
    norm_scale
    self.drop.unsqueeze
    mel_out.cpu
    self.concat_linear
    window.type_as.type_as
    torch.nn.functional.conv2d
    getattr.split
    self.activation
    self.STFTLoss.super.__init__
    self.get_task_ref
    montreal_forced_aligner.command_line.create_segments.run_create_segments
    lengths.tolist.tolist
    x_recon.shape.torch.randn.to
    self.ffn
    modules.commons.conv.TextConvEncoder
    chardet.detect
    cls.add_bdr
    set
    self.get_task_ref.on_train_start
    f_txt.write
    utils.commons.hparams.hparams.int.np.zeros.astype
    audio_out.astype.astype
    f0.clamp.flatten
    self.pre_net
    self.rel_shift
    norm_builder
    modules.commons.normalizing_flow.glow_modules.Glow
    lengths.lengths.device.maxlen.lengths.len.torch.ones.to.cumsum.t.t
    modules.commons.nar_tts_modules.EnergyPredictor
    self.plot_mel
    self.LayerNorm.super.forward
    self.pos_embed
    self.denoise_fn
    GradioInfer.run
    optimizer.step
    utils.nn.seq_utils.make_positions
    self.norm_conv
    is_sil.is_sil.cumsum.long
    spec.pow.sum
    args_queue.get
    AcousticModel2
    utils.text.text_encoder.build_token_encoder
    numpy.maximum
    utils.os_utils.remove_file
    torch.nn.functional.softmax
    self.conv_layers.append
    struct.pack
    model_out.cpu
    self.PositionalEncoding.super.__init__
    torch.relu
    TransformerEncoderLayer
    total_loss.item.item
    self.lstm.flatten_parameters
    self.op
    dur.numpy.numpy
    self.item_names.append
    key_padding_mask.unsqueeze.unsqueeze
    self.dec_query_proj.transpose
    torch.nn.Parameter
    numpy.real.mean
    i.self.in_layers
    torch.is_tensor
    ssim_map.mean
    self.wn.permute
    torch.no_grad
    self.W2
    self.attention
    attn.transpose.contiguous.view
    attn.device.attn.size.torch.arange.to.float
    extractor_name.get_pitch_extractor
    k.transpose.transpose
    montreal_forced_aligner.config.train_yaml_to_config
    mel2ph.float
    sorted.add
    decoded_ids.append
    EnG2p.word_tokenize
    v.transpose.transpose
    os.path.exists
    dur_padding.long
    d.items
    self._phone_encoder
    v.size
    spec.cpu.numpy
    x.abs.sum.float
    montreal_forced_aligner.config.update_command_history
    idx.items.update
    weight.float.torch.inverse.to
    cur_model.state_dict
    item_name.replace
    self.workers.append
    path.np.load.item
    torch.float32.x.size.torch.arange.unsqueeze
    self.token_encoder.seg
    super.add_dur_loss
    x.abs.sum.eq.transpose
    utils.commons.meters.AvgrageMeter
    self.rnn.flatten_parameters
    self.ConvNorm.super.__init__
    modules.commons.layers.Embedding.cos
    self.g_proj
    x.abs.sum.ne.sum
    numpy.clip
    self.forward_energy
    self.metrics_to_scalars.cuda
    ind.mask.expand_as
    montreal_forced_aligner.command_line.validate.run_validate_corpus
    num_tokens_fn
    x.reshape.reshape
    json.dump
    self.num_heads.attn_mask.repeat.reshape.unsqueeze
    torch.cat.append
    torch.diag
    word_id.max.B.dur_gt.new_zeros.scatter_add
    numpy.random.permutation
    self.pitch_embed
    self.FVAE.super.__init__
    y_d_rs.append
    self.forward_dur.float
    utils.audio.align.mel2token_to_dur
    f0.clamp.max
    txt_tokens.float
    self.TokenTextEncoder.super.__init__
    torch.log.view
    convert_continuos_f0
    self.FastSpeechDataset.super.__getitem__
    tensor.ne
    unicodedata.normalize
    self.token_encoder.sil_phonemes
    _stft
    key.self.linear_k.view
    self.get_task_ref.training_step
    _1D_window.t._1D_window.mm.float.unsqueeze.unsqueeze
    torch.nn.utils.rnn.pad_packed_sequence
    ax.imshow
    is_sil_phoneme
    window.torch.getattr
    attn.size.attn.size.attn.new.bool.fill_.float
    self.get_task_ref.named_children
    modules.vocoder.hifigan.hifigan.MultiPeriodDiscriminator
    results_queue.put
    map_func_
    torch.LongTensor
    dict
    self.pre
    re.sub
    x_unsqz.permute.contiguous.view
    torch.nn.ConvTranspose1d
    self.HifiGanGenerator.super.__init__
    _istft
    self._get_relative_embeddings
    librosa.core.get_samplerate
    numpy.pad
    tensors_to_scalars
    self.get_attn_stats
    packaging.version.parse
    ax.twinx.plot
    token.strip
    self.embed_tokens
    wav_pred.view.cpu
    self.get_task_ref.on_before_optimization
    args.acoustic_model_path.lower.endswith
    utils.commons.hparams.hparams.get_vocoder_cls
    local_tgt.mean.unsqueeze
    key_padding_mask.size.torch.zeros.type_as
    self.BaseTask.super.__init__
    self._check_sync_bufs_pre_fwd
    emb.torch.cos.emb.torch.sin.torch.cat.view
    diagonal_attn.sum
    self.mel_losses.items
    self.fs2.cwt2f0_norm
    self.args_queue.put
    src.numel
    sample.abs.sum.eq
    self.emb
    self.SpeechBaseTask.super.validation_end
    self._matmul_with_relative_values
    SinusoidalPositionalEmbedding.get_embedding
    re.compile
    numpy.prod
    re.sub.split
    montreal_forced_aligner.command_line.align.run_align_corpus
    modules.commons.rnn.RNNEncoder
    wav_pred.clamp.clamp
    self.cond_layer
    utils.commons.ddp_utils.DDP
    gradio.inputs.Textbox
    utils.os_utils.copy_file
    positionwise_layer
    self._training_step
    os.environ.split
    utils.commons.hparams.hparams.y_.squeeze.mel_spectrogram.transpose.detach
    self.file.close
    sample.size
    numpy.sqrt
    f0s.items
    samples.get
    attn.size.torch.ones.to
    montreal_forced_aligner.aligner.TrainableAligner.save
    preprocess_args.update
    loss_name.self.getattr
    self.token_encoder.encode
    prons.extend
    self.saving_result_pool.add_job
    self.weights.size
    torch.arange
    modules.vocoder.hifigan.hifigan.discriminator_loss
    self.get_task_ref.on_keyboard_interrupt
    itv_ph.lower
    self.eval
    wav_pred.abs.max
    g2p_en.expand.normalize_numbers
    x.permute.contiguous.view
    t_s.self.k_channels.self.n_heads.b.key.view.transpose
    item.split
    y.squeeze.clamp
    self.num_heads.attn_mask.repeat.reshape.new_zeros
    target.abs.sum
    create_parser.parse_known_args
    torch.nn.ReLU
    tqdm.tqdm.set_postfix
    utils.commons.meters.Timer
    utils.commons.dataset_utils.batch_by_size
    txt.cls.preprocess_text.strip.split
    self.p_losses
    montreal_forced_aligner.command_line.train_g2p.run_train_g2p
    data_gen.tts.runs.train_mfa_align.train_mfa_align
    dst.numel
    montreal_forced_aligner.utils.get_available_acoustic_languages
    sample.abs.sum
    self.LogSTFTMagnitudeLoss.super.__init__
    x_recon.noise.abs
    self.log_metrics_to_tb
    self.get_task_ref.on_after_optimization
    encdec_attn.max.values.sum.argmax.repeat
    x.permute.self.ffn_1.permute
    torch.exp
    self.norm_layers_1.append
    task_cls
    weights.ssim_loss.sum
    samples_.append
    self.wn.remove_weight_norm
    skip.append
    self._get_input_buffer
    self.q_sample
    numpy.real
    numpy.save
    self.process_data
    self.conditioner_projection
    sentence.strip
    REGISTERED_WAV_PROCESSORS.get
    hparams.get
    ids.list.index
    x.size.x_lengths.sequence_mask.torch.unsqueeze.to.unsqueeze
    pitch_pred_inp.detach.detach
    self.conv
    modules.commons.transformer.FastSpeechEncoder
    ret.repeat.reshape
    FFN
    self.ResidualBlock.super.__init__
    list.to
    self.proj
    torch.cos
    self.hparams.get
    cls.txt_to_ph
    torch.nn.init.constant_
    sequence_mask
    re.sub.replace
    self.jobs_pending.append
    lf0_rec_sum.std.std
    RuntimeError
    hparams.update
    self.resblocks.append
    self.LengthRegulator.super.__init__
    logs_q.exp
    FastSpeechInfer.example_run
    self.add_energy_loss
    numpy.diag
    self.forward_pitch
    mel2ph.np.array.max
    split_locs.append
    cls.postprocess
    torch.norm
    self.enc
    self.PortaSpeechFlow.super.forward
    self.conv_post.apply
    self.g_prenet
    n_sqz.x_mask.unsqueeze.repeat.view
    self.pad
    dist
    device.b.self.K_step.torch.randint.long.float
    torch.nn.functional.l1_loss
    dst.copy_
    torch.nn.functional.dropout.transpose
    self._flatten_parameters
    self.data_file.read
    torch.Tensor.abs
    torch.tril
    k.split.split
    self._check_global_requires_backward_grad_sync
    parser.add_subparsers.add_parser
    encdec_attn.gather.gather
    dur_input.data.abs.sum
    utils.audio.align.get_mel2ph
    self.module.validation_step
    tasks.tts.vocoder_infer.base_vocoder.register_vocoder
    loss_output.values
    super.test_start
    self.norm_mha
    base_fn.replace.replace
    flow
    self.input_projection
    padding_idx.tensor.ne.int
    self.forward_decoder
    copy_tensor
    torch.distributed.all_reduce
    utils.nn.seq_utils.softmax.view
    l.split
    reversed
    x_unsqz.permute.contiguous.view.permute
    PortaSpeechFlowInfer.example_run
    noise
    montreal_forced_aligner.utils.get_available_g2p_languages
    i.self.conv_layers
    wav_pred.clamp.view
    self.fvae.decoder
    dur_pred.sum.log
    l.strip
    self.run_single_process
    x.abs.sum.long
    item_name.append
    self.model_gen
    torch.nn.functional.glu
    f0.clamp.exp
    fmap_gs.append
    t_t.self.k_channels.self.n_heads.b.query.view.transpose
    self.linear_out
    mel2ph.float.transpose
    self.lambd
    super.collater.update
    utils.commons.ckpt_utils.get_last_checkpoint
    self.last_ln
    copy.deepcopy
    json.dumps
    logging.basicConfig
    self.hiddens.append
    s.numel
    self.pos_bias_u.q.transpose
    process_cls
    self.pe.to
    target.abs.sum.ne
    self.training_losses_meter.items
    torch.stft
    sample.float.float
    torch.distributions.Normal.log_prob
    self.norm_ff_macaron
    torch.bmm
    self.get_task_ref.on_epoch_start
    self.ups.apply
    y_batch.collate_2d.transpose.size
    y_d_gs.append
    scores.dtype.torch.tensor.numpy
    self.prior_dist.log_prob
    device.shape.torch.randn.repeat
    utils.audio.rnnoise.rnnoise
    self.diffusion_embedding
    h
    self.dump_checkpoint
    utils.nn.schedulers.RSQRTSchedule
    output.pop
    librosa.stft
    self.metrics_to_scalars
    self.dataset_cls
    num_params
    self.forward_style_embed
    modules.vocoder.hifigan.hifigan.generator_loss
    self.n_split.self.n_split.torch.FloatTensor.normal_
    preprocessor.txt_to_ph
    self.in_layers.append
    self.pe.size
    x.size.size
    shutil.rmtree
    eval
    i.self.norm_layers_2
    numpy.load
    skimage.transform.resize
    self.norm
    self.pos_bias_v.q.transpose
    super.on_train_start
    numpy.arange
    torch.cuda.device_count
    attn.size.attn.size.attn.new.bool
    self.get_task_ref.cuda
    self.gamma.view
    q.transpose.transpose
    len
    src_padding.dur.self.length_regulator.detach.float
    txt_tokens.eq.float
    _build_mel_basis
    g.self.dataset.len.torch.randperm.tolist
    y.cpu.numpy
    self.get_task_ref.validation_start
    ph_lengths.append
    lengths.device.maxlen.lengths.len.torch.ones.to.cumsum
    self.conv1d
    numpy.argmin
    txt.split.split
    self.lstm
    utils.commons.trainer.Trainer
    modules.commons.normalizing_flow.res_flow.ResFlow
    filecmp.cmp
    attn_logits.torch.stack.transpose
    q.contiguous.view
    torch.IntTensor
    matplotlib.pyplot.figure
    self.forward_dur
    new_config.items
    layers.EncoderLayer
    utils.nn.seq_utils.weights_nonzero_speech
    x.transpose.self.LayerNorm.super.forward.transpose
    matplotlib.pyplot.gca.twinx
    self.g_prenet.apply
    scores.torch.softmax.masked_fill
    pickle.dumps
    self.dropout
    f.read
    IndexedDatasetBuilder
    resemblyzer.VoiceEncoder.cuda
    mask.torch.cumsum.type_as
    functools.partial
    k.contiguous.view
    self.block_length.scores.torch.ones_like.triu.tril
    torch.distributions.kl_divergence
    numpy.append
    binarization_args.update
    sampler_cls
    self.sin_pos
    montreal_forced_aligner.corpus.align_corpus.AlignableCorpus
    self.predict
    json.load
    uv.to.to
    window_size.window_size.channel._2D_window.expand.contiguous
    SinusoidalPosEmb
    torch.randn
    self.reducer.prepare_for_forward
    yaml.safe_load.update
    sample.cpu.numpy.float
    dur_pred.np.cumsum.astype
    lengths.tolist.unsqueeze
    torch.randperm
    logs.torch.exp.m.view
    torch.nn.InstanceNorm1d
    torch.multiprocessing.spawn
    torch.nn.parallel.distributed.logging.info
    token_mask.long
    wav2spec_dict.astype.astype
    super
    HighwayNetwork
    ConvReluNorm
    modules.commons.nar_tts_modules.PitchPredictor
    run_task
    os.path.normpath.startswith
    utils.commons.indexed_datasets.IndexedDataset
    self.convs1.apply
    pycwt.wavelet.cwt
    query.self.linear_q.view
    self.build_optimizer
    wav.np.abs.max
    webrtcvad.Vad.is_speech
    montreal_forced_aligner.helper.setup_logger.info
    torch.Generator
    outputs.cpu.numpy
    modules.tts.portaspeech.portaspeech_flow.PortaSpeechFlow.eval
    self.self_attn
    montreal_forced_aligner.config.load_basic_align.update_from_args
    self.data_file.close
    torch.cuda.set_device
    super.__getitem__.get
    get_padding
    torch.set_grad_enabled
    k.startswith
    token_gen
    seq_range.unsqueeze.expand
    self.conv_2
    self.p_mean_variance
    utils.commons.tensor_utils.tensors_to_scalars.items
    copy.copy
    modules.commons.layers.LayerNorm
    dur.shape.torch.arange.to
    self.in_proj_v
    tgt_padding_mask.float.size
    list
    timestep.view
    self.module.test_step
    input_lengths.cpu.numpy
    modules.commons.layers.Embedding.sin
    torch.istft
    hparams.DIFF_DECODERS
    torch.nn.AvgPool1d
    montreal_forced_aligner.utils.get_available_lm_languages
    c.self.model.view
    os.environ.get.split
    x.transpose
    modules.vocoder.hifigan.hifigan.feature_loss
    torch.float.num_embeddings.torch.arange.unsqueeze
    torch.nn.parallel.distributed._tree_unflatten_with_rref
    BinarizationError
    x.transpose.self.conv1d.transpose
    matplotlib.pyplot.figure.savefig
    MultiprocessManager.close
    y2word.x2word.long
    numpy.ndim
    utils.commons.ddp_utils.DDP.build_model
    self.Conv1d1x1.super.__init__
    modules.commons.rnn.TacotronEncoder
    pe.unsqueeze.to
    tasks.tts.fs.FastSpeechTask.build_scheduler
    self.stdout.write
    decoded_txt.len.mel2ph.torch.LongTensor.mel2token_to_dur.numpy
    padding_mask.transpose.float
    torch.nn.GroupNorm
    CouplingBlock
    modules.tts.fs2_orig.FastSpeech2Orig
    utils.audio.align.mel2token_to_dur.new_zeros
    super.validation_step
    f.store_inverse
    torch.nn.GRU
    numpy.array
    self.ConvolutionModule.super.__init__
    data_gen.tts.txt_processors.base_text_processor.get_txt_processor_cls
    self.cwt2f0_norm
    lf0_rec.sum
    list.cuda
    torch.nn.MaxPool1d
    window_size.gaussian.unsqueeze.t
    self.load_ckpt
    x_padded.view.view_as
    numpy.concatenate.cpu
    bytes
    self.input_to_batch.max
    self.token_encoder.decode.split
    diagonal_focus_rate.mean
    matplotlib.pyplot.gca
    cls
    AcousticModel2.adaptation_config
    librosa.istft
    spec.transpose.pow
    wav_gt.abs.max
    tasks.tts.tts_utils.parse_dataset_configs
    torch.nn.functional.dropout
    lengths.tolist.max
    img1.get_device
    x.abs.sum.ne
    modules.commons.nar_tts_modules.DurationPredictor
    self.plot_cwt
    re.split
    t_s.self.k_channels.self.n_heads.b.value.view.transpose.view
    mel.exp.sum.sqrt
    param.grad.float.torch.isnan.any
    txt_processor.process
    torch.autograd.profiler.record_function
    self.head_dim.self.num_heads.bsz.tgt_len.q.contiguous.view.transpose.contiguous
    getattr
    x_sqz.permute.contiguous.view.permute
    txt.cls.preprocess_text.strip
    self.num_heads.attn_mask.repeat.reshape
    utils.commons.ddp_utils.DDP.configure_optimizers
    self.W1.bias.data.fill_
    utils.nn.seq_utils.get_incremental_state
    p.isalpha
    numpy.round
    self.build_spk_map
    MultiHeadAttention
    win_size.torch.hann_window.to
    t.float.reshape
    self.run_text_encoder
    make_pad_mask
    SinusoidalPositionalEmbedding
    modules.tts.portaspeech.portaspeech.PortaSpeech
    self.highways.append
    librosa.filters.mel
    param.grad.float
    ValueError
    f0.clamp.log
    x.transpose.self.w_1.torch.relu.transpose
    wav_pred.view.cpu.float.numpy
    montreal_forced_aligner.command_line.train_dictionary.run_train_dictionary
    abs
    numpy.fromiter
    prior_dist.log_prob
    max_len.torch.arange.to
    self._orig_exit
    c.transpose.transpose
    self.SpectralConvergengeLoss.super.__init__
    EnG2p
    self._sync_buffers
    IndexedDatasetBuilder.add_item
    latent_shape.torch.randn.to
    self.out_proj.transpose
    k.self.training_losses_meter.update
    spec.abs.sum
    torch.full
    

    @developer Could please help me check this issue? May I pull a request to fix it? Thank you very much.

    opened by PyDeps 0
  • How to solve find_unused_parameters warning?

    How to solve find_unused_parameters warning?

    Dear author,

    I try to train Fastspeech and FastSpeech 2 with shared code.

    After "sanity val", the find_unused_parameter warning is appeared like below images. image

    This issue doesn't matter to run code, but it cause memory issue after about 50000 steps when I set find_unused_paramter=False. Is it any solution to solve this issue?

    opened by jisang93 0
  • unable to open shared memory object  in read-write mode

    unable to open shared memory object in read-write mode

    I run binarize.py, it returns error:

    Traceback (most recent call last):                                                                                          
      File "/home/jiangbingyu/miniconda3/envs/synta/lib/python3.7/multiprocessing/queues.py", line 236, in _feed                
        obj = _ForkingPickler.dumps(obj)                                                                                        
      File "/home/jiangbingyu/miniconda3/envs/synta/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps              
        cls(buf, protocol).dump(obj)                                                                                            
      File "/home/jiangbingyu/miniconda3/envs/synta/lib/python3.7/site-packages/torch/multiprocessing/reductions.py", line 319, 
    in reduce_storage                                                                                                           
        metadata = storage._share_filename_()                                                                                   
    RuntimeError: unable to open shared memory object </torch_14375_33168349> in read-write mode  
    

    I try num_workers>0, all get this error. And if I set num_workers=0, the program will hang .

    so, how to deal with it?

    opened by leon2milan 0
  • checkpoint is loaded twice

    checkpoint is loaded twice

    It seems that checkpoint is loaded twice.

    The checkpoint is loaded according to path specified by hparams['load_checkpoint'].
    https://github.com/NATSpeech/NATSpeech/blob/238165e8cd430531b69c484cabb032c1313ee73b/utils/commons/trainer.py#L150

    However, the checkpoint is loaded again https://github.com/NATSpeech/NATSpeech/blob/238165e8cd430531b69c484cabb032c1313ee73b/utils/commons/trainer.py#L153-L155

    This can overwrite model parameters with the last checkpoint, which means that hparams['load_checkpoint'] is useless.

    opened by unrea1-sama 0
  • How to compute negative log-likelihood?

    How to compute negative log-likelihood?

    Dear author,

        In your code, you use `prior_dist = dist.Normal(0, 1)` to establish a normal distribution, and use `-prior_dist.log_prob(z_postflow).mean()` to compute NLL (z_postflow means the training output of post-net).  However, I don't understand how this method works. 
    
        Moreover, VITS uses `torch.sum(0.5 * (math.log(2*math.pi)+(z**2)) * x_mask, [1, 2])` to compute NLL; WaveGlow uses `z ** 2 / (2 * sigma**2)` to compute NLL. My second question is where their methods are different from yours.
    
        Wish you an early reply. Thank you.
    
    opened by hongchengzhu 0
Releases(pretrained_models)
Owner
Advanced Non-Autoregressive Text-to-Speech Research
null
PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

VAENAR-TTS - PyTorch Implementation PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

Keon Lee 67 Nov 14, 2022
Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech (BVAE-TTS)

Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech (BVAE-TTS) Yoonhyung Lee, Joongbo Shin, Kyomin Jung Abstract: Although early

LEE YOON HYUNG 147 Dec 5, 2022
PyTorch Implementation of "Non-Autoregressive Neural Machine Translation"

Non-Autoregressive Transformer Code release for Non-Autoregressive Neural Machine Translation by Jiatao Gu, James Bradbury, Caiming Xiong, Victor O.K.

Salesforce 261 Nov 12, 2022
PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

Chung-Ming Chien 1k Dec 30, 2022
The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.

VAENAR-TTS This repo contains code accompanying the paper "VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis". Sa

THUHCSI 138 Oct 28, 2022
A Non-Autoregressive Transformer based TTS, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS.

A Non-Autoregressive Transformer based TTS, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS.

Keon Lee 237 Jan 2, 2023
Learning to Rewrite for Non-Autoregressive Neural Machine Translation

RewriteNAT This repo provides the code for reproducing our proposed RewriteNAT in EMNLP 2021 paper entitled "Learning to Rewrite for Non-Autoregressiv

Xinwei Geng 20 Dec 25, 2022
Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Alexander Veysov 3.2k Dec 31, 2022
Simple Speech to Text, Text to Speech

Simple Speech to Text, Text to Speech 1. Download Repository Opsi 1 Download repository ini, extract di lokasi yang diinginkan Opsi 2 Jika sudah famil

Habib Abdurrasyid 5 Dec 28, 2021
Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

null 186 Dec 24, 2022
A Python module made to simplify the usage of Text To Speech and Speech Recognition.

Nav Module The solution for voice related stuff in Python Nav is a Python module which simplifies voice related stuff in Python. Just import the Modul

Snm Logic 1 Dec 20, 2021
Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

STEMM: Self-learning with Speech-Text Manifold Mixup for Speech Translation This is a PyTorch implementation for the ACL 2022 main conference paper ST

ICTNLP 29 Oct 16, 2022
Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration This repo contains only model Implementation of Zero-Shot Text-to-Speech for Text

Rishikesh (ऋषिकेश) 33 Sep 22, 2022
ConferencingSpeech2022; Non-intrusive Objective Speech Quality Assessment (NISQA) Challenge

ConferencingSpeech 2022 challenge This repository contains the datasets list and scripts required for the ConferencingSpeech 2022 challenge. For more

null 21 Dec 2, 2022
Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing

Token Shift GPT Implementation of Token Shift GPT - An autoregressive model that relies solely on shifting along the sequence dimension and feedforwar

Phil Wang 32 Oct 14, 2022
A PyTorch Implementation of End-to-End Models for Speech-to-Text

speech Speech is an open-source package to build end-to-end models for automatic speech recognition. Sequence-to-sequence models with attention, Conne

Awni Hannun 647 Dec 25, 2022
A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

Chimera: Learning Shared Semantic Space for Speech-to-Text Translation This is a Pytorch implementation for the "Chimera" paper Learning Shared Semant

Chi Han 43 Dec 28, 2022
PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

StyleSpeech - PyTorch Implementation PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation. Status (2021.06.09

Keon Lee 142 Jan 6, 2023