PyTorch implementation of NATSpeech: A Non-Autoregressive Text-to-Speech Framework

Hi, In NATSpeech, inappropriate dependency versioning constraints can cause risks.

Below are the dependencies and version constraints that the project is using

matplotlib
librosa==0.8.0
tqdm
pandas
numba==0.53.1
numpy==1.19.2
scipy==1.3
PyYAML==5.3.1
tensorboardX
pyloudnorm
setuptools>=41.0.0
g2p_en
resemblyzer
webrtcvad
tensorboard==2.6.0
scikit-learn==0.24.1
scikit-image==0.16.2
textgrid
jiwer
pycwt
PyWavelets
praat-parselmouth==0.3.3
jieba
einops
chardet

The version constraint == will introduce the risk of dependency conflicts because the scope of dependencies is too strict. The version constraint No Upper Bound and * will introduce the risk of the missing API Error because the latest version of the dependencies may remove some APIs.

After further analysis, in this project, The version constraint of dependency tqdm can be changed to >=4.36.0,<=4.64.0. The version constraint of dependency numpy can be changed to >=1.16.0rc1,<=1.18.5. The version constraint of dependency setuptools can be changed to >=51.3.0,<=54.1.1. The version constraint of dependency scikit-image can be changed to >=0.9.0,<=0.9.3.

The above modification suggestions can reduce the dependency conflicts as much as possible, and introduce the latest version as much as possible without calling Error in the projects.

The invocation of the current project includes all the following methods.

The calling methods from the tqdm

tqdm.tqdm
tqdm.tqdm.set_postfix

The calling methods from the numpy

numpy.linalg.qr
numpy.linalg.pinv
c

The calling methods from the setuptools

b
packaging.version.parse
glob.glob

The calling methods from the scikit-image

six.iteritems

The calling methods from the all methods

numpy.exp
energy.float.sum
matplotlib.pyplot.imshow
self.fs2.transpose
self.conv_k.weight.data.copy_
self.spectral_convergenge_loss
utils.os_utils.link_file
utils.commons.indexed_datasets.IndexedDatasetBuilder.finalize
numpy.random.randint
strip_ids
numpy.array.append
self.ConformerEncoder.super.forward
z_postflow.prior_dist.log_prob.mean
seq_range.unsqueeze.expand.new
torch.cat.sin
is_sil.float.cumsum
MultiprocessManager.add_job
torch.optim.AdamW
callable
self._convert_range
isinstance
size.values.len.values.new.fill_
self.fc2
self.eos
utils.nn.seq_utils.set_incremental_state
u.double.torch.inverse.float
m.weight.data.normal_
n_sqz.nonpadding.unsqueeze.repeat.view
f.write
self.reducer._rebuild_buckets
audio_sample_rate.wav_data.parselmouth.Sound.to_pitch_ac
self.spk_embed_proj
self.conv_o
utils.audio.align.mel2token_to_dur.sum
j.self.num_kernels.i.self.resblocks
process_cls.process_text
convert_pad_shape
numpy.tril
self.layers.extend
self.vocoder
numpy.triu.astype
scipy.spatial.distance.cdist
dur_padding.long.sum
i_word.txt_struct.append
torch.hann_window
x.size.x.size.torch.ones.to
numpy.std
librosa.resample
get_lf0_cwt
self.id_to_token.values
_1D_window.t._1D_window.mm.float.unsqueeze
self.Permute.super.__init__
modules.tts.commons.align_ops.clip_mel2token_to_multiple
self._to_flatten.append
self.ph2word_encoder
torch.nn.functional.mse_loss.mean
maxlen.lengths.len.torch.ones.to
torch.multiprocessing.set_sharing_strategy
self.w_2
numpy.cumprod
self.l_mask.to
xs.size
max
self.encoder.size
AuxModel
torch.nn.Conv1d
montreal_forced_aligner.aligner.TrainableAligner.train
os.path.expanduser
FastSpeech2OrigInfer.example_run
lengths.seq_range_expand.new.unsqueeze
torch.save
src_mask.float.sum
TokenTextEncoder
fix_path
output.transpose.contiguous.view
sample.eq
numpy.squeeze
self.dump_checkpoint.items
torch.tril.transpose
self.add_mel_loss
self.get_task_ref.train_dataloader
h.new_zeros
c
self.model.fvae.named_children
self.last_norm
montreal_forced_aligner.exceptions.ArgumentError
input.lower
m.flatten_parameters
costs.T.cpu.detach.numpy
wav_gt.clamp.view
query.self._in_proj.chunk
range
mel2ph.numpy.numpy
nonpadding_sqz.loss_kl.sum
img1.size
handler.close
decoder_inp.detach.mean
dur.sum.max.torch.arange.to
os.makedirs
montreal_forced_aligner.thirdparty.kaldi.validate_transcribe_binaries
to_torch
super.__getitem__.update
utils.commons.indexed_datasets.IndexedDatasetBuilder
pos.self.padding_idx.self.weights.expand
self.bias.shape.logs.torch.exp.m.view.to
move_link_func
torch.empty
self.post_flow.train
fn
torch.load
self.ffn_1
self.proj.bias.data.zero_
train_mfa_align
torch.zeros
preprocess_cls.split
target_len.new
optimizer.zero_grad
data_gen.tts.runs.preprocess.preprocess
torch.nn.utils.rnn.pack_padded_sequence.sum
modules.commons.transformer.FFTBlocks
torch.abs
get_cont_lf0
self.save_valid_result
self.end
torch.unsqueeze
end.bias.data.zero_
ph_token.torch.zeros_like.float
process_cls.process_wav
matplotlib.pyplot.matshow
numpy.zeros
self.conv_post
self.run_training_batch
self.convs.apply
torch.cat.unsqueeze
x.abs.sum.eq
torch.nn.parallel.distributed._DDPSink.apply
sample_lens.append
self.wn
matplotlib.pyplot.hlines
numpy.ones
self.conv_k
hp.get
self.reset
six.iteritems
torch.nn.functional.mse_loss
self.run_post_glow
matplotlib.pyplot.subplots
torch.round
self.dec_res_proj
diffusion_step.self.diffusion_projection.unsqueeze
tgt_padding_mask.float.sum
self.parameters
w.replace
torch.nn.parallel.distributed._find_tensors
self.drop
utils.commons.tensor_utils.move_to_cuda
argparse.ArgumentParser.add_argument
extract_pitch
self.FastSpeech2Orig.super.forward_pitch
self.meta_data
encoder_out.abs.sum
fused_add_tanh_sigmoid_multiply
l.double.torch.inverse.float
target_padding_mask.src_seg_mask.src_padding_mask.attn.get_phone_coverage_rate.mean
torch.nn.utils.rnn.pack_padded_sequence
self.logger.set_runtime_stats_and_log
utils.commons.dataset_utils.BaseConcatDataset
encdec_attn.max.values.sum
utils.audio.align.mel2token_to_dur.tolist
self.model.post_flow.parameters
self.p_sample
os.path.isdir
target.abs.sum.ne.float
x_padding.float
self.LambdaLayer.super.__init__
torch.cumsum
pos_emb.self.linear_pos.view
w_shape.np.random.randn.np.linalg.qr.astype
output.get
l.double.torch.inverse.float.double
self.cwt_stats_layers
dur_input.data.abs
f0.log.long
layers.Swish
self.PortaSpeechFlowTask.super.save_valid_result
self.restore_opt_state
x.view
attn.transpose.contiguous
dur.self.length_regulator.detach
MultiheadAttention
infer_cls
CBHG
x_pos.sum.clamp
utils.audio.trim_long_silences
numpy.cumsum
txt_tokens.shape.torch.LongTensor.to
mel.torch.from_numpy.float
self.get_task_ref.test_start
self._word_encoder
traceback.print_exc
is_sil.float.float
torch.cuda.amp.autocast
l
values.new
self.data_file.seek
montreal_forced_aligner.command_line.transcribe.run_transcribe_corpus
self.id_to_token.get
inspect.isfunction
f
torch.nn.Dropout
getattr.process
self.MultiPeriodDiscriminator.super.__init__
repeat_noise
librosa.feature.delta
self.cond_layer.transpose
p.transpose.transpose
add_global_options
self.save_checkpoint
self.load_meta_data
utils.nn.schedulers.WarmupSchedule
encdec_attn.gather.mean
torch.nn.parallel.distributed.Join.notify_join_context
costs.T.cpu
matplotlib.pyplot.tight_layout
utils.audio.librosa_wav2spec.astype
self.PortaSpeechFlow.super.forward.transpose
sample.cpu.numpy.tolist
self.get_task_ref.validation_end
self.amp_scalar.step
torch.cuda.is_available
torch.log1p
attn.sum.sum
self.embed_positions
self.LayerNorm.super.__init__
utils.commons.hparams.hparams.y.squeeze.mel_spectrogram.transpose.detach
self.build_tts_model
self.step
energy_embed_inp.torch.clamp.long
self.g_pre_net
REGISTERED_VOCODERS.get
nonpadding.unsqueeze.x_recon.noise.abs.mean
ax.twinx.legend
xs.dim
matplotlib.pyplot.title
modules.commons.conv.ConvBlocks
x.tgt_nonpadding_BHT.z.self.fvae.decoder.transpose
self.pitch_predictor
v.cpu.numpy
self.init_ddp_connection
montreal_forced_aligner.helper.setup_logger.warning
utils.text.text_encoder.build_token_encoder.encode
self.dec_query_proj
monitor_op
torch.nn.init.xavier_uniform_
self._init_vocab
samples.items
dur_pred.cpu.numpy
torch.distributed.is_initialized
audio.numpy.numpy
min
t_s.self.k_channels.self.n_heads.b.key.view.transpose.size
p.size
dur.numpy.tolist
torch.LongTensor.new_zeros
hparams.FS_ENCODERS
self.ffn_2
modules.tts.glow.utils.squeeze
x.np.abs.sum
self.energy_predictor
self.linear_v
self.model_disc.parameters
self.logs.data.copy_
self.num_heads.attn_mask.repeat.reshape.size
utils.audio.pitch_extractors.extract_pitch_simple
self.build_model
self.fc1
hiddens.append
self.W1
torch.nn.utils.remove_weight_norm
numpy.eye
montreal_forced_aligner.config.load_global_config
x_padded.view.view
self.ffn.clear_buffer
self.run_model
torch.softmax
torch.nn.Linear
torch.distributions.Normal
wav_gt.view.cpu
f0_mel.long.max
torch.nn.functional.gelu
f0.cpu.numpy
self.sin_pos.cumsum
montreal_forced_aligner.thirdparty.kaldi.validate_alignment_binaries
state.items
self.infer_ins.infer_once
scales.len.torch.arange.float.to
self.model.eval
enc_dec_attn_constraint_mask.unsqueeze.bool
torch.rsqrt
os.path.join
nonpadding.uv.p_pred.F.binary_cross_entropy_with_logits.sum
self.extend_pe
self._matmul_with_relative_keys
g.detach.detach
self.window.to
self.sin_pos.sum
IndexedDataset
modules.tts.diffspeech.shallow_diffusion_tts.GaussianDiffusion.eval
self.out_file.write
self.embedding
modules.vocoder.hifigan.hifigan.HifiGanGenerator
c1
ctx.embed_utterance
self.MultiLayeredConv1d.super.__init__
scipy.ndimage.morphology.binary_dilation
y.reshape.reshape
utils.commons.hparams.hparams.get
torch.clamp
subparser.add_argument
montreal_forced_aligner.config.update_global_config
self._in_proj
dynamic_range_decompression_torch
strip_ids.pop
dur.numpy.clamp
text.replace
utils.nn.seq_utils.weights_nonzero_speech.sum
main
window.type_as.cuda
encdec_attn.gather.max
numpy.full
linear_beta_schedule
h.isoformat
self.conv_pre
self.decode_list
numpy.mean
load_adapt_config
Conv1d
pycwt.wavelet.MexicanHat
convolutions.append
numpy.copy
montreal_forced_aligner.utils.get_pretrained_acoustic_path
torch.flatten.view
self.maxpool
sum
torch.isnan
torch.nn.init.kaiming_normal_
json.load.most_common
DecSALayer
self.get_task_ref.on_train_end
self.reducer._set_forward_pass_work_handle
self.save_codes
x.size.x_lengths.sequence_mask.torch.unsqueeze.to
filter
gradio.Interface
i.self.norm_layers_1
get_all_ckpts
torch.qr
sample.cpu.numpy
fmap.append
LogSTFTMagnitudeLoss
utils.audio.cwt.get_cont_lf0
LayerNorm
self.op.set_buffer
self.DiscriminatorS.super.__init__
torch.distributed.is_available
f0_mel.np.rint.astype
b
f0.data.cpu.numpy
join.split
numpy.where
numpy.linspace
webrtcvad.Vad
os.path.splitext
moving_average
self.out_proj.size
self.save_terminal_logs
self.linear_pos
utils.commons.hparams.hparams.update
self.encoder.long
lengths.device.maxlen.lengths.len.torch.ones.to.cumsum.t
self.run_evaluation
tasks.tts.tts_utils.load_data_preprocessor
self.forward_qkv
matplotlib.pyplot.xlim
self.pre_highway
self.conv_q
self.bias_k.repeat
binarizer_cls
montreal_forced_aligner.config.load_basic_align
utils.nn.model_utils.num_params
self.relu_drop
self.linear.transpose
self.conv_project2
self.feed_forward_macaron
_1D_window.t._1D_window.mm.float
self.model.named_children
utils.commons.ddp_utils.DDP.cuda
width2.width1.width2.width1.torch.where.float
utils.nn.seq_utils.softmax
torch.matmul.float
self.input_to_batch.get
self.ResBlock2.super.__init__
self.q_posterior
self._attention_bias_proximal
ExitHooks
dur_pred.new_zeros
numpy.linalg.qr
cosine_beta_schedule
librosa.effects.trim
torch.nn.parallel.distributed._tree_flatten_with_rref
collections.defaultdict
sample.max
resemblyzer.VoiceEncoder
modules.tts.portaspeech.portaspeech_flow.PortaSpeechFlow.to
end.weight.data.zero_
self.add_pitch_loss
f0.clamp.to
self.build_dataloader
w.terminate
self.dec_inp_noise_proj
utils.audio.align.mel2token_to_dur.log
nonpadding.unsqueeze
self.start
torch.tensor
torch.cuda.synchronize
torch.cuda.empty_cache
self.word_pos_proj
alpha.dur.float.torch.round.long
conv
S.np.abs.astype
super.__getitem__
f0.clamp.clamp
self.model_disc
torch.nn.functional.pad
self.get_pos_embed
scales.len.torch.arange.float
textgrid.TextGrid.fromFile.write
self.predict_start_from_noise
unfix_path
self.get_weight
loss.backward
self.log_stft_magnitude_loss
self.head_dim.self.num_heads.bsz.k.contiguous.view.transpose
self.preprocessor.load_dict
modules.vocoder.hifigan.mel_utils.mel_spectrogram
numpy.log
os.environ.get
MultiprocessManager.get_results
utils.commons.dataset_utils.batch_by_size.append
list.items
dur.sum.max
loss.self.amp_scalar.scale.backward
preprocess_args.get
self.MultiHeadedAttention.super.__init__
os.path.basename
scipy.io.wavfile.write
utils.commons.dataset_utils.collate_1d_or_2d
self.wn.view
self.norm_spec
decoder_inp.detach.detach
attn.size.torch.arange.to
scores.masked_fill.masked_fill
torch.angle
self.WN.super.__init__
self.metrics_to_scalars.state_dict
torch.split
cls.preprocess_text
mel2ph.numpy.tolist
batch.append
warnings.filterwarnings
torch.Size
word_len.dur_pred.device.T.torch.arange.to.float
binarizer_cls.split
self.w_1
self.res_blocks
IndexError
self.amp_scalar.scale
self.reducer.prepare_for_backward
self.norm_layers.append
torch.nn.init.calculate_gain
sys.exit
self.forward_model
self.l_mask.transpose.contiguous
setattr
tensors.item.items
encdec_attn.gather.size
self._get_weight
self.add_dur_loss
self._relative_position_to_absolute_position
word_dur_g.float.sum
torch.cuda.amp.GradScaler
encoder_padding_mask.float.transpose
src_padding.dur.self.length_regulator.detach
RESERVED_TOKENS.index
word_len.max
super.__init__
self.model.fs2.named_parameters
self.wn.size
data_gen.tts.txt_processors.base_text_processor.register_txt_processors
self.PortaSpeechTask.super.save_valid_result
librosa.feature.mfcc
time_warp
w.join
v.contiguous.view
torch.zeros_like
numpy.exp.astype
self.encoder.abs
montreal_forced_aligner.utils.get_available_dict_languages
tensors_to_np
_ssim
functools.wraps
metrics.items
validate_args
x.permute
modules.tts.commons.align_ops.build_word_mask
self._get_item.get
word_nonpadding.wdur_loss.sum
torch.flip
self.logger.add_audio
self.ResBlock1.super.__init__
dur.self.length_regulator.detach.float
utils.metrics.diagonal_metrics.get_diagonal_focus_rate
torch.arange.unsqueeze
os.path.dirname
self.training_losses_meter.update
self.validation_step
task
torch.nn.init.zeros_
self.model.detach
SpectralConvergengeLoss
self.layer_norm
scipy.interpolate.interp1d
self.prior_flow
new_audio.append
self.vocoder.spec2wav
argparse.ArgumentParser.parse_known_args
self.id_to_token.update
utils.commons.hparams.hparams.y.squeeze.mel_spectrogram.transpose
re.search
utils.commons.ckpt_utils.get_all_ckpts
tgt_mels.transpose.transpose
EncSALayer
self.encoder.view
torch.distributed.barrier
IndexedDatasetBuilder.finalize
self.model.named_parameters
spec_out.cpu.numpy
modules.commons.transformer.MultiheadAttention
nonpadding.f0.f0_pred.F.l1_loss.sum
self.resolve_root_node_address.split
self.resolve_root_node_address
idx.items.idx.ds.all
collate_2d
self.get_task_ref.validation_step
librosa.core.load
x_recon.noise.abs.mean
x.self.dropout.transpose
numpy.dot
self.model.store_inverse_all
self.num_heads.attn_mask.repeat.reshape.repeat
base_model_name.state_dict.items
encdec_attn.shape.encdec_attn.reshape.softmax
montreal_forced_aligner.command_line.download.run_download
data_gen.tts.wav_processors.base_processor.get_wav_processor_cls
pandas.DataFrame
self.ConditionalConvBlocks.super.forward
numpy.linalg.pinv
all_ones.seg_ids.max_len.B.h.new_zeros.scatter_add_.contiguous
t_t.self.k_channels.self.n_heads.b.query.view.transpose.view
utils.audio.pitch.utils.denorm_f0
montreal_forced_aligner.helper.setup_logger
itvs_.append
self.model_gen.detach
montreal_forced_aligner.command_line.train_and_align.run_train_corpus
DiffSpeechInfer.example_run
torch.log2
modules.commons.rnn.DecoderRNN
numpy.isnan
self.rnn
torch.cat.size
acoustic_model.adaptation_config.update_from_align
dynamic_range_compression_torch
x.abs.sum
matplotlib.pyplot.vlines
self.get_task_ref.val_dataloader
slice
montreal_forced_aligner.dictionary.MultispeakerDictionary
TransformerFFNLayer
x.abs.sum.transpose
self.input_to_batch
txt_struct_.append
x_sqz.permute.contiguous
self.MultiResolutionSTFTLoss.super.__init__
self.stft_loss
self.preprocess_input
torch.ones_like
self.encoder
self.head_dim.self.num_heads.bsz.v.contiguous.view.transpose
t.float.fill_.type_as
self.size
numpy.exp.exp
random.shuffle
numpy.repeat
self.preprocessor.load_spk_map
self.dilated_conv
type
torch.inverse
multiprocessing.Queue
ph2word.gather
item_raw.get
textgrid.TextGrid.fromFile
h.new_ones
argparse.ArgumentParser
attn.size.torch.zeros.to
self.enc_pos_proj
torch.autograd.Variable
layer
T_txt.B.mel2token.new_zeros.scatter_add
self.restore_weights
torch.nn.functional.softplus
collections.Counter
math.sqrt
self.flows.append
fmap_rs.append
torch.cat.view
i.self.ffn_layers
InvConvNear
align_from_distances
x_mask.unsqueeze.repeat
dur_pred.np.cumsum.astype.cpu
torch.transpose
format
lengths.unsqueeze.ids.bool.type
k.task_ref.getattr.load_state_dict
self.logger.add_figure
tuple
x2word.word2word.build_word_mask.float
src_len.tgt_len.self.num_heads.bsz.attn_weights_float.view.transpose
f0.clamp.min
torch.cat.transpose
torch.matmul
gaussian
time.gmtime
self.eye.to
data_gen.tts.wav_processors.base_processor.register_wav_processors
torch.Tensor.sum
numpy.concatenate.append
torch.mean
torch.nn.ModuleDict
os.path.normpath
numpy.random.shuffle
numpy.cos
mel_lengths.append
montreal_forced_aligner.helper.log_config
self.encoder_attn.in_proj_v
ConvNorm
self._absolute_position_to_relative_position
importlib.import_module
sample.split
multiprocessing.Process.start
args.corpus_directory.rstrip.rstrip
file_to_cmp.append
move_to_cpu
torch.nn.utils.rnn.pack_padded_sequence.abs
torch.nn.Conv2d
self.proj_out
torch.cat
mel.torch.FloatTensor.unsqueeze.to
time.strftime
utils.commons.meters.AvgrageMeter.items
tgt_padding_mask.float
attn.max.values.sum
self.register_buffer
value.self.linear_v.view
scores.torch.ones_like.triu
torch.nn.functional.leaky_relu
costs.T.cpu.detach
numpy.interp
x.size
wav_pred.view.cpu.float
re.findall
self.scatter
torch.gather
self.embed
utils.commons.meters.AvgrageMeter.update
self.file.flush
utils.text.encoding.get_encoding
mfa_usr.adapt.run_adapt_model
register_pitch_extractor
torch.nn.Identity
self._init_vocab_from_list
modules.vocoder.hifigan.hifigan.MultiScaleDiscriminator
utils.commons.hparams.hparams.split
x.transpose.self.in_conv1d.transpose
math.isinf
self.proj.weight.data.zero_
y.squeeze
self.ln
wav_gt.clamp.clamp
utils.commons.hparams.hparams.y_.squeeze.mel_spectrogram.transpose
numpy.abs
norm_f
espnet_positional_embedding.RelPositionalEncoding
montreal_forced_aligner.utils.validate_dictionary_arg
modules.tts.fs2_orig.FastSpeech2Orig.eval
cur_model.load_state_dict
self.op.clear_buffer
self.run_model.cpu
losses.detach
txt_tokens.torch.zeros_like.bool
override_config
layers.Embedding
self.get_task_ref.test_end
target_padding_mask.src_padding_mask.attn.get_focus_rate.mean.mean
get_last_checkpoint
batches.append
numpy.finfo
self.args_queue.full
torch.matmul.view
weights.l1_loss.sum
self.in_proj_qkv
self.l_mask.transpose
self.sin_pos.float
torch.det
sorted
self.cond_net
torch.cat.cos
pyloudnorm.Meter
self.word_encoder
self.saving_result_pool.get_results
numpy.log2
PreNet
self.task
json.load.values
self.ph_encoder.encode
torch.sigmoid
utils.plot.plot.spec_to_figure
i.self.ups
_get_full_incremental_state_key
subprocess.check_call
x_unsqz.permute.contiguous
round
self.encoder_attn
tgt_padding_mask.float.new
mel.torch.FloatTensor.unsqueeze
re.sub.lower
ax.twinx.set_ylim
matplotlib.pyplot.show
utils.audio.cwt.get_lf0_cwt
items.append
self.log_s.sum
numpy.exp.all
self.norm_ff
AcousticModel2.validate
y.squeeze.unsqueeze
self._assert_ready_for_upsampling
cls.process_align
self.PitchPredictor.super.__init__
positions.view.self.weights.index_select.view
f0.torch.ones_like.mel2ph.ph_token.torch.zeros_like.float.scatter_add.clamp_min
d
z.permute.contiguous
utils.commons.ddp_utils.DDP.eval
create_parser.add_subparsers
np_p.astype
inference_cls.split
attn_weights.mean.size
modules.tts.commons.align_ops.mel2ph_to_mel2word
logger.add_scalar
x.transpose.transpose
next
self.get_task_ref.parameters
seg_ids.repeat
torch.where
torch.nn.utils.clip_grad_value_
self.layer_norm1
self.postprocess_output
self.get_task_ref.test_step
l.strip.split
i.self.meanpools
model_name.split
random.seed
self.conv_v
values.shape.size.values.len.values.new.fill_
self._get_item.astype
numpy.log10
n_sqz.t.b.torch.ones.to
self.mlp
default
self.log_metrics
numpy.concatenate
utils.audio.pitch.utils.f0_to_coarse
torch.nn.functional.relu
InvConv
config_chains.append
inverse_cwt_torch
six.moves.range
nonpadding.energy.energy_pred.F.mse_loss.sum
ph.lower
t_s.self.k_channels.self.n_heads.b.key.view.transpose.view
os.path.isfile
token_mask.long.token_idx.sum
cwt_gt.cpu.numpy
self.cbhg
dur_pred.log
word_len.dur_pred.device.T.torch.arange.to.float.sum
super.forward
iter
self.res_skip_layers.append
self.configure_ddp
torch.nn.functional.dropout.abs
dur_gt.sum.log
i.self.res_skip_layers
self.reset_parameters
x.transpose.contiguous
word_id.max.B.dur_gt.new_zeros.scatter_add.log
matplotlib.pyplot.figure.colorbar
classname.find
p.process
base_model_name.state_dict.items.keys
self.scheduler.step
self.linear_k
self.FastSpeech2OrigTask.super.add_pitch_loss
cwt_out.cpu.numpy
training_config.training_configs.update
wav.np.abs.max.astype
self.model_gen.parameters
mel2token.repeat
max_len.B.h.new_zeros.scatter_add_
tgt_padding_mask.float.max
utils.audio.pitch.utils.norm_f0
mask.unsqueeze.eq
mask.unsqueeze.eq.unsqueeze
modules.tts.portaspeech.fvae.FVAE
self.model.to
self.ups.append
attn_weights.mean.masked_fill
stft
multiprocessing.freeze_support
self.ffn_layers.append
torch.rand
target.abs.sum.ne.float.repeat
self.get_plot_dur_info
self.encoder_attn.clear_buffer
self.sign_s.to
outputs.append
cond.transpose.transpose
torch.tanh
y.reshape
self.length_regulator
self.mel_out
torch.randint
Exception
modules.tts.portaspeech.portaspeech_flow.PortaSpeechFlow
pos_emb.size
montreal_forced_aligner.aligner.PretrainedAligner
pickle.loads
self.text_encoder_postnet
self.nn
round.items
torch.FloatTensor
log_s.sum
D1.copy
i.self.norm_layers
random.randint
torch.nn.LayerNorm
self.bnorm
nonpadding.unsqueeze.repeat
scipy.linalg.lu
matplotlib.pyplot.savefig
env_vars.append
ax.set_yticklabels
T.mel2token.mel2token_to_dur.float
subparsers.add_parser.add_argument
utils.commons.ddp_utils.DDP.zero_grad
get_local_context
to_mp3
layers.ConvolutionModule
self.conv_k.bias.data.copy_
self.ConvBlocks.super.__init__
self.pointwise_conv2
args.corpus_directory.rstrip
self.attn
self.convs2.apply
self.build_scheduler
torch.nn.functional.l1_loss.mean
collate_1d
self.conv_module
t_s.self.k_channels.self.n_heads.b.key.view.transpose.transpose
sep.join
c.self.model.view.cpu
parselmouth.Sound
self.apply
numpy.rint
samples.get.size
self.fvae
self.RNNEncoder.super.__init__
torch.nn.init.normal_
dataset.ordered_indices
self.norm_final
c2
modules.vocoder.hifigan.stft_loss.MultiResolutionSTFTLoss
self._get_input_buffer.view
torch.distributed.get_world_size
BatchNormConv
self.datasets.collater
modules.tts.glow.utils.unsqueeze
weights.mse_loss.sum
utils.metrics.ssim.ssim
mask.mask.mask.torch.cumsum.type_as.long
torch.max
REGISTERED_TEXT_PROCESSORS.get
t_s.self._attention_bias_proximal.to
yaml.safe_dump
scipy.ndimage.laplace
self._atomic_save
yaml.safe_load
dur_padding.long.float
self.model.parameters
ph_token.torch.zeros_like.float.scatter_add
montreal_forced_aligner.command_line.classify_speakers.run_classify_speakers
self.p.to
Args
self.feed_forward
self.run_model.values
int
output.transpose.contiguous
self.post_flow
self.forward_attention
np_l.astype
v.replace.strip
self._safe_id_to_token
torch.nn.utils.rnn.pack_padded_sequence.transpose_
espnet_transformer_attn.RelPositionMultiHeadedAttention
sample.get.float
torch.nn.functional.conv1d
modules.tts.commons.align_ops.clip_mel2token_to_multiple.float
DiscriminatorS
resblock
fs2_mels.transpose.transpose
self.window.data.type
Encoder
modules.commons.transformer.FastSpeechDecoder
math.exp
torch.stack
self.train
self.bias.data.copy_
self.diffusion_projection
wav_pred.clamp.abs
torch.ones_like.sum
self._get_item
self.evaluate
optimizer.load_state_dict
new_hparam.split
self.raw_data_dir.open.readlines
mel2ph.long
AcousticModel2.log_details
numpy.log.astype
args.dictionary_path.lower.endswith
torch.nn.ConstantPad1d
self.preprocess_input.get
torch.nn.functional.dropout.detach
modules.commons.rel_transformer.RelTransformerEncoder
utils.commons.indexed_datasets.IndexedDatasetBuilder.add_item
d.items.append
matplotlib.pyplot.text
self.embed_positions.view
kwargs.get
_traceback
mel.exp
utils.audio.vad.trim_long_silences
self.conv1d_bank.append
Tee
attn_weights.mean.view
self.post_net1
res.update
audio_mask.np.round.astype
k.all_losses_meter.update
matplotlib.pyplot.pcolor
utils.commons.ddp_utils.DDP.train
GradioInfer
self.norm_layers_2.append
x.get_device
pe.unsqueeze.unsqueeze
torch.utils.tensorboard.SummaryWriter
torch.cat.permute
cal_localnorm_dist
self.build_vocoder
self.model
torch.nn.init.xavier_normal_
NotImplementedError
montreal_forced_aligner.command_line.g2p.run_g2p
u.double.torch.inverse.float.double
get_pitch_extractor
torch.log
self.PortaSpeechFlow.super.forward.repeat
torch.flatten
device.b.self.K_step.torch.randint.long
utils.commons.hparams.set_hparams
self.conv_project1
numpy.pad.cpu
dur_gt.cpu.numpy
hasattr
attn_weights.mean.mean
wav_gt.view.cpu.float
torch.sqrt
adapt_model
enumerate
torch.nn.Embedding
torch.sin
dur_gt.np.cumsum.astype.cpu
pyloudnorm.Meter.integrated_loudness
self.z_channels.T.B.torch.randn.kwargs.get.to
exists
montreal_forced_aligner.corpus.align_corpus.AlignableCorpus.speaker_utterance_info
utils.metrics.diagonal_metrics.get_phone_coverage_rate
word_id.max.B.dur_gt.new_zeros.scatter_add.float
FVAEEncoder
self.head_dim.self.num_heads.bsz.tgt_len.q.contiguous.view.transpose
x.transpose.abs
self.weight.float
img1.data.type
optimizer.state_dict
montreal_forced_aligner.config.load_command_configuration.save
optimizer.state.values
self.pointwise_conv1
utils.commons.tensor_utils.tensors_to_scalars
self.energy_embed
utils.nn.model_utils.print_arch
self._set_input_buffer
self.DiscriminatorP.super.__init__
spec.transpose.transpose
self.token_encoder.eos
sample.abs
utils.audio.librosa_wav2spec
join
t_t.self.k_channels.self.n_heads.b.query.view.transpose.size
montreal_forced_aligner.config.align_yaml_to_config
model.parameters
output.transpose.transpose
matplotlib.pyplot.axis
x_pos.x_pos.sum.clamp.x_pos.cumsum.sum
self.encoder.unsqueeze
modules.tts.fs.FastSpeech
self.MultiScaleDiscriminator.super.__init__
init_ctx_func
montreal_forced_aligner.config.load_command_history
yaml.safe_load.items
montreal_forced_aligner.thirdparty.kaldi.validate_train_dictionary_binaries
torch.nn.utils.weight_norm
torch.logdet
self.DurationPredictor.super.__init__
H.max_len.B.h.new_zeros.scatter_add_
sample.get
optimizer_states.append
torch.nn.functional.multi_head_attention_forward
n_sqz.t.b.torch.ones.to.unsqueeze
self.denorm_spec
torch.Generator.manual_seed
self.vocoder.to
create_parser
numpy.real.std
self.decoder
self.run_vocoder
glob.glob
k
self.fit
hparams.FS_DECODERS
str
self._sync_params
Linear
x.abs.sum.long.sum
self.attn_layers.append
gradio.Interface.launch
self.FastSpeechEncoder.super.forward
self.ConformerEncoder.super.forward.abs
self.dur_predictor
scipy.io.wavfile.read
torch.tensor.expand
reduce_tensors
target_len.size.target_len.new.fill_
diagonal_attn.sum.sum
modules.commons.nar_tts_modules.LengthRegulator
STFTLoss
binarize
torch.nn.utils.rnn.pack_padded_sequence.size
create_window
set.add
torch.Tensor
itertools.chain.from_iterable
self.forward_embedding
FlipLayer
self.depthwise_conv
self.p.inverse
numpy.angle
utils.nn.seq_utils.softmax.type_as
value.size
input_lengths.cpu.numpy.cpu
f.readlines
torch.ones
extract
kwargs.items
self.FastSpeechDataset.super.collater
spec.cpu.numpy.cpu
self.model_gen.squeeze
self.layer_norm2
model_log_variance.exp
preprocess
wav_raw.np.abs.max
montreal_forced_aligner.dictionary.Dictionary
txt_loader
data_gen.tts.runs.binarize.binarize
numpy.seterr
xs.ind.mask.expand_as.to
saved_hparams.update
tasks.tts.vocoder_infer.base_vocoder.get_vocoder_cls
modules.tts.diffspeech.net.DiffNet
utils.commons.multiprocess_utils.multiprocess_run_tqdm
self.module.training_step
f0.data.cpu
utils.audio.cwt.cwt2f0
y_batch.collate_2d.transpose
ActNorm
montreal_forced_aligner.aligner.TrainableAligner.export_textgrids
cal_localnorm_dist.squeeze
numpy.sign.astype
numpy.argsort
montreal_forced_aligner.utils.get_available_ivector_languages
self.linear
item.get
matplotlib.pyplot.close
self.get_task_ref.test_dataloader
librosa_pad_lr
self.spk_id_proj
self.FastSpeech2OrigTask.super.save_valid_result
print
utils.audio.io.save_wav
modules.commons.conv.ConditionalConvBlocks
self.conv_1
montreal_forced_aligner.helper.setup_logger.debug
cls.process_audio
lf0_rec_sum.std.mean
Mish
tqdm.tqdm
t_s.self.k_channels.self.n_heads.b.value.view.transpose
imag.real.torch.clamp.torch.sqrt.transpose
self.layer_norm3
acoustic_model.meta.get
txt_tokens.eq
torch.mean.remove_weight_norm
utils.commons.trainer.Trainer.fit
pos.startswith
CouplingLayer
self.word_encoder.encode
word_len.torch.arange.to
self.file.write
self.out_proj
k.config_node.type
self.bias_v.repeat
c_batch.collate_2d.transpose
load_config
noise_like
unmatched_keys.append
self.metrics_to_scalars.item
encdec_attn.max.values.sum.argmax
torch.nn.functional.binary_cross_entropy_with_logits
nonpadding.losses.sum
torch.optim.lr_scheduler.StepLR
modules.tts.fs.FastSpeech.eval
ret.transpose.transpose
torch.device
self.post
unicodedata.category
torch.nn.utils.rnn.pack_padded_sequence.transpose
w.split
cls.infer_once
target.size
montreal_forced_aligner.command_line.train_ivector_extractor.run_train_ivector_extractor
self.output_projection
open
numpy.zeros_like
sentence.strip.split
query.size
atexit.register
modules.tts.diffspeech.shallow_diffusion_tts.GaussianDiffusion
int16_max.wav.np.round.astype
torch.nn.functional.conv2d.pow
list.append
z.permute.contiguous.view
multiprocessing.Process
utils.commons.multiprocess_utils.MultiprocessManager
sample.cpu
self.byte_offsets.append
q.insert
utils.text.text_encoder.is_sil_phoneme
self.Reshape.super.__init__
item_.update
move_to_cuda
sample.float
args.hparams.split
utils.commons.dataset_utils.collate_2d
x.permute.contiguous
exit
dtw
istft
a.gather
_is_batch_full
dur_gt.np.cumsum.astype
dur_input.detach.detach
utils.metrics.diagonal_metrics.get_focus_rate
self.in_proj_k
numpy.random.seed
self.encoder.transpose
cwt_gt.cpu.numpy.cpu
device.t.torch.tensor.long
self.EncoderLayer.super.__init__
self.token_encoder.decode
t.float
numpy.triu
src_padding_mask.float
cls_name.pkg.importlib.import_module.getattr
utils.commons.trainer.Trainer.test
x_sqz.permute.contiguous.view
self.in_proj_q
f.readlines.join.strip
self.linear_q
DiscriminatorP
matplotlib.use
self.fs2
torch.nn.utils.clip_grad_norm_
torch.cat.new_zeros
utils.commons.dataset_utils.collate_1d
self.weights.to
nonpadding_sqz.logpx.logqx.sum
numpy.sign
wav_out.cpu.numpy.cpu
attn.size.attn.size.attn.new.bool.fill_
mel.exp.sum
word_len.max.B.torch.zeros.to
nltk.tokenize.TweetTokenizer
self.register_parameter
word_id.max.B.dur_pred.new_zeros.scatter_add.log
self.weights.index_select
new_np.append
self.check_index
target.abs
self.PortaSpeechTask.super.add_dur_loss
matplotlib.pyplot.xlabel
input
montreal_forced_aligner.config.load_command_configuration
encdec_attn.gather.reshape
modules.commons.layers.Embedding
super.collater
length.max
torch.nn.functional.pad.view
nonpadding_sqz.sum
torch.nn.GELU
logging.info
self.enc_pos_proj.transpose
self.weight.float.torch.inverse.to
datetime.datetime.now.strftime
item.torch.LongTensor.to
montreal_forced_aligner.command_line.train_lm.run_train_lm
wav_out.cpu.numpy
os.replace
lengths.unsqueeze.ids.bool
MultiprocessManager
a.gather.reshape
torch.is_grad_enabled
ax.set_yticks
self.get_lr
p.insert
self.SSIM.super.__init__
nonpadding.wdur.sum
base_model_name.state_dict.items.items
ldj.mean
self.FastSpeech2OrigTask.super.__init__
numpy.random.normal
torch.tensor.mean
matplotlib.pyplot.xticks
torch.nn.ModuleList
args.dictionary_path.lower
matplotlib.pyplot.plot
lengths.torch.max.item
is_sil.is_sil.cumsum.long.max
y.squeeze.squeeze
self.in_conv1d
self.results_queue.get
ResidualBlock
ph2word.device.word_len.max.B.torch.zeros.to.scatter_add
modules.commons.wavenet.WN
norm_f0
numpy.random.randn
time.time
ph.split
target_padding_mask.src_padding_mask.attn.get_focus_rate.mean
move_to_cpu.cpu
math.log
word_id.max.B.dur_pred.new_zeros.scatter_add
matplotlib.pyplot.ylabel
cls.process_wav
inverse_cwt
uv.float.sum
datetime.datetime.now
torch.nn.Sequential
target_padding_mask.src_seg_mask.src_padding_mask.attn.get_phone_coverage_rate.mean.mean
i.self.attn_layers
self._init_vocab_from_file
LambdaLayer
os.getenv
nonpadding.transpose.transpose
torch.tril.size
utils.audio.pitch.utils.norm_interp_f0
cls.g2p
torch.cdist
torch.distributed.init_process_group
self.validation_end
self.__setattr__
m.store_inverse
_1D_window.t._1D_window.mm.float.unsqueeze.unsqueeze.expand
self.initialize
args.acoustic_model_path.lower
self.get_task_ref.build_tensorboard
self._check_sync_bufs_post_fwd
numpy.random.rand
nltk.pos_tag
s.split
torch.randn_like
torch.LongTensor.max
f0_mel.long.min
torch.cat.contiguous
torch.nn.Softplus
utils.nn.seq_utils.group_hidden_by_segs
utils.nn.schedulers.NoneSchedule
self.DecoderRNN.super.__init__
outputs.pd.DataFrame.to_csv
window_size.gaussian.unsqueeze.mm
hs.append
f.writelines
montreal_forced_aligner.command_line.thirdparty.run_thirdparty
x.transpose.contiguous.view
x.scipy.ndimage.laplace.var
torch.nn.BatchNorm1d
utils.commons.tensor_utils.tensors_to_scalars.values
montreal_forced_aligner.helper.setup_logger.removeHandler
torch.sum
self.get_task_ref.named_parameters
mel2ph.clamp
pyloudnorm.normalize.loudness
T.torch.arange.to
window_size.gaussian.unsqueeze
self.get_task_ref.on_epoch_end
tensors.item.item
self.apply_sparse_mask
self.beta.view
mel2token_to_dur
v.replace.replace
self.out_file.close
torch.nn.LSTM
self.token_encoder.pad
montreal_forced_aligner.command_line.anchor.run_anchor
FVAEDecoder
local_src.mean.unsqueeze
spectral_normalize_torch
montreal_forced_aligner.aligner.TrainableAligner
dur_pred.sum
shuffle_batches
output.transpose.contiguous.view.transpose
nonpadding.uv.uv_pred.F.binary_cross_entropy_with_logits.sum
length.unsqueeze
modules.tts.portaspeech.portaspeech_flow.PortaSpeechFlow.store_inverse_all
torch.nn.functional.linear
float.t.float.fill_.type_as
multiprocess_run
dataloader.sampler.set_epoch
y.unsqueeze.transpose
torch.clamp_min
getattr.start
torch.chunk
self.denoise_fn.clamp_
target_len.size
wav_gt.clamp.abs
math.sqrt.masked_fill
torch.distributed.get_rank
self.VocoderBaseTask.super.__init__
mel.torch.from_numpy.float.to
self.metrics_to_scalars.parameters
x.long.sum
self.vocoder.eval
T.mel2ph.mel2token_to_dur.float
torch.utils.data.DataLoader
spec_gt.cpu.numpy
utils.commons.ckpt_utils.load_ckpt
modules.tts.commons.align_ops.expand_states
f0.cpu.numpy.cpu
wav_gt.view.cpu.float.numpy
torch.from_numpy
hparams.clear
self.logs.shape.logs.view.to
tasks.tts.tts_utils.parse_mel_losses
cls.process_pitch
numpy.any
self.run_decoder
self.skip_projection
torch.nn.functional.dropout.permute
self.amp_scalar.update
zip
float
seq_len.bsz.positions.view.self.weights.index_select.view.detach
self.output_fn
os.cpu_count
x.self.dropout.transpose.self.w_2.transpose
matplotlib.pyplot.yticks
enc_dec_attn_constraint_mask.unsqueeze
t.float.fill_
ExitHooks.hook
norm_scale
self.drop.unsqueeze
mel_out.cpu
self.concat_linear
window.type_as.type_as
torch.nn.functional.conv2d
getattr.split
self.activation
self.STFTLoss.super.__init__
self.get_task_ref
montreal_forced_aligner.command_line.create_segments.run_create_segments
lengths.tolist.tolist
x_recon.shape.torch.randn.to
self.ffn
modules.commons.conv.TextConvEncoder
chardet.detect
cls.add_bdr
set
self.get_task_ref.on_train_start
f_txt.write
utils.commons.hparams.hparams.int.np.zeros.astype
audio_out.astype.astype
f0.clamp.flatten
self.pre_net
self.rel_shift
norm_builder
modules.commons.normalizing_flow.glow_modules.Glow
lengths.lengths.device.maxlen.lengths.len.torch.ones.to.cumsum.t.t
modules.commons.nar_tts_modules.EnergyPredictor
self.plot_mel
self.LayerNorm.super.forward
self.pos_embed
self.denoise_fn
GradioInfer.run
optimizer.step
utils.nn.seq_utils.make_positions
self.norm_conv
is_sil.is_sil.cumsum.long
spec.pow.sum
args_queue.get
AcousticModel2
utils.text.text_encoder.build_token_encoder
numpy.maximum
utils.os_utils.remove_file
torch.nn.functional.softmax
self.conv_layers.append
struct.pack
model_out.cpu
self.PositionalEncoding.super.__init__
torch.relu
TransformerEncoderLayer
total_loss.item.item
self.lstm.flatten_parameters
self.op
dur.numpy.numpy
self.item_names.append
key_padding_mask.unsqueeze.unsqueeze
self.dec_query_proj.transpose
torch.nn.Parameter
numpy.real.mean
i.self.in_layers
torch.is_tensor
ssim_map.mean
self.wn.permute
torch.no_grad
self.W2
self.attention
attn.transpose.contiguous.view
attn.device.attn.size.torch.arange.to.float
extractor_name.get_pitch_extractor
k.transpose.transpose
montreal_forced_aligner.config.train_yaml_to_config
mel2ph.float
sorted.add
decoded_ids.append
EnG2p.word_tokenize
v.transpose.transpose
os.path.exists
dur_padding.long
d.items
self._phone_encoder
v.size
spec.cpu.numpy
x.abs.sum.float
montreal_forced_aligner.config.update_command_history
idx.items.update
weight.float.torch.inverse.to
cur_model.state_dict
item_name.replace
self.workers.append
path.np.load.item
torch.float32.x.size.torch.arange.unsqueeze
self.token_encoder.seg
super.add_dur_loss
x.abs.sum.eq.transpose
utils.commons.meters.AvgrageMeter
self.rnn.flatten_parameters
self.ConvNorm.super.__init__
modules.commons.layers.Embedding.cos
self.g_proj
x.abs.sum.ne.sum
numpy.clip
self.forward_energy
self.metrics_to_scalars.cuda
ind.mask.expand_as
montreal_forced_aligner.command_line.validate.run_validate_corpus
num_tokens_fn
x.reshape.reshape
json.dump
self.num_heads.attn_mask.repeat.reshape.unsqueeze
torch.cat.append
torch.diag
word_id.max.B.dur_gt.new_zeros.scatter_add
numpy.random.permutation
self.pitch_embed
self.FVAE.super.__init__
y_d_rs.append
self.forward_dur.float
utils.audio.align.mel2token_to_dur
f0.clamp.max
txt_tokens.float
self.TokenTextEncoder.super.__init__
torch.log.view
convert_continuos_f0
self.FastSpeechDataset.super.__getitem__
tensor.ne
unicodedata.normalize
self.token_encoder.sil_phonemes
_stft
key.self.linear_k.view
self.get_task_ref.training_step
_1D_window.t._1D_window.mm.float.unsqueeze.unsqueeze
torch.nn.utils.rnn.pad_packed_sequence
ax.imshow
is_sil_phoneme
window.torch.getattr
attn.size.attn.size.attn.new.bool.fill_.float
self.get_task_ref.named_children
modules.vocoder.hifigan.hifigan.MultiPeriodDiscriminator
results_queue.put
map_func_
torch.LongTensor
dict
self.pre
re.sub
x_unsqz.permute.contiguous.view
torch.nn.ConvTranspose1d
self.HifiGanGenerator.super.__init__
_istft
self._get_relative_embeddings
librosa.core.get_samplerate
numpy.pad
tensors_to_scalars
self.get_attn_stats
packaging.version.parse
ax.twinx.plot
token.strip
self.embed_tokens
wav_pred.view.cpu
self.get_task_ref.on_before_optimization
args.acoustic_model_path.lower.endswith
utils.commons.hparams.hparams.get_vocoder_cls
local_tgt.mean.unsqueeze
key_padding_mask.size.torch.zeros.type_as
self.BaseTask.super.__init__
self._check_sync_bufs_pre_fwd
emb.torch.cos.emb.torch.sin.torch.cat.view
diagonal_attn.sum
self.mel_losses.items
self.fs2.cwt2f0_norm
self.args_queue.put
src.numel
sample.abs.sum.eq
self.emb
self.SpeechBaseTask.super.validation_end
self._matmul_with_relative_values
SinusoidalPositionalEmbedding.get_embedding
re.compile
numpy.prod
re.sub.split
montreal_forced_aligner.command_line.align.run_align_corpus
modules.commons.rnn.RNNEncoder
wav_pred.clamp.clamp
self.cond_layer
utils.commons.ddp_utils.DDP
gradio.inputs.Textbox
utils.os_utils.copy_file
positionwise_layer
self._training_step
os.environ.split
utils.commons.hparams.hparams.y_.squeeze.mel_spectrogram.transpose.detach
self.file.close
sample.size
numpy.sqrt
f0s.items
samples.get
attn.size.torch.ones.to
montreal_forced_aligner.aligner.TrainableAligner.save
preprocess_args.update
loss_name.self.getattr
self.token_encoder.encode
prons.extend
self.saving_result_pool.add_job
self.weights.size
torch.arange
modules.vocoder.hifigan.hifigan.discriminator_loss
self.get_task_ref.on_keyboard_interrupt
itv_ph.lower
self.eval
wav_pred.abs.max
g2p_en.expand.normalize_numbers
x.permute.contiguous.view
t_s.self.k_channels.self.n_heads.b.key.view.transpose
item.split
y.squeeze.clamp
self.num_heads.attn_mask.repeat.reshape.new_zeros
target.abs.sum
create_parser.parse_known_args
torch.nn.ReLU
tqdm.tqdm.set_postfix
utils.commons.meters.Timer
utils.commons.dataset_utils.batch_by_size
txt.cls.preprocess_text.strip.split
self.p_losses
montreal_forced_aligner.command_line.train_g2p.run_train_g2p
data_gen.tts.runs.train_mfa_align.train_mfa_align
dst.numel
montreal_forced_aligner.utils.get_available_acoustic_languages
sample.abs.sum
self.LogSTFTMagnitudeLoss.super.__init__
x_recon.noise.abs
self.log_metrics_to_tb
self.get_task_ref.on_after_optimization
encdec_attn.max.values.sum.argmax.repeat
x.permute.self.ffn_1.permute
torch.exp
self.norm_layers_1.append
task_cls
weights.ssim_loss.sum
samples_.append
self.wn.remove_weight_norm
skip.append
self._get_input_buffer
self.q_sample
numpy.real
numpy.save
self.process_data
self.conditioner_projection
sentence.strip
REGISTERED_WAV_PROCESSORS.get
hparams.get
ids.list.index
x.size.x_lengths.sequence_mask.torch.unsqueeze.to.unsqueeze
pitch_pred_inp.detach.detach
self.conv
modules.commons.transformer.FastSpeechEncoder
ret.repeat.reshape
FFN
self.ResidualBlock.super.__init__
list.to
self.proj
torch.cos
self.hparams.get
cls.txt_to_ph
torch.nn.init.constant_
sequence_mask
re.sub.replace
self.jobs_pending.append
lf0_rec_sum.std.std
RuntimeError
hparams.update
self.resblocks.append
self.LengthRegulator.super.__init__
logs_q.exp
FastSpeechInfer.example_run
self.add_energy_loss
numpy.diag
self.forward_pitch
mel2ph.np.array.max
split_locs.append
cls.postprocess
torch.norm
self.enc
self.PortaSpeechFlow.super.forward
self.conv_post.apply
self.g_prenet
n_sqz.x_mask.unsqueeze.repeat.view
self.pad
dist
device.b.self.K_step.torch.randint.long.float
torch.nn.functional.l1_loss
dst.copy_
torch.nn.functional.dropout.transpose
self._flatten_parameters
self.data_file.read
torch.Tensor.abs
torch.tril
k.split.split
self._check_global_requires_backward_grad_sync
parser.add_subparsers.add_parser
encdec_attn.gather.gather
dur_input.data.abs.sum
utils.audio.align.get_mel2ph
self.module.validation_step
tasks.tts.vocoder_infer.base_vocoder.register_vocoder
loss_output.values
super.test_start
self.norm_mha
base_fn.replace.replace
flow
self.input_projection
padding_idx.tensor.ne.int
self.forward_decoder
copy_tensor
torch.distributed.all_reduce
utils.nn.seq_utils.softmax.view
l.split
reversed
x_unsqz.permute.contiguous.view.permute
PortaSpeechFlowInfer.example_run
noise
montreal_forced_aligner.utils.get_available_g2p_languages
i.self.conv_layers
wav_pred.clamp.view
self.fvae.decoder
dur_pred.sum.log
l.strip
self.run_single_process
x.abs.sum.long
item_name.append
self.model_gen
torch.nn.functional.glu
f0.clamp.exp
fmap_gs.append
t_t.self.k_channels.self.n_heads.b.query.view.transpose
self.linear_out
mel2ph.float.transpose
self.lambd
super.collater.update
utils.commons.ckpt_utils.get_last_checkpoint
self.last_ln
copy.deepcopy
json.dumps
logging.basicConfig
self.hiddens.append
s.numel
self.pos_bias_u.q.transpose
process_cls
self.pe.to
target.abs.sum.ne
self.training_losses_meter.items
torch.stft
sample.float.float
torch.distributions.Normal.log_prob
self.norm_ff_macaron
torch.bmm
self.get_task_ref.on_epoch_start
self.ups.apply
y_batch.collate_2d.transpose.size
y_d_gs.append
scores.dtype.torch.tensor.numpy
self.prior_dist.log_prob
device.shape.torch.randn.repeat
utils.audio.rnnoise.rnnoise
self.diffusion_embedding
h
self.dump_checkpoint
utils.nn.schedulers.RSQRTSchedule
output.pop
librosa.stft
self.metrics_to_scalars
self.dataset_cls
num_params
self.forward_style_embed
modules.vocoder.hifigan.hifigan.generator_loss
self.n_split.self.n_split.torch.FloatTensor.normal_
preprocessor.txt_to_ph
self.in_layers.append
self.pe.size
x.size.size
shutil.rmtree
eval
i.self.norm_layers_2
numpy.load
skimage.transform.resize
self.norm
self.pos_bias_v.q.transpose
super.on_train_start
numpy.arange
torch.cuda.device_count
attn.size.attn.size.attn.new.bool
self.get_task_ref.cuda
self.gamma.view
q.transpose.transpose
len
src_padding.dur.self.length_regulator.detach.float
txt_tokens.eq.float
_build_mel_basis
g.self.dataset.len.torch.randperm.tolist
y.cpu.numpy
self.get_task_ref.validation_start
ph_lengths.append
lengths.device.maxlen.lengths.len.torch.ones.to.cumsum
self.conv1d
numpy.argmin
txt.split.split
self.lstm
utils.commons.trainer.Trainer
modules.commons.normalizing_flow.res_flow.ResFlow
filecmp.cmp
attn_logits.torch.stack.transpose
q.contiguous.view
torch.IntTensor
matplotlib.pyplot.figure
self.forward_dur
new_config.items
layers.EncoderLayer
utils.nn.seq_utils.weights_nonzero_speech
x.transpose.self.LayerNorm.super.forward.transpose
matplotlib.pyplot.gca.twinx
self.g_prenet.apply
scores.torch.softmax.masked_fill
pickle.dumps
self.dropout
f.read
IndexedDatasetBuilder
resemblyzer.VoiceEncoder.cuda
mask.torch.cumsum.type_as
functools.partial
k.contiguous.view
self.block_length.scores.torch.ones_like.triu.tril
torch.distributions.kl_divergence
numpy.append
binarization_args.update
sampler_cls
self.sin_pos
montreal_forced_aligner.corpus.align_corpus.AlignableCorpus
self.predict
json.load
uv.to.to
window_size.window_size.channel._2D_window.expand.contiguous
SinusoidalPosEmb
torch.randn
self.reducer.prepare_for_forward
yaml.safe_load.update
sample.cpu.numpy.float
dur_pred.np.cumsum.astype
lengths.tolist.unsqueeze
torch.randperm
logs.torch.exp.m.view
torch.nn.InstanceNorm1d
torch.multiprocessing.spawn
torch.nn.parallel.distributed.logging.info
token_mask.long
wav2spec_dict.astype.astype
super
HighwayNetwork
ConvReluNorm
modules.commons.nar_tts_modules.PitchPredictor
run_task
os.path.normpath.startswith
utils.commons.indexed_datasets.IndexedDataset
self.convs1.apply
pycwt.wavelet.cwt
query.self.linear_q.view
self.build_optimizer
wav.np.abs.max
webrtcvad.Vad.is_speech
montreal_forced_aligner.helper.setup_logger.info
torch.Generator
outputs.cpu.numpy
modules.tts.portaspeech.portaspeech_flow.PortaSpeechFlow.eval
self.self_attn
montreal_forced_aligner.config.load_basic_align.update_from_args
self.data_file.close
torch.cuda.set_device
super.__getitem__.get
get_padding
torch.set_grad_enabled
k.startswith
token_gen
seq_range.unsqueeze.expand
self.conv_2
self.p_mean_variance
utils.commons.tensor_utils.tensors_to_scalars.items
copy.copy
modules.commons.layers.LayerNorm
dur.shape.torch.arange.to
self.in_proj_v
tgt_padding_mask.float.size
list
timestep.view
self.module.test_step
input_lengths.cpu.numpy
modules.commons.layers.Embedding.sin
torch.istft
hparams.DIFF_DECODERS
torch.nn.AvgPool1d
montreal_forced_aligner.utils.get_available_lm_languages
c.self.model.view
os.environ.get.split
x.transpose
modules.vocoder.hifigan.hifigan.feature_loss
torch.float.num_embeddings.torch.arange.unsqueeze
torch.nn.parallel.distributed._tree_unflatten_with_rref
BinarizationError
x.transpose.self.conv1d.transpose
matplotlib.pyplot.figure.savefig
MultiprocessManager.close
y2word.x2word.long
numpy.ndim
utils.commons.ddp_utils.DDP.build_model
self.Conv1d1x1.super.__init__
modules.commons.rnn.TacotronEncoder
pe.unsqueeze.to
tasks.tts.fs.FastSpeechTask.build_scheduler
self.stdout.write
decoded_txt.len.mel2ph.torch.LongTensor.mel2token_to_dur.numpy
padding_mask.transpose.float
torch.nn.GroupNorm
CouplingBlock
modules.tts.fs2_orig.FastSpeech2Orig
utils.audio.align.mel2token_to_dur.new_zeros
super.validation_step
f.store_inverse
torch.nn.GRU
numpy.array
self.ConvolutionModule.super.__init__
data_gen.tts.txt_processors.base_text_processor.get_txt_processor_cls
self.cwt2f0_norm
lf0_rec.sum
list.cuda
torch.nn.MaxPool1d
window_size.gaussian.unsqueeze.t
self.load_ckpt
x_padded.view.view_as
numpy.concatenate.cpu
bytes
self.input_to_batch.max
self.token_encoder.decode.split
diagonal_focus_rate.mean
matplotlib.pyplot.gca
cls
AcousticModel2.adaptation_config
librosa.istft
spec.transpose.pow
wav_gt.abs.max
tasks.tts.tts_utils.parse_dataset_configs
torch.nn.functional.dropout
lengths.tolist.max
img1.get_device
x.abs.sum.ne
modules.commons.nar_tts_modules.DurationPredictor
self.plot_cwt
re.split
t_s.self.k_channels.self.n_heads.b.value.view.transpose.view
mel.exp.sum.sqrt
param.grad.float.torch.isnan.any
txt_processor.process
torch.autograd.profiler.record_function
self.head_dim.self.num_heads.bsz.tgt_len.q.contiguous.view.transpose.contiguous
getattr
x_sqz.permute.contiguous.view.permute
txt.cls.preprocess_text.strip
self.num_heads.attn_mask.repeat.reshape
utils.commons.ddp_utils.DDP.configure_optimizers
self.W1.bias.data.fill_
utils.nn.seq_utils.get_incremental_state
p.isalpha
numpy.round
self.build_spk_map
MultiHeadAttention
win_size.torch.hann_window.to
t.float.reshape
self.run_text_encoder
make_pad_mask
SinusoidalPositionalEmbedding
modules.tts.portaspeech.portaspeech.PortaSpeech
self.highways.append
librosa.filters.mel
param.grad.float
ValueError
f0.clamp.log
x.transpose.self.w_1.torch.relu.transpose
wav_pred.view.cpu.float.numpy
montreal_forced_aligner.command_line.train_dictionary.run_train_dictionary
abs
numpy.fromiter
prior_dist.log_prob
max_len.torch.arange.to
self._orig_exit
c.transpose.transpose
self.SpectralConvergengeLoss.super.__init__
EnG2p
self._sync_buffers
IndexedDatasetBuilder.add_item
latent_shape.torch.randn.to
self.out_proj.transpose
k.self.training_losses_meter.update
spec.abs.sum
torch.full

@developer Could please help me check this issue? May I pull a request to fix it? Thank you very much.

setting `uv = f0 = 0` before normalize?

Hey guys, I found you set uv = f0 = 0 at line 57, what's the intension behind this? https://github.com/NATSpeech/NATSpeech/blob/aef3aa8899c82e40a28e4f59d559b46b18ba87e8/utils/audio/pitch/utils.py#L52-L58

BTW, I found same issue in DiffSinger https://github.com/MoonInTheRiver/DiffSinger/issues/47

opened by Cescfangs 1

a question about code?

class FFTBlocks(nn.Module):

def __init__(self, hidden_size, num_layers, ffn_kernel_size=9, dropout=0.0,
             num_heads=2, use_pos_embed=True, use_last_norm=True,
             use_pos_embed_alpha=True):
    super().__init__()
    self.num_layers = num_layers
    embed_dim = self.hidden_size = hidden_size
    self.dropout = dropout
    self.use_pos_embed = use_pos_embed
    self.use_last_norm = use_last_norm
    if use_pos_embed:
        self.max_source_positions = DEFAULT_MAX_TARGET_POSITIONS
        self.padding_idx = 0
        self.pos_embed_alpha = nn.Parameter(torch.Tensor([1])) if use_pos_embed_alpha else 1
        self.embed_positions = SinusoidalPositionalEmbedding(
            embed_dim, self.padding_idx, init_size=DEFAULT_MAX_TARGET_POSITIONS,
        )

    self.layers = nn.ModuleList([])
    self.layers.extend([
        TransformerEncoderLayer(self.hidden_size, self.dropout,
                                kernel_size=ffn_kernel_size, num_heads=num_heads)
        for _ in range(self.num_layers)
    ])
    if self.use_last_norm:
        self.layer_norm = nn.LayerNorm(embed_dim)
    else:
        self.layer_norm = None

def forward(self, x, padding_mask=None, attn_mask=None, return_hiddens=False):
    """
    :param x: [B, T, C]
    :param padding_mask: [B, T]
    :return: [B, T, C] or [L, B, T, C]
    """
    padding_mask = x.abs().sum(-1).eq(0).data if padding_mask is None else padding_mask
    nonpadding_mask_TB = 1 - padding_mask.transpose(0, 1).float()[:, :, None]  # [T, B, 1]
    ```

if self.use_pos_embed: positions = self.pos_embed_alpha * self.embed_positions(x[..., 0]) x = x + positions x = F.dropout(x, p=self.dropout, training=self.training)

        # B x T x C -> T x B x C
        x = x.transpose(0, 1) * nonpadding_mask_TB
        hiddens = []
        for layer in self.layers:
            x = layer(x, encoder_padding_mask=padding_mask, attn_mask=attn_mask) * nonpadding_mask_TB
            hiddens.append(x)
        if self.use_last_norm:
            x = self.layer_norm(x) * nonpadding_mask_TB
        if return_hiddens:
            x = torch.stack(hiddens, 0)  # [L, T, B, C]
            x = x.transpose(1, 2)  # [L, B, T, C]
        else:
            x = x.transpose(0, 1)  # [B, T, C]
        return x`
`class FastSpeechEncoder(FFTBlocks):
    def __init__(self, dict_size, hidden_size=256, num_layers=4, kernel_size=9, num_heads=2,
                 dropout=0.0):

        super().__init__(hidden_size, num_layers, kernel_size, num_heads=num_heads,
                         use_pos_embed=False, dropout=dropout)  # use_pos_embed_alpha for compatibility
        self.embed_tokens = Embedding(dict_size, hidden_size, 0)
        self.embed_scale = math.sqrt(hidden_size)
        self.padding_idx = 0
        self.embed_positions = SinusoidalPositionalEmbedding(
            hidden_size, self.padding_idx, init_size=DEFAULT_MAX_TARGET_POSITIONS,
        )

    def forward(self, txt_tokens, attn_mask=None):
        """

        :param txt_tokens: [B, T]
        :return: {
            'encoder_out': [B x T x C]
        }
        """
        encoder_padding_mask = txt_tokens.eq(self.padding_idx).data
        x = self.forward_embedding(txt_tokens)  # [B, T, H]
        if self.num_layers > 0:
            x = super(FastSpeechEncoder, self).forward(x, encoder_padding_mask, attn_mask=attn_mask)
        return x

    def forward_embedding(self, txt_tokens):
        # embed tokens and positions
        x = self.embed_scale * self.embed_tokens(txt_tokens)
        if self.use_pos_embed:
            positions = self.embed_positions(txt_tokens)
            x = x + positions
        x = F.dropout(x, p=self.dropout, training=self.training)
        return x
`

I see you use position embedding twice when in encoder,and I don't understand the role of the second which I bold it in the code ，can you explain to me? QAQ looking forward to your reply

opened by awmmmm 1

Meet error when using MFA align data

When I try to align the data (ljspeech). I follow the readme part, but when I run python data_gen/tts/runs/train_mfa_align.py --config $CONFIG_NAME I meet the following error. Can anyone know how to solve this problem?

| Unknow hparams:  []
| Run MFA for ljspeech. Env vars: CORPUS=ljspeech NUM_JOB=10 MFA_OUTPUTS=mfa_outputs MFA_INPUTS=mfa_inputs MFA_CMD=train
| Training MFA using 10 cores.
ERROR - There was an error in the run, please see the log.
DictionaryError:

  Error parsing line 0 of data/processed/ljspeech/mfa_dict.txt: Did not find any tabs, please ensure that your 
    dictionary has tabs between words and their pronunciations.

opened by yangdongchao 0

Need steps required my custom data

I have audio data and corresponding phoneme data along with syllable boundaries and stress information. How should I tokenize and encode phoneme and syllable level information along with stress information?

opened by kafan1986 0
fix multiprocessing bug
Hi @RayeRen, today I've tried your preprocess.py code for LJSpeech dataset. And I realized that there were some processed items in metadata.json doesn't have ph_token key or len(ph.split()) is not equal to len(ph_token). So I've checked your code and found the problem with the line 123 in utils/commons/multiprocess_utils.py.

https://github.com/NATSpeech/NATSpeech/blob/e7e68d68f3ee70c8d13a1d689b6d69b79331825d/utils/commons/multiprocess_utils.py#L120-L125

For my understanding, you tried to return the indices and results as passed order instead of processed order, so I think i_now should be yielded instead of job_i. I also code a small snippet to debug your code, you can use it as reference.

def test_map_func(idx): import time time.sleep(0.2) return {"number": idx*2, "id": idx} if __name__ == "__main__": args = [{"idx": idx} for idx in range(100)] ids = [] for idx, x in multiprocess_run_tqdm(test_map_func, args): # args[idx].update(x) ids.append(idx) print(ids) # print [0, 1, 2, 3, 4, 5, 5, 7, 7, 7, 7, 7, ...] with yield job_i # print [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, ...] with yield i_now
opened by leminhnguyen 0

Project dependencies may have API risk issues

opened by PyDeps 0

How to solve find_unused_parameters warning?

Dear author,

I try to train Fastspeech and FastSpeech 2 with shared code.

After "sanity val", the find_unused_parameter warning is appeared like below images.

This issue doesn't matter to run code, but it cause memory issue after about 50000 steps when I set find_unused_paramter=False. Is it any solution to solve this issue?

opened by jisang93 0

unable to open shared memory object in read-write mode

I run binarize.py, it returns error:

Traceback (most recent call last):                                                                                          
  File "/home/jiangbingyu/miniconda3/envs/synta/lib/python3.7/multiprocessing/queues.py", line 236, in _feed                
    obj = _ForkingPickler.dumps(obj)                                                                                        
  File "/home/jiangbingyu/miniconda3/envs/synta/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps              
    cls(buf, protocol).dump(obj)                                                                                            
  File "/home/jiangbingyu/miniconda3/envs/synta/lib/python3.7/site-packages/torch/multiprocessing/reductions.py", line 319, 
in reduce_storage                                                                                                           
    metadata = storage._share_filename_()                                                                                   
RuntimeError: unable to open shared memory object </torch_14375_33168349> in read-write mode

I try num_workers>0, all get this error. And if I set num_workers=0, the program will hang .

so, how to deal with it?

opened by leon2milan 0

checkpoint is loaded twice

It seems that checkpoint is loaded twice.

The checkpoint is loaded according to path specified by hparams['load_checkpoint'].
https://github.com/NATSpeech/NATSpeech/blob/238165e8cd430531b69c484cabb032c1313ee73b/utils/commons/trainer.py#L150

However, the checkpoint is loaded again https://github.com/NATSpeech/NATSpeech/blob/238165e8cd430531b69c484cabb032c1313ee73b/utils/commons/trainer.py#L153-L155

This can overwrite model parameters with the last checkpoint, which means that hparams['load_checkpoint'] is useless.

opened by unrea1-sama 0

How to compute negative log-likelihood?

Dear author,

    In your code, you use `prior_dist = dist.Normal(0, 1)` to establish a normal distribution, and use `-prior_dist.log_prob(z_postflow).mean()` to compute NLL (z_postflow means the training output of post-net).  However, I don't understand how this method works. 

    Moreover, VITS uses `torch.sum(0.5 * (math.log(2*math.pi)+(z**2)) * x_mask, [1, 2])` to compute NLL; WaveGlow uses `z ** 2 / (2 * sigma**2)` to compute NLL. My second question is where their methods are different from yours.

    Wish you an early reply. Thank you.

opened by hongchengzhu 0

pretrained_models(Feb 13, 2022)
This release includes LJSpeech pre-trained models:

hifi_lj.zip: HiFi-GAN

ps_normal_exp.zip: PortaSpeech (normal version)

ps_small_exp.zip: PortaSpeech (small version)

ds_exp.zip: DiffSpeech

aux_exp.zip: a pre-trained FastSpeech2 checkpoint for training DiffSpeech

Source code(tar.gz)
Source code(zip)
aux_exp.zip(264.78 MB)
ds_exp.zip(248.90 MB)
hifi_lj.zip(49.45 MB)
ps_normal_exp.zip(274.92 MB)
ps_small_exp.zip(86.53 MB)

PyTorch implementation of NATSpeech: A Non-Autoregressive Text-to-Speech Framework

Related tags

Overview

NATSpeech: A Non-Autoregressive Text-to-Speech Framework

Key Features

Install Dependencies

Documents

Citation

Acknowledgments

Comments

Releases(pretrained_models)

pretrained_models(Feb 13, 2022)

Owner

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech (BVAE-TTS)

PyTorch Implementation of "Non-Autoregressive Neural Machine Translation"

PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.

A Non-Autoregressive Transformer based TTS, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS.

Learning to Rewrite for Non-Autoregressive Neural Machine Translation

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Simple Speech to Text, Text to Speech

Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

A Python module made to simplify the usage of Text To Speech and Speech Recognition.

Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

ConferencingSpeech2022; Non-intrusive Objective Speech Quality Assessment (NISQA) Challenge

Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing

A PyTorch Implementation of End-to-End Models for Speech-to-Text

A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation