Hello,
I think the "unispeech_sat.th" is wrong.
I have just cloned the repository and tried the speaker verification with Unispeech-SAT and when I launch the example :
python verification.py --model_name unispeech_sat --wav1 vox1_data/David_Faustino/hn8GyCJIfLM_0000012.wav --wav2 vox1_data/Josh_Gad/HXUqYaOwrxA_0000015.wav --checkpoint UniSpeech-SAT-Large.pt
I have an error (end of the traceback):
File "/data/coros1/ddallon/workspace/UniSpeech/UniSpeech-SAT/fairseq/models/__init__.py", line 88, in build_model assert model is not None, ( AssertionError: Could not infer model type from {'_name': 'bc_m_hubert', 'label_rate': 50, 'extractor_mode': 'layer_norm', 'structure_type': 'transformer', 'encoder_layers': 24, 'encoder_embed_dim': 1024, 'encoder_ffn_embed_dim': 4096, 'encoder_attention_heads': 16, 'activation_fn': 'gelu', 'dropout': 0.0, 'attention_dropout': 0.0, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.0, 'dropout_input': 0.0, 'dropout_features': 0.0, 'final_dim': 768, 'untie_final_proj': True, 'layer_norm_first': True, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 1.0, 'boundary_mask': False, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': 'static', 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': 'static', 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': False, 'relative_position_embedding': False, 'num_buckets': 320, 'max_distance': 1280, 'gru_rel_pos': False, 'expand_attention_head_size': -1, 'streaming': False, 'chunk_size': 0, 'left_chunk': 0, 'num_negatives': 0, 'negatives_from_everywhere': False, 'cross_sample_negatives': 100, 'codebook_negatives': 0, 'quantize_targets': True, 'latent_vars': 320, 'latent_groups': 2, 'latent_dim': 0, 'spk_layer': 12, 'mixing_max_len': -1, 'mixing_prob': 0.5, 'mixing_num': 1, 'pretrained_path': ''}. Available models: dict_keys(['wav2vec', 'wav2vec2', 'wav2vec_ctc', 'wav2vec_seq2seq', 'wav2vec_transducer', 'hubert', 'hubert_ctc', 'transformer_lm', 'unispeech_sat']) Requested model type: bc_m_hubert
And I notice that "bc_m_hubert" appears only in "unispeech_sat.th".
Could you check it or help me ? :-)