HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools

Overview

HuggingSound

HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools.

I have no intention of building a very complex tool here. I just wanna have an easy-to-use toolkit for my speech-related experiments. I hope this library could be helpful for someone else too :)

Requirements

  • Python 3.7+

Installation

$ pip install huggingsound

How to use it?

I'll try to summarize the usage of this toolkit. But many things will be missing from the documentation below. I promise to make it better soon. For now, you can open an issue if you have some questions or look at the source code to see how it works. You can check more usage examples in the repository examples folder.

Speech recognition

For speech recognition you can use any CTC model hosted on the Hugging Face Hub. You can find some available models here.

Inference

from huggingsound import SpeechRecognitionModel

model = SpeechRecognitionModel("jonatasgrosman/wav2vec2-large-xlsr-53-english")
audio_paths = ["/path/to/sagan.mp3", "/path/to/asimov.wav"]

transcriptions = model.transcribe(audio_paths)

print(transcriptions)

# transcriptions format (a list of dicts, one for each audio file):
# [
#  {
#   "transcription": "extraordinary claims require extraordinary evidence", 
#   "start_timestamps": [100, 120, 140, 180, ...],
#   "end_timestamps": [120, 140, 180, 200, ...],
#   "probabilities": [0.95, 0.88, 0.9, 0.97, ...]
# },
# ...]
#
# as you can see, not only the transcription is returned but also the timestamps (in milliseconds) 
# and probabilities of each character of the transcription.

Inference (boosted by a language model)

from huggingsound import SpeechRecognitionModel, KenshoLMDecoder

model = SpeechRecognitionModel("jonatasgrosman/wav2vec2-large-xlsr-53-english")
audio_paths = ["/path/to/sagan.mp3", "/path/to/asimov.wav"]

# The LM format used by the LM decoders is the KenLM format (arpa or binary file).
# You can download some LM files examples from here: https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-english/tree/main/language_model
lm_path = "path/to/your/lm_files/lm.binary"
unigrams_path = "path/to/your/lm_files/unigrams.txt"

# We implemented three different decoders for LM boosted decoding: KenshoLMDecoder, ParlanceLMDecoder, and FlashlightLMDecoder
# On this example, we'll use the KenshoLMDecoder
# To use this decoder you'll need to install the Kensho's ctcdecode first (https://github.com/kensho-technologies/pyctcdecode)
decoder = KenshoLMDecoder(model.token_set, lm_path=lm_path, unigrams_path=unigrams_path)

transcriptions = model.transcribe(audio_paths, decoder=decoder)

print(transcriptions)

Evaluation

from huggingsound import SpeechRecognitionModel

model = SpeechRecognitionModel("jonatasgrosman/wav2vec2-large-xlsr-53-english")

references = [
    {"path": "/path/to/sagan.mp3", "transcription": "extraordinary claims require extraordinary evidence"},
    {"path": "/path/to/asimov.wav", "transcription": "violence is the last refuge of the incompetent"},
]

evaluation = model.evaluate(references)

print(evaluation)

# evaluation format: {"wer": 0.08, "cer": 0.02}

Fine-tuning

from huggingsound import TrainingArguments, ModelArguments, SpeechRecognitionModel, TokenSet

model = SpeechRecognitionModel("facebook/wav2vec2-large-xlsr-53")
output_dir = "my/finetuned/model/output/dir"

# first of all, you need to define your model's token set
# however, the token set is only needed for non-finetuned models
# if you pass a new token set for an already finetuned model, it'll be ignored during training
tokens = ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "'"]
token_set = TokenSet(tokens)

# define your train/eval data
train_data = [
    {"path": "/path/to/sagan.mp3", "transcription": "extraordinary claims require extraordinary evidence"},
    {"path": "/path/to/asimov.wav", "transcription": "violence is the last refuge of the incompetent"},
]
eval_data = [
    {"path": "/path/to/sagan2.mp3", "transcription": "absence of evidence is not evidence of absence"},
    {"path": "/path/to/asimov2.wav", "transcription": "the true delight is in the finding out rather than in the knowing"},
]

# and finally, fine-tune your model
model.finetune(
    output_dir, 
    train_data=train_data, 
    eval_data=eval_data, # the eval_data is optional
    token_set=token_set,
)

Troubleshooting

  • If you are having trouble when loading MP3 files: $ sudo apt-get install ffmpeg

Want to help?

See the contribution guidelines if you'd like to contribute to HuggingSound project.

You don't even need to know how to code to contribute to the project. Even the improvement of our documentation is an outstanding contribution.

If this project has been useful for you, please share it with your friends. This project could be helpful for them too.

If you like this project and want to motivate the maintainers, give us a . This kind of recognition will make us very happy with the work that we've done with ❤️

You can also "Buy Me A Coffee"

Citation

If you want to cite the tool you can use this:

@misc{grosman2022huggingsound,
  title={HuggingSound},
  author={Grosman, Jonatas},
  publisher={GitHub},
  journal={GitHub repository},
  howpublished={\url{https://github.com/jonatasgrosman/huggingsound}},
  year={2022}
}
Comments
  • Compability with python 3.10

    Compability with python 3.10

    This package cannot be installed with python 3.10.

    Trying to install the wheel manually it complains about the correct numba version not being available. Is numba<0.54.0,>=0.53.1 really required instead e.g. numba==0.55.0 or any other version which is available for python 3.10?

    I would love to use this library, but currently it does not seem to be possible to install it on Ubuntu 22.04.

    opened by FredHaa 5
  • 'CTCTrainer' object has no attribute 'use_amp'

    'CTCTrainer' object has no attribute 'use_amp'

    Use the latest huggingsound.

    #!pip list | grep huggingsound
    huggingsound 0.1.4
    

    AttributeError occurs when finetune is performed as shown in the sample below. https://github.com/jonatasgrosman/huggingsound#fine-tuning

    /usr/local/lib/python3.7/dist-packages/huggingsound/trainer.py in training_step(self, model, inputs) 432 inputs = self._prepare_inputs(inputs) 433 --> 434 if self.use_amp: 435 with torch.cuda.amp.autocast(): 436 loss = self.compute_loss(model, inputs)

    AttributeError: 'CTCTrainer' object has no attribute 'use_amp'

    Can you find the cause?

    opened by its-ogawa 5
  • Solved issue related to prediction padding and pad_token_id

    Solved issue related to prediction padding and pad_token_id

    This is a relatively hidden bug and took me quite some time to debug :)

    Issue _compute_metrics() in the evaluation loop calculates wrong CER & WER metrics when using a TokenSet where the pad_token_id is not equal to 0. This doesn't affect the loss calculation / training as such, but the logged metrics during training will be wrong and won't match the metrics calculated using model.evaluate() after training.

    Reason Similar to the label_ids the prediction logits are passed to _compute_metrics() as a matrix padded with -100 values. Currently the argmax call which maps logits to token ids converts these -100 values to 0. So after the argmax, pred_ids will be 0-padded. For most wav2vec2 models this is not an issue, because their vocab.json assigns the ID 0 to the <pad> token. However, if you use a custom TokenSet for finetuning, <pad> will most probably not be mapped to 0, so the obtained "0-padding" values will wrongly correpond to another token. image See relevant code here: https://github.com/jonatasgrosman/huggingsound/blob/main/huggingsound/trainer.py#L599

    pred_logits = pred.predictions
    pred_ids = np.argmax(pred_logits, axis=-1)
    
    pred.label_ids[pred.label_ids == -100] = processor.tokenizer.pad_token_id
    

    Proposed Solution Save a padding_mask which stores the location of the padded -100 in the prediction logits. Then after applying the argmax use the mask to set the padded entries to the ID corresponding to the padding token.

    opened by nkaenzig 2
  • Question about '1b' model

    Question about '1b' model

    Dear Jonatas,

    Question, not a bug-report. The jonatasgrosman/wav2vec2-xls-r-1b-german model removes all numbers. Is there a way to recognize numbers?

    Thank you for your great models! Best wishes from Vienna Markus

    Testcase - output.zip Meaning: etwa 20000 euro - ungefähr 12000 euro, 1b result: etwa euro - ungefähr euro

    import torch, transformers, librosa
    filepath = 'output.wav'
    for MODEL_ID in ['jonatasgrosman/wav2vec2-large-xlsr-53-german','jonatasgrosman/wav2vec2-xls-r-1b-german']:
        processor = transformers.Wav2Vec2Processor.from_pretrained(MODEL_ID)
        model = transformers.Wav2Vec2ForCTC.from_pretrained(MODEL_ID)
        speech_array, sampling_rate = librosa.load(filepath, sr=16_000)
        inputs = processor(speech_array, sampling_rate=16_000, return_tensors="pt", padding=True)
        with torch.no_grad():
            logits = model(inputs.input_values, attention_mask=inputs.attention_mask).logits
        predicted_ids = torch.argmax(logits, dim=-1)
        predicted_sentences = processor.batch_decode(predicted_ids)
        print( MODEL_ID, predicted_sentences[0] )
    
    opened by doublex 2
  • Getting error during training

    Getting error during training

    RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor

    Which means that the data did not move to GPU.

    My code:

    torch.device("cuda")

    model = SpeechRecognitionModel("jonatasgrosman/wav2vec2-large-xlsr-53-spanish", device="cuda")
    processor_ref = Wav2Vec2Processor.from_pretrained("jonatasgrosman/wav2vec2-large-xlsr-53-spanish")
    token_list = list(processor_ref.tokenizer.encoder.keys())
    token_set = TokenSet(token_list)
    
    train_set = []
    eval_set = []
    
    train_set, eval_set = add_sealed_data_set(train_set, eval_set, config[environment][SAMPLES_DIR])
    
    training_arguments = TrainingArguments()
    training_arguments.overwrite_output_dir = True
    training_arguments.per_device_train_batch_size = 128
    training_arguments.per_device_eval_batch_size = 128
    
    model.finetune(
        config[environment][MODEL_OUTPUT_DIR],
        train_data=train_set,
        eval_data=eval_set,  # the eval_data is optional
        token_set=token_set,
        training_args=training_arguments
    )
    

    Managing to work around this by adding a move to cuda of my dataset inside huggingsound code. If I can make it work I'll create a PR

    opened by arikhalperin 2
  • Pre-trained uppercase models don't work

    Pre-trained uppercase models don't work

    First of all thanks for this great library, it's really helpful :)

    I just tried to fine-tune a model by facebook that they previously fine-tuned on English transcription tasks: facebook/wav2vec2-large-960h-lv60-self.

    During the training I get WERs of 100%, and after training, model.transcribe() returns empty results.

    The issue seems to be that this model was trained with a upper-case character vocabulary.

    To overcome this, I found this very easy fix, which just converts the vocabulary of the encoder/decoder to lower case:

    from huggingsound import TrainingArguments, ModelArguments, SpeechRecognitionModel, TokenSet
    model = SpeechRecognitionModel(model_name, device='cuda')
    
    model.processor.tokenizer.encoder = {k.lower(): v for k, v in model.processor.tokenizer.encoder.items()}
    model.processor.tokenizer.decoder = {k: v.lower() for k, v in model.processor.tokenizer.decoder.items()} 
    

    Would be great to integrate this somehow into the library.

    opened by nkaenzig 2
  • raise NoBackendError() audioread.exceptions.NoBackendError

    raise NoBackendError() audioread.exceptions.NoBackendError

    First of all thank you for your work! I am not able to run the transcibe() method.

    This is my code:

    from huggingsound import SpeechRecognitionModel
    
    model = SpeechRecognitionModel("jonatasgrosman/wav2vec2-large-xlsr-53-german")
    path = r"C:/Users/johndoe/PycharmProjects/kedro_pipeline/data/01_raw/Bond_ueber_das_wetter_und_berlin.mp3"
    audio_paths = [path]
    
    transcriptions = model.transcribe(audio_paths)
    

    I assume my path is not correct, but I already tried different Formats:

    r"C:\\Users\\johndoe\\..."
    r"C:\Users\johndoe\..."
    

    -> did not work either.

    This is the output:

    02/24/2022 11:40:11 - INFO - huggingsound.speech_recognition.model - Loading model...
      0%|          | 0/1 [00:00<?, ?it/s]
    Traceback (most recent call last):
      File "C:\Users\johndoe\anaconda3\envs\lib\site-packages\librosa\core\audio.py", line 149, in load
        with sf.SoundFile(path) as sf_desc:
      File "C:\Users\johndoe\anaconda3\envs\lib\site-packages\soundfile.py", line 629, in __init__
        self._file = self._open(file, mode_int, closefd)
      File "C:\Users\johndoe\anaconda3\envs\lib\site-packages\soundfile.py", line 1183, in _open
        _error_check(_snd.sf_error(file_ptr),
      File "C:\Users\johndoe\anaconda3\envs\lib\site-packages\soundfile.py", line 1357, in _error_check
        raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
    RuntimeError: Error opening 'C:/Users/johndoe/PycharmProjects/kedro_pipeline/data/01_raw/Bond_ueber_das_wetter_und_berlin.mp3': File contains data in an unknown format.
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "C:/Users/johndoe/PycharmProjects/main.py", line 7, in <module>
        transcriptions = model.transcribe(audio_paths)
      File "C:\Users\johndoe\anaconda3\envs\lib\site-packages\huggingsound\speech_recognition\model.py", line 108, in transcribe
        waveforms = get_waveforms(paths_batch, sampling_rate)
      File "C:\Users\johndoe\anaconda3\envs\lib\site-packages\huggingsound\utils.py", line 52, in get_waveforms
        waveform, sr = librosa.load(path, sr=sampling_rate)
      File "C:\Users\johndoe\anaconda3\envs\lib\site-packages\librosa\core\audio.py", line 166, in load
        y, sr_native = __audioread_load(path, offset, duration, dtype)
      File "C:\Users\johndoe\anaconda3\envs\lib\site-packages\librosa\core\audio.py", line 190, in __audioread_load
        with audioread.audio_open(path) as input_file:
      File "C:\Users\johndoe\anaconda3\envs\lib\site-packages\audioread\__init__.py", line 116, in audio_open
        raise NoBackendError()
    audioread.exceptions.NoBackendError
    
    Process finished with exit code 1
    
    opened by DanielGuo1 2
  • Fine-tuned version of model - a raised exception

    Fine-tuned version of model - a raised exception

    Hello.

    I am getting the following exception:

    ValueError: Not fine-tuned model! Please, fine-tune the model first.
    

    I have looked into the code and see that it needs to have Wav2Vec2ForPreTraining (self.model_config.architectures) in the ctc_finetuded_architectures variable.

    Now this variable has these values:

    {'WavLMForCTC', 'HubertForCTC', 'UniSpeechSatForCTC', 'Wav2Vec2ForCTC', 'UniSpeechForCTC', 'SEWForCTC', 'SEWDForCTC'}

    I am running the code with this model - https://huggingface.co/Yehor/wav2vec2-xls-r-300m-uk-with-lm

    I disabled the code that raises that exception and it seems there is no issue.

    I would like to use some type of configuration to be able to run the code without changing the library code.

    opened by egorsmkv 2
  • Bump datasets from 1.18.3 to 2.6.1

    Bump datasets from 1.18.3 to 2.6.1

    Bumps datasets from 1.18.3 to 2.6.1.

    Release notes

    Sourced from datasets's releases.

    2.6.1

    Bug fixes

    New Contributors

    Full Changelog: https://github.com/huggingface/datasets/compare/2.6.0...2.6.1

    2.6.0

    Important

    Datasets features

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 1
  • Bump transformers from 4.16.2 to 4.23.1

    Bump transformers from 4.16.2 to 4.23.1

    Bumps transformers from 4.16.2 to 4.23.1.

    Release notes

    Sourced from transformers's releases.

    v4.23.1 Patch release

    Fix a revert introduced by mistake making the "automatic-speech-recognition" for Whisper.

    v4.23.0: Whisper, Deformable DETR, Conditional DETR, MarkupLM, MSN, safetensors

    Whisper

    The Whisper model was proposed in Robust Speech Recognition via Large-Scale Weak Supervision by Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever.

    Whisper is an encoder-decoder Transformer trained on 680,000 hours of labeled (transcribed) audio. The model shows impressive performance and robustness in a zero-shot setting, in multiple languages.

    Deformable DETR

    The Deformable DETR model was proposed in Deformable DETR: Deformable Transformers for End-to-End Object Detection by Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai.

    Deformable DETR mitigates the slow convergence issues and limited feature spatial resolution of the original DETR by leveraging a new deformable attention module which only attends to a small set of key sampling points around a reference.

    Conditional DETR

    The Conditional DETR model was proposed in Conditional DETR for Fast Training Convergence by Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang.

    Conditional DETR presents a conditional cross-attention mechanism for fast DETR training. Conditional DETR converges 6.7× to 10× faster than DETR.

    Time Series Transformer

    The Time Series Transformer model is a vanilla encoder-decoder Transformer for time series forecasting.

    The model is trained in a similar way to how one would train an encoder-decoder Transformer (like T5 or BART) for machine translation; i.e. teacher forcing is used. At inference time, one can autoregressively generate samples, one time step at a time.

    :warning: This is a recently introduced model and modality, so the API hasn't been tested extensively. There may be some bugs or slight breaking changes to fix it in the future. If you see something strange, file a Github Issue.

    Masked Siamese Networks

    The ViTMSN model was proposed in Masked Siamese Networks for Label-Efficient Learning by Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas.

    MSN (masked siamese networks) consists of a joint-embedding architecture to match the prototypes of masked patches with that of the unmasked patches. With this setup, the method yields excellent performance in the low-shot and extreme low-shot regimes for image classification, outperforming other self-supervised methods such as DINO. For instance, with 1% of ImageNet-1K labels, the method achieves 75.7% top-1 accuracy.

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 1
  • Bump datasets from 1.18.3 to 2.5.2

    Bump datasets from 1.18.3 to 2.5.2

    Bumps datasets from 1.18.3 to 2.5.2.

    Release notes

    Sourced from datasets's releases.

    2.5.2

    Bug fixes

    • Revert task removal in folder-based builders (#5051)
    • Support hfh 0.10 implicit auth (#5031)

    Full Changelog: https://github.com/huggingface/datasets/compare/2.5.1...2.5.2

    2.5.1

    Bug fixes

    Full Changelog: https://github.com/huggingface/datasets/compare/2.5.0...2.5.1

    2.5.0

    Important

    Datasets features

    No-code loaders

    Dataset methods

    Parquet support

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 1
  • Error during fine-tuning

    Error during fine-tuning

    I have code:

    from huggingsound import TrainingArguments, ModelArguments, SpeechRecognitionModel, TokenSet
    from transformers import Wav2Vec2Processor
    
    processor_ref = Wav2Vec2Processor.from_pretrained("/my/dir/wav2vec2-large-xlsr-53-kalmyk")
    token_list = list(processor_ref.tokenizer.encoder.keys())
    print(len(token_list))
    
    model = SpeechRecognitionModel("/my/dir/wav2vec2-large-xlsr-53-kalmyk")
    output_dir = "/my/dir/tuned"
    
    token_set = TokenSet(token_list)
    
    model.finetune(
        output_dir, 
        train_data=train_data,
        token_set=token_set
    )
    

    I have list of dicts like this in my train_data:

    train_data = [
        {"path": "/path/to/sagan.mp3", "transcription": "extraordinary claims require extraordinary evidence"},
        {"path": "/path/to/asimov.wav", "transcription": "violence is the last refuge of the incompetent"},
    ]
    

    Then i get some errors. Can someone help me with that?

    	size mismatch for lm_head.weight: copying a param with shape torch.Size([41, 1024]) from checkpoint, the shape in current model is torch.Size([45, 1024]).
    	size mismatch for lm_head.bias: copying a param with shape torch.Size([41]) from checkpoint, the shape in current model is torch.Size([45]).
    	You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method. ```
    opened by utnasun 0
  • Bump transformers from 4.23.1 to 4.24.0

    Bump transformers from 4.23.1 to 4.24.0

    Bumps transformers from 4.23.1 to 4.24.0.

    Release notes

    Sourced from transformers's releases.

    v4.24.0: ESM-2/ESMFold, LiLT, Flan-T5, Table Transformer and Contrastive search decoding

    ESM-2/ESMFold

    ESM-2 and ESMFold are new state-of-the-art Transformer protein language and folding models from Meta AI's Fundamental AI Research Team (FAIR). ESM-2 is trained with a masked language modeling objective, and it can be easily transferred to sequence and token classification tasks for proteins. Checkpoints exist in various sizes, from 8 million parameters up to a huge 15 billion parameter model.

    ESMFold is a state-of-the-art single sequence protein folding model which produces high accuracy predictions significantly faster. Unlike previous protein folding tools like AlphaFold2 and openfold, ESMFold uses a pretrained protein language model to generate token embeddings that are used as input to the folding model, and so does not require a multiple sequence alignment (MSA) of related proteins as input. As a result, proteins can be folded in a single forward pass of the model without requiring any external databases or search/alignment tools to be present at inference time. This hugely reduces the time and compute requirements for folding.

    Transformer protein language models were introduced in the paper Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences by Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus.

    ESMFold was introduced in the paper Language models of protein sequences at the scale of evolution enable accurate structure prediction by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, and Alexander Rives.

    LiLT

    LiLT allows to combine any pre-trained RoBERTa text encoder with a lightweight Layout Transformer, to enable LayoutLM-like document understanding for many languages.

    It was proposed in LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding by Jiapeng Wang, Lianwen Jin, Kai Ding.

    Flan-T5

    FLAN-T5 is an enhanced version of T5 that has been finetuned on a mixture of tasks.

    It was released in the paper Scaling Instruction-Finetuned Language Models by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei.

    Table Transformer

    Table Transformer is a model that can perform table extraction and table structure recognition from unstructured documents based on the DETR architecture.

    It was proposed in PubTables-1M: Towards comprehensive table extraction from unstructured documents by Brandon Smock, Rohith Pesala, Robin Abraham.

    Contrastive search decoding

    Contrastive search decoding is a new state-of-the-art generation method which aims at reducing the repetitive patterns in which generation models often fall.

    It was introduced in A Contrastive Framework for Neural Text Generation by Yixuan Su, Tian Lan, Yan Wang, Dani Yogatama, Lingpeng Kong, Nigel Collier.

    • Adding the state-of-the-art contrastive search decoding methods for the codebase of generation_utils.py by @​gmftbyGMFTBY in #19477

    Safety and security

    We continue to explore the new serialization format not using Pickle via the safetensors library, this time by adding support for TensorFlow models. More checkpoints have been converted to this format. Support is still experimental.

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 0
  • Bump pytest from 5.4.3 to 7.2.0

    Bump pytest from 5.4.3 to 7.2.0

    Bumps pytest from 5.4.3 to 7.2.0.

    Release notes

    Sourced from pytest's releases.

    7.2.0

    pytest 7.2.0 (2022-10-23)

    Deprecations

    • #10012: Update pytest.PytestUnhandledCoroutineWarning{.interpreted-text role="class"} to a deprecation; it will raise an error in pytest 8.

    • #10396: pytest no longer depends on the py library. pytest provides a vendored copy of py.error and py.path modules but will use the py library if it is installed. If you need other py.* modules, continue to install the deprecated py library separately, otherwise it can usually be removed as a dependency.

    • #4562: Deprecate configuring hook specs/impls using attributes/marks.

      Instead use :pypytest.hookimpl{.interpreted-text role="func"} and :pypytest.hookspec{.interpreted-text role="func"}. For more details, see the docs <legacy-path-hooks-deprecated>{.interpreted-text role="ref"}.

    • #9886: The functionality for running tests written for nose has been officially deprecated.

      This includes:

      • Plain setup and teardown functions and methods: this might catch users by surprise, as setup() and teardown() are not pytest idioms, but part of the nose support.
      • Setup/teardown using the @​with_setup decorator.

      For more details, consult the deprecation docs <nose-deprecation>{.interpreted-text role="ref"}.

    Features

    • #9897: Added shell-style wildcard support to testpaths.

    Improvements

    • #10218: @pytest.mark.parametrize() (and similar functions) now accepts any Sequence[str] for the argument names, instead of just list[str] and tuple[str, ...].

      (Note that str, which is itself a Sequence[str], is still treated as a comma-delimited name list, as before).

    • #10381: The --no-showlocals flag has been added. This can be passed directly to tests to override --showlocals declared through addopts.

    • #3426: Assertion failures with strings in NFC and NFD forms that normalize to the same string now have a dedicated error message detailing the issue, and their utf-8 representation is expresed instead.

    • #7337: A warning is now emitted if a test function returns something other than [None]{.title-ref}. This prevents a common mistake among beginners that expect that returning a [bool]{.title-ref} (for example [return foo(a, b) == result]{.title-ref}) would cause a test to pass or fail, instead of using [assert]{.title-ref}.

    • #8508: Introduce multiline display for warning matching via :pypytest.warns{.interpreted-text role="func"} and enhance match comparison for :py_pytest._code.ExceptionInfo.match{.interpreted-text role="func"} as returned by :pypytest.raises{.interpreted-text role="func"}.

    • #8646: Improve :pypytest.raises{.interpreted-text role="func"}. Previously passing an empty tuple would give a confusing error. We now raise immediately with a more helpful message.

    • #9741: On Python 3.11, use the standard library's tomllib{.interpreted-text role="mod"} to parse TOML.

      tomli{.interpreted-text role="mod"}` is no longer a dependency on Python 3.11.

    • #9742: Display assertion message without escaped newline characters with -vv.

    • #9823: Improved error message that is shown when no collector is found for a given file.

    ... (truncated)

    Commits
    • 3af3f56 Prepare release version 7.2.0
    • bc2c3b6 Merge pull request #10408 from NateMeyvis/patch-2
    • d84ed48 Merge pull request #10409 from pytest-dev/asottile-patch-1
    • ffe49ac Merge pull request #10396 from pytest-dev/pylib-hax
    • d352098 allow jobs to pass if codecov.io fails
    • c5c562b Fix typos in CONTRIBUTING.rst
    • d543a45 add deprecation changelog for py library vendoring
    • f341a5c Merge pull request #10407 from NateMeyvis/patch-1
    • 1027dc8 [pre-commit.ci] auto fixes from pre-commit.com hooks
    • 6b905ee Add note on tags to CONTRIBUTING.rst
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 0
  • Bump torch from 1.10.2 to 1.13.0

    Bump torch from 1.10.2 to 1.13.0

    Bumps torch from 1.10.2 to 1.13.0.

    Release notes

    Sourced from torch's releases.

    PyTorch 1.13: beta versions of functorch and improved support for Apple’s new M1 chips are now available

    Pytorch 1.13 Release Notes

    • Highlights
    • Backwards Incompatible Changes
    • New Features
    • Improvements
    • Performance
    • Documentation
    • Developers

    Highlights

    We are excited to announce the release of PyTorch 1.13! This includes stable versions of BetterTransformer. We deprecated CUDA 10.2 and 11.3 and completed migration of CUDA 11.6 and 11.7. Beta includes improved support for Apple M1 chips and functorch, a library that offers composable vmap (vectorization) and autodiff transforms, being included in-tree with the PyTorch release. This release is composed of over 3,749 commits and 467 contributors since 1.12.1. We want to sincerely thank our dedicated community for your contributions.

    Summary:

    • The BetterTransformer feature set supports fastpath execution for common Transformer models during Inference out-of-the-box, without the need to modify the model. Additional improvements include accelerated add+matmul linear algebra kernels for sizes commonly used in Transformer models and Nested Tensors is now enabled by default.

    • Timely deprecating older CUDA versions allows us to proceed with introducing the latest CUDA version as they are introduced by Nvidia®, and hence allows support for C++17 in PyTorch and new NVIDIA Open GPU Kernel Modules.

    • Previously, functorch was released out-of-tree in a separate package. After installing PyTorch, a user will be able to import functorch and use functorch without needing to install another package.

    • PyTorch is offering native builds for Apple® silicon machines that use Apple's new M1 chip as a beta feature, providing improved support across PyTorch's APIs.

    Stable Beta Prototype
    Better TransformerCUDA 10.2 and 11.3 CI/CD Deprecation Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIsExtend NNC to support channels last and bf16Functorch now in PyTorch Core LibraryBeta Support for M1 devices Arm® Compute Library backend support for AWS Graviton CUDA Sanitizer

    You can check the blogpost that shows the new features here.

    Backwards Incompatible changes

    Python API

    uint8 and all integer dtype masks are no longer allowed in Transformer (#87106)

    Prior to 1.13, key_padding_mask could be set to uint8 or other integer dtypes in TransformerEncoder and MultiheadAttention, which might generate unexpected results. In this release, these dtypes are not allowed for the mask anymore. Please convert them to torch.bool before using.

    1.12.1

    >>> layer = nn.TransformerEncoderLayer(2, 4, 2)
    >>> encoder = nn.TransformerEncoder(layer, 2)
    >>> pad_mask = torch.tensor([[1, 1, 0, 0]], dtype=torch.uint8)
    >>> inputs = torch.cat([torch.randn(1, 2, 2), torch.zeros(1, 2, 2)], dim=1)
    # works before 1.13
    >>> outputs = encoder(inputs, src_key_padding_mask=pad_mask)
    

    ... (truncated)

    Changelog

    Sourced from torch's changelog.

    Releasing PyTorch

    General Overview

    Releasing a new version of PyTorch generally entails 3 major steps:

    1. Cutting a release branch preparations
    2. Cutting a release branch and making release branch specific changes
    3. Drafting RCs (Release Candidates), and merging cherry picks
    4. Promoting RCs to stable and performing release day tasks

    Cutting a release branch preparations

    Following Requirements needs to be met prior to final RC Cut:

    • Resolve all outstanding issues in the milestones(for example 1.11.0)before first RC cut is completed. After RC cut is completed following script should be executed from builder repo in order to validate the presence of the fixes in the release branch : python github_analyze.py --repo-path ~/local/pytorch --remote upstream --branch release/1.11 --milestone-id 26 --missing-in-branch

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 0
  • Different evaluation results on HuggingFace and locally

    Different evaluation results on HuggingFace and locally

    Hello I encountered a problem.

    Model: jonatasgrosman/wav2vec2-xls-r-1b-russian

    Example 1

    On HuggingFace using Hosted inference API (good):

    рекомендуем при обращении в контактный центр использовать код клиента

    Locally using huggingsound library (bad, missing a whitespace)

    рекомендуем приобращение в контактный центр использовать кодклиента

    Example 2

    On HuggingFace using Hosted inference API (good):

    в настоящий момент по техническим причинам купюры номиналом пять тысяч рублей действительно не принимаются в некоторых банкоматах

    Locally using huggingsound library (bad, ending, syntax)

    в настоящий момент по техническим причинам купюра номеналом пять тысяч рублей действительно не принимаются в некоторых банкоматах

    opened by kirillrybin 0
  • Bump coverage from 5.5 to 6.5.0

    Bump coverage from 5.5 to 6.5.0

    Bumps coverage from 5.5 to 6.5.0.

    Release notes

    Sourced from coverage's releases.

    coverage-5.6b1

    • Third-party packages are now ignored in coverage reporting. This solves a few problems:
      • Coverage will no longer report about other people’s code (issue 876). This is true even when using --source=. with a venv in the current directory.
      • Coverage will no longer generate “Already imported a file that will be measured” warnings about coverage itself (issue 905).
    • The HTML report uses j/k to move up and down among the highlighted chunks of code. They used to highlight the current chunk, but 5.0 broke that behavior. Now the highlighting is working again.
    • The JSON report now includes percent_covered_display, a string with the total percentage, rounded to the same number of decimal places as the other reports’ totals.
    Changelog

    Sourced from coverage's changelog.

    Version 6.5.0 — 2022-09-29

    • The JSON report now includes details of which branches were taken, and which are missing for each file. Thanks, Christoph Blessing (pull 1438). Closes issue 1425.

    • Starting with coverage.py 6.2, class statements were marked as a branch. This wasn't right, and has been reverted, fixing issue 1449_. Note this will very slightly reduce your coverage total if you are measuring branch coverage.

    • Packaging is now compliant with PEP 517, closing issue 1395.

    • A new debug option --debug=pathmap shows details of the remapping of paths that happens during combine due to the [paths] setting.

    • Fix an internal problem with caching of invalid Python parsing. Found by OSS-Fuzz, fixing their bug 50381_.

    .. _bug 50381: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=50381 .. _PEP 517: https://peps.python.org/pep-0517/ .. _issue 1395: nedbat/coveragepy#1395 .. _issue 1425: nedbat/coveragepy#1425 .. _pull 1438: nedbat/coveragepy#1438 .. _issue 1449: nedbat/coveragepy#1449

    .. _changes_6-4-4:

    Version 6.4.4 — 2022-08-16

    • Wheels are now provided for Python 3.11.

    .. _changes_6-4-3:

    Version 6.4.3 — 2022-08-06

    • Fix a failure when combining data files if the file names contained glob-like patterns (pull 1405_). Thanks, Michael Krebs and Benjamin Schubert.

    • Fix a messaging failure when combining Windows data files on a different drive than the current directory. (pull 1430, fixing issue 1428). Thanks, Lorenzo Micò.

    • Fix path calculations when running in the root directory, as you might do in

    ... (truncated)

    Commits
    • 0ac2453 docs: sample html report
    • 0954c85 build: prep for 6.5.0
    • 95195b1 docs: changelog for json report branch details
    • 789f175 fix: keep negative arc values
    • aabc540 feat: include branches taken and missed in JSON report. #1425
    • a59fc44 docs: minor tweaks to db docs
    • d296083 docs: add a note to the class-branch change
    • 7f07df6 chore: make upgrade
    • 6bc29a9 build: use the badge action coloring
    • fd36918 fix: class statements shouldn't be branches. #1449
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 0
Releases(v0.1.6)
Owner
Jonatas Grosman
PhD Student in Computer Science at Pontifical Catholic University of Rio de Janeiro
Jonatas Grosman
PhoNLP: A BERT-based multi-task learning toolkit for part-of-speech tagging, named entity recognition and dependency parsing

PhoNLP is a multi-task learning model for joint part-of-speech (POS) tagging, named entity recognition (NER) and dependency parsing. Experiments on Vietnamese benchmark datasets show that PhoNLP produces state-of-the-art results, outperforming a single-task learning approach that fine-tunes the pre-trained Vietnamese language model PhoBERT for each task independently.

VinAI Research 109 Dec 2, 2022
SpeechBrain is an open-source and all-in-one speech toolkit based on PyTorch.

The goal is to create a single, flexible, and user-friendly toolkit that can be used to easily develop state-of-the-art speech technologies, including systems for speech recognition, speaker recognition, speech enhancement, multi-microphone signal processing and many others.

SpeechBrain 5.1k Jan 9, 2023
skweak: A software toolkit for weak supervision applied to NLP tasks

Labelled data remains a scarce resource in many practical NLP scenarios. This is especially the case when working with resource-poor languages (or text domains), or when using task-specific labels without pre-existing datasets. The only available option is often to collect and annotate texts by hand, which is expensive and time-consuming.

Norsk Regnesentral (Norwegian Computing Center) 850 Dec 28, 2022
pysentimiento: A Python toolkit for Sentiment Analysis and Social NLP tasks

A Python multilingual toolkit for Sentiment Analysis and Social NLP tasks

null 297 Dec 29, 2022
End-to-End Speech Processing Toolkit

ESPnet: end-to-end speech processing toolkit system/pytorch ver. 1.0.1 1.1.0 1.2.0 1.3.1 1.4.0 1.5.1 1.6.0 1.7.1 1.8.1 ubuntu18/python3.8/pip ubuntu18

ESPnet 5.9k Jan 3, 2023
Mirco Ravanelli 2.3k Dec 27, 2022
Espresso: A Fast End-to-End Neural Speech Recognition Toolkit

Espresso Espresso is an open-source, modular, extensible end-to-end neural automatic speech recognition (ASR) toolkit based on the deep learning libra

Yiming Wang 919 Jan 3, 2023
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform tasks on automatic speech recogniti

Soohwan Kim 26 Dec 14, 2022
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform tasks on automatic speech recogniti

Soohwan Kim 86 Jun 11, 2021
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

?? Contributing to OpenSpeech ?? OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform ta

Openspeech TEAM 513 Jan 3, 2023
ExKaldi-RT: An Online Speech Recognition Extension Toolkit of Kaldi

ExKaldi-RT is an online ASR toolkit for Python language. It reads realtime streaming audio and do online feature extraction, probability computation, and online decoding.

Wang Yu 31 Aug 16, 2021
IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models

IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models. Everything is pure Python and PyTorch based to keep it as simple and beginner-friendly, yet powerful as possible.

Digital Phonetics at the University of Stuttgart 247 Jan 5, 2023
text to speech toolkit. 好用的中文语音合成工具箱,包含语音编码器、语音合成器、声码器和可视化模块。

ttskit Text To Speech Toolkit: 语音合成工具箱。 安装 pip install -U ttskit 注意 可能需另外安装的依赖包:torch,版本要求torch>=1.6.0,<=1.7.1,根据自己的实际环境安装合适cuda或cpu版本的torch。 ttskit的

KDD 483 Jan 4, 2023
PyKaldi is a Python scripting layer for the Kaldi speech recognition toolkit.

PyKaldi is a Python scripting layer for the Kaldi speech recognition toolkit. It provides easy-to-use, low-overhead, first-class Python wrappers for t

null 922 Dec 31, 2022
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

⚠️ Checkout develop branch to see what is coming in pyannote.audio 2.0: a much smaller and cleaner codebase Python-first API (the good old pyannote-au

pyannote 2.2k Jan 9, 2023
Speech Recognition for Uyghur using Speech transformer

Speech Recognition for Uyghur using Speech transformer Training: this model using CTC loss and Cross Entropy loss for training. Download pretrained mo

Uyghur 11 Nov 17, 2022
Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Alexander Veysov 3.2k Dec 31, 2022