A PyTorch Implementation of End-to-End Models for Speech-to-Text

Overview

speech

Speech is an open-source package to build end-to-end models for automatic speech recognition. Sequence-to-sequence models with attention, Connectionist Temporal Classification and the RNN Sequence Transducer are currently supported.

The goal of this software is to facilitate research in end-to-end models for speech recognition. The models are implemented in PyTorch.

The software has only been tested in Python3.6.

We will not be providing backward compatability for Python2.7.

Install

We recommend creating a virtual environment and installing the python requirements there.

virtualenv <path_to_your_env>
source <path_to_your_env>/bin/activate
pip install -r requirements.txt

Then follow the installation instructions for a version of PyTorch which works for your machine.

After all the python requirements are installed, from the top level directory, run:

make

The build process requires CMake as well as Make.

After that, source the setup.sh from the repo root.

source setup.sh

Consider adding this to your bashrc.

You can verify the install was successful by running the tests from the tests directory.

cd tests
pytest

Run

To train a model run

python train.py <path_to_config>

After the model is done training you can evaluate it with

python eval.py <path_to_model> <path_to_data_json>

To see the available options for each script use -h:

python {train, eval}.py -h

Examples

For examples of model configurations and datasets, visit the examples directory. Each example dataset should have instructions and/or scripts for downloading and preparing the data. There should also be one or more model configurations available. The results for each configuration will documented in each examples corresponding README.md.

Comments
  • A question about the TRAINING SET used in timit script

    A question about the TRAINING SET used in timit script

    Normally we use the standard 462-speaker data as training set, while this timit exmaple use 556-speaker data(including some data from the full test set) in train.json. Although the WER results seem pretty promising in this repo, are the methods you use here really convincing or comparable?

    opened by wolverineq 6
  • KeyError: 'start_and_end'

    KeyError: 'start_and_end'

    When I try to run the "train.py", I get the following error:

    (venv-speech) sroca@nx2:~/speech>> python train.py examples/librispeech/config.json
    
    Traceback (most recent call last):
      File "train.py", line 145, in <module>
        run(config)
      File "train.py", line 80, in run
        start_and_end=data_cfg["start_and_end"])
    KeyError: 'start_and_end'
    srun: error: c8: task 0: Exited with exit code 1
    

    It seems that the object 'start_and_end' is not defined anywhere, so it can't be found.

    How can I fix it?

    opened by sroca8 5
  • RNN Transducer training problem

    RNN Transducer training problem

    Hi, It seems that your implementation of RNN Transducer loss function is right. But when I train Graves2012 TIMIT, the loss decrease, but the PER increase, no matter how to adjust learning rate. ( If choose a small lr, the PER would be first decrease, then increase all the time. )

    In your training procedure, the RNNT loss is exactly decreasing, but if you output the PER, it increasing! So what's wrong ?

    opened by HawkAaron 5
  • pytest failure

    pytest failure

    Environment

    Titan Xp CUDA 9.0 cnDNN 7.1.3

    Ubuntu 16.04 Python 2.7 Pytorch 0.4.0

    Code to reproduce the issue

    git clone https://github.com/awni/speech.git
    cd speech
    conda create -n asr -y python=2.7
    source activate asr
    pip install -r requirements
    pip install http://download.pytorch.org/whl/cu90/torch-0.4.0-cp27-cp27mu-linux_x86_64.whl 
    pip install torchvision 
    make
    source setup.sh
    cd test
    pytest
    

    when I was running the training on my own data (or with pytest), it fails with the following error:

    ERROR: TypeError: activations must be <type 'torch.FloatTensor'>
    

    Anyone has an idea what happens? This issue persists with or without GPU.

    ============================= test session starts ==============================
    platform linux2 -- Python 2.7.15, pytest-3.2.3, py-1.4.34, pluggy-0.4.0
    rootdir: /data2/colosseum/test-speech2/speech/tests, inifile:
    collected 9 items
    
    ctc_test.py F.
    io_test.py .
    loader_test.py ..
    model_test.py .
    seq2seq_test.py .
    wave_test.py ..
    
    =================================== FAILURES ===================================
    
    ________________________________ test_ctc_model ________________________________
    
        def test_ctc_model():
            freq_dim = 40
            vocab_size = 10
    
            batch = shared.gen_fake_data(freq_dim, vocab_size)
            batch_size = len(batch[0])
    
            model = CTC(freq_dim, vocab_size, shared.model_config)
            out = model(batch)
    
            assert out.size()[0] == batch_size
    
            # CTC model adds the blank token to the vocab
            assert out.size()[2] == (vocab_size + 1)
    
            assert len(out.size()) == 3
    
    >       loss = model.loss(batch)
    
    ctc_test.py:26:
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    ../speech/models/ctc_model.py:39: in loss
        loss = loss_fn(out, y, x_lens, y_lens)
    ../libs/warp-ctc/pytorch_binding/functions/ctc.py:77: in forward
        costs = parent.forward(*args)
    ../libs/warp-ctc/pytorch_binding/functions/ctc.py:41: in forward
        certify_inputs(activations, labels, lengths, label_lengths)
    ../libs/warp-ctc/pytorch_binding/functions/ctc.py:107: in certify_inputs
        check_type(activations, torch.FloatTensor, "activations")
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    
    var = tensor([[[-0.0090,  0.4523,  0.0716,  ...,  0.0900, -0.0668,  0.1392],
           ...1443],
             [ 0.1413, -0.0695,  0.0591,  ..., -0.3491, -0.0151, -0.0068]]])
    t = <type 'torch.FloatTensor'>, name = 'activations'
    
        def check_type(var, t, name):
            if type(var) is not t:
    >           raise TypeError("{} must be {}".format(name, t))
    E           TypeError: activations must be <type 'torch.FloatTensor'>
    
    ../libs/warp-ctc/pytorch_binding/functions/ctc.py:92: TypeError
    ====================== 1 failed, 8 passed in 3.60 seconds ======================
    
    opened by JeremyCCHsu 3
  • Availability of pretrained model for RNN Transducer & Seq2Seq Attention Model

    Availability of pretrained model for RNN Transducer & Seq2Seq Attention Model

    Hey, I wanted to inquire if there are any plans to open source the pretrained models, for the RNN Transducer and Seq2Seq Model? If there are such pretrained models, can anyone please share the link?

    opened by kdatta03 3
  • TIMIT PER

    TIMIT PER

    With the recommended Seq2seq config, I get the Timit PER of 28% on the test set (instead of the reported 18.7%). Is there anyone else with a similar experience and/or know what could be going wrong?

    Thank you!

    opened by ankitapasad 2
  • Errors with Installation

    Errors with Installation

    Hi,

    I have successfully installed the following: virtualenv e2e_awni source e2e_awni/bin/activate cd speech pip install -r requirements.txt

    As the next step, should I install pytorch while virtualenv is activated or not?

    The following errors occur If I install pytorch when virtualenv is activated:

    (e2e_awni)kevin@DEVBOX2:~$ pip install http://download.pytorch.org/whl/cu80/torch-0.4.1-cp27-cp27mu-linux_x86_64.whl torch-0.4.1-cp27-cp27mu-linux_x86_64.whl is not a supported wheel on this platform. Storing debug log for failure in /home/zhme/.pip/pip.log

    (e2e_awni)kevin@DEVBOX2:~$ pip install http://download.pytorch.org/whl/cu80/torch-0.4.1-cp27-cp27m-linux_x86_64.whl torch-0.4.1-cp27-cp27m-linux_x86_64.whl is not a supported wheel on this platform. Storing debug log for failure in /home/zhme/.pip/pip.log

    I can successfully install pytorch when virtualenv is deactivated. But the following errors occur when I run pytest under speech/tests after "make".

    (e2e_awni)kevin@DEVBOX2:~/speech/tests$ pytest

    ================================================================================== ERRORS =================================================================================== _______________________________________________________________________ ERROR collecting ctc_test.py ________________________________________________________________________ ImportError while importing test module '/home/kevin/speech/tests/ctc_test.py'. Hint: make sure your test modules/packages have valid Python names. Traceback: ctc_test.py:2: in import torch E ImportError: No module named torch ________________________________________________________________________ ERROR collecting io_test.py ________________________________________________________________________ ImportError while importing test module '/home/kevin/speech/tests/io_test.py'. Hint: make sure your test modules/packages have valid Python names. Traceback: io_test.py:3: in import speech.models E ImportError: No module named speech.models ______________________________________________________________________ ERROR collecting loader_test.py ______________________________________________________________________ ImportError while importing test module '/home/kevin/speech/tests/loader_test.py'. Hint: make sure your test modules/packages have valid Python names. Traceback: loader_test.py:3: in from speech import loader E ImportError: No module named speech ______________________________________________________________________ ERROR collecting model_test.py _______________________________________________________________________ ImportError while importing test module '/home/kevin/speech/tests/model_test.py'. Hint: make sure your test modules/packages have valid Python names. Traceback: model_test.py:3: in import torch E ImportError: No module named torch _____________________________________________________________________ ERROR collecting seq2seq_test.py ______________________________________________________________________ ImportError while importing test module '/home/kevin/speech/tests/seq2seq_test.py'. Hint: make sure your test modules/packages have valid Python names. Traceback: seq2seq_test.py:3: in import torch E ImportError: No module named torch _______________________________________________________________________ ERROR collecting wave_test.py _______________________________________________________________________ ImportError while importing test module '/home/kevin/speech/tests/wave_test.py'. Hint: make sure your test modules/packages have valid Python names. Traceback: wave_test.py:4: in import speech.utils.wave as wave E ImportError: No module named speech.utils.wave !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 6 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ========================================================================== 6 error in 0.23 seconds ==========================================================================

    Could you help me out?

    Thank you.

    opened by Energyquantum 2
  • Results on LibriSpeech very bad

    Results on LibriSpeech very bad

    Hi,

    I used this tool to train a seq to seq speech system on LibriSpeech data, however the results are very bad. Did you had similar results ? Did you know please how can i fix this issue ?

    Thank you Sahar

    opened by ghost 2
  • Attention explanation

    Attention explanation

    Is there any paper or tutorial that describes exactly the same attention mechanism that is used in this repository? I mean the fact that attention values are added, not concatenated, the usage of LinearND, and the fact that there is a convolution. Is there any place with the theory given? Thank you

    opened by smolendawid 2
  • Bump protobuf from 3.4.0 to 3.15.0

    Bump protobuf from 3.4.0 to 3.15.0

    Bumps protobuf from 3.4.0 to 3.15.0.

    Release notes

    Sourced from protobuf's releases.

    Protocol Buffers v3.15.0

    Protocol Compiler

    • Optional fields for proto3 are enabled by default, and no longer require the --experimental_allow_proto3_optional flag.

    C++

    • MessageDifferencer: fixed bug when using custom ignore with multiple unknown fields
    • Use init_seg in MSVC to push initialization to an earlier phase.
    • Runtime no longer triggers -Wsign-compare warnings.
    • Fixed -Wtautological-constant-out-of-range-compare warning.
    • DynamicCastToGenerated works for nullptr input for even if RTTI is disabled
    • Arena is refactored and optimized.
    • Clarified/specified that the exact value of Arena::SpaceAllocated() is an implementation detail users must not rely on. It should not be used in unit tests.
    • Change the signature of Any::PackFrom() to return false on error.
    • Add fast reflection getter API for strings.
    • Constant initialize the global message instances
    • Avoid potential for missed wakeup in UnknownFieldSet
    • Now Proto3 Oneof fields have "has" methods for checking their presence in C++.
    • Bugfix for NVCC
    • Return early in _InternalSerialize for empty maps.
    • Adding functionality for outputting map key values in proto path logging output (does not affect comparison logic) and stop printing 'value' in the path. The modified print functionality is in the MessageDifferencer::StreamReporter.
    • Fixed protocolbuffers/protobuf#8129
    • Ensure that null char symbol, package and file names do not result in a crash.
    • Constant initialize the global message instances
    • Pretty print 'max' instead of numeric values in reserved ranges.
    • Removed remaining instances of std::is_pod, which is deprecated in C++20.
    • Changes to reduce code size for unknown field handling by making uncommon cases out of line.
    • Fix std::is_pod deprecated in C++20 (#7180)
    • Fix some -Wunused-parameter warnings (#8053)
    • Fix detecting file as directory on zOS issue #8051 (#8052)
    • Don't include sys/param.h for _BYTE_ORDER (#8106)
    • remove CMAKE_THREAD_LIBS_INIT from pkgconfig CFLAGS (#8154)
    • Fix TextFormatMapTest.DynamicMessage issue#5136 (#8159)
    • Fix for compiler warning issue#8145 (#8160)
    • fix: support deprecated enums for GCC < 6 (#8164)
    • Fix some warning when compiling with Visual Studio 2019 on x64 target (#8125)

    Python

    • Provided an override for the reverse() method that will reverse the internal collection directly instead of using the other methods of the BaseContainer.
    • MessageFactory.CreateProtoype can be overridden to customize class creation.

    ... (truncated)

    Commits
    • ae50d9b Update protobuf version
    • 8260126 Update protobuf version
    • c741c46 Resovled issue in the .pb.cc files
    • eef2764 Resolved an issue where NO_DESTROY and CONSTINIT were in incorrect order
    • 0040102 Updated collect_all_artifacts.sh for Ubuntu Xenial
    • 26cb6a7 Delete root-owned files in Kokoro builds
    • 1e924ef Update port_def.inc
    • 9a80cf1 Update coded_stream.h
    • a97c4f4 Merge pull request #8276 from haberman/php-warning
    • 44cd75d Merge pull request #8282 from haberman/changelog
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • Bump pyyaml from 3.12 to 5.1

    Bump pyyaml from 3.12 to 5.1

    Bumps pyyaml from 3.12 to 5.1.

    Changelog

    Sourced from pyyaml's changelog.

    5.1 (2019-03-13)

    3.13 (2018-07-05)

    • Resolved issues around PyYAML working in Python 3.7.
    Commits
    • e471e86 Updates for 5.1 release
    • 9141e90 Windows Appveyor build
    • d6cbff6 Skip certain unicode tests when maxunicode not > 0xffff
    • 69103ba Update .travis.yml to use libyaml 0.2.2
    • 91c9435 Squash/merge pull request #105 from nnadeau/patch-1
    • 507a464 Make default_flow_style=False
    • 07c88c6 Allow to turn off sorting keys in Dumper
    • 611ba39 Include license file in the generated wheel package
    • 857dff1 Apply FullLoader/UnsafeLoader changes to lib3
    • 0cedb2a Deprecate/warn usage of yaml.load(input)
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot ignore this [patch|minor|major] version will close this PR and stop Dependabot creating any more for this minor/major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • CVE-2007-4559 Patch

    CVE-2007-4559 Patch

    Patching CVE-2007-4559

    Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

    If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

    opened by TrellixVulnTeam 0
  • Bump protobuf from 3.4.0 to 3.18.3

    Bump protobuf from 3.4.0 to 3.18.3

    Bumps protobuf from 3.4.0 to 3.18.3.

    Release notes

    Sourced from protobuf's releases.

    Protocol Buffers v3.18.3

    C++

    Protocol Buffers v3.16.1

    Java

    • Improve performance characteristics of UnknownFieldSet parsing (#9371)

    Protocol Buffers v3.18.2

    Java

    • Improve performance characteristics of UnknownFieldSet parsing (#9371)

    Protocol Buffers v3.18.1

    Python

    • Update setup.py to reflect that we now require at least Python 3.5 (#8989)
    • Performance fix for DynamicMessage: force GetRaw() to be inlined (#9023)

    Ruby

    • Update ruby_generator.cc to allow proto2 imports in proto3 (#9003)

    Protocol Buffers v3.18.0

    C++

    • Fix warnings raised by clang 11 (#8664)
    • Make StringPiece constructible from std::string_view (#8707)
    • Add missing capability attributes for LLVM 12 (#8714)
    • Stop using std::iterator (deprecated in C++17). (#8741)
    • Move field_access_listener from libprotobuf-lite to libprotobuf (#8775)
    • Fix #7047 Safely handle setlocale (#8735)
    • Remove deprecated version of SetTotalBytesLimit() (#8794)
    • Support arena allocation of google::protobuf::AnyMetadata (#8758)
    • Fix undefined symbol error around SharedCtor() (#8827)
    • Fix default value of enum(int) in json_util with proto2 (#8835)
    • Better Smaller ByteSizeLong
    • Introduce event filters for inject_field_listener_events
    • Reduce memory usage of DescriptorPool
    • For lazy fields copy serialized form when allowed.
    • Re-introduce the InlinedStringField class
    • v2 access listener
    • Reduce padding in the proto's ExtensionRegistry map.
    • GetExtension performance optimizations
    • Make tracker a static variable rather than call static functions
    • Support extensions in field access listener
    • Annotate MergeFrom for field access listener
    • Fix incomplete types for field access listener
    • Add map_entry/new_map_entry to SpecificField in MessageDifferencer. They record the map items which are different in MessageDifferencer's reporter.
    • Reduce binary size due to fieldless proto messages
    • TextFormat: ParseInfoTree supports getting field end location in addition to start.

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Bump numpy from 1.13.3 to 1.22.0

    Bump numpy from 1.13.3 to 1.22.0

    Bumps numpy from 1.13.3 to 1.22.0.

    Release notes

    Sourced from numpy's releases.

    v1.22.0

    NumPy 1.22.0 Release Notes

    NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

    • Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.
    • A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.
    • NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.
    • New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.
    • A new configurable allocator for use by downstream projects.

    These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

    The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

    Expired deprecations

    Deprecated numeric style dtype strings have been removed

    Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

    (gh-19539)

    Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

    numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

    (gh-19615)

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Make Requires Cuda

    Make Requires Cuda

    When we follow the installation instructions, the "make" command throws us the error, "CUDA_TOOKIT_ROOT_DIR not found". How do we build this repo on a machine without a GPU? Thanks!

    opened by goldblum 0
  • Fix Librispeech + Support Python3.6

    Fix Librispeech + Support Python3.6

    LibirSpeech Config was out of date, therefore updated and seed changed to 2019 just as proof of change. Changed train.py and added batch=list(batch) twice, because zipped objects terminate after 1 epoch.

    opened by thethiny 0
Owner
Awni Hannun
Research Scientist at Facebook AI Research
Awni Hannun
A PyTorch Implementation of End-to-End Models for Speech-to-Text

speech Speech is an open-source package to build end-to-end models for automatic speech recognition. Sequence-to-sequence models with attention, Conne

Awni Hannun 647 Dec 25, 2022
Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Alexander Veysov 3.2k Dec 31, 2022
Rhasspy 673 Dec 28, 2022
Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks. It takes raw videos/images + text as inputs, and outputs task predictions. ClipBERT is designed based on 2D CNNs and transformers, and uses a sparse sampling strategy to enable efficient end-to-end video-and-language learning.

Jie Lei 雷杰 612 Jan 4, 2023
Simple Speech to Text, Text to Speech

Simple Speech to Text, Text to Speech 1. Download Repository Opsi 1 Download repository ini, extract di lokasi yang diinginkan Opsi 2 Jika sudah famil

Habib Abdurrasyid 5 Dec 28, 2021
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform tasks on automatic speech recogniti

Soohwan Kim 26 Dec 14, 2022
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform tasks on automatic speech recogniti

Soohwan Kim 86 Jun 11, 2021
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

?? Contributing to OpenSpeech ?? OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform ta

Openspeech TEAM 513 Jan 3, 2023
Athena is an open-source implementation of end-to-end speech processing engine.

Athena is an open-source implementation of end-to-end speech processing engine. Our vision is to empower both industrial application and academic research on end-to-end models for speech processing. To make speech processing available to everyone, we're also releasing example implementation and recipe on some opensource dataset for various tasks (Automatic Speech Recognition, Speech Synthesis, Voice Conversion, Speaker Recognition, etc).

Ke Technologies 34 Sep 8, 2022
glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end.

Glow-Speak glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end. Installation git clone https://g

Rhasspy 8 Dec 25, 2022
End-to-End Speech Processing Toolkit

ESPnet: end-to-end speech processing toolkit system/pytorch ver. 1.0.1 1.1.0 1.2.0 1.3.1 1.4.0 1.5.1 1.6.0 1.7.1 1.8.1 ubuntu18/python3.8/pip ubuntu18

ESPnet 5.9k Jan 3, 2023
Espresso: A Fast End-to-End Neural Speech Recognition Toolkit

Espresso Espresso is an open-source, modular, extensible end-to-end neural automatic speech recognition (ASR) toolkit based on the deep learning libra

Yiming Wang 919 Jan 3, 2023
End-2-end speech synthesis with recurrent neural networks

Introduction New: Interactive demo using Google Colaboratory can be found here TTS-Cube is an end-2-end speech synthesis system that provides a full p

Tiberiu Boros 214 Dec 7, 2022
SHAS: Approaching optimal Segmentation for End-to-End Speech Translation

SHAS: Approaching optimal Segmentation for End-to-End Speech Translation In this repo you can find the code of the Supervised Hybrid Audio Segmentatio

Machine Translation @ UPC 21 Dec 20, 2022
PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

Deepvoice3_pytorch PyTorch implementation of convolutional networks-based text-to-speech synthesis models: arXiv:1710.07654: Deep Voice 3: Scaling Tex

Ryuichi Yamamoto 1.8k Dec 30, 2022
Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

null 186 Dec 24, 2022
A Python module made to simplify the usage of Text To Speech and Speech Recognition.

Nav Module The solution for voice related stuff in Python Nav is a Python module which simplifies voice related stuff in Python. Just import the Modul

Snm Logic 1 Dec 20, 2021
Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

STEMM: Self-learning with Speech-Text Manifold Mixup for Speech Translation This is a PyTorch implementation for the ACL 2022 main conference paper ST

ICTNLP 29 Oct 16, 2022