A PyTorch Implementation of End-to-End Models for Speech-to-Text

Awni Hannun

Last update: Dec 25, 2022

Related tags

Text Data & NLP speech

Overview

speech

Speech is an open-source package to build end-to-end models for automatic speech recognition. Sequence-to-sequence models with attention, Connectionist Temporal Classification and the RNN Sequence Transducer are currently supported.

The goal of this software is to facilitate research in end-to-end models for speech recognition. The models are implemented in PyTorch.

The software has only been tested in Python3.6.

We will not be providing backward compatability for Python2.7.

Install

We recommend creating a virtual environment and installing the python requirements there.

virtualenv <path_to_your_env>
source <path_to_your_env>/bin/activate
pip install -r requirements.txt

Then follow the installation instructions for a version of PyTorch which works for your machine.

After all the python requirements are installed, from the top level directory, run:

make

The build process requires CMake as well as Make.

After that, source the setup.sh from the repo root.

source setup.sh

Consider adding this to your bashrc.

You can verify the install was successful by running the tests from the tests directory.

cd tests
pytest

Run

To train a model run

python train.py <path_to_config>

After the model is done training you can evaluate it with

python eval.py <path_to_model> <path_to_data_json>

To see the available options for each script use -h:

python {train, eval}.py -h

Examples

For examples of model configurations and datasets, visit the examples directory. Each example dataset should have instructions and/or scripts for downloading and preparing the data. There should also be one or more model configurations available. The results for each configuration will documented in each examples corresponding README.md.

Comments

A question about the TRAINING SET used in timit script

Normally we use the standard 462-speaker data as training set, while this timit exmaple use 556-speaker data(including some data from the full test set) in train.json. Although the WER results seem pretty promising in this repo, are the methods you use here really convincing or comparable?

opened by wolverineq 6

KeyError: 'start_and_end'

When I try to run the "train.py", I get the following error:

(venv-speech) sroca@nx2:~/speech>> python train.py examples/librispeech/config.json

Traceback (most recent call last):
  File "train.py", line 145, in <module>
    run(config)
  File "train.py", line 80, in run
    start_and_end=data_cfg["start_and_end"])
KeyError: 'start_and_end'
srun: error: c8: task 0: Exited with exit code 1

It seems that the object 'start_and_end' is not defined anywhere, so it can't be found.

How can I fix it?

opened by sroca8 5

RNN Transducer training problem

Hi, It seems that your implementation of RNN Transducer loss function is right. But when I train Graves2012 TIMIT, the loss decrease, but the PER increase, no matter how to adjust learning rate. ( If choose a small lr, the PER would be first decrease, then increase all the time. )

In your training procedure, the RNNT loss is exactly decreasing, but if you output the PER, it increasing! So what's wrong ?

opened by HawkAaron 5

pytest failure

Environment

Titan Xp CUDA 9.0 cnDNN 7.1.3

Ubuntu 16.04 Python 2.7 Pytorch 0.4.0

Code to reproduce the issue

git clone https://github.com/awni/speech.git
cd speech
conda create -n asr -y python=2.7
source activate asr
pip install -r requirements
pip install http://download.pytorch.org/whl/cu90/torch-0.4.0-cp27-cp27mu-linux_x86_64.whl 
pip install torchvision 
make
source setup.sh
cd test
pytest

when I was running the training on my own data (or with pytest), it fails with the following error:

ERROR: TypeError: activations must be <type 'torch.FloatTensor'>

Anyone has an idea what happens? This issue persists with or without GPU.

============================= test session starts ==============================
platform linux2 -- Python 2.7.15, pytest-3.2.3, py-1.4.34, pluggy-0.4.0
rootdir: /data2/colosseum/test-speech2/speech/tests, inifile:
collected 9 items

ctc_test.py F.
io_test.py .
loader_test.py ..
model_test.py .
seq2seq_test.py .
wave_test.py ..

=================================== FAILURES ===================================

________________________________ test_ctc_model ________________________________

    def test_ctc_model():
        freq_dim = 40
        vocab_size = 10

        batch = shared.gen_fake_data(freq_dim, vocab_size)
        batch_size = len(batch[0])

        model = CTC(freq_dim, vocab_size, shared.model_config)
        out = model(batch)

        assert out.size()[0] == batch_size

        # CTC model adds the blank token to the vocab
        assert out.size()[2] == (vocab_size + 1)

        assert len(out.size()) == 3

>       loss = model.loss(batch)

ctc_test.py:26:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../speech/models/ctc_model.py:39: in loss
    loss = loss_fn(out, y, x_lens, y_lens)
../libs/warp-ctc/pytorch_binding/functions/ctc.py:77: in forward
    costs = parent.forward(*args)
../libs/warp-ctc/pytorch_binding/functions/ctc.py:41: in forward
    certify_inputs(activations, labels, lengths, label_lengths)
../libs/warp-ctc/pytorch_binding/functions/ctc.py:107: in certify_inputs
    check_type(activations, torch.FloatTensor, "activations")
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

var = tensor([[[-0.0090,  0.4523,  0.0716,  ...,  0.0900, -0.0668,  0.1392],
       ...1443],
         [ 0.1413, -0.0695,  0.0591,  ..., -0.3491, -0.0151, -0.0068]]])
t = <type 'torch.FloatTensor'>, name = 'activations'

    def check_type(var, t, name):
        if type(var) is not t:
>           raise TypeError("{} must be {}".format(name, t))
E           TypeError: activations must be <type 'torch.FloatTensor'>

../libs/warp-ctc/pytorch_binding/functions/ctc.py:92: TypeError
====================== 1 failed, 8 passed in 3.60 seconds ======================

opened by JeremyCCHsu 3

Availability of pretrained model for RNN Transducer & Seq2Seq Attention Model

Hey, I wanted to inquire if there are any plans to open source the pretrained models, for the RNN Transducer and Seq2Seq Model? If there are such pretrained models, can anyone please share the link?

opened by kdatta03 3
TIMIT PER

With the recommended Seq2seq config, I get the Timit PER of 28% on the test set (instead of the reported 18.7%). Is there anyone else with a similar experience and/or know what could be going wrong?

Thank you!

opened by ankitapasad 2
Errors with Installation

Hi,

I have successfully installed the following: virtualenv e2e_awni source e2e_awni/bin/activate cd speech pip install -r requirements.txt

As the next step, should I install pytorch while virtualenv is activated or not?

The following errors occur If I install pytorch when virtualenv is activated:

(e2e_awni)kevin@DEVBOX2:~$ pip install http://download.pytorch.org/whl/cu80/torch-0.4.1-cp27-cp27mu-linux_x86_64.whl torch-0.4.1-cp27-cp27mu-linux_x86_64.whl is not a supported wheel on this platform. Storing debug log for failure in /home/zhme/.pip/pip.log

(e2e_awni)kevin@DEVBOX2:~$ pip install http://download.pytorch.org/whl/cu80/torch-0.4.1-cp27-cp27m-linux_x86_64.whl torch-0.4.1-cp27-cp27m-linux_x86_64.whl is not a supported wheel on this platform. Storing debug log for failure in /home/zhme/.pip/pip.log

I can successfully install pytorch when virtualenv is deactivated. But the following errors occur when I run pytest under speech/tests after "make".

(e2e_awni)kevin@DEVBOX2:~/speech/tests$ pytest

================================================================================== ERRORS =================================================================================== _______________________________________________________________________ ERROR collecting ctc_test.py ________________________________________________________________________ ImportError while importing test module '/home/kevin/speech/tests/ctc_test.py'. Hint: make sure your test modules/packages have valid Python names. Traceback: ctc_test.py:2: in import torch E ImportError: No module named torch ________________________________________________________________________ ERROR collecting io_test.py ________________________________________________________________________ ImportError while importing test module '/home/kevin/speech/tests/io_test.py'. Hint: make sure your test modules/packages have valid Python names. Traceback: io_test.py:3: in import speech.models E ImportError: No module named speech.models ______________________________________________________________________ ERROR collecting loader_test.py ______________________________________________________________________ ImportError while importing test module '/home/kevin/speech/tests/loader_test.py'. Hint: make sure your test modules/packages have valid Python names. Traceback: loader_test.py:3: in from speech import loader E ImportError: No module named speech ______________________________________________________________________ ERROR collecting model_test.py _______________________________________________________________________ ImportError while importing test module '/home/kevin/speech/tests/model_test.py'. Hint: make sure your test modules/packages have valid Python names. Traceback: model_test.py:3: in import torch E ImportError: No module named torch _____________________________________________________________________ ERROR collecting seq2seq_test.py ______________________________________________________________________ ImportError while importing test module '/home/kevin/speech/tests/seq2seq_test.py'. Hint: make sure your test modules/packages have valid Python names. Traceback: seq2seq_test.py:3: in import torch E ImportError: No module named torch _______________________________________________________________________ ERROR collecting wave_test.py _______________________________________________________________________ ImportError while importing test module '/home/kevin/speech/tests/wave_test.py'. Hint: make sure your test modules/packages have valid Python names. Traceback: wave_test.py:4: in import speech.utils.wave as wave E ImportError: No module named speech.utils.wave !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 6 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ========================================================================== 6 error in 0.23 seconds ==========================================================================

Could you help me out?

Thank you.

opened by Energyquantum 2
Results on LibriSpeech very bad

Hi,

I used this tool to train a seq to seq speech system on LibriSpeech data, however the results are very bad. Did you had similar results ? Did you know please how can i fix this issue ?

Thank you Sahar

opened by ghost 2
Attention explanation

Is there any paper or tutorial that describes exactly the same attention mechanism that is used in this repository? I mean the fact that attention values are added, not concatenated, the usage of LinearND, and the fact that there is a convolution. Is there any place with the theory given? Thank you

opened by smolendawid 2
Bump protobuf from 3.4.0 to 3.15.0
Bumps protobuf from 3.4.0 to 3.15.0.

Release notes

Sourced from protobuf's releases.

Protocol Buffers v3.15.0

Protocol Compiler

Optional fields for proto3 are enabled by default, and no longer require the --experimental_allow_proto3_optional flag.

C++

MessageDifferencer: fixed bug when using custom ignore with multiple unknown fields

Use init_seg in MSVC to push initialization to an earlier phase.

Runtime no longer triggers -Wsign-compare warnings.

Fixed -Wtautological-constant-out-of-range-compare warning.

DynamicCastToGenerated works for nullptr input for even if RTTI is disabled

Arena is refactored and optimized.

Clarified/specified that the exact value of Arena::SpaceAllocated() is an implementation detail users must not rely on. It should not be used in unit tests.

Change the signature of Any::PackFrom() to return false on error.

Add fast reflection getter API for strings.

Constant initialize the global message instances

Avoid potential for missed wakeup in UnknownFieldSet

Now Proto3 Oneof fields have "has" methods for checking their presence in C++.

Bugfix for NVCC

Return early in _InternalSerialize for empty maps.

Adding functionality for outputting map key values in proto path logging output (does not affect comparison logic) and stop printing 'value' in the path. The modified print functionality is in the MessageDifferencer::StreamReporter.

Fixed protocolbuffers/protobuf#8129

Ensure that null char symbol, package and file names do not result in a crash.

Constant initialize the global message instances

Pretty print 'max' instead of numeric values in reserved ranges.

Removed remaining instances of std::is_pod, which is deprecated in C++20.

Changes to reduce code size for unknown field handling by making uncommon cases out of line.

Fix std::is_pod deprecated in C++20 (#7180)

Fix some -Wunused-parameter warnings (#8053)

Fix detecting file as directory on zOS issue #8051 (#8052)

Don't include sys/param.h for _BYTE_ORDER (#8106)

remove CMAKE_THREAD_LIBS_INIT from pkgconfig CFLAGS (#8154)

Fix TextFormatMapTest.DynamicMessage issue#5136 (#8159)

Fix for compiler warning issue#8145 (#8160)

fix: support deprecated enums for GCC < 6 (#8164)

Fix some warning when compiling with Visual Studio 2019 on x64 target (#8125)

Python

Provided an override for the reverse() method that will reverse the internal collection directly instead of using the other methods of the BaseContainer.

MessageFactory.CreateProtoype can be overridden to customize class creation.

... (truncated)

Commits

ae50d9b Update protobuf version

8260126 Update protobuf version

c741c46 Resovled issue in the .pb.cc files

eef2764 Resolved an issue where NO_DESTROY and CONSTINIT were in incorrect order

0040102 Updated collect_all_artifacts.sh for Ubuntu Xenial

26cb6a7 Delete root-owned files in Kokoro builds

1e924ef Update port_def.inc

9a80cf1 Update coded_stream.h

a97c4f4 Merge pull request #8276 from haberman/php-warning

44cd75d Merge pull request #8282 from haberman/changelog

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 1
Bump pyyaml from 3.12 to 5.1
Bumps pyyaml from 3.12 to 5.1.

Changelog

Sourced from pyyaml's changelog.

5.1 (2019-03-13)

yaml/pyyaml#35 -- Some modernization of the test running

yaml/pyyaml#42 -- Install tox in a virtualenv

yaml/pyyaml#45 -- Allow colon in a plain scalar in a flow context

yaml/pyyaml#48 -- Fix typos

yaml/pyyaml#55 -- Improve RepresenterError creation

yaml/pyyaml#59 -- Resolves #57, update readme issues link

yaml/pyyaml#60 -- Document and test Python 3.6 support

yaml/pyyaml#61 -- Use Travis CI built in pip cache support

yaml/pyyaml#62 -- Remove tox workaround for Travis CI

yaml/pyyaml#63 -- Adding support to Unicode characters over codepoint 0xffff

yaml/pyyaml#65 -- Support unicode literals over codepoint 0xffff

yaml/pyyaml#75 -- add 3.12 changelog

yaml/pyyaml#76 -- Fallback to Pure Python if Compilation fails

yaml/pyyaml#84 -- Drop unsupported Python 3.3

yaml/pyyaml#102 -- Include license file in the generated wheel package

yaml/pyyaml#105 -- Removed Python 2.6 & 3.3 support

yaml/pyyaml#111 -- Remove commented out Psyco code

yaml/pyyaml#129 -- Remove call to ord in lib3 emitter code

yaml/pyyaml#143 -- Allow to turn off sorting keys in Dumper

yaml/pyyaml#149 -- Test on Python 3.7-dev

yaml/pyyaml#158 -- Support escaped slash in double quotes "/"

yaml/pyyaml#181 -- Import Hashable from collections.abc

yaml/pyyaml#256 -- Make default_flow_style=False

yaml/pyyaml#257 -- Deprecate yaml.load and add FullLoader and UnsafeLoader classes

yaml/pyyaml#263 -- Windows Appveyor build

3.13 (2018-07-05)

Resolved issues around PyYAML working in Python 3.7.

Commits

e471e86 Updates for 5.1 release

9141e90 Windows Appveyor build

d6cbff6 Skip certain unicode tests when maxunicode not > 0xffff

69103ba Update .travis.yml to use libyaml 0.2.2

91c9435 Squash/merge pull request #105 from nnadeau/patch-1

507a464 Make default_flow_style=False

07c88c6 Allow to turn off sorting keys in Dumper

611ba39 Include license file in the generated wheel package

857dff1 Apply FullLoader/UnsafeLoader changes to lib3

0cedb2a Deprecate/warn usage of yaml.load(input)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot ignore this [patch|minor|major] version will close this PR and stop Dependabot creating any more for this minor/major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 1
CVE-2007-4559 Patch

Patching CVE-2007-4559

Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

opened by TrellixVulnTeam 0
Bump protobuf from 3.4.0 to 3.18.3
Bumps protobuf from 3.4.0 to 3.18.3.

Release notes

Sourced from protobuf's releases.

Protocol Buffers v3.18.3

C++

Reduce memory consumption of MessageSet parsing

This release addresses a Security Advisory for C++ and Python users

Protocol Buffers v3.16.1

Java

Improve performance characteristics of UnknownFieldSet parsing (#9371)

Protocol Buffers v3.18.2

Java

Improve performance characteristics of UnknownFieldSet parsing (#9371)

Protocol Buffers v3.18.1

Python

Update setup.py to reflect that we now require at least Python 3.5 (#8989)

Performance fix for DynamicMessage: force GetRaw() to be inlined (#9023)

Ruby

Update ruby_generator.cc to allow proto2 imports in proto3 (#9003)

Protocol Buffers v3.18.0

C++

Fix warnings raised by clang 11 (#8664)

Make StringPiece constructible from std::string_view (#8707)

Add missing capability attributes for LLVM 12 (#8714)

Stop using std::iterator (deprecated in C++17). (#8741)

Move field_access_listener from libprotobuf-lite to libprotobuf (#8775)

Fix #7047 Safely handle setlocale (#8735)

Remove deprecated version of SetTotalBytesLimit() (#8794)

Support arena allocation of google::protobuf::AnyMetadata (#8758)

Fix undefined symbol error around SharedCtor() (#8827)

Fix default value of enum(int) in json_util with proto2 (#8835)

Better Smaller ByteSizeLong

Introduce event filters for inject_field_listener_events

Reduce memory usage of DescriptorPool

For lazy fields copy serialized form when allowed.

Re-introduce the InlinedStringField class

v2 access listener

Reduce padding in the proto's ExtensionRegistry map.

GetExtension performance optimizations

Make tracker a static variable rather than call static functions

Support extensions in field access listener

Annotate MergeFrom for field access listener

Fix incomplete types for field access listener

Add map_entry/new_map_entry to SpecificField in MessageDifferencer. They record the map items which are different in MessageDifferencer's reporter.

Reduce binary size due to fieldless proto messages

TextFormat: ParseInfoTree supports getting field end location in addition to start.

... (truncated)

Commits

a902b39 No-op whitespace change

ae62acd Updating version.json and repo version numbers to: 18.3

f43ac49 Merge pull request #10542 from deannagarcia/3.18.x

9efdf55 Add missing includes

d1635e1 Apply patch

5b37c91 Update version.json with "lts": true (#10534)

c39d622 Merge pull request #10529 from protocolbuffers/deannagarcia-patch-5

f77d3b6 Update version.json

8178b06 Merge pull request #10503 from deannagarcia/3.18.x

24ca839 Add version file

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
Bump numpy from 1.13.3 to 1.22.0
Bumps numpy from 1.13.3 to 1.22.0.

Release notes

Sourced from numpy's releases.

v1.22.0

NumPy 1.22.0 Release Notes

NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.

A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.

NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.

New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.

A new configurable allocator for use by downstream projects.

These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

Expired deprecations

Deprecated numeric style dtype strings have been removed

Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

(gh-19539)

Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

(gh-19615)

... (truncated)

Commits

4adc87d Merge pull request #20685 from charris/prepare-for-1.22.0-release

fd66547 REL: Prepare for the NumPy 1.22.0 release.

125304b wip

c283859 Merge pull request #20682 from charris/backport-20416

5399c03 Merge pull request #20681 from charris/backport-20954

f9c45f8 Merge pull request #20680 from charris/backport-20663

794b36f Update armccompiler.py

d93b14e Update test_public_api.py

7662c07 Update init.py

311ab52 Update armccompiler.py

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
Make Requires Cuda

When we follow the installation instructions, the "make" command throws us the error, "CUDA_TOOKIT_ROOT_DIR not found". How do we build this repo on a machine without a GPU? Thanks!

opened by goldblum 0
Fix Librispeech + Support Python3.6

LibirSpeech Config was out of date, therefore updated and seed changed to 2019 just as proof of change. Changed train.py and added batch=list(batch) twice, because zipped objects terminate after 1 epoch.

opened by thethiny 0

Owner

Awni Hannun

Research Scientist at Facebook AI Research

GitHub

A PyTorch Implementation of End-to-End Models for Speech-to-Text

speech Speech is an open-source package to build end-to-end models for automatic speech recognition. Sequence-to-sequence models with attention, Conne

647 Dec 25, 2022

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

3.2k Dec 31, 2022

In this repository, I have developed an end to end Automatic speech recognition project. I have developed the neural network model for automatic speech recognition with PyTorch and used MLflow to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry.

End to End Automatic Speech Recognition In this repository, I have developed an end to end Automatic speech recognition project. I have developed the

22 Nov 13, 2022

End-to-end text to speech system using gruut and onnx. There are 40 voices available across 8 languages.

End to end text to speech system using gruut and onnx

673 Dec 28, 2022

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks. It takes raw videos/images + text as inputs, and outputs task predictions. ClipBERT is designed based on 2D CNNs and transformers, and uses a sparse sampling strategy to enable efficient end-to-end video-and-language learning.

612 Jan 4, 2023

Simple Speech to Text, Text to Speech

Simple Speech to Text, Text to Speech 1. Download Repository Opsi 1 Download repository ini, extract di lokasi yang diinginkan Opsi 2 Jika sudah famil

5 Dec 28, 2021

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform tasks on automatic speech recogniti

26 Dec 14, 2022

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform tasks on automatic speech recogniti

86 Jun 11, 2021

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

?? Contributing to OpenSpeech ?? OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform ta

513 Jan 3, 2023

Athena is an open-source implementation of end-to-end speech processing engine.

Athena is an open-source implementation of end-to-end speech processing engine. Our vision is to empower both industrial application and academic research on end-to-end models for speech processing. To make speech processing available to everyone, we're also releasing example implementation and recipe on some opensource dataset for various tasks (Automatic Speech Recognition, Speech Synthesis, Voice Conversion, Speaker Recognition, etc).

34 Sep 8, 2022

glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end.

Glow-Speak glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end. Installation git clone https://g

8 Dec 25, 2022

End-to-End Speech Processing Toolkit

ESPnet: end-to-end speech processing toolkit system/pytorch ver. 1.0.1 1.1.0 1.2.0 1.3.1 1.4.0 1.5.1 1.6.0 1.7.1 1.8.1 ubuntu18/python3.8/pip ubuntu18

5.9k Jan 3, 2023

Espresso: A Fast End-to-End Neural Speech Recognition Toolkit

Espresso Espresso is an open-source, modular, extensible end-to-end neural automatic speech recognition (ASR) toolkit based on the deep learning libra

919 Jan 3, 2023

End-2-end speech synthesis with recurrent neural networks

Introduction New: Interactive demo using Google Colaboratory can be found here TTS-Cube is an end-2-end speech synthesis system that provides a full p

214 Dec 7, 2022

SHAS: Approaching optimal Segmentation for End-to-End Speech Translation

SHAS: Approaching optimal Segmentation for End-to-End Speech Translation In this repo you can find the code of the Supervised Hybrid Audio Segmentatio

21 Dec 20, 2022

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

Deepvoice3_pytorch PyTorch implementation of convolutional networks-based text-to-speech synthesis models: arXiv:1710.07654: Deep Voice 3: Scaling Tex

1.8k Dec 30, 2022

Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU，一个中文文本分类、序列标注工具包，支持中文长文本、短文本的多类、多标签分类任务，支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

186 Dec 24, 2022

A Python module made to simplify the usage of Text To Speech and Speech Recognition.

Nav Module The solution for voice related stuff in Python Nav is a Python module which simplifies voice related stuff in Python. Just import the Modul

1 Dec 20, 2021

Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

STEMM: Self-learning with Speech-Text Manifold Mixup for Speech Translation This is a PyTorch implementation for the ACL 2022 main conference paper ST

29 Oct 16, 2022

A PyTorch Implementation of End-to-End Models for Speech-to-Text

Related tags

Overview

speech

Install

Run

Examples

Comments

Environment

Code to reproduce the issue

Protocol Buffers v3.15.0

Protocol Compiler

C++

Python

5.1 (2019-03-13)

3.13 (2018-07-05)

Patching CVE-2007-4559

Protocol Buffers v3.18.3

C++

Protocol Buffers v3.16.1

Java

Protocol Buffers v3.18.2

Java

Protocol Buffers v3.18.1

Python

Ruby

Protocol Buffers v3.18.0

C++

v1.22.0

NumPy 1.22.0 Release Notes

Expired deprecations

Deprecated numeric style dtype strings have been removed

Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

Owner

Awni Hannun

A PyTorch Implementation of End-to-End Models for Speech-to-Text

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

End-to-end text to speech system using gruut and onnx. There are 40 voices available across 8 languages.

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks

Simple Speech to Text, Text to Speech

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

Athena is an open-source implementation of end-to-end speech processing engine.

glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end.

End-to-End Speech Processing Toolkit

Espresso: A Fast End-to-End Neural Speech Recognition Toolkit

End-2-end speech synthesis with recurrent neural networks

SHAS: Approaching optimal Segmentation for End-to-End Speech Translation

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

A Python module made to simplify the usage of Text To Speech and Speech Recognition.

Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

Expired deprecations for `loads`, `ndfromtxt`, and `mafromtxt` in npyio