Replication Package for AequeVox:Automated Fariness Testing for Speech Recognition Systems

Sai Sathiesh

Last update: Aug 28, 2022

Related tags

Deep Learning AequeVox

Overview

AequeVox

Replication Package for AequeVox:Automated Fariness Testing for Speech Recognition Systems

README under development.

Python Packages Required

numpy
scipy
math
librosa
random
time
json
threading
re
nltk

ASR Specific Packages

Google Cloud

speech
Storage

Microsoft Azure

Azure.cognitiveservices.speech

IBM Cloud

ibm_watson
ibm_watson.websocket
Ibm_cloud_sdk_core.authenticators

The code is separated into 2 sections, Generation and Analysis.

Generation:

transGen.py

Lists all transformation types and magnitudes to be used. Can be modified as necessary.
Requires the specification of file names of all the original speech files.

Generates transformed speech files with form {Original File Name}{Transformation Type Abbreviation}{Magnitude of Transformation Parameter, theta}.wav

List of Abbreviations.

A - Amplitude
C - Clipping
D - Drop
F - Frame
HP - Highpass
LP - LP
N - Noise
S - Scale

GCP_Recog.py

Requires Google cloud client libraries and associated keys.

Takes a group name and the list of all original files in the group to generate transcripts.

MS_Recog.py

Requires Microsoft Azure client libraries and associated key and region.

Takes a group name and the list of all original files in the group to generate transcripts.

IBM_Recog.py

Requires IBM client libraries and associated key and service URL..

Takes a group name and the list of all original files in the group to generate transcripts.

compASR.py

Takes the names of two ASR systems and group names to generate a distance metric. Result yields text files with distance metrics for specified groups.

Users are requested to use the distance metrics to calculate the D values for each transformation.

You might also like...

Testing the Facial Emotion Recognition (FER) algorithm on animations

PegHeads-Tutorial-3 Testing the Facial Emotion Recognition (FER) algorithm on animations

2 Jan 3, 2022

AutoVideo: An Automated Video Action Recognition System

AutoVideo is a system for automated video analysis. It is developed based on D3M infrastructure, which describes machine learning with generic pipeline languages. Currently, it focuses on video action recognition, supporting various state-of-the-art video action recognition algorithms. It also supports automated model selection and hyperparameter tuning. AutoVideo is developed by DATA Lab at Texas A&M University.

Data Analytics Lab at Texas A&M University

267 Dec 17, 2022

PyTorch implementation of "ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context" (INTERSPEECH 2020)

ContextNet ContextNet has CNN-RNN-transducer architecture and features a fully convolutional encoder that incorporates global context information into

24 Nov 24, 2022

Replication Package for AequeVox:Automated Fariness Testing for Speech Recognition Systems

Related tags

Overview

AequeVox

You might also like...

Testing the Facial Emotion Recognition (FER) algorithm on animations

AutoVideo: An Automated Video Action Recognition System

Automated Attendance Project Using Face Recognition

An automated facial recognition based attendance system (desktop application)

A repository built on the Flow software package to explore cyber-security attacks on intelligent transportation systems.

PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)

AI grand challenge 2020 Repo (Speech Recognition Track)

PyTorch Lightning implementation of Automatic Speech Recognition

PyTorch implementation of "ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context" (INTERSPEECH 2020)

Owner

Sai Sathiesh

Replication attempt for the Protein Folding Model

Replication of Pix2Seq with Pretrained Model

African language Speech Recognition - Speech-to-Text

AutoPentest-DRL: Automated Penetration Testing Using Deep Reinforcement Learning

Flybirds - BDD-driven natural language automated testing framework, present by Trip Flight

automated systems to assist guarding corona Virus precautions for Closed Rooms (e.g. Halls, offices, etc..)

A Python-based development platform for automated trading systems - from backtesting to optimisation to livetrading.

Code for Private Recommender Systems: How Can Users Build Their Own Fair Recommender Systems without Log Data? (SDM 2022)

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.