A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models
This repository accompanies the
How to contribute
(Mostly identical to the huggingface/datasets contributing guide)
Fork the repository by clicking on the 'Fork' button on the repository's page. This creates a copy of the code under your GitHub user account.
Clone your fork to your local disk, and add the base repository as a remote:
git clone [email protected]:<your Github handle>/wav2vec-toolkit.git cd wav2vec-toolkit git remote add upstream https://github.com/anton-l/wav2vec-toolkit.git
Create a new branch to hold your development changes:
git checkout -b a-descriptive-name-for-my-changes
do not work on the
Set up a development environment by running the following command in a virtual environment:
pip install -e ".[dev]"
(If wav2vec-toolkit was already installed in the virtual environment, remove it with
pip uninstall wav2vec_toolkitbefore reinstalling it in editable mode with the
Develop the features on your branch.
Format your code. Run black and isort so that your newly added files look nice with the following command:
black --line-length 119 --target-version py36 src scripts isort src scripts
Once you're happy with your implementation, add your changes and make a commit to record your changes locally:
git add . git commit
It is a good idea to sync your copy of the code with the original repository regularly. This way you can quickly account for changes:
git fetch upstream git rebase upstream/main
Push the changes to your account using:
git push -u origin a-descriptive-name-for-my-changes
Once you are satisfied, go the webpage of your fork on GitHub. Click on "Pull request" to send your to the project maintainers for review.