Official implementation of A cappella: Audio-visual Singing VoiceSeparation, from BMVC21

Overview

Y-Net

Official implementation of A cappella: Audio-visual Singing VoiceSeparation, British Machine Vision Conference 2021

Project page: ipcv.github.io/Acappella/
Paper: Arxiv, BMVC (not available yet)

Running a demo / Y-Net Inference

We provide simple functions to load models with pre-trained weights. Steps:

  1. Clone the repo or download y-net>VnBSS>models (models can run as a standalone package)
  2. Load a model:
from VnBSS import y_net_gr # or from models import y_net_gr 
model = y_net_gr(n=1)

Check a demo fully working:
Open In Colab

Citation

@inproceedings{acappella,
    author    = {Juan F. Montesinos and
                 Venkatesh S. Kadandale and
                 Gloria Haro},
    title     = {A cappella: Audio-visual Singing VoiceSeparation},
    booktitle = {British Machine Vision Conference (BMVC)},
    year      = {2021},

}

Repository under construction .
.
.
.
.
.
.
.

Training / Using DEV code

###Training The most difficult part is to prepare the dataset as everything is builded upon a very specific format.
To run training:
python run.py -m model_name --workname experiment_name --arxiv_path directory_of_experiments --pretrained_from path_pret_weights
You can inspect the argparse at default.py>argparse_default.
Possible model names are: y_net_g, y_net_gr, y_net_m,y_net_r,u_net,llcp

Testing

  1. Go to manuscript_scripts and replace checkpoint paths by yours in the testing scripts.
  2. Run: bash manuscript_scripts/test_gr_r.sh
  3. Replace the paths of manuscript_scripts/auto_metrics.py by your experiment_directory path.
  4. Run: python manuscript_scripts/auto_metrics.py to visualise results.

It's a complicated framework. HELP!

The best option to run the framework is to debug! Having a runable code helps to see input shapes, dataflow and to run line by line. Download The circle of life demo with the files already processed. It will act like a dataset of 6 samples. You can download it from Google Drive 1.1 Gb.

  1. Unzip the file
  2. run python run.py -m y_net_gr (for example)

Everything has been configured to run by default this way.

The model

Each effective model is wrapped by a nn.Module which takes care of computing the STFT, the mask, returning the waveform etcetera... This wrapper can be found at VnBSS>models>y_net.py>YNet. To get rid of this you can simply inherit the class, take minimum layers and keep the core_forward method, which is the inference step without the miscelanea.

FAQs

  1. How to change the optimizer's hyperparameters?
    Go to config>optimizer.json
  2. How to change clip duration, video framerate, STFT parameters or audio samplerate?
    Go to config>__init__.py
  3. How to change the batch size or the amount of epochs?
    Go to config>hyptrs.json
  4. How to dump predictions from the training and test set
    Go to default.py. Modify DUMP_FILES (can be controlled at a subset level). force argument skips the iteration-wise conditions and dumps for every single network prediction.
  5. Is tensorboard enabled?
    Yes, you will find tensorboard records at your_experiment_directory/used_workname/tensorboard
  6. Can I resume an experiment?
    Yes, if you set exactly the same experiment folder and workname, the system will detect it and will resume from there.
  7. I'm trying to resume but found AssertionError If there is an exception before running the model
  8. How to change the amount of layers of U-Net
    U-net is build dynamically given a list of layers per block as shown in models>__init__.py from outer to inner blocks.
  9. How to modify the default network values?
    The json file config>net_cfg.json overwrites any default configuration from the model.
You might also like...
Python library for audio and music analysis

librosa A python package for music and audio analysis. Documentation See https://librosa.org/doc/ for a complete reference manual and introductory tut

?️ Open Source Audio Matching and Mastering
?️ Open Source Audio Matching and Mastering

Matching + Mastering = ❤️ Matchering 2.0 is a novel Containerized Web Application and Python Library for audio matching and mastering. It follows a si

Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications
Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications

A Python library for audio feature extraction, classification, segmentation and applications This doc contains general info. Click here for the comple

Manipulate audio with a simple and easy high level interface

Pydub Pydub lets you do stuff to audio in a way that isn't stupid. Stuff you might be looking for: Installing Pydub API Documentation Dependencies Pla

Scalable audio processing framework written in Python with a RESTful API

TimeSide : scalable audio processing framework and server written in Python TimeSide is a python framework enabling low and high level audio analysis,

Python module for handling audio metadata

Mutagen is a Python module to handle audio metadata. It supports ASF, FLAC, MP4, Monkey's Audio, MP3, Musepack, Ogg Opus, Ogg FLAC, Ogg Speex, Ogg The

Python I/O for STEM audio files
Python I/O for STEM audio files

stempeg = stems + ffmpeg Python package to read and write STEM audio files. Technically, stems are audio containers that combine multiple audio stream

Python library for handling audio datasets.

AUDIOMATE Audiomate is a library for easy access to audio datasets. It provides the datastructures for accessing/loading different datasets in a gener

An audio digital processing toolbox based on a workflow/pipeline principle

AudioTK Audio ToolKit is a set of audio filters. It helps assembling workflows for specific audio processing workloads. The audio workflow is split in

Comments
  • Missing required arguments ?

    Missing required arguments ?

    Hello, thanks for this great work and dataset, When I tried to run ,I got error below, Traceback (most recent call last): File "Desktop/Acappella-YNet/run.py", line 27, in iter_param, model, model_kwargs = VnBSS.ModelConstructor( File "/Desktop/Acappella-YNet/VnBSS/models/init.py", line 134, in build return self._build_dev() File "/Desktop/Acappella-YNet/VnBSS/models/init.py", line 140, in _build_dev model = constructor(**self.common_kwargs) TypeError: init() missing 6 required keyword-only arguments: 'remix_input', 'remix_coef', 'video_enabled', 'llcp_enabled', 'skeleton_enabled', and 'activation'

    opened by Enescigdem 9
  • Metrics

    Metrics

    Hi,

    I run run.py and it gives me some metrics like sdr/ds, etc.. Are these metrics equal to ones without ds? I search ds term in code, but couldn't find what it is exactly. For example I got sdr/ds=13.4 for my training, can I say sdr=13.4 ?

    opened by EmreOzkose 2
  • There is no dataset config ?

    There is no dataset config ?

    Hi, In README. md , Download the code and set your dataset paths at config>dataset_paths.json is said but there is no such file in code directory or it is not mentioned about the format of that json file. Thanks in advance

    opened by Enescigdem 2
  • UnboundLocalError: local variable 'warped_img' referenced before assignment

    UnboundLocalError: local variable 'warped_img' referenced before assignment

    Hi, thanks for releasing your code and dataset. I encounter this error on executing preprocess.py

    
    Exception Handled:  'NoneType' object is not iterable
      0%|                                                                                                                                           |
     samples processed:   0%|
    Traceback (most recent call last):
      File "preprocess.py", line 314, in <module>
        mean_face)
      File "preprocess.py", line 254, in process_samples
        del warped_img, landmarks, bbox, vid, stacked_frames, stacked_landmarks
    UnboundLocalError: local variable 'warped_img' referenced before assignment
    

    Error log is saying that 'warped_img' isn't assigned. So I checked 'warped_img' and that Errorhandling code.

    In 231 line, in preprocess_sample function,

    with tqdm(total=num_frames) as pbar:
                    for num_frame in range(num_frames):
                        try:
                            img_raw = vid.get_data(num_frame)
                        except IndexError as e:
                            print("Processing FAILED for sample: " + sample_id)
                            break
                        if img_raw.shape[1] >= MAX_IMAGE_WIDTH:
                            asp_ratio = img_raw.shape[0] / img_raw.shape[1]
                            dim = (MAX_IMAGE_WIDTH, int(MAX_IMAGE_WIDTH * asp_ratio))
                            new_img = cv2.resize(img_raw, dim, interpolation=cv2.INTER_AREA)
                            img = np.asarray(new_img)
                        else:
                            img = img_raw
                        try:
                            **_warped_img, landmarks, bbox = fp.process_image(img)_**
                            _, aligned_landmarks, _ = fp.process_image(warped_img)
                            good_frame_ids.append(num_frame)
                        except Exception as e:
                            print("Exception Handled: ", e)
                            continue
    

    I think face detector can't find face , but I don't know why If you give advice to me, I'm very appreciate.

    opened by yddr 2
Owner
Juan F. Montesinos
PhD student at Pompeu Fabra university Barcelona
Juan F. Montesinos
[Singing Log] Let your program learn to sing!

[Singing Log] Let your program learn to sing! You must have thought this was changelog when you saw the English title, but it's not, it's chànggēlog. What it does is allow your program to print logs and sing at the same time!

黄巍 22 Sep 3, 2022
cross-library (GStreamer + Core Audio + MAD + FFmpeg) audio decoding for Python

audioread Decode audio files using whichever backend is available. The library currently supports: Gstreamer via PyGObject. Core Audio on Mac OS X via

beetbox 419 Dec 26, 2022
cross-library (GStreamer + Core Audio + MAD + FFmpeg) audio decoding for Python

audioread Decode audio files using whichever backend is available. The library currently supports: Gstreamer via PyGObject. Core Audio on Mac OS X via

beetbox 359 Feb 15, 2021
Audio spatialization over WebRTC and JACK Audio Connection Kit

Audio spatialization over WebRTC Spatify provides a framework for building multichannel installations using WebRTC.

Bruno Gola 34 Jun 29, 2022
Audio augmentations library for PyTorch for audio in the time-domain

Audio augmentations library for PyTorch for audio in the time-domain, with support for stochastic data augmentations as used often in self-supervised / contrastive learning.

Janne 166 Jan 8, 2023
praudio provides audio preprocessing framework for Deep Learning audio applications

praudio provides objects and a script for performing complex preprocessing operations on entire audio datasets with one command.

Valerio Velardo 105 Dec 26, 2022
convert-to-opus-cli is a Python CLI program for converting audio files to opus audio format.

convert-to-opus-cli convert-to-opus-cli is a Python CLI program for converting audio files to opus audio format. Installation Must have installed ffmp

null 4 Dec 21, 2022
Implementation of "Slow-Fast Auditory Streams for Audio Recognition, ICASSP, 2021" in PyTorch

Auditory Slow-Fast This repository implements the model proposed in the paper: Evangelos Kazakos, Arsha Nagrani, Andrew Zisserman, Dima Damen, Slow-Fa

Evangelos Kazakos 57 Dec 7, 2022
Audio fingerprinting and recognition in Python

dejavu Audio fingerprinting and recognition algorithm implemented in Python, see the explanation here: How it works Dejavu can memorize audio by liste

Will Drevo 6k Jan 6, 2023
kapre: Keras Audio Preprocessors

Kapre Keras Audio Preprocessors - compute STFT, ISTFT, Melspectrogram, and others on GPU real-time. Tested on Python 3.6 and 3.7 Why Kapre? vs. Pre-co

Keunwoo Choi 867 Dec 29, 2022