Real-Time Spherical Microphone Renderer for binaural reproduction in Python

Overview

ReTiSAR

Implementation of the Real-Time Spherical Microphone Renderer for binaural reproduction in Python [1][2].
Mentioned in Awesome Python for Scientific Audio

Badge_OS Badge_Python Badge Version Badge_LastCommit
Badge_CommitActivity Badge_CodeSize Badge_RepoSize Badge Black
Badge_Conda Badge_FFTW Badge_JACK Badge_SOFA Badge_OSC


Contents: | Requirements | Setup | Quickstart | Execution parameters | Execution modes | Remote Control | Validation - Setup and Execution | Benchmark - Setup and Execution | References | Change Log | Contributing | Credits | License |


Requirements

  • macOS (tested on 10.14 Mojave and 10.15 Catalina) or Linux (tested on 5.9.1-1-rt19-MANJARO)
    (Windows is not supported due to an incompatibility with the current multiprocessing implementation)
  • JACK library (prebuilt installers / binaries are available)
  • Conda installation (miniconda is sufficient; provides an easy way to get Intel MKL or alternatively OpenBLAS optimized numpy versions which is highly recommended)
  • Python installation (tested with 3.7 to 3.9; recommended way to get Python is to use Conda as described in the setup section)
  • Installation of the required Python packages (recommended way is to use Conda as described in the setup section)
  • Optional: Download of publicly available measurement data for alternative execution modes (always check the command line output or log files in case the rendering pipeline does not initialize successfully!)
  • Optional: Install an OSC client for real-time feedback and remote control options during runtime

Setup

  • Clone repository with command line or any other git client:
    git clone https://github.com/AppliedAcousticsChalmers/ReTiSAR.git
    • Alternative: Download and extract snapshot manually from provided URL (not recommended due to not being able to pull updates)
    • Alternative: Update your local copy with changes from the repository (if you have cloned it in the past):
      git pull
  • Navigate into the repository (the directory containing setup.py):
    cd ReTiSAR/
  • Install required Python packages i.e., Conda is recommended:
    • Make sure that Conda is up to date:
      conda update conda
    • Create new Conda environment from the specified requirements (--force to overwrite potentially existing outdated environment):
      conda env create --file environment.yml --force
    • Activate created Conda environment:
      source activate ReTiSAR

Quickstart

  • Follow requirements and setup instructions
  • During first execution, some small amount of additional mandatory external measurement data will be downloaded automatically, see remark in execution modes (requires Internet connection)
  • Start JACK server with desired sampling rate (all demo configurations are in 48 kHz):
    jackd -d coreaudio -r 48000 [macOS]
    jackd -d alsa -r 48000 [Linux]
    Remark: Check e.g. the jackd -d coreaudio -d -l command to specify the audio interface that should be used!
  • Run package with [default] parameters to hear a binaural rendering of a raw Eigenmike recording:
    python -m ReTiSAR
  • Option 1: Modify the configuration by changing the default parameters in config.py (prepared block comments for the specific execution modes below exist).
  • Option 2: Modify the configuration by command line arguments (like in the following examples showing different execution parameters and modes, see --help).

JACK initialization — In case you have never started the JACK audio server on your system or want to make sure it initializes with appropriate values. Open the JackPilot application set your system specific default settings.
At this point the only relevant JACK audio server setting is the sampling frequency, which has to match the sampling frequency of your rendered audio source file or stream (no resampling will be applied for that specific file).

FFTW optimization — In case the rendering takes very long to start (after the message "initializing FFTW DFT optimization ..."), you might want to endure this long computation time once (per rendering configuration) or lower your FFTW planner effort (see --help).

Rendering performance — Follow these remarks to expect continuous and artifact free rendering:

  • Optional components like array pre-rendering, headphone equalization, noise generation, etc. will save performance in case they are not deployed.
  • Extended IR lengths (particularly for modes with an array IR pre-rendering) will massively increase the computational load depending on the chosen block size (partitioned convolution).
  • Currently, there is no partitioned convolution for the main binaural renderer with SH based processing, hence the FIR taps of the applied HRIR, Modal Radial Filters and further compensations (e.g. Spherical Head Filter) need to cumulatively fit inside the chosen block size.
  • Higher block size means lower computational load in real-time rendering but also increased system latency, most relevant for modes with array live-stream rendering, but also all other modes in terms of a slightly "smeared" head-tracking experience (noticeable at 4096 samples).
  • Adjust output levels of all rendering components (default parameters chosen accordingly) to prevent signal clipping (indicated by warning messages during execution).
  • Check JACK system load (e.g. JackPilot or OSC_Remote_Demo.pd) to be below approx. 95% load, in order to prevent dropouts (i.e. the OS reported overall system load is not a good indicator).
  • Check JACK detected dropouts ("xruns" indicated during execution).
  • Most of all, use your ears! If something sounds strange, there is probably something going wrong... ;)

Always check the command line output or generated log files in case the rendering pipeline does not initialize successfully!

Execution parameters

The following parameters are all optional and available in combinations with the named execution modes subsequently:

  • Run with a specific processing block size (choose the value according to the individual rendering configuration and performance of your system)
    • The largest block size (the best performance but noticeable input latency):
      python -m ReTiSAR -b=4096 [default]
    • Try smaller block sizes according to the specific rendering configuration and individual system performance:
      python -m ReTiSAR -b=1024
      python -m ReTiSAR -b=256
  • Run with a specific processing word length
    • Single precision 32 bit (better performance):
      python -m ReTiSAR -SP=TRUE [default]
    • Double precision 64 bit (no configuration with an actual benefit is known):
      python -m ReTiSAR -SP=FALSE
  • Run with a specific IR truncation cutoff level (applied to all IRs)
    • Cutoff -60 dB under peak (better performance and perceptually irrelevant in most cases):
      python -m ReTiSAR -irt=-60 [default]
    • No cutoff to render the entire IR (this constitutes tough performance requirements in the case of rendering array IRs with long reverberation):
      python -m ReTiSAR -irt=0 [applied in all scientific evaluations]
  • Run with a specific head-tracking device (paths are system dependent!)
    • No tracking (head movement can be remote controlled):
      python -m ReTiSAR -tt=NONE [default]
    • Automatic rotation:
      python -m ReTiSAR -tt=AUTO_ROTATE
    • Tracker Razor AHRS:
      python -m ReTiSAR -tt=RAZOR_AHRS -t=/dev/tty.usbserial-AH03F9XC
    • Tracker Polhemus Patriot:
      python -m ReTiSAR -tt=POLHEMUS_PATRIOT -t=/dev/tty.UC-232AC
    • Tracker Polhemus Fastrack:
      python -m ReTiSAR -tt=POLHEMUS_FASTRACK -t=/dev/tty.UC-232AC
  • Run with a specific HRTF dataset as MIRO [6] or SOFA [7] files
    • Neumann KU100 artificial head from [6] as SOFA:
      python -m ReTiSAR -hr=res/HRIR/KU100_THK/48k_32bit_128tap_2702dir.sofa -hrt=HRIR_SOFA [default]
    • Neumann KU100 artificial head from [6] as MIRO:
      python -m ReTiSAR -hr=res/HRIR/KU100_THK/48k_32bit_128tap_2702dir_struct.mat -hrt=HRIR_MIRO
    • Neumann KU100 artificial head from [8] as SOFA:
      python -m ReTiSAR -hr=res/HRIR/KU100_SADIE2/48k_24bit_256tap_8802dir.sofa -hrt=HRIR_SOFA
    • GRAS KEMAR artificial head from [8] as SOFA:
      python -m ReTiSAR -hr=res/HRIR/KEMAR_SADIE2/48k_24bit_256tap_8802dir.sofa -hrt=HRIR_SOFA
    • FABIAN artificial head from [9] as SOFA:
      python -m ReTiSAR -hr=res/HRIR/FABIAN_TUB/44k_32bit_256tap_11950dir_HATO_0.sofa -hrt=HRIR_SOFA
    • Employ an arbitrary (artificial or individual) dataset by providing a relative / absolute path!
    • The length of the employed HRIR dataset constrains the minimum usable rendering block size!
    • Mismatched IRs with a sampling frequency different to the source material will be resampled!
  • Run with a specific headphone equalization / compensation filters (arbitrary filter length). The compensation filter should match the used headphone model or even the individual headphone. In the best case scenario, the filter was also gathered on the identical utilized HRIR (artificial or individual head).
    • No individual headphone compensation:
      python -m ReTiSAR -hp=NONE [default]
    • Beyerdynamic DT990 headphone on Neumann KU100 artificial head from [8]:
      python -m ReTiSAR -hp=res/HPCF/KU100_SADIE2/48k_24bit_1024tap_Beyerdynamic_DT990.wav
    • Beyerdynamic DT990 headphone on GRAS KEMAR artificial head from [8]:
      python -m ReTiSAR -hp=res/HPCF/KEMAR_SADIE2/48k_24bit_1024tap_Beyerdynamic_DT990.wav
    • AKG K701 headphone on FABIAN artificial head from [9]:
      python -m ReTiSAR -hp=res/HPCF/FABIAN_TUB/44k_32bit_4096tap_AKG_K701.wav
    • Sennheiser HD800 headphone on FABIAN artificial head from [9]:
      python -m ReTiSAR -hp=res/HPCF/FABIAN_TUB/44k_32bit_4096tap_Sennheiser_HD800.wav
    • Sennheiser HD600 headphone on GRAS KEMAR artificial head from TU Rostock:
      python -m ReTiSAR -hp=res/HPCF/KEMAR_TUR/44k_24bit_2048tap_Sennheiser_HD600.wav
    • Check the res/HPCF/. directory for numerous other headphone models or employ arbitrary (artificial or individual) compensation filters by providing a relative / absolute path!
    • Mismatched IRs with a sampling frequency different to the source material will be resampled!
  • Run with a specific SH processing compensation techniques (relevant for rendering modes utilizing spherical harmonics)
    • Modal Radial Filters [always applied] with an individual amplification soft-limiting in dB according to [3]:
      python -m ReTiSAR -arr=18 [default]
    • Spherical Head Filter according to [4]:
      python -m ReTiSAR -sht=SHF
    • Spherical Harmonics Tapering in combination with the Spherical Head Filter according to [5]:
      python -m ReTiSAR -sht=SHT+SHF [default]
  • Run with some specific emulated self-noise as additive component to each microphone array sensor (the performance requirements increase according to channel count)
    • No noise (yielding the best performance):
      python -m ReTiSAR -gt=NONE [default]
    • White noise (also setting the initial output level and mute state of the rendering component):
      python -m ReTiSAR -gt=NOISE_WHITE -gl=-30 -gm=FALSE
    • Pink noise by IIR filtering (higher performance requirements):
      python -m ReTiSAR -gt=NOISE_IIR_PINK -gl=-30 -gm=FALSE
    • Eigenmike noise coloration by IIR filtering from [10]:
      python -m ReTiSAR -gt=NOISE_IIR_EIGENMIKE -gl=-30 -gm=FALSE
  • For further configuration parameters, check Alternative 1 and Alternative 2 above!

Execution modes

This section list all the conceptually different rendering modes of the pipeline. Most of the other beforehand introduced execution parameters can be combined with the mode-specific parameters. In case no manual value for all specific rendering parameters is provided (as in the following examples), their respective default values will be used.

Most execution modes require additional external measurement data, which cannot be republished here. However, all provided examples are based on publicly available research data. Respective files are represented here by provided source reference files (see res/), containing a source URL and potentially further instructions. In case the respective resource data file is not yet available on your system, download instructions will be shown in the command line output and generated log files.

  • Run as array recording renderer

    • Eigenmike at Chalmers lab space with speaker moving horizontally around the array:
      python -m ReTiSAR -sh=4 -tt=NONE -s=res/record/EM32ch_lab_voice_around.wav -ar=res/ARIR/RT_calib_EM32ch_struct.mat -art=AS_MIRO -arl=0 -hr=res/HRIR/KU100_THK/48k_32bit_128tap_2702dir.sofa -hrt=HRIR_SOFA [default]
    • Eigenmike at Chalmers lab space with speaker moving vertically in front of the array:
      python -m ReTiSAR -sh=4 -tt=NONE -s=res/record/EM32ch_lab_voice_updown.wav -ar=res/ARIR/RT_calib_EM32ch_struct.mat -art=AS_MIRO -arl=0 -hr=res/HRIR/KU100_THK/48k_32bit_128tap_2702dir.sofa -hrt=HRIR_SOFA
    • Zylia ZM-1 at TH Cologne office (recording file not provided!):
      python -m ReTiSAR -b=512 -sh=3 -tt=NONE -s=res/record/ZY19_off_around.wav -sl=9 -ar=res/ARIR/RT_calib_ZY19_struct.mat -art=AS_MIRO -arl=0 -hr=res/HRIR/KU100_THK/48k_32bit_128tap_2702dir.sofa -hrt=HRIR_SOFA
    • HØSMA-7N at TH Cologne lecture hall (recording file not provided!):
      python -m ReTiSAR -b=2048 -sh=7 -tt=NONE -s=res/record/HOS64_hall_lecture.wav -sp="[(90,0)]" -sl=9 -ar=res/ARIR/RT_calib_HOS64_struct.mat -art=AS_MIRO -arl=0 -hr=res/HRIR/KU100_THK/48k_32bit_128tap_2702dir.sofa -hrt=HRIR_SOFA
  • Run as array live-stream renderer with minimum latency (e.g. Eigenmike with the respective channel calibration provided by the manufacturer)

    • Eigenmike Chalmers EM32 (SN 28):
      python -m ReTiSAR -b=512 -sh=4 -tt=NONE -s=None -ar=res/ARIR/RT_calib_EM32ch_struct.mat -art=AS_MIRO -arl=0 -hr=res/HRIR/KU100_THK/48k_32bit_128tap_2702dir.sofa -hrt=HRIR_SOFA
    • Eigenmike Facebook Reality Labs EM32 (SN ??):
      python -m ReTiSAR -b=512 -sh=4 -tt=NONE -s=None -ar=res/ARIR/RT_calib_EM32frl_struct.mat -art=AS_MIRO -arl=0 -hr=res/HRIR/KU100_THK/48k_32bit_128tap_2702dir.sofa -hrt=HRIR_SOFA
    • Zylia ZM-1:
      python -m ReTiSAR -b=512 -sh=3 -tt=NONE -s=None -ar=res/ARIR/RT_calib_ZY19_struct.mat -art=AS_MIRO -arl=0 -hr=res/HRIR/KU100_THK/48k_32bit_128tap_2702dir.sofa -hrt=HRIR_SOFA
    • TH Cologne HØSMA-7N:
      python -m ReTiSAR -b=2048 -sh=7 -tt=NONE -s=None -ar=res/ARIR/RT_calib_HOS64_struct.mat -art=AS_MIRO -arl=0 -hr=res/HRIR/KU100_THK/48k_32bit_128tap_2702dir.sofa -hrt=HRIR_SOFA
  • Run as array IR renderer, e.g. Eigenmike

    • Simulated plane wave:
      python -m ReTiSAR -sh=4 -tt=AUTO_ROTATE -s=res/source/Drums_48.wav -ar=res/ARIR/DRIR_sim_EM32_PW_struct.mat -art=ARIR_MIRO -arl=-6 -hr=res/HRIR/KU100_THK/48k_32bit_128tap_2702dir.sofa -hrt=HRIR_SOFA
    • Anechoic measurement:
      python -m ReTiSAR -sh=4 -tt=AUTO_ROTATE -s=res/source/Drums_48.wav -ar=res/ARIR/DRIR_anec_EM32ch_S_struct.mat -art=ARIR_MIRO -arl=0 -hr=res/HRIR/KU100_THK/48k_32bit_128tap_2702dir.sofa -hrt=HRIR_SOFA
  • Run as array IR renderer, e.g. sequential VSA measurements from [11] at the maximum respective SH order (different room, source positions and array configurations are available in res/ARIR/)

    • 50ch (sh5), LBS center:
      python -m ReTiSAR -sh=5 -tt=AUTO_ROTATE -s=res/source/Drums_48.wav -ar=res/ARIR/DRIR_LBS_VSA_50RS_PAC.sofa -art=ARIR_SOFA -arl=-12 -hr=res/HRIR/KU100_THK/48k_32bit_128tap_2702dir.sofa -hrt=HRIR_SOFA
    • 86ch (sh7), SBS center:
      python -m ReTiSAR -sh=7 -tt=AUTO_ROTATE -s=res/source/Drums_48.wav -ar=res/ARIR/DRIR_SBS_VSA_86RS_PAC.sofa -art=ARIR_SOFA -arl=-12 -hr=res/HRIR/KU100_THK/48k_32bit_128tap_2702dir.sofa -hrt=HRIR_SOFA
    • 110ch (sh8), CR1 left:
      python -m ReTiSAR -sh=8 -tt=AUTO_ROTATE -s=res/source/Drums_48.wav -sp="[(-37,0)]" -ar=res/ARIR/DRIR_CR1_VSA_110RS_L.sofa -art=ARIR_SOFA -arl=-12 -hr=res/HRIR/KU100_THK/48k_32bit_128tap_2702dir.sofa -hrt=HRIR_SOFA
    • 194ch (sh11, open sphere, cardioid microphones), LBS center:
      python -m ReTiSAR -sh=11 -tt=AUTO_ROTATE -s=res/source/Drums_48.wav -ar=res/ARIR/DRIR_LBS_VSA_194OSC_PAC.sofa -art=ARIR_SOFA -arl=-12 -hr=res/HRIR/KU100_THK/48k_32bit_128tap_2702dir.sofa -hrt=HRIR_SOFA
    • 1202ch (truncated sh12), CR7 left:
      python -m ReTiSAR -sh=12 -tt=AUTO_ROTATE -s=res/source/Drums_48.wav -sp="[(-37,0)]" -ar=res/ARIR/DRIR_CR7_VSA_1202RS_L.sofa -art=ARIR_SOFA -arl=-12 -hr=res/HRIR/KU100_THK/48k_32bit_128tap_2702dir.sofa -hrt=HRIR_SOFA

Note that the rendering performance is mostly determined by the chosen combination of the following parameters: number of microphones (ARIR channels), room reverberation time (ARIR length), IR truncation cutoff level and rendering block size.

  • Run as BRIR renderer (partitioned convolution in frequency domain) for any BRIR compatible to the SoundScape Renderer, e.g. pre-processed array IRs by [12]:
    python -m ReTiSAR -tt=AUTO_ROTATE -s=res/source/Drums_48.wav -art=NONE -hr=res/HRIR/KU100_THK/BRIR_CR1_VSA_110RS_L_SSR_SFA_-37_SOFA_RFI.wav -hrt=BRIR_SSR -hrl=-12
  • Run as "binauralizer" for an arbitrary number of virtual sound sources via HRTF (partitioned convolution in frequency domain) for any HRIR compatible to the SoundScape Renderer:
    python -m ReTiSAR -tt=AUTO_ROTATE -s=res/source/PinkMartini_Lilly_44.wav -sp="[(30, 0),(-30, 0)]" -art=NONE -hr=res/HRIR/FABIAN_TUB/hrirs_fabian.wav -hrt=HRIR_SSR (provide respective source file and source positions!)

Remote Control

  • During runtime, certain parameters of the application can be remote controlled via Open Sound Control. Individual clients can be accessed by targeting them with specific OSC commands on port 5005 [default].
    Depending on the current configuration and rendering mode different commands are available i.e., arbitrary combinations of the following targets and values:
    /generator/volume 0, /generator/volume -12 (set any client output volume in dBFS),
    /prerenderer/mute 1, /prerenderer/mute 0, /prerenderer/mute -1, /prerenderer/mute (set/toggle any client mute state),
    /hpeq/passthrough true, /hpeq/passthrough false, /hpeq/passthrough toggle (set/toggle any client passthrough state)
  • The target name is derived from the individual JACK client name for all commands, while the order of target client and command can be altered, while further commands might be available:
    /renderer/crossfade, /crossfade/renderer (set/toggle the crossfade state),
    /renderer/delay 350.0 (set an additional input delay in ms),
    /renderer/order 0, /renderer/order 4 (set the SH rendering order),
    /tracker/zero (calibrate the tracker), /tracker/azimuth 45 (set the tracker orientation),
    /player/stop, /player/play, /quit (quit all rendering components)
  • During runtime, individual JACK clients with their respective "target" name also report real-time feedback or analysis data on port 5006 [default] in the specified exemplary data format (number of values depends on output ports) i.e., arbitrary combinations of the name and parameters:
    /player/rms 0.0, /generator/peak 0.0 0.0 0.0 0.0 (the current audio output metrics),
    /renderer/load 100 (the current client load),
    /tracker/AzimElevTilt 0.0 0.0 0.0 (the current head orientation),
    /load 100 (the current JACK system load)
  • In the package included is an example remote control client implemented for "vanilla" PD, see further instructions in OSC_Remote_Demo.pd. Screenshot of OSC_Remote_Demo.pd

Validation - Setup and Execution

  • Download and build required ecasound library for signal playback and capture with JACK support:
    in directory ./configure, make and sudo make install while having JACK installed
  • Optional: Install sendosc tool to be used for automation in shell scripts:
    brew install yoggy/tap/sendosc
  • Remark: Make sure all subsequent rendering configurations are able to start up properly before recording starts (particularly FFTW optimization might take a long time, see above)
  • Validate impulse responses by comparing against a reference implementation, in this case the output of sound_field_analysis-py [11]
    • Execute recording script, consecutively starting the package and capturing impulse responses in different rendering configurations:
      ./res/research/validation/record_ir.sh
      Remark: Both implementations compensate the source being at an incidence angle of -37 degrees in the measurement IR set
    • Run package in validation mode, executing a comparison of all beforehand captured IRs in res/research/validation/ against the provided reference IRs:
      python -m ReTiSAR --VALIDATION_MODE=res/HRIR/KU100_THK/BRIR_CR1_VSA_110RS_L_SSR_SFA_-37_SOFA_RFI.wav
  • Validate signal-to-noise-ratio by comparing input and output signals of the main binaural renderer for wanted target signals and emulated sensor self-noise respectively
    • Execute recording script consecutively starting the package and capturing target-noise as well as self-noise input and output signals in different rendering configurations:
      ./res/research/validation/record_snr.sh
    • Open (and run) MATLAB analysis script to execute an SNR comparison of beforehand captured signals:
      open ./res/research/validation/calculate_snr.m

Benchmark - Setup and Execution

  • Install additionally required Python packages into Conda environment:
    conda env update --file environment_dev.yml
  • Run the JACK server with arbitrary sampling rate via JackPilot or in a new command line window ([CMD]+[T]):
    jackd -d coreaudio
  • Run in benchmark mode, instantiating one rendering JACK client with as many convolver instances as possible (40-60 minutes):
    python -m ReTiSAR --BENCHMARK_MODE=PARALLEL_CONVOLVERS
  • Run in benchmark mode, instantiating as many rendering JACK clients as possible with one convolver instance (10-15 minutes):
    python -m ReTiSAR --BENCHMARK_MODE=PARALLEL_CLIENTS
  • Find generated results in the specified files at the end of the script.

References

[1] H. Helmholz, C. Andersson, and J. Ahrens, “Real-Time Implementation of Binaural Rendering of High-Order Spherical Microphone Array Signals,” in Fortschritte der Akustik -- DAGA 2019, 2019, pp. 1462–1465.
[2] H. Helmholz, T. Lübeck, J. Ahrens, S. V. A. Garí, D. Lou Alon, and R. Mehra, “Updates on the Real-Time Spherical Array Renderer (ReTiSAR),” in Fortschritte der Akustik -- DAGA 2020, 2020, pp. 1169–1172.
[3] B. Bernschütz, C. Pörschmann, S. Spors, and S. Weinzierl, “SOFiA Sound Field Analysis Toolbox,” in International Conference on Spatial Audio, 2011, pp. 7–15.
[4] C. Hold, H. Gamper, V. Pulkki, N. Raghuvanshi, and I. J. Tashev, “Improving Binaural Ambisonics Decoding by Spherical Harmonics Domain Tapering and Coloration Compensation,” in International Conference on Acoustics, Speech and Signal Processing, 2019, pp. 261–265, doi: 10.1109/ICASSP.2019.8683751.
[5] Z. Ben-Hur, F. Brinkmann, J. Sheaffer, S. Weinzierl, and B. Rafaely, “Spectral equalization in binaural signals represented by order-truncated spherical harmonics,” J. Acoust. Soc. Am., vol. 141, no. 6, pp. 4087–4096, 2017, doi: 10.1121/1.4983652.
[6] B. Bernschütz, “A spherical far field HRIR/HRTF compilation of the Neumann KU 100,” in Fortschritte der Akustik -- AIA/DAGA 2013, 2013, pp. 592–595.
[7] P. Majdak et al., “Spatially Oriented Format for Acoustics: A Data Exchange Format Representing Head-Related Transfer Functions,” in AES Convention 134, 2013, pp. 262–272.
[8] C. Armstrong, L. Thresh, D. Murphy, and G. Kearney, “A Perceptual Evaluation of Individual and Non-Individual HRTFs: A Case Study of the SADIE II Database,” Appl. Sci., vol. 8, no. 11, pp. 1–21, 2018, doi: 10.3390/app8112029.
[9] F. Brinkmann et al., “The FABIAN head-related transfer function data base.” Technische Universität Berlin, Berlin, Germany, 2017, doi: 10.14279/depositonce-5718.5.
[10] H. Helmholz, D. Lou Alon, S. V. A. Garí, and J. Ahrens, “Instrumental Evaluation of Sensor Self-Noise in Binaural Rendering of Spherical Microphone Array Signals,” in Forum Acusticum, 2020, pp. 1349–1356, doi: 10.48465/fa.2020.0074.
[11] P. Stade, B. Bernschütz, and M. Rühl, “A Spatial Audio Impulse Response Compilation Captured at the WDR Broadcast Studios,” in 27th Tonmeistertagung -- VDT International Convention, 2012, pp. 551–567.
[12] C. Hohnerlein and J. Ahrens, “Spherical Microphone Array Processing in Python with the sound_field_analysis-py Toolbox,” in Fortschritte der Akustik -- DAGA 2017, 2017, pp. 1033–1036.
[13] H. Helmholz, J. Ahrens, D. Lou Alon, S. V. A. Garí, and R. Mehra, “Evaluation of Sensor Self-Noise In Binaural Rendering of Spherical Microphone Array Signals,” in International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 161–165, doi: 10.1109/ICASSP40776.2020.9054434.

Change Log

  • v2021.03.30

    • Addition of Zylia ZM-1 array example recording and live-stream configurations
  • v2020.11.23

    • Consolidation of Python 3.9 compatibility
    • Consolidation of Linux compatibility (no modifications were required; tested with Jack 1.9.16 on kernel 5.9.1-1-rt19-MANJARO)
  • v2020.10.21

    • Improvement of establishing JACK and/or OS specific client name length limitation of JackClient
  • v2020.10.15 (v2020.FA)

    • Addition of references to data set for Forum Acusticum [10] publication
  • v2020.9.10

    • Enforcement of Black >= 20.8b1 code style
  • v2020.8.20

    • Extension of JackPlayer to make auto-play behaviour configurable (via config parameter or command line argument)
  • v2020.8.13

    • Update of OSC Remote Demo to reset (after not receiving data) and clip displayed RMS values
    • Improvement of FFTW wisdom verification to be more error proof
  • v2020.7.16

    • Addition of experimental SECTORIAL_DEGREE_SELECTION and EQUATORIAL_DEGREE_SELECTION SH weighting techniques (partial elimination of HRIR elevation queues)
    • Update of plotting during and after application of SH compensation techniques
  • v2020.7.7

    • Addition of HRIR and HPCF source files to SADIE II database [8]
    • Extension of DataRetriever to automatically extract requested resources from downloaded *.zip archives
  • v2020.7.4

    • Introduction of FFTW wisdom file signature verification (in order to update any already accumulated wisdom run with --PYFFTW_LEGACY_FILE=log/pyfftw_wisdom.bin once)
    • Fixes for further SonarLint security and code style recommendations
  • v2020.7.1

    • Update and addition of further WDR Cologne ARIR source files (linking to Zenodo data set)
    • Hack for Modal Radial Filters generation in open / cardioid SMA configurations (unfortunately this metadata is not directly available in the SOFA ARIR files)
  • v2020.4.8

    • Improvement of IIR pink noise generation (continuous utilization of internal filter delay conditions)
    • Improvement of IIR pink noise generation (employment of SOS instead of BA coefficients)
    • Addition of IIR Eigenmike coloration noise generation according to [10]
  • v2020.4.3

    • Improvement of white noise generation (vastly improved performance due to numpy SFC64 generator)
    • Enabling of JackGenerator (and derivatives) to operate in single precision for improved performance
  • v2020.3.3

    • Addition of further simulated array data sets
  • v2020.2.24

    • Consolidation of Python 3.8 compatibility
    • Introduction of multiprocessing context for compatibility
    • Enforcement of Black code style
  • v2020.2.14

    • Addition of TH Cologne HØSMA-7N array configuration
  • v2020.2.10

    • Addition of project community information (contributing, code of conduct, issue templates)
  • v2020.2.7

    • Extension of DataRetriever to automatically download data files
    • Addition of missing ignored project resources
  • v2020.2.2

    • Change of default rendering configuration to contained Eigenmike recording
    • Update of README structure (including Quickstart section)
  • v2020.1.30

    • First publication of code
  • Pre-release (v2020.ICASSP)

    • Contains the older original code state for the ICASSP [13] publication
  • Pre-release (v2019.DAGA)

    • Contains the older original code state for the initial DAGA [1] publication

Contributing

See CONTRIBUTING for full details.

Credits

Written by Hannes Helmholz.

Scientific supervision by Jens Ahrens.

Contributions by Carl Andersson and Tim Lübeck.

This work was funded by Facebook Reality Labs.

License

This software is licensed under a Non-Commercial Software License (see LICENSE for full details).

Copyright (c) 2018
Division of Applied Acoustics
Chalmers University of Technology

You might also like...
Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications
Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications

A Python library for audio feature extraction, classification, segmentation and applications This doc contains general info. Click here for the comple

Scalable audio processing framework written in Python with a RESTful API

TimeSide : scalable audio processing framework and server written in Python TimeSide is a python framework enabling low and high level audio analysis,

Python module for handling audio metadata

Mutagen is a Python module to handle audio metadata. It supports ASF, FLAC, MP4, Monkey's Audio, MP3, Musepack, Ogg Opus, Ogg FLAC, Ogg Speex, Ogg The

Read music meta data and length of MP3, OGG, OPUS, MP4, M4A, FLAC, WMA and Wave files with python 2 or 3

tinytag tinytag is a library for reading music meta data of MP3, OGG, OPUS, MP4, M4A, FLAC, WMA and Wave files with python Install pip install tinytag

Telegram Voice-Chat Bot Written In Python Using Pyrogram.
Telegram Voice-Chat Bot Written In Python Using Pyrogram.

Telegram Voice-Chat Bot Telegram Voice-Chat Bot To Play Music From Various Sources In Your Group Support All linux based os. Windows Mac Diagram Requi

Expressive Digital Signal Processing (DSP) package for Python
Expressive Digital Signal Processing (DSP) package for Python

AudioLazy Development Last release PyPI status Real-Time Expressive Digital Signal Processing (DSP) Package for Python! Laziness and object representa

cross-library (GStreamer + Core Audio + MAD + FFmpeg) audio decoding for Python

audioread Decode audio files using whichever backend is available. The library currently supports: Gstreamer via PyGObject. Core Audio on Mac OS X via

Python wrapper around sox.
Python wrapper around sox.

pysox Python wrapper around sox. Read the Docs here. This library was presented in the following paper: R. M. Bittner, E. J. Humphrey and J. P. Bello,

Python I/O for STEM audio files
Python I/O for STEM audio files

stempeg = stems + ffmpeg Python package to read and write STEM audio files. Technically, stems are audio containers that combine multiple audio stream

Releases(v2021.TASLP)
  • v2021.TASLP(Jan 9, 2022)

    H. Helmholz, D. Lou Alon, S. V. Amengual Garí, and J. Ahrens, “Effects of Additive Noise in Binaural Rendering of Spherical Microphone Array Signals,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 29, pp. 3642–3653, 2021, doi: 10.1109/TASLP.2021.3129359.

    The code, data structures and included resources are directly based on the continuously developed master branch around the time of this publication.

    The procedures utilized to capture rendered ear signals, the resulting raw data, the scripts realizing the Composite Loudness Levels analysis, the intermediate ILD, CLL and CLLgd plots, as well as the scripts to generate the respective result plots are available at: http://doi.org/10.5281/zenodo.3842305

    Source code(tar.gz)
    Source code(zip)
  • v2020.FA(Oct 15, 2020)

    H. Helmholz, D. Lou Alon, S. V. Amengual Garí, and J. Ahrens, “Instrumental Evaluation of Sensor Self-Noise in Binaural Rendering of Spherical Microphone Array Signals,” in Forum Acusticum, 2020, pp. 1349–1356, doi: 10.48465/fa.2020.0074.

    The code, data structures and included resources are directly based on the continuously developed master branch around the time of this publication.

    The methods for the instrumental SNR evaluation, gathered results and according visualizations, as well as mh acoustics Eigenmike 32 self-noise measurement scripts, recorded results and according visualizations utilized in the publication are available at: http://doi.org/10.5281/zenodo.3711626

    Source code(tar.gz)
    Source code(zip)
  • v2020.ICASSP(May 6, 2020)

    H. Helmholz, J. Ahrens, D. Lou Alon, S. V. Amengual Garí, and R. Mehra, “Evaluation of Sensor Self-Noise In Binaural Rendering of Spherical Microphone Array Signals,” in International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 161–165, doi: 10.1109/ICASSP40776.2020.9054434.

    Note that this is an early state of the publication, where the code base, data structures and included resources could be very different from current releases. Accordingly the original project structure is provided here for reference.

    The methods for the instrumental evaluation, gathered results and according visualizations utilized in the publication are available at: http://doi.org/10.5281/zenodo.3661422

    Source code(tar.gz)
    Source code(zip)
  • v2019.DAGA(May 6, 2020)

    H. Helmholz, C. Andersson, and J. Ahrens, “Real-Time Implementation of Binaural Rendering of High-Order Spherical Microphone Array Signals,” in Fortschritte der Akustik -- DAGA 2019, 2019, pp. 1462–1465.

    Note that this is an early state of the publication, where the code base, data structures and included resources could be very different from current releases. Accordingly the original project structure is provided here for reference.

    The methods for the instrumental evaluation, gathered results and according visualizations utilized in the publication are included here.

    Source code(tar.gz)
    Source code(zip)
Owner
Division of Applied Acoustics at Chalmers University of Technology
Division of Applied Acoustics at Chalmers University of Technology
This is a short program that takes the input from your microphone and uses OpenGL to draw a live colourful pattern

Visual-Music This is a short program that takes the input from your microphone and uses OpenGL to draw a live colourful pattern Installation and Setup

Tom Jebbo 1 Dec 26, 2021
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

Project DeepSpeech DeepSpeech is an open-source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu's Deep Spee

Mozilla 20.8k Jan 3, 2023
Real-time audio visualizations (spectrum, spectrogram, etc.)

Friture Friture is an application to visualize and analyze live audio data in real-time. Friture displays audio data in several widgets, such as a sco

Timothée Lecomte 700 Dec 31, 2022
nicfit 425 Jan 1, 2023
Sync Toolbox - Python package with reference implementations for efficient, robust, and accurate music synchronization based on dynamic time warping (DTW)

Sync Toolbox - Python package with reference implementations for efficient, robust, and accurate music synchronization based on dynamic time warping (DTW)

Meinard Mueller 66 Jan 2, 2023
Audio augmentations library for PyTorch for audio in the time-domain

Audio augmentations library for PyTorch for audio in the time-domain, with support for stochastic data augmentations as used often in self-supervised / contrastive learning.

Janne 166 Jan 8, 2023
Royal Music You can play music and video at a time in vc

Royals-Music Royal Music You can play music and video at a time in vc Commands SOON String STRING_SESSION Deployment ?? Credits • ??ᴏᴍʏᴀ⃝??ᴇᴇᴛ • ??ғғɪ

null 2 Nov 23, 2021
cross-library (GStreamer + Core Audio + MAD + FFmpeg) audio decoding for Python

audioread Decode audio files using whichever backend is available. The library currently supports: Gstreamer via PyGObject. Core Audio on Mac OS X via

beetbox 419 Dec 26, 2022
Audio fingerprinting and recognition in Python

dejavu Audio fingerprinting and recognition algorithm implemented in Python, see the explanation here: How it works Dejavu can memorize audio by liste

Will Drevo 6k Jan 6, 2023
Python library for audio and music analysis

librosa A python package for music and audio analysis. Documentation See https://librosa.org/doc/ for a complete reference manual and introductory tut

librosa 5.6k Jan 6, 2023