Where `my_test_config.json` is a test configuration file in json format with fields explained in next section.
### 2.2. Test configuration file
This is the main file to edit in order to change global configuration options, detailed below.
*NOTE: Paths specified in the JSON file are relative to the working directory where the script is executed from, NOT the location of the JSON file itself. It is possible (and recommended!) to use absolute paths instead to avoid confusion.*
| input_path | ./my_items/ | Required | Input directory with *.WAV, *.PCM or *.TXT files to process |
| preproc_input | True/False | False | Whether to execute preprocessing on the input files |
| in_format | HOA3 | Required | Input format for the conditions to generate, see spatial_audio_format |
| in_fs | 32000 | 48000 | Input sampling rate for conditions to generate (assumed to be sampling-rate of input PCM files to process) |
| input_select | ["in", "file2"] | Required | Filenames to filter in the input directory, can be a single value, an array or null. Only compares filenames (therefore "in" in this array would match both "in.wav" and "in.pcm") |
| | | | |
| concatenate_input | True/False | False | Whether to (horizontally) concatenate files in the input directory |
| concat_silence_ms | [1000, 1000] | [0, 0] | Specifies the pre- and post-silence duration to pad concatenation with in ms. If a single value is specified it will be used for BOTH pre- and post-padding |
| preproc_loudness | -26 | | Loudness to preprocess input to (dBov / LKFS depending on tool). Only processed if preproc_input is True. |
| out_fs | 48000 | 48000 | Output sampling rate for conditions to generate |
| output_loudness | -26 | | Loudness level for output file (dBov / LKFS depending on tool). |
| | | | |
| renderer_format | 7_1_4 or CICP19 | Required | Format to be rendered (using offline rendering, will be bypassed if = out_format) |
| binaural_rendered | True/False | False | Extra binauralization of the rendered outputs (using offline rendering) |
| include_LFE | True/False | False | Whether to include LFE in binural rendering |
| gain_factor | float value | 1.0 | Gain factor to be applied to LFE channel |
| loudness_tool | "sv56demo" | "bs1770demo" | Tool to use for loudness adjustment. Currently only sv56demo and bs1770demo are supported for appropriate format configurations. Optionally can be a path to the binary. |
| | | | |
| lt_mode | "MUSHRA" | | Automatically generates a NAME.ltg file with generate_lt_file.py in output_path according to the specified mode |
| conditions_to_generate | ["ref", "ivas"] | Required | list of conditions to be generated, for ivas and evs, multiple conditions can be specified with an \_ separator (i.e. "ivas_branch", "ivas_trunk" etc.) |
| | | | |
| ref | | | |
| - out_fc | 32000 | 48000 | cut-off frequency to be applied to the reference condition in post |
| ivas | | | |
| - bitrates | [16400, 128000] | Required | Bitrate(s) used for IVAS encoder |
| - enc_fs | 48000 | 48000 | Sampling rate for input to the encoder (pre-processing) |
| - max_band | wb, swb, fb etc. | FB | Maximum encoded bandwidth |
| - out_format | 7_1_4 or CICP19 | Required | Output format for IVAS, see spatial_audio_format |
| ISM or ISMx | yes/no/no | Objects with metadata, description using renderer metadata |
| MASA or MASAx | yes/no/no | mono or stereo signals with spatial metadata !!!metadata must share same basename as waveform file but with .met extension!!! |
| FOA/HOA2/HOA3 or PLANAR(FOA/HOAx) | yes/yes/yes | Ambisonic signals or planar ambisonic signals |
| 5_1/5_1_2/5_1_4/7_1/7_1_4 or CICP[6/12/14/16/19] | yes/yes/yes | Multi-channel signals for predefined loudspeaker layout |
| META | yes/yes/no | Audio scene described by a renderer config |
---
### 2.5. Processing
The processing chain is as follows:
1. Preprocessing
-**Condition**: `preproc_input == true`
- Input files converted to `in_format`
2. Processing
-**Condition**: Performed depending on key in `conditions_to_generate`
- Coding/decoding from `in_format` to `out_format`
3. Postprocessing
1. Rendering to `renderer_format`
-**Condition**: `out_format != renderer_format`
- output files converted from `out_format` to `renderer_format`
1. Binaural Rendering
-**Condition**: `binaural_rendered == true` and `out_format` is not a BINAURAL type
- output files converted from `out_format` to `BINAURAL`
---
### 2.6. Renderer Metadata definition
To run, the renderer requires a config file describing the input scene.The expected format of the config file is as follows:
---
- Line 1: Path to a "multitrack" audio file. This should be a single multichannel wav/pcm file that contains all input audio. For example channels 1-4 can be an FOA scene,channel 5 - an object and channels 6-11 - a 5.1 channel bed. If the path is not absolute, it is considered relative to the renderer executable, not the config file. This path has lower priority than the one given on the command line: *The path in the config file is ignored if the --inputAudio argument to the renderer executable is specified.*
---
- Line 2: Contains number of inputs. An input can either be an Ambisonics scene, anobject or a channel bed.This is NOT the total number of channels in the input audio file.The renderer currently supports simultaneously: *Up to 2 SBA inputs, Up to 2 MC inputs* Up to 16 ISM inputsThese limits can be freely changed with pre-processor macros, if needed.
---
- Following lines:
Define each of the inputs. Inputs can be listed in any order - they are NOT required to be listed in the same order as in the audio file.
Input definitions:
- First line of an input definition contains the input type: SBA, MC or ISM.Following lines depend on the input type:SBAIndex of the first channel of this input in the multitrack file (1-indexed)Ambisonics orderMCIndex of the first channel of this input in the multitrack file (1-indexed)CICP index of the speaker layoutISMIndex of this input's audio in the multitrack file (1-indexed)Path to ISM metadata file (if not absolute, relative to executable location)ORISMIndex of this input's audio in the multitrack file (1-indexed)Number N of positions defined, followed by N lines in form:
stay in position for x frames, azimuth, elevation(ISM position metadata defined this way is looped if there are more framesof audio than given positions)
---
Example config
The following example defines a scene with 4 inputs: *ISM with trajectory defined in a separate file. Channel 12 in the input file.* Ambisonics, order 1. Channels 1-4 in the input audio file. *CICP6 channel bed. Channels 5-10 in the input audio file.* ISM with 2 defined positions (-90,0) and (90,0). Channel 11 in the input file. The object will start at position (-90,0) and stay there for 5 frames, then move to (90,0) and stay there for 5 frames. This trajectory is looped over the duration of the input audio file.
## 3. Script for converting formats and binauralizing
The script audio3dtools.py can convert between different input and output formats and binauralize signals.
Execute `python -m pyaudio3dtools.audio3dtools --help` for usage.
### 3.1. Binauralizing with head rotation
This example binauralizes a HOA3 signal with a head-rotation trajectory. Head rotation is peformed in SHD. It is supported for HOA3 and META input formats. For META input format, the audioscene is first prerendered to HOA3 and then rotated and binauralized.
Currently MC input signals are supported. The reference processing can be activated by selecting BINAURAL[_ROOM]_REF as output format. The signals are generated by convolving the channels with the filters from the database that are closes to the current position of the virtual LS. All interpolation methods supported by numpy can be chosen between the measured points along the trajectory.
### 3.3. Rendering ISM to Custom loudspeakers with auxiliary binaural output
ISM metadata can either be specified via an input text file in the Renderer Metadata definition format, or via the commandline using the same style as IVAS: