Merge branch 'main' of... (1cd40f94) · Commits · IVAS Codec Public Collaboration / IVAS Processing Scripts

.gitlab-ci.yml

+2 −2

Original line number	Diff line number	Diff line
		@@ -37,10 +37,10 @@ stages:
		# NOTE: CODEC_DIR has to be in PATH
		- cd $CODEC_DIR
		# make sure that we are at latest main
		# TODO: temporarily use the RC1b tag
		- git restore .
		- git clean -fx .
		- git fetch
		- git checkout 20230616-selection-prerelease
		- git checkout '20240522_delivery_SA4#128_final'
		- echo "--------------------------------------------"
		- echo "Building codec on commit $(git rev-parse HEAD --short)"
		- echo "--------------------------------------------"

README.md

+58 −32

Original line number	Diff line number	Diff line
		<!---
		****<!---

		(C) 2022-2025 IVAS codec Public Collaboration with portions copyright Dolby International AB, Ericsson AB,
		Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V., Huawei Technologies Co. LTD.,
		@@ -67,16 +67,18 @@ To facilitate the preparation of items for P800-{X} listening tests, it is possi

		The YAML configuration file (`scene_description_config_file.yml`) defines how individual mono files should be spatially positioned and combined into the target format. For advanced formats like OMASA or OSBA, note that additional SBA items may be required. Refer to the `examples/` folder for template `.yml` files demonstrating the expected structure and usage.

		Relative paths are resolved from the working directory (not the YAML file location). Use absolute paths if you're unsure. Avoid using dots `.` in file names (e.g., use `item_xxa3s1.wav`, not `item.xx.a3s1.wav`). Windows users: Use double backslashes `\\` and add `.exe` to executables if needed. Input and output files follow structured naming conventions to encode metadata like lab, language, speaker ID, etc. These are explained in detail in the file under Filename conventions.
		Relative paths are resolved from the working directory (not the YAML file location). Use absolute paths if you're unsure. Avoid using dots `.` in file names (e.g., use `item_xxa3s1.wav`, not `item.xx.a3s1.wav`). Windows users: Use double backslashes `\\` and add `.exe` to executables if needed. Input and output files follow structured naming conventions to encode metadata like lab, language, speaker ID, etc. These are explained in detail in the file under _Filename conventions_.

		Each entry under `scenes:` describes one test item, specifying:

		* `output`: output file name
		* `description`: human-readable description
		* `input`: list of mono `.wav` files
		* `azimuth` / `elevation`: spatial placement (°)
		* `level`: loudness in dB
		* `shift`: timing offsets in seconds
		- `output`: output file name
		- `description`: human-readable description
		- `input`: list of mono `.wav` files
		- `azimuth` / `elevation`: spatial placement (°)
		- `level`: loudness in dB
		- `shift`: timing offsets in seconds
		- `background`: background noise file (applicable to STEREO and SBA only)
		- `background_level`: level of the background noise (applicable to STEREO and SBA only)

		Dynamic positioning (e.g., `"-20:1.0:360"`) means the source will move over time, stepping every 20 ms.

		@@ -84,6 +86,8 @@ The total duration of the output signal can be controlled using the `duration`

		Start by running a single scene to verify settings. Output includes both audio and optional metadata files. You can enable multiprocessing by setting `multiprocessing: true`.

		The addition of custom background noise at specific level is supported for the STEREO and SBA formats only. For ISMs it's not applicable. For OMASA and OSBA formats, it is expected that the backround noise is provided in the FOA/HOA2/HOA3 format as the first item in the `input` list.

		### Item processing

		The input has to be in the folder `experiments/selection/P800-{X}/proc_input_{l}`. If item generation is performed previous to this step, the corresponding files are already in the right folder.
		@@ -271,6 +275,10 @@ input:
		# fmt: "7_1_4"
		### Define mask (HP50 or 20KBP) for input signal filtering; default = null
		# mask: "HP50"
		### Gain factor to be applied BEFORE any other processing (linear, or add dB suffix)
		# gain_pre: 10 dB
		### Gain factor to be applied AFTER any other processing (linear, or add dB suffix)
		# gain_post: 3.1622776602
		### Target sampling rate in Hz for resampling; default = null (no resampling)
		# fs: 16000
		### Target loudness in LKFS; default = null (no loudness change applied)
		@@ -373,6 +381,8 @@ input:
		### mono_dmx generate mono downmix condition
		### evs generate an EVS coded condition (see below examples for additional required keys)
		### ivas generate an IVAS coded condition (see below examples for additional required keys)
		### ivas_combined generate a combined-format IVAS coded condition using two IVAS instances for each part
		### (see below examples for additional required keys)
		conditions_to_generate:
		### Reference and anchor conditions ##########################
		c01:
		@@ -401,7 +411,7 @@ conditions_to_generate:
		c06:
		### REQUIRED: type of condition
		type: ivas
		### REQUIRED: Bitrates to use for coding
		### REQUIRED: Bitrates to use for coding, for ivas_combined, first and second bitrates are for objects and spatial parts respectively
		bitrates:
		- 160000
		# - 32000
		@@ -413,6 +423,8 @@ conditions_to_generate:
		# fs: 32000
		### Additional commandline options; default = null
		# opts: ["-q", "-dtx", 4]
		### Extended metadata flag for ISM > 64kbps, ignored otherwise; default = false
		# extended_metadata: true
		### Decoder options
		dec:
		### Path to decoder binary; default search for IVAS_dec in bin folder (primary) and PATH (secondary)
		@@ -423,12 +435,19 @@ conditions_to_generate:
		# fs: 48000
		### Additional commandline options; default = null
		# opts: ["-q", "-no_delay_cmp"]
		### Per-item renderer configuration. Set to true to search for a file with suffix .cfg; default = false
		# render_config: true
		### Head-tracking trajectory file for binaural output OR 'true' which will search for a file with the suffix .ht.csv next to the input; default = null
		### NOTE: this automatically configures the '-T' argument to the decoder, so may conflict if also specified in `opts`
		# trajectory: "path/to/file"
		### Limit the trajectory to 3DoF via truncation; default = false
		# only_3dof: false

		### IVAS condition ###############################
		c07:
		### REQUIRED: type of condition
		type: ivas
		### REQUIRED: Bitrates to use for coding
		### REQUIRED: Bitrates to use for coding, for ivas_combined, first and second bitrates are for objects and spatial parts respectively
		bitrates:
		- 160000
		# - 32000
		@@ -490,6 +509,10 @@ postprocessing:
		fmt: "BINAURAL"
		### REQUIRED: Target sampling rate in Hz for resampling; default = null (no resampling)
		fs: 48000
		### Gain factor to be applied BEFORE any other processing (linear, or add dB suffix)
		# gain_pre: 10 dB
		### Gain factor to be applied AFTER any other processing (linear, or add dB suffix)
		# gain_post: 3.1622776602
		### Low-pass cut-off frequency in Hz; default = null (no filtering)
		# lp_cutoff: 24000
		### Target loudness in LKFS; default = null (no loudness change applied)
		@@ -504,8 +527,10 @@ postprocessing:
		# bin_lfe_gain: 1
		### Flag whether output should be limited to avoid clipping (can alter target loudness); default = true
		# limit: false
		### Head-tracking trajectory file for binaural output; default = null
		### Head-tracking trajectory file for binaural output OR 'true' which will search for a file with the suffix .ht.csv in the input dir; default = null
		# trajectory: "path/to/file"
		### Limit the trajectory to 3DoF via truncation; default = false
		# only_3dof: false
		```

		</details>
		@@ -517,7 +542,7 @@ postprocessing:
		The following values may be used for the `type` key of a condition:

		\| Supported conditions \| Description \|
		\| :------------------: \| ----------------------------------------------------------- \|
		\| :------------------: \| ----------------------------------------------------------------- \|
		\| ref \| Uncoded (reference) \|
		\| lp3k5 \| Uncoded low-passed at 3.5 kHz (anchor) \|
		\| lp7k \| Uncoded low-passed at 7 kHz (anchor) \|
		@@ -526,6 +551,7 @@ The following values may be used for the `type` key of a condition:
		\| mono_dmx \| Uncoded mono downmix \|
		\| evs \| Coded with multi-stream EVS codec (metadata not coded!) \|
		\| ivas \| Coded with IVAS codec \|
		\| ivas_combined \| Combined format coding with two IVAS instances (only OMASA, OSBA) \|

		### Configuration of conditions

examples/ITEM_GENERATION_5_1_4.yml

0 → 100644

+177 −0

Original line number	Diff line number	Diff line
		---
		################################################
		# Item generation - General configuration
		################################################

		### Any relative paths will be interpreted relative to the working directory the script is called from!
		### Usage of absolute paths is recommended.
		### Do not use file names with dots "." in them! This is not supported, use "_" instead
		### For Windows users: please use double back slash '\\' in paths and add '.exe' to executable definitions

		### Output format
		format: "5_1_4"
		# masa_tc: 1 # applicable only to MASA/OMASA format
		# masa_dirs: 1 # applicable only to MASA/OMASA format
		# sba_order: 2 # applicable only to OSBA format

		### Output sampling rate in Hz
		fs: 48000

		### Generate BINAURAL output (_BINAURAL will be appended to the output filename)
		binaural_output: true

		### Normalize target loudness to X LKFS
		loudness: -26

		### Apply pre-amble and post-amble in X seconds
		preamble: 0.0
		postamble: 0.0

		### Apply fade-in and fade-out of X seconds
		fade_in_out: 0.5

		### Trim the output such that the total duration is X seconds
		duration: 8

		### Add low-level random background noise (amplitude +-4) instead of silence; default = false (silence)
		add_low_level_random_noise: false

		### Process with parallel streams
		multiprocessing: false

		################################################
		### Item generation - Filename conventions
		################################################

		### Naming convention for the input mono files
		### The input filenames are represented by:
		### lLLeeettszz.wav
		### where:
		### l stands for the listening lab designator: a (Force Technology), b (HEAD acoustics), c (MQ University), d (Mesaqin.com)
		### LL stands for the language: JP, FR, GE, MA, DA, EN
		### eee stands for the experiment designator: p01, p02, p04, p05, p06, p07, p08, p09
		### tt stands for the talker ID: f1, f2, f3, m1, m2, m3
		### s stands for 'sample' and zz is the sample number; 01, ..., 14

		### Naming convention for the generated output files
		### The output filenames are represented by:
		### leeeayszz.wav
		### The filenames of the accompanying output metadata files (applicable to metadata-assisted spatial audio, object-based audio) are represented by:
		### leeeayszz.met for metadata-assisted spatial audio
		### leeeayszz.wav.o.csv for object-based audio
		### where:
		### l stands for the listening lab designator: a (Force Technology), b (HEAD acoustics), c (MQ University), d (Mesaqin.com)
		### eee stands for the experiment designator: p01, p02, p04, p05, p06, p07, p08, p09
		### a stands 'audio'
		### y is the per-experiment category according to IVAS-8a: 01, 02, 03, 04, 05, 06
		### s stands for sample and zz is the sample number; 01, 02, 03, 04, 05, 06, 07 (07 is the preliminary sample)
		### o stands for the object number; 0, 1, 2, 3

		### File designators, default is "l" for listening lab, "EN" for language, "p07" for experiment and "g" for company
		listening_lab: "l"
		language: "EN"
		exp: "p01"
		provider: "va"

		### Insert prefix for all input filenames (default: "")
		### l stands for the 'listening_lab' designator, L stands for the 'language', e stands for the 'experiment'
		### the number of consecutive letters define the length of each field
		# use_input_prefix: "lLLeee"

		### Insert prefix for all output filenames (default: "")
		### l stands for the 'listening_lab' designator, L stands for the 'language', e stands for the 'experiment'
		### the number of consecutive letters define the length of each field
		# use_output_prefix: "leee"

		################################################
		### Item generation - Scene description
		################################################

		### Each scene shall de described using the following parameters/properties:
		### output: output filename
		### description: textual description of the scene
		### input: input filename(s)
		### IR: filenames(s) of the input IRs
		### azimuth: azimuth in the range [-180,180]; positive values point to the left
		### elevation: elevation in the range [-90,90]; positive values indicate up
		### shift: time adjustment of the input signal (negative value delays the signal)
		### background: background noise filename (if used, the 'add_low_level_random_noise' parameter is ignored)
		### background_level: normalized background noise loudness to X dB LKFS
		###
		### Note 0: you can use relative paths in filenames (the program assumes that the root directory is the parent directory of the ivas_processing_scripts subfolder)
		### Note 1: use brackets [val1, val2, ...] when specifying multiple values
		### Note 2: use the "start:step:stop" notation for moving sources, where step will be applied in 20ms frames
		### Note 3: we're using right-handed coordinate system with azimuth = 0 pointing from the nose to the screen

		scenes:
		"01":
		output: "out/s01.wav"
		description: "Car with AB microphone pickup, no overlap between the talkers, car noise."
		input: ["items_mono/untrimmed/f1s4b_Talker2.wav", "items_mono/untrimmed/f2s1a_Talker1.wav"]
		IR: ["IRs/IR_do_p04_e_01_01_FOA.wav", "IRs/IR_do_p04_e_02_01_FOA.wav"]
		shift: [0.0, -1.0]
		background: "items_background/Dolby_BG_do_p05_a_01_FOA.wav"
		background_level: -46

		"02":
		output: "out/s02.wav"
		description: "Car with AB microphone pickup, overlap between the talkers, car noise."
		input: ["items_mono/untrimmed/f1s6a_Talker2.wav", "items_mono/untrimmed/f2s3b_Talker1.wav"]
		IR: ["IRs/IR_do_p04_e_03_01_FOA.wav", "IRs/IR_do_p04_e_04_01_FOA.wav"]
		shift: [0.0, +1.0]
		background: "items_background/Dolby_BG_do_p05_a_01_FOA.wav"
		background_level: -46

		"03":
		output: "out/s03.wav"
		description: "Car with AB microphone pickup, no overlap between the talkers, car noise."
		input: ["items_mono/untrimmed/f3s3a_Talker2.wav", "items_mono/untrimmed/f3s10b_Talker2.wav"]
		IR: ["IRs/IR_do_p04_e_05_01_FOA.wav", "IRs/IR_do_p04_e_06_01_FOA.wav"]
		shift: [0.0, -1.0]
		background: "items_background/Dolby_BG_do_p05_a_01_FOA.wav"
		background_level: -46

		"04":
		output: "out/s04.wav"
		description: "Car with AB microphone pickup, no overlap between the talkers, car noise."
		input: ["items_mono/untrimmed/f2s7b_Talker1.wav", "items_mono/untrimmed/f5s15a_Talker1.wav"]
		IR: ["IRs/IR_do_p04_e_07_01_FOA.wav", "IRs/IR_do_p04_e_08_01_FOA.wav"]
		shift: [0.0, -1.0]
		background: "items_background/Dolby_BG_do_p05_a_01_FOA.wav"
		background_level: -46

		"05":
		output: "out/s05.wav"
		description: "Car with AB microphone pickup, no overlap between the talkers, car noise."
		input: ["items_mono/untrimmed/m2s15a_Talker2.wav", "items_mono/untrimmed/m1s4a_Talker1.wav"]
		IR: ["IRs/IR_do_p04_e_07_01_FOA.wav", "IRs/IR_do_p04_e_01_01_FOA.wav"]
		shift: [0.0, -1.0]
		background: "items_background/Dolby_BG_do_p05_a_01_FOA.wav"
		background_level: -46

		"06":
		output: "out/s06.wav"
		description: "Car with AB microphone pickup, no overlap between the talkers."
		input: ["items_mono/untrimmed/m3s8a_Talker2.wav", "items_mono/untrimmed/m4s13a_Talker1.wav"]
		IR: ["IRs/IR_do_p04_e_03_01_FOA.wav", "IRs/IR_do_p04_e_01_01_FOA.wav"]
		shift: [0.0, -1.0]
		background: "items_background/Dolby_BG_do_p05_a_01_FOA.wav"
		background_level: -46

		"07":
		output: "out/s07.wav"
		description: "Preliminary: Car with AB microphone pickup, no overlap between the talkers."
		input: ["items_mono/untrimmed/f1s20a_Talker2.wav", "items_mono/untrimmed/f5s15b_Talker1.wav"]
		IR: ["IRs/IR_do_p04_e_02_01_FOA.wav", "IRs/IR_do_p04_e_07_01_FOA.wav"]
		shift: [0.0, -1.0]
		background: "items_background/Dolby_BG_do_p05_a_01_FOA.wav"
		background_level: -46

		"08":
		output: "out/s08.wav"
		description: "Car with AB microphone pickup, overlap between the talkers."
		input: ["items_mono/untrimmed/m2s6b_Talker2.wav", "items_mono/untrimmed/f5s14a_Talker1.wav"]
		IR: ["IRs/IR_do_p04_e_08_01_FOA.wav", "IRs/IR_do_p04_e_04_01_FOA.wav"]
		shift: [0.0, +1.0]
		background: "items_background/Dolby_BG_do_p05_a_01_FOA.wav"
		background_level: -46

examples/ITEM_GENERATION_FOA.yml

+11 −1

Original line number	Diff line number	Diff line
		@@ -95,6 +95,8 @@ use_output_prefix: "leee"
		### azimuth: azimuth in the range [-180,180]; positive values point to the left
		### elevation: elevation in the range [-90,90]; positive values indicate up
		### shift: time adjustment of the input signal (negative value delays the signal)
		### background: background noise filename (if used, the 'add_low_level_random_noise' parameter is ignored)
		### background_level: normalized background noise loudness to X dB LKFS
		###
		### Note 0: you can use relative paths in filenames (the program assumes that the root directory is the parent directory of the ivas_processing_scripts subfolder)
		### Note 1: use brackets [val1, val2, ...] when specifying multiple values
		@@ -109,6 +111,8 @@ scenes:
		input: ["items_mono/untrimmed/f1s4b_Talker2.wav", "items_mono/untrimmed/f2s1a_Talker1.wav"]
		IR: ["IRs/IR_do_p04_e_01_01_FOA.wav", "IRs/IR_do_p04_e_02_01_FOA.wav"]
		shift: [0.0, -1.0]
		background: "items_background/Dolby_BG_do_p05_a_01_FOA.wav"
		background_level: -46

		"02":
		output: "out/s02.wav"
		@@ -116,6 +120,8 @@ scenes:
		input: ["items_mono/untrimmed/f1s6a_Talker2.wav", "items_mono/untrimmed/f2s3b_Talker1.wav"]
		IR: ["IRs/IR_do_p04_e_03_01_FOA.wav", "IRs/IR_do_p04_e_04_01_FOA.wav"]
		shift: [0.0, +1.0]
		background: "items_background/Dolby_BG_do_p05_a_01_FOA.wav"
		background_level: -46

		"03":
		output: "out/s03.wav"
		@@ -130,6 +136,8 @@ scenes:
		input: ["items_mono/untrimmed/f2s7b_Talker1.wav", "items_mono/untrimmed/f5s15a_Talker1.wav"]
		IR: ["IRs/IR_do_p04_e_07_01_FOA.wav", "IRs/IR_do_p04_e_08_01_FOA.wav"]
		shift: [0.0, -1.0]
		background: "items_background/Dolby_BG_do_p05_a_01_FOA.wav"
		background_level: -46

		"05":
		output: "out/s05.wav"
		@@ -137,6 +145,8 @@ scenes:
		input: ["items_mono/untrimmed/m2s15a_Talker2.wav", "items_mono/untrimmed/m1s4a_Talker1.wav"]
		IR: ["IRs/IR_do_p04_e_07_01_FOA.wav", "IRs/IR_do_p04_e_01_01_FOA.wav"]
		shift: [0.0, -1.0]
		background: "items_background/Dolby_BG_do_p05_a_01_FOA.wav"
		background_level: -46

		"06":
		output: "out/s06.wav"

examples/ITEM_GENERATION_MASA.yml

0 → 100644

+177 −0

File added.

Preview size limit exceeded, changes collapsed.