add support for discretely IVAS combined formats with two IVAS instances (68aa8f90) · Commits · IVAS Codec Public Collaboration / IVAS Processing Scripts

README.md

+42 −31

Original line number	Diff line number	Diff line
		<!---
		****<!---

		(C) 2022-2025 IVAS codec Public Collaboration with portions copyright Dolby International AB, Ericsson AB,
		Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V., Huawei Technologies Co. LTD.,
		@@ -67,16 +67,16 @@ To facilitate the preparation of items for P800-{X} listening tests, it is possi

		The YAML configuration file (`scene_description_config_file.yml`) defines how individual mono files should be spatially positioned and combined into the target format. For advanced formats like OMASA or OSBA, note that additional SBA items may be required. Refer to the `examples/` folder for template `.yml` files demonstrating the expected structure and usage.

		Relative paths are resolved from the working directory (not the YAML file location). Use absolute paths if you're unsure. Avoid using dots `.` in file names (e.g., use `item_xxa3s1.wav`, not `item.xx.a3s1.wav`). Windows users: Use double backslashes `\\` and add `.exe` to executables if needed. Input and output files follow structured naming conventions to encode metadata like lab, language, speaker ID, etc. These are explained in detail in the file under Filename conventions.
		Relative paths are resolved from the working directory (not the YAML file location). Use absolute paths if you're unsure. Avoid using dots `.` in file names (e.g., use `item_xxa3s1.wav`, not `item.xx.a3s1.wav`). Windows users: Use double backslashes `\\` and add `.exe` to executables if needed. Input and output files follow structured naming conventions to encode metadata like lab, language, speaker ID, etc. These are explained in detail in the file under _Filename conventions_.

		Each entry under `scenes:` describes one test item, specifying:

		* `output`: output file name
		* `description`: human-readable description
		* `input`: list of mono `.wav` files
		* `azimuth` / `elevation`: spatial placement (°)
		* `level`: loudness in dB
		* `shift`: timing offsets in seconds
		- `output`: output file name
		- `description`: human-readable description
		- `input`: list of mono `.wav` files
		- `azimuth` / `elevation`: spatial placement (°)
		- `level`: loudness in dB
		- `shift`: timing offsets in seconds

		Dynamic positioning (e.g., `"-20:1.0:360"`) means the source will move over time, stepping every 20 ms.

		@@ -271,6 +271,10 @@ input:
		# fmt: "7_1_4"
		### Define mask (HP50 or 20KBP) for input signal filtering; default = null
		# mask: "HP50"
		### Gain factor to be applied BEFORE any other processing (linear, or add dB suffix)
		# gain_pre: 10 dB
		### Gain factor to be applied AFTER any other processing (linear, or add dB suffix)
		# gain_post: 3.1622776602
		### Target sampling rate in Hz for resampling; default = null (no resampling)
		# fs: 16000
		### Target loudness in LKFS; default = null (no loudness change applied)
		@@ -373,6 +377,8 @@ input:
		### mono_dmx generate mono downmix condition
		### evs generate an EVS coded condition (see below examples for additional required keys)
		### ivas generate an IVAS coded condition (see below examples for additional required keys)
		### ivas_combined generate a combined-format IVAS coded condition using two IVAS instances for each part
		### (see below examples for additional required keys)
		conditions_to_generate:
		### Reference and anchor conditions ##########################
		c01:
		@@ -401,7 +407,7 @@ conditions_to_generate:
		c06:
		### REQUIRED: type of condition
		type: ivas
		### REQUIRED: Bitrates to use for coding
		### REQUIRED: Bitrates to use for coding, for ivas_combined, first and second bitrates are for objects and spatial parts respectively
		bitrates:
		- 160000
		# - 32000
		@@ -428,7 +434,7 @@ conditions_to_generate:
		c07:
		### REQUIRED: type of condition
		type: ivas
		### REQUIRED: Bitrates to use for coding
		### REQUIRED: Bitrates to use for coding, for ivas_combined, first and second bitrates are for objects and spatial parts respectively
		bitrates:
		- 160000
		# - 32000
		@@ -490,6 +496,10 @@ postprocessing:
		fmt: "BINAURAL"
		### REQUIRED: Target sampling rate in Hz for resampling; default = null (no resampling)
		fs: 48000
		### Gain factor to be applied BEFORE any other processing (linear, or add dB suffix)
		# gain_pre: 10 dB
		### Gain factor to be applied AFTER any other processing (linear, or add dB suffix)
		# gain_post: 3.1622776602
		### Low-pass cut-off frequency in Hz; default = null (no filtering)
		# lp_cutoff: 24000
		### Target loudness in LKFS; default = null (no loudness change applied)
		@@ -517,7 +527,7 @@ postprocessing:
		The following values may be used for the `type` key of a condition:

		\| Supported conditions \| Description \|
		\| :------------------: \| ----------------------------------------------------------- \|
		\| :------------------: \| ----------------------------------------------------------------- \|
		\| ref \| Uncoded (reference) \|
		\| lp3k5 \| Uncoded low-passed at 3.5 kHz (anchor) \|
		\| lp7k \| Uncoded low-passed at 7 kHz (anchor) \|
		@@ -526,6 +536,7 @@ The following values may be used for the `type` key of a condition:
		\| mono_dmx \| Uncoded mono downmix \|
		\| evs \| Coded with multi-stream EVS codec (metadata not coded!) \|
		\| ivas \| Coded with IVAS codec \|
		\| ivas_combined \| Combined format coding with two IVAS instances (only OMASA, OSBA) \|

		### Configuration of conditions

ivas_processing_scripts/constants.py

+1 −0

Original line number	Diff line number	Diff line
		@@ -49,6 +49,7 @@ SUPPORTED_CONDITIONS = {
		"esdru",
		"evs",
		"ivas",
		"ivas_combined",
		"mono_dmx",
		"spatial_distortion",
		}

ivas_processing_scripts/processing/chains.py

+47 −38

Original line number	Diff line number	Diff line
		@@ -41,13 +41,14 @@ from ivas_processing_scripts.audiotools.audiofile import read, write
		from ivas_processing_scripts.processing.config import TestConfig
		from ivas_processing_scripts.processing.evs import EVS
		from ivas_processing_scripts.processing.ivas import IVAS, IVAS_rend
		from ivas_processing_scripts.processing.tx import get_tx_cfg
		from ivas_processing_scripts.processing.ivas_combined import IVASCombined
		from ivas_processing_scripts.processing.postprocessing import Postprocessing
		from ivas_processing_scripts.processing.preprocessing import Preprocessing
		from ivas_processing_scripts.processing.preprocessing_2 import Preprocessing2
		from ivas_processing_scripts.processing.processing_splitting_scaling import (
		Processing_splitting_scaling,
		)
		from ivas_processing_scripts.processing.tx import get_tx_cfg
		from ivas_processing_scripts.utils import get_abs_path, list_audio, parse_gain


		@@ -63,39 +64,45 @@ def init_processing_chains(cfg: TestConfig) -> None:
		# other processing chains
		for cond_name, cond_cfg in cfg.conditions_to_generate.items():
		bitrates = cond_cfg.get("bitrates")
		if bitrates is not None and len(bitrates) > 1:
		multiple_bitrates_flag = True

		if not bitrates:
		# non coding condition
		cfg.proc_chains.append(get_processing_chain(cond_name, cfg))
		else:
		# bitrates can be a value, list or list of lists
		# for EVS, this is a per-channel bitrate
		# for IVAS combined, this is a per-instance bitrate
		# otherwise it is regular IVAS with multiple bitrate conditions
		multiple_bitrates_flag = isinstance(bitrates, list) and len(bitrates) > 1

		# pass the list for IVAS Combined, per-instance bitrate
		if cond_cfg.get("type") == "ivas_combined":
		cfg.proc_chains.append(
		get_processing_chain(
		cond_name,
		cfg,
		bitrates,
		)
		)
		else:
		multiple_bitrates_flag = False
		if bitrates:
		for bitrate in bitrates:
		# check if a list was specified
		if isinstance(bitrate, list) and cond_name.startswith("ivas"):
		# flatten the list of lists for IVAS
		if isinstance(bitrate, list) and cond_cfg.get("type") == "ivas":
		# flatten the list by adding multiple conditions
		[
		cfg.proc_chains.append(
		get_processing_chain(
		cond_name,
		cond_cfg,
		extend_br,
		multiple_bitrates=multiple_bitrates_flag,
		cond_name, cfg, extend_br, multiple_bitrates_flag
		)
		)
		for extend_br in bitrate
		]
		else:
		# otherwise pass the list; EVS will interpret as per-channel bitrate
		# EVS, interpreted as per-channel bitrate
		cfg.proc_chains.append(
		get_processing_chain(
		cond_name,
		cfg,
		bitrate,
		multiple_bitrates=multiple_bitrates_flag,
		cond_name, cfg, bitrate, multiple_bitrates_flag
		)
		)
		else:
		# non coding condition
		cfg.proc_chains.append(get_processing_chain(cond_name, cfg))

		# list items in input directory
		cfg.items_list = list_audio(
		@@ -219,8 +226,7 @@ def get_processing_chain(
		) -> dict:
		"""Mapping from test configuration to condition and postprocessing keyword arguments"""
		name = f"{condition}"
		if bitrate:
		if multiple_bitrates:
		if bitrate and multiple_bitrates:
		if isinstance(bitrate, list):
		name += f"_{sum(bitrate)}"
		else:
		@@ -364,7 +370,7 @@ def get_processing_chain(
		)
		# update values to reflect decoder output
		tmp_in_fs = dec_cfg.get("fs", tmp_in_fs)
		elif cond_cfg["type"] == "ivas":
		elif cond_cfg["type"] == "ivas" or cond_cfg["type"] == "ivas_combined":
		cod_cfg = cond_cfg["cod"]
		dec_cfg = cond_cfg["dec"]

		@@ -402,7 +408,7 @@ def get_processing_chain(
		if hasattr(cfg, "preprocessing_2"):
		preamble = cfg.preprocessing_2.get("preamble", 0)

		# if the encoding format differs from the format after the preprocessing, add format conversion stuff
		# if the encoding format differs from the format after the preprocessing, add format conversion
		if (cod_fmt := cod_cfg.get("fmt", tmp_in_fmt)) != tmp_in_fmt:
		chain["processes"].append(
		Postprocessing(
		@@ -425,8 +431,11 @@ def get_processing_chain(
		cond_fmt.extend(tmp_out_fmt)
		tmp_out_fmt = tmp_out_fmt[0]

		ivas_cls = IVAS
		if cond_cfg["type"] == "ivas_combined":
		ivas_cls = IVASCombined
		chain["processes"].append(
		IVAS(
		ivas_cls(
		{
		"in_fmt": tmp_in_fmt,
		"in_fs": tmp_in_fs,

ivas_processing_scripts/processing/config.py

+1 −1

Original line number	Diff line number	Diff line
		@@ -230,7 +230,7 @@ class TestConfig:
		raise KeyError(
		f"The following key(s) must be specified for EVS: {REQUIRED_KEYS_EVS}"
		)
		elif type == "ivas":
		elif type == "ivas" or type == "ivas_combined":
		merged_cfg = get_default_config_for_codecs("IVAS", codec_bin_extension)
		merge_dicts(merged_cfg, cond_cfg)
		cfg["conditions_to_generate"][cond_name] = merged_cfg

ivas_processing_scripts/processing/ivas.py

+14 −16

Original line number	Diff line number	Diff line
		@@ -54,13 +54,13 @@ from ivas_processing_scripts.utils import run, use_wine
		class IVAS(Processing):
		def __init__(self, attrs):
		super().__init__(attrs)
		self._validate()
		self.name = "ivas"
		self.in_fmt = audio.fromtype(self.in_fmt)
		self.out_fmt = audio.fromtype(self.out_fmt)
		if not hasattr(self, "dec_opts"):
		self.dec_opts = None
		self._use_wine = use_wine(self.use_windows_codec_binaries)
		self._validate()

		def _validate(self):
		need_exe_suffix = (
		@@ -146,21 +146,19 @@ class IVAS(Processing):
		logger.debug(f"IVAS encoder {in_file} -> {bitstream}")

		# Only resample and convert if wav, otherwise supposed pcm to be sampled at self.in_fs
		metadata_files = []
		metadata_files = in_meta if in_meta else []

		# for MASA suppose that metadata file has same basename and location as input file
		if not metadata_files:
		if isinstance(self.in_fmt, audio.MetadataAssistedSpatialAudio):
		md_file = in_file.parent / (in_file.name + ".met")
		metadata_files.append(md_file)
		metadata_files = [md_file]
		elif isinstance(self.in_fmt, (audio.ObjectBasedAudio, audio.OSBAAudio)):
		metadata_files = in_meta
		elif isinstance(self.in_fmt, audio.OMASAAudio):
		metadata_files = in_meta
		# TODO treffehn: check and maybe change here and for masa
		# if len(metadata_files) != number of ism channels plus one
		# md_file = in_file.parent / (in_file.name + ".met")
		# metadata_files.append(md_file)
		pass
		if len(metadata_files) != self.in_fmt.num_ism_channels + 1:
		md_file = in_file.parent / (in_file.name + ".met")
		metadata_files.append(md_file)

		# Support input file wav, pcm and txt (metadata iis)
		if in_file.suffix == ".wav":