Merge branch... (e499d545) · Commits · IVAS Codec Public Collaboration / IVAS Processing Scripts

README.md

+42 −31

Original line number	Diff line number	Diff line
		<!---
		****<!---

		(C) 2022-2025 IVAS codec Public Collaboration with portions copyright Dolby International AB, Ericsson AB,
		Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V., Huawei Technologies Co. LTD.,
		@@ -67,16 +67,16 @@ To facilitate the preparation of items for P800-{X} listening tests, it is possi

		The YAML configuration file (`scene_description_config_file.yml`) defines how individual mono files should be spatially positioned and combined into the target format. For advanced formats like OMASA or OSBA, note that additional SBA items may be required. Refer to the `examples/` folder for template `.yml` files demonstrating the expected structure and usage.

		Relative paths are resolved from the working directory (not the YAML file location). Use absolute paths if you're unsure. Avoid using dots `.` in file names (e.g., use `item_xxa3s1.wav`, not `item.xx.a3s1.wav`). Windows users: Use double backslashes `\\` and add `.exe` to executables if needed. Input and output files follow structured naming conventions to encode metadata like lab, language, speaker ID, etc. These are explained in detail in the file under Filename conventions.
		Relative paths are resolved from the working directory (not the YAML file location). Use absolute paths if you're unsure. Avoid using dots `.` in file names (e.g., use `item_xxa3s1.wav`, not `item.xx.a3s1.wav`). Windows users: Use double backslashes `\\` and add `.exe` to executables if needed. Input and output files follow structured naming conventions to encode metadata like lab, language, speaker ID, etc. These are explained in detail in the file under _Filename conventions_.

		Each entry under `scenes:` describes one test item, specifying:

		* `output`: output file name
		* `description`: human-readable description
		* `input`: list of mono `.wav` files
		* `azimuth` / `elevation`: spatial placement (°)
		* `level`: loudness in dB
		* `shift`: timing offsets in seconds
		- `output`: output file name
		- `description`: human-readable description
		- `input`: list of mono `.wav` files
		- `azimuth` / `elevation`: spatial placement (°)
		- `level`: loudness in dB
		- `shift`: timing offsets in seconds

		Dynamic positioning (e.g., `"-20:1.0:360"`) means the source will move over time, stepping every 20 ms.

		@@ -271,6 +271,10 @@ input:
		# fmt: "7_1_4"
		### Define mask (HP50 or 20KBP) for input signal filtering; default = null
		# mask: "HP50"
		### Gain factor to be applied BEFORE any other processing (linear, or add dB suffix)
		# gain_pre: 10 dB
		### Gain factor to be applied AFTER any other processing (linear, or add dB suffix)
		# gain_post: 3.1622776602
		### Target sampling rate in Hz for resampling; default = null (no resampling)
		# fs: 16000
		### Target loudness in LKFS; default = null (no loudness change applied)
		@@ -373,6 +377,8 @@ input:
		### mono_dmx generate mono downmix condition
		### evs generate an EVS coded condition (see below examples for additional required keys)
		### ivas generate an IVAS coded condition (see below examples for additional required keys)
		### ivas_combined generate a combined-format IVAS coded condition using two IVAS instances for each part
		### (see below examples for additional required keys)
		conditions_to_generate:
		### Reference and anchor conditions ##########################
		c01:
		@@ -401,7 +407,7 @@ conditions_to_generate:
		c06:
		### REQUIRED: type of condition
		type: ivas
		### REQUIRED: Bitrates to use for coding
		### REQUIRED: Bitrates to use for coding, for ivas_combined, first and second bitrates are for objects and spatial parts respectively
		bitrates:
		- 160000
		# - 32000
		@@ -430,7 +436,7 @@ conditions_to_generate:
		c07:
		### REQUIRED: type of condition
		type: ivas
		### REQUIRED: Bitrates to use for coding
		### REQUIRED: Bitrates to use for coding, for ivas_combined, first and second bitrates are for objects and spatial parts respectively
		bitrates:
		- 160000
		# - 32000
		@@ -492,6 +498,10 @@ postprocessing:
		fmt: "BINAURAL"
		### REQUIRED: Target sampling rate in Hz for resampling; default = null (no resampling)
		fs: 48000
		### Gain factor to be applied BEFORE any other processing (linear, or add dB suffix)
		# gain_pre: 10 dB
		### Gain factor to be applied AFTER any other processing (linear, or add dB suffix)
		# gain_post: 3.1622776602
		### Low-pass cut-off frequency in Hz; default = null (no filtering)
		# lp_cutoff: 24000
		### Target loudness in LKFS; default = null (no loudness change applied)
		@@ -519,7 +529,7 @@ postprocessing:
		The following values may be used for the `type` key of a condition:

		\| Supported conditions \| Description \|
		\| :------------------: \| ----------------------------------------------------------- \|
		\| :------------------: \| ----------------------------------------------------------------- \|
		\| ref \| Uncoded (reference) \|
		\| lp3k5 \| Uncoded low-passed at 3.5 kHz (anchor) \|
		\| lp7k \| Uncoded low-passed at 7 kHz (anchor) \|
		@@ -528,6 +538,7 @@ The following values may be used for the `type` key of a condition:
		\| mono_dmx \| Uncoded mono downmix \|
		\| evs \| Coded with multi-stream EVS codec (metadata not coded!) \|
		\| ivas \| Coded with IVAS codec \|
		\| ivas_combined \| Combined format coding with two IVAS instances (only OMASA, OSBA) \|

		### Configuration of conditions

examples/TEMPLATE.yml

+13 −11

Original line number	Diff line number	Diff line
		@@ -154,7 +154,7 @@ input:
		### REQUIRED: either error_pattern (and errpatt_late_loss_rate or errpatt_delay) or error_profile
		### delay error profile file
		# error_pattern: ".../dly_error_profile.dat"
		### Late loss rate in precent or EVS
		### Late loss rate in precent for EVS
		# errpatt_late_loss_rate: 1
		### Constant JBM delay in milliseconds for EVS
		# errpatt_delay: 200
		@@ -186,6 +186,8 @@ input:
		### mono_dmx generate mono downmix condition
		### evs generate an EVS coded condition (see below examples for additional required keys)
		### ivas generate an IVAS coded condition (see below examples for additional required keys)
		### ivas_combined generate a combined-format IVAS coded condition using two IVAS instances for each part
		### (see below examples for additional required keys)
		conditions_to_generate:
		### Reference and anchor conditions ##########################
		c01:
		@@ -226,7 +228,7 @@ conditions_to_generate:
		c06:
		### REQUIRED: type of condition
		type: ivas
		### REQUIRED: Bitrates to use for coding
		### REQUIRED: Bitrates to use for coding, for ivas_combined, first and second bitrates are for objects and spatial parts respectively
		bitrates:
		- 160000
		# - 32000
		@@ -264,7 +266,7 @@ conditions_to_generate:
		c07:
		### REQUIRED: type of condition
		type: ivas
		### REQUIRED: Bitrates to use for coding
		### REQUIRED: Bitrates to use for coding, for ivas_combined, first and second bitrates are for objects and spatial parts respectively
		bitrates:
		- 160000
		# - 32000

ivas_processing_scripts/audiotools/convert/objectbased.py

+1 −0

Original line number	Diff line number	Diff line
		@@ -141,6 +141,7 @@ def render_oba_to_binaural(

		# sum results over all objects
		bin.audio = np.sum(np.stack(result, axis=2), axis=2)
		bin.fs = 48000

		# compensate delay from binaural dataset
		bin.audio = delay(bin.audio, bin.fs, -latency_smp, samples=True)

ivas_processing_scripts/constants.py

+1 −0

Original line number	Diff line number	Diff line
		@@ -49,6 +49,7 @@ SUPPORTED_CONDITIONS = {
		"esdru",
		"evs",
		"ivas",
		"ivas_combined",
		"mono_dmx",
		"spatial_distortion",
		}

ivas_processing_scripts/processing/chains.py

+57 −110

Original line number	Diff line number	Diff line
		@@ -41,12 +41,14 @@ from ivas_processing_scripts.audiotools.audiofile import read, write
		from ivas_processing_scripts.processing.config import TestConfig
		from ivas_processing_scripts.processing.evs import EVS
		from ivas_processing_scripts.processing.ivas import IVAS, IVAS_rend
		from ivas_processing_scripts.processing.ivas_combined import IVASCombined
		from ivas_processing_scripts.processing.postprocessing import Postprocessing
		from ivas_processing_scripts.processing.preprocessing import Preprocessing
		from ivas_processing_scripts.processing.preprocessing_2 import Preprocessing2
		from ivas_processing_scripts.processing.processing_splitting_scaling import (
		Processing_splitting_scaling,
		)
		from ivas_processing_scripts.processing.tx import get_tx_cfg
		from ivas_processing_scripts.utils import get_abs_path, list_audio, parse_gain


		@@ -62,39 +64,45 @@ def init_processing_chains(cfg: TestConfig) -> None:
		# other processing chains
		for cond_name, cond_cfg in cfg.conditions_to_generate.items():
		bitrates = cond_cfg.get("bitrates")
		if bitrates is not None and len(bitrates) > 1:
		multiple_bitrates_flag = True

		if not bitrates:
		# non coding condition
		cfg.proc_chains.append(get_processing_chain(cond_name, cfg))
		else:
		# bitrates can be a value, list or list of lists
		# for EVS, this is a per-channel bitrate
		# for IVAS combined, this is a per-instance bitrate
		# otherwise it is regular IVAS with multiple bitrate conditions
		multiple_bitrates_flag = isinstance(bitrates, list) and len(bitrates) > 1

		# pass the list for IVAS Combined, per-instance bitrate
		if cond_cfg.get("type") == "ivas_combined":
		cfg.proc_chains.append(
		get_processing_chain(
		cond_name,
		cfg,
		bitrates,
		)
		)
		else:
		multiple_bitrates_flag = False
		if bitrates:
		for bitrate in bitrates:
		# check if a list was specified
		if isinstance(bitrate, list) and cond_name.startswith("ivas"):
		# flatten the list of lists for IVAS
		if isinstance(bitrate, list) and cond_cfg.get("type") == "ivas":
		# flatten the list by adding multiple conditions
		[
		cfg.proc_chains.append(
		get_processing_chain(
		cond_name,
		cond_cfg,
		extend_br,
		multiple_bitrates=multiple_bitrates_flag,
		cond_name, cfg, extend_br, multiple_bitrates_flag
		)
		)
		for extend_br in bitrate
		]
		else:
		# otherwise pass the list; EVS will interpret as per-channel bitrate
		# EVS, interpreted as per-channel bitrate
		cfg.proc_chains.append(
		get_processing_chain(
		cond_name,
		cfg,
		bitrate,
		multiple_bitrates=multiple_bitrates_flag,
		cond_name, cfg, bitrate, multiple_bitrates_flag
		)
		)
		else:
		# non coding condition
		cfg.proc_chains.append(get_processing_chain(cond_name, cfg))

		# list items in input directory
		cfg.items_list = list_audio(
		@@ -218,8 +226,7 @@ def get_processing_chain(
		) -> dict:
		"""Mapping from test configuration to condition and postprocessing keyword arguments"""
		name = f"{condition}"
		if bitrate:
		if multiple_bitrates:
		if bitrate and multiple_bitrates:
		if isinstance(bitrate, list):
		name += f"_{sum(bitrate)}"
		else:
		@@ -312,52 +319,17 @@ def get_processing_chain(
		evs_lfe_9k6bps_nb = cond_cfg.get("evs_lfe_9k6bps_nb", None)

		# Frame error pattern bitstream modification
		tx_cfg = None
		if "tx" in cond_cfg.keys() or hasattr(cfg, "tx"):
		# postprocess also signal without error if there is loudness scaling
		if post_cfg.get("loudness"):
		tx_condition = True
		# local specification overwrites global one
		if "tx" in cond_cfg.keys():
		tx_cfg_tmp = cond_cfg["tx"]
		else:
		tx_cfg_tmp = cfg.tx

		if tx_cfg_tmp.get("type", None) == "FER":
		tx_cfg = {
		"type": tx_cfg_tmp.get("type", None),
		"error_pattern": get_abs_path(
		tx_cfg_tmp.get("error_pattern", None)
		),
		"error_rate": tx_cfg_tmp.get("error_rate", None),
		"master_seed": cfg.master_seed,
		"prerun_seed": cfg.prerun_seed,
		}
		elif tx_cfg_tmp.get("type", None) == "JBM":
		tx_cfg = {
		"type": tx_cfg_tmp.get("type", None),
		"error_pattern": get_abs_path(
		tx_cfg_tmp.get("error_pattern", None)
		),
		"errpatt_late_loss_rate": tx_cfg_tmp.get(
		"errpatt_late_loss_rate", None
		),
		"errpatt_delay": tx_cfg_tmp.get("errpatt_delay", None),
		"errpatt_seed": tx_cfg_tmp.get("errpatt_seed", None),
		"error_profile": tx_cfg_tmp.get("error_profile", None),
		"n_frames_per_packet": tx_cfg_tmp.get("n_frames_per_packet", None),
		"master_seed": cfg.master_seed,
		}
		else:
		raise ValueError(
		"Type of bitstream procesing either missing or not valid"
		)
		else:
		tx_cfg = None
		tx_cfg = get_tx_cfg(cfg, cond_cfg, is_EVS=True)

		preamble = 0
		if hasattr(cfg, "preprocessing_2"):
		preamble = cfg.preprocessing_2.get("preamble", 0)
		else:
		preamble = 0

		# if the encoding format differs from the format after the preprocessing, add format conversion stuff
		if (cod_fmt := cod_cfg.get("fmt", tmp_in_fmt)) != tmp_in_fmt:
		@@ -398,7 +370,7 @@ def get_processing_chain(
		)
		# update values to reflect decoder output
		tmp_in_fs = dec_cfg.get("fs", tmp_in_fs)
		elif cond_cfg["type"] == "ivas":
		elif cond_cfg["type"] == "ivas" or cond_cfg["type"] == "ivas_combined":
		cod_cfg = cond_cfg["cod"]
		dec_cfg = cond_cfg["dec"]

		@@ -423,49 +395,20 @@ def get_processing_chain(
		)

		# Frame error pattern bitstream modification
		tx_cfg = None
		if "tx" in cond_cfg.keys() or hasattr(cfg, "tx"):
		# postprocess also signal without error if there is loudness scaling
		if post_cfg.get("loudness"):
		tx_condition = True
		# local specification overwrites global one
		if "tx" in cond_cfg.keys():
		tx_cfg_tmp = cond_cfg["tx"]
		else:
		tx_cfg_tmp = cfg.tx

		if tx_cfg_tmp.get("type", None) == "FER":
		tx_cfg = {
		"type": tx_cfg_tmp.get("type", None),
		"error_pattern": get_abs_path(
		tx_cfg_tmp.get("error_pattern", None)
		),
		"error_rate": tx_cfg_tmp.get("error_rate", None),
		"master_seed": cfg.master_seed,
		"prerun_seed": cfg.prerun_seed,
		}
		elif tx_cfg_tmp.get("type", None) == "JBM":
		tx_cfg = {
		"type": tx_cfg_tmp.get("type", None),
		"error_pattern": tx_cfg_tmp.get("error_pattern", None),
		"error_profile": tx_cfg_tmp.get("error_profile", None),
		"n_frames_per_packet": tx_cfg_tmp.get("n_frames_per_packet", None),
		"master_seed": cfg.master_seed,
		"errpatt_seed": tx_cfg_tmp.get("errpatt_seed", None),
		}
		ivas_jbm = True
		else:
		raise ValueError(
		"Type of bitstream procesing either missing or not valid"
		)
		else:
		tx_cfg = None
		tx_cfg = get_tx_cfg(cfg, cond_cfg)
		ivas_jbm = tx_cfg.get("type") == "JBM"

		preamble = 0
		if hasattr(cfg, "preprocessing_2"):
		preamble = cfg.preprocessing_2.get("preamble", 0)
		else:
		preamble = 0

		# if the encoding format differs from the format after the preprocessing, add format conversion stuff
		# if the encoding format differs from the format after the preprocessing, add format conversion
		if (cod_fmt := cod_cfg.get("fmt", tmp_in_fmt)) != tmp_in_fmt:
		chain["processes"].append(
		Postprocessing(
		@@ -488,8 +431,11 @@ def get_processing_chain(
		cond_fmt.extend(tmp_out_fmt)
		tmp_out_fmt = tmp_out_fmt[0]

		ivas_cls = IVAS
		if cond_cfg["type"] == "ivas_combined":
		ivas_cls = IVASCombined
		chain["processes"].append(
		IVAS(
		ivas_cls(
		{
		"in_fmt": tmp_in_fmt,
		"in_fs": tmp_in_fs,
		@@ -592,6 +538,7 @@ def get_processing_chain(
		}
		)
		)

		# add splitting and scaling for all conditions
		chain["processes"].append(
		Processing_splitting_scaling(