Commit 68aa8f90 authored by Archit Tamarapu's avatar Archit Tamarapu
Browse files

add support for discretely IVAS combined formats with two IVAS instances

parent 7091fd04
Loading
Loading
Loading
Loading
+42 −31
Original line number Diff line number Diff line
<!---
****<!---

   (C) 2022-2025 IVAS codec Public Collaboration with portions copyright Dolby International AB, Ericsson AB,
   Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V., Huawei Technologies Co. LTD.,
@@ -67,16 +67,16 @@ To facilitate the preparation of items for P800-{X} listening tests, it is possi

The YAML configuration file (`scene_description_config_file.yml`) defines how individual mono files should be spatially positioned and combined into the target format. For advanced formats like OMASA or OSBA, note that additional SBA items may be required. Refer to the `examples/` folder for template `.yml` files demonstrating the expected structure and usage.

Relative paths are resolved from the working directory (not the YAML file location). Use absolute paths if you're unsure. Avoid using dots `.` in file names (e.g., use `item_xxa3s1.wav`, not `item.xx.a3s1.wav`). Windows users: Use double backslashes `\\` and add `.exe` to executables if needed. Input and output files follow structured naming conventions to encode metadata like lab, language, speaker ID, etc. These are explained in detail in the file under *Filename conventions*.
Relative paths are resolved from the working directory (not the YAML file location). Use absolute paths if you're unsure. Avoid using dots `.` in file names (e.g., use `item_xxa3s1.wav`, not `item.xx.a3s1.wav`). Windows users: Use double backslashes `\\` and add `.exe` to executables if needed. Input and output files follow structured naming conventions to encode metadata like lab, language, speaker ID, etc. These are explained in detail in the file under _Filename conventions_.

Each entry under `scenes:` describes one test item, specifying:

* `output`: output file name
* `description`: human-readable description
* `input`: list of mono `.wav` files
* `azimuth` / `elevation`: spatial placement (°)
* `level`: loudness in dB
* `shift`: timing offsets in seconds
- `output`: output file name
- `description`: human-readable description
- `input`: list of mono `.wav` files
- `azimuth` / `elevation`: spatial placement (°)
- `level`: loudness in dB
- `shift`: timing offsets in seconds

Dynamic positioning (e.g., `"-20:1.0:360"`) means the source will move over time, stepping every 20 ms.

@@ -271,6 +271,10 @@ input:
  # fmt: "7_1_4"
  ### Define mask (HP50 or 20KBP) for input signal filtering; default = null
  # mask: "HP50"
  ### Gain factor to be applied BEFORE any other processing (linear, or add dB suffix)
  # gain_pre: 10 dB
  ### Gain factor to be applied AFTER any other processing (linear, or add dB suffix)
  # gain_post: 3.1622776602
  ### Target sampling rate in Hz for resampling; default = null (no resampling)
  # fs: 16000
  ### Target loudness in LKFS; default = null (no loudness change applied)
@@ -373,6 +377,8 @@ input:
###     mono_dmx        generate mono downmix condition
###     evs             generate an EVS coded condition (see below examples for additional required keys)
###     ivas            generate an IVAS coded condition (see below examples for additional required keys)
###     ivas_combined   generate a combined-format IVAS coded condition using two IVAS instances for each part
###                     (see below examples for additional required keys)
conditions_to_generate:
  ### Reference and anchor conditions ##########################
  c01:
@@ -401,7 +407,7 @@ conditions_to_generate:
  c06:
    ### REQUIRED: type of condition
    type: ivas
    ### REQUIRED: Bitrates to use for coding
    ### REQUIRED: Bitrates to use for coding, for ivas_combined, first and second bitrates are for objects and spatial parts respectively
    bitrates:
      - 160000
      # - 32000
@@ -428,7 +434,7 @@ conditions_to_generate:
  c07:
    ### REQUIRED: type of condition
    type: ivas
    ### REQUIRED: Bitrates to use for coding
    ### REQUIRED: Bitrates to use for coding, for ivas_combined, first and second bitrates are for objects and spatial parts respectively
    bitrates:
      - 160000
      # - 32000
@@ -490,6 +496,10 @@ postprocessing:
  fmt: "BINAURAL"
  ### REQUIRED: Target sampling rate in Hz for resampling; default = null (no resampling)
  fs: 48000
  ### Gain factor to be applied BEFORE any other processing (linear, or add dB suffix)
  # gain_pre: 10 dB
  ### Gain factor to be applied AFTER any other processing (linear, or add dB suffix)
  # gain_post: 3.1622776602
  ### Low-pass cut-off frequency in Hz; default = null (no filtering)
  # lp_cutoff: 24000
  ### Target loudness in LKFS; default = null (no loudness change applied)
@@ -517,7 +527,7 @@ postprocessing:
The following values may be used for the `type` key of a condition:

| Supported conditions | Description                                                       |
| :------------------: | ----------------------------------------------------------- |
| :------------------: | ----------------------------------------------------------------- |
|         ref          | Uncoded (reference)                                               |
|        lp3k5         | Uncoded low-passed at 3.5 kHz (anchor)                            |
|         lp7k         | Uncoded low-passed at 7 kHz (anchor)                              |
@@ -526,6 +536,7 @@ The following values may be used for the `type` key of a condition:
|       mono_dmx       | Uncoded mono downmix                                              |
|         evs          | Coded with multi-stream EVS codec (**metadata not coded!**)       |
|         ivas         | Coded with IVAS codec                                             |
|    ivas_combined     | Combined format coding with two IVAS instances (only OMASA, OSBA) |

### Configuration of conditions

+1 −0
Original line number Diff line number Diff line
@@ -49,6 +49,7 @@ SUPPORTED_CONDITIONS = {
    "esdru",
    "evs",
    "ivas",
    "ivas_combined",
    "mono_dmx",
    "spatial_distortion",
}
+47 −38
Original line number Diff line number Diff line
@@ -41,13 +41,14 @@ from ivas_processing_scripts.audiotools.audiofile import read, write
from ivas_processing_scripts.processing.config import TestConfig
from ivas_processing_scripts.processing.evs import EVS
from ivas_processing_scripts.processing.ivas import IVAS, IVAS_rend
from ivas_processing_scripts.processing.tx import get_tx_cfg
from ivas_processing_scripts.processing.ivas_combined import IVASCombined
from ivas_processing_scripts.processing.postprocessing import Postprocessing
from ivas_processing_scripts.processing.preprocessing import Preprocessing
from ivas_processing_scripts.processing.preprocessing_2 import Preprocessing2
from ivas_processing_scripts.processing.processing_splitting_scaling import (
    Processing_splitting_scaling,
)
from ivas_processing_scripts.processing.tx import get_tx_cfg
from ivas_processing_scripts.utils import get_abs_path, list_audio, parse_gain


@@ -63,39 +64,45 @@ def init_processing_chains(cfg: TestConfig) -> None:
    # other processing chains
    for cond_name, cond_cfg in cfg.conditions_to_generate.items():
        bitrates = cond_cfg.get("bitrates")
        if bitrates is not None and len(bitrates) > 1:
            multiple_bitrates_flag = True

        if not bitrates:
            # non coding condition
            cfg.proc_chains.append(get_processing_chain(cond_name, cfg))
        else:
            # bitrates can be a value, list or list of lists
            # for EVS, this is a per-channel bitrate
            # for IVAS combined, this is a per-instance bitrate
            # otherwise it is regular IVAS with multiple bitrate conditions
            multiple_bitrates_flag = isinstance(bitrates, list) and len(bitrates) > 1

            # pass the list for IVAS Combined, per-instance bitrate
            if cond_cfg.get("type") == "ivas_combined":
                cfg.proc_chains.append(
                    get_processing_chain(
                        cond_name,
                        cfg,
                        bitrates,
                    )
                )
            else:
            multiple_bitrates_flag = False
        if bitrates:
                for bitrate in bitrates:
                # check if a list was specified
                if isinstance(bitrate, list) and cond_name.startswith("ivas"):
                    # flatten the list of lists for IVAS
                    if isinstance(bitrate, list) and cond_cfg.get("type") == "ivas":
                        # flatten the list by adding multiple conditions
                        [
                            cfg.proc_chains.append(
                                get_processing_chain(
                                cond_name,
                                cond_cfg,
                                extend_br,
                                multiple_bitrates=multiple_bitrates_flag,
                                    cond_name, cfg, extend_br, multiple_bitrates_flag
                                )
                            )
                            for extend_br in bitrate
                        ]
                    else:
                    # otherwise pass the list; EVS will interpret as per-channel bitrate
                        # EVS, interpreted as per-channel bitrate
                        cfg.proc_chains.append(
                            get_processing_chain(
                            cond_name,
                            cfg,
                            bitrate,
                            multiple_bitrates=multiple_bitrates_flag,
                                cond_name, cfg, bitrate, multiple_bitrates_flag
                            )
                        )
        else:
            # non coding condition
            cfg.proc_chains.append(get_processing_chain(cond_name, cfg))

    # list items in input directory
    cfg.items_list = list_audio(
@@ -219,8 +226,7 @@ def get_processing_chain(
) -> dict:
    """Mapping from test configuration to condition and postprocessing keyword arguments"""
    name = f"{condition}"
    if bitrate:
        if multiple_bitrates:
    if bitrate and multiple_bitrates:
        if isinstance(bitrate, list):
            name += f"_{sum(bitrate)}"
        else:
@@ -364,7 +370,7 @@ def get_processing_chain(
        )
        # update values to reflect decoder output
        tmp_in_fs = dec_cfg.get("fs", tmp_in_fs)
    elif cond_cfg["type"] == "ivas":
    elif cond_cfg["type"] == "ivas" or cond_cfg["type"] == "ivas_combined":
        cod_cfg = cond_cfg["cod"]
        dec_cfg = cond_cfg["dec"]

@@ -402,7 +408,7 @@ def get_processing_chain(
        if hasattr(cfg, "preprocessing_2"):
            preamble = cfg.preprocessing_2.get("preamble", 0)

        # if the encoding format differs from the format after the preprocessing, add format conversion stuff
        # if the encoding format differs from the format after the preprocessing, add format conversion
        if (cod_fmt := cod_cfg.get("fmt", tmp_in_fmt)) != tmp_in_fmt:
            chain["processes"].append(
                Postprocessing(
@@ -425,8 +431,11 @@ def get_processing_chain(
            cond_fmt.extend(tmp_out_fmt)
            tmp_out_fmt = tmp_out_fmt[0]

        ivas_cls = IVAS
        if cond_cfg["type"] == "ivas_combined":
            ivas_cls = IVASCombined
        chain["processes"].append(
            IVAS(
            ivas_cls(
                {
                    "in_fmt": tmp_in_fmt,
                    "in_fs": tmp_in_fs,
+1 −1
Original line number Diff line number Diff line
@@ -230,7 +230,7 @@ class TestConfig:
                    raise KeyError(
                        f"The following key(s) must be specified for EVS: {REQUIRED_KEYS_EVS}"
                    )
            elif type == "ivas":
            elif type == "ivas" or type == "ivas_combined":
                merged_cfg = get_default_config_for_codecs("IVAS", codec_bin_extension)
                merge_dicts(merged_cfg, cond_cfg)
                cfg["conditions_to_generate"][cond_name] = merged_cfg
+14 −16
Original line number Diff line number Diff line
@@ -54,13 +54,13 @@ from ivas_processing_scripts.utils import run, use_wine
class IVAS(Processing):
    def __init__(self, attrs):
        super().__init__(attrs)
        self._validate()
        self.name = "ivas"
        self.in_fmt = audio.fromtype(self.in_fmt)
        self.out_fmt = audio.fromtype(self.out_fmt)
        if not hasattr(self, "dec_opts"):
            self.dec_opts = None
        self._use_wine = use_wine(self.use_windows_codec_binaries)
        self._validate()

    def _validate(self):
        need_exe_suffix = (
@@ -146,21 +146,19 @@ class IVAS(Processing):
        logger.debug(f"IVAS encoder {in_file} -> {bitstream}")

        # Only resample and convert if wav, otherwise supposed pcm to be sampled at self.in_fs
        metadata_files = []
        metadata_files = in_meta if in_meta else []

        # for MASA suppose that metadata file has same basename and location as input file
        if not metadata_files:
            if isinstance(self.in_fmt, audio.MetadataAssistedSpatialAudio):
                md_file = in_file.parent / (in_file.name + ".met")
            metadata_files.append(md_file)
                metadata_files = [md_file]
            elif isinstance(self.in_fmt, (audio.ObjectBasedAudio, audio.OSBAAudio)):
                metadata_files = in_meta
            elif isinstance(self.in_fmt, audio.OMASAAudio):
                metadata_files = in_meta
            # TODO treffehn: check and maybe change here and for masa
            # if len(metadata_files) != number of ism channels plus one
            # md_file = in_file.parent / (in_file.name + ".met")
            # metadata_files.append(md_file)
            pass
                if len(metadata_files) != self.in_fmt.num_ism_channels + 1:
                    md_file = in_file.parent / (in_file.name + ".met")
                    metadata_files.append(md_file)

        # Support input file wav, pcm and txt (metadata iis)
        if in_file.suffix == ".wav":
Loading