Commit e499d545 authored by Archit Tamarapu's avatar Archit Tamarapu
Browse files

Merge branch...

Merge branch '82-characterization-add-support-for-discretely-coding-combined-formats-with-ivas' into 'main'

Resolve "[characterization] Add support for discretely coding combined formats with IVAS"

See merge request !173
parents 994731cb 8ce339a5
Loading
Loading
Loading
Loading
+42 −31
Original line number Diff line number Diff line
<!---
****<!---

   (C) 2022-2025 IVAS codec Public Collaboration with portions copyright Dolby International AB, Ericsson AB,
   Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V., Huawei Technologies Co. LTD.,
@@ -67,16 +67,16 @@ To facilitate the preparation of items for P800-{X} listening tests, it is possi

The YAML configuration file (`scene_description_config_file.yml`) defines how individual mono files should be spatially positioned and combined into the target format. For advanced formats like OMASA or OSBA, note that additional SBA items may be required. Refer to the `examples/` folder for template `.yml` files demonstrating the expected structure and usage.

Relative paths are resolved from the working directory (not the YAML file location). Use absolute paths if you're unsure. Avoid using dots `.` in file names (e.g., use `item_xxa3s1.wav`, not `item.xx.a3s1.wav`). Windows users: Use double backslashes `\\` and add `.exe` to executables if needed. Input and output files follow structured naming conventions to encode metadata like lab, language, speaker ID, etc. These are explained in detail in the file under *Filename conventions*.
Relative paths are resolved from the working directory (not the YAML file location). Use absolute paths if you're unsure. Avoid using dots `.` in file names (e.g., use `item_xxa3s1.wav`, not `item.xx.a3s1.wav`). Windows users: Use double backslashes `\\` and add `.exe` to executables if needed. Input and output files follow structured naming conventions to encode metadata like lab, language, speaker ID, etc. These are explained in detail in the file under _Filename conventions_.

Each entry under `scenes:` describes one test item, specifying:

* `output`: output file name
* `description`: human-readable description
* `input`: list of mono `.wav` files
* `azimuth` / `elevation`: spatial placement (°)
* `level`: loudness in dB
* `shift`: timing offsets in seconds
- `output`: output file name
- `description`: human-readable description
- `input`: list of mono `.wav` files
- `azimuth` / `elevation`: spatial placement (°)
- `level`: loudness in dB
- `shift`: timing offsets in seconds

Dynamic positioning (e.g., `"-20:1.0:360"`) means the source will move over time, stepping every 20 ms.

@@ -271,6 +271,10 @@ input:
  # fmt: "7_1_4"
  ### Define mask (HP50 or 20KBP) for input signal filtering; default = null
  # mask: "HP50"
  ### Gain factor to be applied BEFORE any other processing (linear, or add dB suffix)
  # gain_pre: 10 dB
  ### Gain factor to be applied AFTER any other processing (linear, or add dB suffix)
  # gain_post: 3.1622776602
  ### Target sampling rate in Hz for resampling; default = null (no resampling)
  # fs: 16000
  ### Target loudness in LKFS; default = null (no loudness change applied)
@@ -373,6 +377,8 @@ input:
###     mono_dmx        generate mono downmix condition
###     evs             generate an EVS coded condition (see below examples for additional required keys)
###     ivas            generate an IVAS coded condition (see below examples for additional required keys)
###     ivas_combined   generate a combined-format IVAS coded condition using two IVAS instances for each part
###                     (see below examples for additional required keys)
conditions_to_generate:
  ### Reference and anchor conditions ##########################
  c01:
@@ -401,7 +407,7 @@ conditions_to_generate:
  c06:
    ### REQUIRED: type of condition
    type: ivas
    ### REQUIRED: Bitrates to use for coding
    ### REQUIRED: Bitrates to use for coding, for ivas_combined, first and second bitrates are for objects and spatial parts respectively
    bitrates:
      - 160000
      # - 32000
@@ -430,7 +436,7 @@ conditions_to_generate:
  c07:
    ### REQUIRED: type of condition
    type: ivas
    ### REQUIRED: Bitrates to use for coding
    ### REQUIRED: Bitrates to use for coding, for ivas_combined, first and second bitrates are for objects and spatial parts respectively
    bitrates:
      - 160000
      # - 32000
@@ -492,6 +498,10 @@ postprocessing:
  fmt: "BINAURAL"
  ### REQUIRED: Target sampling rate in Hz for resampling; default = null (no resampling)
  fs: 48000
  ### Gain factor to be applied BEFORE any other processing (linear, or add dB suffix)
  # gain_pre: 10 dB
  ### Gain factor to be applied AFTER any other processing (linear, or add dB suffix)
  # gain_post: 3.1622776602
  ### Low-pass cut-off frequency in Hz; default = null (no filtering)
  # lp_cutoff: 24000
  ### Target loudness in LKFS; default = null (no loudness change applied)
@@ -519,7 +529,7 @@ postprocessing:
The following values may be used for the `type` key of a condition:

| Supported conditions | Description                                                       |
| :------------------: | ----------------------------------------------------------- |
| :------------------: | ----------------------------------------------------------------- |
|         ref          | Uncoded (reference)                                               |
|        lp3k5         | Uncoded low-passed at 3.5 kHz (anchor)                            |
|         lp7k         | Uncoded low-passed at 7 kHz (anchor)                              |
@@ -528,6 +538,7 @@ The following values may be used for the `type` key of a condition:
|       mono_dmx       | Uncoded mono downmix                                              |
|         evs          | Coded with multi-stream EVS codec (**metadata not coded!**)       |
|         ivas         | Coded with IVAS codec                                             |
|    ivas_combined     | Combined format coding with two IVAS instances (only OMASA, OSBA) |

### Configuration of conditions

+13 −11
Original line number Diff line number Diff line
@@ -154,7 +154,7 @@ input:
    ### REQUIRED: either error_pattern (and errpatt_late_loss_rate or errpatt_delay) or error_profile
    ### delay error profile file
    # error_pattern: ".../dly_error_profile.dat"
    ### Late loss rate in precent or EVS
    ### Late loss rate in precent for EVS
    # errpatt_late_loss_rate: 1
    ### Constant JBM delay in milliseconds for EVS
    # errpatt_delay: 200
@@ -186,6 +186,8 @@ input:
###     mono_dmx        generate mono downmix condition
###     evs             generate an EVS coded condition (see below examples for additional required keys)
###     ivas            generate an IVAS coded condition (see below examples for additional required keys)
###     ivas_combined   generate a combined-format IVAS coded condition using two IVAS instances for each part
###                     (see below examples for additional required keys)
conditions_to_generate:
  ### Reference and anchor conditions ##########################
  c01:
@@ -226,7 +228,7 @@ conditions_to_generate:
  c06:
      ### REQUIRED: type of condition
      type: ivas
      ### REQUIRED: Bitrates to use for coding
      ### REQUIRED: Bitrates to use for coding, for ivas_combined, first and second bitrates are for objects and spatial parts respectively
      bitrates:
          - 160000
          # - 32000
@@ -264,7 +266,7 @@ conditions_to_generate:
  c07:
      ### REQUIRED: type of condition
      type: ivas
      ### REQUIRED: Bitrates to use for coding
      ### REQUIRED: Bitrates to use for coding, for ivas_combined, first and second bitrates are for objects and spatial parts respectively
      bitrates:
          - 160000
          # - 32000
+1 −0
Original line number Diff line number Diff line
@@ -141,6 +141,7 @@ def render_oba_to_binaural(

        # sum results over all objects
        bin.audio = np.sum(np.stack(result, axis=2), axis=2)
        bin.fs = 48000

        # compensate delay from binaural dataset
        bin.audio = delay(bin.audio, bin.fs, -latency_smp, samples=True)
+1 −0
Original line number Diff line number Diff line
@@ -49,6 +49,7 @@ SUPPORTED_CONDITIONS = {
    "esdru",
    "evs",
    "ivas",
    "ivas_combined",
    "mono_dmx",
    "spatial_distortion",
}
+57 −110
Original line number Diff line number Diff line
@@ -41,12 +41,14 @@ from ivas_processing_scripts.audiotools.audiofile import read, write
from ivas_processing_scripts.processing.config import TestConfig
from ivas_processing_scripts.processing.evs import EVS
from ivas_processing_scripts.processing.ivas import IVAS, IVAS_rend
from ivas_processing_scripts.processing.ivas_combined import IVASCombined
from ivas_processing_scripts.processing.postprocessing import Postprocessing
from ivas_processing_scripts.processing.preprocessing import Preprocessing
from ivas_processing_scripts.processing.preprocessing_2 import Preprocessing2
from ivas_processing_scripts.processing.processing_splitting_scaling import (
    Processing_splitting_scaling,
)
from ivas_processing_scripts.processing.tx import get_tx_cfg
from ivas_processing_scripts.utils import get_abs_path, list_audio, parse_gain


@@ -62,39 +64,45 @@ def init_processing_chains(cfg: TestConfig) -> None:
    # other processing chains
    for cond_name, cond_cfg in cfg.conditions_to_generate.items():
        bitrates = cond_cfg.get("bitrates")
        if bitrates is not None and len(bitrates) > 1:
            multiple_bitrates_flag = True

        if not bitrates:
            # non coding condition
            cfg.proc_chains.append(get_processing_chain(cond_name, cfg))
        else:
            # bitrates can be a value, list or list of lists
            # for EVS, this is a per-channel bitrate
            # for IVAS combined, this is a per-instance bitrate
            # otherwise it is regular IVAS with multiple bitrate conditions
            multiple_bitrates_flag = isinstance(bitrates, list) and len(bitrates) > 1

            # pass the list for IVAS Combined, per-instance bitrate
            if cond_cfg.get("type") == "ivas_combined":
                cfg.proc_chains.append(
                    get_processing_chain(
                        cond_name,
                        cfg,
                        bitrates,
                    )
                )
            else:
            multiple_bitrates_flag = False
        if bitrates:
                for bitrate in bitrates:
                # check if a list was specified
                if isinstance(bitrate, list) and cond_name.startswith("ivas"):
                    # flatten the list of lists for IVAS
                    if isinstance(bitrate, list) and cond_cfg.get("type") == "ivas":
                        # flatten the list by adding multiple conditions
                        [
                            cfg.proc_chains.append(
                                get_processing_chain(
                                cond_name,
                                cond_cfg,
                                extend_br,
                                multiple_bitrates=multiple_bitrates_flag,
                                    cond_name, cfg, extend_br, multiple_bitrates_flag
                                )
                            )
                            for extend_br in bitrate
                        ]
                    else:
                    # otherwise pass the list; EVS will interpret as per-channel bitrate
                        # EVS, interpreted as per-channel bitrate
                        cfg.proc_chains.append(
                            get_processing_chain(
                            cond_name,
                            cfg,
                            bitrate,
                            multiple_bitrates=multiple_bitrates_flag,
                                cond_name, cfg, bitrate, multiple_bitrates_flag
                            )
                        )
        else:
            # non coding condition
            cfg.proc_chains.append(get_processing_chain(cond_name, cfg))

    # list items in input directory
    cfg.items_list = list_audio(
@@ -218,8 +226,7 @@ def get_processing_chain(
) -> dict:
    """Mapping from test configuration to condition and postprocessing keyword arguments"""
    name = f"{condition}"
    if bitrate:
        if multiple_bitrates:
    if bitrate and multiple_bitrates:
        if isinstance(bitrate, list):
            name += f"_{sum(bitrate)}"
        else:
@@ -312,52 +319,17 @@ def get_processing_chain(
        evs_lfe_9k6bps_nb = cond_cfg.get("evs_lfe_9k6bps_nb", None)

        # Frame error pattern bitstream modification
        tx_cfg = None
        if "tx" in cond_cfg.keys() or hasattr(cfg, "tx"):
            # postprocess also signal without error if there is loudness scaling
            if post_cfg.get("loudness"):
                tx_condition = True
            # local specification overwrites global one
            if "tx" in cond_cfg.keys():
                tx_cfg_tmp = cond_cfg["tx"]
            else:
                tx_cfg_tmp = cfg.tx

            if tx_cfg_tmp.get("type", None) == "FER":
                tx_cfg = {
                    "type": tx_cfg_tmp.get("type", None),
                    "error_pattern": get_abs_path(
                        tx_cfg_tmp.get("error_pattern", None)
                    ),
                    "error_rate": tx_cfg_tmp.get("error_rate", None),
                    "master_seed": cfg.master_seed,
                    "prerun_seed": cfg.prerun_seed,
                }
            elif tx_cfg_tmp.get("type", None) == "JBM":
                tx_cfg = {
                    "type": tx_cfg_tmp.get("type", None),
                    "error_pattern": get_abs_path(
                        tx_cfg_tmp.get("error_pattern", None)
                    ),
                    "errpatt_late_loss_rate": tx_cfg_tmp.get(
                        "errpatt_late_loss_rate", None
                    ),
                    "errpatt_delay": tx_cfg_tmp.get("errpatt_delay", None),
                    "errpatt_seed": tx_cfg_tmp.get("errpatt_seed", None),
                    "error_profile": tx_cfg_tmp.get("error_profile", None),
                    "n_frames_per_packet": tx_cfg_tmp.get("n_frames_per_packet", None),
                    "master_seed": cfg.master_seed,
                }
            else:
                raise ValueError(
                    "Type of bitstream procesing either missing or not valid"
                )
        else:
            tx_cfg = None
            tx_cfg = get_tx_cfg(cfg, cond_cfg, is_EVS=True)

        preamble = 0
        if hasattr(cfg, "preprocessing_2"):
            preamble = cfg.preprocessing_2.get("preamble", 0)
        else:
            preamble = 0

        # if the encoding format differs from the format after the preprocessing, add format conversion stuff
        if (cod_fmt := cod_cfg.get("fmt", tmp_in_fmt)) != tmp_in_fmt:
@@ -398,7 +370,7 @@ def get_processing_chain(
        )
        # update values to reflect decoder output
        tmp_in_fs = dec_cfg.get("fs", tmp_in_fs)
    elif cond_cfg["type"] == "ivas":
    elif cond_cfg["type"] == "ivas" or cond_cfg["type"] == "ivas_combined":
        cod_cfg = cond_cfg["cod"]
        dec_cfg = cond_cfg["dec"]

@@ -423,49 +395,20 @@ def get_processing_chain(
            )

        # Frame error pattern bitstream modification
        tx_cfg = None
        if "tx" in cond_cfg.keys() or hasattr(cfg, "tx"):
            # postprocess also signal without error if there is loudness scaling
            if post_cfg.get("loudness"):
                tx_condition = True
            # local specification overwrites global one
            if "tx" in cond_cfg.keys():
                tx_cfg_tmp = cond_cfg["tx"]
            else:
                tx_cfg_tmp = cfg.tx

            if tx_cfg_tmp.get("type", None) == "FER":
                tx_cfg = {
                    "type": tx_cfg_tmp.get("type", None),
                    "error_pattern": get_abs_path(
                        tx_cfg_tmp.get("error_pattern", None)
                    ),
                    "error_rate": tx_cfg_tmp.get("error_rate", None),
                    "master_seed": cfg.master_seed,
                    "prerun_seed": cfg.prerun_seed,
                }
            elif tx_cfg_tmp.get("type", None) == "JBM":
                tx_cfg = {
                    "type": tx_cfg_tmp.get("type", None),
                    "error_pattern": tx_cfg_tmp.get("error_pattern", None),
                    "error_profile": tx_cfg_tmp.get("error_profile", None),
                    "n_frames_per_packet": tx_cfg_tmp.get("n_frames_per_packet", None),
                    "master_seed": cfg.master_seed,
                    "errpatt_seed": tx_cfg_tmp.get("errpatt_seed", None),
                }
                ivas_jbm = True
            else:
                raise ValueError(
                    "Type of bitstream procesing either missing or not valid"
                )
        else:
            tx_cfg = None
            tx_cfg = get_tx_cfg(cfg, cond_cfg)
            ivas_jbm = tx_cfg.get("type") == "JBM"

        preamble = 0
        if hasattr(cfg, "preprocessing_2"):
            preamble = cfg.preprocessing_2.get("preamble", 0)
        else:
            preamble = 0

        # if the encoding format differs from the format after the preprocessing, add format conversion stuff
        # if the encoding format differs from the format after the preprocessing, add format conversion
        if (cod_fmt := cod_cfg.get("fmt", tmp_in_fmt)) != tmp_in_fmt:
            chain["processes"].append(
                Postprocessing(
@@ -488,8 +431,11 @@ def get_processing_chain(
            cond_fmt.extend(tmp_out_fmt)
            tmp_out_fmt = tmp_out_fmt[0]

        ivas_cls = IVAS
        if cond_cfg["type"] == "ivas_combined":
            ivas_cls = IVASCombined
        chain["processes"].append(
            IVAS(
            ivas_cls(
                {
                    "in_fmt": tmp_in_fmt,
                    "in_fs": tmp_in_fs,
@@ -592,6 +538,7 @@ def get_processing_chain(
            }
        )
    )

    # add splitting and scaling for all conditions
    chain["processes"].append(
        Processing_splitting_scaling(
Loading