merged main (68381df7) · Commits · IVAS Codec Public Collaboration / IVAS Processing Scripts

README.md

+19 −15

Original line number	Diff line number	Diff line
		@@ -193,8 +193,8 @@ input:
		# preprocessing:
		### Target format used in rendering from input format; default = null (no rendering)
		# fmt: "7_1_4"
		### Flag for application of 50Hz high-pass filter; default = false
		# hp50: true
		### Define mask (HP50 or 20KBP) for input signal filtering; default = null
		# mask: "HP50"
		### Target sampling rate in Hz for resampling; default = null (no resampling)
		# fs: 16000
		### Target loudness in LKFS; default = null (no loudness change applied)
		@@ -386,6 +386,8 @@ conditions_to_generate:
		bitrates:
		# - 9600
		- [13200, 13200, 8000, 13200, 9600]
		### for multi-channel configs, code LFE with 9.6 kbps NB (as mandated by IVAS-3)
		evs_lfe_9k6bps_nb: true
		cod:
		### Path to encoder binary; default search for EVS_cod in bin folder (primary) and PATH (secondary)
		bin: ~/git/ivas-codec/EVS_cod
		@@ -468,6 +470,7 @@ This configuration has to match the channel configuration. If the provided list
		For the encoding stage `cod` and the decoding stage `dec`, the path to the IVAS_cod and IVAS_dec binaries can be specified under the key `bin`.
		Additionally some resampling can be applied by using the key `fs` followed by the desired sampling rate.
		The general bitstream processing configuration can be locally overwritten for each EVS and IVAS condition with the key `tx`.
		The additional key `evs_lfe_9k6bps_nb` is only available for EVS conditions and ensures a bitrate of 9.6kbps and narrow band processing of the LFE channel(s).
		#### IVAS
		The configuration of the IVAS condition is similar to the EVS condition. However, only one bitrate for all channels (and metadata) can be specified.
		In addition to that, the encoder and decoder take some additional arguments defined by the key `opts`.
		@@ -508,20 +511,21 @@ The processing chain is as follows:
		The following additional executables are needed for the different processing steps:

		\| Processing step \| Executable \| Where to find \|
		\|---------------------------------\|-----------------------\|-------------------------------------------------------------------------------------------------------------\|
		\| Loudness adjustment \| bs1770demo \| https://github.com/openitu/STL \|
		\|-------------------------------------------------\|-----------------------\|-------------------------------------------------------------------------------------------------------------\|
		\| Loudness measurement and adjustment \| bs1770demo \| https://github.com/openitu/STL \|
		\| MNRU \| p50fbmnru \| https://github.com/openitu/STL \|
		\| ESDRU \| esdru \| https://github.com/openitu/STL \|
		\| Frame error pattern application \| eid-xor \| https://github.com/openitu/STL \|
		\| Error pattern generation \| gen-patt \| https://www.itu.int/rec/T-REC-G.191-201003-S/en (Note: Version in https://github.com/openitu/STL is buggy!) \|
		\| Filtering, Resampling \| filter \| https://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_76/docs/S4-131277.zip \|
		\| Random offset/seed generation \| random \| https://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_76/docs/S4-131277.zip \|
		\| Random offset/seed generation (necessary for background noise and FER bitstream processing) \| random \| https://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_76/docs/S4-131277.zip \|
		\| JBM network simulator \| networkSimulator_g192 \| https://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_76/docs/S4-131277.zip \|
		\| MASA rendering \| masaRenderer \| https://www.3gpp.org/ftp/TSG_SA/WG4_CODEC/TSGS4_122_Athens/Docs/S4-230221.zip \|
		\| MASA rendering (also used in loudness measurement of MASA items) \| masaRenderer \| https://www.3gpp.org/ftp/TSG_SA/WG4_CODEC/TSGS4_122_Athens/Docs/S4-230221.zip \|
		\| EVS reference conditions \| EVS_cod, EVS_dec \| https://www.3gpp.org/ftp/Specs/archive/26_series/26.443/26443-h00.zip \|

		The necessary binaries have to be placed in the [ivas_processing_scripts/bin](./ivas_processing_scripts/bin) folder.
		For most of the tools it is sufficient to copy the binaries while it is necessary to add some additional files for the MASA renderer.
		The necessary binaries have to be either placed in the [ivas_processing_scripts/bin](./ivas_processing_scripts/bin) folder or the path has to be specified in
		[ivas_processing_scripts/binary_paths.yml](./ivas_processing_scripts/binary_paths.yml).
		For most of the tools it is sufficient to copy the binaries while it is necessary to add the associated *.bin files for the MASA renderer.

		---

examples/TEMPLATE.yml

+6 −2

Original line number	Diff line number	Diff line
		@@ -64,8 +64,8 @@ input:
		# preprocessing:
		### Target format used in rendering from input format; default = null (no rendering)
		# fmt: "7_1_4"
		### Flag for application of 50Hz high-pass filter; default = false
		# hp50: true
		### Define mask (HP50 or 20KBP) for input signal filtering; default = null
		# mask: "HP50"
		### Target sampling rate in Hz for resampling; default = null (no resampling)
		# fs: 16000
		### Target loudness in LKFS; default = null (no loudness change applied)
		@@ -247,11 +247,15 @@ conditions_to_generate:
		bitrates:
		# - 9600
		- [13200, 13200, 8000, 13200, 9600]
		### for multi-channel configs, code LFE with 9.6 kbps NB (as mandated by IVAS-3)
		evs_lfe_9k6bps_nb: true
		### Encoder options
		cod:
		### Path to encoder binary; default search for EVS_cod in bin folder (primary) and PATH (secondary)
		bin: EVS_cod
		### Encoder input sampling rate in Hz (resampling performed in case of mismatch); default = null (no resampling)
		# fs: 32000
		### Decoder options
		dec:
		### Path to encoder binary; default search for EVS_dec in bin folder (primary) and PATH (secondary)
		bin: EVS_dec

ivas_processing_scripts/audiotools/init.py

+5 −4

Original line number	Diff line number	Diff line
		@@ -79,10 +79,11 @@ def add_processing_args(group, input=True):
		default=None,
		)
		group.add_argument(
		f"-{ps}hp",
		f"--{p}_hp50",
		help="Apply 50 Hz high-pass filtering (default = %(default)s)",
		action="store_true",
		f"-{ps}mk",
		f"--{p}_mask",
		type=str,
		help="Apply filtering with mask ((HP50, 20KBP or None; default = %(default)s)",
		default=None,
		)
		group.add_argument(
		f"-{ps}w",

ivas_processing_scripts/audiotools/constants.py

+1 −0

Original line number	Diff line number	Diff line
		@@ -692,6 +692,7 @@ DELAY_COMPENSATION_FOR_FILTERING = {
		"down": 145,
		},
		"MSIN": 92,
		"20KBP": 200,
		"LP1p5": 322,
		"LP35": 232,
		"LP7": 117,

ivas_processing_scripts/audiotools/convert/init.py

+10 −10

Original line number	Diff line number	Diff line
		@@ -43,8 +43,8 @@ from ivas_processing_scripts.audiotools.convert.scenebased import convert_sceneb
		from ivas_processing_scripts.audiotools.wrappers.bs1770 import loudness_norm
		from ivas_processing_scripts.audiotools.wrappers.esdru import esdru
		from ivas_processing_scripts.audiotools.wrappers.filter import (
		hp50filter_itu,
		lpfilter_itu,
		maskfilter_itu,
		resample_itu,
		)
		from ivas_processing_scripts.audiotools.wrappers.p50fbmnru import p50fbmnru
		@@ -133,7 +133,7 @@ def convert(
		in_delay: Optional[float] = None,
		in_fs: Optional[int] = None,
		in_cutoff: Optional[int] = None,
		in_hp50: Optional[bool] = None,
		in_mask: Optional[str] = None,
		in_window: Optional[list] = None,
		in_loudness: Optional[float] = None,
		in_loudness_fmt: Optional[str] = None,
		@@ -142,7 +142,7 @@ def convert(
		out_delay: Optional[float] = None,
		out_fs: Optional[int] = None,
		out_cutoff: Optional[int] = None,
		out_hp50: Optional[bool] = None,
		out_mask: Optional[str] = None,
		out_window: Optional[list] = None,
		out_loudness: Optional[float] = None,
		out_loudness_fmt: Optional[str] = None,
		@@ -162,7 +162,7 @@ def convert(
		delay=in_delay,
		fs=in_fs,
		fc=in_cutoff,
		hp50=in_hp50,
		mask=in_mask,
		window=in_window,
		loudness=in_loudness,
		loudness_fmt=in_loudness_fmt,
		@@ -180,7 +180,7 @@ def convert(
		delay=out_delay,
		fs=out_fs,
		fc=out_cutoff,
		hp50=out_hp50,
		mask=out_mask,
		window=out_window,
		loudness=out_loudness,
		loudness_fmt=out_loudness_fmt,
		@@ -198,7 +198,7 @@ def process_audio(
		delay: Optional[float] = None,
		fs: Optional[int] = None,
		fc: Optional[int] = None,
		hp50: Optional[bool] = False,
		mask: Optional[str] = None,
		window: Optional[float] = None,
		loudness: Optional[float] = None,
		loudness_fmt: Optional[str] = None,
		@@ -232,11 +232,11 @@ def process_audio(
		logger.debug(f"Windowing audio with {window} ms Hann window")
		x.audio = audioarray.window(x.audio, x.fs, window)

		"""high-pass (50 Hz) filtering"""
		if hp50:
		"""mask filtering"""
		if mask is not None:
		if logger:
		logger.debug("Applying 50 Hz high-pass filter using ITU STL filter")
		x.audio = hp50filter_itu(x)
		logger.debug("Applying mask filter using ITU STL filter")
		x.audio = maskfilter_itu(x, mask)

		"""resampling"""
		if x.fs != fs: