Merge branch '75-sba-to-binaural_room-rendering-does-not-use-custom-brirs' of... (4a5cf87f) · Commits · IVAS Codec Public Collaboration / IVAS Processing Scripts

README.md

+283 −267

Original line number	Diff line number	Diff line
		@@ -59,28 +59,28 @@ item generation and item processing. The two steps can be applied independent of

		### Item generation

		To set up the P800-{X} listening test (X = 1, 2, ...9) copy your mono input files to 'experiments/selection/P800-{X}/gen_input/items_mono'.
		These files have to follow the naming scheme '{l}{LL}p0{X}{name_of_item}' where 'l' stands for the listening lab designator: a (Force Technology),
		To set up the P800-{X} listening test (X = 1, 2, ...9) copy your mono input files to `experiments/selection/P800-{X}/gen_input/items_mono`.
		These files have to follow the naming scheme `{l}{LL}p0{X}{name_of_item}` where 'l' stands for the listening lab designator: a (Force Technology),
		b (HEAD acoustics), c (MQ University), d (Mesaqin.com), and 'LL' stands for the language: EN, GE, JP, MA, DK, FR.

		The impluse responses have to be copied to experiments/selection/P800-{X}/gen_input/IRs.

		To generate the items run `python -m ivas_processing_scripts.generation experiments/selection/P800-{X}/config/item_gen_P800-{X}_{l}.yml` from the root folder of the repository.
		The resulting files can be found in 'experiments/selection/P800-{X}/proc_input_{l}' sorted by category.
		The resulting files can be found in `experiments/selection/P800-{X}/proc_input_{l}` sorted by category.

		For P800-3 the input files for the processing are already provided by the listening lab. This means this step can be skipped.
		For tests with ISM input format (P800-6 and P800-7) no IRs are needed, only mono sentences

		### Item processing

		The input has to be in the folder 'experiments/selection/P800-{X}/proc_input_{l}'. If item generation is performed previous to this step, the corresponding files are already in the right folder.
		The input has to be in the folder `experiments/selection/P800-{X}/proc_input_{l}`. If item generation is performed previous to this step, the corresponding files are already in the right folder.
		If this step is performed independently of the previous one the input files have to be copied to the respective folder sorted by category.

		If the test includes background noise, the corresponding files have to be copied to 'experiments/selection/P800-{X}/background_noise'.
		For most tests the naming has to follow the scheme 'background_noise_cat{c}.wav' where 'c' denotes the category with a number between one and six. For the P800-2 test, the naming has to follow 'background_noise_cat{c}-lab_{l}.wav'
		If the test includes background noise, the corresponding files have to be copied to `experiments/selection/P800-{X}/background_noise`.
		For most tests the naming has to follow the scheme `background_noise_cat{c}.wav` where 'c' denotes the category with a number between one and six. For the P800-2 test, the naming has to follow `background_noise_cat{c}-lab_{l}.wav`

		To process the items run `python generate_test.py P800-{X},{l}` from the root folder of the repository.
		The results can be found in 'experiments/selection/P800-{X}/proc_output_{l}'.
		The results can be found in `experiments/selection/P800-{X}/proc_output_{l}`.

		For more information about this processing step see
		[How to generate the configs and process items for the selection test experiments](#how-to-generate-the-configs-and-process-items-for-the-selection-test-experiments).
		@@ -91,10 +91,10 @@ The set up for the MUSHRA test only consists of the item processing.

		### Item processing

		To process a BS1534-{X}{x} (X = 1, 2, ...7, x = a, b) listening test, the input files have to be placed in the folder 'experiments/selection/BS1534-{X}{x}/proc_input_{l}' and the command
		To process a BS1534-{X}{x} (X = 1, 2, ...7, x = a, b) listening test, the input files have to be placed in the folder `experiments/selection/BS1534-{X}{x}/proc_input_{l}` and the command
		`python generate_test.py BS1534-{X}{x},{l}` has to be run from the root of the repository. 'l' stands for the listening lab designator: a (Force Technology), b (HEAD acoustics), c (MQ University), d (Mesaqin.com).

		The output can then be found in 'experiments7selection/BS1534-{X}{x}/proc_output_{l}'
		The output can then be found in `experiments7selection/BS1534-{X}{x}/proc_output_{l}`

		The BS1534-7a and BS1534-7b tests are MASA experiments with FOA and HOA2 inputs. Therefore the input folder contains two subfolders called 'FOA' and 'HOA2'.
		The input files have to be placed in the folders according to their format.
		@@ -332,7 +332,6 @@ input:
		# tx:
		### REQUIRED: Type of bitstream processing; possible types: "JBM" or "FER"
		#type: "JBM"

		### JBM
		### REQUIRED: either error_pattern or error_profile
		### delay error profile file
		@@ -341,7 +340,6 @@ input:
		# error_profile: 5
		## nFramesPerPacket parameter for the network simulator (optional); default = 1
		# n_frames_per_packet: 2

		### FER
		### REQUIRED: either error_pattern or error_rate
		### Frame error pattern file
		@@ -349,8 +347,8 @@ input:
		### Error rate in percent
		# error_rate: 5
		```
		</details>

		</details>

		### Configuration of conditions under test

		@@ -513,7 +511,7 @@ postprocessing:
		The following values may be used for the `type` key of a condition:

		\| Supported conditions \| Description \|
		\|:--------------------:\|-----------------------------------------------------------\|
		\| :------------------: \| ----------------------------------------------------------- \|
		\| ref \| Uncoded (reference) \|
		\| lp3k5 \| Uncoded low-passed at 3.5 kHz (anchor) \|
		\| lp7k \| Uncoded low-passed at 7 kHz (anchor) \|
		@@ -526,23 +524,34 @@ The following values may be used for the `type` key of a condition:
		### Configuration of conditions

		#### Reference ref

		No required arguments but the `type` key. An additional LP filtering can be specified with `out_fc`.

		#### Low-pass Filter lp3k5 and lp7k

		No required arguments but the `type` key.

		#### MNRU and ESDRU

		The MNRU and ESDRU conditions each take one additional required argument. For MNRU the value `q`, which represents the ratio of speech power to modulated noise power in dB,
		has to be specified. <br />
		has to be specified.
		For the ESDRU the spatial degradation value `alpha` in the range [0, 1] has to be defined.

		#### Mono downmix mono_dmx

		No required arguments but the `type` key.

		#### EVS

		For EVS a list of at least one bitrate has to be specified with the key `bitrates`. The entries in this list can also be lists containing the bitrates used for the processing of the individual channels.
		This configuration has to match the channel configuration. If the provided list is shorter, the last value will be repeated.
		For the encoding stage `cod` and the decoding stage `dec`, the path to the IVAS_cod and IVAS_dec binaries can be specified under the key `bin`.
		Additionally some resampling can be applied by using the key `fs` followed by the desired sampling rate.
		The general bitstream processing configuration can be locally overwritten for each EVS and IVAS condition with the key `tx`.
		The additional key `evs_lfe_9k6bps_nb` is only available for EVS conditions and ensures a bitrate of 9.6kbps and narrow band processing of the LFE channel(s).

		#### IVAS

		The configuration of the IVAS condition is similar to the EVS condition. However, only one bitrate for all channels (and metadata) can be specified.
		In addition to that, the encoder and decoder take some additional arguments defined by the key `opts`.
		For the decoder an output format can be set. If this argument is not defined the format specified in postprocessing is used.
		@@ -557,12 +566,14 @@ postprocessing.
		## Supported audio formats

		\| spatial_audio_format \| Input/Ouput/Rendered \| Description \|
		\|--------------------------------------------------\|----------------------\|------------------------------------------------\|
		\| --------------------------------- \| -------------------- \| ----------------------------------------------------------------------------------------------------------------------------- \|
		\| MONO \| yes/yes/yes \| mono signals \|
		\| STEREO \| yes/yes/yes \| stereo signals \|
		\| ISMx \| yes/yes/no \| Objects with metadata, description using renderer metadata \|
		\| MASAx \| yes/no/no \| mono or stereo signals with spatial metadata !!!metadata must share same basename as waveform file but with .met extension!!! \|
		\| ISMxMASAy \| yes/no/no \| Combined objects with MASA \|
		\| FOA/HOA2/HOA3 or PLANAR(FOA/HOAx) \| yes/yes/yes \| Ambisonic signals or planar ambisonic signals \|
		\| ISMxSBAy \| yes/no/no \| Combined objects with ambisonics \|
		\| BINAURAL/BINAURAL_ROOM \| no/yes/yes \| Binaural signals \|
		\| 5_1/5_1_2/5_1_4/7_1/7_1_4 \| yes/yes/yes \| Multi-channel signals for predefined loudspeaker layout \|
		\| META \| yes/no/no \| Audio scene described by a renderer config (work in progress) \|
		@@ -582,27 +593,28 @@ The processing chain is as follows:
		- The postprocessing stage performs a final conversion from the output of the previous stage if necessary and applies the specified processing

		---

		## Additional Executables

		The following additional executables are needed for the different processing steps:

		\| Processing step \| Executable \| Where to find \|
		\|-------------------------------------------------\|-----------------------\|-------------------------------------------------------------------------------------------------------------\|
		\| Loudness measurement and adjustment \| bs1770demo \| https://github.com/ErikNorvell-Ericsson/STL (Note branch) \|
		\| MNRU \| p50fbmnru \| https://github.com/openitu/STL \|
		\| ESDRU \| esdru \| https://github.com/openitu/STL \|
		\| Frame error pattern application \| eid-xor \| https://github.com/openitu/STL \|
		\| Error pattern generation \| gen-patt \| https://www.itu.int/rec/T-REC-G.191-201003-S/en (Note: Version in https://github.com/openitu/STL is buggy!) \|
		\| Reverberation module \| reverb \| https://github.com/openitu/STL \|
		\| Filtering, Resampling \| filter \| https://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_76/docs/S4-131277.zip \|
		\| Random offset/seed generation (necessary for background noise and FER bitstream processing) \| random \| https://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_76/docs/S4-131277.zip \|
		\| JBM network simulator \| networkSimulator_g192 \| https://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_76/docs/S4-131277.zip \|
		\| MASA rendering (also used in loudness measurement of MASA items) \| masaRenderer, masaAnalyzer \| https://www.3gpp.org/ftp/TSG_SA/WG4_CODEC/TSGS4_122_Athens/Docs/S4-230221.zip \|
		\| EVS reference conditions \| EVS_cod, EVS_dec \| https://www.3gpp.org/ftp/Specs/archive/26_series/26.443/26443-h00.zip \|
		\| ------------------------------------------------------------------------------------------- \| -------------------------- \| --------------------------------------------------------------------------------------------------------------- \|
		\| Loudness measurement and adjustment \| bs1770demo \| <https://github.com/ErikNorvell-Ericsson/STL> (Note branch) \|
		\| MNRU \| p50fbmnru \| <https://github.com/openitu/STL> \|
		\| ESDRU \| esdru \| <https://github.com/openitu/STL> \|
		\| Frame error pattern application \| eid-xor \| <https://github.com/openitu/STL> \|
		\| Error pattern generation \| gen-patt \| <https://www.itu.int/rec/T-REC-G.191-201003-S/en> (Note: Version in <https://github.com/openitu/STL> is buggy!) \|
		\| Reverberation module \| reverb \| <https://github.com/openitu/STL> \|
		\| Filtering, Resampling \| filter \| <https://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_76/docs/S4-131277.zip> \|
		\| Random offset/seed generation (necessary for background noise and FER bitstream processing) \| random \| <https://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_76/docs/S4-131277.zip> \|
		\| JBM network simulator \| networkSimulator_g192 \| <https://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_76/docs/S4-131277.zip> \|
		\| MASA rendering (also used in loudness measurement of MASA items) \| masaRenderer, masaAnalyzer \| <https://www.3gpp.org/ftp/TSG_SA/WG4_CODEC/TSGS4_122_Athens/Docs/S4-230221.zip> \|
		\| EVS reference conditions \| EVS_cod, EVS_dec \| <https://www.3gpp.org/ftp/Specs/archive/26_series/26.443/26443-h00.zip> \|

		The necessary binaries have to be either placed in the [ivas_processing_scripts/bin](./ivas_processing_scripts/bin) folder or the path has to be specified in
		[ivas_processing_scripts/binary_paths.yml](./ivas_processing_scripts/binary_paths.yml).
		For most of the tools it is sufficient to copy the binaries while it is necessary to add the associated *.bin files for the MASA renderer.
		For most of the tools it is sufficient to copy the binaries while it is necessary to add the associated \*.bin files for the MASA renderer.

		---

		@@ -651,26 +663,26 @@ First line of an input definition contains the input type: SBA, MC or ISM.

		Following lines depend on the input type:

		```
		```text
		SBA
		Index of the first channel of this input in the multitrack file (1-indexed)
		Ambisonics order
		```

		```
		```text
		MC
		Index of the first channel of this input in the multitrack file (1-indexed)
		Name of speaker layout (X_Y_Z or CICPx format)
		```

		```
		```text
		MASA
		Index of the first channel of this input in the multitrack file (1-indexed)
		Number of transport channels
		Path to MASA metadata file (must be relative to config file location)
		```

		```
		```text
		ISM
		Index of this input's audio in the multitrack file (1-indexed)
		Path to ISM metadata file (must be relative to config file location)
		@@ -678,7 +690,7 @@ Path to ISM metadata file (must be relative to config file location)

		OR

		```
		```text
		ISM
		Index of this input's audio in the multitrack file (1-indexed)
		Number N of positions defined, followed by N lines in form:
		@@ -699,7 +711,7 @@ Each key-value pair should be placed on a separate line.
		Currently the following key-value pairs are supported:

		\| key \| value type \|
		\|---------------------\|--------------------------------------\|
		\| ------- \| ---------- \|
		\| gain_dB \| float \|

		## Example configuration
		@@ -715,7 +727,7 @@ The following example defines a scene with 4 inputs:
		move to (90,0) and stay there for 5 frames. This trajectory is looped over the
		duration of the input audio file.

		```
		```text
		./input_audio.wav
		4
		ISM
		@@ -745,7 +757,8 @@ Please refer to [the notebook](./examples/audiotools.ipynb) for an overview.
		# How to generate the configs and process items for the selection test experiments

		The script `generate_test.py` is used to generate config files and process items for the selection test experiments:
		```

		```text
		usage: generate_test.py [-h] [--no_parallel] [--create_cfg_only] exp_lab_pairs [exp_lab_pairs ...]

		Generate config files and process files for selecton experiments. Experiment names and lab ids must be given as comma-separated pairs (e.g. 'P800-5,b BS1534-4a,d ...')
		@@ -758,13 +771,16 @@ options:
		--no_parallel If given, configs will not be run in parallel
		--create_cfg_only If given, only create the configs and folder structure without processing items
		```

		Before running the script, one needs to put the input files in the respective input folder (including the background noise files, see below). If input files are missing, the script will complain and stop. For example, for processing tests P800-3 and BS1534-4a for labs b and d, respectively, command line would look like this (no whitespace between the commas!):
		```

		```bash
		python3 generate_test.py P800-3,b BS1534-4a,d
		```

		Tests are processed separately per category and per lab (as some values in the configs are dependent on category and lab). For each experiment, a static base config is stored from which the actual configs are generated (identfied by the suffix `catX-lab_Y.yml`). For P800 tests, there are 6 categories each. The BS1534 experiments do not define categories, except for the MASA ones (BAS534-7a/b) - there one might mix FOA and HOA2 input material, so ther eare 2 categories for those in the scripts (category 1 for FOA, category 2 for HOA2). In `experiments/selection/` there is a folder structure prepared for all selection experiments, in which you have to put the input files for your test. For example, for P800-1:
		```

		```text
		experiments/selection/P800-1/
		├── background_noise <--- put your background files in here and name them as background_noisecatX.wav. Not all experiments use background noise
		├── config <--- contains base config, generated configs will be stored here, too

ivas_processing_scripts/audiotools/binaural_datasets/README.md

0 → 100755

+41 −0

Original line number	Diff line number	Diff line
		# Binaural Datasets

		Files in this directory should contain impulse responses for use in rendering in Matlab .mat format
		A sampling rate of 48kHz is assumed.

		## Naming scheme

		Files should adhere to the following naming scheme:

		`{HRIR\|BRIR}_{DATASETNAME}_{FULL\|LS\|SBA(1-3)}.mat`

		- `HRIR or BRIR`
		specifies the type of impulse response which will be used for
		either `BINAURAL` or `BINAURAL_ROOM` output respectively
		- `DATASETNAME`
		specifies the name used with the binaural_dataset commandline argument
		or YAML key to enable selection of this dataset
		- `FULL or LS or SBA3`
		specifies the subset of impulse responses in the file:
		- `FULL`: all available measurements on the sphere
		- `LS`: superset of supported loudspeaker layouts
		_(see `audiotools.constants.CHANNEL_BASED_AUDIO_FORMATS["LS""]`)_
		- `SBA{1,2,3}`: impulse responses transformed to ambisonics by external conversion
		- if available SBA1 is used for FOA, SBA2 for HOA2 and SBA3 for HOA3
		- if not available SBA3 is used and truncated for all Ambisonic formats

		## File Contents

		Each Matlab file should contain the following variables:

		- `IR`
		Impulse responses with dimensions [ir_length x n_ears x n_channels]
		- `SourcePosition`
		array of {azimuth, elevation, radius} of dimensions [n_channels x 3]
		required for FULL, optional otherwise
		- `latency_s`
		latency of the dataset in samples
		optional, will be estimated if not provided

		LICENSES:
		Please see [HRIR.txt](../../../thirdPartyLegalNotices/HRIR.txt) and [BRIR.txt](../../../thirdPartyLegalNotices/BRIR.txt) for license info

ivas_processing_scripts/audiotools/binaural_datasets/README.txt

deleted100755 → 0

+0 −34

Original line number	Diff line number	Diff line
		Files in this directory should contain impulse responses for use in rendering in Matlab .mat format
		Samplingrate of 48kHz is assumed

		Files should adhere to the following naming scheme:

		{HRIR\|BRIR}_{DATASETNAME}_{FULL\|LS\|SBA(1-3)}.mat

		- HRIR or BRIR
		specifies the type of impulse response which will be used
		for either BINAURAL or BINAURAL_ROOM output respectively
		- DATASETNAME
		specifies the name used with the binaural_dataset commandline argument
		or YAML key to enable selection of this dataset
		- FULL or LS or SBA3
		specifies the subset of impulse responses in the file:
		FULL: all available measurements on the sphere
		LS: superset of supported loudspeaker layouts
		(see audiotools.constants.CHANNEL_BASED_AUDIO_FORMATS["LS""])
		SBA(1-3): impulse responses transformed to ambisonics by external conversion
		if available SBA1 is used for FOA, SBA2 for HOA2 and SBA3 for HOA3
		if not available SBA3 is used and truncated for all Ambisonic formats

		Each Matlab file should contain the following variables:
		- IR
		Impulse responses with dimensions [ir_length x n_ears x n_channels]
		- SourcePosition
		array of {azimuth, elevation, radius} of dimensions [n_channels x 3]
		required for FULL, optional otherwise
		- latency_s
		latency of the dataset in samples
		optional, will be estimated if not provided

		LICENSES:
		Please see HRIR.txt and BRIR.txt for license info
		No newline at end of file

ivas_processing_scripts/audiotools/convert/objectbased.py

+1 −1

Original line number	Diff line number	Diff line
		@@ -113,7 +113,7 @@ def render_oba_to_binaural(

		render_oba_to_cba(oba, cba_tmp)

		render_cba_to_binaural(cba_tmp, bin, trajectory)
		render_cba_to_binaural(cba_tmp, bin, trajectory, bin_dataset, **kwargs)
		else:
		IR, SourcePosition, latency_smp = load_ir(oba.name, bin.name, bin_dataset)

ivas_processing_scripts/audiotools/convert/omasa.py

+1 −1

Original line number	Diff line number	Diff line
		@@ -61,7 +61,7 @@ def convert_omasa(
		omasa.audio[:, : omasa.num_ism_channels],
		omasa.fs,
		)
		oba.metadata_files = copy(omasa.metadata_files)
		oba.metadata_files = omasa.metadata_files[:-1]
		oba.object_pos = copy(omasa.object_pos)
		masa = audio.fromarray(
		"MASA"