Merge branch '402-remove-cleanup-of-pyaudio3dtools-scripts-phase1' into 'main' (4a2c526e) · Commits · IVAS Codec Public Collaboration / IVAS Codec

scripts/README.md

+0 −224

Original line number	Diff line number	Diff line
		@@ -38,7 +38,6 @@ title: Python scripts for Testing the IVAS code and Generating test items
		- [Python scripts for Testing the IVAS code and Generating test items](#python-scripts-for-testing-the-ivas-code-and-generating-test-items)
		- [Contents](#contents)
		- [0. Requirements](#0-requirements)
		- [- numpy and scipy for `generate_test_items.py`, `testBitexact.py` and `self_test.py`](#--numpy-and-scipy-for-generate_test_itemspy-testbitexactpy-and-self_testpy)
		- [1. Scripts and classes for testing IVAS code](#1--scripts-and-classes-for-testing-ivas-code)
		- [1.1 Classes](#11-classes)
		- [1.2 Output directory structure](#12-output-directory-structure)
		@@ -49,16 +48,6 @@ title: Python scripts for Testing the IVAS code and Generating test items
		- [`IvasBuildAndRunChecks.py`](#ivasbuildandruncheckspy)
		- [`testBitexact.py`](#testbitexactpy)
		- [`self_test.py`](#self_testpy)
		- [2. Script for generating listening test items](#2-script-for-generating-listening-test-items)
		- [2.1. `generate_test_items.py`](#21-generate_test_itemspy)
		- [2.2. Test configuration file](#22-test-configuration-file)
		- [2.3. Supported test conditions](#23-supported-test-conditions)
		- [2.4. Supported input/output/rendered audio formats](#24-supported-inputoutputrendered-audio-formats)
		- [2.5. Processing](#25-processing)
		- [2.6. Renderer Metadata definition](#26-renderer-metadata-definition)
		- [3. Script for converting formats and binauralizing](#3-script-for-converting-formats-and-binauralizing)
		- [3.1. Binauralizing with head rotation](#31-binauralizing-with-head-rotation)
		- [3.2. Generating binaural reference signals](#32-generating-binaural-reference-signals)

		---

		@@ -441,216 +430,3 @@ Missing reference conditions and the test conditions are then generated and
		the reference and test conditions are compared.

		-----


		## 2. Script for generating listening test items

		The `generate_test_items.py` python script helps to quickly setup listening tests with multiple (pre-)processing and post-processing options.

		### 2.1. `generate_test_items.py`

		Script for generating (listening) test items.

		```
		usage: generate_test_items.py [-h] -i INFILE [INFILE ...]

		Generate test items

		optional arguments:
		-h, --help show this help message and exit
		-i INFILE [INFILE ...], --infile INFILE [INFILE ...]
		Configuration file(s): FILE1.json FILE2.json ...
		```

		Example how to call it:

		```
		python3 .\generate_test_items.py -i .\examples\my_test_config.json
		```

		Where `my_test_config.json` is a test configuration file in json format with fields explained in next section.

		### 2.2. Test configuration file

		This is the main file to edit in order to change global configuration options, detailed below.

		NOTE: Paths specified in the JSON file are relative to the working directory where the script is executed from, NOT the location of the JSON file itself. It is possible (and recommended!) to use absolute paths instead to avoid confusion.

		\| key \| values (example) \| default \| description \|
		\|---------------------------\|:------------------:\|:-------------:\|-----------------------------------------------\|
		\| name \| "my_test" \| Required \| name of the test session \|
		\| author \| "myself" \| \| Author of the configuration file (optional) \|
		\| date \| 20210205 \| \| Date of creation (optional) \|
		\| \| \| \| \|
		\| enable_multiprocessing \| True/False \| True \| Enables multiprocessing, recommended to set to True to make things fast. \|
		\| delete_tmp \| True/False \| False \| Enables deletion of temporary directories (containing intermediate processing files, bitstreams and per-item logfiles etc.). \|
		\| \| \| \| \|
		\| input_path \| ./my_items/ \| Required \| Input directory with .WAV, .PCM or *.TXT files to process \|
		\| preproc_input \| True/False \| False \| Whether to execute preprocessing on the input files \|
		\| in_format \| HOA3 \| Required \| Input format for the conditions to generate, see spatial_audio_format \|
		\| in_fs \| 32000 \| 48000 \| Input sampling rate for conditions to generate (assumed to be sampling-rate of input PCM files to process) \|
		\| input_select \| ["in", "file2"] \| Required \| Filenames to filter in the input directory, can be a single value, an array or null. Only compares filenames (therefore "in" in this array would match both "in.wav" and "in.pcm") \|
		\| \| \| \| \|
		\| concatenate_input \| True/False \| False \| Whether to (horizontally) concatenate files in the input directory \|
		\| concat_silence_ms \| [1000, 1000] \| [0, 0] \| Specifies the pre- and post-silence duration to pad concatenation with in ms. If a single value is specified it will be used for BOTH pre- and post-padding \|
		\| preproc_loudness \| -26 \| \| Loudness to preprocess input to (dBov / LKFS depending on tool). Only processed if preproc_input is True. \|
		\| \| \| \| \|
		\| output_path \| ./out/ \| \| Output root directory hosting generated items & log \|
		\| out_fs \| 48000 \| 48000 \| Output sampling rate for conditions to generate \|
		\| output_loudness \| -26 \| \| Loudness level for output file (dBov / LKFS depending on tool). \|
		\| \| \| \| \|
		\| renderer_format \| 7_1_4 or CICP19 \| Required \| Format to be rendered (using offline rendering, will be bypassed if = out_format) \|
		\| binaural_rendered \| True/False \| False \| Extra binauralization of the rendered outputs (using offline rendering) \|
		\| include_LFE \| True/False \| False \| Whether to include LFE in binural rendering \|
		\| gain_factor \| float value \| 1.0 \| Gain factor to be applied to LFE channel \|
		\| loudness_tool \| "sv56demo" \| "bs1770demo" \| Tool to use for loudness adjustment. Currently only sv56demo and bs1770demo are supported for appropriate format configurations. Optionally can be a path to the binary. \|
		\| \| \| \| \|
		\| lt_mode \| "MUSHRA" \| \| Automatically generates a NAME.ltg file with generate_lt_file.py in output_path according to the specified mode \|
		\| conditions_to_generate \| ["ref", "ivas"] \| Required \| list of conditions to be generated, for ivas and evs, multiple conditions can be specified with an \_ separator (i.e. "ivas_branch", "ivas_trunk" etc.) \|
		\| \| \| \| \|
		\| ref \| \| \| \|
		\| - out_fc \| 32000 \| 48000 \| cut-off frequency to be applied to the reference condition in post \|
		\| ivas \| \| \| \|
		\| - bitrates \| [16400, 128000] \| Required \| Bitrate(s) used for IVAS encoder \|
		\| - enc_fs \| 48000 \| 48000 \| Sampling rate for input to the encoder (pre-processing) \|
		\| - max_band \| wb, swb, fb etc. \| FB \| Maximum encoded bandwidth \|
		\| - out_format \| 7_1_4 or CICP19 \| Required \| Output format for IVAS, see spatial_audio_format \|
		\| - dec_fs \| 48000 \| 48000 \| Sampling rate for decoder output \|
		\| - dtx \| True/False \| False \| Enable DTX mode \|
		\| - head_tracking \| True/False \| False \| Enable head tracking \|
		\| - ht_file \| \| "./trajectories/full_circle_in_15s" \| Head rotation file \|
		\| - plc \| True/False \| False \| Enables forward error correction `IVAS_dec -FEC X` \|
		\| - plc_rate \| 0-10 \| 10 \| Percentage of erased frames \|
		\| - cod_bin \| "../../../IVAS_cod"\| "../IVAS_cod" \| path to encoder binary \|
		\| - dec_bin \| "../../../IVAS_dec"\| "../IVAS_dec" \| path to decoder binary \|
		\| - cod_opt \| ["-ucct", "1"] \| \| list of additional encoder options \|
		\| - dec_opt \| ["-q"] \| \| list of additional decoder options \|
		\| evs \| \| \| \|
		\| - bitrates \| [13200, 164000] \| Required \| Bitrate used for multi-stream EVS condition per stream/channel \|
		\| - enc_fs \| 48000 \| 48000 \| Sampling rate for input to the encoder (pre-processing) \|
		\| - max_band \| wb, swb, fb etc. \| FB \| Maximum encoded bandwidth \|
		\| - dec_fs \| 48000 \| 480000 \| Sampling rate for decoder output \|
		\| - dtx \| True/False \| False \| Enable DTX mode \|
		\| - cod_bin \| ../../../IVAS_cod \| "../IVAS_cod" \| path to binary \|
		\| - dec_bin \| ../../../IVAS_dec \| "../IVAS_dec" \| path to binary \|
		\| \| \| \| \|

		---
		### 2.3. Supported test conditions

		The following conditions are the conditions which can be generated currently by `generate_test_items.py`.

		\| Supported conditions \| Description \|
		\|:--------------------:\|-----------------------------------------------------------\|
		\| ref \| Uncoded (reference) \|
		\| lp3k5 \| Uncoded low-passed at 3.5 kHz (anchor) \|
		\| lp7k \| Uncoded low-passed at 7 kHz (anchor) \|
		\| evs_mono \| Coded with multi-stream EVS codec, !!metadata not coded!! \|
		\| ivas \| Coded with IVAS codec \|


		Multiple conditions for evs_mono and ivas can be specified by using underscore separators e.g. `"ivas_1" : {...}, "ivas_2" : {...}`
		(also see `test_SBA.json` for an example)

		---

		### 2.4. Supported input/output/rendered audio formats

		\| spatial_audio_format \| Input/Ouput/Rendered \| Description \|
		\|--------------------------------------------------\|----------------------\|------------------------------------------------\|
		\| MONO \| yes/yes/yes \| mono signals \|
		\| STEREO \| yes/yes/yes \| stereo signals \|
		\| ISM or ISMx \| yes/no/no \| Objects with metadata, description using renderer metadata \|
		\| MASA or MASAx \| yes/no/no \| mono or stereo signals with spatial metadata !!!metadata must share same basename as waveform file but with .met extension!!! \|
		\| FOA/HOA2/HOA3 or PLANAR(FOA/HOAx) \| yes/yes/yes \| Ambisonic signals or planar ambisonic signals \|
		\| BINAURAL/BINAURAL_ROOM \| no/yes/yes \| Binaural signals \|
		\| 5_1/5_1_2/5_1_4/7_1/7_1_4 or CICP[6/12/14/16/19] \| yes/yes/yes \| Multi-channel signals for predefined loudspeaker layout \|
		\| META \| yes/yes/no \| Audio scene described by a renderer config \|

		---

		### 2.5. Processing

		The processing chain is as follows:

		1. Preprocessing
		- Condition: `preproc_input == true`
		- Input files converted to `in_format`
		2. Processing
		- Condition: Performed depending on key in `conditions_to_generate`
		- Coding/decoding from `in_format` to `out_format`
		3. Postprocessing
		1. Rendering to `renderer_format`
		- Condition: `out_format != renderer_format`
		- output files converted from `out_format` to `renderer_format`
		1. Binaural Rendering
		- Condition: `binaural_rendered == true` and `out_format` is not a BINAURAL type
		- output files converted from `out_format` to `BINAURAL`

		---

		### 2.6. Renderer Metadata definition

		To run, the renderer requires a config file describing the input scene.The expected format of the config file is as follows:

		---

		- Line 1: Path to a "multitrack" audio file. This should be a single multichannel wav/pcm file that contains all input audio. For example channels 1-4 can be an FOA scene,channel 5 - an object and channels 6-11 - a 5.1 channel bed. If the path is not absolute, it is considered relative to the renderer executable, not the config file. This path has lower priority than the one given on the command line: The path in the config file is ignored if the --inputAudio argument to the renderer executable is specified.

		---

		- Line 2: Contains number of inputs. An input can either be an Ambisonics scene, anobject or a channel bed.This is NOT the total number of channels in the input audio file.The renderer currently supports simultaneously: Up to 2 SBA inputs, Up to 2 MC inputs Up to 16 ISM inputsThese limits can be freely changed with pre-processor macros, if needed.

		---
		- Following lines:
		Define each of the inputs. Inputs can be listed in any order - they are NOT required to be listed in the same order as in the audio file.
		Input definitions:
		- First line of an input definition contains the input type: SBA, MC or ISM.Following lines depend on the input type:SBAIndex of the first channel of this input in the multitrack file (1-indexed)Ambisonics orderMCIndex of the first channel of this input in the multitrack file (1-indexed)CICP index of the speaker layoutISMIndex of this input's audio in the multitrack file (1-indexed)Path to ISM metadata file (if not absolute, relative to executable location)ORISMIndex of this input's audio in the multitrack file (1-indexed)Number N of positions defined, followed by N lines in form:
		stay in position for x frames, azimuth, elevation(ISM position metadata defined this way is looped if there are more framesof audio than given positions)

		---
		Example config
		The following example defines a scene with 4 inputs: ISM with trajectory defined in a separate file. Channel 12 in the input file. Ambisonics, order 1. Channels 1-4 in the input audio file. CICP6 channel bed. Channels 5-10 in the input audio file. ISM with 2 defined positions (-90,0) and (90,0). Channel 11 in the input file. The object will start at position (-90,0) and stay there for 5 frames, then move to (90,0) and stay there for 5 frames. This trajectory is looped over the duration of the input audio file.

		```
		./input_audio.wav4ISM12path/to/IVAS_ISM_metadata.csv
		3
		SBA
		1
		1
		MC
		5
		6
		ISM
		1
		1
		25,-90,05,90,
		```

		## 3. Script for converting formats and binauralizing

		The script audio3dtools.py can convert between different input and output formats and binauralize signals.

		Execute `python -m pyaudio3dtools.audio3dtools --help` for usage.

		### 3.1. Binauralizing with head rotation

		This example binauralizes a HOA3 signal with a head-rotation trajectory. Head rotation is peformed in SHD. It is supported for HOA3 and META input formats. For META input format, the audioscene is first prerendered to HOA3 and then rotated and binauralized.

		```
		python -m pyaudio3dtools.audio3dtools -i hoa3_input.wav -o . -F BINAURAL -T .\trajectories\full_circle_in_15s
		```

		### 3.2. Generating binaural reference signals

		Currently MC input signals are supported. The reference processing can be activated by selecting BINAURAL[_ROOM]_REF as output format. The signals are generated by convolving the channels with the filters from the database that are closes to the current position of the virtual LS. All interpolation methods supported by numpy can be chosen between the measured points along the trajectory.

		```
		python -m pyaudio3dtools.audio3dtools -i cicp6_input.wav -o . -F BINAURAL_REF -T .\trajectories\full_circle_in_15s
		```

		### 3.3. Rendering ISM to Custom loudspeakers with auxiliary binaural output
		ISM metadata can either be specified via an input text file in the Renderer Metadata definition format, or via the commandline using the same style as IVAS:
		```
		python -m pyaudio3dtools.audio3dtools -i ism2.wav -f ISM2 -m ism1.csv NULL -F 7_1_4 -o . -b -T .\trajectories\full_circle_in_15s
		```