Commit 4a2c526e authored by Jan Kiene's avatar Jan Kiene
Browse files

Merge branch '402-remove-cleanup-of-pyaudio3dtools-scripts-phase1' into 'main'

Resolve "Remove/cleanup of pyaudio3dtools scripts" - Phase 1 cleanup

See merge request !606
parents 18624e76 dddf9a0f
Loading
Loading
Loading
Loading
Loading
+0 −224
Original line number Diff line number Diff line
@@ -38,7 +38,6 @@ title: Python scripts for Testing the IVAS code and Generating test items
- [Python scripts for Testing the IVAS code and Generating test items](#python-scripts-for-testing-the-ivas-code-and-generating-test-items)
  - [Contents](#contents)
  - [0. Requirements](#0-requirements)
  - [- numpy and scipy for `generate_test_items.py`, `testBitexact.py` and `self_test.py`](#--numpy-and-scipy-for-generate_test_itemspy-testbitexactpy-and-self_testpy)
  - [1.  Scripts and classes for testing IVAS code](#1--scripts-and-classes-for-testing-ivas-code)
    - [1.1 Classes](#11-classes)
    - [1.2 Output directory structure](#12-output-directory-structure)
@@ -49,16 +48,6 @@ title: Python scripts for Testing the IVAS code and Generating test items
      - [`IvasBuildAndRunChecks.py`](#ivasbuildandruncheckspy)
      - [`testBitexact.py`](#testbitexactpy)
      - [`self_test.py`](#self_testpy)
  - [2. Script for generating listening test items](#2-script-for-generating-listening-test-items)
    - [2.1. `generate_test_items.py`](#21-generate_test_itemspy)
    - [2.2. Test configuration file](#22-test-configuration-file)
    - [2.3. Supported test conditions](#23-supported-test-conditions)
    - [2.4. Supported input/output/rendered audio formats](#24-supported-inputoutputrendered-audio-formats)
    - [2.5. Processing](#25-processing)
    - [2.6. Renderer Metadata definition](#26-renderer-metadata-definition)
  - [3. Script for converting formats and binauralizing](#3-script-for-converting-formats-and-binauralizing)
    - [3.1. Binauralizing with head rotation](#31-binauralizing-with-head-rotation)
    - [3.2. Generating binaural reference signals](#32-generating-binaural-reference-signals)
    
---

@@ -441,216 +430,3 @@ Missing reference conditions and the test conditions are then generated and
the reference and test conditions are compared.

-----


## 2. Script for generating listening test items

The `generate_test_items.py` python script helps to quickly setup listening tests with multiple (pre-)processing and post-processing options. 

### 2.1. `generate_test_items.py`

Script for generating (listening) test items.

```
usage: generate_test_items.py [-h] -i INFILE [INFILE ...]

Generate test items

optional arguments:
  -h, --help            show this help message and exit
  -i INFILE [INFILE ...], --infile INFILE [INFILE ...]
                        Configuration file(s): FILE1.json FILE2.json ...
```

Example how to call it:

```
   python3 .\generate_test_items.py -i .\examples\my_test_config.json
```

Where `my_test_config.json` is a test configuration file in json format with fields explained in next section.

### 2.2. Test configuration file

This is the main file to edit in order to change global configuration options, detailed below.

*NOTE: Paths specified in the JSON file are relative to the working directory where the script is executed from, NOT the location of the JSON file itself. It is possible (and recommended!) to use absolute paths instead to avoid confusion.*

| key                       | values (example)   |    default    |   description                                 |
|---------------------------|:------------------:|:-------------:|-----------------------------------------------|
| name                      | "my_test"          | Required      | name of the test session                      |
| author                    | "myself"           |               | Author of the configuration file (optional)   |
| date                      | 20210205           |               | Date of creation (optional)                   |
|                           |                    |               |                                               |
| enable_multiprocessing    | True/False         | True          | Enables multiprocessing, recommended to set to True to make things fast. |
| delete_tmp                | True/False         | False         | Enables deletion of temporary directories (containing intermediate processing files, bitstreams and per-item logfiles etc.). |
|                           |                    |               |                                               |
| input_path                | ./my_items/        | Required      | Input directory with *.WAV, *.PCM or *.TXT files to process |
| preproc_input             | True/False         | False         | Whether to execute preprocessing on  the input files |
| in_format                 | HOA3               | Required      | Input format for the conditions to generate, see spatial_audio_format |
| in_fs                     | 32000              | 48000         | Input sampling rate for conditions to generate (assumed to be sampling-rate of input PCM files to process) |
| input_select              | ["in", "file2"]    | Required      | Filenames to filter in the input directory, can be a single value, an array or null. Only compares filenames (therefore "in" in this array would match both "in.wav" and "in.pcm") |
|                           |                    |               |                                               |
| concatenate_input         | True/False         | False         | Whether to (horizontally) concatenate files in the input directory |
| concat_silence_ms         | [1000, 1000]       | [0, 0]        | Specifies the pre- and post-silence duration to pad concatenation with in ms. If a single value is specified it will be used for BOTH pre- and post-padding |
| preproc_loudness          | -26                |               | Loudness to preprocess input to (dBov / LKFS depending on tool). Only processed if preproc_input is True. |
|                           |                    |               |                                               |
| output_path               | ./out/             |               | Output root directory hosting generated items & log |
| out_fs                    | 48000              | 48000         | Output sampling rate for conditions to generate |
| output_loudness           | -26                |               | Loudness level for output file (dBov / LKFS depending on tool). |
|                           |                    |               |                                               |
| renderer_format           | 7_1_4 or CICP19    | Required      | Format to be rendered (using offline rendering, will be bypassed if = out_format) |
| binaural_rendered         | True/False         | False         | Extra binauralization of the rendered outputs (using offline rendering) |
| include_LFE               | True/False         | False         | Whether to include LFE in binural rendering   |
| gain_factor               | float value        | 1.0           | Gain factor to be applied to LFE channel      |
| loudness_tool             | "sv56demo"         | "bs1770demo"  | Tool to use for loudness adjustment. Currently only sv56demo and bs1770demo are supported for appropriate format configurations. Optionally can be a path to the binary.  |
|                           |                    |               |                                               |
| lt_mode                   | "MUSHRA"           |               | Automatically generates a NAME.ltg file with generate_lt_file.py in output_path according to the specified mode |
| conditions_to_generate    | ["ref", "ivas"]    | Required      | list of conditions to be generated, for ivas and evs, multiple conditions can be specified with an \_ separator (i.e. "ivas_branch", "ivas_trunk" etc.)            |
|                           |                    |               |                                               |
| ref                       |                    |               |                                               |
| - out_fc                  | 32000              | 48000         | cut-off frequency to be applied to the reference condition in post |
| ivas                      |                    |               |                                               |
| - bitrates                | [16400, 128000]    | Required      | Bitrate(s) used for IVAS encoder              |
| - enc_fs                  | 48000              | 48000         | Sampling rate for input to the encoder (pre-processing) |
| - max_band                | wb, swb, fb etc.   | FB            | Maximum encoded bandwidth                     |
| - out_format              | 7_1_4 or CICP19    | Required      | Output format for IVAS, see spatial_audio_format |
| - dec_fs                  | 48000              | 48000         | Sampling rate for decoder output              |
| - dtx                     | True/False         | False         | Enable DTX mode                               |
| - head_tracking           | True/False         | False         | Enable head tracking                          |
| - ht_file                 |                    | "./trajectories/full_circle_in_15s" | Head rotation file                            |
| - plc                     | True/False         | False         | Enables forward error correction  `IVAS_dec -FEC X` |
| - plc_rate                | 0-10               | 10            | Percentage of erased frames                   |
| - cod_bin                 | "../../../IVAS_cod"| "../IVAS_cod" | path to encoder binary                        |
| - dec_bin                 | "../../../IVAS_dec"| "../IVAS_dec" | path to decoder binary                        |
| - cod_opt                 | ["-ucct", "1"]     |               | list of additional encoder options            |
| - dec_opt                 | ["-q"]             |               | list of additional decoder options            |
| evs                       |                    |               |                                               |
| - bitrates                | [13200, 164000]    | Required      | Bitrate used for multi-stream EVS condition per stream/channel |
| - enc_fs                  | 48000              | 48000         | Sampling rate for input to the encoder (pre-processing) |
| - max_band                | wb, swb, fb etc.   | FB            | Maximum encoded bandwidth                     |
| - dec_fs                  | 48000              | 480000        | Sampling rate for decoder output              |
| - dtx                     | True/False         | False         | Enable DTX mode                               |
| - cod_bin                 | ../../../IVAS_cod  | "../IVAS_cod" | path to binary                                |
| - dec_bin                 | ../../../IVAS_dec  | "../IVAS_dec" | path to binary                                |
|                           |                    |               |                                               |

---
### 2.3. Supported test conditions

The following conditions are the conditions which can be generated currently by `generate_test_items.py`.

| Supported conditions | Description                                               |
|:--------------------:|-----------------------------------------------------------|
|        ref           | Uncoded (reference)                                       |
|       lp3k5          | Uncoded low-passed at 3.5 kHz (anchor)                    |
|        lp7k          | Uncoded low-passed at 7 kHz (anchor)                      |
|      evs_mono        | Coded with multi-stream EVS codec, !!metadata not coded!! |
|        ivas          | Coded with IVAS codec                                     |


Multiple conditions for evs_mono and ivas can be specified by using underscore separators e.g. `"ivas_1" : {...}, "ivas_2" : {...}`
(also see `test_SBA.json` for an example)

---

### 2.4. Supported input/output/rendered audio formats

| spatial_audio_format                             | Input/Ouput/Rendered | Description                                    |
|--------------------------------------------------|----------------------|------------------------------------------------|
| MONO                                             |    yes/yes/yes       | mono signals                                   |
| STEREO                                           |    yes/yes/yes       | stereo signals                                 |
| ISM or ISMx                                      |    yes/no/no         | Objects with metadata, description using renderer metadata |
| MASA or MASAx                                    |    yes/no/no         | mono or stereo signals with spatial metadata !!!metadata must share same basename as waveform file but with .met extension!!! |
| FOA/HOA2/HOA3 or PLANAR(FOA/HOAx)                |    yes/yes/yes       | Ambisonic signals or planar ambisonic signals  |
| BINAURAL/BINAURAL_ROOM                           |    no/yes/yes        | Binaural signals                               |
| 5_1/5_1_2/5_1_4/7_1/7_1_4 or CICP[6/12/14/16/19] |    yes/yes/yes       | Multi-channel signals for predefined loudspeaker layout |
| META                                             |    yes/yes/no        | Audio scene described by a renderer config     |

---

### 2.5. Processing

The processing chain is as follows:

1. Preprocessing
   - **Condition**: `preproc_input == true`
   - Input files converted to `in_format`
2. Processing
   - **Condition**: Performed depending on key in `conditions_to_generate`
   - Coding/decoding from `in_format` to `out_format`
3. Postprocessing
   1. Rendering to `renderer_format`
      - **Condition**: `out_format != renderer_format`
      - output files converted from `out_format` to `renderer_format`
   1. Binaural Rendering
      - **Condition**: `binaural_rendered == true` and `out_format` is not a BINAURAL type
      - output files converted from `out_format` to `BINAURAL`

---

### 2.6. Renderer Metadata definition

To run, the renderer requires a config file describing the input scene.The expected format of the config file is as follows:

---

- Line 1: Path to a "multitrack" audio file. This should be a single multichannel wav/pcm file that contains all input audio. For example channels 1-4 can be an FOA scene,channel 5 - an object and channels 6-11 - a 5.1 channel bed. If the path is not absolute, it is considered relative to the renderer executable, not the config file. This path has lower priority than the one given on the command line: *The path in the config file is ignored if the --inputAudio argument to the renderer executable is specified.*

---

- Line 2:  Contains number of inputs. An input can either be an Ambisonics scene, anobject or a channel bed.This is NOT the total number of channels in the input audio file.The renderer currently supports simultaneously:  *Up to 2 SBA inputs,  Up to 2 MC inputs* Up to 16 ISM inputsThese limits can be freely changed with pre-processor macros, if needed.

---
- Following lines: 
Define each of the inputs. Inputs can be listed in any order - they are NOT required to be listed in the same order as in the audio file.
Input definitions:
  - First line of an input definition contains the input type: SBA, MC or ISM.Following lines depend on the input type:SBAIndex of the first channel of this input in the multitrack file (1-indexed)Ambisonics orderMCIndex of the first channel of this input in the multitrack file (1-indexed)CICP index of the speaker layoutISMIndex of this input's audio in the multitrack file (1-indexed)Path to ISM metadata file (if not absolute, relative to executable location)ORISMIndex of this input's audio in the multitrack file (1-indexed)Number N of positions defined, followed by N lines in form:
stay in position for x frames, azimuth, elevation(ISM position metadata defined this way is looped if there are more framesof audio than given positions)

---
Example config 
The following example defines a scene with 4 inputs: *ISM with trajectory defined in a separate file. Channel 12 in the input file.* Ambisonics, order 1. Channels 1-4 in the input audio file. *CICP6 channel bed. Channels 5-10 in the input audio file.* ISM with 2 defined positions (-90,0) and (90,0). Channel 11 in the input file.  The object will start at position (-90,0) and stay there for 5 frames, then  move to (90,0) and stay there for 5 frames. This trajectory is looped over the  duration of the input audio file.

```
./input_audio.wav4ISM12path/to/IVAS_ISM_metadata.csv
3
SBA
1
1
MC
5
6
ISM
1
1
25,-90,05,90,
```

## 3. Script for converting formats and binauralizing

The script audio3dtools.py can convert between different input and output formats and binauralize signals. 

Execute `python -m pyaudio3dtools.audio3dtools --help` for usage.

### 3.1. Binauralizing with head rotation

This example binauralizes a HOA3 signal with a head-rotation trajectory. Head rotation is peformed in SHD. It is supported for HOA3 and META input formats. For META input format, the audioscene is first prerendered to HOA3 and then rotated and binauralized.

```
python -m pyaudio3dtools.audio3dtools -i hoa3_input.wav  -o . -F BINAURAL -T .\trajectories\full_circle_in_15s
```

### 3.2. Generating binaural reference signals

Currently MC input signals are supported. The reference processing can be activated by selecting BINAURAL[_ROOM]_REF as output format. The signals are generated by convolving the channels with the filters from the database that are closes to the current position of the virtual LS.  All interpolation methods supported by numpy can be chosen between the measured points along the trajectory.

```
python -m pyaudio3dtools.audio3dtools -i cicp6_input.wav  -o . -F BINAURAL_REF -T .\trajectories\full_circle_in_15s
```

### 3.3. Rendering ISM to Custom loudspeakers with auxiliary binaural output
ISM metadata can either be specified via an input text file in the Renderer Metadata definition format, or via the commandline using the same style as IVAS:
```
python -m pyaudio3dtools.audio3dtools -i ism2.wav -f ISM2 -m ism1.csv NULL -F 7_1_4 -o . -b -T .\trajectories\full_circle_in_15s 
```
+0 −3
Original line number Diff line number Diff line
version https://git-lfs.github.com/spec/v1
oid sha256:b41a527b6ba22b4c100265655ca801ee4d2dba3c3e03dc58f7cc5d99e397d2c3
size 11795531
+0 −3
Original line number Diff line number Diff line
version https://git-lfs.github.com/spec/v1
oid sha256:081a9053c8b04831d97e6f18d641d4737b2c23b076778a9b41c7b3a41d954c32
size 6348446
+0 −3
Original line number Diff line number Diff line
version https://git-lfs.github.com/spec/v1
oid sha256:0544d1cf80a7cceb156760107d81b10fd787807bb0ea1e74e9aeb552474b3373
size 13233924

scripts/pyaudio3dtools/EFAP.py

deleted100644 → 0
+0 −929

File deleted.

Preview size limit exceeded, changes collapsed.

Loading