Commit 4a5cf87f authored by Devansh Kandpal's avatar Devansh Kandpal
Browse files

Merge branch '75-sba-to-binaural_room-rendering-does-not-use-custom-brirs' of...

Merge branch '75-sba-to-binaural_room-rendering-does-not-use-custom-brirs' of https://forge.3gpp.org/rep/ivas-codec-pc/ivas-processing-scripts into 75-sba-to-binaural_room-rendering-does-not-use-custom-brirs
parents cd5b55a5 8e13546f
Loading
Loading
Loading
Loading
Loading
+283 −267
Original line number Diff line number Diff line
@@ -59,28 +59,28 @@ item generation and item processing. The two steps can be applied independent of

### Item generation

To set up the P800-{X} listening test (X = 1, 2, ...9) copy your mono input files to 'experiments/selection/P800-{X}/gen_input/items_mono'.
These files have to follow the naming scheme '{l}{LL}p0{X}{name_of_item}' where 'l' stands for the listening lab designator: a (Force Technology),
To set up the P800-{X} listening test (X = 1, 2, ...9) copy your mono input files to `experiments/selection/P800-{X}/gen_input/items_mono`.
These files have to follow the naming scheme `{l}{LL}p0{X}{name_of_item}` where 'l' stands for the listening lab designator: a (Force Technology),
b (HEAD acoustics), c (MQ University), d (Mesaqin.com), and 'LL' stands for the language: EN, GE, JP, MA, DK, FR.

The impluse responses have to be copied to experiments/selection/P800-{X}/gen_input/IRs.

To generate the items run `python -m ivas_processing_scripts.generation experiments/selection/P800-{X}/config/item_gen_P800-{X}_{l}.yml` from the root folder of the repository.
The resulting files can be found in 'experiments/selection/P800-{X}/proc_input_{l}' sorted by category.
The resulting files can be found in `experiments/selection/P800-{X}/proc_input_{l}` sorted by category.

For P800-3 the input files for the processing are already provided by the listening lab. This means this step can be skipped.
For tests with ISM input format (P800-6 and P800-7) no IRs are needed, only mono sentences

### Item processing

The input has to be in the folder 'experiments/selection/P800-{X}/proc_input_{l}'. If item generation is performed previous to this step, the corresponding files are already in the right folder.
The input has to be in the folder `experiments/selection/P800-{X}/proc_input_{l}`. If item generation is performed previous to this step, the corresponding files are already in the right folder.
If this step is performed independently of the previous one the input files have to be copied to the respective folder sorted by category.

If the test includes background noise, the corresponding files have to be copied to 'experiments/selection/P800-{X}/background_noise'.
For most tests the naming has to follow the scheme 'background_noise_cat{c}.wav' where 'c' denotes the category with a number between one and six. For the P800-2 test, the naming has to follow 'background_noise_cat{c}-lab_{l}.wav'
If the test includes background noise, the corresponding files have to be copied to `experiments/selection/P800-{X}/background_noise`.
For most tests the naming has to follow the scheme `background_noise_cat{c}.wav` where 'c' denotes the category with a number between one and six. For the P800-2 test, the naming has to follow `background_noise_cat{c}-lab_{l}.wav`

To process the items run `python generate_test.py P800-{X},{l}` from the root folder of the repository.
The results can be found in 'experiments/selection/P800-{X}/proc_output_{l}'.
The results can be found in `experiments/selection/P800-{X}/proc_output_{l}`.

For more information about this processing step see
[How to generate the configs and process items for the selection test experiments](#how-to-generate-the-configs-and-process-items-for-the-selection-test-experiments).
@@ -91,10 +91,10 @@ The set up for the MUSHRA test only consists of the item processing.

### Item processing

To process a BS1534-{X}{x} (X = 1, 2, ...7, x = a, b) listening test, the input files have to be placed in the folder 'experiments/selection/BS1534-{X}{x}/proc_input_{l}' and the command 
To process a BS1534-{X}{x} (X = 1, 2, ...7, x = a, b) listening test, the input files have to be placed in the folder `experiments/selection/BS1534-{X}{x}/proc_input_{l}` and the command
`python generate_test.py BS1534-{X}{x},{l}` has to be run from the root of the repository. 'l' stands for the listening lab designator: a (Force Technology), b (HEAD acoustics), c (MQ University), d (Mesaqin.com).

The output can then be found in 'experiments7selection/BS1534-{X}{x}/proc_output_{l}'
The output can then be found in `experiments7selection/BS1534-{X}{x}/proc_output_{l}`

The BS1534-7a and BS1534-7b tests are MASA experiments with FOA and HOA2 inputs. Therefore the input folder contains two subfolders called 'FOA' and 'HOA2'.
The input files have to be placed in the folders according to their format.
@@ -332,7 +332,6 @@ input:
# tx:
  ### REQUIRED: Type of bitstream processing; possible types: "JBM" or "FER"
  #type: "JBM"
    
  ### JBM
  ### REQUIRED: either error_pattern or error_profile
  ### delay error profile file
@@ -341,7 +340,6 @@ input:
  # error_profile: 5
  ## nFramesPerPacket parameter for the network simulator (optional); default = 1
  # n_frames_per_packet: 2
    
  ### FER
  ### REQUIRED: either error_pattern or error_rate
  ### Frame error pattern file
@@ -349,8 +347,8 @@ input:
  ### Error rate in percent
  # error_rate: 5
```
</details>

</details>

### Configuration of conditions under test

@@ -513,7 +511,7 @@ postprocessing:
The following values may be used for the `type` key of a condition:

| Supported conditions | Description                                                 |
|:--------------------:|-----------------------------------------------------------|
| :------------------: | ----------------------------------------------------------- |
|         ref          | Uncoded (reference)                                         |
|        lp3k5         | Uncoded low-passed at 3.5 kHz (anchor)                      |
|         lp7k         | Uncoded low-passed at 7 kHz (anchor)                        |
@@ -526,23 +524,34 @@ The following values may be used for the `type` key of a condition:
### Configuration of conditions

#### Reference ref

No required arguments but the `type` key. An additional LP filtering can be specified with `out_fc`.

#### Low-pass Filter lp3k5 and lp7k

No required arguments but the `type` key.

#### MNRU and ESDRU

The MNRU and ESDRU conditions each take one additional required argument. For MNRU the value `q`, which represents the ratio of speech power to modulated noise power in dB,
has to be specified. <br /> 
has to be specified.  
For the ESDRU the spatial degradation value `alpha` in the range [0, 1] has to be defined.

#### Mono downmix mono_dmx

No required arguments but the `type` key.

#### EVS

For EVS a list of at least one bitrate has to be specified with the key `bitrates`. The entries in this list can also be lists containing the bitrates used for the processing of the individual channels.
This configuration has to match the channel configuration. If the provided list is shorter, the last value will be repeated.
For the encoding stage `cod` and the decoding stage `dec`, the path to the IVAS_cod and IVAS_dec binaries can be specified under the key `bin`.
Additionally some resampling can be applied by using the key `fs` followed by the desired sampling rate.
The general bitstream processing configuration can be locally overwritten for each EVS and IVAS condition with the key `tx`.
The additional key `evs_lfe_9k6bps_nb` is only available for EVS conditions and ensures a bitrate of 9.6kbps and narrow band processing of the LFE channel(s).

#### IVAS

The configuration of the IVAS condition is similar to the EVS condition. However, only one bitrate for all channels (and metadata) can be specified.
In addition to that, the encoder and decoder take some additional arguments defined by the key `opts`.
For the decoder an output format can be set. If this argument is not defined the format specified in postprocessing is used.
@@ -557,12 +566,14 @@ postprocessing.
## Supported audio formats

| spatial_audio_format              | Input/Ouput/Rendered | Description                                                                                                                   |
|--------------------------------------------------|----------------------|------------------------------------------------|
| --------------------------------- | -------------------- | ----------------------------------------------------------------------------------------------------------------------------- |
| MONO                              | yes/yes/yes          | mono signals                                                                                                                  |
| STEREO                            | yes/yes/yes          | stereo signals                                                                                                                |
| ISMx                              | yes/yes/no           | Objects with metadata, description using renderer metadata                                                                    |
| MASAx                             | yes/no/no            | mono or stereo signals with spatial metadata !!!metadata must share same basename as waveform file but with .met extension!!! |
| ISMxMASAy                         | yes/no/no            | Combined objects with MASA                                                                                                    |
| FOA/HOA2/HOA3 or PLANAR(FOA/HOAx) | yes/yes/yes          | Ambisonic signals or planar ambisonic signals                                                                                 |
| ISMxSBAy                          | yes/no/no            | Combined objects with ambisonics                                                                                              |
| BINAURAL/BINAURAL_ROOM            | no/yes/yes           | Binaural signals                                                                                                              |
| 5_1/5_1_2/5_1_4/7_1/7_1_4         | yes/yes/yes          | Multi-channel signals for predefined loudspeaker layout                                                                       |
| META                              | yes/no/no            | Audio scene described by a renderer config (work in progress)                                                                 |
@@ -582,27 +593,28 @@ The processing chain is as follows:
   - The postprocessing stage performs a final conversion from the output of the previous stage if necessary and applies the specified processing

---

## Additional Executables

The following additional executables are needed for the different processing steps:

| Processing step                                                                             | Executable                 | Where to find                                                                                                   |
|-------------------------------------------------|-----------------------|-------------------------------------------------------------------------------------------------------------|
| Loudness measurement and adjustment             | bs1770demo            | https://github.com/ErikNorvell-Ericsson/STL (Note branch)                                                                             |
| MNRU                                            | p50fbmnru             | https://github.com/openitu/STL                                                                              |
| ESDRU                                           | esdru                 | https://github.com/openitu/STL                                                                              |
| Frame error pattern application                 | eid-xor               | https://github.com/openitu/STL                                                                              |
| Error pattern generation                        | gen-patt              | https://www.itu.int/rec/T-REC-G.191-201003-S/en (Note: Version in https://github.com/openitu/STL is buggy!) |
| Reverberation module                            | reverb                | https://github.com/openitu/STL                                                                              |
| Filtering, Resampling                           | filter                | https://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_76/docs/S4-131277.zip                                       |
| Random offset/seed generation (necessary for background noise and FER bitstream processing)   | random                | https://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_76/docs/S4-131277.zip                                       |
| JBM network simulator                           | networkSimulator_g192 | https://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_76/docs/S4-131277.zip                                       |
| MASA rendering (also used in loudness measurement of MASA items)        | masaRenderer, masaAnalyzer   | https://www.3gpp.org/ftp/TSG_SA/WG4_CODEC/TSGS4_122_Athens/Docs/S4-230221.zip         |
| EVS reference conditions        | EVS_cod, EVS_dec      | https://www.3gpp.org/ftp/Specs/archive/26_series/26.443/26443-h00.zip                                       |
| ------------------------------------------------------------------------------------------- | -------------------------- | --------------------------------------------------------------------------------------------------------------- |
| Loudness measurement and adjustment                                                         | bs1770demo                 | <https://github.com/ErikNorvell-Ericsson/STL> (Note branch)                                                     |
| MNRU                                                                                        | p50fbmnru                  | <https://github.com/openitu/STL>                                                                                |
| ESDRU                                                                                       | esdru                      | <https://github.com/openitu/STL>                                                                                |
| Frame error pattern application                                                             | eid-xor                    | <https://github.com/openitu/STL>                                                                                |
| Error pattern generation                                                                    | gen-patt                   | <https://www.itu.int/rec/T-REC-G.191-201003-S/en> (Note: Version in <https://github.com/openitu/STL> is buggy!) |
| Reverberation module                                                                        | reverb                     | <https://github.com/openitu/STL>                                                                                |
| Filtering, Resampling                                                                       | filter                     | <https://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_76/docs/S4-131277.zip>                                         |
| Random offset/seed generation (necessary for background noise and FER bitstream processing) | random                     | <https://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_76/docs/S4-131277.zip>                                         |
| JBM network simulator                                                                       | networkSimulator_g192      | <https://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_76/docs/S4-131277.zip>                                         |
| MASA rendering (also used in loudness measurement of MASA items)                            | masaRenderer, masaAnalyzer | <https://www.3gpp.org/ftp/TSG_SA/WG4_CODEC/TSGS4_122_Athens/Docs/S4-230221.zip>                                 |
| EVS reference conditions                                                                    | EVS_cod, EVS_dec           | <https://www.3gpp.org/ftp/Specs/archive/26_series/26.443/26443-h00.zip>                                         |

The necessary binaries have to be either placed in the [ivas_processing_scripts/bin](./ivas_processing_scripts/bin) folder or the path has to be specified in
[ivas_processing_scripts/binary_paths.yml](./ivas_processing_scripts/binary_paths.yml).
For most of the tools it is sufficient to copy the binaries while it is necessary to add the associated *.bin files for the MASA renderer.
For most of the tools it is sufficient to copy the binaries while it is necessary to add the associated \*.bin files for the MASA renderer.

---

@@ -651,26 +663,26 @@ First line of an input definition contains the input type: SBA, MC or ISM.

Following lines depend on the input type:

```
```text
SBA
Index of the first channel of this input in the multitrack file (1-indexed)
Ambisonics order
```

```
```text
MC
Index of the first channel of this input in the multitrack file (1-indexed)
Name of speaker layout (X_Y_Z or CICPx format)
```

```
```text
MASA
Index of the first channel of this input in the multitrack file (1-indexed)
Number of transport channels
Path to MASA metadata file (must be relative to config file location)
```

```
```text
ISM
Index of this input's audio in the multitrack file (1-indexed)
Path to ISM metadata file (must be relative to config file location)
@@ -678,7 +690,7 @@ Path to ISM metadata file (must be relative to config file location)

OR

```
```text
ISM
Index of this input's audio in the multitrack file (1-indexed)
Number N of positions defined, followed by N lines in form:
@@ -699,7 +711,7 @@ Each key-value pair should be placed on a separate line.
Currently the following key-value pairs are supported:

| key     | value type |
|---------------------|--------------------------------------|
| ------- | ---------- |
| gain_dB | float      |

## Example configuration
@@ -715,7 +727,7 @@ The following example defines a scene with 4 inputs:
  move to (90,0) and stay there for 5 frames. This trajectory is looped over the
  duration of the input audio file.

```
```text
./input_audio.wav
4
ISM
@@ -745,7 +757,8 @@ Please refer to [the notebook](./examples/audiotools.ipynb) for an overview.
# How to generate the configs and process items for the selection test experiments

The script `generate_test.py` is used to generate config files and process items for the selection test experiments:
```

```text
usage: generate_test.py [-h] [--no_parallel] [--create_cfg_only] exp_lab_pairs [exp_lab_pairs ...]

Generate config files and process files for selecton experiments. Experiment names and lab ids must be given as comma-separated pairs (e.g. 'P800-5,b BS1534-4a,d ...')
@@ -758,13 +771,16 @@ options:
  --no_parallel      If given, configs will not be run in parallel
  --create_cfg_only  If given, only create the configs and folder structure without processing items
```

Before running the script, one needs to put the input files in the respective input folder (including the background noise files, see below). If input files are missing, the script will complain and stop. For example, for processing tests P800-3 and BS1534-4a for labs b and d, respectively, command line would look like this (no whitespace between the commas!):
```

```bash
python3 generate_test.py P800-3,b BS1534-4a,d
```

Tests are processed separately per category and per lab (as some values in the configs are dependent on category and lab). For each experiment, a static base config is stored from which the actual configs are generated (identfied by the suffix `catX-lab_Y.yml`). For P800 tests, there are 6 categories each. The BS1534 experiments do not define categories, except for the MASA ones (BAS534-7a/b) - there one might mix FOA and HOA2 input material, so ther eare 2 categories for those in the scripts (category 1 for FOA, category 2 for HOA2). In `experiments/selection/` there is a folder structure prepared for all selection experiments, in which you have to put the input files for your test. For example, for P800-1:
```

```text
experiments/selection/P800-1/
├── background_noise    <--- put your background files in here and name them as background_noisecatX.wav. Not all experiments use background noise
├── config              <--- contains base config, generated configs will be stored here, too
+41 −0
Original line number Diff line number Diff line
# Binaural Datasets

Files in this directory should contain impulse responses for use in rendering in Matlab .mat format
A sampling rate of 48kHz is assumed.

## Naming scheme

Files should adhere to the following naming scheme:

`{HRIR|BRIR}_{DATASETNAME}_{FULL|LS|SBA(1-3)}.mat`

- `HRIR or BRIR`  
   specifies the type of impulse response which will be used for  
   either `BINAURAL` or `BINAURAL_ROOM` output respectively
- `DATASETNAME`  
  specifies the name used with the binaural_dataset commandline argument  
   or YAML key to enable selection of this dataset
- `FULL or LS or SBA3`  
  specifies the subset of impulse responses in the file:
  - `FULL`: all available measurements on the sphere
  - `LS`: superset of supported loudspeaker layouts  
    _(see `audiotools.constants.CHANNEL_BASED_AUDIO_FORMATS["LS""]`)_
  - `SBA{1,2,3}`: impulse responses transformed to ambisonics by external conversion
    - if available SBA1 is used for FOA, SBA2 for HOA2 and SBA3 for HOA3
    - if not available SBA3 is used and truncated for all Ambisonic formats

## File Contents

Each Matlab file should contain the following variables:

- `IR`  
  Impulse responses with dimensions [ir_length x n_ears x n_channels]
- `SourcePosition`  
  array of {azimuth, elevation, radius} of dimensions [n_channels x 3]  
  required for FULL, optional otherwise
- `latency_s`  
  latency of the dataset in samples  
  optional, will be estimated if not provided

LICENSES:  
Please see [HRIR.txt](../../../thirdPartyLegalNotices/HRIR.txt) and [BRIR.txt](../../../thirdPartyLegalNotices/BRIR.txt) for license info
+0 −34
Original line number Diff line number Diff line
Files in this directory should contain impulse responses for use in rendering in Matlab .mat format
Samplingrate of 48kHz is assumed

Files should adhere to the following naming scheme:

{HRIR|BRIR}_{DATASETNAME}_{FULL|LS|SBA(1-3)}.mat

- HRIR or BRIR
    specifies the type of impulse response which will be used
    for either BINAURAL or BINAURAL_ROOM output respectively
- DATASETNAME
    specifies the name used with the binaural_dataset commandline argument
    or YAML key to enable selection of this dataset
- FULL or LS or SBA3
    specifies the subset of impulse responses in the file:
    FULL:       all available measurements on the sphere
    LS:         superset of supported loudspeaker layouts
                (see audiotools.constants.CHANNEL_BASED_AUDIO_FORMATS["LS""])
    SBA(1-3):   impulse responses transformed to ambisonics by external conversion
                if available SBA1 is used for FOA, SBA2 for HOA2 and SBA3 for HOA3
                if not available SBA3 is used and truncated for all Ambisonic formats

Each Matlab file should contain the following variables:
- IR
    Impulse responses with dimensions [ir_length x n_ears x n_channels]
- SourcePosition 
    array of {azimuth, elevation, radius} of dimensions [n_channels x 3]
    required for FULL, optional otherwise
- latency_s
    latency of the dataset in samples
    optional, will be estimated if not provided
    
LICENSES:
Please see HRIR.txt and BRIR.txt for license info
 No newline at end of file
+1 −1
Original line number Diff line number Diff line
@@ -113,7 +113,7 @@ def render_oba_to_binaural(

        render_oba_to_cba(oba, cba_tmp)

        render_cba_to_binaural(cba_tmp, bin, trajectory)
        render_cba_to_binaural(cba_tmp, bin, trajectory, bin_dataset, **kwargs)
    else:
        IR, SourcePosition, latency_smp = load_ir(oba.name, bin.name, bin_dataset)

+1 −1
Original line number Diff line number Diff line
@@ -61,7 +61,7 @@ def convert_omasa(
        omasa.audio[:, : omasa.num_ism_channels],
        omasa.fs,
    )
    oba.metadata_files = copy(omasa.metadata_files)
    oba.metadata_files = omasa.metadata_files[:-1]
    oba.object_pos = copy(omasa.object_pos)
    masa = audio.fromarray(
        "MASA"
Loading