diff --git a/README.md b/README.md index 4ee1686015acf91454571afdbd32f889b01e0947..43efa62e7ca76203fa0b4aefe38cff8f886946a6 100755 --- a/README.md +++ b/README.md @@ -59,30 +59,30 @@ item generation and item processing. The two steps can be applied independent of ### Item generation -To set up the P800-{X} listening test (X = 1, 2, ...9) copy your mono input files to 'experiments/selection/P800-{X}/gen_input/items_mono'. -These files have to follow the naming scheme '{l}{LL}p0{X}{name_of_item}' where 'l' stands for the listening lab designator: a (Force Technology), +To set up the P800-{X} listening test (X = 1, 2, ...9) copy your mono input files to `experiments/selection/P800-{X}/gen_input/items_mono`. +These files have to follow the naming scheme `{l}{LL}p0{X}{name_of_item}` where 'l' stands for the listening lab designator: a (Force Technology), b (HEAD acoustics), c (MQ University), d (Mesaqin.com), and 'LL' stands for the language: EN, GE, JP, MA, DK, FR. The impluse responses have to be copied to experiments/selection/P800-{X}/gen_input/IRs. To generate the items run `python -m ivas_processing_scripts.generation experiments/selection/P800-{X}/config/item_gen_P800-{X}_{l}.yml` from the root folder of the repository. -The resulting files can be found in 'experiments/selection/P800-{X}/proc_input_{l}' sorted by category. +The resulting files can be found in `experiments/selection/P800-{X}/proc_input_{l}` sorted by category. For P800-3 the input files for the processing are already provided by the listening lab. This means this step can be skipped. For tests with ISM input format (P800-6 and P800-7) no IRs are needed, only mono sentences ### Item processing -The input has to be in the folder 'experiments/selection/P800-{X}/proc_input_{l}'. If item generation is performed previous to this step, the corresponding files are already in the right folder. +The input has to be in the folder `experiments/selection/P800-{X}/proc_input_{l}`. If item generation is performed previous to this step, the corresponding files are already in the right folder. If this step is performed independently of the previous one the input files have to be copied to the respective folder sorted by category. -If the test includes background noise, the corresponding files have to be copied to 'experiments/selection/P800-{X}/background_noise'. -For most tests the naming has to follow the scheme 'background_noise_cat{c}.wav' where 'c' denotes the category with a number between one and six. For the P800-2 test, the naming has to follow 'background_noise_cat{c}-lab_{l}.wav' +If the test includes background noise, the corresponding files have to be copied to `experiments/selection/P800-{X}/background_noise`. +For most tests the naming has to follow the scheme `background_noise_cat{c}.wav` where 'c' denotes the category with a number between one and six. For the P800-2 test, the naming has to follow `background_noise_cat{c}-lab_{l}.wav` To process the items run `python generate_test.py P800-{X},{l}` from the root folder of the repository. -The results can be found in 'experiments/selection/P800-{X}/proc_output_{l}'. +The results can be found in `experiments/selection/P800-{X}/proc_output_{l}`. -For more information about this processing step see +For more information about this processing step see [How to generate the configs and process items for the selection test experiments](#how-to-generate-the-configs-and-process-items-for-the-selection-test-experiments). ## MUSHRA @@ -91,10 +91,10 @@ The set up for the MUSHRA test only consists of the item processing. ### Item processing -To process a BS1534-{X}{x} (X = 1, 2, ...7, x = a, b) listening test, the input files have to be placed in the folder 'experiments/selection/BS1534-{X}{x}/proc_input_{l}' and the command +To process a BS1534-{X}{x} (X = 1, 2, ...7, x = a, b) listening test, the input files have to be placed in the folder `experiments/selection/BS1534-{X}{x}/proc_input_{l}` and the command `python generate_test.py BS1534-{X}{x},{l}` has to be run from the root of the repository. 'l' stands for the listening lab designator: a (Force Technology), b (HEAD acoustics), c (MQ University), d (Mesaqin.com). -The output can then be found in 'experiments7selection/BS1534-{X}{x}/proc_output_{l}' +The output can then be found in `experiments7selection/BS1534-{X}{x}/proc_output_{l}` The BS1534-7a and BS1534-7b tests are MASA experiments with FOA and HOA2 inputs. Therefore the input folder contains two subfolders called 'FOA' and 'HOA2'. The input files have to be placed in the folders according to their format. @@ -143,33 +143,33 @@ input_path: "~/ivas/items/HOA3" output_path: "./tmp_output" input: - fmt: "HOA3" + fmt: "HOA3" conditions_to_generate: c01: - type: ref + type: ref c02: - type: lp3k5 + type: lp3k5 c03: - type: ivas - bitrates: - - 160000 - cod: - bin: ~/git/ivas-codec/IVAS_cod - dec: - bin: ~/git/ivas-codec/IVAS_dec - fmt: HOA3 + type: ivas + bitrates: + - 160000 + cod: + bin: ~/git/ivas-codec/IVAS_cod + dec: + bin: ~/git/ivas-codec/IVAS_dec + fmt: HOA3 c05: - type: evs - bitrates: - - 9600 - cod: - bin: ~/git/ivas-codec/EVS_cod - dec: - bin: ~/git/ivas-codec/EVS_dec + type: evs + bitrates: + - 9600 + cod: + bin: ~/git/ivas-codec/EVS_cod + dec: + bin: ~/git/ivas-codec/EVS_dec postprocessing: - fmt: "BINAURAL" - fs: 48000 + fmt: "BINAURAL" + fs: 48000 ``` @@ -189,7 +189,7 @@ postprocessing: ### Whether to use multiprocessing; default = true # multiprocessing: false -### Deletion of temporary directories containing +### Deletion of temporary directories containing ### intermediate processing files, bitstreams etc.; default = false # delete_tmp: true ### Master seed for random processes like bitstream error pattern generation; default = 0 @@ -212,18 +212,18 @@ output_path: "./tmp_output" ### If input format is ISM{1-4} a path for the metadata files can be specified; ### default = null (for ISM search for item_name.{wav, raw, pcm}.{0-3}.csv in input folder, otherise ignored) # metadata_path: - ### Path can be set for all items with the 'all_items' key (automatic search for item_name.{wav, raw, pcm}.{0-3}.csv within this folder) - # all_items: ".../metadata_folder" - ### Path can be set for all items individually with 'item{1-4}' keys - ### 'item{1-4}' keys can also be renamed to the input file names including extension {wav, raw, pcm} - ### Either list individual files for all objects or name folder for automatic search for one item - # item1: - # - ".../meta_all_obj" - # item2: - # - ".../meta_obj1.csv" - # - ".../meta_ob2.csv" - # noise.wav: - # - ".../metadata_folder_for_noise_item" + ### Path can be set for all items with the 'all_items' key (automatic search for item_name.{wav, raw, pcm}.{0-3}.csv within this folder) + # all_items: ".../metadata_folder" + ### Path can be set for all items individually with 'item{1-4}' keys + ### 'item{1-4}' keys can also be renamed to the input file names including extension {wav, raw, pcm} + ### Either list individual files for all objects or name folder for automatic search for one item + # item1: + # - ".../meta_all_obj" + # item2: + # - ".../meta_obj1.csv" + # - ".../meta_ob2.csv" + # noise.wav: + # - ".../metadata_folder_for_noise_item" ### Select only a subset of items ### searches for the specified substring in found filenames; default = null @@ -244,10 +244,10 @@ output_path: "./tmp_output" ```yaml input: - ### REQUIRED: Input format - fmt: "HOA3" - ### Input sampling rate in Hz needed for headerless audio files; default = 48000 - # fs: 32000 + ### REQUIRED: Input format + fmt: "HOA3" + ### Input sampling rate in Hz needed for headerless audio files; default = 48000 + # fs: 32000 ``` @@ -261,27 +261,27 @@ input: ### Pre-processing step performed prior to core processing for all conditions ### If not defined, preprocessing step is skipped # preprocessing: - ### Target format used in rendering from input format; default = null (no rendering) - # fmt: "7_1_4" - ### Define mask (HP50 or 20KBP) for input signal filtering; default = null - # mask: "HP50" - ### Target sampling rate in Hz for resampling; default = null (no resampling) - # fs: 16000 - ### Target loudness in LKFS; default = null (no loudness change applied) - # loudness: -26 - ### Spatial audio format in which loudness is adjusted (only used if preprocessing loudness is not null); - ### default = null (uses postprocessing fmt) - # loudness_fmt: "BINAURAL" - ### Pre-/post-trim individual signal(s) (ms) (negative values pad silence); default = 0 - # trim: - # - 50 - # - -50 - ### Flag for using noise (amplitude +-4) instead of silence for padding; default = false (silence) - # pad_noise: true - ### Value for application of delay (ms) (negative values advance); default = 0 - # delay: 20 - ### Length of window used at start/end of signal (ms); default = 0 - # window: 100 + ### Target format used in rendering from input format; default = null (no rendering) + # fmt: "7_1_4" + ### Define mask (HP50 or 20KBP) for input signal filtering; default = null + # mask: "HP50" + ### Target sampling rate in Hz for resampling; default = null (no resampling) + # fs: 16000 + ### Target loudness in LKFS; default = null (no loudness change applied) + # loudness: -26 + ### Spatial audio format in which loudness is adjusted (only used if preprocessing loudness is not null); + ### default = null (uses postprocessing fmt) + # loudness_fmt: "BINAURAL" + ### Pre-/post-trim individual signal(s) (ms) (negative values pad silence); default = 0 + # trim: + # - 50 + # - -50 + ### Flag for using noise (amplitude +-4) instead of silence for padding; default = false (silence) + # pad_noise: true + ### Value for application of delay (ms) (negative values advance); default = 0 + # delay: 20 + ### Length of window used at start/end of signal (ms); default = 0 + # window: 100 ``` @@ -293,28 +293,28 @@ input: ```yaml # preprocessing_2: - ### Options for processing of the concatenated item (concatenate_input: true) or - ### the individual items (concatenate_input: false) after previous pre-processing step - ### Horizontally concatenate input items into one long file; default = false - # concatenate_input: true - ### Specify the concatenation order in a list of strings. If not specified, the concatenation order would be - ### as per the filesystem on the users' device - ### Should only be used if concatenate_input = true - # concatenation_order: [] - ### Specify preamble duration in ms; default = 0 - # preamble: 10000 - ### Flag wheter to use noise (amplitude +-4) for the preamble or silence; default = false (silence) - # preamble_noise: true - ### Additive background noise - # background_noise: - ### SNR for background noise in dB; REQUIRED for prerecorded background noise and ignored for low level noise - # snr: 10 - ### REQUIRED: Either background noise path or low level noise flag - ### Path to background noise, must have same format and sampling rate as input signal(s); default = null - # background_noise_path: ".../noise.wav" - ### Flag for using low level [-4,+4] background noise; default = false - # low_level_noise: true - ### Flag for repeating the whole signal once and discarding the first half after processing + ### Options for processing of the concatenated item (concatenate_input: true) or + ### the individual items (concatenate_input: false) after previous pre-processing step + ### Horizontally concatenate input items into one long file; default = false + # concatenate_input: true + ### Specify the concatenation order in a list of strings. If not specified, the concatenation order would be + ### as per the filesystem on the users' device + ### Should only be used if concatenate_input = true + # concatenation_order: [] + ### Specify preamble duration in ms; default = 0 + # preamble: 10000 + ### Flag wheter to use noise (amplitude +-4) for the preamble or silence; default = false (silence) + # preamble_noise: true + ### Additive background noise + # background_noise: + ### SNR for background noise in dB; REQUIRED for prerecorded background noise and ignored for low level noise + # snr: 10 + ### REQUIRED: Either background noise path or low level noise flag + ### Path to background noise, must have same format and sampling rate as input signal(s); default = null + # background_noise_path: ".../noise.wav" + ### Flag for using low level [-4,+4] background noise; default = false + # low_level_noise: true + ### Flag for repeating the whole signal once and discarding the first half after processing # repeat_signal: true ``` @@ -330,27 +330,25 @@ input: ### e.g. frame error insertion or transport simulation for JBM testing ### can be given globally or in individual conditions of type ivas or evs # tx: - ### REQUIRED: Type of bitstream processing; possible types: "JBM" or "FER" - #type: "JBM" - - ### JBM - ### REQUIRED: either error_pattern or error_profile - ### delay error profile file - # error_pattern: ".../dly_error_profile.dat" - ### Index of one of the existing delay error profile files to use (1-11) - # error_profile: 5 - ## nFramesPerPacket parameter for the network simulator (optional); default = 1 - # n_frames_per_packet: 2 - - ### FER - ### REQUIRED: either error_pattern or error_rate - ### Frame error pattern file - # error_pattern: "path/pattern.192" - ### Error rate in percent - # error_rate: 5 + ### REQUIRED: Type of bitstream processing; possible types: "JBM" or "FER" + #type: "JBM" + ### JBM + ### REQUIRED: either error_pattern or error_profile + ### delay error profile file + # error_pattern: ".../dly_error_profile.dat" + ### Index of one of the existing delay error profile files to use (1-11) + # error_profile: 5 + ## nFramesPerPacket parameter for the network simulator (optional); default = 1 + # n_frames_per_packet: 2 + ### FER + ### REQUIRED: either error_pattern or error_rate + ### Frame error pattern file + # error_pattern: "path/pattern.192" + ### Error rate in percent + # error_rate: 5 ``` - + ### Configuration of conditions under test @@ -372,103 +370,103 @@ input: conditions_to_generate: ### Reference and anchor conditions ########################## c01: - ### REQUIRED: type of condition - type: ref - ### optional low-pass cut-off frequency in Hz; default = null - # out_fc: 22500 + ### REQUIRED: type of condition + type: ref + ### optional low-pass cut-off frequency in Hz; default = null + # out_fc: 22500 c02: - ### REQUIRED: type of condition - type: lp3k5 + ### REQUIRED: type of condition + type: lp3k5 c03: - ### REQUIRED: type of condition - type: mnru - ### REQUIRED: the ratio of speech power to modulated noise power in dB - q: 20 + ### REQUIRED: type of condition + type: mnru + ### REQUIRED: the ratio of speech power to modulated noise power in dB + q: 20 c04: - ### REQUIRED: type of condition - type: esdru - ### REQUIRED: spatial degradation value between 0 and 1 - alpha: 0.5 + ### REQUIRED: type of condition + type: esdru + ### REQUIRED: spatial degradation value between 0 and 1 + alpha: 0.5 c05: - ### REQUIRED: type of condition - type: mono_dmx - + ### REQUIRED: type of condition + type: mono_dmx + ### IVAS condition ############################### c06: - ### REQUIRED: type of condition - type: ivas - ### REQUIRED: Bitrates to use for coding - bitrates: - - 160000 - # - 32000 - ### Encoder options - cod: - ### Path to encoder binary; default search for IVAS_cod in bin folder (primary) and PATH (secondary) - bin: ~/git/ivas-codec/IVAS_cod - ### Encoder input sampling rate in Hz (resampling performed in case of mismatch); default = null (no resampling) - # fs: 32000 - ### Additional commandline options; default = null - # opts: ["-q", "-dtx", 4] - ### Decoder options - dec: - ### Path to decoder binary; default search for IVAS_dec in bin folder (primary) and PATH (secondary) - bin: ~/git/ivas-codec/IVAS_dec - ### Decoder output format; default = postprocessing fmt - fmt: "HOA3" - ### Decoder output sampling rate; default = null (same as input) - # fs: 48000 - ### Additional commandline options; default = null - # opts: ["-q", "-no_delay_cmp"] + ### REQUIRED: type of condition + type: ivas + ### REQUIRED: Bitrates to use for coding + bitrates: + - 160000 + # - 32000 + ### Encoder options + cod: + ### Path to encoder binary; default search for IVAS_cod in bin folder (primary) and PATH (secondary) + bin: ~/git/ivas-codec/IVAS_cod + ### Encoder input sampling rate in Hz (resampling performed in case of mismatch); default = null (no resampling) + # fs: 32000 + ### Additional commandline options; default = null + # opts: ["-q", "-dtx", 4] + ### Decoder options + dec: + ### Path to decoder binary; default search for IVAS_dec in bin folder (primary) and PATH (secondary) + bin: ~/git/ivas-codec/IVAS_dec + ### Decoder output format; default = postprocessing fmt + fmt: "HOA3" + ### Decoder output sampling rate; default = null (same as input) + # fs: 48000 + ### Additional commandline options; default = null + # opts: ["-q", "-no_delay_cmp"] ### IVAS condition ############################### c07: - ### REQUIRED: type of condition - type: ivas - ### REQUIRED: Bitrates to use for coding - bitrates: - - 160000 - # - 32000 - ### Encoder options - cod: - ### Path to encoder binary; default search for IVAS_cod in bin folder (primary) and PATH (secondary) - bin: ~/git/ivas-codec/IVAS_cod - ### Encoder input sampling rate in Hz (resampling performed in case of mismatch); default = null (no resampling) - # fs: 32000 - ### Additional commandline options; default = null - # opts: ["-q", "-dtx", 4] - ### Decoder options - dec: - ### Path to decoder binary; default search for IVAS_dec in bin folder (primary) and PATH (secondary) - bin: ~/git/ivas-codec/IVAS_dec - ### Decoder output format; default = postprocessing fmt - fmt: "7_1_4" - ### Decoder output sampling rate; default = null (same as input) - # fs: 48000 - ### Additional commandline options; default = null - # opts: ["-q", "-no_delay_cmp"] - + ### REQUIRED: type of condition + type: ivas + ### REQUIRED: Bitrates to use for coding + bitrates: + - 160000 + # - 32000 + ### Encoder options + cod: + ### Path to encoder binary; default search for IVAS_cod in bin folder (primary) and PATH (secondary) + bin: ~/git/ivas-codec/IVAS_cod + ### Encoder input sampling rate in Hz (resampling performed in case of mismatch); default = null (no resampling) + # fs: 32000 + ### Additional commandline options; default = null + # opts: ["-q", "-dtx", 4] + ### Decoder options + dec: + ### Path to decoder binary; default search for IVAS_dec in bin folder (primary) and PATH (secondary) + bin: ~/git/ivas-codec/IVAS_dec + ### Decoder output format; default = postprocessing fmt + fmt: "7_1_4" + ### Decoder output sampling rate; default = null (same as input) + # fs: 48000 + ### Additional commandline options; default = null + # opts: ["-q", "-no_delay_cmp"] + ### EVS condition ################################ c08: - ### REQUIRED: type of condition - type: evs - ### REQUIRED: Bitrates to use for coding - ### For EVS mono, this may be a per-channel bitrate configuration (must match input/preprocessing format!) - ### the last value will be repeated if too few are specified - bitrates: - # - 9600 - - [13200, 13200, 8000, 13200, 9600] - ### for multi-channel configs, code LFE with 9.6 kbps NB (as mandated by IVAS-3) - evs_lfe_9k6bps_nb: true - cod: - ### Path to encoder binary; default search for EVS_cod in bin folder (primary) and PATH (secondary) - bin: ~/git/ivas-codec/EVS_cod - ### Encoder input sampling rate in Hz (resampling performed in case of mismatch); default = null (no resampling) - # fs: 32000 - dec: - ### Path to encoder binary; default search for EVS_dec in bin folder (primary) and PATH (secondary) - bin: ~/git/ivas-codec/EVS_dec - ### Decoder output sampling rate; default = null (same as input) - # fs: 48000 + ### REQUIRED: type of condition + type: evs + ### REQUIRED: Bitrates to use for coding + ### For EVS mono, this may be a per-channel bitrate configuration (must match input/preprocessing format!) + ### the last value will be repeated if too few are specified + bitrates: + # - 9600 + - [13200, 13200, 8000, 13200, 9600] + ### for multi-channel configs, code LFE with 9.6 kbps NB (as mandated by IVAS-3) + evs_lfe_9k6bps_nb: true + cod: + ### Path to encoder binary; default search for EVS_cod in bin folder (primary) and PATH (secondary) + bin: ~/git/ivas-codec/EVS_cod + ### Encoder input sampling rate in Hz (resampling performed in case of mismatch); default = null (no resampling) + # fs: 32000 + dec: + ### Path to encoder binary; default search for EVS_dec in bin folder (primary) and PATH (secondary) + bin: ~/git/ivas-codec/EVS_dec + ### Decoder output sampling rate; default = null (same as input) + # fs: 48000 ``` @@ -482,26 +480,26 @@ conditions_to_generate: ### Post-processing step performed after core processing for all conditions ### Post-processing is required and can not be omitted postprocessing: - ### REQUIRED: Target format for output - fmt: "BINAURAL" - ### REQUIRED: Target sampling rate in Hz for resampling; default = null (no resampling) - fs: 48000 - ### Low-pass cut-off frequency in Hz; default = null (no filtering) - # lp_cutoff: 24000 - ### Target loudness in LKFS; default = null (no loudness change applied) - # loudness: -26 - ### Spatial audio format in which loudness is adjusted (only used if preprocessing loudness is not null); - ### default = null (uses postprocessing fmt if possible) - # loudness_fmt: null - ### Name of custom binaural dataset (without prefix or suffix); - ### default = null (ORANGE53(_Dolby) for BINAURAL, IISofficialMPEG222UC for BINAURAL_ROOM) - # bin_dataset: SADIE - ### Render LFE to binaural output with the specified gain (only valid for channel-based input); default = null - # bin_lfe_gain: 1 - ### Flag whether output should be limited to avoid clipping (can alter target loudness); default = true - # limit: false - ### Head-tracking trajectory file for binaural output; default = null - # trajectory: "path/to/file" + ### REQUIRED: Target format for output + fmt: "BINAURAL" + ### REQUIRED: Target sampling rate in Hz for resampling; default = null (no resampling) + fs: 48000 + ### Low-pass cut-off frequency in Hz; default = null (no filtering) + # lp_cutoff: 24000 + ### Target loudness in LKFS; default = null (no loudness change applied) + # loudness: -26 + ### Spatial audio format in which loudness is adjusted (only used if preprocessing loudness is not null); + ### default = null (uses postprocessing fmt if possible) + # loudness_fmt: null + ### Name of custom binaural dataset (without prefix or suffix); + ### default = null (ORANGE53(_Dolby) for BINAURAL, IISofficialMPEG222UC for BINAURAL_ROOM) + # bin_dataset: SADIE + ### Render LFE to binaural output with the specified gain (only valid for channel-based input); default = null + # bin_lfe_gain: 1 + ### Flag whether output should be limited to avoid clipping (can alter target loudness); default = true + # limit: false + ### Head-tracking trajectory file for binaural output; default = null + # trajectory: "path/to/file" ``` @@ -512,60 +510,73 @@ postprocessing: The following values may be used for the `type` key of a condition: -| Supported conditions | Description | -|:--------------------:|-----------------------------------------------------------| -| ref | Uncoded (reference) | -| lp3k5 | Uncoded low-passed at 3.5 kHz (anchor) | -| lp7k | Uncoded low-passed at 7 kHz (anchor) | -| mnru | Uncoded MNRU | -| esdru | Uncoded ESDRU | -| mono_dmx | Uncoded mono downmix | -| evs | Coded with multi-stream EVS codec (**metadata not coded!**) | -| ivas | Coded with IVAS codec | +| Supported conditions | Description | +| :------------------: | ----------------------------------------------------------- | +| ref | Uncoded (reference) | +| lp3k5 | Uncoded low-passed at 3.5 kHz (anchor) | +| lp7k | Uncoded low-passed at 7 kHz (anchor) | +| mnru | Uncoded MNRU | +| esdru | Uncoded ESDRU | +| mono_dmx | Uncoded mono downmix | +| evs | Coded with multi-stream EVS codec (**metadata not coded!**) | +| ivas | Coded with IVAS codec | ### Configuration of conditions #### Reference ref + No required arguments but the `type` key. An additional LP filtering can be specified with `out_fc`. + #### Low-pass Filter lp3k5 and lp7k + No required arguments but the `type` key. + #### MNRU and ESDRU + The MNRU and ESDRU conditions each take one additional required argument. For MNRU the value `q`, which represents the ratio of speech power to modulated noise power in dB, -has to be specified.
+has to be specified. For the ESDRU the spatial degradation value `alpha` in the range [0, 1] has to be defined. + #### Mono downmix mono_dmx + No required arguments but the `type` key. + #### EVS + For EVS a list of at least one bitrate has to be specified with the key `bitrates`. The entries in this list can also be lists containing the bitrates used for the processing of the individual channels. This configuration has to match the channel configuration. If the provided list is shorter, the last value will be repeated. For the encoding stage `cod` and the decoding stage `dec`, the path to the IVAS_cod and IVAS_dec binaries can be specified under the key `bin`. Additionally some resampling can be applied by using the key `fs` followed by the desired sampling rate. The general bitstream processing configuration can be locally overwritten for each EVS and IVAS condition with the key `tx`. The additional key `evs_lfe_9k6bps_nb` is only available for EVS conditions and ensures a bitrate of 9.6kbps and narrow band processing of the LFE channel(s). + #### IVAS + The configuration of the IVAS condition is similar to the EVS condition. However, only one bitrate for all channels (and metadata) can be specified. In addition to that, the encoder and decoder take some additional arguments defined by the key `opts`. For the decoder an output format can be set. If this argument is not defined the format specified in postprocessing is used. ### IVAS External Renderer -The key `ivas_rend` can be added in each condition to apply the IVAS_rend external renderer after the condition (e.g. after encoding and decoding for `ivas`) but previous to the +The key `ivas_rend` can be added in each condition to apply the IVAS_rend external renderer after the condition (e.g. after encoding and decoding for `ivas`) but previous to the postprocessing. --- ## Supported audio formats -| spatial_audio_format | Input/Ouput/Rendered | Description | -|--------------------------------------------------|----------------------|------------------------------------------------| -| MONO | yes/yes/yes | mono signals | -| STEREO | yes/yes/yes | stereo signals | -| ISMx | yes/yes/no | Objects with metadata, description using renderer metadata | -| MASAx | yes/no/no | mono or stereo signals with spatial metadata !!!metadata must share same basename as waveform file but with .met extension!!! | -| FOA/HOA2/HOA3 or PLANAR(FOA/HOAx) | yes/yes/yes | Ambisonic signals or planar ambisonic signals | -| BINAURAL/BINAURAL_ROOM | no/yes/yes | Binaural signals | -| 5_1/5_1_2/5_1_4/7_1/7_1_4 | yes/yes/yes | Multi-channel signals for predefined loudspeaker layout | -| META | yes/no/no | Audio scene described by a renderer config (work in progress) | +| spatial_audio_format | Input/Ouput/Rendered | Description | +| --------------------------------- | -------------------- | ----------------------------------------------------------------------------------------------------------------------------- | +| MONO | yes/yes/yes | mono signals | +| STEREO | yes/yes/yes | stereo signals | +| ISMx | yes/yes/no | Objects with metadata, description using renderer metadata | +| MASAx | yes/no/no | mono or stereo signals with spatial metadata !!!metadata must share same basename as waveform file but with .met extension!!! | +| ISMxMASAy | yes/no/no | Combined objects with MASA | +| FOA/HOA2/HOA3 or PLANAR(FOA/HOAx) | yes/yes/yes | Ambisonic signals or planar ambisonic signals | +| ISMxSBAy | yes/no/no | Combined objects with ambisonics | +| BINAURAL/BINAURAL_ROOM | no/yes/yes | Binaural signals | +| 5_1/5_1_2/5_1_4/7_1/7_1_4 | yes/yes/yes | Multi-channel signals for predefined loudspeaker layout | +| META | yes/no/no | Audio scene described by a renderer config (work in progress) | --- @@ -582,27 +593,28 @@ The processing chain is as follows: - The postprocessing stage performs a final conversion from the output of the previous stage if necessary and applies the specified processing --- + ## Additional Executables The following additional executables are needed for the different processing steps: -| Processing step | Executable | Where to find | -|-------------------------------------------------|-----------------------|-------------------------------------------------------------------------------------------------------------| -| Loudness measurement and adjustment | bs1770demo | https://github.com/ErikNorvell-Ericsson/STL (Note branch) | -| MNRU | p50fbmnru | https://github.com/openitu/STL | -| ESDRU | esdru | https://github.com/openitu/STL | -| Frame error pattern application | eid-xor | https://github.com/openitu/STL | -| Error pattern generation | gen-patt | https://www.itu.int/rec/T-REC-G.191-201003-S/en (Note: Version in https://github.com/openitu/STL is buggy!) | -| Reverberation module | reverb | https://github.com/openitu/STL | -| Filtering, Resampling | filter | https://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_76/docs/S4-131277.zip | -| Random offset/seed generation (necessary for background noise and FER bitstream processing) | random | https://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_76/docs/S4-131277.zip | -| JBM network simulator | networkSimulator_g192 | https://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_76/docs/S4-131277.zip | -| MASA rendering (also used in loudness measurement of MASA items) | masaRenderer, masaAnalyzer | https://www.3gpp.org/ftp/TSG_SA/WG4_CODEC/TSGS4_122_Athens/Docs/S4-230221.zip | -| EVS reference conditions | EVS_cod, EVS_dec | https://www.3gpp.org/ftp/Specs/archive/26_series/26.443/26443-h00.zip | - -The necessary binaries have to be either placed in the [ivas_processing_scripts/bin](./ivas_processing_scripts/bin) folder or the path has to be specified in +| Processing step | Executable | Where to find | +| ------------------------------------------------------------------------------------------- | -------------------------- | --------------------------------------------------------------------------------------------------------------- | +| Loudness measurement and adjustment | bs1770demo | (Note branch) | +| MNRU | p50fbmnru | | +| ESDRU | esdru | | +| Frame error pattern application | eid-xor | | +| Error pattern generation | gen-patt | (Note: Version in is buggy!) | +| Reverberation module | reverb | | +| Filtering, Resampling | filter | | +| Random offset/seed generation (necessary for background noise and FER bitstream processing) | random | | +| JBM network simulator | networkSimulator_g192 | | +| MASA rendering (also used in loudness measurement of MASA items) | masaRenderer, masaAnalyzer | | +| EVS reference conditions | EVS_cod, EVS_dec | | + +The necessary binaries have to be either placed in the [ivas_processing_scripts/bin](./ivas_processing_scripts/bin) folder or the path has to be specified in [ivas_processing_scripts/binary_paths.yml](./ivas_processing_scripts/binary_paths.yml). -For most of the tools it is sufficient to copy the binaries while it is necessary to add the associated *.bin files for the MASA renderer. +For most of the tools it is sufficient to copy the binaries while it is necessary to add the associated \*.bin files for the MASA renderer. --- @@ -651,26 +663,26 @@ First line of an input definition contains the input type: SBA, MC or ISM. Following lines depend on the input type: -``` +```text SBA Index of the first channel of this input in the multitrack file (1-indexed) Ambisonics order ``` -``` +```text MC Index of the first channel of this input in the multitrack file (1-indexed) Name of speaker layout (X_Y_Z or CICPx format) ``` -``` +```text MASA Index of the first channel of this input in the multitrack file (1-indexed) Number of transport channels Path to MASA metadata file (must be relative to config file location) ``` -``` +```text ISM Index of this input's audio in the multitrack file (1-indexed) Path to ISM metadata file (must be relative to config file location) @@ -678,7 +690,7 @@ Path to ISM metadata file (must be relative to config file location) OR -``` +```text ISM Index of this input's audio in the multitrack file (1-indexed) Number N of positions defined, followed by N lines in form: @@ -698,9 +710,9 @@ Each key-value pair should be placed on a separate line. Currently the following key-value pairs are supported: -| key | value type | -|---------------------|--------------------------------------| -| gain_dB | float | +| key | value type | +| ------- | ---------- | +| gain_dB | float | ## Example configuration @@ -715,7 +727,7 @@ The following example defines a scene with 4 inputs: move to (90,0) and stay there for 5 frames. This trajectory is looped over the duration of the input audio file. -``` +```text ./input_audio.wav 4 ISM @@ -745,7 +757,8 @@ Please refer to [the notebook](./examples/audiotools.ipynb) for an overview. # How to generate the configs and process items for the selection test experiments The script `generate_test.py` is used to generate config files and process items for the selection test experiments: -``` + +```text usage: generate_test.py [-h] [--no_parallel] [--create_cfg_only] exp_lab_pairs [exp_lab_pairs ...] Generate config files and process files for selecton experiments. Experiment names and lab ids must be given as comma-separated pairs (e.g. 'P800-5,b BS1534-4a,d ...') @@ -758,13 +771,16 @@ options: --no_parallel If given, configs will not be run in parallel --create_cfg_only If given, only create the configs and folder structure without processing items ``` + Before running the script, one needs to put the input files in the respective input folder (including the background noise files, see below). If input files are missing, the script will complain and stop. For example, for processing tests P800-3 and BS1534-4a for labs b and d, respectively, command line would look like this (no whitespace between the commas!): -``` + +```bash python3 generate_test.py P800-3,b BS1534-4a,d ``` -Tests are processed separately per category and per lab (as some values in the configs are dependent on category and lab). For each experiment, a static base config is stored from which the actual configs are generated (identfied by the suffix `catX-lab_Y.yml`). For P800 tests, there are 6 categories each. The BS1534 experiments do not define categories, except for the MASA ones (BAS534-7a/b) - there one might mix FOA and HOA2 input material, so ther eare 2 categories for those in the scripts (category 1 for FOA, category 2 for HOA2). In `experiments/selection/` there is a folder structure prepared for all selection experiments, in which you have to put the input files for your test. For example, for P800-1: -``` +Tests are processed separately per category and per lab (as some values in the configs are dependent on category and lab). For each experiment, a static base config is stored from which the actual configs are generated (identfied by the suffix `catX-lab_Y.yml`). For P800 tests, there are 6 categories each. The BS1534 experiments do not define categories, except for the MASA ones (BAS534-7a/b) - there one might mix FOA and HOA2 input material, so ther eare 2 categories for those in the scripts (category 1 for FOA, category 2 for HOA2). In `experiments/selection/` there is a folder structure prepared for all selection experiments, in which you have to put the input files for your test. For example, for P800-1: + +```text experiments/selection/P800-1/ ├── background_noise <--- put your background files in here and name them as background_noisecatX.wav. Not all experiments use background noise ├── config <--- contains base config, generated configs will be stored here, too @@ -773,4 +789,4 @@ experiments/selection/P800-1/ │ ├── catX <--- put your input files for cat X in here └── proc_output <--- collect your output from here, example subfolder below │ ├── catX-lab_Y <--- NOTE: this is only generated by the script and not checked in in the repository -``` \ No newline at end of file +``` diff --git a/ivas_processing_scripts/audiotools/binaural_datasets/README.md b/ivas_processing_scripts/audiotools/binaural_datasets/README.md new file mode 100755 index 0000000000000000000000000000000000000000..5028d5da31adb9b135a142b318b429c1a97ac43f --- /dev/null +++ b/ivas_processing_scripts/audiotools/binaural_datasets/README.md @@ -0,0 +1,41 @@ +# Binaural Datasets + +Files in this directory should contain impulse responses for use in rendering in Matlab .mat format +A sampling rate of 48kHz is assumed. + +## Naming scheme + +Files should adhere to the following naming scheme: + +`{HRIR|BRIR}_{DATASETNAME}_{FULL|LS|SBA(1-3)}.mat` + +- `HRIR or BRIR` + specifies the type of impulse response which will be used for + either `BINAURAL` or `BINAURAL_ROOM` output respectively +- `DATASETNAME` + specifies the name used with the binaural_dataset commandline argument + or YAML key to enable selection of this dataset +- `FULL or LS or SBA3` + specifies the subset of impulse responses in the file: + - `FULL`: all available measurements on the sphere + - `LS`: superset of supported loudspeaker layouts + _(see `audiotools.constants.CHANNEL_BASED_AUDIO_FORMATS["LS""]`)_ + - `SBA{1,2,3}`: impulse responses transformed to ambisonics by external conversion + - if available SBA1 is used for FOA, SBA2 for HOA2 and SBA3 for HOA3 + - if not available SBA3 is used and truncated for all Ambisonic formats + +## File Contents + +Each Matlab file should contain the following variables: + +- `IR` + Impulse responses with dimensions [ir_length x n_ears x n_channels] +- `SourcePosition` + array of {azimuth, elevation, radius} of dimensions [n_channels x 3] + required for FULL, optional otherwise +- `latency_s` + latency of the dataset in samples + optional, will be estimated if not provided + +LICENSES: +Please see [HRIR.txt](../../../thirdPartyLegalNotices/HRIR.txt) and [BRIR.txt](../../../thirdPartyLegalNotices/BRIR.txt) for license info diff --git a/ivas_processing_scripts/audiotools/binaural_datasets/README.txt b/ivas_processing_scripts/audiotools/binaural_datasets/README.txt deleted file mode 100755 index 9fd37c966abf95f652245ae9ff1ae8573754b570..0000000000000000000000000000000000000000 --- a/ivas_processing_scripts/audiotools/binaural_datasets/README.txt +++ /dev/null @@ -1,34 +0,0 @@ -Files in this directory should contain impulse responses for use in rendering in Matlab .mat format -Samplingrate of 48kHz is assumed - -Files should adhere to the following naming scheme: - -{HRIR|BRIR}_{DATASETNAME}_{FULL|LS|SBA(1-3)}.mat - -- HRIR or BRIR - specifies the type of impulse response which will be used - for either BINAURAL or BINAURAL_ROOM output respectively -- DATASETNAME - specifies the name used with the binaural_dataset commandline argument - or YAML key to enable selection of this dataset -- FULL or LS or SBA3 - specifies the subset of impulse responses in the file: - FULL: all available measurements on the sphere - LS: superset of supported loudspeaker layouts - (see audiotools.constants.CHANNEL_BASED_AUDIO_FORMATS["LS""]) - SBA(1-3): impulse responses transformed to ambisonics by external conversion - if available SBA1 is used for FOA, SBA2 for HOA2 and SBA3 for HOA3 - if not available SBA3 is used and truncated for all Ambisonic formats - -Each Matlab file should contain the following variables: -- IR - Impulse responses with dimensions [ir_length x n_ears x n_channels] -- SourcePosition - array of {azimuth, elevation, radius} of dimensions [n_channels x 3] - required for FULL, optional otherwise -- latency_s - latency of the dataset in samples - optional, will be estimated if not provided - -LICENSES: -Please see HRIR.txt and BRIR.txt for license info \ No newline at end of file diff --git a/ivas_processing_scripts/bin/README.md b/ivas_processing_scripts/bin/README.md index 494ed5ed1c0c00d2babb1d6d3411d5634718a34b..10b28b3e6bf0ae69aa2296ec6f7e47513da20d2e 100755 --- a/ivas_processing_scripts/bin/README.md +++ b/ivas_processing_scripts/bin/README.md @@ -1,17 +1,17 @@ - Necessary additional executables: -| Processing step | Executable | Where to find | -|-------------------------------------------------|-----------------------|-------------------------------------------------------------------------------------------------------------| -| Loudness measurement and adjustment | bs1770demo | https://github.com/ErikNorvell-Ericsson/STL (Note branch) | -| MNRU | p50fbmnru | https://github.com/openitu/STL | -| ESDRU | esdru | https://github.com/openitu/STL | -| Frame error pattern application | eid-xor | https://github.com/openitu/STL | -| Reverberation module | reverb | https://github.com/openitu/STL | -| Error pattern generation | gen-patt | https://www.itu.int/rec/T-REC-G.191-201003-S/en (Note: Version in https://github.com/openitu/STL is buggy!) | -| Filtering, Resampling | filter | https://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_76/docs/S4-131277.zip | -| Random offset/seed generation (necessary for background noise and FER bitstream processing) | random | https://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_76/docs/S4-131277.zip | -| JBM network simulator | networkSimulator_g192 | https://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_76/docs/S4-131277.zip | -| MASA rendering (also used in loudness measurement of MASA items) | masaRenderer | https://www.3gpp.org/ftp/TSG_SA/WG4_CODEC/TSGS4_122_Athens/Docs/S4-230221.zip | -| EVS reference conditions | EVS_cod, EVS_dec | https://www.3gpp.org/ftp/Specs/archive/26_series/26.443/26443-h00.zip | -| EVS JBM conditions | dlyerr_2_errpat | http://ftp.3gpp.org/tsg_sa/WG4_CODEC/TSGS4_70/Docs/S4-121077.zip | \ No newline at end of file +| Processing step | Executable | Where to find | +| ------------------------------------------------------------------------------------------- | --------------------- | --------------------------------------------------------------------------------------------------------------- | +| Loudness measurement and adjustment | bs1770demo | (Note branch) | +| MNRU | p50fbmnru | | +| ESDRU | esdru | | +| Frame error pattern application | eid-xor | | +| Reverberation module | reverb | | +| Error pattern generation | gen-patt | (Note: Version in is buggy!) | +| Filtering, Resampling | filter | | +| Random offset/seed generation (necessary for background noise and FER bitstream processing) | random | | +| JBM network simulator | networkSimulator_g192 | | +| MASA analyzer (used for SBA to MASA conversion) | masaRenderer | | +| MASA rendering (also used in loudness measurement of MASA items) | masaAnalyzer | | +| EVS reference conditions | EVS_cod, EVS_dec | | +| EVS JBM conditions | dlyerr_2_errpat | | diff --git a/tests/README.md b/tests/README.md index 808224257c879803213cb7f5f736a03ac43c059b..aad10a34ae3850ec2688597c01e7274eeef4b20e 100644 --- a/tests/README.md +++ b/tests/README.md @@ -35,4 +35,3 @@ The files contained in this folder are not part of the processing scripts themselves. They are used only for testing the scripts during development and are not required to use or run the processing scripts for generating listening tests. Additional dependencies for the files in this folder are not required for running the scripts and are thus no dependencies of the processing scripts themselves. -