Skip to content

[WIP] Resolve "Enable rendering to all output formats for EVS mono and IVAS Stereo bitstreams"

  • Related issues: #1419
  • Requested reviewers:

Reason why this change is needed

  • Currently the decoder crashes with a segfault if an IVAS commandline with OutputConf is specified for EVS Mono bitstreams. For IVAS Stereo, rendering to ambisonics outputs and binaural flavours returns an Invalid output format error.
  • Decoder format switching between formats does not allow preserving output configuration, since not all output formats are supported for all input (bitstream) formats

Description of the change

  • Rendering is enabled for:
    • Mono to all output formats 1
    • Stereo to all output formats (i.e. extends now to ambisonics and binaural variants)
  • By default, the upmix for mono/stereo is non-spatial; this means:
    • Mono/[Stereo] to ambisonics route to W/[W±Y]
    • Binaural rendering uses passive upmix (non-diegetic panning to center)/[passthrough]
  • To enable a spatial upmix, a renderer configuration file must be supplied specifying channel positions (chapter name MSUPMIX):
    • This file contains 3 parameters: AZIMUTH[], ELEVATION[] and RADIUS[].
    • Radius 0 will be interpreted as omni/non-spatial
    • Any other values are used as a spatial position for rendering (1 value required for mono, 2 for stereo)
  • Technical details:
    • Existing renderers are reused for implementation of this functionality. Overall the changes in this MR are really only "plumbing" to get things to work.
    • LS setup conversion renderer is used for multichannel outputs
    • Ambisonics spherical response (ivas_mc2sba() or passive upmix for ambisonics rendering)
    • In case of binaural rendering the precedent set by high bitrate multichannel is followed:
      • TD object renderer is used for everything except BRIRs, with channel positions set and propagated through hTransSetup.
      • CRend is used for BINAURAL_ROOM_IR; if headrotation is enabled rotation of input sources is performed using EFAP on the pseudo 7.1+4 layout prior to rendering.

⚠️ This work is still in progress! Outstanding tasks:

  • Split rendering Mono with LC3plus to BINAURAL_SPLIT_CODED has clicks (_PCM is OK)
  • External renderer implementation is missing -> will be dealt with in another issue
  • VoIP mode untested
  • Validation of user-input positions in render config file missing
  • Split rendering outputs are currently crashing
  • Rendering mono/stereo to ambisonics needs to be changed to default to a passive upmix (currently the spatial option is enabled by default)
  • Radius functionality is not yet verified/implemented

Affected operating points

  • Decoding a mono bitstream when specifying output format
  • Decoding a stereo bitstream to binaural variants or ambisonics

Overview of rendering paths

Input Format Output Format Render config supplied? Rendering path
Mono Mono N/A Not changed in this MR (direct decoding)
Mono Stereo N/A Not changed in this MR (Non-diegetic upmix)
Mono Multichannel 2 N/A Mixing matrices (ivas_ls_setup_conversion())
Mono Ambisonics 3 N/A Passthrough to channel index 0 (W/Omni)
Mono Binaural 4 NO Non-diegetic upmix
Mono Binaural (ROOM_IR) NO Non-diegetic upmix
Mono Binaural 4 YES TD Object renderer, position specified via renderer config
Mono Binaural (ROOM_IR) YES CRend, position specified by render config but only 0,0 is supported
Stereo Mono N/A Not changed in this MR (Directly handled by CPE decoding)
Stereo Stereo N/A Not changed in this MR (passthrough/direct decoding)
Stereo Multichannel 2 N/A Not changed in this MR (ivas_ls_setup_conversion())
Stereo Ambisonics 3 N/A M/S routing to W and Y (W/mid = \frac{L+R}{2}; Y/side = \frac{L-R}{2})
Stereo Binaural 4 NO Passthrough as Stereo
Stereo Binaural (ROOM_IR) NO Passthrough as Stereo
Stereo Binaural 4 YES TD Object renderer, positions specified via render config
Stereo Binaural (ROOM_IR) YES CRend, positions specified by render config but will snap to ±30 or ±90 azimuth with zero elevation

Commandlines for testing and review (also in zip file below).

Test script
#!/usr/bin/bash
set -euxo pipefail

### Mono upmix testing ###
# generate bitstream
../IVAS_cod 128000 48 ../scripts/testv/stv48c.wav mono.192
# mono to multichannel
../IVAS_dec -evs 5_1 48 mono.192 mono_to_51.wav
# mono to ambisonics
../IVAS_dec -evs FOA 48 mono.192 mono_to_FOA.wav
# mono to binaural formats (nonspatial by default)
../IVAS_dec -evs BINAURAL 48 mono.192 mono_to_hrir_dry.wav
../IVAS_dec -evs BINAURAL_ROOM_REVERB 48 mono.192 mono_to_reverb_dry.wav
../IVAS_dec -evs BINAURAL_ROOM_IR 48 mono.192 mono_to_brir_dry.wav
# mono to binaural formats with spatial upmix
../IVAS_dec -evs -render_config mono.txt BINAURAL 48 mono.192 mono_to_hrir_spatial.wav
../IVAS_dec -evs -render_config mono.txt BINAURAL_ROOM_REVERB 48 mono.192 mono_to_reverb_spatial.wav
../IVAS_dec -evs -render_config mono.txt BINAURAL_ROOM_IR 48 mono.192 mono_to_brir_spatial.wav

### Mono split rendering ###
## BINAURAL_SPLIT_CODED
# LCLD
# 0DOF@256kbps
../IVAS_dec -evs -T ../scripts/trajectories/full_circle_in_15s_delayed.csv -render_config mono_split_LCLD_0dof.txt BINAURAL_SPLIT_CODED 48 mono.192 mono_split_LCLD_0dof.192
../ISAR_post_rend -i mono_split_LCLD_0dof.192 -if BINAURAL_SPLIT_CODED -o mono_split_0dof_LCLD_coded.wav -T ../scripts/trajectories/full_circle_in_15s.csv -fs 48
# 2DOF@512kbps
../IVAS_dec -evs -T ../scripts/trajectories/full_circle_in_15s_delayed.csv -render_config mono_split_LCLD_2dof.txt BINAURAL_SPLIT_CODED 48 mono.192 mono_split_LCLD_2dof.192
../ISAR_post_rend -i mono_split_LCLD_2dof.192 -if BINAURAL_SPLIT_CODED -o mono_split_2dof_LCLD_coded.wav -T ../scripts/trajectories/full_circle_in_15s.csv -fs 48
# 3DOFHQ@768kbps
../IVAS_dec -evs -T ../scripts/trajectories/full_circle_in_15s_delayed.csv -render_config mono_split_LCLD_3dofhq.txt BINAURAL_SPLIT_CODED 48 mono.192 mono_split_LCLD_3dofhq.192
../ISAR_post_rend -i mono_split_LCLD_3dofhq.192 -if BINAURAL_SPLIT_CODED -o mono_split_3dofhq_LCLD_coded.wav -T ../scripts/trajectories/full_circle_in_15s.csv -fs 48
# LC3plus
# 0DOF@256kbps
../IVAS_dec -evs -T ../scripts/trajectories/full_circle_in_15s_delayed.csv -render_config mono_split_LC3plus_0dof.txt BINAURAL_SPLIT_CODED 48 mono.192 mono_split_LC3plus_0dof.192
../ISAR_post_rend -i mono_split_LC3plus_0dof.192 -if BINAURAL_SPLIT_CODED -o mono_split_0dof_LC3plus_coded.wav -T ../scripts/trajectories/full_circle_in_15s.csv -fs 48
# 2DOF@512kbps
../IVAS_dec -evs -T ../scripts/trajectories/full_circle_in_15s_delayed.csv -render_config mono_split_LC3plus_2dof.txt BINAURAL_SPLIT_CODED 48 mono.192 mono_split_LC3plus_2dof.192
../ISAR_post_rend -i mono_split_LC3plus_2dof.192 -if BINAURAL_SPLIT_CODED -o mono_split_2dof_LC3plus_coded.wav -T ../scripts/trajectories/full_circle_in_15s.csv -fs 48
# 3DOFHQ@768kbps
../IVAS_dec -evs -T ../scripts/trajectories/full_circle_in_15s_delayed.csv -render_config mono_split_LC3plus_3dofhq.txt BINAURAL_SPLIT_CODED 48 mono.192 mono_split_LC3plus_3dofhq.192
../ISAR_post_rend -i mono_split_LC3plus_3dofhq.192 -if BINAURAL_SPLIT_CODED -o mono_split_3dofhq_LC3plus_coded.wav -T ../scripts/trajectories/full_circle_in_15s.csv -fs 48
## BINAURAL_SPLIT_PCM
# LCLD
# 0DOF@256kbps
../IVAS_dec -evs -T ../scripts/trajectories/full_circle_in_15s_delayed.csv -render_config mono_split_LCLD_0dof.txt -om mono_split_LCLD_0dof.md BINAURAL_SPLIT_PCM 48 mono.192 mono_split_LCLD_0dof.192
../ISAR_post_rend -i mono_split_LCLD_0dof.192 -im mono_split_LCLD_0dof.md -if BINAURAL_SPLIT_PCM -o mono_split_0dof_LCLD_pcm.wav -T ../scripts/trajectories/full_circle_in_15s.csv -fs 48
# 2DOF@512kbps
../IVAS_dec -evs -T ../scripts/trajectories/full_circle_in_15s_delayed.csv -render_config mono_split_LCLD_2dof.txt -om mono_split_LCLD_2dof.md BINAURAL_SPLIT_PCM 48 mono.192 mono_split_LCLD_2dof.192
../ISAR_post_rend -i mono_split_LCLD_2dof.192 -im mono_split_LCLD_2dof.md -if BINAURAL_SPLIT_PCM -o mono_split_2dof_LCLD_pcm.wav -T ../scripts/trajectories/full_circle_in_15s.csv -fs 48
# 3DOFHQ@768kbps
../IVAS_dec -evs -T ../scripts/trajectories/full_circle_in_15s_delayed.csv -render_config mono_split_LCLD_3dofhq.txt -om mono_split_LCLD_3dofhq.md BINAURAL_SPLIT_PCM 48 mono.192 mono_split_LCLD_3dofhq.192
../ISAR_post_rend -i mono_split_LCLD_3dofhq.192 -im mono_split_LCLD_3dofhq.md -if BINAURAL_SPLIT_PCM -o mono_split_3dofhq_LCLD_pcm.wav -T ../scripts/trajectories/full_circle_in_15s.csv -fs 48
# LC3plus
# 0DOF@256kbps
../IVAS_dec -evs -T ../scripts/trajectories/full_circle_in_15s_delayed.csv -render_config mono_split_LC3plus_0dof.txt -om mono_split_LC3plus_0dof.md BINAURAL_SPLIT_PCM 48 mono.192 mono_split_LC3plus_0dof.192
../ISAR_post_rend -i mono_split_LC3plus_0dof.192 -im mono_split_LC3plus_0dof.md -if BINAURAL_SPLIT_PCM -o mono_split_0dof_LC3plus_pcm.wav -T ../scripts/trajectories/full_circle_in_15s.csv -fs 48
# 2DOF@512kbps
../IVAS_dec -evs -T ../scripts/trajectories/full_circle_in_15s_delayed.csv -render_config mono_split_LC3plus_2dof.txt -om mono_split_LC3plus_2dof.md BINAURAL_SPLIT_PCM 48 mono.192 mono_split_LC3plus_2dof.192
../ISAR_post_rend -i mono_split_LC3plus_2dof.192 -im mono_split_LC3plus_2dof.md -if BINAURAL_SPLIT_PCM -o mono_split_2dof_LC3plus_pcm.wav -T ../scripts/trajectories/full_circle_in_15s.csv -fs 48
# 3DOFHQ@768kbps
../IVAS_dec -evs -T ../scripts/trajectories/full_circle_in_15s_delayed.csv -render_config mono_split_LC3plus_3dofhq.txt -om mono_split_LC3plus_3dofhq.md BINAURAL_SPLIT_PCM 48 mono.192 mono_split_LC3plus_3dofhq.192
../ISAR_post_rend -i mono_split_LC3plus_3dofhq.192 -im mono_split_LC3plus_3dofhq.md -if BINAURAL_SPLIT_PCM -o mono_split_3dofhq_LC3plus_pcm.wav -T ../scripts/trajectories/full_circle_in_15s.csv -fs 48

### Stereo upmix testing ###
../IVAS_cod -stereo 256000 48 ../scripts/testv/stvST48c.wav stereo.192
# stereo to ambisonics
../IVAS_dec FOA 48 stereo.192 stereo_to_FOA.wav
# stereo to binaural formats (nonspatial by default)
../IVAS_dec BINAURAL 48 stereo.192 stereo_to_hrir_dry.wav
../IVAS_dec BINAURAL_ROOM_REVERB 48 stereo.192 stereo_to_reverb_dry.wav
../IVAS_dec BINAURAL_ROOM_IR 48 stereo.192 stereo_to_brir_dry.wav
# stereo to binaural formats with spatial upmix
../IVAS_dec -render_config stereo.txt BINAURAL 48 stereo.192 stereo_to_hrir_spatial.wav
../IVAS_dec -render_config stereo.txt BINAURAL_ROOM_REVERB 48 stereo.192 stereo_to_reverb_spatial.wav
../IVAS_dec -render_config stereo.txt BINAURAL_ROOM_IR 48 stereo.192 stereo_to_brir_spatial.wav
### Stereo split rendering ##
## BINAURAL_SPLIT_CODED
# LCLD
# 0DOF@256kbps
../IVAS_dec -T ../scripts/trajectories/full_circle_in_15s_delayed.csv -render_config stereo_split_LCLD_0dof.txt BINAURAL_SPLIT_CODED 48 stereo.192 stereo_split_LCLD_0dof.192
../ISAR_post_rend -i stereo_split_LCLD_0dof.192 -if BINAURAL_SPLIT_CODED -o stereo_split_0dof_LCLD_coded.wav -T ../scripts/trajectories/full_circle_in_15s.csv -fs 48
# 2DOF@512kbps
../IVAS_dec -T ../scripts/trajectories/full_circle_in_15s_delayed.csv -render_config stereo_split_LCLD_2dof.txt BINAURAL_SPLIT_CODED 48 stereo.192 stereo_split_LCLD_2dof.192
../ISAR_post_rend -i stereo_split_LCLD_2dof.192 -if BINAURAL_SPLIT_CODED -o stereo_split_2dof_LCLD_coded.wav -T ../scripts/trajectories/full_circle_in_15s.csv -fs 48
# 3DOFHQ@768kbps
../IVAS_dec -T ../scripts/trajectories/full_circle_in_15s_delayed.csv -render_config stereo_split_LCLD_3dofhq.txt BINAURAL_SPLIT_CODED 48 stereo.192 stereo_split_LCLD_3dofhq.192
../ISAR_post_rend -i stereo_split_LCLD_3dofhq.192 -if BINAURAL_SPLIT_CODED -o stereo_split_3dofhq_LCLD_coded.wav -T ../scripts/trajectories/full_circle_in_15s.csv -fs 48
# LC3plus
# 0DOF@256kbps
../IVAS_dec -T ../scripts/trajectories/full_circle_in_15s_delayed.csv -render_config stereo_split_LC3plus_0dof.txt BINAURAL_SPLIT_CODED 48 stereo.192 stereo_split_LC3plus_0dof.192
../ISAR_post_rend -i stereo_split_LC3plus_0dof.192 -if BINAURAL_SPLIT_CODED -o stereo_split_0dof_LC3plus_coded.wav -T ../scripts/trajectories/full_circle_in_15s.csv -fs 48
# 2DOF@512kbps
../IVAS_dec -T ../scripts/trajectories/full_circle_in_15s_delayed.csv -render_config stereo_split_LC3plus_2dof.txt BINAURAL_SPLIT_CODED 48 stereo.192 stereo_split_LC3plus_2dof.192
../ISAR_post_rend -i stereo_split_LC3plus_2dof.192 -if BINAURAL_SPLIT_CODED -o stereo_split_2dof_LC3plus_coded.wav -T ../scripts/trajectories/full_circle_in_15s.csv -fs 48
# 3DOFHQ@768kbps
../IVAS_dec -T ../scripts/trajectories/full_circle_in_15s_delayed.csv -render_config stereo_split_LC3plus_3dofhq.txt BINAURAL_SPLIT_CODED 48 stereo.192 stereo_split_LC3plus_3dofhq.192
../ISAR_post_rend -i stereo_split_LC3plus_3dofhq.192 -if BINAURAL_SPLIT_CODED -o stereo_split_3dofhq_LC3plus_coded.wav -T ../scripts/trajectories/full_circle_in_15s.csv -fs 48
## BINAURAL_SPLIT_PCM
# LCLD
# 0DOF@256kbps
../IVAS_dec -T ../scripts/trajectories/full_circle_in_15s_delayed.csv -render_config stereo_split_LCLD_0dof.txt -om stereo_split_LCLD_0dof.md BINAURAL_SPLIT_PCM 48 stereo.192 stereo_split_LCLD_0dof.192
../ISAR_post_rend -i stereo_split_LCLD_0dof.192 -im stereo_split_LCLD_0dof.md -if BINAURAL_SPLIT_PCM -o stereo_split_0dof_LCLD_pcm.wav -T ../scripts/trajectories/full_circle_in_15s.csv -fs 48
# 2DOF@512kbps
../IVAS_dec -T ../scripts/trajectories/full_circle_in_15s_delayed.csv -render_config stereo_split_LCLD_2dof.txt -om stereo_split_LCLD_2dof.md BINAURAL_SPLIT_PCM 48 stereo.192 stereo_split_LCLD_2dof.192
../ISAR_post_rend -i stereo_split_LCLD_2dof.192 -im stereo_split_LCLD_2dof.md -if BINAURAL_SPLIT_PCM -o stereo_split_2dof_LCLD_pcm.wav -T ../scripts/trajectories/full_circle_in_15s.csv -fs 48
# 3DOFHQ@768kbps
../IVAS_dec -T ../scripts/trajectories/full_circle_in_15s_delayed.csv -render_config stereo_split_LCLD_3dofhq.txt -om stereo_split_LCLD_3dofhq.md BINAURAL_SPLIT_PCM 48 stereo.192 stereo_split_LCLD_3dofhq.192
../ISAR_post_rend -i stereo_split_LCLD_3dofhq.192 -im stereo_split_LCLD_3dofhq.md -if BINAURAL_SPLIT_PCM -o stereo_split_3dofhq_LCLD_pcm.wav -T ../scripts/trajectories/full_circle_in_15s.csv -fs 48
# LC3plus
# 0DOF@256kbps
../IVAS_dec -T ../scripts/trajectories/full_circle_in_15s_delayed.csv -render_config stereo_split_LC3plus_0dof.txt -om stereo_split_LC3plus_0dof.md BINAURAL_SPLIT_PCM 48 stereo.192 stereo_split_LC3plus_0dof.192
../ISAR_post_rend -i stereo_split_LC3plus_0dof.192 -im stereo_split_LC3plus_0dof.md -if BINAURAL_SPLIT_PCM -o stereo_split_0dof_LC3plus_pcm.wav -T ../scripts/trajectories/full_circle_in_15s.csv -fs 48
# 2DOF@512kbps
../IVAS_dec -T ../scripts/trajectories/full_circle_in_15s_delayed.csv -render_config stereo_split_LC3plus_2dof.txt -om stereo_split_LC3plus_2dof.md BINAURAL_SPLIT_PCM 48 stereo.192 stereo_split_LC3plus_2dof.192
../ISAR_post_rend -i stereo_split_LC3plus_2dof.192 -im stereo_split_LC3plus_2dof.md -if BINAURAL_SPLIT_PCM -o stereo_split_2dof_LC3plus_pcm.wav -T ../scripts/trajectories/full_circle_in_15s.csv -fs 48
# 3DOFHQ@768kbps
../IVAS_dec -T ../scripts/trajectories/full_circle_in_15s_delayed.csv -render_config stereo_split_LC3plus_3dofhq.txt -om stereo_split_LC3plus_3dofhq.md BINAURAL_SPLIT_PCM 48 stereo.192 stereo_split_LC3plus_3dofhq.192
../ISAR_post_rend -i stereo_split_LC3plus_3dofhq.192 -im stereo_split_LC3plus_3dofhq.md -if BINAURAL_SPLIT_PCM -o stereo_split_3dofhq_LC3plus_pcm.wav -T ../scripts/trajectories/full_circle_in_15s.csv -fs 48

Zip file with scripts and input render config files. Unzip in IVAS root or adjust paths accordingly.

  1. -evs flag is used with an IVAS commandline to enable EVS to support OutputConf. This means that BC to legacy EVS commandline is preserved. Example: IVAS_dec -evs BINAURAL 48 bit out.wav

  2. 5.1, 7.1, 5.1+2, 5.1+4, 7.1+4 ↩️2

  3. FOA, HOA2, HOA3 ↩️2

  4. BINAURAL, BINAURAL_ROOM_REVERB, BINAURAL_SPLIT_CODED, BINAURAL_SPLIT_PCM ↩️2 ↩️3 ↩️4

Edited by Archit Tamarapu

Merge request reports

Loading