Draft: [non-be][rend-non-be][split-non-be][fix] float issue 1521: SBA loudness for binaural and stereo

  • Related issues: #1521
  • Requested reviewers:

Reason why this change is needed

  • SBA loudness across output formats is inconsistent, especially BINAURAL output is too loud
  • Stereo output on the other hand was too quiet leading to a 5-6 LKFS gap between the two output formats
  • See latest presentation of loudness results

An analysis over 65 ambisonics test items (non-normalized, varying content) was performed. The omni / W channel was used as an anchor or reference loudness level instead of the binaural or 7.1+4 rendering that is currently used (processing scripts) in an attempt to remove the renderer as a variable. Mono, Stereo and Binaural output loudness was measured for the scope of the issue.

Mono is important as a reference/anchor ("rendering" is a passthrough of W) and Stereo <=> Binaural switching is also an important scenario to consider.

This analysis revealed the following over all bitrates and ambisonics orders:

∆ = output LKFS - W LKFS from source ambisonics item

Output Format Mean ∆ (LKFS) Median ∆ (LKFS)
Mono +0.28 +0.61
Stereo -1.28 -1.05
Binaural +3.79 +4.07

While mono is quite reasonable, the jump between stereo and binaural is 5.07 LKFS (mean).

This demonstrates a consistent quietness for stereo, and a loudness for binaural.

To ensure consistency between all three of these formats without some kind of content dependent loudness equalization, empirical broadband compensation factors are currently the simplest solution.

Reasoning for stereo gain adjustment

For the stereo case, a simple example with a pink noise source at the front center was checked. The FOA SH response for such a source is:

\begin{array}{cccc} W & Y & Z & X \\ \hline 1 & 0 & 0 & 1 \end{array}

Considering the current rendering of \frac{W±Y}{2}, the output is L = R = \frac{W}{2} \implies L^2 + R^2 = \frac{W^2}{2}.

This means that compared to mono there is an attenuation of 3dB.

To match channel energy1, a constant power gain of \frac{1}{\sqrt{2}} should be used instead, such that L = R = \frac{W}{\sqrt{2}} \implies L^2 + R^2 = W^2.

Applying the same logic to a fully lateral source (SH gain is instead 1 for Y), the modification results in L^2 + R^2 = \frac{W^2 + Y^2}{2} \to W^2 + Y^2.

If W = Y then this changes W^2 \to 2W^2 and in the general case 1+y^2 where y \in [-1, 1] thus ranging from 0dB to +3dB.

In other words: the old gain produced 0dB for lateral sources with a -3dB at front center. The new gain ensures the front direction has 0dB (same level as mono content being upmixed) and the lateral direction has +3dB.

Over varying spatial content, this change produces now an average positive delta of 1.71 LKFS over mono.

Reasoning for binaural gain adjustment

Comparing with the stereo case which depends on the ambisonics Y component and lateral source energy, binaural energy is more complex due to the HRTF. Binaural loudness values do not seem to be ambisonics order dependent.

The SD to SHD conversion was inspected and preserves the diffuse field equalization. A potential contributing factor is also stage 1 of the K-weighting filter in BS.1770 which applies a high shelf to compensate for head effects (Annex 1), which in combination with the HRTF is a "double head effect" that may inflate the reported LKFS value.

Given the complexity of this rendering path, a full in-depth analysis of every contributing factor was not possible. The proposed empirical gain reduces the mean difference from +3.79 LKFS to +0.83 LKFS and the delta between binaural and stereo is also brought to a reasonable level.

Description of the change

  • Bring stereo and binaural output in line with each other by:
    • Applying constant power panning instead of amplitude panning for Ambisonics to Stereo output i.e. \frac{W±Y}{2} \to \frac{W±Y}{\sqrt{2}}
    • Apply an empirical gain for binaural output, this is \frac{1}{\sqrt{2}} or -3.01dB

Post-fix measurements

Output Format Mean ∆ (LKFS) Median ∆ (LKFS)
Mono (no change) +0.28 +0.61
Stereo +1.71 +1.96
Binaural +0.83 +1.09

Mean ∆ between stereo and binaural is now 0.88 LKFS.

Affected operating points

  • SBA binaural (all non-IR variants) and stereo output for decoder and external renderer
  1. Loudness as measured according BS.1770 is essentially mean squared energy

Edited by Archit Tamarapu

Merge request reports

Loading