[non-be][rend-non-be][split-non-be][fix] float issue 1521: SBA loudness for binaural and stereo (!2607) · Merge requests · IVAS Codec Public Collaboration / IVAS Codec

Related issues: #1521
Requested reviewers:

Reason why this change is needed

SBA loudness across output formats is inconsistent, especially BINAURAL output is too loud
Stereo output on the other hand was too quiet leading to a 5-6 LKFS gap between the two output formats
See latest presentation of loudness results

An analysis over 65 ambisonics test items (non-normalized, varying content) was performed. The omni / W channel was used as an anchor or reference loudness level instead of the binaural or 7.1+4 rendering that is currently used (processing scripts) in an attempt to remove the renderer as a variable. Mono, Stereo and Binaural output loudness was measured for the scope of the issue.

Mono is important as a reference/anchor ("rendering" is a passthrough of W) and Stereo <=> Binaural switching is also an important scenario to consider.

This analysis revealed the following over all bitrates and ambisonics orders:

∆ = output LKFS - W LKFS from source ambisonics item

Output Format	Mean ∆ (LKFS)	Median ∆ (LKFS)
Mono	+0.28	+0.61
Stereo	-1.28	-1.05
Binaural	+3.79	+4.07

While mono is quite reasonable, the jump between stereo and binaural is 5.07 LKFS (mean).

This demonstrates a consistent quietness for stereo, and a loudness for binaural.

To ensure consistency between all three of these formats without some kind of content dependent loudness equalization, empirical broadband compensation factors are currently the simplest solution.

Reasoning for stereo gain adjustment

For the stereo case, a simple example with a pink noise source at the front center was checked. The FOA SH response for such a source is:

\begin{array}{cccc} W & Y & Z & X \\ \hline 1 & 0 & 0 & 1 \end{array}

Considering the current rendering of \frac{W±Y}{2}, the output is L = R = \frac{W}{2} \implies L^2 + R^2 = \frac{W^2}{2}.

This means that compared to mono there is an attenuation of 3dB.

To match channel energy¹, a constant power gain of \frac{1}{\sqrt{2}} should be used instead, such that L = R = \frac{W}{\sqrt{2}} \implies L^2 + R^2 = W^2.

Applying the same logic to a fully lateral source (SH gain is instead 1 for Y), the modification results in L^2 + R^2 = \frac{W^2 + Y^2}{2} \to W^2 + Y^2.

If W = Y then this changes W^2 \to 2W^2 and in the general case 1+y^2 where y \in [-1, 1] thus ranging from 0dB to +3dB.

In other words: the old gain produced 0dB for lateral sources with a -3dB at front center. The new gain ensures the front direction has 0dB (same level as mono content being upmixed) and the lateral direction has +3dB.

Over varying spatial content, this change produces now an average positive delta of 1.71 LKFS over mono.

Reasoning for binaural gain adjustment

Comparing with the stereo case which depends on the ambisonics Y component and lateral source energy, binaural energy is more complex due to the HRTF. Binaural loudness values do not seem to be ambisonics order dependent.

The SD to SHD conversion was inspected and preserves the diffuse field equalization. A potential contributing factor is also stage 1 of the K-weighting filter in BS.1770 which applies a high shelf to compensate for head effects (Annex 1), which in combination with the HRTF is a "double head effect" that may inflate the reported LKFS value.

Given the complexity of this rendering path, a full in-depth analysis of every contributing factor was not possible. The proposed empirical gain reduces the mean difference from +3.79 LKFS to +0.83 LKFS and the delta between binaural and stereo is also brought to a reasonable level.

Description of the change

Bring stereo and binaural output in line with each other by:
- Applying constant power panning instead of amplitude panning for Ambisonics to Stereo output i.e. \frac{W±Y}{2} \to \frac{W±Y}{\sqrt{2}}
- Apply an empirical gain for binaural output, this is \frac{1}{\sqrt{2}} or -3.01dB

Post-fix measurements

Output Format	Mean ∆ (LKFS)	Median ∆ (LKFS)
Mono (no change)	+0.28	+0.61
Stereo	+1.71	+1.96
Binaural	+0.83	+1.09

Mean ∆ between stereo and binaural is now 0.88 LKFS.

Affected operating points

SBA binaural (all non-IR variants) and stereo output for decoder and external renderer

Loudness as measured according BS.1770 is essentially mean squared energy ↩

Edited Apr 29, 2026 by Archit Tamarapu

[non-be][rend-non-be][split-non-be][fix] float issue 1521: SBA loudness for binaural and stereo