Skip to content

Energy calculation in quantization gain estimation for MDCT-Stereo stereo mode decision not working correctly

Basic info

  • Fixed point:

Bug description

I did an investigation on the MDCT-Stereo M/S stereo mode decision mechanism and found that the scalar quantization gain estimation can behave quite differently from the floating point reference. The bit estimates for the stereo bands used in the function MsStereoDecision_fx() can differ greatly. Below I plotted the maximum, median and minimum difference between BASOP and float code in bit estimation for the stereo bands over frames. The test signal used was ltv48_STEREO.wav and I zoomed in on the frame with the biggest deviations (number 2827). This plot is for the L channel only and the logging code is this one: https://forge.3gpp.org/rep/sa4/audio/ivas-basop/-/blob/ivas-float-update/lib_enc/ivas_stereo_mdct_stereo_enc.c?ref_type=heads#L1074.

Screenshot 2025-04-09 at 08.42.52.png

In this frame, the variable GLR_fx differs between float and BASOP code. It is returned in this line:

    GLR_fx = SQ_gain_estimate_stereo_fx( specL_fx, specL_e, specR_fx, specR_e, nBitsAvailable, length, &e_GLR ); /* Q31-e_GLR */

Inside this function (lib_enc/ivas_stereo_mdct_stereo_enc_fx.c:863), the input spectra (xL_fx) look similar between float and BASOP for frame 2827 (scaling difference may be due to my plotting):

Screenshot 2025-04-08 at 17.50.36.pngScreenshot 2025-04-08 at 17.51.21.png

In the first loop, the energy of this spectrum is calculated. After the loop, the energy arrays (ener_fx) look quite different between BASOP and float:

Screenshot 2025-04-08 at 17.46.19.pngScreenshot 2025-04-08 at 17.46.42.png

In the first few bins, the shape is still similar, but there seems to be something going wrong in the BASOP energy calculation (right plot). This wrong energy array

Some observations in the code:

  • en_fx array: the comment at declaration says “Q(26)”, but in line 872, it is set with “ 0.01 in Q31” (21474836)
  • in line 865, the same value (21474836) is labeled as "/0.01 in Q15/"
  • optimzation possible: e_xL * 2 and e_xR * 2 can be precomputed
  • input arguments xL_fx, e_xL, xR_fx, e_xR can be declared const as they are not changed in the function

Ways to reproduce

One needs to activate the DEBUGGING switch to use the core forcing mechanism.

./IVAS_cod -force tcx20 -stereo 96000 48 ltv48_STEREO.wav bit