amplitude of downmixed signal (MC to SCE or MC to CPE) is too high for the EVS fixed point core

The problem is the following, the input the SCE core (similar observation holds for CPE) should be within the range expected by EVS. At lower bitrate, SCE and CPE rely mainly on EVS functions which were designed for 15 bits input, not 17 bits input as shown in the figures below :

on the figure below from top to bottom:

IVAS downmixed signal that is seen by the SCE module : amplitude = ± 85000
Front left section of LTV_MC51_10dB : amplitude = ± 32767
Front right section of LTV_MC51_10dB : amplitude = ± 32767
Center section of LTV_MC51_10dB : amplitude = ± 32767
LFE section of LTV_MC51_10dB : amplitude = ± 32767
Surround left section of LTV_MC51_10dB : amplitude = ± 32767
Surround right section of LTV_MC51_10dB : amplitude = ± 32767
spectrums

The next figure shows from top to bottom :

IVAS downmixed signal that is seen by the SCE module : amplitude = ± 85000
Passive downmixed of a section of LTV_MC51_10dB : amplitude = ± 12000
spectrums

Note that the passive downmixed includes the LFE.

The floating point EVS encoder might handle this discrepancy, although many quantization tables won't be optimal anymore. However, for the fixed point encoder (and decoder to a similar extend), this issue becomes even more problematic.

When the floating point downmixed signal that has an amplitude = ± 85000 is converted to 15 bits, it will be saturated so significantly that it might not looks like the original signal at all. Alternatively, if some negative exponent are used to represented it on 15 bits, that will cause issues in all the energy and noise estimation modules, where everything is de-normalised.

This has some similarities to float #1134, so maybe that has been discussed in the floating point forum and I missed it.

@vaclav, @multrus, @kiene, @malenovsky, @fotopoulou, @vasilache