Unified Stereo: Difference in actively coded frames between DTX on and off is much bigger than in float
Basic info
Observations
When assessing the ltv signals, I noticed that the difference in the actively coded parts of the signal (that is normal coding, no DTX frames) between DTX on and DTX off are in general much bigger in BASOP than they are in float. I could make this observation at all unified stereo bitrates (<=32kbps). At 24.4kbps, the float reference is even in large parts almost bitexact (at least switching between the waveforms has no visual impact and the differences are also not audible). It is not expected, that DTX on and off are completely BE in the actively coded parts (that is already not the case in EVS). But with the BASOP encoder, sometimes there are audible differences which seem to not only stem from e.g. different comfort noise being added. The nature of this audible differences varies, it is not always the case to me that one of DTX on/off is systematically better and the subjective diff is small.
The general observation is that for the float codec, the differences between DTX on and off are mainly in the BWE region and sometimes due to different comfort noise addition (my judgement without debugging) while in BASOP, usually the core is also different. Maybe this is simply a result of the different precision and is to be expected for BASOP. Imo, the behaviour of EVS fx and flt code should be checked and compared to get a sense of what is "normal" here.
I used
./IVAS_cod -stereo -dtx 24400 32 ltv32_STEREO.wav bit_dtx
./IVAS_cod -stereo 24400 32 ltv32_STEREO.wav bit
./IVAS_dec stereo 32 bit_dtx out_dtx.wav
./IVAS_dec stereo 32 bit out.wav
with fx enc -> flt dec and flt enc -> flt dec.