Basop Encoder: Stereo DTX Front VAD Flag Mismatch
# Basic info <!--- Add commit SHA used to reproduce --> I do not have specific SHA for the initial analysis. The commit for both was from April 3rd. However I have tested some of the items with these versions: - Float reference: - Encoder (float): b63c033f6c392878104cf9af73b00159945c7d32 - Decoder (float): b63c033f6c392878104cf9af73b00159945c7d32 - Fixed point: - Encoder (fixed): 1ccc3c58692b0b2dda571d435957ae21e220f67a - Decoder (fixed): # Bug description The front VAD flag decisions were investigated for floating point and fixed point code outputs for low bitrate (13.2kbps) Stereo DTX with 71 test items. Most of the items contain background noise of different versions (such as train station or office). Some test items contain both speech and background noise. The percentage of frames with different front VAD flags (float vs fixed) for each test item is presented: ![DifferentFramesPercentage71.svg](/uploads/ba8930027be788c13c3ccf054bbf2055/DifferentFramesPercentage71.svg) The highest difference is 6.5% for a car sound item without speech. I have plotted the difference between the left channels of the outputs along with FrontVAD difference: ![4VWGolf90spclab](/uploads/ea87fab495e1a4e2f71f9daf4d7237f5/4VWGolf90spclab.png) The different frontVAD decisions affect the whole signal and the outputs sound very different. There is already an open issue about this specific item: https://forge.3gpp.org/rep/sa4/audio/ivas-basop/-/issues/1410#note_69661 For a speech+background item with 2.4% difference, the plots look like this: ![SpeechTrain2spclab](/uploads/617bda0e7a078fbfab73013204858e26/SpeechTrain2spclab.png) The difference after 10 seconds causes fixed point suppressing the train sound in the background. Floating point: ![RefSpTr2](/uploads/5bd3d582e802d8f5b75aabf6d30f0ea8/RefSpTr2.png) Fixed: ![DutSpTr2](/uploads/d0c996bd76637df48d9ffaf6dc8ae63b/DutSpTr2.png) For the same item, there is an energy increase in higher frequencies around 17 seconds, where frontVAD differs. # Ways to reproduce Box folder: ...\Box_EXTERNAL_IVAS_BASOP_VERIFICATION\issues\issue-1487 <!-- Commandline or script --> ```bash ./IVAS_cod_ref.exe -dtx -stereo 13200 32 spTr2.wav bitref ./IVAS_dec_ref.exe stereo 32 bitref refSpTr2.wav ./IVAS_cod.exe -dtx -stereo 13200 32 spTr2.wav bit ./IVAS_dec_ref.exe stereo 32 bit dutSpTr2.wav ``` There are a few speech+background noise items where frontVAD difference results in distortion or spectral differences in between speech segments. Issue is labeled as medium since for the test items mentioned here, the differences mainly occur for background sound segments, and not the speech segments. <!--- Below are labels that will be added but are not shown in description. This is a template to help fill them. Add further information to the first row and remove and add labels as necessary. -->
issue