Skip to content

Binaural renderer flush in OMASA VoIP

This issue is spin-off from !2052 (merged) and follows the comment at !2052 (comment 67323):

In the case of OMASA bitrate switching from Disc mode to another mode and binaural output in VoIP decoding, a change from rendering with a granularity of 5 ms to rendering with a granularity of 1.25 ms happens. In this case, buffered samples are possibly flushed in the function ivas_jbm_dec_flush_renderer(). In the current main, only the samples from ISM channels are flushed while samples from both MASA and ISM channels should likely be flushed.

I did a short experiment with OMASA 1ISM and bitrate switching between 32 kbps and 256 kbps while the ISM channel is all zeroes to emphasize the flushing effect. The bitstream is attached: bit.voip and the decoder command line is

ivas_dec.exe -voip binaural 48 bit.voip syn.dec

The figure below then shows three outputs:

  • reference: constant bitrate 256 kbps decoding (i.e. no bitrate switching)
  • baseline: decoding using the command line above (i.e. bitrate switching)
  • new: "fixed" decoding using the command line above (i.e. bitrate switching) image The figure corresponds to frame 110 which is a typical example of the issue: in the baseline, there is a segment 1.25 ms long with zeros. These zeros are not present in the "fixed" version. However, there seems to be a shift between the correct (reference) synthesis and both baseline and new synthesis, so the issue seems to be more tricky.

The "fix" consists of flushing MASA and ISM channels instead of ISM ones only in ivas_jbm_dec_flush_renderer() (it replicates what is done in OSBA in !2052 (merged)):

else if ( st_ivas->ivas_format == MASA_ISM_FORMAT || st_ivas->ivas_format == MASA_FORMAT )
        {
            if ( ism_mode_old == ISM_MASA_MODE_DISC )
            {
                float *tc_local[MAX_NUM_OBJECTS];

                for ( ch_idx = 0; ch_idx < st_ivas->nchan_ism; ch_idx++ )
                {
                    tc_local[ch_idx] = &st_ivas->hTcBuffer->tc[ch_idx + 2][hTcBuffer->n_samples_rendered];
                    mvr2r( st_ivas->hMasaIsmData->delayBuffer[ch_idx], tc_local[ch_idx], st_ivas->hMasaIsmData->delayBuffer_size );
                }

#ifdef FIX
                uint16_t nSamplesAvailableNext;
                ISM_MODE ism_mode_orig;
                RENDERER_TYPE renderer_type_orig;
                int32_t ivas_total_brate;
                ivas_total_brate = st_ivas->hDecoderConfig->ivas_total_brate;
                renderer_type_orig = st_ivas->renderer_type;
                ism_mode_orig = st_ivas->ism_mode;
                st_ivas->ism_mode = ism_mode_old;
                st_ivas->renderer_type = renderer_type_old;
                st_ivas->hDecoderConfig->ivas_total_brate = st_ivas->hDecoderConfig->last_ivas_total_brate;


                st_ivas->hSpatParamRendCom->nb_subframes = 1;
                st_ivas->hSpatParamRendCom->subframes_rendered = 0;
                //st_ivas->hSpatParamRendCom->subframe_nbslots[0] = JBM_CLDFB_SLOTS_IN_SUBFRAME;
                //st_ivas->hSpatParamRendCom->slots_rendered = 0;
                st_ivas->hSpatParamRendCom->num_slots = JBM_CLDFB_SLOTS_IN_SUBFRAME;

                if ( ( error = ivas_omasa_dirac_td_binaural_jbm( st_ivas, (uint16_t) hTcBuffer->n_samples_granularity, nSamplesRendered, &nSamplesAvailableNext, CPE_CHANNELS, p_output ) ) != IVAS_ERR_OK )
                {
                    return error;
                }

                st_ivas->ism_mode = ism_mode_orig;
                st_ivas->renderer_type = renderer_type_orig;
                st_ivas->hDecoderConfig->ivas_total_brate = ivas_total_brate;
#else
                if ( ( error = ivas_td_binaural_renderer_sf( st_ivas, p_output, hTcBuffer->n_samples_granularity ) ) != IVAS_ERR_OK )
                {
                    return error;
                }
#endif
            }
        }

Tagging @laitinenmik and @pihlajakuja for awarness.