Skip to content

Quality issue for ISM1 MDCT core at 32 kbps

When analyzing the selection test results, we noticed a quality issue for ISM1 ~ 32 kbps when the MDCT core was selected. The issue seems to be related to the MDCT classifier.

In particular, the sample dm6ba1s06 in IVAS scores worse than EVS in the ISM4 test, namely conditions c01 (4xEVS at 32 kbps) and c07 (IVAS at 128 kbps).

We tried to narrow down the issue and found that it is mainly related to object # 3. When object # 3 is individually coded in the ISM1 system using

ivas_cod.exe -ism 1 NULL 32000 48 dm6ba1s06_C_obj3.wav bit
ivas_dec.exe EXT 48 bit syn.ism1

and compared to the EVS coding using

ivas_cod.exe 32000 48 dm6ba1s06_C_obj3.wav bit
ivas_dec.exe 48 bit syn.evs

it was noticed that the ISM1 synthesis suffers from several quality issues. It can be compared using the following signals. Note that the input sentence is repeated twice to follow the selection test processing. Thus, please, listen to the second sentence starting at 10 sec.

dm6ba1s06_C_obj3 syn.evs syn.ism1

Observation 1: In EVS, the HQ core is selected for the majority of the frames while the TCX core dominates in IVAS (again, focus on the second sentence). From the top: input, core decision for EVS, core decision for IVAS

image

The differences come from ivas_decision_matrix_enc():

    else /* sp_aud_decision1 == 1 && *sp_aud_decision2 == 1 */
    {
        /* music w. TCX or HQ core */
        st->core = TCX_20_CORE;

        if ( ... || **st->sp_aud_decision0 == 0 )
        {
            st->core = TCX_20_CORE;
        }
        else
        {
            /* select TCX core or HQ core using bits_frame_nominal to match the TCX configuration bitrate */
            st->core = mdct_classifier( st, fft_buff, enerBuffer, st->bits_frame_nominal * FRAMES_PER_SEC );
        }
    }

while in EVS, there is just

st->core = mdct_classifier( st, fft_buff, enerBuffer, st->total_brate );

Note the bitrate threshold difference: st->bits_frame_nominal * 50 in IVAS (==28.25 kbps) vs. st->total_brate in EVS (== 32 kbps).

Observation 2: Temporal events (attacks) are diffused/smeared in IVAS TCX core wrt. EVS. The most obvious subjective problem is at frame 687. From the top: input, EVS synthesis, IVAS synthesis:

image

Similar onset issues can be observed e.g. in frame 553

image

or in frame 598

image

but it seems to be a general issue (at least frames 553, 598, 682, 687, 697). When looking at the output from the transient detector, there are no significant differences between EVS and IVAS though.

Observation 3:

When forcing the HQ core instead of the TCX core by hacking `ivas_decision_matrix_enc()` as
    ```
else /* sp_aud_decision1 == 1 && *sp_aud_decision2 == 1 */
    {
        /* music w. TCX or HQ core */
        st->core = TCX_20_CORE;

        if ( st->element_mode == IVAS_CPE_TD || st->sp_aud_decision0 == 0 )
        {
            st->core = TCX_20_CORE;
            st->core = HQ_CORE; // fix
        }
        else
        {
            /* select TCX core or HQ core using bits_frame_nominal to match the TCX configuration bitrate */
            //st->core = mdct_classifier( st, fft_buff, enerBuffer, st->bits_frame_nominal * FRAMES_PER_SEC );
            st->core = mdct_classifier( st, fft_buff, enerBuffer, st->element_brate );
        }

the coding of onsets and the quality significantly increased:

syn.hq

From the top: input, EVS synthesis IVAS synthesis, IVAS with forced HQ core synthesis

image

When comes to the general quality, e.g. in segment of cca. frames 530-600, the TCX core variant sounds more like "broadband noise" and lacks the fine details present in the HQ core variant.

The original binaural ISM4 selection samples are below:

Edited by jelinek