Quality issue for ISM1 MDCT core at 32 kbps
When analyzing the selection test results, we noticed a quality issue for ISM1 ~ 32 kbps when the MDCT core was selected. The issue seems to be related to the MDCT classifier.
In particular, the sample dm6ba1s06 in IVAS scores worse than EVS in the ISM4 test, namely conditions c01 (4xEVS at 32 kbps) and c07 (IVAS at 128 kbps).
We tried to narrow down the issue and found that it is mainly related to object # 3. When object # 3 is individually coded in the ISM1 system using
ivas_cod.exe -ism 1 NULL 32000 48 dm6ba1s06_C_obj3.wav bit
ivas_dec.exe EXT 48 bit syn.ism1
and compared to the EVS coding using
ivas_cod.exe 32000 48 dm6ba1s06_C_obj3.wav bit
ivas_dec.exe 48 bit syn.evs
it was noticed that the ISM1 synthesis suffers from several quality issues. It can be compared using the following signals. Note that the input sentence is repeated twice to follow the selection test processing. Thus, please, listen to the second sentence starting at 10 sec.
dm6ba1s06_C_obj3 syn.evs syn.ism1
Observation 1: In EVS, the HQ core is selected for the majority of the frames while the TCX core dominates in IVAS (again, focus on the second sentence). From the top: input, core decision for EVS, core decision for IVAS
The differences come from ivas_decision_matrix_enc()
:
else /* sp_aud_decision1 == 1 && *sp_aud_decision2 == 1 */
{
/* music w. TCX or HQ core */
st->core = TCX_20_CORE;
if ( ... || **st->sp_aud_decision0 == 0 )
{
st->core = TCX_20_CORE;
}
else
{
/* select TCX core or HQ core using bits_frame_nominal to match the TCX configuration bitrate */
st->core = mdct_classifier( st, fft_buff, enerBuffer, st->bits_frame_nominal * FRAMES_PER_SEC );
}
}
while in EVS, there is just
st->core = mdct_classifier( st, fft_buff, enerBuffer, st->total_brate );
Note the bitrate threshold difference: st->bits_frame_nominal * 50
in IVAS (==28.25 kbps) vs. st->total_brate
in EVS (== 32 kbps).
Observation 2: Temporal events (attacks) are diffused/smeared in IVAS TCX core wrt. EVS. The most obvious subjective problem is at frame 687. From the top: input, EVS synthesis, IVAS synthesis:
Similar onset issues can be observed e.g. in frame 553
or in frame 598
but it seems to be a general issue (at least frames 553, 598, 682, 687, 697). When looking at the output from the transient detector, there are no significant differences between EVS and IVAS though.
Observation 3:
When forcing the HQ core instead of the TCX core by hacking `ivas_decision_matrix_enc()` as
```
else /* sp_aud_decision1 == 1 && *sp_aud_decision2 == 1 */
{
/* music w. TCX or HQ core */
st->core = TCX_20_CORE;
if ( st->element_mode == IVAS_CPE_TD || st->sp_aud_decision0 == 0 )
{
st->core = TCX_20_CORE;
st->core = HQ_CORE; // fix
}
else
{
/* select TCX core or HQ core using bits_frame_nominal to match the TCX configuration bitrate */
//st->core = mdct_classifier( st, fft_buff, enerBuffer, st->bits_frame_nominal * FRAMES_PER_SEC );
st->core = mdct_classifier( st, fft_buff, enerBuffer, st->element_brate );
}
the coding of onsets and the quality significantly increased:
From the top: input, EVS synthesis IVAS synthesis, IVAS with forced HQ core synthesis
When comes to the general quality, e.g. in segment of cca. frames 530-600, the TCX core variant sounds more like "broadband noise" and lacks the fine details present in the HQ core variant.
The original binaural ISM4 selection samples are below:
- direct: dm6ba1s06.c01
- EVS at 4x32 kbps: dm6ba1s06.c04
- IVAS at 128 kbps: dm6ba1s06.c07