Incorrect scaling of MDCT power spectrum when updating background noise msNoiseEst[]
In MDCT stereo mode, during inactive frames, the power spectrum power_spec[]
is used to update the background noise spectrum in the run_min_stats()
function, around lines 621-650:
/* Compute power spectrum twice if estimation runs on both channels.
If only on one channel, it's calculated once (for ch == 0) as the result will be the same. */
if ( ( will_estimate_noise_on_channel[0] == will_estimate_noise_on_channel[1] ) || ch == 0 )
{
/* Calculate power spectrum from MDCT coefficients and estimated MDST coefficients */
power_spec[0] = spec_in[0] * spec_in[0];
power_spec[L_FRAME16k - 1] = spec_in[L_FRAME16k - 1] * spec_in[L_FRAME16k - 1];
for ( int16_t i = 1; i < L_FRAME16k - 1; i++ )
{
float mdst = spec_in[i + 1] - spec_in[i - 1];
power_spec[i] = spec_in[i] * spec_in[i] + mdst * mdst;
}
}
noisy_speech_detection( st->hFdCngDec, st->VAD && st->m_frame_type == ACTIVE_FRAME, power_spec );
...
ApplyFdCng( NULL, st->bfi ? NULL : power_spec, NULL, NULL, st, st->bfi, 0 );
In ApplyFdCng()
, the power spectrum is used to update the periodogram periodog[]
and, consequently, msNoiseEst[]
. This happens inside perform_noise_estimation_dec()
(line 904):
if ( element_mode == IVAS_CPE_MDCT && power_spectrum != NULL )
{
/* Use MDCT-domain power spectrum instead of recalculating */
periodog = power_spectrum;
}
...
/* Adjust to desired frequency resolution by averaging over spectral partitions for SID transmission */
bandcombinepow( periodog, stopFFTbin - startBand, part, npart, psize_inv, msPeriodog );
/* Compress MS inputs */
compress_range( msPeriodog, msLogPeriodog, npart );
/* Perform minimum statistics noise estimation */
minimum_statistics( npart, nFFTpart, psize, msLogPeriodog, hFdCngDec->msNoiseFloor, msLogNoiseEst, hFdCngDec->msAlpha, hFdCngDec->msPsd, hFdCngDec->msPsdFirstMoment, hFdCngDec->msPsdSecondMoment, hFdCngDec->msMinBuf, hFdCngDec->msBminWin, hFdCngDec->msBminSubWin, hFdCngDec->msCurrentMin, hFdCngDec->msCurrentMinOut, hFdCngDec->msCurrentMinSubWindow, hFdCngDec->msLocalMinFlag, hFdCngDec->msNewMinFlag, hFdCngDec->msPeriodogBuf, &( hFdCngDec->msPeriodogBufPtr ), hFdCngDec->hFdCngCom,
DEC, element_mode );
/* Expand MS outputs */
expand_range( msLogNoiseEst, msNoiseEst, npart );
However, there is no scaling factor such as 1/N^2
applied to the MDCT power spectrum, which is required by periodog[]
and consequently msNoiseStat[]
. As a result, the estimated background noise spectrum has an overly high amplitude. The estimated bacground noise spectrum is important as it is used at severeal places such as in CNG ot stereo CNA. In the example below it leads to a severe artifact in the first DFT stereo frame after switching from MDCT stereo frames, as shown in the following graph:
The issue may be reproduced with
./IVAS_cod -dtx -stereo sw_13k2_to_128k_10fr.bin 48 p800-2-c1d_short.wav bit
./IVAS_dec STEREO 48 bit syn