Accuracy of spatial metadata in ivas_mcmasa_enc_fx compared to the floating point equivalent
Basic info
Bug description
After doing BASOP encoder verification, it became evident that there is something causing differences in McMASA path. As MASA path (which is used for McMASA encoding) is already in relatively good shape, this suggested that the source is probably the spatial metadata estimation in ivas_mcmasa_enc_fx
. To analyze this, introduced binary dump code at the end of the ivas_mcmasa_enc_fx
and ivas_mcmasa_enc
functions for variables
Word32 elevation_m_values_fx[MAX_PARAM_SPATIAL_SUBFRAMES][MASA_FREQUENCY_BANDS]; // Q22
Word32 azimuth_m_values_fx[MAX_PARAM_SPATIAL_SUBFRAMES][MASA_FREQUENCY_BANDS]; // Q22
Word32 energyRatio_fx[MAX_PARAM_SPATIAL_SUBFRAMES][MASA_FREQUENCY_BANDS]; // Q31
Word32 spreadCoherence_fx[MAX_PARAM_SPATIAL_SUBFRAMES][MASA_FREQUENCY_BANDS]; // Q30
Word32 surroundingCoherence_fx[MAX_PARAM_SPATIAL_SUBFRAMES][MASA_FREQUENCY_BANDS]; // Q31
and float correspondents. Then I processed these in Matlab to create diff plots. The diff plot (fx-fl) for command IVAS_cod -mc 7_1_4 96000 48 ltv48_MC714.wav out.bit
is below (Note: figure and description edited as the old figure could not be reproduced)
In this plot, negative difference is red and positive difference is green. Ideally, the plots should be completely black. Looking at the values, we have mostly small diff but highest two bands do have high estimation errors. The diff is calculated in the corresponding Q-format (i.e., float value is converted to Q-format) and the diff values are converted back to float for plotting purposes.
As the spatial metadata estimation is done before any encoding, it should have maximal accuracy and it is not expected that there are very large differences between the values between fixed-point and floating point. Obviously, the quantization and encoding will reduce the accuracy of the values in the end but a lot of decisions in parameter encoding is based on the initial quantized parameter values. Thus, difference changes the bit use a lot.
As a second thing, the Q-format of the values should be verified. For example, spread coherence is using Q30 whereas surround coherence is Q31. Both values should be between 0 and 1 always so difference is surprising. There is also a very peculiar line of code at line 740 of ivas_mcmasa_enc_fx.c
hQMeta->q_direction[0].band_data[i].energy_ratio_fx[j] = energyRatio_fx[k][i]; // Q30
Here, the Q-format is marked as Q30. However, local value energyRatio_fx[k][i]
has comment for Q31. The target value in band_data is Q30. Based on the values themselves, Q31 is probably correct but it would make sense to verify that this is honored everywhere.
Ways to inspect data
The comparison was done by dumping for basop (at end of ivas_mcmasa_enc_fx
) with
{
static FILE* dumpFile = NULL;
if (dumpFile == NULL)
{
dumpFile = fopen("./mcmasa_basop_dump.bin", "wb");
}
for (i = 0; i < MAX_PARAM_SPATIAL_SUBFRAMES; i++)
{
fwrite(azimuth_m_values_fx[i], sizeof(Word32), MASA_FREQUENCY_BANDS, dumpFile);
fwrite(elevation_m_values_fx[i], sizeof(Word32), MASA_FREQUENCY_BANDS, dumpFile);
fwrite(energyRatio_fx[i], sizeof(Word32), MASA_FREQUENCY_BANDS, dumpFile);
fwrite(spreadCoherence_fx[i], sizeof(Word32), MASA_FREQUENCY_BANDS, dumpFile);
fwrite(surroundingCoherence_fx[i], sizeof(Word32), MASA_FREQUENCY_BANDS, dumpFile);
}
}
and for float (at end of ivas_mcmasa_enc
) with
{
static FILE* dumpFile = NULL;
if (dumpFile == NULL)
{
dumpFile = fopen("./mcmasa_float_dump.bin", "wb");
}
for (i = 0; i < MAX_PARAM_SPATIAL_SUBFRAMES; i++)
{
fwrite(azimuth_m_values[i], sizeof(float), MASA_FREQUENCY_BANDS, dumpFile);
fwrite(elevation_m_values[i], sizeof(float), MASA_FREQUENCY_BANDS, dumpFile);
fwrite(energyRatio[i], sizeof(float), MASA_FREQUENCY_BANDS, dumpFile);
fwrite(spreadCoherence[i], sizeof(float), MASA_FREQUENCY_BANDS, dumpFile);
fwrite(surroundingCoherence[i], sizeof(float), MASA_FREQUENCY_BANDS, dumpFile);
}
}
and my Matlab script for comparison (and plotting) is here