Bugfixes for SNS: fix spectral tilt values and convert SNS VQ codebooks to fixed-point
The spectral tilt values used in lib_com/ivas_sns_com.c
in the function sns_compute_scf
are wrong. There should be only one tilt value used per core sampling rate, but currently every sub-frame length has its own value. This results in using a different spectral tilt for short, long and transition blocks of the same core fs, which is incorrect. This is fixed by setting the values only on the different (whole) frame lengths. The new tilt values are also better tuned to be more similar between the different core sampling rates.
Changing the spectral tilt values makes it necessary to re-train the VQ codebooks. In the same run, we followed a suggestion from the crosscheck of the SNS MSVQ and converted the VQ codebooks to be fixed-point-compatible. I.e., for the SNS MSVQ tables, the tables are now pre-rounded to be (signed) Q4.12 representable. The tables for the split VQ are now actually stored as int16_t values. Having BASOP-compatible codebooks already in the floating point code should help us minimize risk for compatibility/conformance problems between float and fixed point implementations.
Perceptual impact is negligible and was checked by informal listening.
Complexity impact is as follows (measured with the ltv vectors):
TROM: decrease by 16 WORDS in lib_com.
PROM: same for all operating points. Below is the diff between switch on and off
enc dec com rend total
conf
Stereo@48 kbps WB to STEREO -29 8 -2 0 -23
RAM: No impact on max RAM demand.
WMOPS diff between switch on and off is shown below for stereo modes. Complexity-wise, the only thing that is really added is the conversion from integer tables to float values in the split VQ encoder. Most split VQ bitrates show a decrease, however. This is most likely due to the differences in quantization later in the core coder which can trigger different things that affect the complexity numbers and the worst case frame number changes. Generally, no big changes in computational complexity occur.
enc dec total
conf
Stereo@48 kbps WB to STEREO 0.266 -0.049 0.217
Stereo@64 kbps WB to STEREO -0.004 0.015 0.011
Stereo@80 kbps WB to STEREO 0.093 0.091 0.184
Stereo@96 kbps WB to STEREO 0.845 0.245 1.090
Stereo@128 kbps WB to STEREO 0.083 0.019 0.102
Stereo@160 kbps WB to STEREO -0.402 0.018 -0.384
Stereo@192 kbps WB to STEREO -0.639 -0.124 -0.763
Stereo@256 kbps WB to STEREO -0.040 -0.126 -0.166
Stereo@48 kbps SWB to STEREO 0.034 -0.366 -0.330
Stereo@64 kbps SWB to STEREO -0.183 0.311 0.130
Stereo@80 kbps SWB to STEREO -0.641 -0.234 -0.870
Stereo@96 kbps SWB to STEREO -0.138 0.455 0.320
Stereo@128 kbps SWB to STEREO -0.002 0.089 0.080
Stereo@160 kbps SWB to STEREO -0.575 -0.076 -0.650
Stereo@192 kbps SWB to STEREO -0.777 -0.058 -0.840
Stereo@256 kbps SWB to STEREO 0.250 0.062 0.320
Stereo@48 kbps FB to STEREO -0.381 0.192 -0.180
Stereo@64 kbps FB to STEREO -0.130 0.336 0.200
Stereo@80 kbps FB to STEREO -1.820 0.117 -1.700
Stereo@96 kbps FB to STEREO -0.010 0.228 0.220
Stereo@128 kbps FB to STEREO -0.012 -0.088 -0.100
Stereo@160 kbps FB to STEREO -0.034 -0.183 -0.220
Stereo@192 kbps FB to STEREO -0.041 -0.232 -0.270
Stereo@256 kbps FB to STEREO 0.207 -0.388 -0.180