ISM: Binaural rendering gain is too high -- TD renderer matches HRIRs without additional normalization
EDIT: When obtaining the filter model for the TD renderer no normalization is applied. Hence, any level attenuation or amplification that are part of the HR filter set will be applied in the same way in the TD renderer. The level of the TD renderer will be close to the Python reference renderer. The default filter has already been normalized to have 0 dB amplification on average, but it appears another level normalization is required.
Next step: clarify what level normalization is needed such that it may be added to the modeling code for the TD renderer. Does anyone know what needs to be applied? @ramoa, @laitinenmik, @tamarapu, @emerit?
Original issue filed by @ramoa:
When testing oMASA we have identified that ISM rendering to binaural has about 3 desibels too high volume. This makes listening test processing impossible, since e.g. IVAS_rend and oMASA (enc+dec) produce very different balance between ISM and MASA when rendered for BINAURAL listening.
As far as we can see oMASA rendering of MASA + ISM, the gains about correct currently. However, when using IVAS_rend with oMASA or IVAS_dec with ISMs BINAURAL rendered signal has too high volume for the object part. We also checked STEREO and 7.1.4 outputs with ISMs and they appear to have correct gains. We have not yet checked other output formats such as MONO, other MC modes or SBA.
The measurement was done with analysing input objects with bs1770 and then comparing the rendered output with the same tool. Naturally the HRTFs and directions affect the gains somewhat, but we used a lot varying directions for ISM metadata when doing the measurement. See the attached Excel sheet for results ISM_levels.xlsx.
It shows average of ~15 different scenarios with 3-4 ISMs. Bitrates from 24.4- 160 kbit/s as well IVAS_rend are shown. I think it is reasonable that lower bitrates have somewhat lower output volume, but the general level of ISM BINAURAL rendering too loud. This also causes the audio limiter of the decoder to deploy a lot of time, if the ISM input signals are relatively hot.