Incorrect ITD detection when running DFT stereo on binaural audio items generated with the pre-renderer tool

Basic info

Commit SHA: 47d5fc30
Platform: Linux/Windows

Bug description

When combining clean mono speakers in a HOA3 scene and rendering them to BINAURAL output using the pre-renderer tool, the ITD estimation module based on the GCC-PHAT function inside DFT stereo fails to detect the correct ITD.

Ways to reproduce

The binaural audio items are generated with the pre-renderer tool as follow:

python -m pyaudio3dtools.audio3dtools -b -i prerenderer_config.txt -s 48000 -f META -o output.wav -F HOA3

Here's the prerenderer_config.txt. The input file contains two clean speech objects (ISMs), each at its own position. After pre-rendering the output file should be named out_BINAURAL.wav.

The IVAS codec is then run on output_BINAURAL.wav with:

./IVAS_cod.exe -dtx -max_band SWB -stereo 24400 48 output_BINAURAL.wav bitstream

The GCC-PHAT function is automatically extracted with DEBUG_DFT_STEREO ON and it can be found in res/gcc_phat. The ITD can be found in res/itd.

The two mono speakers are non-overlaping, located at -80° and 90° in the horizontal place, respectively. Therefore, they both have a non-zero ITD. However, the GCC-PHAT function contains a strong peak in every frame corresponding to ITD=0 which is wrong. As a consequence, there is a strong spatial degradation after the decoding. See the graph below (legend: left channel, right channel, GCC-PHAT):

It's unclear whether the problem is in the GCC-PHAT routine or in the pre-renderer.

Edited Nov 10, 2022 by Ghost User