Incorrect ITD detection when running DFT stereo on binaural audio items generated with the pre-renderer tool
Basic info
- Commit SHA: 47d5fc30
- Platform: Linux/Windows
Bug description
When combining clean mono speakers in a HOA3 scene and rendering them to BINAURAL output using the pre-renderer tool, the ITD estimation module based on the GCC-PHAT function inside DFT stereo fails to detect the correct ITD.
Ways to reproduce
The binaural audio items are generated with the pre-renderer tool as follow:
python -m pyaudio3dtools.audio3dtools -b -i prerenderer_config.txt -s 48000 -f META -o output.wav -F HOA3
Here's the prerenderer_config.txt. The input file contains two clean speech objects (ISMs), each at its own position. After pre-rendering the output file should be named out_BINAURAL.wav.
The IVAS codec is then run on output_BINAURAL.wav with:
./IVAS_cod.exe -dtx -max_band SWB -stereo 24400 48 output_BINAURAL.wav bitstream
The GCC-PHAT function is automatically extracted with DEBUG_DFT_STEREO ON and it can be found in res/gcc_phat. The ITD can be found in res/itd.
The two mono speakers are non-overlaping, located at -80° and 90° in the horizontal place, respectively. Therefore, they both have a non-zero ITD. However, the GCC-PHAT function contains a strong peak in every frame corresponding to ITD=0 which is wrong. As a consequence, there is a strong spatial degradation after the decoding. See the graph below (legend: left channel, right channel, GCC-PHAT):
It's unclear whether the problem is in the GCC-PHAT routine or in the pre-renderer.
