Skip to content

[non-BE][allow regression] Optimize ivas_dirac_dec_binaural_formulate_input_covariance_matrices

Closes #2157

Summary

The ivas_dirac_dec_binaural_formulate_input_covariance_matrices_fx function makes extensive use of float-like operations (e.g. BASOP_Util_Add_Mant32Exp) which are computationally expensive and not really necessary. Those operations can be swapped with cheaper 64-bit low-level operations.

Complexity analysis

Good improvement from 138.376 WMOPS to 135.285 WMOPS.

Before:

 --- Complexity analysis [WMOPS] ---  

                                           |------  SELF  ------|   |---  CUMULATIVE  ---|
                        routine    calls     min     max     avg      min     max     avg 
                ---------------   ------   ------  ------  ------   ------  ------  ------
                ivas_jbm_dec_tc     1.00    1.891   1.915   1.915   26.345  38.997  29.507
               ivas_spar_decode     1.00    1.010   1.041   1.019    2.544   2.684   2.628
               ivas_spar_dec_MD     1.00    1.526   1.665   1.609    1.526   1.665   1.609
                   ivas_sce_dec     1.00    0.246   0.246   0.246   21.794  34.495  24.964
                  ivas_core_dec     1.00    3.152  11.390   7.909   21.548  34.249  24.718
                 acelp_core_dec     0.61   12.830  19.132  14.543   12.830  19.132  14.543
      ivas_dec_prepare_renderer     1.00    7.068   8.724   7.173    7.068   8.724   7.173
                ivas_dec_render     1.00   77.336  88.004  87.518   81.487  92.262  91.776
    ivas_sba_prototype_renderer     4.00    3.922   4.258   4.258    3.922   4.258   4.258
            stereo_tcx_core_dec     0.39   17.651  31.015  20.324   17.651  31.015  20.324
                ---------------   ------   ------  ------  ------
                          total  1000.00  119.955 138.376 128.456

After:

 --- Complexity analysis [WMOPS] ---  

                                           |------  SELF  ------|   |---  CUMULATIVE  ---|
                        routine    calls     min     max     avg      min     max     avg 
                ---------------   ------   ------  ------  ------   ------  ------  ------
                ivas_jbm_dec_tc     1.00    1.891   1.915   1.915   26.345  38.997  29.507
               ivas_spar_decode     1.00    1.010   1.041   1.019    2.544   2.684   2.628
               ivas_spar_dec_MD     1.00    1.526   1.665   1.609    1.526   1.665   1.609
                   ivas_sce_dec     1.00    0.246   0.246   0.246   21.794  34.495  24.964
                  ivas_core_dec     1.00    3.152  11.390   7.909   21.548  34.249  24.718
                 acelp_core_dec     0.61   12.830  19.132  14.543   12.830  19.132  14.543
      ivas_dec_prepare_renderer     1.00    7.068   8.724   7.173    7.068   8.724   7.173
                ivas_dec_render     1.00   75.226  86.621  85.357   79.377  90.879  89.615
    ivas_sba_prototype_renderer     4.00    3.922   4.258   4.258    3.922   4.258   4.258
            stereo_tcx_core_dec     0.39   17.651  31.015  20.324   17.651  31.015  20.324
                ---------------   ------   ------  ------  ------
                          total  1000.00  117.845 135.285 126.295

Accuracy analysis

The optimised implementation does not normalise/truncate 64-bits integers, tries to be as precise as possible when performing summations and multiplications with 64-bits integers. Unfortunately, it does not operate on normalised values (as the current implementation does), and that makes this optimised implementation slightly less accurate than the current one when processing "tiny" input values.

30/10/2024

Implemented "unit test" which compares the computation of the current fixed-point implementation vs optimised fixed-point implementation vs single-precision floating-point implementation vs double-precision floating-point implementation.

I have discovered various aspects that make the results of the optimised fixed-point implementation slightly diverge from the results of the current fixed-point implementation. One important cause of the results divergence is that the current fixed-point implementation operates on normalised values (because of the pseudo float operations) and as a consequence it is more accurate when processing tiny quantities. The optimised fixed-point implementation does not normalise any value, performs all operations with 64-bits integers and it is more accurate when processing large quantities. In the specific test that I am running, the inputs to the function do not use the full 32-bit range of values and that makes the current fixed-point implementation slightly more accurate than the optimised one when computing IIReneLimiter. The optimised implementation is always more accurate than the current one when computing SubFrameTotalEne.

There is a solution to make the optimised implementation more accurate for small values, but that would increase the computation a little bit (hard to quantify by how much) and memory usage. The idea would be to implement a small int96_t/int128_t library. In this way ivas_dirac_dec_binaural_formulate_input_covariance_matrices_fx would be as accurate as the double-precision floating point implementation.

Another solution would be to keep the optimised implementation as it is and make sure the inputs to the function use the full range of values (i.e. use a better Q format). If the inputs do not occupy the full range of the fixed-point format, then we have unused or "wasted" leading bits, which reduce the effective precision of the representation and lead to poor results (as it is happening in this case).

Edited by Nicolas Roussin

Merge request reports

Loading