[Complexity] Optimize v_add_inc_fx()
# Basic info This is a sub-task of issue #1009. # Bug description The function v_add_inc_fx() is basically only used for interleaved input (x_inc == x2_inc == 2, y_inc == 1, &x1\[1\] == &x2\[0\]. For this case, the pointer/array addresses don't need to be computed in BASOP, but it's simple arithmetic where no instrumentation is needed; it's proposed to treat this as a special case in the function, in order to save complexity. ``` void v_add_inc_fx( const Word32 x1[], /* i : Input vector 1 Qx*/ const Word16 x_inc, /* i : Increment for input vector 1 Q0*/ const Word32 x2[], /* i : Input vector 2 Qx*/ const Word16 x2_inc, /* i : Increment for input vector 2 Q0*/ Word32 y[], /* o : Output vector that contains vector 1 + vector 2 Qx*/ const Word16 y_inc, /* i : increment for vector y[] Q0*/ const Word16 N /* i : Vector length Q0*/ ) { #ifndef PATCH Word16 i; Word16 ix1 = 0; Word16 ix2 = 0; Word16 iy = 0; #else Word16 i, ix1, ix2, iy; /* The use of this function is currently always for the interleaved input format, */ /* that means, the following conditions are always true and thus obsolete. */ test(); test(); test(); test(); IF( ( sub( x_inc, 2 ) == 0 ) && ( sub( x2_inc, 2 ) == 0 ) && ( sub( y_inc, 1 ) == 0 ) && ( &x1[1] == &x2[0] ) ) { /* Interleaved input case, linear output */ FOR( i = 0; i < N; i++ ) { y[i] = L_add( x1[2 * i + 0], x1[2 * i + 1] ); /*Qx*/ move32(); } return; } ix1 = 0; ix2 = 0; iy = 0; #endif move16(); move16(); move16(); FOR( i = 0; i < N; i++ ) { y[iy] = L_add( x1[ix1], x2[ix2] ); /*Qx*/ move32(); ix1 = add( ix1, x_inc ); /*Q0*/ ix2 = add( ix2, x2_inc ); /*Q0*/ iy = add( iy, y_inc ); /*Q0*/ } return; } ``` Saves 3 cycles for `FOR`-iteration in the regular case. # Ways to reproduce (Clear steps or refer to a failing automated test, e.g. with a pipeline link)
issue