[Complexity] Optimize v_add_inc_fx()
Basic info
This is a sub-task of issue #1009 (closed).
Bug description
The function v_add_inc_fx() is basically only used for interleaved input (x_inc == x2_inc == 2, y_inc == 1, &x1[1] == &x2[0]. For this case, the pointer/array addresses don't need to be computed in BASOP, but it's simple arithmetic where no instrumentation is needed; it's proposed to treat this as a special case in the function, in order to save complexity.
void v_add_inc_fx(
const Word32 x1[], /* i : Input vector 1 Qx*/
const Word16 x_inc, /* i : Increment for input vector 1 Q0*/
const Word32 x2[], /* i : Input vector 2 Qx*/
const Word16 x2_inc, /* i : Increment for input vector 2 Q0*/
Word32 y[], /* o : Output vector that contains vector 1 + vector 2 Qx*/
const Word16 y_inc, /* i : increment for vector y[] Q0*/
const Word16 N /* i : Vector length Q0*/
)
{
#ifndef PATCH
Word16 i;
Word16 ix1 = 0;
Word16 ix2 = 0;
Word16 iy = 0;
#else
Word16 i, ix1, ix2, iy;
/* The use of this function is currently always for the interleaved input format, */
/* that means, the following conditions are always true and thus obsolete. */
test();
test();
test();
test();
IF( ( sub( x_inc, 2 ) == 0 ) && ( sub( x2_inc, 2 ) == 0 ) && ( sub( y_inc, 1 ) == 0 ) && ( &x1[1] == &x2[0] ) )
{
/* Interleaved input case, linear output */
FOR( i = 0; i < N; i++ )
{
y[i] = L_add( x1[2 * i + 0], x1[2 * i + 1] ); /*Qx*/
move32();
}
return;
}
ix1 = 0;
ix2 = 0;
iy = 0;
#endif
move16();
move16();
move16();
FOR( i = 0; i < N; i++ )
{
y[iy] = L_add( x1[ix1], x2[ix2] ); /*Qx*/
move32();
ix1 = add( ix1, x_inc ); /*Q0*/
ix2 = add( ix2, x2_inc ); /*Q0*/
iy = add( iy, y_inc ); /*Q0*/
}
return;
}
Saves 3 cycles for FOR
-iteration in the regular case.
Ways to reproduce
(Clear steps or refer to a failing automated test, e.g. with a pipeline link)
Edited by multrus