Simplification of arithmetic codec subfunctions

The arithmetic codec uses some subroutines in combination with some comparators in a very inefficient way.

Example:

IF( GT_32( L_multi31x16_X2( range_h, range_l, p[8] ), cum ) ) 
{ 
  p = p + 8; 
}

Initially, the Word32 range is split into Word16 range_h and range_l on cost of 4 operations. The split is not needed at all.

The function L_multi31x16_X2 costs 3 operations instead of 1.

The comparison of the product with Word32 could be merged into a MSU operation without additional costs.

The costly IF-clause could be changed into lower-case "if".

The new solution (naming still t.b.d.) could then look like that:

if (L_msui_32_16(cum, range, p[8]) < 0) 
{ 
  p = p + 8;
  move16();
}
Edited by Arthur Tritthart