
Philips Semiconductors
Custom Operations for Multimedia
File: cstm.fm5, modified 7/26/99
PRELIMINARY INFORMATION
4-7
ations, and the quadavg and dspuquadaddui operations
have replaced the bulk of the loop code. Finally,
Figure 4-8 shows a more compact expression of the loop
code, eliminating the temporary variable (Note that
TM100 C compiler does the optimization by itself).
assumes that the character arrays are 32-bit word
aligned and padded if necessary to fill an integral number
of 32-bit words.
The original code required three additions, one shift, two
tests, three loads, and one store per pixel. The new code
using custom operations requires only two custom oper-
ations, three loads, and one store for
four pixels, which is
more than a factor of six improvement. The actual perfor-
mance improvement can be even greater depending on
how well the compiler is able to deal with the branches in
the original version of the code, which depends in part on
the surrounding code. Reducing the number of branches
void reconstruct (unsigned char *back,
unsigned char *forward,
char *idct,
unsigned char *destination)
{
int i, temp0, temp1, temp2, temp3;
for (i = 0; i < 64; i += 4)
{
temp0 = ((back[i+0] + forward[i+0] + 1) >> 1);
temp1 = ((back[i+1] + forward[i+1] + 1) >> 1);
temp2 = ((back[i+2] + forward[i+2] + 1) >> 1);
temp3 = ((back[i+3] + forward[i+3] + 1) >> 1);
temp0 += idct[i+0];
if (temp0 > 255) temp0 = 255;
else if (temp0 < 0) temp0 = 0;
temp1 += idct[i+1];
if (temp1 > 255) temp1 = 255;
else if (temp1 < 0) temp1 = 0;
temp2 += idct[i+2];
if (temp2 > 255) temp2 = 255;
else if (temp2 < 0) temp2 = 0;
temp3 += idct[i+3];
if (temp3 > 255) temp3 = 255;
else if (temp3 < 0) temp3 = 0;
destination[i+0] = temp0;
destination[i+1] = temp1;
destination[i+2] = temp2;
destination[i+3] = temp3;
}
void reconstruct (unsigned char *back,
unsigned char *forward,
char *idct,
unsigned char *destination)
{
int i, temp;
int *i_back
= (int *) back;
int *i_forward = (int *) forward;
int *i_idct
= (int *) idct;
int *i_dest
= (int *) destination;
for (i = 0; i < 16; i += 1)
{
temp = QUADAVG(i_back[i], i_forward[i]);
temp = DSPUQUADADDUI(temp, i_idct[i]);
i_dest[i] = temp;
}
Figure 4-7. Using the custom operation dspquadaddui to speed up the loop of Figure 4-6.