
Reed Solomon Encoder/Decoder on the StarCore SC140/SC1400 Cores, With Extended Examples, Rev. 1
16
Freescale Semiconductor
Implementation on the SC140 Core
terms in the same cycle and performing the MAC operations. If the number of terms in the input vector does not
divide by four, the input vector is
a priori
zero-appended to bring the number of elements to an integer multiple of
four.
In the multi-sample method, all result vector terms are calculated simultaneously, step by step, in each iteration.
This is performed by loading the matrix elements column-wise instead of row-wise. Zero-appending is done if
needed. The inner loop of code
Example 1
is implemented by the split-summation method in the following steps:
1.
Load four matrix and four input vector terms.
2.
From the four pairs of exponents, get the four exponents of the products. Those exponents are the off-
sets from the base address of the exponential-to-binary conversion table.
3.
Retrieve the four table entries at those four offsets and XOR them with the accumulator.
The theoretical minimal number of cycles needed to realize these steps is as follows:
1.
Loading the matrix and vector elements requires two MOVE.2L instructions.
2.
The table offsets are addresses of four bytes. After their calculation, they must be transferred into four
AGU registers. This requires four MOVE.L instructions.
3.
The four table entries are bytes that are separately accumulated using four MOVEU.B (rx) instruc-
tions.
Thus, the total cycle count for the inner loop is five cycles per polynomial term, for four field points. In other
words, under full parallelization,
one MAC operation
consumes 1.25 cycles. The theoretical minimal cycle count
C
min
is given by:
where
denotes
rounding up to the nearest integer. If the input polynomial is represented in binary form, the
theoretical minimal cycle count becomes:
In addition to this basic cycle count, a small overhead is required in the actual implementation.
4.5 Cycle Count of the Reed-Solomon Routines
The theory enables estimation of the cycle count required to execute the C code for each routine. Following are
summary estimates for each routine. The C code for the decoder routines is presented in
Appendix A
on page 20:
Encoding routine
. A series of two concatenated polynomial evaluations, with
M
equal to 2
T
and
D
equal to 255 or 2
T–
1, respectively.
Therefore, the encoder routine requires at least 680
Τ
+ 544 cycles,
not including overhead.
Syndromes calculation
. A simple polynomial evaluation, with
M
equal to
2T
and
D
equal to 255.
Therefore, the lower bound on the cycle count is 640
T
+ 512 cycles, not including overhead.
Berlekamp-Massey Algorithm
. This algorithm is highly serial and parallelism cannot be applied. The
error-location polynomial
Λ
(x)
is calculated iteratively and bytewise. There are 2
T
iterations during
which mostly MAC operations are performed. The exact number of MACs is data-dependent, but it
can be approximated by
T +
n
errors
(2/
T).
In this implementation, optimization was performed on the
compiled code.
Since it is not possible to perform these MAC operations with inputs
a priori
in
exponential form, an order of 10 cycles is needed for each MAC instruction.
C
min
5
M
D
1
+
4
(
------------------
)
≈
C
min
5
M
D
1
+
4
------------------
)
2
D
(
1
+
)
+
≈