
Chapter 6. Instruction Pipeline and Timing
For More Information On This Product,
Go to: www.freescale.com
6-13
Operand Execution Pipeline (OEP)
NOTE:
Ry indicates a Dy or Ay source register. Rx indicates a Dx or
Ax
destination
register
OagComputeEngine on subsequent instructions as a base
register with no stall or as an index register with a scale factor
of 1 and no stall. If the destination register is then used as an
index with a different scale factor, the 3-cycle stall described
above occurs
available
either
to
the
V4 CPU measurements indicate a 0.12-cycle-per-instruction degradation factor associated
with change/use stalls across the embedded benchmark suite. Approximately 6% of the
dynamic instruction stream encounter this type of stall. The average stall is about 2 cycles
per register-busy.
6.3.4 EMAC-Specific OEP Sequence Stalls
The ColdFire family supports two multiply-accumulate implementations that provide
different levels of performance and capability for differing silicon costs. The EMAC
features a four-stage execution pipeline, optimized for 32-bit operands with a
fully-pipelined 32 x 32 multiply array and four 48-bit accumulators. A MAC or EMAC can
be attached to any version ColdFire core as determined by application requirements.
The EMAC execution pipeline overlaps the EX stage of the OEP; that is, the first stage of
the EMAC pipeline is the last stage of the basic OEP. EMAC units are designed for
sustained, fully-pipelined operation on accumulator load, copy, and multiply-accumulate
instructions. However, instructions that store contents of the multiply-accumulate
programming model can generate OEP stalls that expose the EMAC execution pipeline
depth, as in the following
mac.w Ry,Rx,Acc0
mov.l Acc0,Rz
The mov.l instruction that stores the accumulator to an integer register (Rz) stalls until the
program-visible copy of the accumulator is available. Figure 6-4 shows EMAC timing.
F
Freescale Semiconductor, Inc.
n
.