Chapter 6. Instruction Timing
6-29
Execution Unit Timings
1. In clock cycle 1, instructions 0 and 1 execute and complete. Instructions 2 and 3
enter the dispatch entries in the IQ. Instruction 4 (a second
bc
instruction) and 5 are
fetched. The second
bc
instruction is predicted as taken. It can be folded, but it
cannot be resolved until instruction 3 writes back.
2. In clock cycle 2, instruction 4 has been folded and instruction 5 has been ushed
from the IQ. The two target instructions, T0 and T1, are both in the BTIC, so they
are fetched in this cycle. Note that even though the Trst
bc
instruction may not have
resolved by this point (we can assume it has), the MPC7400 allows fetching from a
second predicted branch stream. However, these instructions could not be
dispatched until the previous branch has resolved.
3. In clock cycle 3, target instructions T2DT5 are fetched as T0 and T1 are dispatched.
4. In clock cycle 4, instruction 3, on which the second branch instruction depended,
writes back and the branch prediction is proven incorrect. Even though T0 is in CQ1,
from which it could be written back, it is not written back because the branch
prediction was incorrect. All target instructions are ushed from their positions in
the pipeline at the end of this clock cycle, as are any results in the rename registers.
After one clock cycle required to refetch the original instruction stream, instruction 5, the
same instruction that was fetched in clock cycle 1, is brought back into the IQ from the
instruction cache, along with three others (not all of which are shown).
6.4.2 Integer Unit Execution Timing
The MPC7400 has two integer units. The IU1 can execute all integer instructions; and the
IU2 can execute all integer instructions except multiply and divide instructions. As shown
in Figure 6-2, each integer unit has one execute pipeline stage, thus when a multicycle
integer instruction is being executed, no other integer instructions can begin to execute.
Table 6-6 lists integer instruction latencies.
Most integer instructions have an execution latency of one clock cycle.
6.4.3 Floating-Point Unit Execution Timing
The oating-point unit on the MPC7400 executes all oating-point instructions. Execution
of most oating-point instructions is pipelined within the FPU, allowing up to three
instructions to be executing in the FPU concurrently. Although most oating-point
instructions execute with three-cycle latency and one-cycle throughput, three instructions
(
fdivs
,
fdiv
, and
fres
) execute with latencies of 17 to 31 cycles. The
fdivs
,
fdiv
,
fres
,
mcrfs
,
mtfsb0
,
mtfsb1
,
mtfsT
,
mffs
, and
mtfsf
instructions block the oating-point unit pipeline
until they complete execution, and thereby inhibit the dispatch of additional oating-point
instructions. See Table 6-7 for oating-point instruction execution timing.