6-6
MPC7400 RISC Microprocessor Users Manual
Instruction Timing Overview
The instruction pipeline stages are described as follows:
¥
Instruction fetchIncludes the clock cycles necessary to request instructions from
the memory system and the time the memory system takes to respond to the request.
Instruction fetch timing depends on many variables, such as whether the instruction
is in the branch target instruction cache, the on-chip instruction cache, or the L2
cache. Those factors increase when it is necessary to fetch instructions from system
memory, and include the processor-to-bus clock ratio, the amount of bus trafTc, and
whether any cache coherency operations are required.
Because there are so many variables, unless otherwise speciTed, the instruction
timing examples below assume optimal performance and show the portion of the
fetch stage in which the instruction is already in the instruction queue. The fetch
stage ends when the instruction is dispatched.
The decode/dispatch stage consists of the time it takes to fully decode the instruction
and dispatch it from the instruction queue to the appropriate execution unit.
Instruction dispatch requires the following:
Instructions can be dispatched only from the two lowest instruction queue
entries, IQ0 and IQ1.
A maximum of two instructions can be dispatched per clock cycle (although an
additional branch instruction can be handled by the BPU).
Only one instruction can be dispatched to each execution unit (IU1, IU2, FPU,
LSU, SRU, VPU, and VALU) per clock cycle.
There must be a vacancy in the speciTed execution unit.
A rename register must be available for each destination operand speciTed by the
instruction.
For an instruction to dispatch, the appropriate execution unit must be available
and there must be an open position in the CQ. If no entry is available, the
instruction remains in the IQ.
The execute stage consists of the time between dispatch to the execution unit (or
reservation station) and the point at which the instruction vacates the execution unit.
Most integer instructions have a one-cycle latency; results of these instructions can
be used in the clock cycle after an instruction enters the execution unit. However,
integer multiply and divide instructions take multiple clock cycles to complete. The
IU1 can process all integer instructions; the IU2 can process all integer instructions
except multiply and divide instructions.
The LSU, FPU, VCIU and VFPU units are pipelined, as shown in Figure 6-2.
¥
¥