Chapter 6. Instruction Timing
6-37
Instruction Scheduling Guidelines
The PTE table search assumes a hit in the Trst entry of the primary PTEG and no RC
updates.
6.6 Instruction Scheduling Guidelines
The performance of the MPC7400 can be improved by avoiding resource conicts and
scheduling instructions to take fullest advantage of the parallel execution units. Instruction
scheduling on the MPC7400 can be improved by observing the following guidelines:
¥
To reduce mispredictions, separate the instruction that sets CR bits from the branch
instruction that evaluates them. Because there can be no more than 12 instructions
in the processor (with the instruction that sets CR in CQ0 and the dependent branch
instruction in IQ5), there is no advantage to having more than 10 instructions
between them.
Likewise, when branching to a location speciTed by the CTR or LR, separate the
mtspr
instruction that initializes the CTR or LR from the dependent branch
instruction. This ensures the register values are immediately available to the branch
instruction.
Schedule instructions such that two can be dispatched at a time.
Schedule instructions to minimize stalls due to busy execution units.
Avoid scheduling high-latency instructions close together. Interspersing
single-cycle latency instructions between longer-latency instructions minimizes the
effect that instructions such as integer divide and multiply can have on throughput.
Avoid using serializing instructions.
Schedule instructions to avoid dispatch stalls:
Eight instructions can be tracked in the CQ; therefore, eight instructions can be
in the execute stages at any one time
There are six GPR rename registers; therefore only six GPRs can be speciTed as
destination operands at any time. If no rename registers are available,
instructions cannot enter the execute stage and remain in the reservation station
or instruction queue until they become available.
Note that load with update address instructions use two destination registers
Similarly, there are six FPR rename registers and six VR rename registers, so
only six FPR and six VR destination operands can be in the execute and complete
stages at any time.
¥
¥
¥
¥
¥
¥
100% L1 & L2 cache miss with bus running at 4:1 with 5:2:2:2 memory
33 cycles
100% L1 & L2 cache miss with bus running at 4:1 with 11:1:1:1 memory
57 cycles
Table 6-2.
Effect of TLB Miss on Performance
(Continued)
Cache Hit/Miss
Latency