
1-12
PRELIMINARY
Integer Unit
Advanci ng the S tandar ds
1.2.5 Branch Control
Branch instructions occur on average every
four to six instructions in x86-compatible pro-
grams. When the normal sequential flow of a
program changes due to a branch instruction,
the pipeline stages may stall while waiting for
the CPU to calculate, retrieve, and decode the
new instruction stream. The M II CPU mini-
mizes the performance degradation and
latency of branch instructions through the use
of branch prediction and speculative execu-
tion.
1.2.5.1
Branch Prediction
The M II CPU uses a 512-entry, 4-way set asso-
ciative Branch Target Buffer (BTB) to store
branch target addresses. The M II CPU has
1024-entry branch history table. During the
fetch stage, the instruction stream is checked
for the presence of branch instructions. If an
unconditional branch instruction is encoun-
tered, the M II CPU accesses the BTB to check
for the branch instruction’s target address. If
the branch instruction’s target address is found
in the BTB, the M II CPU begins fetching at the
target address specified by the BTB.
In case of conditional branches, the BTB also
provides history information to indicate
whether the branch is more likely to be taken
or not taken. If the conditional branch instruc-
tion is found in the BTB, the M II CPU begins
fetching instructions at the predicted target
address. If the conditional branch misses in the
BTB, the M II CPU predicts that the branch
will not be taken, and instruction fetching
continues with the next sequential instruction.
The decision to fetch the taken or not taken
target address is based on a four-state branch
prediction algorithm.
Once fetched, a conditional branch instruction
is first decoded and then dispatched to the X
pipeline only. The conditional branch instruc-
tion proceeds through the X pipeline and is
then resolved in either the EX stage or the WB
stage. The conditional branch is resolved in the
EX stage, if the instruction responsible for
setting the condition codes is completed prior
to the execution of the branch. If the instruc-
tion that sets the condition codes is executed
in parallel with the branch, the conditional
branch instruction is resolved in the WB stage.
Correctly predicted branch instructions
execute in a single core clock. If resolution of a
branch indicates that a misprediction has
occurred, the M II CPU flushes the pipeline
and starts fetching from the correct target
address. The M II CPU prefetches both the
predicted and the non-predicted path for each
conditional branch, thereby eliminating the
cache access cycle on a misprediction. If the
branch is resolved in the EX stage, the
resulting misprediction latency is four cycles.
If the branch is resolved in the WB stage, the
latency is five cycles.
Since the target address of return (RET)
instructions is dynamic rather than static, the
M II CPU caches target addresses for RET
instructions in an eight-entry return stack
rather than in the BTB. The return address is
pushed on the return stack during a CALL
instruction and popped during the corre-
sponding RET instruction.