![](http://datasheet.mmic.net.cn/230000/INSTRUCTIN603EWP_datasheet_15584964/INSTRUCTIN603EWP_19.png)
RISC Microprocessor Division
Page 19
The branch processing unit (BPU) can fold certain branches out of the instruction queue. They are
removed from the IQ before being dispatched, allowing the dispatcher to handle other instructions, and
freeing space in the instruction queue and completion queue for other instructions. Frequently,
instruction flow can continue as if the branch had not occurred.
The BPU can fold all unconditional branches, as well as conditional branches that do not involve the
CTR or LR. Conditional branches that do involve these registers cannot be folded because the CTR
and LR have corresponding rename registers which can only be tracked if branches using them get
recorded in the completion queue by being dispatched.
Consider the left two columns of diagrams. We start with four instructions in the instruction queue.
Instruction C is a branch. In the second column, we see that instructions A and B have been
dispatched and have entries in the completion queue, and that instruction C has been folded out by the
BPU. Instructions E and F have also been fetched in.
Because superscalar processors feature multiple units that are attempting to flow instructions through
their pipelines as quickly as possible, race conditions between various resources can occasionally
arise. One race condition occurs in the instruction queue: if the dispatcher can tag a branch for
dispatch before the BPU can fold it out of the instruction queue, then the branch will not be folded; it will
be dispatched and an entry created for it in the completion queue. This situation typically occurs if the
IQ is empty or near-empty and the foldable branch is fetched directly into one of the bottom two slots
(i.e. the slots from which instructions are dispatched). However, the performance impact of this race
condition is negligible.
The right two columns illustrate the branch race condition. Instructions A and B have just been fetched
into the instruction queue, with A being a branch. In this case, the dispatcher grabs A before it can be
folded, and we see it in the completion queue in the next cycle.