
22
Evaluating and Programming the 29K RISC Family
Write–Read Dependency
Even if a processor is able to support out–of–order instruction completion, it
still must deal with the data dependencies that flow through a program’s execution.
These
flow dependencies
(often known as
true dependencies
) represent the move-
ment of operands between instructions in a program.Examine the code below:
mul
add
gr96,lr2,lr5
gr97,gr96,1
;write gr96, gr96 = lr2 * lr5
;read gr96, write–read dependency
The first instruction would be issued to the integer
multiply unit; this will have (according to AMD’s
product overview) two cycles of latency. The result is
written to register
gr96
. The second instruction would
be issued to a different integer handling unit. However,
it has a source operand supplied in
gr96
. If the second
instruction had no data dependencies on the first, it
would be easy to issue the instruction while the first was
still in execute. However, execution of the first
instruction must complete before the second instruction
can start execution. Steps must be taken to deal with the
data dependency. This kind of dependency is also know
as
write–read
dependency, because
gr96
must be
written by an earlier instruction before a later one can
read the result.
Some superscalar processors, such as the Intel i960 CA, use a
reduced–scoreboarding mechanism to resolve data dependances [Thorton 1970].
When a register is required for a result, a one–bit flag is set to indicate the register is in
use. Currently in–execute instructions set the scoreboard bit for their result registers.
Before an instruction is issued the scoreboard bit is examined. Further instructions
are not issued if the scoreboard indicates that an in–execute instruction intends to
write a register which supplies a source operand for the instruction waiting for issue.
When an instruction completes, the relevant scoreboard bit is cleared. This may
result in a currently stalled instruction being issued.
It is unlikely a 29K processor will use scoreboarding; and even less likely it will
use a reduced–scoreboarding mechanism, such as the i960 CA, which only detects
data dependency for out–of–order instruction completion. A superscalar 29K
processor will support out–of–order instruction issue, which is described shortly.
Scoreboarding can resolve the resulting data dependencies. However, other
techniques, such as
register renaming
, enable instructions to be decoded and issued
further ahead than is possible with scoreboarding. This will be described in more
detail as we proceed.
mul
add
gr96
lr5
lr2
gr97
1