
October 13 1995, Draft 1
390
Addendum to –– Evaluating and Programming the 29K RISC Family
A processed trace line contains the instruction which was in write–back during
the captured trace cycle, or data which was transferred during the cycle. Put another
way, if an instruction is fetched from memory, then during its write–back cycle (if it
reaches execute) the op–code is reported in the processed trace. Let’s look at the algo-
rithm used with the Am29040 Traceable Cache preprocessor. The DATA and ADDR
labels have their values changed by the algorithm to reflect the instruction which was
executed during the traced cycle. The DATA and ADDR labels in the raw trace indi-
cate the instruction which was fetched during the traced cycle or data which was ac-
cessed during the cycle. If no data access or instruction execution occurs in a cycle,
then there is no processed trace line corresponding to the raw trace line. MonTIP only
reports lines which are considered
valid
.
The algorithm operates in two stages; first data accesses are processed, then
instruction flow is determined. Data accesses are examined to determine if there are
any repeat accesses reported due to the use of Scalable Clocking. Trace information
is captured at the internal processors speed. The memory system may be running at
half this speed. Consequently, accesses to memory are captured twice in adjacent
trace cycles. Only the final access is considered valid.
Data transfer, due to a load or store instruction, can occur during the same cycle
another instruction is executed. When this happens, the algorithm moves the report-
ing of the data access to a future trace cycle which contains no valid trace informa-
tion. If another data transfer occurs before the previous is reported, then the previous
data value will not be reported. The R/_W, and I/_D information is repositioned
where necessary and possible, so as to report data accesses which occurred. Note,
LOADM and STOREM data transfers are reported before the instruction execution
is reported; this reflects the correct operation of a 29K processor. Currently, the algo-
rithm is being enhanced to enable multiple instruction execution or data accesses to
be reported occurring on different processed trace lines which correspond to the same
captured trace cycle. This eliminates the need to reposition or drop data accesses
reporting. These algorithm enhancements are required by superscalar processors.
Figure 7-15. Path Taken By Am29040 Recursive Trace Processing Algorithm
target
branch
delay–slot
target
branch
delay–slot
target
instruction
sequential
instruction
sequence