
10
Evaluating and Programming the 29K RISC Family
All processors in the 29K family support byte and half–word size read and write
access to data memory. The original Am29000 (pre rev–D, 1990) only supported
word sized data access. This resulted in read–modify–write cycles to modify sub–
word sized objects. The processor supports insert– and extract–byte and half–word
instructions to assist with sub–word operations. These instructions are little used
today.
The processor has a Branch Target Cache (BTC) memory which is used to sup-
ply the first four instructions of previously taken branches. Successful branches are
20% of a typical instruction mix. Using burst–mode and interleaf techniques,
memory systems can sustain the high bandwidths required to keep the instruction
hungry RISC fed. However, when a branch occurs, memory systems can present con-
siderable latency before supplying the first instruction of the branch target. For ex-
ample, consider an instruction memory system which has a 3–cycle first access laten-
cy but can sustain 1–cycle access in burst–mode. Typically every 5th instruction is a
branch and for the example the branch instruction would take effectively 5–cycles to
complete its execution (the pipeline would be stalled for 4–cycles (see section 1.13)).
If all other instructions were executed in a single–cycle the average cycle time per
instruction would be 1.8 (i.e. 9/5); not the desired sustained single–cycle operation.
The BTC can hide all 3–cycles of memory access latency, and enable the branch
instruction to execute in a single–cycle.
The programmer has little control over BTC operation; it is maintained internal-
ly by processor hardware. There are 32 cache entries (known as cache blocks) of four
instructions each. They are configured in a 2–way set associative arrangement. En-
tries are tagged to distinguish between accesses made in User mode and Supervisor
mode; they are also tagged to differentiate between virtual addresses and physical
addresses. Because the address in the program counter is presented to the BTC at the
same time it is presented to the MMU, the BTC does not operate with physical ad-
dresses. Entries are not tagged with per–process identifiers; consequently the BTC
can not distinguish between identical virtual addresses belonging to different pro-
cesses operating with virtual addressing. Systems which operate with multiple tasks
using virtual addressing must invalidate the cache when a user–task context switch
occurs. Using the IRETINV (interrupt return and invalidate) instruction is one con-
venient way of doing this.
The BTC is able to hold the instructions of frequently taken trap handler rou-
tines, but there is no means to lock code sequences into the cache. Entries are replaced
in the cache on a random basis, the most recently occurring branches replacing the
current entries when necessary.
The 3–bus members of the the 29K family can operate the shared address bus in
a pipeline mode. If a memory system is able to latch an address before an instruction
or data transfer is complete, the address bus can be freed to start a subsequent access.