
278
Evaluating and Programming the 29K RISC Family
they do not need to be invalidated on a task context switch; they do not need extra tag
information to distinguish virtual from physical access and Supervisor from User
mode access; and importantly, cache coherence problems are more easily solved with
a physically addressed cache. It is somewhat more difficult to implement a physically
addressed data cache. Virtual data addresses must first be converted to physical
addresses before cache access can be attempted. The required address translation
followed by the cache access overhead can introduce a delay before the cache can
respond with the requested data. As internal processor speeds increase, the cache
may not be able to respond within a single–cycle, thus introducing the potential for
pipeline stalling if load instruction scheduling is not performed.
The data cache is enabled by clearing the Data Cache Disable (DD) bit in the
CFG configuration register. Data caches support accesses to byte and half–word
sized objects within a cached word. Cache tag information is associated with each
block (or cache entry), and the block size is four words (16 bytes). A 2K byte data
cache would have 64 sets, each containing two blocks (a total of 128 blocks given
there is a block for each of the two columns in a set). Individual cache entries can be
accessed via the Cache interface (CIR) and Cache Data (CDR) registers. These
registers enable the data and tags of a cache block to be directly read and written.
There is only one Valid (V) bit for each block. This means blocks are never
partially filled and marked valid. A 29K data cache only allocates cache blocks to
data when a miss occurs during a data load operation. This is known as a
“read–allocate” policy. When performing a data store and an address match is not
found in the cache, no cache block will be allocated. This “no write–allocation”
policy has some advantages. It simplifies the cache design, as an “allocate on write”
policy may require a currently valid block to be written–back to memory before the
block is reallocated to cache the data block causing the cache miss. This would be a
complicated process as the reload and write–back activities both require access to the
system busses. Additionally, the instructions following the load instruction may also
require access to the system bus if they are not being provided by the instruction
cache. To implement an “allocate on write” policy, which avoided the potentially
severe pipeline stalling, would be expensive in terms of on–chip (silicon) resources.
Typically, when data is written–out to memory it is no longer required, as compilers
prefer to keep critical data in registers. Thus, typical patterns of data access indicate
that data written–out should not cause block allocation as the data is somewhat less
likely to be accessed again in the near future.
When stores are performed on data which is not currently in the cache, or to data
which is supported with a “write–through” policy, a write–through buffer is used to
assist the operation. The buffer is two words deep and holds store–data which is
waiting for access to the memory bus. This enables the processor to continue
executing new instructions and not wait till the store is complete. The pipeline only
stalls when there are more than two outstanding stores waiting to be written into