PowerPC 440 Core
09/21/1999
Page 7 of 18
Instruction and Data Caches
Processor Local Bus (PLB) Memory Access
The PPC440 has three independent 128-bit Processor Local Bus (PLB) master interfaces, one for
instruction fetches, one for data reads, and a third for data writes. Memory accesses are performed through
the PLB interfaces to/from the instruction cache (I-Cache) or data cache (D-Cache) units. Having three
independent bus interfaces for the cache units provides maximum flexibility for designs to optimize
system throughput. Memory accesses (loads/stores) which hit in the cache achieve single-cycle
throughput.
Cache Configuration
The PPC440 has separate instruction and data caches with 8 word (32 byte) cache lines. Instruction and
data cache sizes are factory-configurable to any combination of 0KB, 8KB, 16KB, 32KB, or 64KB cache
sizes. Configurable cache sizes provide designers with a parameter for optimizing the PPC440 to a
desired price-performance for a particular application. The caches are highly associative, with
associativity varying with cache size as shown in Table 3. High associativity enables advanced cache
functions such as locking and transient memory regions (see “Cache Partitioning” below).
Cache Size
8 KB
16KB
32KB
64KB
Ways
32
64
64
128
Table 3 – Number of Ways for Different PPC440 Cache Sizes
The cache arrays are non-blocking. Non-blocking caches allow the PPC440 to overlap execution of
load/store instructions while instruction fetches take place over the PLB. The caches, therefore, continue
supplying data and instructions without interruption to the pipeline. The PPC440 replaces cache lines
according to a round-robin replacement policy.
The initial PPC440A4 core offering will include a 32KB instruction cache and 32KB data cache. These
caches are physically constructed using two, 16KB CAMRAM macros, each consisting of 8, 2KB sub-
banks (or “sets”). This organization facilities low-power operation and fast hit/miss determination.
Cache Partitioning
The PPC440 caches have the ability to be separated into “normal”, “transient”, and “l(fā)ocked” regions.
Normal regions are what is traditionally thought of regarding cache replacement. Transient regions are
used for data that is used temporarily and then not needed again, such as the data in a particular JPEG
image. A separate transient region avoids castouts of more commonly accessed code in the normal region.
The locked region is for code that is not to be cast out of the cache, and is the resulting region not
included in the normal and transient regions. The regions are set via “victim” ceiling and floor pointers,
as shown in Figure 4. Figure 4 shows two examples of cache partitioning, the left side shows separate
transient and normal regions, and the right side shows part of the normal region overlapping with the
transient region. The normal ceiling is defined as the top of the cache.