1-2
MPC7400 RISC Microprocessor Users Manual
MPC7400 Microprocessor Overview
¥
Two vector units that support AltiVec instructions:
Vector permute unit (VPU)
Vector arithmetic logic unit (VALU), which consists of the following
independent subunits:
D Vector simple integer unit (VSIU)
D Vector complex integer unit (VCIU)
D Vector oating-point unit (VFPU)
The ability to execute several instructions in parallel and the use of simple instructions with
rapid execution times yield high efTciency and throughput for MPC7400-based systems.
Most integer instructions (including VSIU instructions) have a one-clock cycle execution
latency.
The FPU and VFPU are pipelined; that is, the tasks they perform are broken into subtasks
executed in successive stages. Typically, a oating-point instruction occupies only one of
the three FPU stages at a time, freeing the previous stage to work on the next oating-point
instruction. Thus, three oating-point instructions can be in the FPU execute stage at a time
and one oating-point instruction can Tnish executing per processor clock cycle.
The VFPU has four pipeline stages when executing in non-Java mode and Tve when
executing in Java mode.
Note that for the MPC7400, double- and single-precision versions of oating-point
instructions have the same latency. For example, a oating-point multiply-add instruction
takes three cycles to execute, regardless of whether it is single- (
fmadds
) or
double-precision (
fmadd
).
Figure 1-1 shows the parallel organization of the execution units (shaded in the diagram).
The instruction unit fetches, dispatches, and predicts branch instructions. Note that this is
a conceptual model that shows basic features rather than attempting to show how features
are implemented physically.
The MPC7400 has independent on-chip, 32-Kbyte, eight-way set-associative, physically-
addressed L1 (level-one) caches for instructions and data and independent instruction and
data memory management units (MMUs). Each MMU has a 128-entry, two-way
set-associative translation lookaside buffer (DTLB and ITLB) that saves recently used page
address translations. Block address translation is implemented with the four-entry
instruction and data block address translation (IBAT and DBAT) arrays, deTned by the
PowerPC architecture. During block translation, effective addresses are compared
simultaneously with all four BAT entries, as described in Chapter 5, òMemory
Management.ó For information about the L1 caches, see Chapter 3, òL1 and L2 Cache
Operation.ó
The L2 cache is implemented with an on-chip, two-way, set-associative tag memory, and
with external, synchronous SRAMs for data storage. The external SRAMs are accessed
through a dedicated L2 cache port that supports a single bank of 0.5, 1, or 2 Mbyte of