Chapter 6. Instruction Timing
6-33
Execution Unit Timings
If a memory access is designated as to transient, that cache block is marked not to be cast
out to the L2 unless it has been modiTed in the L1 data cache. If it is modiTed in the L1,
the block is not allocated in the L2 cache when it is victimized from the L1 data cache.
Instead, the block is written directly to main memory, bypassing the L2 cache.
The following instructions are interpreted to be transient:
¥
¥
dstt
and
dststt
(transient forms of the two data stream touch instructions)
lvxl
and
stvxl
6.4.8 AltiVec Instructions
The MPC7400 implements all instructions in the AltiVec speciTcation. The AltiVec
instruction set has no optional instructions; however, a few instructions associated with the
load/store model are deTned to allow signiTcant differences between implementations. The
following sections describe the MPC7400s implementation of these options.
6.4.8.1 AltiVec Permute Unit (VPU) Execution Timing
All AltiVec permute instructions are executed in a single cycle
6.4.8.2 AltiVec Arithmetic Logical Unit (VALU) Execution Timing
The AltiVec arithmetic logical unit (VALU) contains the following three independent
execution units for vector computations:
¥
¥
¥
Vector simple integer unit (VSIU)
Vector complex integer unit (VCIU)
Vector oating-point unit (VFPU)
Execution timing for these units are described in the following sections.
6.4.8.2.1 Vector Simple Integer Unit (VSIU) Execution Timing
Except
mtvscr
and
mfvscr
, the VSIU executes all AltiVec simple integer instructions and
all AltiVec oating-point compare, minimum, and maximum instructions, all of which have
single-cycle latency.
6.4.8.2.2 Vector Complex Integer Unit (VCIU) Execution Timing
The VCIU executes all AltiVec complex integer instructions, which have a three-cycle
latency.
6.4.8.2.3 Vector Floating-Point Unit (VFPU) Execution Timing
In non-Java mode, all AltiVec oating-point instructions (except for the oating-point
compare, minimum, and maximum instructions, which are executed in the VSIU) have a
four-cycle latency.