7-10
MPC7400 RISC Microprocessor Users Manual
AltiVec Technology and the Programming Model
7.1.2.5.5 Differences Between dst/dstt and dstst/dststt Instructions
The only difference between touch-for-load (
dst
/
dstt
) and touch-for-store (
dstst
/
dststt)
streams is that touch-for-load streams are subdivided into line fetches that are treated
identically to individual
dcbt
fetches, while touch-for-store streams are subdivided into line
fetches that are treated identically to individual
dcbtst
fetches.
Note that if a touch-for-store stream instruction is mapped to a write-through page, that
stream is terminated. The use of the touch-for-store streams is not recommended when
store-miss merging is enabled, which is the default case.
Although the MPC7400 implements touch-for-store stream instructions, their use is
discouraged. If
dstst
is used to prefetch a 32-byte a cache block that would eventually be
fully consumed by 32 bytes worth of stores (that is, two back-to-back
stvx
instructions), the
inclusion of touch-for-store can reduce performance for systems with limited bandwidth.
This is because a touch-for-store must perform both a 32-byte coherency operation on the
address bus (two or more bus cycles) and 32-bytes of data transfer (four or more 64-bit bus
cycles). On the other hand, cacheable write-back stores that merge to 32 bytes require only
a 32-byte coherency operation (two or more bus cycles) because of the store-miss-merging
mechanism. Because these store misses are already fully pipelined on the MPC7400,
placing a touch-for-store before a series of adjacent stores that merge naturally anyway can
degrade performance.
7.1.2.5.6 dss and dssall Instructions
The Data Stream Stop instruction
dss
is never executed speculatively. Instead,
dss
instructions ow into a four-entry
dss
queue (DSSQ) in which one entry is dedicated to
each possible tag. If another
dss
is dispatched with a tag that matches a non-completed but
valid DSSQ entry, then that new
dss
remains in a hold queue and waits for the previous
dss
in the DSSQ to be completed.
If a subsequent
dst
x
is queued in the VTQ, it cancels an older
dss
entry in the DSSQ (same
tag).
When a given DSSQ entry completes, the valid bit for the VTQ entry corresponding to that
tag is immediately cleared.
If a
dssall
instruction is executed, the DSSQ queues up all four queue entries in order to
terminate all four VT streams when the
dssall
instruction is the oldest. The
dssall
opcode
differs from
dss
in that bit 6 (the A Teld) is set and bits 7D10 are ignored.
Note that line fetches in progress for a given
dst
x
stream are not canceled by the
dss
instruction. Only subsequent line fetches are prevented. To ensure that all line fetches from
a
dst
x
are completed, a
sync
instruction must be issued after the
dss
instruction.