Chapter 7. The AltiVec Technology Implementation
7-7
AltiVec Technology and the Programming Model
The operation of a VT data stream engine does not consume any dispatch or completion
resources. A VT is an asynchronous line-fetch or line-touch engine that can prefetch data
in units of 32-byte cache blocks by inserting touch requests into the normal load/store
pipeline.
After the
dst
x
is queued in the VTQ, the VTQ begins to unroll the stream into 32-byte line
touches. As early as the second cycle after the LSU sends its request to the VTQ, the VTQ
could make its Trst line-fetch touch request to the data cache.
Note that a data stream engine bases its accesses on effective addresses. This means that
each line fetch within a stream accesses the data MMU simultaneously with the L1 data
cache and performs a normal translation. There are no arbitrary address boundaries that
affect the progress of a given stream.
In addition, if a VTQ line touch accesses a page that does not reside in the data MMU, a
table search operation is performed to load that PTE into the data TLB. The TLB is
non-blocking during a VTQ-initiated table search operation, meaning that normal loads and
stores can hit in the TLB (and in the data cache) during the table search.
7.1.2.4 Stream Engine Tags
The STRM Teld in the
dst
x
instruction designates which of the four data stream engines
(VT0, VT1, VT2, or VT3) is used by a given instruction, as described in Table 7-4.
Bits 7 and 8 of the
dst
x
opcode are reserved. If bit 7 is set, it is ignored. If bit 8 is set, the
VTQ does not queue up the stream and that
dst
x
instruction is ignored.
7.1.2.4.1 Speculative Execution and Pipeline Stalls for Data Stream
Instructions
Like a load miss instruction or a
dcbt
/
dcbtst
instruction, a
dst
x
instruction is executed
speculatively. If the target of a particular
dst
x
line fetch is mapped G = 1 (guarded), any
reload for that line fetch is under the same constraints as a guarded load. If any of the four
data stream engines encounter a TLB miss, all four pause until the
dst
x
access that caused
the TLB miss is retired from the completion queue or is the oldest instruction in the queue.
The
dst
x
then initiates a table search and completes its current cache access.
If a
dst
x
instruction to a given data stream is dispatched and the VTQ is processing a
previous
dst
x
to the same data stream, the second
dst
to that tag supersedes the Trst one,
Table 7-4. DST[STRM] Description
Value of STRM Field in dst
x
instruction
Data Stream Engines (VTs)
00
VT0
01
VT1
10
VT2
11
VT3