+
vB
vD
vC
vA
Prod
What Is AltiVec
Technology?
Freescale’s AltiVec technology is a powerful
tool for software developers who want to add
efficiency and speed to their applications.
Starting with the e600 PowerPC core (also
known as the G4), a 128-bit vector execution
unit was added to the architecture. This
engine operates concurrently with the
existing integer and floating-point units and
enables highly parallel operations—up to
16 operations in a single clock cycle. By
leveraging AltiVec technology, developers
can see dramatic acceleration in
performance-driven, high-bandwidth
computing and communications applications.
AltiVec technology uses a separate register
file containing 32 entries—each of which is
128 bits wide. Each value within an AltiVec
register is a vector that is made up of
elements. AltiVec instructions perform
simultaneous operations on all elements
within an AltiVec vector register. This is often
referred to as SIMD (Single Instruction
Multiple Data) parallel processing. Depending
on data size, vectors are 4, 8 or 16 elements
long. There is virtually no performance penalty
for mingling integer, FPU and AltiVec
technology operations.
AltiVec technology is an extension to the
PowerPC instruction set, adding 162 new
“vector” instructions. A set of useful
operations such as sum-across (sums all the
elements in a vector) and multiply-sum
(multiplies and sums elements in three
vectors) have been added. In addition, data
manipulation instructions have been
augmented to include operations such as
permute (fills a register with bytes from two
other registers), merge (merges two vectors
into one), and “splat” (duplicates data across
elements in a vector). The AltiVec instructions
have one, two or three source operands and
are non-destructive in nature.
Other advantages provided by the AltiVec
instructions include:
> Fully pipelined with single-cycle throughput
Simple ops: 1 cycle latency
Compound ops: 3–4 cycle latency
No restriction on issue with
scalar instructions
> Enhanced cache/memory interface
Software hints for data reuse probability
Prefetch support (stride-N access)
> Simplified load/store architecture
Simple byte, halfword, word and
quadword loads and stores
Virtually no unaligned accesses—software
managed via permute instruction
ALTIVEC EXECUTION OF MULTIPLY-ACCUMULATE
Dispatch
IU
FPU
Vector Unit
Cache/Memory
Instruction
Stream
GPRs
FPRs
Vector Register File
32 bits
64 bits
128 bits
ALTIVEC VECTOR UNIT BLOCK DIAGRAM