Chapter 1
3DNow! Technology
11
21928G/0—March 2000
3DNow! Technology Manual
Execution Resources on AMD-K6
Processors
The register operations of all 3DNow! floating-point
instructions are executed by either the register X unit or the
register Y unit. One operation can be issued to each register
unit each clock cycle, for a maximum issue and execution rate
of two 3DNow! operations per cycle. All 3DNow! operations
have an execution latency of two clock cycles and are fully
pipelined.
Even though 3DNow! execution resources are not duplicated in
both register units (for example, there are not two pairs of
3DNow! multipliers, just one shared pair of multipliers), there
are no instruction-decode or operation-issue pairing
restrictions. When, for example, a 3DNow! multiply operation
starts execution in a register unit, that unit grabs and uses the
one shared pair of 3DNow! multipliers. Only when actual
contention occurs between two 3DNow! operations starting
execution at the same time is one of the operations held up for
one cycle in its first execution pipe stage while the other
proceeds. The delay is never more than one cycle.
For code optimization purposes, 3DNow! operations are
grouped into two categories. These categories are based on
execution resources and are important when creating properly
scheduled code. As long as two 3DNow! operations that start
execution simultaneously do not fall into the same category,
both operations will start execution without delay.
The first category of instructions contains the operations for the
following 3DNow! instructions: PFADD, PFSUB, PFSUBR,
PFACC, PFCMPx, PFMIN, PFMAX, PI2FD, PF2ID, PFRCP, and
PFRSQRT.
The second category contains the operations for the following
3DNow! instructions: PFMUL, PFRCPIT1, PFRSQIT1, and
PFRCPIT2.
Note:
3DNow! add and multiply operations, among other
combinations, can execute simultaneously.
Normally, in high-performance 3DNow! code, all of the 3DNow!
instructions are properly scheduled apart from each other so as
to avoid delays due to execution resource contentions (as well
as taking into account dependencies and execution latencies).