
352
Chapter 6: x87 Floating-Point Programming
AMD 64-Bit Technology
24593—Rev. 3.09—September 2003
whatever other value the tag bits indicated prior to the save).
To invalidate the contents of the x87 data registers after
FXSAVE, software must explicitly execute an FINIT
instruction. Also, FXSAVE (like FNSAVE) and FXRSTOR do
not check for pending unmasked x87 floating-point exceptions.
An FWAIT instruction can be used for this purpose.
The architecture supports two memory formats for FXSAVE
and FXRSTOR, a 512-byte 32-bit legacy format and a 512-byte
64-bit format, used in 64-bit mode. Selection of the 32-bit or 64-
bit format is determined by the effective operand size for the
FXSAVE and FXRSTOR instructions. For details, see “Saving
Media and x87 Processor State” in Volume 2.
6.11
Performance Considerations
In addition to typical code optimization techniques, such as
those affecting loops and the inlining of function calls, the
following considerations may help improve the performance of
application programs written with x87 floating-point
instructions.
These are implementation-independent performance
considerations. Other considerations depend on the hardware
implementation. For information about such implementation-
dependent considerations and for more information about
application performance in general, see the data sheets and the
software-optimization guides relating to particular hardware
implementations.
6.11.1
Replace x87
Code with 128-Bit
Media Code
Code written with 128-bit media floating-point instructions can
operate in parallel on four times as many single-precision
floating-point operands as can x87 floating-point code. This
achieves potentially four times the computational work of x87
instructions that use single-precision operands. Also, the higher
density of 128-bit media floating-point operands may make it
possible to remove local temporary variables that would
otherwise be needed in x87 floating-point code. 128-bit media
code is easier to write than x87 floating-point code, because the
XMM register file is flat rather than stack-oriented, and, in 64-
bit mode there are twice the number of XMM registers as x87
registers.
6.11.2
Use FCOMI-
FCMOV
x
Branching
Depending on the hardware implementation of the
architecture, the combination of FCOMI and FCMOV
cc
is often