
122
Chapter 3: General-Purpose Programming
AMD 64-Bit Technology
24593—Rev. 3.09—September 2003
used by the program, or won’t be used again for a long time.
Applications can use the CLFLUSH instruction to remove stale
data from the cache.
3.9.6
Cache-Control
Instructions
General control and management of the caches is performed by
system software and not application software. System software
uses special registers to assign
memory types
to physical-address
ranges, and page-attribute tables are used to assign memory
types to virtual address ranges. Memory types define the
cacheability characteristics of memory regions and how
coherency is maintained with main memory. See “Memory
System” in Volume 2 for additional information on memory
typing.
Instructions are available that allow application software to
control the cacheability of data it uses on a more limited basis.
These instructions can be used to boost an application’s
performance by prefetching data into the cache, and by
avoiding cache pollution. Run-time analysis tools and compilers
may be able to suggest the use of cache-control instructions for
critical sections of application code.
Cache Prefetching.
Applications can prefetch entire cache lines
into the caching hierarchy using one of the prefetch
instructions. The prefetch should be performed in advance, so
that the data is available in the cache when needed. Although
load instructions can mimic the prefetch function, they do not
offer the same performance advantage, because a load
instruction may cause a subsequent instruction to stall until the
load completes, but a prefetch instruction will never cause such
a stall. Load instructions also unnecessarily require the use of a
register, but prefetch instructions do not.
The instructions available in the AMD64 architecture for cache-
line prefetching include one SSE instruction and two 3DNow!
instructions:
PREFETCHlevel
—(an SSE instruction) Prefetches read/write
data into a specific level of the cache hierarchy. If the
requested data is already in the desired cache level or closer
to the processor (lower cache-hierarchy level), the data is not
prefetched. If the operand specifies an invalid memory
address, no exception occurs, and the instruction has no
effect. Attempts to prefetch data from non-cacheable
memory, such as video frame buffers, or data from write-
combining memory, are also ignored. The exact actions