
Application Note
15
AMD Alchemy Solutions Au1100 Processor LCD Performance
Rev. 30274A April 2003
Accesses to static bus peripherals can have an unusually large transfer time, which directly translates
into a dramatic increase in system bus latency. System designers must carefully consider the timing of
all peripherals on the static bus and optimize the timings to consume the least amount of time
possible. The prime example is the PCMCIA interface, where card transfer times can vary from
150ns to 250ns depending upon the card inserted. In addition, the card can also assert PWAIT# to
extend the cycle time indefinitely.
5.2 Software Design Considerations
The LCD controller merely fetches pixel data from the framebuffer residing in SDRAM; it is the
responsibility of software executing on the Au1 core to perform all graphics operations. The graphics
driver for the Au1100 processor LCD controller can optimize framebuffer caching and mapping to
improve overall system performance.
5.2.1 Framebuffer Caching
Generally speaking caching data improves overall performance. However, a framebuffer presents a
unique challenge in that it is a large, infrequently referenced data structure. For even a small display
panel with resolution 320x240 at 16bpp, the resulting framebuffer of 153,600 bytes easily exceeds the
16KB data cache of the Au1 core. As a direct result, caching the framebuffer displaces other useful,
non-framebuffer data (such as working variables, data-sets, stack, etc.) from the cache. Furthermore,
the cache is best utilized when the memory is referenced frequently; framebuffers pixels are typically
only written once by graphics operations and remain unchanged until a subsequent graphics operation
changes the pixel.
The net result is that it is undesirable to have the framebuffer occupy the entire cache since it reduces
overall cache hit rate and in turn reduces overall system performance. However, for performance
reasons, it is always desirable to do the most efficient access possible to the framebuffer. The Au1100
processor offers several options for improving framebuffer accesses.
If using the translation look-aside buffers (TLB) to access the framebuffer (that is, KSEG0 or
KSEG1spaces are not used exclusively to access the framebuffer), then the framebuffer cache setting
in the TLB should be one of the following, in order of preference:
1. CCA=6 (cached into way 0), with the data cache way 0 locked
2. CCA=6 (cached into way 0), without the data cache way 0 locked
3. CCA=7 (non-cached, write buffer merging and gathering)
4. CCA=2 (non-cached, no write buffer merging and gathering)
5. CCA=3 (cached, uses entire data cache)
CCA, cache coherency attributes, is a field in the MIPS TLB. See the
Alchemy Au1100
Processor from AMD Data Book
“2.4 Virtual Memory” for more information. CCA values are
provided in “Table 2. CCA Values” of the data book.