
Chapter 8. System Considerations
8-9
This approach requires no state in the L2 (or higher level cache) and relies upon simply
gathering the various reservation indications at any level and passing them down as a
unified signal to lower levels to allow reservation influencing operations to propagate back
up the tree where they can be selected by a branch that is interested in such operations.
While the hardware requirements are simple in this approach, performance is affected in
the following ways:
The available system bus bandwidth is reduced while operations are retried pending
a response from the top of the tree.
Intermediate buses are tied up so other processors cannot have higher level misses
serviced.
Many read operations are cached as shared instead of exclusive, which generates
unnecessary bus traffic later when stores are performed to those addresses. For a
large multiprocessor system, this could cause significant loss in total bandwidth.
Implementation of this scheme only requires timely assertion of the RSRV
signal by a
processor. If RSRV were asserted by the end of the cycle after AACK
assertion for
operations where a read-atomic operation is required, or the next three-state after setting
the reservation for a cache hit, then there is adequate time for the L2 controller to prepare
for future system bus operations.
8.8.2.2 Improved Reservation Snooping
A more hardware-intensive approach to filtering is to require L2 caches to contain registers
and comparators for the address associated with a specific processor’s reservation. The
controller only passes on reservation-modifying cycles from the system bus side to the
processor bus side and can participate directly in reservation-influenced cycles. Thus, only
those addresses with actual outstanding reservations causes accesses to be retried on the
system bus, intermediate buses being unavailable, and placed in other caches as shared only
when necessary to maintain a reservation.
To provide this level of support, a processor must always ensure lower levels can snoop
addresses on which a reservation is placed. In the case of either noncacheable or cacheable
miss operations, the address is transmitted during the read-atomic operation that acquires
the data. For cacheable snoop hits, an address-only bus operation should be performed, to
allow the reservation address to be passed cleanly from the processor to any L2 caches.
This additional bus operation type is proposed since there are problems in using the current
read-atomic operation in the face of a cache hit. While there are many ways of trying to use
the current data transferring read-atomic operation, there are problems with both the L1 and
higher level caches dealing with the case of modified data already resident in the cache. For
these reasons it is cleaner to require a new bus operation type which would transmit a
reservation address down from one level in the hierarchy to the next below.
Additionally, the reservation address needs to be cleared so higher levels of the memory
hierarchy can stop snooping for reservations. This
stwcx.
address-only operation is an
optimization, and is not required. The cost of not clearing the reservation address is that a