| |||
| Home > Level One Memory System > Cache organization | |||
Each cache is implemented as a four-way set associative cache of configurable size. They are virtually indexed and physically addressed. The cache sizes are configurable with sizes in the range of 4 to 64KB. Both the Instruction Cache and the Data Cache are capable of providing two words per cycle for all requesting sources.
Each cache way is architecturally limited to 16KB in size, because of the limitations of the virtually indexed, physically addressed implementation. The number of cache ways is fixed at four, but the cache way size can be varied between 1KB and 16KB in powers of 2. The line length is not configurable and is fixed at eight words per line.
Write operations must occur after the Tag RAM reads and associated address comparisons have completed. A three-entry write buffer is included in the cache to enable the written words to be held until there is a gap in cache usage to enable them to be written. One or two words can be written in a single store operation. The addresses of these outstanding writes provide an additional input into the Tag RAM comparison for reads.
To avoid a critical path from the Tag RAM comparison to the enable signals for the data RAMs, there is a minimum of one cycle of latency between the determination of a hit to a particular way, and the start of writing to the data RAM of that way. This requires the cache write buffer to be able to hold three entries, for back-to-back writes. Accesses that read the dirty bits must also check the cache write buffer for pending writes that result in dirty bits being set. The cache dirty bits for the Data Cache are updated when the cache write buffer data is written to the RAM. This requires the dirty bits to be held as a separate storage array (significantly, the tag arrays cannot be written, because the arrays are not accessed during the data RAM writes), but permits the dirty bits to be implemented as a small RAM.
The other main operations performed by the cache are cache line refills and write-back. These occur to particular cache ways, which are determined at the point of the detection of the cache miss by the victim selection logic.
To reduce overall power consumption, the number of full cache reads is reduced by the sequential nature of many cache operations, especially on the instruction side. On a cache read that is sequential to the previous cache read, only the data RAM Set that was previously read is accessed, if the read is within the same cache line. The Tag RAM is not accessed at all during this sequential operation.
To reduce unnecessary power consumption further, only the addressed words within a cache line are read at any time. With the required 64-bit read interface, this is achieved by disabling half of the RAMs on occasions when only a 32-bit value is required. The implementation uses two 32-bit wide RAMs to implement the cache data RAM shown in Figure 7.1, with the words of each line folded into the RAMs on an odd and even basis. This means that cache refills can take several cycles, depending on the cache line lengths. The cache line length is eight words.
The control of the level one memory system and the associated functionality, together with other system wide control attributes are handled through the system control coprocessor, CP15. This is described in About the system control coprocessor.
The block diagram of the cache subsystem is as shown in Figure 7.1. This diagram does not show the cache refill paths.