1.3.6 Caches in Fast Models

Fast Models with cache-state modeling enabled can replicate some types of failure-case, but not all types.

The effects of caches are programmer visible because they can cause a single memory location to exist as multiple inconsistent copies. If caches are not correctly maintained, reads can observe stale copies of locations, and flushes/cleans can cause writes to be lost.

There are three ways in which incorrect cache maintenance can be programmer visible:

From the D-side interface of a single processor
The only way of detecting the presence of caches is to create aliases in the memory map, so that the same range of physical addresses can be observed as both cached and non-cached memory.
From the D-side of a single processor to its I-side
Stale instruction data can be fetched when new instructions have been written by the D-side. This can either be because of deliberate self-modifying code, or as a consequence of incorrect OS demand paging.
Between one processor and another device
For example, another processor in a non-coherent MP system, or an external DMA device.

Fast Models with cache-state modeling enabled can replicate all of these failure-cases. However, they do not attempt to reproduce the following effects of caches:

  • Changes to timing behavior of a program because of cache hits/misses (because the timing of memory accesses is not modeled).
  • Ordering of line-fill and eviction operations.
  • Cache usage statistics (because the models do not generate accurate bus traffic).
  • Device-accurate allocation of cache victim lines (which is not possible without accurate bus traffic modeling).
  • Write-streaming behavior where a cache spots patterns of writes to consecutive addresses and automatically turns off the write-allocation policy.

The Cortex®‑A9 and Cortex‑A5 models do not model device-accurate MESI behavior. The Cortex‑A15 and Cortex‑A7 models do simulate hardware MOESI state handling, and can handle cache-to-cache snoops. In addition, they model the AMBA® 4 ACE cache-coherency protocols over their PVBus ports, so can be connected to a model of an ACE Coherent Interconnect (such as the CCI-400 model) to simulate coherent sharing of cache contents between processors.

It is not possible to insert devices between the processor and its L1 caches. In particular, you cannot model L1 traffic, although you can tell the model not to model the state of L1 caches.

Non-ConfidentialPDF file icon PDF version100964_1110_00_en
Copyright © 2014–2020 Arm Limited or its affiliates. All rights reserved.