14.2. Cache coherency

Chapter 11 Caches only considers the effect of the caches within a single processor. The Cortex-A53 and Cortex-A57 processors support coherency management between the different cores in a cluster. This requires address regions to be marked with the correct shareable attribute. These processors permit systems containing multi-core clusters to be built, where coherency can be maintained for data shared between clusters. Such system-level coherency requires a cache coherent interconnect, such as the ARM CCI-400, which implements the AMBA 4 ACE bus specification. See Figure 14.2.

Figure 14.2. Cache coherency groups

To view this graphic, your browser must support the SVG format. Either install a browser with native support, or install an appropriate plugin such as Adobe SVG Viewer.

The coherency support in a system depends on hardware design decisions and many possible configurations exist. For example, coherency can only be supported within a single cluster. A dual cluster big.LITTLE system is posssible in which the inner domain includes the cores of both clusters, or a multi-cluster system where the inner domain includes the cluster and the outer domain includes the other clusters. For more information about big.LITTLE systems, see Chapter 16 big.LITTLE Technology.

In addition to hardware, which maintains data coherency between caches, you must be able to broadcast cache maintenance activity performed by code running on one core to other parts of the system. There are hardware configuration signals, sampled at reset, which control whether inner or outer or both cache maintenance operations are broadcast and whether system barrier instructions are broadcast. The AMBA 4 ACE protocol allows signaling of barriers to other masters, so that ordering of maintenance and coherency operations is maintained. The Interconnect logic might require initialization by boot code.

Software must define which address regions are to be used by which group of masters, that is which other masters are sharing this address, by creating appropriate translation table entries. For Normal cacheable regions, this means setting the shareable attribute to one of Non-shareable, Inner Shareable, or Outer Shareable. For non-cacheable regions, the shareable attribute is ignored.

In a multi-core system it is not possible to know whether a specific core has a line covering a particular address in one of its caches (especially where the interconnect features caches, such as CCN-50x).

Maintenance may need to be broadcast to the interconnect. This means that software on one core can issue a cache clean or invalidate operation to an address that might currently be stored in the data cache of a different core that holds the address. When a maintenance operation is broadcast as shown in Figure 14.3, the operation is performed by all the cores in a particular shareability domain.

Figure 14.3. Broadcasting cache operations to other cores

To view this graphic, your browser must support the SVG format. Either install a browser with native support, or install an appropriate plugin such as Adobe SVG Viewer.

SMP operating systems typically rely on being able to broadcast cache and TLB maintenance operations. Consider the situation where an external DMA engine is able to modify the contents of external memory.

The SMP operating system running on a particular core does not know which core has which data. It simply requires an address range to be invalidated wherever it is in the cluster. If operations are not broadcast, the operating system must issue the clean or invalidate operations locally on each core. A DSB barrier instruction makes a core wait for the broadcasted operation it has issued to complete. The barrier does not force operations received by broadcast to complete. For more information about barrier instructions, see Chapter 13 Memory Ordering.

Table 14.1 lists the cache maintenance operations described in Chapter 11 and whether they are broadcast.

Table 14.1. Instructions with broadcast

Instructions Description Broadcast?
IC IALLUIS I-cache invalidate all to Point of Unification, Inner Shareable Yes (inner only)
IC IALLU I-cache invalidate all to Point of Unification No[a]
IC IVAU, Xt I-cache invalidate by address to Point of Unification Maybe[b]
DC ZVA, Xt D-cache zero by address No
DC IVAC, Xt D-cache invalidate by address to Point of Coherency Yes
DC ISW, Xt D-cache invalidate by Set/Way No
DC CVAC, Xt D-cache clean by address to Point of Coherency Maybe[b]
DC CSW, Xt D-cache clean by Set/Way No
DC CVAU, Xt D-cache clean by address to Point of Unification Maybe[b]
DC CIVAC, Xt D-cache clean and invalidate by address to Point of Coherency Yes
DC CISW, Xt D-cache clean and invalidate by Set/Way No

[a] Broadcast in Non-secure EL1 if HCR/HCR_EL2 FB bit is set, overriding normal behavior. This bit causes the following instructions to be broadcast within the Inner shareable domain when executed from Non-secure:


[b] Broadcast determined by shareability of memory region

For the IC instruction, that is the instruction cache maintenance operation, IS indicates that the function applies to all instruction caches within the Inner Shareable domain.

Copyright © 2015 ARM. All rights reserved.ARM DEN0024A