A7.1 About the L2 memory system

In most configurations, the L2 memory system consists of an integrated SCU that connects the cores in a cluster, an optional, tightly-coupled L2 cache, and an optional ACP interface. In single core, AXI configurations that do not include CPU cache protection, ACP, or an L2 cache, the SCU is replaced with a more area-efficient mini-SCU.

The same system register control bit enables the L1 data cache and the L2 cache.

SCU

The SCU maintains coherency between the L1 and L2 data caches in the processor. It also arbitrates requests for the L2 cache and the AXI, ACE, or CHI master interface.

A coherent request from a core is one that checks for data in the L1 data caches and, if present, the L2 cache. The SCU might send a request to another core to retrieve or invalidate data, or both, depending on the type of coherent request. This request is referred to as a snoop request. If the processor is implemented with an ACE or CHI master interface then the SCU can issue coherent requests on the master interface, which might result in snoop requests being sent to other masters in the system. The SCU might also receive snoop requests from other masters.

The SCU can handle direct cache-to-cache transfers between cores without having to read or write any data to the external memory system. Cache line migration enables dirty cache lines to be moved between cores, and there is no requirement to write back transferred cache line data to the external memory system.

Each core has tag and dirty RAMs that contain the state of the cache line in the L1 data cache. Rather than sending a snoop request to each core to access these for each coherent request, the SCU contains a set of duplicate tags that allows it to check the contents of each L1 data cache. The duplicate tags filter coherent requests so that a snoop request is only sent to a core if the coherent request hits in the corresponding duplicate tags. The duplicate tags are also used to filter snoop requests from the external memory system. This allows the cores and the system to function efficiently even with a high volume of requests.

The SCU does not support hardware management of coherency of the instruction caches. Instruction cache linefills perform coherent reads, however, there is no coherency management of data held in the instruction cache.

mini-SCU
The mini-SCU replaces the SCU in certain uniprocessor configurations that do not require data cache coherency with other masters in the system. That is, implementations that are configured to have a single CPU, no L2 cache, no CPU cache protection, and an AXI interface. The mini-SCU bridges between the master interface of the core and the AXI master interface of the processor.
L2 cache

Data cache lines are allocated to the L2 cache only when evicted from the L1 memory system, not when first fetched from the system. The only exceptions to this rule are for memory marked with the inner transient hint, or for non-temporal loads that are only ever allocated to the L2 cache. The L1 cache can prefetch data from the system, without data being evicted from the L2 cache.

Instruction cache lines are allocated to the L2 cache when fetched from the system and can be invalidated during maintenance operations.

The L2 cache is 8-way set associative. The L2 cache tags are looked up in parallel with the SCU duplicate tags. If both the L2 tag and SCU duplicate tag hit, a read accesses the L2 cache in preference to snooping one of the other cores.

L2 RAMs are invalidated automatically at reset unless the L2RSTDISABLE signal is set HIGH when the nL2RESET signal is deasserted.

Further features of the L2 cache are:

  • Configurable size of 128KB, 256KB, 512KB, and 1MB.

  • Fixed line length of 64 bytes.
  • Physically indexed and tagged.
  • Optional ECC protection.
  • A pseudo-LRU replacement policy.
ACP
Optional 128-bit wide I/O coherent ACP interface that can allocate to the L2 cache.
Master memory interface
The SCU connects the cores to the external memory system through a 128-bit-wide master memory interface that uses ACE, CHI, or AXI technology. The memory interface supports integer ratios of the processor clock period up to and including 1:1 and a 40-bit physical address range.

The L2 memory system has two abort mechanisms, a synchronous one and an asynchronous one.

Non-ConfidentialPDF file icon PDF versionARM 100241_0001_00_en
Copyright © 2016, 2017 ARM Limited or its affiliates. All rights reserved.