7.4 L2 cache prefetcher

The Cortex-A72 processor includes a hardware L2 prefetcher that handles prefetch generation for instruction fetch and TBW descriptor accesses.

Note

The Load/store unit handles prefetch generation for Load/store accesses targeting both the L1D cache and L2 cache.
Some of the key features are:
  • Software-programmable prefetches on any instruction fetch L2 miss of 0, 1, 2, or 3 prefetches. All prefetches are allocated into the L2 cache.
  • Separate mechanisms to detect and prefetch:
    • Instruction fetch streams, to fetch consecutive cache lines on an L2 instruction fetch access.
    • Table walk descriptor, to fetch the consecutive cache line on an L2 table walk descriptor access.

    Note

    The prefetcher is limited to prefetch within the 4KB page of the current request, if the page has been mapped at a 4KB granularity.
  • Support for forwarding from prefetched requests. If a read request was sent over AXI because of a prefetch request, and a demand access for the same line was received, the read data can be forwarded from the internal data buffers to the demand request, before waiting for the line to be allocated to the cache.
You can program the CPUECTLR register to indicate the maximum number of prefetches to be allocated in the PRQ on the following:
  • An instruction fetch miss in the L2 cache by programming CPUECTLR_EL1[36:35].
The programmed distance is also used as the skip distance for any instruction fetch read with a stride match that hits in the L2 cache. In these cases, a single prefetch request is allocate in the PRQ as:
prefetch address = current address + (stride × programmed distance)

Note

The stride for an instruction fetch access is always one cache line.
Non-ConfidentialPDF file icon PDF versionARM 100095_0002_04_en
Copyright © 2014-2016 ARM. All rights reserved.