A6.5 Data prefetching

The following section describes the software and hardware data prefetching behavior of the Cortex®-A55 core.

Hardware data prefetcher

The Cortex-A55 core has a data prefetch mechanism that looks for cache line fetches with regular patterns. If the data prefetcher detects a pattern, then it signals to the memory system that memory accesses from a specified address are likely to occur soon. The memory system responds by starting new linefills to fetch the predicted addresses ahead of the demand loads.

The Cortex-A55 core can track multiple streams in parallel.

Prefetch streams end when either:

  • The pattern is broken.
  • A DSB is executed.
  • A WFI or WFE is executed.
  • A data cache maintenance operation is executed.

For read streams, the prefetcher is based on the virtual addresses. A given stream is allowed to prefetch addresses through multiple pages as long as they are cacheable and with read permissions. If the new page is still cacheable and has read permission, it can cross page boundaries. Write streams are based on physical addresses and so cannot cross page boundaries. However, if full cache line writes are performed then the prefetcher does not activate and write streaming mode is used instead.

For some types of pattern, when the prefetcher is confident in the stream, it can start progressively increasing the prefetch distance ahead of the current accesses. These accesses start to allocate to the L3 cache rather than L1. Allocating to the L3 cache allows better utilization of the larger resources available at L3. Also, utilizing the L3 cache reduces the amount of pollution of the L1 cache if the stream ends or is incorrectly predicted. If the prefetching to L3 was accurate, the line will be removed from L3 and allocated to L1 when the stream reaches that address.

The CPUECTLR register allows you to:

  • Deactivate the prefetcher.
  • Alter the number of outstanding requests that the prefetcher can make.

Preload instructions

The Cortex-A55 core supports PLD and PRFM instructions. If PLD and PRFM miss and are to a cacheable address, then these instructions perform a lookup in the cache and start a linefill. The PRFMs also enables targeting of a prefetch to the L2 or L3 cache. A request is sent to L2 to start a linefill, and then the instruction can retire without any data being returned to L1. PLI,PLIL1KEEP, and PLIL1STRM are implemented as a prefetch to L2.

Use the PLD or PRFM instruction for data prefetching where short sequences or irregular pattern fetches are required. For more information about prefetch memory and preloading caches, see the Arm® Architecture Reference Manual Armv8, for Armv8-A architecture profile.

Data Cache Zero

The Data Cache Zero by Virtual Address (DC ZVA) instruction enables a block of 64-bytes in memory, which is aligned to 64-bytes in size, to be set to 0. The DCZID_EL0 register passes this value.

The DC ZVA instruction allocates this value into the data cache using the same method as a normal store instruction.

Non-ConfidentialPDF file icon PDF version100442_0200_00_en
Copyright © 2016–2018 Arm Limited or its affiliates. All rights reserved.