|Home > Functional description > Level 1 memory system > Data prefetching|
The following section describes the software and hardware data prefetching behavior of the Cortex®-A55 core.
The Cortex-A55 core has a data prefetch mechanism that looks for cache line fetches with regular patterns. If the data prefetcher detects a pattern, then it signals to the memory system that memory accesses from a specified address are likely to occur soon. The memory system responds by starting new linefills to fetch the predicted addresses ahead of the demand loads.
The Cortex-A55 core can track multiple streams in parallel.
Prefetch streams end when either:
For read streams, the prefetcher is based on the virtual addresses. A given stream is allowed to prefetch addresses through multiple pages as long as they are cacheable and with read permissions. If the new page is still cacheable and has read permission, it can cross page boundaries. Write streams are based on physical addresses and so cannot cross page boundaries. However, if full cache line writes are performed then the prefetcher does not activate and write streaming mode is used instead.
For some types of pattern, when the prefetcher is confident in the stream, it can start progressively increasing the prefetch distance ahead of the current accesses. These accesses start to allocate to the L3 cache rather than L1. Allocating to the L3 cache allows better utilization of the larger resources available at L3. Also, utilizing the L3 cache reduces the amount of pollution of the L1 cache if the stream ends or is incorrectly predicted. If the prefetching to L3 was accurate, the line will be removed from L3 and allocated to L1 when the stream reaches that address.
The CPUECTLR register allows you to:
The Cortex-A55 core supports
PRFM instructions. If
PRFM miss and
are to a cacheable address, then these instructions perform a lookup in the cache and start
a linefill. The
PRFMs also enables targeting of a prefetch
to the L2 or L3 cache. A request is sent to L2 to start a linefill, and then the instruction
can retire without any data being returned to L1.
PLIL1STRM are implemented as
a prefetch to L2.
PRFM instruction for data prefetching where short sequences or irregular pattern
fetches are required. For more information about prefetch memory and preloading caches, see
Arm® Architecture Reference Manual Armv8, for Armv8-A architecture profile.
The Data Cache Zero by Virtual Address (
ZVA) instruction enables a block of 64-bytes in memory, which is aligned to
64-bytes in size, to be set to 0. The DCZID_EL0 register passes this value.
DC ZVA instruction allocates this
value into the data cache using the same method as a normal store instruction.