A6.4.1 Memory system implementation

This section describes the implementation of the L1 memory system.

Limited Order Regions

The core offers support for four limited ordering region descriptors, as introduced by the Armv8.1 Limited Ordering Regions.

Atomic instructions

The Cortex®‑A76 core supports the atomic instructions added in Armv8.1 architecture.

Atomic instructions to cacheable memory can be performed as either near atomics or far atomics, depending on where the cache line containing the data resides.

When an instruction hits in the L1 data cache in a unique state, then it is performed as a near atomic in the L1 memory system. If the atomic operation misses in the L1 cache, or the line is shared with another core, then the atomic is sent as a far atomic on the core CHI interface.

If the operation misses everywhere within the cluster, and the interconnect supports far atomics, then the atomic is passed on to the interconnect to perform the operation.

When the operation hits anywhere inside the cluster, or when an interconnect does not support atomics, the L3 memory system performs the atomic operation. If the line it is not already there, it allocates the line into the L3 cache. This depends on whether the DSU is configured with an L3 cache.

Therefore, if software prefers that the atomic is performed as a near atomic, precede the atomic instruction with a PLDW or PRFM PSTL1KEEP instruction.

Alternatively, the CPUECTLR can be programmed such that different types of atomic instructions attempt to execute as a near atomic. One cache fill will be made on an atomic. If the cache line is lost before the atomic operation can be made, it will be sent as a far atomic.

The Cortex‑A76 core supports atomics to device or non-cacheable memory, however this relies on the interconnect also supporting atomics. If such an atomic instruction is executed when the interconnect does not support them, it will result in an abort.

For more information on the CPUECTLR register, see B2.32 CPUECTLR_EL1, CPU Extended Control Register, EL1 .

LDAPR instructions

The core supports Load acquire instructions adhering to the RCpc consistency semantic introduced in the Armv8.3 extensions for A profile. This is reflected in register ID_AA64ISAR1_EL1 where bits[23:20] are set to 0b0001 to indicate that the core supports LDAPRB, LDAPRH, and LDAPR instructions implemented in AArch64.

Transient memory region

The core has a specific behavior for memory regions that are marked as write-back cacheable and transient, as defined in the Armv8.0 architecture.

For any load or store that is targeted at a memory region that is marked as transient, the following occurs:

  • If the memory access misses in the L1 data cache, the returned cache line is allocated in the L1 data cache but is marked as transient.
  • When the line is evicted from the L1 data cache, the transient hint is passed to the L2 cache so that the replacement policy will not attempt to retain the line. When the line is subsequently evicted from the L2 cache, it will bypass the next level cache entirely.

Non-temporal loads

Non-temporal loads indicate to the caches that the data is likely to be used for only short periods. For example, when streaming single-use read data that is then discarded. In addition to non-temporal loads, there are also prefetch-memory (PRFM) hint instructions with the STRM qualifier.

Non-temporal loads to memory which are designated as Write-Back are treated the same as loads to Transient memory.

Non-ConfidentialPDF file icon PDF version100798_0400_00_en
Copyright © 2016–2019 Arm Limited or its affiliates. All rights reserved.