A6.4.1 Memory system implementation

This section describes the implementation of the L1 memory system.

Limited Order Regions

The Cortex®-A55 core supports a single limited order range that includes the entire memory space.

Atomic instructions

The Cortex-A55 core supports the atomic instructions added in the Arm®v8.1‑A architecture.

Atomic instructions to cacheable memory can be performed as either near atomics or far atomics, depending on where the cache line containing the data resides. If the instruction hits in the L1 data cache in a unique state then it will be performed as a near atomic in the L1 memory system. If the atomic operation misses in the L1 cache, or the line is shared with another core then the atomic is sent as a far atomic out to the L3 cache. If the operation misses everywhere within the cluster, and the master interface is configured as CHI, and the interconnect supports far atomics, then the atomic will be passed on to the interconnect to perform the operation. If the operation hits anywhere inside the cluster, or the interconnect does not support atomics, then the L3 memory system will perform the atomic operation and allocate the line into the L3 cache if it is not already there.

The Cortex-A55 core supports atomics to device or non-cacheable memory, however this relies on the interconnect also supporting atomics. If such an atomic instruction is executed when the interconnect does not support them, it will result in a synchronous Data Abort (for load atomics) or an asynchronous Data Abort (for store atomics). The behavior of the atomic instructions can be modified by the CPUECTLR register settings.

For more information on the CPUECTLR register, see B2.30 CPUECTLR_EL1, CPU Extended Control Register, EL1.

LDAPR instructions

The core supports Load acquire instructions adhering to the RCpc consistency semantic introduced in the Armv8.3‑A extensions. This is reflected in register ID_AA64ISAR1_EL1 where bits[23:20] are set to 0b0001 to indicate that the core supports LDAPRB, LDAPRH, and LDAPR instructions implemented in AArch64.

For more information on the ID_AA64ISAR1_EL1 register, see B2.58 ID_AA64ISAR0_EL1, AArch64 Instruction Set Attribute Register 0, EL1.

Transient memory region

The core has a specific behavior for memory regions that are marked as Write-Back cacheable and transient, as defined in the Armv8‑A architecture.

For any load that is targeted at a memory region that is marked as transient, the following occurs:

  • If the memory access misses in the L1 data cache, the returned cache line is allocated in the L1 data cache but is marked as transient.
  • On eviction, if the line is clean and marked as transient, it is not allocated into the L2 cache but is marked as invalid.

For stores that are targeted at a memory region that is marked as transient, if the store misses in the L1 data cache, the line is allocated into the L2 cache.

Non-temporal loads

Non-temporal loads indicate to the caches that the data is likely to be used for only short periods. For example, when streaming single-use read data that is then discarded. In addition to non-temporal loads, there are also prefetch-memory (PRFM) hint instructions with the STRM qualifier.

Non-temporal loads cause allocation into the L1 data cache, with the same performance as normal loads. However, when a later linefill is allocated into the cache, the cacheline marked as non-temporal has higher priority to be be replaced. To prevent pollution of the L2 cache, a non-temporal line that is evicted from L1, is not allocated to L2 as would happen for a normal line.

Note:

The line is only marked as non-temporal in the cache if the core has the line in a unique state. If shared with other cores, the line is treated normally.

Non-temporal stores are treated the same as stores to a memory region that is marked as transient.

Non-ConfidentialPDF file icon PDF version100442_0200_00_en
Copyright © 2016–2018 Arm Limited or its affiliates. All rights reserved.