D.1. Memory ordering

The ARM architecture requires that transactions to locations in Device-type memory be ordered. The Cortex-R5 processor has an in-order pipeline, so any non-cached read blocks, preventing any subsequent read or write from starting until the current read is complete. On an AXI bus, as used by the Cortex-R5 processor, a series of writes issued in order, is kept in order by using the same ID for all the transactions.

To maintain ordering between a write and a subsequent read, the Cortex-R5 processor waits for the write transaction to complete before starting the read. The writes that the Cortex-R5 processor must wait for are any Device-type writes in its write buffer or bus interface and writes for which the address and data have been accepted by the bus but for which no response has been received, that is AXI outstanding writes. The latency of the Device read depends on how many writes must complete before it starts.

The architectural ordering requirements apply only to individual peripherals so, for example, an outstanding write to a UART does not have to be completed before a read from an interrupt controller can be started. However, the Cortex-R5 processor views the memory attached the each interface as flat, so ordering is preserved for all accesses to a given interface. Accesses to different Cortex-R5 interfaces are not ordered, so selecting which interface is used can improve the latency of critical Device read accesses.

For example, if a CPU has a number of write transactions outstanding on the AXI master interface, a read from an interrupt controller attached to the AXI master interface must wait for those writes to complete and the latency incurred might impact the interrupt handling performance. Alternatively, if the interrupt controller were attached to the AXI peripheral interface, the read could start without waiting for the outstanding writes on the AXI master interface. However, the read would have to wait for any outstanding writes on the AXI peripheral interface or its buffers.


  • The transaction ordering provided by Device memory is useful in situations where the access has side effects. For example, if the processor writes to a memory-mapped FIFO, and then reads a different memory-mapped register that indicates whether the FIFO is full, the value read must reflect the state of the FIFO after the write otherwise a further write could be performed that causes an overflow.

  • If a write to a peripheral on one interface causes a side effect on a peripheral on a different interface, there is no implicit ordering to ensure the side effect is observed by a subsequent access to the second peripheral, even if both are in Device-type memory. In this situation, you must perform a read from the first peripheral to ensure that the write has completed, followed by a DMB to ensure ordering before performing the second access. On the Cortex-R5 processor, a DMB alone is sufficient to force this ordering, but this is not architectural and cannot be relied on in the general case.

Writes to Device-type memory always drain from the Cortex-R5 buffers as quickly as possible. If the memory system attached to a port is perfect, that is the write response is returned in the cycle after the address and data have been received, outstanding accesses cannot accumulate. Selecting different interfaces for different peripherals does not improve read latencies in such a system.

Copyright © 2010-2011 ARM. All rights reserved.ARM DDI 0460C