4.3. Replacing ARMv5 barriers with equivalent ARMv7 barriers

In ARMv7, the IMB barrier is deprecated, and you must replace it with equivalent ARMv7 barriers.

A memory barrier is an instruction that requires the core to apply an ordering constraint between memory operations that occur before and after the memory barrier instruction in the program. Such instructions can also be called memory fences in other architectures.

The term memory barrier also refers to a compiler mechanism that prevents the compiler from scheduling data access instructions across the barrier when performing optimizations. In GCC, for example, you can use the inline assembler memory clobber, to indicate that the instruction changes memory and therefore the optimizer cannot reorder memory accesses across the barrier. The syntax is as follows:

asm volatile("" ::: "memory");

ARM Compiler, armcc, includes a similar intrinsic, called __schedule_barrier().

However, this document focuses on hardware memory barriers, provided through dedicated ARM assembly language instructions. Core optimizations can result in memory operations occurring in a different order from that specified in the executing code.

Normally, this reordering is invisible to you, and you do not have to worry about memory barriers. However, there are cases where you must take care of such ordering issues. For example, you must consider these issues in device drivers or when you have multiple observers of the data that must be synchronized.

The ARM architecture provides memory barrier instructions that enable you to force the core to wait for memory accesses to complete. These instructions are available in both ARM and Thumb code, in both user and privileged modes. In older versions of the architecture, these were performed using CP15 operations in ARM code only. Use of these is now deprecated, although preserved for compatibility.

Data Synchronization Barrier (DSB)

This instruction forces the core to wait for all pending explicit data accesses to complete before any additional instruction stages can be executed. There is no effect on pre-fetching of instructions.

Data Memory Barrier (DMB)

This instruction ensures that all explicit memory accesses that appear in the program order preceding the DMB instruction are observed before any explicit memory accesses that appear in the program order after the DMB instruction. It does not affect the ordering of any other instructions executing on the core, or of instruction fetches.

Instruction Synchronization Barrier (ISB)

This instruction ensures that the effects of all context-altering operations preceding the ISB are recognized by subsequent instructions. This results in a flushing of the instruction pipeline, with the instruction following the ISB being refetched.

To provide the type of access and the shareability domain it applies to, the following options can be specified with the DMB or DSB instructions.

SY

The barrier applies to the full system, including all cores and peripherals. This is the default.

ST

The barrier only waits for stores to complete.

ISH

The barrier only applies to the Inner Shareable domain.

ISHST

The barrier combines ST and ISH. That is, it only stores to the Inner Shareable domain.

NSH

The barrier only applies to the Point of Unification (PoU).

NSHST

The barrier only waits for stores to complete and only out to the point of unification.

OSH

The barrier operation only applies to the Outer Shareable domain.

OSHST

The barrier operation only waits for stores to complete, and only to the Outer Shareable domain.

The DMB instruction enforces memory access ordering within a shareable domain. All processors within the shareable domain are guaranteed to observe all explicit memory accesses preceding the DMB instruction, before they observe any of the explicit memory accesses following the DMB instruction.

The DSB instruction has the same effect as the DMB. In addition, it also synchronizes the memory accesses with the full instruction stream, not only other memory accesses. This means that when a DSB is issued, execution stops until all outstanding explicit memory accesses are completed. When all outstanding reads have completed and the write buffer is drained, execution resumes as normal.

It might be easier to understand the effect of the barriers by considering an example. Consider the case of a quad core Cortex-A9 cluster. The cluster forms a single Inner Shareable domain. When a single core within the cluster executes a DMB instruction, that core ensures that all data memory accesses in the program order preceding the barrier complete, before any explicit memory accesses that appear in the program order after the barrier. This barrier ensures that all cores within the cluster see the accesses on either side of that barrier in the same order as the core that performs them. If the DMB ISH variant is used, the same cannot be guaranteed for external observers, such as DMA controllers or DSPs.

Copyright © 2014 ARM. All rights reserved.ARM DAI0425
Non-ConfidentialID080414