13.2. Barriers

The ARM architecture includes barrier instructions to force access ordering and access completion at a specific point. In some architectures, similar instructions are known as a fence.

If you are writing code where ordering is important, see Appendix J7 Barrier Litmus Tests in the ARM Architecture Reference Manual - ARMv8, for ARMv8-A architecture profile and Appendix G Barrier Litmus Tests in the ARM Architecture Reference Manual ARMv7-A/R Edition, which includes many worked examples.

The ARM Architecture Reference Manual defines certain key words, in particular, the terms observe and must be observed. In typical systems, this defines how the bus interface of a master, for example, a core or GPU and the interconnect, must handle bus transactions. Only masters are able to observe transfers. All bus transactions are initiated by a master. The order that a master performs transactions in is not necessarily the same order that such transactions complete at the slave device, because transactions might be re-ordered by the interconnect unless some ordering is explicitly enforced.

A simple way to describe observability is to say that “I have observed your write when I can read what you wrote and I have observed your read when I can no longer change the value you read” where both I and you refer to cores or other masters in the system.

There are three types of barrier instruction provided by the architecture:

Instruction Synchronization Barrier (ISB)

This is used to guarantee that any subsequent instructions are fetched, again, so that privilege and access are checked with the current MMU configuration. It is used to ensure any previously executed context-changing operations, such as writes to system control registers, have completed by the time the ISB completes. In hardware terms, this might mean that the instruction pipeline is flushed, for example. Typical uses of this would be in memory management, cache control, and context switching code, or where code is being moved about in memory.

Data Memory Barrier (DMB)

This prevents re-ordering of data accesses instructions across the barrier instruction. All data accesses, that is, loads or stores, but not instruction fetches, performed by this processor before the DMB, are visible to all other masters within the specified shareability domain before any of the data accesses after the DMB.

For example:

LDR x0, [x1] // Must be seen by the memory system before the STR below.
ADD x2, #1   // May be executed before or after the memory system sees LDR.
STR x3, [x4] // Must be seen by the memory system after the LDR above.

It also ensures that any explicit preceding data or unified cache maintenance operations have completed before any subsequent data accesses are executed.

DC CSW, x5	   // Data clean by Set/way
LDR x0, [x1] // Effect of data cache clean might not be seen by this
             // instruction
LDR x2, [x3] // Effect of data cache clean will be seen by this instruction
Data Synchronization Barrier (DSB)

This enforces the same ordering as the Data Memory Barrier, but has the additional effect of blocking execution of any further instructions, not just loads or stores, or both, until synchronization is complete. This can be used to prevent execution of a SEV instruction, for instance, that would signal to other cores that an event occurred. It waits until all cache, TLB and branch predictor maintenance operations issued by this processor have completed for the specified shareability domain.

For example:

DC ISW, x5     // operation must have completed before DSB can complete
STR x0, [x1]   // Access must have completed before DSB can complete
ADD x2, x2, #3 // Cannot be executed until DSB completes  

As you can see from the above examples, the DMB and DSB instructions take a parameter which specifies the types of access to which the barrier operates, before or after, and a shareability domain to which it applies.

The available options are listed in the table.

Table 13.1. Barrier parameters

<option> Ordered Accesses (before - after) Shareability Domain
OSHLD Load - Load, Load - Store Outer shareable
OSHST Store - Store
OSH Any - Any
NSHLD Load - Load, Load - Store Non-shareable
NSHST Store - Store
NSH Any - Any
ISHLD Load -Load, Load - Store Inner shareable
ISHST Store - Store
ISH Any - Any
LD Load -Load, Load - Store Full system
ST Store - Store
SY Any - Any

The ordered access field specifies which classes of accesses the barrier operates on. There are three options.

Load - Load/Store

This means that the barrier requires all loads to complete before the barrier but does not require stores to complete. Both loads and stores that appear after the barrier in program order must wait for the barrier to complete.

Store - Store

This means that the barrier only affects store accesses and that loads can still be freely re-ordered around the barrier.

Any - Any

This means that both loads and stores must complete before the barrier. Both loads and stores that appear after the barrier in program order must wait for the barrier to complete.

Barriers are used to prevent unsafe optimizations from occurring and to enforce a specific memory ordering. Use of unnecessary barrier instructions can therefore reduce software performance. Consider carefully whether a barrier is necessary in a specific situation, and if so, which is the correct barrier to use.

A more subtle effect of the ordering rules is that the instruction interface, data interface, and MMU table walker of a core are considered as separate observers. This means that you might need, for example, to use DSB instructions to ensure that an access one interface is guaranteed to be observable on a different interface.

If you execute a data cache clean and invalidate instruction, for example DCCVAU, X0, you must insert a DSB instruction after this to be sure that subsequent page table walks, modifications to translation table entries, instruction fetches, or updates to instructions in memory, can all see the new values.

For example, consider an update of the translation tables:

  STR X0, [X1]          // update a translation table entry
  DSB ISHST             // ensure write has completed
  TLBI VAE1IS, X2       // invalidate the TLB entry for the entry that changes
  DSB ISH               // ensure TLB invalidation is complete
  ISB                   // synchronize context on this processor 

A DSB is required to ensure that the maintenance operations complete and an ISB is required to ensure that the effects of those operations are seen by the instructions that follow.

The processor might speculatively access an address marked as Normal at any time. So when considering whether barriers are required, don’t just consider explicit accesses generated by load or store instructions.

Copyright © 2015 ARM. All rights reserved.ARM DEN0024A