6.3.8. Non-temporal load and store pair

A new concept in ARMv8 is the non-temporal load and store. These are the LDNP and STNP instructions that perform a read or write of a pair of register values. They also give a hint to the memory system that caching is not useful for this data. The hint does not prohibit memory system activity such as caching of the address, preload, or gathering. However, it indicates that caching is unlikely to increase performance. A typical use case might be streaming data, but take note that effective use of these instructions requires an approach specific to the microarchitecture.

Non-temporal loads and stores relax the memory ordering requirements. In the above case, the LDNP instruction might be observed before the preceding LDR instruction, which can result in reading from an uncertain address in X0.

For example:

  LDR X0, [X3]
  LDNP X2, X1, [X0]      // Xo may not be loaded when the instruction executes!

To correct the above, you need an explicit load barrier:

  LDR X0, [X3]
  DMB nshld
  LDNP X2, X1, [X0]
Copyright © 2015 ARM. All rights reserved.ARM DEN0024A