ARM Technical Support Knowledge Articles

How do I avoid the compiler generating LDM/STM instructions?

Applies to: DS-5

Answer

In some situations it might not be desirable to generate multi-word access instructions such as LDM and STM. For example, a rare processor integration issue1 could result in a particular AHB slave being unable to support INCR burst accesses. 

An ARM processor might be capable of executing a number of instructions resulting in burst memory accesses. For example, a Cortex-R5 processor can generate AHB INCR (incremental) bursts for the following instructions:

The number of AHB INCR burst generating instructions is very processor and system dependent, so there may be greater or fewer instructions than the ones listed above.  

Out of the above instructions, LDM/STM and LDRD/STRD instructions are common. For example, a compiler may generate one LDM/STM instruction instead of multiple LDR/STR instructions. Such code may be generated for a memory copy operation where the compiler can assume that pointers are aligned to an appropriate boundary. The compiler may also generate load and store multiples/doubles for accessing an AHB slave/peripheral register larger than one word (32-bits).

Currently there is no ARM Compiler option to avoid multi-word access instructions like LDM/STM from being generated. Therefore the safest way to avoid LDM/STM is to scan through the code the compiler has generated. If an LDM/STM instruction is still generated, the programmer must look for a solution for each case.

Section 15.11 - "Qualifiers" from the armcc User Guide states the following:

"An object that has a volatile-qualified type is accessed as a word, halfword, or byte as determined by its size and alignment. For volatile objects larger than a word, the order of accesses to the parts of the object is undefined. Updates to volatile bitfields generally require a read-modify-write. Accesses to aligned word, halfword and byte types are atomic. Other volatile accesses are not necessarily atomic.

Otherwise, reads and writes to volatile qualified objects occur as directly implied by the source code, in the order implied by the source code."

This means that the volatile keyword will suppress merging of word-sized and smaller loads/stores, however there is no guarantee that it will prevent all burst access instructions from being generated. It depends on several factors, including the type of the variable. For example if you access a 64-bit type then there is no guarantee of the ordering of the accesses or merging of accesses to individual words within.

Further information about using the volatile keyword can be found in section 5.8 - "Compiler optimization and the volatile keyword" from the armcc User Guide.

A example of where using the volatile keyword is not helpful is shown below in a piece of code that accesses a 64-bit peripheral:

void perip_64bitaccess(void)
{
  unsigned long long value = 0x0123456789ABCDEF;
  *((volatile unsigned long long*) (0xA0000000)) = value;
}

In this example, the compiler is likely to generate a store double word (STRD) instruction or a store multiple (STM) instruction. There are different approaches to avoid the compiler generating such a code sequence. The following C example uses two word sized pointers:

void perip_64bitaccess(void)
{
  unsigned long long value = 0x0123456789ABCDEF;
  *((volatile unsigned int*) (0xA0000000)) = (int)value;
  *((volatile unsigned int*) (0xA0000000+4)) = (int)(value>>32);
}

When the above program is compiled for Cortex-R5, the ARM Compiler generates the following instruction sequence:

    MOV r0,#0xa0000000
    LDR r1,[pc,#12] ; [0x18] = 0x89abcdef
    STR r1,[r0,#0]
    LDR r1,[pc,#8] ; [0x1c] = 0x1234567
    STR r1,[r0,#4]
    BX lr

    DCD 2309737967
    DCD 19088743

An alternative solution is to use assembly language or the embedded assembler, for example:

__asm void stm_llout(unsigned long long* addr, unsigned long long value)
{
  STR r2, [r0]
  STR r3, [r0,#4]
  BX lr
}

In the examples above it might also be necessary to disable interrupts (or certain exceptions) before calling the function stm_llout() and enabling interrupts again after returning from it. For example, this could be achieved by using the  __disable_irq and __enable_irq compiler intrinsics.

1 Before considering the software solution above, if there is a slave in the system that is not compliant to the Advanced Microcontroller Bus Architecture (AMBA), check the technical documentation for the ARM processor being used in the system to see if a workaround is available. If it is unclear whether a workaround exists, contact ARM Technical Support by raising a support case.

Article last edited on: 2014-02-28 09:16:56

Rate this article

[Bad]
|
|
[Good]
Disagree? Move your mouse over the bar and click

Did you find this article helpful? Yes No

How can we improve this article?

Link to this article
Copyright © 2011 ARM Limited. All rights reserved. External (Open), Non-Confidential