4.17 Block copy with LDM and STM

You can sometimes make code more efficient by using LDM and STM instead of LDR and STR instructions.

Example of block copy without LDM and STM

The following example is an ARM code routine that copies a set of words from a source location to a destination a single word at a time:

        AREA  Word, CODE, READONLY  ; name the block of code
num     EQU   20                    ; set number of words to be copied
        ENTRY                       ; mark the first instruction called
start
        LDR   r0, =src              ; r0 = pointer to source block
        LDR   r1, =dst              ; r1 = pointer to destination block
        MOV   r2, #num              ; r2 = number of words to copy
wordcopy
        LDR   r3, [r0], #4          ; load a word from the source and
        STR   r3, [r1], #4          ; store it to the destination
        SUBS  r2, r2, #1            ; decrement the counter
        BNE   wordcopy              ; ... copy more
stop
        MOV   r0, #0x18             ; angel_SWIreason_ReportException
        LDR   r1, =0x20026          ; ADP_Stopped_ApplicationExit
        SVC   #0x123456             ; ARM semihosting (formerly SWI)

        AREA  BlockData, DATA, READWRITE
src     DCD   1,2,3,4,5,6,7,8,1,2,3,4,5,6,7,8,1,2,3,4
dst     DCD   0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
        END

You can make this module more efficient by using LDM and STM for as much of the copying as possible. Eight is a sensible number of words to transfer at a time, given the number of available registers. You can find the number of eight-word multiples in the block to be copied (if R2 = number of words to be copied) using:

    MOVS   r3, r2, LSR #3    ; number of eight word multiples

You can use this value to control the number of iterations through a loop that copies eight words per iteration. When there are fewer than eight words left, you can find the number of words left (assuming that R2 has not been corrupted) using:

    ANDS   r2, r2, #7

Example of block copy using LDM and STM

The following example lists the block copy module rewritten to use LDM and STM for copying:

      AREA   Block, CODE, READONLY ; name this block of code
num   EQU    20                    ; set number of words to be copied
      ENTRY                        ; mark the first instruction called
start
      LDR    r0, =src              ; r0 = pointer to source block
      LDR    r1, =dst              ; r1 = pointer to destination block
      MOV    r2, #num              ; r2 = number of words to copy
      MOV    sp, #0x400            ; Set up stack pointer (sp)
blockcopy
      MOVS   r3,r2, LSR #3         ; Number of eight word multiples
      BEQ    copywords             ; Fewer than eight words to move?
      PUSH   {r4-r11}              ; Save some working registers
octcopy
      LDM    r0!, {r4-r11}         ; Load 8 words from the source
      STM    r1!, {r4-r11}         ; and put them at the destination
      SUBS   r3, r3, #1            ; Decrement the counter
      BNE    octcopy               ; ... copy more
      POP    {r4-r11}              ; Don't require these now - restore
                                   ; originals
copywords 
      ANDS   r2, r2, #7            ; Number of odd words to copy
      BEQ    stop                  ; No words left to copy?
wordcopy 
      LDR    r3, [r0], #4          ; Load a word from the source and
      STR    r3, [r1], #4          ; store it to the destination
      SUBS   r2, r2, #1            ; Decrement the counter
      BNE    wordcopy              ; ... copy more
stop
      MOV    r0, #0x18             ; angel_SWIreason_ReportException
      LDR    r1, =0x20026          ; ADP_Stopped_ApplicationExit
        SVC   #0x123456             ; ARM semihosting (formerly SWI)

      AREA   BlockData, DATA, READWRITE
src   DCD    1,2,3,4,5,6,7,8,1,2,3,4,5,6,7,8,1,2,3,4
dst   DCD    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
      END

Note:

The purpose of this example is to show the use of the LDM and STM instructions. There are other ways to perform bulk copy operations, the most efficient of which depends on many factors and is outside the scope of this document.
Non-ConfidentialPDF file icon PDF versionARM DUI0473M
Copyright © 2010-2016 ARM Limited or its affiliates. All rights reserved.