B.3. Load and store instructions

Load and store instructions are classed as:

For load multiple and store multiple instructions, the number of registers in the register list usually determines the number of cycles required to execute a load or store instruction.

The Cortex-R7 MPCore processor has an optimized path from a load instruction to a subsequent data processing instruction, saving 1 cycle on the load-use penalty.

This path is used when the following conditions are met:

Table B.2 shows cycle timing for single load and store operations. The result latency is the latency of the first loaded register.

Table B.2. Single load and store operation cycle timings

Instruction cyclesAGU cyclesResult latency
Fast forward casesOther cases

LDR ,[reg]

LDR ,[reg imm]

LDR ,[reg reg]

LDR ,[reg reg LSL #2]

LDR ,[reg reg LSL #3]


LDR ,[reg reg LSL reg]

LDR ,[reg reg LSR reg]

LDR ,[reg reg ASR reg]

LDR ,[reg reg ROR reg]

LDR ,[reg reg, RRX]


LDRB ,[reg]

LDRB ,[reg imm]

LDRB ,[reg reg]

LDRB ,[reg reg LSL #2]

LDRB ,[reg reg LSL #3]

LDRH ,[reg]

LDRH ,[reg imm]

LDRH ,[reg reg]

LDRH ,[reg reg LSL #2]

LDRH ,[reg reg LSL #3]


LDRB ,[reg reg LSL reg]

LDRB ,[reg reg ASR reg]

LDRB ,[reg reg LSL reg]

LDRB ,[reg reg ASR reg]

LDRH ,[reg reg LSL reg]

LDRH ,[reg reg ASR reg]

LDRH ,[reg reg LSL reg]

LDRH ,[reg reg ASR reg]


The Cortex-R7 MPCore processor can load or store two 32-bit registers in each cycle. However, to access 64 bits, the address must be 64-bit aligned.

This scheduling is done in the Address Generation Unit (AGU). The number of cycles required by the AGU to process the load multiple or store multiple operations depends on the length of the register list and the 64-bit alignment of the address. The resulting latency is the latency of the first loaded register. Table B.3 shows the cycle timings for load multiple operations.

Table B.3. Load multiple operations cycle timings

InstructionAGU cycles to process the instruction Resulting latency
Address aligned on a 64-bit boundaryFast forward caseOther cases
LDM ,{1 register}1123

LDM ,{2 registers}



LDM ,{3 registers}2223
LDM ,{4 registers}2323
LDM ,{5 registers}3323
LDM ,{6 registers}3423
LDM ,{7 registers}4423
LDM ,{8 registers}4523
LDM ,{9 registers}5523
LDM ,{10 registers}5623
LDM ,{11 registers}6623
LDM ,{12 registers}6723
LDM ,{13 registers}7723
LDM ,{14 registers}7823
LDM ,{15 registers}8823
LDM ,{16 registers}8923

Table B.4 shows the cycle timings of store multiple operations.

Table B.4. Store multiple operations cycle timings

InstructionAGU cycles
Aligned on a 64-bit boundary
STM ,{1 register}11

STM ,{2 registers}



STM ,{3 registers}22
STM ,{4 registers}23
STM ,{5 registers}33
STM ,{6 registers}34
STM ,{7 registers}44
STM ,{8 registers}45
STM ,{9 registers}55
STM ,{10 registers}56
STM ,{11 registers}66
STM ,{12 registers}67
STM ,{13 registers}77
STM ,{14 registers}78
STM ,{15 registers}88
STM ,{16 registers}89

Copyright © 2012, 2014 ARM. All rights reserved.ARM DDI 0458C