| |||
| Home > Instruction Cycle Timings > Load and store instructions | |||
Load and store instructions are classed as:
single load and
store instructions such as LDR instructions
load and store multiple instructions such as LDM instructions.
For load multiple and store multiple instructions, the number of registers in the register list usually determines the number of cycles required to execute a load or store instruction.
The Cortex-A9 processor has special paths that immediately forward data from a load instruction to a subsequent data processing instruction in the execution units.
This path is used when the following conditions are met:
the data-processing instruction is
one of: SUB, RSB, ADD, ADC, SBC, RSC, CMN, MVN,
or CMP
the forwarded source register is not part of a shift operation.
Table B.2 shows cycle timing for single load and store operations. The result latency is the latency of the first loaded register.
Table B.2. Single load and store operation cycle timings
| Instruction cycles | AGU cycles | Result latency | |
|---|---|---|---|
| Fast forward cases | other cases | ||
| 1 | 2 | 3 |
| 1 | 3 | 4 |
| 2 | 3 | 4 |
| 2 | 4 | 5 |
The Cortex-A9 processor can load or store two 32-bit registers in each cycle. However, to access 64 bits, the address must be 64-bit aligned.
This scheduling is done in the Address Generation Unit (AGU). The number of cycles required by the AGU to process the load multiple or store multiple operations depends on the length of the register list andthe 64-bit alignment of the address. The resulting latency is the latency of the first loaded register. Table B.3 shows the cycle timings for load multiple operations.
Table B.3. Load multiple operations cycle timings
| Instruction | AGU cycles to process the instruction | Resulting latency | ||
|---|---|---|---|---|
| Address aligned on a 64-bit boundary | Fast forward case | Other cases | ||
| Yes | No | |||
LDM ,{1 register} | 1 | 1 | 2 | 3 |
| 1 | 2 | 2 | 3 |
LDM ,{3 registers} | 2 | 2 | 2 | 3 |
LDM ,{4 registers} | 2 | 3 | 2 | 3 |
LDM ,{5 registers} | 3 | 3 | 2 | 3 |
LDM ,{6 registers} | 3 | 4 | 2 | 3 |
LDM ,{7 registers} | 4 | 4 | 2 | 3 |
LDM ,{8 registers} | 4 | 5 | 2 | 3 |
LDM ,{9 registers} | 5 | 5 | 2 | 3 |
LDM ,{10 registers} | 5 | 6 | 2 | 3 |
LDM ,{11 registers} | 6 | 6 | 2 | 3 |
LDM ,{12 registers} | 6 | 7 | 2 | 3 |
LDM ,{13 registers} | 7 | 7 | 2 | 3 |
LDM ,{14 registers} | 7 | 8 | 2 | 3 |
LDM ,{15 registers} | 8 | 8 | 2 | 3 |
LDM ,{16 registers} | 8 | 9 | 2 | 3 |
Table B.4 shows the cycle timings of store multiple operations.
Table B.4. Store multiple operations cycle timings
| Instruction | AGU cycles | |
|---|---|---|
| Aligned on a 64-bit boundary | ||
| Yes | No | |
STM ,{1 register} | 1 | 1 |
| 1 | 2 |
STM ,{3 registers} | 2 | 2 |
STM ,{4 registers} | 2 | 3 |
STM ,{5 registers} | 3 | 3 |
STM ,{6 registers} | 3 | 4 |
STM ,{7 registers} | 4 | 4 |
STM ,{8 registers} | 4 | 5 |
STM ,{9 registers} | 5 | 5 |
STM ,{10 registers} | 5 | 6 |
STM ,{11 registers} | 6 | 6 |
STM ,{12 registers} | 6 | 7 |
STM ,{13 registers} | 7 | 7 |
STM ,{14 registers} | 7 | 8 |
STM ,{15 registers} | 8 | 8 |
STM ,{16 registers} | 8 | 9 |