6.20. Instruction speed summary

Due to the pipelined architecture of the CPU, instructions overlap considerably. In a typical cycle, one instruction can be using the data path while the next is being decoded and the one after that is being fetched. For this reason Table 6.23 presents the incremental number of cycles required by an instruction, rather than the total number of cycles for which the instruction uses part of the processor. Elapsed time, in cycles, for a routine can be calculated from these figures listed in Table 6.23. These figures assume that the instruction is actually executed. Unexecuted instructions take one cycle.

If the condition is not met then all instructions take one S-cycle. The cycle types N, S, I, and C are described in Bus cycle types .

In Table 6.23:

Table 6.23. ARM instruction speed summary

InstructionCycle countAdditional

Data Processing

S

+I for SHIFT(Rs)

+S + N if R15 written

MSR, MRS

S

-

LDR

S+N+I

+S +N if R15 loaded

STR

2N

-

LDM

nS+N+I

+S +N if R15 loaded

STM

(n-1)S+2N

-

SWP

S+2N+I

-

B,BL

2S+N

-

SWI, trap

2S+N

-

MUL

S+mI

-

MLA

S+(m+1)I

-

MULL

S+(m+1)I

-

MLAL

S+(m+2)I

-

CDP

S+bI

-

LDC, STC

(n-1)S+2N+bI

-

MCR

N+bI+C

-

MRC

S+(b+1)I+C

-

Copyright © 1994-2001. All rights reserved.ARM DDI 0029G
Non-Confidential