3.3.1. Cortex-M3 instructions

The processor implements the ARMv7-M Thumb instruction set. Table 3.1 shows the Cortex-M3 instructions and their cycle counts. The cycle counts are based on a system with zero wait states.

Within the assembler syntax, depending on the operation, the <op2> field can be replaced with one of the following options:

For brevity, not all load and store addressing modes are shown. See the ARMv7-M Architecture Reference Manual for more information.

Table 3.1 uses the following abbreviations in the Cycles column:


The number of cycles required for a pipeline refill. This ranges from 1 to 3 depending on the alignment and width of the target instruction, and whether the processor manages to speculate the address early.


The number of cycles required to perform the barrier operation. For DSB and DMB, the minimum number of cycles is zero. For ISB, the minimum number of cycles is equivalent to the number required for a pipeline refill.


The number of registers in the register list to be loaded or stored, including PC or LR.


The number of cycles spent waiting for an appropriate event.

Table 3.1. Cortex-M3 instruction set summary

MoveRegisterMOV Rd, <op2>1
16-bit immediateMOVW Rd, #<imm>1
Immediate into topMOVT Rd, #<imm>1
To PCMOV PC, Rm1 + P
AddAddADD Rd, Rn, <op2>1
Add to PCADD PC, PC, Rm1 + P
Add with carryADC Rd, Rn, <op2>1
Form addressADR Rd, <label>1
SubtractSubtractSUB Rd, Rn, <op2>1
Subtract with borrowSBC Rd, Rn, <op2>1
ReverseRSB Rd, Rn, <op2>1
MultiplyMultiplyMUL Rd, Rn, Rm1
Multiply accumulateMLA Rd, Rn, Rm2
Multiply subtractMLS Rd, Rn, Rm2
Long signedSMULL RdLo, RdHi, Rn, Rm3 to 5[a]
Long unsignedUMULL RdLo, RdHi, Rn, Rm3 to 5[a]
Long signed accumulateSMLAL RdLo, RdHi, Rn, Rm4 to 7[a]
Long unsigned accumulateUMLAL RdLo, RdHi, Rn, Rm4 to 7[a]
DivideSignedSDIV Rd, Rn, Rm2 to 12[b]
UnsignedUDIV Rd, Rn, Rm2 to 12[b]
SaturateSignedSSAT Rd, #<imm>, <op2>1
UnsignedUSAT Rd, #<imm>, <op2>1
CompareCompareCMP Rn, <op2>1
NegativeCMN Rn, <op2>1
LogicalANDAND Rd, Rn, <op2>1
Exclusive OREOR Rd, Rn, <op2>1
ORORR Rd, Rn, <op2>1
OR NOTORN Rd, Rn, <op2>1
Bit clearBIC Rd, Rn, <op2>1
Move NOTMVN Rd, <op2>1
AND testTST Rn, <op2>1
Exclusive OR testTEQ Rn, <op1> 
ShiftLogical shift leftLSL Rd, Rn, #<imm>1
Logical shift leftLSL Rd, Rn, Rs1
Logical shift rightLSR Rd, Rn, #<imm>1
Logical shift rightLSR Rd, Rn, Rs1
Arithmetic shift rightASR Rd, Rn, #<imm>1
Arithmetic shift rightASR Rd, Rn, Rs1
RotateRotate rightROR Rd, Rn, #<imm>1
Rotate rightROR Rd, Rn, Rs1
With extensionRRX Rd, Rn1
CountLeading zeroesCLZ Rd, Rn1
LoadWordLDR Rd, [Rn, <op2>]2[c]
To PCLDR PC, [Rn, <op2>]2[c] + P
HalfwordLDRH Rd, [Rn, <op2>]2[c]
ByteLDRB Rd, [Rn, <op2>]2[c]
Signed halfwordLDRSH Rd, [Rn, <op2>]2[c]
Signed byteLDRSB Rd, [Rn, <op2>]2[c]
User wordLDRT Rd, [Rn, #<imm>]2[c]
User halfwordLDRHT Rd, [Rn, #<imm>]2[c]
User byteLDRBT Rd, [Rn, #<imm>]2[c]
User signed halfwordLDRSHT Rd, [Rn, #<imm>]2[c]
User signed byteLDRSBT Rd, [Rn, #<imm>]2[c]
PC relativeLDR Rd,[PC, #<imm>]2[c]
DoublewordLDRD Rd, Rd, [Rn, #<imm>]1 + N
MultipleLDM Rn, {<reglist>}1 + N
Multiple including PCLDM Rn, {<reglist>, PC}1 + N + P
StoreWordSTR Rd, [Rn, <op2>]2[c]
HalfwordSTRH Rd, [Rn, <op2>]2[c]
ByteSTRB Rd, [Rn, <op2>]2[c]
Signed halfwordSTRSH Rd, [Rn, <op2>]2[c]
Signed byteSTRSB Rd, [Rn, <op2>]2[c]
User wordSTRT Rd, [Rn, #<imm>]2[c]
User halfwordSTRHT Rd, [Rn, #<imm>]2[c]
User byteSTRBT Rd, [Rn, #<imm>]2[c]
User signed halfwordSTRSHT Rd, [Rn, #<imm>]2[c]
User signed byteSTRSBT Rd, [Rn, #<imm>]2c
DoublewordSTRD Rd, Rd, [Rn, #<imm>]1 + N
MultipleSTM Rn, {<reglist>}1 + N
PushPushPUSH {<reglist>}1 + N
Push with link registerPUSH {<reglist>, LR}1 + N
PopPopPOP {<reglist>}1 + N
Pop and returnPOP {<reglist>, PC}1 + N + P
SemaphoreLoad exclusiveLDREX Rd, [Rn, #<imm>]2
Load exclusive halfLDREXH Rd, [Rn]2
Load exclusive byteLDREXB Rd, [Rn]2
Store exclusiveSTREX Rd, Rt, [Rn, #<imm>]2
Store exclusive halfSTREXH Rd, Rt, [Rn]2
Store exclusive byteSTREXB Rd, Rt, [Rn]2
Clear exclusive monitorCLREX1
BranchConditionalB<cc> <label>1 or 1 + P[d]
UnconditionalB <label>1 + P
With linkBL <label>1 + P
With exchangeBX Rm1 + P
With link and exchangeBLX Rm1 + P
Branch if zeroCBZ Rn, <label>1 or 1 + P[d]
Branch if non-zeroCBNZ Rn, <label>1 or 1 + P[d]
Byte table branchTBB [Rn, Rm]2 + P
Halfword table branchTBH [Rn, Rm, LSL#1]2 + P
State changeSupervisor callSVC #<imm>-
If-then-elseIT... <cond>1[e]
Disable interruptsCPSID <flags>1 or 2
Enable interruptsCPSIE <flags>1 or 2
Read special registerMRS Rd, <specreg>1 or 2
Write special registerMSR <specreg>, Rn1 or 2
BreakpointBKPT #<imm>-
ExtendSigned halfword to wordSXTH Rd, <op2>1
Signed byte to wordSXTB Rd, <op2>1
Unsigned halfwordUXTH Rd, <op2>1
Unsigned byteUXTB Rd, <op2>1
Bit fieldExtract unsignedUBFX Rd, Rn, #<imm>, #<imm>1
Extract signedSBFX Rd, Rn, #<imm>, #<imm>1
ClearBFC Rd, Rn, #<imm>, #<imm>1
InsertBFI Rd, Rn, #<imm>, #<imm>1
ReverseBytes in wordREV Rd, Rm1
Bytes in both halfwordsREV16 Rd, Rm1
Signed bottom halfwordREVSH Rd, Rm1
Bits in wordRBIT Rd, Rm1
HintSend eventSEV1
Wait for eventWFE1 + W
Wait for interruptWFI1 + W
No operationNOP1
BarriersInstruction synchronizationISB1 + B
Data memoryDMB1 + B
Data synchronizationDSB <flags>1 + B

[a] UMULL, SMULL, UMLAL, and SMLAL instructions use early termination depending on the size of the source values. These are interruptible, that is abandoned and restarted, with worst case latency of one cycle.

[b] Division operations use early termination to minimize the number of cycles required based on the number of leading ones and zeroes in the input operands.

[c] Neighboring load and store single instructions can pipeline their address and data phases. This enables these instructions to complete in a single execution cycle.

[d] Conditional branch completes in a single cycle if the branch is not taken.

[e] An IT instruction can be folded onto a preceding 16-bit Thumb instruction, enabling execution in zero cycles.

Copyright © 2005-2008, 2010 ARM Limited. All rights reserved.ARM DDI 0337H