B.7. Multiplies

Most multiply operations cannot forward their result early, except as the accumulate value for a subsequent multiply. For a subsequent multiply accumulate the result is available one cycle earlier than for all other uses of the result.

Certain multiplies require:

The multiplicand and multiplier are required as Early Regs because they are both required at the end of the Iss stage.

Flag-setting multiplies followed by a conditional instruction interlock the conditional instruction for one cycle, or two cycles if the instruction is a conditional multiply. Flag-setting multiplies followed by a flag-setting instruction interlock the flag-setting instruction for one cycle, unless the instruction is a flag-setting multiply in which case there is no interlock.

Table B.9 shows the cycle timing behavior of example multiply instructions.

Table B.9. Example multiply instruction cycle timing behavior

Example instructionCyclesEarly RegLate RegResult latency
MUL(S)2<Rn>, <Rm>-3
MLA(S), MLS2<Rn>, <Rm><Ra>3
SMULL(S)2<Rn>, <Rm>-3, 3
UMULL(S)2<Rn>, <Rm>-3, 3
SMLAL(S)2<Rn>, <Rm><RdLo>, <RdHi>3, 3
UMLAL(S)2<Rn>, <Rm><RdLo>, <RdHi>3, 3
SMULxy1<Rn>, <Rm>-2
SMLAxy1<Rn>, <Rm>-2
SMULWy1<Rn>, <Rm>-2
SMLAWy1<Rn>, <Rm>-2
SMLALxy2<Rn>, <Rm><RdLo>, <RdHi>3, 3
SMUAD, SMUADX1<Rn>, <Rm>-2
SMLAD, SMLADX1<Rn>, <Rm>-2
SMUSD, SMUSDX1<Rn>, <Rm>-2
SMLSD, SMLSDX1<Rn>, <Rm>-2
SMMUL, SMMULR2<Rn>, <Rm>-3
SMMLA, SMMLAR2<Rn>, <Rm><Ra>3
SMMLS, SMMLSR2<Rn>, <Rm><Ra>3
SMLALD, SMLALDX1<Rn>, <Rm>-2, 2
SMLSLD, SMLSLDX1<Rn>, <Rm>-2, 2
UMAAL2<Rn>, <Rm><RdLo>, <RdHi>3, 3


Result Latency is one less if the result is used as the accumulate value for a subsequent multiply accumulate. This only applies if the result is the same width as the accumulate value, that is 32 or 64 bits.

Copyright © 2010-2011 ARM. All rights reserved.ARM DDI 0460C