17.7. Multiplies

The multiplier consists of a three-cycle pipeline with early result forwarding not possible other than to the internal accumulate path. For a subsequent multiply accumulate the result is available one cycle earlier than for all other uses of the result.

Certain multiplies require:

Multiplies with 64-bit results take and require two cycles to write the results, consequently they have two result latencies with the low half of the result always available first. The multiplicand and multiplier are required as Early registers because they are both required at the start of MAC1.

Table 17.10 shows the cycle timing behavior of example multiply instructions.

Table 17.10. Example multiply instruction cycle timing behavior

Example instructionCycles

Cycles if sets flags

Early register

Late register

Result latency

MUL(S)25<Rm>, <Rs>-4
MLA(S), MLS25<Rm>, <Rs><Rn>4
SMULL(S)36<Rm>, <Rs>-4/5
UMULL(S)36<Rm>, <Rs>-4/5
SMLAL(S)36<Rm>, <Rs><RdLo>4/5
UMLAL(S)36<Rm>, <Rs><RdLo>4/5
SMULxy1-<Rm>, <Rs>-3
SMLAxy1-<Rm>, <Rs>-3
SMULWy1-<Rm>, <Rs>-3
SMLAWy1-<Rm>, <Rs>-3
SMLALxy2-<Rm>, <Rs><RdHi>3/4
SMUAD, SMUADX1-<Rm>, <Rs>-3
SMLAD, SMLADX1-<Rm>, <Rs>-3
SMUSD, SMUSDX1-<Rm>, <Rs>-3
SMLSD, SMLSDX1-<Rm>, <Rs>-3
SMMUL, SMMULR2-<Rm>, <Rs>-4
SMMLA, SMMLAR2-<Rm>, <Rs><Rn>4
SMMLS, SMMLSR2-<Rm>, <Rs><Rn>4
SMLALD, SMLALDX2-<Rm>, <Rs><RdHi>3/4
SMLSLD, SMLSLDX2-<Rm>, <Rs><RdHi>3/4
UMAAL3-<Rm>, <Rs><RdLo>4/5

Note

Result latency is one less if the result is used as the accumulate register for a subsequent multiply accumulate.

Copyright © 2005-2007 ARM Limited. All rights reserved.ARM DDI 0290G
Non-Confidential