| |||
Home > Instruction Cycle Times > Multiply and multiply accumulate > Interlocks |
The multiply unit in the ARM7EJ-S processor operates in both the Execute and Memory stages of the pipeline. For this reason, the multiplier result is not available until the end of the Memory stage of the pipeline. If the following instruction requires the use of the multiplier result, then it must be interlocked so that the correct value is available. This applies to all instructions that require the multiply result for the first Execute cycle or first Memory cycle of the instruction, except for multiply accumulate instructions using the previous multiply result as the accumulator operand.
For example, the following sequence incurs a single-cycle interlock:
MULr0, r1, r2SUBr4, r0, r3
The following cycle also incurs a single-cycle interlock:
MLAr0, r1, r2, r3STRr0, [r8]
The following example does not incur an interlock:
MLAr0, r1, r2, r0MLAr0, r3, r4, r0
Table 9.10 shows
the cycle timing for MUL
and MLA
instructions
with and without interlocks.
Table 9.10. Cycle timing for MUL and MLA
Cycle | ADDR | RDATA | TRANS | |
---|---|---|---|---|
Normal | 1 | pc+3i | (pc+2i) | I cycle |
2 | pc+3i | - | S cycle | |
(pc+3i) | ||||
Interlock | 1 | pc+3i | (pc+2i) | I cycle |
2 | pc+3i | - | I cycle | |
3 | pc+3i | - | S cycle | |
(pc+3i) |
The MULS
and MLAS
instructions always
take four cycles to execute, and cannot generate interlocks in following
instructions.
Table 9.11 shows
the cycle timing for MULS
and MLAS
instructions.
Table 9.11. Cycle timings for MULS and MLAS
Cycle | ADDR | RDATA | TRANS |
---|---|---|---|
1 | pc+3i | (pc+2i) | I cycle |
2 | pc+3i | - | I cycle |
3 | pc+3i | - | I cycle |
4 | pc+3i | - | S cycle |
(pc+3i) |
Table 9.12 shows
the cycle timing for SMULL
, UMULL
, SMLAL
,
and UMLAL
instructions with and without interlocks.
Table 9.12. Cycle timing for SMULL, UMULL, SMLAL, and UMLAL
Cycle | ADDR | RDATA | TRANS | |
---|---|---|---|---|
Normal | 1 | pc+3i | (pc+2i) | I cycle |
2 | pc+3i | - | I cycle | |
3 | pc+3i | - | S cycle | |
(pc+3i) | ||||
Interlock | 1 | pc+3i | (pc+2i) | I cycle |
2 | pc+3i | - | I cycle | |
3 | pc+3i | - | I cycle | |
4 | pc+3i | - | S cycle | |
(pc+3i) |
The SMULLS
, UMULLS
, SMLALS
,
and UMLALS
instructions always take five cycles to
execute, and cannot generate interlocks in following instructions.
Table 9.13 shows
the cycle timing for the SMULLS
, UMULLS
, SMLALS
,
and UMLALS
instructions.
Table 9.13. Cycle timings for SMULLS, UMULLS, SMLALS, and UMLALS
Cycle | ADDR | RDATA | TRANS |
---|---|---|---|
1 | pc+3i | (pc+2i) | I cycle |
2 | pc+3i | - | I cycle |
3 | pc+3i | - | I cycle |
4 | pc+3i | - | I cycle |
5 | pc+3i | - | S cycle |
(pc+3i) |
Table 9.14 shows
the cycle timings for SMULxy
, SMLAxy
, SMULWy
,
and SMLAWy
instructions with and without interlocks.
Table 9.14. Cycle timings for SMULxy, SMLAxy, SMULWy, and SMLAWy
Cycle | ADDR | RDATA | TRANS | |
---|---|---|---|---|
Normal | 1 | pc+3i | (pc+2i) | S cycle |
b | (pc+3i) | b | ||
Interlock | 1 | pc+3i | (pc+2i) | I cycle |
2 | pc+3i | - | S cycle | |
(pc+3i) |
Table 9.15 shows
the cycle timing for SMLALxy
instructions with and
without interlocks.