9.9.1. Interlocks

The multiply unit in the ARM7EJ-S processor operates in both the Execute and Memory stages of the pipeline. For this reason, the multiplier result is not available until the end of the Memory stage of the pipeline. If the following instruction requires the use of the multiplier result, then it must be interlocked so that the correct value is available. This applies to all instructions that require the multiply result for the first Execute cycle or first Memory cycle of the instruction, except for multiply accumulate instructions using the previous multiply result as the accumulator operand.

For example, the following sequence incurs a single-cycle interlock:

MULr0, r1, r2SUBr4, r0, r3

The following cycle also incurs a single-cycle interlock:

MLAr0, r1, r2, r3STRr0, [r8]

The following example does not incur an interlock:

MLAr0, r1, r2, r0MLAr0, r3, r4, r0

Table 9.10 shows the cycle timing for MUL and MLA instructions with and without interlocks.

Table 9.10. Cycle timing for MUL and MLA

CycleADDRRDATATRANS
Normal1pc+3i(pc+2i)I cycle
 2pc+3i-S cycle
   (pc+3i) 
Interlock1pc+3i(pc+2i)I cycle
 2pc+3i-I cycle
 3pc+3i-S cycle
   (pc+3i) 

The MULS and MLAS instructions always take four cycles to execute, and cannot generate interlocks in following instructions.

Table 9.11 shows the cycle timing for MULS and MLAS instructions.

Table 9.11. Cycle timings for MULS and MLAS

CycleADDRRDATATRANS
1pc+3i(pc+2i)I cycle
2pc+3i-I cycle
3pc+3i-I cycle
4pc+3i-S cycle
  (pc+3i) 

Table 9.12 shows the cycle timing for SMULL, UMULL, SMLAL, and UMLAL instructions with and without interlocks.

Table 9.12. Cycle timing for SMULL, UMULL, SMLAL, and UMLAL

CycleADDRRDATATRANS
Normal1pc+3i(pc+2i)I cycle
 2pc+3i-I cycle
 3pc+3i-S cycle
   (pc+3i) 
Interlock1pc+3i(pc+2i)I cycle
 2pc+3i-I cycle
 3pc+3i-I cycle
 4pc+3i-S cycle
   (pc+3i) 

The SMULLS, UMULLS, SMLALS, and UMLALS instructions always take five cycles to execute, and cannot generate interlocks in following instructions.

Table 9.13 shows the cycle timing for the SMULLS, UMULLS, SMLALS, and UMLALS instructions.

Table 9.13. Cycle timings for SMULLS, UMULLS, SMLALS, and UMLALS

CycleADDRRDATATRANS
1pc+3i(pc+2i)I cycle
2pc+3i-I cycle
3pc+3i-I cycle
4pc+3i-I cycle
5pc+3i-S cycle
  (pc+3i) 

Table 9.14 shows the cycle timings for SMULxy, SMLAxy, SMULWy, and SMLAWy instructions with and without interlocks.

Table 9.14. Cycle timings for SMULxy, SMLAxy, SMULWy, and SMLAWy

CycleADDRRDATATRANS
Normal1pc+3i(pc+2i)S cycle
  b(pc+3i)b
Interlock1pc+3i(pc+2i)I cycle
 2pc+3i-S cycle
   (pc+3i) 

Table 9.15 shows the cycle timing for SMLALxy instructions with and without interlocks.

Table 9.15. Cycle timings for SMLALxy

CycleADDRRDATATRANS
Normal1pc+3i(pc+2i)I cycle
 2pc+3i-S cycle
   (pc+3i) 
Interlock1pc+3i(pc+2i)I cycle
 2pc+3i-I cycle
 3pc+3i-S cycle
   (pc+3i) 
Copyright ©  2001 ARM Limited. All rights reserved.ARM DDI 0214B
Non-Confidential