9.11.1. Interlocks

Unaligned word loads, load byte (LDRB), and load halfword (LDRH) instructions use the byte rotate unit in the Write stage of the pipeline. This introduces a single-cycle load-use interlock, that can affect the two instructions immediately following the load instruction.

The following example incurs a single-cycle interlock:

LDRB r0, [r1, #1]ADD r2, r0, r3ORR r4, r4, r5

The following example incurs a single-cycle interlock:

LDRB r0, [r1, #1]ORR r4, r4, r5ADD r2, r0, r3

When an interlock has been incurred for one instruction it does not have to be incurred for a later instruction. For example, the following sequence incurs a single-cycle interlock on the first ADD instruction, but the second ADD does not incur any interlocks:

LDRB r0, [r1, #1]ADD r2, r0, r3ADD r4, r0, r5

A single-cycle interlock refers to the number of unwaited clock cycles to which the interlock applies. If a multi-cycle instruction separates a load instruction and the instruction using the result of the load, then no interlock can apply. The following example does not incur an interlock:

LDRB r0, [r1]MUL r6, r7, r8ADD r4, r0, r5

Table 9.17 shows the cycle timing for basic load register operations.

Table 9.17. Cycle timings for basic load register operations

CycleADDRRDATATRANS
Normal case1da(pc+2i)N cycle
 2pc+3i(da)N cycle
   (pc+3i) 
Scaled offset1pc+3i(pc+2i)I cycle
 2da-N cycle
 3pc+3i(da)N cycle
   (pc+3i) 
dest=pc1da(pc+2i)N cycle
 2pc+3i(da)I cycle
 3pc'-N cycle
 4pc'+i(pc')S cycle
 5pc'+2i(pc'+i)S cycle
   (pc'+2i) 
Scaled offset dest=pc1pc+3i(pc+2i)I cycle
 2da-N cycle
 3pc+3i(da)I cycle
 4pc'-N cycle
 5pc'+i(pc')S cycle
 6pc'+2i(pc'+i)S cycle
   (pc'+2i) 

Table 9.18 shows the cycle timing for load operations resulting in simple interlocks.

Table 9.18. Cycle timings for load operations resulting in simple interlocks

CycleADDRRDATATRANS
Single-cycle interlock1da(pc+2i)N cycle
 2pc+3i(da)I cycle
 3pc+3i-N cycle
   (pc+3i) 

With more complicated interlock cases you cannot consider the load instruction in isolation. This is because in these cases the load instruction has vacated the Execute stage of the pipeline and a later instruction has occupied it.

Table 9.19 shows the one-cycle interlock incurred for the following sequence of instructions:

LDRB r0, [r1]ADD r6, r6, r3 ADD r2, r0, r1

Table 9.19. Cycle timings for an example LDRB, ADD and ADD sequence

CycleADDRRDATATRANS
LDRB r0, [r1]1da(pc+2i)N cycle
 2pc+3i(da)N cycle
ADD r6, r6, r33pc+4i (pc+3i)I cycle
 4pc+4i-S cycle
ADD r2, r0, r15pc+5i (pc+4i)S cycle
   (pc+5i) 

Table 9.20 shows thecycle timing for the following code sequence:

LDRB r0, [r2]STMIA r3, {r0-r1}

Table 9.20. Cycle timings for an example LDRB and STMIA sequence

CycleADDRRDATATRANSWDATA
LDRB r0, [r2]1da(pc+2i)N cycle 
 2pc+3i(da)N cycle 
STMIA r3, {r0-r1}3pc+4i(pc+3i)I cycle 
 4r3-N cycle 
 5r3+4-S cycler0
 6pc+4i-N cycler1
   (pc+4i)  
Copyright ©  2001 ARM Limited. All rights reserved.ARM DDI 0214B
Non-Confidential