| |||
Home > Instruction Cycle Times > Load register > Interlocks |
Unaligned word loads, load byte (LDRB
), and load
halfword (LDRH
) instructions use the byte rotate unit
in the Write stage of the pipeline. This introduces a single-cycle load-use
interlock, that can affect the two instructions immediately following
the load instruction.
The following example incurs a single-cycle interlock:
LDRB r0, [r1, #1]ADD r2, r0, r3ORR r4, r4, r5
The following example incurs a single-cycle interlock:
LDRB r0, [r1, #1]ORR r4, r4, r5ADD r2, r0, r3
When an interlock has been incurred for one instruction it
does not have to be incurred for a later instruction. For example,
the following sequence incurs a single-cycle interlock on the first ADD
instruction,
but the second ADD
does not incur any interlocks:
LDRB r0, [r1, #1]ADD r2, r0, r3ADD r4, r0, r5
A single-cycle interlock refers to the number of unwaited clock cycles to which the interlock applies. If a multi-cycle instruction separates a load instruction and the instruction using the result of the load, then no interlock can apply. The following example does not incur an interlock:
LDRB r0, [r1]MUL r6, r7, r8ADD r4, r0, r5
Table 9.17 shows the cycle timing for basic load register operations.
Table 9.17. Cycle timings for basic load register operations
Cycle | ADDR | RDATA | TRANS | |
---|---|---|---|---|
Normal case | 1 | da | (pc+2i) | N cycle |
2 | pc+3i | (da) | N cycle | |
(pc+3i) | ||||
Scaled offset | 1 | pc+3i | (pc+2i) | I cycle |
2 | da | - | N cycle | |
3 | pc+3i | (da) | N cycle | |
(pc+3i) | ||||
dest=pc | 1 | da | (pc+2i) | N cycle |
2 | pc+3i | (da) | I cycle | |
3 | pc' | - | N cycle | |
4 | pc'+i | (pc') | S cycle | |
5 | pc'+2i | (pc'+i) | S cycle | |
(pc'+2i) | ||||
Scaled offset dest=pc | 1 | pc+3i | (pc+2i) | I cycle |
2 | da | - | N cycle | |
3 | pc+3i | (da) | I cycle | |
4 | pc' | - | N cycle | |
5 | pc'+i | (pc') | S cycle | |
6 | pc'+2i | (pc'+i) | S cycle | |
(pc'+2i) |
Table 9.18 shows the cycle timing for load operations resulting in simple interlocks.
Table 9.18. Cycle timings for load operations resulting in simple interlocks
Cycle | ADDR | RDATA | TRANS | |
---|---|---|---|---|
Single-cycle interlock | 1 | da | (pc+2i) | N cycle |
2 | pc+3i | (da) | I cycle | |
3 | pc+3i | - | N cycle | |
(pc+3i) |
With more complicated interlock cases you cannot consider the load instruction in isolation. This is because in these cases the load instruction has vacated the Execute stage of the pipeline and a later instruction has occupied it.
Table 9.19 shows the one-cycle interlock incurred for the following sequence of instructions:
LDRB r0, [r1]ADD r6, r6, r3 ADD r2, r0, r1
Table 9.19. Cycle timings for an example LDRB, ADD and ADD sequence
Cycle | ADDR | RDATA | TRANS | |
---|---|---|---|---|
LDRB r0, [r1] | 1 | da | (pc+2i) | N cycle |
2 | pc+3i | (da) | N cycle | |
ADD r6, r6, r3 | 3 | pc+4i | (pc+3i) | I cycle |
4 | pc+4i | - | S cycle | |
ADD r2, r0, r1 | 5 | pc+5i | (pc+4i) | S cycle |
(pc+5i) |
Table 9.20 shows thecycle timing for the following code sequence:
LDRB r0, [r2]STMIA r3, {r0-r1}