16.4.2. Memory system effects on instruction timings

Because the processor is a statically scheduled design, any stall from the memory system can result in the minimum of a 8-cycle delay. This 8-cycle delay minimum is balanced with the minimum number of possible cycles to receive data from the L2 cache in the case of an L1 load miss. Table 16.16 gives the most common cases that can result in an instruction replay because of a memory system stall.

Table 16.16. Memory system effects on instruction timings

Replay eventDelayDescription
Load data miss8 cycles
  1. A load instruction misses in the L1 data cache.

  2. A request is then made to the L2 data cache.

  3. If a miss also occurs in the L2 data cache, then a second replay occurs. The number of stall cycles depends on the external system memory timing. The time required to receive the critical word for an L2 cache miss is 18 core cycles plus the number of cycles required by the external memory system. The minimum number of additional cycles required for the external system is 2 cycles, making the total minimum cycle count 20 cycles. However, 20 cycles are likely to be optimistic because this can only occur in a system with a 1:1 bus ratio and zero wait-state memory.

Data TLB miss24 cycles
  1. A table walk because of a miss in the L1 TLB causes a 24-cycle delay, assuming the translation table entries are found in the L2 cache.

  2. If the translation table entries are not present in the L2 cache, the number of stall cycles depends on the external system memory timing.

Store buffer full

8 cycles plus latency to drain fill buffer

  1. A store instruction miss does not result in any stalls unless the store buffer is full.

  2. In the case of a full store buffer, the delay is at least eight cycles. The delay can be more if it takes longer to drain some entries from the store buffer.

Unaligned

load or store

request

8 cycles
  1. If a load instruction address is unaligned and the full access is not contained within a 128-bit boundary, there is a 8-cycle penalty.

  2. If a store instruction address is unaligned and the full access is not contained within a 64-bit boundary, there is a 8-cycle penalty.


Copyright © 2006-2009 ARM Limited. All rights reserved.ARM DDI 0344I
Non-Confidential