ARM Technical Support Knowledge Articles

What is the true interrupt latency of Cortex-M3 and Cortex-M4 for interrupt entry and exit?

Applies to: Cortex-M3, Cortex-M4


The interrupt latency for interrupt entry is the number of processor clock cycles between an interrupt signal arriving at the processor and the processor executing the first instruction of the interrupt handler. Conversely, the interrupt exit latency is the number of processor clock cycles between execution of the interrupt return instruction and execution of the next instruction in the interrupted execution context.

The Cortex-M4 Technical Reference Manual (TRM) states that the interrupt latency on entry is 12 cycles, plus a possible additional 17 cycles for Cortex-M4 with Floating Point Unit (FPU) implemented, and the latency on exit is ten cycles, plus a possible additional 17 cycles for Cortex-M4 with FPU.

Cortex-M3 TRM (for example, revision 'I') states that the interrupt latency on entry is 12 cycles and the latency on exit is also 12 cycles. This is a typographical error in the Cortex-M3 TRM. The Cortex-M3 has a latency on exit of ten cycles, just like the Cortex-M4.


The basic interrupt entry latency of 12 cycles depends upon a number of conditions relating to both the chip design and the software programming of the processor.

The processor has three main physical interfaces to its memory system; I-Code and D-Code  each accessing addresses below 0x20000000, and System accessing addresses 0x20000000 and higher. The 12 cycle latency requires that a nine cycle stack push can take place on one interface (typically the System interface) in parallel with a six cycle vector table read and interrupt handler fetch on other interfaces (typically I-Code). If these operations cannot be performed in parallel, they will have to be performed one after the other, increasing the latency. The ability to perform these memory accesses in parallel depends upon both the hardware design providing the relevant memory blocks at suitable addresses, and software programming of the location of the vector table, interrupt handler code, and stack in those memories.

The 12 cycle latency requires that there are no wait-states, either explicit or implicit, in the memory system. The procedure described above consists of many memory accesses. Wait-states cause memory accesses to take additional cycles, increasing the latency. Explicit wait-states are related to the design of the memory system, and correspond to the memory system indicating "not ready" when the processor tries to access memory. Implicit wait-states are caused by optional features of the processor design and software programming. The processor has an optional "bit-band" feature where a programmed store operation is converted into a read-modify-write operation to change a single bit in memory. Use of this feature adds an implicit wait-state, which is added to the interrupt latency if the interrupt signal coincides with a bit-band write. The processor is also able to perform unaligned memory accesses, where the address being accessed is not a multiple of the size of the access. Since the bus protocol used by the processor does not support this type of access, the access is converted into two or three smaller aligned accesses on the bus. This can add one or two implicit wait-states to the access. In addition, the chip designer may have implemented a feature called CONST_AHB_CTRL (or, in some documents, AHB_CONST_CTRL), which can add one further implicit wait-state to unaligned accesses.

It is software programmable whether the Cortex-M4 with FPU will stack seventeen floating point registers during interrupt entry. It is generally preferable to use the "lazy stacking" option, which defers this additional stacking until it becomes necessary, and in many cases avoids the need to stack these registers at all.

For both processors, the notional 12 cycle interrupt latency can be reduced in the case of a late-arriving interrupt. This occurs when a higher priority interrupt arrives during the interrupt entry sequence for a lower priority interrupt. Since the processor has already started to push the interrupted context onto the stack, the latency for the new interrupt can be reduced, but must still include at least the two cycles required to recognize the new interrupt and the six cycle vector table read and interrupt handler fetch, so cannot be less than eight cycles.

Consequently, in a device with no wait-states in the external memory system, and with the stack, vector table and handler code memory located in ideal memory locations, the interrupt latency may actually be between eight and 15 cycles if the features which have implicit wait-states are used, and between eight and 32 cycles if the floating point context in a Cortex-M4 with FPU is stacked immediately as well.

These calculations do not include any synchronizer external to the processor, which would add further cycles to the effective interrupt latency. It is recommended that chip designers include an external synchronizer on any interrupt signal arriving from an asynchronous clock domain.

The interrupt exit latency is similarly affected by which bus interfaces are required for the stack pop and the fetch of the interrupted code stream. Note that this may not be the same memory as the interrupt handler code fetch. Interrupt exit latency is also similarly affected by explicit wait-states in the memory system.

Interrupt exit latency is not affected by implicit wait-states, as the processor will (by definition) be executing a "return from exception" instruction rather than a bit-band or unaligned memory access at the point where it recognizes the return from the interrupt handler.

In a Cortex-M4 with FPU, if the floating point registers were stacked on exception entry or if they were stacked later via lazy stacking, the exception return will include the additional 17 cycles. This means that the additional 17 cycles will occur more often on interrupt exit than on interrupt entry, if lazy stacking is used.

Consequently, in a device with no wait-states in the external memory system, and with the stack, vector table and the code for the interrupted context located in ideal memory locations, the interrupt exit latency can be either ten cycles, or for Cortex-M4 with FPU and with an interrupted floating point context in the stack frame, 27 cycles.

In a memory system with explicit wait-states (meaning that the interconnect or slaves can add stall cycles to the data phase of a transfer), an interrupt may be recognized during the wait-states in the data phase of a memory access. This data phase must complete before the interrupt entry sequence can begin, extending the interrupt entry latency. Additionally, according to the AMBA 3 AHB-Lite protocol, if a further address phase has been indicated on the bus for the next memory access in program order, this additional memory access must also complete before the interrupt can be taken. The chip designer has the option to configure the processor to respect this protocol rule, using the CONST_AHB_CTRL feature, or to ignore this protocol rule by deselecting CONST_AHB_CTRL.

So in a memory system with wait-states, calculation of the worst-case additional interrupt latency depends upon wait-states in the data phase of a current memory access, a possible additional set of wait-states for a further memory access if CONST_AHB_CTRL is used, wait-states on the multiple memory accesses cause by bit-band or unaligned data accesses, wait-states on the stack push of eight or possibly twenty-five registers, and wait states on the vector table read and handler code fetch. There is also a capability for software to disable interruption of multi-cycle instructions (ACTLR.DISMCYCINT), which could further extend the number of memory accesses that must be completed before the interrupt entry procedure starts.

Rate this article

Disagree? Move your mouse over the bar and click

Did you find this article helpful? Yes No

How can we improve this article?

Link to this article
Copyright © 2011 ARM Limited. All rights reserved. External (Open), Non-Confidential