ARM Technical Support Knowledge Articles

How can I count the number of instructions executed by the processor in a given time interval?

Applies to: Cortex-M3, Cortex-M4



The processor contains an optional DWT unit which provides a number of cycle counters.

The basic cycle counter DWT_CYCCNT increments on each clock cycle when the processor is not halted in debug state.

A variety of performance monitor counters are provided, which count the number of clock cycles during which the processor diverges from its usual behavior of executing one instruction per cycle. Most of these performance monitors account for cycles where no additional instruction is executed for one of a number of reasons:

  DWT_CPICNT - additional cycles required to execute multi-cycle instructions and instruction fetch stalls

  DWT_EXCCNT - cycles spent performing exception entry and exit procedures

  DWT_SLEEPCNT - cycles spent sleeping

  DWT_LSUCNT - cycles spent waiting for loads and stores to complete

There is also a performance monitor for cycles saved by "folded" instructions:

   DWT_FOLDCNT - cycles saved by instructions which execute in zero cycles

So if the processor includes the DWT profiling counters, the instruction count can be calculated as:


This result is architecturally defined to be approximate. See section "Profiling counter accuracy" in the ARMv7-M Architecture Reference Manual for details.

For a finished, packaged chip, if the chip includes an ETM module for instruction trace, a debugger connected to the trace port output should be able to count instructions exactly, as every instruction is reported in the streaming trace exported on the trace port. However, depending on the processor clock speed and the trace channel bandwidth, it is possible that there may be intermittent gaps in the trace stream due to trace channel capacity overload.

For chip designers who are running a logic simulation of the chip design using the RTL description of the processor, or using the Design Simulation Model (DSM), the exact instruction count can be observed by enabling the "tarmac" logging feature to generate a text log file history of the processor activity during the simulation run, or by enabling the ETM interface of the processor (whether or not the ETM option is implemented in the design) and counting the cycles where the ETMIVALID signal is asserted High.

Rate this article

Disagree? Move your mouse over the bar and click

Did you find this article helpful? Yes No

How can we improve this article?

Link to this article
Copyright © 2011 ARM Limited. All rights reserved. External (Open), Non-Confidential