|ARM Technical Support Knowledge Articles|
The event counting hardware on ARM11, Cortex-R and Cortex-A cores can be used for benchmarking and profiling code. This article explains how a particular function (that may call child functions) can be profiled to give chosen statistics, including cycle and instruction counts.
The above mentioned cores contain:
a cycle counter, which can be configured to increment for every core cycle or for every 64 core cycles
some configurable event counters, which can be set to count a chosen event (for example instructions executed or mispredicted branches):
The performance counters can be configured and accessed through software calls, or via ARM's debug tools.
Using Performance Counters via Software
The following procedure should be followed:
It may then be desirable to repeat the process from step 4 onwards, this time calling an empty function in place of the function to profile. The results from this profiling can be deducted from the original results, to negate any overheads incurred in enabling / disabling the performance counters, and calling the function to be profiled.
In the ARM11, the performance counters are configured via the core's CP15 control register. In the Cortex-R and Cortex-A cores, they are configured via the CP15 Performance Monitoring Unit (PMU) registers.
You should refer to the TRM for the core that you are using for information on how to perform each of the steps above and for a full list of Events with corresponding Event Numbers. Some reference code is provided with this FAQ (see bottom of page).
Next, call the function that you wish to profile, ensuring that any variables that would ordinarily be set prior to calling this function have been set correctly.
You should always check that the performance counters have not overflowed after performing benchmarking. If a performance counter has overflowed, then the results will be invalid. The cycle counter can be configured to only tick every 64 core cycles to help prevent overflows. The event counters can't be scaled.
Example PMU code
Basic example code used to be available as a download from the ARM website. An improved version is now shipped as part of DS-5 (from version 5.8 onwards). Within the DS-5 Bare-metal example package, the code is included as part of the "optimization3" example.
A free evaluation version of DS-5 is available from https://silver.arm.com/
Article last edited on: 2012-02-22 15:47:14
Did you find this article helpful? Yes No
How can we improve this article?