ARM Technical Support Knowledge Articles

Performance Monitor Unit example code for ARM11 and Cortex-A/R

Applies to: ARM11 processors, Cortex-A15, Cortex-A5, Cortex-A7, Cortex-A8, Cortex-A9, Cortex-R4, Cortex-R5, DS-5

Answer

The event counting hardware on ARM11, Cortex-R and Cortex-A cores can be used for benchmarking and profiling code. This article explains how a particular function (that may call child functions) can be profiled to give chosen statistics, including cycle and instruction counts.

The above mentioned cores contain:

  • a cycle counter, which can be configured to increment for every core cycle or for every 64 core cycles

  • some configurable event counters, which can be set to count a chosen event (for example instructions executed or mispredicted branches):

    • ARM11 cores have 2 configurable event counters
    • Cortex-R4 cores have 3 configurable event counters
    • Cortex-A5 cores have 2 configurable event counters
    • Cortex-A8 cores have 4 configurable event counters
    • Cortex-A9 cores have 6 configurable event counters

The performance counters can be configured and accessed through software calls, or via ARM's debug tools.

Using Performance Counters via Software

The following procedure should be followed:

  1. Disable performance counters
  2. Set what each event counter will count
  3. Set cycle counter tick rate
  4. Reset performance counters
  5. Enable performance counters
  6. Call function to profile
  7. Disable performance counters
  8. Read out performance counters
  9. Check that performance counters did not overflow

It may then be desirable to repeat the process from step 4 onwards, this time calling an empty function in place of the function to profile. The results from this profiling can be deducted from the original results, to negate any overheads incurred in enabling / disabling the performance counters, and calling the function to be profiled.

In the ARM11, the performance counters are configured via the core's CP15 control register. In the Cortex-R and Cortex-A cores, they are configured via the CP15 Performance Monitoring Unit (PMU) registers.

You should refer to the TRM for the core that you are using for information on how to perform each of the steps above and for a full list of Events with corresponding Event Numbers. Some reference code is provided with this FAQ (see bottom of page).

Next, call the function that you wish to profile, ensuring that any variables that would ordinarily be set prior to calling this function have been set correctly.

You should always check that the performance counters have not overflowed after performing benchmarking. If a performance counter has overflowed, then the results will be invalid. The cycle counter can be configured to only tick every 64 core cycles to help prevent overflows. The event counters can't be scaled.


Example PMU code

Basic example code used to be available as a download from the ARM website.  An improved version is now shipped as part of DS-5 (from version 5.8 onwards).  Within the DS-5 Bare-metal example package, the code is included as part of the "optimization3" example.

A free evaluation version of DS-5 is available from https://silver.arm.com/

Article last edited on: 2012-02-22 15:47:14

Rate this article

[Bad]
|
|
[Good]
Disagree? Move your mouse over the bar and click

Did you find this article helpful? Yes No

How can we improve this article?

Link to this article
Copyright © 2011 ARM Limited. All rights reserved. External (Open), Non-Confidential