C2.3 Performance monitoring events

The PMU monitors events in the processor and uses reference numbers for significant ones.

The following table shows the bit position of each event on the event bus. Event reference numbers that are not listed are reserved.

Table C2-3 Performance monitoring events

Number Event mnemonic PMU event bus to external PMU event bus to trace Event name
0x00 SW_INCR - - Software increment. The register is incremented only on writes to the Software Increment Register.
0x01 L1I_CACHE_REFILL [0] [0] L1 Instruction cache refill.
0x02 L1I_TLB_REFILL [1] [1] L1 Instruction TLB refill.
0x03 L1D_CACHE_REFILL [2] [2] L1 Data cache refill.
0x04 L1D_CACHE [3] [3] L1 Data cache access.
0x05 L1D_TLB_REFILL [4] [4] L1 Data TLB refill.
0x06 LD_RETIRED [5] [5] Instruction that is architecturally executed, condition check pass - load.
0x07 ST_RETIRED [6] [6] Instruction that is architecturally executed, condition check pass - store.
0x08 INST_RETIRED [7] [7] Instruction that is architecturally executed.
- - [8] [8]

Two instructions are architecturally executed.

Counts every cycle in which two instructions are architecturally retired. Event 0x08, INST_RETIRED, always counts when this event counts.

0x09 EXC_TAKEN [9] [9] Exception taken.
0x0A EXC_RETURN [10] [10] Exception return.
0x0B CID_WRITE_RETIRED [11] [11] Change to Context ID retired.
0x0C PC_WRITE_RETIRED [12] [12] Instruction that is architecturally executed, condition check pass, software change of the PC.
0x0D BR_IMMED_RETIRED [13] [13] Instruction that is architecturally executed, immediate branch.
0x0E BR_RETURN_RETIRED - -

Instruction that is architecturally executed, condition code check pass, procedure return.

0x0F UNALIGNED_LDST_RETIRED [14] [14] Instruction that is architecturally executed, condition check pass, unaligned load or store.
0x10 BR_MIS_PRED [15] [15] Mispredicted or not predicted branch that is speculatively executed.
0x11 CPU_CYCLES - - Cycle.
0x12 BR_PRED [16] [16] Predictable branch that is speculatively executed.
0x13 MEM_ACCESS [17] [17] Data memory access.
0x14 L1I_CACHE [18] [18] L1 Instruction cache access.
0x15 L1D_CACHE_WB [19] [19] L1 Data cache writeback.
0x16 L2D_CACHE [20] [20] L2 Data cache access.
0x17 L2D_CACHE_REFILL [21] [21] L2 Data cache refill.
0x18 L2D_CACHE_WB [22] [22] L2 Data cache write-back.
0x19 BUS_ACCESS - - Bus access.
0x1A MEMORY_ERROR - - Local memory error.
0x1D BUS_CYCLES - - Bus cycle.
0x1E CHAIN - - Odd performance counter chain mode.
0x60 BUS_ACCESS_LD - - Bus access - Read.
0x61 BUS_ACCESS_ST - - Bus access - Write.
0x7A BR_INDIRECT_SPEC - - Branch that is speculatively executed - Indirect branch.
0x86 EXC_IRQ - - Exception taken, IRQ.
0x87 EXC_FIQ - - Exception taken, FIQ.
0xC0 - - - External memory request.
0xC1 - - - Non-cacheable external memory request.
0xC2 - - - Linefill because of prefetch.
0xC4 - - - Entering read allocate mode.
0xC5 - - - Read allocate mode.
0xC6 - - - Pre-decode error.
0xC7 - - - Data Write operation that stalls the pipeline because the store buffer is full.
0xC8 - - - SCU Snooped data from another core for this core.
0xC9 - - - Conditional branch that is executed.
0xCA - - - Indirect branch that is mispredicted.
0xCB - - - Indirect branch that is mispredicted because of address miscompare.
0xCC - - - Conditional branch that is mispredicted.
0xD0 - [23] [23] L1 Instruction Cache (data or tag) memory error.
0xD1 - [24] [24] L1 Data Cache (data, tag, or dirty) memory error, correctable or non-correctable.
0xD2 - [25] [25] TLB memory error.
0xE0 - - -

Attributable Performance Impact Event.

Counts every cycle that the DPU IQ is empty and that is not because of a recent micro-TLB miss, an instruction cache miss or a pre-decode error.

0xE1 - - -

Attributable Performance Impact Event.

Counts every cycle the DPU IQ is empty and there is an instruction cache miss being processed.

0xE2 - - -

Attributable Performance Impact Event.

Counts every cycle the DPU IQ is empty and there is an instruction micro-TLB miss being processed.

0xE3 - - -

Attributable Performance Impact Event.

Counts every cycle the DPU IQ is empty and there is a pre-decode error being processed.

0xE4 - - -

Attributable Performance Impact Event.

Counts every cycle there is an interlock that is not because of an Advanced SIMD or floating-point instruction, and not because of a load/store instruction waiting for data to calculate the address in the AGU.

Stall cycles because of a stall in Wr, typically awaiting load data, are excluded.

0xE5 - - -

Attributable Performance Impact Event.

Counts every cycle there is an interlock that is because of a load/store instruction waiting for data to calculate the address in the AGU.

Stall cycles because of a stall in Wr, typically awaiting load data, are excluded.

0xE6 - - -

Attributable Performance Impact Event.

Counts every cycle there is an interlock that is because of an Advanced SIMD or floating-point instruction.

Stall cycles because of a stall in the Wr stage, typically awaiting load data, are excluded.

0xE7 - - -

Attributable Performance Impact Event

Counts every cycle there is a stall in the Wr stage because of a load miss.

0xE8 - - -

Attributable Performance Impact Event.

Counts every cycle there is a stall in the Wr stage because of a store.

- - [26] [26] L2 (data or tag) memory error, correctable or non-correctable.
- - [27] [27] SCU snoop filter memory error, correctable or non-correctable.
- - [28] - Advanced SIMD and floating-point retention active.
- - [29] - Core retention active.
Non-ConfidentialPDF file icon PDF versionARM 100241_0001_00_en
Copyright © 2016, 2017 ARM Limited or its affiliates. All rights reserved.