9.2.1. Cortex-A9 specific events

Table 9.2 shows the Cortex-A9 specific events. In the value column of Table 9.2 Precise means the event is counted precisely. Events related to stalls and speculative instructions appear as Approximate entries in this column.

Table 9.2. Cortex-A9 specific events

Event DescriptionValue
0x40

Java bytecode execute[a]

Counts the number of Java bytecodes being decoded, including speculative ones.

Approximate
0x41

Software Java bytecode executed.[a]

Counts the number of software java bytecodes being decoded, including speculative ones.

Approximate
0x42

Jazelle backward branches executed[a].

Counts the number of Jazelle taken branches being executed. This includes the branches that are flushed because of a previous load/store which aborts late.

Approximate
0x50

Coherent linefill miss[b]

Counts the number of coherent linefill requests performed by the Cortex-A9 processor which also miss in all the other Cortex-A9 processors, meaning that the request is sent to the external memory.

Precise
0x51

Coherent linefill hit[b]

Counts the number of coherent linefill requests performed by the Cortex-A9 processor which hit in another Cortex-A9 processor, meaning that the linefill data is fetched directly from the relevant Cortex-A9 cache.

Precise
0x60

Instruction cache dependent stall cycles

Counts the number of cycles where the processor is ready to accept new instructions, but does not receive any because of the instruction side not being able to provide any and the instruction cache is currently performing at least one linefill.

Approximate
0x61

Data cache dependent stall cycles

Counts the number of cycles where the core has some instructions that it cannot issue to any pipeline, and the Load Store unit has at least one pending linefill request, and no pending TLB requests.

Approximate
0x62

Main TLB miss stall cycles

Counts the number of cycles where the processor is stalled waiting for the completion of translation table walks from the main TLB. The processor stalls can be because of the instruction side not being able to provide the instructions, or to the data side not being able to provide the necessary data, because of them waiting for the main TLB translation table walk to complete.

Approximate
0x63

STREX passed

Counts the number of STREX instructions architecturally executed and passed.

Precise
0x64

STREX failed

Counts the number of STREX instructions architecturally executed and failed.

Precise
0x65

Data eviction

Counts the number of eviction requests because of a linefill in the data cache.

Precise
0x66

Issue does not dispatch any instruction

Counts the number of cycles where the issue stage does not dispatch any instruction because it is empty or cannot dispatch any instructions.

Precise
0x67

Issue is empty

Counts the number of cycles where the issue stage is empty.

Precise
0x68

Instructions coming out of the core renaming stage

Counts the number of instructions going through the Register Renaming stage. This number is an approximate number of the total number of instructions speculatively executed, and even more approximate of the total number of instructions architecturally executed. The approximation depends mainly on the branch misprediction rate.

The renaming stage can handle two instructions in the same cycle so the event is two bits long:

  • b00 no instructions coming out of the core renaming stage

  • b01 one instruction coming out of the core renaming stage

  • b10 two instructions coming out of the core renaming stage.

See Table A.17 for a description of how these values map to the PMUEVENT bus bits.

Approximate
0x6E

Predictable function returns

Counts the number of procedure returns whose condition codes do not fail, excluding all returns from exception. This count includes procedure returns which are flushed because of a previous load/store which aborts late.

Only the following instructions are reported:

  • BX R14

  • MOV PC LR

  • POP {..,pc}

  • LDR pc,[sp],#offset.

The following instructions are not reported:

  • LDMIA R9!,{..,PC} (ThumbEE state only)

  • LDR PC,[R9],#offset (ThumbEE state only)

  • BX R0 (Rm != R14)

  • MOV PC,R0 (Rm != R14)

  • LDM SP,{...,PC} (writeback not specified)

  • LDR PC,[SP,#offset] (wrong addressing mode).

Approximate
0x70

Main execution unit instructions

Counts the number of instructions being executed in the main execution pipeline of the processor, the multiply pipeline and arithmetic logic unit pipeline. The counted instructions are still speculative.

Approximate
0x71

Second execution unit instructions

Counts the number of instructions being executed in the processor second execution pipeline (ALU). The counted instructions are still speculative.

Approximate
0x72

Load/Store Instructions

Counts the number of instructions being executed in the Load/Store unit. The counted instructions are still speculative.

Approximate
0x73

Floating-point instructions

Counts the number of Floating-point instructions going through the Register Rename stage. Instructions are still speculative in this stage.

Two floating-point instructions can be renamed in the same cycle so the event is two bits long:

0b00 no floating-point instruction renamed

0b01 one floating-point instruction renamed

0b10 two floating-point instructions renamed.

See Table A.17 for a description of how these values map to the PMUEVENT bus bits.

Approximate
0x74

NEON instructions

Counts the number of NEON instructions going through the Register Rename stage. Instructions are still speculative in this stage.

Two NEON instructions can be renamed in the same cycle so the event is two bits long:

0b00 no NEON instruction renamed

0b01 one NEON instruction renamed

0b10 two NEON instructions renamed.

See Table A.17 for a description of how these values map to the PMUEVENT bus bits.

Approximate
0x80

Processor stalls because of PLDs

Counts the number of cycles where the processor is stalled because PLD slots are all full.

Approximate
0x81

Processor stalled because of a write to memory

Counts the number of cycles when the processor is stalled and the data side is stalled too because it is full and executing writes to the external memory.

Approximate
0x82

Processor stalled because of instruction side main TLB miss

Counts the number of stall cycles because of main TLB misses on requests issued by the instruction side.

Approximate
0x83

Processor stalled because of data side main TLB miss

Counts the number of stall cycles because of main TLB misses on requests issued by the data side.

Approximate
0x84

Processor stalled because of instruction micro TLB miss

Counts the number of stall cycles because of micro TLB misses on the instruction side. This event does not include main TLB miss stall cycles that are already counted in the corresponding main TLB event.

Approximate
0x85

Processor stalled because of data micro TLB miss

Counts the number of stall cycles because of micro TLB misses on the data side. This event does not include main TLB miss stall cycles that are already counted in the corresponding main TLB event.

Approximate
0x86

Processor stalled because of DMB

Counts the number of stall cycles because of the execution of a DMB memory barrier. This includes all DMB instructions being executed, even speculatively.

Approximate
0x8A

Integer clock enabled

Counts the number of cycles during which the integer core clock is enabled.

Approximate
0x8B

Data Engine clock enabled

Counts the number of cycles during which the Data Engine clock is enabled.

Approximate
0x90

ISB instructions

Counts the number of ISB instructions architecturally executed.

Precise
0x91

DSB instructions

Counts the number of DSB instructions architecturally executed.

Precise
0x92

DMB instructions

Counts the number of DMB instructions speculatively executed.

Approximate
0x93

External interrupts

Counts the number of external interrupts executed by the processor.

Approximate
0xA0 PLE cache line request completed.[c]Precise
0xA1PLE cache line request skipped.[c]Precise
0xA2PLE FIFO flush.[c]Precise
0xA3PLE request completed.[c]Precise
0xA4PLE FIFO overflow.[c]Precise
0xA5PLE request programmed.[c]Precise

[a] Only when the design implements the Jazelle extensions. Otherwise reads as 0.

[b] For use with Cortex-A9 multiprocessor variants.

[c] Active only when the PLE is present. Otherwise reads as 0.


Copyright © 2008-2009 ARM. All rights reserved.ARM DDI 0388E
Non-Confidential