4.2.2. Profiling Counters

The Profiling Counters provide information about bus activity over a much longer time span than the AHBMONITOR[32:29] port. Each counter is 32 bits wide and records the number of occurrences of a particular event, since a software or hardware reset. The count values can be read back through the AHB slave interface. Control registers can be written to the AHB interface to enable, disable, reset the counters, and in some cases, modify the manner in which they count. The counters are used for statistical profiling of software and system setups, for example the number of cache line fills and/or evictions during a particular algorithm. This permits some assessment of how well the bus fabric is managing competition for slave bandwidth.

As each monitor layer registers the target bus signals to minimize the loading effects, the control and configuration of the counters is required to be synchronized to match the pipelined effect. This ensures that the activity monitored coincides with the correct configuration settings. The best example to explain this synchronization is the write to reset the counters; the write that caused the reset does not cause the respective profile counters to increment, but any transfers directly after do.

Accessing the Profiling Counters

The profiling counters of each layer are accessed through an AHB slave port. They are only connected to the ARM-D and EXPansion bus layers at a base address of 0x101D0000. The control and configuration registers for the profiling counter are also accessible through this base address. Each register in the AHB monitor is word aligned.

Note

The enabling and clearing of the profiling counters is synchronized to the registered operation of the associated layers to ensure that the data collected is correct.

Counted events

All the layers contain a number of counters for collecting statistical information, such as number of basic accesses and burst behavior. Certain layers also contain additional counters for layer specific or behavior of interest. For example:

  • the ARM-I and ARM-D layers also contain counters for cache related statistics

  • the ARM-D and EXP layers contain counters to track accesses to the APB bridges

  • for the DMA peripheral layer (DMA0) the transfers to each of the built-in slaves are counted independently as an easy means of determining which component was using the most DMA bandwidth.

There are counters to help determine the mean latency experienced by each bus master in the form of wait states. Latency caused by bus infrastructure or arbitration is typically seen only on the first transfer of a burst. By differentiating between the two causes of wait states for nonsequential transfers, and knowing the number of bursts, you get an indication of the stalling effect of the bus infrastructure.

As each layer, with exception of the GXI, can receive an ERROR response and the ARM-D layer can receive a RETRY response, it is important to define the count behavior in regards to these condition:

  • Read / Write counters increment on ERROR, second cycle, and OKAY responses only

  • burst counters increment on ERROR, second cycle, and OKAY responses only

  • no increments occur with any other response.

As the AMBA AHB specification allows for divergent behavior in reaction to the ERROR specification, the behavior of each of the AHB masters contained within the ARM926PXP development chip design is shown below to help you understand the profile information collected on each layer:

CLCDC

It stops the current burst by issuing a bus idle and enters an error state that issues an interrupt signal. Once the interrupt is cleared the controller begins the frame again.

DMAC

Halts the current burst then halts further bus activity and optionally raises an interrupt.

AHB-AHB bridge

Passes errors back to the source master.

ARM926-I

The burst transaction is always completed before an ABORT exception is raised. Following the completion of the ABORT exception handler, the ARM executes a specific instruction to execute the instruction that caused the data abort.

ARM926-D

If it occurs on a SWAP instruction, the write is always attempted. For all other cases the BUI, always completes the burst transaction before raising an ABORT exception. Following the completion of the ABORT exception handler the ARM executes a specific instruction to execute the instruction that caused the data abort.

Similarly to the AHB layers, statistical information is also collected for the GXI, such as number of complete read and write transfers and stall trends.

Counted events on ARM-I layer

The ARM926EJ-S ARM-I BIU performs only a small subset of AHB transfer types. It performs no writes at all. All reads are word-sized, even when not in ARM state. For example, two half-word Thumb instructions are fetched by one word transfer. It performs only burst types SINGLE, INCR4, WRAP8 (I-cache line fills).There are no BUSY transfers on this layer.

Table 4.6 shows all the events that are recorded on the ARM-I layer. These counters are enabled through the counter enable bit in the AHBMONCtrlReg register and are further controlled by the DBGACK and when the track DBGACK is asserted. The track DBGACK, when set, disables the counters during cycles that the DBGACK is asserted. All counters can be reset by writing to the AHBMONRstCntrs register, and preset to their absolute address by writing to the AHBMONPrstCntrs register.

Table 4.6. Event counters for the ARM-I layer

Counter NameDescription
CtArmiRdTotal number of completed read transfers
CtArmiBurstSingleNumber of single word bursts
CtArmiBurstIncr4Number of 4-word incrementing bursts
CtArmiLineFillNumber of I-cache linefills, that is, CtArmiBurstWrap8 - number of 8-word wrapping bursts
CtArmiWaitTotalTotal number of wait states
CtArmiWaitNonSeqSlaveNumber of wait states on the first transfer of a burst that were caused by a slave
CtArmiWaitNonSeqBusNumber of wait states on the first transfer of a burst that were caused by the bus infrastructure
CtArmiWaitThresholdHitNumber of occurrences that a wait-state exceeded a configurable threshold

Counted events on CLCDC layer

The PrimeCell PL110 CLCDC BIU performs only a small subset of AHB transfer types:

  • it performs no writes at all

  • all reads are word-sized

  • it performs only burst types INCR, INCR4, INCR8, and INCR16

  • the unspecified length bursts are used to implement single transfers by the CLCDC.

There are no BUSY transfers on this layer, because the FIFO in this particular implementation is synthesized from flip-flops rather than using a compiled RAM block. However, the RTL does contain busy activity that could affect other configurations.

Table 4.7 shows all the events that are recorded on the CLCDC layer. These counters are enabled through the counter enable bit in the AHBMONCtrlReg register. All counters can be reset by writing to the AHBMONRstCntrs register, and preset to their absolute address by writing to the AHBMONPrstCntrs register.

Table 4.7. CLCDC events

Counter NameDescription
CtClcdRdTotal number of completed read transfers
CtClcdBurstIncrNumber of unspecified length bursts, this format is used for 1-word transfers)
CtClcdBurstIncr4Number of 4-word incrementing bursts
CtClcdBurstIncr8Number of 8-word incrementing bursts
CtClcdBurstIncr16Number of 16-word incrementing bursts
CtClcdWaitTotalTotal number of wait states
CtClcdWaitNonSeqSlaveNumber of wait states on the first transfer of a burst that were caused by a slave
CtClcdWaitNonSeqBusNumber of wait states on the first transfer of a burst that were caused by the bus infrastructure
CtClcdWaitThresholdHitNumber of occurrences that a wait-state exceeded a configurable threshold

Counted events on DMA-0 layer

The PrimeCell PL080 DMAC BIU can perform all active AHB transfer types, that is, reads and writes of size 8, 16, and 32-bits. It performs only burst types INCR, INCR4, INCR8, and INCR16. DMA master number 0 is used to access the three DMA-capable peripherals within the ARM926PXP development chip (UART, SCI, SSP) or external slaves accessed through the off-chip bridges.

The APB bridge generates the PSEL signal from a direct binary decode of HADDR[15:12]. The resultant decode maps as:

PSEL[15:0] = {PSelExp[10:0], PSelSsp, PSelUart[2:0], PSelSCard}

Table 4.8 shows all the events that are recorded on the DMA-0 layer. These counters are enabled through the counter enable bit in the AHBMONCtrlReg. All counters can be reset by writing to the AHBMONRstCntrs register, and preset to their absolute address by writing to the AHBMONPrstCntrs register.

Table 4.8. DMA 0 events

Counter NameDescription
CtDma0RdTotal number of read transfers
CtDma0WrTotal number of write transfers
CtDma0RdUartNumber of read transfers from the UART, there are three selects for UART,
CtDma0WrUartNumber of write transfers to the UART, there are three selects for UART,
CtDma0RdSciNumber of read transfers from the SCI
CtDma0WrSciNumber of write transfers to the SCI
CtDma0RdSspNumber of read transfers from the SSP
CtDma0WrSspNumber of write transfers to the SSP
CtDma0BurstIncrNumber of unspecified length bursts
CtDma0BurstIncr4Number of 4-beat incrementing bursts
CtDma0BurstIncr8Number of 8-beat incrementing bursts
CtDma0BurstIncr16Number of 16-beat incrementing bursts
CtDma0WaitTotalTotal number of wait states
CtDma0WaitNonSeqSlaveNumber of wait states on the first transfer of a burst that were caused by a slave
CtDma0WaitNonSeqBusNumber of wait states on the first transfer of a burst that were caused by the bus infrastructure
CtDma0WaitThresholdHitNumber of occurrences that a wait-state exceeded a configurable threshold

Counted events on DMA-1 layer

The PrimeCell PL080 DMAC BIU can perform all active AHB transfer types, that is, reads and writes of size 8, 16, and 32-bits. It performs only burst types INCR, INCR4, INCR8, and INCR16. DMA master number 1 is used to access memory either through the memory controllers within the ARM926PXP development chip (SMC, MPMC) or external slaves accessed through the off-chip bridges.

Table 4.9 shows all the events that are recorded on the DMA-1 layer. These counters are enabled through the counter enable bit in the AHBMONCtrlReg register. All counters can be reset by writing to the AHBMONRstCntrs register, and preset to their absolute address by writing to the AHBMONPrstCntrs register.

Table 4.9. DMA 1 events

Counter NameDescription
CtDma1RdTotal number of read transfers
CtDma1WrTotal number of write transfers
CtDma1BurstIncrNumber of unspecified length bursts
CtDma1BurstIncr4Number of 4-beat incrementing bursts
CtDma1BurstIncr8Number of 8-beat incrementing bursts
CtDma1BurstIncr16Number of 16-beat incrementing bursts
CtDma1WaitTotalTotal number of wait states
CtDma1WaitNonSeqSlaveNumber of wait states on the first transfer of a burst that were caused by a slave
CtDma1WaitNonSeqBusNumber of wait states on the first transfer of a burst that were caused by the bus infrastructure
CtDma1WaitThresholdHitNumber of occurrences that a wait-state exceeded a configurable threshold

Counted events on EXPansion layer

The expansion layer is connected to an AHB-AHB bridge to enable external AHB components to be interfaced to the ARM926PXP development chip. The AHB-AHB bridge passes all AHB signals between AHB domains, with exception of SPLIT and RETRY responses, which are substituted with stall-cycles with the response signals.

Table 4.10 shows all the events that are recorded on the EXPansion layer. These counters are enabled through the counter enable bit in the AHBMONCtrlReg. All counters can be reset by writing to the AHBMONRstCntrs register, and preset to their absolute address by writing to the AHBMONPrstCntrs register

Table 4.10. EXP layer events

Counter NameDescription
CtExpRdTotal number of read transfers
CtExpWrTotal number of write transfers
CtExpRdApbDmaTotal number of read transfers to the DMA peripherals APB Bridge
CtExpWrApbDmaTotal number of write transfers to the DMA peripherals APB Bridge
CtExpRdApbCoreTotal number of read transfers to the Core peripherals APB Bridge
CtExpWrApbCoreTotal number of write transfers to the Core peripherals APB Bridge
CtExpBurstSingleNumber of single beat bursts
CtExpBurstIncrNumber of undefined length bursts
CtExpBurstWrap4Number of 4-beat wrapping bursts
CtExpBurstIncr4Number of 4-beat incrementing bursts
CtExpBurstWrap8Number of 8-beat wrapping bursts
CtExpBurstIncr8Number of 8-beat incrementing bursts
CtExpBurstWrap16Number of 16-beat wrapping bursts
CtExpBurstIncr16Number of 16-beat incrementing bursts
CtExpWaitTotalTotal number of wait states
CtExpWaitNonSeqSlaveNumber of wait states on the first transfer of a burst that were caused by a slave
CtExpWaitNonSeqBusNumber of wait states on the first transfer of a burst that were caused by the bus infrastructure
CtExpWaitThresholdHitNumber of occurrences that a wait-state exceeded a configurable threshold

Counted events on ARM-D layer

The ARM926EJ-S ARM-D BIU can perform all active AHB transfer types, that is reads and writes of size 8-, 16- and 32-bits. It performs only burst types SING, INCR, INCR4, INCR8, WRAP8 (Rd). Normal data transfers, cache operations and page table walks can be differentiated by using a combination of transfer direction, burst type and protection signal values.

There are no BUSY transfers on this layer.

Table 4.11 shows all the events that are recorded on the ARM-D layer. These counters are enabled through the counter enable bit in the AHBMONCtrlReg register (see AHBMONCtrlReg) and are further controlled by the DBGACK and when the track DBGACK is asserted. The track DBGACK, when set, disables the counters during cycles that the DBGACK is asserted. All counters can be reset by writing to the AHBMONRstCntrs register, and preset to their absolute address by writing to the AHBMONPrstCntrs register.

Table 4.11. D layer events

Event Counter NameDescription
CtArmdRdTotal number of read transfers, including cache linefills and page table walks,
CtArmdWrTotal number of write transfers, including cache writebacks
CtArmdRdApbDmaTotal number of read transfers to the DMA peripherals APB Bridge
CtArmdWrApbDmaTotal number of write transfers to the DMA peripherals APB Bridge
CtArmdRdApbCoreTotal number of read transfers to the Core peripherals APB Bridge
CtArmdWrApbCoreTotal number of write transfers to the Core peripherals APB Bridge
CtArmdBurstSingleNumber of single word bursts, including page table walks
CtArmdBurstIncr4Number of 4-word incrementing bursts, including half line cache writebacks
CtArmdBurstIncr8Number of 8-word incrementing bursts, including full line cache write backs
CtArmdLineFillNumber of D-cache linefills, that is, CtArmdRdBurstWrap8 - number of 8-word wrapping read bursts.
CtArmdCastOut4 Number of half line cache writebacks, that is, CtArmdWrBurstIncr4ProtCBPD - number of 4-word incrementing write bursts with particular HPROT. This counter can also check for the INCR4 being 4-word aligned in order to filter out approximately 75% of the STM bursts that are incorrectly identified as cast-outs
CtArmdCastOut8 Number of full line cache writebacks, that is, CtArmdWrBurstIncr8ProtCBPD - number of 8-word incrementing write bursts with particular HPROT.
CtArmdPageWalkDNumber of read transfers for D-side page table walks, that is, CtArmdRdBurstSingProtCBPD - number of single word reads with particular HPROT.
CtArmdPageWalkINumber of read transfers for I-side page table walk reads ,that is, CtArmdRdBurstSingProtCBPI - number of single word reads with particular HPROT.
CtArmdWaitTotalTotal number of wait states
CtArmdWaitNonSeqSlaveNumber of wait states on the first transfer of a burst that were caused by a slave
CtArmdWaitNonSeqBusNumber of wait states on the first transfer of a burst that were caused by the bus infrastructure
CtArmdWaitThresholdHitNumber of occurrences that a wait-state exceeded a configurable threshold

Counted Events on the MBX GXI Layer

The MBX connects to the MPMC through a dedicated interconnect. The connection performs single unit read and write transfers over logically disjoint data buses. Further details of the interconnect are contained with the MBX TRM.

Table 4.12 shows all the events that are recorded on the MBX GXI bus. These counters are enabled through the counter enable bit in the AHBMONCtrlReg. All counters can be reset by writing to the AHBMONRstCntrs register, and preset to their absolute address by writing to the AHBMONPrstCntrs register.

Table 4.12. GXI events

Counter NameDescription
CtGxiWrNumber of completed write transfers.
CtGxiRdNumber of completed read requests.
CtGxiWrAddrWaitNumber of wait cycles suffered by write requests.
CtGxiRdAddrWaitNumber of wait cycles suffered by read requests.
CtGxiRdDataWaitNumber of wait cycles suffered by pending read transfers.
CtGxiRdAWaitThresholdHitNumber of occurrences that a read request wait-state exceeded a configurable threshold, not a count of how many messages suffer the threshold latency.
CtGxiRdDWaitThresholdHitNumber of occurrences that a read transfer wait-state exceeded a configurable threshold.
CtGxiWrAWaitThresholdHitNumber of occurrences that a write wait-state exceeded a configurable threshold.
CtGxiPageChangeNumber of times the transfer crossed a page boundary. The page boundary size can be configured to 2k, 4k, 8k or 16k.
GxiPageSizeDefines the page size that the CtGxiPageChange counter uses to monitor page boundary changes. 2'b00 = 2k, 2'b01 = 4k, 2'b10 = 8k, & 2'b11 = 16k.

Miscellaneous Counted Events with the AHB Monitor

Table 4.13 presents three additional counters included to enable the further evaluation of the system based on the number of cycles elapsed dependent on certain operating conditions. With exception of the CtTotalCycles, these counters are enabled through the counter enable bit in the AHBMONCtrlReg register (see AHBMONCtrlReg). The CtTotalCyclesNonDebug counter is controlled also by the DBGACK and when the track DBGACK is asserted. The track DBGACK, when set, disables the counter during cycles that the DBGACK is asserted. The latter two counters can be reset by writing to the AHBMONRstCntrs register, and preset to their absolute address by writing to the AHBMONPrstCntrs register.

Table 4.13. Other events

Counter NameDescription
CtTotalCyclesNumber of bus cycles since the last hardware reset
CtTotalCyclesEnNumber of bus cycles that the profile counters have been enabled since the last hardware reset or counter reset
CtTotalCyclesNonDebugNumber of bus cycles that the profile counters have been enabled since the last hardware reset or counter reset discounting debug cycles, based on DBGACK from the ARM processor

Copyright © 2004, 2006 ARM Limited. All rights reserved.ARM DDI 0287B
Non-Confidential