ARM Technical Support Knowledge Articles

Does the ARM Profiler support profiling NEON/VFP code?

Applies to: RealView Development Suite (RVDS)


The profiler supports profiling VFP code (for example on an ARM1136JF-S) and NEON code (for example on a Cortex-A8).

However the cycle timing information displayed always relates to the integer core. On cores where the NEON/VFP unit is decoupled from the integer pipeline the stalls will not be shown.

For example, on an ARM1136JF-S:

The VSTR instruction is stalled as the value of s1 has not yet been calculated by the preceding VMUL instruction. The Profiler displays this delay in the interlock (I) column.

For example on the Cortex-A9:

The VST1 instruction is stalled in the NEON unit as the value of d0 has not yet been calculated by the preceding VMUL instruction. The Profiler does not show an interlock between these instructions as the NEON unit is decoupled from the integer core. A stall in the integer core would not occur in this scenario unless NEON instruction queue was full.

Other considerations

The RTSMs do not model the instruction queues in the NEON unit. In order to observe the effects of the I/D queues becoming full you must profile on hardware.

The Profiler will incorrectly display integer core stalls for VFP instructions on cores where the VFP unit is decoupled. This does not affect NEON instructions, and will be addressed in a future release.

Attachments: arm11.jpg , neonA9.jpg

Article last edited on: 2008-11-13 11:44:41

Rate this article

Disagree? Move your mouse over the bar and click

Did you find this article helpful? Yes No

How can we improve this article?

Link to this article
Copyright © 2011 ARM Limited. All rights reserved. External (Open), Non-Confidential