|ARM Technical Support Knowledge Articles|
This Knowledge Article is intended for engineers who have licensed the Cortex-M3 and Cortex-M4 processors for inclusion in their chip designs.
The Cortex-M3 version r2p1-00rel0 is delivered to licensees together with a simple example system testbench connecting the processor to some memory, using a simple combinatorial multiplexer for the Code memory.
The Cortex-M4 version r0p1-03rel0 is delivered to licensees with a formalized Integration Kit (IK) implementing a simple micro-controller (MCU) design and testbench. In the IK MCU, the processor is connected to memory through an AHB-Lite Bus Matrix component which arbitrates memory accesses from slave ports (driven by bus masters) to master ports (connected to slave devices).
Both the Cortex-M3 "example" system and the Cortex-M4 Integration Kit include a copy of a Dhrystone test, comprised of files "dhry.h", "dhry_1.c" and "dhry_2.c", which can be run in simulation in the testbench provided.
Dhrysone figures for Cortex-M3 and Cortex-M4, based on the Cortex-M3 "example" testbench, are reported in Knowledge Article 16376:
Dhrystone figures for Cortex-M4, based on the IK testbench, can be obtained by running the "dhry" test. The results are not printed directly by default, but can be obtained by referring to Knowledge Article 16826:
The results reported for the zero wait-state row in KA16376, and reported by the IK "dhry" test, include one or two overhead cycles resulting from entry to, and exit from, the test. This adds a fractional cycle to the reported result. By examining the tarmac.log files, it can be seen that the actual cycle counts for KA16376 are 460 for Cortex-M3 and 454 for Cortex-M4.
Using the same version of the ARM Compiler (from DS-5 version 5.18), ARM Compiler 5.04 update 1 [build 49] (), the Cortex-M4 IK result is reported as 2307 cycles for 5 iterations, or 461.4 cycles per Dhrystone loop. Analysis of the tarmac again rounds this to exactly 461 cycles through the Dhrystone loop, 7 cycles more than the 454 achieved in the "example" system testbench.
Inspection of the source code for the "dhry" test shows that the IK version has been slightly modernized to use ANSI C function declarations, and corresponding prototypes in the header file, which help an ANSI C compiler produce a more efficient result.
As it happens, the modernized "dhry_2.c" file remains compatible with the "dhry.h" and "dhry_1.c" files in the "example" system. (The reverse is not true - the "example" system "dhry_2.c" cannot be run directly in the IK environment.) Using the modernized "dhry_2.c" and rerunning the zero wait-state tests in the "example" system using the same ARM Compiler version, the Dhrystone cycle counts are improved to 454 for Cortex-M3 and 448 for Cortex-M4.
Therefore, to a first order approximation, the code modification in the IK version results in a 6 cycle improvement in Dhrystone, and therefore the memory system in the IK must be adding 13 cycles compared to the memory system in the "example" system, in order to explain the observed result that the Cortex-M4 IK "dhry" tests yields a result 7 cycles slower than running the "example" system "dhry" test on the Cortex-M4.
Cycle count summary:
Cortex-M3 out-of-box "example" system - 460 Cortex-M3 "example" with IK "dhry_2.c" - 454 Cortex-M4 out-of-box IK - 461 Cortex-M4 in "example" system - 454 Cortex-M4 "example" with IK "dhry_2.c" - 448
These results illustrate that Dhrystone figures are not an objective test of processor performance, but are heavily influenced by compilation effects and effects of the memory system to which the processor is connected.
Did you find this article helpful? Yes No
How can we improve this article?