|ARM Technical Support Knowledge Articles|
Applies to: Cortex-M4
This Knowledge article refers to Cortex-M4 versions up to r0p1, the current version at the time of writing.
Chip designers who have licensed the Cortex-M4 processor for inclusion into their chip design also receive a simple example MCU design and testbench with some test programs, together known as the Cortex-M4 Integration Kit.
One of the test programs included is a copy of the standard Dhrystone benchmark test, named "dhry".
When this test is compiled and run without any modification, it results in a completion report such as:
Measured time too small to obtain meaningful results Please increase number of runs ** TEST PASSED OK ** (Time: 2211180)
The same result is obtained irrespective of the number of Dhrystone loops completed.
The Dhrystone program has run correctly, but the time measurement function is missing in this test environment, resulting in the test appearing to have completed in zero cycles.
The Cortex-M4 processor always contains the architecturally defined system timer "SysTick", which can be used for cycle counting.
If the header file
./integration_kit/validation/tests/dhry.h is modified to include the line:
#define EXPECTED_SYST 1
at the start of the file, and the test program is recompiled, the test program will automatically configure itself to measure SysTick cycle count instead of the non-existent real time function. The program will then produce a completion report such as:
Number of cycles for 5 iteration is 2307 ** TEST PASSED OK ** (Time: 2129420)
This result includes a 2-cycle overhead for branching in/out of the Dhrystone test code.
Alternatively, the exact cycle count can be obtained by examining the ./<simulator>/tarmac.log file generated during the run. The loop cycle count is found by identifying an instruction which is executed once per loop, and measuring the time difference between consecutive executions of this instruction. A good candidate instruction is "MUL":
> grep MUL VCS/tarmac.log 343160 ns IT (000003d4:000024db) 000003d4 4360 T16 MULS r0,r4,r0 352380 ns IT (000003d4:00002614) 000003d4 4360 T16 MULS r0,r4,r0 361600 ns IT (000003d4:0000274d) 000003d4 4360 T16 MULS r0,r4,r0 370820 ns IT (000003d4:00002886) 000003d4 4360 T16 MULS r0,r4,r0 380040 ns IT (000003d4:000029bf) 000003d4 4360 T16 MULS r0,r4,r0 >
The time delta of 9220ns between each execution, in proportion to the 50MHz clock in the example, gives 461 cycles per Dhrystone loop.
Note that Dhrystone results are highly dependent on compiler version and memory system. A comparison of Dhrystone MIPS for different memory system wait-states can be found here:
Did you find this article helpful? Yes No
How can we improve this article?