ARM Technical Support Knowledge Articles

How does the DMIPS/MHz performance vary with wait-states on the code memory?

Applies to: Cortex-M3, Cortex-M4


Dhrystone MIPS (Million Instructions per Second), or DMIPS, is a measure of computer performance relative to the performance of the DEC VAX 11/780 minicomputer of the 1970s. The Dhrystone test is based on a standard loop of synthetic code, intended to mimic the types of operations required in real computer algorithms of that time. DMIPS/MHz remains a widely-used performance metric due to its simplicity, in spite of a number of shortcomings. Note that different compiler versions can produce different performance results on the same hardware. DMIPS results are generally only useful under tightly controlled restrictions on compiler optimization options, as modern compilers can easily eliminate large portions of the test code if allowed to do so.

The Cortex-M3 RTL is delivered to licensees together with an "example" system testbench for simulation of a simple Cortex-M3 system, and a number of test programs including a Dhrystone test called "dhry". This program reports the average number of processor clock cycles per loop of the Dhrystone test. DMIPS/MHz is calculated using the following formula:

   DMIPS/MHz = 10^6 / (1757 * Number of processor clock cycles per Dhrystone loop)

The Cortex-M3 example system includes a configuration file which makes it easy to add wait-states to memory accesses which read from the Code memory space.

By substituting a Cortex-M4 processor for the Cortex-M3 in this example system, similar results can be generated for the Cortex-M4.


The instruction fetch logic in these processors somewhat reduces the effect of wait-states when reading from the code image, resulting in a non-linear relationship between number of wait-states and the reduction in performance for small numbers of wait-states.

At the time of writing, the current version of the ARM Compiler is:

   Product: DS-5 Professional 5.18
   Component: ARM Compiler 5.04 update 1 (build 49)
   Tool: armcc [5040049]

Using this compiler version and the default version of the Cortex-M3 "example" system with Cortex-M3 r2p1 and Cortex-M4 r0p1 processors, and adding wait-states by adjusting ARM_ICODE_WAITS_RD and ARM_DCODE_WAITS_RD, the following cycle counts and DMIPS/MHz results can be observed:

   Wait-states       Cortex-M3 cycles/loop  (DMIPS/MHz)    Cortex-M4 cycles/loop  (DMIPS/MHz)

        0                       460.2          1.24                  454.2           1.25
        1                       601.2          0.95                  600.2           0.95
        2                       826.5          0.69                  826.5           0.69
        3                      1073.8          0.53                 1073.8           0.53            
        4                      1334.0          0.43                 1334.0           0.43
        5                      1599.2          0.36                 1599.2           0.36
        6                      1865.5          0.31                 1865.5           0.31

Rate this article

Disagree? Move your mouse over the bar and click

Did you find this article helpful? Yes No

How can we improve this article?

Link to this article
Copyright © 2011 ARM Limited. All rights reserved. External (Open), Non-Confidential