5.6 Optimization of loop termination in C code

Loops are a common construct in most programs. Because a significant amount of execution time is often spent in loops, it is worthwhile paying attention to time-critical loops.

The loop termination condition can cause significant overhead if written without caution. Where possible:

  • Use simple termination conditions.

  • Write count-down-to-zero loops.

  • Use counters of type unsigned int.

  • Test for equality against zero.

Following any or all of these guidelines, separately or in combination, is likely to result in better code.

The following table shows two sample implementations of a routine to calculate n! that together illustrate loop termination overhead. The first implementation calculates n! using an incrementing loop, while the second routine calculates n! using a decrementing loop.

Table 5-1 C code for incrementing and decrementing loops

Incrementing loop Decrementing loop
int fact1(int n)
{
    int i, fact = 1;
    for (i = 1; i <= n; i++)
        fact *= i;
    return (fact);
}
int fact2(int n)
{
    unsigned int i, fact = 1;
    for (i = n; i != 0; i--)
        fact *= i;
    return (fact);
}

The following table shows the corresponding disassembly of the machine code produced by the compiler for each of the sample implementations above, where the C code for both implementations has been compiled using the options -O2 -Otime.

Table 5-2 C Disassembly for incrementing and decrementing loops

Incrementing loop Decrementing loop
fact1 PROC
    MOV      r2, r0
    MOV      r0, #1
    CMP      r2, #1
    MOV      r1, r0
    BXLT     lr
|L1.20|
    MUL      r0, r1, r0
    ADD      r1, r1, #1
    CMP      r1, r2
    BLE      |L1.20|
    BX       lr
    ENDP
fact2 PROC
    MOVS     r1, r0
    MOV      r0, #1
    BXEQ     lr
|L1.12|
    MUL      r0, r1, r0
    SUBS     r1, r1, #1
    BNE      |L1.12|
    BX       lr
    ENDP

Comparing the disassemblies shows that the ADD and CMP instruction pair in the incrementing loop disassembly has been replaced with a single SUBS instruction in the decrementing loop disassembly. This is because a compare with zero can be used instead.

In addition to saving an instruction in the loop, the variable n does not have to be saved across the loop, so the use of a register is also saved in the decrementing loop disassembly. This eases register allocation. It is even more important if the original termination condition involves a function call. For example:

for (...; i < get_limit(); ...);

The technique of initializing the loop counter to the number of iterations required, and then decrementing down to zero, also applies to while and do statements.

Non-ConfidentialPDF file icon PDF versionARM DUI0472M
Copyright © 2010-2016 ARM Limited or its affiliates. All rights reserved.