7.1 Optimization of loop termination in C code

Loops are a common construct in most programs. Because a significant amount of execution time is often spent in loops, it is worthwhile paying attention to time-critical loops.

The loop termination condition can cause significant overhead if written without caution. Where possible:

  • Use simple termination conditions.

  • Write count-down-to-zero loops.

  • Use counters of type unsigned int.

  • Test for equality against zero.

Following any or all of these guidelines, separately or in combination, is likely to result in better code.

The following table shows two sample implementations of a routine to calculate n! that together illustrate loop termination overhead. The first implementation calculates n! using an incrementing loop, while the second routine calculates n! using a decrementing loop.

Table 7-1 C code for incrementing and decrementing loops

Incrementing loop Decrementing loop
int fact1(int n)
{
    int i, fact = 1;
    for (i = 1; i <= n; i++)
        fact *= i;
    return (fact);
}
int fact2(int n)
{
    unsigned int i, fact = 1;
    for (i = n; i != 0; i--)
        fact *= i;
    return (fact);
}

The following table shows the corresponding disassembly of the machine code produced by armclang -Os -S --target=arm-arm-none-eabi -march=armv8-a for each of the sample implementations above.

Table 7-2 C disassembly for incrementing and decrementing loops

Incrementing loop Decrementing loop
fact1:                                  
        mov     r1, r0
        mov     r0, #1
        cmp     r1, #1
        bxlt    lr
        mov     r2, #0
.LBB0_1:                                
        add     r2, r2, #1
        mul     r0, r0, r2
        cmp     r1, r2
        bne     .LBB0_1
        bx      lr
fact2:                                  
        mov     r1, r0
        mov     r0, #1
        cmp     r1, #0
        bxeq    lr
.LBB1_1:                                
        mul     r0, r0, r1
        subs    r1, r1, #1
        bne     .LBB1_1
        bx      lr

Comparing the disassemblies shows that the ADD and CMP instruction pair in the incrementing loop disassembly has been replaced with a single SUBS instruction in the decrementing loop disassembly. Because the SUBS instruction updates the status flags, including the Z flag, there is no requirement for an explicit CMP r1,r2 instruction.

In addition to saving an instruction in the loop, the variable n does not have to be available for the lifetime of the loop, reducing the number of registers that have to be maintained. This eases register allocation. It is even more important if the original termination condition involves a function call. For example:

for (...; i < get_limit(); ...);

The technique of initializing the loop counter to the number of iterations required, and then decrementing down to zero, also applies to while and do statements.

Non-ConfidentialPDF file icon PDF versionDUI0773J
Copyright © 2014–2017, 2019 Arm Limited or its affiliates. All rights reserved.