Illustration of the benefits of using conditional instructions

This illustrates the difference between using branches and using conditional instructions. It uses the Euclid algorithm for the Greatest Common Divisor (gcd) to demonstrate how conditional instructions improve code size and speed.

In C the gcd algorithm can be expressed as:

int gcd(int a, int b)
{
    while (a != b)
      {
        if (a > b)
            a = a - b;
        else
            b = b - a;
      }
    return a;
}

The following examples show implementations of the gcd algorithm with and without conditional instructions.

Note

The detailed analysis of execution speed only applies to an ARM7™ processor. The code density calculations apply to all ARM processors.

Show/hideExample of conditional execution using branches in ARM code

This is an ARM code implementation of the gcd algorithm using branches, without using any other conditional instructions. Conditional execution is achieved by using conditional branches, rather than individual conditional instructions:

gcd     CMP      r0, r1
        BEQ      end
        BLT      less
        SUBS     r0, r0, r1  ; could be SUB r0, r0, r1 for ARM
        B        gcd
less
        SUBS     r1, r1, r0  ; could be SUB r1, r1, r0 for ARM
        B        gcd
end

The code is seven instructions long because of the number of branches. Every time a branch is taken, the processor must refill the pipeline and continue from the new location. The other instructions and non-executed branches use a single cycle each.

The following table shows the number of cycles this implementation uses on an ARM7 processor when R0 equals 1 and R1 equals 2.

Table 19. Conditional branches only

R0: aR1: bInstructionCycles (ARM7)
12CMP r0, r11
12BEQ end1 (not executed)
12BLT less3
12SUB r1, r1, r01
12B gcd3
11CMP r0, r11
11BEQ end3
   Total = 13

Show/hideExample of conditional execution using conditional instructions in ARM code

This is an ARM code implementation of the gcd algorithm using individual conditional instructions in ARM code. The gcd algorithm only takes four instructions:

gcd
        CMP      r0, r1
        SUBGT    r0, r0, r1
        SUBLE    r1, r1, r0
        BNE      gcd

In addition to improving code size, in most cases this code executes faster than the version that uses only branches.

The following table shows the number of cycles this implementation uses on an ARM7 processor when R0 equals 1 and R1 equals 2.

Table 20. All instructions conditional

R0: aR1: bInstructionCycles (ARM7)
12CMP r0, r11
12SUBGT r0,r0,r11 (not executed)
11SUBLT r1,r1,r01
11BNE gcd3
11CMP r0,r11
11SUBGT r0,r0,r11 (not executed)
11SUBLT r1,r1,r01 (not executed)
11BNE gcd1 (not executed)
   Total = 10

Comparing this with the example that uses only branches:

  • Replacing branches with conditional execution of all instructions saves three cycles.

  • Where R0 equals R1, both implementations execute in the same number of cycles. For all other cases, the implementation that uses conditional instructions executes in fewer cycles than the implementation that uses branches only.

Show/hideExample of conditional execution using conditional instructions in Thumb code

In architectures ARMv6T2 and later, you can use the IT instruction to write conditional instructions in Thumb code. The Thumb code implementation of the gcd algorithm using conditional instructions is very similar to the implementation in ARM code. The implementation in Thumb code is:

gcd
        CMP     r0, r1
        ITE     GT 
        SUBGT   r0, r0, r1
        SUBLE   r1, r1, r0
        BNE     gcd

This assembles equally well to ARM or Thumb code. The assembler checks the IT instructions, but omits them on assembly to ARM code.

It requires one more instruction in Thumb code (the IT instruction) than in ARM code, but the overall code size is 10 bytes in Thumb code compared with 16 bytes in ARM code.

Show/hideExample of conditional execution code using branches in Thumb code

In architectures before ARMv6T2, there is no IT instruction and hence Thumb instructions cannot be executed conditionally except for the B branch instruction. The gcd algorithm must be written with conditional branches and is very similar to the ARM code implementation using branches, without conditional instructions.

The Thumb code implementation of the gcd algorithm without conditional instructions requires seven instructions. The overall code size is 14 bytes. This is even less than the ARM implementation that uses conditional instructions, which uses 16 bytes.

In addition, on a system using 16-bit memory this Thumb implementation runs faster than both ARM implementations because only one memory access is required for each 16-bit Thumb instruction, whereas each 32-bit ARM instruction requires two fetches.

Show/hideSee also

Copyright © 2010-2011 ARM. All rights reserved.ARM DUI 0473C
Non-ConfidentialID080411