6.2.2. An example of APCS register usage: 64-bit integer addition

This example illustrates how to use ARM assembly language to code a small function so that it can be used by C modules.

The function performs a 64-bit integer addition. It uses a two-word data structure to store each 64-bit operand. We will consider the following stages:

Writing the function in C

In ARM assembly language, you can code the addition of double-length integers by using the Carry flag from the low word addition in the high word addition. However, in C there is no way of specifying the Carry flag. Example 6.1 shows a workaround.

Example 6.1. 

void add_64(int64 *dest, int64 *src1, int64 *src2)
{ unsigned hibit1=src1->lo >> 31, hibit2=src2->lo >> 31, hibit3;
	dest->lo=src1->lo + src2->lo;
	hibit3=dest->lo >> 31;
	dest->hi=src1->hi + src2->hi +
		((hibit1 & hibit2) || (hibit1!= hibit3));
	return;
}

The highest bits of the low words in the two operands are calculated (shifting them into bit 0, while clearing the rest of the register). These bits are then used to determine the value of the carry bit (in the same way as the ARM itself does).

Examining the compiler output

If the addition routine were to be used a great deal, an implementation such as this would probably be inadequate. To consider the quality of the implementation, examine the code produced by the compiler. Follow these steps to produce an assembly language listing:

  1. Copy file examples/candasm/add64_1.c to your current working directory. This file contains the C code in Example 6.1.

  2. Compile it to ARM assembly language source as follows:

    armcc -li -S add64_1.c

    The -S flag tells the compiler to produce ARM assembly language source (suitable for armasm) instead of object code.

Example 6.2 shows the assembly language output in file add64_1.s. It reveals that this is an inefficient implementation (instructions may vary between compiler releases).

Example 6.2. 

add_64
	STMDB		sp!,{v1,lr}
	LDR		v1,[a2,#0]
	MOV		a4,v1,LSR #31
	LDR		ip,[a3,#0]
	MOV		lr,ip,LSR #31
	ADD		ip,v1,ip
	STR		ip,[a1,#0]
	MOV		ip,ip,LSR #31
	LDR		a2,[a2,#4]
	LDR		a3,[a3,#4]
	ADD		a2,a2,a3
	TST		a4,lr
	CMPEQ		a4,ip
	MOVNE		a3,#1
	MOVEQ		a3,#0
	ADD		a2,a2,a3
	STR		a2,[a1,#4]!
	LDMIA		sp!,{v1,pc}

Modifying the compiler output

Because you cannot specify the Carry flag in C, you must get the compiler to produce almost the right code, and then modify it by hand. Start with (incorrect) code that does not perform the carry addition, as in Example 6.3.

Example 6.3. 

void add_64(int64 *dest, int64 *src1, int64 *src2)
{ dest->lo=src1->lo + src2->lo;
  dest->hi=src1->hi + src2->hi;
  return;
}

Copy file examples/candasm/add64_2.c (which contains the code in Example 6.3) to your current working directory.

Compile it to ARM assembly language source as follows:

armcc -li -S add64_2.c

You can find the assembly language produced by the compiler in the file add64_2.s.

Example 6.4. 

add_64
	LDR		a4,[a2,#0]
	LDR		ip,[a3,#0]
	ADD		a4,a4,ip
	STR		a4,[a1,#0]
	LDR		a2,[a2,#4]
	LDR		a3,[a3,#4]
	ADD		a2,a2,a3
	STR		a2,[a1,#4]
	MOV		pc,lr

Comparing this to the C source, you can see that the first ADD instruction produces the low order word, and the second produces the high order word. To correct this, get the carry from the low to high word by changing:

  • the first ADD to ADDS (add and set flags)

  • the second ADD to an ADC (add with carry)

You can find this modified code in the directory examples/candasm as add64_3.s.

Looking at the effects of the APCS

The most obvious effect of the APCS on the example code is the change in register names:

  • a1 holds a pointer to the destination structure.

  • a2 and a3 hold pointers to the operand structures.

  • a4 and ip are used as temporary registers that are not preserved. The conditions under which ip can be corrupted are discussed in A more detailed look at APCS register usage.

This is a simple leaf function that uses few temporary registers, so none are saved to the stack and restored on exit. Therefore you can use a simple MOV pc,lr to return.

If you wish to return another result, such as the carry out from the addition, you must load it into a1 prior to exit. You can do this as follows:

Change the second ADD to ADCS (add with carry and set flags).

Add the following instructions to load a1 with 1 or 0 depending on the carry out from the high order addition.

	MOV	a1, #0
	ADC	a1, a1, #0

Change the return type of function declaration for add-64() from void to int.

Revisiting the first implementation

Although the first C implementation is inefficient, it shows more about the APCS than the hand-modified version.

You have already seen a4 and ip being used as non-preserved temporary registers. However, here v1 and lr are also used as temporary registers. v1 is preserved by being stored (together with lr) on entry. Register lr is corrupted, but a copy is saved onto the stack and reloaded into pc when v1 is restored. This means that there is still only a single exit instruction, but now it is:

	LDMIA		sp!,{v1,pc}
Copyright © 1997, 1998 ARM Limited. All rights reserved.ARM DUI 0040D
Non-Confidential