| |||
Home > Using the Procedure Call Standards > Using the ARM Procedure Call Standard > An example of APCS register usage: 64-bit integer addition |
This example illustrates how to use ARM assembly language to code a small function so that it can be used by C modules.
The function performs a 64-bit integer addition. It uses a two-word data structure to store each 64-bit operand. We will consider the following stages:
writing the function in C
examining the compiler output
modifying the compiler output
looking at the effects of the APCS
revisiting the first implementation.
In ARM assembly language, you can code the addition of double-length integers by using the Carry flag from the low word addition in the high word addition. However, in C there is no way of specifying the Carry flag. Example 6.1 shows a workaround.
Example 6.1.
void add_64(int64 *dest, int64 *src1, int64 *src2) { unsigned hibit1=src1->lo >> 31, hibit2=src2->lo >> 31, hibit3; dest->lo=src1->lo + src2->lo; hibit3=dest->lo >> 31; dest->hi=src1->hi + src2->hi + ((hibit1 & hibit2) || (hibit1!= hibit3)); return; }
The highest bits of the low words in the two operands are calculated (shifting them into bit 0, while clearing the rest of the register). These bits are then used to determine the value of the carry bit (in the same way as the ARM itself does).
If the addition routine were to be used a great deal, an implementation such as this would probably be inadequate. To consider the quality of the implementation, examine the code produced by the compiler. Follow these steps to produce an assembly language listing:
Copy file examples/candasm/add64_1.c
to
your current working directory. This file contains the C code in Example 6.1.
Compile it to ARM assembly language source as follows:
armcc -li -S add64_1.c
The -S
flag tells the compiler to produce
ARM assembly language source (suitable for armasm) instead of object
code.
Example 6.2 shows the
assembly language output in file add64_1.s
.
It reveals that this is an inefficient implementation (instructions
may vary between compiler releases).
Because you cannot specify the Carry flag in C, you must get the compiler to produce almost the right code, and then modify it by hand. Start with (incorrect) code that does not perform the carry addition, as in Example 6.3.
Example 6.3.
void add_64(int64 *dest, int64 *src1, int64 *src2) { dest->lo=src1->lo + src2->lo; dest->hi=src1->hi + src2->hi; return; }
Copy file examples/candasm/add64_2.c
(which
contains the code in Example 6.3)
to your current working directory.
Compile it to ARM assembly language source as follows:
armcc -li -S add64_2.c
You can find the assembly language produced by the compiler
in the file add64_2.s
.
Example 6.4.
add_64 LDR a4,[a2,#0] LDR ip,[a3,#0] ADD a4,a4,ip STR a4,[a1,#0] LDR a2,[a2,#4] LDR a3,[a3,#4] ADD a2,a2,a3 STR a2,[a1,#4] MOV pc,lr
Comparing this to the C source, you can see that the first ADD
instruction
produces the low order word, and the second produces the high order
word. To correct this, get the carry from the low to high word by
changing:
the first ADD
to ADDS
(add
and set flags)
the second ADD
to an ADC
(add
with carry)
You can find this modified
code in the directory examples/candasm
as add64_3.s
.
The most obvious effect of the APCS on the example code is the change in register names:
a1 holds a pointer to the destination structure.
a2 and a3 hold pointers to the operand structures.
a4 and ip are used as temporary registers that are not preserved. The conditions under which ip can be corrupted are discussed in A more detailed look at APCS register usage.
This is a simple leaf function that uses few temporary registers,
so none are saved to the stack and restored on exit. Therefore you
can use a simple MOV pc,lr
to return.
If you wish to return another result, such as the carry out from the addition, you must load it into a1 prior to exit. You can do this as follows:
Change the second ADD
to ADCS
(add
with carry and set flags).
Add the following instructions to load a1 with 1 or 0 depending on the carry out from the high order addition.
MOV a1, #0 ADC a1, a1, #0
Change the return type of function declaration for add-64()
from void to int.
Although the first C implementation is inefficient, it shows more about the APCS than the hand-modified version.
You have already seen a4 and ip being used as non-preserved temporary registers. However, here v1 and lr are also used as temporary registers. v1 is preserved by being stored (together with lr) on entry. Register lr is corrupted, but a copy is saved onto the stack and reloaded into pc when v1 is restored. This means that there is still only a single exit instruction, but now it is:
LDMIA sp!,{v1,pc}