8.1.6. Examples

The following examples demonstrate some of the ways in which you can use inline assembly language effectively.

Enabling and disabling interrupts

Interrupts are enabled or disabled by reading the cpsr flags and updating bit 7. Example 8.2 shows how this can be done by using small functions that can be inlined. These functions work only in a privileged mode, because the control bits of the cpsr and spsr cannot be changed while in User mode.

Example 8.2. 

__inline void enable_IRQ(void)
{
	int tmp;
	__asm
	{
		MRS tmp, CPSR
		BIC tmp, tmp, #0x80
		MSR CPSR_c, tmp
	}
}
__inline void disable_IRQ(void)
{
	int tmp;
	__asm
	{
		MRS tmp, CPSR
		ORR tmp, tmp, #0x80
		MSR CPSR_c, tmp
	}
}
int main(void)
{
	disable_IRQ();
	enable_IRQ();
}

Dot product

Example 8.3 calculates the dot product of two integer arrays. It demonstrates how inline assembly language can interwork with C or C++ expressions and data types that are not directly supported by the inline assembler. The inline function mlal() is optimized to a single SMLAL instruction. Use the -S -fs compiler option to view the assembly language code generated by the compiler.

Example 8.3. 

#include <stdio.h>
#define lo64(a) (((unsigned*) &a)[0]) 										// low 32 bits of a long long
#define hi64(a) (((int*) &a)[1])										// high 32 bits of a long long
__inline __int64 mlal(__int64 sum, int a, int b)
{
#if !defined(__thumb) && defined(__TARGET_FEATURE_MULTIPLY)
	__asm
	{
		SMLAL lo64(sum), hi64(sum), a, b
	}
#else
	sum += (__int64) a * (__int64) b;
#endif
	return sum;
}
__int64 dotprod(int *a, int *b, unsigned n)
{
	__int64 sum = 0;
	do
		sum = mlal(sum, *a++, *b++);
	while (--n != 0);
	return sum;
}
int a[10] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
int b[10] = { 10, 9, 8, 7, 6, 5, 4, 3, 2, 1 };
int main(void)
{
	printf("Dotproduct %lld (should be %d)\n", dotprod(a, b, 10), 220);
	return 0;
}

Long multiplies

You can use the inline assembler to optimize long multiplies on processors that support MULL instructions. Example 8.4 shows a simple long multiply routine in C.

Example 8.5 shows how you can use inline assembly language to generate optimal code for the same routine. You can use the inline assembler to write the high word and the low word of the long long separately. The compiler optimization routines detect this case and optimize the code as if the address of res was not taken.

Note

This works only at the highest compiler optimization level (-O2 compiler option).

The inline assembly language code depends on the word ordering of long long types, because it assumes that the low 32 bits are at offset 0.

Example 8.4. 

Writing the multiply routine in C:

// long multiply routine in C
long long smull(int x, int y)
{
	return (long long) x * (long long) y;
}

The compiler generates the following code:

	MOV a3,a1
	MOV a1,a2
	MOV a2,a3
	SMULL ip,a2,a1,a2
	MOV a1,ip
	MOV pc,lr

Example 8.5. 

Writing the same routine using inline assembly language:

long long smull(int x, int y)
{
	long long res;
	__asm { SMULL ((int*)&res)[0], ((int*)&res)[1], x, y }
	return res;
}

The compiler generates the following code:

	MOV a3,a1
	SMULL a1,a2,a3,a2
	MOV pc,lr
Copyright © 1997, 1998 ARM Limited. All rights reserved.ARM DUI 0040D
Non-Confidential