11.4.3. Changing the source

You can make further improvements to code size and performance in addition to those achieved by good use of compiler options by modifying the code to take advantage of the ARM processor features.

Use of shorts

ARM cores that implement an ARM Architecture earlier than version 4 do not have the ability to directly load or store halfword quantities (or short types). This affects code size. Generally, code generated for Architecture 3 that makes use of short is larger than equivalent code that only performs byte or word transfers. Storing a short is particularly expensive, because the ARM processor must make two byte stores. Similarly, loading a short requires a word load, followed by shifting out the unwanted halfword.

If your processor supports halfwords, use the appropriate -architecture or -processor options. Refer to Chapter 2 The ARM Compilers in the ARM Software Development Toolkit Reference Guide. This ensures that the resulting code contains the Architecture 4 halfword instructions. By default the compiler generates halfword instructions.

If you are writing or porting for processors that do not have halfword support, you should minimize the use of short values. However, this is sometimes impossible. C programs ported from x86 or 68k architectures, for example, frequently make heavy use of short. If the code has been written with portability in mind, all you may have to do is change a typedef or #define to use int instead of short. Where this is not the case, you may have to make some functional changes to the code.

You may be able to establish the extent of code size increase resulting from using shorts by compiling the code with:

armcc -Dshort=int

which preprocesses all instances of short to int. Be aware that, although it may compile and link correctly, code created with this option may not function as expected.

Whatever your approach, you need to weigh the change in code size against the opposite change in data size.

The program below illustrates the effect of using shorts, integers, and the -ARM7T option on code and data size.

#include <stdio.h>
typedef short number;
number array [2000];
number loop;
int main()
	for (loop=0; loop < 2000; loop++)
		array[loop] = loop;
	return 0;

The results of compiling the program with all three options are shown in the following table:

Table 11.3. Object code and data sizes

code size

inline data

inline strings

const data

RW data

0-init data

debug data









short with hardware support (see note)

















See Specifying the target processor and architecture of the ARM Software Development Toolkit Reference Guide for details of hardware support for halfwords.

Other changes

  • Modify performance-critical C source to compile efficiently on the ARM. See Improving performance and code size.

  • Port small, performance-critical routines into ARM assembly language.

Compile with the -S option to produce assembly output without generating object code, and take this as a starting point for your own hand-optimized assembly language. When you specify the -S option you can also specify-fs to write a file containing interleaved C or C++ and assembly language (see Specifying output format of the ARM Software Development Toolkit Reference Guide).

You can make significant performance improvements by using Load and Store Multiple instructions in memory-intensive algorithms. When optimizing the routines:

  • use load/store multiple instructions for memory-intensive algorithms

  • use 64-bit result multiply instructions (where available) for fixed-point arithmetic

  • replace small, performance-critical functions by macros, or use the __inline preprocessor directive

  • avoid the use of setjmp() in performance-critical routines (particularly in pcc mode).

Copyright © 1997, 1998 ARM Limited. All rights reserved.ARM DUI 0040D