| |||
| Home > Using the ARM Compiler > armcc command syntax > Controlling code generation | |||
Use the options described in this section to control aspects of the code generated by the compiler such as optimization. See Pragmas for information on additional code generation options that are controlled using pragmas.
This section describes:
These options control the target instruction set:
--armConfigures the compiler to target the ARM instruction set. This is the default.
--thumbConfigures
the compiler to target the Thumb instruction set. This predefines __thumb and __thumb__.
Also, see the descriptions of #pragma arm and #pragma
thumb in Pragmas controlling code generation.
These pragmas enable you to compile specific functions for ARM or
Thumb.
If you are compiling code that is targeted at both ARM and
Thumb, then you must specify the interworking option --apcs
/interwork. See Interworking qualifiers for more details.
If you enter armcc --thumb --fpu vfp on
the command line, the compiler compiles as much of the code using
the Thumb instruction set as possible. However, the compiler might
generate ARM code for some parts of the compilation. See details
on the argument --fpu in Specifying the target processor or
architecture.name
These options control endianness:
--littleendThis option generates code for an ARM processor using little-endian memory. With little-endian memory, the least significant byte of a word has the lowest address. This is the default.
The optimization options can be grouped into:
The optimizations described in this section enable you to control multiple optimizations with a single option.
You can also apply the -O, num-Ospace,
and -Otime optimizations on individual functions using
pragmas. See Pragmas controlling multiple optimizations for
more information.
The optimization options that are prefixed by -O can
be specified using lowercase, uppercase, or mixed-case. However,
the -O prefix must be uppercase. For example:
-Ospace -OTIME -OSpace
The multi-optimization options are:
-O, numSpecifies the level of optimization to be used. The optimization levels are:
-O0Minimum optimization. Turns off most optimizations. It gives the best possible debug view and the lowest level of optimization.
-O1Restricted
optimization. Removes unused inline functions and unused static
functions. Turns off optimizations that seriously degrade the debug
view. If used with --debug (see Debug table generation options), this option
gives a satisfactory debug view with good code density.
-O2High
optimization. If used with --debug (see Debug table generation options), the debug view
might be less satisfactory because the mapping of object code to
source code is not always clear.
This is the default optimization level.
-O3Maximum
optimization. The balance between space and time optimizations in
the generated code is more heavily weighted towards space or time
compared with -O2. That is:
-O3 -Otime aims to produce faster code than -O2
-Otime, at the risk of increasing your image size
-O3 -Ospace aims to produce smaller
code than -O2 -Ospace, but performance might be
degraded.
-O3 performs the same optimizations as -O2.
In addition, -O3 performs extra optimizations that
are more aggressive, such as:
More
aggressive inlining and automatic inlining for -O3 -Otime.
Multifile compilation by default.
For more details on multifile compilation, see Multifile compilation. Also, see the
description of the --multifile option.
For floating-point code, -O3 is not necessarily
ISO C and C++ standard-compliant. Use -O3 --fpmode=std to ensure
ISO compliance. See the description of --fpmode for
more information.
Do not rely on the implementation details of these optimizations, because they might change in future releases.
-OspaceInstructs the compiler to perform optimizations to reduce image size at the expense of a possible increase in execution time. For example, large structure copies are done by out-of-line function calls instead of inline code. Use this option if code size is more critical than performance. This is the default.
-OtimeInstructs the compiler to perform optimizations to reduce execution time at the possible expense of a larger image. Use this option if execution time is more critical than code size. For example, it compiles:
while (expression)body;
as:
if (expression) { dobody; while (expression); }
If you specify neither -Otime or -Ospace,
the compiler uses -Ospace. You can compile time-critical
parts of your code with -Otime, and the rest with -Ospace.
You must not specify both -Otime and -Ospace in
the same compiler invocation. If you do, the last one wins (see Ordering command-line options).
--feedback filenameSpecifies the feedback file created by a previous execution of the ARM linker. The file contains a list of functions that the linker identifies as being unused in your code. The contents of this file are optimization hints only. These hints might be ignored by the compiler. Therefore, this is a safe optimization.
See Linker feedback for more details.
It is recommended that you use this optimization in preference
to the --split_sections option (formerly -zo)
for removing unused functions. This is because linker feedback produces
smaller code, by avoiding the overhead of splitting all sections.
--fpmode modelSpecifies the floating-point conformance, and sets
library attributes and floating-point optimizations. can
be one of:model
ieee_fullAll facilities, operations, and representations guaranteed by the IEEE standard are available in single and double-precision. Modes of operation can be selected dynamically at runtime.
This defines the symbols:
__FP_IEEE __FP_FENV_EXCEPTIONS __FP_FENV_ROUNDING __FP_INEXACT_EXCEPTION
ieee_fixedIEEE standard with round-to-nearest and no inexact exception.
This defines the symbols:
__FP_IEEE __FP_FENV_EXCEPTIONS
ieee_no_fenvIEEE standard with round-to-nearest and no exceptions. This mode is compatible with the Java floating-point arithmetic model.
This defines the symbol __FP_IEEE.
stdIEEE finite values with denormals flushed to zero, round-to-nearest, and no exceptions. It is C and C++ compatible. This is the default option.
Finite values are as predicted by the IEEE standard. However:
NaNs and infinities might not be produced in all circumstances defined by the IEEE model. Also, when they are produced, they might not have the same sign.
The sign of zero might not be that predicted by the IEEE model.
fastPerform more aggressive floating-point optimizations that might cause a small loss of accuracy to provide a significant performance increase. This option results in behavior that is not fully ISO C and C++ standard-compliant, however numerically robust floating-point programs will behave correctly.
The following optimizations are performed:
Math functions with float arguments, or double arguments that have been converted from float, call the single precision version if it exists, unless the argument is a constant expression. This is not compliant with ISO C or C++.
Double precision floating-point expressions that
are narrowed to single precision are evaluated in single precision
when it is beneficial to do so. For example, float y
= (float)(x + 1.0) is evaluated as float y
= (float)x + 1.0f.
Division by a floating-point constant is replaced
by multiplication with the inverse. For example, x / 3.0 is evaluated
as x * (1.0 / 3.0).
This option defines the symbol __FP_FAST.
--multifileEnables
the compiler to perform optimization across all specified files, instead
of on each individual file. The specified files are compiled into one
single object file. Using --multifile requires
large amounts of memory while compiling. Although there is no limit
to the number of files you can specify on the command line, a practical
limit is 10 source files. For more details on multifile compilation,
see Multifile compilation.
This optimization is on by default for optimization level -O3.
--vfe --no_vfeEnables or disables unused virtual function elimination (VFE)
in C++ mode. --vfe is the default, except for the
case where legacy object files compiled with a pre-RVCT v2.1 compiler
do not contain VFE information.
When VFE is enabled, the compiler places the information in
special sections with the prefix .arm_vfe_. These
sections are harmless to a linker that is not VFE-aware, because
they are not referenced by the rest of the code. Therefore, they
do not increase the size of the executable. However, they increase
the size of the object files. If this is a problem, then specify --no_vfe.
For more details on VFE, and the associated linker options, see RealView Developer Kit v2.2 Linker and Utilities Guide. Also, see Calling a pure virtual function for more information on pure virtual functions.
These options enable you to have individual control of the compiler optimizations:
--autoinline
--no_autoinlineEnables or disables
automatic inlining. --no_autoinline is the default
for optimization levels -O0 and -O1,
and --autoinline is the default for optimization
levels -O2 and -O3 (see Multi-optimization options).
The compiler automatically inlines functions where it is sensible
to do so. The -Ospace and -Otime options
influence how the compiler automatically inlines functions. Selecting -Otime increases
the likelihood that functions are inlined.
--data_reorder --no_data_reorderEnables or disables automatic reordering of top-level
data items (globals, for example). The compiler can save memory
by eliminating wasted space between data items. However, --data_reorder can
break legacy code, if the code makes invalid assumptions about ordering
of data by the compiler.
The ISO C Standard does not guarantee data order, so you must avoid writing code that depends on any assumed ordering. If you require data ordering, place the data items into a structure.
--forceinlineIf this option is used, the compiler always attempts
to inline those functions marked as __inline,
if possible. The compiler attempts to inline the function, regardless
of the characteristics of the function. However, the compiler does
not inline a function if doing so causes problems, for example,
a recursive function is inlined only once.
If you want to force specific functions to be inlined, use
the __forceinline function storage class modifier
(see Function storage class modifiers).
--no_inlineDisables inlining of functions (see --inline).
Calls to inline functions are not expanded inline. You can use this
option to help debug inline functions.
If a function is declared inline, then it is compiled out-of-line into a common code section.
--inlineEnables the compiler to inline functions. This is the default.
The compiler inlines functions as follows:
Automatically, for optimization levels -O2 and -O3 (see Multi-optimization options), unless you
use the option --no_autoinline.
When the function is qualified as an inline function.
That is with the __inline keyword in C, the __forceinline keyword
in C and C++, or the inline keyword in C++. This
applies for all optimization levels. Functions that are explicitly
qualified as inline functions are more likely to be inlined. However
using the inline qualifier does not guarantee
that functions are inlined. See Function keywords. Also, see the description of --forceinline.
The compiler changes the criteria for inlining functions depending
on whether you select -Ospace or -Otime.
Select -Otime to increase the likelihood that a
function is inlined. See Multi-optimization options for more details.
Sometimes, an out-of-line copy of an inlined function might remain in an object or image, even though that code is no longer used. Linker feedback enables you to detect and remove any unused code fragments. See Linker feedback.
When you set a breakpoint on an inline function, an ARM debugger attempts to set a breakpoint on each inlined instance of that function. If you are using RVI-ME or other hardware to debug an image in ROM, and the number of inline instances is greater than the number of available hardware breakpoints, the debugger cannot set the additional breakpoints and reports an error.
--lower_ropi --no_lower_ropiEnables or disables less restrictive C in ROPI mode.
See Position independence
qualifiers for details
of the /ropi option.
If you compile with --lower_ropi, then the
static initialization is done at runtime by the C++ constructor
mechanism, even for C. This enables these static initializations
to work with ROPI code.
--lower_rwpi --no_lower_rwpiEnables or disables less restrictive C and C++ in
RWPI mode. --lower_rwpi is the default. See Position independence
qualifiers for details of
the /rwpi option.
If you compile with --lower_rwpi, then the
static initialization is done at runtime by the C++ constructor
mechanism, even for C. This enables these static initializations
to work with RWPI code.
--split_ldmInstructs
the compiler to split LDM and STM instructions
into two or more LDM or STM instructions,
where required, to reduce the maximum number of registers transferred
to:
five, for all STMs,
and for LDMs that do not load the PC
four, for LDMs that load the PC.
The --split_ldm option has the following
effects:
It can reduce interrupt latency on ARM systems that:
do not have a cache or a write buffer
use zero-wait-state, 32-bit memory.
Using --split_ldm increases code size and
decreases performance slightly.
It does not split ARM inline assembly LDM or STM instructions,
or VFP FLDM or FSTM instructions.
There are some systems that do not benefit from being built
with --split_ldm:
It has no significant benefit for cached systems, or for processors with a write buffer.
It has no benefit for systems with non-zero-wait-state memory, or for systems with slow peripheral devices. Interrupt latency in such systems is determined by the number of cycles required for the slowest memory or peripheral access. Typically, this is much greater than the latency introduced by multiple register transfers.
This option enables you to control code and data sections:
--split_sectionsThis option generates one ELF section for each function
in the source file. Output sections are named with the same name
as the function that generates the section, but with an i. prefix.
For example:
int f(int x) { return x+1; }
compiled with --split_sections gives:
AREA ||i.f||, CODE, READONLY
f PROC
ADD r0,r0,#1
MOV pc,lr
This option increases code size slightly (typically by a few percent) for some functions because it reduces the potential for sharing addresses, data, and string literals between functions.
If you want to remove unused functions, it is recommended that you use the linker feedback optimization in preference to this option. This is because linker feedback produces smaller code, by avoiding the overhead of splitting all sections. See Linker feedback for more details.
The pragma arm section pragma
specifies the code or data section name used for subsequent functions
or objects. This includes definitions of anonymous objects that
the compiler creates for initializations. See Pragmas controlling code generation for more details.
Use a scatter-loading description file to place some functions in fast memory and others in slow memory (see the chapter on using scatter-loading description files in RealView Developer Kit v2.2 Linker and Utilities Guide).
You can also use a scatter-loading file to place a function at a particular address in memory.
This option enables you to control pointer alignment:
--pointer_alignment=numSpecifies the unaligned pointer support required,
where is one of
the following:num
1Treats accesses through pointers as having an alignment of one, that is, byte-aligned or unaligned.
2Treats accesses through pointers as having an alignment of at most two, that is, at most halfword aligned.
4Treats accesses through pointers as having an alignment of at most four, that is, at most word aligned.
8Accesses through pointers have normal alignment, that is, at most doubleword aligned.
De-aligning pointers might increase the code size, even on
CPUs with unaligned access support. For example, on ARMv6, using
the UL41 memory access model. This is because only
a subset of the load and store instructions benefit from unaligned
access support. The compiler is unable to use multiple-word transfers
or coprocessor-memory transfers, including hardware floating-point
loads and stores, directly on unaligned memory objects.
Code size might increase significantly when compiling for CPUs without hardware support for unaligned access.
Unaligned pointer mode does not affect the placement of objects in memory, nor the layout and padding of structures.
This option assists the porting of source code that has been
written for architectures without alignment requirements. You can
achieve finer control of access to unaligned data, with less impact
on the quality of generated code, using the __packed qualifier.
For more details on the __packed qualifier, see Type qualifiers.
These options enable you to control memory alignment:
--memaccess optionThis option tells the compiler that the memory in
the target system has slightly restricted or expanded capabilities.
If you specify a processor that supports ARMv6 (for example, --cpu
ARM1136J-S) or the ARMv6 architecture (that is, --cpu
6), the compiler utilizes ARMv6 unaligned access support
to speed up accesses to packed structures. See ARMv6 unaligned accesses for more details.
Specify to
indicate the load and store capability:option
-UL41Disables unaligned mode for code that uses pre-ARMv6 unaligned access behavior.
It is possible that:
the processor has memory access modes available that the physical memory lacks (load aligned halfword, for example)
the physical memory has access modes that the processor cannot use (ARMv3 load aligned halfword, for example).
--min_array_alignment=optionSpecifies the minimum alignment of arrays, where is
one of the following:option
1byte alignment, or unaligned
2two-byte (halfword) alignment
4four-byte (word) alignment
8eight-byte (doubleword) alignment.
For example, compiling the following code with --min_array_alignment=8,
gives the alignment described in the comments:
char arr_c1[1]; // alignment == 8
char c1; // alignment == 1
char arr_c2[3]; // alignment == 8
char arr_c3[10]; // alignment == 8
struct st {
int i1;
} c; // alignment == 1
char c2; // alignment == 1
Also, see Keywords specific to the ARM compiler for
a description of the __align( storage
class modifier.n)
The compiler utilizes ARMv6 unaligned access support by default.
This speeds up accesses to packed structures by enabling an LDR instruction
to load from, or an STR instruction to store to, a
non-word aligned address. That is, the compiler might generate unaligned
word and halfword accesses, and might select a library that supports unaligned
accesses. Structures remain unpacked, unless you explicitly qualify
them with __packed. For more details on the __packed qualifier,
see Type qualifiers.
Therefore, code compiled for ARMv6 can run correctly only
if you enable unaligned support on the ARM core. To enable unaligned
support on an ARMv6 core, you must set the U bit (bit 22) of CP15 register
1 in your initialization code. This can also be achieved in hardware,
by tying the UBITINIT input to
the core HIGH.
To generate code that uses the pre-ARMv6 unaligned access
behavior, use the --memaccess -UL41 compiler option.
These options enable you to specify implementation details:
--enum_is_intThis option forces all enumerations to be stored in integers. This option is switched off by default and the smallest data type is used that can hold the values of all enumerators.
The --enum_is_int option is not recommended
for general use and is not required for ISO-compatible source. Code
compiled with this option is not compliant with the ABI
for the ARM Architecture (base standard) [BSABI], and
incorrect use might result in a failure at runtime. This option
is not supported by the C++ libraries.
--dollar --no_dollarAccepts dollar signs, $, in identifiers.
The default is --dollar, except in --strict mode.
--alternative_tokens
--no_alternative_tokensEnables or
disables the recognition of alternative tokens. This controls recognition
of the digraphs in C and C++, and controls recognition of the operator
keywords, such as and and bitand, in C++.
For more details on digraphs, see The Design and Evolution
of C++, or any other book describing the C++ programming
language. The default behavior is --alternative_tokens.
--multibyte_chars --no_multibyte_charsEnables or disables processing for multibyte character
sequences in comments, string literals, and character constants.
Multibyte encodings are used for character sets such as the Japanese Shift-Japanese
Industrial Standard (Shift-JIS). The default behavior
is --multibyte_chars.
--locale stringUse this option in combination with --multibyte_chars to
switch the default locale for source files to the one you specify
in . For example,
to compile Japanese source files on an English-based Windows NT
workstation, use:string
--multibyte_chars --locale japanese
The permitted settings of locale are determined by the host platform.
--loose_implicit_castMakes illegal implicit casts legal, such as implicit casts of a nonzero int to pointer, for example:
int *p = 0x8000;
Without this option, the compiler reports:
Error: #144: a value of type “int” cannot be used to initialize an entity of type “int *”
With this option, the compiler generates the following warning message, which you can suppress (see Suppressing diagnostic messages):
Warning: #152-D: conversion of nonzero integer to pointer
--restrict --no_restrictEnables or disables the use of the restrict keyword.
The default is --no_restrict.
See restrict for more details on the restrict keyword.
--signed_chars --unsigned_charsMakes the char type to be signed or
unsigned. The default is --unsigned_chars.
When char is signed, the macro __FEATURE_SIGNED_CHAR is
defined by the compiler.
For --unsigned_chars, any char that
is assigned a negative number causes the following warning to be
generated:
Warning: #68-D: integer conversion resulted in a change of sign
The --signed_chars option is not recommended
for general use and is not required for ISO-compatible source. Code
compiled with this option is not compliant with the ABI
for the ARM Architecture (base standard) [BSABI], and
incorrect use might result in a failure at runtime. This option
is not supported by the C++ libraries.