2.3.12. Controlling code generation

Use the options described in this section to control aspects of the code generated by the compiler such as optimization. See Pragmas for information on other code generation options that are controlled using pragmas.

This section describes:

Targeting the instruction set

These options control the target instruction set:

--arm

Configures the compiler to target the ARM instruction set. This is the default.

--thumb

Configures the compiler to target the Thumb instruction set. This predefines __thumb and __thumb__.

Also, see the descriptions of #pragma arm and #pragma thumb in Pragmas controlling code generation. These pragmas enable you to compile specific functions for ARM or Thumb.

If you are compiling code that is intended for mixed ARM/Thumb systems for processors that support ARMv4T or ARMv5, then you must specify the interworking option --apcs /interwork. This is enabled by default for processors that support ARMv5 or above. See Interworking qualifiers for more details. Interworking is described in detail in RealView Compilation Tools v3.0 Developer Guide.

If you enter armcc --thumb --fpu vfp on the command line, the compiler compiles as much of the code using the Thumb instruction set as possible. However, the compiler might generate ARM code for some parts of the compilation.

If you enter armcc --thumb on the command line, the compiler compiles as much of the code using the Thumb instruction set as possible. However, the compiler might generate ARM code for some parts of the compilation. In particular, if you are compiling code for a pre-Thumb-2 processor and using VFP, any function containing floating-point operations is compiled for ARM.

See details on the argument --fpu name in Specifying the target processor or architecture.

Setting byte order

These options control endianess:

--littleend

Generates code for an ARM processor using little-endian memory. With little-endian memory, the least significant byte of a word has the lowest address. This is the default.

--bigend

Generates code for an ARM processor using big-endian memory. With big-endian memory, the most significant byte of a word has the lowest address.

Choose between Byte Invariant Addressing mode and Word Invariant Addressing mode at link time with the armlink command-line options --be8 and --be32 (see RealView Compilation Tools v3.0 Linker and Utilities Guide for details).

Defining optimization criteria

The optimization options can be grouped into:

Multi-optimization options

This section describes how to control multiple optimizations with a single option.

You can also apply the -Onum, -Ospace, and -Otime optimizations on individual functions using pragmas. See Pragmas controlling multiple optimizations for more information.

Note

The optimization options prefixed by -O are specified using lowercase. However, the -O prefix must be uppercase.

The multi-optimization options are:

-O, num

Specifies the level of optimization to be used:

-O0

Minimum optimization. Turns off most optimizations. It gives the best possible debug view and the lowest level of optimization.

-O1

Restricted optimization. Removes unused inline functions and unused static functions. Turns off optimizations that seriously degrade the debug view. If used with --debug (see Debug table generation options), this option gives a satisfactory debug view with good code density.

-O2

High optimization. If used with --debug (see Debug table generation options), the debug view might be lss satisfactory because the mapping of object code to source code is not always clear.

This is the default optimization level.

-O3

Maximum optimization. -O3 performs the same optimizations as -O2 however the balance between space and time optimizations in the generated code is more heavily weighted towards space or time compared with -O2. That is:

  • -O3 -Otime aims to produce faster code than -O2 -Otime, at the risk of increasing your image size

  • -O3 -Ospace aims to produce smaller code than -O2 -Ospace, but performance might be degraded.

In addition, -O3 performs extra optimizations that are more aggressive, such as:

  • High-level scalar optimizations, including loop unrolling. This can give significant performance benefits at a small code size cost, but at the risk of a slower build time.

  • More aggressive inlining and automatic inlining for -O3 -Otime.

  • Multifile compilation by default (see Multifile compilation).

Note

For floating-point code, -O3 is not necessarily ISO C and C++ standard-compliant. Use -O3 --fpmode=std to ensure ISO compliance. See the description of --fpmode for more information.

Note

Do not rely on the implementation details of these optimizations, because they might change in future releases.

-Ospace

Instructs the compiler to perform optimizations to reduce image size at the expense of a possible increase in execution time. For example, large structure copies are done by out-of-line function calls instead of inline code. Use this option if code size is more critical than performance. This is the default.

-Otime

Instructs the compiler to perform optimizations to reduce execution time at the possible expense of a larger image. Use this option if execution time is more critical than code size. For example, it compiles:

while (expression) body;

as:

if (expression) {
    do body;
    while (expression);
}

If you specify neither -Otime nor -Ospace, the compiler uses -Ospace. You can compile time-critical parts of your code with -Otime, and the rest with -Ospace.

If you specify both -Otime and -Ospace in the same compiler invocation, the last one wins (see Ordering command-line options).

--feedback filename

Specifies the feedback file created by a previous execution of the ARM linker. The file contains a list of functions that the linker identifies as being unused in your code. The contents of this file are optimization hints only. These hints might be ignored by the compiler. Therefore, this is a safe optimization.

See Linker feedback for more details.

Note

It is recommended that you use liker feedback in preference to the --split_sections option (formerly -zo) for removing unused functions. This is because linker feedback produces smaller code, by avoiding the overhead of splitting all sections.

--fpmode model

Specifies the floating-point conformance, and sets library attributes and floating-point optimizations. model can be one of:

ieee_full

All facilities, operations, and representations guaranteed by the IEEE standard are available in single and double-precision. Modes of operation can be selected dynamically at runtime.

This defines the symbols:


__FP_IEEE
__FP_FENV_EXCEPTIONS
__FP_FENV_ROUNDING
__FP_INEXACT_EXCEPTION

ieee_fixed

IEEE standard with round-to-nearest and no inexact exceptions.

This defines the symbols:


__FP_IEEE
__FP_FENV_EXCEPTIONS

ieee_no_fenv

IEEE standard with round-to-nearest and no exceptions. This mode is stateless and is compatible with the Java floating-point arithmetic model.

This defines the symbol __FP_IEEE.

std

IEEE finite values with denormals flushed to zero, round-to-nearest, and no exceptions. This is compatible with standard C and C++ and is the default option.

Normal finite values are as predicted by the IEEE standard. However:

  • NaNs and infinities might not be produced in all circumstances defined by the IEEE model. Also, when they are produced, they might not have the same sign.

  • The sign of zero might not be that predicted by the IEEE model.

fast

Perform more aggressive floating-point optimizations that might cause a small loss of accuracy to provide a significant performance increase. This option defines the symbol __FP_FAST.

This option results in behavior that is not fully compliant with the ISO C or C++ standard, however numerically robust floating-point programs will behave correctly.

A number of transformations might be performed, including:

  • Double-precision math functions might be converted to single precision equivalents if all floating-point arguments can be exactly represented as single precision values, and the result is immediately converted to a single-precision value.

    This transformation is only performed when the selected library contains the single-precision equivalent functions, for example, when the selected library is rvct or aeabi_glibc (for more information, see the description of --library_interface in Single-optimization options).

    For example:

    float f(float a) { return sqrt(a); }
    

    is transformed to

    float f(float a) { return sqrtf(a); }.
    
  • Double-precision floating-point expressions that are narrowed to single-precision are evaluated in single-precision when it is beneficial to do so. For example, float y = (float)(x + 1.0) is evaluated as float y = (float)x + 1.0f.

  • Division by a floating-point constant is replaced by multiplication with the inverse. For example, x / 3.0 is evaluated as x * (1.0 / 3.0).

  • It is not guaranteed that the value of errno is compliant with the ISO C or C++ standard after math functions have been called. This enables the compiler to inline the VFP square root instructions in place of calls to sqrt() or sqrtf().

--multifile

Enables the compiler to perform optimization across all specified files, instead of on each individual file. The specified files are compiled into one single object file. Using --multifile requires large amounts of memory while compiling. Although there is no limit to the number of files you can specify on the command line, a practical limit is 10 source files.

--multifile is on by default for optimization level -O3.

For more details on multifile compilation, see Multifile compilation.

--vfe --no_vfe

Enables or disables unused virtual function elimination (VFE) in C++ mode. --vfe is the default, except for the case where legacy object files compiled with a pre-RVCT v2.1 compiler do not contain VFE information.

When VFE is enabled, the compiler places the information in special sections with the prefix .arm_vfe_. These sections are harmless to a linker that is not VFE-aware, because they are not referenced by the rest of the code. Therefore, they do not increase the size of the executable. However, they increase the size of the object files. If this is a problem, then specify --no_vfe.

For more details on VFE, and the associated linker options, see RealView Compilation Tools v3.0 Linker and Utilities Guide. Also, see Calling a pure virtual function for more information on pure virtual functions.

Single-optimization options

This section describes how to have individual control of the compiler optimizations:

--autoinline --no_autoinline

Enables or disables automatic inlining. --no_autoinline is the default for optimization levels -O0 and -O1, and --autoinline is the default for optimization levels -O2 and -O3 (see Multi-optimization options).

The compiler automatically inlines functions where it is sensible to do so. The -Ospace and -Otime options influence how the compiler automatically inlines functions. Selecting -Otime increases the likelihood that functions are inlined.

--data_reorder --no_data_reorder

Enables or disables automatic reordering of top-level data items (globals, for example). The compiler can save memory by eliminating wasted space between data items. However, --data_reorder can break legacy code, if the code makes invalid assumptions about ordering of data by the compiler.

The ISO C Standard does not guarantee data order, so you must avoid writing code that depends on any assumed ordering. If you require data ordering, place the data items into a structure.

--forceinline

If used, the compiler always attempts to inline those functions marked as __inline, if possible. The compiler attempts to inline the function, regardless of the characteristics of the function. However, it does not inline a function if doing so causes problems, for example, a recursive function is inlined only once.

If you want to force specific functions to be inlined, use the __forceinline function storage class modifier (see Function storage class qualifiers).

--no_inline

Disables inlining of functions (see --inline). Calls to inline functions are not expanded inline. You can use this option to help debug inline functions.

If a function is declared inline, then it is compiled out-of-line into a common code section. Functions marked as __forceinline are still expanded inline (see Function storage class qualifiers).

--inline

Enables the compiler to inline functions. This is the default.

The compiler inlines functions as follows:

  • Automatically, for optimization levels -O2 and -O3 (see Multi-optimization options), unless you use the option --no_autoinline.

  • When the function is qualified as an inline function. That is with the __inline keyword in C, the __forceinline keyword in C and C++, or the inline keyword in C++. This applies for all optimization levels. Functions that are explicitly qualified as inline functions are more likely to be inlined. However using the inline qualifier does not guarantee that functions are inlined. See Function keywords. Also, see the description of --forceinline.

The compiler changes the criteria for inlining functions depending on whether you select -Ospace or -Otime. Select -Otime to increase the likelihood that a function is inlined. See Multi-optimization options for more details.

Sometimes, an out-of-line copy of an inlined function might remain in an object or image, even though that code is no longer used. Linker feedback enables you to detect and remove any unused code fragments. See Linker feedback.

Note

When you set a breakpoint on an inline function, an ARM debugger attempts to set a breakpoint on each inlined instance of that function. If you are using Multi-ICE®, RealView ICE, or other hardware to debug an image in ROM, and the number of inline instances is greater than the number of available hardware breakpoints, the debugger cannot set the additional breakpoints and reports an error.

--lower_ropi --no_lower_ropi

Enables or disables less restrictive C in ROPI mode. See Position independence qualifiers for details of the /ropi option.

Note

If you compile with --lower_ropi, then the static initialization is done at runtime by the C++ constructor mechanism, even for C. This enables these static initializations to work with ROPI code.

--lower_rwpi --no_lower_rwpi

Enables or disables less restrictive C and C++ in RWPI mode. --lower_rwpi is the default. See Position independence qualifiers for details of the /rwpi option.

Note

If you compile with --lower_rwpi, then the static initialization is done at runtime by the C++ constructor mechanism, even for C. This enables these static initializations to work with RWPI code.

--split_ldm

By default, the compiler uses registers for LDM and STM instructions:

  • 16, for ARM instructions

  • 15, for 32-bit Thumb-2 instructions

  • eight, for 16-bit Thumb and Thumb-2 instructions.

The --split_ldm option instructs the compiler to split LDM and STM instructions into two or more LDM or STM instructions, where required, to reduce the maximum number of registers transferred to:

  • five, for all STMs, and for LDMs that do not load the PC

  • four, for LDMs that load the PC.

Inline assembler LDM and STM instructions are split by default. However, the compiler might subsequently recombine the separate instructions into an LDM or STM (see Instruction expansion for more details).

The --split_ldm option has the following effects:

  • It can reduce interrupt latency on ARM systems that:

    • do not have a cache or a write buffer (for example, a cacheless ARM7TDMI)

    • use zero-wait-state, 32-bit memory.

    Note

    Using --split_ldm increases code size and decreases performance slightly.

  • It does not split VFP FLDM or FSTM instructions.

There are some systems that do not benefit from being built with --split_ldm:

  • It has no significant benefit for cached systems, or for processors with a write buffer.

  • It has no benefit for systems with nonzero-wait-state memory, or for systems with slow peripheral devices. Interrupt latency in such systems is determined by the number of cycles required for the slowest memory or peripheral access. Typically, this is much greater than the latency introduced by multiple register transfers.

--library_interface=lib

Specifies that the compiler output works with the RVCT libraries or with any AEABI-compliant library. lib can be one of:

rvct

Specifies that the compiler output works with the RVCT runtime libraries. Use this option to exploit the full range of compiler optimizations when linking. This is the default.

aeabi_clib

Specifies that the compiler output works with any AEABI-compliant C library.

aeabi_glibc

Specifies that the compiler output works with an AEABI-compliant version of the GNU C library.

Use this option when linking with any ABI-compliant, third-party, libraries or where your code includes replacement functions, for example, where using an embedded operating system. In this case, use this option to disable the compiler variants, for example, if you are re-implementing any functions such as printf or scanf. This option ensures that the compiler does not generate calls to any optimized functions. See ABI for the ARM Architecture compliance for more details.

--split_sections

Generates one ELF section for each function in the source file. Output sections are named with the same name as the function that generates the section, but with an i. prefix. For example:


int f(int x) { return x+1; }

compiled with --split_sections gives:

        AREA ||i.f||, CODE, READONLY
f PROC
        ADD      r0,r0,#1
        MOV      pc,lr

This option increases code size slightly (typically by a few percent) for some functions because it reduces the potential for sharing addresses, data, and string literals between functions.

Note

If you want to remove unused functions, it is recommended that you use the linker feedback optimization in preference to this option. This is because linker feedback produces smaller code, by avoiding the overhead of splitting all sections. See Linker feedback for more details.

Controlling symbols

This section describes how to control symbol visibility:

--export_defs_implicitly

Enables you to control how dynamic symbols are exported. Use this option to export definitions where the prototype was marked __declspec(dllimport).

See Storage class modifiers for details on __declspec(dllimport).

--dllexport_all

Enables you to control symbol visibility when building DLLs. Use this option to mark all extern definitions as __declspec(dllexport).

See Storage class modifiers for details on __declspec(dllexport).

--no_hide_all

Enables you to control symbol visibility when building SVr4 shared objects. Use this option to mark all extern definitions as __declspec(dllexport), and to import all undefined references.

See Storage class modifiers for details on __declspec(dllexport).

Setting pointer alignment options

This option enables you to control pointer alignment:

--pointer_alignment=num

Specifies the unaligned pointer support required, where num is one of the following:

1

Treats accesses through pointers as having an alignment of one, that is, byte-aligned or unaligned.

2

Treats accesses through pointers as having an alignment of at most two, that is, at most halfword aligned.

4

Treats accesses through pointers as having an alignment of at most four, that is, at most word aligned.

8

Accesses through pointers have normal alignment, that is, at most doubleword aligned.

De-aligning pointers might increase the code size, even on CPUs with unaligned access support. This is because only a subset of the load and store instructions benefit from unaligned access support. The compiler is unable to use multiple-word transfers or coprocessor-memory transfers, including hardware floating-point loads and stores, directly on unaligned memory objects.

Note

  • Code size might increase significantly when compiling for CPUs without hardware support for unaligned access.

  • Unaligned pointer mode does not affect the placement of objects in memory, nor the layout and padding of structures.

This option assists the porting of source code that has been written for architectures without alignment requirements. You can achieve finer control of access to unaligned data, with less impact on the quality of generated code, using the __packed qualifier. For more details on the __packed qualifier, see Type qualifiers.

Setting alignment options

These options enable you to control memory alignment:

--unaligned_access --no_unaligned_access

If you specify a processor that supports ARMv6 (for example, --cpu ARM1136J-S) or the ARMv6 architecture (that is, --cpu 6), the compiler assumes the U bit is set and utilizes unaligned access support to speed up accesses to packed structures by enabling an LDR instruction to load from, or an STR instruction to store to, a non-word aligned address. This means that the compiler might generate unaligned word and halfword accesses, and might select a library that supports unaligned accesses. Structures remain unpacked, unless you explicitly qualify them with __packed (see Type qualifiers).

Therefore, code compiled for ARMv6 can run correctly only if you enable unaligned support. To do this, you must set the U bit (bit 22) of CP15 register 1 in your initialization code. This can also be achieved in hardware, by tying the UBITINIT input to the core HIGH.

Use --no_unaligned_access to disable the generation of unaligned accesses on ARMv6 processors.

Note

The --no_unaligned_access option replaces the (now deprecated) --memaccess -UL41. The --memaccess option is deprecated and will be removed in a future release.

--min_array_alignment=option

Specifies the minimum alignment of arrays, where option is one of the following:

1

Byte alignment, or unaligned.

2

Two-byte (halfword) alignment.

4

Four-byte (word) alignment.

8

Eight-byte (doubleword) alignment.

For example, compiling the following code with --min_array_alignment=8, gives the alignment described in the comments:

char arr_c1[1];      // alignment == 8
char c1;             // alignment == 1
char arr_c2[3];      // alignment == 8
char arr_c3[10];     // alignment == 8

struct st {
    int i1;
} c;                 // alignment == 4

char c2;             // alignment == 1

Also, see Storage class modifiers for a description of the __align(n) storage class modifier.

Controlling implementation details

These options enable you to specify implementation details:

--enum_is_int

Forces the size of all enumeration types to be at least 4 bytes. This option is switched off by default and the smallest data type is used that can hold the values of all enumerators.

Note

The --enum_is_int option is not recommended for general use and is not required for ISO-compatible source. Code compiled with this option is not compliant with the ABI for the ARM Architecture (base standard) [BSABI], and incorrect use might result in a failure at runtime. This option is not supported by the C++ libraries.

--dollar --no_dollar

Accepts dollar signs, $, in identifiers. The default is --dollar, except in --strict mode.

--alternative_tokens --no_alternative_tokens

Enables or disables the recognition of alternative tokens. This controls recognition of the digraphs in C and C++, and controls recognition of the operator keywords, such as and and bitand, in C++. For more details on digraphs, see The Design and Evolution of C++, or any other book describing the C++ programming language. The default behavior is --alternative_tokens.

--multibyte_chars --no_multibyte_chars

Enables or disables processing for multibyte character sequences in comments, string literals, and character constants. Multibyte encodings are used for character sets such as the Japanese Shift-Japanese Industrial Standard (Shift-JIS). The default behavior is --no_multibyte_chars.

--locale lang_country

Use this option in combination with --multibyte_chars to switch the default locale for source files to the one you specify in lang_country.

For example, to compile Japanese source files on an English-based Windows workstation, use:


--multibyte_chars --locale japanese

and on a UNIX workstation use:


--multibyte_chars --locale ja_JP

The locale name might be case-sensitive, depending on the host platform.

The permitted settings of locale are determined by the host platform.

Ensure that you have installed the appropriate locale support for the host platform.

--message_locale lang_country --message_locale lang_country.codepage

Use this option to switch the default language for the display of error and warning messages to the one you specify in lang_country or lang_country.codepage.

For example, to display messages in Japanese, use:


--message_locale ja_JP

The locale name might be case-sensitive, depending on the host platform.

Ensure that you have installed the appropriate locale support for the host platform.

The permitted languages are independent of the host platform. The following settings are supported in this release of RVCT:

  • en_US (the default)

  • zh_CN

  • ko_KR

  • ja_JP.

The ability to specify a codepage, and its meaning, depends on the host platform.

If you specify a setting that is not supported, the compiler silently ignores this and uses the default for your environment.

--loose_implicit_cast

Makes illegal implicit casts legal, such as implicit casts of a nonzero int to pointer, for example:


int *p = 0x8000;

Without this option, the compiler reports:


Error:  #144: a value of type “int” cannot be used to initialize an entity of type “int *”

With this option, the compiler generates the following warning message, which you can suppress (see Suppressing diagnostic messages):


Warning:  #152-D: conversion of nonzero integer to pointer

--restrict --no_restrict

Enables or disables the use of the C99 restrict keyword. The default is --no_restrict.

See restrict for more details on the restrict keyword.

--signed_bitfields --unsigned_bitfields

Makes bitfields signed. The default is --unsigned_bitfields.

Note

The AAPCS requirement for bitfields to default to unsigned on ARM has been overturned.

--signed_chars --unsigned_chars

Makes the char type to be signed or unsigned. The default is --unsigned_chars.

When char is signed, the macro __FEATURE_SIGNED_CHAR is defined by the compiler.

For --unsigned_chars, any char that is assigned a negative number causes the following warning to be generated:


Warning:  #68-D: integer conversion resulted in a change of sign

Note

The --signed_chars option is not recommended for general use and is not required for ISO-compatible source. Code compiled with this option is not compliant with the ABI for the ARM Architecture (base standard) [BSABI], and incorrect use might result in a failure at runtime. This option is not supported by the C++ libraries.

Copyright © 2002-2006 ARM Limited. All rights reserved.ARM DUI 0205G
Non-Confidential