3.3.9. Branch inlining

armlink has global visibility of all your program code and so can perform some additional branch optimizations.

armlink uses branch inlining to optimize small function calls in your image. A small function is defined as any one-instruction function that can be inlined into the 4 bytes of a BL or BLX instruction. In this case, there is no branch and, therefore, the return address is redundant.

Note

This branch optimization is off by default because enabling it changes the image such that debug information might be incorrect. If enabled, the linker makes no attempt to correct the debug information.

Use the command-line options to control branch inlining:

--inline

Enables branch inlining (see Controlling inlining).

--no_branchnop

Prevents a branch being replaced with a NOP (see Controlling inlining).

--tailreorder

Moves tail calling sections immediately before their target, if possible, to optimize function calls (see Handling tail calling sections).

If you enable branch inlining, armlink scans each function call in the image and then inlines where applicable. When armlink inlines a function, it removes the reference to the called function from the caller. armlink applies this optimization before any unused sections are eliminated so that any section that is always inlined can then be removed.

Use the --info command-line option to display information about branch inlining:

--info inline

Displays a message each time a function is inlined and gives the total number of inlines, for example:


Small function inlining results

Inlined function __Heap_DescSize from object h1_alloc.o at offset 0x5c in section .text from object malloc.o.
Inlined function __ieee_status from object istatus.o at offset 0x40 in section .text from object _printf_fp_dec.o.
.
Inlined total of 6 calls.

Controlling inlining

If you have enabled branch inlining, there are certain conditions that a function must meet in order to be inlined:

  • armlink handles only the simplest cases and does not inline any instruction that reads or writes to the PC because this depends on the location of the function.

  • The action of the linker also depends on the size of the symbol representing a function and on the caller (ARM or Thumb) and the callee (ARM or Thumb) as shown in Table 3.2.

    Table 3.2. Inlining small functions

    CallerCalleeSymbol size that can be inlined
    ARMARM4 to 8 bytes
    ARMThumb2 to 6 bytes
    ThumbThumb2 to 6 bytes
    ThumbARM4 to 8 bytes
  • When branch optimization is enabled, the linker replaces any branch with a relocation that resolves to the next instruction with a NOP.

    This is the default behavior. However, there are cases where you might want to disable the option, for example, when performing verification or pipeline flushes.

    Use the --no_branchnop option to disable this behavior.

  • In order to be inlined, the last instruction of a function must be either:


    MOV pc, rn

    or


    BX rn

    By default, any function that consists of just a return sequence is inlined as a NOP.

  • A conditional ARM instruction can only be inlined if either the condition on the BL matches the condition on the instruction being inlined, or the BL or instruction to be inlined is unconditional. For example, BLEQ can only inline an unconditional instruction like ADD or an instruction with a matching condition like ADDEQ.

    An unconditional ARM BL can inline any conditional or unconditional instruction that satisfies all the other criteria.

  • A BL that is the last instruction of an IT block cannot inline a 16-bit Thumb instruction or a 32-bit MRS, MSR, or CPS instruction. This is because the IT block changes the behavior of the instructions within its scope so inlining the instruction would change the behavior of the program.

  • If your image contains both ARM and Thumb code, functions that are called from the other state must be built for interworking. An ARM caller might inline a Thumb callee if an equivalent ARM instruction is available. However, a Thumb caller cannot inline an ARM callee. Also, armlink can inline up to two 16-byte Thumb instructions, However, an ARM caller can only inline a single 16-bit Thumb instruction.

Handling tail calling sections

As described in Controlling inlining, the linker replaces any branch with a relocation that resolves to the next instruction with a NOP. This means that tail calling sections, that is, sections that finish with a branch instruction, might be optimized so that their target appears immediately after them in the execution region.

You can take advantage of this behavior by using the command-line option --tailreorder to move tail calling sections above their target. If this is possible, be aware that:

  • armlink can only move one tail calling section for each tail call target. If there are multiple tail calls to a single section, the tail calling section with an identical section name is moved before the target. If no section name is found in the tail calling section that has a matching name, then the linker moves the first section it encounters.

  • armlink cannot move a tail calling section out of its execution region.

  • armlink does not move tail calling sections before inline veneers.

Use the --info command-line option to display information about tail call optimization. For example, --info tailreorder gives details of any moved tail calling sections:


Tailcall reorder results
Tail calling Section !!!main from object __main.o placed before .text from kernel.o
Tail calling Section .text from object rt_raise.o placed before .text from sys_exit.o
Tail calling Section .text from object plibspace.o placed before .text from libspace.o
Tail calling Section .text from object aeabi_idiv0.o placed before .text from rt_div0.o
......

Copyright © 2002-2005 ARM Limited. All rights reserved.ARM DUI 0206F
Non-Confidential