| |||
| Home > Using the Basic Linker Functionality > Optimizations and modifications > Branch inlining | |||
armlink has global visibility of all your program code and so can perform some additional branch optimizations.
armlink uses branch inlining to optimize
small function calls in your image. A small function is defined
as any one-instruction function that can be inlined into the 4 bytes of
a BL or BLX instruction. In this case,
there is no branch and, therefore, the return address is redundant.
This branch optimization is off by default because enabling it changes the image such that debug information might be incorrect. If enabled, the linker makes no attempt to correct the debug information.
Use the command-line options to control branch inlining:
--inlineEnables branch inlining (see Controlling inlining).
--no_branchnopPrevents a branch being replaced with a NOP (see Controlling inlining).
--tailreorderMoves tail calling sections immediately before their target, if possible, to optimize function calls (see Handling tail calling sections).
If you enable branch inlining, armlink scans each function call in the image and then inlines where applicable. When armlink inlines a function, it removes the reference to the called function from the caller. armlink applies this optimization before any unused sections are eliminated so that any section that is always inlined can then be removed.
Use the --info command-line option to display
information about branch inlining:
--info inline Displays a message each time a function is inlined and gives the total number of inlines, for example:
Small function inlining results
Inlined function __Heap_DescSize from object h1_alloc.o at offset 0x5c in section .text from object malloc.o.
Inlined function __ieee_status from object istatus.o at offset 0x40 in section .text from object _printf_fp_dec.o.
.
Inlined total of 6 calls.
If you have enabled branch inlining, there are certain conditions that a function must meet in order to be inlined:
armlink handles only the simplest cases and does not inline any instruction that reads or writes to the PC because this depends on the location of the function.
The action of the linker also depends on the size of the symbol representing a function and on the caller (ARM or Thumb) and the callee (ARM or Thumb) as shown in Table 3.2.
When branch optimization is enabled, the linker
replaces any branch with a relocation that resolves to the next
instruction with a NOP.
This is the default behavior. However, there are cases where you might want to disable the option, for example, when performing verification or pipeline flushes.
Use the --no_branchnop option to disable
this behavior.
In order to be inlined, the last instruction of a function must be either:
MOV pc, rn
or
BX rn
By default, any function that consists of just a return sequence
is inlined as a NOP.
A conditional ARM instruction can only be inlined
if either the condition on the BL matches the condition
on the instruction being inlined, or the BL or instruction to
be inlined is unconditional. For example, BLEQ can
only inline an unconditional instruction like ADD or
an instruction with a matching condition like ADDEQ.
An unconditional ARM BL can inline any conditional
or unconditional instruction that satisfies all the other criteria.
A BL that is the last instruction of
an IT block cannot inline a 16-bit Thumb instruction or a 32-bit MRS, MSR,
or CPS instruction. This is because the IT block changes
the behavior of the instructions within its scope so inlining the
instruction would change the behavior of the program.
If your image contains both ARM and Thumb code, functions that are called from the other state must be built for interworking. An ARM caller might inline a Thumb callee if an equivalent ARM instruction is available. However, a Thumb caller cannot inline an ARM callee. Also, armlink can inline up to two 16-byte Thumb instructions, However, an ARM caller can only inline a single 16-bit Thumb instruction.
As described in Controlling inlining,
the linker replaces any branch with a relocation that resolves to
the next instruction with a NOP. This means that tail
calling sections, that is, sections that finish with a branch instruction,
might be optimized so that their target appears immediately after
them in the execution region.
You can take advantage of this behavior by using the command-line
option --tailreorder to move tail calling sections
above their target. If this is possible, be aware that:
armlink can only move one tail calling section for each tail call target. If there are multiple tail calls to a single section, the tail calling section with an identical section name is moved before the target. If no section name is found in the tail calling section that has a matching name, then the linker moves the first section it encounters.
armlink cannot move a tail calling section out of its execution region.
armlink does not move tail calling sections before inline veneers.
Use the --info command-line option to display
information about tail call optimization. For example, --info
tailreorder gives details of any moved tail calling sections:
Tailcall reorder results
Tail calling Section !!!main from object __main.o placed before .text from kernel.o
Tail calling Section .text from object rt_raise.o placed before .text from sys_exit.o
Tail calling Section .text from object plibspace.o placed before .text from libspace.o
Tail calling Section .text from object aeabi_idiv0.o placed before .text from rt_div0.o
......