3.3. Static linking and relocations

An object producer, normally a compiler or assembler, only has visibility of the source code it is processing and has very limited information about the image layout. In Example 3.1 of section: Symbol definitions and references, the object producer (C/C++ compiler) knows that it needs to call the bar function using some form of branch instruction, but it has no knowledge about the address of bar used in the final executable image. Therefore, the compiler cannot generate the offset or address for the branch instruction.

To solve this problem the object producer generates a relocation to instruct the linker to fill in the required offset or address.

If you are already familiar with relocations and how they work, you can skip the following section, but if this is not the case, or you want to refresh your existing knowledge, go to the examples/example_3-1 directory to see how this works with the ARM Compiler toolchain. This directory contains the C source code for Example 3.3 and a build script, which when run, displays the code and relocation information for the ELF object.

Section 4.6.1.2 - "Relocation types" from the ELF for the ARM Architecture ABI document, can be used to check the type of relocation generated for each example. The following output is generated by the fromelf utility of the ARM Compiler toolchain. The fromelf utility is an ELF reader and image converter.

Example 3.3. Sample output from fromelf -c -r foo.o

** Section #1 '.text' (SHT_PROGBITS) [SHF_ALLOC + SHF_EXECINSTR]
    Size :    4 bytes (alignment 4)
    Address: 0x00000000

    $a
    .text
    foo
        0x00000000:    EAFFFFFE    ....    B            bar

** Section #6 '.rel.text' (SHT_REL)
    Size : 8 bytes (alignment 4)
    Symbol table #5 '.symtab'
    1 relocations applied to section #1 '.text'
    #    Offset         Relocation Type    Wrt    Symbol    Defined in

    0    0x00000000    29 R_ARM_JUMP24    7        bar        Ref

This output shows a single jump relocation with a relocation type 29, R_ARM_JUMP24. The offset of the relocation is 0x0 bytes into the file, and the relocation is for the reference to the symbol bar, which is the seventh symbol listed in the symbol table of the object.

The operation for the R_ARM_JUMP24 relocation is: ((S + A) | T) - P.

S is the address of the symbol, which equals 0x8000 if you check the address information of the final image.

A is the addend for the relocation. The addend for this "REL type relocation" is:

sign_extend (insn[23:0] << 2) = sign_extend (0xFFFFFE << 2)
                              = sign_extend 0xFFFFF8
                              = 0xFFFFFFF8
                              = -0x8[1]

The ELF for ARM Architecture document states that T is 1 if the target symbol has global binding and addresses a Thumb instruction. In the example_3-1 directory a definition for the symbol bar exists in a separate unit, and the build script targets that unit for ARM code. Therefore T is 0.

P is the address of the place being relocated, which equals 0x8018.

((S + A) | T) - P = (0x8000 + -0x8) | 0) - 0x8018
                  = 0xFFFFFFE0
                  = -0x20

Therefore, an offset of -0x20 is required for the branch instruction in Example 3.3.

The fromelf disassembly listing for the final executable image shows that the opcode used for the branch instruction is 0xEAFFFFF8. This opcode has been inserted by the static linker to ensure that the instruction causes the processor program counter to branch to bar. But why has the opcode changed from 0xEAFFFFFE (as seen in the object file) to 0xEAFFFFF8, as shown in the following example?

** Section #1 'ER_RO' (SHT_PROGBITS) [SHF_ALLOC + SHF_EXECINSTR]
    Size : 28 bytes (alignment 4)
    Address: 0x00008000
    $a
    .text
    bar
        0x00008000: E59f000C .... LDR r0,[pc,#12] ; [0x8014] = 0x801C
        0x00008004: E5901000 .... LDR r1,[r0,#0]
        0x00008008: E2411001 ..A. SUB r1,r1,#1
        0x0000800C: E5801000 .... STR r1,[r0,#0]
        0x00008010: E12FFF1E ../. BX lr
    $d
        0x00008014: 0000801C .... DCD 32796
    $a
    .text
    foo
        0x00008018: EAFFFFF8 .... B bar ; 0x8000

Under normal circumstances it is not necessary to understand the exact syntax of a specific instruction, but sometimes this information is of interest, for example, when debugging or trying to understand how instructions and relocations work. Also, a linker is responsible for modifying or inserting instructions into an executable image to ensure that it runs correctly, so it is still important to have a good understanding of how an instruction is formed. To explain what opcode 0xEAFFFFF8 means, the full syntax of a branch instruction is shown in Figure 3.2. This is taken directly from the ARM Architecture Reference Manual.

Bits 31-28 of the opcode hold the condition under which the instruction is executed, denoted by 0xE, which is 1110 in binary. This is the AL condition that means always execute or execute unconditionally.[2]

Bits 27-25 are always set to 101 for a branch (B) or branch with link (BL) instruction. Bit 24 is not set, so this means that the instruction is a standard branch (B), which also means that the processor does not store a return address in the link register, R14, because bar does not return. Therefore, the hexadecimal number denoted by the 0xA is stored into bits 27-24 of the branch instruction.

The other bits, bits 23-0, are used to specify the target address of the ARM branch instruction. The following steps are required to calculate the target address:

  1. Sign extend the 24-bit signed immediate to 30 bits, so the value 0xFFFFF8 becomes 0x3FFFFFF8.

  2. Shift the resulting value left by two bits to form a 32-bit value of 0xFFFFFFE0 (offset -0x20 calculated above, using twos complement).

  3. Add the value to the contents of the PC, which contains the address of the instruction plus 8 bytes, because of the processor instruction pipeline. Adding 0x8 (8 bytes) to 0x00008018 (the address of the branch instruction) gives: 0x00008020.

    0x00008020 + -0x00000020 = 0x00008000.
    

Example 3.4. Sample output from fromelf -r bar.o

** Section #7 '.rel.text' (SHT_REL)
        Size : 8 bytes (alignment 4)
        Symbol table #6 '.symtab'
        1 relocations applied to section #1 '.text'

        # Offset         Relocation Type    Wrt   Symbol   Defined in

        0 0x0000000C     2 R_ARM_ABS32      4     .data     #4 '.data'

This output shows a single 32-bit absolute relocation in the above output with a relocation type 2, R_ARM_ABS32. The offset of the relocation is 0xC bytes into the file and the relocation is for the reference to the symbol var. The symbol var is defined in a read/write data section (#4) called .data, which is the fourth symbol listed in the symbol table of the object.

The operation for the R_ARM_ABS32 relocation is: (S + A) | T.

S is 0x8010, if you check the address information of the symbol var in the final image.

A (the addend) is 0, because the value at address 0x800C is 0x0 in the ELF object.

T is also 0 because the target symbol addresses an ARM instruction.

The result is:

(S + A) | T = (0x8010 + 0x0) | 0)            = 0x8010

Therefore, the value 0x8010 (the address of var) is filled into the literal pool.

Figure 3.2. Extract from the ARM Architecture Reference Manual (ARM ARM)

To view this graphic, your browser must support the SVG format. Either install a browser with native support, or install an appropriate plugin such as Adobe SVG Viewer.




[1] The value 0xFFFFFFF8 is negative, but it is difficult for the human eye to read. One of the best ways to find out what negative number this value represents is to invert all of the bits and add 1 to the result (twos complement) giving: 0x00000007 + 1, which equals 0x8. Therefore the negative number 0xFFFFFFF8 represents is: -0x8. For more information, see Figure 3.2 for a description of the branch and branch with link instruction, and also Table 4-11, ARM relocation actions by instruction type, from the ELF for ARM Architecture.

[2] It is not necessary to specify the AL condition when writing ARM assembler language which is why it is not displayed in the fromelf disassembly listing.

Copyright © 2010 ARM. All rights reserved.ARM DAI 0242A
Non-ConfidentialID011411