9.1.2. Indirect result location

To reiterate, the X8 (XR) register is used to pass the indirect result location. Here is some code:

  //test.c//

  struct struct_A
  {
      int i0;
      int i1;
      double d0;
      double d1;
  } AA;

  struct struct_A foo(int i0, int i1, double d0, double d1)
  {
      struct struct_A A1;

      A1.i0 = i0;
      A1.i1 = i1;
      A1.d0 = d0;
      A1.d1 = d1;

      return A1;
  }

  void bar()
  {
      AA = foo(0, 1, 1.0, 2.0);
  }

and that can be compiled using:

  armclang -target aarch64-arm-none-eabi -c test.c
  fromelf-c test.o

Note

This code is compiled without optimization to demonstrate the mechanisms and principles involved. It is possible that with optimization, the compiler might remove all of this.

  foo//
      SUB SP, SP, #0x30
      STR W0, [SP, #0x2C]
      STR W1, [SP, #0x28]
      STR D0, [SP, #0x20]
      STR D1, [SP, #0x18]
      LDR W0, [SP, #0x2C]
      STR W0, [SP, #0]
      LDR W0, [SP, #0x28]
      STR W0, [SP, #4]
      LDR W0, [SP, #0x20]
      STR W0, [SP, #8]
      LDR W0, [SP, #0x18]
      STR W0, [SP, #10]
      LDR X9, [SP, #0x0]
      STR X9, [X8, #0]
      LDR X9, [SP, #8]
      STR X9, [X8, #8]
      LDR X9, [SP, #0x10]
      STR X9, [X8, #0x10]
      ADD SP, SP, #0x30
      RET
  bar//
      STP X29, X30, [SP, #0x10]!
      MOV X29, SP
      SUB SP, SP, #0x20
      ADD X8, SP, #8
      MOV W0, WZR
      ORR W1, WZR, #1
      FMOV D0, #1.00000000
      FMOV D1, #2.00000000
      BL foo:
      ADRP X8, {PC}, 0x78
      ADD X8, X8, #0
      LDR X9, [SP, #8]
      STR X9, [X8, #0]
      LDR X9, [SP, #0x10]
      STR X9, [X8, #8]
      LDR X9, [SP, #0x18]
      STR X9, [X8, #0x10]
      MOV SP, X29
      LDP X20, X30, [SP], #0x10
      RET

In this example, the structure contains more than 16 bytes. According to the AAPCS for AArch64, the returned object is written to the memory pointed to by XR.

The generated code shows:

The advantage of using X8 (XR) is that it does not reduce the availability of registers for passing the function parameters.

An AAPC64 stack frame shown in Figure 9.2. The frame pointer (X29) should point to the previous frame pointer saved on stack, with the saved LR (X30) stored after it. The final frame pointer in the chain should be set to 0. The Stack Pointer must always be aligned on a 16 byte boundary. There can be some variation of the exact layout of a stack frame, particularly in the case of variadic or frameless functions. Consult the AAPCS64 document for details.

Figure 9.2. Stack frame

To view this graphic, your browser must support the SVG format. Either install a browser with native support, or install an appropriate plugin such as Adobe SVG Viewer.


Note

The AAPCS only specifies the FP, LR block layout and how these blocks are chained together. Everything else in Figure 9.2 (including the precise location of the boundary between frames of the two functions) is unspecified, and can be freely chosen by the compiler.

Figure 9.2 illustrates a frame that uses two callee-saved registers (X19 and X20) and one temporary variable, with the following layout (number on left is offset from the FP in bytes):

  40:  <padding>
  32:  temp
  24:  X20
  16:  X19
   8:  LR'
   0:  FP'

The padding is necessary to maintain the 16 byte alignment of the Stack Pointer.

  function:
          STP X29, X30, [SP, #-48]! // Push down stack pointer and store FP and LR
          MOV X29, SP               // Set the frame pointer to the bottom of the new
                                    // frame
          STP X19, X20, [X29, #16]  // Save X19 and X20
          :     :
          Main body of code
          :     :
          LDP 	X19, X20, [X29, #16]  // Restore X19 and X29
          LDP 	X29, X30, [SP], #48   // Restore FP' and LR' before setting the stack
                                    // pointer to its original position
          RET                       // Return to caller
Copyright © 2015 ARM. All rights reserved.ARM DEN0024A
Non-ConfidentialID050815