5.1.3. Registers

The A64 64-bit register bank helps reduce register pressure in most applications.

The A64 Procedure Call Standard (PCS) passes up to eight parameters in registers (X0-X7). In contrast, A32 and T32 pass only four arguments in registers, with any excess being passed on the stack.

The PCS also defines a dedicated Frame Pointer (FP), which makes debugging and call-graph profiling easier by making it possible to reliably unwind the stack. Refer to Chapter 9 The ABI for ARM 64-bit Architecture for further information.

A consequence of adopting 64-bit wide integer registers is the varying widths of variables used by programming languages. A number of standard models are currently in use, which differ mainly in the size defined for integers, longs, and pointers:

Table 5.1. Variable width

long long646464

64-bit Linux implementations use LP64 and this is supported by the A64 Procedure Call Standard. Other PCS variants are defined that can be used by other operating systems.

Zero register

The zero register (WZR/XZR) is used for a few encoding tricks. For example, there is no plain multiply encoding, just multiply-add. The instruction MUL W0, W1, W2 is identical to MADD W0, W1, W2, WZR which uses the zero register. Not all instructions can use the XZR/WZR. As we mentioned in Chapter 4, the zero register shares the same encoding as the stack pointer. This means that, for some arguments, for a very limited number of instructions, WZR/XZR is not available, but WSP/SP is used instead.

Example 5.1. Using the Zero register to write a zero to memory

In A32:

  mov  r0, #0
  str  r0, [...]

In A64 using the zero register:

  str  wzr, [...]

No need for a spare register. Or write 16 bytes of zeros using:

  stp xzr, xzr, [...] etc

A convenient side-effect of the zero register is that there are many NOP instructions with large immediate fields. For example, ADR XZR, #<imm> alone gives you 21 bits of data in an instruction with no other side effects. This is very useful for JIT compilers, where code can be patched at runtime.

Stack pointer

The Stack Pointer (SP) cannot be referenced by most instructions. Some forms of arithmetic instructions can read or write the current stack pointer. This might be done to adjust the stack pointer in a function prologue or epilogue. For example:

ADD SP, SP, #256        // SP = SP + 256
Program counter

The current Program Counter (PC) cannot be referred to by number as if part of the general register file and therefore cannot be used as the source or destination of arithmetic instructions, or as the base, index or transfer register of load and store instructions.

The only instructions that read the PC are those whose function it is to compute a PC-relative address (ADR, ADRP, literal load, and direct branches), and the branch-and-link instructions that store a return address in the link register (BL and BLR). The only way to modify the program counter is using branch, exception generation and exception return instructions.

Where the PC is read by an instruction to compute a PC-relative address, then its value is the address of that instruction. Unlike A32 and T32, there is no implied offset of 4 or 8 bytes.

FP and NEON registers

The most significant update to the NEON registers is that NEON now has 32 16-byte registers, instead of the 16 registers it had before. The simpler mapping scheme between the different register sizes in the floating-point and NEON register bank make these registers much easier to use. The mapping is easier for compilers and optimizers to model and analyze.

Register indexed addressing

The A64 instruction set provides additional addressing modes with respect to A32, allowing a 64-bit index register to be added to the 64-bit base register, with optional scaling of the index by the access size. Additionally, it provides sign or zero-extension of a 32-bit value within an index register, again with optional scaling.

Copyright © 2015 ARM. All rights reserved.ARM DEN0024A