5.1.2. Addressing

When the processor can store 64-bit values in a single register, it becomes much simpler to access large amounts of memory within a program. A single thread executing on a 32-bit core is limited to accessing 4GB of address space. Large parts of that addressable space are reserved for use by the OS kernel, library code, peripherals, and more. As a result, lack of space means that the program might need to map some data in or out of memory while executing. Having a larger address space, with 64-bit pointers, avoids this problem. It also makes techniques such as memory-mapped files more attractive and convenient to use. The file contents are mapped into the memory map of a thread, even though the physical RAM might not be large enough to contain the whole file.

Other improvements to addressing include the following:

Exclusive accesses

Exclusive load-store of a byte, halfword, word and doubleword. Exclusive access to a pair of doublewords permits atomic updates of a pair of pointers, for example circular list inserts. All exclusive accesses must be naturally aligned, and exclusive pair access must be aligned to twice the data size, that is, 128 bits for a pair of 64-bit values.

Increased PC-relative offset addressing

PC-relative literal loads have an offset range of ±1MB. Compared to the PC-relative loads of A32, this reduces the number of literal pools, and increases sharing of literal data between functions. In turn, this reduces I-cache and TLB pollution.

Most conditional branches have a range of ±1MB, expected to be sufficient for the majority of conditional branches that take place within a single function.

Unconditional branches, including branch and link, have a range of ±128MB, expected to be sufficient to span the static code segment of most executable load modules and shared objects, without needing linker-inserted veneers.

Note

Veneers are small pieces of code that are automatically inserted by the linker, for example, when it detects that a branch target is out of range. The veneer becomes an intermediate target of the original branch with the veneer itself then being a branch to the target address.

The linker can reuse a veneer generated for a previous call, for other calls to the same function if it is in range from both calls. Occasionally, such veneers can be a performance factor.

If you have a loop that calls multiple functions through veneers, you will get many pipeline flushes and therefore sub-optimal performance. Placing related code together in memory can avoid this.

PC-relative load and store and address generation with a range of ±4GB can be performed inline using only two instructions, that is, without the need to load an offset from a literal pool.

Unaligned address support

Except for exclusive and ordered accesses, all loads and stores support the use of unaligned addresses when accessing normal memory. This simplifies porting code to A64.

Bulk transfers

The LDM, STM, PUSH, and POP instructions do not exist in A64. Bulk transfers can be constructed using the LDP and STP instructions. These instructions load and store a pair of independent registers from consecutive memory locations.

The LDNP and STNP instructions provide a streaming or non-temporal hint, that the data does not need to be retained in caches.

The PRFM, or prefetch memory instructions enable targeting of a prefetch to a specific cache level.

Load/Store

All Load/Store instructions now support consistent addressing modes. This makes it much easier, for example, to treat char, short, int and long long in the same way when loading and storing quantities from memory.

The floating-point and NEON registers now support the same addressing modes as the core registers, making it easier to use the two register banks interchangeably.

Alignment checking

When executing in AArch64, additional alignment checking is performed on instruction fetches and on loads or stores using the stack pointer, enabling misalignment checking of the PC or the current SP.

This approach is preferable to forcing the correct alignment of the PC or SP, because a misalignment of the PC or SP commonly indicates a software error, such as corruption of an address in software.

There are a number of types of alignment checking:

  • Program Counter alignment checking generates an exception associated with instruction fetch whenever an attempt is made to execute an instruction fetched with a misaligned PC in AArch64.

    A misaligned PC is defined to be one where bits [1:0] of the PC are not 00.

    A PC misalignment is identified in the exception syndrome register associated with the target Exception level.

    When the exception is handled using AArch64, the associated exception link register holds the entire PC in its misaligned form, as does the Fault Address Register, FAR_ELn, for the Exception level in which the exception is taken.

    PC alignment checking is performed in AArch64, and in AArch32 as part of Data Abort exception handling.

  • Stack Pointer (SP) alignment checking generates an exception associated with data memory access whenever a load or store using the stack pointer as a base address in AArch64 is attempted.

    A misaligned stack pointer is one where bits [3:0] of the stack pointer, used as the base address of the calculation, are not 0000. The stack pointer must be 16-byte aligned whenever it is usedas a base address.

    Stack pointer alignment checking is only performed in AArch64, and can be enabled independently for each Exception level:

    • EL0 and EL1 are controlled by two separate bits in SCTLR_EL1.

    • EL2 is controlled by a bit in SCTLR_EL2.

    • EL3 is controlled by a bit in SCTLR_EL3.

Copyright © 2015 ARM. All rights reserved.ARM DEN0024A
Non-ConfidentialID050815