2.1. ARMv8-A

The ARMv8-A architecture is the latest generation ARM architecture targeted at the Applications Profile. The name ARMv8 is used to describe the overall architecture, which now includes both 32-bit execution and 64-bit execution. It introduces the ability to perform execution with 64-bit wide registers, while preserving backwards compatibility with existing ARMv7 software.

Figure 2.1. Development of the ARMv8 architecture

To view this graphic, your browser must support the SVG format. Either install a browser with native support, or install an appropriate plugin such as Adobe SVG Viewer.


The ARMv8-A architecture introduces a number of changes, which enable significantly higher performance processor implementations to be designed.

Large physical address

This enables the processor to access beyond 4GB of physical memory.

64-bit virtual addressing

This enables virtual memory beyond the 4GB limit. This is important for modern desktop and server software using memory mapped file I/O or sparse addressing.

Automatic event signaling

This enables power-efficient, high-performance spinlocks.

Larger register files

Thirty-one 64-bit general-purpose registers increase performance and reduce stack use.

Efficient 64-bit immediate generation

There is less need for literal pools.

Large PC-relative addressing range

A +/-4GB addressing range for efficient data addressing within shared libraries and position-independent executables.

Additional 16KB and 64KB translation granules

This reduces Translation Lookaside Buffer (TLB) miss rates and depth of page walks.

New exception model

This reduces OS and hypervisor software complexity.

Efficient cache management

User space cache operations improve dynamic code generation efficiency. Fast Data cache clear using a Data Cache Zero instruction.

Hardware-accelerated cryptography

Provides 3× to 10× better software encryption performance. This is useful for small granule decryption and encryption too small to offload to a hardware accelerator efficiently, for example https.

Load-Acquire, Store-Release instructions

Designed for C++11, C11, Java memory models. They improve performance of thread-safe code by eliminating explicit memory barrier instructions.

NEON double-precision floating-point advanced SIMD

This enables SIMD vectorization to be applied to a much wider set of algorithms, for example, scientific computing, High Performance Computing (HPC) and supercomputers.

Copyright © 2015 ARM. All rights reserved.ARM DEN0024A
Non-ConfidentialID050815