ARM® Cortex®-A Series Programmer’s Guide for ARMv8-A

Version: 1.0

Table of Contents

Connected community
Feedback on this book
1. Introduction
1.1. How to use this book
2. ARMv8-A Architecture and Processors
2.1. ARMv8-A
2.2. ARMv8-A Processor properties
2.2.1. ARMv8 processors
3. Fundamentals of ARMv8
3.1. Execution states
3.2. Changing Exception levels
3.3. Changing execution state
4. ARMv8 Registers
4.1. AArch64 special registers
4.1.1. Zero register
4.1.2. Stack pointer
4.1.3. Program Counter
4.1.4. Exception Link Register (ELR)
4.1.5. Saved Process Status Register
4.2. Processor state
4.3. System registers
4.3.1. The system control register
4.4. Endianness
4.5. Changing execution state (again)
4.5.1. Registers at AArch32
4.5.2. PSTATE at AArch32
4.6. NEON and floating-point registers
4.6.1. Floating-point register organization in AArch64
4.6.2. Scalar register sizes
4.6.3. Vector register sizes
4.6.4. NEON in AArch32 execution state.
5. An Introduction to the ARMv8 Instruction Sets
5.1. The ARMv8 instruction sets
5.1.1. Distinguishing between 32-bit and 64-bit A64 instructions
5.1.2. Addressing
5.1.3. Registers
5.2. C/C++ inline assembly
5.3. Switching between the instruction sets
6. The A64 instruction set
6.1. Instruction mnemonics
6.2. Data processing instructions
6.2.1. Arithmetic and logical operations
6.2.2. Multiply and divide instructions
6.2.3. Shift operations
6.2.4. Bitfield and byte manipulation instructions
6.2.5. Conditional instructions
6.3. Memory access instructions
6.3.1. Load instruction format
6.3.2. Store instruction format
6.3.3. Floating-point and NEON scalar loads and stores
6.3.4. Specifying the address for a Load or Store instruction
6.3.5. Accessing multiple memory locations
6.3.6. Unprivileged access
6.3.7. Prefetching memory
6.3.8. Non-temporal load and store pair
6.3.9. Memory access atomicity
6.3.10. Memory barrier and fence instructions
6.3.11. Synchronization primitives
6.4. Flow control
6.5. System control and other instructions
6.5.1. Exception handling instructions
6.5.2. System register access
6.5.3. Debug instructions
6.5.4. Hint instructions
6.5.5. NEON instructions
6.5.6. Floating-point instructions
6.5.7. Cryptographic instructions
7. AArch64 Floating-point and NEON
7.1. New features for NEON and Floating-point in AArch64
7.2. NEON and Floating-Point architecture
7.2.1. Floating-point
7.2.2. Scalar data and NEON
7.2.3. Floating-point parameters
7.3. AArch64 NEON instruction format
7.4. NEON coding alternatives
8. Porting to A64
8.1. Alignment
8.2. Data types
8.2.1. Assembly code
8.3. Issues when porting code from a 32-bit to 64-bit environment
8.3.1. Recompile or rewrite code
8.3.2. ARM Compiler 6 options for ARMv8-A
8.4. Recommendations for new C code
8.4.1. Explicit and implicit type conversions
8.4.2. Bit manipulation operations
8.4.3. Indexes
9. The ABI for ARM 64-bit Architecture
9.1. Register use in the AArch64 Procedure Call Standard
9.1.1. Parameters in general-purpose registers
9.1.2. Indirect result location
9.1.3. Parameters in NEON and floating-point registers
10. AArch64 Exception Handling
10.1. Exception handling registers
10.2. Synchronous and asynchronous exceptions
10.2.1. Synchronous aborts
10.2.2. Handling synchronous exceptions
10.2.3. System calls
10.2.4. System calls to EL2/EL3
10.2.5. Unallocated instructions
10.2.6. The Exception Syndrome Register
10.3. Changes to execution state and Exception level caused by exceptions
10.4. AArch64 exception table
10.5. Interrupt handling
10.6. The Generic Interrupt Controller
10.6.1. Configuration
10.6.2. Initialization
10.6.3. Interrupt handling
11. Caches
11.1. Cache terminology
11.1.1. Set associative caches and ways
11.1.2. Cache tags and Physical Addresses
11.1.3. Inclusive and exclusive caches
11.2. Cache controller
11.3. Cache policies
11.4. Point of coherency and unification
11.5. Cache maintenance
11.6. Cache discovery
12. The Memory Management Unit
12.1. The Translation Lookaside Buffer
12.2. Separation of kernel and application Virtual Address spaces
12.3. Translating a Virtual Address to a Physical Address
12.3.1. Secure and Non-secure addresses
12.3.2. Configuring and enabling the MMU
12.3.3. Operation when the Memory Management Unit is disabled
12.4. Translation tables in ARMv8-A
12.4.1. AArch64 descriptor format
12.4.2. Effect of granule sizes on translation tables
12.4.3. Cache configuration
12.4.4. Cache policies
12.5. Translation table configuration
12.5.1. Virtual Address tagging
12.6. Translations at EL2 and EL3
12.7. Access permissions
12.8. Operating system use of translation table descriptors
12.9. Security and the MMU
12.10. Context switching
12.11. Kernel access with user permissions
13. Memory Ordering
13.1. Memory types
13.1.1. Normal memory
13.1.2. Device memory
13.2. Barriers
13.2.1. One-way barriers
13.2.2. ISB in more detail
13.2.3. Use of barriers in C code
13.2.4. Non-temporal load and store pair
13.3. Memory attributes
13.3.1. Cacheable and shareable memory attributes
14. Multi-core processors
14.1. Multi-processing systems
14.1.1. Determining which core the code is running on
14.1.2. Symmetric multi-processing
14.1.3. Timers
14.1.4. Synchronization
14.1.5. Asymmetric multi-processing
14.1.6. Heterogeneous multi-processing
14.1.7. Exclusive monitor system location
14.2. Cache coherency
14.3. Multi-core cache coherency within a cluster
14.3.1. Snoop Control Unit
14.3.2. Accelerator coherency port
14.3.3. Cache coherency between clusters
14.3.4. Domains
14.4. Bus protocol and the Cache Coherent Interconnect
14.4.1. Compute subsystems and mobile applications
15. Power Management
15.1. Idle management
15.1.1. Power and clocking
15.1.2. Standby
15.1.3. Retention
15.1.4. Power down
15.1.5. Dormant mode
15.1.6. Hotplug
15.2. Dynamic voltage and frequency scaling
15.3. Assembly language power instructions
15.4. Power State Coordination Interface
16. big.LITTLE Technology
16.1. Structure of a big.LITTLE system
16.1.1. big.LITTLE configurations
16.2. Software execution models in big.LITTLE
16.2.1. Cluster migration
16.2.2. CPU migration
16.2.3. Global Task Scheduling
16.3. big.LITTLE MP
16.3.1. Fork migration
16.3.2. Wake migration
16.3.3. Forced migration
16.3.4. Idle pull migration
16.3.5. Offload migration
17. Security
17.1. TrustZone hardware architecture
17.2. Switching security worlds through interrupts
17.3. Security in multi-core systems
17.3.1. Interaction of Normal and Secure worlds
17.3.2. Secure debug
17.4. Switching between Secure and Non-secure state
18. Debug
18.1. ARM debug hardware
18.1.1. Overview
18.1.2. Halting or Self-hosted debug
18.1.3. Debug events
18.1.4. External debug
18.1.5. Halting debug mode
18.1.6. Self-hosted debug
18.1.7. Debugging Linux applications
18.1.8. Debugging Linux kernel
18.1.9. The call stack
18.1.10. Semihosting debug
18.2. ARM trace hardware
18.2.1. CoreSight
18.3. DS-5 debug and trace
18.3.1. Debugging Linux or Android applications using DS-5
18.3.2. Debugging Linux kernel modules
18.3.3. Debugging Linux kernels using DS-5
18.3.4. Debugging a multi-threaded application using DS-5
18.3.5. Debugging shared libraries
18.3.6. Trace support in DS-5
19. ARMv8 Models
19.1. ARM Fast Models
19.1.1. Where to get ARM Fast Models
19.2. ARMv8-A Foundation Platform
19.2.1. Limitations of the Foundation Platform
19.2.2. Software requirements
19.2.3. Where to get the ARM Foundation Platform
19.2.4. Verifying the installation
19.2.5. Running the example program
19.2.6. Troubleshooting the example program
19.2.7. The kernel
19.2.8. Configuring the kernel command line
19.2.9. Choice of root filesystem
19.2.10. Setting up a block device image for a root file system
19.2.11. Starting the Foundation platform
19.2.12. Network connections
19.2.13. Setting up a network connection
19.2.14. Command-line overview
19.2.15. Web interface
19.2.16. UARTs
19.2.17. UART output
19.2.18. Multicore configuration
19.2.19. Semihosting
19.2.20. Semihosting configuration
19.3. The Base Platform FVP
19.3.1. Software requirements
19.3.2. Verifying the installation
19.3.3. Semihosting support
19.3.4. Using a configuration GUI in your debugger
19.3.5. Setting model configuration options from Model Shell
19.3.6. Loading and running an application on the AEMv8-A Base Platform FVP
19.3.7. Running the example program from the command line
19.3.8. Running the example program using Model Debugger
19.3.9. Using the CLCD window
19.3.10. Using Ethernet with the AEMv8-A Base Platform FVP
19.3.11. Compatibility with VE model and platform
19.3.12. Where to get the ARMv8-A Base Platform FVP

List of Figures

2.1. Development of the ARMv8 architecture
2.2. Cortex-A53 processor
2.3. Cortex-A57 processor core
3.1. Exception levels
3.2. ARMv8 Exception levels in the Normal and Secure worlds
3.3. Exception levels in AArch64
3.4. Exception levels in AArch32
3.5. ARMv7 privilege levels
3.6. AArch32 processor modes
3.7. Moving between AArch32 and AArch64
4.1. AArch64 general-purpose registers
4.2. 64-bit register with W and X access.
4.3. AArch64 special registers
4.4. SPSR
4.5. SCTLR bit assignments
4.7. The ARMv7 register set showing banked registers
4.8. AArch64 to AArch32 register mapping
4.9. CPSR bit assignments in AArch32
4.10. Arrangement of floating-point values
4.11. Arrangement of ARMv8 registers when holding scalar values
4.12. Vector sizes
4.13. Arrangement of ARMv7 SIMD registers
5.1. Switching between instruction sets
6.1. Shift operations
6.2. Bit manipulation instructions
6.3. REV16 instruction
6.4. REV32 instruction
6.5. Load instructions
6.6. LDP W3, W7 [X0]
6.7. LDP X8, X2, [X0 + #0x10]!
7.1. Divisions of the V register
7.2. Divisions of the D register
7.3. Floating-point register divisions
7.4. Inserting an element into a vector (INS V0.S[1], V1.S[0])
7.5. Moving a scalar to a lane (MOV V0.B[3], W0)
7.6. NEON long instructions
7.7. NEON wide instructions
7.8. NEON narrow instructions
7.9. Pairwise operation
7.10. Across all lanes operation
7.11. SADDW2
7.12. XTN2
7.13. SADDL2
9.1. General-purpose register use in the ABI
9.2. Stack frame
9.3. SIMD and floating-point registers in the ABI
10.1. Exception flow
10.2. When exceptions are taken from AArch64
10.3. When exceptions are taken from AArch32
10.4. Exception handling
10.5. Exception to EL1
10.6. Interrupt handler in C code
10.7. Handling nested interrupts
11.1. A basic cache arrangement
11.2. Cache terminology
11.3. A 2-way set-associative cache
11.4. A 32KB 4-way set associative data cache
11.5. Found in the L1 cache
11.6. Found in the L2 cache
11.7. Found in external memory
11.8. Write-back
11.9. Write-through
11.10. Cacheable properties of memory
11.11. Point of Coherency
11.12. Point of Unification
12.1. The Memory Management Unit
12.2. Virtual and physical memory
12.3. Address translation using translation tables
12.4. Kernel and application memory mapping
12.5. Translation table control configuration
12.6. Translation table control register
12.7. Virtual to Physical Address translation for a 512MB block
12.8. Virtual to Physical Address translation for a 64KB page
12.9. Physical Address spaces
12.10. A64 Table descriptor type
12.11. 4KB Granule
12.12. 16KB Granule
12.13. 64KB Granule
12.14. Memory busses and caches
12.15. Two stage translation process
12.16. Maximum IPA space
12.17. Maximum Virtual Address space
12.18. Device regions
12.19. Translation table descriptors
13.1. Type encoding
13.2. One-way barriers
13.3. Stage 1 block memory attributes
13.4. Inner and outer shareable domains
14.1. A typical big.LITTLE system
14.2. Cache coherency groups
14.3. Broadcasting cache operations to other cores
14.4. Cache coherency logic
14.5. Bus master coherency domains
14.6. Multi-cluster system
14.7. CCI snoop request
14.8. Example mobile applications processor with CoreLink IP
16.1. Typical big.LITTLE system
16.2. CPU migration
16.3. Global Task Scheduling
16.4. Migration thresholds
16.5. Wake migration on a big core
16.6. Wake migration on a LITTLE core
16.7. Forced migration
17.1. Non-secure interrupts
17.2. Secure interrupts
17.3. Interaction with Security Extension
17.4. Security model when EL3 is using AArch32
17.5. Security model when EL3 is using AArch64
18.1. Debugging a kernel using DS-5
18.2. Threading call stacks in the DS-5 Debug Control view
18.3. DS-5 Debugger Trace view
19.1. Block diagram of ARMv8-A Foundation Platform
19.2. Installed files
19.3. Visualization window
19.4. Multicore option with number of cores = 4
19.5. AEMv8-A Base Platform FVP directory tree
19.6. CLCD window at startup
19.7. CLCD window active
19.8. Model networking structure block diagram

Proprietary Notice

This document is protected by copyright and other related rights and the practice or implementation of the information contained in this document may be protected by one or more patents or pending patent applications. No part of this document may be reproduced in any form by any means without the express prior written permission of ARM. No license, express or implied, by estoppel or otherwise to any intellectual property rights is granted by this document unless specifically stated.

Your access to the information in this document is conditional upon your acceptance that you will not use or permit others to use the information for the purposes of determining whether implementations infringe any third party patents.

THIS DOCUMENT IS PROVIDED “AS IS”. ARM PROVIDES NO REPRESENTATIONS AND NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY, SATISFACTORY QUALITY, NON-INFRINGEMENT OR FITNESS FOR A PARTICULAR PURPOSE WITH RESPECT TO THE DOCUMENT. For the avoidance of doubt, ARM makes no representation with respect to, and has undertaken no analysis to identify or understand the scope and content of, third party patents, copyrights, trade secrets, or other rights.

This document may include technical inaccuracies or typographical errors.


This document consists solely of commercial items. You shall be responsible for ensuring that any use, duplication or disclosure of this document complies fully with any relevant export laws and regulations to assure that this document or any portion thereof is not exported, directly or indirectly, in violation of such export laws. Use of the word “partner” in reference to ARM’s customers is not intended to create or refer to any partnership relationship with any other company. ARM may make changes to this document at any time and without notice.

If any of the provisions contained in these terms conflict with any of the provisions of any signed written agreement covering this document with ARM, then the signed written agreement prevails over and supersedes the conflicting provisions of these terms. This document may be translated into other languages for convenience, and you agree that if there is any conflict between the English version of this document and any translation, the terms of the English version of the Agreement shall prevail.

Words and logos marked with ® or ™ are registered trademarks or trademarks of ARM Limited or its affiliates in the EU and/or elsewhere. All rights reserved. Other brands and names mentioned in this document may be the trademarks of their respective owners. Please follow ARM’s trademark usage guidelines at

Copyright © 2015, ARM Limited or its affiliates. All rights reserved.

ARM Limited. Company 02557590 registered in England.

110 Fulbourn Road, Cambridge, England CB1 9NJ.

Confidentiality Status

This document is Non-Confidential. The right to use, copy and disclose this document may be subject to license restrictions in accordance with the terms of the agreement entered into by ARM and the party that ARM delivered this document to.

Product Status

The information in this document is final, that is for a developed product.

Revision History
Revision A24 March 2015First release
Copyright © 2015 ARM. All rights reserved.ARM DEN0024A