3.2 The NEON unit

The NEON unit has a register bank of thirty-two 64-bit vector registers that can be operated on in parallel.

The NEON unit can view the register bank as either:

  • Sixteen 128-bit quadword registers, Q0 to Q15.

  • Thirty-two 64-bit doubleword registers, D0 to D31.

These registers can then be operated on in parallel in the NEON unit. For example, in one vector add instruction you can add eight 16-bit integers to eight other 16-bit integers to produce eight 16-bit results. This is known as vectorization (or more specifically for NEON, Single Instruction Multiple Data (SIMD) vectorization).

The NEON unit supports 8-bit, 16-bit and 32-bit integer operations, and some 64-bit operations, in addition to single-precision (32-bit) floating point operations. It can operate on elements in groups of 2, 4, 8, or 16. (The Cortex-A9 processor also supports conversion to and from 16-bit floating-point operations, which the compiler supports when --fp16_format is specified, from RVCT 4.0 and later, and ARM Compiler 4.1 and later.)


Vectorization of floating-point code does not always occur automatically. For example, loops that require re-association only vectorize when compiled with --fpmode fast. Compiling with --fpmode fast enables the compiler to perform some transformations that could affect the result.

The NEON unit is classified as a vector Single Instruction Multiple Data (SIMD) unit that operates on multiple elements in a vector register by using one instruction.

For example, array A is a 16-bit integer array with 8 elements.

Table 3-1 Array A

1 2 3 4 5 6 7 8

Array B has the following 8 elements:

Table 3-2 Array B

80 70 60 50 40 30 20 10

To add these arrays together, fetch each vector into a vector register and use one vector SIMD instruction to obtain the result.

Table 3-3 Result

81 72 63 54 45 36 27 18

The NEON unit can only deal with vectors that are stored consecutively in memory, so it is not possible to vectorize indirect addressing.

When writing structures, be aware that NEON structure loads require the structure to contain equal-sized members.

Non-ConfidentialPDF file icon PDF versionARM DUI0472M
Copyright © 2010-2016 ARM Limited or its affiliates. All rights reserved.