3.1. The NEON unit

The NEON unit provides 32 vector registers that each hold 16 bytes of information. These 16 byte registers can then be operated on in parallel in the NEON unit. For example, in one vector add instruction you can add eight 16-bit integers to eight other 16 bit integers to produce eight 16-bit results.

The NEON unit supports 8-bit, 16-bit and 32-bit integer operations, and some 64-bit operations, in addition to 32-bit floating point operations.

Note

Vectorization of floating-point code does not always occur automatically. For example, loops that require reassociation only vectorize when compiled with --fpmode fast. Compiling with --fpmode fast enables the compiler to perform some transformations that could affect the result. (See --fpmode=model in the Compiler Reference Guide.)

The NEON unit is classified as a vector SIMD unit that operates on multiple elements, in a vector register, with one instruction.

For example, array A is a 16-bit integer array with 8 elements.

Table 3.1. Array A

12345678

Array B has these 8 elements:

Table 3.2. Array B

8070605040302010

To add these arrays together, fetch each vector into a vector register and use one vector SIMD instruction to obtain the result.

Table 3.3. Result

8172635445362718

Copyright © 2002-2010 ARM. All rights reserved.ARM DUI 0205J
Non-ConfidentialID101213