3.9. NEON

ARMv7 architecture introduces the advanced Single Instruction Multiple Data (SIMD) extension as an optional extension to the ARMv7-A and ARMv7-R profiles. It extends the SIMD concept by defining groups of instructions operating on vectors stored in 64-bit D, doubleword registers and 128-bit Q, quadword vector registers. The implementation of the Advanced SIMD extension used in ARM processors is called NEON, and the NEON technology is implemented in all current ARM Cortex-A series processors.

The NEON technology can accelerate multimedia and signal processing algorithms, such as video encode and decode, 2D and 3D graphics, gaming, audio and speech processing, image processing, telephony, and sound synthesis by at least three times the performance of ARMv5.

NEON is designed as an additional load and store architecture to provide vectorizing compiler support from languages, such as C and C++. NEON instructions operate on wide 64-bit and 128-bit vector registers, and form part of the normal ARM or Thumb code. Therefore, NEON instructions are easy to understand, and make hand-coding easy for applications that require the highest performance.

The NEON architecture uses a 32 × 64-bit register file. They are actually the same registers used by the floating-point unit, VFPv3. The compiler can use any NEON or VFP registers for floating-point values or NEON data at any point in the code. NEON differs from VFP primarily in the following aspects:

For example, the VADD.I16 Q0, Q1, Q2 instruction performs a parallel addition of eight lanes of 16-bit elements from vectors in Q1 and Q2, and stores the result in Q0, as shown in Figure 3.10.

Figure 3.10. 8-lane 16-bit integer add operation

To view this graphic, your browser must support the SVG format. Either install a browser with native support, or install an appropriate plugin such as Adobe SVG Viewer.

Copyright © 2014 ARM. All rights reserved.ARM DAI0425