18.3 NEON intrinsics

NEON intrinsics map closely to NEON instructions.

The documentation for each intrinsic begins with a list of function prototypes, with a comment specifying an equivalent assembler instruction. The compiler selects an instruction that has the required semantics, but there is no guarantee that the compiler produces the listed instruction.

The intrinsics use a naming scheme that is similar to the NEON unified assembler syntax. That is, each intrinsic has the form:


The optional q flag specifies that the intrinsic operates on 128-bit vectors.

For example:

  • vmul_s16, multiplies two vectors of signed 16-bit values.

    This compiles to VMUL.I16 d2, d0, d1.

  • vaddl_u8, is a long add of two 64-bit vectors containing unsigned 8-bit values, resulting in a 128-bit vector of unsigned 16-bit values.

    This compiles to VADDL.U8 q1, d0, d1.

Registers other than those specified in these examples might be used. In addition, the compiler might perform optimization that in some way changes the instruction that the source code compiles to.


The intrinsic function prototypes in this documentation use the following type annotations:


The argument n must be a compile-time constant.

__constrange(min, max)

The argument must be a compile-time constant in the range min to max.


The intrinsic loads n lanes from this pointer.


The NEON intrinsic function prototypes that use __fp16 are only available for targets that have the NEON half-precision VFP extension. To enable use of __fp16, use the --fp16_format command-line option.

Related reference
8.84 --fp16_format=format
18.4 NEON intrinsics for addition
18.5 NEON intrinsics for multiplication
18.6 NEON intrinsics for subtraction
18.7 NEON intrinsics for comparison
18.8 NEON intrinsics for absolute difference
18.9 NEON intrinsics for maximum and minimum
18.10 NEON intrinsics for pairwise addition
18.11 NEON intrinsics for folding maximum
18.12 NEON intrinsics for folding minimum
18.13 NEON intrinsics for reciprocal and sqrt
18.14 NEON intrinsics for shifts by signed variable
18.15 NEON intrinsics for shifts by a constant
18.16 NEON intrinsics for shifts with insert
18.17 NEON intrinsics for loading a single vector or lane
18.18 NEON intrinsics for storing a single vector or lane
18.19 NEON intrinsics for loading an N-element structure
18.20 NEON intrinsics for extracting lanes from a vector into a register
18.21 NEON intrinsics for loading a single lane of a vector from a literal
18.22 NEON intrinsics for initializing a vector from a literal bit pattern
18.23 NEON intrinsics for setting all lanes to the same value
18.24 NEON intrinsics for combining vectors
18.25 NEON intrinsics for splitting vectors
18.26 NEON intrinsics for converting vectors
18.27 NEON intrinsics for table look up
18.28 NEON intrinsics for extended table look up
18.29 NEON intrinsics for operations with a scalar value
18.30 NEON intrinsics for vector extraction
18.31 NEON intrinsics for reversing vector elements (swap endianness)
18.32 NEON intrinsics for other single operand arithmetic
18.33 NEON intrinsics for logical operations
18.34 NEON intrinsics for transposition operations
18.35 NEON intrinsics for vector cast operations
Non-ConfidentialPDF file icon PDF versionARM DUI0472J
Copyright © 2010-2013 ARM. All rights reserved.