| |||

Home > NEON and VFP Programming |

This chapter provides reference information about programming NEON™ and the VFP coprocessor in assembly language. It contains the following sections:

See Table 5.1, Table 5.2, and Table 5.3 to locate descriptions of individual instructions.

**Table 5.1. Location of NEON instructions**

Mnemonic | Brief description | Page |
---|---|---|

`VABA` , `VABD` | Absolute difference, Absolute difference and Accumulate | VABA{L} and VABD{L} |

`VABS` | Absolute value | V{Q}ABS and V{Q}NEG |

`VACGE` , `VACGT` | Absolute Compare Greater than or Equal, Greater Than | VACGE and VACGT |

`VACLE` , `VACLT` | Absolute Compare Less than or Equal, Less than (pseudo-instructionS) | VACLE and VACLT |

`VADD` | Add | V{Q}ADD, VADDL, VADDW,
V{Q}SUB, VSUBL, and VSUBW |

`VADDHN` | Add, select High half | V{R}ADDHN and V{R}SUBHN |

`VAND` | Bitwise AND | VAND, VBIC, VEOR, VORN,
and VORR (register) |

`VAND` | Bitwise AND (pseudo-instruction) | VAND and VORN (immediate) |

`VBIC` | Bitwise Bit Clear (register) | VAND, VBIC, VEOR, VORN,
and VORR (register) |

`VBIC` | Bitwise Bit Clear (immediate) | VBIC and VORR (immediate) |

`VBIF` , `VBIT` , `VBSL` | Bitwise Insert if False, Insert if True, Select | VBIF, VBIT, and VBSL |

`VCEQ` , `VCLE` , `VCLT` | Compare Equal, Less than or Equal, Compare Less Than | VCEQ, VCGE, VCGT, VCLE,
and VCLT |

`VCLE` , `VCLT` | Compare Less than or Equal, Compare Less Than (pseudo-instruction) | VCLE and VCLT |

`VCLS` , `VCLZ` , `VCNT` | Count Leading Sign bits, Count Leading Zeros, and Count set bits | VCLS, VCLZ, and VCNT |

`VCVT` | Convert fixed-point or integer to floating point, floating-point to integer or fixed-point | VCVT |

`VDUP` | Duplicate scalar to all lanes of vector | VDUP |

`VEXT` | Extract | VEXT |

`VCGE` , `VCGT` | Compare Greater than or Equal, Greater Than | VCEQ, VCGE, VCGT, VCLE,
and VCLT |

`VEOR` | Bitwise Exclusive OR | VAND, VBIC, VEOR, VORN,
and VORR (register) |

`VHADD` , `VHSUB` | Halving Add, Halving Subtract | V{R}HADD and VHSUB |

`VMAX` , `VMIN` | Maximum, Minimum | VMAX, VMIN, VPMAX,
and VPMIN |

`VLD` | Vector Load | NEON load / store element and structure
instructions |

`VMLA` , `VMLS` | Multiply Accumulate, Multiply Subtract (vector) | VMUL{L}, VMLA{L}, and
VMLS{L} |

`VMLA` , `VMLS` | Multiply Accumulate, Multiply Subtract (by scalar) | VMUL{L}, VMLA{L}, and
VMLS{L} (by scalar) |

`VMOV` | Move (immediate) | VMOV, VMVN (immediate) |

`VMOV` | Move (register) | VMOV, VMVN (register) |

`VMOVL` , `VMOV{U}N` | Move Long, Move Narrow (register) | VMOVL, V{Q}MOVN, VQMOVUN |

`VMUL` | Multiply (vector) | VMUL{L}, VMLA{L}, and
VMLS{L} |

`VMUL` | Multiply (by scalar) | VMUL{L}, VMLA{L}, and
VMLS{L} (by scalar) |

`VMVN` | Move Negative (immediate) | VMOV, VMVN (immediate) |

`VNEG` | Negate | V{Q}ABS and V{Q}NEG |

`VORN` | Bitwise OR NOT | VAND, VBIC, VEOR, VORN,
and VORR (register) |

`VORN` | Bitwise OR NOT (pseudo-instruction) | VAND and VORN (immediate) |

`VORR` | Bitwise OR (register) | VAND, VBIC, VEOR, VORN,
and VORR (register) |

`VORR` | Bitwise OR (immediate) | VBIC and VORR (immediate) |

`VPADD` , `VPADAL` | Pairwise Add, Pairwise Add and Accumulate | VPADD{L}, VPADAL |

`VPMAX` , `VPMIN` | Pairwise Maximum, Pairwise Minimum | VMAX, VMIN, VPMAX,
and VPMIN |

`VQABS` | Absolute value, saturate | V{Q}ABS and V{Q}NEG |

`VQADD` | Add, saturate | V{Q}ADD, VADDL, VADDW,
V{Q}SUB, VSUBL, and VSUBW |

`VQDMLAL` , `VQDMLSL` | Saturating Doubling Multiply Accumulate, and Multiply Subtract | VQDMULL, VQDMLAL, and
VQDMLSL (by vector or by scalar) |

`VQMOV{U}N` | Saturating Move (register) | VMOVL, V{Q}MOVN, VQMOVUN |

`VQDMUL` | Saturating Doubling Multiply | VQDMULL, VQDMLAL, and
VQDMLSL (by vector or by scalar) |

`VQDMULH` | Saturating Doubling Multiply returning High half | VQ{R}DMULH (by vector
or by scalar) |

`VQNEG` | Negate, saturate | V{Q}ABS and V{Q}NEG |

`VQRDMULH` | Saturating Doubling Multiply returning High half | VQ{R}DMULH (by vector
or by scalar) |

`VQRSHL` | Shift Left, Round, saturate (by signed variable) | V{Q}{R}SHL (by signed
variable) |

`VQRSHR` | Shift Right, Round, saturate (by immediate) | VQ{R}SHR{U}N (by immediate) |

`VQSHL` | Shift Left, saturate (by immediate) | VSHL, VQSHL, VQSHLU,
and VSHLL (by immediate) |

`VQSHL` | Shift Left, saturate (by signed variable) | V{Q}{R}SHL (by signed
variable) |

`VQSHR` | Shift Right, saturate (by immediate) | VQ{R}SHR{U}N (by immediate) |

`VQSUB` | Subtract, saturate | V{Q}ADD, VADDL, VADDW,
V{Q}SUB, VSUBL, and VSUBW |

`VRADDH` | Add, select High half, Round | V{R}ADDHN and V{R}SUBHN |

`VRECPE` , `VRECPS` | Reciprocal Estimate, Reciprocal Step | VRECPE and VRSQRTE |

`VREV` | Reverse elements | VREV |

`VRHADD` | Halving Add, Round | V{R}HADD and VHSUB |

`VRSHR` , `VRSRA` | Shift Right and Round, Shift Right, Round, and Accumulate (by immediate) | V{R}SHR{N}, V{R}SRA
(by immediate) |

`VRSUBH` | Subtract, select High half, Round | V{R}ADDHN and V{R}SUBHN |

`VRSQRTE` , `VRSQRTS` | Reciprocal Square Root Estimate, Reciprocal Square Root Step | VRECPS and VRSQRTS |

`VSHL` | Shift Left (by immediate) | VSHL, VQSHL, VQSHLU,
and VSHLL (by immediate) |

`VSHR` | Shift Right (by immediate) | V{R}SHR{N}, V{R}SRA
(by immediate) |

`VSLI` | Shift Left and Insert | VSLI and VSRI |

`VSRA` | Shift Right, Accumulate (by immediate) | V{R}SHR{N}, V{R}SRA
(by immediate) |

`VSRI` | Shift Right and Insert | VSLI and VSRI |

`VST` | Vector Store | NEON load / store element and structure
instructions |

`VSUB` | Subtract | V{Q}ADD, VADDL, VADDW,
V{Q}SUB, VSUBL, and VSUBW |

`VSUBH` | Subtract, select High half | V{R}ADDHN and V{R}SUBHN |

`VSWP` | Swap vectors | VSWP |

`VTBL` , `VTBX` | Vector table look-up | VTBL, VTBX |

`VTST` | Test bits | VTST |

`VTRN` | Vector transpose | VTRN |

`VUZP` , `VZIP` | Vector interleave and de-interleave | VUZP, VZIP |

**Table 5.2. Location of shared NEON and VFP instructions**

Mnemonic | Brief description | Page | Op. | Arch. |
---|---|---|---|---|

`VLDM` | Load multiple | VLDM, VSTM, VPOP, and
VPUSH | - | All |

`VLDR` | Load (see also VLDR pseudo‑instruction) | VLDR and VSTR | Scalar | All |

`VMOV` | Transfer from one ARM^{®} register
to half of double-precision | VMOV (between an ARM
register and a NEON scalar) | Scalar | All |

Transfer from two ARM registers to double-precision | VMOV (between two ARM
registers and an extension register) | Scalar | VFPv2 | |

Transfer from half of double-precision to ARM register | VMOV (between an ARM
register and a NEON scalar) | Scalar | All | |

Transfer from double-precision to two ARM registers | VMOV (between two ARM
registers and an extension register) | Scalar | VFPv2 | |

Transfer from single-precision to ARM register | VMOV (between one ARM
register and single precision VFP) | Scalar | All | |

Transfer from ARM register to single-precision | VMOV (between one ARM
register and single precision VFP) | Scalar | All | |

`VMRS` | Transfer from NEON and VFP system register to ARM register | VMRS and VMSR | - | All |

`VMSR` | Transfer from ARM register to NEON and VFP system register | VMRS and VMSR | - | All |

`VSTM` | Store multiple | VLDM, VSTM, VPOP, and
VPUSH | - | All |

`VSTR` | Store | VLDR and VSTR | Scalar | All |

**Table 5.3. Location of VFP instructions**

Mnemonic | Brief description | Page | Op. | Arch. |
---|---|---|---|---|

`VABS` | Absolute value | VABS, VNEG, and VSQRT | Vector | All |

`VADD` | Add | VADD, VSUB, and VDIV | Vector | All |

`VCMP` | Compare | VCMP | Scalar | All |

`VCVT` | Convert betwen single-precision and double-precision | VCVT (between single-precision
and double-precision) | Scalar | All |

Convert between floating-point and integer | VCVT (between floating-point
and integer) | Scalar | All | |

Convert between floating-point and fixed-point | VCVT (between floating-point
and fixed-point) | Scalar | VFPv3 | |

`VDIV` | Divide | VADD, VSUB, and VDIV | Vector | All |

`VMLA` | Multiply accumulate | VMUL, VMLA, VMLS, VNMUL,
VNMLA, and VNMLS | Vector | All |

`VMLS` | Multiply subtract | VMUL, VMLA, VMLS, VNMUL,
VNMLA, and VNMLS | Vector | All |

`VMOV` | Insert floating-point constant in single-precision or double-precision register (see also Table 5.2) | VMOV | Scalar | VFPv3 |

`VMUL` | Multiply | VMUL, VMLA, VMLS, VNMUL,
VNMLA, and VNMLS | Vector | All |

`VNEG` | Negate | VABS, VNEG, and VSQRT | Vector | All |

`VNMLA` | Negated multiply accumulate | VMUL, VMLA, VMLS, VNMUL,
VNMLA, and VNMLS | Vector | All |

`VNMLS` | Negated multiply subtract | VMUL, VMLA, VMLS, VNMUL,
VNMLA, and VNMLS | Vector | All |

`VNMUL` | Negated multiply | VMUL, VMLA, VMLS, VNMUL,
VNMLA, and VNMLS | Vector | All |

`VSQRT` | Square Root | VABS, VNEG, and VSQRT | Vector | All |

`VSUB` | Subtract | VADD, VSUB, and VDIV | Vector | All |