| |||
| Home > NEON and VFP Programming | |||
This chapter provides reference information about programming NEON™ and the VFP coprocessor in assembly language. It contains the following sections:
See Table 5.1, Table 5.2, and Table 5.3 to locate descriptions of individual instructions.
Table 5.1. Location of NEON instructions
| Mnemonic | Brief description | Page |
|---|---|---|
VABA, VABD | Absolute difference, Absolute difference and Accumulate | VABA{L} and VABD{L} |
VABS | Absolute value | V{Q}ABS and V{Q}NEG |
VACGE, VACGT | Absolute Compare Greater than or Equal, Greater Than | VACGE and VACGT |
VACLE, VACLT | Absolute Compare Less than or Equal, Less than (pseudo-instructionS) | VACLE and VACLT |
VADD | Add | V{Q}ADD, VADDL, VADDW, V{Q}SUB, VSUBL, and VSUBW |
VADDHN | Add, select High half | V{R}ADDHN and V{R}SUBHN |
VAND | Bitwise AND | VAND, VBIC, VEOR, VORN, and VORR (register) |
VAND | Bitwise AND (pseudo-instruction) | VAND and VORN (immediate) |
VBIC | Bitwise Bit Clear (register) | VAND, VBIC, VEOR, VORN, and VORR (register) |
VBIC | Bitwise Bit Clear (immediate) | VBIC and VORR (immediate) |
VBIF, VBIT, VBSL | Bitwise Insert if False, Insert if True, Select | VBIF, VBIT, and VBSL |
VCEQ, VCLE, VCLT | Compare Equal, Less than or Equal, Compare Less Than | VCEQ, VCGE, VCGT, VCLE, and VCLT |
VCLE, VCLT | Compare Less than or Equal, Compare Less Than (pseudo-instruction) | VCLE and VCLT |
VCLS, VCLZ, VCNT | Count Leading Sign bits, Count Leading Zeros, and Count set bits | VCLS, VCLZ, and VCNT |
VCVT | Convert fixed-point or integer to floating point, floating-point to integer or fixed-point | VCVT |
VDUP | Duplicate scalar to all lanes of vector | VDUP |
VEXT | Extract | VEXT |
VCGE, VCGT | Compare Greater than or Equal, Greater Than | VCEQ, VCGE, VCGT, VCLE, and VCLT |
VEOR | Bitwise Exclusive OR | VAND, VBIC, VEOR, VORN, and VORR (register) |
VHADD, VHSUB | Halving Add, Halving Subtract | V{R}HADD and VHSUB |
VMAX, VMIN | Maximum, Minimum | VMAX, VMIN, VPMAX, and VPMIN |
VLD | Vector Load | NEON load / store element and structure instructions |
VMLA, VMLS | Multiply Accumulate, Multiply Subtract (vector) | VMUL{L}, VMLA{L}, and VMLS{L} |
VMLA, VMLS | Multiply Accumulate, Multiply Subtract (by scalar) | VMUL{L}, VMLA{L}, and VMLS{L} (by scalar) |
VMOV | Move (immediate) | VMOV, VMVN (immediate) |
VMOV | Move (register) | VMOV, VMVN (register) |
VMOVL, VMOV{U}N | Move Long, Move Narrow (register) | VMOVL, V{Q}MOVN, VQMOVUN |
VMUL | Multiply (vector) | VMUL{L}, VMLA{L}, and VMLS{L} |
VMUL | Multiply (by scalar) | VMUL{L}, VMLA{L}, and VMLS{L} (by scalar) |
VMVN | Move Negative (immediate) | VMOV, VMVN (immediate) |
VNEG | Negate | V{Q}ABS and V{Q}NEG |
VORN | Bitwise OR NOT | VAND, VBIC, VEOR, VORN, and VORR (register) |
VORN | Bitwise OR NOT (pseudo-instruction) | VAND and VORN (immediate) |
VORR | Bitwise OR (register) | VAND, VBIC, VEOR, VORN, and VORR (register) |
VORR | Bitwise OR (immediate) | VBIC and VORR (immediate) |
VPADD, VPADAL | Pairwise Add, Pairwise Add and Accumulate | VPADD{L}, VPADAL |
VPMAX, VPMIN | Pairwise Maximum, Pairwise Minimum | VMAX, VMIN, VPMAX, and VPMIN |
VQABS | Absolute value, saturate | V{Q}ABS and V{Q}NEG |
VQADD | Add, saturate | V{Q}ADD, VADDL, VADDW, V{Q}SUB, VSUBL, and VSUBW |
VQDMLAL, VQDMLSL | Saturating Doubling Multiply Accumulate, and Multiply Subtract | VQDMULL, VQDMLAL, and VQDMLSL (by vector or by scalar) |
VQMOV{U}N | Saturating Move (register) | VMOVL, V{Q}MOVN, VQMOVUN |
VQDMUL | Saturating Doubling Multiply | VQDMULL, VQDMLAL, and VQDMLSL (by vector or by scalar) |
VQDMULH | Saturating Doubling Multiply returning High half | VQ{R}DMULH (by vector or by scalar) |
VQNEG | Negate, saturate | V{Q}ABS and V{Q}NEG |
VQRDMULH | Saturating Doubling Multiply returning High half | VQ{R}DMULH (by vector or by scalar) |
VQRSHL | Shift Left, Round, saturate (by signed variable) | V{Q}{R}SHL (by signed variable) |
VQRSHR | Shift Right, Round, saturate (by immediate) | VQ{R}SHR{U}N (by immediate) |
VQSHL | Shift Left, saturate (by immediate) | VSHL, VQSHL, VQSHLU, and VSHLL (by immediate) |
VQSHL | Shift Left, saturate (by signed variable) | V{Q}{R}SHL (by signed variable) |
VQSHR | Shift Right, saturate (by immediate) | VQ{R}SHR{U}N (by immediate) |
VQSUB | Subtract, saturate | V{Q}ADD, VADDL, VADDW, V{Q}SUB, VSUBL, and VSUBW |
VRADDH | Add, select High half, Round | V{R}ADDHN and V{R}SUBHN |
VRECPE, VRECPS | Reciprocal Estimate, Reciprocal Step | VRECPE and VRSQRTE |
VREV | Reverse elements | VREV |
VRHADD | Halving Add, Round | V{R}HADD and VHSUB |
VRSHR, VRSRA | Shift Right and Round, Shift Right, Round, and Accumulate (by immediate) | V{R}SHR{N}, V{R}SRA (by immediate) |
VRSUBH | Subtract, select High half, Round | V{R}ADDHN and V{R}SUBHN |
VRSQRTE, VRSQRTS | Reciprocal Square Root Estimate, Reciprocal Square Root Step | VRECPS and VRSQRTS |
VSHL | Shift Left (by immediate) | VSHL, VQSHL, VQSHLU, and VSHLL (by immediate) |
VSHR | Shift Right (by immediate) | V{R}SHR{N}, V{R}SRA (by immediate) |
VSLI | Shift Left and Insert | VSLI and VSRI |
VSRA | Shift Right, Accumulate (by immediate) | V{R}SHR{N}, V{R}SRA (by immediate) |
VSRI | Shift Right and Insert | VSLI and VSRI |
VST | Vector Store | NEON load / store element and structure instructions |
VSUB | Subtract | V{Q}ADD, VADDL, VADDW, V{Q}SUB, VSUBL, and VSUBW |
VSUBH | Subtract, select High half | V{R}ADDHN and V{R}SUBHN |
VSWP | Swap vectors | VSWP |
VTBL, VTBX | Vector table look-up | VTBL, VTBX |
VTST | Test bits | VTST |
VTRN | Vector transpose | VTRN |
VUZP, VZIP | Vector interleave and de-interleave | VUZP, VZIP |
Table 5.2. Location of shared NEON and VFP instructions
| Mnemonic | Brief description | Page | Op. | Arch. |
|---|---|---|---|---|
VLDM | Load multiple | VLDM, VSTM, VPOP, and VPUSH | - | All |
VLDR | Load (see also VLDR pseudo‑instruction) | VLDR and VSTR | Scalar | All |
VMOV | Transfer from one ARM® register to half of double-precision | VMOV (between an ARM register and a NEON scalar) | Scalar | All |
| Transfer from two ARM registers to double-precision | VMOV (between two ARM registers and an extension register) | Scalar | VFPv2 | |
| Transfer from half of double-precision to ARM register | VMOV (between an ARM register and a NEON scalar) | Scalar | All | |
| Transfer from double-precision to two ARM registers | VMOV (between two ARM registers and an extension register) | Scalar | VFPv2 | |
| Transfer from single-precision to ARM register | VMOV (between one ARM register and single precision VFP) | Scalar | All | |
| Transfer from ARM register to single-precision | VMOV (between one ARM register and single precision VFP) | Scalar | All | |
VMRS | Transfer from NEON and VFP system register to ARM register | VMRS and VMSR | - | All |
VMSR | Transfer from ARM register to NEON and VFP system register | VMRS and VMSR | - | All |
VSTM | Store multiple | VLDM, VSTM, VPOP, and VPUSH | - | All |
VSTR | Store | VLDR and VSTR | Scalar | All |
Table 5.3. Location of VFP instructions
| Mnemonic | Brief description | Page | Op. | Arch. |
|---|---|---|---|---|
VABS | Absolute value | VABS, VNEG, and VSQRT | Vector | All |
VADD | Add | VADD, VSUB, and VDIV | Vector | All |
VCMP | Compare | VCMP | Scalar | All |
VCVT | Convert betwen single-precision and double-precision | VCVT (between single-precision and double-precision) | Scalar | All |
| Convert between floating-point and integer | VCVT (between floating-point and integer) | Scalar | All | |
| Convert between floating-point and fixed-point | VCVT (between floating-point and fixed-point) | Scalar | VFPv3 | |
VDIV | Divide | VADD, VSUB, and VDIV | Vector | All |
VMLA | Multiply accumulate | VMUL, VMLA, VMLS, VNMUL, VNMLA, and VNMLS | Vector | All |
VMLS | Multiply subtract | VMUL, VMLA, VMLS, VNMUL, VNMLA, and VNMLS | Vector | All |
VMOV | Insert floating-point constant in single-precision or double-precision register (see also Table 5.2) | VMOV | Scalar | VFPv3 |
VMUL | Multiply | VMUL, VMLA, VMLS, VNMUL, VNMLA, and VNMLS | Vector | All |
VNEG | Negate | VABS, VNEG, and VSQRT | Vector | All |
VNMLA | Negated multiply accumulate | VMUL, VMLA, VMLS, VNMUL, VNMLA, and VNMLS | Vector | All |
VNMLS | Negated multiply subtract | VMUL, VMLA, VMLS, VNMUL, VNMLA, and VNMLS | Vector | All |
VNMUL | Negated multiply | VMUL, VMLA, VMLS, VNMUL, VNMLA, and VNMLS | Vector | All |
VSQRT | Square Root | VABS, VNEG, and VSQRT | Vector | All |
VSUB | Subtract | VADD, VSUB, and VDIV | Vector | All |