Chapter 5. NEON and VFP Programming

This chapter provides reference information about programming NEON™ and the VFP coprocessor in assembly language. It contains the following sections:

See Table 5.1, Table 5.2, and Table 5.3 to locate descriptions of individual instructions.

Table 5.1. Location of NEON instructions

MnemonicBrief descriptionPage
VABA, VABDAbsolute difference, Absolute difference and AccumulateVABA{L} and VABD{L}
VABSAbsolute valueV{Q}ABS and V{Q}NEG
VACGE, VACGTAbsolute Compare Greater than or Equal, Greater ThanVACGE and VACGT
VACLE, VACLTAbsolute Compare Less than or Equal, Less than (pseudo-instructionS) VACLE and VACLT
VADDAddV{Q}ADD, VADDL, VADDW, V{Q}SUB, VSUBL, and VSUBW
VADDHNAdd, select High halfV{R}ADDHN and V{R}SUBHN
VANDBitwise ANDVAND, VBIC, VEOR, VORN, and VORR (register)
VANDBitwise AND (pseudo-instruction) VAND and VORN (immediate)
VBICBitwise Bit Clear (register)VAND, VBIC, VEOR, VORN, and VORR (register)
VBICBitwise Bit Clear (immediate)VBIC and VORR (immediate)
VBIF, VBIT, VBSLBitwise Insert if False, Insert if True, SelectVBIF, VBIT, and VBSL
VCEQ, VCLE, VCLTCompare Equal, Less than or Equal, Compare Less ThanVCEQ, VCGE, VCGT, VCLE, and VCLT
VCLE, VCLTCompare Less than or Equal, Compare Less Than (pseudo-instruction) VCLE and VCLT
VCLS, VCLZ, VCNTCount Leading Sign bits, Count Leading Zeros, and Count set bitsVCLS, VCLZ, and VCNT
VCVTConvert fixed-point or integer to floating point, floating-point to integer or fixed-pointVCVT
VDUPDuplicate scalar to all lanes of vectorVDUP
VEXTExtractVEXT
VCGE, VCGTCompare Greater than or Equal, Greater ThanVCEQ, VCGE, VCGT, VCLE, and VCLT
VEORBitwise Exclusive ORVAND, VBIC, VEOR, VORN, and VORR (register)
VHADD, VHSUBHalving Add, Halving SubtractV{R}HADD and VHSUB
VMAX, VMINMaximum, MinimumVMAX, VMIN, VPMAX, and VPMIN
VLDVector LoadNEON load / store element and structure instructions
VMLA, VMLSMultiply Accumulate, Multiply Subtract (vector)VMUL{L}, VMLA{L}, and VMLS{L}
VMLA, VMLSMultiply Accumulate, Multiply Subtract (by scalar)VMUL{L}, VMLA{L}, and VMLS{L} (by scalar)
VMOVMove (immediate)VMOV, VMVN (immediate)
VMOVMove (register)VMOV, VMVN (register)
VMOVL, VMOV{U}NMove Long, Move Narrow (register)VMOVL, V{Q}MOVN, VQMOVUN
VMULMultiply (vector)VMUL{L}, VMLA{L}, and VMLS{L}
VMULMultiply (by scalar)VMUL{L}, VMLA{L}, and VMLS{L} (by scalar)
VMVNMove Negative (immediate)VMOV, VMVN (immediate)
VNEGNegateV{Q}ABS and V{Q}NEG
VORNBitwise OR NOTVAND, VBIC, VEOR, VORN, and VORR (register)
VORNBitwise OR NOT (pseudo-instruction) VAND and VORN (immediate)
VORRBitwise OR (register)VAND, VBIC, VEOR, VORN, and VORR (register)
VORRBitwise OR (immediate)VBIC and VORR (immediate)
VPADD, VPADALPairwise Add, Pairwise Add and AccumulateVPADD{L}, VPADAL
VPMAX, VPMINPairwise Maximum, Pairwise MinimumVMAX, VMIN, VPMAX, and VPMIN
VQABSAbsolute value, saturateV{Q}ABS and V{Q}NEG
VQADDAdd, saturateV{Q}ADD, VADDL, VADDW, V{Q}SUB, VSUBL, and VSUBW
VQDMLAL, VQDMLSLSaturating Doubling Multiply Accumulate, and Multiply SubtractVQDMULL, VQDMLAL, and VQDMLSL (by vector or by scalar)
VQMOV{U}NSaturating Move (register)VMOVL, V{Q}MOVN, VQMOVUN
VQDMULSaturating Doubling MultiplyVQDMULL, VQDMLAL, and VQDMLSL (by vector or by scalar)
VQDMULH

Saturating Doubling Multiply returning High half

VQ{R}DMULH (by vector or by scalar)
VQNEGNegate, saturateV{Q}ABS and V{Q}NEG
VQRDMULH

Saturating Doubling Multiply returning High half

VQ{R}DMULH (by vector or by scalar)
VQRSHLShift Left, Round, saturate (by signed variable)V{Q}{R}SHL (by signed variable)
VQRSHRShift Right, Round, saturate (by immediate)VQ{R}SHR{U}N (by immediate)
VQSHLShift Left, saturate (by immediate)VSHL, VQSHL, VQSHLU, and VSHLL (by immediate)
VQSHLShift Left, saturate (by signed variable)V{Q}{R}SHL (by signed variable)
VQSHRShift Right, saturate (by immediate)VQ{R}SHR{U}N (by immediate)
VQSUBSubtract, saturateV{Q}ADD, VADDL, VADDW, V{Q}SUB, VSUBL, and VSUBW
VRADDHAdd, select High half, RoundV{R}ADDHN and V{R}SUBHN
VRECPE, VRECPSReciprocal Estimate, Reciprocal StepVRECPE and VRSQRTE
VREVReverse elementsVREV
VRHADDHalving Add, RoundV{R}HADD and VHSUB
VRSHR, VRSRAShift Right and Round, Shift Right, Round, and Accumulate (by immediate)V{R}SHR{N}, V{R}SRA (by immediate)
VRSUBHSubtract, select High half, RoundV{R}ADDHN and V{R}SUBHN
VRSQRTE, VRSQRTSReciprocal Square Root Estimate, Reciprocal Square Root StepVRECPS and VRSQRTS
VSHLShift Left (by immediate)VSHL, VQSHL, VQSHLU, and VSHLL (by immediate)
VSHRShift Right (by immediate)V{R}SHR{N}, V{R}SRA (by immediate)
VSLIShift Left and InsertVSLI and VSRI
VSRAShift Right, Accumulate (by immediate)V{R}SHR{N}, V{R}SRA (by immediate)
VSRIShift Right and InsertVSLI and VSRI
VSTVector StoreNEON load / store element and structure instructions
VSUBSubtractV{Q}ADD, VADDL, VADDW, V{Q}SUB, VSUBL, and VSUBW
VSUBHSubtract, select High halfV{R}ADDHN and V{R}SUBHN
VSWPSwap vectorsVSWP
VTBL, VTBXVector table look-upVTBL, VTBX
VTSTTest bitsVTST
VTRNVector transposeVTRN
VUZP, VZIPVector interleave and de-interleaveVUZP, VZIP

Table 5.2. Location of shared NEON and VFP instructions

MnemonicBrief descriptionPageOp.Arch.
VLDMLoad multipleVLDM, VSTM, VPOP, and VPUSH-All
VLDRLoad (see also VLDR pseudo‑instruction)VLDR and VSTRScalarAll
VMOVTransfer from one ARM® register to half of double-precisionVMOV (between an ARM register and a NEON scalar)ScalarAll
 Transfer from two ARM registers to double-precisionVMOV (between two ARM registers and an extension register)ScalarVFPv2
 Transfer from half of double-precision to ARM registerVMOV (between an ARM register and a NEON scalar)ScalarAll
 Transfer from double-precision to two ARM registersVMOV (between two ARM registers and an extension register)ScalarVFPv2
 Transfer from single-precision to ARM registerVMOV (between one ARM register and single precision VFP)ScalarAll
 Transfer from ARM register to single-precisionVMOV (between one ARM register and single precision VFP)ScalarAll
VMRSTransfer from NEON and VFP system register to ARM registerVMRS and VMSR-All
VMSRTransfer from ARM register to NEON and VFP system registerVMRS and VMSR-All
VSTMStore multipleVLDM, VSTM, VPOP, and VPUSH-All
VSTRStoreVLDR and VSTRScalarAll

Table 5.3. Location of VFP instructions

MnemonicBrief descriptionPageOp.Arch.
VABSAbsolute valueVABS, VNEG, and VSQRTVectorAll
VADDAddVADD, VSUB, and VDIVVectorAll
VCMPCompareVCMPScalarAll
VCVTConvert betwen single-precision and double-precisionVCVT (between single-precision and double-precision)ScalarAll
 Convert between floating-point and integerVCVT (between floating-point and integer)ScalarAll
 Convert between floating-point and fixed-pointVCVT (between floating-point and fixed-point)ScalarVFPv3
VDIVDivideVADD, VSUB, and VDIVVectorAll
VMLAMultiply accumulateVMUL, VMLA, VMLS, VNMUL, VNMLA, and VNMLSVectorAll
VMLSMultiply subtractVMUL, VMLA, VMLS, VNMUL, VNMLA, and VNMLSVectorAll
VMOVInsert floating-point constant in single-precision or double-precision register (see also Table 5.2)VMOVScalarVFPv3
VMULMultiplyVMUL, VMLA, VMLS, VNMUL, VNMLA, and VNMLSVectorAll
VNEGNegateVABS, VNEG, and VSQRTVectorAll
VNMLANegated multiply accumulateVMUL, VMLA, VMLS, VNMUL, VNMLA, and VNMLSVectorAll
VNMLSNegated multiply subtractVMUL, VMLA, VMLS, VNMUL, VNMLA, and VNMLSVectorAll
VNMULNegated multiplyVMUL, VMLA, VMLS, VNMUL, VNMLA, and VNMLSVectorAll
VSQRTSquare RootVABS, VNEG, and VSQRTVectorAll
VSUBSubtractVADD, VSUB, and VDIVVectorAll

Copyright © 2002-2007 ARM Limited. All rights reserved.ARM DUI 0204H
Non-Confidential