16.6.5. Advanced SIMD floating-point instructions

Table 16.21 shows the operation of the Advanced SIMD floating-point instructions.

Table 16.21. Advanced SIMD floating-point instructions

InstructionRegister formatCyclesSource1Source2Source3Source4Result1Result2

VADD

VSUB

VABD

VMUL

VCEQ

VCGE

VCGT

VCAGE

VCAGT

VMAX

VMIN

Dd,Dn,Dm1Dn:N2Dm:N2--Dd:N5-
Qd,Qn,Qm

1

2

QnLo:N2

QnHi:N2

QmLo:N2

QmHi:N2

-

-

-

-

QdLo:N5

QdHi:N5

-

-

VABS

VNEG

VCEQZ

VCGEZ

VCGTZ

VCLEZ

VCLTZ

VRECPE

VRSQRTE

VCVT

Dd,Dm1Dm:N2---Dd:N5-
Qd,Qm

1

2

QmLo:N2

QmHi:N2

-

-

-

-

-

-

QdLo:N5

QdHi:N5

-

-

VSUM

VFMX

VPMN

Dd,Dn,Dm1

Dn:N1

Dm:N1

--Dd:N5-
VMUL

Dd,Dn,Dm[x]

(scalar)

1Dn:N2Dm:N1

-

-

Dd:N5-

Qd,Qn,Dm[x]

(scalar)

1

2

QnLo:N2

QnHi:N2

Dm:N1

-

-

-

-

-

QdLo:N5

QdHi:N5

-

-

VMLA[1]

VMLSa

Dd,Dn,Dm

1

Dn:N2

Dm:N2

Dd:N3

-

Dd:N9

-

Qd,Qn,Qm

1

2

QnLo:N2

QnHi:N2

QmLo:N2

QmHi:N2

QdLo:N3

QdHi:N3

-

-

QdLo:N9

QdHi:N9

-

-

Dd,Dn,Dm[x] (scalar)

1

Dn:N2

Dm:N1

Dd:N3

-

Dd:N9

-

Qd,Qn,Dm[x] (scalar)

1

2

QnLo:N2

QnHi:N2

Dm:N1

-

QdLo:N3

QdHi:N3

-

-

QdLo:N9

QdHi:N9

-

-

VRECPSa

VRSQRTSa

Dd,Dn,Dm

1

Dn:N2

Dm:N2

-

-

Dd:N9

-

Qd,Qn,Qm

1

2

QnLo:N2

QnHi:N2

QmLo:N2

QmHi:N2

-

-

-

-

QdLo:N9

QdHi:N9

-

-

[1] The VMLA.F, VMLS.F, VRECPS.F, VRSQRTS.F instructions begin execution on the floating-point multiply pipeline. The floating-point multiply result is then forwarded to the floating-point add pipeline to complete the accumulate portion of the instructions. Therefore, these instructions are pipelined across ten stages, N1 through N10, where N10 is the writeback stage.


Note

The VMLA.F and VMLS.F type instructions have additional restrictions that determine when they can be issued:

  • If a VMLA.F is followed by a VMLA.F with no RAW hazard, the second VFMLA.F will issue with no stalls.

  • If a VMLA.F is followed by an VADD.F or VMUL.F with no RAW hazard, the VADD.F or VMUL.F will stall 4 cycles before issue. The 4 cycle stall preserves the in-order retirement of the instructions.

  • A VMLA.F followed by any NEON floating-point instruction with RAW hazard will stall for 8 cycles.

Copyright © 2006-2009 ARM Limited. All rights reserved.ARM DDI 0344I
Non-Confidential