4.11. Execution timing

Complex instruction dependencies and memory system interactions make it impossible to describe briefly the exact cycle timing of all instructions in all circumstances. The timing shown in Table 4.17 is accurate in most cases. For precise timing, you must use a cycle-accurate model of the ARM1136JF-S processor.

In Table 4.17, throughput is defined as the cycle after issue in which another instruction can begin execution. Instruction latency is the number of cycles after which the data is available for another operation. Forwarding reduces the latency by one cycle for operations that depend on floating-point data. Table 4.17 shows the throughput and latency for all VFP11 instructions.

Table 4.17. Throughput and latency cycle counts for VFP11 instructions

InstructionsSingle-precisionDouble-precision
ThroughputLatencyThroughputLatency
FABS, FNEG, FCVT, FCPY1414
FCMP, FCMPE, FCMPZ, FCMPEZ1414
FSITO, FUITO, FTOSI, FTOUI, FTOUIZ, FTOSIZ1818
FADD, FSUB1818
FMUL, FNMUL1829
FMAC, FNMAC, FMSC, FNMSC1829
FDIV, FSQRT15192933
FLD [1]

1

414
FST [1]

1[1]

System- dependent1System- dependent
FLDM [1]X[2]X[2] + 3X[2]X[2] + 3
FSTM [1]X[2]System- dependentX[2]System- dependent
FMSTAT12--
FMSR, FMSRR [3]14--
FMDHR, FMDHC, FMDRR [3]--14
FMRS, FMRRS [3]12--
FMRDH, FMRDL, FMRRD [3]--12
FMXR [4]14--
FMRX [4]12--

[1] The cycle count for a load instruction is based on load data that is cached and available to the ARM1136 processor from the cache. The cycle count for a store instruction is based on store data that is written to the cache and/or write buffer immediately. When the data is not cached or the write buffer is unavailable, the number of cycles also depends on the memory subsystem.

[2] The number of cycles represented by X is (N/2) if N is even or (N/2 + 1) if N is odd.

[3] FMDRR and FMRRD transfer one double-precision data per transfer. FMSRR and FMRRS transfer two single-precision data per transfer.

[4] FMXR and FMRX are serializing instructions. The latency depends on the register transferred and the current activity in the VFP11 coprocessor when the instruction is issued.

Copyright © 2002, 2003, 2005-2007 ARM Limited. All rights reserved.ARM DDI 0274H
Non-Confidential