1.4. VFP11 coprocessor pipelines

The VFP11 coprocessor has three separate instruction pipelines:

Each pipeline can operate independently of the other pipelines and in parallel with them. Each of the three pipelines shares the first two pipeline stages, Decode and Issue. These two stages and the first cycle of the Execute stage of each pipeline remain in lockstep with the ARM1136 processor pipeline stage but effectively one cycle behind the ARM1136 pipeline. When the ARM1136 processor is in the Issue stage for a particular VFP instruction, the VFP11 coprocessor is in the Decode stage for the same instruction. This lockstep mechanism maintains in-order issue of instructions between the ARM1136 processor and the VFP11 coprocessor.

The three pipelines can operate in parallel, enabling more than one instruction to be completed per cycle. Instructions issued to the FMAC pipeline can complete out of order with respect to operations in the LS and DS pipelines. This out-of-order completion might be visible to the user when a short vector FMAC or DS operation generates an exception, and an LS operation begins before the exception is detected. In this situation:

For more information, see Parallel execution.

Except for divide and square root operations, the pipelines support single-cycle throughput for all single-precision operations and most double-precision operations. Double-precision multiply and multiply and accumulate operations have a two-cycle throughput. The LS pipeline can supply two single-precision operands or one double-precision operand per cycle, balancing the data transfer capability with the operand requirements.

Copyright © 2002, 2003, 2005-2007 ARM Limited. All rights reserved.ARM DDI 0274H
Non-Confidential