22.10. Parallel execution

The VFP11 coprocessor is capable of execution in each of the three pipelines independently of the others and without blocking issue or writeback from any pipeline. Separate LS, FMAC, and DS pipelines allow for parallel operation of CDP and data transfer instructions. Scheduling instructions to take advantage of the parallelism that occurs when multiple instructions execute in the VFP11 pipelines can result in a significant improvement in program execution time.

A data transfer operation can begin execution if:

A CDP can be issued to the FMAC pipeline if:

A divide or square root instruction can be issued to the DS pipeline if:

Table 22.15 shows a case of the VFP11 coprocessor executing instructions in parallel in each of the three pipelines:

In this example, the LEN field contains b011, selecting a vector length of four iterations, and the STRIDE field contains b00, for a vector stride of one.

Example 22.13. Parallel execution in all three pipelines

FLDM [R4], {S4-S13}
FDIVSS0, S1, S2
FADDS S16, S20, S24

Table 22.15 shows the pipeline progression of the three instructions.

Table 22.15. Parallel execution in all three pipelines

 Instruction cycle number
 123456789101112131415
FLDMDIEM1M2WWWWW-----
FDIVS-DIE1’E1E1E1E1E1E1E1E1E1E1E1
FADDS--DIE1E1E1E1E2E3E4E5E6E7W

In Example 22.13, no data hazards exist between any of the three instructions. The load multiple is able to begin execution immediately, and data is transferred to the register file beginning in cycle 6. Because the destination is in bank 0, the FDIVS is a scalar operation and requires one cycle in the FMAC pipeline E1 stage. If the FDIVS were a short vector operation, the FADDS could not begin execution until the last FDIVS iteration passed the FMAC E1 pipeline stage. The FADDS is a short vector operation and requires the FMAC pipeline E1 stage for cycles 5-8.

Note

E1’ is the first cycle in E1 and is in both FMAC and DS blocks. Subsequent E1 cycles represent the iteration cycles and occupy both E1 and E2 stages in the DS block.

Copyright © 2005-2007 ARM Limited. All rights reserved.ARM DDI 0290G
Non-Confidential