1.7. Parallel execution of instructions

The VFP11 coprocessor can execute several floating-point operations in parallel, while the ARM1136JF-S processor is executing ARM instructions. While a short vector operation executes for a number of cycles in the VFP11 coprocessor, it appears to the ARM1136 processor as a single-cycle instruction and is retired in the ARM1136 processor before it completes execution in the VFP11 coprocessor. The three pipelines in the VFP coprocessor operate independently of one another once initial processing is completed. This means you can issue a short vector operation, and issue a load or store multiple operation in the next cycle, and have both executing at the same time, provided no data hazards exist between the two instructions. With this mechanism, you can write algorithms that can be double-buffered to hide much of the time to transfer data to and from the VFP11 coprocessor under the arithmetic operations. This results in a significant improvement in performance. The separate DS pipeline enables both data transfer operations and CDPs that are not to the DS pipeline to execute in parallel with the divide. The DS block has a dedicated write port to the register file, and executing operations in parallel with divide or square root instructions does not require any special care. For more information see Parallel execution.

Copyright © 2002, 2003, 2005-2007 ARM Limited. All rights reserved.ARM DDI 0274H
Non-Confidential