5.2. Branch prediction

The PFU normally fetches instructions from sequential addresses. If a branch instruction is fetched, the next instruction to be fetched can only be determined with certainty after the instruction has completed execution at the end of the pipeline in the DPU. If the branch is taken, the next instruction to be executed is not sequential. The sequential instructions that the PFU has fetched while the branch instruction was executing must be flushed from the pipeline and the correct instruction fetched. This has the effect of reducing the performance of the processor.

The PFU can detect branches in the Pd-stage of the pipeline, predict whether or not the branch is taken, and determine or predict the target address for a taken branch. This enables the PFU to start fetching instructions at the destination of a taken branch before the branch has completed execution in the DPU. The branch instruction is still executed in the DPU to determine the accuracy of the prediction. If the branch was mispredicted, the pipeline must be flushed and the correct instruction fetched. In general, more branches are correctly predicted than mispredicted so fewer pipeline flushes occur and the performance of the processor is enhanced.

Two major classes of branch are addressed in the processor prediction scheme:

  1. Direct branches, including B, BL, CZB, and BLX immediate, where the target address is a fixed offset, encoded in the instruction, from the program counter. If such an instruction has been fetched, and the program counter is known, predicting the destination of the branch only involves predicting whether the instruction passes or fails its condition code, that is, whether the branch is taken or not taken.

  2. Indirect branches such as load and Branch and eXchange (BX), instructions that write to the PC, that can be identified as a likely return from a procedure call. Two identifiable cases are:

    • loads to the PC from an address derived from R13

    • BX from R0-R14.

    In these cases, if the calling operation can also be identified, the likely return address can be stored in the return stack. Typical calling operations are BL and BLX instructions.


Unconditional instructions of either class of program flow are always executed, and do not affect prediction history. Unconditional return stack operations always affect the return stack.

This section describes:

Copyright © 2010-2011 ARM. All rights reserved.ARM DDI 0460C