| |||
| Home > Instruction Execution > Operation of the scoreboard > Data hazards in Full-compliance mode | |||
Source registers must be protected in the event of an exceptional condition in an instruction or in an iteration of a short vector instruction.
Source registers are cleared in the first Execute 1 cycle of an operation. To enable forwarding to a subsequent instruction, destination registers are cleared in the next-to-last cycle.
The sections that follow give examples of data hazards in RunFast mode:
In Example 4.5, the FMSTAT is stalled for three cycles in the Fetch stage until the FCMPS updates the condition codes in the FPSCR register. Two cycles later, FMSTAT updates the ARM CPSR register with the condition codes.
Table 4.6 shows the VFP9-S pipeline stages for Example 4.5.
Table 4.6. Pipeline stages for Example 4.5
| Instruction cycle number | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Instruction | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| FCMPS | F | D | E1 | E2 | E3 | E4 | - | - | - | - |
| FMSTAT | - | F | F | F | F | F | D | E | M | W |
In Example 4.6, the FADDS is stalled in the Fetch stage for nine cycles until the FLDM makes its last transfer to the VFP9-S coprocessor.
Table 4.7 shows the VFP9-S pipeline stages for Example 4.6.
Table 4.7. Pipeline stages for Example 4.6
| Instruction cycle number | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Instruction | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 |
| FLDM | F | D | E | M | W | W | W | W | W | W | W | W | - | - | - | - |
| FADDS | - | F | F | F | F | F | F | F | F | F | F | D | E1 | E2 | E3 | E4 |
In Example 4.7, the FADDS is stalled for three cycles in the Fetch stage until the FMULS data is written and forwarded in cycle 6 to the Decode stage of the FADDS.
Table 4.8 shows VFP9-S pipeline stages of Example 4.7.
Table 4.8. Pipeline stages for Example 4.7
| Instruction cycle number | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Instruction | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| FMULS | F | D | E1 | E2 | E3 | E4 | - | - | - | - |
| FADDS | - | F | F | F | F | D | E1 | E2 | E3 | E4 |
In Example 4.8, the short vector FADDS is stalled in the Fetch stage until the FLDM loads all source registers required by the FADDS. In this case, the FADDS is stalled for two cycles. It does not have to wait for completion of the FLDM, because it depends on the FLDM only for one register, S7. The S7 data is forwarded in cycle 5. The vector length is four iterations (LEN = 3), and the stride is one (STRIDE = 0). Notice that the first source vector uses registers S7, S0, S1, and S2, and the only FADDS source register loaded by the FLDM is S7. This example is based on the assumption that the remaining source and destination registers are available to the FADDS in cycle 5.
Table 4.9 shows the VFP9-S pipeline stages for Example 4.8.
Table 4.9. Pipeline stages for first iteration of Example 4.8
| Instruction cycle number | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Instruction | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
| FLDM | F | D | E | M | W | W | W | W | - |
| FADDS | - | F | F | F | D | E1 | E2 | E3 | E4 |
In Example 4.9, S25 is a source for the second iteration of the FMULS and a source for the FSTS. The FMULS locks S25, and the FSTS must wait until the FMULS releases it. After the FMULS releases S25, the FSTS can store S25 while the FMULS continues with its third and fourth iteration. The vector length is four iterations (LEN = 3), and the stride is one (STRIDE = 0).
Table 4.10. Pipeline stages for Example 4.9
| Instruction cycle number | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Instruction | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
| FMULS | F | D | E1 | E1 | E1 | E1 | E2 | E3 | E4 |
| FSTS | - | F | F | F | D | E | M | W | - |
In Example 4.10, the load multiple FLDMS creates a WAR hazard to the source registers of the FMULS. The vector length is four iterations (LEN = 3), and the stride is one (STRIDE = 0). The VFP9-S coprocessor stalls the FLDMS until the FMULS clears all the source registers, S16-S19 and S24-S27.
Table 4.11. Pipeline stages for first iteration of Example 4.10
| Instruction cycle number | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Instruction | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 |
| FMULS | F | D | E1 | E1 | E1 | E1 | E2 | E3 | E4 | - | - | - | - | - | - | - | - |
| FLDMS | - | F | F | F | F | F | D | E | M | W | W | W | W | W | W | W | W |