16.8. Scheduling example

Example 16.6 shows a sample code segment and how the processor might schedule it.

Example 16.6. Dual issue instruction sequence for integer pipeline

Cycle  PC          Opcode     Instruction              Timing description
  1    0x00000ed0: 0xe12fff1e BX r14                   Dual issue pipeline 0
  1    0x00000ee4: 0xe3500000 CMP r0,#0                Dual issue in pipeline 1
  2    0x00000ee8: 0xe3a03003 MOV r3,#3                Dual issue pipeline 0
  2    0x00000eec: 0xe3a00000 MOV r0,#0                Dual issue in pipeline 1
  3    0x00000ef0: 0x05813000 STREQ r3,[r1,#0]         Dual issue in pipeline 0, r3 not needed until E3
  3    0x00000ef4: 0xe3520004 CMP r2,#4                Dual issue in pipeline 1
  4    0x00000ef8: 0x979ff102 LDRLS pc,[pc,r2,LSL #2]  Single issue pipeline 0, +1 cycle for load to pc, no 
                                                       extra cycle for shift since LSL #2
  5    0x00000f2c: 0xe3a00001 MOV r0,#1                Dual issue with 2nd iteration of load in
                                                       pipeline 1
  6    0x00000f30: 0xea000000 B {pc}+8                 #0xf38 dual issue pipeline 0
  6    0x00000f38: 0xe5810000 STR r0,[r1,#0]           Dual issue pipeline 1
  7    0x00000f3c: 0xe49df004 LDR pc,[r13],#4          Single issue pipeline 0, +1 cycle for load to pc
  8    0x0000017c: 0xe284200c ADD r2,r4,#0xc           Dual issue with 2nd iteration of load in pipeline 1
  9    0x00000180: 0xe5960004 LDR r0,[r6,#4]           Dual issue pipeline 0
  9    0x00000184: 0xe3a0100a MOV r1,#0xa              Dual issue pipeline 1
 12    0x00000188: 0xe5900000 LDR r0,[r0,#0]           Single issue pipeline 0: r0 produced in E3,
                                                       required in E1, so +2 cycle stall
 13    0x0000018c: 0xe5840000 STR r0,[r4,#0]           Single issue pipeline 0 due to LS resource
                                                       hazard, no extra delay for r0 since produced in
                                                       E3 and consumed in E3
 14    0x00000190: 0xe594000c LDR r0,[r4,#0xc]         Single issue pipeline 0 due to LS resource  hazard
 15    0x00000194: 0xe8bd4070 LDMFD r13!,{r4-r6,r14}   Load multiple loads r4 in 1st cycle, r5 and r6                                                        in 2nd cycle, r14 in 3rd cycle, 3 cycles total
 17    0x00000198: 0xea000368 B {pc}+0xda8             #0xf40 dual issue in pipeline 1 with 3rd cycle of LDM
 18    0x00000f40: 0xe2800002 ADD r0,r0,#2 ARM         Single issue in pipeline 0
 19    0x00000f44: 0xe0810000 ADD r0,r1,r0 ARM         Single issue in pipeline 0, no dual issue due to
                                                       hazard on r0 produced in E2 and required in E2

Example 16.7 shows a sample instruction sequence for the NEON pipeline.

Example 16.7. Instruction sequence for the NEON pipeline

Cycle  PC          Opcode      Instruction                  Timing description
  1    0x00003690: 0xf2dbeac8  VMULL.S16 q15,d27,d0[1]     ;4X16 SIMD multiply
  2    0x00003694: 0xf2daaac8 VMULL.S16 q13,d26,d0[1]      ;independent from previous multiply, issued
                                                            in back-to-back cycles
  2    0x00003698: 0xf4402a5d VST1.16 {d18,d19},[r0@64]!   ;128bit 2-issue cycle store (1st issue cycle
                                                            is dual issued with previous instruction)
  3    0x0000369c: 0xf2d7685a VRSHRN.I32 d22,q5,#9         ;shift operation (dual issued with 2nd issue
                                                            cycle of previous store)
  4    0x000036a0: 0xf2d7785c VRSHRN.I32 d23,q6,#9         ;independent from previous shift, executed
                                                            in back-to-back cycles
  5    0x000036a4: 0xf29caac0 VMULL.S16 q5,d28,d0[0]       ;4X16 SIMD multiply
  6    0x000036a8: 0xf29dcac0 VMULL.S16 q6,d29,d0[0]       ;independent from previous multiply, issued
                                                            in back-to-back cycles
  7    0x000036ac: 0xf26aa8c6 VADD.I32 q13,q13,q3          ;4x32 (128bit) VADD uses result of multiply
                                                            from cycle 2.
  8    0x000036b0: 0xf26ee8c8 VADD.I32 q15,q15,q4          ;4x32 (128bit) independent from previous
                                                            add, issued in back-to-back cycles  9    0x000036b4: 0xf29e6260 VMLAL.S16 q3,d14,d0[2]       ;independent multiply
  9    0x000036bc: 0xf4004a5d VST1.16  {d4,d5},[r0@64]!    ;128bit 2-issue cycle store (1st issue cycle
                                                            is dual issued with previous instruction)     
 10    0x000036c0: 0xf2d7a87a VRSHRN.I32 d26,q13,#9        ;shift operation (dual issued with 2nd issue
                                                            cycle of previous store) 

Example 16-8 shows an instruction sequence for the VFP pipeline.

Example 16.8. Instruction sequence for VFP pipeline

Cycle  PC          Opcode     Instruction          Timing description
  4    0x00002c44: 0xeeb01a49 FCPYS   s2,s18      ;4 cycle single precision register move
  8    0x00002c48: 0xeef00a68 FCPYS   s1,s17      ;4 cycle single precision register move
 12    0x00002c4c: 0xeeb00a48 FCPYS   s0,s16      ;4 cycle single precision register move
 12    0x00002c50: 0xeb000116 BL      {pc}+0x460  ;branch executed ‘for free’ in ARM pipeline, not 
                                                   seen by Neon
 19    0x000030b0: 0xee200a21 FMULS   s0,s0,s3    ;7 cycle single precision multiply operation
 30    0x000030b4: 0xee000a82 FMACS   s0,s1,s4    ;11 cycle single precision multiply accumulate (uses
                                                   NFP multiply and add pipelines with bypassing of add
                                                   format stage)
 41    0x000030b8: 0xee010a22 FMACS   s0,s2,s5    ;11 cycle single precision multiply accumulate (uses
                                                   NFP multiply and add pipelines with bypassing of add
                                                   format stage)
 41    0x000030bc: 0xe12fff1e BX      lr          ;branch executed ‘for free’ in ARM pipeline, not seen
                                                   by Neon
 45    0x00002c54: 0xeeb01a4a FCPYS   s2,s20      ;4 cycle single precision register move
 80    0x00002c58: 0xeeb10ac0 FSQRTS  s0,s0       ;35 cycles to execute single precision square root
                                                   function (number of cycles is data dependent)
112    0x00002c5c: 0xeec10a00 FDIVS    s1,s2,s0   ;32 cycles to execute single precision divide
                                                   function (number of cycles is data dependent)
116    0x00002c60: 0xeeb00a69 FCPYS    s0,s19     ;4 cycle single precision register move
123    0x00002c64: 0xee600a20 FMULS    s1,s0,s1   ;7 cycle single precision multiply operation 

Copyright © 2006-2010 ARM Limited. All rights reserved.ARM DDI 0344K
Non-ConfidentialID060510