3.4.9.  Advanced SIMD load-store instructions

Table 3.9 shows the Advanced SIMD load-store instruction timing.

The values in Table 3-9 correspond to the number of issue cycles required within the MPE execution unit. The number of cycles required by the Cortex-A9 integer processor is equal to the number of 64-bit aligned doublewords that the NEON load or store data overlaps.

Table 3.9. Advanced SIMD load-store instructions

Name Format Cycles Source Result Writeback
VLD1{Dd},[]2-27
{Dd},[@]1-16
{Dd,Dd1},[]2-,-2,27,7
{Dd,Dd1},[@]1-,-1,16,6
{Dd,Dd1,Dd2},[]3-,-,-2,2,37,7,8
{Dd,Dd1,Dd2},[@]2-,-,-1,1,26,6,7
{Dd,Dd1,Dd2,Dd3},[]3-,-,-,-2,2,3,37,7,8,8
{Dd,Dd1,Dd2,Dd3},[@]2-,-,-,-1,1,2,26,6,7,7
{Dd[x]},[]3148
{Dd[x]},[@]2137
{Dd[]},[]2-37
{Dd[]},[@]1-26
{Dd[],Dd1[]},[]2-,-3,37,7
{Dd[],Dd1[]},[@]1-,-2,26,6
VLD2{Dd,Dd1},[]2-,-3,37,7
{Dd,Dd1},[@]1-,-2,26,6
{Dd,Dd1,Dd2,Dd3},[]3-,-,-,-3,4,3,47,8,7,8
{Dd,Dd1,Dd2,Dd3},[@]2-,-,-,-2,3,2,36,7,6,7
{Dd[x],Dd1[x]},[]31,14,48,8
{Dd[x],Dd1[x]},[@]21,13,37,7
{Dd[],Dd1[]},[]2-,-3,37,7
{Dd[],Dd1[]},[@]1-,-2,26,6
VLD3{Dd,Dd1,Dd2},[]4-,-,-4,4,58,8,9
{Dd,Dd1,Dd2},[@]3-,-,-3,3,47,7,8
{Dd[x],…,Dd2[x]},[]51,1,25,5,69,9,10
{Dd[],Dd1[],Dd2[]},[]3-,-,-3,3,47,7,8
VLD4{Dd,Dd1,Dd2,Dd3},[]4-,-,-,-4,4,5,58,8,9,9
{Dd,Dd1,Dd2,Dd3},[@]3-,-,-,-3,3,4,47,7,8,8
{Dd[x],…,Dd3[x]},[]51,1,2,25,5,6,69,9,10,10
{Dd[x],…,Dd3[x]},[@]41,1,2,24,4,5,58,8,9,9
{Dd[],…,Dd3[]},[]3-,-,-,-3,3,4,47,7,8,8
{Dd[],….,Dd3[]},[@]2-,-,-,-2,2,3,36,6,7,7
VST1{Dd},[]21--
{Dd},[@]11--
{Dd[x]},[]21--
{Dd[x]},[@]11--
VST2{Dd,Dd1},[]21,1--
{Dd,Dd1},[@]11,1--
{Dd[x],Dd1[x]},[]21,1--
{Dd[x],Dd1[x]},[@]11,1--
VST3{Dd,Dd1,Dd2},[]31,1,2--
{Dd,Dd1,Dd2},[@]21,1,2--
{Dd[x],…,Dd2[x]},[]31,1,2--
VST4{Dd,Dd1,Dd2,Dd3},[]31,1,2,2--
{Dd,Dd1,Dd2,Dd3},[@]21,1,2,2--
{Dd[x],…,Dd3[x]},[]31,1,2,2--
{Dd[x],…,Dd3[x]},[@]21,1,2,2--

Copyright © 2008-2010 ARM. All rights reserved.ARM DDI 0409F
Non-ConfidentialID050110