|Non-Confidential||PDF version||ARM DUI0472M|
|Home > Using the NEON Vectorizing Compiler > What can limit or prevent automatic vectorization|
The following table summarizes what can limit or prevent automatic vectorization of loops.
Table 3-5 Factors that limit or prevent automatic vectorization
|Inhibiting factor||Extent to which it applies|
|Not having a valid NEON compiler license.||
You might require a valid NEON compiler license to generate NEON instructions, depending on your compiler version.
RVCT 3.1 or later, and ARM Compiler 4.1, require a valid NEON compiler license.
ARM Compiler 5.01 and later do not require a separate NEON compiler license.
|Source code without loops.||Automatic vectorization involves loop analysis. Without loops, automatic vectorization cannot apply.|
|Target processor.||The target processor (
|Floating-point code.||Vectorization of floating-point code does not
always occur automatically. For example, loops that require re-association
only vectorize when compiled with
||By default, generation of NEON vector instructions directly
from C or C++ code is disabled, and must be enabled with
||The optimization level you set must be
|Risk of incorrect results.||If there is a risk of an incorrect result, vectorization is not applied where that risk occurs. You might have to manually tune your code to make it more suitable for automatic vectorization.|
|Earlier manual optimization attempts.||Automatic vectorization can be impeded by earlier manual optimization attempts. For example, manual loop unrolling in the source code, or complex array accesses.|
|No vector access pattern.||If variables in a loop lack a vector access pattern, the compiler cannot automatically vectorize the loop.|
|Data dependencies between different iterations of a loop.||Where there is a possibility of the use and storage of arrays overlapping on different iterations of a loop, there is a data dependency problem. A loop cannot be safely vectorized if the vector order of operations can change the results, so the compiler leaves the loop in its original form or only partially vectorizes the loop.|
|Memory hierarchy.||Performing relatively few arithmetic operations on large data sets retrieved from main memory is limited by the memory bandwidth of the system. Most processors are relatively unbalanced between memory bandwidth and processor capacity. This can adversely affect the automatic vectorization process.|
|Iteration count not fixed at start of loop.||For automatic vectorization, it is generally best to write simple loops with iterations that are fixed at the start of the loop. If a loop does not have a fixed iteration count, automatic addressing is not possible.|
|Conditional loop exits.||It is best to write loops that do not contain conditional exits from the loop.|
|Carry-around scalar variables.||Carry-around scalar variables are a problem for automatic vectorization because the value computed in one pass of the loop is carried forward into the next pass.|
||Failure to use
|Pointer aliasing.||Pointer aliasing prevents the use of automatically vectorized code.|
|Indirect addressing.||Indirect addressing is not vectorizable because the NEON unit can only deal with vectors stored consecutively in memory.|
|Separating access to different parts of a structure into separate loops.||Each part of a structure must be accessed within the same loop for automatic vectorization to occur.|
|Inconsistent length of members within a structure.||If members of a structure are not all the same length, the compiler does not attempt to use vector loads.|
|Calls to non-inline functions.||Calls to non-inline functions from within a
loop inhibits vectorization. If such functions are to be considered
for vectorization, they must be marked with the
||Extensive use of
You can use
obtain compiler diagnostics on what can and cannot be vectorized.