3.27 What can limit or prevent automatic vectorization

The following table summarizes what can limit or prevent automatic vectorization of loops.

Table 3-5 Factors that limit or prevent automatic vectorization

Inhibiting factor Extent to which it applies
Not having a valid NEON compiler license.

You might require a valid NEON compiler license to generate NEON instructions, depending on your compiler version.

RVCT 3.1 or later, and ARM Compiler 4.1, require a valid NEON compiler license.

ARM Compiler 5.01 and later do not require a separate NEON compiler license.

Source code without loops. Automatic vectorization involves loop analysis. Without loops, automatic vectorization cannot apply.
Target processor. The target processor (--cpu) must have NEON capability if NEON instructions are to be generated. For example, Cortex-A7, Cortex-A8, Cortex-A9, Cortex-A12, or Cortex-A15.
Floating-point code. Vectorization of floating-point code does not always occur automatically. For example, loops that require re-association only vectorize when compiled with --fpmode fast.
--no_vectorize by default. By default, generation of NEON vector instructions directly from C or C++ code is disabled, and must be enabled with --vectorize.
-Otime not specified. -Otime must be specified to reduce execution time and enable loops to vectorize.
-Onum not set high enough. The optimization level you set must be -O2 or -O3. Loops do not vectorize at -O0 or -O1.
Risk of incorrect results. If there is a risk of an incorrect result, vectorization is not applied where that risk occurs. You might have to manually tune your code to make it more suitable for automatic vectorization.
Earlier manual optimization attempts. Automatic vectorization can be impeded by earlier manual optimization attempts. For example, manual loop unrolling in the source code, or complex array accesses.
No vector access pattern. If variables in a loop lack a vector access pattern, the compiler cannot automatically vectorize the loop.
Data dependencies between different iterations of a loop. Where there is a possibility of the use and storage of arrays overlapping on different iterations of a loop, there is a data dependency problem. A loop cannot be safely vectorized if the vector order of operations can change the results, so the compiler leaves the loop in its original form or only partially vectorizes the loop.
Memory hierarchy. Performing relatively few arithmetic operations on large data sets retrieved from main memory is limited by the memory bandwidth of the system. Most processors are relatively unbalanced between memory bandwidth and processor capacity. This can adversely affect the automatic vectorization process.
Iteration count not fixed at start of loop. For automatic vectorization, it is generally best to write simple loops with iterations that are fixed at the start of the loop. If a loop does not have a fixed iteration count, automatic addressing is not possible.
Conditional loop exits. It is best to write loops that do not contain conditional exits from the loop.
Carry-around scalar variables. Carry-around scalar variables are a problem for automatic vectorization because the value computed in one pass of the loop is carried forward into the next pass.
__promise(expr) not used. Failure to use __promise(expr) where it could make a difference to automatic vectorization can limit automatic vectorization.
Pointer aliasing. Pointer aliasing prevents the use of automatically vectorized code.
Indirect addressing. Indirect addressing is not vectorizable because the NEON unit can only deal with vectors stored consecutively in memory.
Separating access to different parts of a structure into separate loops. Each part of a structure must be accessed within the same loop for automatic vectorization to occur.
Inconsistent length of members within a structure. If members of a structure are not all the same length, the compiler does not attempt to use vector loads.
Calls to non-inline functions. Calls to non-inline functions from within a loop inhibits vectorization. If such functions are to be considered for vectorization, they must be marked with the __inline or __forceinline keywords.
if and switch statements. Extensive use of if and switch statements can affect the efficiency of automatic vectorization.

You can use --diag_warning=optimizations to obtain compiler diagnostics on what can and cannot be vectorized.

Non-ConfidentialPDF file icon PDF versionARM DUI0472M
Copyright © 2010-2016 ARM Limited or its affiliates. All rights reserved.