3.10 NEON vectorization performance goals

Most applications require tuning to gain the best performance from vectorization. There is always some overhead so the theoretical maximum performance cannot be reached.

For example, the NEON unit can process four single-precision floats at one time. This means that the theoretical maximum performance for a floating-point application is a factor of four over the original scalar nonvectorized code.

Related concepts
3.6 Automatic vectorization
3.17 Nonvectorization on conditional loop exits
3.16 Nonvectorization on loops containing pointers and indirect addressing
3.15 Vectorization on loops containing pointers
3.14 Reduction of a vector to a scalar
3.13 Carry-around scalar variables and vectorization
3.12 Data dependency conflicts when vectorizing code
3.9 Factors affecting NEON vectorization performance
Related reference
8.189 --vectorize, --no_vectorize
3.11 Recommended loop structure for vectorization
Non-ConfidentialPDF file icon PDF versionARM DUI0472J
Copyright © 2010-2013 ARM. All rights reserved.