3.4.1. General performance issues

Using the command-line options -O3 and -Otime ensures that the code achieves significant performance benefits in addition to those of vectorization.

When optimizing for performance, you must give consideration to the high-level algorithm structure, data element size, array configurations, strict iterative loops, reduction operations and data dependency issues. Optimizing for performance requires an understanding of where in the program most of the time is spent. To gain maximum performance benefits you might need to use profiling and benchmarking of the code under realistic conditions.

Automatic vectorization can often be impeded by any prior manual optimization of the code, for example, manual loop unrolling in the source code or complex array accesses. For optimal results, the best way is to write the code using simple loops, therefore enabling the compiler to perform all the optimization. For hand-optimized legacy code, you might find it easier to rewrite critical portions based on the original algorithm using simple loops. Removing manual optimizations might impede automatic vectorization.

For more information see:

Copyright © 2007 ARM Limited. All rights reserved.ARM DUI 0350A
Non-Confidential