| |||

Home > Using the NEON Vectorizing Compiler > Improving performance > Example of improving performance by tuning source code |

The compiler can provide diagnostic information to indicate
where vectorization optimizations are successfully applied and where
it failed to apply vectorization. See *--diag_suppress=optimizations* and *--diag_warning=optimizations* for more information.

Example 3.14 shows two functions that implement a simple sum operation on an array. This code does not vectorize.

**Example 3.14. Non vectorizable code**

int addition(int a, int b) { return a + b; } void add_int(int *pa, int *pb, unsigned int n, int x) { unsigned int i; for(i = 0; i < n; i++) *(pa + i) = addition(*(pb + i),x); }

Using the `--diag_warnings=optimization`

option
produces an optimization warning message for the `addition()`

function.

Adding the __inline qualifier to the definition of `addition()`

enables
this code to vectorize but it is still not optimal. Using the `--diag_warnings=optimization`

option again,
produces optimization warning messages to indicate that the loop
vectorizes but there might be a potential pointer aliasing problem.

The compiler must generate a runtime test for aliasing and output both vectorized and scalar copies of the code. Example 3.15 shows how this can be improved using the restrict keyword if you know that the pointers are not aliased.

**Example 3.15. Using restrict to improve vectorization
performance**

__inline int addition(int a, int b) { return a + b; } void add_int(int * __restrict pa, int * __restrict pb, unsigned int n, int x) { unsigned int i; for(i = 0; i < n; i++) *(pa + i) = addition(*(pb + i),x); }

The final improvement that can be made is to the number of loop iterations. In Example 3.15, the number of iterations is not fixed and might not be a multiple that can fit exactly into a NEON register. This means that the compiler must test for remaining iterations to execute using non vectored code. If you know that your iteration count is one of those supported by NEON, you can indicate this to the compiler. Example 3.16 shows the final improvement that can be made to obtain the best performance from vectorization.

**Example 3.16. Code tuned for best vectorization
performance**

__inline int addition(int a, int b) { return a + b; } void add_int(int * __restrict pa, int * __restrict pb, unsigned int n, int x) { unsigned int i; for(i = 0; i < (n & ~3); i++) *(pa + i) = addition(*(pb + i),x); /* n is a multiple of 4 */ }