7.4. NEON coding alternatives

NEON code may be written in a number of ways. These are briefly listed here (but see the ARM NEON Programmers Guide for details). These include the use of intrinsics, automatic vectorization of C code, the use of libraries and of course directly writing in assembly language.

Intrinsics are C or C++ pseudo-function calls that the compiler replaces with the appropriate NEON instructions. This allows you to use the data types and operations available in the NEON implementation, while allowing the compiler to handle instruction scheduling and register allocation. These intrinsics are defined in the ARM C Language Extensions document.

Auto-vectorization is controlled with the -fvectorize option in ARM Compiler 6, but is enabled automatically at higher optimization levels (-O2 and above). Auto-vectorization is disabled at -O0 even if you specify -fvectorize. Therefore, you would use the following to enable auto-vectorization at -O1:

  armclang --target=armv8a-arm-none-eabi -fvectorize -O1 -c file.c 

There are various libraries available which can use NEON code. The exact status of such libraries changes over time and so current support is not covered in this guide.

Although it is technically possible to optimize NEON assembly by hand, this can be very difficult because the pipeline and memory access timings have complex inter-dependencies. Instead of hand assembly, ARM strongly recommends the use of intrinsics:

Copyright © 2015 ARM. All rights reserved.ARM DEN0024A
Non-ConfidentialID050815