10.1 About the kernel auto-vectorizer and unroller

The OpenCL compiler includes a kernel auto-vectorizer and a kernel unroller:

These operations can provide substantial performance gains if they are possible.
There are several options to control the auto-vectorizer and unroller. The following table shows the basic options.

Table 10-1 Kernel auto-vectorizer and unroller options

Option Description
no option Kernel unroller and vectorizer enabled, with conservative heuristics
-fno-kernel-vectorizer Disable the kernel vectorizer
-fno-kernel-unroller Disable the kernel unroller
-fkernel-vectorizer Enable aggressive heuristics for the kernel vectorizer
-fkernel-unroller Enable aggressive heuristics for the kernel unroller

Note

The kernel auto-vectorizer performs a code transformation. For the transformation to be possible, several conditions must be met:
  • The enqueued NDRange must be a multiple of the vectorization factor.
  • Barriers are not permitted in the kernel.
  • Thread-divergent code is not permitted in the kernel.
  • Global offsets are not permitted in the enqueued NDRange.
Non-ConfidentialPDF file icon PDF versionARM 100614_0300_00_en
Copyright © 2012, 2013, 2015, 2016 ARM. All rights reserved.