10.1 About the kernel auto-vectorizer and unroller

The OpenCL compiler includes a kernel auto-vectorizer and a kernel unroller. You must manually enable these features.

The kernel auto-vectorizer takes existing code and transforms it into vector code.

The unroller merges work-items by unrolling the bodies of the kernels.

If these operations are possible, they can provide substantial performance gains.

For Bifrost and Valhall GPUs, you manually enable these features by passing the kernel transformations command-line options to the compiler, see:

There are several options to control the auto-vectorizer and unroller. The following table shows the basic options.

Table 10-1 Kernel auto-vectorizer and unroller options

Option Description
no option Kernel unroller and vectorizer enabled, with conservative heuristics.
-fno-kernel-vectorizer Disable the kernel vectorizer.
-fno-kernel-unroller Disable the kernel unroller.
-fkernel-vectorizer Enable aggressive heuristics for the kernel vectorizer.
-fkernel-unroller Enable aggressive heuristics for the kernel unroller.


The kernel auto-vectorizer performs a code transformation. For the transformation to be possible, several conditions must be met:

  • The enqueued NDRange must be a multiple of the vectorization factor.
  • Barriers are not permitted in the kernel.
  • Thread-divergent code is not permitted in the kernel.
  • Global offsets are not permitted in the enqueued NDRange.
Non-ConfidentialPDF file icon PDF version101574_0302_00_en
Copyright © 2019 Arm Limited or its affiliates. All rights reserved.