5.8.3 Determining the local work-group size

You can specify the size of the work-group that OpenCL uses when you enqueue a kernel to execute on a device. To do this, you must know the maximum work-group size permitted by the OpenCL device your work-items execute on. To find the maximum work-group size for a specific kernel, use the clGetKernelWorkGroupInfo() function and request the CL_KERNEL_WORK_GROUP_SIZE property.

If your application is not required to share data among work-items, set the local_work_size parameter to NULL when enqueuing your kernel. This enables the OpenCL driver to determine an efficient work-group size for your kernel, but this might not be the optimal work-group size.

To get the maximum work-group size in each dimension, call clGetDeviceInfo() with CL_DEVICE_MAX_WORK_ITEM_SIZES. This provides maximum sizes for the simplest kernel, and dimensions might be lower for more complex kernels. The product of the dimensions of your work-group might limit the size of the work-group.

Note:

To get the maximum work-group size for a specific kernel, call clGetKernelWorkGroupInfo() with CL_KERNEL_WORK_GROUP_SIZE. If the maximum work-group size for a kernel is lower than 128, performance is reduced. If this is the case, try simplifying the kernel.

The work-group size for each dimension must divide evenly into the total data-size for that dimension. This means that the x size of the work-group must divide evenly into the x size of the total data. If this requirement means padding the work-group with extra work-items, ensure the additional work-items return immediately and do no work.

Non-ConfidentialPDF file icon PDF version101574_0301_00_en
Copyright © 2019 Arm Limited or its affiliates. All rights reserved.