You can specify the size of the work-group that OpenCL uses when you enqueue a kernel to execute on a device. To do this, you must know the maximum work-group size permitted by the OpenCL device your work-items execute on. To find the maximum work-group size for a specific kernel, use the clGetKernelWorkGroupInfo() function and request the CL_KERNEL_WORK_GROUP_SIZE property.
If your application is not required to share data among work-items,
local_work_size parameter to
enqueuing your kernel. This enables the OpenCL driver to determine
an efficient work-group size for your kernel, but this might not
be the optimal work-group size.
To get the maximum work-group size in each dimension, call
This is for the simplest kernel and dimensions might be lower for
more complex kernels. The product of the dimensions of your work-group
might limit the size of the work-group.
To get the total work-group size, call
If the maximum work-group size for a kernel is lower than 128, performance
is reduced. If this is the case, try simplifying the kernel.
The work-group size for each dimension must divide evenly into the total
data-size for that dimension. This means that the x size of the work-group must divide evenly
into the x size of the total data. If this requirement means padding the work-group with extra
work-items, ensure the additional work-items return immediately and do no work.