F.7 OpenCL 2.0 optimizations
OpenCL 2.0 includes several features that can improve performance over OpenCL 1.2.
OpenCL 2.0 includes the following features that you can use to optimize your
- Shared virtual memory
- On a fully coherent platform, shared virtual memory reduces the requirement
to call map and unmap API functions, when a memory region is used on both the
GPU and the application processor. See F.8 Shared virtual memory.
- Read-Write images
- This enables the same kernel to both read from and write to a single image, that
when used correctly, can improve cache efficiency and reduce memory usage.
- Generic Address space
- This enables code to be written once, and it works in any address
- sRGB images
- If the OpenCL kernel is reading from an sRGB image, it is not required to be
translated to RGB before it can be used, the read_image call converts to
standard RGB as part of the read operation.
- Program scope variables
- In some circumstances, program scope variables can be useful to avoid
passing data from the host program to multiple kernels. For example, if a kernel
is calculating a histogram, storing that in a buffer, the host program then
passes the same buffer to another kernel that does some other part of the work,
using the histogram, and the histogram is never used on the host, then a
plausible solution is to make the histogram into a global variable in the
program. Both kernels must be part of the same program for this to work
correctly. As always, using global variables does have some drawbacks,
particularly when it comes to understanding what variables can be modified by
what parts of the code.
- Pipes and device execution
recommends that you avoid using the OpenCL pipes and device execution
functionality. See F.9 OpenCL 2.0 pipes and device execution.