F.7 OpenCL 2.0 optimizations

OpenCL 2.0 includes several features that can improve performance over OpenCL 1.2.

OpenCL 2.0 includes the following features that you can use to optimize your code:

Shared virtual memory
On a fully coherent platform, shared virtual memory reduces the requirement to call map and unmap API functions, when a memory region is used on both the GPU and the application processor. See F.8 Shared virtual memory.
Read-Write images
This enables the same kernel to both read from and write to a single image, that when used correctly, can improve cache efficiency and reduce memory usage.
Generic Address space
This enables code to be written once, and it works in any address space.
sRGB images
If the OpenCL kernel is reading from an sRGB image, it is not required to be translated to RGB before it can be used, the read_image call converts to standard RGB as part of the read operation.
Program scope variables
In some circumstances, program scope variables can be useful to avoid passing data from the host program to multiple kernels. For example, if a kernel is calculating a histogram, storing that in a buffer, the host program then passes the same buffer to another kernel that does some other part of the work, using the histogram, and the histogram is never used on the host, then a plausible solution is to make the histogram into a global variable in the program. Both kernels must be part of the same program for this to work correctly. As always, using global variables does have some drawbacks, particularly when it comes to understanding what variables can be modified by what parts of the code.
Pipes and device execution
Arm recommends that you avoid using the OpenCL pipes and device execution functionality. See F.9 OpenCL 2.0 pipes and device execution.
Non-ConfidentialPDF file icon PDF version101574_0301_00_en
Copyright © 2019 Arm Limited or its affiliates. All rights reserved.