7.2.3 Programming OpenCL for Mali GPUs
There are several differences between programming OpenCL on a Mali™ GPU and a desktop GPU.
On a Mali GPU:
- The global and local OpenCL address spaces are mapped to the same physical
memory and are accelerated by L1 and L2 caches. This means that you are not required to use
explicit data copies or implement the associated barrier synchronization.
All threads have individual program counters. This means that branch divergence is
not a major problem. Branch divergence is a major issue for warp or wavefront-based
In OpenCL, each work-item typically maps to a single thread
on a Mali GPU.
Use the kernel auto-vectorizer.