|Home > Retuning existing OpenCL code > Retuning existing OpenCL code for Mali GPUs > Locate and remove device optimizations|
There are optimizations for alternative compute devices that have no effect on Mali™ GPUs, or can reduce performance. To retune the OpenCL code for Mali GPUs, you must first remove all types of optimizations to create a non device-specific reference implementation.
Remove the following types of optimizations if you are targeting Mali™ Bifrost and Valhall GPUs:
Mali GPUs use caches instead of local memories. The OpenCL local and private memories are mapped into main memory. There is therefore no performance advantage using local or private memories in OpenCL code for Mali GPUs.
You can use local or private memories as temporary storage, but memory copies to or from the memories are an expensive operation. Using local or private memories can reduce performance in OpenCL on Mali GPUs.
Do not use local or private memories as a cache because this can reduce performance. The processors already contain hardware caches that perform the same job without the overhead of expensive copy operations.
Some code copies data into a local or private memory, processes it, then writes it out again. This code wastes both performance and power by performing these copies.
Some code optimizes reads and writes to ensure data fits into cache lines. This is a useful optimization for both increasing performance and reducing power consumption. However, the code is likely to be optimized for cache line sizes that are different than those used by Mali GPUs.
If the code is optimized for the wrong cache line size, there might be unnecessary cache flushes and this can decrease performance.