|Home > Optimizing OpenCL for Mali GPUs > The optimization process for OpenCL applications|
To optimize your application, you must first identify the most computationally intensive parts of your application. In an OpenCL application that means identifying the kernels that take the most time.
To identify the most computationally intensive kernels, you must individually measure the time taken by each kernel:
Go through your kernels one at a time and:
Do a dummy run of the kernel the first time to ensure that the memory is allocated. Ensure this is outside of your timing loop.
The allocation of some buffers in certain cases is delayed until the first time they are used. This can cause the first kernel run to be slower than subsequent runs.
Analyze the kernels to see if they contain computationally expensive operations:
For Mali™ GPUs, you can use the Offline Shader Compiler to check the balancing between the different pipelines.
If you cannot determine the compute intensive part of the kernel by analysis, you can isolate it by measuring different parts of the kernel individually.
You can do this by removing different code blocks and measuring the performance difference each time.
The section of code that takes the most time is the most intensive.