8.1 The optimization process for OpenCL applications

To optimize your application, you must first identify the most computationally intensive parts of your application. In an OpenCL application that means identifying the kernels that take the most time.

To identify the most computationally intensive kernels, you must individually measure the time taken by each kernel:

Measure individual kernels

Go through your kernels one at a time and:

  1. Measure the time it takes for several runs.
  2. Average the results.

Note:

It is important that you measure the run times of the individual kernels to get accurate measurements.

Do a dummy run of the kernel the first time to ensure that the memory is allocated. Ensure this is outside of your timing loop.

The allocation of some buffers in certain cases is delayed until the first time they are used. This can cause the first kernel run to be slower than subsequent runs.

Select the kernels that take the most time
Select the kernels that have the longest run-time and optimize these. Optimizing any other kernels has little impact on overall performance.
Analyze the kernels

Analyze the kernels to see if they contain computationally expensive operations:

  • Measure how many reads and writes there are in the kernel. For high performance, do as many computations per memory access as possible.
  • For Mali™ GPUs, you can use the Offline Shader Compiler to check the balancing between the different pipelines.

Measure individual parts of the kernel

If you cannot determine the compute intensive part of the kernel by analysis, you can isolate it by measuring different parts of the kernel individually.

You can do this by removing different code blocks and measuring the performance difference each time.

The section of code that takes the most time is the most intensive.

Apply optimizations
Consider how the most intensive section of code can be rewritten and what optimizations apply.
Apply a relevant optimization.
Check your results
Whenever you make changes to optimize your code, ensure that you measure the results so you can determine the optimization was successful. Many changes that are beneficial in one situation, might not provide any benefit, or even reduce performance under a different set of conditions.
Reiterate the process
When you have increased the performance of your code with an optimization, measure it again to find out if there are other areas you can improve performance. There are typically several areas where you can improve performance so you might need to iterate the process many times to achieve optimal performance.
Non-ConfidentialPDF file icon PDF version101574_0301_00_en
Copyright © 2019 Arm Limited or its affiliates. All rights reserved.