6.2.4 Analyze loops

Loops are good targets for parallelization because they repeat computations many times, often independently.

Consider the following types of loops:
Loops that process few elements
If the loop only processes a relatively small number of elements, it might not be appropriate for data parallel processing.
It might be better to parallelize these sorts of loops with task parallelism on one or more application processors.
Nested loops
If the loop is part of a series of nested loops and the total number of iterations is large, this loop is probably appropriate for parallel processing.
Perfect loops
Look for loops that:
  • Process thousands of items.
  • Have no dependencies on previous iterations.
  • Access data independently in each iteration.
These types of loops are data parallel, so are ideal for OpenCL.
Simple loop parallelization
If the loop includes a variable that is incremented based on a value from the previous iteration, this is a dependency between iterations that prevents parallelization.
See if you can work out a formula that enables you to compute the value of the variable based on the main loop counter.
In OpenCL work-items are processed in parallel, not in a sequential loop. However, work-item processing acts in a similar way to a loop.
Every work-item has a unique global id that identifies it and you can use this value in place of a loop counter.
It is also possible to have loops within work-items, but these are independent of other work-items.
Loops that require data from previous iterations
If your loop involves dependencies based on data processed by a previous iteration, this is a more complex problem.
Can the loop be restructured to remove the dependency? If not, it might not be possible to parallelize the loop.
There are several techniques that help you deal with dependencies. See if you can use these techniques to parallelize the loop.
Non-parallelizable loops
If the loop contains dependencies that you cannot remove, investigate alternative methods of performing the computation. These might be parallelizable.
Related concepts
6.3 Parallel processing techniques in OpenCL
6.3.1 Use the global ID instead of the loop counter
6.4 Using parallel processing with non-parallelizable code
Non-ConfidentialPDF file icon PDF versionARM 100614_0300_00_en
Copyright © 2012, 2013, 2015, 2016 ARM. All rights reserved.