6.2.4 Analyze loops
Loops are good targets for parallelization because they repeat computations many times, often independently.
Consider the following types of loops:
- Loops that process few elements
If the loop only processes a relatively small number
of elements, it might not be appropriate for data parallel processing.
It might be better to parallelize these sorts of loops with
task parallelism on one or more application processors.
- Nested loops
- If the loop is part of a series of nested loops
and the total number of iterations is large, this loop is probably
appropriate for parallel processing.
- Perfect loops
Look for loops that:
thousands of items.
- Have no dependencies on previous iterations.
- Access data independently in each iteration.
These types of loops are data parallel, so are ideal for OpenCL.
- Simple loop parallelization
If the loop includes a variable that is incremented
based on a value from the previous iteration, this is a dependency
between iterations that prevents parallelization.
See if you can work out a formula that enables you to compute
the value of the variable based on the main loop counter.
In OpenCL work-items are processed in parallel, not in a sequential loop.
However, work-item processing acts in a similar way to a loop.
Every work-item has a unique global id that identifies it and you can use this value
in place of a loop counter.
It is also possible to have loops within work-items, but these
are independent of other work-items.
- Loops that require data from previous iterations
If your loop involves dependencies based on data processed
by a previous iteration, this is a more complex problem.
Can the loop be restructured to remove the dependency? If
not, it might not be possible to parallelize the loop.
There are several techniques that help you deal with dependencies. See if you can use these
techniques to parallelize the loop.
- Non-parallelizable loops
If the loop contains dependencies that you cannot
remove, investigate alternative methods of performing the computation.
These might be parallelizable.