If part of your computation is serial, see if it can be removed and performed separately.
For example, the audio synthesis technique Frequency Modulation (FM)
works by reading an audio waveform called the carrier. The rate
the waveform is read at depends on another waveform called the modulator.
In one type of algorithm, the carrier values are read by a
pointer to generate the output waveform. The position of the pointer
is computed by taking the previous value and moving it by an amount
determined by the value of the modulator waveform.
The position of the pointer has a dependency on the previous
value and that value has a dependency on the value before it. This
series of dependencies makes the algorithm difficult or impossible
Another approach is to consider that the pointer is moving
through the carrier waveform at a fixed speed and the modulator
is adding or subtracting an offset. This can be computed in parallel,
but the offsets are incorrect because they do not take account of
the dependencies on previous offsets.
The computation of the correct offsets is a serial process.
If you pre-compute these values, the remaining computation can be
parallelized. The parallel component reads from the generated offset
table and uses this to read the correct value from the carrier waveform.
There is a potential problem with this example. The offset
table must be recomputed every time the modulating waveform changes.
This is an example of Amdahl’s law. The amount of parallel computation
possible is limited by the speed of the serial computation.