9.5 Reducing the effect of serial computations

You can reduce the impact of serial components in your application by reducing and optimizing the computations.

Use memory mapping instead of memory copies to transfer data.
Optimize communication code.
To reduce latency, optimize the communication code that sends and receives data.
Keep messages small.
Reduce communication overhead by sending only the data that is required.
Use power of two sized memory blocks for communication.
Ensure the sizes of memory blocks used for communication are a power of two. This makes the data more cacheable.
Send more data in a smaller number of transfers.
Compute values instead of reading them from memory.
A simple computation is likely to be faster than reading from memory.
Do serial computations on the application processors.
Application processors are optimized for low latency tasks.
Use clEnqueueFillBuffer() to fill buffers.
The Mali™ OpenCL driver contains an optimized implementation of clEnqueueFillBuffer(). Use in place of manually implementing a buffer fill in your application.
Use clEnqueueFillImage() to fill images.
The Mali OpenCL driver contains an optimized implementation of clEnqueueFillImage(). Use this in place of manually implementing an image fill in your application.
