9.4 Execution optimizations
ARM® recommends some execution optimizations such as optimizing communication code to reduce latency.
ARM also recommends that:
- If you are building from source, cache
binaries on the storage device.
If you know the kernels that you are using when your application initializes, call
clCreateKernelsInProgram() to initiate the final
compile as soon as possible.
Doing this ensures that when you use kernels in the future,
they start faster because the existing finalized binary is used.
If you use callbacks to prompt the processor to
continue processing data resulting from the execution of a kernel,
ensure that the callbacks are set before you flush the queue.
If you do not do this, the callbacks might occur at the end
of a larger batch of work, later than they might have based on actual
completion of work.