8.3.1 About memory allocation

To avoid making the copies, use the OpenCL API to allocate memory buffers and use map() and unmap() operations. These operations enable both the application processor and the Mali™ GPU to access the data without any copies.

OpenCL originated in desktop systems where the application processor and the GPU have separate memories. To use OpenCL in these systems, you must allocate buffers to copy data to and from the separate memories.
Systems with Mali GPUs typically have a shared memory, so you are not required to copy data. However, OpenCL assumes that the memories are separate and buffer allocation involves memory copies. This is wasteful because copies take time and consume power.
The following table shows the different cl_mem_flags parameters in clCreateBuffer().

Table 8-1 Parameters for clCreateBuffer()

Parameter Description
CL_MEM_ALLOC_HOST_PTR
This is a hint to the driver indicating that the buffer is accessed on the host side. To use the buffer on the application processor side, you must map this buffer and write the data into it. This is the only method that does not involve copying data. If you must fill in an image that is processed by the GPU, this is the best way to avoid a copy.
CL_MEM_COPY_HOST_PTR
Copies the contents of the host_ptr argument into memory allocated by the driver.
CL_MEM_USE_HOST_PTR
Copies the content of the host memory pointer into the buffer when the first kernel using this buffer starts running. This flag enforces memory restrictions that can reduce performance. Avoid using this if possible.
When a map is executed, the memory must be copied back to the provided host pointer. This significantly increases the cost of map operations.
ARM® recommends the following:
  • Do not use private or local memory to improve memory read performance.
  • If your kernel is memory bandwidth bound, try using a simple formula to compute variables instead of reading from memory. This saves memory bandwidth and might be faster.
  • If your kernel is compute bound, try reading from memory instead of computing variables. This saves computations and might be faster.
Non-ConfidentialPDF file icon PDF versionARM 100614_0300_00_en
Copyright © 2012, 2013, 2015, 2016 ARM. All rights reserved.