F.8 Shared virtual memory

Shared Virtual Memory (SVM) is a feature of OpenCL 2.0 that enables the same virtual memory address range to be used on both the GPU and the application processor.

There are two types of SVM:

Fine-grained
This is available when your platform supports full coherency.
Coarse-grained
This is for non-coherent or IO-coherent platforms.

SVM has the following advantages:

The following code-fragments are examples that show the difference between using a pointer and using a CL buffer.

Note:

This code only illustrates the difference, between the use of SVM buffer and CL buffer, it is not a complete example.

The first example shows the traditional approach of sharing data with the cl_buffer interface:

/* Create and prepare buffer content */

size_t buffer_size = 100 * 1024;
cl_buffer *buffer = clCreateBuffer(context, CL_MEM_READ_WRITE, buffer_size, NULL, &err);
cl_event buffer_event;
void *buffer_map_ptr = clEnqueueMapBuffer(queue, CL_NON_BLOCKING, CL_MAP_WRITE, buffer, 0, buffer_size, 0, NULL, &buffer_event, &err);
... some other code, perhaps ...
clWaitForEvent(1, &buffer_event);
... use buffer_map_ptr to fill the content ... 
clEnqueueUnmapBuffer(queue, buffer, 0, NULL, &buffer_event);

/* Now do some actual CL kernel work. */
cl_event kernel_event;
clSetKernelArg(kernel, 0, &buffer);
clEnqueueNDRangeKernel(queue, kernel, NULL, work_dim, global_size, local_size, 1, &buffer_event, &kernel_event);
... Ideally do some other work here ... 
clWaitForEvent(1, &kernel_event);

/* make use of the new buffer content */
buffer_map_ptr = clEnqueueMapBuffer(queue, CL_NON_BLOCKING, CL_MAP_WRITE, buffer, 0, buffer_size, 0, NULL, &buffer_event, &err);
... some other code, perhaps ...
clWaitForEvent(1, &buffer_event);
... use buffer_map_ptr to get data out of the buffer ... 
clEnqueueUnmapBuffer(queue, buffer, 0, NULL, &buffer_event);

In a coherent system, you can use SVM to write code like the following:

/* Create and prepare buffer content */
size_t buffer_size = 100 * 1024;
void *buffer = clSvmAlloc(context, CL_MEM_READ_WRITE, buffer_size, 0);
... use buffer to fill the content ... 

/* Now do some actual CL kernel work. */
clSetKernelArgSVMPointer(kernel, 0, buffer);
cl_event kernel_event;
clEnqueueNDRangeKernel(queue, kernel, NULL, work_dim, global_size, local_size, 0, NULL, &kernel_event);
... Ideally do some other work here ... 
clWaitForEvent(1, &kernel_event);

/* make use of the new buffer content */
... use buffer to get the data stored by the kernel ... 
clSVMFree(queue, buffer);

In a non-coherent system, it is still possible to use SVM, but you must use map() and unmap() calls to ensure that the view of the memory content is up to date on the application processor and the GPU.

/* Create and prepare buffer content */
size_t buffer_size = 100 * 1024;
void *buffer = clSvmAlloc(context, CL_MEM_READ_WRITE, buffer_size, 0);
clEnqueueSVMMap(queue, CL_NON_BLOCKING, CL_MAP_WRITE, buffer, buffer_size, 0, NULL, &buffer_event);
clWaitForEvent(1, &buffer_event);

... use buffer to fill the content ... 
clEnqueueSVMUnmap(queue, buffer, 0, NULL, &buffer_event)
/* Now do some actual CL kernel work. */
clSetKernelArgSVMPointer(kernel, 0, buffer);
cl_event kernel_event;
clEnqueueNDRangeKernel(queue, kernel, NULL, work_dim, global_size, local_size, 1, &buffer_event, &kernel_event);
... Ideally do some other work here ... 
clEnqueueSVMMap(queue, CL_NON_BLOCKING, CL_MAP_WRITE, buffer, buffer_size, 0, NULL, &buffer_event);
clWaitForEvent(1, &buffer_event);

/* make use of the new buffer content */
... use buffer to get to the data stored by the kernel ...
 
clEnqueueSVMUnmap(queue, buffer, 0, NULL, &buffer_event)
clWaitForEvent(1, &buffer_event);
clSVMFree(queue, buffer);

Note:

You can use map() and unmap() calls in a coherent system, but there is still some overhead from the API functions.
Non-ConfidentialPDF file icon PDF version101574_0301_00_en
Copyright © 2019 Arm Limited or its affiliates. All rights reserved.