Non-Confidential | ![]() | 101574_0301_00_en | ||
| ||||
Home > OpenCL 2.0 > Shared virtual memory |
Shared Virtual Memory (SVM) is a feature of OpenCL 2.0 that enables the same virtual memory address range to be used on both the GPU and the application processor.
There are two types of SVM:
SVM has the following advantages:
It has lower overhead than the traditional
cl_buffer
interface.
SVM is easier to use in the host program because it is only a pointer to data.
With full coherency, you can use the memory without the overhead of calls to
map()
and unmap()
functions. It is also
possible to use atomic operations on both the GPU and application processor side
to update data that is shared between the two architectures.
The map()
and unmap()
functions
are still required with coarse-grained SVM.
It is easier to share work-loads between the GPU and application processor, because the address of the memory is the same in both GPU and application processor.
This enables you to build data structures that naturally use pointers such as linked lists or binary trees, on the host application processor, and the GPU can traverse these without having to translate the pointer values.
The following code-fragments are examples that show the difference between using a pointer and using a CL buffer.
The first example shows the traditional approach of sharing data with the
cl_buffer
interface:
/* Create and prepare buffer content */ size_t buffer_size = 100 * 1024; cl_buffer *buffer = clCreateBuffer(context, CL_MEM_READ_WRITE, buffer_size, NULL, &err); cl_event buffer_event; void *buffer_map_ptr = clEnqueueMapBuffer(queue, CL_NON_BLOCKING, CL_MAP_WRITE, buffer, 0, buffer_size, 0, NULL, &buffer_event, &err); ... some other code, perhaps ... clWaitForEvent(1, &buffer_event); ... use buffer_map_ptr to fill the content ... clEnqueueUnmapBuffer(queue, buffer, 0, NULL, &buffer_event); /* Now do some actual CL kernel work. */ cl_event kernel_event; clSetKernelArg(kernel, 0, &buffer); clEnqueueNDRangeKernel(queue, kernel, NULL, work_dim, global_size, local_size, 1, &buffer_event, &kernel_event); ... Ideally do some other work here ... clWaitForEvent(1, &kernel_event); /* make use of the new buffer content */ buffer_map_ptr = clEnqueueMapBuffer(queue, CL_NON_BLOCKING, CL_MAP_WRITE, buffer, 0, buffer_size, 0, NULL, &buffer_event, &err); ... some other code, perhaps ... clWaitForEvent(1, &buffer_event); ... use buffer_map_ptr to get data out of the buffer ... clEnqueueUnmapBuffer(queue, buffer, 0, NULL, &buffer_event);
In a coherent system, you can use SVM to write code like the following:
/* Create and prepare buffer content */ size_t buffer_size = 100 * 1024; void *buffer = clSvmAlloc(context, CL_MEM_READ_WRITE, buffer_size, 0); ... use buffer to fill the content ... /* Now do some actual CL kernel work. */ clSetKernelArgSVMPointer(kernel, 0, buffer); cl_event kernel_event; clEnqueueNDRangeKernel(queue, kernel, NULL, work_dim, global_size, local_size, 0, NULL, &kernel_event); ... Ideally do some other work here ... clWaitForEvent(1, &kernel_event); /* make use of the new buffer content */ ... use buffer to get the data stored by the kernel ... clSVMFree(queue, buffer);
In a non-coherent system, it is still possible to use SVM, but you must use
map()
and unmap()
calls to ensure that the view of
the memory content is up to date on the application processor and the GPU.
/* Create and prepare buffer content */ size_t buffer_size = 100 * 1024; void *buffer = clSvmAlloc(context, CL_MEM_READ_WRITE, buffer_size, 0); clEnqueueSVMMap(queue, CL_NON_BLOCKING, CL_MAP_WRITE, buffer, buffer_size, 0, NULL, &buffer_event); clWaitForEvent(1, &buffer_event); ... use buffer to fill the content ... clEnqueueSVMUnmap(queue, buffer, 0, NULL, &buffer_event) /* Now do some actual CL kernel work. */ clSetKernelArgSVMPointer(kernel, 0, buffer); cl_event kernel_event; clEnqueueNDRangeKernel(queue, kernel, NULL, work_dim, global_size, local_size, 1, &buffer_event, &kernel_event); ... Ideally do some other work here ... clEnqueueSVMMap(queue, CL_NON_BLOCKING, CL_MAP_WRITE, buffer, buffer_size, 0, NULL, &buffer_event); clWaitForEvent(1, &buffer_event); /* make use of the new buffer content */ ... use buffer to get to the data stored by the kernel ... clEnqueueSVMUnmap(queue, buffer, 0, NULL, &buffer_event) clWaitForEvent(1, &buffer_event); clSVMFree(queue, buffer);
map()
and unmap()
calls in a coherent
system, but there is still some overhead from the API functions.