Mali™-T600 Series GPU OpenCL Developer Guide

Version 1.1.0

Table of Contents

About this book
Product revision status
Intended audience
Using this book
Typographical conventions
Additional reading
Feedback on this product
Feedback on content
1. Introduction
1.1. About GPU compute
1.2. About OpenCL
1.3. About the Mali-T600 Series Linux OpenCL driver
1.4. About the Mali-T600 Series OpenCL SDK
2. Parallel Processing Concepts
2.1. Types of parallelism
2.2. Concurrency
2.3. Limitations of parallel processing
2.4. Embarrassingly parallel applications
2.5. Mixing different types of parallelism
3. OpenCL Concepts
3.1. About OpenCL
3.2. OpenCL applications
3.3. OpenCL execution model
3.4. OpenCL data processing
3.4.1. Work-items and the NDRange
3.4.2. OpenCL work-groups
3.4.3. Identifiers in OpenCL
3.5. The OpenCL memory model
3.6. The Mali GPU memory model
3.7. Summary
4. Stages in an OpenCL Program
4.1. Software required for OpenCL development
4.2. Development stages
4.3. Finding the available compute devices
4.4. Initializing and creating OpenCL contexts
4.5. Creating a command queue
4.6. Creating program objects
4.7. Building a program executable
4.8. Creating kernel and memory objects
4.8.1. Creating kernel objects
4.8.2. Creating memory objects
4.9. Executing the kernel
4.9.1. Determining the data dimensions
4.9.2. Determining the optimal global work size
4.9.3. Determining the local work-group size
4.9.4. Enqueuing kernel execution
4.9.5. Executing kernels
4.10. Reading the results
4.11. Cleaning up
5. Converting Existing Code to OpenCL
5.1. Profile your application
5.2. Analyzing code for parallelization
5.2.1. About analyzing code for parallelization
5.2.2. Look for data parallel operations
5.2.3. Look for operations with few dependencies
5.2.4. Analyze loops
5.3. Parallel Processing Techniques
5.3.1. Use the global ID instead of the loop counter
5.3.2. Compute values in a loop with a formula instead of using counters
5.3.3. Compute values per frame
5.3.4. Perform computations with dependencies in multiple-passes
5.3.5. Pre-compute values to remove dependencies
5.3.6. Use software pipelining
5.3.7. Use task parallelism
5.4. Using parallel processing with non-parallelizable code
5.5. Dividing data for OpenCL
5.5.1. About dividing data for OpenCL
5.5.2. Use concurrent data structures
5.5.3. Data division examples
6. Retuning Existing OpenCL Code for Mali GPUs
6.1. About optimizing existing OpenCL code for Mali GPUs
6.2. Procedure for optimizing existing OpenCL code for Mali GPUs
6.2.1. Analyze code
6.2.2. Locate and remove device optimizations
6.2.3. Optimizing your OpenCL code for Mali GPUs
7. Optimizing OpenCL for Mali GPUs
7.1. General optimizations
7.2. Code optimizations
7.3. Memory optimizations
7.4. Kernel optimizations
7.5. Execution optimizations
7.6. Reducing the effect of serial computations
8. The Mali OpenCL SDK
A. OpenCL Data Types
B. OpenCL Built-in Functions
B.1. Work-item functions
B.2. Math functions
B.3. half_ and native_ math functions
B.4. Integer functions
B.5. Common functions
B.6. Geometric functions
B.7. Relational functions
B.8. Vector data load and store functions
B.9. Synchronisation
B.10. Asynchronous copy functions
B.11. Atomic functions
B.12. Miscellaneous vector functions
B.13. Image read and write functions
C. OpenCL Extensions

Proprietary Notice

Words and logos marked with ™ or ® are registered trademarks or trademarks of ARM® in the EU and other countries, except as otherwise stated below in this proprietary notice. Other brands and names mentioned herein may be the trademarks of their respective owners.

Neither the whole nor any part of the information contained in, or the product described in, this document may be adapted or reproduced in any material form except with the prior written permission of the copyright holder.

The product described in this document is subject to continuous developments and improvements. All particulars of the product and its use contained in this document are given by ARM in good faith. However, all warranties implied or expressed, including but not limited to implied warranties of merchantability, or fitness for purpose, are excluded.

This document is intended only to assist the reader in the use of the product. ARM shall not be liable for any loss or damage arising from the use of any information in this document, or any error or omission in such information, or any incorrect use of the product.

Where the term ARM is used it means “ARM or any of its subsidiaries as appropriate”.

Confidentiality Status

This document is Non-Confidential. The right to use, copy and disclose this document may be subject to license restrictions in accordance with the terms of the agreement entered into by ARM and the party that ARM delivered this document to.

Product Status

The information in this document is final, that is for a developed product.

Revision History
Revision A12 July 2012First release for r1p1
Revision D07 November 2012First release for r1p2
Revision E18 February 2013First release for Mali T600 Series OpenCL SDK
Copyright © 2012-2013 ARM. All rights reserved.DUI0538E