Mali™ GPU Application Optimization Guide

Version: 1.0

Table of Contents

About this book
Intended audience
Using this book
Additional reading
Feedback on content
1. Introduction
1.1. About optimization
1.1.1. What is optimization?
1.1.2. Why optimize?
1.1.3. Aims of optimization
1.2. The graphics pipeline
1.2.1. OpenGL ES Graphics pipeline overview
1.2.2. Initial processing
1.2.3. Per-vertex operations
1.2.4. Rasterization and fragment shading
1.2.5. Blending and framebuffer operations
1.3. The Mali GPU hardware
1.3.1. Tile based rendering
1.3.2. Mali GPU hardware components
1.3.3. The Geometry processor
1.3.4. The Pixel processors
1.3.5. L2 cache controller
1.4. Differences between desktop systems and mobile devices
1.5. Differences between mobile renderers
1.5.1. Differences with other mobile GPUs
1.5.2. Differences with software renderers
2. Optimization Checklist
2.1. About the optimization checklist
2.2. Check the display settings
2.2.1. About display settings
2.2.2. Data conversions caused by incorrect settings
2.2.3. Configuring display settings to avoid conversions
2.2.4. Ensure your application has the correct drawing surface
2.3. Use direct rendering if possible
2.4. Use the correct tools with the correct settings
2.4.1. Use the latest tools
2.4.2. Rebuild everything after a tools update
2.4.3. Build for the correct architecture
2.4.4. Use the facilities in your hardware
2.4.5. Optimize your release build
2.5. Remove debugging information
2.6. Avoid infinite command lists
2.7. Avoid calls that stall the graphics pipeline
2.8. Do not compile shaders every frame
2.9. Use VSYNC
2.9.1. About VSYNC
2.9.2. Using VSYNC
2.9.3. Potential issues with VSYNC
2.9.4. Triple buffering
2.10. Use graphics assets appropriate for your platform
2.11. Do not use 24-bit textures
2.12. Use mipmapping
2.13. Use texture compression
2.13.1. About texture compression
2.13.2. Suitability of textures for texture compression
2.13.3. Using ETC1 with transparent objects
2.14. Reduce memory bandwidth usage
2.14.1. About reducing bandwidth
2.14.2. Activate back face culling
2.14.3. Utilize view frustum culling
2.14.4. Ensure textures are not too large
2.14.5. Use a texture resolution that fits the object on screen
2.14.6. Use low bit depth textures where possible
2.14.7. Use lower resolution textures if the texture does not contain sharp detail
2.14.8. Textures and lighting maps do not have to be the same size
2.14.9. Consider if tri-linear filtering is necessary for every object
2.14.10. Utilize dynamic level of detail
2.15. Use Vertex Buffer Objects
2.16. Ensure your application is not CPU bound
2.16.1. Determining if your application is CPU bound
2.16.2. Optimize application logic
2.16.3. Use loop optimizations
2.16.4. Align data
2.17. Check system settings
2.18. Final release check list
3. The Optimization Process
3.1. About the optimization process
3.2. General optimization advice
3.2.1. The general principle of try it and see
3.2.2. Use frame time instead of FPS for measurements
3.2.3. Set a computation budget and measure against it
3.2.4. Bottlenecks move between processors
3.3. The optimization process steps
3.3.1. Take measurements
3.3.2. Locate the bottleneck
3.3.3. Determine the optimization
3.3.4. Apply the optimization
3.3.5. Verify the optimization
3.3.6. Repeat the optimization process
3.4. Locating bottlenecks with the Performance Analysis Tool
3.4.1. Taking measurements with the instrumented drivers
3.4.2. About the Performance Analysis Tool
3.4.3. GPU counters
3.4.4. Analyzing graphs
3.4.5. Specific problems areas to look at
3.4.6. Additional counters to examine
3.5. Locating bottlenecks with other tools
3.5.1. Taking measurements without the Performance Analysis Tool
3.5.2. Measurements from other Mali GPU tools
3.5.3. Information from debugging tools
3.6. Finding exact problem areas
3.6.1. Technique for locating the exact problem areas
3.6.2. Areas to investigate
3.6.3. Diagnosing when memory bandwidth is a problem
3.7. Determining the relevant optimization
3.7.1. Optimization lists
3.7.2. Miscellaneous optimizations
4. Optimization Techniques
4.1. Minimize draw calls
4.1.1. About minimizing draw calls
4.1.2. Limitations on combined draw calls
4.1.3. Combining textures in a texture atlas
4.2. Minimize state changes
4.3. Avoid overdraw
4.3.1. Use culling
4.3.2. Sort objects and draw in front to back order
4.3.3. Optimize the use of transparency
4.4. Use approximations to improve performance
4.4.1. General methods of approximation
4.4.2. Specific methods of approximation
4.5. Use dynamic level of detail
4.6. Optimize loops
4.7. Use fast data structures
4.8. Use vector instructions
4.9. Make use of under-used resources
4.9.1. Moving operations from the pixel processor to the geometry processor
4.9.2. Moving operations from the geometry processor to the pixel processor
4.9.3. Using spare resources to increase image quality
4.9.4. Using spare resources to save power
4.10. Ensure the graphics pipeline is kept running
4.11. Application optimizations

Proprietary Notice

Words and logos marked with ™ or ® are registered trademarks or trademarks of ARM in the EU and other countries, except as otherwise stated below in this proprietary notice. Other brands and names mentioned herein may be the trademarks of their respective owners.

Neither the whole nor any part of the information contained in, or the product described in, this document may be adapted or reproduced in any material form except with the prior written permission of the copyright holder.

The product described in this document is subject to continuous developments and improvements. All particulars of the product and its use contained in this document are given by ARM in good faith. However, all warranties implied or expressed, including but not limited to implied warranties of merchantability, or fitness for purpose, are excluded.

This document is intended only to assist the reader in the use of the product. ARM shall not be liable for any loss or damage arising from the use of any information in this document, or any error or omission in such information, or any incorrect use of the product.

Where the term ARM is used it means “ARM or any of its subsidiaries as appropriate”.

Confidentiality Status

This document is Non-Confidential. The right to use, copy and disclose this document may be subject to license restrictions in accordance with the terms of the agreement entered into by ARM and the party that ARM delivered this document to.

Product Status

The information in this document is final, that is for a developed product.

Revision History
Revision A30 March 2011First release
Copyright © 2011 ARM. All rights reserved.ARM DUI 0555A