4.4.4 About the Mali GPU pipelines

Mali GPUs contain three types of processing pipeline:

The pipelines all run in parallel. Your shaders typically use all three types of pipeline.
The Mali Offline Shader Compiler provides numbers of cycles used in each pipeline. The shader is slowest in the pipeline with the highest number of cycles. Optimize your shader with optimizations that target that pipeline it is slowest in.

The Arithmetic pipeline

All arithmetic operations consume cycles in the Arithmetic pipeline.

The following are a number of ways you can reduce the Arithmetic pipeline usage:
  • Avoid using complex arithmetic such as:
    • The inverse matrix function.
    • Modulo operators.
    • Division.
    • Determinant.
    • Sine.
    • Cosine.
  • For integer operands, use operations such as shifts to compute divisions, modulo, and multiplications.
  • Use transpose instead of inverse for orthogonal matrices.
  • To avoid computing the transpose, switch the order of operands in a matrix-vector or matrix-matrix multiplication if one of the matrices is transposed. For example:
    Transpose(A)*Vector == Vector * A.
You can also reduce load on the Arithmetic pipe by moving load to the other pipelines:
  • Pass matrices as uniforms instead of computing them. This uses the Load/Store pipeline.
  • Use a texture to store a set of precomputed values that represent a function such as sine or cosine. This moves the load to the Texture pipeline.

The Load/Store pipeline

The Load/Store pipeline is used for reading uniforms, writing varyings, and accessing buffers in the shaders such as Uniform Buffer Objects or Shader Storage Buffer Objects.

If your application is Load/Store pipeline bound, try the following techniques:
  • Use a texture instead of a buffer object to read data in the shader.
  • Compute data using arithmetic operations.
  • Compress or reduce uniforms and varyings.

The Texture pipeline

Texture accesses use cycles in the Texture pipeline and use memory bandwidth. Using large textures can be detrimental because cache misses are more likely and this can cause multiple threads to stall while waiting for data.

To improve the performance of the Texture pipeline try the following:
Use mipmaps
Mipmaps increase the cache hit rate because it selects the best resolution of the texture to use based on the variation of texture coordinates.
Use texture compression
This is also good for reducing the memory bandwidth and increasing the cache hit rate. Each compressed block contains more than one texel, so accessing it makes it more cacheable.
Avoid trilinear or anisotropic filtering
Trilinear and anisotropic filtering increase the number of operations required to fetch texels. Avoid using these techniques unless you absolutely require them.
Non-ConfidentialPDF file icon PDF versionARM 100140_0201_00_en
Copyright © 2014, 2015 ARM. All rights reserved.