4.4.4 About the Mali GPU pipelines
Mali GPUs contain three types of processing pipeline:
The Load/Store pipeline.
The Texture pipeline.
The pipelines all run in parallel. Your shaders typically use all three types
The Mali Offline Shader Compiler provides numbers of cycles used in each pipeline. The
shader is slowest in the pipeline with the highest number of cycles. Optimize your
shader with optimizations that target that pipeline it is slowest in.
The Arithmetic pipeline
All arithmetic operations consume cycles in the Arithmetic pipeline.
The following are a number of ways you can reduce the Arithmetic pipeline
Avoid using complex arithmetic such as:
For integer operands, use operations such as shifts to compute
divisions, modulo, and multiplications.
Use transpose instead of inverse for orthogonal matrices.
To avoid computing the transpose, switch the order of operands in a
matrix-vector or matrix-matrix multiplication if one of the matrices is
transposed. For example:
Transpose(A)*Vector == Vector * A.
You can also reduce load on the Arithmetic pipe by moving load to the other
Pass matrices as uniforms instead of computing them. This uses the
Use a texture to store a set of precomputed values that represent a function
such as sine or cosine. This moves the load to the Texture pipeline.
The Load/Store pipeline
The Load/Store pipeline is used for reading uniforms, writing varyings, and accessing buffers in the shaders such as Uniform Buffer Objects or Shader Storage Buffer Objects.
If your application is Load/Store pipeline bound, try the following
Use a texture instead of a buffer object to read data in the
Compute data using arithmetic operations.
Compress or reduce uniforms and varyings.
The Texture pipeline
Texture accesses use cycles in the Texture pipeline and use memory bandwidth. Using large textures can be detrimental because cache misses are more likely and this can cause multiple threads to stall while waiting for data.
To improve the performance of the Texture pipeline try the following:
- Use mipmaps
- Mipmaps increase the cache hit rate because it
selects the best resolution of the texture to use based on the variation of
- Use texture compression
- This is also good for reducing the memory bandwidth and increasing the
cache hit rate. Each compressed block contains more than one texel, so accessing
it makes it more cacheable.
- Avoid trilinear or anisotropic filtering
- Trilinear and anisotropic filtering increase the number of operations required to fetch
texels. Avoid using these techniques unless you absolutely require them.