4.4.5 Additional techniques for reducing pipeline cycles
There are a number of additional techniques you can use to reduce the cycles used in each pipeline.
Avoid register spilling
The Mali Offline Shader Compiler indicates if your shader spills registers. Register spilling is typically caused in a thread by a high number of input uniforms that cannot fit entirely in the register set.
Register spilling forces the Mali GPU to read some uniforms from memory, this
increases the load on the Load/Store unit and reduces performance. To solve this issue,
try to reduce the number and the precision of the uniforms you supply to the shader.
In the Ice Cave demo, some of the shaders suffered from register spilling, for
Figure 4-24 Shader with register spilling.
Reducing the number of uniforms permitted solves this problem, and the result
is an increase in performance, for example:
Figure 4-25 Shader with no register spilling.
Reduce the precision of varying and uniforms
When you write custom shaders, you can specify the floating point precision of uniforms and varyings using 32-bit floats or 16-bit half-floats. The precision determines the minimum and maximum values that the variable can represent.
There are several advantages of using half-floats:
Bandwidth usage is reduced.
The cycles used in the Arithmetic pipeline are reduced because the shader
compiler can optimize your code to use more parallelization.
The number of uniform registers required is reduced and this in
turn reduces the risk of register spilling.
The following code provides examples of a simple fragment shader variant from
the Ice Cave demo. The shader is compiled with the Mali Offline Shader Compiler twice.
The first code example is compiled with floats:
Figure 4-26 Shader compiled with Floats
The second code example is compiled with half-floats:
Figure 4-27 Shader compiled with Half floats
The number of Load/Store instructions is reduced in the half-float version.
The number of work and uniform registers used is reduced and there is no register
The code generated with half-floats is also smaller than code generated with
floats. This improves the cache hit rate on the Mali GPU increasing performance.
Use world space normal maps for static objects
You can use Tangent space normal maps to increase the details of a model without increasing the geometric detail. You can use tangent space normal maps on animated objects without modifying them because of their locality to each triangle of the mesh.
Unfortunately these require more arithmetic operations to be performed in the
shaders to achieve the correct result. For static objects, these calculations are
You can alternatively use local space normal maps or world space normal maps.
Using local space normal maps reduces the number of calculations performed in the
shaders but transformations on the model must be applied to the sampled normal. World
space normal maps do not require any transformations but these are static and the
objects cannot move. In the Ice Cave demo, the cave and other high quality objects are
static and using world space normal maps reduces the number of ALU operations required
by the shaders considerably. Most common 3D modeling tools can create world space normal
maps or you can generate them by code in an offline process.