The peak throughput depends on the hardware implementation
of the Mali GPU type and configuration.
The Mali GPUs contain 1 to 16 identical shader cores. Each shader core supports up
to 384 concurrently executing threads.
Each shader core contains:
- One to four arithmetic pipelines.
- One load-store pipeline.
- One texture pipeline.
Note
OpenCL typically only uses the arithmetic or load-store execution
pipelines. The texture pipeline is only used for reading image data
types.
The Mali GPUs use a VLIW (Very Long Instruction Word)
architecture. Each instruction word contains multiple operations.
The Mali GPUs also use SIMD, so that most arithmetic instructions
operate on multiple data elements simultaneously.
Each thread uses only one of the arithmetic or load-store execution pipes at
any point in time. Two instructions from the same thread execute in sequence.