7.2.2 About Mali™ GPU architectures

Mali™ GPUs use an architecture in which instructions operate on multiple data elements simultaneously.

The peak throughput depends on the hardware implementation of the Mali GPU type and configuration.

Mali GPUs can contain many identical shader cores. Each shader core supports hundreds of concurrently executing threads.

OpenCL typically only uses the arithmetic pipelines or execution engines and the load-store pipelines. The texture pipeline is only used for reading image data types.

In the execution engines in Mali Bifrost and Valhall GPUs, scalar instructions are executed in parallel so the GPU operates on multiple data elements simultaneously. You are not required to vectorize your code to do this.

