A2.1 Components

The Cortex®‑A32 processor consists of:

  • One to four cores, each with its own governor block. The governor block provides functionality that remains required when the core is in retention.
  • An SCU-L2 memory system block. The SCU maintains data coherency between the L1 data caches and the L2 cache. It also connects the cores to an external memory system using an AXI, ACE, or CHI master interface. A mini-SCU replaces the SCU in configurations that do not require the SCU functionality. The mini-SCU is instantiated in implementations that are configured with a single core, no L2 cache, no CPU cache protection, and an AXI master interface.

The processor also integrates CoreSight components, and optionally integrates cache protection and the Cryptographic Extension.

The following figure shows a top-level functional diagram of the Cortex‑A32 processor.

Figure A2-1 Cortex‑A32 processor block diagram
To view this graphic, your browser must support the SVG format. Either install a browser with native support, or install an appropriate plugin such as Adobe SVG Viewer.

Instruction Fetch Unit (IFU)

The IFU obtains instructions from the instruction cache or from external memory and predicts the outcome of branches in the instruction stream. It passes the instructions to the Data Processing Unit (DPU) for processing.

In implementations with CPU cache protection, parity bits protect the L1 Instruction cache data and tag RAMs by enabling the detection of any single-bit error. If an error is detected, the line is invalidated and fetched again.

Data Processing Unit (DPU)

The DPU decodes and executes instructions. It executes instructions that require data transfer to or from the memory system by interfacing to the Data Cache Unit (DCU). The DPU includes the Performance Monitor Unit (PMU), the Advanced SIMD and floating-point support, and the Cryptographic Extension.


The PMU provides six performance monitors that can be configured to gather statistics on the operation of each core and the memory system. The information can be used for debug and code profiling.

Advanced SIMD and floating-point support

Advanced SIMD is a media and signal processing architecture that adds instructions primarily for audio, video, 3-D graphics, image, and speech processing. The floating-point architecture provides support for single-precision and double-precision floating-point operations.


The Advanced SIMD architecture, its associated implementations, and supporting software, are also referred to as NEON technology.
Cryptographic Extension
The optional Cortex‑A32 processor Cryptographic Extension supports the ARMv8 Cryptographic Extensions. It can be configured at implementation time and applies to all cores. The Cryptographic Extension adds new instructions to Advanced SIMD that accelerate:
  • Advanced Encryption Standard (AES) encryption and decryption.
  • The Secure Hash Algorithm (SHA) functions SHA-1, SHA-224, and SHA-256.
  • Finite field arithmetic used in algorithms such as Galois/Counter Mode and Elliptic Curve Cryptography.

Memory Management Unit (MMU)

The MMU provides fine-grained memory system control through a set of virtual-to-physical address mappings and memory attributes that are held in translation tables. These are loaded into the Translation Lookaside Buffer (TLB) when a location is accessed. The TLB entries include global and application specific identifiers to prevent context switch TLB flushes. They also include Virtual Machine Identifiers (VMIDs) to prevent TLB flushes on virtual machine switches by the hypervisor.

Micro TLBs
The first level of caching for the translation table information is a micro TLB of ten entries. It is implemented on each of the instruction and data sides. All main TLB related maintenance operations result in flushing both the instruction and data micro TLB.
Main TLB

A unified main TLB handles misses from the micro TLBs.

In implementations with CPU cache protection, parity bits protect the TLB RAMs by enabling the detection of any single-bit error. If an error is detected, the entry is flushed and fetched again.

L1 data-side memory system

The L1 data-side memory system includes the Data Cache Unit (DCU), the Store Buffer (STB), and the Bus Interface Unit.


The DCU manages all load and store operations.

In implementations with CPU cache protection, parity bits protect the L1 Data cache tag RAMs and dirty RAMs. The L1 Data cache data RAMs are protected using Error Correction Codes (ECC). The ECC scheme is Single Error Correct Double Error Detect (SECDED). The DCU includes a combined local and global exclusive monitor that is used by the Load-Exclusive/Store-Exclusive instructions.


The STB holds store operations when they have left the load/store pipeline in the DCU and have been committed by the DPU. The STB can request access to the cache RAMs in the DCU, request the BIU to initiate linefills, or request the Bus Interface Unit (BIU) to write out the data on the external write channel. External data writes are through the SCU.

The STB is also used to queue maintenance operations before they are broadcast to other cores in the processor.

The BIU contains the SCU interface and buffers to decouple the interface from the L1 Data cache and STB. The BIU and the SCU always operate at the processor frequency.


The governor block, outside the core, includes all functions that must remain operating while a core is in retention mode.

GIC CPU interface

The GIC CPU interface is a memory-mapped interface through which a core receives an interrupt. The GIC Distributor can read and write the GIC CPU interface registers even while the core is in retention mode.

Generic timer
The Generic Timer has an interface to an external system counter. It provides a consistent view of time, which can be used to schedule events and trigger interrupts. It is also used by the retention circuits in the processor.

L2 Memory System

The L2 memory system contains the L2 cache pipeline and all the logic that maintains memory coherence between the cores in the cluster.

The SCU connects the cores to the external memory system through the master memory interface. It also maintains data cache coherency between the cores and arbitrates L2 requests from the cores.
The mini-SCU replaces the SCU in certain uniprocessor configurations that do not require data cache coherency with other masters in the system. That is, implementations that are configured to have a single core, no L2 cache, no CPU cache protection, and an AXI interface. The mini-SCU bridges between the master interface of the core and the AXI master interface of the processor.
L2 cache

Each Cortex‑A32 cluster can include an optional L2 cache that participates in the coherency protocol. Each L2 cache is 8-way set associative, supports 64-byte cache lines, and has a configurable cache RAM size between 128KB and 1MB.


The ACP interface cannot be configured without an L2 cache because it reuses buffering and data paths implemented for the L2 cache to achieve optimal efficiency. The main advantage of the ACP interface is its ability to allocate data in the L2 cache RAMs.

Debug and trace components

The Cross Trigger Matrix (CTM) combines the CoreSight Cross Trigger Interface (CTI) channel signals from all the cores so that a single cross trigger channel interface is presented in the Cortex‑A32 processor. This module can combine up to four internal channel interfaces corresponding to each core along with one external channel interface.
Debug ROM
The Cortex‑A32 processor has a debug ROM which is a CoreSight feature.
The ETM trace unit is a build-time configuration option. This module performs real-time instruction flow tracing that complies with the ETM architecture.
Non-ConfidentialPDF file icon PDF versionARM 100241_0001_00_en
Copyright © 2016, 2017 ARM Limited or its affiliates. All rights reserved.