|Non-Confidential||PDF version||ARM 100241_0001_00_en|
|Home > Functional Description > Technical Overview > Components|
The Cortex®‑A32 processor consists of:
The processor also integrates CoreSight components, and optionally integrates cache protection and the Cryptographic Extension.
The following figure shows a top-level functional diagram of the Cortex‑A32 processor.
The IFU obtains instructions from the instruction cache or from external memory and predicts the outcome of branches in the instruction stream. It passes the instructions to the Data Processing Unit (DPU) for processing.
In implementations with CPU cache protection, parity bits protect the L1 Instruction cache data and tag RAMs by enabling the detection of any single-bit error. If an error is detected, the line is invalidated and fetched again.
The DPU decodes and executes instructions. It executes instructions that require data transfer to or from the memory system by interfacing to the Data Cache Unit (DCU). The DPU includes the Performance Monitor Unit (PMU), the Advanced SIMD and floating-point support, and the Cryptographic Extension.
The PMU provides six performance monitors that can be configured to gather statistics on the operation of each core and the memory system. The information can be used for debug and code profiling.
Advanced SIMD is a media and signal processing architecture that adds instructions primarily for audio, video, 3-D graphics, image, and speech processing. The floating-point architecture provides support for single-precision and double-precision floating-point operations.
The MMU provides fine-grained memory system control through a set of virtual-to-physical address mappings and memory attributes that are held in translation tables. These are loaded into the Translation Lookaside Buffer (TLB) when a location is accessed. The TLB entries include global and application specific identifiers to prevent context switch TLB flushes. They also include Virtual Machine Identifiers (VMIDs) to prevent TLB flushes on virtual machine switches by the hypervisor.
A unified main TLB handles misses from the micro TLBs.
In implementations with CPU cache protection, parity bits protect the TLB RAMs by enabling the detection of any single-bit error. If an error is detected, the entry is flushed and fetched again.
The L1 data-side memory system includes the Data Cache Unit (DCU), the Store Buffer (STB), and the Bus Interface Unit.
The DCU manages all load and store operations.
In implementations with CPU cache protection, parity bits protect the L1 Data cache tag RAMs and dirty RAMs. The L1 Data cache data RAMs are protected using Error Correction Codes (ECC). The ECC scheme is Single Error Correct Double Error Detect (SECDED). The DCU includes a combined local and global exclusive monitor that is used by the Load-Exclusive/Store-Exclusive instructions.
The STB holds store operations when they have left the load/store pipeline in the DCU and have been committed by the DPU. The STB can request access to the cache RAMs in the DCU, request the BIU to initiate linefills, or request the Bus Interface Unit (BIU) to write out the data on the external write channel. External data writes are through the SCU.
The STB is also used to queue maintenance operations before they are broadcast to other cores in the processor.
The governor block, outside the core, includes all functions that must remain operating while a core is in retention mode.
The GIC CPU interface is a memory-mapped interface through which a core receives an interrupt. The GIC Distributor can read and write the GIC CPU interface registers even while the core is in retention mode.
The L2 memory system contains the L2 cache pipeline and all the logic that maintains memory coherence between the cores in the cluster.
Each Cortex‑A32 cluster can include an optional L2 cache that participates in the coherency protocol. Each L2 cache is 8-way set associative, supports 64-byte cache lines, and has a configurable cache RAM size between 128KB and 1MB.
The ACP interface cannot be configured without an L2 cache because it reuses buffering and data paths implemented for the L2 cache to achieve optimal efficiency. The main advantage of the ACP interface is its ability to allocate data in the L2 cache RAMs.