2.1.1. Components of the processor

The main components of the processor are:

Instruction fetch

The instruction fetch unit fetches instructions from the L1 instruction cache and delivers up to three instructions per cycle to the instruction decode unit. It supports dynamic and static branch prediction. The instruction fetch unit includes:

  • L1 instruction cache that is a 32KB 2-way set-associative cache with 64 bytes cache line and optional parity protection per 16-bits

  • 2-level dynamic predictor with BTB for fast target generation

  • return stack

  • static branch predictor

  • indirect predictor

  • 32-entry fully-associative L1 instruction TLB.

Instruction decode

The instruction decode unit decodes the following instructions:

  • ARM

  • Thumb

  • ThumbEE

  • Advanced SIMD

  • CP14

  • CP15.

The instruction decode unit also performs register renaming to facilitate out-of-order execution by removing Write-After-Write (WAW) and Write-After-Read (WAR) hazards. A loop buffer provides additional power savings while executing small instruction loops.

Instruction dispatch

The instruction dispatch unit controls when the decoded instructions can be dispatched to the execution pipelines and when the returned results can be retired. It includes:

  • the ARM core general purpose registers

  • the Advanced SIMD and VFP extension register set

  • the CP14 and CP15 registers

  • the APSR and FPSCR flag bits.

Integer execute

The integer execute unit includes:

  • two symmetric Arithmetic Logical Unit (ALU) pipelines

  • integer multiply-accumulate pipeline

  • iterative integer divide hardware

  • branch and instruction condition codes resolution logic

  • result forwarding and comparator logic.

Load/Store unit

The load/store unit executes load and store instructions and encompasses the L1 data side memory system. It also services memory coherency requests from the L2 memory system. The load/store unit includes:

  • L1 data cache that is a 32KB 2-way set-associative cache with 64 bytes cache line and optional ECC protection per 32-bits

  • two separate 32-entry fully-associative L1 TLBs, one for data loads and one for data stores.

See Chapter 5 Memory Management Unit and Chapter 6 Level 1 Memory System for more information.

L2 memory system

The L2 memory system services L1 instruction and data cache misses from each processor. It handles requests on the AMBA 4 ACE master interface and AXI3 ACP slave interface. The L2 memory system includes:

  • L2 cache that is:

    • 512KB, 1MB, 2MB, or 4MB configurable size

    • 16-way set-associative cache with optional ECC protection per 64-bits.

  • duplicate copy of L1 data cache tag RAMs from each processor for handling snoop requests

  • 4-way set-associative of 512-entry L2 TLB in each processor

  • automatic hardware prefetcher with programmable instruction fetch and load/store data prefetch distances.

See Chapter 7 Level 2 Memory System for more information.

NEON and VFP unit

The NEON and VFP unit provides support for the ARMv7 Advanced SIMDv2 and VFPv4 instruction sets. See Chapter 14 NEON and VFP Unit for more information.

Generic Interrupt Controller

The GIC provides support for handling multiple interrupt sources. See Chapter 8 Generic Interrupt Controller for more information.

Generic Timer

The Generic Timer provides the ability to schedule events and trigger interrupts. See Chapter 9 Generic Timer for more information.

Debug and trace

The debug and trace unit includes:

  • support for ARMv7.1 Debug architecture with an APB slave interface for access to the debug registers

  • Performance Monitor Unit based on PMUv2 architecture

  • Program Trace Macrocell based on the CoreSight PFTv1.1 architecture and dedicated ATB interface per processor

  • cross trigger interfaces for multi-processor debugging.

See the following for more information:

Copyright © 2011 ARM. All rights reserved.ARM DDI 0438D