3.4.18 ARMCortexA76CT

ARMCortexA76CT CPU component. This model is written in C++ and models version r0p0 of the RTL.

ARMCortexA76CT - about

This component has a variable number of cores per cluster, specified using the NUM_CORES parameter. For components with a fixed number of cores per cluster, the number of cores (1-4) is given by the component name, for example ARMCortexA76x4CT. The per-core parameters are preceded by cpun, where n identifies the core (0-3).

The model supports the following features:

  • DynamIQ™ system registers.
  • Per-core L2 cache.
  • A P-Channel for the cluster and for each core.
  • BROADCASTPERSIST pin.
  • Optional peripheral port.
  • L3 cache partition.
  • Per-core clock.

The model does not support the following features:

  • BROADCASTCACHEMAINTPOU pin.
  • COREINSTRRET, COREINSTRRUN, and nPMBIRQ signals.
  • DynamIQ features that are irrelevant to the programmers view simulation. For example:
    • Automatic CPU retention mode.
    • Level-3 Cache RAM retention.
  • 256-bit wide output transactions.
  • Error correction and detection.
  • Self-test (MBIST).
  • Latency configuration.
  • Snoop filtering.
  • Cache stashing capability.

Table 3-116 Ports

Name Protocol Type Description
AENDMP 2.7.5 Value_64 protocol Slave Port to obtain end address of valid peripheral address range (exclusive).
ASTARTMP 2.7.5 Value_64 protocol Slave Port to obtain start address of valid peripheral address range (inclusive).
CNTHPIRQ[4] 2.7.2 Signal protocol Master Timer signals to SOC.
CNTHVIRQ[4] 2.7.2 Signal protocol Master Timer signals to SOC.
CNTPNSIRQ[4] 2.7.2 Signal protocol Master Timer signals to SOC.
CNTPSIRQ[4] 2.7.2 Signal protocol Master Timer signals to SOC.
CNTVIRQ[4] 2.7.2 Signal protocol Master Timer signals to SOC.
acp_s PVBus Slave AXI ACP slave port.
broadcastatomic 2.7.2 Signal protocol Slave CHI defined pins.
broadcastcachemaint 2.7.2 Signal protocol Slave ACE defined pins.
broadcastouter 2.7.2 Signal protocol Slave ACE defined pins.
broadcastpersist 2.7.2 Signal protocol Slave CHI defined pins.
cfgend[4] 2.7.2 Signal protocol Slave This signal if for EE bit initialisation.
cfgsdisable 2.7.2 Signal protocol Slave This signal disables write access to some secure Interrupt Controller registers.
cfgte[4] 2.7.2 Signal protocol Slave This signal provides default exception handling state.
clk_in ClockSignal Slave The clock signal connected to the clk_in port is used to determine the rate at which the cluster level components run e.g cluster level timers, caches and pmu.
clrexmonack 2.7.2 Signal protocol Master Acknowledge handshake signal for the clrexmonreq signal
clrexmonreq 2.7.2 Signal protocol Slave Signals the clearing of an external global exclusive monitor
clusterid 2.7.4 Value protocol Slave The port reads the value in CPU ID register field, bits[11:8] of the MPIDR.
clusterpmuirq 2.7.2 Signal protocol Master DynamIQ pmu irq
cntvalueb 2.6.1 CounterInterface protocol Slave Interface to SoC level counter module.
commirq[4] 2.7.2 Signal protocol Master Interrupt signal from debug communications channel.
core_clk_in[4] ClockSignal Slave The clock signal connected to the core_clk_in port is used to determine the rate at which each core executes instructions.
cp15sdisable[4] 2.7.2 Signal protocol Slave This signal disables write access to some system control processor registers.
cpuporeset[4] 2.7.2 Signal protocol Slave Power on reset. Initializes all the processor logic, including debug logic.
cryptodisable[4] 2.7.2 Signal protocol Slave Disable cryptography extensions after reset.
cti[4] 2.6.4 v8EmbeddedCrossTrigger_controlprotocol protocol Master Cross trigger matrix port.
ctidbgirq[4] 2.7.2 Signal protocol Master Cross Trigger Interface (CTI) interrupt trigger output.
dbgen[4] 2.7.2 Signal protocol Slave External debug interface.
dbgnopwrdwn[4] 2.7.2 Signal protocol Master These signals relate to core power down.
dbgpwrupreq[4] 2.7.2 Signal protocol Master Debug power up request.
dev_debug_s PVBus Slave External debug interface.
event 2.7.2 Signal protocol Peer This peer port of event input (and output) is for wakeup from WFE.
fiq[4] 2.7.2 Signal protocol Slave This signal drives the CPUs fast-interrupt handling.
gicv3_redistributor_s[4] 2.6.2 GICv3Comms protocol Slave GICv3 AXI-stream port.
irq[4] 2.7.2 Signal protocol Slave This signal drives the CPUs interrupt handling.
memorymapped_debug_s PVBus Slave External debug interface.
niden[4] 2.7.2 Signal protocol Slave External debug interface.
pchannel_cluster 2.5.1 PChannel protocol Slave PChannel for cluster.
pchannel_core[4] 2.5.1 PChannel protocol Slave PChannels for cores
pmuirq[4] 2.7.2 Signal protocol Master Interrupt signal from performance monitoring unit.
presetdbg 2.7.2 Signal protocol Slave Initialize the shared debug APB, Cross Trigger Interface (CTI), and Cross Trigger Matrix (CTM) logic.
pvbus_m0 PVBus Master The core will generate bus requests on this port.
pvbus_periph_m PVBus Master The core can generate peripheral bus request on this port.
rei[4] 2.7.2 Signal protocol Slave Per core RAM Error Interrupt.
reset[4] 2.7.2 Signal protocol Slave Raising this signal will put the core into reset mode.
rvbaraddr[4] 2.7.5 Value_64 protocol Slave Reset vector base address.
sci_m SystemCoherencyInterface Master System coherency interface port, which is used to take the whole cluster into/out-of coherency domain
sei[4] 2.7.2 Signal protocol Slave Per core virtual System Error physical pins.
spiden[4] 2.7.2 Signal protocol Slave External debug interface.
spniden[4] 2.7.2 Signal protocol Slave External debug interface.
sporeset 2.7.2 Signal protocol Slave A single cluster-wide power on reset signal for all resettable registers in DynamIQ.
ticks[4] 2.6.3 InstructionCount protocol Master This port should be connected to one of the two ticks ports on a 'visualisation' component, in order to display a running instruction count.
vcpumntirq[4] 2.7.2 Signal protocol Master Interrupt signal for virtual CPU maintenance IRQ.
vfiq[4] 2.7.2 Signal protocol Slave Virtualised FIQ.
vinithi[4] 2.7.2 Signal protocol Slave This signal controls of the location of the exception vectors at reset.
virq[4] 2.7.2 Signal protocol Slave Virtualised IRQ.
virtio_s PVBus Slave The virtio coherent port, hooks directly into the L2 system and becomes coherent (assuming attributes are set correctly).
vsei[4] 2.7.2 Signal protocol Slave Per core virtual System Error physical pins.

Table 3-117 Parameters for Cluster_ARM_Cortex-A76

Name Type Default value Description
BROADCASTATOMIC bool 0x1 Enable broadcasting of atomic operation. The broadcastatomic signal will override this value if used
BROADCASTCACHEMAINT bool 0x1 Enable broadcasting of cache maintenance operations to downstream caches. The broadcastcachemaint signal will override this value if used.
BROADCASTOUTER bool 0x1 Enable broadcasting of Outer Shareable transactions. The broadcastouter signal will override this value if used.
BROADCASTPERSIST bool 0x1 Enable broadcasting of cache clean to the point of persistence operations. The broadcastpersist signal will override this value if used
CLUSTER_ID int 0x0 Processor cluster ID value
GICDISABLE bool 0x1 Disable the new style GICv3 CPU interface in each core model. Should be left enabled unless the platform contains a GICv3 distributor.
NUM_CORES int 0x1 Number of cores per cluster
cpi_div int 0x1 Divider for calculating CPI (Cycles Per Instruction)
cpi_mul int 0x1 Multiplier for calculating CPI (Cycles Per Instruction)
dcache-hit_latency int 0x0 L1 D-Cache timing annotation latency for hit. Intended to model the tag-lookup time. This is only used when dcache-state_modelled=true.
dcache-maintenance_latency int 0x0 L1 D-Cache timing annotation latency for cache maintenance operations given in total ticks. This is only used when dcache-state_modelled=true.
dcache-miss_latency int 0x0 L1 D-Cache timing annotation latency for miss. Intended to model the time for failed tag-lookup and allocation of intermediate buffers. This is only used when dcache-state_modelled=true.
dcache-prefetch_enabled bool 0x0 Enable simulation of data cache prefetching. This is only used when dcache-state_modelled=true
dcache-read_access_latency int 0x0 L1 D-Cache timing annotation latency for read accesses given in ticks per access (of size dcache-read_bus_width_in_bytes). If this parameter is non-zero, per-access latencies will be used instead of per-byte even if dcache-read_latency is set. This is in addition to the hit or miss latency, and intended to correspond to the time taken to transfer across the cache upstream bus, this is only used when dcache-state_modelled=true.
dcache-read_latency int 0x0 L1 D-Cache timing annotation latency for read accesses given in ticks per byte accessed.dcache-read_access_latency must be set to 0 for per-byte latencies to be applied. This is in addition to the hit or miss latency, and intended to correspond to the time taken to transfer across the cache upstream bus. This is only used when dcache-state_modelled=true.
dcache-snoop_data_transfer_latency int 0x0 L1 D-Cache timing annotation latency for received snoop accesses that perform a data transfer given in ticks per byte accessed. This is only used when dcache-state_modelled=true.
dcache-state_modelled bool 0x0 Set whether D-cache has stateful implementation
dcache-write_access_latency int 0x0 L1 D-Cache timing annotation latency for write accesses given in ticks per access (of size dcache-write_bus_width_in_bytes). If this parameter is non-zero, per-access latencies will be used instead of per-byte even if dcache-write_latency is set. This is only used when dcache-state_modelled=true.
dcache-write_latency int 0x0 L1 D-Cache timing annotation latency for write accesses given in ticks per byte accessed. dcache-write_access_latency must be set to 0 for per-byte latencies to be applied. This is only used when dcache-state_modelled=true.
default_opmode int 0x4 Operating mode of DynamIQ coming out of reset. 0: SFONLY ON, 1: 1/4 CACHE ON, 2: 1/2 CACHE ON, 3: 3/4 CACHE ON, 4: FULL CACHE ON
diagnostics bool 0x0 Enable DynamIQ diagnostic messages
enable_simulation_performance_optimizations bool 0x1 With this option enabled, the model will run more quickly, but be less accurate to exact CPU behavior. The model will still be functionally accurate for software, but may increase differences seen between hardware behavior and model behavior for certain workloads (it changes the micro-architectural value of stage12_tlb_size parameter to 1024).
ext_abort_device_read_is_sync bool 0x0 Synchronous reporting of device-nGnRE read external aborts
ext_abort_device_write_is_sync bool 0x0 Synchronous reporting of device-nGnRE write external aborts
ext_abort_so_read_is_sync bool 0x0 Synchronous reporting of device-nGnRnE read external aborts
ext_abort_so_write_is_sync bool 0x0 Synchronous reporting of device-nGnRnE write external aborts
gicv3.cpuintf-mmap-access-level int 0x0 Allowed values are: 0-mmap access is supported for GICC,GICH,GICV registers. 1-mmap access is supported only for GICV registers. 2-mmap access is not supported.
has_peripheral_port bool 0x0 If true, additional AXI peripheral port is configured.
has_statistical_profiling bool 0x1 Whether Statistical Based Profiling is implemented
icache-hit_latency int 0x0 L1 I-Cache timing annotation latency for hit. Intended to model the tag-lookup time. This is only used when icache-state_modelled=true.
icache-maintenance_latency int 0x0 L1 I-Cache timing annotation latency for cache maintenance operations given in total ticks. This is only used when icache-state_modelled=true.
icache-miss_latency int 0x0 L1 I-Cache timing annotation latency for miss. Intended to model the time for failed tag-lookup and allocation of intermediate buffers. This is only used when icache-state_modelled=true.
icache-prefetch_enabled bool 0x0 Enable simulation of instruction cache prefetching. This is only used when icache-state_modelled=true.
icache-read_access_latency int 0x0 L1 I-Cache timing annotation latency for read accesses given in ticks per access (of size icache-read_bus_width_in_bytes). If this parameter is non-zero, per-access latencies will be used instead of per-byte even if icache-read_latency is set. This is in addition to the hit or miss latency, and intended to correspond to the time taken to transfer across the cache upstream bus, this is only used when icache-state_modelled=true.
icache-read_latency int 0x0 L1 I-Cache timing annotation latency for read accesses given in ticks per byte accessed.icache-read_access_latency must be set to 0 for per-byte latencies to be applied. This is in addition to the hit or miss latency, and intended to correspond to the time taken to transfer across the cache upstream bus. This is only used when icache-state_modelled=true.
icache-state_modelled bool 0x0 Set whether I-cache has stateful implementation
l3cache-hit_latency int 0x0 L3 Cache timing annotation latency for hit. Intended to model the tag-lookup time. This is only used when l3cache-state_modelled=true.
l3cache-maintenance_latency int 0x0 L3 Cache timing annotation latency for cache maintenance operations given in total ticks. This is only used when dcache-state_modelled=true.
l3cache-miss_latency int 0x0 L3 Cache timing annotation latency for miss. Intended to model the time for failed tag-lookup and allocation of intermediate buffers. This is only used when l3cache-state_modelled=true.
l3cache-read_access_latency int 0x0 L3 Cache timing annotation latency for read accesses given in ticks per access (of size l3cache-read_bus_width_in_bytes). If this parameter is non-zero, per-access latencies will be used instead of per-byte even if l3cache-read_latency is set. This is in addition to the hit or miss latency, and intended to correspond to the time taken to transfer across the cache upstream bus, this is only used when l3cache-state_modelled=true.
l3cache-read_latency int 0x0 L3 Cache timing annotation latency for read accesses given in ticks per byte accessed.l3cache-read_access_latency must be set to 0 for per-byte latencies to be applied. This is in addition to the hit or miss latency, and intended to correspond to the time taken to transfer across the cache upstream bus. This is only used when l3cache-state_modelled=true.
l3cache-size int 0x100000 L3 Cache size in bytes.
l3cache-snoop_data_transfer_latency int 0x0 L3 Cache timing annotation latency for received snoop accesses that perform a data transfer given in ticks per byte accessed. This is only used when dcache-state_modelled=true.
l3cache-snoop_issue_latency int 0x0 L3 Cache timing annotation latency for snoop accesses issued by this cache in total ticks. This is only used when dcache-state_modelled=true.
l3cache-write_access_latency int 0x0 L3 Cache timing annotation latency for write accesses given in ticks per access (of size l3cache-write_bus_width_in_bytes). If this parameter is non-zero, per-access latencies will be used instead of per-byte even if l3cache-write_latency is set. This is only used when l3cache-state_modelled=true.
l3cache-write_latency int 0x0 L3 Cache timing annotation latency for write accesses given in ticks per byte accessed. l3cache-write_access_latency must be set to 0 for per-byte latencies to be applied. This is only used when l3cache-state_modelled=true.
pchannel_treat_simreset_as_poreset bool 0x0 Register core as ON state to cluster with simulation reset.
periph_address_end int 0x0 End address for peripheral port address range exclusive(corresponds to AENDMP input signal).
periph_address_start int 0x0 Start address for peripheral port address range inclusive(corresponds to ASTARTMP input signal).
ptw_latency int 0x0 Page table walker latency for TA (Timing Annotation), expressed in simulation ticks
tlb_latency int 0x0 TLB latency for TA (Timing Annotation), expressed in simulation ticks
treat-dcache-cmos-to-pou-as-nop bool 0x0 Whether dcache invalidation to the point of unification is required for instruction to data coherence. true - Invalidate operations not required
walk_cache_latency int 0x0 Walk cache latency for TA (Timing Annotation), expressed in simulation ticks

Table 3-118 Parameters for ARM_Cortex-A76

Name Type Default value Description
cpu4.CFGEND bool 0x0 Endianness configuration at reset. 0, little endian. 1, big endian.
cpu4.CFGTE bool 0x0 Instruction set state when resetting into AArch32. 0, A32. 1, T32.
cpu4.CP15SDISABLE bool 0x0 Initialize to disable access to some CP15 registers
cpu4.CRYPTODISABLE bool 0x0 Disable cryptographic features.
cpu4.RVBARADDR int 0x0 Value of RVBAR_ELx register.
cpu4.VINITHI bool 0x0 Reset value of SCTLR.V.
cpu4.enable_trace_special_hlt_imm16 bool 0x0 Enable usage of parameter trace_special_hlt_imm16
cpu4.l2cache-hit_latency int 0x0 L2 Cache timing annotation latency for hit. Intended to model the tag-lookup time. This is only used when l2cache-state_modelled=true.
cpu4.l2cache-maintenance_latency int 0x0 L2 Cache timing annotation latency for cache maintenance operations given in total ticks. This is only used when dcache-state_modelled=true.
cpu4.l2cache-miss_latency int 0x0 L2 Cache timing annotation latency for miss. Intended to model the time for failed tag-lookup and allocation of intermediate buffers. This is only used when l2cache-state_modelled=true.
cpu4.l2cache-read_access_latency int 0x0 L2 Cache timing annotation latency for read accesses given in ticks per access. If this parameter is non-zero, per-access latencies will be used instead of per-byte even if l2cache-read_latency is set. This is in addition to the hit or miss latency, and intended to correspond to the time taken to transfer across the cache upstream bus, this is only used when l2cache-state_modelled=true.
cpu4.l2cache-read_latency int 0x0 L2 Cache timing annotation latency for read accesses given in ticks per byte accessed.l2cache-read_access_latency must be set to 0 for per-byte latencies to be applied. This is in addition to the hit or miss latency, and intended to correspond to the time taken to transfer across the cache upstream bus. This is only used when l2cache-state_modelled=true.
cpu4.l2cache-size int 0x80000 L2 Cache size in bytes.
cpu4.l2cache-snoop_data_transfer_latency int 0x0 L2 Cache timing annotation latency for received snoop accesses that perform a data transfer given in ticks per byte accessed. This is only used when dcache-state_modelled=true.
cpu4.l2cache-snoop_issue_latency int 0x0 L2 Cache timing annotation latency for snoop accesses issued by this cache in total ticks. This is only used when dcache-state_modelled=true.
cpu4.l2cache-write_access_latency int 0x0 L2 Cache timing annotation latency for write accesses given in ticks per access. If this parameter is non-zero, per-access latencies will be used instead of per-byte even if l2cache-write_latency is set. This is only used when l2cache-state_modelled=true.
cpu4.l2cache-write_latency int 0x0 L2 Cache timing annotation latency for write accesses given in ticks per byte accessed. l2cache-write_access_latency must be set to 0 for per-byte latencies to be applied. This is only used when l2cache-state_modelled=true.
cpu4.max_code_cache_mb int 0x100 Maximum size of the simulation code cache (MiB). For platforms with more than 2 cores this limit will be scaled down. (e.g 1/8 for 16 or more cores)
cpu4.min_sync_level int 0x0 Force minimum syncLevel (0=off=default,1=syncState,2=postInsnIO,3=postInsnAll)
cpu4.semihosting-A32_HLT int 0xf000 A32 HLT number for semihosting calls.
cpu4.semihosting-A64_HLT int 0xf000 A64 HLT number for semihosting calls.
cpu4.semihosting-ARM_SVC int 0x123456 A32 SVC number for semihosting calls.
cpu4.semihosting-T32_HLT int 0x3c T32 HLT number for semihosting calls.
cpu4.semihosting-Thumb_SVC int 0xab T32 SVC number for semihosting calls.
cpu4.semihosting-cmd_line string "" Command line available to semihosting calls.
cpu4.semihosting-cwd string "" Base directory for semihosting file access.
cpu4.semihosting-enable bool 0x1 Enable semihosting SVC/HLT traps.
cpu4.semihosting-heap_base int 0x0 Virtual address of heap base.
cpu4.semihosting-heap_limit int 0xf000000 Virtual address of top of heap.
cpu4.semihosting-stack_base int 0x10000000 Virtual address of base of descending stack.
cpu4.semihosting-stack_limit int 0xf000000 Virtual address of stack limit.
cpu4.trace_special_hlt_imm16 int 0xf000 For this HLT number, IF enable_trace_special_hlt_imm16=true, skip performing usual HLT execution but call MTI trace if registered
cpu4.vfp-enable_at_reset bool 0x0 Enable VFP in CPACR, CPPWR, NSACR at reset. Warning: Arm recommends going through the implementation's suggested VFP power-up sequence!
Non-ConfidentialPDF file icon PDF version100964_1142_00_en
Copyright © 2014–2018 Arm Limited or its affiliates. All rights reserved.