2.4.1 Dynamic power management

This section describes the following dynamic power management features in the processor:

Core Wait for Interrupt

Wait for Interrupt (WFI) is a feature of the ARMv8-A architecture that puts the core in a low-power state by disabling the clocks in the core while keeping the core powered up. This reduces the power drawn to the static leakage when the core is in WFI low-power state.

A core enters into WFI low-power state by executing the WFI instruction.
When executing the WFI instruction, the core waits for all instructions in the core to retire before entering the idle or low-power state. The WFI instruction ensures that all explicit memory accesses occurred before the WFI instruction in program order, have retired. For example, the WFI instruction ensures that the following instructions receive the required data or responses from the L2 memory system:
  • Load instructions.
  • Cache and TLB maintenance operations.
  • Store-Exclusive instructions.
In addition, the WFI instruction ensures that store instructions update the cache or are issued to the L2 memory system.
While the core is in WFI low-power state, the clocks in the core are temporarily enabled without causing the core to exit WFI low-power state, when any of the following events are detected:
  • An L2 snoop request that must be serviced by the core L1 data cache.
  • A cache, TLB, or BTB maintenance operation that must be serviced by the core L1 instruction cache, data cache, instruction TLB, data TLB, or BTB.
  • An APB access to the debug or trace registers residing in the core power domain.
The core exits from WFI low-power state when it detects a reset or a WFI wake-up event occurs. See the ARM® Architecture Reference Manual ARMv8, for ARMv8-A architecture profile for information about the various WFI wake-up events.
On entry into WFI low-power state, STANDBYWFI for that core is asserted. STANDBYWFI continues to assert even if the clocks in the core are temporarily enabled because of an L2 snoop request, cache, TLB, and BTB maintenance operation or an APB access.

Core Wait for Event

Wait for Event (WFE) is a feature of the ARMv8-A architecture that uses a locking mechanism based on events to put the core in a low-power state by disabling the clocks in the core while keeping the core powered up. This reduces the power drawn to the static leakage current, when the core is in WFE low-power state.

A core enters into WFE low-power state by executing the WFE instruction. When executing the WFE instruction, the core waits for all instructions in the core to complete before entering the idle or low-power state. The WFE instruction ensures that all explicit memory accesses occurred before the WFE instruction in program order, have completed.
While the core is in WFE low-power state, the clocks in the core are temporarily enabled without causing the core to exit WFE low-power state, when any of the following events are detected:
  • An L2 snoop request that must be serviced by the core L1 data cache.
  • A cache, TLB, or BTB maintenance operation that must be serviced by the core L1 instruction cache, data cache, instruction TLB, data TLB, or BTB.
  • An APB access to the debug or trace registers residing in the core power domain.
The cores exits from WFE low-power state when:
  • It detects a reset.
  • The EVENTI input signal asserts.
  • The CLREXMONREQ input signal asserts.
  • A WFE wake-up event occurs. See the ARM® Architecture Reference Manual ARMv8, for ARMv8-A architecture profile for information about the various WFE wake-up events.
On entry into WFE low-power state, STANDBYWFE for that core is asserted. STANDBYWFE continues to assert even if the clocks in the core are temporarily enabled because of an L2 snoop request, cache, TLB, and BTB maintenance operation or an APB access.

Event communication using WFE and SEV instructions

The EVENTI signal enables an external agent to participate in the WFE and SEV event communication. When this signal is asserted, it sends an event message to all the cores in the processor. This is similar to executing an SEV instruction on one core in the processor. This enables the external agent to signal to the core that it has released a semaphore and that the core can leave the WFE low-power state. The EVENTI input signal must remain HIGH for at least one CLK cycle to be visible by the cores.

The external agent can determine that at least one of the cores in the processor has executed an SEV instruction by checking the EVENTO signal. When any of the cores in the processor executes an SEV instruction, an event is signaled to all the cores in the processor, and the EVENTO signal is asserted. This signal is asserted HIGH for three CLK cycles when any of the cores executes an SEV instruction.

CLREXMON request and acknowledge signaling

The CLREXMONREQ signal has a corresponding CLREXMONACK response signal. This forms a standard 2-wire, 4-phase handshake that can be used to signal across the voltage and frequency boundary between the core and system.

When the CLREXMONREQ input is asserted, it signals the clearing of an external global exclusive monitor and acts as WFE wake-up event to all the cores in the processor.
The following figure shows the CLREXMON request and acknowledge handshake. When the request signal is asserted, it continues to assert until an acknowledge is received. When the request is deasserted, the acknowledge can then deassert.

Note

If a global exclusive monitor does not exist in your system, tie the CLREXMONREQ input LOW.
Figure 2-11 CLREXMON request and acknowledge handshake
To view this graphic, your browser must support the SVG format. Either install a browser with native support, or install an appropriate plugin such as Adobe SVG Viewer.

L2 Wait for Interrupt

When all the cores are in WFI low-power state, the shared L2 memory system logic that is common to all the cores can also enter a WFI low-power state.

Entry into L2 WFI low-power state can only occur if specific requirements are met and the following sequence applied:
  1. All cores are in the WFI low-power state, so all the core STANDBYWFI outputs are asserted.
  2. When ACP is present and all outstanding ACP requests are complete, the SoC asserts the AINACTS input to idle the ACP slave interface. When AINACTS has been asserted, the SoC must not assert ARVALIDS, AWVALIDS, or WVALIDS.
  3. If the processor implements:
    An ACE interface
    When all outstanding snoop requests are complete, the SoC asserts the ACINACTM input signal to idle the AXI master snoop interface. This prevents the L2 memory system from accepting any new requests from the AXI master snoop interface. When ACINACTM has been asserted, the SoC must not assert ACVALIDM.
    A CHI interface
    When all outstanding snoop requests are complete, the SoC asserts the SINACT input signal indicating that the processor is removed from the coherency domain and does not receive any more snoops. This triggers the L2 to deactivate the TX and RX links. When the TX and RX links are in their respective stop states, the L2 memory system does not accept any new requests from the CHI interface.
  4. When the L2 memory system completes the outstanding transactions for ACE and CHI interfaces, it can then enter the L2 WFI low-power state. On entry into L2 WFI low-power state, STANDBYWFIL2 is asserted. Assertion of STANDBYWFIL2 guarantees that the L2 is idle and does not accept any new transactions.
  5. The SoC can then choose to deassert the CLKEN input to the processor to stop all remaining internal clocks within the core that are derived from CLK. All clocks in the shared L2 memory system logic, GIC, and Timer, are disabled.
If CLKEN is deasserted, the SoC must assert the CLKEN input on a WFI wake-up event to enable the L2 memory system and potentially the core. There are two classes of wake-up events:
  • An event that requires only the L2 memory system to be enabled.
  • An event that requires both the L2 memory and the core to be enabled.
The following wake-up events cause both the L2 memory system and the core to exit WFI low-power state:
  • A physical IRQ or FIQ interrupt.
  • A debug event.
  • Powerup or Warm reset.
The following wake-up events cause only the L2 memory system to exit WFI low-power state:
  • If the device is configured to have an ACE interface, deassertion of ACINACTM to service an external snoop request on the AXI master snoop interface.
  • If the device is configured to have a CHI interface:
    • Deassertion of SINACT to service an external snoop request.
    • Activation of TX or RX links.
  • If ACP is present, deassertion of AINACTS to service an ACP transaction on the slave interface.
When the core exits from WFI low-power state STANDBYWFI for that core is deasserted. When the L2 memory system logic exits from WFI low-power state, STANDBYWFIL2 is deasserted.
The following figure shows the L2 WFI timing for a 4-core configuration.
Figure 2-12 L2 Wait For Interrupt timing
To view this graphic, your browser must support the SVG format. Either install a browser with native support, or install an appropriate plugin such as Adobe SVG Viewer.

L2 hardware cache flush

The processor provides an efficient way to fully clean and invalidate the L2 cache in preparation for powering it down without requiring the waking of a core to perform the clean and invalidate through software.

Use of L2 hardware cache flush can only occur if specific requirements are met and the following sequence applied:
  1. Disable L2 prefetches by writing zeros to bits[38, 36:35] of the CPU Extended Control Register.
  2. Disable the load-store hardware prefetcher by writing a one to bit [56] of the CPU Auxiliary Control Register.
  3. Execute an ISB instruction to ensure the CPU Extended Control Register and CPU Auxiliary Control Register writes are complete.
  4. Execute a DSB instruction to ensure completion of any prior prefetch requests.
  5. All cores are in the WFI low-power state, so all the core STANDBYWFI outputs are asserted.
  6. When ACP is present and all outstanding ACP transactions are complete, the SoC asserts the AINACTS signal to idle the ACP. This is necessary to prevent ACP transactions from allocating new entries in the L2 cache while the hardware cache flush is occurring. When AINACTS has been asserted, the SoC must not assert ARVALIDS, AWVALIDS, or WVALIDS.
  7. The SoC can now assert the L2FLUSHREQ input.
  8. The L2 performs a series of internal clean and invalidate operations to each set and way of the L2 cache. Any dirty cache lines are written back to the system using WriteBack or WriteNoSnoop operations. Clean cache lines can cause Evict or WriteEvict transactions if the L2 is configured.
  9. When the L2 completes the clean and invalidate sequence, it asserts the L2FLUSHDONE signal. The SoC can now deassert L2FLUSHREQ signal and then the L2 deasserts L2FLUSHDONE.
  10. When all outstanding snoop transactions are completed, the SoC can assert the ACINACTM signal in an ACE implementation or the SINACT signal in a CHI implementation. In response, the L2 asserts the STANDBYWFIL2 signal.
It is possible to terminate the L2 hardware cache flush by deasserting the L2FLUSHREQ signal before the L2FLUSHDONE signal is asserted. This causes the L2 to abort the hardware cache flush. This feature can be used when the SoC does not power down the core and must wake up the core quickly.
The following figure shows the L2 hardware cache flush timing.
Figure 2-13 L2 hardware cache flush timing
To view this graphic, your browser must support the SVG format. Either install a browser with native support, or install an appropriate plugin such as Adobe SVG Viewer.

Processor dynamic retention

When a core is in WFI low-power state or WFE low-power state, the clocks to the core are stopped. During these low-power states, the core might start the clocks for short periods of time to allow it to handle snoops or other short events but it remains in the low-power state.

Whenever the clocks to a core are stopped, it is possible for an external power controller to place the core in a retention state to reduce leakage power consumption without state loss.
Each core in the processor has a CPU Q-channel interface that allows an external power controller to place the core into a retention state. This interface consists of four pins:
  • CPUQACTIVE.
  • CPUQREQn.
  • CPUQACCEPTn.
  • CPUQDENY.
The operational relationship of these signals are:
  • CPUQREQn can only go LOW, if CPUQACCEPTn is HIGH and CPUQDENY is LOW.
  • After CPUQREQn goes LOW, it must remain LOW until either CPUQACCEPTn goes LOW or CPUQDENY goes HIGH.
  • CPUQREQn can then go HIGH, and must remain HIGH until both CPUQACCEPTn is HIGH and CPUQDENY is LOW.
  • Each CPUQREQn request is followed by the assertion of either CPUQACCEPTn or CPUQDENY, but not both. CPUQACCEPTn cannot be asserted LOW at the same time as CPUQDENY is asserted HIGH.
A typical sequence of the external power controller successfully placing the core in retention state is:
  1. The core executes a WFI instruction. The clocks in the core are stopped and STANDBYWFI is asserted. After the programmed number of Generic Timer CNTVALUEB ticks specified by CPUECTLR[2:0] field has elapsed, the CPUQACTIVE for that core is deasserted. This hints that retention is possible for that core.
  2. The external power controller asserts CPUQREQn to indicate that it wants to put that core into retention state.
  3. While the core is still in WFI low-power state and the clocks are stopped, the core accepts the retention request by asserting CPUQACCEPTn.
  4. While CPUQREQn and CPUQACCEPTn are both asserted, the core is in quiescent state and the external power controller can safely put the core into retention state.
  5. During retention, if a snoop occurs to access the cache of the quiescent core, the CPUQACTIVE signal is asserted to request exit from retention.
  6. The external power controller brings the core out of retention and deasserts CPUQREQn.
  7. The core deasserts CPUQACCEPTn to complete the handshake.
  8. The clocks in the core are restarted temporarily to allow the snoop request to the core to proceed.
  9. After the snoop access is complete, the core deasserts CPUQACTIVE.
  10. CPUQREQn and CPUQACCEPTn are then asserted. The core has reentered quiescent state and the external power controller can put the core into retention state again.
  11. When the core is ready to exit WFI low-power state, CPUQACTIVE is asserted.
  12. CPUQREQn is then deasserted, the core exits WFI low-power state, and CPUQACCEPTn is deasserted.
The following figure shows a typical sequence where the external power controller successfully places the core in retention state.
Figure 2-14 Successful retention timing
To view this graphic, your browser must support the SVG format. Either install a browser with native support, or install an appropriate plugin such as Adobe SVG Viewer.

The core enters WFI low-power state and deasserts CPUQACTIVE. The external power controller asserts CPUQREQn. If the core cannot safely enter quiescent state, it asserts CPUQDENY instead of CPUQACCEPTn. When this occurs, the external power controller cannot put that core into retention state. The external power controller must then deassert CPUQREQn, then the core deasserts CPUQDENY.
The following figure shows a sequence where the external power controller attempts to put a core in retention state but the core denies the request.
Figure 2-15 Denied retention timing
To view this graphic, your browser must support the SVG format. Either install a browser with native support, or install an appropriate plugin such as Adobe SVG Viewer.

Guidelines on the use of core dynamic retention

As cores generally only stay in WFE low-power state for a short period of time, ARM recommends that you only take a core into retention when it is in WFI low-power state.

If the L1 data cache of a core that is in WFI low-power state contains data that is likely to be the target of frequent snoops from other cores, entering quiescent state and retention is likely to be inefficient.
When using the core retention feature, you must consider the following points:
  • During core reset, CPUQREQn must be deasserted HIGH while CPUQACCEPTn is asserted LOW.
  • The Processor dynamic retention control field in the CPU Extended Control Register, CPUECTLR, must be set to a nonzero value to enable this feature. If this field is 0b000, all assertions of CPUQREQn LOW receive CPUQDENY responses.
  • If the core dynamic retention feature is not used, CPUQREQn must be tied HIGH and the CPUECTLR retention control field set to disabled.

Note

If you use the core dynamic retention feature then the CPU Auxiliary Control Register, CPUACTLR[30:29] bits must be zero.

L2 RAMs dynamic retention

L2 RAM dynamic retention mode provides a way of saving power in an idle processor while allowing quick wake-up to service a snoop from ACE or CHI. The core supports dynamic retention of the L2 Data, Dirty, Tag, Inclusion PLRU, and Snoop Tag RAMs.

The processor has an L2 Q-channel interface that allows an external power controller to place the L2 RAMs into a retention state.
L2 RAM dynamic retention mode is entered and exited using the following sequence of events:
  1. All cores are in WFI or WFE low-power state and therefore, all the cores STANDBYWFI or STANDBYWFE outputs are asserted.
  2. When all pending L2 activity is complete, and the L2 remains idle for the programmed number of Generic Timer CNTVALUEB ticks, as specified by L2ECTLR[2:0] field, the L2 deasserts L2QACTIVE.
  3. The external power controller asserts L2QREQn to indicate that it wants to put the L2 RAMs into retention state.
  4. If the L2 is still idle, it accepts the retention request by asserting L2QACCEPTn.
  5. While L2QREQn and L2QACCEPTn are both asserted, the power controller can safely put the L2 RAMs into retention state.
  6. If the L2 detects that one or more cores have exited WFI low-power state, the ACP becomes active or a snoop request must be serviced, the L2 asserts L2QACTIVE to request exit from retention.
  7. The power controller brings the L2 RAMs out of retention and deasserts L2QREQn.
  8. The L2 deasserts L2QACCEPTn to complete the handshake.
The following figure shows the L2 dynamic retention timing.
Figure 2-16 L2 dynamic retention timing
To view this graphic, your browser must support the SVG format. Either install a browser with native support, or install an appropriate plugin such as Adobe SVG Viewer.

If the L2 exits idle in step 4, it asserts L2QDENY instead of L2QACCEPTn. In response, the power controller must deassert L2QREQn, causing the L2 to deassert L2QDENY.
The L2 dynamic retention control field in the L2 Extended Control Register, L2ECTLR, must be set to a nonzero value to enable this feature. If this field is 0b000, all assertions of L2QREQn LOW receive L2QDENY responses.
If the L2 dynamic retention feature is not used, L2QREQn must be tied HIGH and the L2ECTLR retention control field set to disabled.

Note

If you use the L2 dynamic retention feature then the L2 Auxiliary Control Register, L2ACTLR[28:27] bits must be zero.

Advanced SIMD and FP clock gating

The processor supports dynamic high-level clock gating of the Advanced SIMD and FP unit to reduce dynamic power dissipation.

The clock to the Advanced SIMD and FP unit is enabled when an Advanced SIMD or FP instruction is detected in the pipeline, and is disabled otherwise.
You can set bit[29] of the CPU Auxiliary Control Register, CPUACTLR_EL1, to 1 to disable dynamic clock gating of the Advanced SIMD or FP unit.

L2 control and tag banks clock gating

The processor supports dynamic high-level clock gating of the shared L2 control logic and the two L2 tag banks to reduce dynamic power dissipation.

The L2 tag bank clocks are only enabled when a corresponding access to the L2 tag bank is detected in the pipeline.
The L2 control logic is disabled after 256 consecutive idle cycles. It is then enabled when an L2 access is detected, with an additional 4-cycle penalty for the wake up before the access is serviced.
You can set bit[28] of the L2 Auxiliary Control Register, L2ACTLR_EL1, to 1 to disable dynamic clock gating of the L2 tag banks.
You can set bit[27] of the L2 Auxiliary Control Register, L2ACTLR_EL1, to 1 to disable dynamic clock gating of the L2 control logic.
Related information

Regional clock gating

In addition to extensive local clock gating to register flops, you can configure the processor to include Regional Clock Gates (RCGs) that can perform additional clock gating of logic blocks such as the register banks to reduce dynamic power dissipation.

You can set bit[63] of the CPUACTLR_EL1 to 1 to disable regional clock gating for each processor.
You can set bit[26] of the L2ACTLR_EL1 to 1 to disable regional clock gating in the L2, GIC, and Timer.
Non-ConfidentialPDF file icon PDF versionARM 100095_0002_03_en
Copyright © 2014, 2015 ARM. All rights reserved.