2.4.1. Dynamic power management

This section describes the following dynamic power management features in the multiprocessor:

Processor Wait for Interrupt

Wait for Interrupt (WFI) is a feature of the ARMv8 architecture that puts the processor in a low-power state by disabling the clocks in the processor while keeping the processor powered up. This reduces the power drawn to the static leakage when the processor is in WFI low-power state.

A processor enters into WFI low-power state by executing the WFI instruction.

When executing the WFI instruction, the processor waits for all instructions in the processor to retire before entering the idle or low-power state. The WFI instruction ensures that all explicit memory accesses occurred before the WFI instruction in program order, have retired. For example, the WFI instruction ensures that the following instructions receive the required data or responses from the L2 memory system:

  • Load instructions.

  • Cache and TLB maintenance operations.

  • Store-Exclusive instructions.

In addition, the WFI instruction ensures that store instructions update the cache or are issued to the L2 memory system.

While the processor is in WFI low-power state, the clocks in the processor are temporarily enabled without causing the processor to exit WFI low-power state, when any of the following events are detected:

  • An L2 snoop request that must be serviced by the processor L1 data cache.

  • A cache, TLB, or BTB maintenance operation that must be serviced by the processor L1 instruction cache, data cache, instruction TLB, data TLB, or BTB.

  • An APB access to the debug or trace registers residing in the processor power domain.

The processor exits from WFI low-power state when it detects a reset or a WFI wake-up event occurs. See the ARM® Architecture Reference Manual ARMv8 for information about the various WFI wake-up events.

On entry into WFI low-power state, STANDBYWFI for that processor is asserted. STANDBYWFI continues to assert even if the clocks in the processor are temporarily enabled because of an L2 snoop request, cache, TLB, and BTB maintenance operation or an APB access.

Processor Wait for Event

Wait for Event (WFE) is a feature of the ARMv8 architecture that uses a locking mechanism based on events to put the processor in a low-power state by disabling the clocks in the processor while keeping the processor powered up. This reduces the power drawn to the static leakage current, when the processor is in WFE low-power state.

A processor enters into WFE low-power state by executing the WFE instruction. When executing the WFE instruction, the processor waits for all instructions in the processor to complete before entering the idle or low-power state. The WFE instruction ensures that all explicit memory accesses occurred before the WFE instruction in program order, have completed.

While the processor is in WFE low-power state, the clocks in the processor are temporarily enabled without causing the processor to exit WFE low-power state, when any of the following events are detected:

  • An L2 snoop request that must be serviced by the processor L1 data cache.

  • A cache, TLB, or BTB maintenance operation that must be serviced by the processor L1 instruction cache, data cache, instruction TLB, data TLB, or BTB.

  • An APB access to the debug or trace registers residing in the processor power domain.

The processors exits from WFE low-power state when:

  • It detects a reset.

  • The EVENTI input signal asserts.

  • The CLREXMONREQ input signal asserts.

  • A WFE wake-up event occurs. See the ARM® Architecture Reference Manual ARMv8 for information about the various WFE wake-up events.

On entry into WFE low-power state, STANDBYWFE for that processor is asserted. STANDBYWFE continues to assert even if the clocks in the processor are temporarily enabled because of an L2 snoop request, cache, TLB, and BTB maintenance operation or an APB access.

Event communication using WFE and SEV instructions

The EVENTI signal enables an external agent to participate in the WFE and SEV event communication. When this signal is asserted, it sends an event message to all the processors in the multiprocessor. This is similar to executing an SEV instruction on one processor in the multiprocessor. This enables the external agent to signal to the processor that it has released a semaphore and that the processor can leave the WFE low-power state. The EVENTI input signal must remain HIGH for at least one CLK cycle to be visible by the processors.

The external agent can determine that at least one of the processors in the multiprocessor has executed an SEV instruction by checking the EVENTO signal. When any of the processors in the multiprocessor executes an SEV instruction, an event is signaled to all the processors in the multiprocessor, and the EVENTO signal is asserted. This signal is asserted HIGH for three CLK cycles when any of the processors executes an SEV instruction.

CLREXMON request and acknowledge signaling

The CLREXMONREQ signal has a corresponding CLREXMONACK response signal. This forms a standard 2-wire, 4-phase handshake that can be used to signal across the voltage and frequency boundary between the processor and system.

When the CLREXMONREQ input is asserted, it signals the clearing of an external global exclusive monitor and acts as WFE wake-up event to all the processors in the multiprocessor.

Figure 2.11 shows the CLREXMON request and acknowledge handshake. When the request signal is asserted, it continues to assert until an acknowledge is received. When the request is deasserted, the acknowledge can then deassert.

Note

If a global exclusive monitor does not exist in your system, tie the CLREXMONREQ input LOW.

Figure 2.11. CLREXMON request and acknowledge handshake

To view this graphic, your browser must support the SVG format. Either install a browser with native support, or install an appropriate plugin such as Adobe SVG Viewer.


L2 Wait for Interrupt

When all the processors are in WFI low-power state, the shared L2 memory system logic that is common to all the processors can also enter a WFI low-power state.

Entry into L2 WFI low-power state can only occur if specific requirements are met and the following sequence applied:

  1. All processors are in the WFI low-power state, so all the processor STANDBYWFI outputs are asserted.

  2. When all outstanding ACP requests are complete, the SoC asserts the AINACTS input to idle the ACP slave interface. When AINACTS has been asserted, the SoC must not assert ARVALIDS, AWVALIDS, or WVALIDS.

  3. If the multiprocessor implements:

    An ACE interface

    When all outstanding snoop requests are complete, the SoC asserts the ACINACTM input signal to idle the AXI master snoop interface. This prevents the L2 memory system from accepting any new requests from the AXI master snoop interface.

    A CHI interface

    When all outstanding snoop requests are complete, the SoC asserts the SINACT input signal indicating that the multiprocessor is removed from the coherency domain and does not receive any more snoops. This triggers the L2 to deactivate the TX and RX links. When the TX and RX links are in their respective stop states, the L2 memory system does not accept any new requests from the CHI interface.

  4. When the L2 memory system completes the outstanding transactions for ACE and CHI interfaces, it can then enter the L2 WFI low-power state. On entry into L2 WFI low-power state, STANDBYWFIL2 is asserted. Assertion of STANDBYWFIL2 guarantees that the L2 is idle and does not accept any new transactions.

  5. The SoC can then choose to deassert the CLKEN input to the multiprocessor to stop all remaining internal clocks within the processor that are derived from CLK. All clocks in the shared L2 memory system logic, GIC, and Timer, are disabled.

If CLKEN is deasserted, the SoC must assert the CLKEN input on a WFI wake-up event to enable the L2 memory system and potentially the processor. There are two classes of wake-up events:

  • An event that requires only the L2 memory system to be enabled.

  • An event that requires both the L2 memory and the processor to be enabled.

The following wake-up events cause both the L2 memory system and the processor to exit WFI low-power state:

  • A physical IRQ or FIQ interrupt.

  • A debug event.

  • Powerup or Warm reset.

The following wake-up events cause only the L2 memory system to exit WFI low-power state:

  • If the device is configured to have an ACE interface, deassertion of ACINACTM to service an external snoop request on the AXI master snoop interface.

  • If the device is configured to have a CHI interface:

    • Deassertion of SINACT to service an external snoop request.

    • Activation of TX or RX links.

  • Deassertion of AINACTS to service an ACP transaction on the slave interface.

When the processor exits from WFI low-power state STANDBYWFI for that processor is deasserted. When the L2 memory system logic exits from WFI low-power state, STANDBYWFIL2 is deasserted.

Figure 2.12 shows the L2 WFI timing for a 4-processor configuration.

Figure 2.12. L2 Wait For Interrupt timing

To view this graphic, your browser must support the SVG format. Either install a browser with native support, or install an appropriate plugin such as Adobe SVG Viewer.


L2 hardware cache flush

The multiprocessor provides an efficient way to fully clean and invalidate the L2 cache in preparation for powering it down without requiring the waking of a processor to perform the clean and invalidate through software.

Use of L2 hardware cache flush can only occur if specific requirements are met and the following sequence applied:

  1. Disable L2 prefetches by writing zeros to bits[38, 36:35, 33:32] of the CPU Extended Control Register. See CPU Extended Control Register, EL1 for more information.

  2. Execute an ISB instruction to ensure the CPU Extended Control Register write is complete.

  3. Execute a DSB instruction to ensure completion of any prior prefetch requests.

  4. All processors are in the WFI low-power state, so all the processor STANDBYWFI outputs are asserted.

  5. When all outstanding ACP transactions are complete, the SoC asserts the AINACTS signal to idle the ACP. This is necessary to prevent ACP transactions from allocating new entries in the L2 cache while the hardware cache flush is occurring. When AINACTS has been asserted, the SoC must not assert ARVALIDS, AWVALIDS, or WVALIDS.

  6. The SoC can now assert the L2FLUSHREQ input.

  7. The L2 performs a series of internal clean and invalidate operations to each set and way of the L2 cache. Any dirty cache lines are written back to the system using WriteBack or WriteNoSnoop operations. Clean cache lines can cause Evict or WriteEvict transactions if the L2 is configured.

  8. When the L2 completes the clean and invalidate sequence, it asserts the L2FLUSHDONE signal. The SoC can now deassert L2FLUSHREQ signal and then the L2 deasserts L2FLUSHDONE.

  9. When all outstanding snoop transactions are completed, the SoC can assert the ACINACTM signal in an ACE implementation or the SINACT signal in a CHI implementation. In response, the L2 asserts the STANDBYWFIL2 signal.

It is possible to terminate the L2 hardware cache flush by deasserting the L2FLUSHREQ signal before the L2FLUSHDONE signal is asserted. This causes the L2 to abort the hardware cache flush. This feature can be used when the SoC does not power down the multiprocessor and must wake up the processor quickly.

Figure 2.13 shows the L2 hardware cache flush timing.

Figure 2.13. L2 hardware cache flush timing

To view this graphic, your browser must support the SVG format. Either install a browser with native support, or install an appropriate plugin such as Adobe SVG Viewer.


Processor dynamic retention

When a processor is in WFI low-power state or WFE low-power state, the clocks to the processor are stopped. During these low-power states, the processor might start the clocks for short periods of time to allow it to handle snoops or other short events but it remains in the low-power state.

Whenever the clocks to a processor are stopped, it is possible for an external power controller to place the processor in a retention state to reduce leakage power consumption without state loss.

Each processor in the multiprocessor has a CPU Q-channel interface that allows an external power controller to place the processor into a retention state. This interface consists of four pins:

  • CPUQACTIVE.

  • CPUQREQn.

  • CPUQACCEPTn.

  • CPUQDENY.

The operational relationship of these signals are:

  • CPUQREQn can only go LOW, if CPUQACCEPTn is HIGH and CPUQDENY is LOW.

  • After CPUQREQn goes LOW, it must remain LOW until either CPUQACCEPTn goes LOW or CPUQDENY goes HIGH.

  • CPUQREQn can then go HIGH, and must remain HIGH until both CPUQACCEPTn is HIGH and CPUQDENY is LOW.

  • Each CPUQREQn request is followed by the assertion of either CPUQACCEPTn or CPUQDENY, but not both. CPUQACCEPTn cannot be asserted LOW at the same time as CPUQDENY is asserted HIGH.

A typical sequence of the external power controller successfully placing the processor in retention state is:

  1. The processor executes a WFI instruction. The clocks in the processor are stopped and STANDBYWFI is asserted. After the programmed number of Generic Timer CNTVALUEB ticks specified by CPUECTLR[2:0] field has elapsed, the CPUQACTIVE for that processor is deasserted. This hints that retention is possible for that processor.

  2. The external power controller asserts CPUQREQn to indicate that it wants to put that processor into retention state.

  3. While the processor is still in WFI low-power state and the clocks are stopped, the processor accepts the retention request by asserting CPUQACCEPTn.

  4. While CPUQREQn and CPUQACCEPTn are both asserted, the processor is in quiescent state and the external power controller can safely put the processor into retention state.

  5. During retention, if a snoop occurs to access the cache of the quiescent processor, the CPUQACTIVE signal is asserted to request exit from retention.

  6. The external power controller brings the processor out of retention and deasserts CPUQREQn.

  7. The processor deasserts CPUQACCEPTn to complete the handshake.

  8. The clocks in the processor are restarted temporarily to allow the snoop request to the processor to proceed.

  9. After the snoop access is complete, the processor deasserts CPUQACTIVE.

  10. CPUQREQn and CPUQACCEPTn are then asserted. The processor has reentered quiescent state and the external power controller can put the processor into retention state again.

  11. When the processor is ready to exit WFI low-power state, CPUQACTIVE is asserted.

  12. CPUQREQn is then deasserted, the processor exits WFI low-power state, and CPUQACCEPTn is deasserted.

Figure 2.14 shows a typical sequence where the external power controller successfully places the processor in retention state.

Figure 2.14. Successful retention timing

To view this graphic, your browser must support the SVG format. Either install a browser with native support, or install an appropriate plugin such as Adobe SVG Viewer.


The processor enters WFI low-power state and deasserts CPUQACTIVE. The external power controller asserts CPUQREQn. If the processor cannot safely enter quiescent state, it asserts CPUQDENY instead of CPUQACCEPTn. When this occurs, the external power controller cannot put that processor into retention state. The external power controller must then deassert CPUQREQn, then the processor deasserts CPUQDENY.

Figure 2.15 shows a sequence where the external power controller attempts to put a processor in retention state but the processor denies the request.

Figure 2.15. Denied retention timing

To view this graphic, your browser must support the SVG format. Either install a browser with native support, or install an appropriate plugin such as Adobe SVG Viewer.


Guidelines on the use of processor dynamic retention

As processors generally only stay in WFE low-power state for a short period of time, ARM recommends that you only take a processor into retention when it is in WFI low-power state.

If the L1 data cache of a processor that is in WFI low-power state contains data that is likely to be the target of frequent snoops from other processors, entering quiescent state and retention is likely to be inefficient.

When using the processor retention feature, you must consider the following points:

  • During processor reset, CPUQREQn must be deasserted HIGH while CPUQACCEPTn is asserted LOW.

  • The Processor dynamic retention control field in the CPU Extended Control Register, CPUECTLR, must be set to a nonzero value to enable this feature. If this field is 0b000, all assertions of CPUQREQn LOW receive CPUQDENY responses. See CPU Extended Control Register, EL1.

  • If the processor dynamic retention feature is not used, CPUQREQn must be tied HIGH and the CPUECTLR retention control field set to disabled. See CPU Extended Control Register, EL1 for more information.

Note

If you use the processor dynamic retention feature then the CPU Auxiliary Control Register, CPUACTLR[30:29] bits must be zero. See CPU Auxiliary Control Register, EL1.

L2 RAMs dynamic retention

L2 RAM dynamic retention mode provides a way of saving power in an idle multiprocessor while allowing quick wake-up to service a snoop from ACE or CHI. The processor supports dynamic retention of the L2 Data, Dirty, Tag, Inclusion PF. and Snoop Tag RAMs.

The multiprocessor has an L2 Q-channel interface that allows an external power controller to place the L2 RAMs into a retention state.

L2 RAM dynamic retention mode is entered and exited using the following sequence of events:

  1. All processors are in WFI or WFE low-power state and therefore, all the processors STANDBYWFI or STANDBYWFE outputs are asserted.

  2. When all pending L2 activity is complete, and the L2 remains idle for the programmed number of Generic Timer CNTVALUEB ticks, as specified by L2ECTLR[2:0] field, the L2 deasserts L2QACTIVE. See L2 Extended Control Register, EL1 for more information.

  3. The external power controller asserts L2QREQn to indicate that it wants to put the L2 RAMs into retention state.

  4. If the L2 is still idle, it accepts the retention request by asserting L2QACCEPTn.

  5. While L2QREQn and L2QACCEPTn are both asserted, the power controller can safely put the L2 RAMs into retention state.

  6. If the L2 detects that one or more processors have exited WFI low-power state, the ACP becomes active or a snoop request must be serviced, the L2 asserts L2QACTIVE to request exit from retention.

  7. The power controller brings the L2 RAMs out of retention and deasserts L2QREQn.

  8. The L2 deasserts L2QACCEPTn to complete the handshake.

Figure 2.16 shows the L2 dynamic retention timing.

Figure 2.16. L2 dynamic retention timing

To view this graphic, your browser must support the SVG format. Either install a browser with native support, or install an appropriate plugin such as Adobe SVG Viewer.


If the L2 exits idle in step 4, it asserts L2QDENY instead of L2QACCEPTn. In response, the power controller must deassert L2QREQn, causing the L2 to deassert L2QDENY.

The L2 dynamic retention control field in the L2 Extended Control Register, L2ECTLR, must be set to a nonzero value to enable this feature. If this field is 0b000, all assertions of L2QREQn LOW receive L2QDENY responses. See L2 Extended Control Register, EL1.

If the L2 dynamic retention feature is not used, L2QREQn must be tied HIGH and the L2ECTLR retention control field set to disabled. See L2 Extended Control Register, EL1 for more information.

Note

If you use the L2 dynamic retention feature then the L2 Auxiliary Control Register, L2ACTLR[28:27] bits must be zero. See L2 Auxiliary Control Register, EL1.

Advanced SIMD and FP clock gating

The multiprocessor supports dynamic high-level clock gating of the Advanced SIMD and FP unit to reduce dynamic power dissipation.

The clock to the Advanced SIMD and FP unit is enabled when an Advanced SIMD or FP instruction is detected in the pipeline, and is disabled otherwise.

You can set bit[29] of the CPU Auxiliary Control Register, CPUACTLR_EL1, to 1 to disable dynamic clock gating of the Advanced SIMD or FP unit. See CPU Auxiliary Control Register, EL1.

L2 control and tag banks clock gating

The multiprocessor supports dynamic high-level clock gating of the shared L2 control logic and the two L2 tag banks to reduce dynamic power dissipation.

The L2 tag bank clocks are only enabled when a corresponding access to the L2 tag bank is detected in the pipeline.

The L2 control logic is disabled after 256 consecutive idle cycles. It is then enabled when an L2 access is detected, with an additional 4-cycle penalty for the wake up before the access is serviced.

You can set bit[28] of the L2 Auxiliary Control Register, L2ACTLR_EL1, to 1 to disable dynamic clock gating of the L2 tag banks. See L2 Auxiliary Control Register, EL1.

You can set bit[27] of the L2 Auxiliary Control Register, L2ACTLR_EL1, to 1 to disable dynamic clock gating of the L2 control logic. See L2 Auxiliary Control Register, EL1.

Regional clock gating

In addition to extensive local clock gating to register flops, you can configure the multiprocessor to include Regional Clock Gates (RCGs) that can perform additional clock gating of logic blocks such as the register banks to reduce dynamic power dissipation.

You can set bit[63] of the CPUACTLR_EL1 to 1 to disable regional clock gating for each processor. See CPU Auxiliary Control Register, EL1.

You can set bit[26] of the L2ACTLR_EL1 to 1 to disable regional clock gating in the L2, GIC, and Timer. See L2 Auxiliary Control Register, EL1.

Copyright © 2013, 2014 ARM. All rights reserved.ARM DDI 0488D
Non-ConfidentialID012914