8.5.3. Cache error detection and correction

This section describes how the processor detects, handles, reports, and corrects cache memory errors. Memory errors detected with parity or ECC have Fault Status Register (FSR) values to distinguish them from other abort causes.

This section describes:

Error build options

The caches can detect and correct errors depending on the build options used in the implementation. The build options for the instruction cache can be different to the data cache.

If the parity build option is enabled, the cache is protected by parity bits. For both the instruction and data cache, the data RAMs include one parity bit per byte of data. The tag RAM contains one parity bit to cover the tag and valid bit.

If the ECC build option is enabled:

  • The instruction cache is protected by a 64-bit ECC scheme. The data RAMs include eight bits of ECC code for every 64 bits of data. The tag RAMs include seven bits of ECC code to cover the tag and valid bit.

  • The data cache is protected by a 32-bit ECC scheme. The data RAMs include seven bits of ECC code for every 32 bits of data. The tag RAMs include seven bits of ECC code to cover the tag and valid bit. The dirty RAM includes four bits of ECC to cover the dirty bit and the two outer attributes bits of each cache line.

Address decoder faults

The error detection schemes described in this section provide protection against errors that occur in the data stored in the cache RAMs. Each RAM normally includes a decoder that enables access to that data and, if an error occurs in this logic, it is not normally detected by these error detection schemes. The processor includes features that enable it to detect some address decoder faults. If you are implementing the processor and require these features, contact ARM to discuss the features and your requirements.

Handling cache parity errors

Table 8.2 shows the behavior of the processor on a cache parity error, depending on bits [5:3] of the Auxiliary Control Register, see c1, Auxiliary Control Register.

Table 8.2. Cache parity error behavior

ValueBehavior
b000Generate abort on parity errors[a], force write-through, enable hardware recovery
b001
b010
b011Reserved
b100Disable parity checking
b101Do not generate abort on parity errors, force write-through, enable hardware recovery
b110
b111Reserved

[a] Parity errors caused by ACP coherency maintenance operations do not generate aborts


See Disabling or enabling error checking for information on how to safely change these bits.

Hardware recovery

When parity checking is enabled, hardware recovery is always enabled. Memory marked as write-back write-allocate behaves as write-though. This ensures that cache lines can never be dirty, therefore the error can always be recovered from by invalidating the cache line that contains the parity error. The processor automatically performs this invalidation when an error is detected. The correct data can then be re-read from the L2 memory system.

Parity aborts

If aborts on parity errors are enabled, software is notified of the error by a data abort or prefetch abort. The error is still automatically corrected by the hardware even if an abort is generated.

If abort generation is not enabled, the hardware recovery including the access retry is invisible to software. If required, software can use events and the Correctable Fault Location Register to monitor the errors that are detected and corrected. See Error detection events and Correctable Fault Location Register.

Parity errors, caused by ACP coherency maintenance operations, never generate aborts.

Handling cache ECC errors

Table 8.3 shows the behavior of the processor on a cache ECC error, depending on bits [5:3] of the Auxiliary Control Register, see c1, Auxiliary Control Register.

Table 8.3. Cache ECC error behavior

ValueBehavior
b000

Generate abort on ECC errors[a], enable hardware recovery

b001
b010Generate abort on ECC errors[a], force write-through, enable hardware recovery
b011Reserved
b100Disable ECC checking
b101Do not generate abort on ECC errors, enable hardware recovery
b110Do not generate abort on ECC errors, force write-through, enable hardware recovery
b111Reserved

[a] ECC errors caused by ACP coherency maintenance operations do not generate aborts


See Disabling or enabling error checking for information on how to safely change these bits.

When ECC checking is enabled, hardware recovery is always enabled. When an ECC error is detected, the processor tries to evict the cache line containing the error. If the line is clean, it is invalidated, and the correct data is reloaded from the L2 memory system. If the line is dirty, the eviction writes the dirty data out to the L2 memory system, and in the process it corrects any 1-bit errors. The corrected data is then reloaded from the L2 memory system.

If a 2-bit error is detected in a dirty line, the error is not correctable. If the 2-bit error is in the tag or dirty RAM, no data is written to the L2 memory system. If the 2-bit error is in the data RAM, the cache line is written to the L2 memory system, but the AXI master port WSTRBMm signal is LOW for the data that contains the error. If an uncorrectable error is detected, an abort is always generated because data might have been lost. It is expected that such a situation can be fatal to the software process running.

If one of the force write-though settings is enabled, memory marked as write-back write-allocate behaves as write-though. This ensures that cache lines can never be dirty, therefore the error can always be recovered from by invalidating the cache line that contains the ECC error.

You can recover from all detectable errors in the instruction cache, because the instruction cache can never contain dirty data.

ECC aborts

If aborts on ECC errors are enabled, software is notified of the error by a data abort or prefetch abort. The error is still automatically corrected by the hardware even if an abort is generated.

If abort generation is not enabled, the hardware recovery including the access retry of correctable errors is invisible to software. If required, software can use events and the Correctable Fault Location Register to monitor the errors that are detected and corrected. See Error detection events and Correctable Fault Location Register.

ECC errors, caused by ACP coherency maintenance operations, never generate aborts.

Errors on instruction cache read

All parity or ECC errors detected on instruction cache reads are correctable. If aborts are enabled, a synchronous prefetch abort exception occurs. The instruction FAR gives the address that caused the error to be detected. The instruction FSR indicates a parity error on a read. The auxiliary FSR indicates that the error was in the cache and which cache Way the error was in.

Errors on data cache read

If parity or ECC aborts are enabled, or an uncorrectable ECC error is detected, a synchronous data abort exception occurs. The data FAR gives the address that caused the error to be detected. The data FSR indicates a synchronous read parity error. The auxiliary FSR indicates that the error was in the cache and which cache Way the error was in.

Errors on data cache write

If parity or ECC aborts are enabled, or an uncorrectable ECC error is detected, an asynchronous data abort exception occurs. Because the abort is asynchronous, the data FAR is Unpredictable. The data FSR indicates an asynchronous write parity error. The auxiliary FSR indicates that the error was in the cache and which cache Way and Index the error was in.

In write-through cache regions the store that caused the error is written to external memory using the L2 memory interface so data is not lost and the error is not fatal.

Errors on evictions

If the cache controller has determined a cache miss has occurred, it might have to do an eviction before a linefill can take place. This can occur on reads, and on writes if write-allocation is enabled for the region. Certain cache maintenance operations also generate evictions. If it is a data-cache line that is dirty, an ECC error might be detected on the line being evicted:

  • if the error is correctable, it is corrected inline before the data is written to the external memory using the L2 memory interface

  • if there is an uncorrectable error in the tag or dirty RAM, the write is not done and an asynchronous abort occurs

  • if there is an uncorrectable error in the data RAM, the AXI master port WSTRBMm signal is deasserted for the words with an error, and an asynchronous abort occurs.

An asynchronous abort can also occur on a correctable error depending on the Auxiliary Control Register bits [5:3], see c1, Auxiliary Control Register. Any detected error is signaled with the appropriate event.

Note

When parity checking is enabled, force write-though is always enabled. Therefore the cache lines can never be dirty, and so evictions are not required. Force write-through can also be enabled with ECC checking.

Errors on cache maintenance operations

The following sections describe errors on cache maintenance operations:

Invalidate all instruction cache

This operation ignores all errors in the cache and sets all instruction cache entries to invalid regardless of error events. This operation cannot generate an asynchronous abort, and no error events are signaled.

Invalidate all data cache

This operation ignores all errors in the cache and sets all data cache entries to invalid regardless of errors. This operation cannot generate an asynchronous abort and no error events are signaled.

Invalidate instruction cache by address

This operation requires a cache lookup. Any errors found in the set that was looked up are fixed by invalidating that line and, if the address in question is found in the set, it is invalidated.

This operation cannot generate an asynchronous abort. Any detected error is signaled with the appropriate event.

Invalidate data cache by address

This operation requires a cache lookup. Any correctable errors found in the set that was looked up are fixed and, if the address in question is found in the set, it is invalidated.

Any uncorrectable errors cause an asynchronous abort. An asynchronous abort can also be raised on a correctable error if aborts on RAM errors are enabled in the Auxiliary Control Register.

Any detected error is signaled with the appropriate event.

Invalidate data cache by set/way

This operation does not require a cache lookup. It refers to a particular cache line.

The entry at the given set/way is marked as invalid regardless of any errors. This operation cannot generate an asynchronous abort. Any detected error is signaled with the appropriate event.

Clean data cache by address

This operation requires a cache lookup. Any correctable errors found in the set that was looked up are fixed and, if the address in question is found in the set, the instruction carries on with the clean operation. When the tag lookup is done, the dirty RAM is checked.

Note

When force write-through is enabled, the dirty bit is ignored.

If the tag or dirty RAM has an uncorrectable error, the data is not written to memory.

If the line is dirty, the data is written back to external memory. If the data has an uncorrectable error, the words with the error have their WSTRBMm AXI signal deasserted. If there is a correctable error, the line has the error corrected inline before it is written back to memory.

Any uncorrectable errors cause an asynchronous abort. An asynchronous abort can also be raised on a correctable error if aborts on RAM errors are enabled in the Auxiliary Control Register.

Any detected error is signaled with the appropriate event.

Clean data cache by set/way

This operation does not require a cache lookup. It refers to a particular cache line.

The tag and dirty RAMs for the cache line are checked.

Note

When force write-through is enabled, the dirty bit is ignored.

If the tag or dirty RAM has an uncorrectable error, the data is not written to memory.

If the line is dirty, the data is written back to external memory. If the data has an uncorrectable error, the words with the error have their WSTRBMm AXI signal deasserted. If there is a correctable error, the line has the error corrected inline before it is written back to memory.

Any uncorrectable errors found cause an asynchronous abort. An asynchronous abort can also be raised on a correctable error if aborts on RAM errors are enabled in the Auxiliary Control Register.

Any detected error is signaled with the appropriate event.

Clean and invalidate data cache by address

This operation requires a cache lookup. Any correctable errors found in the set that was looked up are fixed and, if the address in question is found in the set, the instruction carries on with the clean and invalidate operation. When the tag lookup is done, the dirty RAM is checked.

Note

When force write-through is enabled, the dirty bit is ignored.

If the tag or dirty RAM has an uncorrectable error, the data is not written to memory.

If the line is dirty, the data is written back to external memory. If the data has an uncorrectable error, the words with the error have their WSTRBMm AXI signal deasserted. If there is a correctable error, the line has the error corrected inline before it is written back to memory.

Any uncorrectable errors found cause an asynchronous abort. An asynchronous abort can also be raised on a correctable error if aborts on RAM errors are enabled in the Auxiliary Control Register.

Any detected error is signaled with the appropriate event.

Clean and invalidate data cache by set/way

This operation does not require a cache lookup. It refers to a particular cache line.

The tag and dirty RAMs for the cache line are checked.

Note

When force write-through is enabled, the dirty bit is ignored.

If the tag or dirty RAM has an uncorrectable error, the data is not written to memory.

If the line is dirty, the data is written back to external memory. If the data has an uncorrectable error, the words with the error have their WSTRBMm AXI signal deasserted. If there is a correctable error, the line has the error corrected inline before it is written back to memory.

Any uncorrectable errors found cause an asynchronous abort. An asynchronous abort can also be raised on a correctable error if aborts on RAM errors are enabled in the Auxiliary Control Register.

Any detected error is signaled with the appropriate event.

Errors on ACP coherency maintenance operations

Coherency maintenance operations are issued to the data cache controller when the ACP processes coherent write transactions. See Accelerator Coherency Port interface for more information on the ACP.

These operations require data cache lookups. Any correctable errors found in the set that was looked up are fixed and, if the address is found in the set and not marked as dirty, it is invalidated.

Any detected error is signaled with the appropriate event.

Copyright © 2010-2011 ARM. All rights reserved.ARM DDI 0460C
Non-ConfidentialID021511