7.2.1. Protection method

The following sections describe how RAM errors are managed:

Detecting errors

The Cortex-R7 MPCore processor uses ECC to indicate errors on the RAMs. ECC can also correct errors because the probability of single errors is much higher than the probability of multiple errors in the same ECC chunk. This is done by arranging the RAMs so that physically contiguous positions in the RAMs do not correspond to the same ECC chunks.

Correcting errors

The Cortex-R7 MPCore processor implements RAM error correction using a clean and invalidate and retry for caches, and a correct, writeback, and retry mechanism for TCMs. When a correctable error is detected, as shown in Table 7.1, the corresponding index/way is cleaned and invalidated. When the clean and invalidate operation completed, the requester retries its access.

Note

The detection of multiple-bit errors is not synchronous. Therefore, when such an error is notified, corrupted data might not be contained. Contact ARM for more details about multiple-bit ECC errors.

Instruction side

On the instruction side, lines are always clean so that invalidating the line is sufficient. The retried access then fetches the correct value from the upper level memory.

Data side

On the data side, the cache line can be dirty. The correction of the read contents is done as part of the clean and invalidate operation for caches. This takes place in the eviction buffer and in the cache coherency block. For TCMs, correction of the read contents is done with a correct and writeback operation.

SCU

The detection of an error in the duplicate of the tags of a processor causes a clean and invalidate in the corresponding processor tag RAM. When the clean and invalidate is done, the line in the SCU tag RAM is marked as unusable.

Handling permanent errors

Permanent errors are handled as follows:

General behavior

If hard, or permanent, errors occur on the RAMs, the clean/invalidate and retry scheme might cause a deadlock, and the access is continuously replayed. To prevent this, error bank registers are provided to mask the faulty locations as unusable and invalid. When an error is detected, the location is pushed in the bank that masks the corresponding valid bit of the location when reading and when allocating a new line. The line is therefore no longer used unless the entry is reset by a CP15 access. There is a short period of time during which the line is still seen by the system, but is removed from the allocation pool.

The depth of the error bank determines how many errors can be supported by the system. When this limit is reached, the system might livelock. The processor provides a special ECC event indicating the number of corrupted location to monitor the error bank status before it becomes full. This is a condition that can cause a potential deadlock. This information is reported on several pins signaling the usage of the error bank, that is, showing if the error bank is empty or at least one error has been encountered. See Error detection notification signals.

Interaction between SCU and processors

For a processor error, the line is cleaned and invalidated and the ECC error bank prevents any future allocation in this way. However, the line is still seen as present by the SCU, and the SCU requests the line to the processor that misses or hits, depending on whether the line has been reallocated in another cache location.

For an SCU error, the line is marked as unusable by the SCU error bank but the processor still sees the line as usable. Therefore, a processor can request an access to this way to allocate the cache line, but the write fails in the SCU without being reported. Because of this, the error seen by SCU is sent back to the processor, and stored in the processor data error bank.

Reporting errors

The Cortex-R7 MPCore processor notifies the detection of any error using primary output events, and the update of performance and statistics counters.

Copyright © 2012, 2014 ARM. All rights reserved.ARM DDI 0458C
Non-ConfidentialID112814