|Non-Confidential||PDF version||ARM 100400_0001_03_en|
|Home > Fault Detection > RAM protection > Protection method|
The methods used by the processor to detect, correct, and report RAM errors.
The Cortex®‑R8 processor uses ECC to detect RAM data errors. ECC can also correct a single bit error that might occur in a chunk, where a chunk is typically one or two words of data. However, ECC cannot correct two or more bit errors in the same chunk.
The Cortex®‑R8 processor implements RAM error correction using a clean and invalidate and retry for caches, and a correct, writeback, and retry mechanism for TCMs.
When a correctable error is detected, the corresponding index/way is cleaned and invalidated. When the clean and invalidate operation is completed, the requester retries its access.
Bank registers are used to mask faulty RAM locations if a hard error occurs. If a processor error occurs, the line is cleaned and invalidated and the ECC error bank prevents any future allocation. For an SCU error, the line is marked as unusable by the SCU error bank but the processor still sees the line as usable.
Permanent errors are handled as follows:
If hard, or permanent, errors occur on the RAMs, the clean and invalidate, and retry scheme might cause a deadlock, and the access is continuously replayed. To prevent this, error bank registers are provided to mask the faulty locations as unusable and invalid. When an error is detected, the location is pushed in the bank that masks the corresponding valid bit of the location when reading and when allocating a new line. The line is therefore no longer used unless the entry is reset by a CP15 access. There is a short period during which the line is still seen by the system, but is removed from the allocation pool.
The depth of the error bank determines how many errors can be supported by the system. When this limit is reached, the system might deadlock. The processor provides a special ECC event indicating the number of corrupted locations to monitor the error bank status before it becomes full. This is a condition that can cause a potential deadlock. This information is reported on several pins signaling the usage of the error bank, that is, showing if the error bank is empty or at least one error has been encountered.
Cortex‑R8 is robust to hard-errors, but might require software intervention. When a single-bit error occurs in TCM, the corrected data is written to the error bank and then written back to TCM. The access is then replayed using the error bank data.
If a second single-bit error occurs in the TCM, the error bank is not written to, but corrected data is still written back to the TCM. This allows errors to be isolated. Isolating a hard error prevents its RAM location becoming a double-bit error that is not correctable.
Because subsequent errors do not overwrite the error bank, the replayed access uses the corrected data from the TCM. However, for soft or hard errors:
For a core error, the line is cleaned and invalidated and the ECC error bank prevents any future allocation in this way. However, the line is still seen as present by the SCU, and the SCU requests the line to the core that misses or hits, depending on whether the line has been reallocated in another cache location.
For an SCU error, the line is marked as unusable by the SCU error bank but the core still sees the line as usable. Therefore, a core can request an access to this way to allocate the cache line, but the write fails in the SCU without being reported. Because of this, the error seen by SCU is sent back to the core, and stored in the core data error bank.
The Cortex®‑R8 processor notifies the detection of any error using primary output events, and the update of performance and statistics counters.