7.2.5 Register slice support for large cache sizes

As the L2 cache size is increased, the area of the implementation increases. This increase adds significant route delays to and from the RAM memories. This increase can impact the maximum frequency of the implementation.

To counter this, you can insert register slices before and after the RAM memories to offset the longer route delays. This enables the frequency target of the implementation to remain high. Additional slices can impact the overall L2 hit latency but they can enable requests to be streamed in a more efficient manner. You can increase the programmed latency values of the RAMs to cover the additional route delays without adding the slices. However, this method has an impact on performance because requests cannot be streamed as efficiently.
The L2 RAMs support one inserted register slice. Each register slice introduces a pair of registers, one before the RAM and one after the RAM.
Bits[12] and [10] of the CP15 L2 Control Register, L2CTLR, indicate the presence of RAM register slices in the design. In addition, the L2CTLR contains bits to program the setup and latency for the L2 Tag and Data RAMs.
Related information
4.3.58 L2 Control Register, EL1

Overall RAM latency calculation

The RAM latency is a function of the following:
  • Programmed latency in the L2 Control Register.
  • Additional strobe clock setup required value in the L2CTLR.
RAM latency = programmed value + strobe setup.
The RAM latency determines the rate at which back to back operations to the RAM can be scheduled.
The total effective latency = RAM latency + 2×N, where N is the number of register slices to insert.
The slices are considered pipeline registers and do not affect the throughput rate of RAM accesses.
The following table shows the total effective L2 Tag latency with the register slice and setup factored in.

Table 7-1 Total effective L2 Tag latency with slice and setup factored in

L2CTLR[8:6] register bits Total effective Tag latency
Tag slice =0
Tag setup =0
Tag slice =0
Tag setup =1
Tag slice =1
Tag setup =0
Tag slice =1
Tag setup =1
000a 2 3 4 5
001 2 3 4 5
010 3 4 5 5
011 4 5 5 5
100 5 5 5 5
1xx, ≥ 4 5 5 5 5

Note

  • The total effective L2 Tag latency is set to a maximum of 5 cycles.
  • Each tag slice adds 2 cycles and affects the L2 Tag, Snoop Tag, Dirty, and Inclusion PLRU RAMs.
  • Setting tag setup to 1 adds 1 cycle.
  • Slice and setup have priority over programmed latency in determining the total effective L2 Tag latency.
The following example shows a Tag RAM access with 3 cycles total effective Tag latency.

Example 7-1 Examples

Tag RAM access with 3 cycles total latency
When tag slice = 0, L2CTLR[9] = 0, L2CTLR[8:6] = 0b010, the following applies:
  • No slice cycle.
  • No setup cycle.
  • 3 cycles Tag RAM access.
  • 3 cycles total effective Tag latency.
The following example shows a Tag RAM access with 4 cycles total effective Tag latency.

Example 7-2 Examples

Tag RAM access with 4 cycles total latency
When tag slice = 0, L2CTLR[9] = 1, L2CTLR[8:6] = 0b010, the following applies:
  • No slice cycle.
  • 1 setup cycle.
  • 4 cycles Tag RAM access (Programmed + Setup).
  • 4 cycles total effective Tag latency.
The following example shows a Tag RAM access with 5 cycles total effective Tag latency.

Example 7-3 Examples

Tag RAM access with 5 cycles total latency
When tag slice = 1, L2CTLR[9] = 1, L2CTLR[8:6] = 0b010, the following applies:
  • 2 slice cycles.
  • 1 setup cycle.
  • 3 cycles Tag RAM access (Programmed + Setup, capped to 3 because of slices).
  • 5 cycles total effective Tag latency.
The following table shows the total effective L2 Data latency with the register slice and setup factored in.

Table 7-2 Total effective L2 Data latency with slice and setup factored in

L2CTLR[2:0] register bits Total effective Data latency
Data slice =0
Data setup =0
Data slice =0
Data setup =1
Data slice =1
Data setup =0
Data slice =1
Data setup =1
Data slice =2
Data setup =0
Data slice =2
Data setup =1
000b 2 3 4 5 6 7
001 2 3 4 5 6 7
010 3 4 5 6 7 8
011 4 5 6 6 8 8
100 5 6 6 6 8 8
101, 11x >=5 6 6 6 6 8 8

Note

  • The total effective L2 Data latency is set to a maximum of 8 cycles for configurations supporting Data slice=2, otherwise the maximum is set to 6 cycles.
  • Each data slice adds 2 cycles and affects the L2 data and data ECC RAMs.
  • Setting data setup to 1 adds 1 cycle.
  • Slice and setup have priority over programmed latency in determining the total effective L2 Data latency.
The following example shows a Data RAM access with 4 cycles total effective Data latency.

Example 7-4 Data RAM access with 4 cycles total latency

When data slice = 0, L2CTLR[5] = 0, L2CTLR[2:0] = 0b011, the following applies:
  • No slice cycle.
  • No setup cycle.
  • 4 cycles Data RAM access.
  • 4 cycles total effective Data latency.
The following example shows a Data RAM access with 5 cycles total effective Data latency.

Example 7-5 Data RAM access with 5 cycles total latency

When data slice = 0, L2CTLR[5] = 1, L2CTLR[2:0] = 0b011, the following applies:
  • No slice cycle.
  • 1 setup cycle.
  • 5 cycles Data RAM access (Programmed + Setup).
  • 5 cycles total effective Data latency.
The following example shows a Data RAM access with 6 cycles total effective Data latency.

Example 7-6 Data RAM access with 6 cycles total latency

When data slice = 1, L2CTLR[5] = 1, L2CTLR[2:0] = 0b011, the following applies:
  • 2 slice cycles.
  • 1 setup cycle.
  • 4 cycles Data RAM access (Programmed + Setup, capped to 4 due to register slices).
  • 6 cycles total effective Data latency.
The following example shows a Data RAM access with 8 cycles total effective Data latency.

Example 7-7 Data RAM access with 8 cycles total latency

When data slice = 2, L2CTLR[5] = 1, L2CTLR[2:0] = 0b011, the following applies:
  • 4 slice cycles.
  • 1 setup cycle.
  • 4 cycles Data RAM access (Programmed + Setup, capped to 4 due to register slices).
  • 8 cycles total effective Data latency.
a
This is the reset value.
b This is the reset value.
Non-ConfidentialPDF file icon PDF versionARM 100095_0002_04_en
Copyright © 2014-2016 ARM. All rights reserved.