7.2.5. Register slice support for large cache sizes

As the L2 cache size is increased, the area of the implementation increases. This increase adds significant route delays to and from the RAM memories. This increase can impact the maximum frequency of the implementation. To counter this, you can insert register slices before and after the RAM memories to offset the longer route delays. This enables the frequency target of the implementation to remain high. Additional slices can impact the overall L2 hit latency but they can enable requests to be streamed in a more efficient manner. You can increase the programmed latency values of the RAMs to cover the additional route delays without adding the slices. However, this method has an impact on performance because requests cannot be streamed as efficiently.

The L2 Data RAMs support up to two inserted register slices, whereas all other L2 RAMs can only support one inserted register slice. Each register slice introduces a pair of registers, one before the RAM and one after the RAM.

Bits[12:10] of the CP15 L2 Control Register, L2CTLR, indicate the number of RAM register slices in the design. In addition, the L2CTLR contains bits to program the setup and latency for the L2 Tag and Data RAMs. See L2 Control Register, EL1 for more information.

Overall RAM latency calculation

The RAM latency is a function of the following:

  • Programmed latency in the L2 Control Register, L2CTLR, see L2 Control Register, EL1.

  • Additional strobe clock setup required value in the L2CTLR.

  • Number of slices added.

RAM latency = programmed value + strobe setup + 2×N, where N is the number of register slices to insert.

Table 7.1 shows the adjusted L2 Tag RAM latency with the register slice and setup factored in.

Table 7.1. L2 Tag RAM latency with slice and setup factored in

L2CTLR[8:6] register bitsTotal adjusted Tag RAM latency

Tag slice =0

Tag setup =0

Tag slice =0

Tag setup =1

Tag slice =1

Tag setup =0

Tag slice =1

Tag setup =1

000[a]2345
0012345
0103455
0114555
1005555
1xx, ≥ 45555

[a] This is the reset value.


Note

  • The L2 Tag RAM total latency is set to a maximum of 5 cycles.

  • Each tag slice adds 2 cycles and affects the L2 Tag, Snoop Tag, Dirty, Inclusion PF, and prefetch stride queue RAMs.

  • Setting tag setup to 1 adds 1 cycle.

  • Slice and setup have priority over programmed latency in determining the total adjusted RAM latency.

Example 7.1 shows a Tag RAM access with 3 cycles total RAM latency.

Example 7.1. Tag RAM access with 3 cycles total latency

When tag slice = 0, L2CTLR[9] = 0, L2CTLR[8:6] = 0b010, the following applies:

  • No slice cycle.

  • No setup cycle.

  • 3 cycles Tag RAM access.

  • 3 cycles total Tag RAM latency.


Example 7.2 shows a Tag RAM access with 5 cycles total RAM latency.

Example 7.2. Tag RAM access with 5 cycles total latency

When tag slice = 1, L2CTLR[9] = 1, L2CTLR[8:6] = 0b010, the following applies:

  • 2 slice cycles.

  • 1 setup cycle.

  • 2 cycles Tag RAM access adjusted because of slice and setup values.

  • 5 cycles total Tag RAM latency.


Table 7.2 shows the adjusted L2 Data RAM latency with the register slice and setup factored in.

Table 7.2. L2 Data RAM latency with slice and setup factored in

L2CTLR[2:0] register bitsTotal adjusted Data RAM latency

Data slice =0

Data setup =0

Data slice =0

Data setup =1

Data slice =1

Data setup =0

Data slice =1

Data setup =1

Data slice =2

Data setup =0

Data slice =2

Data setup =1

000[a]234567
001234567
010345678
011456788
100567888
101678888
110788888
111888888

[a] This is the reset value.


Note

  • The L2 Data RAM total latency is set to a maximum of 8 cycles.

  • Each data slice adds 2 cycles and affects the L2 data and data ECC RAMs.

  • Setting data setup to 1 adds 1 cycle.

  • Slice and setup have priority over programmed latency in determining the total adjusted RAM latency.

Example 7.3 shows a Data RAM access with 4 cycles total RAM latency.

Example 7.3. Data RAM access with 4 cycles total latency

When data slice = 0, L2CTLR[5] = 0, L2CTLR[2:0] = 0b011, the following applies:

  • No slice cycle.

  • No setup cycle.

  • 4 cycles Data RAM access.

  • 4 cycles total Data RAM latency.


Example 7.4 shows a Data RAM access with 8 cycles total RAM latency.

Example 7.4. Data RAM access with 8 cycles total latency

When data slice = 2, L2CTLR[5] = 1, L2CTLR[2:0] = 0b011, the following applies:

  • 4 slice cycles.

  • 1 setup cycle.

  • 3 cycles Data RAM access adjusted because of slice and setup values.

  • 8 cycles total Data RAM latency.


Copyright © 2013, 2014 ARM. All rights reserved.ARM DDI 0488D
Non-ConfidentialID012914