2.4.12 Quality of Service

The CCI-550 provides a set of QoS regulation and control mechanisms.

The following mechanisms are supported:

QoS value as a priority indicator

The CCI-550 uses the QoS value as a priority indicator for arbitration of requests. The QoS value can be from an input to a slave interface, or it can be overwritten by a programmed value.

The CCI-550 uses the QoS value when selecting the request to admit into the main transaction queue. Requests with the highest QoS have the highest priority unless an anti-starvation mechanism is activated. The CCI-550 uses a Least Recently Granted (LRG) scheme when two or more transactions share the highest priority. The arbiter has starvation avoidance mechanisms to prevent high bandwidth requests from stalling lower priority requests indefinitely.
The CCI-550 propagates QoS values. This determines the service rate when downstream interconnect and slave devices are sensitive to the QoS value. The NIC-400 Network Interconnect is sensitive to the QoS value.

Note

Ensure that you balance the relative priorities of all slave interfaces. For example, setting each one to the highest QoS value reduces the arbitration to LRG, and there is no advantage in using the QoS value.
You can override the ARQOS and AWQOS input signals on each slave interface by using a programmable register. The value from this register is only applied if the relevant static input signal, QOSOVERRIDE[6:0], is HIGH. CCI-550-generated transactions use the QoS value of the trigger transaction or the override value if the QOSOVERRIDE signal is set.

Note

The QOSOVERRIDE signal only applies to transactions for which the ARQOS or AWQOS signals are set to a value of zero. Therefore, each interface can have a mixture of overridden traffic and other traffic, with an unaffected non-zero QoS value.

High and low priority requests

You can use the QoS Threshold Register to set a QoS value threshold that classifies requests as high or low priority. A high priority request is a read or write request with an ARQOS or AWQOS value that is equal to or greater than the threshold.

In heavy congestion, high priority requests use a TT reserved slot to take a fast path through the CCI-550.

QoS value regulation based on requested bandwidth

You can configure each CCI-550 slave interface to have a bandwidth regulator for read and write requests. The regulator enables you to modify the QoS value of read and write requests to suit the allocated bandwidth through each slave interface.

To use the regulator, each slave interface has a read and write bandwidth allocation and a QoS value range. You can set the programmable bandwidth_allocation value in bytes per cycle. The CCI-550 has 128-byte interfaces and bandwidth_allocation is a 4-bit value that represents 0-15 bytes per cycle. The following table shows the bandwidth_allocation settings, where the CCI-550 is running at 800MHz.

Table 2-7 bandwidth_allocation settings

bandwidth_allocation Bytes per cycle Bandwidth (GB/s)
0b0000 0 0
0b0001 1 0.8
0b0010 2 1.6
0b0011 3 2.4
0b0100 4 3.2
0b0101 5 4.0
0b0110 6 4.8
0b0111 7 5.6
0b1000 8 6.4
0b1001 9 7.2
0b1010 10 8.0
0b1011 11 8.8
0b1100 12 9.6
0b1101 13 10.4
0b1110 14 11.2
0b1111 15 12.0
When enabled on an interface, the regulator uses the maximum QoS value when requests are issued at a rate lower than, or equal to, their allocation. If requests are issued at a rate greater than the allocation, the regulator reduces the QoS value until either the request bandwidth reduces or the minimum QoS value is reached. The regulator has an accumulator that tracks the excess requested data, in bytes, and modifies the QoS value according to a programmable value excess_bytes_per_qv. The following table shows the possible excess_bytes_per_qv values.

Table 2-8 excess_bytes_per_qv values

Encoding
Excess, in bytes
0b000
256
0b001
512
0b010
1024
0b011
2048
0b100
4096
0b101
8192
0b110
16384
0b111
32768
The regulator has a nominal 64-byte granule size, and most transactions are expected to be of cache line length. Therefore, transactions that are not a multiple of 64 bytes are rounded up to the nearest 64 bytes.

Example of using the bandwidth regulator

This example system uses a CCI-550 running at 800MHz to connect:

  • Two CPU clusters.
  • A display processor.
  • A GPU.
The example system also has the following characteristics:
  • Each CPU cluster requests read data at an average of 1GB/s. However, at times it can saturate the read data channel, and therefore has a peak bandwidth of 12.8GB/s.
  • The display processor requires an average of 2.8GB/s of read bandwidth and has a 32KB read data buffer that must not underflow.
  • The GPU consumes an average of 6.0GB/s, but peaks at 12.8GB/s. The GPU is more tolerant of bandwidth variations than the other processors.
  • The memory system can provide 16GB/s of read bandwidth.
The following table summarizes the bandwidth requirements for each processor.

Table 2-9 Example system bandwidth allocation

Component Average read bandwidth (GB/s) Peak read bandwidth (GB/s)
Cluster 1 1.0 12.8
Cluster 2 1.0 12.8
Display 2.8 2.8
GPU 6.0 12.8
Total 10.8 41.2
The table demonstrates that the memory cannot provide sufficient bandwidth to cater for the peak rates of each processor. It is therefore necessary to use QoS to manage bandwidth allocations.
When allocating QoS values, assume that requests with higher values are serviced ahead of requests with lower values.
To achieve the lowest latency, the CPU clusters must issue requests with the highest priority. However, during periods of peak request rates, the CPUs might use all the available memory bandwidth. To prevent this, the CPUs must be regulated to a QoS value that is lower than that of the display processor.
The following table shows example QoS values for the system.

Table 2-10 Example system QoS values

Component Maximum QoS value Minimum QoS value
Cluster 1 14 8
Cluster 2 14 8
Display 12 12
GPU 7 7
You can set the bandwith_allocation for the CPU clusters to a value that is higher than their average, for example 4.8GB/s. The memory controller can provide 16GB/s, leaving 6.4GB/s of memory bandwidth for the display and GPU. The display has a read buffer of 32KB, which must not underrun. The CPUs are given a maximum QoS value that is two levels higher than that of the display processor. Setting the QoS regulators so that the CPU QoS value is modified at a rate of 4KB per QoS value, each CPU is permitted 8KB of excess data. The following figure shows the effective QoS assignments and ranges.
Figure 2-2 QoS value and display buffer underrun
To view this graphic, your browser must support the SVG format. Either install a browser with native support, or install an appropriate plugin such as Adobe SVG Viewer.

If the two CPU clusters request data at a peak rate, the display can be starved. However, the maximum excess data the clusters can request at a priority higher than the display is 2 x 8KB, that is, 16KB. This excess is sufficiently lower than the 32KB display buffer size, so the buffer is unlikely to underrun, provided the available memory bandwidth does not drop below 4.8 + 4.8 + 2.8, that is, 12.4GB/s. Some contingency is built into this example, to allow for inaccuracies in assumptions.
QoS value regulation is only applied to requests with a QoS value of 0. You can tie-off the ARQOS and AWQOS inputs LOW if you want the CCI-550 to always drive the QoS values. Alternatively, you can set both these inputs LOW for traffic you want to regulate, and HIGH for other traffic.

Regulation based on outstanding transactions

Each slave interface has a programmable mechanism for limiting the number of outstanding read and write transactions.

An Outstanding Transaction (OT) is a read request that has not yet received its last beat of read data, or a write request that has not yet received a response. You can use the OT regulation mechanism with QoS value mechanisms or when the system is not sensitive to the QoS value.
There is a combined OT count for read and write transactions, and this count includes all possible request types. Two-part DVM messages count as two outstanding transactions, and transactions that the CCI-550 splits into 64-byte granules count as multiple transactions.
When programming the Maximum OT register, the hardware implementation sets the value for the maximum number of OTs for a slave interface. This maximum value is the value of the register from reset. The minimum value for the OT register is SIx_W_MIN + 2. This minimum value is the number of tracker slots that are reserved for requests from each slave interface, to prevent deadlock. If you write a value outside these limits, then the limited value is set and read back.
The OT limit sets a maximum bandwidth for the attached master, based on the average response latency from downstream. You can use the following approximation to allocate memory bandwidth resource among various masters in the system:
  • OT limit = maximum bandwidth * average latency / bytes per request
For example, if the average latency between arrival at the main CCI-550 tracking structures and downstream response is 128ns, the maximum required bandwidth is 8GB/s, and requests are 64 bytes in length, then the necessary OT limit for an ACE-Lite master assuming a negligible hit rate is:
  • max OT = 8 * 128 / 64 = 16

Note

For ACE masters, the time from the response to the RACK or WACK acknowledgement must be included in the response latency.
Non-ConfidentialPDF file icon PDF versionARM 100282_0100_00_en
Copyright © 2015, 2016 ARM. All rights reserved.