9.6.2 Configuring data prefetching

The purpose of data prefetch modeling is to make the contents of the data cache more closely resemble those on a system with a hardware prefetcher. A default data prefetcher is supplied, which is relatively configurable. It is not intended to match any specific processor.

To run the model with data prefetch modeling enabled, using the default data prefetcher with default parameters, use the following parameters:

-C cache_state_modelled=true --plugin "<<internal><DataPrefetch>>" -C cluster0.dcache-prefetch_enabled=1

When the model exits, it reports how many prefetches were issued and how many cache hits on recently-prefetched data were detected. The performance impact is about 10% compared to running with cache state modeling enabled.

By default, a data prefetch plug-in attaches to all processors and clusters in a system, and maintains independent internal state for each processor. To change this, for example if you want a different number of tracked streams on big and LITTLE cores, load the plug-in twice and pass a different .cluster parameter to each instance, for example:

--plugin "DP_BIG=<<internal><DataPrefetch>>" --plugin "DP_LITTLE=<<internal><DataPrefetch>>" \
 -C DataPrefetch.DP_BIG.cluster=0 -C DataPrefetch.DP_LITTLE.cluster=1 \
 -C DataPrefetch.DP_BIG.lfb_entries=16 -C DataPrefetch.DP_LITTLE.lfb_entries=4

The names DP_BIG and DP_LITTLE are examples. They can be any names you choose.

The example prefetcher is a basic stride-detecting prefetcher, but relatively configurable using the following parameters:

Table 9-1 Parameters for the example prefetcher

Parameter Description
history_length Length of history to maintain.
history_threshold Number of misses to allow in history before issuing a prefetch.
lfb_entries Number of access streams to track.
mbs_expire Number of non-hitting loads to allow before the prefetcher stops tracking a potential access stream.
pf_count Number of prefetch streams available.
pf_tracker_count Number of prefetches tracked.
pf_initial_number Initial number of prefetches to issue for a new stream.
prefetch_all_levels Prefetch to all cache levels rather than just the lowest level.

An access stream is created whenever a load is made to an address which is not within three cache lines of a previously-observed load. This might overwrite a previously created access stream. When a consistent stride has been observed, that is, when addresses N, N+delta, N+2*delta are seen, a prefetch stream is allocated with stride delta and a lifetime of pf_initial_number.

Prefetches are issued in a round-robin fashion from active prefetch streams (the lifetime goes down by one each time a prefetch is issued) whenever there have been fewer than history_threshold cache misses among the last history_length loads. The rationale is that if lots of cache hits are occurring, there should be available bandwidth on the memory interface to be used by prefetching.

Issued prefetches are tracked in a circular list of size pf_tracker_count, and if the prefetcher sees a load to an address in this circular list, it increments the lifetime of the prefetch stream that issued the successful prefetch.


Prefetches are to physical addresses, and as a result, a prefetch stream expires when it reaches the end of a 4KB region.
Non-ConfidentialPDF file icon PDF version100965_1105_00_en
Copyright © 2014–2018 Arm Limited or its affiliates. All rights reserved.