1.3.3 Bus traffic in Fast Models

PVBus can simulate the behavior of individual bus transactions passing through a hierarchy of bus fabric, but it employs several techniques to optimize this process.

  1. PVBus generally decodes the path between a bus master and the bus slave the first time a transaction is issued. All subsequent transactions to the same address are automatically sent to the same slave, without passing through the intervening fabric.
  2. For accesses to normal memory, the master can cache a pointer to the (host) storage that holds the data contents of the memory. The master can read and write directly to this memory without generating bus-transactions.
  3. For instruction-fetch, and for operations such as repeated DMA from framebuffer memory, PVBus provides an optimization called “snooping”, that informs the master if anyone else could have modified the contents of memory. If no changes have occurred the master can avoid the need to re-read memory contents.

If a piece of bus fabric wants to intercept and log all bus transactions, it can defeat these optimizations by claiming to be a slave device. It can then log all transactions and can reissue identical transactions on its own master port. However, doing this slows all bus transactions and significantly impacts simulation performance.

Note:

If direct accesses to memory by the CT engine are intercepted by the fabric, the processor is forced to single step. Execution is much slower than normal operation with translated code.

The bus traffic generated by a processor is not representative of real traffic:

Timing differences
Re-ordering and buffering of memory accesses, out-of-order execution, speculative prefetch and drain-buffers can cause timing differences. They are not modeled, since they are not visible to the programmer except in situations where a cluster program contains race conditions that violate serial-consistency expectations.
Bus contention
Fast Models do not model the time taken for a bus transaction, so they cannot model the effects of multiple transactions contending for bus availability.
Size of access
Fast Models do not attempt to generate the same types of burst transaction from the processor for accesses to multiple consecutive locations.
Instruction fetch
The behavior of the instruction prefetch unit of a processor is not modeled to match the hardware implementation.
Behavioral differences
In some software, the trace of instruction execution is dependent on timing effects. For example, if a loop polls a device waiting for a 10ms time-out, the number of iterations of the polling loop depends on the rate of instruction execution.
Non-ConfidentialPDF file icon PDF version100964_1161_00_en
Copyright © 2014–2019 Arm Limited or its affiliates. All rights reserved.