2.11.1. Parallel execution

The ETM must choose a particular stage in the processor pipeline from which to trace instructions. This is chosen to be as close as possible to the program order of the instructions, without tracing instructions that are fetched but not executed.

The two types of parallel execution are:

Parallel instruction execution

This is where more than one instruction is executed in a single cycle.

In ETMv2.x, parallel instruction execution is supported, provided that only one of the instructions is capable of transferring data. The processor cannot execute multiple data transfer instructions in parallel. This restriction is because the pin protocol makes this assumption.

From ETMv3, more general parallel instruction execution is supported.

Parallel data transfers

In processors with a 64-bit data bus, two 32-bit quantities might be transferred in a single cycle. This generally occurs only with Load/Store Multiple (LSM) instructions. See Definitions for more information about these instructions.

Rules for parallel execution

Note

In this subsection, item refers to an object that is traced, either an instruction or data. When both instructions and data are being traced, an instruction and its associated data are separate items.

For either type of parallel execution, a small amount of extra trace is possible, but effects on the long-term state must be minimized. The following rules apply:

  • In applying these rules, an instruction item must be considered as occurring before any associated data item. For example, if the trace stop control is a data address comparator, and the trace start/stop block is active before an instruction with data that matches this comparator, the trace start/stop block is active for the instruction. This is because the data comparison is considered after the instruction is traced.

  • The trace start/stop block must view the instructions and data transfers as executing in the order in which they would be traced, regardless of whether tracing is enabled. An example of this ordering is given in Example 2.2.

    For each cycle, there are three situations to consider:

    1. Only a start address matches. In this case, the start/stop block must be active on this cycle, and remains active until a stop address is encountered.

    2. Only a stop address matches, when the start/stop block is already active. In this case:

      • if the stop address is the first item to be executed in this cycle then the start/stop block inactive on this cycle

      • if the stop address is not the first item then the start/stop block must be active on this cycle, to trace the items before the stop address.

      The start/stop block is inactive at the end of this cycle, and remains inactive until a start address is encountered.

    3. Both the start address and the stop address match. In this case:

      • If the start address occurs before the stop address, the start/stop block must be active on this cycle, and inactive at the end of the cycle. It then remains inactive until a start address is encountered.

      • If the stop address occurs before the start address, the start stop block must be active on this cycle, and remains active after the cycle until another stop address is encountered.

    This behavior means that a single address comparator must perform simultaneously a comparison for each instruction or data transfer, so that it can match or not match each item individually.

  • If instructions would have been traced if they had been executed sequentially then they must be traced when executed in parallel. This applies to the TraceEnable include/exclude regions and to the trace start/stop block. Other instructions executed in the same cycle might also be traced as a result, but no trace must be lost.

    For example, consider two instructions executed in parallel, one of which causes a selected single address comparator to match. If TraceEnable is in include mode, the matching instruction must be traced, and the other might be traced. However if TraceEnable is in exclude mode, the non-matching instruction must be traced, and the other might be traced.

  • ViewData must trace each data transfer if it would have been traced had the instructions been executed sequentially. This applies to the include/exclude regions.

    Note

    This rule might be reviewed if the ETM specification is extended to processors capable of executing multiple data transfer instructions in parallel.

  • Any resource, when viewed as an event, must be active for the entire cycle if it matched for any instruction executed in that cycle. For example, if the trace start/stop resource is used as the enabling event of ViewData, but is logically active for only the first of two instructions executed in a cycle, the second instruction must have its data traced (assuming the include/exclude regions match).

Example 2.2. Trace Start/Stop block ordering of parallel instructions

Consider the case where these two instructions are executed in parallel:

        LDRD  r4,  [r1]
        LDR   r10, [r2]

The Trace Start/Stop block must behave as if the LDRD  r4, [r1] is executed first, followed by LDR  r10, [r2]. The means the block must behave as if the trace order is:

    Instruction(LDRD r4)
    Data loaded into r4, from address indicated by r1
    Data loaded into r5, from (address indicated by r1) + 4
    Instruction(LDR r10)
    Data loaded into r10, from address indicated by r2

Copyright © 1999-2002, 2004-2009, 2011 ARM Limited. All rights reserved.ARM IHI 0014Q
Non-ConfidentialID101211