4.13. Use of a return stack

The PFT architecture includes a return stack that you can use to reduce the number of branch address packets generated by the PTM. Branch address packets are a high proportion of the data in the trace stream, and using the return stack significantly reduces the amount of trace output.

You enable use of the return stack by setting the Return stack enable bit of the Main Control Register to 1. See Main Control Register, ETMCR. On a PTM reset, this bit is set to 0, disabling the use of the return stack.

The PTM can use the return stack for tracing branch with link instructions. Table 4.12 lists these instructions.

Table 4.12. Branch with link instructions

InstructionInstruction set
MnemonicDescriptionARMThumb, 16-bitThumb, 32-bit
BLBranch and linkYes--
BL <immed>Branch and link--Yes
BLX <reg>Branch with link and exchangeYesYes-
BLX <immed>Branch immediate with link and exchangeYes-Yes
HBL, HBLPHandler branch with link-Yes-

When a branch with link instruction is executed, the PTM puts the return address of the branch instruction onto the return stack. This address is the address stored in the LR for the link. The return stack entry includes the instruction set and security states that correspond to this return address. If the stack is full, the oldest entry is discarded to make room for the new entry.

When an indirect branch instruction is executed and passes its condition code check, if the return stack is not empty the target of the branch is compared with the most recent entry on the stack. If the address, instruction set state, and security state all match then:

If the return stack is empty, or the indirect branch instruction does not match the most recent stack entry, then the PTM traces the indirect branch normally, by generating a branch address packet, and does not change the contents of the return stack.

If the most recent entry on the return stack matches, but the waypoint causes a change in security state, then the PTM always generates a branch address packet for the waypoint. The entry is not removed from the return stack.

Any periodic or nonperiodic I-sync packet clears the contents of the return stack.

Table 4.13 summarizes how the PTM traces branch instructions when you have enabled use of the return stack.

Table 4.13. PTM branch tracing when using the return stack

Branch typeCondition code checkReturn stackTrace generated
Matches [a]Operation
IndirectPassYesRemove most recent entryE atom
NoNo changeBranch address packet
FailXNo changeN atom
DirectPassXNo changeE atom
FailXNo changeN atom

[a] Does the top entry on the stack match the address, instruction set state, and security state of the branch target address? If the return stack is empty the processing is the same as for a failed match.

When an exception occurs, the return stack has no effect. On an exception, the PTM always outputs a branch address packet and leaves the return stack unchanged.

The ordering of operations on the return stack is important. When tracing an indirect branch instruction, the PTM compares the branch address with the most recent entry on the stack before performing any push operation associated with the instruction. For example, if at address 0x1000 there is an indirect BLX to 0x2000, the PTM operates as follows:

  1. Compare the destination address of the BLX instruction, 0x2000, with the most recent address on the return stack. If the address, instruction set state, and security state all match then remove the most recent entry from the stack.

  2. Regardless of the outcome of step 1, push the return address of the BLX instruction, 0x1004, onto the return stack.

If the security state of an entry that the PTM might push onto the return stack is different from the last security state it output in the trace stream, then the PTM:

From PFTv1.1, the return stack must not match on:

This ensures an explicit branch packet is traced for each of these scenarios.


  • This cannot occur in code that uses only the methods recommended by ARM to change the security state. See the ARM Architecture Reference Manual, ARMv7-A and ARMv7-R edition.

  • A trace decompression tool does not have to know that the PTM has cleared the return stack.

The size of the return stack is implementation defined, from 0-15 entries. There is no mechanism for detecting the size of this stack, and a trace decompressor does not require this information. The Return stack implemented bit in the Configuration Code Extension Register is set to 1 if the return stack is implemented. See Configuration Code Extension Register, ETMCCER.


  • Depending on the application code, using the return stack reduces the amount of trace generated by 25-40%.

  • There are no packet formats or other PFT protocol elements associated with using the return stack. The effect of using the return stack is to replace some branch address packets with E atoms.

  • Use of the return stack is a prediction technique. The PTM predicts the target of a branch and if the prediction is:

    • correct, it outputs an E atom

    • incorrect, it outputs a branch address packet.

Copyright © 1999-2002, 2004-2008, 2011 ARM. All rights reserved.ARM IHI 0035B