2.12.3. Capturing profiling information

This example demonstrates how you can use RealView Debugger to capture profiling information for your application.

The dhrystone.axf image used in this example performs a benchmarking sample that is executed n number of times, where you are prompted to indicate n when running the image. By performing multiple runs, you can sum the profiling data. This example assumes that you want to analyze the execution times of all functions that are executed in the main for loop that is run repeatedly. It also assumes that you are using an ETM-based target.

This example is in two parts:

  1. Procedure for capturing profiling data

  2. Procedure for summing profiling data over multiple runs.

Procedure for capturing profiling data

To capture profiling information using the dhrystone example:

  1. Load the example image dhrystone.axf into the debugger. This file is located in the install_directory\RVDK\Examples\...\dhrystone\DebugRel directory. The tab for dhry_1.c is displayed in the File Editor pane.

  2. Select Edit → Advanced → Show Line Numbers from the Code window main menu to display the line numbers in dhry_1.c.

  3. Select View → Analysis Window from the Code window main menu to view the Analysis window, and position the window so that it does not cover the Code window.

  4. Configure the ETM for profiling:

    1. Select Edit → Configure Analyzer Properties... from the Analysis window main menu to display the Configure ETM dialog box (see Configuring the ETM for more details on this dialog box).

    2. To generate profiling information you must enable one of the settings that causes this to be captured. For RealView Trace, you can select either or both of the settings Enable Timestamping and Cycle accurate tracing.

      For this example, select Enable Timestamping.

      Note

      Be aware that selecting Cycle accurate tracing might give less accurate results than timestamping.

    3. Click OK to close the Configure ETM dialog box.

  5. Set a trace start point at the start of the program loop as follows:

    1. In the dhry_1.c source tab, scroll down the source file and right-click in the gray area to the left of the code listing in line 146 to display the context menu. This line represents the start of a for loop.

    2. Select Set/Toggle Tracepoint... from the context menu. A List Selection dialog box is displayed.

    3. Select Trace Start Point from the List Selection dialog box. An arrow is placed next to line 146 to indicate the start point you have set, and details of this control point are displayed in the Break/Tracepoints pane of the Code window, as shown in Figure 2.47.

    Figure 2.47. Setting a trace start point

    Setting a trace start point
  6. Set a trace end point after the end of the program loop as follows:

    1. Scroll down the source file and right-click in the gray area to the left of the code listing at line 204 to display the context menu.

      By placing the end point after the end of the loop, you ensure that RealView Debugger captures all the iterations of the loop, rather than a single iteration.

      In addition, the area bounded by the trace start and end points does not include any code that activates semihosting. The semihosting mechanism can affect the profiling results.

    2. Select Set/Toggle Tracepoint... from the context menu. A List Selection dialog box is displayed.

    3. Select Trace End Point from the List Selection dialog box. An arrow is placed next to line 204 to indicate the end point you have set, and details of this control point are displayed in the Break/Tracepoints pane of the Code window.

    Because you have set trace start and end points, and not a trace range, you are instructing RealView Debugger to capture and display all trace information between the start and end points, including any instructions that might be branched to between the points. For more details on these types of tracepoints, see Setting unconditional tracepoints.

  7. Select Debug → Run from the Code window main menu. The image is executed and a prompt is displayed in the Output pane at the bottom of the Code window.

  8. In the Output pane, enter the number of runs through the benchmark you want RealView Debugger to perform, in this case 5000, and press Enter. Trace capture begins for the area of execution you have specified using tracepoints, that is, for the for loop area of code. The Analysis window is updated with the results of the trace capture.

  9. In the Analysis window, click the Profile tab to display the profile data, as shown in Figure 2.48.

    Figure 2.48. Results displayed in the Profile tab

    Results displayed in the Profile tab

    In the Profile tab, as shown in Figure 2.48, the execution times for all functions accessed during the for loop are displayed, and a graphical representation of their respective execution times is shown in the Histogram column. For details on the types of information displayed in each column, see Columns.

  10. To view call-graph data for these results:

    1. Select Profiling Data → Parents of Function from the Analysis window main menu.

    2. Select Profiling Data → Children of Function from the Analysis window main menu.

    The Profile tab displays parents and children of each function, as shown in Figure 2.49.

    Figure 2.49. Call-graph data displayed in the Profile tab

    Call-graph data displayed in the Profile tab

    In addition to displaying the execution times for each function accessed during the for loop, the execution times of the parents and children of these functions are also displayed. Figure 2.50 shows the data for a single function, Func_2.

    Figure 2.50. Call-graph data for Func_2

    Call-graph data for Func_2

    The Exec% column shows that:

    • 6.92% of the total execution time was spent in code of the function Func_2 when called from the parent main.

    • 6.92% of the total execution time was spent in code of the function Func_2.

    • 1.41% of the total execution time was spent in code of the function Func_1 when called as a child from Func_2.

    • 14.22% of the total execution time was spent in code of the function strcmp when called as a child from Func_2.

    The B=>E% column shows that:

    • 22.56% of the total execution time was spent in calls to the function Func_2 and its children when called from the parent main.

    • 22.56% of the total execution time was spent in calls to the function Func_2 and its children.

    • 1.41% of the total execution time was spent in calls to the function Func_1 called as a child from Func_2.

    • 14.22% of the total execution time was spent in calls to the function strcmp called as a child from Func_2.

    The Count column shows that:

    • There were 5000 calls from the function main to the function Func_2.

    • There were 5000 calls to the function Func_2.

    • There were 5000 calls to the function Func_1 from the function Func_2.

    • There were 5000 calls to the function strcmp from the function Func_2.

  11. You can sum profiling data over additional runs of dhrystone as described in Procedure for summing profiling data over multiple runs.

Procedure for summing profiling data over multiple runs

To sum profiling data over multiple runs, do the following:

  1. If you have not yet done so, perform the procedure described in Procedure for capturing profiling data.

  2. Select Profiling Data → Sum Profiling Data from the Analysis window main menu.

  3. Select Debug → Set PC to Entry Point from the Code window main menu.

    Note

    If you reload the image, or load a new image, the trace details are cleared and Sum Profiling Data is disabled.

  4. Select Debug → Run from the Code window main menu.

  5. Enter 500 for the number of runs through the benchmark. The profiling data is updated as shown in Figure 2.51.

    Figure 2.51. Summing profiling data over multiple runs

    Summing profiling data over multiple runs

If you no longer want to sum profiling data, select Profiling Data → Sum Profiling Data from the Analysis window main menu to deselect it.

Observations on summing profiling data over multiple runs

Compare Figure 2.51 with Figure 2.48. The summing of profiling data reveals the following information:

Count column

The Count column reveals the following:

  • After the first run:

    • there were 15000 calls to functions Proc_7, Func_1

    • there were 5000 calls to all other functions.

  • After the second run:

    • there were 16500 calls to functions Proc_7, Func_1

    • there were 5500 calls to all other functions.

B=>E% column

The B=>E% column reveals the following:

  • The total execution time after two runs is equal to the total execution time of the first run, plus the total execution time of the second run.

  • The time taken to execute a particular function after two runs is equal to the time taken to execute it after the first run.

  • The percentage of the total execution time that is spent in the code of a particular function is the same as that after the first run. However, this might not always be the case.

Exec% column

The Exec% column reveals the following:

  • The total execution time after two runs is equal to the total execution time of the first run, plus the total execution time of the second run.

  • The time spent from entry to exit of a particular function after two runs is equal to the time spent from entry to exit of that function after the first run.

  • The percentage of the total execution time that is spent in calls to a particular function is the same as that after the first run. However, this might not always be the case.

Copyright © 2006 ARM Limited. All rights reserved.ARM DUI 0335A
Non-Confidential