10.9.2 Modeling Cycles Per Instruction (CPI)

This section demonstrates how to precisely model the simulated time per instruction by using the CPI timing annotation feature.

CPI parameters

You can specify a single CPI value for all instructions that execute within a cluster. This value is referred to as a fixed CPI value. Alternatively, you can use a custom CPI file to define individual CPI values for specific instructions. Use a fixed CPI value instead of a CPI file when precise per-instruction modeling is not required.

When running a simulation with either of these options, you can calculate the average CPI value using the formula that is shown in Calculating the average CPI value.

Note:

You can combine the CPI specification with other timing annotation features. Therefore, the average CPI value that you observe can be different from the fixed CPI value that you specify.

Specifying a fixed CPI value

You can specify a fixed CPI value by using the per-cluster model parameters cpi_mul and cpi_div.

These parameters are documented in the Fast Models Reference Manual. By default, a fixed CPI value of 1.00 is used. The values that you specify in these parameters must be integers. Using them, any arbitrary value can be generated and is applied to all instructions during execution within that cluster. The value is used in a way that core_clock_period * fixed_cpi_value is rounded to the nearest picosecond.

Example CPI file

CPI files can be large because they have to cover multiple encodings for many of the instructions that are included. Various predefined encodings are provided under $PVLIB_HOME/etc/CPIPredefines/ that can help you to create CPI files. This tutorial does not use predefined encodings.

The following example defines CPI values for the instructions ADRP, ADR, ADD, CMP, ORR, LDP, STR, branches, exception generating instructions, and system instructions. It defines a default CPI value of 0.75 for all other instructions. It applies to the A64 instruction set, and does not restrict the values to a specific core.

Note:

These CPI values are an example only. They are arbitrary and are not representative of any ARM processor.
# -------------------
# Instruction classes
# -------------------
## PC-relative addressing
DefineClass ADRP                   Mask=0x9F000000 Value=0x90000000 ISet=A64
DefineClass ADR                    Mask=0x9F000000 Value=0x10000000 ISet=A64 
## Arithmetic
DefineClass ADD_ext_reg            Mask=0x7FE00000 Value=0x0B200000 ISet=A64
DefineClass ADD_sft_reg            Mask=0x7F200000 Value=0x0B000000 ISet=A64
DefineClass ADD_imm                Mask=0x7F000000 Value=0x11000000 ISet=A64
DefineClass CMP_ext_reg            Mask=0x7FE0001F Value=0x6B20001F ISet=A64
DefineClass CMP_sft_reg            Mask=0x7F20001F Value=0x6B00001F ISet=A64
DefineClass CMP_imm                Mask=0x7F00001F Value=0x7100001F ISet=A64 
## Logical
DefineClass ORR_sft_reg            Mask=0x7F200000 Value=0x2A000000 ISet=A64
DefineClass ORR_imm                Mask=0x7F800000 Value=0x32000000 ISet=A64 
## Branches, exception generating and system instructions
DefineClass B_gen_except_sys       Mask=0x1C000000 Value=0x14000000 ISet=A64 
## Load register pair
DefineClass LDP_post_idx           Mask=0x7FC00000 Value=0x28C00000 ISet=A64
DefineClass LDP_pre_idx            Mask=0x7FC00000 Value=0x29C00000 ISet=A64
DefineClass LDP_sgn_off            Mask=0x7FC00000 Value=0x29400000 ISet=A64 
## Store register
DefineClass STR_reg                Mask=0xBFE00C00 Value=0xB8200000 ISet=A64
DefineClass STR_imm_post_idx       Mask=0xBFE00C00 Value=0xB8000400 ISet=A64
DefineClass STR_imm_pre_idx        Mask=0xBFE00C00 Value=0xB8000C00 ISet=A64
DefineClass STR_imm_usg_off        Mask=0xBFC00000 Value=0xB9000000 ISet=A64 
# ------------------
# Instruction groups
# ------------------
DefineGroup PC_rel_addr_instr      Classes=ADRP,ADR                             ISet=A64
DefineGroup ADD_instr              Classes=ADD_ext_reg,ADD_sft_reg,ADD_imm      ISet=A64
DefineGroup CMP_instr              Classes=CMP_ext_reg,CMP_sft_reg,CMP_imm      ISet=A64
DefineGroup ORR_instr              Classes=ORR_sft_reg,ORR_imm                  ISet=A64
DefineGroup B_gen_except_sys_instr Classes=B_gen_except_sys                     ISet=A64
DefineGroup LDP_instr              Classes=LDP_post_idx,LDP_pre_idx,LDP_sgn_off ISet=A64
DefineGroup STR_instr              Classes=STR_reg,STR_imm_post_idx,STR_imm_pre_idx,STR_imm_usg_off ISet=A64
# ----------
# CPI values
# ----------
DefineCpi   PC_rel_addr_instr      ISet=A64 Cpi=0.25
DefineCpi   ADD_instr              ISet=A64 Cpi=0.50
DefineCpi   CMP_instr              ISet=A64 Cpi=0.75
DefineCpi   ORR_instr              ISet=A64 Cpi=0.50
DefineCpi   B_gen_except_sys_instr ISet=A64 Cpi=1.00
DefineCpi   LDP_instr              ISet=A64 Cpi=2.00
DefineCpi   STR_instr              ISet=A64 Cpi=1.00 
# --------
# Defaults
# --------
Defaults ISet=* Cpi=0.75

Defining CPI values in a CPI file

To define CPI values in a CPI file, use the following procedure for each instruction or set of instructions:

Procedure

  1. Create an instruction class for each encoding of an instruction or set of instructions by using the DefineClass keyword.
  2. Group instruction classes by using the DefineGroup keyword.
  3. Set a CPI value for each instruction class or group of classes by using the DefineCpi keyword.

The encodings for each instruction in the A64 instruction set are provided by the ARMv8-A Architecture Reference Manual, section C6.2. Also, groups of instructions that share encodings are described in chapter C4. You can use these encodings to define the Mask and Value fields in the CPI file.

The Mask field must cover all bits that are fixed in the encoding of an instruction. The Value field must specify the value of these bits. For example, section C4.2.6 of the ARMv8-A Architecture Reference Manual defines a set of instructions called PC-rel. addressing. In the example CPI file, the following statements specify a common CPI value for these instructions:

DefineClass ADRP Mask=0x9F000000 Value=0x90000000 ISet=A64
DefineClass ADR Mask=0x9F000000 Value=0x10000000  ISet=A64
DefineGroup PC_rel_addr_instr Classes=ADRP,ADR    ISet=A64
DefineCpi PC_rel_addr_instr ISet=A64 Cpi=0.25

For both instruction classes, the Mask value has bit[31] set to 0b1 and bits [28:24] set to 0b11111. As shown in the reference manual, a value of 0b10000 for bits [28:24] identifies the instruction as being ADR or ADRP. Therefore, both Value fields set bits [28:24] to 0b10000. Bit[31] distinguishes between ADR and ADRP, so bit[31] in the Value field for ADR is set to 0b0 and to 0b1 for ADRP.

This specification allows the model to specify a CPI value of 0.25 for the PC_rel_addr_instr group of instructions. A similar process has been followed to determine the Mask and Value fields for the other instructions in the CPI file example.

Validating a CPI file

To validate CPI files, use the CPIValidator tool. You can find this tool with the Fast Models Tools under $MAXCORE_HOME/bin/. The tool can detect missing or incompatible instruction groups and classes, but cannot validate the encodings themselves.

For example, if you remove the DefineClass statement for the B_gen_except_sys instruction class, and validate the example CPI file by using the following command:

CPIValidator --input-file /path/to/custom_cpi.txt --output-file cpi_evaluation.txt

the tool produces the following output:

ERROR: Instruction Class 'B_gen_except_sys' has no definition, when Instruction Set is 'A64' and the CPU Type is 'Default ARM Core'.
ERROR: Processing error in file /path/to/custom_cpi.txt

Using the tool with the complete CPI file produces the following output:

Core Performance Profile: Default ARM Core
--------------------------------------------------------------------------------
Instruction Set: A32 Default Cpi:0.75
Instruction Set: A64 Default Cpi:0.75
    (0x1c000000|0x14000000) Cpi:1 Name:B_gen_except_sys
    (0x7f000000|0x11000000) Cpi:0.5 Name:ADD_imm
    (0x7f00001f|0x7100001f) Cpi:0.75 Name:CMP_imm
    (0x7f200000|0x0b000000) Cpi:0.5 Name:ADD_sft_reg
    (0x7f200000|0x2a000000) Cpi:0.5 Name:ORR_sft_reg
    (0x7f20001f|0x6b00001f) Cpi:0.75 Name:CMP_sft_reg
    (0x7f800000|0x32000000) Cpi:0.5 Name:ORR_imm
    (0x7fc00000|0x28c00000) Cpi:2 Name:LDP_post_idx
    (0x7fc00000|0x29400000) Cpi:2 Name:LDP_sgn_off
    (0x7fc00000|0x29c00000) Cpi:2 Name:LDP_pre_idx
    (0x7fe00000|0x0b200000) Cpi:0.5 Name:ADD_ext_reg
    (0x7fe0001f|0x6b20001f) Cpi:0.75 Name:CMP_ext_reg
    (0x9f000000|0x10000000) Cpi:0.25 Name:ADR
    (0x9f000000|0x90000000) Cpi:0.25 Name:ADRP
    (0xbfc00000|0xb9000000) Cpi:1 Name:STR_imm_usg_off
    (0xbfe00c00|0xb8000400) Cpi:1 Name:STR_imm_post_idx
    (0xbfe00c00|0xb8000c00) Cpi:1 Name:STR_imm_pre_idx
    (0xbfe00c00|0xb8200000) Cpi:1 Name:STR_reg
Instruction Set: Thumb Default Cpi:0.75
Instruction Set: T2EE Default Cpi:0.75

The example CPI file and the CPIValidator output are provided in $PVLIB_HOME/images/source/ta_cpi/, see custom_cpi.txt and cpi_evaluation.txt respectively.

CPI class example program

The example program is designed to show the effect of the CPI values that are specified in the example CPI file that was described previously.

It includes the following sequence of embedded assembly code that uses instructions for which specific CPI values were defined:

	.section asm_func, "ax"
	.global  asm_cpi
	.type    asm_cpi, "function"
asm_cpi:
	ldp  w1, w2, [x0]
	cmp  w1, w2
	b.gt skip
	orr  w1, w1, w2
	str  w1, [x0]
skip:
	ret

This sequence checks if the second value in a two-element array pointed to by the address in x0 is greater than the first value. If so, it performs a bitwise OR operation using the two values, storing the result as the new first value. The rest of this section examines this sequence by running the example on the EVS_Base_Cortex-A73x1 platform model with the following CPI configurations:

  • Using the default CPI value.
  • Using the custom CPI file that was described earlier in the tutorial.
  • Using a fixed CPI value.

The CPI class example is based on the DS-5 example startup_AEMv8-FVP_AArch64_AC6. The binary file is $PVLIB_HOME/images/ta_cpi.axf, and the source code is available under $PVLIB_HOME/images/source/ta_cpi/.

Running the example with the default CPI value

If you do not specify any CPI parameters, a default CPI value of 1.00 is used. This value establishes a baseline to compare the other CPI configurations with.

To use the default CPI value of 1.00, launch the model using the following command:

$PVLIB_HOME/examples/SystemCExport/EVS_Platforms/EVS_Base/Build_Cortex-A73x1/EVS_Base_Cortex-A73x1.x \
-C Base.bp.secure_memory=0 \
--plugin=$PVLIB_HOME/plugins/Linux64_GCC-4.8/GenericTrace.so \
-C TRACE.GenericTrace.trace-sources=INST \
-C TRACE.GenericTrace.trace-file=trace.txt \
-a $PVLIB_HOME/images/ta_cpi.axf \
--stat

In the trace file that the GenericTrace plugin produces, find the instruction at address 0x800005a4. The trace for this instruction and the one before it is as follows:

INST: PC=0x00000000800005a0 OPCODE=0x910003fd SIZE=0x04 MODE=EL1h ISET=AArch64 
PADDR=0x00000000800005a0 NSDESC=0x01 PADDR2=0x00000000800005a0 NSDESC2=0x01 NS=0x01 
ITSTATE=0x00 INST_COUNT=0x000000000000b7bc LOCAL_TIME=0x0000000000007530 
CURRENT_TIME=0x000000001c091fc0 CORE_NUM=0x00 DISASS="MOV      x29,sp"

INST: PC=0x00000000800005a4 OPCODE=0x90000020 SIZE=0x04 MODE=EL1h ISET=AArch64 
PADDR=0x00000000800005a4 NSDESC=0x01 PADDR2=0x00000000800005a4 NSDESC2=0x01 NS=0x01 
ITSTATE=0x00 INST_COUNT=0x000000000000b7bd LOCAL_TIME=0x0000000000009c40 
CURRENT_TIME=0x000000001c0946d0 CORE_NUM=0x00 DISASS="ADRP     x0,{pc}+0x4000 ; 0x800045a4"

Using the CURRENT_TIME values, it can be observed that the instruction took 10000ps or 1 tick to complete, which shows the default CPI value of 1.00 is being used. You can verify that all other instructions are also using the default CPI value by examining the trace.

Running the example with a custom CPI file

To use the custom CPI file, launch the model using the following command:

$PVLIB_HOME/examples/SystemCExport/EVS_Platforms/EVS_Base/Build_Cortex-A73x1/EVS_Base_Cortex-A73x1.x \
-C Base.bp.secure_memory=0 \
--plugin=$PVLIB_HOME/plugins/Linux64_GCC-4.8/GenericTrace.so \
-C TRACE.GenericTrace.trace-sources=INST \
-C TRACE.GenericTrace.trace-file=trace.txt \
-a $PVLIB_HOME/images/ta_cpi.axf \
--cpi-file $PVLIB_HOME/images/source/ta_cpi/custom_cpi.txt \
--stat

Using the trace output that the GenericTrace plugin produces for the 10 instructions starting at address 0x800005a4, and the --stat output, the following information can be obtained for the embedded assembly code sequence in the example program:

Table 10-3 CPI values for embedded assembly instructions

Address Instruction Simulated time (ps) CPI value observed
0x800005a4 ADRP x0,{pc}+0x4000 2500 0.25
0x800005a8 ADD x0,x0,#0x9f0 5000 0.50
0x800005ac ADD x1,x0,#4 5000 0.50
0x800005b0 BL {pc}+0x4294 10000 1.00
0x80004844 LDP w1,w2,[x0,#0] 20000 2.00
0x80004848 CMP w1,w2 7500 0.75
0x8000484c B.GT {pc}+0xc 10000 1.00
0x80004850 ORR w1,w1,w2 5000 0.50
0x80004854 STR w1,[x0,#0] 10000 1.00
0x80004858 RET 10000 1.00

This table shows that the CPI values that are defined in the example CPI file have been applied to the appropriate instructions.

The following information can be obtained for the simulation as a whole:

Table 10-4 Statistics for the whole simulation

Total number of instructions Overall simulated time in seconds Average CPI value
47701 0.000362 0.75889

Note:

The average CPI value being close to the default CPI value specified in the CPI file does not signify anything by itself. To draw any conclusions from it, further analysis on the distribution of instructions would be required.

Running the example with a fixed CPI value

The average CPI value that was observed when running the example program with the custom CPI file is approximately 0.75889. Fractionally, the exact value is 36200/47701.

This fraction can be applied to the simulation by using the cpi_mul and cpi_div model parameters as follows:

$PVLIB_HOME/examples/SystemCExport/EVS_Platforms/EVS_Base/Build_Cortex-A73x1/EVS_Base_Cortex-A73x1.x \
-C Base.bp.secure_memory=0 \
--plugin=$PVLIB_HOME/plugins/Linux64_GCC-4.8/GenericTrace.so \
-C TRACE.GenericTrace.trace-sources=INST \
-C TRACE.GenericTrace.trace-file=trace.txt \
-C Base.cluster0.cpi_mul=36200 \
-C Base.cluster0.cpi_div=47701 \
-a $PVLIB_HOME/images/ta_cpi.axf \
--stat

For each instruction, a simulated time of 7589ps or 0.7589 ticks can be observed using the GenericTrace plugin. The --stat output is as follows and shows the same simulated time value as that obtained using the custom CPI file:

--- Base statistics: ----------------------------------------------------------
Simulated time                          : 0.000362s
User time                               : 0.171601s
System time                             : 0.015601s
Wall time                               : 0.196000s
Performance index                       : 0.00
Base.cluster0.cpu0                      : 0.25 MIPS (47701 Inst)

In this case, because the same application was run with the custom CPI file and with the average CPI value, an approximation of the average CPI value shows the same overall simulated time. However, the average CPI value for one application is not necessarily an accurate approximation of the average CPI value for a different application.

For example, running the branch prediction example application, described in the next section, clearly shows this difference. Specifying a branch misprediction latency increases the overall simulated time, and therefore gives a different average CPI value to the fixed CPI value that was specified. Using the custom CPI file produces a more accurate average CPI value for the branch prediction example.

Table 10-5 CPI values for simulation with branch prediction latency

Branch prediction example CPI configuration Overall simulated time in seconds Average CPI value
Using the average CPI value that was observed in the CPI class example program. 0.001726 1.00754
Using the custom CPI file. 0.001945 1.13538
Non-ConfidentialPDF file icon PDF versionARM 100965_1101_00_en
Copyright © 2014–2017 ARM Limited or its affiliates. All rights reserved.