2.4 Running a binary in an AEMv8-A Base Fixed Virtual Platform (FVP)

Describes how to compile a program with Arm® Compiler and then run the resulting binary using the AEMv8-A Base Fixed Virtual Platform (FVP). This demonstrates some basic features and shows how increasing the SVE vector width produces a corresponding performance gain.

Running the FVP

The command to execute a compiled binary through the FVP is fairly complex, but there are only a few elements that can be edited.

The following example shows a complete command-line invocation of the FVP. Most of the lines are required for correct program execution and do not need to be modified. The italic elements indicate parameters that can be edited.

$FVP_BASE/FVP_Base_AEMv8A-AEMv8A \
  --plugin $FVP_BASE/ScalableVectorExtension.so \
  -C SVE.ScalableVectorExtension.veclen=$VECLEN \
  --quiet \
  --stat \
  -C cluster0.NUM_CORES=1 \
  -C bp.secure_memory=0 \
  -C bp.refcounter.non_arch_start_at_default=1 \
  -C cluster0.cpu0.semihosting-use_stderr=1 \
  -C bp.vis.disable_visualisation=1 \
  -C cluster0.cpu0.semihosting-cmd_line="$CMDLINE" \ 
  -a cluster0.cpu0=$BINARY

Where:

$FVP_BASE

Specifies the path to the FVP.

$VECLEN

Defines the SVE vector width, in units of 64-bit (8 byte) blocks. The maximum value is 32, which corresponds to the architectural maximum SVE vector width of 2048 bits (256 bytes).

The SVE architecture only supports vector lengths in 128-bit (16 byte increments), so all values of $VECLEN should be even. For example, a value of 8 would signify a 512-bit vector width.

--quiet

Specifies that the FVP emits reduced output. For example, if --quiet is omitted, Simulation is started and Simulation is terminating messages are output to signify the start and end of program execution.

--stat

Specifies that the FVP writes a short summary of program execution to standard output following termination (even if --quiet is specified).

This output is of the form:

Total instructions executed: 10344
User time:    0.01 sec
Kernel time: 0.00 sec
CPU time:    0.01 sec
Elapsed clock: 0.00 sec
$CMDLINE

Specifies the command-line to pass to your program. This is typically of the form "./binary_name arg1 arg2".

$BINARY

Specifies the path to the compiled binary that will be loaded and executed by the FVP.

A sample application

The following sample application contains two vectorizable loops. The first fills the values array with floating-point values, and the second calculates the total. The application then performs a printf operation, producing output when executed through the FVP.

#include <stdio.h>
#define ITERATIONS 8192
float values[ITERATIONS];
void fill()
{
  for (int i = 0; i < ITERATIONS; i++)
  {
    values[i] = (float)i;
  }
}
 
float reduce() {
  float result = 0.0;
  for (int i = 0; i < ITERATIONS; i++)
  {
    result += values[i];
  }
  return result;
}
 
int main(int argc, char* argv[]) {
  fill();
  printf("Result was %f\n", reduce());
}

To compile this application and create an executable binary:

armclang -O3 -Xlinker "--ro_base=0x80000000" --target=aarch64-arm-none-eabi 
   -march=armv8-a+sve -o sum sum.c

Running the sample application on an FVP

To execute an application using an FVP, it is useful to construct a shell script as follows:

#!/bin/bash
# fvp-run.sh
# Usage: fvp-run.sh [veclen] [binary]
#    Executes the specified binary in the FVP, with no command-line
#    arguments.  The SVE register width will be [veclen] x 64 bits. Only
#    even values of veclen are valid.
#
#
# Set the FVP_BASE environment variable to point to the FVP directory.
#
# Set the ARMLMD_LICENSE_FILE environment variable to reference a license    
# file or license server with entitlement for the FVP.
 
VECLEN=$1
CMDLINE=$2
 
$FVP_BASE/FVP_Base_AEMv8A-AEMv8A \
   --plugin $FVP_BASE/ScalableVectorExtension.so \
   -C SVE.ScalableVectorExtension.veclen=$VECLEN \
   --quiet \
   --stat \
   -C cluster0.NUM_CORES=1 \
   -C bp.secure_memory=0 \
   -C bp.refcounter.non_arch_start_at_default=1 \
   -C cluster0.cpu0.semihosting-use_stderr=1 \
   -C bp.vis.disable_visualisation=1 \
   -C cluster0.cpu0.semihosting-cmd_line="$CMDLINE"  \
   -a cluster0.cpu0=$CMDLINE

This script loads and executes a compiled binary with the FVP, configured for a specified vector width.

Running the compiled binary through the FVP generates output of the form:

$ ./fvp-run.sh 2 ./sum
terminal_3: Listening for serial connection on port 5000
terminal_2: Listening for serial connection on port 5001
terminal_1: Listening for serial connection on port 5002
terminal_0: Listening for serial connection on port 5003
Result was 33549136.000000
 
Total instructions executed:      62090
User time:   0.01 sec
Kernel time: 0.01 sec
CPU time:    0.02 sec
Elapsed clock: 0.00 sec

The first line is the command-line invocation, passing a vector width of 2 and the application binary ./sum. The line starting "Result was" is generated by the application. The remainder of the output is the result of specifying the --stat option.

Varying the vector width

Varying the SVE vector width changes the total instruction count. The wider the SVE vector, the fewer instructions are needed to process the array. The following example bash command-line executes the binary with all possible vector widths, extracting and printing the instruction count as returned by the FVP --stat option.

$ for x in {2..32..2};
  do echo -ne VL=$x\\t;
  ./fvp-run.sh $x ./sum | grep 'instructions' | cut -f2 -d:;
  done
VL=2          62090
VL=4          50826
VL=6          47075
VL=8          45194
VL=10         44072
VL=12         43324
VL=14         42785
VL=16         42378
VL=18         42070
VL=20         41817
VL=22         41619
VL=24         41443
VL=26         41300
VL=28         41179
VL=30         41179
VL=32         41179
Non-ConfidentialPDF file icon PDF version100891_0609_00_en
Copyright © 2016, 2017 Arm Limited (or its affiliates). All rights reserved.