2.2 Compiling C and C++ code for SVE-enabled targets

Arm® Compiler is an advanced auto-vectorizing compiler for the 64-bit Armv8‑A architecture and supports the SVE Architectural extension.

Generating SVE assembly code from C and C++ code

Arm Compiler can produce annotated assembly, and this is a good first step to see how the auto-vectorizer generates SVE instructions.

The following C program subtracts corresponding elements in two arrays, writing the result to a third array. The three arrays are declared using the restrict keyword, indicating to the compiler that they do not overlap in memory.

// example1.c
#define ARRAYSIZE 1024
void subtract_arrays(int *restrict a, int *restrict b, int *restrict c)
  for (int i = 0; i < ARRAYSIZE; i++)
    a[i] = b[i] - c[i];

int main()
  subtract_arrays(a, b, c);

Compile the program as follows:

armclang -O3 -S --target=aarch64-arm-none-eabi -march=armv8-a+sve -o example1.s example1.c

The output assembly code is saved as example1.s. The section of the generated assembly language file containing the compiled subtract_arrays function appears as follows:

subtract_arrays:                        // @subtract_arrays
// BB#0:
        orr     w9, wzr, #0x400
        mov     x8, xzr
        whilelo p0.s, xzr, x9
.LBB0_1:                                // =>This Inner Loop Header: Depth=1
        ld1w    {z0.s}, p0/z, [x1, x8, lsl #2]
        ld1w    {z1.s}, p0/z, [x2, x8, lsl #2]
        sub     z0.s, z0.s, z1.s
        st1w    {z0.s}, p0, [x0, x8, lsl #2]
        incw    x8
        whilelo p0.s, x8, x9
        b.mi    .LBB0_1
// BB#2:

SVE instructions operate on the z and p register banks. In this example the inner loop is almost entirely composed of SVE instructions. The auto-vectorizer has converted the scalar loop from the original C source code into a vector loop that is independent of the width of SVE vector registers.

Generating an executable binary from C and C++ code

To generate an executable binary, compile your program without the –S option:

armclang -O3 -Xlinker "--ro_base=0x80000000" --target=aarch64-arm-none-eabi -march=armv8-a+sve -o example1 example1.c

You can specify multiple source files on a single line. Each source file is compiled individually and then linked into a single executable binary:

armclang -O3 -Xlinker "--ro_base=0x80000000" --target=aarch64-arm-none-eabi -march=armv8-a+sve -o example2 example2a.c example2b.c


When compiling binaries to execute on the AEMv8-A Base Fixed Virtual Platform (FVP) base model, use the -Xlinker "--ro_base=0x80000000" option to specify the location in memory to load and run the binary. The RAM base address for this FVP is 0x80000000.

These executable binaries are suitable for execution on an SVE-enabled AEMv8-A Base Fixed Virtual Platform (FVP). Binaries are automatically linked against the Arm C/C++ library, which is included as part of the Arm Compiler distribution.

The Arm C/C++ library provides many common C functions. The version used by the Arm Compiler is configured for semihosting. This allows a compiled binary to run on an FVP, and pass I/O operations to the host system, removing the need to run a full operating system within the FVP.

Compiling and linking object files as separate steps

To compile each of your source files individually into an object file, specify the -c (compile-only) armclang option, and then pass the resulting object files into another invocation of armclang to link them into an executable binary. The -Xlinker argument is not required until the final invocation.

armclang -O3 --target=aarch64-arm-none-eabi -march=armv8-a+sve -c -o example2a.o example2a.c
armclang -O3 --target=aarch64-arm-none-eabi -march=armv8-a+sve -c -o example2b.o example2b.c
armclang -O3 -Xlinker "--ro_base=0x80000000" --target=aarch64-arm-none-eabi 
   -march=armv8-a+sve -o example2 example2a.o example2b.o
Non-ConfidentialPDF file icon PDF version100891_0609_00_en
Copyright © 2016, 2017 Arm Limited (or its affiliates). All rights reserved.