2.3.1. #pragma unroll [(n)]

This pragma instructs the compiler to unroll a loop by n iterations.


Both vectorized and non vectorized loops can be unrolled using #pragma unroll [(n)]. That is, #pragma unroll [(n)] applies to both --vectorize and --no_vectorize.


#pragma unroll
#pragma unroll (n)



is an optional value indicating the number of iterations to unroll.


If you do not specify a value for n, the compiler assumes #pragma unroll (4).


When compiling at -O3 -Otime, the compiler automatically unrolls loops where it is beneficial to do so. You can use this pragma to request that the compiler to unroll a loop that has not been unrolled automatically.


Use this #pragma only when you have evidence, for example from --diag_warning=optimizations, that the compiler is not unrolling loops optimally by itself.


#pragma unroll [(n)] can be used only immediately before a for loop, a while loop, or a do ... while loop.


void matrix_multiply(float ** __restrict dest, float ** __restrict src1,
    float ** __restrict src2, unsigned int n)
    unsigned int i, j, k;

    for (i = 0; i < n; i++)
        for (k = 0; k < n; k++)
            float sum = 0.0f;
            /* #pragma unroll */
            for(j = 0; j < n; j++)
                sum += src1[i][j] * src2[j][k];
            dest[i][k] = sum;

In this example, the compiler does not normally complete its loop analysis because src2 is indexed as src2[j][k] but the loops are nested in the opposite order, that is, with j inside k. When #pragma unroll is uncommented in the example, the compiler proceeds to unroll the loop four times.

If the intention is to multiply a matrix that is not a multiple of four in size, for example an n * n matrix, #pragma unroll (m) could be used instead, where m is some value such that n is an integral multiple of m.

See also

Copyright © 2007 ARM Limited. All rights reserved.ARM DUI 0350A