ARM Technical Support Knowledge Articles

How do the ARM Compilers handle memcpy()?

Applies to: ARM Developer Suite (ADS), RealView Developer Kit (RVDK) for OKI, RealView Developer Kit (RVDK) for ST, RealView Development Suite (RVDS)

Answer

In many cases, when compiling calls to memcpy(), the ARM C compiler will generate calls to specialized, optimised, library functions instead. Since RVCT 2.1, these specialized functions are part of the ABI for the ARM architecture (AEABI), and include:

  • __aeabi_memcpy
    This function is the same as ANSI C memcpy, except that the return value is void.
  • __aeabi_memcpy4 
    This function is the same as __aeabi_memcpy; but may assume the pointers are 4-byte aligned.
  • __aeabi_memcpy8 
    This function is the same as __aeabi_memcpy but may assume the pointers are 8-byte aligned.
  • The linker will select the optimal versions of these functions to use depending on the selected target processor and build options. In many cases when unaligned accesses are permitted, the 3 variants may map to the same function.  As memcpy is typically heavily used and performance critical, ARM versions of these functions are always selected, unless the target processor does not support the ARM instruction set (for example, the Cortex-M3 processor). The linker will provide a state-change as required (for example an inline veneer or a BLX instruction as the function call).

    A further optimization may take place if the compiler can determine that you require a copy of a small number of bytes (typically <= 64) which is a multiple of four (e.g. 36 bytes). In this case, rather than calling a function the compiler will generate multiple LDM/STM instructions to perform the copy.

    Due to these optimizations, you must take care when copying data using unaligned pointers. The ARM compiler assumes that all pointers are naturally-aligned (i.e. int* is word-aligned, short* is halfword-aligned, etc.). You need to either explicitly tell the compiler when you are using unaligned pointers by using the __packed keyword (described in the compiler guide), or create a temporary char* pointer to access the address. For example:

    #include <string.h>
    
    unsigned int * const dest;
    
    void example (unsigned int * const unaligned_ptr)
    {
      __packed unsigned int * packed_ptr = unaligned_ptr;
      char * temp_ptr = (char *)unaligned_ptr;
      memcpy(dest, unaligned_ptr, 32);         /* Unsafe */
      memcpy(dest, (void *)packed_ptr, 32);    /* Safe   */
      memcpy(dest, temp_ptr, 32);              /* Safe   */
    }
    

    In both of the safe cases the compiler will generate code (or call functions) that work regardless of the pointer alignment. In the unsafe case the compiler is likely to perform the copy using LDM and STM instructions, which do not work if the pointers are unaligned, even on processors that support unaligned accesses.

    In a similar way, calls to memmove() and memset() may result in calls to optimized versions which assume 4 or 8 byte alignment, or generate instructions inline. Calls to memset() with zero as the initializing value result in a call to an optimised memclr().

    If you wish to provide your own implementations of these functions, you must also provide implementations of the __aeabi* versions. These optimised functions are described further in the Run-time ABI for the ARM Architecture, part of the AEABI, which can be found at http://www.arm.com/products/DevTools/ABI.html.

Article last edited on: 2011-02-04 10:33:55

Rate this article

[Bad]
|
|
[Good]
Disagree? Move your mouse over the bar and click

Did you find this article helpful? Yes No

How can we improve this article?

Link to this article
Copyright © 2011 ARM Limited. All rights reserved. External (Open), Non-Confidential