| ARM Technical Support Knowledge Articles | |
Applies to: ARM Developer Suite (ADS), RealView Developer Kit (RVDK) for OKI, RealView Developer Kit (RVDK) for ST, RealView Developer Kit for XScale (RVXDK), RealView Development Suite (RVDS)
Introduction
Like other RISC architectures, ARM processors are designed to efficiently access 'aligned data' - i.e. words which lie on addresses that are multiples of 4, halfwords which lie on addresses that are multiples of 2. Such data is located on its natural size boundary.
ARM's compilers normally align global variables to these natural size boundaries so that these items can be accessed efficiently using the LDR/STR instructions.
This contrasts with most CISC architectures where instructions are available to directly access 'unaligned data'. This means that care needs to be taken when porting legacy code which carries out such unaligned accesses from such architectures to the ARM.
Unaligned Pointers
The ARM compilers expect normal 'C' pointers to point to an aligned word in memory, as this allows the compiler to generate more efficient code.
For example if the pointer 'int *' is used to read a word, ARM compilers will use an LDR instruction in the generated code. This works as expected when the address is a multiple of 4 (i.e. on a word boundary). However, if the address is not a multiple of 4, then an LDR will return a rotated result rather than performing a true unaligned word load. The actual rotated result depends on the offset and endianess of the system.
For example if a pointer pointing at the address 0x8006 was loaded from, then you might expect to load the contents of bytes from 0x8006, 0x8007, 0x8008 and 0x8009. However on the ARM, such an access would actually load the rotated contents of bytes from 0x8004, 0x8005, 0x8006 and 0x8007.
Thus if you wish to define a pointer to a word that can be at any address (i.e. that can be at a non-natural alignment) then you must specify this using the __packed qualifier when defining the pointer:
__packed int *pi; // pointer to unaligned int
The ARM compilers will not then use an LDR, but instead generate code which correctly accesses the value regardless of the alignment of the pointer. This code generated will be a sequence of byte accesses, or variable alignment-dependent shifting and masking (depending on the compile options) and will therefore incur a performance and code size penalty.
Note that you should not access memory-mapped peripheral registers using __packed, because the ARM compilers can use multiple memory accesses to retrieve the data, and may also access nearby locations which might correspond to other peripheral registers. When bitfields are used, the ARM compiler currently accesses the entire container, not just the field specified.
Unaligned fields in structures
Just as global variables will be located on their natural size boundary, so will the fields in a structure. This means that the compiler will often need to insert padding between fields to ensure that fields are aligned.
Again, this may sometimes not be what is required and again the __packed qualifier can be used to create structures without padding between fields and which will therefore require unaligned accesses.
If the ARM compiler knows the alignment of a particular structure, it can work out whether the fields it is accessing are aligned or not within a packed structure. In such cases it will carry out the more efficient aligned word or halfword accesses where possible and otherwise it will use multiple aligned memory accesses (LDR/STR/LDM/STM) combined with fixed shifting and masking to access the correct bytes in memory.
Whether these accesses to unaligned elements are done inline or by calling a function is controlled using the -Ospace (default, will call a function) and -Otime (do unaligned access inline) compiler options. Consider the simple example:
__packed struct mystruct {
int aligned_i;
short aligned_s;
int unaligned_i;
};
struct mystruct S1;
int foo (int a, short b)
{
S1.aligned_i=a;
S1.aligned_s=b;
return S1.unaligned_i;
}
If this is compiled using armcc -c -Otime foo.c, the code produced will be:
MOV r2,r0
LDR r0,|L1.84|
MOV r12,r2,LSR #8
STRB r2,[r0,#0] ; S1
STRB r12,[r0,#1] ; S1
MOV r12,r2,LSR #16
STRB r12,[r0,#2] ; S1
MOV r12,r2,LSR #24
STRB r12,[r0,#3] ; S1
STRB r1,[r0,#4] ; S1
MOV r12,r1,LSR #8
STRB r12,[r0,#5]
ADD r0,r0,#6
BIC r3,r0,#3
AND r0,r0,#3
MOV r0,r0,LSL #3
LDMIA r3,{r3,r12}
MOV r3,r3,LSR r0
RSB r0,r0,#0x20
ORR r0,r3,r12,LSL r0
MOV pc,lr
It is possible though to give the compiler more information to allow it to know which fields are aligned and which are not. To do this it is necessary to declare non-aligned fields as '__packed', and remove the __packed attribute from the struct itself. This is the recommended approach, and the only way of guaranteeing fast access to naturally aligned members within the struct. This also makes it clearer to the programmer which fields are non-aligned, though care is needed when adding/deleting fields from the struct.
Thus if the definition of the structure is modified to:
struct mystruct {
int aligned_i;
short aligned_s;
__packed int unaligned_d;
};
compiling foo will generate the following much more efficient code:
MOV r2,r0
LDR r0,|L1.32|
STR r2,[r0,#0] ; S1
STRH r1,[r0,#4] ; S1
LDMIB r0,{r3,r12}
MOV r0,r3,LSR #16
ORR r0,r0,r12,LSL #16
MOV pc,lr
The same principle applies to unions. Use the __packed attribute on the components of the union that will be unaligned in memory.
Note: Any __packed object accessed through a pointer has unknown alignment, even packed structures.
Unaligned LDR for accessing halfwords
In some circumstances the ARM compilers can intentionally generate unaligned LDR instructions. In particular the compiler will do this to load halfwords from memory. This is because by using an appropriate address the required halfword can be loaded into the top half of a register and then shifted down to the bottom half. This requires only one memory access whereas doing the same operation using LDRB's would require two memory accesses, plus instructions to merge the two bytes. On ARM architecture v3 and earlier this will typically be done for any halfword loads. On Architecture v4 and later this will be done less often because dedicated halfword load instructions exist, but unaligned LDRs may still be generated - for instance to access an unaligned short within a packed structure.
Note that such unaligned LDRs will only be generated by the ADS/RVCT/RVDS compilers if you enable them using the '-memaccess +L41' option.
[Note that for SDT 2.5x, the compiler will generate unaligned loads by default. This can be disabled using the '-za1' compiler option.]
Porting code and detecting unaligned accesses
Legacy C code for other architectures (e.g. x86 CISC) may perform accesses to unaligned data using pointers which will not work on the ARM. This is non-portable code - such accesses must be identified and corrected to work on RISC architectures which expect aligned data.
Identifying the unaligned accesses can be difficult, because use of load or store with unaligned addresses will give incorrect behavior. But it will be difficult to trace which part of the C source is causing the problem.
ARM processors with full MMUs (e.g. ARM920T) support optional alignment checking where the processor will check every access to ensure it is correctly aligned. The MMU will raise a data abort if an incorrectly aligned access occurs.
Some ARM partners using simple cores such as the ARM7TDMI have implemented alignment-checking for their ASIC/ASSP. This can be done with an additional hardware block external to the ARM core, which monitors the access size and the least significant bits of the address bus for every data access. The ASIC/ASSP can be configured to raise the ABORT signal in the case of an unaligned access. ARM recommends that such logic is included on ASIC/ASSP devices where code will be ported from other architectures.
If the system is configured to abort on unaligned accesses, a data abort exception handler should be installed. When an unaligned access occurs, the data abort handler will be entered - this can identify the erroneous data access instruction which is located at (r14-8).
Once identified, the data access must be fixed by changes to the C source. These changes can be made conditional using the following technique:
ifdef __arm
#define PACKED __packed
#else
#define PACKED
#endif
:
PACKED int *pi;
:
It is best to minimise accesses to unaligned data because of code size and performance overheads.
Unsupported “#pragma pack”
RVCT does not officially support “#pragma pack”. To achieve a similar result, we suggest using __packed instead.
There is a known issue with RVCT 3.0 (and earlier) when using #pragma pack to pack all data within a block of the code. Some padding may be inserted into the wrong place in a data section.
Article last edited on: 2008-09-09 15:47:27
Did you find this article helpful? Yes No
How can we improve this article?