|Non-Confidential||PDF version||ARM DUI0472J|
|Home > Standard C Implementation Definition > Characters|
Describes implementation-defined aspects of the ARM C compiler and C library relating to characters, as required by the ISO C standard.
The following points apply to the character sets expected by the compiler:
setlocale(LC_CTYPE, "ISO8859-1") makes the
islower() functions behave as expected over the full
8-bit Latin-1 alphabet, rather than over the 7-bit ASCII subset. The locale must be selected at link
Source files are compiled according to the currently selected locale. You might have to
select a different locale, with the
--locale command-line option, if the source
file contains non-ASCII characters.
The compiler supports multibyte character sets, such as Unicode.
Other properties of the source character set are host-specific.
The properties of the execution character set are target-specific. The ARM C and C++ libraries support the ISO 8859-1 (Latin-1 Alphabet) character set with the following consequences:
The execution character set is identical to the source character set.
There are eight bits in a character in the execution character set.
There are four characters (bytes) in an
int. If the memory system
The bytes are ordered from least significant at the lowest address to most significant at the highest address.
The bytes are ordered from least significant at the highest address to most significant at the lowest address.
In C all character constants have type
int. In C++ a character
constant containing one character has the type
char and a character
constant containing more than one character has the type
int. Up to four
characters of the constant are represented in the integer value. The last character in the constant
occupies the lowest-order byte of the integer value. Up to three preceding characters are placed at
higher-order bytes. Unused bytes are filled with the
All integer character constants that contain a single character, or character escape sequence, are represented in both the source and execution character sets.The following table lists the supported character escape codes.
Table 15-1 Character escape codes
|Escape sequence||Char value||Description|
||New line (line feed)|
||ASCII code in hexadecimal|
||ASCII code in octal|
Characters of the source character set in string literals and character constants map identically into the execution character set.
Data items of type
char are unsigned by default. They can be
explicitly declared as
signed char or
--signed_chars option makes the
--unsigned_chars option makes the
Care must be taken when mixing translation units that have been compiled with and without the
--unsigned_chars options, and that share
interfaces or data structures.
The ARM ABI defines
char as an unsigned byte, and this is the
interpretation used by the C++ libraries supplied with the ARM compilation tools.
Converting multibyte characters into the corresponding wide characters for a wide character constant does not use a locale. This is not relevant to the generic implementation.