| |||
| Home > ARM Compiler Reference > C and C++ implementation details > Character sets and identifiers | |||
The following points apply to the character sets and identifiers expected by the compilers:
An identifier can be of any length. The compiler truncates an identifier after 256 characters, all of which are significant.
Uppercase and lowercase characters are distinct
in all internal and external identifiers. An identifier may also
contain a dollar ($) character unless the -fussy compiler
option is specified.
Calling setlocale(LC_CTYPE, "ISO8859-1") makes
the isupper() and islower() functions
behave as expected over the full 8-bit Latin-1 alphabet, rather
than over the 7-bit ASCII subset.
The characters in the source character set are assumed to be ISO 8859-1 (Latin-1 Alphabet), a superset of the ASCII character set. The printable characters are those in the range 32 to 126 and 160 to 255. Any printable character may appear in a string or character constant, and in a comment.
The ARM compilers do not support multibyte character sets.
Other properties of the source character set are host specific.
The properties of the execution character set are target-specific. The ARM C and C++ libraries support the ISO 8859-1 (Latin-1 Alphabet) character set, so the following points are valid:
The execution character set is identical to the source character set.
There are eight bits in a character in the execution character set.
There are four chars/bytes in an int. If the memory system is:
the bytes are ordered from least significant at the lowest address to most significant at the highest address.
the bytes are ordered from least significant at the highest address to most significant at the lowest address.
A character constant containing more than one character
has the type int. Up to four characters of the constant
are represented in the integer value. The first character in the
constant occupies the lowest-addressed byte of the integer value. Up
to three following characters are placed at ascending addresses.
Unused bytes are filled with the NULL (\0) character.
All integer character constants that contain a single character, or character escape sequence, are represented in both the source and execution character sets (by an assumption that may be false in any given retargeting of the generic ARM C library).
Characters of the source character set in string literals and character constants map identically into the execution character set (by an assumption that may be false in any given retargeting of the generic ARM C library).
No locale is used to convert multibyte characters into the corresponding wide characters (codes) for a wide character constant. This is not relevant to the generic implementation.
The character escape codes are shown in Table 3.2.