| |||
| Home > ARM Compiler Reference > C and C++ implementation details > Character sets and identifiers | |||
The following points apply to the character sets and identifiers expected by the compiler:
Uppercase
and lowercase characters are distinct in all internal and external identifiers.
An identifier can also contain a dollar ($) character
unless the --strict compiler option is specified.
To permit dollar signs in identifiers with the --strict option,
then also use the --dollar command-line option.
Calling setlocale(LC_CTYPE, "ISO8859-1") makes
the isupper() and islower() functions
behave as expected over the full 8-bit Latin-1 alphabet, rather
than over the seven-bit ASCII subset. The locale must be selected
at link-time. (See Tailoring locale and CTYPE.)
The characters in the source character set are assumed to be ISO 8859-1 (Latin-1 Alphabet), a superset of the ASCII character set. The printable characters are those in the range 32 to 126 and 160 to 255. Any printable character can appear in a string or character constant, and in a comment.
The ARM compiler supports multibyte character sets, such as Unicode.
Other properties of the source character set are host-specific.
The properties of the execution character set are target-specific. The ARM C and C++ libraries support the ISO 8859-1 (Latin-1 Alphabet) character set with the following consequences:
The execution character set is identical to the source character set.
There are eight bits in a character in the execution character set.
There are four characters (bytes) in an int. If the memory system is:
The bytes are ordered from least significant at the lowest address to most significant at the highest address.
The bytes are ordered from least significant at the highest address to most significant at the lowest address. (On this product, only little-endian is supported.)
In C all character constants have type int.
In C++ a character constant containing one character has the type char and
a character constant containing more than one character has the
type int. Up to four characters of the constant are
represented in the integer value. The last character in the constant
occupies the lowest-order byte of the integer value. Up to three
preceding characters are placed at higher-order bytes. Unused bytes
are filled with the NULL (\0)
character.
All integer character constants that contain a single character, or character escape sequence (see Table 3.2), are represented in both the source and execution character sets.
Characters of the source character set in string literals and character constants map identically into the execution character set.
Data items of type char are unsigned by default. They can be explicitly declared as signed char or unsigned char:
the --signed_chars option
can be used to make the char signed
the --unsigned_chars option can
be used to make the char unsigned.
The --signed_chars option is not recommended
for general use and is not required for ISO-compatible source. Code
compiled with this option is not compliant with the ABI
for the ARM Architecture (base standard) [BSABI], and
incorrect use might result in a failure at runtime. This option
is not supported by the C++ libraries.
No locale is used to convert multibyte characters into the corresponding wide characters (codes) for a wide character constant. This is not relevant to the generic implementation.