1.9.1 Link time selection of the locale subsystem in the C library

The locale subsystem of the C library can be selected at link time or can be extended to be selectable at runtime.

The following list describes the use of locale categories by the library:

  • The default implementation of each locale category is for the C locale. The library also provides an alternative, ISO8859-1 (Latin-1 alphabet) implementation of each locale category that you can select at link time.

  • Both the C and ISO8859-1 default implementations usually provide one locale for each category to select at runtime.

  • You can replace each locale category individually.

  • You can include as many of your own locales in each category as you choose, and you can name your own locales as you choose.

  • Each locale category uses one word in the private static data of the library.

  • The locale category data is read-only and position independent.

  • scanf() forces the inclusion of the LC_CTYPE locale category, but in either of the default locales this adds only 260 bytes of read-only data to several kilobytes of code.

ISO8859-1 implementation

The default implementation of each locale category is for the C locale. The library also provides an alternative, ISO8859-1 (Latin-1 alphabet) implementation of each locale category that you can select at link time.

The following table shows the ISO8859-1 (Latin-1 alphabet) locale categories.

Table 1-6 Default ISO8859-1 locales

Symbol Description
__use_iso8859_ctype Selects the ISO8859-1 (Latin-1) classification of characters. This is essentially 7-bit ASCII, except that the character codes 160-255 represent a selection of useful European punctuation characters, letters, and accented letters.
__use_iso8859_collate Selects the strcoll/strxfrm collation table appropriate to the Latin-1 alphabet. The default C locale does not require a collation table.
__use_iso8859_monetary Selects the Sterling monetary category using Latin-1 coding.
__use_iso8859_numeric Selects separation of thousands with commas in the printing of numeric values.
__use_iso8859_locale Selects all the ISO8859-1 selections described in this table.

There is no ISO8859-1 version of the LC_TIME category.

Shift-JIS and UTF-8 implementation

The Shift-JIS and UTF-8 locales let you use Japanese and Unicode characters.

The following table shows the Shift-JIS (Japanese characters) or UTF-8 (Unicode characters) locale categories.

Table 1-7 Default Shift-JIS and UTF-8 locales

Function Description
__use_sjis_ctype Sets the character set to the Shift-JIS multibyte encoding of Japanese characters
__use_utf8_ctype Sets the character set to the UTF-8 multibyte encoding of all Unicode characters

The following list describes the effects of Shift-JIS and UTF-8 encoding:

  • The ordinary ctype functions behave correctly on any byte value that is a self-contained character in Shift-JIS. For example, half-width katakana characters that Shift-JIS encodes as single bytes between 0xA6 and 0xDF are treated as alphabetic by isalpha().

    UTF-8 encoding uses the same set of self-contained characters as the ASCII character set.

  • The multibyte conversion functions such as mbrtowc(), mbsrtowcs(), and wcrtomb(), all convert between wide strings in Unicode and multibyte character strings in Shift-JIS or UTF-8.

  • printf("%ls") converts a Unicode wide string into Shift-JIS or UTF-8 output, and scanf("%ls") converts Shift-JIS or UTF-8 input into a Unicode wide string.

Non-ConfidentialPDF file icon PDF versionARM DUI0475M
Copyright © 2010-2016 ARM Limited or its affiliates. All rights reserved.