1.9.4 LC_CTYPE data block

The LC_CTYPE data block configures character classification and conversion.

When defining a locale data block in the C library, the macros that define an LC_CTYPE data block are as follows:

  1. Call LC_CTYPE_begin with a symbol name and a locale name.

  2. Call LC_CTYPE_table repeatedly to specify 256 table entries. LC_CTYPE_table takes a single argument in quotes. This must be a comma-separated list of table entries. Each table entry describes one of the 256 possible characters, and can be either an illegal character (IL) or the bitwise OR of one or more of the following flags:

    __S

    whitespace characters

    __P

    punctuation characters

    __B

    printable space characters

    __L

    lowercase letters

    __U

    uppercase letters

    __N

    decimal digits

    __C

    control characters

    __X

    hexadecimal digit letters A-F and a-f

    __A

    alphabetic but neither uppercase nor lowercase, such as Japanese katakana.

    Note:

    A printable space character is defined as any character where the result of both isprint() and isspace() is true.

    __A must not be specified for the same character as either __N or __X.

  3. If required, call one or both of the following optional macros:

    • LC_CTYPE_full_wctype. Calling this macro without arguments causes the C99 wide-character ctype functions (iswalpha(), iswupper(), ...) to return useful values across the full range of Unicode when this LC_CTYPE locale is active. If this macro is not specified, the wide ctype functions treat the first 256 wchar_t values as the same as the 256 char values, and the rest of the wchar_t range as containing illegal characters.

    • LC_CTYPE_multibyte defines this locale to be a multibyte character set. Call this macro with three arguments. The first two arguments are the names of functions that perform conversion between the multibyte character set and Unicode wide characters. The last argument is the value that must be taken by the C macro MB_CUR_MAX for the respective character set. The two function arguments have the following prototypes:

      size_t internal_mbrtowc(wchar_t *pwc, char c, mbstate_t *pstate);
      size_t internal_wcrtomb(char *s, wchar_t w, mbstate_t *pstate);
      

      internal_mbrtowc()

      takes one byte, c, as input, and updates the mbstate_t pointed to by pstate as a result of reading that byte. If the byte completes the encoding of a multibyte character, it writes the corresponding wide character into the location pointed to by pwc, and returns 1 to indicate that it has done so. If not, it returns -2 to indicate the state change of mbstate_t and that no character is output. Otherwise, it returns -1 to indicate that the encoded input is invalid.

      internal_wcrtomb()

      takes one wide character, w, as input, and writes some number of bytes into the memory pointed to by s. It returns the number of bytes output, or -1 to indicate that the input character has no valid representation in the multibyte character set.

  4. Call LC_CTYPE_end, without arguments, to finish the locale block definition.

Example LC_CTYPE data block

        LC_CTYPE_begin utf8_ctype, "UTF-8"
        ;
        ; Single-byte characters in the low half of UTF-8 are exactly
        ; the same as in the normal "C" locale.
        LC_CTYPE_table "__C, __C, __C, __C, __C, __C, __C, __C, __C" ; 0x00-0x08
        LC_CTYPE_table "__C|__S, __C|__S, __C|__S, __C|__S, __C|__S"
                                                         ; 0x09-0x0D(BS,LF,VT,FF,CR)
        LC_CTYPE_table "__C, __C, __C, __C, __C, __C, __C, __C, __C" ; 0x0E-0x16
        LC_CTYPE_table "__C, __C, __C, __C, __C, __C, __C, __C, __C" ; 0x17-0x1F
        LC_CTYPE_table "__B|__S" ; space
        LC_CTYPE_table "__P, __P, __P, __P, __P, __P, __P, __P" ; !"#$%&'(
        LC_CTYPE_table "__P, __P, __P, __P, __P, __P, __P" ; )*+,-./
        LC_CTYPE_table "__N, __N, __N, __N, __N, __N, __N, __N, __N, __N" ; 0-9
        LC_CTYPE_table "__P, __P, __P, __P, __P, __P, __P" ; :;<=>?@
        LC_CTYPE_table "__U|__X, __U|__X, __U|__X, __U|__X, __U|__X, __U|__X" ; A-F
        LC_CTYPE_table "__U, __U, __U, __U, __U, __U, __U, __U, __U, __U" ; G-P
        LC_CTYPE_table "__U, __U, __U, __U, __U, __U, __U, __U, __U, __U" ; Q-Z
        LC_CTYPE_table "__P, __P, __P, __P, __P, __P" ; [\]^_`
        LC_CTYPE_table "__L|__X, __L|__X, __L|__X, __L|__X, __L|__X, __L|__X" ; a-f
        LC_CTYPE_table "__L, __L, __L, __L, __L, __L, __L, __L, __L, __L" ; g-p
        LC_CTYPE_table "__L, __L, __L, __L, __L, __L, __L, __L, __L, __L" ; q-z
        LC_CTYPE_table "__P, __P, __P, __P" ; {|}~
        LC_CTYPE_table "__C" ; 0x7F
        ;
        ; Nothing in the top half of UTF-8 is valid on its own as a
        ; single-byte character, so they are all illegal characters (IL).
        LC_CTYPE_table "IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL"
        LC_CTYPE_table "IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL"
        LC_CTYPE_table "IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL"
        LC_CTYPE_table "IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL"
        LC_CTYPE_table "IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL"
        LC_CTYPE_table "IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL"
        LC_CTYPE_table "IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL"
        LC_CTYPE_table "IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL,IL"
        ;
        ; The UTF-8 ctype locale wants the full version of wctype.
        LC_CTYPE_full_wctype
        ;
        ; UTF-8 is a multibyte locale, so we must specify some
        ; conversion functions. MB_CUR_MAX is 6 for UTF-8 (the lead
        ; bytes 0xFC and 0xFD are each followed by five continuation
        ; bytes).
        ;
        ; The implementations of the conversion functions are not
        ; provided in this example.
        ;
        IMPORT  utf8_mbrtowc
        IMPORT  utf8_wcrtomb
        LC_CTYPE_multibyte utf8_mbrtowc, utf8_wcrtomb, 6
        LC_CTYPE_end
Non-ConfidentialPDF file icon PDF versionARM DUI0475M
Copyright © 2010-2016 ARM Limited or its affiliates. All rights reserved.