Provided by: m17n-docs_1.6.2-2_all bug

NAME

       Charset - Charset objects and API for them.

   Defines
       #define MCHAR_INVALID_CODE
           Invalid code-point.

   Functions
       MSymbol mchar_define_charset (const char *name, MPlist *plist)
           Define a charset.
       MSymbol mchar_resolve_charset (MSymbol symbol)
           Resolve charset name.
       int mchar_list_charset (MSymbol **symbols)
           List symbols representing charsets.
       int mchar_decode (MSymbol charset_name, unsigned code)
           Decode a code-point.
       unsigned mchar_encode (MSymbol charset_name, int c)
           Encode a character code.
       int mchar_map_charset (MSymbol charset_name, void(*func)(int from, int to, void *arg),
           void *func_arg)
           Call a function for all the characters in a specified charset.

   Variables
       MSymbol Mcharset
           The symbol Mcharset.

   Variables: Symbols representing a charset.
        Each of the following symbols represents a predefined charset.
       MSymbol Mcharset_ascii
           Symbol representing the charset ASCII.
       MSymbol Mcharset_iso_8859_1
           Symbol representing the charset ISO/IEC 8859/1.
       MSymbol Mcharset_unicode
           Symbol representing the charset Unicode.
       MSymbol Mcharset_m17n
           Symbol representing the largest charset.
       MSymbol Mcharset_binary
           Symbol representing the charset for ill-decoded characters.

   Variables: Parameter keys for mchar_define_charset().
        These are the predefined symbols to use as parameter keys for the function
       mchar_define_charset() (which see).
       MSymbol Mmethod
       MSymbol Mdimension
       MSymbol Mmin_range
       MSymbol Mmax_range
       MSymbol Mmin_code
       MSymbol Mmax_code
       MSymbol Mascii_compatible
       MSymbol Mfinal_byte
       MSymbol Mrevision
       MSymbol Mmin_char
       MSymbol Mmapfile
       MSymbol Mparents
       MSymbol Msubset_offset
       MSymbol Mdefine_coding
       MSymbol Maliases

   Variables: Symbols representing charset methods.
        These are the predefined symbols that can be a value of the Mmethod parameter of a
       charset used in an argument to the mchar_define_charset() function.

       A method specifies how code-points and character codes are converted. See the
       documentation of the mchar_define_charset() function for the details.
       MSymbol Moffset
           Symbol for the offset type method of charset.
       MSymbol Mmap
           Symbol for the map type method of charset.
       MSymbol Munify
           Symbol for the unify type method of charset.
       MSymbol Msubset
           Symbol for the subset type method of charset.
       MSymbol Msuperset
           Symbol for the superset type method of charset.

Detailed Description

       Charset objects and API for them.

       The m17n library uses charset objects to represent a coded character sets (CCS). The m17n
       library supports many predefined coded character sets. r, application programs can add
       other charsets. A character can belong to multiple charsets.

       The m17n library distinguishes the following three concepts:

       • A code-point is a number assigned by the CCS to each character. Code-points may or may
         not be continuous. The type unsigned is used to represent a code-point. An invalid
         code-point is represented by the macro MCHAR_INVALID_CODE.
       • A character index is the canonical index of a character in a CCS. The character that has
         the character index N occupies the Nth position when all the characters in the current
         CCS are sorted by their code-points. Character indices in a CCS are continuous and start
         with 0.
       • A character code is the internal representation in the m17n library of a character. A
         character code is a signed integer of 21 bits or longer.
       Each charset object defines how characters are converted between code-points and character
       codes. To encode means converting code-points to character codes and to decode means
       converting character codes to code-points.

Define Documentation

   #define MCHAR_INVALID_CODE
       Invalid code-point. The macro MCHAR_INVALID_CODE gives the invalid code-point.

Variable Documentation

   MSymbol Mcharset_ascii
       Symbol representing the charset ASCII. The symbol Mcharset_ascii has name 'ascii' and
       represents the charset ISO 646, USA Version X3.4-1968 (ISO-IR-6).
   MSymbol Mcharset_iso_8859_1
       Symbol representing the charset ISO/IEC 8859/1. The symbol Mcharset_iso_8859_1 has name
       'iso-8859-1' and represents the charset ISO/IEC 8859-1:1998.
   MSymbol Mcharset_unicode
       Symbol representing the charset Unicode. The symbol Mcharset_unicode has name 'unicode'
       and represents the charset Unicode.
   MSymbol Mcharset_m17n
       Symbol representing the largest charset. The symbol Mcharset_m17n has name 'm17n' and
       represents the charset that contains all characters supported by the m17n library.
   MSymbol Mcharset_binary
       Symbol representing the charset for ill-decoded characters. The symbol Mcharset_binary has
       name 'binary' and represents the fake charset which the decoding functions put to an
       M-text as a text property when they encounter an invalid byte (sequence).
       See Code Conversion for more details.
   MSymbol Mmethod
   MSymbol Mdimension
   MSymbol Mmin_range
   MSymbol Mmax_range
   MSymbol Mmin_code
   MSymbol Mmax_code
   MSymbol Mascii_compatible
   MSymbol Mfinal_byte
   MSymbol Mrevision
   MSymbol Mmin_char
   MSymbol Mmapfile
   MSymbol Mparents
   MSymbol Msubset_offset
   MSymbol Mdefine_coding
   MSymbol Maliases
   MSymbol Moffset
       Symbol for the offset type method of charset. The symbol Moffset has the name 'offset'
       and, when used as a value of Mmethod parameter of a charset, it means that the conversion
       of code-points and character codes of the charset is done by this calculation:
       CHARACTER-CODE = CODE-POINT - MIN-CODE + MIN-CHAR
       where, MIN-CODE is a value of Mmin_code parameter of the charset, and MIN-CHAR is a value
       of Mmin_char parameter.
   MSymbol Mmap
       Symbol for the map type method of charset. The symbol Mmap has the name 'map' and, when
       used as a value of Mmethod parameter of a charset, it means that the conversion of
       code-points and character codes of the charset is done by map looking up. The map must be
       given by Mmapfile parameter.
   MSymbol Munify
       Symbol for the unify type method of charset. The symbol Munify has the name 'unify' and,
       when used as a value of Mmethod parameter of a charset, it means that the conversion of
       code-points and character codes of the charset is done by map looking up and offsetting.
       The map must be given by Mmapfile parameter. For this kind of charset, a unique continuous
       character code space for all characters is assigned.
       If the map has an entry for a code-point, the conversion is done by looking up the map.
       Otherwise, the conversion is done by this calculation:
       CHARACTER-CODE = CODE-POINT - MIN-CODE + LOWEST-CHAR-CODE
       where, MIN-CODE is a value of Mmin_code parameter of the charset, and LOWEST-CHAR-CODE is
       the lowest character code of the assigned code space.
   MSymbol Msubset
       Symbol for the subset type method of charset. The symbol Msubset has the name 'subset'
       and, when used as a value of Mmethod parameter of a charset, it means that the charset is
       a subset of a parent charset. The parent charset must be given by Mparents parameter. The
       conversion of code-points and character codes of the charset is done conceptually by this
       calculation:
       CHARACTER-CODE = PARENT-CODE (CODE-POINT) + SUBSET-OFFSET
       where, PARENT-CODE is a pseudo function that returns a character code of CODE-POINT in the
       parent charset, and SUBSET-OFFSET is a value given by Msubset_offset parameter.
   MSymbol Msuperset
       Symbol for the superset type method of charset. The symbol Msuperset has the name
       'superset' and, when used as a value of Mmethod parameter of a charset, it means that the
       charset is a superset of parent charsets. The parent charsets must be given by Mparents
       parameter.
   MSymbol Mcharset
       The symbol Mcharset. Any decoded M-text has a text property whose key is the predefined
       symbol Mcharset. The name of Mcharset is 'charset'.

Author

       Generated automatically by Doxygen for The m17n Library from the source code.

COPYRIGHT

       Copyright (C) 2001 Information-technology Promotion Agency (IPA)
       Copyright (C) 2001-2011 National Institute of Advanced Industrial Science and Technology
       (AIST)
       Permission is granted to copy, distribute and/or modify this document under the terms of
       the GNU Free Documentation License <http://www.gnu.org/licenses/fdl.html>.