Provided by: m17n-docs_1.8.4-1_all bug

NAME

       m17nConv_-_Code_Conver -  - Coding system objects and API for them.

SYNOPSIS

   Data Structures
       struct MConverter
           Structure to be used in code conversion.
       struct MCodingInfoISO2022
           Structure for a coding system of type MCODING_TYPE_ISO_2022.
       struct MCodingInfoUTF
           Structure for extra information about a coding system of type MCODING_TYPE_UTF.

   Enumerations
       enum MConversionResult { MCONVERSION_RESULT_SUCCESS, MCONVERSION_RESULT_INVALID_BYTE,
           MCONVERSION_RESULT_INVALID_CHAR, MCONVERSION_RESULT_INSUFFICIENT_SRC,
           MCONVERSION_RESULT_INSUFFICIENT_DST, MCONVERSION_RESULT_IO_ERROR }
           Codes that represent the result of code conversion.
       enum MCodingType { MCODING_TYPE_CHARSET, MCODING_TYPE_UTF, MCODING_TYPE_ISO_2022,
           MCODING_TYPE_MISC }
           Types of coding system.
            "
       enum MCodingFlagISO2022 { MCODING_ISO_RESET_AT_EOL = 0x1, MCODING_ISO_RESET_AT_CNTL = 0x2,
           MCODING_ISO_EIGHT_BIT = 0x4, MCODING_ISO_LONG_FORM = 0x8, MCODING_ISO_DESIGNATION_G0 =
           0x10, MCODING_ISO_DESIGNATION_G1 = 0x20, MCODING_ISO_DESIGNATION_CTEXT = 0x40,
           MCODING_ISO_DESIGNATION_CTEXT_EXT = 0x80, MCODING_ISO_LOCKING_SHIFT = 0x100,
           MCODING_ISO_SINGLE_SHIFT = 0x200, MCODING_ISO_SINGLE_SHIFT_7 = 0x400,
           MCODING_ISO_EUC_TW_SHIFT = 0x800, MCODING_ISO_ISO6429 = 0x1000,
           MCODING_ISO_REVISION_NUMBER = 0x2000, MCODING_ISO_FULL_SUPPORT = 0x3000,
           MCODING_ISO_FLAG_MAX }
           Bit-masks to specify the detail of coding system whose type is MCODING_TYPE_ISO_2022.
            "

   Functions
       MSymbol mconv_define_coding (const char *name, MPlist *plist, int(*resetter)(MConverter
           *), int(*decoder)(const unsigned char *, int, MText *, MConverter *),
           int(*encoder)(MText *, int, int, unsigned char *, int, MConverter *), void
           *extra_info)
       MSymbol mconv_resolve_coding (MSymbol symbol)
           Resolve coding system name.
       int mconv_list_codings (MSymbol **symbols)
           List symbols representing coding systems.
       MConverter * mconv_buffer_converter (MSymbol name, const unsigned char *buf, int n)
           Create a code converter bound to a buffer.
       MConverter * mconv_stream_converter (MSymbol name, FILE *fp)
           Create a code converter bound to a stream.
       int mconv_reset_converter (MConverter *converter)
           Reset a code converter.
       void mconv_free_converter (MConverter *converter)
           Free a code converter.
       MConverter * mconv_rebind_buffer (MConverter *converter, const unsigned char *buf, int n)
           Bind a buffer to a code converter.
       MConverter * mconv_rebind_stream (MConverter *converter, FILE *fp)
           Bind a stream to a code converter.
       MText * mconv_decode (MConverter *converter, MText *mt)
           Decode a byte sequence into an M-text.
       MText * mconv_decode_buffer (MSymbol name, const unsigned char *buf, int n)
           Decode a buffer area based on a coding system.
       MText * mconv_decode_stream (MSymbol name, FILE *fp)
           Decode a stream input based on a coding system.
       int mconv_encode (MConverter *converter, MText *mt)
           Encode an M-text into a byte sequence.
       int mconv_encode_range (MConverter *converter, MText *mt, int from, int to)
           Encode a part of an M-text.
       int mconv_encode_buffer (MSymbol name, MText *mt, unsigned char *buf, int n)
           Encode an M-text into a buffer area.
       int mconv_encode_stream (MSymbol name, MText *mt, FILE *fp)
           Encode an M-text to write to a stream.
       int mconv_getc (MConverter *converter)
           Read a character via a code converter.
       int mconv_ungetc (MConverter *converter, int c)
           Push a character back to a code converter.
       int mconv_putc (MConverter *converter, int c)
           Write a character via a code converter.
       MText * mconv_gets (MConverter *converter, MText *mt)
           Read a line using a code converter.

   Variables: Symbols representing coding systems
       MSymbol Mcoding_us_ascii
           Symbol for the coding system US-ASCII.
       MSymbol Mcoding_iso_8859_1
           Symbol for the coding system ISO-8859-1.
       MSymbol Mcoding_utf_8
           Symbol for the coding system UTF-8.
       MSymbol Mcoding_utf_8_full
           Symbol for the coding system UTF-8-FULL.
       MSymbol Mcoding_utf_16
           Symbol for the coding system UTF-16.
       MSymbol Mcoding_utf_16be
           Symbol for the coding system UTF-16BE.
       MSymbol Mcoding_utf_16le
           Symbol for the coding system UTF-16LE.
       MSymbol Mcoding_utf_32
           Symbol for the coding system UTF-32.
       MSymbol Mcoding_utf_32be
           Symbol for the coding system UTF-32BE.
       MSymbol Mcoding_utf_32le
           Symbol for the coding system UTF-32LE.
       MSymbol Mcoding_sjis
           Symbol for the coding system SJIS.

   Variables: Parameter keys for mconv_define_coding(). <br>
       MSymbol Mtype
       MSymbol Mcharsets
       MSymbol Mflags
       MSymbol Mdesignation
       MSymbol Minvocation
       MSymbol Mcode_unit
       MSymbol Mbom
       MSymbol Mlittle_endian

   Variables: Symbols representing coding system types. <br>
       MSymbol Mutf
       MSymbol Miso_2022

   Variables: Symbols appearing in the value of Mflags parameter. <br>
        Symbols that can be a value of the Mflags parameter of a coding system used in an
       argument to the mconv_define_coding() function (which see).

       MSymbol Mreset_at_eol
       MSymbol Mreset_at_cntl
       MSymbol Meight_bit
       MSymbol Mlong_form
       MSymbol Mdesignation_g0
       MSymbol Mdesignation_g1
       MSymbol Mdesignation_ctext
       MSymbol Mdesignation_ctext_ext
       MSymbol Mlocking_shift
       MSymbol Msingle_shift
       MSymbol Msingle_shift_7
       MSymbol Meuc_tw_shift
       MSymbol Miso_6429
       MSymbol Mrevision_number
       MSymbol Mfull_support

   Variables: Others
        Remaining variables.

       MSymbol Mmaybe
           Symbol whose name is 'maybe'.
       MSymbol Mcoding
           The symbol Mcoding.

Detailed Description

       Coding system objects and API for them.

       The m17n library represents a character encoding scheme (CES) of coded character sets
       (CCS) as an object called coding system. Application programs can add original coding
       systems.

       To encode means converting code-points to character codes and to decode means converting
       character codes back to code-points.

       Application programs can decode a byte sequence with a specified coding system into an
       M-text, and inversely, can encode an M-text into a byte sequence.

Data Structure Documentation

   MConverter
       Structure to be used in code conversion.

       FIELD DOCUMENTATION:

       int MConverter::lenient Set the value to nonzero if the conversion should be lenient. By
       default, the conversion is strict (i.e. not lenient).

       If the conversion is strict, the converter stops at the first invalid byte (on decoding)
       or at the first character not supported by the coding system (on encoding). If this
       happens, MConverter->result is set to MCONVERSION_RESULT_INVALID_BYTE or
       MCONVERSION_RESULT_INVALID_CHAR accordingly.

       If the conversion is lenient, on decoding, an invalid byte is kept per se, and on
       encoding, an invalid character is replaced with '<U+XXXX>' (if the character is a Unicode
       character) or with '<M+XXXXXX>' (otherwise).

       int MConverter::last_block Set the value to nonzero before decoding or encoding the last
       block of the byte sequence or the character sequence respectively. The value influences
       the conversion as below.

       On decoding, in the case that the last few bytes are too short to form a valid byte
       sequence:

       If the value is nonzero, the conversion terminates by error
       (MCONVERSION_RESULT_INVALID_BYTE) at the first byte of the sequence.

       If the value is zero, the conversion terminates successfully. Those bytes are stored in
       the converter as carryover and are prepended to the byte sequence of the further
       conversion.

       On encoding, in the case that the coding system is context dependent:

       If the value is nonzero, the conversion may produce a byte sequence at the end to reset
       the context to the initial state even if the source characters are zero.

       If the value is zero, the conversion never produce such a byte sequence at the end.

       unsigned MConverter::at_most If the value is nonzero, it specifies at most how many
       characters to convert.

       int MConverter::nchars The following three members are to report the result of the
       conversion.

       Number of characters most recently decoded or encoded.

       int MConverter::nbytes Number of bytes recently decoded or encoded.

       enum MConversionResult MConverter::result Result code of the conversion.

       void* MConverter::ptr

       double MConverter::dbl

       char MConverter::c[256]

       union { ... }  MConverter::status Various information about the status of code conversion.
       The contents depend on the type of coding system. It is assured that status is aligned so
       that any type of casting is safe and at least 256 bytes of memory space can be used.

       void* MConverter::internal_info This member is for internally use only. An application
       program should never touch it.

   MCodingInfoISO2022
       Structure for a coding system of type MCODING_TYPE_ISO_2022.

       FIELD DOCUMENTATION:

       int MCodingInfoISO2022::initial_invocation[2] Table of numbers of an ISO2022 code
       extension element invoked to each graphic plane (Graphic Left and Graphic Right). -1 means
       no code extension element is invoked to that plane.

       char MCodingInfoISO2022::designations[32] Table of code extension elements. The Nth
       element corresponds to the Nth charset in charset_names, which is an argument given to the
       mconv_define_coding() function.

       If an element value is 0..3, it specifies a graphic register number to designate the
       corresponds charset. In addition, the charset is initially designated to that graphic
       register.

       If the value is -4..-1, it specifies a graphic register number 0..3 respectively to
       designate the corresponds charset. Initially, the charset is not designated to any graphic
       register.

       unsigned MCodingInfoISO2022::flags Bitwise OR of enum MCodingFlagISO2022 .

   MCodingInfoUTF
       Structure for extra information about a coding system of type MCODING_TYPE_UTF.

       FIELD DOCUMENTATION:

       int MCodingInfoUTF::code_unit_bits Specify bits of a code unit. The value must be 8, 16,
       or 32.

       int MCodingInfoUTF::bom Specify how to handle the heading BOM (byte order mark). The value
       must be 0, 1, or 2. The meanings are as follows:

       0: On decoding, check the first two byte. If they are BOM, decide endian by them. If not,
       decide endian by the member endian. On encoding, produce byte sequence according to endian
       with heading BOM.

       1: On decoding, do not handle the first two bytes as BOM, and decide endian by endian. On
       encoding, produce byte sequence according to endian without BOM.

       2: On decoding, handle the first two bytes as BOM and decide ending by them. On encoding,
       produce byte sequence according to endian with heading BOM.

       If <code_unit_bits> is 8, the value has no meaning.

       int MCodingInfoUTF::endian Specify the endian type. The value must be 0 or 1. 0 means
       little endian, and 1 means big endian.

       If <code_unit_bits> is 8, the value has no meaning.

Enumeration Type Documentation

   enum MConversionResult
       Codes that represent the result of code conversion. One of these values is set in
       MConverter->result.

       Enumerator

       MCONVERSION_RESULT_SUCCESS
              Code conversion is successful.

       MCONVERSION_RESULT_INVALID_BYTE
              On decoding, the source contains an invalid byte.

       MCONVERSION_RESULT_INVALID_CHAR
              On encoding, the source contains a character that cannot be encoded by the
              specified coding system.

       MCONVERSION_RESULT_INSUFFICIENT_SRC
              On decoding, the source ends with an incomplete byte sequence.

       MCONVERSION_RESULT_INSUFFICIENT_DST
              On encoding, the destination is too short to store the result.

       MCONVERSION_RESULT_IO_ERROR
              An I/O error occurred in the conversion.

   enum MCodingType
       Types of coding system.

       Enumerator

       MCODING_TYPE_CHARSET
              A coding system of this type supports charsets directly. The dimension of each
              charset defines the length of bytes to represent a single character of the charset,
              and a byte sequence directly represents the code-point of a character. The m17n
              library provides the default decoding and encoding routines of this type.

       MCODING_TYPE_UTF
              A coding system of this type supports byte sequences of a UTF (UTF-8, UTF-16,
              UTF-32) like structure. The m17n library provides the default decoding and encoding
              routines of this type.

       MCODING_TYPE_ISO_2022
              A coding system of this type supports byte sequences of an ISO-2022 like structure.
              The details of each structure are specified by MCodingInfoISO2022 . The m17n
              library provides decoding and encoding routines of this type.

       MCODING_TYPE_MISC
              A coding system of this type is for byte sequences of miscellaneous structures. The
              m17n library does not provide decoding and encoding routines of this type. They
              must be provided by the application program.

   enum MCodingFlagISO2022
       Bit-masks to specify the detail of coding system whose type is MCODING_TYPE_ISO_2022.

       Enumerator

       MCODING_ISO_RESET_AT_EOL
              On encoding, reset the invocation and designation status to initial at end of line.

       MCODING_ISO_RESET_AT_CNTL
              On encoding, reset the invocation and designation status to initial before any
              control codes.

       MCODING_ISO_EIGHT_BIT
              Use the right graphic plane.

       MCODING_ISO_LONG_FORM
              Use the non-standard 4 bytes format for designation sequence for charsets
              JISX0208-1978, GB2312, and JISX0208-1983.

       MCODING_ISO_DESIGNATION_G0
              On encoding, unless explicitly specified, designate charsets to G0.

       MCODING_ISO_DESIGNATION_G1
              On encoding, unless explicitly specified, designate charsets except for ASCII to
              G1.

       MCODING_ISO_DESIGNATION_CTEXT
              On encoding, unless explicitly specified, designate 94-chars charsets to G0,
              96-chars charsets to G1.

       MCODING_ISO_DESIGNATION_CTEXT_EXT
              On encoding, encode such charsets not conforming to ISO-2022 by ESC % / ..., and
              encode non-supported Unicode characters by ESC % G ... ESC % @ . On decoding,
              handle those escape sequences.

       MCODING_ISO_LOCKING_SHIFT
              Use locking shift.

       MCODING_ISO_SINGLE_SHIFT
              Use single shift (SS2 (0x8E or ESC N), SS3 (0x8F or ESC O)).

       MCODING_ISO_SINGLE_SHIFT_7
              Use 7-bit single shift 2 (SS2 (0x19)).

       MCODING_ISO_EUC_TW_SHIFT
              Use EUC-TW like special shifting.

       MCODING_ISO_ISO6429
              Use ISO-6429 escape sequences to indicate direction. Not yet implemented.

       MCODING_ISO_REVISION_NUMBER
              On encoding, if a charset has revision number, produce escape sequences to specify
              the number.

       MCODING_ISO_FULL_SUPPORT
              Support all ISO-2022 charsets.

       MCODING_ISO_FLAG_MAX

Variable Documentation

   MSymbol Mcoding_us_ascii
       Symbol for the coding system US-ASCII. The symbol Mcoding_us_ascii has name 'us-ascii' and
       represents a coding system for the CES US-ASCII.

   MSymbol Mcoding_iso_8859_1
       Symbol for the coding system ISO-8859-1. The symbol Mcoding_iso_8859_1 has name
       'iso-8859-1' and represents a coding system for the CES ISO-8859-1.

   MSymbol Mcoding_utf_8
       Symbol for the coding system UTF-8. The symbol Mcoding_utf_8 has name 'utf-8' and
       represents a coding system for the CES UTF-8.

   MSymbol Mcoding_utf_8_full
       Symbol for the coding system UTF-8-FULL. The symbol Mcoding_utf_8_full has name
       'utf-8-full' and represents a coding system that is a extension of UTF-8. This coding
       system uses the same encoding algorithm as UTF-8 but is not limited to the Unicode
       characters. It can encode all characters supported by the m17n library.

   MSymbol Mcoding_utf_16
       Symbol for the coding system UTF-16. The symbol Mcoding_utf_16 has name 'utf-16' and
       represents a coding system for the CES UTF-16 (RFC 2279).

   MSymbol Mcoding_utf_16be
       Symbol for the coding system UTF-16BE. The symbol Mcoding_utf_16be has name 'utf-16be' and
       represents a coding system for the CES UTF-16BE (RFC 2279).

   MSymbol Mcoding_utf_16le
       Symbol for the coding system UTF-16LE. The symbol Mcoding_utf_16le has name 'utf-16le' and
       represents a coding system for the CES UTF-16LE (RFC 2279).

   MSymbol Mcoding_utf_32
       Symbol for the coding system UTF-32. The symbol Mcoding_utf_32 has name 'utf-32' and
       represents a coding system for the CES UTF-32 (RFC 2279).

   MSymbol Mcoding_utf_32be
       Symbol for the coding system UTF-32BE. The symbol Mcoding_utf_32be has name 'utf-32be' and
       represents a coding system for the CES UTF-32BE (RFC 2279).

   MSymbol Mcoding_utf_32le
       Symbol for the coding system UTF-32LE. The symbol Mcoding_utf_32le has name 'utf-32le' and
       represents a coding system for the CES UTF-32LE (RFC 2279).

   MSymbol Mcoding_sjis
       Symbol for the coding system SJIS. The symbol Mcoding_sjis has name 'sjis' and represents
       a coding system for the CES Shift-JIS.

   MSymbol Mtype
       Parameter key for mconv_define_coding() (which see).

   MSymbol Mcharsets
   MSymbol Mflags
   MSymbol Mdesignation
   MSymbol Minvocation
   MSymbol Mcode_unit
   MSymbol Mbom
   MSymbol Mlittle_endian
   MSymbol Mutf
       Symbol that can be a value of the Mtype parameter of a coding system used in an argument
       to the mconv_define_coding() function (which see).

   MSymbol Miso_2022
   MSymbol Mreset_at_eol
   MSymbol Mreset_at_cntl
   MSymbol Meight_bit
   MSymbol Mlong_form
   MSymbol Mdesignation_g0
   MSymbol Mdesignation_g1
   MSymbol Mdesignation_ctext
   MSymbol Mdesignation_ctext_ext
   MSymbol Mlocking_shift
   MSymbol Msingle_shift
   MSymbol Msingle_shift_7
   MSymbol Meuc_tw_shift
   MSymbol Miso_6429
   MSymbol Mrevision_number
   MSymbol Mfull_support
   MSymbol Mmaybe
       Symbol whose name is 'maybe'. The variable Mmaybe is a symbol of name 'maybe'. It is used
       a value of Mbom parameter of the function mconv_define_coding() (which see).

   MSymbol Mcoding
       The symbol Mcoding. Any decoded M-text has a text property whose key is the predefined
       symbol Mcoding. The name of Mcoding is 'coding'.

Author

       Generated automatically by Doxygen for The m17n Library from the source code.

COPYRIGHT

       Copyright (C) 2001 Information-technology Promotion Agency (IPA)
       Copyright (C) 2001-2011 National Institute of Advanced Industrial Science and Technology
       (AIST)
       Permission is granted to copy, distribute and/or modify this document under the terms of
       the GNU Free Documentation License <http://www.gnu.org/licenses/fdl.html>.