mule-ucs.texi
- Provided by: xemacs21-mulesupport (Version: 2009.02.17.dfsg.2-5)
- Source: xemacs21-packages
- Report a bug
@c %**start of header @setfilename mule-ucs.info @settitle Mule-UCS Manual @setchapternewpage odd @c %**end of header
@c This is *so* much nicer :) @footnotestyle end
@c Version values, for easy modification @set VERSION $Revision: 1.6 $ @set UPDATED 25 January 2002
@c Entries for @command{install-info} to use @direntry * Mule-UCS:: Lisp-based Unicode support for Emacsen. @end direntry
@c Copying permissions, et al @ifinfo This file documents the XEmacs package distribution of Mule-UCS, a package providing efficient Lisp-based coding support (specifically, Unicode) for Emacs and XEmacs.
Copyright @copyright{} 1997 MIYASHITA Hisashi Copyright @copyright{} 2001, 2002 Free Software Foundation, Inc.
Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies.
@ignore Permission is granted to process this file through TeX and print the results, provided the printed document carries a copying permission notice identical to this one except for the removal of this paragraph (this paragraph not being relevant to the printed manual).
@end ignore Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided also that the sections entitled ``Copying'' and ``GNU General Public License'' are included exactly as in the original, and provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one.
Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the Free Software Foundation. @end ifinfo
@tex
@titlepage @title Mule-UCS User Manual @subtitle Last updated @value{UPDATED}
@author by Stephen J. Turnbull @author including documentation by MIYASHITA Hisashi @page
@vskip 0pt plus 1filll Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies.
Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided also that the sections entitled ``Copying'' and ``GNU General Public License'' are included exactly as in the original, and provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one.
Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the Free Software Foundation.
@end titlepage @page
@end tex
@ifnottex @node Top, Copying, (dir), (dir) @top Mule-UCS User Manual
Mule-UCS is a character code translator. It provides functions to translate from any character set to any other, and construct new coding systems easily. It requires the MUltiLingual extensions to Emacs (MULE), including extended CCL facilities. These functions are provided by XEmacs (versions 21.2.36 and later), GNU Emacs (versions 20.3 and later), Emacs patched to use Mule 3.0, and Meadow.
Mule-UCS was designed and implemented by Miyashita Hisashi (HIMI) @email{himi@@bird.scphys.kyoto-u.ac.jp}
This is version @value{VERSION} of the Mule-UCS manual, last updated on @value{UPDATED}. It documents the XEmacs package distribution of Mule-UCS. It should be applicable to other versions of Mule-UCS with slight changes. Please report errors and variations among platforms to @email{stephen@@xemacs.org,Stephen Turnbull}, for incorporation in future versions of this manual.
@c You can find the latest version of this document on the web at @c @uref{http://www.xemacs.org/}.
IMPORTANT NOTE; Mule-UCS translates from Unicode to XEmacs' internal Mule encoding, and vice-versa. This internal encoding does not have a mapping for every Unicode code point, so if you are using any code point that is remotely obscure, there's a good chance it will be trashed, and you will lose data. Examples of such code points are U+264A WHITE SMILING FACE and U+201A SINGLE LOW-9 QUOTATION MARK, the latter as often used in Central Europe.
@ifhtml @c This manual is also available as a @uref{mule-ucs_ja.html, a Japanese @c translation}.
The latest release of Mule-UCS is available for @uref{ftp://ftp.xemacs.org/pub/xemacs/packages/, download}, or you may see @ref{Obtaining Mule-UCS} for more details, including the CVS server details. @end ifhtml
Mule-UCS is discussed on the mailing lists for Mule at @samp{m17n.org}.
@end ifnottex
@c Yeah the menu is incomplete. Go right ahead and fix it!! @menu * Copying:: Mule-UCS Copying conditions. * Overview:: What Mule-UCS can and cannot do.
For the end user: * Obtaining Mule-UCS:: How to obtain Mule-UCS. * History:: History of Mule-UCS * Installation:: Installing Mule-UCS with your (X)Emacs. * Configuration:: Configuring Mule-UCS for use. * Design of Mule-UCS:: How it works. @c * Usage:: An overview of the operation of Mule-UCS. @c * Bug Reports:: Reporting Bugs and Problems @c * Frequently Asked Questions:: Questions and answers from the mailing list.
For the developer:
@c @detailmenu @c --- The Detailed Node Listing --- @c @c Configuring Mule-UCS for use @c @c Using Mule-UCS @c @end detailmenu @end menu
@node Copying, Overview, Top, Top @chapter Mule-UCS Copying conditions
Copyright (C) 1998, 1999, 2000 Free Software Foundation, Inc.
This file is part of the XEmacs distribution of Mule-UCS.
Mule-UCS is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version.
Mule-UCS is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with XEmacs; see the file COPYING. If not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
@node Overview, Obtaining Mule-UCS, Copying, Top @chapter An overview of Mule-UCS
After the installation of Mule-UCS into your Emacs, you will be able to access Unicode files transparently. All that is needed is to load the @file{un-define} library. Mule-UCS implements rather low-level functions, and once loaded, the user should never notice that coding systems implemented via Mule-UCS are any different from those implemented in C or CCL.
Mule-UCS contains large tables, and takes about 4 seconds to load on a 450MHz Pentium III notebook. Thus if your use of Unicode is at all regular, it is recommended that the Mule-UCS Unicode coding systems be loaded by including
@example (require 'un-define) @end example
@noindent in your init file. Otherwise, you must load @file{un-define} by hand, using @code{load-library}. Also, by default XEmacs does not autodetect Unicode. For the most common case, UTF-8, include
@example (set-coding-priority-list '(utf-8)) (set-coding-category-system 'utf-8 'utf-8) @end example
@noindent in your init file. UTF-8 has a very characteristic signature; false negatives and positives should be very rare.
Autodetecting 16-bit wide-char versions of Unicode is not currently implemented in XEmacs itself. Mule-UCS provides some utilities in the @file{un-tools} library, but these are of unknown reliability.
Since Mule-UCS uses regular Mule code internally, and does not create an internal Mule charset for UCS, your normal input methods, whether native (Wnn), Lisp + backend (new Tamago), all in Lisp (Quail), or XIM-based (kinput2) should work with Unicode files without any change in your setup or habits. Input methods supported by terminals (cxterm, localized keyboards) should also work (if they work on the native Chinese!) as long as the terminal coding system is set properly by @samp{set-terminal-coding-system}.
Mule-UCS was written by a Japanese and thus gives priority to Japanese by default. This means that Unicode characters that are unified from various Asian character sets (eg, the single horizontal stroke meaning "one" is present in all of them) will be presented in the Mule buffer as Japanese characters, and displayed with a Japanese font. @emph{No information will be lost or corrupted} as long as you @emph{save back to Unicode}. (That's what "unification" means.)
However, if you wish to use Mule-UCS to translate Unicode to national subsets other than ASCII, Latin-1, and Japanese, you must change the priorities. This also allows you to satisfy cultural preferences for glyph styles by defaulting to an appropriate font. Use @samp{un-define-change-charset-order}. For the common case of the Latin character sets, where by international standard as well as common practice characters common to more than one character set are considered identical (not "unified" as for the Han characters in Unicode), the @file{latin-unity} package will probably be of use.
@c #### need examples of un-define-change-charset-order usage
(Mule-UCS does not understand Plane 14 tags. Therefore attempts to translate multilingual texts into non-Unicode encodings such as ISO 2022 will have to be done by hand.)
That is all that most users of Mule-UCS need to know---but make sure you've read the warning at the start of this document about losing data!
Mule-UCS is still under development and any problems you encounter, trivial or major, should be reported to the Mule-UCS developers. Use the standard package bug address @email{mule-ucs-bugs@@xemacs.org}. @c #### @xref{Bug Reports}.
@subsubheading Behind the scenes
This section tries to explain what goes on behind the scenes when you visit a file encoded in Unicode with Mule-UCS.
#### to be written
@c For the end user @node Obtaining Mule-UCS, History, Overview, Top @chapter Obtaining Mule-UCS.
Mule-UCS is freely available on the Internet and the latest release may be downloaded from @uref{ftp://ftp.m17n.org/pub/mule/Mule-UCS/}. This release includes the full documentation and code for Mule-UCS, suitable for installation. The current version is 0.84 @samp{KOUGETSUDAI}, and is in the file @file{Mule-UCS-0.84.tar.gz}.
For the especially brave, Mule-UCS is available from CVS. The CVS version is the latest version of the code and may contain incomplete features or new issues. Use these versions at your own risk.
Follow the example session below:
@example $ @kbd{cvs -d:pserver:anonymous@@cvs.meadowy.org:/cvsroot login} (Logging in to anonymous@@cvs.meadowy.org) CVS password: @key{RET} @dots{}
$ @kbd{cvs -z3 -d:pserver:anonymous@@cvs.meadowy.org:/cvsroot co mule-ucs} @end example
You should now have a directory @file{mule-ucs} containing the latest version of Mule-UCS. You can fetch the latest updates from the repository by issuing the command:
@example $ @kbd{cd mule-ucs} $ @kbd{cvs update -d} @end example
@c #### Document XEmacs packages here.
Mule-UCS is also available as an XEmacs package. @xref{Packages,,,xemacs}.
@node History, Installation, Obtaining Mule-UCS, Top @chapter History of Mule-UCS
Development was started in late 1997. The earliest net releases were done in about July 1999.
@node Installation, Configuration, History, Top @chapter Installing Mule-UCS into Emacs or XEmacs
Since Mule-UCS is only an Emacs Lisp library, you have only to byte-compile
@file{*.el} files and install them to the location refered by
@code{load-path}.
You can use @file{mucs-comp.el} at the top directory. Enter the following
command line:
@example emacs(xemacs) -q --no-site-file -batch -l mucs-comp.el @end example
If you use Meadow, enter the following:
@example Meadow95(NT) -q --no-site-file -batch -l mucs-comp.el @end example
Then you will obtain byte-compiled emacs-lisp files. Finally, you should install the files in the lisp directory to your @file{site-lisp} directory.
@c #### document build and install of big5conv and JIS X 0213 support.
@c #### document creation and formatting of Info docs.
@node Configuration, Design of Mule-UCS, Installation, Top @chapter Configuring Mule-UCS for use
if your use of Unicode is at all regular, it is recommended that the Mule-UCS Unicode coding systems be loaded by including
@example (require 'un-define) @end example
@noindent in your init file. Otherwise, you must load @file{un-define} by hand, using @code{load-library}. Also, by default XEmacs does not autodetect Unicode. For the most common case, UTF-8, include
@example (set-coding-priority-list '(utf-8)) (set-coding-category-system 'utf-8 'utf-8) @end example
@noindent in your init file. UTF-8 has a very characteristic signature; false negatives and positives should be very rare.
Autodetecting 16-bit wide-char versions of Unicode is not currently implemented in XEmacs itself. Mule-UCS provides some utilities in the @file{un-tools} library, but these are of unknown reliability.
That is all that most users of Mule-UCS need to know---but make sure you've read the warning at the start of this document about losing data!
@c The rest of this section documents various advanced features which allow @c Mule-UCS to be tuned to resolve ambiguities (such as the unification of @c the Han characters across several languages) more appropriately.
@c #### FIXME! @c Well, it will once it's written. @code{:-P}
@node Design of Mule-UCS, , Configuration, Top @chapter Design goal
MULE-UCS is a character code translator system. I set the goal of this system as follows.
@table @emph @item map character codepoint. MULE-UCS have to map character codepoint fast, and give a flexible way to change mapping policy.
@item utilize character codetables MULE-UCS can handle multiple codepoint tables, and then reorganize many character set.
@item generate coding system. MULE-UCS can generate coding systems from your own translation rule. Of course including a CCL to convert font codepoint. @end table
MULE-UCS has the following supplementary features.
@itemize @bullet @item Very biased (@code{:-P}) MULE-INTERNAL and ISO-10646 translator. and ISO-10646 coding-system.
@item Convertor tables from text representation to MULE-UCS awarable emacs lisp representation. @end itemize
MULE-UCS overview.
MULE-UCS consists of these modules mainly.
@enumerate @item Association compiler. @item Table organizer. @item CCL generator. @end enumerate
@table @emph @item Association compiler. On MULE-UCS, codepoint mapping rule is described by association list(alist). Association compiler generate table set from an assocation list for encoding and decoding. Association compiler also optimize tables.
@item Table organizer. Table Organizer can @end table