Ubuntu Manpage: utf8trans - Transliterate UTF-8 characters according to a table

NAME

       utf8trans - Transliterate UTF-8 characters according to a table

SYNOPSIS

       utf8trans charmap [file]...

DESCRIPTION

       utf8trans transliterates characters in the specified files (or standard input, if they are not specified)
       and writes the output to standard output. All input and output is in the UTF-8 encoding.

       This  program  is usually used to render characters in Unicode text files as some markup escapes or ASCII
       transliterations.  (It is not intended for general charset conversions.)  It provides functionality simi‐
       lar to the character maps in XSLT 2.0 (XML Stylesheet Language – Transformations, version 2.0).

OPTIONS

       -m, --modify
              Modifies the given files in-place with their transliterated output, instead of sending it to stan‐
              dard output.

              This option is useful for efficient transliteration of many files at once.

       --help Show brief usage information and exit.

       --version
              Show version and exit.

USAGE

       The translation is done according to the rules in the ‘character map’, named in the file charmap. It  has
       the following format:

       1.  Each  line  represents  a  translation entry, except for blank lines and comment lines, which are ig‐
           nored.

       2.  Any amount of whitespace (space or tab) may precede the start of an entry.

       3.  Comment lines begin with #.  Everything on the same line is ignored.

       4.  Each entry consists of the Unicode codepoint of the character to translate, in hexadecimal,  followed
           one space or tab, followed by the translation string, up to the end of the line.

       5.  The  translation string is taken literally, including any leading and trailing spaces (except the de‐
           limeter between the codepoint and the translation string), and all types of characters.  The  newline
           at the end is not included.

       The  above  format  is intended to be restrictive, to keep utf8trans simple. But if a XML-based format is
       desired, there is a xmlcharmap2utf8trans script that comes with the docbook2X distribution, that converts
       character maps in XSLT 2.0 format to the utf8trans format.

LIMITATIONS

       • utf8trans does not work with binary files, because malformed UTF-8 sequences in the input  are  substi‐
         tuted with U+FFFD characters. However, null characters in the input are handled correctly. This limita‐
         tion may be removed in the future.

       • There is no way to include a newline or null in the substitution string.

AUTHOR

       Steve Cheng <stevecheng@users.sourceforge.net>.

docbook2X 0.8.8                                   3 March 2007                                      utf8trans(1)