Ubuntu Manpage: utf8trans - Transliterate UTF-8 characters according to a table

NAME

       utf8trans - Transliterate UTF-8 characters according to a table

SYNOPSIS

       utf8trans charmap [file]...

DESCRIPTION

       utf8trans transliterates characters in the specified files (or standard input, if they are
       not specified) and writes the output to standard output. All input and output  is  in  the
       UTF-8 encoding.

       This  program  is  usually  used to render characters in Unicode text files as some markup
       escapes or ASCII transliterations.  (It is not intended for general charset  conversions.)
       It  provides  functionality  similar  to  the  character  maps in XSLT 2.0 (XML Stylesheet
       Language – Transformations, version 2.0).

OPTIONS

       -m, --modify
              Modifies the given files in-place with  their  transliterated  output,  instead  of
              sending it to standard output.

              This option is useful for efficient transliteration of many files at once.

       --help Show brief usage information and exit.

       --version
              Show version and exit.

USAGE

       The  translation  is done according to the rules in the ‘character map’, named in the file
       charmap. It has the following format:

       1.  Each line represents a translation entry, except for blank lines  and  comment  lines,
           which are ignored.

       2.  Any amount of whitespace (space or tab) may precede the start of an entry.

       3.  Comment lines begin with #.  Everything on the same line is ignored.

       4.  Each  entry  consists  of  the  Unicode  codepoint  of  the character to translate, in
           hexadecimal, followed one space or tab, followed by the translation string, up to  the
           end of the line.

       5.  The  translation  string is taken literally, including any leading and trailing spaces
           (except the delimeter between the codepoint and the translation string), and all types
           of characters. The newline at the end is not included.

       The  above  format  is intended to be restrictive, to keep utf8trans simple. But if a XML-
       based format is desired, there is  a  xmlcharmap2utf8trans  script  that  comes  with  the
       docbook2X  distribution,  that converts character maps in XSLT 2.0 format to the utf8trans
       format.

LIMITATIONS

       • utf8trans does not work with binary files, because  malformed  UTF-8  sequences  in  the
         input  are substituted with U+FFFD characters. However, null characters in the input are
         handled correctly. This limitation may be removed in the future.

       • There is no way to include a newline or null in the substitution string.

AUTHOR

       Steve Cheng <stevecheng@users.sourceforge.net>.