Ubuntu Manpage: Encode::Arabic::ArabTeX - Interpreter of the ArabTeX notation of Arabic

name
revision
synopsis
description
see also
author
copyright and license

trusty (3) Encode::Arabic::ArabTeX.3pm.gz

Provided by: libencode-arabic-perl_1.9-1_all

NAME

       Encode::Arabic::ArabTeX - Interpreter of the ArabTeX notation of Arabic

REVISION

           $Revision: 717 $             $Date: 2008-10-03 00:28:12 +0200 (Fri, 03 Oct 2008) $

SYNOPSIS

           use Encode::Arabic::ArabTeX;        # imports just like 'use Encode' would, plus extended options

           while ($line = <>) {                # maps the ArabTeX notation for Arabic into the Arabic script

               print encode 'utf8', decode 'arabtex', $line;       # 'ArabTeX' alias 'Lagally' alias 'TeX'
           }

           # ArabTeX lower ASCII transliteration <--> Arabic script in Perl's internal format

           $string = decode 'ArabTeX', $octets;
           $octets = encode 'ArabTeX', $string;

           Encode::Arabic::ArabTeX->encoder('dump' => '!./encoder.code');  # dump the encoder engine to file
           Encode::Arabic::ArabTeX->decoder('load');   # load the decoder engine from module's extra sources

DESCRIPTION

       ArabTeX is an excellent extension to TeX/LaTeX designed for typesetting the right-to-left scripts of the
       Orient. It comes up with very intuitive and comprehensible lower ASCII transliterations, the expressive
       power of which is even better than that of the scripts.

       Encode::Arabic::ArabTeX implements the rules needed for proper interpretation of the ArabTeX notation of
       Arabic. The conversion ifself is done by Encode::Mapper, and the user interface is built on the
       Encode::Encoding module.

   ENCODING BUSINESS
       Since the ArabTeX notation is not a simple mapping to the graphemes of the Arabic script, encoding the
       script into the notation is ambiguous. Two different strings in the notation may correspond to identical
       strings in the script. Heuristics must be engaged to decide which of the representations is more
       appropriate.

       Together with this bottle-neck, encoding may not be perfectly invertible by the decode operation, due to
       over-generation or approximations in the encoding algorithm.

       There are situations where conversion from the Arabic script to the ArabTeX notation is still convenient
       and useful. Imagine you need to edit the data, enhance it with vowels or other diacritical marks, produce
       phonetic transcripts and trim the typography of the script ... Do it in the ArabTeX notation, having an
       unrivalled control over your acts!

       Nonetheless, encoding is not the very purpose for this module's existence ;)

   DECODING BUSINESS
       The module decodes the ArabTeX notation as defined in the User Manual Version 4.00 of March 11, 2004,
       <ftp://ftp.informatik.uni-stuttgart.de/pub/arabtex/doc/arabdoc.pdf>. The implementation uses three levels
       of Encode::Mapper engines to solve the problem:

       Hamza writing
           Hamza carriers are determined from the context in accordance with the Arabic orthographical
           conventions.  The first level of mapping expands every "<'>" into the verbatim encoding of the
           relevant carrier.  This level of processing can become optional, if people ever need to encode the
           hamza carriers explicitly.

           Interpretation of geminated hamza "<''>" is correct here, as opposed to ArabTeX itself. In order to
           deduce the proper spelling rules, we resorted to <http://www.arabic-morphology.com/> and experimented
           with words like "<ra''asa>", "<ru''isa>", "<tara''usuN>", etc.

           On this level, word-internal occurrences of "<T>" get translated into "<t>", which is an extension to
           the notation that simplifies some requirements in modeling of the Arabic morphology.

       Grapheme generation
           The core level includes most of the rules needed, and converts the ArabTeX notation to Arabic
           graphemes in Unicode. The engine recognizes all the consonants of Modern Standard Arabic, plus the
           following letters:

                               [ "|",           ""         ],              # invisible consonant
                               [ "B",           "\x{0640}" ],              # consonantal ta.twil

                               [ "T",           "\x{0629}" ],              # ta' marbu.ta
                               [ "H",           "\x{0629}" ],              # ta' marbu.ta silent

                               [ "p",           "\x{067E}" ],              # pa'
                               [ "v",           "\x{06A4}" ],              # va'
                               [ "g",           "\x{06AF}" ],              # gaf

                               [ "c",           "\x{0681}" ],              # .ha with hamza
                               [ "^c",          "\x{0686}" ],              # gim with three
                               [ ",c",          "\x{0685}" ],              # _ha with three
                               [ "^z",          "\x{0698}" ],              # zay with three
                               [ "^n",          "\x{06AD}" ],              # kaf with three
                               [ "^l",          "\x{06B5}" ],              # lam with bow above
                               [ ".r",          "\x{0695}" ],              # ra' with bow below

           There are many nice features in the notation, like assimilation, gemination, hyphenation, all
           implemented here.  Defective and historical writings of vowels are supported, too! Try yourself if
           your fonts can handle these ;)

           Word-initial sequences like "<lV-all>", "<lV-al->", "<lV-al-CC>" and "<lV-aC-C>", where "V" stands
           for a short, possibly quoted or missing, vowel, and "C" represents a fixed consonant, are processed
           according to the requirements of the Arabic orthography. Thus, "<li-al-laylaTi>" reduces to
           "<li-llaylaTi>", "<li-al-rra^guli>" becomes "<lir-ra^guli>", and "<la-al-ma^gdu>" equals
           "<lal-ma^gdu>", while "<li-alla_dI>" turns into "<lilla_dI>".

       Wasla and ligatures
           Wasla is introduced if there is a preceding long or short vowel, and the blank space is one newline,
           one tabulator, or up to four single spaces. Optionally, diacritical marks in between laam and 'alif
           go after the latter letter, since most of the current systems rendering the Arabic script do not
           produce the desired ligatures if the two kinds of graphemes are not adjacent immediately.

       There are modes and options in ArabTeX that have not been dealt with yet in Encode::Arabic::ArabTeX.
       Still, mutual consistency of the systems is very high. This new release does support vowel quoting and
       works in the ArabTeX's "\vocalize" mode by default. The other conversion modes are implemented, too, as
       described below within the "enmode" and "demode" methods.

   EXPORTS, ENGINES & MODES
       The module exports as if "use Encode" also appeared in the package. The "import" options, except for the
       first-place subsequence of ":xml", ":simple" or ":describe", are just delegated to Encode and imports
       performed properly.

       If the first element in the list to "use" is ":xml", all XML markup, or rather any data enclosed in the
       well-paired and non-nested angle brackets "<" and ">", will be preserved. Properties of the
       Encode::Arabic::ArabTeX engines can be generally controlled through the Encode::Mapper API.

       In case the next, possibly the first, element in this list is ":simple", rules in the engines get
       simplified so that quotes be mapped to empty strings and infrequent or experimental notations of vowels
       not be interpreted in the extra manner of ArabTeX. Using ":simple" is recommended for simple every-day
       tasks where these nuances would have no impact and where full initialization would be bothering.

       The ":describe" option calls the Encode::Mapper's "describe" method on the module's engines right after
       their compilation.

       Initialization of the engines takes place the first time they are used, unless they have already been
       defined.  There are two explicit methods for it:

       encoder
           Initialize or redefine the encoder engine. If no parameters are given, rules in the module are
           compiled into a list of Encode::Mapper objects. Currently, the "--dump" and "--load" options have
           some experimental meaning.

       decoder
           See the description of "encoder".

       There are five conversion modes currently recognized in this module, and their aliases are mapped
       according to the module's %modemap hash. Selection of the appropriate mode is done best through the
       "enmode" and "demode" functions of Encode::Arabic, or with a direct call of the namesake methods in
       Encode::Arabic::ArabTeX:

           our %Encode::Arabic::ArabTeX::modemap = (           # the module provides these definitions

                   'default'       => 3,                           'undef'         => 0,

                   'fullvocalize'  => 4,   'full'          => 4,

                   'vocalize'      => 3,   'nosukuun'      => 3,

                   'novocalize'    => 2,   'novowels'      => 2,   'none'          => 2,

                   'noshadda'      => 1,   'noneplus'      => 1,
               );

           # the function calls might be preferred as more comfortable

           Encode::Arabic::demode 'arabtex', 'full';           # like 'encode' and 'decode' of Encode

           Encode::Arabic::ArabTeX->demode('fullvocalize');    # like the Encode::Encoding interfaces

           # how modes can be set easily

           use Encode::Arabic ':modes';   enmode 'arabtex', 'undef';   demode 'arabtex', 'noneplus';

       enmode
           Currently in development. The mode is fixed to 'undef' internally.

       demode
           Enforces the proper version of the final, third level of the Encode::Mapper engines.

AUTHOR

       Otakar Smrz, <http://ufal.mff.cuni.cz/~smrz/>

           eval { 'E<lt>' . ( join '.', qw 'otakar smrz' ) . "\x40" . ( join '.', qw 'mff cuni cz' ) . 'E<gt>' }

       Perl is also designed to make the easy jobs not that easy ;)

COPYRIGHT AND LICENSE

       Copyright 2003-2008 by Otakar Smrz

       This library is free software; you can redistribute it and/or modify it under the same terms as Perl
       itself.

NAME

REVISION

SYNOPSIS

DESCRIPTION

SEE ALSO

AUTHOR

COPYRIGHT AND LICENSE