Provided by: libencode-arabic-perl_1.9-1_all bug

NAME

       Encode::Arabic::Buckwalter - Tim Buckwalter's transliteration of Arabic

REVISION

           $Revision: 179 $        $Date: 2007-01-14 01:23:25 +0100 (Sun, 14 Jan 2007) $

SYNOPSIS

           use Encode::Arabic::Buckwalter;         # imports just like 'use Encode' would, plus more

           while ($line = <>) {                    # Tim Buckwalter's mapping into the Arabic script

               print encode 'utf8', decode 'buckwalter', $line;    # 'Buckwalter' alias 'Tim'
           }

           # shell filter of data, e.g. in *n*x systems instead of viewing the Arabic script proper

           % perl -MEncode::Arabic::Buckwalter -pe '$_ = encode "buckwalter", decode "utf8", $_'

           # employing the modes of conversion for filtering and trimming

           Encode::Arabic::enmode 'buckwalter', 'nosukuun', '>&< xml';
           Encode::Arabic::Buckwalter->demode(undef, undef, 'strip _');

           $decode = "Aiqora>o h`*aA {l_n~a_S~a bi___{notibaAhK.";
           $encode = encode 'buckwalter', decode 'buckwalter', $decode;

           # $encode eq "AiqraO h`*aA Aln~aS~a biAntibaAhK."

DESCRIPTION

       Tim Buckwalter's notation is a one-to-one transliteration of the Arabic script for Modern
       Standard Arabic, using lower ASCII characters to encode the graphemes of the original
       script. This system has been very popular in Natural Language Processing, however, there
       are limits to its applicability due to numerous non-alphabetic codes involved.

   IMPLEMENTATION
       The module takes care of the Encode::Encoding programming interface, while the effective
       code is Tim Buckwalter's "tr"ick:

           $encode =~ tr[\x{060C}\x{061B}\x{061F}\x{0621}-\x{063A}\x{0640}-\x{0652}    # !! no break in true perl !!
                         \x{0670}\x{0671}\x{067E}\x{0686}\x{0698}\x{06A4}\x{06AF}\x{0660}-\x{0669}]
                        [,;?'|>&<}AbptvjHxd*rzs$SDTZEg_fqklmnhwYyFNKaui~o`{PJRVG0-9];

           $decode =~ tr[,;?'|>&<}AbptvjHxd*rzs$SDTZEg_fqklmnhwYyFNKaui~o`{PJRVG0-9]
                        [\x{060C}\x{061B}\x{061F}\x{0621}-\x{063A}\x{0640}-\x{0652}    # !! no break in true perl !!
                         \x{0670}\x{0671}\x{067E}\x{0686}\x{0698}\x{06A4}\x{06AF}\x{0660}-\x{0669}];

   EXPORTS & MODES
       If the first element in the list to "use" is ":xml", the alternative mapping is introduced
       that suits the XML etiquette. This option is there only to replace the ">&<" reserved
       characters by "OWI" while still having a one-to-one notation. There is no XML parsing
       involved, and the markup would get distorted if subject to "decode"!

           $using_xml = eval q { use Encode::Arabic::Buckwalter ':xml'; decode 'buckwalter', 'OWI' };
           $classical = eval q { use Encode::Arabic::Buckwalter;        decode 'buckwalter', '>&<' };

           # $classical eq $using_xml and $classical eq "\x{0623}\x{0624}\x{0625}"

       The module exports as if "use Encode" also appeared in the package. The other "import"
       options are just delegated to Encode and imports performed properly.

       The conversion modes of this module allow to override the setting of the ":xml" option, in
       addition to filtering out diacritical marks and stripping off kashida. The modes and
       aliases relate like this:

           our %Encode::Arabic::Buckwalter::modemap = (

                   'default'       => 0,   'undef'         => 0,

                   'fullvocalize'  => 0,   'full'          => 0,

                   'nowasla'       => 4,

                   'vocalize'      => 3,   'nosukuun'      => 3,

                   'novocalize'    => 2,   'novowels'      => 2,   'none'          => 2,

                   'noshadda'      => 1,   'noneplus'      => 1,
               );

       enmode ($obj, $mode, $xml, $kshd)
       demode ($obj, $mode, $xml, $kshd)
           These methods can be invoked directly or through the respective functions of
           Encode::Arabic. The meaning of the extra parameters follows from the examples of
           usage.

SEE ALSO

       Encode::Arabic, Encode, Encode::Encoding

       Tim Buckwalter's Qamus  <http://www.qamus.org/>

       Buckwalter Arabic Morphological Analyzer
           <http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002L49>

       Xerox Arabic Home Page  <http://www.arabic-morphology.com/>

AUTHOR

       Otakar Smrz, <http://ufal.mff.cuni.cz/~smrz/>

           eval { 'E<lt>' . ( join '.', qw 'otakar smrz' ) . "\x40" . ( join '.', qw 'mff cuni cz' ) . 'E<gt>' }

       Perl is also designed to make the easy jobs not that easy ;)

COPYRIGHT AND LICENSE

       Copyright 2003-2007 by Otakar Smrz

       This library is free software; you can redistribute it and/or modify it under the same
       terms as Perl itself.