Ubuntu Manpage: Text::RewriteRules - A system to rewrite text using regexp-based rules

Provided by: libtext-rewriterules-perl_0.25-1_all

NAME

       Text::RewriteRules - A system to rewrite text using regexp-based rules

SYNOPSIS

           use Text::RewriteRules;

           RULES email
           \.==> DOT
           @==> AT
           ENDRULES

           print email("ambs@cpan.org") # prints ambs AT cpan DOT org

           RULES/m inc
           (\d+)=e=> $1+1
           ENDRULES

           print inc("I saw 11 cats and 23 dogs") # prints I saw 12 cats and 24 dogs

ABSTRACT

       This module uses a simplified syntax for regexp-based rules for rewriting text. You define
       a set of rules, and the system applies them until no more rule can be applied.

       Two variants are provided:

       1.  traditional rewrite (RULES function):

            while it is possible do substitute
            | apply first substitution rule

       2.  cursor based rewrite (RULES/m function):

            add a cursor to the beginning of the string
            while not reach end of string
            | apply substitute just after cursor and advance cursor
            | or advance cursor if no rule can be applied

DESCRIPTION

       A lot of computer science problems can be solved using rewriting rules.

       Rewriting rules consist of mainly two parts: a regexp (LHS: Left Hand Side) that is
       matched with the text, and the string to use to substitute the content matched with the
       regexp (RHS: Right Hand Side).

       Now, why don't use a simple substitute? Because we want to define a set of rules and match
       them again and again, until no more regexp of the LHS matches.

       A point of discussion is the syntax to define this system. A brief discussion shown that
       some users would prefer a function to receive an hash with the rules, some other, prefer
       some syntax sugar.

       The approach used is the last: we use "Filter::Simple" such that we can add a specific
       non-perl syntax inside the Perl script. This improves legibility of big rewriting rules
       systems.

       This documentation is divided in two parts: first we will see the reference of the module.
       Kind of, what it does, with a brief explanation. Follows a tutorial which will be growing
       through time and releases.

SYNTAX REFERENCE

       Note: most of the examples are very stupid, but that is the easiest way to explain the
       basic syntax.

       The basic syntax for the rewrite rules is a block, started by the keyword "RULES" and
       ended by the "ENDRULES". Everything between them is handled by the module and interpreted
       as rules or comments.

       The "RULES" keyword can handle a set of flags (we will see that later), and requires a
       name for the rule-set. This name will be used to define a function for that rewriting
       system.

          RULES functioname
           ...
          ENDRULES

       The function is defined in the main namespace where the "RULES" block appears.

       In this block, each line can be a comment (Perl style), an empty line or a rule.

   Basic Rule
       A basic rule is a simple substitution:

         RULES foobar
         foo==>bar
         ENDRULES

       The arrow "==>" is used as delimiter. At its left is the regexp to match, at the right
       side, the substitution. So, the previous block defines a "foobar" function that
       substitutes all "foo" by "bar".

       Although this can seems similar to a global substitution, it is not. With a global
       substitution you can't do an endless loop. With this module it is very simple. I know you
       will get the idea.

       You can use the syntax of Perl both on the left and right hand side of the rule, including
       "$1...".

   Execution Rule
       If the Perl substitution supports execution, why not to support it, also? So, you got the
       idea. Here is an example:

         RULES foo
         (\d+)b=e=>'b' x $1
         (\d+)a=eval=>'a' x ($1*2)
         ENDRULES

       So, for any number followed by a "b", we replace by that number of "b's". For each number
       followed by an "a", we replace them by twice that number of "a's".

       Also, you mean evaluation using an "e" or "eval" inside the arrow. I should remind you can
       mix all these rules together in the same rewriting system.

   Conditional Rule
       On some cases we want to perform a substitution if the pattern matches and a set of
       conditions about that pattern (or not) are true.

       For that, we use a three part rule. We have the common rule plus the condition part,
       separated from the rule by "!!". These conditional rules can be applied both for basic and
       execution rules.

         RULES translate
         ([[:alpha:]]+)=e=>$dic{$1}!! exists($dic{$1})
         ENDRULES

       The previous example would translate all words that exist on the dictionary.

   Begin Rule
       Sometimes it is useful to change something on the string before starting to apply the
       rules. For that, there is a special rule named "begin" (or "b" for abbreviate) just with a
       RHS. This RHS is Perl code. Any Perl code. If you want to modify the string, use $_.

         RULES foo
         =b=> $_.=" END"
         ENDRULES

   Last Rule
       As you use "last" on Perl to skip the remaining code on a loop, you can also call a "last"
       (or "l") rule when a specific pattern matches.

       Like the "begin" rule with only a RHS, the "last" rule has only a LHS:

         RULES foo
         foobar=l=>
         ENDRULES

       This way, the rules iterate until the string matches with "foobar".

       You can also supply a condition in a last rule:

         RULES bar
         f(o+)b(a+)r=l=> !! length($1) == 2 * length($2);

   Rules with /x mode
       It is possible to use the regular expressions /x mode in the rewrite rules.  In this case:

       1.  there must be an empty line between rules

       2.  you can insert space and line breaks into the regular expression:

            RULES/x f1
            (\d+)
            (\d{3})
            (000)
            ==>$1 milhao e $2 mil!! $1 == 1

            ENDRULES

POWER EXPRESSIONS

       To facilitate matching complex languages Text::RewriteRules defines a set of regular
       expressions that you can use (without defining them).

   Parenthesis
       There are three kind of usual parenthesis: the standard parenthesis, brackets or curly
       braces. You can match a balanced string of parenthesis using the power expressions
       "[[:PB:]]", "[[:BB:]]" and "[[:CBB:]]" for these three kind of parenthesis.

       For instance, if you apply this rule:

          [[:BB:]]==>foo

       to this string

         something [ a [ b] c [d ]] and something more

       then, you will get

         something foo and something more

       Note that if you apply it to

         something [[ not ] balanced [ here

       then you will get

         something [foo balanced [ here

   XML tags
       The power expression "[[:XML:]]" match a XML tag (with or without children XML tags. Note
       that this expression matches only well formed XML tags.

       As an example, the rule

         [[:XML:]]=>tag

       applied to the string

         <a><b></a></b> and <more><img src="foo"/></more>

       will result in

         <a><b></a></b> and tag

TUTORIAL

       At the moment, just a set of commented examples.

       Example1 -- from number to portuguese words  (using traditional rewriting)

       Example2 -- Naif translator (using cursor-based rewriting)

Conversion between numbers and words

       Yes, you can use Lingua::PT::Nums2Words and similar (for other languages). Meanwhile,
       before it existed we needed to write such a conversion tool.

       Here I present a subset of the rules (for numbers bellow 1000). The generated text is
       Portuguese but I think you can get the idea. I'll try to create a version for English very
       soon.

       You can check the full code on the samples directory (file "num2words").

         use Text::RewriteRules;

         RULES num2words
         100==>cem
         1(\d\d)==>cento e $1
         0(\d\d)==>$1
         200==>duzentos
         300==>trezentos
         400==>quatrocentos
         500==>quinhentos
         600==>seiscentos
         700==>setecentos
         800==>oitocentos
         900==>novecentos
         (\d)(\d\d)==>${1}00 e $2

         10==>dez
         11==>onze
         12==>doze
         13==>treze
         14==>catorze
         15==>quinze
         16==>dezasseis
         17==>dezassete
         18==>dezoito
         19==>dezanove
         20==>vinte
         30==>trinta
         40==>quarenta
         50==>cinquenta
         60==>sessenta
         70==>setenta
         80==>oitenta
         90==>noventa
         0(\d)==>$1
         (\d)(\d)==>${1}0 e $2

         1==>um
         2==>dois
         3==>tres
         4==>quatro
         5==>cinco
         6==>seis
         7==>sete
         8==>oito
         9==>nove
         0$==>zero
         0==>
           ==>
          ,==>,
         ENDRULES

         num2words(123); # returns "cento e vinte e tres"

   Naif translator (using cursor-based rewriting)
        use Text::RewriteRules;
        %dict=(driver=>"motorista",
               the=>"o",
               of=>"de",
               car=>"carro");

        $word='\b\w+\b';

        if( b(a("I see the Driver of the car")) eq "(I) (see) o Motorista do carro" )
             {print "ok\n"}
        else {print "ko\n"}

        RULES/m a
        ($word)==>$dict{$1}!!                  defined($dict{$1})
        ($word)=e=> ucfirst($dict{lc($1)}) !!  defined($dict{lc($1)})
        ($word)==>($1)
        ENDRULES

        RULES/m b
        \bde o\b==>do
        ENDRULES

AUTHOR

       Alberto Simo~es, "<ambs@cpan.org>"

       Jose Joa~o Almeida, "<jjoao@cpan.org>"

BUGS

       We know documentation is missing and you all want to use this module.  In fact we are
       using it a lot, what explains why we don't have the time to write documentation.

       Please report any bugs or feature requests to "bug-text-rewrite@rt.cpan.org", or through
       the web interface at <http://rt.cpan.org>.  I will be notified, and then you'll
       automatically be notified of progress on your bug as I make changes.

ACKNOWLEDGEMENTS

       Damian Conway for Filter::Simple

COPYRIGHT & LICENSE

       Copyright 2004-2012 Alberto Simo~es and Jose Joa~o Almeida, All Rights Reserved.

       This program is free software; you can redistribute it and/or modify it under the same
       terms as Perl itself.