Ubuntu Manpage: Marpa::R2::ASF - Marpa's abstract syntax forests (ASF's)

Provided by: libmarpa-r2-perl_2.086000~dfsg-5build1_amd64

Name

       Marpa::R2::ASF - Marpa's abstract syntax forests (ASF's)

Synopsis

       We want to "diagram" the following sentence.

         my $sentence = 'a panda eats shoots and leaves.';

       Here's the result we are looking for.  It is in Penntag form:

         (S (NP (DT a) (NN panda))
            (VP (VBZ eats) (NP (NNS shoots) (CC and) (NNS leaves)))
            (. .))
         (S (NP (DT a) (NN panda))
            (VP (VP (VBZ eats) (NP (NNS shoots))) (CC and) (VP (VBZ leaves)))
            (. .))
         (S (NP (DT a) (NN panda))
            (VP (VP (VBZ eats)) (VP (VBZ shoots)) (CC and) (VP (VBZ leaves)))
            (. .))

       Here is the grammar.

         :default ::= action => [ values ] bless => ::lhs
         lexeme default = action => [ value ] bless => ::name

         S   ::= NP  VP  period  bless => S

         NP  ::= NN              bless => NP
             |   NNS          bless => NP
             |   DT  NN          bless => NP
             |   NN  NNS         bless => NP
             |   NNS CC NNS  bless => NP

         VP  ::= VBZ NP          bless => VP
             | VP VBZ NNS        bless => VP
             | VP CC VP bless => VP
             | VP VP CC VP bless => VP
             | VBZ bless => VP

         period ~ '.'

         :discard ~ whitespace
         whitespace ~ [\s]+

         CC ~ 'and'
         DT  ~ 'a' | 'an'
         NN  ~ 'panda'
         NNS  ~ 'shoots' | 'leaves'
         VBZ ~ 'eats' | 'shoots' | 'leaves'

       Here's the code. It actually does two traversals, one that produces the full result as
       shown above, and another which "prunes" the forest down to a single tree.

         my $panda_grammar = Marpa::R2::Scanless::G->new(
             { source => \$dsl, bless_package => 'PennTags', } );
         my $panda_recce = Marpa::R2::Scanless::R->new( { grammar => $panda_grammar } );
         $panda_recce->read( \$sentence );
         my $asf = Marpa::R2::ASF->new( { slr=>$panda_recce } );
         my $full_result = $asf->traverse( {}, \&full_traverser );
         my $pruned_result = $asf->traverse( {}, \&pruning_traverser );

       The code for the full traverser is in an appendix.  The pruning code is simpler.  Here it
       is:

         sub penn_tag {
            my ($symbol_name) = @_;
            return q{.} if $symbol_name eq 'period';
            return $symbol_name;
         }

         sub pruning_traverser {

             # This routine converts the glade into a list of Penn-tagged elements.  It is called recursively.
             my ($glade, $scratch)     = @_;
             my $rule_id     = $glade->rule_id();
             my $symbol_id   = $glade->symbol_id();
             my $symbol_name = $panda_grammar->symbol_name($symbol_id);

             # A token is a single choice, and we know enough to fully Penn-tag it
             if ( not defined $rule_id ) {
             my $literal = $glade->literal();
             my $penn_tag = penn_tag($symbol_name);
             return "($penn_tag $literal)";
             }

             my $length = $glade->rh_length();
             my @return_value = map { $glade->rh_value($_) } 0 .. $length - 1;

             # Special case for the start rule
             return (join q{ }, @return_value) . "\n" if  $symbol_name eq '[:start]' ;

             my $join_ws = q{ };
             $join_ws = qq{\n   } if $symbol_name eq 'S';
             my $penn_tag = penn_tag($symbol_name);
             return "($penn_tag " . ( join $join_ws, @return_value ) . ')';

         }

       Here is the "pruned" output:

         (S (NP (DT a) (NN panda))
            (VP (VBZ eats) (NP (NNS shoots) (CC and) (NNS leaves)))
            (. .))

THIS INTERFACE is ALPHA and EXPERIMENTAL

       The interface described in this document is very much a work in progress.  It is alpha and
       experimental -- subject to radical change without notice.

About this document

       This document describes the abstract syntax forests (ASF's) of Marpa's SLIF interface.  An
       ASF is an efficient and practical way to represent multiple abstract syntax trees (AST's).

Constructor

   new()
         my $asf = Marpa::R2::ASF->new( { slr => $slr } );
         die 'No ASF' if not defined $asf;

       Creates a new ASF object.  Must be called with a list of one or more hashes of named
       arguments.  Currently only one named argument is allowed, the "slr" argument, and that
       argument is required.  The value of the "slr" argument must be a SLIF recognizer object.

       Returns the new ASF object, or "undef" if there was a problem.

Accessor

   grammar()
           my $grammar     = $asf->grammar();

       Returns the SLIF grammar associated with the ASF.  This can be convenient when using SLIF
       grammar methods while examining an ASF.  All failures are thrown as exceptions.

The traverser method

   traverse()
         my $full_result = $asf->traverse( {}, \&full_traverser );

       Performs a traversal of the ASF.  Returns the value of the traversal, which is computed as
       described below.  It requires two arguments.  The first is a per-traversal object, which
       must be a Perl reference.  The second argument must be a reference to a traverser
       function, Discussion of how to write a traverser follows.  The "traverse()" method may be
       called repeatedly for an ASF, with the same traverser, or with different ones.

How to write a traverser

The process of writing a traverser will be familiar if you have experience with traversing
trees. The traverser may be called at every node of the forest. (These nodes are called
glades.) The traverser must return a value, which may not be an "undef". The value
returned by the traverser becomes the value of the glade. The value of the topmost glade
(called the peak) becomes the value of the traversal, and will be the value returned by
the "traverse()" method.

The traverser is called at most once for each glade -- subsequent attempts to determine
the value of a glade will return a memoized value. The traverser is always invoked for
the peak, and for any glade whose value is required. It may or may not be invoked for
other glades.

The traverser is always invoked with two arguments. The first argument will be a glade
object. Methods of the glade object are used to find information about the glade, and to
move around in it.

The second of the two arguments to a traverser is the per-traversal object, which will be
shared by all calls in the traversal. It may be used as a "scratch pad" for information
which it is not convenient to pass via return values, as a means of avoiding the use of
globals.

"Moving around" in a glade means visiting its parse alternatives. (Parse alternatives are
usually called alternatives, when the meaning is clear.) If a glade has exactly one
alternative, it is called a trivial glade. When invoked, the traverser points at the
first alternative. Alternatives after the first may be visited using the the "next()"
glade method.

Parse alternatives may be either token alternatives or rule alternatives. Whether or not
the current alternative of the glade is a rule can be determined using the the "rule_id()"
glade method, which returns "undef" if and only if the glade is positioned at a token
alternative.

As a special case, a glade representing a nulled symbol is always a trivial glade,
containing only one token alternative. This means that a nulled symbol is always treated
as a token in this context, even when it actually is the LHS symbol of a nulled rule.

At all alternatives, the "span()" and the "literal()" glade methods are of use. The
"symbol_id()" glade method is also always of use although its meaning varies. At token
alteratives, the "symbol_id()" method returns the token symbol. At rule alteratives, the
"symbol_id()" method returns the ID of the LHS of the rule.

At rule alternatives, the "rh_length()" and the "rh_value()" glade methods are of use.
The "rh_length()" method returns the length of the RHS, and the "rh_value()" method
returns the value of one of the RHS children, as determined using its traverser.

At the peak of the ASF, the symbol will be named '"[:start]"'. This case often requires
special treatment. Note that it is entirely possible for the peak glade to be non-
trivial.

Glade methods

       These are methods of the glade object.  Glade objects are passed as arguments to the
       traversal routine, and are only valid within its scope.

   literal()
         my $literal = $glade->literal();

       Returns the glade literal, a string in the input which corresponds to this glade.  The
       glade literal remains constant inside a glade.  The "literal()" method accepts no
       arguments.

   span()
         my ( $start, $length ) = $glade->span();
         my $end = $start + $length - 1;

       Returns the glade span, two numbers which describe the location which corresponds to this
       glade.  The first number will be the start of the span, as an offset in the input stream.
       The second number will be its length.  The glade span remains constant within a glade.
       The "span()" method accepts no arguments.

       Then "end" character of the span, when defined, may be calculated as its start plus its
       length, minus one.  Applications should note that glades representing nulled symbols are
       special cases.  They will have a length of zero and, properly speaking, their literals are
       zero length and do not have defined first (start) and last (end) characters.

   symbol_id()
           my $symbol_id   = $glade->symbol_id();

       Returns the glade symbol, which remains constant inside a glade.  For a token alternative,
       the glade symbol is the token symbol.  For a rule alternative, the glade symbol is the LHS
       symbol of the rule.  The symbol ID remains constant within a glade.  The "symbol_id()"
       method accepts no arguments.

   rule_id()
         my $rule_id     = $glade->rule_id();

       Returns the ID of the rule for the current alternative.  The ID will be non-negative, but
       it may be zero.  Returns "undef" if and only if the current alternative is a token
       alternative.  The "rule_id()" method accepts no arguments.

   rh_length()
         my $length = $glade->rh_length();

       Returns the number of RHS children of the current rule.  On success, this will always be
       an integer greater than zero.  The "rh_length()" method accepts no arguments.  It is a
       fatal error to call "rh_length()" for a glade that currently points to a token
       alternative.

   rh_value()
         my $child_value = $glade->rh_value($rh_ix);

       Requires exactly one argument, $rh_ix, which must be the zero-based index of a RHS child
       of the current rule instance.  Returns the value of the $rh_ix'th child of the current
       rule instance.  For convenient iteration, returns "undef" if the value of the $rh_ix is
       greater than or equal to the RHS length.  It is a fatal error to call "rh_value()" for a
       glade that currently points to a token alternative.

   next()
         last CHOICE if not defined $glade->next();

       Points the glade at the next alternative.  If there is no next alternative, returns
       "undef".  On success, returns a defined value.  One of the values returned on success may
       be the integer zero, so applications checking for failure should be careful to check for a
       Perl defined value, and not for a Perl true value.

       In addition, because the "rule_id()" method remains constant only within a symch, and the
       "next()" method may change the current symch, "rule_id()" method must always be called to
       obtain the current rule ID in a "while" loop where "next()" method is used as the exit
       condition.

Details

This section contains additional explanations, not essential to understanding the topic of
this document. Often they are formal or mathematical. Some people find these helpful,
but others do not, which is why they are segregated here.

Symches and factorings
Symch and factoring are terms which are useful for some advanced applications. For the
purposes of this document, the reader can consider the term "factoring" as a synonym for
"parse alternative". A symch is either a rule symch or a token alternative. A rule symch
is a series of rule alternatives (factorings) which share the same rule ID and the same
glade. A glade's token alternative is a symch all by itself. The term symch is shorthand
for "symbolic choice".

For each glade accessor, its value can be classified as

• remaining constant inside a glade;

• remaining constant within a symch; or

• potentially varying with each factoring.

The values of the "literal()", "span()", and "symbol_id()" methods remain constant inside
each glade. The "rule_id()" method remains constant within a symch -- in fact, the rule
ID and the glade define a symch. (Recall that for this purpose, the token alternative's
"undef" is considered a rule ID.) The values of the "rh_length()" method and the values
of the "rh_value()" method method may vary with each alternative (factoring).

When moving through a glade using the "next()" method, alternatives within the same symch
are visited as a group. More precisely, let the "current rule ID" be defined as the rule
ID of the alternative at which the glade is currently pointing. The "next()" glade method
guarantees that, before any alternative with a rule ID different from the current rule ID
is visited, all of the so-far-unvisited alternatives that share the current rule ID will
be visited.

Appendix: full traverser code

         sub full_traverser {

             # This routine converts the glade into a list of Penn-tagged elements.  It is called recursively.
             my ($glade, $scratch)     = @_;
             my $rule_id     = $glade->rule_id();
             my $symbol_id   = $glade->symbol_id();
             my $symbol_name = $panda_grammar->symbol_name($symbol_id);

             # A token is a single choice, and we know enough to fully Penn-tag it
             if ( not defined $rule_id ) {
             my $literal = $glade->literal();
             my $penn_tag = penn_tag($symbol_name);
             return ["($penn_tag $literal)"];
             } ## end if ( not defined $rule_id )

             # Our result will be a list of choices
             my @return_value = ();

             CHOICE: while (1) {

             # The results at each position are a list of choices, so
             # to produce a new result list, we need to take a Cartesian
             # product of all the choices
             my $length = $glade->rh_length();
             my @results = ( [] );
             for my $rh_ix ( 0 .. $length - 1 ) {
                 my @new_results = ();
                 for my $old_result (@results) {
                 my $child_value = $glade->rh_value($rh_ix);
                 for my $new_value ( @{ $child_value } ) {
                     push @new_results, [ @{$old_result}, $new_value ];
                 }
                 }
                 @results = @new_results;
             } ## end for my $rh_ix ( 0 .. $length - 1 )

             # Special case for the start rule
             if ( $symbol_name eq '[:start]' ) {
                 return [ map { join q{}, @{$_} } @results ];
             }

             # Now we have a list of choices, as a list of lists.  Each sub list
             # is a list of Penn-tagged elements, which we need to join into
             # a single Penn-tagged element.  The result will be to collapse
             # one level of lists, and leave us with a list of Penn-tagged
             # elements
             my $join_ws = q{ };
             $join_ws = qq{\n   } if $symbol_name eq 'S';
             push @return_value,
                 map { '(' . penn_tag($symbol_name) . q{ } . ( join $join_ws, @{$_} ) . ')' }
                 @results;

             # Look at the next alternative in this glade, or end the
             # loop if there is none
             last CHOICE if not defined $glade->next();

             } ## end CHOICE: while (1)

             # Return the list of Penn-tagged elements for this glade
             return \@return_value;
         } ## end sub full_traverser

Copyright and License

         Copyright 2014 Jeffrey Kegler
         This file is part of Marpa::R2.  Marpa::R2 is free software: you can
         redistribute it and/or modify it under the terms of the GNU Lesser
         General Public License as published by the Free Software Foundation,
         either version 3 of the License, or (at your option) any later version.

         Marpa::R2 is distributed in the hope that it will be useful,
         but WITHOUT ANY WARRANTY; without even the implied warranty of
         MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
         Lesser General Public License for more details.

         You should have received a copy of the GNU Lesser
         General Public License along with Marpa::R2.  If not, see
         http://www.gnu.org/licenses/.