Provided by: libmarpa-r2-perl_2.086000~dfsg-6build2_amd64 bug

NAME

       Marpa::R2::Semantics::Order - How the SLIF ranks ambiguous parses

Description

       Marpa allows ambiguous parses.  While an unambiguous parse can produce at most one parse
       tree and one parse result, an ambiguous parse will produce a parse series.  A parse series
       is a sequence of parse trees, each of which will have its own parse result.

       This document describes ways of controlling the order in which the SLIF recognizer's
       "value()" method evaluates the parse trees of an ambiguous parse.  It also describes ways
       to exclude selected parse trees from the parse series.

   Duplicate parses are eliminated
       When evaluating the parse trees in a parse series, Marpa never evaluates the same parse
       tree twice.  What this means probably matches the programmer's intuition of what it should
       mean.  Marpa considers two parse trees to be the same if they are semantic equivalents.

       Two parse trees are semantic equivalents if and only if a recursive, top-down evaluation
       of each applies the same rules in the same order at the same G1 locations.  If the
       semantics are deterministic, and if two parse trees are semantic equivalents according to
       this definition, the two parse trees will always produce the same parse result.

       The two parse trees are called semantic equivalents, because from the point of view of a
       deterministic semantics they are indistinguishable.  When the Marpa documentation refers
       to duplicate parses, unless otherwise stated, it means that the two are semantic
       equivalents.

   Default parse order
       By calling the recognizer's "value()" method repeatedly, Marpa can produce all the parse
       results in the current parse series.  The default is for the parse results to be returned
       in an arbitrary parse order.  This corresponds to the ""none"" value of the recognizer's
       "ranking_method" named argument.

       Traversal of the parse trees in arbitrary parse order will be always be well-behaved in
       the sense that no two parse trees will be semantic duplicates, and no unique (semantic
       non-duplicate) parse tree will be omitted in it.  No other property of arbitrary parse
       order is guaranteed.  For example, the order may change each time the parse series is
       traversed.

   Ranking methods
       SLIF recognizer objects have a "ranking_method" named argument, whose value can be the
       name of a ranking method, or ""none"", indicating that the default ranking method is to be
       used.

   The "rule" ranking method
       The rule method ranks alternative parses according to their rule alternatives.  Every rule
       alternative has a numeric rank.  A rule's rank can be specified using the the "rank"
       adverb argument for that RHS alternative.  Rule ranks must be integers.  They may be
       negative.  If no numeric rank is specified, the numeric rank is 0.

   The "high_rule_only" ranking method
       The "high_rule_only" ranking method is similar to the "rule" ranking method, except that,
       at every choice point, it discards all of the choices which have a rank lower than that of
       the highest ranked choice.

       The "high_rule_only" ranking method can reduce the ambiguity of a parse, but it does not
       necessarily do so.  This is because, at each choice point among the parse trees, it is
       possible that several of the choices, or all of them, will have the same rank as the
       highest ranked choice.

   Rule ranking
       A parse series is kept in a structure called a parse bocage.  The parse bocage is a tree-
       like structure, whose root node is the common root of all the parse trees of the parse
       series.  In an unambiguous parse, there will be only one parse tree, and the parse bocage
       will be equivalent to that parse tree.  In an ambiguous parse, there will be choice points
       in the parse bocage.  At the choice points, there will be two or more alternatives --
       choices which result in different parse trees.

       When ranking, the logic traverses the parse bocage, looking for choice points.  From the
       point of view of the individual parse trees, this traversal will be top-down and left-to-
       right.  At the choice points, the choices are ranked as follows:

       •   Different numeric ranks:

           If the two choices have different numeric ranks, they must also have different rule
           alternatives.  The choice whose rule alternative has the higher numeric rank will rank
           high.

       •   Same rule alternative:

           If the two choices have the same rule alternative, they rank as described under "Null
           variant ranking".

       •   Same numeric rank, different rule alternatives:

           Two different rule alternatives can have the same numeric rank.  If the two choices
           are for rule alternatives that are different, but that have the same numeric rank, the
           relative order of the two choices is arbitrary.

       Note that, in the above, the logic is the same regardless of the DSL rule to which the
       rule alternatives belong.  Different rule alternatives can, in the case of a prioritized
       rule, belong to the same DSL rule.  But two rule alternatives may also be different
       because they are from two different DSL rules.

   Null variant ranking
       Some rules have a RHS which contains proper nullables: symbols which may be nulled, but
       which are not nulling symbols.  (Nulling symbols are symbols which are always nulled.)

       When a rule alternative contains proper nullables, each instance of that rule creates a
       nulling variant.  A nulling variant is a specific pattern of null and non-null symbols in
       a rule instance's RHS.  In many cases, this creates an ambiguity -- different nulling
       variants can match the same substring in the input.  In ambiguous parsings of this kind,
       some applications may want to rank nulling variants that start with non-null symbols
       higher.  Other applications may want to do the opposite -- to rank nulling variants that
       start with null symbols higher.

       The "null-ranking" adverb for RHS alternatives specifies which nulling variants are ranked
       high or low.  If the "null-ranking" is ""low"", then the closer a nulling variant places
       its visible (non-null) symbols to the start of the rule instance, the higher it ranks.  A
       null ranking of "low" is the default.  If the "null-ranking" is ""high"", then the closer
       a nulling variant places its null symbols to the start of the rule instance, the higher it
       ranks.  In ranking nulling variants with more than one proper nullable, major-to-minor is
       left-to-right.

   A general approach to sorting parses
       The most general way to sort Marpa parses is for the application to take control.  The
       application can set up the Marpa semantic actions so that the parse result of every parse
       tree is a "<rank, true_value>" duple.  The duples can then be sorted by "rank".  Once the
       results are sorted, the "rank" element of the duple can be discarded.  (Those familiar
       with the Schwartzian transform may note a resemblance.  In Perl, duples can be implemented
       as references to arrays of 2 elements.)

       The user needs to be careful.  In theory, ambiguity can cause an exponential explosion in
       the number of results.  In practice, ambiguity tends to get out of hand very easily.
       Producing and sorting all the parses can take a very long time.

Formal definitions

       This section is a restatement of earlier material in more formal language.  It is recorded
       here for those who find it helpful.  Most readers will want to ignore this section.

       Call the set of parse trees, "T".  Semantic equivalence is an equivalence relation on "T".
       Call this relation "~".  Call "E", the quotient set of "T" by "~".  In this document, the
       term arbitrary parse order is used to mean an arbitrary choice among the relations which
       are strict total orders of "E".

Copyright and License

         Copyright 2014 Jeffrey Kegler
         This file is part of Marpa::R2.  Marpa::R2 is free software: you can
         redistribute it and/or modify it under the terms of the GNU Lesser
         General Public License as published by the Free Software Foundation,
         either version 3 of the License, or (at your option) any later version.

         Marpa::R2 is distributed in the hope that it will be useful,
         but WITHOUT ANY WARRANTY; without even the implied warranty of
         MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
         Lesser General Public License for more details.

         You should have received a copy of the GNU Lesser
         General Public License along with Marpa::R2.  If not, see
         http://www.gnu.org/licenses/.