Provided by: libmarpa-r2-perl_2.086000~dfsg-8_amd64 bug

NAME

       Marpa::R2 - Release 2 of Marpa

Synopsis

           use Marpa::R2;

           my $dsl = <<'END_OF_DSL';
           :default ::= action => [name,values]
           lexeme default = latm => 1

           Expression ::= Term action => ::first
           Term ::=
                 Factor action => ::first
               | Term '+' Term action => do_add
           Factor ::=
                 Number action => ::first
               | Factor '*' Factor action => do_multiply
           Number ~ digits
           digits ~ [\d]+
           :discard ~ whitespace
           whitespace ~ [\s]+
           END_OF_DSL

           my $grammar = Marpa::R2::Scanless::G->new( { source => \$dsl } );
           my $recce = Marpa::R2::Scanless::R->new(
               { grammar => $grammar, semantics_package => 'My_Actions' } );
           my $input = '42 * 1 + 7';
           $recce->read( \$input );

           my $value_ref = $recce->value;
           my $value = $value_ref ? ${$value_ref} : 'No Parse';

           sub My_Actions::do_add {
               my ( undef, $t1, undef, $t2 ) = @_;
               return $t1 + $t2;
           }

           sub My_Actions::do_multiply {
               my ( undef, $t1, undef, $t2 ) = @_;
               return $t1 * $t2;
           }

Description

   Overview
       Marpa parses any language whose grammar can be written in BNF.  That includes recursive
       grammars, ambiguous grammars, infinitely ambiguous grammars and grammars with useless or
       empty productions.  Marpa does both left- and right-recursion in linear time -- in fact if
       a grammar is in any class currently in practical use, Marpa will parse it in linear time.

       This document centers around a short tutorial of the Scanless interface (SLIF).  This is
       the interface most suitable for beginners.  The SLIF is the most suitable interface for
       most advanced uses as well.

A simple calculator

       The synopsis shows the code for an extremely simple calculator.  It handles only addition
       and multiplication of integers.  The sections which follow explain, line by line, how it
       works.  The explanation will assume that the reader understands BNF and the basics of
       grammars -- what rules are, what symbols are, what the start symbol of a grammar is, etc.

   Marpa::R2::Scanless::G::new
           my $dsl = <<'END_OF_DSL';
           :default ::= action => [name,values]
           lexeme default = latm => 1

           Expression ::= Term action => ::first
           Term ::=
                 Factor action => ::first
               | Term '+' Term action => do_add
           Factor ::=
                 Number action => ::first
               | Factor '*' Factor action => do_multiply
           Number ~ digits
           digits ~ [\d]+
           :discard ~ whitespace
           whitespace ~ [\s]+
           END_OF_DSL

           my $grammar = Marpa::R2::Scanless::G->new( { source => \$dsl } );

       The code first creates a new SLIF grammar.  SLIF grammars are "Marpa::R2::Scanless:G"
       objects.  They are created with the Marpa::R2::Scanless:G::new constructor.  The arguments
       to Marpa::R2::Scanless::G::new are references to hashes of named arguments.  In the
       key/value pairs of these hashes, the hash key is the name of the argument, and the hash
       value is the value of the named argument.

       In the example, there is only one named argument to the SLIF grammar constructor:
       "source".  The value of "source" must be a reference to a string in the SLIF's domain-
       specific language (DSL).  In this example, the DSL consists of several rules and pseudo-
       rules.

   The default pseudo-rule
           :default ::= action => [name,values]
           lexeme default = latm => 1

       These two lines set useful defaults.  The first sets a default semantics, one which is
       especially useful for development.  This is a finished script, so the default semantics is
       not used much.  We'll talk about this more when we discuss semantics at the end.

       The second line sets the longest acceptable tokens match (LATM) style of lexing, which is
       what you'll almost always want.  It is not the default for historical reasons, so your
       scripts will almost always start with this line.

   A G1 rule
       Next follows a G1, or structural rule.  Structural rules are the kinds of rules typically
       seen in BNF -- they describe the symbols which provide the structure of the grammar, but
       leave out details of whitespace.  The SLIF also handles the lexical details in this
       example, but it does it via L0 rules, which we will see shortly.

           Expression ::= Term action => ::first

       As is normal for BNF rules, this rule consists of a left hand side symbol
       (""Expression""), the BNF operator (""::="") and a series of right hand side (RHS)
       symbols.  There is always exactly one left hand side (LHS) symbol.  There may be any
       number of RHS symbols.  In the case of an empty rule, the number of RHS symbols would be
       zero.  In this rule, there is one RHS symbol, ""Term"".

       The BNF operator (""::="") is what makes this rule a G1 (structural) rule.  Later we will
       see lexical rules, which will use the match operator (""~"").

       After the rule is an adverb: "action => ::first".  We'll explain the purpose of the
       "action" adverbs when we discuss semantics

   More complicated G1 rules
           Term ::=
                 Factor action => ::first
               | Term '+' Term action => do_add

       This rule says that a "Term" may be one of two alternatives: either a "Factor" or two
       "Term"'s separated by an addition operator.  Immediately following is another G1 rule
       defining a "Factor".  It is very similar in form to the one for "Term".

           Factor ::=
                 Number action => ::first
               | Factor '*' Factor action => do_multiply

   L0 rules
       The structural rules define the high-level structure of the grammar, and ignore details of
       whitespace, comments, etc.  Now we look at how the low-level, lexical issues are handled.
       This very simple calculator language does not allow comments, but it does define
       whitespace.

                 :discard ~ whitespace
                 whitespace ~ [\s]+

       The ":discard" rule is a pseudo-rule, which tells Marpa to use whatever it matches to
       separate G1 symbols, but otherwise to ignore it -- to "discard" it.  "whitespace" is
       defined in the next rule as a sequence of one or more spaces.

       Note the match operator (""~"") in the rule defining whitespace.  It tells Marpa that this
       rule is lexical and should be interpreted exactly as written, character by character.

       The "whitespace" rule is a special kind of rule in two respects.  First, its RHS is
       followed by a quantifier (""+""), which makes it a sequence rule.  Aside from the
       quantifier, sequence rules may only have a single symbol or character class on their RHS.
       The plus quantifier (""+"") means a sequence of one or more items.  The star quantifier
       (""*"") is also allowed, and it indicates a sequence of zero or more items.

       The whitespace items are defined by a character class: "[\s]".  Marpa supports the same
       character classes, and the same character class syntax, as Perl does.

       The next pair of L0 rules define the "Number" symbol

                 Number ~ digits
                 digits ~ [\d]+

       The above two rules say that a "Number" is a sequence of one or more digits.  "Number" is
       a lexeme -- a G1 symbol which is defined and recognized at the lexical (L0) level.  In
       this example, there are three other lexemes: "whitespace", and the addition and
       multiplication operators.

       We've already looked at the "whitespace" lexeme, which will be discarded without being
       seen by G1.  The addition and multiplication operators were defined with single quoted
       strings in the G1 rules.  As a reminder, here's the rule for "Term" again:

           Term ::=
                 Factor action => ::first
               | Term '+' Term action => do_add

       In the above rule, the single-quoted string '+' implicitly defines a L0 lexeme.  Something
       similar happens with the '*' string in the rule for a "Factor".

       The SLIF's lexer mostly "does what you mean".  While the input is being read, it looks for
       all lexemes defined in the DSL.  Almost always, you'll want Marpa to look only for tokens
       that are actually acceptable to the parse.  Telling Marpa to do so was the purpose of this
       line:

           lexeme default = latm => 1

       LATM means "longest acceptable tokens match".  (LATM is not the default for historical
       reasons.)

       Among the acceptable tokens, Marpa looks for longest matches -- if multiple tokens match,
       the longest match is the winner.  Marpa tolerates ambiguity, so one feature special to
       Marpa is that LATM is a longest acceptable tokens match -- if more than one token is
       longest, all of them are considered in the parse.  The logic of SLIF lexing is described
       with more precision in the SLIF overview document.

   Marpa::R2::Scanless::R::new
           my $recce = Marpa::R2::Scanless::R->new(
               { grammar => $grammar, semantics_package => 'My_Actions' } );

       "Marpa::R2::Scanless::R::new" creates a new SLIF recognizer.  Its arguments are references
       to hashes of named arguments.  In this example the first named argument is the required
       argument: ""grammar"".  The value of the "grammar" named argument must be a Marpa::R2 SLIF
       grammar.

       The second argument is optional, but you will use it frequently.  The
       ""semantics_package"" named argument tells Marpa in which Perl package to look for the
       closures implementing the semantics for this grammar.  We will talk more about this below.

   Marpa::R2::Scanless::R::read
           my $input = '42 * 1 + 7';
           $recce->read( \$input );

       To parse a string, we use the "Marpa::R2::Scanless::R::read()" method.  In its simplest
       form, as here, the "Marpa::R2::Scanless::R::read()" takes a reference to a string
       containing the input stream as its argument.

   Marpa::R2::Scanless::R::value
           my $value_ref = $recce->value;
           my $value = $value_ref ? ${$value_ref} : 'No Parse';

       The "Marpa::R2::Scanless::R::value()" method returns a reference to the parse result's
       value, if there was a parse result.  If there was no parse result,
       "Marpa::R2::Scanless::R::value()" returns "undef".

       We have yet to describe how the Marpa SLIF computes the value of a parse.  In fact, up to
       this point, we have been skipping everything that had to do with semantics.  Now it is
       time to go back to those features.

   Semantics
       The value of the parse result, as returned via the "value()" method, is determined by the
       parse's semantics.  Marpa's semantics are the traditional ones: The input is seen as a
       tree which takes its structure from the G1 rules.  (This is why the G1 rules are called
       structural.)  The value of the parse results from repeatedly evaluating nodes of this
       tree, starting at the bottom, with the results of child nodes made available to their
       parent node when the parent node is evaluated.

       Parse trees are usually drawn upside-down with their root at the top, and their "leaves"
       at the bottom.  In Marpa::R2's SLIF, the "leaves" are the symbols that the G1 (structural)
       rules share with the L0 (lexical) rules.  The symbols shared by L0 and G1 are those
       lexemes which are not discarded.  In this example, the lexemes visible to G1 are "Number"
       and two operators which are specified with a quoted string: ""+"" and ""*"".

       Marpa assigns values to the nodes of the tree, starting with the leaves.  Marpa's "leaves"
       will always be L0 symbols, and their value by default is the literal value at their
       location in the input stream.  In the case of the two operators described by quoted
       string, the value is that quoted string.  That is, the value of '"+"' is '"+"', and the
       value of '"*"' is '"*"'.  The value of "Number" will be the portion of the input that
       matched the "[\d]+" pattern.

       Starting with the values for leaves, Marpa::R2 moves recursively "up" the tree to its
       root, assigning a value to each node of the tree based on the value of its child nodes.
       Each non-leaf node corresponds to a G1 rule, and the children of the non-leaf node
       correspond to the RHS symbols of the rule.  When the non-leaf node is valued, its value
       becomes the value of its LHS symbol, and this value will become the value of a RHS symbol
       of another node with one exception.

       The one exception, the node with a LHS symbol that does not become a RHS symbol, is the
       value of the top (or "root") node.  The value of the top node becomes the value of the
       parse, and this is the parse result value to which the "value()" method returns a
       reference.

           :default ::= action => [name,values]

       Each non-leaf node determines its value with an action.  The default pseudo-rule allows
       you to specify the default action.  (It is a pseudo-rule because its LHS, "":default"", is
       a pseudo-symbol, not a real one.)  Often actions are Perl functions, which in this context
       are called Perl semantic closures.

           my $recce = Marpa::R2::Scanless::R->new(
               { grammar => $grammar, semantics_package => 'My_Actions' } );

       Above we saw the "semantics_package" named argument used when constructing the SLIF
       recognizer.  As we noted, this specifies the package that is used to find the Perl
       semantic closures.

       In this example the default semantics, as specified by the "default_action" named
       argument, come from a "array descriptor" named ""[name,values]"".  This indicates that, by
       default, the value of a rule is to be a reference to an array consisting of the rule's
       name, followed by the values of its children.

       In this case, the semantics is not actually used, and you would usually change it to
       something more convenient for your application.  But ""[name,values]"" is an excellent
       starting point when you're first developing a DSL and, since this code is intended as a
       template, we've kept it.  For more about array descriptors, see the semantics document

       The other way we specify semantics in this example is by using an "action" adverb for a
       RHS alternative.  We've seen the "action" adverb several times, but skipped over it.  Now
       it is time to look at it.

           Term ::=
                 Factor action => ::first
               | Term '+' Term action => do_add
           Factor ::=
                 Number action => ::first
               | Factor '*' Factor action => do_multiply

       The ""::first"" action indicates that the value of a rule is to be the value of its first
       child, that is, the value corresponding to the first symbol of the rule's RHS.  (In the
       case of an empty rule, the value would be a Perl "undef").  (The initial double colon
       indicates a reserved action.)

       The action for the second RHS alternative defining "Term" is "do_add", and the action for
       the second RHS alternative defining "Factor" is "do_multiply".  To implement these
       actions, we need to "resolve" their names -- map the action names into the Perl closures
       which actually carry out the semantics.

       The "semantics_package" specified the package where we can find the actions:
       ""My_Actions"".  So, to resolve the "do_multiply" action, Marpa looks for a closure whose
       fully qualified name is "My_Actions::do_multiply", which it finds:

           sub My_Actions::do_multiply {
               my ( undef, $t1, undef, $t2 ) = @_;
               return $t1 * $t2;
           }

       The "do_add" action is resolved to a Perl semantic closure in much the same way:

           sub My_Actions::do_add {
               my ( undef, $t1, undef, $t2 ) = @_;
               return $t1 + $t2;
           }

       The Perl semantic closures are callbacks.  They are called as each node in a parse tree is
       evaluated.

       Each Perl semantic closure is called with one or more arguments.  The first argument to a
       value action is always a per-parse-tree object, which the callbacks can use as a
       scratchpad.  In this example, the per-parse-tree object is not used.  The remaining
       arguments will be the values of the node's "children" -- in other words, the values
       computed for each of its RHS symbols, in order.  If the action is for an empty rule, the
       per-parse-tree object will be its only argument.

       Every value action is expected to return a value.  With one exception, this value is
       passed up to a parent node as an argument.  The exception is the value for the start rule.
       The return value for the start rule becomes the parse result.

Tainted data

       Marpa::R2 exists to allow its input to alter execution in flexible and powerful ways.
       Marpa should not be used with untrusted input.  In Perl' s taint mode, it is a fatal error
       to use Marpa's SLIF interface with a tainted grammar, a tainted input string, or tainted
       token values.

Threads

       When used in a thread-safe Perl, Marpa::R2 should be thread-safe, with one important
       restriction: All Marpa objects that share the same grammar must be created and used within
       a single thread.

       This restriction may be lifted someday, but in practice it does not seem onerous.  Note
       that you can use the same grammar in different threads by creating grammars that are exact
       copies of each other, one grammar per thread.

The Marpa:: namespace

       The "Marpa::" top-level namespace is reserved.  For extensions to Marpa, one appropriate
       place is the "MarpaX::" namespace.  This practice helps avoid namespace collisions, and
       follows a CPAN standard, as exemplified by the "DBIx::" "LWPx::" and "MooseX::" which are
       for extensions of, respectively, DBI, LWP and Moose.

Other documents

       This document gives a semi-tutorial overview of Marpa's Scanless interface (SLIF).  For
       more details about the SLIF, there is an overview, and pages describing its DSL, its
       grammar methods, and its recognizer methods.

       Marpa has two other interfaces.  The thin interface provides direct access to the
       underlying Libmarpa C library.  Of the Perl interfaces to Marpa, the thin interface is the
       most low-level.  The thin interface offers efficient access to the full power of the Marpa
       parse engine, but it requires the application to do a lot of the work itself.

       Now discouraged, the named argument inteface (NAIF) was Marpa::R2's first interface.  It
       is a more traditional, middle level interface which uses Perl calls instead of a DSL.

       Marpa::R2::Vocabulary is intended as a quick refresher in parsing terminology, emphasizing
       how the standard terms are used in the Marpa context.  Marpa's standard semantics are
       fully described in the Marpa::R2::Semantics document.  Techniques for tracing and for
       debugging your Marpa grammars are described in the Marpa::R2::Tracing document and the
       Marpa::R2::Progress document.  For those with a theoretical bent, my sources, and other
       useful references, are described in Marpa::R2::Advanced::Bibliography.

Author

       Jeffrey Kegler

   Why is it called "Marpa"?
       Marpa is the name of the greatest of the Tibetan "translators".  In his time (the 11th
       century AD) Indian Buddhism was at its height.  Marpa's generation of scholars was devoted
       to producing Tibetan versions of Buddhism's Sanskrit scriptures.  Marpa became the
       greatest of them, and today is known as Marpa Lotsawa: "Marpa the Translator".

   Blatant plug
       Marpa is a character in my novel, The God Proof.  The God Proof centers around Kurt
       Goedel's proof of God's existence.  Yes, that Kurt Goedel, and yes, he really did work out
       a God Proof (it's in his Collected Works, Vol. 3, pp. 403-404).  The God Proof is
       available as a free download (<http://www.lulu.com/content/933192>).  It can be purchased
       in print form at Amazon.com:
       <http://www.amazon.com/God-Proof-Jeffrey-Kegler/dp/1434807355>.

Support

       Marpa::R2 comes without warranty.  Support is provided on a volunteer basis through the
       standard mechanisms for CPAN modules.  The Support document has details.

Copyright and License

         Copyright 2014 Jeffrey Kegler
         This file is part of Marpa::R2.  Marpa::R2 is free software: you can
         redistribute it and/or modify it under the terms of the GNU Lesser
         General Public License as published by the Free Software Foundation,
         either version 3 of the License, or (at your option) any later version.

         Marpa::R2 is distributed in the hope that it will be useful,
         but WITHOUT ANY WARRANTY; without even the implied warranty of
         MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
         Lesser General Public License for more details.

         You should have received a copy of the GNU Lesser
         General Public License along with Marpa::R2.  If not, see
         http://www.gnu.org/licenses/.