bionic (3) pt.3tcl.gz

Provided by: tcllib_1.19-dfsg-2_all bug

NAME

       pt - Parser Tools Application

SYNOPSIS

       package require Tcl  8.5

       pt generate resultformat ?options...? resultfile inputformat inputfile

________________________________________________________________________________________________________________

DESCRIPTION

       Are  you lost ?  Do you have trouble understanding this document ?  In that case please read the overview
       provided by the Introduction to Parser Tools. This document is the entrypoint to  the  whole  system  the
       current package is a part of.

       This document describes pt, the main application of the module, a parser generator. Its intended audience
       are people who wish to create a parser for some language  of  theirs.  Should  you  wish  to  modify  the
       application instead, please see the section about the application's Internals for the basic references.

       It resides in the User Application Layer of Parser Tools.

       IMAGE: arch_user_app

COMMAND LINE

       pt generate resultformat ?options...? resultfile inputformat inputfile
              This  sub-command  of the application reads the parsing expression grammar stored in the inputfile
              in the format inputformat, converts it to the resultformat under the  direction  of  the  (format-
              specific) set of options specified by the user and stores the result in the resultfile.

              The  inputfile  has  to  exist,  while the resultfile may be created, overwriting any pre-existing
              content of the file. Any missing directory in the path to the resultfile will be created as well.

              The exact form of the result for, and the set of options supported by  the  known  result-formats,
              are  explained  in  the upcoming sections of this document, with the list below providing an index
              mapping between format name and its associated section. In alphabetical order:

              c      A resultformat. See section C Parser.

              container
                     A resultformat. See section Grammar Container.

              critcl A resultformat. See section C Parser Embedded In Tcl.

              json   A input- and resultformat. See section JSON Grammar Exchange.

              oo     A resultformat. See section TclOO Parser.

              peg    A input- and resultformat. See section PEG Specification Language.

              snit   A resultformat. See section Snit Parser.

       Of the seven possible results four are parsers outright  (c,  critcl,  oo,  and  snit),  one  (container)
       provides  code  which  can  be  used  in  conjunction  with  a  generic  parser  (also known as a grammar
       interpreter), and the last two (json and peg) are  doing  double-duty  as  input  formats,  allowing  the
       transformation of grammars for exchange, reformatting, and the like.

       The created parsers fall into three categories:

       .nf + --- C ---> critcl, c | + --- specialized -+ |                  | ---+                  + --- Tcl ->
       snit, oo | + --- interpreted (Tcl) ------> container .fi

       Specialized parsers implemented in C
              The fastest parsers are created when using the result formats c and critcl. The first returns  the
              raw C code for the parser, while the latter wraps it into a Tcl package using CriTcl.

              This  makes  the  latter  much easier to use than the former. On the other hand, the former can be
              adapted to the users' requirements through a multitude of options, allowing for things like  usage
              of  the  parser outside of a Tcl environment, something the critcl format doesn't support. As such
              the c format is meant for more advanced users, or users with special needs.

              A disadvantage of all the parsers in this section is the need to run them through a C compiler  to
              make  them  actually  executable.  This is not something everyone has the necessary tools for. The
              parsers in the next section are for people under such restrictions.

       Specialized parsers implemented in Tcl
              As the parsers in this section are implemented in Tcl they are quite a bit  slower  than  anything
              from the previous section. On the other hand this allows them to be used in pure-Tcl environments,
              or in environments which allow only a limited set of binary packages. In the latter case  it  will
              be  advantageous  to lobby for the inclusion of the C-based runtime support (notes below) into the
              environment to reduce the impact of Tcl's on the speed of these parsers.

              The relevant formats are snit and oo. Both place their result into  a  Tcl  package  containing  a
              snit::type, or TclOO class respectively.

              Of  the supporting runtime, which is the package pt::rde, the user has to know nothing but that it
              does exist and that the parsers are dependent on it. Knowledge of the API exported by the  runtime
              for the parsers' consumption is not required by the parsers' users.

       Interpreted parsing implemented in Tcl
              The  last  category, grammar interpretation. This means that an interpreter for parsing expression
              grammars takes the description of the grammar to parse input for, and uses it  guide  the  parsing
              process.   This is the slowest of the available options, as the interpreter has to continually run
              through the configured grammar, whereas the specialized parsers of the previous sections have  the
              relevant knowledge about the grammar baked into them.

              The  only  places where using interpretation make sense is where the grammar for some input may be
              changed interactively by the user, as the interpretation allows for quick  turnaround  after  each
              change, whereas the previous methods require the generation of a whole new parser, which is not as
              fast.  On the other hand, wherever the grammar to use is fixed, the previous methods are much more
              advantageous  as the time to generate the parser is minuscule compared to the time the parser code
              is in use.

              The relevant result format is container.  It (quickly) generates grammar descriptions (instead  of
              a  full  parser)  which match the API expected by ParserTools' grammar interpreter.  The latter is
              provided by the package pt::peg::interp.

       All the parsers generated by critcl, snit, and oo, and the grammar interpreter share  a  common  API  for
       access  to  the  actual  parsing  functionality, making them all plug-compatible.  It is described in the
       Parser API specification document.

PEG SPECIFICATION LANGUAGE

       peg, a language for the specification of parsing expression grammars is meant to be human  readable,  and
       writable  as  well,  yet strict enough to allow its processing by machine. Like any computer language. It
       was defined to make writing the specification of a grammar easy, something the other formats found in the
       Parser Tools do not lend themselves too.

       For  either  an  introduction  to or the formal specification of the language, please go and read the PEG
       Language Tutorial.

       When used as a result-format this format supports the following options:

       -file string
              The value of this option is the name of the file or other entity from which the grammar came,  for
              which the command is run. The default value is unknown.

       -name string
              The  value  of  this  option  is  the name of the grammar we are processing.  The default value is
              a_pe_grammar.

       -user string
              The value of this option is the name of the user for which the command is run. The  default  value
              is unknown.

       -template string
              The  value  of  this option is a string into which to put the generated text and the values of the
              other options. The various  locations  for  user-data  are  expected  to  be  specified  with  the
              placeholders listed below. The default value is "@code@".

              @user@ To be replaced with the value of the option -user.

              @format@
                     To be replaced with the the constant PEG.

              @file@ To be replaced with the value of the option -file.

              @name@ To be replaced with the value of the option -name.

              @code@ To be replaced with the generated text.

JSON GRAMMAR EXCHANGE

       The  json  format for parsing expression grammars was written as a data exchange format not bound to Tcl.
       It was defined to allow the exchange of grammars with  PackRat/PEG  based  parser  generators  for  other
       languages.

       For  the  formal  specification  of the JSON grammar exchange format, please go and read The JSON Grammar
       Exchange Format.

       When used as a result-format this format supports the following options:

       -file string
              The value of this option is the name of the file or other entity from which the grammar came,  for
              which the command is run. The default value is unknown.

       -name string
              The  value  of  this  option  is  the name of the grammar we are processing.  The default value is
              a_pe_grammar.

       -user string
              The value of this option is the name of the user for which the command is run. The  default  value
              is unknown.

       -indented boolean
              If  this  option  is  set  the  system  will  break  the generated JSON across lines and indent it
              according to its inner structure, with each key of a dictionary on a separate line.

              If the option is not set (the default), the whole JSON object will be written on  a  single  line,
              with minimum spacing between all elements.

       -aligned boolean
              If  this  option  is  set  the system will ensure that the values for the keys in a dictionary are
              vertically aligned with each other, for a nice table effect.  To make this work this also  implies
              that -indented is set.

              If  the  option  is  not  set (the default), the output is formatted as per the value of indented,
              without trying to align the values for dictionary keys.

C PARSER EMBEDDED IN TCL

       The critcl format is executable code, a parser for the grammar. It is  a  Tcl  package  with  the  actual
       parser implementation written in C and embedded in Tcl via the critcl package.

       This result-format supports the following options:

       -file string
              The  value of this option is the name of the file or other entity from which the grammar came, for
              which the command is run. The default value is unknown.

       -name string
              The value of this option is the name of the grammar we  are  processing.   The  default  value  is
              a_pe_grammar.

       -user string
              The  value  of this option is the name of the user for which the command is run. The default value
              is unknown.

       -class string
              The value of this option is the name of the  class  to  generate,  without  leading  colons.   The
              default value is CLASS.

              For  a  simple  value X without colons, like CLASS, the parser command will be X::X. Whereas for a
              namespaced value X::Y the parser command will be X::Y.

       -package string
              The value of this option is the name of the package to generate.  The default value is PACKAGE.

       -version string
              The value of this option is the version of the package to generate.  The default value is 1.

C PARSER

       The c format is executable code, a parser for the grammar. The parser implementation is written in C  and
       can be tweaked to the users' needs through a multitude of options.

       The  critcl  format, for example, is implemented as a canned configuration of these options on top of the
       generator for c.

       This result-format supports the following options:

       -file string
              The value of this option is the name of the file or other entity from which the grammar came,  for
              which the command is run. The default value is unknown.

       -name string
              The  value  of  this  option  is  the name of the grammar we are processing.  The default value is
              a_pe_grammar.

       -user string
              The value of this option is the name of the user for which the command is run. The  default  value
              is unknown.

       -template string
              The  value  of  this  option  is  a  string  into  which  to  put the generated text and the other
              configuration settings. The various locations for user-data are expected to be specified with  the
              placeholders listed below. The default value is "@code@".

              @user@ To be replaced with the value of the option -user.

              @format@
                     To be replaced with the the constant C/PARAM.

              @file@ To be replaced with the value of the option -file.

              @name@ To be replaced with the value of the option -name.

              @code@ To be replaced with the generated Tcl code.

              The  following  options  are  special,  in that they will occur within the generated code, and are
              replaced there as well.

              @statedecl@
                     To be replaced with the value of the option state-decl.

              @stateref@
                     To be replaced with the value of the option state-ref.

              @strings@
                     To be replaced with the value of the option string-varname.

              @self@ To be replaced with the value of the option self-command.

              @def@  To be replaced with the value of the option fun-qualifier.

              @ns@   To be replaced with the value of the option namespace.

              @main@ To be replaced with the value of the option main.

              @prelude@
                     To be replaced with the value of the option prelude.

       -state-decl string
              A C string representing the argument declaration to use in  the  generated  parsing  functions  to
              refer  to  the  parsing state. In essence type and argument name.  The default value is the string
              RDE_PARAM p.

       -state-ref string
              A C string representing the argument named used in the generated parsing functions to refer to the
              parsing state.  The default value is the string p.

       -self-command string
              A  C  string representing the reference needed to call the generated parser function (methods ...)
              from another parser fonction, per the chosen framework (template).  The default value is the empty
              string.

       -fun-qualifier string
              A  C  string  containing  the attributes to give to the generated functions (methods ...), per the
              chosen framework (template).  The default value is static.

       -namespace string
              The name of the C namespace the parser functions (methods, ...) shall  reside  in,  or  a  general
              prefix to add to the function names.  The default value is the empty string.

       -main string
              The  name  of  the  main function (method, ...) to be called by the chosen framework (template) to
              start parsing input.  The default value is __main.

       -string-varname string
              The name of the variable used for the table of strings used by the generated  parser,  i.e.  error
              messages, symbol names, etc.  The default value is p_string.

       -prelude string
              A  snippet  of  code  to  be inserted at the head of each generated parsing function.  The default
              value is the empty string.

       -indent integer
              The number of characters to indent each line of the generated code by.  The default value is 0.

       -comments boolean
              A flag controlling the generation of code comments containing the original  parsing  expression  a
              parsing function is for.  The default value is on.

SNIT PARSER

       The  snit  format is executable code, a parser for the grammar. It is a Tcl package holding a snit::type,
       i.e. a class, whose instances are parsers for the input grammar.

       This result-format supports the following options:

       -file string
              The value of this option is the name of the file or other entity from which the grammar came,  for
              which the command is run. The default value is unknown.

       -name string
              The  value  of  this  option  is  the name of the grammar we are processing.  The default value is
              a_pe_grammar.

       -user string
              The value of this option is the name of the user for which the command is run. The  default  value
              is unknown.

       -class string
              The  value  of  this option is the name of the class to generate, without leading colons. Note, it
              serves double-duty as the name of  the  package  to  generate  too,  if  option  -package  is  not
              specified,  see below.  The default value is CLASS, applying if neither option -class nor -package
              were specified.

       -package string
              The value of this option is the name of the package to generate, without leading colons. Note,  it
              serves  double-duty  as  the name of the class to generate too, if option -class is not specified,
              see above.  The default value is PACKAGE, applying if neither  option  -package  nor  -class  were
              specified.

       -version string
              The value of this option is the version of the package to generate.  The default value is 1.

TCLOO PARSER

       The  oo  format  is executable code, a parser for the grammar. It is a Tcl package holding a TclOO class,
       whose instances are parsers for the input grammar.

       This result-format supports the following options:

       -file string
              The value of this option is the name of the file or other entity from which the grammar came,  for
              which the command is run. The default value is unknown.

       -name string
              The  value  of  this  option  is  the name of the grammar we are processing.  The default value is
              a_pe_grammar.

       -user string
              The value of this option is the name of the user for which the command is run. The  default  value
              is unknown.

       -class string
              The  value  of  this option is the name of the class to generate, without leading colons. Note, it
              serves double-duty as the name of  the  package  to  generate  too,  if  option  -package  is  not
              specified,  see below.  The default value is CLASS, applying if neither option -class nor -package
              were specified.

       -package string
              The value of this option is the name of the package to generate, without leading colons. Note,  it
              serves  double-duty  as  the name of the class to generate too, if option -class is not specified,
              see above.  The default value is PACKAGE, applying if neither  option  -package  nor  -class  were
              specified.

       -version string
              The value of this option is the version of the package to generate.  The default value is 1.

GRAMMAR CONTAINER

       The container format is another form of describing parsing expression grammars. While data in this format
       is executable it does not constitute a parser for the grammar. It always has to be  used  in  conjunction
       with the package pt::peg::interp, a grammar interpreter.

       The  format  represents  grammars  by a snit::type, i.e. class, whose instances are API-compatible to the
       instances of the pt::peg::container package, and which are preloaded with the grammar in question.

       This result-format supports the following options:

       -file string
              The value of this option is the name of the file or other entity from which the grammar came,  for
              which the command is run. The default value is unknown.

       -name string
              The  value  of  this  option  is  the name of the grammar we are processing.  The default value is
              a_pe_grammar.

       -user string
              The value of this option is the name of the user for which the command is run. The  default  value
              is unknown.

       -mode bulk|incremental
              The  value  of  this  option  controls  which  methods of pt::peg::container instances are used to
              specify the grammar, i.e. preload it into the container. There are two  legal  values,  as  listed
              below. The default is bulk.

              bulk   In  this mode the methods start, add, modes, and rules are used to specify the grammar in a
                     bulk manner, i.e. as a set of nonterminal symbols, and two dictionaries  mapping  from  the
                     symbols to their semantic modes and parsing expressions.

                     This mode is the default.

              incremental
                     In  this  mode  the  methods  start,  add,  mode,  and rule are used to specify the grammar
                     piecemal, with each nonterminal having its own block of defining commands.

       -template string
              The value of this option is a  string  into  which  to  put  the  generated  code  and  the  other
              configuration  settings. The various locations for user-data are expected to be specified with the
              placeholders listed below. The default value is "@code@".

              @user@ To be replaced with the value of the option -user.

              @format@
                     To be replaced with the the constant CONTAINER.

              @file@ To be replaced with the value of the option -file.

              @name@ To be replaced with the value of the option -name.

              @mode@ To be replaced with the value of the option -mode.

              @code@ To be replaced with the generated code.

EXAMPLE

       In this section we are working a complete example, starting with a PEG grammar and  ending  with  running
       the parser generated from it over some input, following the outline shown in the figure below:

       IMAGE: flow

       Our grammar, assumed to the stored in the file "calculator.peg" is

              PEG calculator (Expression)
                  Digit      <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9'       ;
                  Sign       <- '-' / '+'                                     ;
                  Number     <- Sign? Digit+                                  ;
                  Expression <- Term (AddOp Term)*                            ;
                  MulOp      <- '*' / '/'                                     ;
                  Term       <- Factor (MulOp Factor)*                        ;
                  AddOp      <- '+'/'-'                                       ;
                  Factor     <- '(' Expression ')' / Number                   ;
              END;

       From this we create a snit-based parser via

              pt generate snit calculator.tcl -class calculator -name calculator peg calculator.peg

       which  leaves  us  with the parser package and class written to the file "calculator.tcl".  Assuming that
       this package is then properly installed in a place where Tcl can find it we can now use this class via  a
       script like

                  package require calculator

                  lassign $argv input
                  set channel [open $input r]

                  set parser [calculator]
                  set ast [$parser parse $channel]
                  $parser destroy
                  close $channel

                  ... now process the returned abstract syntax tree ...

       where the abstract syntax tree stored in the variable will look like

              set ast {Expression 0 4
                  {Factor 0 4
                      {Term 0 2
                          {Number 0 2
                              {Digit 0 0}
                              {Digit 1 1}
                              {Digit 2 2}
                          }
                      }
                      {AddOp 3 3}
                      {Term 4 4
                          {Number 4 4
                              {Digit 4 4}
                          }
                      }
                  }
              }

       assuming that the input file and channel contained the text

               120+5
       A more graphical representation of the tree would be

       .nf  +-  Digit  0  0  |  1  |             |  +-  Term  0  2  ---  Number  0  2  -+-  Digit  1  1  |  2  |
       |            | |                           +- Digit 2 2 |  0  |                                         |
       Expression    0    4   ---   Factor   0   4   -+-----------------------------   AddOp   3   3   |   +   |
       | +- Term 4 4 --- Number 4 4 --- Digit 4 4 | 5 .fi

       Regardless, at this point it is the user's responsibility to work with the tree to  reach  whatever  goal
       she  desires.  I.e.  analyze it, transform it, etc. The package pt::ast should be of help here, providing
       commands to walk such ASTs structures in various ways.

       One important thing to note is that the parsers used  here  return  a  data  structure  representing  the
       structure  of  the input per the grammar underlying the parser. There are no callbacks during the parsing
       process, i.e. no parsing actions, as most other parsers will have.

       Going back to the last snippet of code, the execution of the parser for some input, note how  the  parser
       instance follows the specified Parser API.

INTERNALS

       This  section  is  intended  for  users  of the application which wish to modify or extend it. Users only
       interested in the generation of parsers can ignore it.

       The main functionality of the application is encapsulated in the package pt::pgen.  Please  read  it  for
       more information.

BUGS, IDEAS, FEEDBACK

       This  document,  and  the package it describes, will undoubtedly contain bugs and other problems.  Please
       report such in the category pt of the  Tcllib  Trackers  [http://core.tcl.tk/tcllib/reportlist].   Please
       also report any ideas for enhancements you may have for either package and/or documentation.

       When proposing code changes, please provide unified diffs, i.e the output of diff -u.

       Note  further  that  attachments  are strongly preferred over inlined patches. Attachments can be made by
       going to the Edit form of the ticket immediately after its creation, and then using the left-most  button
       in the secondary navigation bar.

KEYWORDS

       EBNF,   LL(k),  PEG,  TDPL,  context-free  languages,  expression,  grammar,  matching,  parser,  parsing
       expression, parsing expression grammar, push down automaton, recursive descent, state,  top-down  parsing
       languages, transducer

CATEGORY

       Parsing and Grammars

       Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>