Provided by: peg_0.1.18-1_amd64 bug

NAME

       peg, leg - parser generators

SYNOPSIS

       peg [-hvV -ooutput] [filename ...]
       leg [-hvV -ooutput] [filename ...]

DESCRIPTION

       peg and leg are tools for generating recursive-descent parsers: programs that perform pattern matching on
       text.  They process a Parsing Expression Grammar (PEG) [Ford 2004] to produce a program  that  recognises
       legal sentences of that grammar.  peg processes PEGs written using the original syntax described by Ford;
       leg processes PEGs written using slightly different syntax and conventions that are intended to  make  it
       an  attractive  replacement  for parsers built with lex(1) and yacc(1).  Unlike lex and yacc, peg and leg
       support unlimited backtracking, provide ordered choice as a means for  disambiguation,  and  can  combine
       scanning (lexical analysis) and parsing (syntactic analysis) into a single activity.

       peg  reads the specified filenames, or standard input if no filenames are given, for a grammar describing
       the parser to generate.  peg then generates a C source file that defines a function  yyparse().   This  C
       source file can be included in, or compiled and then linked with, a client program.  Each time the client
       program calls yyparse() the parser consumes input text according to the parsing rules, starting from  the
       first  rule  in  the  grammar.   yyparse() returns non-zero if the input could be parsed according to the
       grammar; it returns zero if the input could not be parsed.

       The prefix 'yy' or 'YY' is prepended to all externally-visible symbols in the generated parser.  This  is
       intended  to  reduce  the  risk  of  namespace  pollution  in  client  programs.   (The choice of 'yy' is
       historical; see lex(1) and yacc(1), for example.)

OPTIONS

       peg and leg provide the following options:

       -h     prints a summary of available options and then exits.

       -ooutput
              writes the generated parser to the file output instead of the standard output.

       -P     suppresses #line directives in the output.

       -v     writes verbose information to standard error while working.

       -V     writes version information to standard error then exits.

A SIMPLE EXAMPLE

       The following peg input specifies a grammar with a single rule (called 'start') that  is  satisfied  when
       the input contains the string "username".

           start <- "username"

       (The  quotation  marks  are  not  part of the matched text; they serve to indicate a literal string to be
       matched.)  In other words, yyparse() in the generated C source will return  non-zero  only  if  the  next
       eight  characters  read  from  the input spell the word "username".  If the input contains anything else,
       yyparse() returns zero and no input will have been consumed.  (Subsequent calls to  yyparse()  will  also
       return  zero,  since  the  parser  is  effectively blocked looking for the string "username".)  To ensure
       progress we can add an alternative clause to the 'start' rule that will match  any  single  character  if
       "username" is not found.

           start <- "username"
                  / .

       yyparse()  now  always returns non-zero (except at the very end of the input).  To do something useful we
       can add actions to the rules.  These actions are performed after a complete match is found (starting from
       the  first  rule)  and  are  chosen according to the 'path' taken through the grammar to match the input.
       (Linguists would call this path a 'phrase marker'.)

           start <- "username"    { printf("%s\n", getlogin()); }
                  / < . >         { putchar(yytext[0]); }

       The first line instructs the parser to print the user's login name whenever it  sees  "username"  in  the
       input.  If that match fails, the second line tells the parser to echo the next character on the input the
       standard output.  Our parser is now performing useful work:  it  will  copy  the  input  to  the  output,
       replacing all occurrences of "username" with the user's account name.

       Note the angle brackets ('<' and '>') that were added to the second alternative.  These have no effect on
       the meaning of the rule, but serve to delimit the text made available to  the  following  action  in  the
       variable yytext.

       If the above grammar is placed in the file username.peg, running the command

           peg -o username.c username.peg

       will  save  the  corresponding  parser  in the file username.c.  To create a complete program this parser
       could be included by a C program as follows.

           #include <stdio.h>      /* printf(), putchar() */
           #include <unistd.h>     /* getlogin() */

           #include "username.c"   /* yyparse() */

           int main()
           {
             while (yyparse())     /* repeat until EOF */
               ;
             return 0;
           }

PEG GRAMMARS

       A grammar consists of a set of named rules.

           name <- pattern

       The pattern contains one or more of the following elements.

       name   The element stands for the entire pattern in the rule with the given name.

       "characters"
              A character or string enclosed in double quotes is matched literally.  The ANSI C escape sequences
              are recognised within the characters.

       'characters'
              A character or string enclosed in single quotes is matched literally, as above.

       [characters]
              A  set  of  characters enclosed in square brackets matches any single character from the set, with
              escape characters recognised as above.  If the set begins with an uparrow  (^)  then  the  set  is
              negated (the element matches any character not in the set).  Any pair of characters separated with
              a dash (-) represents the range of characters from the first to the second, inclusive.   A  single
              alphabetic character or underscore is matched by the following set.

                  [a-zA-Z_]

              Similarly, the following matches  any single non-digit character.

                  [^0-9]

       .      A  dot  matches  any  character.   Note that the only time this fails is at the end of file, where
              there is no character to match.

       ( pattern )
              Parentheses are used for grouping (modifying the precedence of the operators described below).

       { action }
              Curly braces surround actions.  The action is arbitrary C source code to be executed at the end of
              matching.   Any braces within the action must be properly nested.  Any input text that was matched
              before the action and delimited by angle brackets (see below) is made available within the  action
              as  the contents of the character array yytext.  The length of (number of characters in) yytext is
              available in the variable yyleng.  (These variable names are historical; see lex(1).)

       <      An opening angle bracket always matches (consuming no  input)  and  causes  the  parser  to  begin
              accumulating matched text.  This text will be made available to actions in the variable yytext.

       >      A  closing  angle  bracket  always  matches  (consuming  no  input)  and causes the parser to stop
              accumulating text for yytext.

       The above elements can be made optional and/or repeatable with the following suffixes:

       element ?
              The element is optional.  If present on the input, it is consumed and the match succeeds.  If  not
              present on the input, no text is consumed and the match succeeds anyway.

       element +
              The  element  is  repeatable.   If  present  on  the input, one or more occurrences of element are
              consumed and the match succeeds.  If no occurrences of element are present on the input, the match
              fails.

       element *
              The  element  is  optional  and  repeatable.   If present on the input, one or more occurrences of
              element are consumed and the match succeeds.  If no occurrences of  element  are  present  on  the
              input, the match succeeds anyway.

       The  above  elements  and  suffixes can be converted into predicates (that match arbitrary input text and
       subsequently succeed or fail without consuming that input) with the following prefixes:

       & element
              The predicate succeeds only if element can be matched.  Input text scanned while matching  element
              is not consumed from the input and remains available for subsequent matching.

       ! element
              The  predicate  succeeds  only  if  element  cannot be matched.  Input text scanned while matching
              element is not consumed from the input and remains available for subsequent matching.   A  popular
              idiom is

                  !.

              which matches the end of file, after the last character of the input has already been consumed.

       A special form of the '&' predicate is provided:

       &{ expression }
              In this predicate the simple C expression (not statement) is evaluated immediately when the parser
              reaches the predicate.  If the expression yields non-zero (true)  the  'match'  succeeds  and  the
              parser  continues with the next element in the pattern.  If the expression yields zero (false) the
              'match' fails and the parser backs up to look for an alternative parse of the input.

       Several elements (with or without prefixes and suffixes) can be combined into a sequence by writing  them
       one after the other.  The entire sequence matches only if each individual element within it matches, from
       left to right.

       Sequences can be separated into disjoint alternatives by the alternation operator '/'.

       sequence-1 / sequence-2 / ... / sequence-N
              Each sequence is tried in turn until one of them matches, at which time matching for  the  overall
              pattern succeeds.  If none of the sequences matches then the match of the overall pattern fails.

       Finally, the pound sign (#) introduces a comment (discarded) that continues until the end of the line.

       To  summarise  the above, the parser tries to match the input text against a pattern containing literals,
       names (representing other rules), and various operators (written as prefixes, suffixes, juxtaposition for
       sequencing  and  and  infix  alternation  operator)  that  modify how the elements within the pattern are
       matched.  Matches are made from left to right, 'descending' into named sub-rules as they are encountered.
       If  the  matching  process  fails,  the  parser 'back tracks' ('rewinding' the input appropriately in the
       process) to find the nearest alternative 'path' through the grammar.  In other words the parser  performs
       a  depth-first,  left-to-right  search  for  the  first successfully-matching path through the rules.  If
       found, the actions along the successful path are executed (in the order they were encountered).

       Note that predicates are evaluated immediately during the search  for  a  successful  match,  since  they
       contribute  to  the  success  or  failure  of  the  search.  Actions, however, are evaluated only after a
       successful match has been found.

PEG GRAMMAR FOR PEG GRAMMARS

       The grammar for peg grammars is  shown  below.   This  will  both  illustrate  and  formalise  the  above
       description.

           Grammar         <- Spacing Definition+ EndOfFile

           Definition      <- Identifier LEFTARROW Expression
           Expression      <- Sequence ( SLASH Sequence )*
           Sequence        <- Prefix*
           Prefix          <- AND Action
                            / ( AND | NOT )? Suffix
           Suffix          <- Primary ( QUERY / STAR / PLUS )?
           Primary         <- Identifier !LEFTARROW
                            / OPEN Expression CLOSE
                            / Literal
                            / Class
                            / DOT
                            / Action
                            / BEGIN
                            / END

           Identifier      <- < IdentStart IdentCont* > Spacing
           IdentStart      <- [a-zA-Z_]
           IdentCont       <- IdentStart / [0-9]
           Literal         <- ['] < ( !['] Char  )* > ['] Spacing
                            / ["] < ( !["] Char  )* > ["] Spacing
           Class           <- '[' < ( !']' Range )* > ']' Spacing
           Range           <- Char '-' Char / Char
           Char            <- '\\' [abefnrtv'"\[\]\\]
                            / '\\' [0-3][0-7][0-7]
                            / '\\' [0-7][0-7]?
                            / '\\' '-'
                            / !'\\' .
           LEFTARROW       <- '<-' Spacing
           SLASH           <- '/' Spacing
           AND             <- '&' Spacing
           NOT             <- '!' Spacing
           QUERY           <- '?' Spacing
           STAR            <- '*' Spacing
           PLUS            <- '+' Spacing
           OPEN            <- '(' Spacing
           CLOSE           <- ')' Spacing
           DOT             <- '.' Spacing
           Spacing         <- ( Space / Comment )*
           Comment         <- '#' ( !EndOfLine . )* EndOfLine
           Space           <- ' ' / '\t' / EndOfLine
           EndOfLine       <- '\r\n' / '\n' / '\r'
           EndOfFile       <- !.
           Action          <- '{' < [^}]* > '}' Spacing
           BEGIN           <- '<' Spacing
           END             <- '>' Spacing

LEG GRAMMARS

       leg  is  a  variant  of  peg  that  adds some features of lex(1) and yacc(1).  It differs from peg in the
       following ways.

       %{ text... %}
              A declaration section can appear anywhere that a rule definition is expected.   The  text  between
              the  delimiters  '%{'  and  '%}' is copied verbatim to the generated C parser code before the code
              that implements the parser itself.

       name = pattern
              The 'assignment' operator replaces the left arrow operator '<-'.

       rule-name
              Hyphens can appear as letters in the names of rules.  Each hyphen is converted into an  underscore
              in the generated C source code.  A single hyphen '-' is a legal rule name.

                  -       = [ \t\n\r]*
                  number  = [0-9]+                 -
                  name    = [a-zA-Z_][a-zA_Z_0-9]* -
                  l-paren = '('                    -
                  r-paren = ')'                    -

              This  example  shows  how  ignored  whitespace  can  be  obvious  when reading the grammar and yet
              unobtrusive when placed liberally at the end of every rule associated with a lexical element.

       seq-1 | seq-2
              The alternation operator is vertical bar '|' rather than forward slash '/'.  The peg rule

                  name <- sequence-1
                        / sequence-2
                        / sequence-3

              is therefore written

                  name = sequence-1
                       | sequence-2
                       | sequence-3
                       ;

              in leg (with the final semicolon being optional, as described next).

       @{ action }
              Actions prefixed with an 'at' symbol will be performed  during  parsing,  at  the  time  they  are
              encountered  while  matching  the  input  text  with  a rule.  Because of back-tracking in the PEG
              parsing algorithm, actions prefixed with '@' might be performed multiple times for the same  input
              text.   (The  usual  behviour of actions is that they are saved up until matching is complete, and
              then those that are part of the final derivation  are  performed  in  left-to-right  order.)   The
              variable yytext is available within these actions.

       exp ~ { action }
              A  postfix  operator  ~{ action } can be placed after any expression and will behave like a normal
              action (arbitrary C code) except that it is invoked only when exp fails.  It  binds  less  tightly
              than  any other operator except alternation and sequencing, and is intended to make error handling
              and recovery code easier to write.  Note that yytext and yyleng are  not  available  inside  these
              actions,  but  the  pointer  variable  yy is available to give the code access to any user-defined
              members of the parser state (see "CUSTOMISING THE PARSER" below).  Note also that exp is always  a
              single  expression;  to invoke an error action for any failure within a sequence, parentheses must
              be used to group the sequence into a single expression.

                  rule = e1 e2 e3 ~{ error("e[12] ok; e3 has failed"); }
                       | ...

                  rule = (e1 e2 e3) ~{ error("one of e[123] has failed"); }
                       | ...

       pattern ;
              A semicolon punctuator can optionally terminate a pattern.

       %% text...
              A double percent '%%' terminates the rules (and declarations) section of the  grammar.   All  text
              following  '%%'  is copied verbatim to the generated C parser code after the parser implementation
              code.

       $$ = value
              A sub-rule can return a semantic value from an action by assigning it to the pseudo-variable '$$'.
              All  semantic  values must have the same type (which defaults to 'int').  This type can be changed
              by defining YYSTYPE in a declaration section.

       identifier:name
              The semantic value returned (by assigning to '$$') from the sub-rule name is associated  with  the
              identifier and can be referred to in subsequent actions.

       The desk calculator example below illustrates the use of '$$' and ':'.

LEG EXAMPLE: A DESK CALCULATOR

       The  extensions  in  leg  described  above  allow  useful parsers and evaluators (including declarations,
       grammar rules, and supporting C functions such as 'main') to be kept within a  single  source  file.   To
       illustrate  this  we  show  a  simple desk calculator supporting the four common arithmetic operators and
       named variables.  The intermediate results of arithmetic evaluation will be accumulated  on  an  implicit
       stack by returning them as semantic values from sub-rules.

           %{
           #include <stdio.h>     /* printf() */
           #include <stdlib.h>    /* atoi() */
           int vars[26];
           %}

           Stmt    = - e:Expr EOL                  { printf("%d\n", e); }
                   | ( !EOL . )* EOL               { printf("error\n"); }

           Expr    = i:ID ASSIGN s:Sum             { $$ = vars[i] = s; }
                   | s:Sum                         { $$ = s; }

           Sum     = l:Product
                           ( PLUS  r:Product       { l += r; }
                           | MINUS r:Product       { l -= r; }
                           )*                      { $$ = l; }

           Product = l:Value
                           ( TIMES  r:Value        { l *= r; }
                           | DIVIDE r:Value        { l /= r; }
                           )*                      { $$ = l; }

           Value   = i:NUMBER                      { $$ = atoi(yytext); }
                   | i:ID !ASSIGN                  { $$ = vars[i]; }
                   | OPEN i:Expr CLOSE             { $$ = i; }

           NUMBER  = < [0-9]+ >    -               { $$ = atoi(yytext); }
           ID      = < [a-z]  >    -               { $$ = yytext[0] - 'a'; }
           ASSIGN  = '='           -
           PLUS    = '+'           -
           MINUS   = '-'           -
           TIMES   = '*'           -
           DIVIDE  = '/'           -
           OPEN    = '('           -
           CLOSE   = ')'           -

           -       = [ \t]*
           EOL     = '\n' | '\r\n' | '\r' | ';'

           %%

           int main()
           {
             while (yyparse())
               ;
             return 0;
           }

LEG GRAMMAR FOR LEG GRAMMARS

       The  grammar  for  leg  grammars  is  shown  below.   This  will  both illustrate and formalise the above
       description.

           grammar =       -
                           ( declaration | definition )+
                           trailer? end-of-file

           declaration =   '%{' < ( !'%}' . )* > RPERCENT

           trailer =       '%%' < .* >

           definition =    identifier EQUAL expression SEMICOLON?

           expression =    sequence ( BAR sequence )*

           sequence =      error+

           error =         prefix ( TILDE action )?

           prefix =        AND action
           |               ( AND | NOT )? suffix

           suffix =        primary ( QUERY | STAR | PLUS )?

           primary =       identifier COLON identifier !EQUAL
           |               identifier !EQUAL
           |               OPEN expression CLOSE
           |               literal
           |               class
           |               DOT
           |               action
           |               BEGIN
           |               END

           identifier =    < [-a-zA-Z_][-a-zA-Z_0-9]* > -

           literal =       ['] < ( !['] char )* > ['] -
           |               ["] < ( !["] char )* > ["] -

           class =         '[' < ( !']' range )* > ']' -

           range =         char '-' char | char

           char =          '\\' [abefnrtv'"\[\]\\]
           |               '\\' [0-3][0-7][0-7]
           |               '\\' [0-7][0-7]?
           |               !'\\' .

           action =        '{' < braces* > '}' -

           braces =        '{' braces* '}'
           |               !'}' .

           EQUAL =         '=' -
           COLON =         ':' -
           SEMICOLON =     ';' -
           BAR =           '|' -
           AND =           '&' -
           NOT =           '!' -
           QUERY =         '?' -
           STAR =          '*' -
           PLUS =          '+' -
           OPEN =          '(' -
           CLOSE =         ')' -
           DOT =           '.' -
           BEGIN =         '<' -
           END =           '>' -
           TILDE =         '~' -
           RPERCENT =      '%}' -

           - =             ( space | comment )*
           space =         ' ' | '\t' | end-of-line
           comment =       '#' ( !end-of-line . )* end-of-line
           end-of-line =   '\r\n' | '\n' | '\r'
           end-of-file =   !.

CUSTOMISING THE PARSER

       The following symbols can be redefined in declaration sections to modify the generated parser code.

       YYSTYPE
              The semantic value type.  The pseudo-variable '$$' and the identifiers  'bound'  to  rule  results
              with  the  colon  operator  ':' should all be considered as being declared to have this type.  The
              default value is 'int'.

       YYPARSE
              The name of the main entry point to the parser.  The default value is 'yyparse'.

       YYPARSEFROM
              The name of an alternative entry point to the parser.  This function  expects  one  argument:  the
              function corresponding to the rule from which the search for a match should begin.  The default is
              'yyparsefrom'.  Note that yyparse() is defined as

                  int yyparse() { return yyparsefrom(yy_foo); }

              where 'foo' is the name of the first rule in the grammar.

       YY_INPUT(buf, result, max_size)
              This macro is invoked by the parser to obtain more input text.  buf points to an  area  of  memory
              that  can  hold  at  most  max_size  characters.  The macro should copy input text to buf and then
              assign the integer variable result to indicate the number of characters copied.  If no more  input
              is  available,  the macro should assign 0 to result.  By default, the YY_INPUT macro is defined as
              follows.

                  #define YY_INPUT(buf, result, max_size)        \
                  {                                              \
                    int yyc= getchar();                          \
                    result= (EOF == yyc) ? 0 : (*(buf)= yyc, 1); \
                  }

              Note that if YY_CTX_LOCAL is defined (see below) then an additional first argument, containing the
              parser context, is passed to YY_INPUT.

       YY_DEBUG
              If  this  symbols  is defined then additional code will be included in the parser that prints vast
              quantities of arcane information to the standard error while the parser is running.

       YY_BEGIN
              This macro is invoked to mark the start of input text that will be made available  in  actions  as
              'yytext'.   This  corresponds  to  occurrences  of  '<'  in the grammar.  These are converted into
              predicates that are expected to succeed.  The default definition

                  #define YY_BEGIN (yybegin= yypos, 1)

              therefore saves the current input position and returns 1 ('true') as the result of the predicate.

       YY_END This macros corresponds to '>' in the grammar.  Again, it is a predicate so the default definition
              saves the input position before 'succeeding'.

                  #define YY_END (yyend= yypos, 1)

       YY_PARSE(T)
              This  macro  declares  the  parser  entry  points  (yyparse and yyparsefrom) to be of type T.  The
              default definition

                  #define YY_PARSE(T) T

              leaves yyparse() and yyparsefrom() with global visibility.   If  they  should  not  be  externally
              visible in other source files, this macro can be redefined to declare them 'static'.

                  #define YY_PARSE(T) static T

       YY_CTX_LOCAL
              If  this  symbol is defined during compilation of a generated parser then global parser state will
              be kept in a structure of type 'yycontext' which can be declared as a local variable.  This allows
              multiple  instances  of  parsers to coexist and to be thread-safe.  The parsing function yyparse()
              will be declared to expect a first argument of type 'yycontext *', an instance  of  the  structure
              holding  the global state for the parser.  This instance must be allocated and initialised to zero
              by the client.  A trivial but complete example is as follows.

                  #include <stdio.h>

                  #define YY_CTX_LOCAL

                  #include "the-generated-parser.peg.c"

                  int main()
                  {
                    yycontext ctx;
                    memset(&ctx, 0, sizeof(yycontext));
                    while (yyparse(&ctx));
                    return 0;
                  }

              Note that if this symbol is undefined then the compiled parser will statically allocate its global
              state  and  will  be  neither  reentrant  nor  thread-safe.   Note  also that the parser yycontext
              structure is initialised automatically the first time yyparse() is  called;  this  structure  must
              therefore be properly initialised to zero before the first call to yyparse().

       YY_CTX_MEMBERS
              If  YY_CTX_LOCAL  is defined (see above) then the macro YY_CTX_MEMBERS can be defined to expand to
              any additional member field declarations that the client would like included in the declaration of
              the  'yycontext'  structure type.  These additional members are otherwise ignored by the generated
              parser.  The instance of 'yycontext' associated with  the  currently-active  parser  is  available
              within actions as the pointer variable yy.

       YY_BUFFER_SIZE
              The initial size of the text buffer, in bytes.  The default is 1024 and the buffer size is doubled
              whenever required to meet demand during parsing.  An application that typically parses much longer
              strings could increase this to avoid unnecessary buffer reallocation.

       YY_STACK_SIZE
              The initial size of the variable and action stacks.  The default is 128, which is doubled whenever
              required to meet demand during parsing.  Applications that have deep call stacks with  many  local
              variables,  or  that  perform many actions after a single successful match, could increase this to
              avoid unnecessary buffer reallocation.

       YY_MALLOC(YY, SIZE)
              The memory allocator for all parser-related storage.  The parameters  are  the  current  yycontext
              structure and the number of bytes to allocate.  The default definition is: malloc(SIZE)

       YY_REALLOC(YY, PTR, SIZE)
              The  memory  reallocator for dynamically-grown storage (such as text buffers and variable stacks).
              The parameters are the current yycontext structure,  the  previously-allocated  storage,  and  the
              number   of   bytes   to  which  that  storage  should  be  grown.   The  default  definition  is:
              realloc(PTR, SIZE)

       YY_FREE(YY, PTR)
              The memory deallocator.  The parameters are the current yycontext structure  and  the  storage  to
              deallocate.  The default definition is: free(PTR)

       YYRELEASE
              The  name  of the function that releases all resources held by a yycontext structure.  The default
              value is 'yyrelease'.

       The following variables can be referred to within actions.

       char *yybuf
              This variable points to the parser's input buffer used to store input text that has not  yet  been
              matched.

       int yypos
              This is the offset (in yybuf) of the next character to be matched and consumed.

       char *yytext
              The most recent matched text delimited by '<' and '>' is stored in this variable.

       int yyleng
              This variable indicates the number of characters in 'yytext'.

       yycontext *yy
              This variable points to the instance of 'yycontext' associated with the currently-active parser.

       Programs that wish to release all the resources associated with a parser can use the following function.

       yyrelease(yycontext*yy)
              Returns  all  parser-allocated  storage  associated  with  yy  to the system.  The storage will be
              reallocated on the next call to yyparse().

       Note that the storage for the yycontext structure itself is never allocated or reclaimed implicitly.  The
       application  must  allocate  these  structures in automatic storage, or use calloc() and free() to manage
       them explicitly.  The example in the following section demonstrates one approach to resource management.

LEG EXAMPLE: EXTENDING THE PARSER'S CONTEXT

       The yy variable passed to actions contains the state of the parser plus any additional fields defined  by
       YY_CTX_MEMBERS.   Theses fields can be used to store application-specific information that is global to a
       particular call of yyparse().  A trivial  but  complete  leg  example  follows  in  which  the  yycontext
       structure  is  extended  with  a  count of the number of newline characters seen in the input so far (the
       grammar otherwise consumes and ignores the entire input).  The caller of yyparse() uses  count  to  print
       the number of lines of input that were read.

           %{
           #define YY_CTX_LOCAL 1
           #define YY_CTX_MEMBERS \
             int count;
           %}

           Char    = ('\n' | '\r\n' | '\r')        { yy->count++ }
                   | .

           %%

           #include <stdio.h>
           #include <string.h>

           int main()
           {
               /* create a local parser context in automatic storage */
               yycontext yy;
               /* the context *must* be initialised to zero before first use*/
               memset(&yy, 0, sizeof(yy));

               while (yyparse(&yy))
                   ;
               printf("%d newlines\n", yy.count);

               /* release all resources associated with the context */
               yyrelease(&yy);

               return 0;
           }

DIAGNOSTICS

       peg and leg warn about the following conditions while converting a grammar into a parser.

       syntax error
              The  input grammar was malformed in some way.  The error message will include the text about to be
              matched (often backed up a huge amount from the actual location of the error) and the line  number
              of the most recently considered character (which is often the real location of the problem).

       rule 'foo' used but not defined
              The  grammar referred to a rule named 'foo' but no definition for it was given.  Attempting to use
              the generated parser will likely result in  errors  from  the  linker  due  to  undefined  symbols
              associated with the missing rule.

       rule 'foo' defined but not used
              The  grammar defined a rule named 'foo' and then ignored it.  The code associated with the rule is
              included in the generated parser which will in all other respects be healthy.

       possible infinite left recursion in rule 'foo'
              There exists at least one path through the grammar that leads from  the  rule  'foo'  back  to  (a
              recursive invocation of) the same rule without consuming any input.

       Left  recursion,  especially  that  found  in  standards documents, is often 'direct' and implies trivial
       repetition.

           # (6.7.6)
           direct-abstract-declarator =
               LPAREN abstract-declarator RPAREN
           |   direct-abstract-declarator? LBRACKET assign-expr? RBRACKET
           |   direct-abstract-declarator? LBRACKET STAR RBRACKET
           |   direct-abstract-declarator? LPAREN param-type-list? RPAREN

       The recursion can easily be eliminated by converting the parts of the  pattern  following  the  recursion
       into a repeatable suffix.

           # (6.7.6)
           direct-abstract-declarator =
               direct-abstract-declarator-head?
               direct-abstract-declarator-tail*

           direct-abstract-declarator-head =
               LPAREN abstract-declarator RPAREN

           direct-abstract-declarator-tail =
               LBRACKET assign-expr? RBRACKET
           |   LBRACKET STAR RBRACKET
           |   LPAREN param-type-list? RPAREN

CAVEATS

       A parser that accepts empty input will always succeed.  Consider the following example, not atypical of a
       first attempt to write a PEG-based parser:

           Program = Expression*
           Expression = "whatever"
           %%
           int main() {
             while (yyparse())
               puts("success!");
             return 0;
           }

       This program loops forever, no matter what (if any) input is provided on stdin.  Many fixes are possible,
       the  easiest  being  to  insist that the parser always consumes some non-empty input.  Changing the first
       line to

           Program = Expression+

       accomplishes this.  If the parser is expected to consume the entire input, then explicitly requiring  the
       end-of-file is also highly recommended:

           Program = Expression+ !.

       This  works  because  the  parser  will  only  fail  to  match  ("!" predicate) any character at all ("."
       expression) when it attempts to read beyond the end of the input.

BUGS

       You have to type 'man peg' to read the manual page for leg(1).

       The 'yy' and 'YY' prefixes cannot be changed.

       Left recursion is detected in the input grammar but is not handled correctly in the generated parser.

       Diagnostics for errors in the input grammar are obscure and not particularly helpful.

       The operators ! and ~ should really be named the other way around.

       Several commonly-used lex(1) features (yywrap(), yyin, etc.) are completely absent.

       The generated parser does not contain '#line' directives to direct C compiler errors back to the  grammar
       description when appropriate.

SEE ALSO

       D. Val Schorre, META II, a syntax-oriented compiler writing language, 19th ACM National Conference, 1964,
       pp. 41.301--41.311.  Describes a  self-implementing  parser  generator  for  analytic  grammars  with  no
       backtracking.

       Alexander  Birman,  The  TMG  Recognition  Schema,  Ph.D.  dissertation, Princeton, 1970.  A mathematical
       treatment of the power and complexity of recursive-descent parsing with backtracking.

       Bryan Ford, Parsing Expression Grammars: A Recognition-Based Syntactic Foundation, ACM SIGPLAN  Symposium
       on Principles of Programming Languages, 2004.  Defines PEGs and analyses them in relation to context-free
       and regular grammars.  Introduces the syntax adopted in peg.

       The standard Unix utilities lex(1) and yacc(1) which influenced the syntax and features of leg.

       The source code for peg and leg whose grammar parsers are written using themselves.

       The latest version of this software and documentation:

           http://piumarta.com/software/peg

AUTHOR

       peg, leg and this manual page were written by Ian  Piumarta  (first-name  at  last-name  dot  com)  while
       investigating  the  viability  of regular and parsing-expression grammars for efficiently extracting type
       and signature information from C header files.

       Please send bug reports and suggestions for improvements to the author at the above address.