Ubuntu Manpages
input texinfo @c -*-texinfo-*- @c %**start of header @setfilename bovine.info @set TITLE Bovine parser development @set AUTHOR Eric M. Ludlam, David Ponce, and Richard Y. Kim @settitle @value{TITLE}

@c ************************************************************************* @c @ Header @c *************************************************************************

@c Merge all indexes into a single index for now. @c We can always separate them later into two or more as needed. @syncodeindex vr cp @syncodeindex fn cp @syncodeindex ky cp @syncodeindex pg cp @syncodeindex tp cp

@c @footnotestyle separate @c @paragraphindent 2 @c @@smallbook @c %**end of header

@copying This manual documents Bovine parser development in Semantic

Copyright @copyright{} 1999, 2000, 2001, 2002, 2003, 2004 Eric M. Ludlam Copyright @copyright{} 2001, 2002, 2003, 2004 David Ponce Copyright @copyright{} 2002, 2003 Richard Y. Kim

@quotation Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with the Invariant Sections being list their titles, with the Front-Cover Texts being list, and with the Back-Cover Texts being list. A copy of the license is included in the section entitled ``GNU Free Documentation License''. @end quotation @end copying

@ifinfo @dircategory Emacs @direntry * Semantic bovine parser development: (bovine). @end direntry @end ifinfo

@iftex @finalout @end iftex

@c @setchapternewpage odd @c @setchapternewpage off

@ifinfo This file documents parser development with the bovine parser generator @emph{Infrastructure for parser based text analysis in Emacs}

Copyright @copyright{} 1999, 2000, 2001, 2002, 2003, 2004 @value{AUTHOR} @end ifinfo

@titlepage @sp 10 @title @value{TITLE} @author by @value{AUTHOR} @vskip 0pt plus 1 fill Copyright @copyright{} 1999, 2000, 2001, 2002, 2003, 2004 @value{AUTHOR} @page @vskip 0pt plus 1 fill @insertcopying @end titlepage @page

@c MACRO inclusion @include semanticheader.texi

@c ************************************************************************* @c @ Document @c ************************************************************************* @contents

@node top @top @value{TITLE}

The @dfn{bovine} parser is the original @semantic{} parser, and is an implementation of an @acronym{LL} parser. It is good for simple languages. It has many conveniences making grammar writing easy. The conveniences make it less powerful than a Bison-like @acronym{LALR} parser. For more information, @inforef{top, the Wisent Parser Manual, wisent}.

Bovine @acronym{LL} grammars are stored in files with a @file{.by} extension. When compiled, the contents is converted into a file of the form @file{NAME-by.el}. This, in turn is byte compiled. @inforef{top, Grammar Framework Manual, grammar-fw}.

@menu * Starting Rules:: The starting rules for the grammar. * Bovine Grammar Rules:: Rules used to parse a language * Optional Lambda Expression:: Actions to take when a rule is matched * Bovine Examples:: Simple Samples * GNU Free Documentation License:: * Index:: @end menu

@node Starting Rules @chapter Starting Rules

In Bison, one and only one nonterminal is designated as the ``start'' symbol. In @semantic{}, one or more nonterminals can be designated as the ``start'' symbol. They are declared following the @code{%start} keyword separated by spaces. @inforef{start Decl, ,grammar-fw}.

If no @code{%start} keyword is used in a grammar, then the very first is used. Internally the first start nonterminal is targeted by the reserved symbol @code{bovine-toplevel}, so it can be found by the parser harness.

To find locally defined variables, the local context handler needs to parse the body of functional code. The @code{scopestart} declaration specifies the name of a nonterminal used as the goal to parse a local context, @inforef{scopestart Decl, ,grammar-fw}. Internally the scopestart nonterminal is targeted by the reserved symbol @code{bovine-inner-scope}, so it can be found by the parser harness.

@node Bovine Grammar Rules @chapter Bovine Grammar Rules

The rules are what allow the compiler to create tags from a language file. Once the setup is done in the prologue, you can start writing rules. @inforef{Grammar Rules, ,grammar-fw}.

@example @var{result} : @var{components1} @var{optional-semantic-action1})
| @var{components2} @var{optional-semantic-action2}
; @end example

@var{result} is a nonterminal, that is a symbol synthesized in your grammar. @var{components} is a list of elements that are to be matched if @var{result} is to be made. @var{optional-semantic-action} is an optional sequence of simplified Emacs Lisp expressions for concocting the parse tree.

In bison, each time an element of @var{components} is found, it is @dfn{shifted} onto the parser stack. (The stack of matched elements.) When all @var{components}' elements have been matched, it is @dfn{reduced} to @var{result}. @xref{(bison)Algorithm}.

A particular @var{result} written into your grammar becomes the parser's goal. It is designated by a @code{%start} statement (@pxref{Starting Rules}). The value returned by the associated @var{optional-semantic-action} is the parser's result. It should be a tree of @semantic{} @dfn{tags}, @inforef{Semantic Tags, , semantic-appdev}.

@var{components} is made up of symbols. A symbol such as @code{FOO} means that a syntactic token of class @code{FOO} must be matched.

@menu * How Lexical Tokens Match:: * Grammar-to-Lisp Details:: * Order of components in rules:: @end menu

@node How Lexical Tokens Match @section How Lexical Tokens Match

A lexical rule must be used to define how to match a lexical token.

For instance:

@example %keyword FOO "foo" @end example

Means that @code{FOO} is a reserved language keyword, matched as such by looking up into a keyword table, @inforef{keyword Decl, ,grammar-fw}. This is because @code{"foo"} will be converted to @code{FOO} in the lexical analysis stage. Thus the symbol @code{FOO} won't be available any other way.

If we specify our token in this way:

@example %token <symbol> FOO "foo" @end example

then @code{FOO} will match the string @code{"foo"} explicitly, but it won't do so at the lexical level, allowing use of the text @code{"foo"} in other forms of regular expressions.

In that case, @code{FOO} is a @code{symbol}-type token. To match, a @code{symbol} must first be encountered, and then it must @code{string-match "foo"}.

@table @strong @item Caution: Be especially careful to remember that @code{"foo"}, and more generally the %token's match-value string, is a regular expression! @end table

Non symbol tokens are also allowed. For example:

@example %token <punctuation> PERIOD "[.]"

filename : symbol PERIOD symbol
; @end example

@code{PERIOD} is a @code{punctuation}-type token that will explicitly match one period when used in the above rule.

@table @strong @item Please Note: @code{symbol}, @code{punctuation}, etc., are predefined lexical token types, based on the @dfn{syntax class}-character associations currently in effect. @end table

@node Grammar-to-Lisp Details @section Grammar-to-Lisp Details

For the bovinator, lexical token matching patterns are @emph{inlined}. When the grammar-to-lisp converter encounters a lexical token declaration of the form:

@example %token <@var{type}> @var{token-name} @var{match-value} @end example

It substitutes every occurrences of @var{token-name} in rules, by its expanded form:

@example @var{type} @var{match-value} @end example

For example:

@example %token <symbol> MOOSE "moose"

find_a_moose: MOOSE
; @end example

Will generate this pseudo equivalent-rule:

@example find_a_moose: symbol "moose" ;; invalid syntax!
; @end example

Thus, from the bovinator point of view, the @var{components} part of a rule is made up of symbols and strings. A string in the mix means that the previous symbol must have the additional constraint of exactly matching it, as described in @ref{How Lexical Tokens Match}.

@table @strong @item Please Note: For the bovinator, this task was mixed into the language definition to simplify implementation, though Bison's technique is more efficient. @end table

@node Order of components in rules @section Order of components in rules

If a rule has multiple components, order is important, for example

@example headerfile : symbol PERIOD symbol
| symbol
; @end example

would match @samp{foo.h} or the @acronym{C++} header @samp{foo}. The bovine parser will first attempt to match the long form, and then the short form. If they were in reverse order, then the long form would never be tested.

@c @xref{Default syntactic tokens}.

@node Optional Lambda Expression @chapter Optional Lambda Expressions

The @acronym{OLE} (@dfn{Optional Lambda Expression}) is converted into a bovine lambda. This lambda has special short-cuts to simplify reading the semantic action definition. An @acronym{OLE} like this:

@example ( $1 ) @end example

results in a lambda return which consists entirely of the string or object found by matching the first (zeroth) element of match. An @acronym{OLE} like this:

@example ( ,(foo $1) ) @end example

executes @code{foo} on the first argument, and then splices its return into the return list whereas:

@example ( (foo $1) ) @end example

executes @code{foo}, and that is placed in the return list.

Here are other things that can appear inline:

@table @code @item $1 The first object matched.

@item ,$1 The first object spliced into the list (assuming it is a list from a non-terminal).

@item '$1 The first object matched, placed in a list. i.e. @code{( $1 )}.

@item foo The symbol @code{foo} (exactly as displayed).

@item (foo) A function call to foo which is stuck into the return list.

@item ,(foo) A function call to foo which is spliced into the return list.

@item '(foo) A function call to foo which is stuck into the return list in a list.

@item (EXPAND @var{$1} @var{nonterminal} @var{depth}) A list starting with @code{EXPAND} performs a recursive parse on the token passed to it (represented by @samp{$1} above.) The @dfn{semantic list} is a common token to expand, as there are often interesting things in the list. The @var{nonterminal} is a symbol in your table which the bovinator will start with when parsing. @var{nonterminal}'s definition is the same as any other nonterminal. @var{depth} should be at least @samp{1} when descending into a semantic list.

@item (EXPANDFULL @var{$1} @var{nonterminal} @var{depth}) Is like @code{EXPAND}, except that the parser will iterate over @var{nonterminal} until there are no more matches. (The same way the parser iterates over the starting rule (@pxref{Starting Rules}). This lets you have much simpler rules in this specific case, and also lets you have positional information in the returned tokens, and error skipping.

@item (ASSOC @var{symbol1} @var{value1} @var{symbol2} @var{value2} @dots{}) This is used for creating an association list. Each @var{symbol} is included in the list if the associated @var{value} is non-@code{nil}. While the items are all listed explicitly, the created structure is an association list of the form:

@example ((@var{symbol1} . @var{value1}) (@var{symbol2} . @var{value2}) @dots{}) @end example

@item (TAG @var{name} @var{class} [@var{attributes}]) This creates one tag in the current buffer.

@table @var @item name Is a string that represents the tag in the language.

@item class Is the kind of tag being create, such as @code{function}, or @code{variable}, though any symbol will work.

@item attributes Is an optional set of labeled values such as @w{@code{:constant-flag t :parent "parenttype"}}. @end table

@item (TAG-VARIABLE @var{name} @var{type} @var{default-value} [@var{attributes}]) @itemx (TAG-FUNCTION @var{name} @var{type} @var{arg-list} [@var{attributes}]) @itemx (TAG-TYPE @var{name} @var{type} @var{members} @var{parents} [@var{attributes}]) @itemx (TAG-INCLUDE @var{name} @var{system-flag} [@var{attributes}]) @itemx (TAG-PACKAGE @var{name} @var{detail} [@var{attributes}]) @itemx (TAG-CODE @var{name} @var{detail} [@var{attributes}]) Create a tag with @var{name} of respectively the class @code{variable}, @code{function}, @code{type}, @code{include}, @code{package}, and @code{code}. See @inforef{Creating Tags, , semantic-appdev} for the lisp functions these translate into. @end table

If the symbol @code{%quotemode backquote} is specified, then use @code{,@@} to splice a list in, and @code{,} to evaluate the expression. This lets you send @code{$1} as a symbol into a list instead of having it expanded inline.

@node Bovine Examples @chapter Examples

The rule:

@example any-symbol: symbol
; @end example

is equivalent to

@example any-symbol: symbol
( $1 )
; @end example

which, if it matched the string @samp{"A"}, would return

@example ( "A" ) @end example

If this rule were used like this:

@example %token <punctuation> EQUAL "=" @dots{} assign: any-symbol EQUAL any-symbol
( $1 $3 )
; @end example

it would match @samp{"A=B"}, and return

@example ( ("A") ("B") ) @end example

The letters @samp{A} and @samp{B} come back in lists because @samp{any-symbol} is a nonterminal, not an actual lexical element.

To get a better result with nonterminals, use @asis{,} to splice lists in like this:

@example %token <punctuation> EQUAL "=" @dots{} assign: any-symbol EQUAL any-symbol
( ,$1 ,$3 )
; @end example

which would return

@example ( "A" "B" ) @end example

@node GNU Free Documentation License @appendix GNU Free Documentation License

@include fdl.texi

@node Index @unnumbered Index @printindex cp

@iftex @contents @summarycontents @end iftex

@bye

@c Following comments are for the benefit of ispell.

@c LocalWords: bovinator inlined