Provided by: tcllib_1.21+dfsg-1_all bug

NAME

       grammar::fa::op - Operations on finite automatons

SYNOPSIS

       package require Tcl  8.4

       package require snit

       package require struct::list

       package require struct::set

       package require grammar::fa::op  ?0.4.1?

       ::grammar::fa::op::constructor cmd

       ::grammar::fa::op::reverse fa

       ::grammar::fa::op::complete fa ?sink?

       ::grammar::fa::op::remove_eps fa

       ::grammar::fa::op::trim fa ?what?

       ::grammar::fa::op::determinize fa ?mapvar?

       ::grammar::fa::op::minimize fa ?mapvar?

       ::grammar::fa::op::complement fa

       ::grammar::fa::op::kleene fa

       ::grammar::fa::op::optional fa

       ::grammar::fa::op::union fa fb ?mapvar?

       ::grammar::fa::op::intersect fa fb ?mapvar?

       ::grammar::fa::op::difference fa fb ?mapvar?

       ::grammar::fa::op::concatenate fa fb ?mapvar?

       ::grammar::fa::op::fromRegex fa regex ?over?

       ::grammar::fa::op::toRegexp fa

       ::grammar::fa::op::toRegexp2 fa

       ::grammar::fa::op::toTclRegexp regexp symdict

       ::grammar::fa::op::simplifyRegexp regexp

_________________________________________________________________________________________________

DESCRIPTION

       This  package provides a number of complex operations on finite automatons (Short: FA), as
       provided by the package grammar::fa.  The package does not provide the ability  to  create
       and/or  manipulate such FAs, nor the ability to execute a FA for a stream of symbols.  Use
       the packages grammar::fa and grammar::fa::interpreter for that.  Another  package  related
       to  this  is  grammar::fa::compiler  which turns a FA into an executor class which has the
       definition of the FA hardwired into it.

       For more information about what a finite automaton is see  section  FINITE  AUTOMATONS  in
       package grammar::fa.

API

       The  package  exports  the  API described here.  All commands modify their first argument.
       I.e. whatever FA they compute is  stored  back  into  it.  Some  of  the  operations  will
       construct  an  automaton whose states are all new, but related to the states in the source
       automaton(s). These operations take variable names as optional arguments where  they  will
       store  mappings  which  describe  the  relationship(s).   The  operations  can  be loosely
       partitioned into structural and language operations. The latter are defined  in  terms  of
       the  language  the  automaton(s)  accept,  whereas  the former are defined in terms of the
       structural properties of the involved automaton(s). Some operations are  both.   Structure
       operations

       ::grammar::fa::op::constructor cmd
              This  command  has  to  be  called  by  the  user  of  the package before any other
              operations is performed, to establish a command which can be used to construct a FA
              container  object.  If  this  is  not done several operations will fail as they are
              unable to construct internal and transient containers to hold state and/or  partial
              results.

              Any  container  class  using this package for complex operations should set its own
              class command as the constructor. See package grammar::fa for an example.

       ::grammar::fa::op::reverse fa
              Reverses the fa. This is done by reversing the direction  of  all  transitions  and
              swapping  the  sets  of  start  and  final  states.  The  language  of  fa  changes
              unpredictably.

       ::grammar::fa::op::complete fa ?sink?
              Completes the fa complete, but nothing is done if the fa is already complete.  This
              implies that only the first in a series of multiple consecutive complete operations
              on fa will perform anything. The remainder will be null operations.

              The language of fa is unchanged by this operation.

              This is done by adding a single new state, the sink, and transitions from all other
              states  to  that sink for all symbols they have no transitions for. The sink itself
              is made complete by adding loop transitions for all symbols.

              Note: When a FA has epsilon-transitions transitions over a symbol for a state S can
              be indirect, i.e. not attached directly to S, but to a state in the epsilon-closure
              of S. The symbols for such indirect transitions count when  computing  completeness
              of a state. In other words, these indirectly reached symbols are not missing.

              The  argument  sink  provides the name for the new state and most not be present in
              the fa if specified. If the name is not specified the command will name  the  state
              "sinkn", where n is set so that there are no collisions with existing states.

              Note that the sink state is not useful by definition.  In other words, while the FA
              becomes complete, it is also not useful in the strict sense as it has a state  from
              which no final state can be reached.

       ::grammar::fa::op::remove_eps fa
              Removes all epsilon-transitions from the fa in such a manner the the language of fa
              is unchanged. However nothing is done if the  fa  is  already  epsilon-free.   This
              implies that only the first in a series of multiple consecutive complete operations
              on fa will perform anything. The remainder will be null operations.

              Note: This operation may cause states to become unreachable or  not  useful.  These
              states  are  not  removed  by this operation.  Use ::grammar::fa::op::trim for that
              instead.

       ::grammar::fa::op::trim fa ?what?
              Removes unwanted baggage from fa.  The legal values for what are listed below.  The
              command defaults to !reachable|!useful if no specific argument was given.

              !reachable
                     Removes all states which are not reachable from a start state.

              !useful
                     Removes all states which are unable to reach a final state.

              !reachable&!useful

              !(reachable|useful)
                     Removes all states which are not reachable from a start state and are unable
                     to reach a final state.

              !reachable|!useful

              !(reachable&useful)
                     Removes all states which are not reachable from a start state or are  unable
                     to reach a final state.

       ::grammar::fa::op::determinize fa ?mapvar?
              Makes  the  fa  deterministic  without  changing  the  language accepted by the fa.
              However nothing is done if the fa is already deterministic. This implies that  only
              the  first  in  a  series  of  multiple  consecutive complete operations on fa will
              perform anything. The remainder will be null operations.

              The command will store a dictionary describing the  relationship  between  the  new
              states  of  the  resulting dfa and the states of the input nfa in mapvar, if it has
              been specified. Keys of the dictionary are  the  handles  for  the  states  of  the
              resulting dfa, values are sets of states from the input nfa.

              Note:  An  empty  dictionary  signals  that  the  command  was  able to make the fa
              deterministic without performing a  full  subset  construction,  just  by  removing
              states and shuffling transitions around (As part of making the FA epsilon-free).

              Note:  The  algorithm  fails to make the FA deterministic in the technical sense if
              the FA has no start state(s), because determinism requires the FA to  have  exactly
              one  start  states.  In that situation we make a best effort; and the missing start
              state will be the  only  condition  preventing  the  generated  result  from  being
              deterministic.   It  should  also  be noted that in this case the possibilities for
              trimming states from the FA are also severely reduced as we cannot  declare  states
              unreachable.

       ::grammar::fa::op::minimize fa ?mapvar?
              Creates  a  FA  which  accepts the same language as fa, but has a minimal number of
              states. Uses Brzozowski's method to accomplish this.

              The command will store a dictionary describing the  relationship  between  the  new
              states  of the resulting minimal fa and the states of the input fa in mapvar, if it
              has been specified. Keys of the dictionary are the handles for the  states  of  the
              resulting minimal fa, values are sets of states from the input fa.

              Note:  An  empty  dictionary  signals  that the command was able to minimize the fa
              without having to compute new states. This should happen if and only if  the  input
              FA was already minimal.

              Note:  If  the  algorithm has no start or final states to work with then the result
              might be technically minimal, but have a very unexpected structure.  It should also
              be  noted  that  in this case the possibilities for trimming states from the FA are
              also severely reduced as we cannot declare states unreachable.

       Language operations All operations in this section require that  all  input  FAs  have  at
       least  one  start and at least one final state. Otherwise the language of the FAs will not
       be defined, making the operation senseless (as it operates on the languages of the FAs  in
       a defined manner).

       ::grammar::fa::op::complement fa
              Complements  fa.  This is possible if and only if fa is complete and deterministic.
              The resulting FA accepts the complementary language of  fa.  In  other  words,  all
              inputs not accepted by the input are accepted by the result, and vice versa.

              The  result  will have all states and transitions of the input, and different final
              states.

       ::grammar::fa::op::kleene fa
              Applies Kleene's closure to fa.  The resulting FA accepts all strings S  for  which
              we  can find a natural number n (0 inclusive) and strings A1 ... An in the language
              of fa such that S is the concatenation of A1 ... An.  In other words, the  language
              of  the  result  is  the  infinite union over finite length concatenations over the
              language of fa.

              The result will have all states and transitions of the input,  and  new  start  and
              final states.

       ::grammar::fa::op::optional fa
              Makes the fa optional. In other words it computes the FA which accepts the language
              of fa and the empty the word (epsilon) as well.

              The result will have all states and transitions of the input,  and  new  start  and
              final states.

       ::grammar::fa::op::union fa fb ?mapvar?
              Combines  the  FAs  fa  and  fb such that the resulting FA accepts the union of the
              languages of the two FAs.

              The result will have all states and transitions of the two input FAs, and new start
              and  final  states. All states of fb which exist in fa as well will be renamed, and
              the mapvar will contain a mapping from the old states of fb to  the  new  ones,  if
              present.

              It  should  be  noted that the result will be non-deterministic, even if the inputs
              are deterministic.

       ::grammar::fa::op::intersect fa fb ?mapvar?
              Combines the FAs fa and fb such that the resulting FA accepts the  intersection  of
              the  languages of the two FAs. In other words, the result will accept a word if and
              only if the word is accepted by both fa and fb. The result will be useful, but  not
              necessarily deterministic or minimal.

              The  command  will  store  a dictionary describing the relationship between the new
              states of the resulting fa and the pairs of states of the input FAs in  mapvar,  if
              it has been specified. Keys of the dictionary are the handles for the states of the
              resulting fa, values are pairs of states from the input FAs. Pairs are  represented
              by  lists. The first element in each pair will be a state in fa, the second element
              will be drawn from fb.

       ::grammar::fa::op::difference fa fb ?mapvar?
              Combines the FAs fa and fb such that the resulting FA accepts the difference of the
              languages of the two FAs. In other words, the result will accept a word if and only
              if the word is accepted by fa, but not by fb. This can also  be  expressed  as  the
              intersection  of  fa  with the complement of fb. The result will be useful, but not
              necessarily deterministic or minimal.

              The command will store a dictionary describing the  relationship  between  the  new
              states  of  the resulting fa and the pairs of states of the input FAs in mapvar, if
              it has been specified. Keys of the dictionary are the handles for the states of the
              resulting  fa, values are pairs of states from the input FAs. Pairs are represented
              by lists. The first element in each pair will be a state in fa, the second  element
              will be drawn from fb.

       ::grammar::fa::op::concatenate fa fb ?mapvar?
              Combines  the FAs fa and fb such that the resulting FA accepts the cross-product of
              the languages of the two FAs. I.e. a word W will be accepted by the result if there
              are  two words A and B accepted by fa, and fb resp. and W is the concatenation of A
              and B.

              The result FA will be non-deterministic.

       ::grammar::fa::op::fromRegex fa regex ?over?
              Generates a non-deterministic FA which accepts the same  language  as  the  regular
              expression  regex. If the over is specified it is treated as the set of symbols the
              regular expression and the automaton are defined over. The command will compute the
              set  from  the  "S"  constructors in regex when over was not specified. This set is
              important if and only if the complement operator  "!"  is  used  in  regex  as  the
              complementary language of an FA is quite different for different sets of symbols.

              The  regular expression is represented by a nested list, which forms a syntax tree.
              The following structures are legal:

              {S x}  Atomic regular  expression.  Everything  else  is  constructed  from  these.
                     Accepts the Symbol "x".

              {. A1 A2 ...}
                     Concatenation operator. Accepts the concatenation of the regular expressions
                     A1, A2, etc.

                     Note that this operator accepts zero or more arguments. With zero  arguments
                     the represented language is epsilon, the empty word.

              {| A1 A2 ...}
                     Choice operator, also called "Alternative". Accepts all input accepted by at
                     least one of the regular expressions A1, A2, etc. In other words, the  union
                     of A1, A2.

                     Note  that this operator accepts zero or more arguments. With zero arguments
                     the represented language is the empty language, the language without words.

              {& A1 A2 ...}
                     Intersection operator, logical and. Accepts  all  input  accepted  which  is
                     accepted  by all of the regular expressions A1, A2, etc. In other words, the
                     intersection of A1, A2.

              {? A}  Optionality operator. Accepts the empty word and anything from  the  regular
                     expression A.

              {* A}  Kleene closure. Accepts the empty word and any finite concatenation of words
                     accepted by the regular expression A.

              {+ A}  Positive Kleene closure. Accepts any finite concatenation of words  accepted
                     by the regular expression A, but not the empty word.

              {! A}  Complement operator. Accepts any word not accepted by the regular expression
                     A. Note that the complement depends on the set of symbol the  result  should
                     run over. See the discussion of the argument over before.

       ::grammar::fa::op::toRegexp fa
              This  command  generates  and  returns  a regular expression which accepts the same
              language as the finite automaton fa. The regular expression is  in  the  format  as
              described above, for ::grammar::fa::op::fromRegex.

       ::grammar::fa::op::toRegexp2 fa
              This  command has the same functionality as ::grammar::fa::op::toRegexp, but uses a
              different algorithm to simplify the generated regular expressions.

       ::grammar::fa::op::toTclRegexp regexp symdict
              This command generates and returns a regular  expression  in  Tcl  syntax  for  the
              regular  expression  regexp,  if  that is possible. regexp is in the same format as
              expected by ::grammar::fa::op::fromRegex.

              The command will fail and throw an error if  regexp  contains  complementation  and
              intersection operations.

              The  argument  symdict  is  a dictionary mapping symbol names to pairs of syntactic
              type and Tcl-regexp. If a symbol occurring in the regexp  is  not  listed  in  this
              dictionary  then  single-character  symbols  are considered to designate themselves
              whereas multiple-character symbols are considered to be a character class name.

       ::grammar::fa::op::simplifyRegexp regexp
              This command simplifies a regular expression by applying  the  following  algorithm
              first to the main expression and then recursively to all sub-expressions:

              [1]    Convert the expression into a finite automaton.

              [2]    Minimize the automaton.

              [3]    Convert the automaton back to a regular expression.

              [4]    Choose  the  shorter of original expression and expression from the previous
                     step.

EXAMPLES

BUGS, IDEAS, FEEDBACK

       This document, and the package it describes,  will  undoubtedly  contain  bugs  and  other
       problems.   Please  report  such  in  the  category  grammar_fa  of  the  Tcllib  Trackers
       [http://core.tcl.tk/tcllib/reportlist].  Please also report any ideas for enhancements you
       may have for either package and/or documentation.

       When proposing code changes, please provide unified diffs, i.e the output of diff -u.

       Note further that attachments are strongly preferred over inlined patches. Attachments can
       be made by going to the Edit form of the ticket immediately after its creation,  and  then
       using the left-most button in the secondary navigation bar.

KEYWORDS

       automaton,  finite  automaton,  grammar,  parsing,  regular  expression,  regular grammar,
       regular languages, state, transducer

CATEGORY

       Grammars and finite automata

COPYRIGHT

       Copyright (c) 2004-2008 Andreas Kupries <andreas_kupries@users.sourceforge.net>