Provided by: tcllib_1.21+dfsg-1_all bug

NAME

       doctools::idx::parse - Parsing text in docidx format

SYNOPSIS

       package require doctools::idx::parse  ?0.1?

       package require Tcl  8.4

       package require doctools::idx::structure

       package require doctools::msgcat

       package require doctools::tcl::parse

       package require fileutil

       package require logger

       package require snit

       package require struct::list

       package require struct::stack

       ::doctools::idx::parse text text

       ::doctools::idx::parse file path

       ::doctools::idx::parse includes

       ::doctools::idx::parse include add path

       ::doctools::idx::parse include remove path

       ::doctools::idx::parse include clear

       ::doctools::idx::parse vars

       ::doctools::idx::parse var set name value

       ::doctools::idx::parse var unset name

       ::doctools::idx::parse var clear ?pattern?

_________________________________________________________________________________________________

DESCRIPTION

       This  package  provides  commands  to parse text written in the docidx markup language and
       convert it into the canonical serialization of the keyword index encoded in the text.  See
       the section Keyword index serialization format for specification of their format.

       This  is  an  internal  package of doctools, for use by the higher level packages handling
       docidx documents.

API

       ::doctools::idx::parse text text
              The command takes the string contained in text and parses it under  the  assumption
              that  it  contains a document written using the docidx markup language. An error is
              thrown if this assumption is found to be false.  The  format  of  these  errors  is
              described in section Parse errors.

              When  successful  the  command  returns  the canonical serialization of the keyword
              index which was encoded in the text.  See the section Keyword  index  serialization
              format for specification of that format.

       ::doctools::idx::parse file path
              The  same as text, except that the text to parse is read from the file specified by
              path.

       ::doctools::idx::parse includes
              This method returns the current list of search paths used when looking for  include
              files.

       ::doctools::idx::parse include add path
              This method adds the path to the list of paths searched when looking for an include
              file. The call is ignored if the path is already in the list of paths.  The  method
              returns the empty string as its result.

       ::doctools::idx::parse include remove path
              This  method  removes  the path from the list of paths searched when looking for an
              include file. The call is ignored if the path is  not  contained  in  the  list  of
              paths. The method returns the empty string as its result.

       ::doctools::idx::parse include clear
              This method clears the list of search paths for include files.

       ::doctools::idx::parse vars
              This method returns a dictionary containing the current set of predefined variables
              known to the vset markup command during processing.

       ::doctools::idx::parse var set name value
              This method adds the variable name to the set of predefined variables known to  the
              vset markup command during processing, and gives it the specified value. The method
              returns the empty string as its result.

       ::doctools::idx::parse var unset name
              This method removes the variable name from the set of predefined variables known to
              the  vset  markup command during processing. The method returns the empty string as
              its result.

       ::doctools::idx::parse var clear ?pattern?
              This method removes all variables matching the pattern from the set  of  predefined
              variables  known  to  the vset markup command during processing. The method returns
              the empty string as its result.

              The pattern matching is done with string match, and the default pattern  used  when
              none is specified, is *.

PARSE ERRORS

       The  format  of the parse error messages thrown when encountering violations of the docidx
       markup syntax is human readable and not intended for processing by machines. As such it is
       not documented.

       However,  the  errorCode attached to the message is machine-readable and has the following
       format:

       [1]    The error code will be a list, each element describing a single error found in  the
              input. The list has at least one element, possibly more.

       [2]    Each  error  element  will  be a list containing six strings describing an error in
              detail. The strings will be

              [1]    The path of the file the error occurred in. This may be empty.

              [2]    The range of the token the error was found at. This range is  a  two-element
                     list  containing  the  offset  of the first and last character in the range,
                     counted from the beginning of the input (file).  Offsets  are  counted  from
                     zero.

              [3]    The  line the first character after the error is on.  Lines are counted from
                     one.

              [4]    The column the first character after the error is at.  Columns  are  counted
                     from zero.

              [5]    The  message  code  of  the  error.  This  value  can be used as argument to
                     msgcat::mc  to  obtain  a  localized  error  message,  assuming   that   the
                     application  had a suitable call of doctools::msgcat::init to initialize the
                     necessary message catalogs (See package doctools::msgcat).

              [6]    A list of details for the error, like the markup command  involved.  In  the
                     case  of  message code docidx/include/syntax this value is the set of errors
                     found in the included file, using the format described here.

[DOCIDX] NOTATION OF KEYWORD INDICES

       The docidx format for keyword indices, also called the  docidx  markup  language,  is  too
       large  to  be  covered  in  single  section.   The interested reader should start with the
       document

       [1]    docidx language introduction

       and then proceed from there to the formal specifications, i.e. the documents

       [1]    docidx language syntax and

       [2]    docidx language command reference.

       to get a thorough understanding of the language.

KEYWORD INDEX SERIALIZATION FORMAT

       Here we specify the format used by the doctools v2 packages to serialize  keyword  indices
       as immutable values for transport, comparison, etc.

       We  distinguish  between  regular  and canonical serializations. While a keyword index may
       have more than one regular serialization only exactly one of them will be canonical.

       regular serialization

              [1]    An index serialization is a nested Tcl dictionary.

              [2]    This dictionary holds a single key, doctools::idx, and its value. This value
                     holds the contents of the index.

              [3]    The  contents  of  the  index  are a Tcl dictionary holding the title of the
                     index, a label, and the keywords and references. The relevant keys and their
                     values are

                     title  The value is a string containing the title of the index.

                     label  The value is a string containing a label for the index.

                     keywords
                            The  value is a Tcl dictionary, using the keywords known to the index
                            as keys. The associated values are lists containing  the  identifiers
                            of the references associated with that particular keyword.

                            Any reference identifier used in these lists has to exist as a key in
                            the references dictionary, see the next item for its definition.

                     references
                            The value  is  a  Tcl  dictionary,  using  the  identifiers  for  the
                            references  known  to  the  index  as keys. The associated values are
                            2-element lists containing the type and label of  the  reference,  in
                            this order.

                            Any  key  here  has  to be associated with at least one keyword, i.e.
                            occur in at least one of the reference lists which are the values  in
                            the keywords dictionary, see previous item for its definition.

              [4]    The type of a reference can be one of two values,

                     manpage
                            The identifier of the reference is interpreted as symbolic file name,
                            referring to one of the documents the index was made for.

                     url    The identifier of the reference is interpreted as an  url,  referring
                            to some external location, like a website, etc.

       canonical serialization
              The  canonical  serialization of a keyword index has the format as specified in the
              previous item, and then additionally satisfies the constraints below, which make it
              unique among all the possible serializations of the keyword index.

              [1]    The  keys  found  in all the nested Tcl dictionaries are sorted in ascending
                     dictionary order, as generated by Tcl's builtin  command  lsort  -increasing
                     -dict.

              [2]    The  references  listed for each keyword of the index, if any, are listed in
                     ascending dictionary order of their labels, as generated  by  Tcl's  builtin
                     command lsort -increasing -dict.

BUGS, IDEAS, FEEDBACK

       This  document,  and  the  package  it  describes, will undoubtedly contain bugs and other
       problems.   Please  report  such  in  the  category  doctools  of  the   Tcllib   Trackers
       [http://core.tcl.tk/tcllib/reportlist].  Please also report any ideas for enhancements you
       may have for either package and/or documentation.

       When proposing code changes, please provide unified diffs, i.e the output of diff -u.

       Note further that attachments are strongly preferred over inlined patches. Attachments can
       be  made  by going to the Edit form of the ticket immediately after its creation, and then
       using the left-most button in the secondary navigation bar.

KEYWORDS

       docidx, doctools, lexer, parser

CATEGORY

       Documentation tools

COPYRIGHT

       Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>