Provided by: libbtparse-dev_0.88-4_amd64 bug

NAME

       bt_post_processing - post-processing of BibTeX strings, values, and entries

SYNOPSIS

          void bt_postprocess_string (char * s,
                                      btshort options)

          char * bt_postprocess_value (AST *   value,
                                       btshort  options,
                                       boolean replace);

          char * bt_postprocess_field (AST *   field,
                                       btshort  options,
                                       boolean replace);

          void bt_postprocess_entry (AST *  entry,
                                     btshort options);

DESCRIPTION

       When btparse parses a BibTeX entry, it initially stores the results in an abstract syntax
       tree (AST), in a form exactly mirroring the parsed data.  For example, the entry

          @Article{Jones:1997a,
            AuThOr = "Bob   Jones" # and # "Jim Smith ",
            TITLE = "Feeding Habits of
                     the Common Cockroach",
            JoUrNaL = j_ent,
            YEAR = 1997
          }

       would parse to an AST that could be represented as follows:

          (entry,"Article")
            (key,"Jones:1997a")
            (field,"AuThOr")
              (string,"Bob   Jones")
              (macro,"and")
              (string,"Jim Smith ")
            (field,"TITLE")
              (string,"Feeding Habits of               the Common Cockroach")
            (field,"JoUrNaL")
              (macro,"j_ent")
            (field,"YEAR")
              (number,"1997")

       The advantage of this form is that all the important information in the entry is readily
       available by traversing the tree using the functions described in bt_traversal.  This
       obvious problem is that the data is a little too raw to be immediately useful: entry types
       and field names are inconsistently capitalized, strings are full of unwanted whitespace,
       field values not reduced to single strings, and so forth.

       All of these problems are addressed by btparse's post-processing functions, described
       here.  Normally, you won't have to call these functions---the library does the Right Thing
       for you after parsing each entry, and you can customize what exactly the Right Thing is
       for your application.  (For instance, you can tell it to expand macros, but not to
       concatenate substrings together.)  However, it's conceivable that you might wish to move
       the post-processing into your own code and out of the library's control.  More likely, you
       could have strings that come from something other than BibTeX files that you would like to
       have treated as BibTeX strings; for that situation, the post-processing functions are
       essential.  Finally, you might just be curious about what exactly happens to your data
       after it's parsed.  If so, you've come to the right place for excruciatingly detailed
       explanations.

FUNCTIONS

       btparse offers four points of entry to its post-processing code.  Of these, probably only
       the first and last---for processing individual strings and whole entries---will be
       commonly used.

   Post-processing entry points
       To understand why four entry points are offered, an explanation of the sample AST shown
       above will help.  First of all, the whole entry is represented by the "(entry,"Article")"
       node; this node has the entry key and all its field/value pairs as children.  Entry nodes
       are returned by "bt_parse_entry()" and "bt_parse_entry_s()" (see bt_input) as well as
       "bt_next_entry()" (which traverses a list of entries returned from "bt_parse_file()"---see
       bt_traversal).  Whole entries may be post-processed with "bt_postprocess_entry()".

       You may also need to post-process a single field, or just the value associated with it.
       (The difference is that processing the field can change the field name---e.g. to
       lowercase---in addition to the field value.)  The "(field,"AuThOr")" node above is an
       example of a field sub-AST, and "(string,"Bob   Jones")" is the first node in the list of
       simple values representing that field's value.  (Recall that a field value is, in general,
       a list of simple values.)  Field nodes are returned by "bt_next_field()", value nodes by
       "bt_next_value()".  The former may be passed to "bt_postprocess_field()" for post-
       processing, the latter to "bt_postprocess_value()".

       Finally, individual strings may wander into your program from many places other than a
       btparse AST.  For that reason, "bt_postprocess_string()" is available for post-processing
       arbitrary strings.

   Post-processing options
       All of the post-processing routines have an "options" parameter, which you can use to
       fine-tune the post-processing.  (This is just like the per-metatype string-processing
       options that you can set before parsing entries; see "bt_set_stringopts()" in bt_input.)
       Like elsewhere in the library, "options" is a bitmap constructed by or'ing together
       various predefined constants.  These constants and their effects are documented in "String
       processing option macros" in btparse.

       bt_postprocess_string ()
              void bt_postprocess_string (char * s,
                                          btshort options)

           Post-processes an individual string, "s", which is modified in place.  The only post-
           processing option that makes sense on individual strings is whether to collapse
           whitespace according to the BibTeX rules; thus, if "options & BTO_COLLAPSE" is false,
           this function has no effect.  (Although it makes a complete pass over the string
           anyways.  This is for future expansion.)

           The exact rules for collapsing whitespace are simple: non-space whitespace characters
           (tabs and newlines mainly) are converted to space, any strings of more than one space
           within are collapsed to a single space, and any leading or trailing spaces are
           deleted.  (Ensuring that all whitespace is spaces is actually done by btparse's
           lexical scanner, so strings in btparse ASTs will never have whitespace apart from
           space.  Likewise, any strings passed to bt_postprocess_string() should not contain
           non-space whitespace characters.)

       bt_postprocess_value ()
              char * bt_postprocess_value (AST *   value,
                                           btshort  options,
                                           boolean replace);

           Post-processes a single field value, which is the head of a list of simple values as
           returned by "bt_next_value()".  All of the relevant string-processing options come
           into play here: conversion of numbers to strings ("BTO_CONVERT"), macro expansion
           ("BTO_EXPAND"), collapsing of whitespace ("BTO_COLLAPSE"), and string pasting
           ("BTO_PASTE").  Since pasting substrings together without first expanding macros and
           converting numbers would be nonsensical, attempting to do so is a fatal error.

           If "replace" is true, then the list headed by "value" will be replaced by a list
           representing the processed value.  That is, if string pasting is turned on ("options &
           BTO_PASTE" is true), then this list will be collapsed to a single node containing the
           single string that results from pasting together all the substrings.  If string
           pasting is not on, then each node in the list will be left intact, but will have its
           text replaced by processed text.

           If "replace" is false, then a new string will be built on the fly and returned by the
           function.  Note that if pasting is not on in this case, you will only get the last
           string in the list.  (It doesn't really make a lot of sense to post-process a value
           without pasting unless you're replacing it with the new value, though.)

           Returns the string that resulted from processing the whole value, which only makes
           sense if pasting was on or there was only one value in the list.  If a multiple-value
           list was processed without pasting, the last string in the list is returned (after
           processing).

           Consider what might be done to the value of the "author" field in the above example,
           which is the concatenation of a string, a macro, and another string.  Assume that the
           macro "and" expands to " and ", and that the variable "value" points to the sub-AST
           for this value.  The original sub-AST corresponding to this value is

              (string,"Bob   Jones")
              (macro,"and")
              (string,"Jim Smith ")

           To fully process this value in-place, you would call

              bt_postprocess_value (value, BTO_FULL, TRUE);

           ("BTO_FULL" is just the combination of all possible string-processing options:
           "BTO_CONVERT|BTO_EXPAND|BTO_PASTE|BTO_COLLAPSE".)  This would convert the value to a
           single-element list,

              (string,"Bob Jones and Jim Smith")

           and return the fully-processed string "Bob Jones and Jim Smith".  Note that the "and"
           macro has been expanded, interpolated between the two literal strings, everything
           pasted together, and finally whitespace collapsed.  (Collapsing whitespace before
           concatenating the strings would be a bad idea.)

           Let's say you'd rather preserve the list nature of the value, while expanding macros
           and converting any numbers to strings.  (This conversion is trivial: it just changes
           the type of the node from "BTAST_NUMBER" to "BTAST_STRING".  "Number" values are
           always stored as a string of digits, just as they appear in the file.)  This would be
           done with the call

              bt_postprocess_value
                 (value, BTO_CONVERT|BTO_EXPAND|BTO_COLLAPSE,TRUE);

           which would change the list to

              (string,"Bob Jones")
              (string,"and")
              (string,"Jim Smith")

           Note that whitespace is collapsed here before any concatenation can be done; this is
           probably a bad idea.  But you can do it if you wish.  (If you get any ideas about
           cooking up your own value post-processing scheme by doing it in little steps like
           this, take a look at the source to "bt_postprocess_value()"; it should dissuade you
           from such a venture.)

       bt_postprocess_field ()
              char * bt_postprocess_field (AST *   field,
                                           btshort  options,
                                           boolean replace);

           This is little more than a front-end to "bt_postprocess_value()"; the only difference
           is that you pass it a "field" AST node (eg. the "(field,"AuThOr")" in the above
           example), and that it transforms the field name in addition to its value.  In
           particular, the field name is forced to lowercase; this behaviour is (currently) not
           optional.

           Returns the string returned by "bt_postprocess_value()".

       bt_postprocess_entry ()
              void bt_postprocess_entry (AST *  entry,
                                         btshort options);

           Post-processes all values in an entry.  If "entry" points to the AST for a "regular"
           or "macro definition" entry, then the values are just what you'd expect: everything on
           the right-hand side of a field or macro "assignment."  You can also post-process
           comment and preamble entries, though.  Comment entries are essentially one big string,
           so only whitespace collapsing makes sense on them.  Preambles may have multiple
           strings pasted together, so all the string-processing options apply to them.  (And
           there's nothing to prevent you from using macros in a preamble.)