Provided by: libbtparse-dev_0.89-1build3_amd64 

NAME
bt_post_processing - post-processing of BibTeX strings, values, and entries
SYNOPSIS
void bt_postprocess_string (char * s,
btshort options)
char * bt_postprocess_value (AST * value,
btshort options,
boolean replace);
char * bt_postprocess_field (AST * field,
btshort options,
boolean replace);
void bt_postprocess_entry (AST * entry,
btshort options);
DESCRIPTION
When btparse parses a BibTeX entry, it initially stores the results in an abstract syntax tree (AST), in
a form exactly mirroring the parsed data. For example, the entry
@Article{Jones:1997a,
AuThOr = "Bob Jones" # and # "Jim Smith ",
TITLE = "Feeding Habits of
the Common Cockroach",
JoUrNaL = j_ent,
YEAR = 1997
}
would parse to an AST that could be represented as follows:
(entry,"Article")
(key,"Jones:1997a")
(field,"AuThOr")
(string,"Bob Jones")
(macro,"and")
(string,"Jim Smith ")
(field,"TITLE")
(string,"Feeding Habits of the Common Cockroach")
(field,"JoUrNaL")
(macro,"j_ent")
(field,"YEAR")
(number,"1997")
The advantage of this form is that all the important information in the entry is readily available by
traversing the tree using the functions described in bt_traversal. This obvious problem is that the data
is a little too raw to be immediately useful: entry types and field names are inconsistently capitalized,
strings are full of unwanted whitespace, field values not reduced to single strings, and so forth.
All of these problems are addressed by btparse's post-processing functions, described here. Normally,
you won't have to call these functions---the library does the Right Thing for you after parsing each
entry, and you can customize what exactly the Right Thing is for your application. (For instance, you
can tell it to expand macros, but not to concatenate substrings together.) However, it's conceivable
that you might wish to move the post-processing into your own code and out of the library's control.
More likely, you could have strings that come from something other than BibTeX files that you would like
to have treated as BibTeX strings; for that situation, the post-processing functions are essential.
Finally, you might just be curious about what exactly happens to your data after it's parsed. If so,
you've come to the right place for excruciatingly detailed explanations.
FUNCTIONS
btparse offers four points of entry to its post-processing code. Of these, probably only the first and
last---for processing individual strings and whole entries---will be commonly used.
Post-processing entry points
To understand why four entry points are offered, an explanation of the sample AST shown above will help.
First of all, the whole entry is represented by the "(entry,"Article")" node; this node has the entry key
and all its field/value pairs as children. Entry nodes are returned by bt_parse_entry() and
bt_parse_entry_s() (see bt_input) as well as bt_next_entry() (which traverses a list of entries returned
from bt_parse_file()---see bt_traversal). Whole entries may be post-processed with
bt_postprocess_entry().
You may also need to post-process a single field, or just the value associated with it. (The difference
is that processing the field can change the field name---e.g. to lowercase---in addition to the field
value.) The "(field,"AuThOr")" node above is an example of a field sub-AST, and "(string,"Bob Jones")"
is the first node in the list of simple values representing that field's value. (Recall that a field
value is, in general, a list of simple values.) Field nodes are returned by bt_next_field(), value nodes
by bt_next_value(). The former may be passed to bt_postprocess_field() for post-processing, the latter
to bt_postprocess_value().
Finally, individual strings may wander into your program from many places other than a btparse AST. For
that reason, bt_postprocess_string() is available for post-processing arbitrary strings.
Post-processing options
All of the post-processing routines have an "options" parameter, which you can use to fine-tune the post-
processing. (This is just like the per-metatype string-processing options that you can set before
parsing entries; see bt_set_stringopts() in bt_input.) Like elsewhere in the library, "options" is a
bitmap constructed by or'ing together various predefined constants. These constants and their effects
are documented in "String processing option macros" in btparse.
bt_postprocess_string ()
void bt_postprocess_string (char * s,
btshort options)
Post-processes an individual string, "s", which is modified in place. The only post-processing
option that makes sense on individual strings is whether to collapse whitespace according to the
BibTeX rules; thus, if "options & BTO_COLLAPSE" is false, this function has no effect. (Although it
makes a complete pass over the string anyways. This is for future expansion.)
The exact rules for collapsing whitespace are simple: non-space whitespace characters (tabs and
newlines mainly) are converted to space, any strings of more than one space within are collapsed to a
single space, and any leading or trailing spaces are deleted. (Ensuring that all whitespace is
spaces is actually done by btparse's lexical scanner, so strings in btparse ASTs will never have
whitespace apart from space. Likewise, any strings passed to bt_postprocess_string() should not
contain non-space whitespace characters.)
bt_postprocess_value ()
char * bt_postprocess_value (AST * value,
btshort options,
boolean replace);
Post-processes a single field value, which is the head of a list of simple values as returned by
bt_next_value(). All of the relevant string-processing options come into play here: conversion of
numbers to strings ("BTO_CONVERT"), macro expansion ("BTO_EXPAND"), collapsing of whitespace
("BTO_COLLAPSE"), and string pasting ("BTO_PASTE"). Since pasting substrings together without first
expanding macros and converting numbers would be nonsensical, attempting to do so is a fatal error.
If "replace" is true, then the list headed by "value" will be replaced by a list representing the
processed value. That is, if string pasting is turned on ("options & BTO_PASTE" is true), then this
list will be collapsed to a single node containing the single string that results from pasting
together all the substrings. If string pasting is not on, then each node in the list will be left
intact, but will have its text replaced by processed text.
If "replace" is false, then a new string will be built on the fly and returned by the function. Note
that if pasting is not on in this case, you will only get the last string in the list. (It doesn't
really make a lot of sense to post-process a value without pasting unless you're replacing it with
the new value, though.)
Returns the string that resulted from processing the whole value, which only makes sense if pasting
was on or there was only one value in the list. If a multiple-value list was processed without
pasting, the last string in the list is returned (after processing).
Consider what might be done to the value of the "author" field in the above example, which is the
concatenation of a string, a macro, and another string. Assume that the macro "and" expands to " and
", and that the variable "value" points to the sub-AST for this value. The original sub-AST
corresponding to this value is
(string,"Bob Jones")
(macro,"and")
(string,"Jim Smith ")
To fully process this value in-place, you would call
bt_postprocess_value (value, BTO_FULL, TRUE);
("BTO_FULL" is just the combination of all possible string-processing options:
"BTO_CONVERT|BTO_EXPAND|BTO_PASTE|BTO_COLLAPSE".) This would convert the value to a single-element
list,
(string,"Bob Jones and Jim Smith")
and return the fully-processed string "Bob Jones and Jim Smith". Note that the "and" macro has been
expanded, interpolated between the two literal strings, everything pasted together, and finally
whitespace collapsed. (Collapsing whitespace before concatenating the strings would be a bad idea.)
Let's say you'd rather preserve the list nature of the value, while expanding macros and converting
any numbers to strings. (This conversion is trivial: it just changes the type of the node from
"BTAST_NUMBER" to "BTAST_STRING". "Number" values are always stored as a string of digits, just as
they appear in the file.) This would be done with the call
bt_postprocess_value
(value, BTO_CONVERT|BTO_EXPAND|BTO_COLLAPSE,TRUE);
which would change the list to
(string,"Bob Jones")
(string,"and")
(string,"Jim Smith")
Note that whitespace is collapsed here before any concatenation can be done; this is probably a bad
idea. But you can do it if you wish. (If you get any ideas about cooking up your own value post-
processing scheme by doing it in little steps like this, take a look at the source to
bt_postprocess_value(); it should dissuade you from such a venture.)
bt_postprocess_field ()
char * bt_postprocess_field (AST * field,
btshort options,
boolean replace);
This is little more than a front-end to bt_postprocess_value(); the only difference is that you pass
it a "field" AST node (eg. the "(field,"AuThOr")" in the above example), and that it transforms the
field name in addition to its value. In particular, the field name is forced to lowercase; this
behaviour is (currently) not optional.
Returns the string returned by bt_postprocess_value().
bt_postprocess_entry ()
void bt_postprocess_entry (AST * entry,
btshort options);
Post-processes all values in an entry. If "entry" points to the AST for a "regular" or "macro
definition" entry, then the values are just what you'd expect: everything on the right-hand side of a
field or macro "assignment." You can also post-process comment and preamble entries, though.
Comment entries are essentially one big string, so only whitespace collapsing makes sense on them.
Preambles may have multiple strings pasted together, so all the string-processing options apply to
them. (And there's nothing to prevent you from using macros in a preamble.)
btparse, version 0.89 2024-03-31 btparse::doc::bt_post_processing(3)