lunar (1) bibclean.1.gz

Provided by: bibclean_2.11.4.1-4.1_amd64 bug

NAME

       bibclean - prettyprint and syntax check BibTeX and Scribe bibliography data base files

SYNOPSIS

       bibclean [ -author ] [ -error-log filename ] [ -help ] [ -? ] [ -init-file filename ]
                [ -long-field fieldname ] [ -max-width nnn ] [ -[no-]align-equals ]
                [ -[no-]check-values ] [ -[no-]delete-empty-values ] [ -[no-]file-position ]
                [ -[no-]fix-font-changes ] [ -[no-]fix-initials ] [ -[no-]fix-names ]
                [ -[no-]German-style ] [ -[no-]keep-linebreaks ] [ -[no-]keep-parbreaks ]
                [ -[no-]keep-preamble-spaces ] [ -[no-]keep-spaces ] [ -[no-]keep-string-spaces ]
                [ -[no-]parbreaks ] [ -[no-]prettyprint ] [ -[no-]print-patterns ]
                [ -[no-]read-init-files ] [ -[no-]remove-OPT-prefixes ] [ -[no-]scribe ]
                [ -[no-]trace-file-opening ] [ -[no-]warnings ] [ -version ]
                ( <infile | bibfile1 bibfile2 bibfile3 ... ) >outfile

       All options can be abbreviated to a unique leading prefix.

       An explicit file name of ``-'' represents standard input; it is assumed if no input  files
       are specified.

DESCRIPTION

       bibclean  prettyprints  input  BibTeX  files  to  stdout, and checks the brace balance and
       bibliography entry syntax as well.  It can be used to detect problems in BibTeX files that
       sometimes  confuse  even  BibTeX  itself,  and  importantly,  can be used to normalize the
       appearance of collections of BibTeX files.

       Here is a summary of the formatting actions:

       •  BibTeX items are formatted into a consistent structure with one field  =  "value"  pair
          per line, and the initial @ and trailing right brace in column 1.

       •  Tabs  are  expanded  into  blank strings; their use is discouraged because they inhibit
          portability, and can suffer corruption in electronic mail.

       •  Long string values are split at a blank and continued onto the next line  with  leading
          indentation.

       •  A single blank line separates adjacent bibliography entries.

       •  Text outside BibTeX entries is passed through verbatim.

       •  Outer parentheses around entries are converted to braces.

       •  Personal  names in author and editor field values are normalized to the form ``P. D. Q.
          Bach'', from ``P.D.Q. Bach'' and ``Bach, P.D.Q.''.

       •  Hyphen sequences in page numbers are converted to en-dashes.

       •  Month values are converted to standard BibTeX string abbreviations.

       •  In titles, sequences of upper-case characters at brace level zero are braced to protect
          them from being converted to lower-case letters by some bibliography styles.

       •  CODEN,  ISBN  (International  Standard  Book  Number)  and ISSN (International Standard
          Serial Number) entry values are examined to verify the checksums of each listed number,
          and correct ISBN hyphenation is automatically supplied.

       The  standardized  format  of  the output of bibclean facilitates the later application of
       simple filters, such as bibcheck(1), bibdup(1),  bibextract(1),  bibindex(1),  bibjoin(1),
       biblabel(1), biblook(1), biborder(1), bibsort(1), citefind(1), and citetags(1), to process
       the text, and also is the one expected by the GNU Emacs BibTeX support functions.

OPTIONS

       Command-line switches may be abbreviated to a unique leading prefix, and  letter  case  is
       not  significant.  All options are parsed before any input bibliography files are read, no
       matter what their order on the command line.  Options that correspond to a yes/no  setting
       of  a  flag  have a form with a prefix "no-" to set the flag to no.  For such options, the
       last setting determines the flag value used.  This is significant when  options  are  also
       specified in initialization files (see the INITIALIZATION FILES manual section).

       The  leading  hyphen  that  distinguishes  an  option  from a filename may be doubled, for
       compatibility with GNU and POSIX conventions.  Thus, -author and --author are equivalent.

       To avoid confusion with options, if a filename begins with a hyphen, it must be  disguised
       by a leading absolute or relative directory path, e.g., /tmp/-foo.bib or ./-foo.bib.

       -author   Display  an author credit on the standard error unit, stderr, and then exit with
                 a success return code.  Sometimes an executable program is  separated  from  its
                 documentation and source code; this option provides a way to recover from that.

       -error-log filename
                 Redirect  stderr to the indicated file, which will then contain all of the error
                 and warning messages.  This option is  provided  for  those  systems  that  have
                 difficulty redirecting stderr.

       -help or -?
                 Display  a  help  message on stderr, giving a usage description, similar to this
                 section of the manual pages, and then exit with a success return code.

       -init-file filename
                 Provide an explicit value pattern initialization file.   It  will  be  processed
                 after  any system-wide and job-wide initialization files, and may override them.
                 It in turn may be overridden by a subsequent file-specific initialization  file.
                 For further details, see the INITIALIZATION FILES manual section.

       -long-field fieldname
                 Suppress warnings that field named fieldname have lenghts exceeding the standard
                 BibTeX limits.  NB! This is a Debian-specific extension!

       -max-width nnn
                 bibclean normally limits output  line  widths  to  72  characters,  and  in  the
                 interests  of  consistency,  that  value  should  not be changed.  Occasionally,
                 special-purpose applications may require different maximum line widths, so  this
                 option  provides  that  capability.  The number following the option name can be
                 specified in decimal, octal (starting with 0),  or  hexadecimal  (starting  with
                 0x).  A zero or negative value is interpreted to mean unlimited, so -max-width 0
                 can be used to ensure that each field/value pair appears on a single line.

                 When -no-prettyprint requests bibclean to act as a lexical analyzer, the default
                 line width is unlimited, unless overridden by this option.

                 When  bibclean  is  prettyprinting,  line wrapping will be done only at a space.
                 Consequently, a long non-blank character  sequence  may  result  in  the  output
                 exceeding the requested line width.

                 When  bibclean is lexing, line wrapping is done by inserting a backslash-newline
                 pair when the specified maximum is reached, so no line length will  ever  exceed
                 the maximum.

       -[no-]align-equals
                 With  the  positive  form, align the equals sign in key/value assignments at the
                 same column, separated by a single space from the value string.  Otherwise,  the
                 equals sign follows the key, separated by a single space.  Default: no.

       -[no-]check-values
                 With  the  positive  form,  apply  heuristic pattern matching to field values in
                 order to detect possible errors (e.g., ``year =  "192"''  instead  of  ``year  =
                 "1992"''), and issue warnings when unexpected patterns are found.

                 This  checking is usually beneficial, but if it produces too many bogus warnings
                 for a particular bibliography file, you can disable it with the negative form of
                 this option.  Default: yes.

       -[no-]delete-empty-values
                 With  the  positive form, remove all field/value pairs for which the value is an
                 empty string.  This is helpful in cleaning up bibliographies generated from text
                 editor  templates.  Compare this option with -[no-]remove-OPT-prefixes described
                 below.  Default: no.

       -[no-]file-position
                 With the positive form, give detailed file position information in  warning  and
                 error messages.  Default: no.

       -[no-]fix-font-changes
                 With  the positive form, supply an additional brace level around font changes in
                 titles to protect against downcasing by some BibTeX styles.  Font  changes  that
                 already have more than one level of braces are not modified.

                 For example, if a title contains the Latin phrase {\em Dictyostelium Discoideum}
                 or {\em {D}ictyostelium {D}iscoideum}, then downcasing will incorrectly  convert
                 the  phrase to lower-case letters.  Most BibTeX users are surprised that bracing
                 the initial letters does not prevent the downcase action.  The correct coding is
                 {{\em  Dictyostelium  Discoideum}}.   However,  there  are also legitimate cases
                 where an extra level of bracing wrongly protects from downcasing.  Consequently,
                 bibclean  will  normally  not supply an extra level of braces, but if you have a
                 bibliography where the extra braces are routinely  missing,  you  can  use  this
                 option to supply them.

                 If  you  think  that  you  need this option, it is strongly recommended that you
                 apply bibclean to your bibliography file  with  and  without  -fix-font-changes,
                 then  compare  the  two  output  files to ensure that extra braces are not being
                 supplied in titles where they should not be present.  You will  have  to  decide
                 which  of  the  two output files is the better choice, then repair the incorrect
                 title bracing by hand.

                 Since font changes in titles are uncommon, except for cases of  the  type  which
                 this  option is designed to correct, it should do more good than harm.  Default:
                 no.

       -[no-]fix-initials
                 With the positive form, insert a space after a period following author initials.
                 Default: yes.

       -[no-]fix-names
                 With the positive form, reorder author and editor name lists to remove commas at
                 brace level zero, placing first names or initials before last  names.   Default:
                 yes.

       -[no-]German-style
                 With  the  positive  form,  interpret  quote  characters ["] inside braced value
                 strings at brace level 1 according to the conventions  of  the  TeX  style  file
                 german.sty, which overloads quote to simplify input and representation of German
                 umlaut  accents,  sharp-s  (es-zet),  ligature  separators,  invisible  hyphens,
                 raised/lowered quotes, French guillemets, and discretionary hyphens.  Recognized
                 character combinations will be braced to prevent BibTeX  from  interpreting  the
                 quote as a string delimiter.

                 Quoted  strings  receive  no special handling from this option, and since German
                 nouns in titles must anyway be protected from the downcasing operation  of  most
                 BibTeX  bibliography  styles, German value strings that use the overloaded quote
                 character can always be entered in the form "{...}", without the need to specify
                 this option at all.

                 Default: no.

       -[no-]keep-linebreaks
                 Normally, line breaks inside value strings are collapsed into a single space, so
                 that long value strings can later be  broken  to  provide  lines  of  reasonable
                 length.

                 With  the  positive  form,  linebreaks are preserved in value strings.  If -max-
                 width is set to zero, this preserves the original line breaks.  Spacing  outside
                 value  strings  remains  under  bibclean's  control, and is not affected by this
                 option.

                 Default: no.

       -[no-]keep-parbreaks
                 With the positive form, preserve paragraph breaks (either  formfeeds,  or  lines
                 containing  only  spaces)  in  value  strings.   Normally,  paragraph breaks are
                 collapsed into a single space.  Spacing  outside  value  strings  remains  under
                 bibclean's control, and is not affected by this option.  Default: no.

       -[no-]keep-preamble-spaces
                 With  the  positive  form,  preserve  all  whitespace in @Preamble{...} entries.
                 Default: no.

       -[no-]keep-spaces
                 With the positive  form,  preserve  all  spaces  in  value  strings.   Normally,
                 multiple  spaces  are  collapsed  into  a single space.  This option can be used
                 together with -keep-linebreaks, -keep-parbreaks, and -max-width  0  to  preserve
                 the  form  of  value  strings  while  still providing syntax and value checking.
                 Spacing outside value strings remains  under  bibclean's  control,  and  is  not
                 affected by this option.  Default: no.

       -[no-]keep-string-spaces
                 With  the  positive  form,  preserve  all  whitespace  in  @String{...} entries.
                 Default: no.

       -[no-]parbreaks
                 With the negative form,  a  paragraph  break  (either  a  formfeed,  or  a  line
                 containing   only  spaces)  is  not  permitted  in  value  strings,  or  between
                 field/value pairs.  This may be useful to quickly trap runaway  strings  arising
                 from mismatched delimiters.  Default: yes.

       -[no-]prettyprint
                 Normally,  bibclean  functions  as  a prettyprinter.  However, with the negative
                 form of this option, it acts as a lexical analyzer instead, producing  a  stream
                 of lexical tokens.  See the LEXICAL ANALYSIS manual section for further details.
                 Default: yes.

       -[no-]print-patterns
                 With the positive form, print the value patterns read from initialization  files
                 as  they  are  added  to  internal tables.  Use this option to check newly-added
                 patterns, or to see what patterns are being used.

                 These patterns are the ones that will be used  in  checking  value  strings  for
                 valid syntax, and all of them are specified in initialization files, rather than
                 hard-coded into the program.  For further details, see the INITIALIZATION  FILES
                 manual section.  Default: no.

       -[no-]read-init-files
                 With  the  negative  form, suppress loading of system-, user-, and file-specific
                 initialization  files.   Initializations  will  come  only  from   those   files
                 explicitly given by -init-file filename options.  Default: yes.

       -[no-]remove-OPT-prefixes
                 With the positive form, remove the ``OPT'' prefix from each field name where the
                 corresponding value is not an empty string.  The prefix ``OPT'' must be entirely
                 in upper-case to be recognized.

                 This  option  is  for  bibliographies  generated  with the help of the GNU Emacs
                 BibTeX  editing  support,  which  generates  templates  with   optional   fields
                 identified  by  the ``OPT'' prefix.  Although the function M-x bibtex-remove-OPT
                 normally bound to the keystrokes C-c C-o does the job, users often forget,  with
                 the  result that BibTeX does not recognize the field name, and ignores the value
                 string.  Compare this option  with  -[no-]delete-empty-values  described  above.
                 Default: no.

       -[no-]scribe
                 With  the  positive  form, accept input syntax conforming to the Scribe document
                 system.  The output will be converted to conform  to  BibTeX  syntax.   See  the
                 SCRIBE BIBLIOGRAPHY FORMAT manual section for further details.  Default: no.

       -[no-]trace-file-opening
                 With  the  positive  form,  record  in the error log file the names of all files
                 which  bibclean  attempts  to  open.   Use  this  option   to   identify   where
                 initialization files are located.  Default: no.

       -[no-]warnings
                 With  the  positive  form, allow all warning messages.  The negative form is not
                 recommended since it may mask problems that should be repaired.  Default: yes.

       -version  Display the program version number on stderr,  and  then  exit  with  a  success
                 return  code.  This will also include an indication of who compiled the program,
                 the host name on which it was compiled, the time of compilation, and the type of
                 string-value  matching  code selected, when that information is available to the
                 compiler.

ERROR RECOVERY AND WARNINGS

       When bibclean detects an error, it issues an error message  to  both  stderr  and  stdout.
       That  way,  the  user  is  clearly notified, and the output bibliography also contains the
       message at the point of error.

       Error messages begin with a distinctive pair  of  queries,  ??,  beginning  in  column  1,
       followed  by  the  input  file  name  and  line  number.  If the -file-position option was
       specified, they also contain the input and output positions of the  current  file,  entry,
       and  value.   Each position includes the file byte number, the line number, and the column
       number.  In the event of a runaway string argument, the entry and value  positions  should
       precisely  pinpoint the erroneous bibliography entry, and the file positions will indicate
       where it was detected, which may be rather later in the files.

       Warning messages identify possible problems, and are therefore sent only  to  stderr,  and
       not  to  stdout,  so  they  never  appear  in  the  output file.  They are identified by a
       distinctive pair of percents, %%, beginning in column 1, and as with error  messages,  may
       be followed by file position messages if the -file-position option was specified.

       For  convenience,  the  first  line  of  each  error and warning message sent to stderr is
       formatted according to the expectations of the GNU  Emacs  next-error  command.   You  can
       invoke  bibclean  with  the  Emacs  M-x  compile<RET>bibclean  filename.bib  >filename.new
       command, then use the next-error command, normally bound to C-x  `  (that's  a  grave,  or
       back, accent), to move to the location of the error in the input file.

       If  error  messages  are  ignored,  and  left  in  the output bibliography file, they will
       precipitate an error when the bibliography is next processed with BibTeX.

       After issuing an error message, bibclean then  resynchronizes  its  input  by  copying  it
       verbatim  to  stdout  until  a new bibliography entry is recognized on a line in which the
       first non-blank character is an at-sign (@).  This ensures that nothing is lost  from  the
       input  file(s),  allowing  corrections to be made in either the input or the output files.
       However, if bibclean detects an internal error in its data structures, it  will  terminate
       abruptly  without  further  input  or  output  processing; this kind of error should never
       happen, and if it does, it should be reported immediately to the author  of  the  program.
       Errors  in  initialization files, and running out of dynamic memory, will also immediately
       terminate bibclean.

INITIALIZATION FILES

       bibclean can be compiled with one of three different types of pattern matching; the choice
       is made by the installer at compile time:

              •  The original version uses explicit hand-coded tests of value-string syntax.

              •  The   second  version  uses  regular-expression  pattern-matching  host  library
                 routines together with  regular-expression  patterns  that  come  entirely  from
                 initialization files.

              •  The  third  version uses special patterns that come entirely from initialization
                 files.

       This Debianized version of bibclean uses the third version.  However, command-line options
       can also be specified in initialization files, no matter which pattern matching choice was
       selected.

       When bibclean starts, it searches  for  initialization  files,  using  the  first  one  of
       $(HOME)/.bibcleanrc,  /usr/share/bibcleanrc, and /etc/bibcleanrc that exists.  Afterwards,
       it reads the first .bibcleanrc found in the BIBINPUTS search path.  The  name  .bibcleanrc
       can  be changed at run time through a setting of the environment variable BIBCLEANINI.  If
       the name starts with a dot, it will be stripped when looking in /usr/share and /etc.

       Then, when command-line arguments are processed, any additional files specified by  -init-
       filefilename   options  are  also  processed.   Finally,  immediately  before  each  named
       bibliography file is processed, an attempt is made to process an initialization file  with
       the  same  name,  but  with  the  extension changed to .ini.  The default extension can be
       changed by a setting of the environment variable BIBCLEANEXT.  This scheme permits system-
       wide, user-wide, session-wide, and file-specific initialization files to be supported.

       When input is taken from stdin, there is no file-specific initialization.

       For  precise  control,  the -no-read-init-files option suppresses all initialization files
       except those explicitly named by -init-filefilename options, either on the  command  line,
       or in requested initialization files.

       Recursive  execution  of initialization files with nested -init-file options is permitted;
       if the recursion is circular, bibclean will finally get a  non-fatal  initialization  file
       open  failure  after  opening too many files.  This terminates further initialization file
       processing.  As the recursion unwinds, the files are all closed, then  execution  proceeds
       normally.

       An initialization file may contain empty lines, comments from percent to end of line (just
       like TeX),  option  switches,  and  field/pattern  or  field/pattern/message  assignments.
       Leading and trailing spaces are ignored.  This is best illustrated by a short example:

       % This is a small bibclean initialization file

       -init-file /u/math/bib/.bibcleanrc %% departmental patterns

       chapter = "\"D\""                 %% 23

       pages   = "\"D--D\""              %% 23--27

       volume  = "\"D \\an\\d D\""       %% 11 and 12

       year    = \
          "\"dddd, dddd, dddd\"" \
          "Multiple years specified."      %% 1989, 1990, 1991

       -no-fix-names   %% do not modify author/editor lists

       Long  logical  lines can be split into multiple physical lines by breaking at a backslash-
       newline pair; the backslash-newline pair is  discarded.   This  processing  happens  while
       characters are being read, before any further interpretation of the input stream.

       Each  logical  line  must contain a complete option (and its value, if any), or a complete
       field/pattern pair, or a field/pattern/message triple.

       Comments are stripped during the parsing of the field, pattern, and message  values.   The
       comment  start symbol is not recognized inside quoted strings, so it can be freely used in
       such strings.

       Comments on logical lines that were input as multiple physical lines  via  the  backslash-
       newline  convention  must  appear  on  the  last  physical  line; otherwise, the remaining
       physical lines will become part of the comment.

       Pattern strings must be enclosed in quotation marks;  within  such  strings,  a  backslash
       starts  an escape mechanism that is commonly used in UNIX software.  The recognized escape
       sequences are:

              \a     alarm bell (octal 007)

              \b     backspace (octal 010)

              \f     formfeed (octal 014)

              \n     newline (octal 012)

              \r     carriage return (octal 015)

              \t     horizontal tab (octal 011)

              \v     vertical tab (octal 013)

              \ooo   character number octal ooo (e.g \012 is linefeed).  Up to 3 octal digits may
                     be used.

              \0xhh  character  number  hexadecimal  hh (e.g., \0x0a is linefeed).  xhh may be in
                     either letter case.  Any number of hexadecimal digits may be used.

       Backslash followed by any other character produces just that character.  Thus, \%  gets  a
       literal  percent into a string (preventing its interpretation as a comment), \" produces a
       quotation mark, and \\ produces a single backslash.

       An ASCII NUL (\0) in a string will terminate it; this is a feature of  the  C  programming
       language in which bibclean is implemented.

       Field/pattern  pairs can be separated by arbitrary space, and optionally, either an equals
       sign or colon functioning as an assignment operator.  Thus, the following are equivalent:

       pages="\"D--D\""
       pages:"\"D--D\""
       pages "\"D--D\""
         pages = "\"D--D\""
         pages : "\"D--D\""
       pages   "\"D--D\""

       Each field name can have an arbitrary number of patterns associated with it; however, they
       must be specified in separate field/pattern assignments.

       An  empty  pattern  string  causes  previously-loaded  patterns  for that field name to be
       forgotten.  This feature permits an initialization file  to  completely  discard  patterns
       from earlier initialization files.

       Patterns for value strings are represented in a tiny special-purpose language that is both
       convenient and suitable for bibliography  value-string  syntax  checking.   While  not  as
       powerful  as  the  language  of  regular-expression  patterns, its parsing can be portably
       implemented in less than 3% of the code in a widely-used  regular-expression  parser  (the
       GNU regexp package).

       The patterns are represented by the following special characters:

              <space>  one or more spaces

              a        exactly one letter

              A        one or more letters

              d        exactly one digit

              D        one or more digits

              r        exactly one Roman numeral

              R        one or more Roman numerals (i.e. a Roman number)

              w        exactly one word (one or more letters and digits)

              W        one or more space-separated words, beginning and ending with a word

              .        one  `special' character, one of the characters <space>!#()*+,-./:;?[]~, a
                       subset of punctuation characters that are typically used in string values

              :        one or more `special' characters

              X        one or more `special'-separated words, beginning and ending with a word

              \x       exactly one x (x is any  character),  possibly  with  an  escape  sequence
                       interpretation given earlier

              x        exactly  the  character  x  (x  is  anything  but  one  of  these  pattern
                       characters: aAdDrRwW.:<space>\)

       The X pattern character is very powerful, but generally inadvisable, since it  will  match
       almost  anything  likely  to  be found in a BibTeX value string.  The reason for providing
       pattern matching on the value strings is to uncover possible errors, not mask them.

       There is no provision for specifying ranges or repetitions of  characters,  but  this  can
       usually be done with separate patterns.  It is a good idea to accompany the pattern with a
       comment showing the kind of thing it is expected to  match.   Here  is  a  portion  of  an
       initialization file giving a few of the patterns used to match number value strings:

       number  =       "\"D\""         %% 23
       number  =       "\"A AD\""      %% PN LPS5001
       number  =       "\"A D(D)\""    %% RJ 34(49)
       number  =       "\"A D\""       %% XNSS 288811
       number  =       "\"A D\\.D\""   %% Version 3.20
       number  =       "\"A-A-D-D\""   %% UMIAC-TR-89-11
       number  =       "\"A-A-D\""     %% CS-TR-2189
       number  =       "\"A-A-D\\.D\"" %% CS-TR-21.7

       For  a  bibliography  that  contains  only  article  entries, this list should probably be
       reduced to just the first pattern, so that anything other than a digit  string  fails  the
       pattern-match  test.   This  is easily done by keeping bibliography-specific patterns in a
       corresponding file with extension .ini, since that file is read automatically.

       You should be sure to use empty pattern strings in this pattern file to  discard  patterns
       from earlier initialization files.

       The  value  strings  passed  to  the  pattern  matcher  contain surrounding quotes, so the
       patterns should also.  However, you could use a pattern specification like "\"D" to  match
       an  initial  digit  string  followed by anything else; the omission of the final quotation
       mark \" in the pattern allows  the  match  to  succeed  without  checking  that  the  next
       character in the value string is a quotation mark.

       Because  the  value  strings  are  intended  to  be processed by TeX, the pattern matching
       ignores braces, and TeX control sequences, together with any space following those control
       sequences.   Spaces  around  braces  are  preserved.   This  convention allows the pattern
       fragment A-AD-D  to  match  the  value  string  TN-K\slash 27-70,  because  the  value  is
       implicitly collapsed to TN-K27-70 during the matching operation.

       bibclean's  normal  action  when  a  string  value fails to match any of the corresponding
       patterns is to issue a warning message something like this: "Unexpected value in ``year  =
       "192"''.   In  most  cases,  that  is  sufficient to alert the user to a problem.  In some
       cases, however, it may be desirable to associate a different  message  with  a  particular
       pattern.   This  can  be  done by supplying a message string following the pattern string.
       Format items %% (single percent), %e (entry name), %f (field name), %k (citation key), and
       %v  (string  value) are available to get current values expanded in the messages.  Here is
       an example:

       chapter = "\"D:D\"" "Colon found in ``%f = %v''" %% 23:2

       To be consistent with other messages output by bibclean, the message string should not end
       with punctuation.

       If  you  wish  to  make  the message an error, rather than just a warning, begin it with a
       query (?), like this:

       chapter = "\"D:D\"" "?Colon found in ``%f = %v''" %% 23:2

       The query will not be included in the output message.

       Escape sequences are supported in message strings, just as they are  in  pattern  strings.
       You can use this to advantage for fancy things, such as terminal display mode control.  If
       you rewrite the previous example as

       chapter = "\"D:D\"" \
                 "?\033[7mColon found in ``%f = %v''\033[0m" %% 23:2

       the error message will appear in inverse  video  on  display  screens  that  support  ANSI
       terminal  control sequences.  Such practice is not normally recommended, since it may have
       undesirable effects on some output devices.  Nevertheless, you  may  find  it  useful  for
       restricted applications.

       For  some  types  of  bibliography  fields,  bibclean  contains  special-purpose  code  to
       supplement or replace the pattern matching:

              •  CODEN, ISBN and ISSN field values are handled this way because their  validation
                 requires evaluation of checksums that cannot be expressed by simple patterns; no
                 patterns are even used in these three cases.

              •  chapter, number, pages, and volume values are checked only by pattern matching.

              •  month values are first checked against the standard BibTeX month  abbreviations,
                 and only if no match is found are patterns then used.

              •  year  values  are first checked against patterns, then if no match is found, the
                 year numbers are found and converted  to  integer  values  for  testing  against
                 reasonable bounds.

       Values  for  other fields are checked only against patterns.  You can provide patterns for
       any field you like, even ones bibclean does not already know about.  New ones  are  simply
       added to an internal table that is searched for each string to be validated.

       The  special  field,  key,  represents  the  bibliographic  citation key.  It can be given
       patterns, like any other field.  Here is an initialization file  pattern  assignment  that
       will match an author name, a colon, an alphabetic string, and a two-digit year:

       key = "A:Add"                     %% Knuth:TB86

       Notice  that no quotation marks are included in the pattern, because the citation keys are
       not quoted.  You can use such patterns to help  enforce  uniform  naming  conventions  for
       citation keys, which is increasingly important as your bibliography data base grows.

LEXICAL ANALYSIS

       When  -no-prettyprint  is  specified,  bibclean  acts  as  a lexical analyzer instead of a
       prettyprinter, producing output in lines of the form

              <token-number><tab><token-name><tab>"<token-value>"

       Each output line contains a single complete token, identified by a  small  integer  number
       for  use by a computer program, a token type name for human readers, and a string value in
       quotes.

       Special characters in the token value string are  represented  with  ANSI/ISO  Standard  C
       escape  sequences,  so  all  characters  other  than NUL are representable, and multi-line
       values can be represented in a single line.

       Here are the token numbers and token type  names  that  can  appear  in  the  output  when
       -prettyprint is specified:

               0   UNKNOWN
               1   ABBREV
               2   AT
               3   COMMA
               4   COMMENT
               5   ENTRY
               6   EQUALS
               7   FIELD
               8   INCLUDE
               9   INLINE
              10   KEY
              11   LBRACE
              12   LITERAL
              13   NEWLINE
              14   PREAMBLE
              15   RBRACE
              16   SHARP
              17   SPACE
              18   STRING
              19   VALUE

       Programs  that  parse  such  output  should  also be prepared for lines beginning with the
       warning prefix, %%, or the error prefix, ??, and  for  ANSI/ISO  Standard  C  line  number
       directives of the form
              # line 273 "texbook1.bib"
       which record the line number and file name of the current input file.

       If  a  -max-width nnn command-line option was specified, long output lines will be wrapped
       at a backslash-newline pair, and consequently, software that processes the  lexical  token
       stream should be prepared to collapse such wrapped lines back into single lines.

       As an example of the use of -no-prettyprint, the UNIX command pipeline
              bibclean -no-prettyprint mylib.bib | \
                  awk '$2 == "KEY" {print $3}' | \
                  sed -e 's/"//g' | \
                  sort
       will extract a sorted list of all citation keys in the file mylib.bib.

       A  certain  amount  of  processing  will  have  been  done  on the tokens.  In particular,
       delimiters equivalent to braces will have been replaced by braces, and braced strings will
       have become quoted strings.

       The  LITERAL token type is used for arbitrary text that bibclean does not examine further,
       such as the contents of a @Preamble{...} or a @Comment{...}.

       The UNKNOWN token type should never appear in the output stream.  It is used internally to
       initialize token type variables.

SCRIBE BIBLIOGRAPHY FORMAT

       bibclean's  support  for the Scribe bibliography format is based on the syntax description
       in the Scribe Introductory User's Manual, 3rd Edition, May 1980.   Scribe  was  originally
       developed  by  Brian  Reid at Carnegie-Mellon University, and is now marketed by Unilogic,
       Ltd.

       The BibTeX bibliography format was strongly influenced by Scribe, and indeed,  with  care,
       it  is  possible to share bibliography files between the two systems.  Nevertheless, there
       are some differences, so here is a summary of features of  the  Scribe  bibliography  file
       format:

       (1)   Letter case is not significant in field names and entry names, but case is preserved
             in value strings.

       (2)   In field/value pairs, the  field  and  value  may  be  separated  by  one  of  three
             characters: =, /, or space.  Space may optionally surround these separators.

       (3)   Value delimiters are any of these seven pairs: { }   [ ]   ( )   < >   ' '   " "   `
             `

       (4)   Value delimiters may not be nested, even though with the first four delimiter pairs,
             nested balanced delimiters would be unambiguous.

       (5)   Delimiters  can  be  omitted  around values that contain only letters, digits, sharp
             (#), ampersand (&), period (.), and percent (%).

       (6)   Outside of delimited values, a literal at-sign (@) is  represented  by  doubled  at-
             signs (@@).

       (7)   Bibliography  entries  begin  with @name, as for BibTeX, but any of the seven Scribe
             value delimiter pairs may be used to surround the values in field/value  pairs.   As
             in (4), nested delimiters are forbidden.

       (8)   Arbitrary space may separate entry names from the following delimiters.

       (9)   @Comment is a special command whose delimited value is discarded.  As in (4), nested
             delimiters are forbidden.

       (10)  The special form

             @Begin{comment}
              ...
             @End{comment}

             permits encapsulating arbitrary text containing any characters or delimiters,  other
             than  ``@End{comment}''.   Any  of  the seven delimiter pairs may be used around the
             word ``comment'' following the ``@Begin'' or ``@End''; the  delimiters  in  the  two
             cases  need not be the same, and consequently, ``@Begin{comment}''/``@End{comment}''
             pairs may not be nested.

       (11)  The key field is required in each bibliography entry.

       (12)  A backslashed quote in a string will be assumed to  be  a  TeX  accent,  and  braced
             appropriately.   While  such  accents do not conform to Scribe syntax, Scribe-format
             bibliographies have been found that appear to be intended for TeX processing.

       Because of this loose syntax,  bibclean's  normal  error  detection  heuristics  are  less
       effective,  and  consequently, Scribe mode input is not the default; it must be explicitly
       requested.

ENVIRONMENT VARIABLES

       BIBCLEANEXT  File extension of bibliography-specific initialization files.  Default: .ini.

       BIBCLEANINI  Name of bibclean initialization files.  Default: .bibcleanrc.

       BIBINPUTS    Search path for bibclean and BibTeX input files.  This is  a  colon-separated
                    list of directories that are searched in order from first to last.  It is not
                    an error for a specified directory to not exist.

FILES

       *.bib          BibTeX and Scribe bibliography data base files.

       *.ini          File-specific initialization files.

       /usr/share/bibcleanrc, /etc/bibcleanrc
                      System-wide initialization files.

       .bibcleanrc    User-specific initialization files.

SEE ALSO

       bibcheck(1), bibdup(1), bibextract(1), bibindex(1),  bibjoin(1),  biblabel(1),  biblex(1),
       biblook(1),  biborder(1),  bibparse(1),  bibsort(1),  bibtex(1), bibunlex(1), citefind(1),
       citesub(1), citetags(1), latex(1), scribe(1), tex(1).

AUTHOR

       Nelson H. F. Beebe
       Center for Scientific Computing
       University of Utah
       Department of Mathematics, 322 INSCC
       155 S 1400 E RM 233
       Salt Lake City, UT 84112-0090
       USA
       Tel: +1 801 581 5254
       FAX: +1 801 585 1640, +1 801 581 4148
       Email: beebe@math.utah.edu, beebe@acm.org, beebe@ieee.org (Internet)
       URL: http://www.math.utah.edu/~beebe

       This Debianization of bibclean was done  by  Henning  Makholm  <henning@makholm.net>,  and
       differs from the upstream source in where it looks for the system-wide initialization file
       (vanilla bibclean expects to find it in $PATH), and has also been patched  to  ignore  the
       built-in BibTeX field-length limit for abstract fields.