Provided by: libtext-bibtex-perl_0.89-1_amd64 bug

NAME

       Text::BibTeX::Name - interface to BibTeX-style author names

SYNOPSIS

          use Text::BibTeX::Name;

          $name = Text::BibTeX::Name->new();
          $name->split('J. Random Hacker');
          # or:
          $name = Text::BibTeX::Name->new('J. Random Hacker');

          @firstname_tokens = $name->part ('first');
          $lastname = join (' ', $name->part ('last'));

          $format = Text::BibTeX::NameFormat->new();
          # ...customize $format...
          $formatted = $name->format ($format);

DESCRIPTION

       "Text::BibTeX::Name" provides an abstraction for BibTeX-style names and some basic
       operations on them.  A name, in the BibTeX world, consists of a list of tokens which are
       divided amongst four parts: `first', `von', `last', and `jr'.

       Tokens are separated by whitespace or commas at brace-level zero.  Thus the name

          van der Graaf, Horace Q.

       has five tokens, whereas the name

          {Foo, Bar, and Sons}

       consists of a single token.  Skip down to "EXAMPLES" for more examples, or read on if you
       want to know the exact details of how names are split into tokens and parts.

       How tokens are divided into parts depends on the form of the name.  If the name has no
       commas at brace-level zero (as in the second example), then it is assumed to be in either
       "first last" or "first von last" form.  If there are no tokens that start with a lower-
       case letter, then "first last" form is assumed: the final token is the last name, and all
       other tokens form the first name.  Otherwise, the earliest contiguous sequence of tokens
       with initial lower-case letters is taken as the `von' part; if this sequence includes the
       final token, then a warning is printed and the final token is forced to be the `last'
       part.

       If a name has a single comma, then it is assumed to be in "von last, first" form.  A
       leading sequence of tokens with initial lower-case letters, if any, forms the `von' part;
       tokens between the `von' and the comma form the `last' part; tokens following the comma
       form the `first' part.  Again, if there are no tokens following a leading sequence of
       lowercase tokens, a warning is printed and the token immediately preceding the comma is
       taken to be the `last' part.

       If a name has more than two commas, a warning is printed and the name is treated as though
       only the first two commas were present.

       Finally, if a name has two commas, it is assumed to be in "von last, jr, first" form.
       (This is the only way to represent a name with a `jr' part.)  The parsing of the name is
       the same as for a one-comma name, except that tokens between the two commas are taken to
       be the `jr' part.

CAVEAT

       The C code that does the actual work of splitting up names takes a shortcut and makes few
       assumptions about whitespace.  In particular, there must be no leading whitespace, no
       trailing whitespace, no consecutive whitespace characters in the string, and no whitespace
       characters other than space.  In other words, all whitespace must consist of lone internal
       spaces.

EXAMPLES

       The strings "John Smith" and "Smith, John" are different representations of the same name,
       so split into parts and tokens the same way, namely as:

          first => ('John')
          von   => ()
          last  => ('Smith')
          jr    => ()

       Note that every part is a list of tokens, even if there is only one token in that part;
       empty parts get empty token lists.  Every token is just a string.  Writing this example in
       actual code is simple:

          $name = Text::BibTeX::Name->new("John Smith");  # or "Smith, John"
          $name->part ('first');       # returns list ("John")
          $name->part ('last');        # returns list ("Smith")
          $name->part ('von');         # returns list ()
          $name->part ('jr');          # returns list ()

       (We'll omit the empty parts in the rest of the examples: just assume that any unmentioned
       part is an empty list.)  If more than two tokens are included and there's no comma,
       they'll go to the first name: thus "John Q. Smith" splits into

          first => ("John", "Q."))
          last  => ("Smith")

       and "J. R. R. Tolkein" into

          first => ("J.", "R.", "R.")
          last  => ("Tolkein")

       The ambiguous name "Kevin Philips Bong" splits into

          first => ("Kevin", "Philips")
          last  => ("Bong")

       which may or may not be the right thing, depending on the particular person.  There's no
       way to know though, so if this fellow's last name is "Philips Bong" and not "Bong", the
       string representation of his name must disambiguate.  One possibility is "Philips Bong,
       Kevin" which splits into

          first => ("Kevin")
          last  => ("Philips", "Bong")

       Alternately, "Kevin {Philips Bong}" takes advantage of the fact that tokes are only split
       on whitespace at brace-level zero, and becomes

          first => ("Kevin")
          last  => ("{Philips Bong}")

       which is fine if your names are destined to be processed by TeX, but might be problematic
       in other contexts.  Similarly, "St John-Mollusc, Oliver" becomes

          first => ("Oliver")
          last  => ("St", "John-Mollusc")

       which can also be written as "Oliver {St John-Mollusc}":

          first => ("Oliver")
          last  => ("{St John-Mollusc}")

       Since tokens are separated purely by whitespace, hyphenated names will work either way:
       both "Nigel Incubator-Jones" and "Incubator-Jones, Nigel" come out as

          first => ("Nigel")
          last  => ("Incubator-Jones")

       Multi-token last names with lowercase components -- the "von part" -- work fine: both
       "Ludwig van Beethoven" and "van Beethoven, Ludwig" parse (correctly) into

          first => ("Ludwig")
          von   => ("van")
          last  => ("Beethoven")

       This allows these European aristocratic names to sort properly, i.e. van Beethoven under B
       rather than v.  Speaking of aristocratic European names, "Charles Louis Xavier Joseph de
       la Vall{\'e}e Poussin" is handled just fine, and splits into

          first => ("Charles", "Louis", "Xavier", "Joseph")
          von   => ("de", "la")
          last  => ("Vall{\'e}e", "Poussin")

       so could be sorted under V rather than d.  (Note that the sorting algorithm in
       Text::BibTeX::BibSort is a slavish imitiation of BibTeX 0.99, and therefore does the wrong
       thing with these names: the sort key starts with the "von" part.)

       However, capitalized "von parts" don't work so well: "R. J. Van de Graaff" splits into

          first => ("R.", "J.", "Van")
          von   => ("de")
          last  => ("Graaff")

       which is clearly wrong.  This name should be represented as "Van de Graaff, R. J."

          first => ("R.", "J.")
          last  => ("Van", "de", "Graaff")

       which is probably right.  (This particular Van de Graaff was an American, so he probably
       belongs under V -- which is where my (British) dictionary puts him.  Other Van de Graaff's
       mileages may vary.)

       Finally, many names include a suffix: "Jr.", "III", "fils", and so forth.  These are
       handled, but with some limitations.  If there's a comma before the suffix (the usual U.S.
       convention for "Jr."), then the name should be in last, jr, first form, e.g. "Doe, Jr.,
       John" comes out (correctly) as

          first => ("John")
          last  => ("Doe")
          jr    => ("Jr.")

       but "John Doe, Jr." is ambiguous and is parsed as

          first => ("Jr.")
          last  => ("John", "Doe")

       (so don't do it that way).  If there's no comma before the suffix -- the usual for Roman
       numerals, and occasionally seen with "Jr." -- then you're stuck and have to make the
       suffix part of the last name.  Thus, "Gates III, William H." comes out

          first => ("William", "H.")
          last  => ("Gates", "III")

       but "William H. Gates III" is ambiguous, and becomes

          first => ("William", "H.", "Gates")
          last  => ("III")

       -- not what you want.  Again, the curly-brace trick comes in handy, so "William H. {Gates
       III}" splits into

          first => ("William", "H.")
          last  => ("{Gates III}")

       There is no way to make a comma-less suffix the "jr" part.  (This is an unfortunate
       consequence of slavishly imitating BibTeX 0.99.)

       Finally, names that aren't really names of people but rather are organization or company
       names should be forced into a single token by wrapping them in curly braces.  For example,
       "Foo, Bar and Sons" should be written "{Foo, Bar and Sons}", which will split as

          last  => ("{Foo, Bar and Sons}")

       Of course, if this is one name in a BibTeX "authors" or "editors" list, this name has to
       be wrapped in braces anyways (because of the " and "), but that's another story.

FORMATTING NAMES

       Putting a split-up name back together again in a flexible, customizable way is the job of
       another module: see Text::BibTeX::NameFormat.

METHODS

       new([ [OPTS,] NAME [, FILENAME, LINE, NAME_NUM]])
           Creates a new "Text::BibTeX::Name" object.  If NAME is supplied, it must be a string
           containing a single name, and it will be be passed to the "split" method for further
           processing.  FILENAME, LINE, and NAME_NUM, if present, are all also passed to "split"
           to allow better error messages.

           If the first argument is a hash reference, it is used to define configuration values.
           At the moment the available values are:

           BINMODE
               Set the way Text::BibTeX deals with strings. By default it manages strings as
               bytes. You can set BINMODE to 'utf-8' to get NFC normalized UTF-8 strings and you
               can customise the normalization with the NORMALIZATION option.

                  Text::BibTeX::Name->new(
                     { binmode => 'utf-8', normalization => 'NFD' },
                     "Alberto Simo~es"});

       split (NAME [, FILENAME, LINE, NAME_NUM])
           Splits NAME (a string containing a single name) into tokens and subsequently into the
           four parts of a BibTeX-style name (first, von, last, and jr).  (Each part is a list of
           tokens, and tokens are separated by whitespace or commas at brace-depth zero.  See
           above for full details on how a name is split into its component parts.)

           The token-lists that make up each part of the name are then stored in the
           "Text::BibTeX::Name" object for later retrieval or formatting with the "part" and
           "format" methods.

       part (PARTNAME)
           Returns the list of tokens in part PARTNAME of a name previously split with "split".
           For example, suppose a "Text::BibTeX::Name" object is created and initialized like
           this:

              $name = Text::BibTeX::Name->new();
              $name->split ('Charles Louis Xavier Joseph de la Vall{\'e}e Poussin');

           Then this code:

              $name->part ('von');

           would return the list "('de','la')".

       format (FORMAT)
           Formats a name according to the specifications encoded in FORMAT, which should be a
           "Text::BibTeX::NameFormat" (or descendant) object.  (In short, it must supply a method
           "apply" which takes a "Text::BibTeX::NameFormat" object as its only argument.)
           Returns the formatted name as a string.

           See Text::BibTeX::NameFormat for full details on formatting names.

SEE ALSO

       Text::BibTeX::Entry, Text::BibTeX::NameFormat, bt_split_names.

AUTHOR

       Greg Ward <gward@python.net>

COPYRIGHT

       Copyright (c) 1997-2000 by Gregory P. Ward.  All rights reserved.  This file is part of
       the Text::BibTeX library.  This library is free software; you may redistribute it and/or
       modify it under the same terms as Perl itself.