Provided by: texlive-bibtex-extra_2024.20241115-1_all bug

NAME

       ltx2crossrefxml.pl - create XML files for submitting to crossref.org

SYNOPSIS

       ltx2crossrefxml [-c config_file]  [-o output_file] [-input-is-xml]
                       latex_file1 latex_file2 ...

OPTIONS

       -c config_file
           Configuration file.  If this file is absent, defaults are used.  See below for its
           format.

       -o output_file
           Output file.  If this option is not used, the XML is output to stdout.

       -rpi-is-xml
           Do not transform author and title input strings, assume they are valid XML.

       The usual "--help" and "--version" options are also supported. Options can begin with
       either "-" or "--", and ordered arbitrarily.

DESCRIPTION

       For each given latex_file, this script reads ".rpi" and (if they exist) ".bbl" files and
       outputs corresponding XML that can be uploaded to Crossref (<https://crossref.org>). Any
       extension of latex_file is ignored, and latex_file itself is not read (and need not even
       exist).

       Each ".rpi" file specifies the metadata for a single article to be uploaded to Crossref (a
       "journal_article" element in their schema); an example is below. These files are output by
       the "resphilosophica" package (<https://ctan.org/pkg/resphilosophica>) and the TUGboat
       publication procedure (<https://tug.org/TUGboat/repository.html>), but (as always) can
       also be created by hand or by whatever other method you implement.

       Any ".bbl" files present are used for the citation information in the output XML. See the
       CITATIONS section below.

       Unless "--rpi-is-xml" is specified, for all text (authors, title, citations), standard TeX
       control sequences are replaced with plain text or UTF-8 or eliminated, as appropriate. The
       "LaTeX::ToUnicode::convert" routine is used for this
       (<https://ctan.org/pkg/bibtexperllibs>).  Tricky TeX control sequences will almost surely
       not be handled correctly.

       If "--rpi-is-xml" is given, the author and title strings from the rpi files are output as-
       is, assuming they are valid XML; no checking is done.

       Citation text from ".bbl" files is always converted from LaTeX to plain text.

       This script just writes an XML file. It's up to you to do the uploading to Crossref; for
       example, you can use their Java tool "crossref-upload-tool.jar"
       (<https://www.crossref.org/education/member-setup/direct-deposit-xml/https-post>).

       For the definition of the Crossref schema currently output by this script, see
       <https://data.crossref.org/reports/help/schema_doc/5.3.1/index.html> with additional links
       and information at
       <https://www.crossref.org/documentation/schema-library/metadata-deposit-schema-5-3-1/>.

CONFIGURATION FILE FORMAT

       The configuration file is read as Perl code. Thus, comment lines starting with "#" and
       blank lines are ignored. The other lines are typically assignments in the form (spaces are
       optional):

           $variable = value ;

       Usually the value is a "string" enclosed in ASCII double-quote or single-quote characters,
       per Perl syntax. The idea is to specify the user-specific and journal-specific values
       needed for the Crossref upload. The variables which are used are these:

           $depositorName = "Depositor Name";
           $depositorEmail = 'depositor@example.org';
           $registrant = 'Registrant';  # organization name
           $fullTitle = "FULL TITLE";   # journal name
           $issn = "1234-5678";         # required
           $abbrevTitle = "ABBR. TTL."; # optional
           $coden = "CODEN";            # optional

       For a given run, all ".rpi" data read is assumed to belong to the journal that is
       specified in the configuration file. More precisely, the configuration data is written as
       a "journal_metadata" element, with given "full_title", "issn", etc., and then each ".rpi"
       is written as "journal_issue" plus "journal_article" elements.

       The configuration file can also define one Perl function: "LaTeX_ToUnicode_convert_hook".
       If it is defined, it is called at the beginning of the procedure that converts LaTeX text
       to Unicode, which is done with the LaTeX::ToUnicode module, from the "bibtexperllibs"
       package (<https://ctan.org/pkg/bibtexperllibs>). The function must accept one string (the
       LaTeX text), and return one string (presumably the transformed string). The standard
       conversions are then applied to the returned string, so the configured function need only
       handle special cases, such as control sequences particular to the journal at hand.

RPI FILE FORMAT

       Here's the (relevant part of the) ".rpi" file corresponding to the "rpsample.tex" example
       in the "resphilosophica" package (<https://ctan.org/pkg/resphilosophica>):

         %authors=Boris Veytsman\and A. U. Th{\o }r\and C. O. R\"espondent
         %title=A Sample Paper:\\ \emph  {A Template}
         %year=2012
         %volume=90
         %issue=1--2
         %startpage=1
         %endpage=1
         %doi=10.11612/resphil.A31245
         %paperUrl=http://borisv.lk.net/paper12
         %publicationType=full_text

       Other lines, some not beginning with %, are ignored (and not shown).  For more details on
       processing, see the code.

       The %paperUrl value is what will be associated with the given %doi (output as the
       "resource" element). Crossref strongly recommends that the url be for a so-called landing
       page, and not directly for a pdf
       (<https://www.crossref.org/education/member-setup/creating-a-landing-page/>).  Special
       case: if the url is not specified, and the journal is Res Philosophica, a special-purpose
       search url using pdcnet.org is returned.  Any other journal must always specify this.

       The %authors field is split at "\and" (ignoring whitespace before and after), and output
       as the "contributors" element, using "sequence="first"" for the first listed,
       "sequence="additional"" for the remainder. The authors are parsed using
       "BibTeX::Parser::Author" (<https://ctan.org/pkg/bibtexperllibs>).

       If the %publicationType is not specified, it defaults to "full_text", since that has
       historically been the case; "full_text" can also be given explicitly. The other values
       allowed by the Crossref schema are "abstract_only" and "bibliographic_record". Finally, if
       the value is "omit", the "publication_type" attribute is omitted entirely from the given
       "journal_article" element.

       Each ".rpi" must contain information for only one article, but multiple files can be read
       in a single run. It would not be difficult to support multiple articles in a single ".rpi"
       file, but it makes debugging and error correction easier to keep the input to one article
       per file.

   MORE ABOUT AUTHOR NAMES
       The three formats for names recognized are (not coincidentally) the same as BibTeX:

          First von Last
          von Last, First
          von Last, Jr., First

       The forms can be freely intermixed within a single %authors line, separated with "\and"
       (including the backslash). Commas as name separators are not supported, unlike BibTeX.

       In short, you may almost always use the first form; you shouldn't if either there's a Jr
       part, or the Last part has multiple tokens but there's no von part. See the "btxdoc"
       (``BibTeXing'' by Oren Patashnik) document for details. The authors are parsed using
       "BibTeX::Parser::Author" (<https://ctan.org/pkg/bibtexperllibs>).

       In the %authors line of a ".rpi" file, some secondary directives are recognized, indicated
       by "|" characters. Easiest to explain with an example:

         %authors=|organization|\LaTeX\ Project Team \and Alex Brown|orcid=123

       Thus: 1) if "|organization|" is specified, the author name will be output as an
       "organization" contributor, instead of the usual "person_name", as the Crossref schema
       requires.

       2) If "|orcid=value|" is specified, the value is output as an "ORCID" element for that
       "person_name".

       These two directives, "|organization"| and "|orcid|" are mutually exclusive, because
       that's how the Crossref schema defines them. The "=" sign after "orcid" is required, while
       all spaces after the "orcid" keyword are ignored. Other than that, the ORCID value is
       output literally. (E.g., the ORCID value of 123 above is clearly invalid, but it would be
       output anyway, with no warning.)

       Extra "|" characters, at the beginning or end of the entire %authors string, or doubled in
       the middle, are accepted and ignored. Whitespace is ignored around all "|" characters.

CITATIONS

       Each ".bbl" file corresponding to an input ".rpi" file is read and used to output a
       "citation_list" element for that "journal_article" in the output XML. If no ".bbl" file
       exists for a given ".rpi", no "citation_list" is output for that article.

       The ".bbl" processing is rudimentary: only so-called "unstructured_citation" references
       are produced for Crossref, that is, the contents of the citation (each paragraph in the
       ".bbl") is dumped as a single flat string without markup.

       Bibliography text is unconditionally converted from TeX to XML, via the method described
       above. It is not unusual for the conversion to be incomplete or incorrect.  It is up to
       you to check for this; e.g., if any backslashes remain in the output, it is most likely an
       error.

       Furthermore, it is assumed that the ".bbl" file contains a sequence of references, each
       starting with "\bibitem{KEY}" (which itself must be at the beginning of a line, preceded
       only by whitespace), and the whole bibliography ending with "\end{thebibliography}"
       (similarly at the beginning of a line). A bibliography not following this format will not
       produce useful results. Bibliographies can be created by hand, or with BibTeX, or any
       other method.

       The "key" attribute for the "citation" element is taken as the KEY argument to the
       "\bibitem" command. The sequential number of the citation (1, 2, ...) is appended. The
       argument to "\bibitem" can be empty ("\bibitem{}", and the sequence number will be used on
       its own.  Although TeX will not handle empty "\bibitem" keys, it can be convenient when
       creating a ".bbl" purely for Crossref.

       The ".rpi" file is also checked for the bibliography information, in this same format.

       Feature request: if anyone is interested in figuring out how to generate structured
       citations
       (<https://data.crossref.org/reports/help/schema_doc/5.3.1/common5_3_1_xsd.html#citation>),
       that would be great. The schema does not support many useful fields, so we also want to
       keep the unstructured text output.

       Norman Gray's beastie program (<https://heptapod.host/nxg/beastie>) supports this, via
       "beastie extract-bib.scm -O crossref $(doc).aux", as invoked in the TUGboat "Common.mak"
       file. Work in progress.

       By the way, if for some reason we have to switch away from using beastie, the most viable
       approach is probably to change "tugboat.bst" to output no-op TeX commands like
       \tubibauthor, \tubibtitle, etc. (a la biblatex), and use those commands to discern the
       various crossref field values. We can't start from the .bib because then we'd have to
       reimplement Bib(La)TeX.

EXAMPLES

         ltx2crossrefxml.pl ../paper1/paper1.tex ../paper2/paper2.tex \
                             -o result.xml

         ltx2crossrefxml.pl -c myconfig.cfg paper.tex -o paper.xml

AUTHOR

       Boris Veytsman <https://github.com/borisveytsman/crossrefware>

COPYRIGHT AND LICENSE

       Copyright (C) 2012-2024  Boris Veytsman

       This is free software.  You may redistribute copies of it under the terms of the GNU
       General Public License (any version) <https://www.gnu.org/licenses/gpl.html>.  There is NO
       WARRANTY, to the extent permitted by law.

                                            2024-09-02                         ltx2crossrefxml(1)