Ubuntu Manpage: marc2ris - converts MARC bibliographic data to the RIS format

Provided by: refdb-clients_1.0.2-1_amd64

NAME

       marc2ris - converts MARC bibliographic data to the RIS format

SYNOPSIS


       marc2ris [-e log-destination] [-h] [-l log-level] [-L log-file] [-m] [-o outfile] [-O outfile]
                [-t input_type] [-u t|f] file

DESCRIPTION

       marc2ris attempts to extract the information useful to RefDB from MARC datasets.  MARC (Machine Readable
       Catalogue Format) is a standard originating from the 1960s and is widely used by libraries and
       bibliographic agencies. Most libraries that offer Z39.50 access can provide the records in at least one
       MARC format (like with most other "standards" there's a couple to choose from). Currently the following
       MARC dialects are supported:

       MARC21
           This is an attempt to consolidate existing MARC variants (mainly USMARC and CANMARC) and will most
           likely be the format supported by all libraries in the near future. The format is described on the
           Library of Congress MARC pages[1].

       UNIMARC
           This is the European equivalent of a standardization attempt. The specification can be found here[2].

       UKMARC
           This format is fairly close to the USMARC variant and is mainly used by libraries in the United
           Kingdom and in Ireland. Libraries supporting this format may switch to MARC21 in the future.
           Unfortunately there is no online description of this format, but this PDF document[3] describes the
           main differences between USMARC and UKMARC.

OPTIONS

By default the script reads USMARC data from stdin and sends RIS data to stdout.

-e log-destination
log-destination can have the values 0, 1, or 2, or the equivalent strings stderr, syslog, or file,
respectively. This value specifies where the log information goes to. 0 (zero) means the messages
are sent to stderr. They are immediately available on the screen but they may interfere with command
output. 1 will send the output to the syslog facility. Keep in mind that syslog must be configured
to accept log messages from user programs, see the syslog(8) man page for further information.
Unix-like systems usually save these messages in /var/log/user.log. 2 will send the messages to a
custom log file which can be specified with the -L option.

-h
Displays help and usage screen, then exits.

-l log-level
Specify the priority up to which events are logged. This is either a number between 0 and 7 or one of
the strings emerg, alert, crit, err, warning, notice, info, debug, respectively (see also Log level
definitions). -1 disables logging completely. A low log level like 0 means that only the most
critical messages are logged. A higher log level means that less critical events are logged as well.
7 will include debug messages. The latter can be verbose and abundant, so you want to avoid this log
level unless you need to track down problems.

-L log-file
Specify the full path to a log file that will receive the log messages. Typically this would be
/var/log/refdba.

-m
Switch on additional MARC output. The output data will be the RIS output interspersed with the source
MARC data used to generate the output. This is useful to fix conversion errors manually.

-o file
Send output to file. If file exists, its contents will be overwritten.

-O file
Send output to file. If file exists, the output will be appended.

-t input_type
Specify the MARC input type. The default is MARC21. Other available types are UNIMARC and UKMARC.

-u t|f
Request Unicode output if set to "t" (this is the default). marc2ris attempts to convert the input
data into Unicode (unless the dataset explicitly states that it already uses Unicode). If the
conversion does not seem to work, set this to "f" as some MARC variants do not state the character
encoding explicitly.

CONFIGURATION

       marc2ris evaluates the file marc2risrc to initialize itself.

       Table 1. marc2risrc
       ┌───────────┬──────────────────────┬──────────────────────────────┐
       │ Variable  │ Default              │ Comment                      │
       ├───────────┼──────────────────────┼──────────────────────────────┤
       │ outfile   │ (none)               │ The default output file      │
       │           │                      │ name.                        │
       ├───────────┼──────────────────────┼──────────────────────────────┤
       │ outappend │ t                    │ Determines whether output is │
       │           │                      │ appended (t) to an existing  │
       │           │                      │ file or overwrites (f) an    │
       │           │                      │ existing file.               │
       ├───────────┼──────────────────────┼──────────────────────────────┤
       │ unmapped  │ t                    │ If set to t, unknown tags in │
       │           │                      │ the input data will be       │
       │           │                      │ output following a           │
       │           │                      │ <unmapped> tag; the          │
       │           │                      │ resulting data can be        │
       │           │                      │ inspected and then be sent   │
       │           │                      │ through sed to strip off     │
       │           │                      │ these additional lines. If   │
       │           │                      │ set to f, unknown tags will  │
       │           │                      │ be gracefully ignored.       │
       ├───────────┼──────────────────────┼──────────────────────────────┤
       │ logfile   │ /var/log/med2ris.log │ The full path of a custom    │
       │           │                      │ log file. This is used only  │
       │           │                      │ if logdest is set            │
       │           │                      │ appropriately.               │
       ├───────────┼──────────────────────┼──────────────────────────────┤
       │ logdest   │ 1                    │ The destination of the log   │
       │           │                      │ information. 0 = print to    │
       │           │                      │ stderr; 1 = use the syslog   │
       │           │                      │ facility; 2 = use a custom   │
       │           │                      │ logfile. The latter needs a  │
       │           │                      │ proper setting of logfile.   │
       ├───────────┼──────────────────────┼──────────────────────────────┤
       │ loglevel  │ 6                    │ The log level up to which    │
       │           │                      │ messages will be sent. A low │
       │           │                      │ setting (0) allows only the  │
       │           │                      │ most important messages, a   │
       │           │                      │ high setting (7) allows all  │
       │           │                      │ messages including debug     │
       │           │                      │ messages. -1 means nothing   │
       │           │                      │ will be logged.              │
       └───────────┴──────────────────────┴──────────────────────────────┘

DATA PROCESSING

       The purpose of the MARC format is entirely different from the purpose of the RIS format, so you shouldn't
       be too surprised that the import of MARC data is somewhat rough at the edges. The filter apparently deals
       fine with quite a lot of datasets, but the following shortcomings are known (and more are likely to be
       discovered by the interested reader):

       •   Some fields, like 846, are currently ignored completely. This, of course, is bound to change.

       •   Author names specified in the natural order, i.e. something like First Middle Last, are not
           normalized due to the problems with multiple middle or last names. Author names in the inverse order,
           i.e. something like Last, First Middle, are normalized correctly in most cases. Handling of
           non-European names is a matter of trial and error.

       •   Character set handling is somewhat limited. Only the unaltered input character encoding or UTF-8 are
           available for the output data.

       That said, there is still some hope. The -m command line option switches on additional MARC output. That
       is, the generated output will contain interspersed lines that show the contents of the original MARC
       fields used to generate the following RIS line or lines. For example, the following output snippet shows
       how marc2ris generated the author lines from the MARC input:

           <marc>empty author field (100)
           <marc>:Author(Ind1): 1
           <marc>:Author($a): Ershov, A. P.
           <marc>:Author($b):
           <marc>:Author($c):
           <marc>:Author(Ind1): 1
           <marc>:Author($a): Knuth, Donald Ervin,
           <marc>:Author($b):
           <marc>:Author($c):
           AU  - Ershov,A.P.
           AU  - Knuth,Donald Ervin

       If you feel marc2ris does not translate your data appropriately, the easiest way might be to use the -m
       switch and redirect the output into a file. Then you can analyze the situation and fix the RIS lines as
       you see fit. Finally you can strip the MARC lines off with a command like:

           ~$ grep -v "<marc>" < withmarc.ris > womarc.ris

FILES

       PREFIX/etc/refdb/marc2risrc
           The global configuration file of marc2ris.

       $HOME/.marc2risrc
           The user configuration file of marc2ris.

AUTHOR

       marc2ris was written by Markus Hoenicka <markus@mhoenicka.de>.

NOTES

        1. Library of Congress MARC pages
           http://www.loc.gov/marc/

        2. here
           http://www.ifla.org/VI/3/p1996-1/sec-uni.htm

        3. PDF document
           [set $man.base.url.for.relative.links]/www.bl.uk/services/bibliographic/marcchange.pdf

RefDB Manual                                       2005-10-16                                        MARC2RIS(1)