Provided by: libmarc-charset-perl_1.35-4_amd64 bug

NAME

       MARC::Charset - convert MARC-8 encoded strings to UTF-8

SYNOPSIS

           # import the marc8_to_utf8 function
           use MARC::Charset 'marc8_to_utf8';

           # prepare STDOUT for utf8
           binmode(STDOUT, 'utf8');

           # print out some marc8 as utf8
           print marc8_to_utf8($marc8_string);

DESCRIPTION

       MARC::Charset allows you to turn MARC-8 encoded strings into UTF-8 strings. MARC-8 is a
       single byte character encoding that predates unicode, and allows you to put non-Roman
       scripts in MARC bibliographic records.

           http://www.loc.gov/marc/specifications/spechome.html

EXPORTS

   ignore_errors()
       Tells MARC::Charset whether or not to ignore all encoding errors, and returns the current
       setting.  This is helpful if you have records that contain both MARC8 and UNICODE
       characters.

           my $ignore = MARC::Charset->ignore_errors();

           MARC::Charset->ignore_errors(1); # ignore errors
           MARC::Charset->ignore_errors(0); # DO NOT ignore errors

   assume_unicode()
       Tells MARC::Charset whether or not to assume UNICODE when an error is encountered in
       ignore_errors mode and returns the current setting.  This is helpful if you have records
       that contain both MARC8 and UNICODE characters.

           my $setting = MARC::Charset->assume_unicode();

           MARC::Charset->assume_unicode(1); # assume characters are unicode (utf-8)
           MARC::Charset->assume_unicode(0); # DO NOT assume characters are unicode

   assume_encoding()
       Tells MARC::Charset whether or not to assume a specific encoding when an error is
       encountered in ignore_errors mode and returns the current setting.  This is helpful if you
       have records that contain both MARC8 and other characters.

           my $setting = MARC::Charset->assume_encoding();

           MARC::Charset->assume_encoding('cp850'); # assume characters are cp850
           MARC::Charset->assume_encoding(''); # DO NOT assume any encoding

   marc8_to_utf8()
       Converts a MARC-8 encoded string to UTF-8.

           my $utf8 = marc8_to_utf8($marc8);

       If you'd like to ignore errors pass in a true value as the 2nd parameter or call
       MARC::Charset->ignore_errors() with a true value:

           my $utf8 = marc8_to_utf8($marc8, 'ignore-errors');

         or

           MARC::Charset->ignore_errors(1);
           my $utf8 = marc8_to_utf8($marc8);

   utf8_to_marc8()
       Will attempt to translate utf8 into marc8.

           my $marc8 = utf8_to_marc8($utf8);

       If you'd like to ignore errors, or characters that can't be converted to marc8 then pass
       in a true value as the second parameter:

           my $marc8 = utf8_to_marc8($utf8, 'ignore-errors');

         or

           MARC::Charset->ignore_errors(1);
           my $utf8 = marc8_to_utf8($marc8);

DEFAULT CHARACTER SETS

       If you need to alter the default character sets you can set the $MARC::Charset::DEFAULT_G0
       and $MARC::Charset::DEFAULT_G1 variables to the appropriate character set code:

           use MARC::Charset::Constants qw(:all);
           $MARC::Charset::DEFAULT_G0 = BASIC_ARABIC;
           $MARC::Charset::DEFAULT_G1 = EXTENDED_ARABIC;

SEE ALSO

       •   MARC::Charset::Constant

       •   MARC::Charset::Table

       •   MARC::Charset::Code

       •   MARC::Charset::Compiler

       •   MARC::Record

       •   MARC::XML

AUTHOR

       Ed Summers (ehs@pobox.com)