Ubuntu Manpage: fix_latin - filters a data stream that is predominantly utf8 and 'fixes' any latin (ie: non-ASCII 8 bit)

Provided by: libencoding-fixlatin-perl_1.04-4_all

NAME

       fix_latin - filters a data stream that is predominantly utf8 and 'fixes' any latin (ie: non-ASCII 8 bit)
       characters

SYNOPSIS

         fix_latin options <input_file >output_file

         Options:

          --use-xs <value> 'auto' | 'always' | 'never'
          --version        list version number
          --help           detailed help message

DESCRIPTION

       The script acts as a filter, taking source data which may contain a mix of ASCII, UTF8, ISO8859-1 and
       CP1252 characters, and producing output will be all ASCII/UTF8.

       Multi-byte UTF8 characters will be passed through unchanged (although over-long UTF8 byte sequences will
       be converted to the shortest normal form).  Single byte characters will be converted as follows:

         0x00 - 0x7F   ASCII - passed through unchanged
         0x80 - 0x9F   Converted to UTF8 using CP1252 mappings
         0xA0 - 0xFF   Converted to UTF8 using Latin-1 mappings

OPTIONS

       --use-xs 'auto' | 'always' | 'never'
           Override  default ('auto') behaviour of trying to use XS module and falling back to pure-Perl version
           if not available.  Set to 'never' to always use the Perl version or 'always' to always use XS and die
           if not available.

       --version (alias -v)
           Display version number of underlying Encoding::FixLatin and XS modules.

       --help (alias -?)
           Display this documentation.

EXAMPLES

       This script was originally written to assist in converting a Postgres database from SQL-ASCII encoding to
       UNICODE UTF8 encoding.  The following examples illustrate its use in that context.

       If you have a SQL format dump file that you would normally restore by piping into 'psql', you can  simply
       filter the dump file through this script:

         fix_latin < dump_file | psql -d database

       If  you  have a compressed dump file that you would normally restore using 'pg_restore', you can omit the
       '-d' option on pg_restore and pipe the resulting SQL through this script and into psql:

         pg_restore -O dump_file | fix_latin | psql -d database

       To take a look at non-ASCII lines in the dump file:

         perl -ne '/^COPY (\S+)/ and $t = $1; print "$t:$_" if /[^\x00-\x7F]/' dump_file

COPYRIGHT & LICENSE

       Copyright 2009-2014 Grant McLean "<grantm@cpan.org>"

       This program is free software; you can redistribute it and/or modify it under  the  same  terms  as  Perl
       itself.

perl v5.38.2                                       2024-03-05                                      FIX_LATIN(1p)

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

EXAMPLES

SEE ALSO

COPYRIGHT & LICENSE