Ubuntu Manpage: EBook::Tools::IMP - Object class for manipulating the SoftBook/GEB/REB/eBookWise ".IMP" and ".RES" e-book

name
synopsis
constructor and initialization
procedures
bugs and limitations
author
thanks
license and copyright

Provided by: libebook-tools-perl_0.4.9-1_amd64

NAME

       EBook::Tools::IMP - Object class for manipulating the SoftBook/GEB/REB/eBookWise ".IMP" and ".RES" e-book
       formats

SYNOPSIS

        use EBook::Tools::IMP qw(:all)
        my $imp = EBook::Tools::IMP->new();
        $imp->load('myfile.imp');

CONSTRUCTOR AND INITIALIZATION

   "new($filename)"
       Instantiates a new EBook::Tools::IMP object.  If $filename is specified, it will also immediately
       initialize itself via the "load" method.

   "load($filename)"
       Loads a .imp file, parsing it into the various object attributes.  Returns 1 on success, or undef on
       failure.

   "load_resdir($dirname)"
       Loads a ".RES" resource directory, parsing it into the object attributes.  Returns 1 on success, or undef
       on failure.

   "author()"
       Returns the full name of the author of the book.

       Author information can either be found entirely in the "$self->{firstname}" attribute or split up into
       "$self->{firstname}", "$self->{middlename}", and "$self->{lastname}".  If the last name is found
       separately, the full name is returned in the format "Last, First Middle".  Otherwise, the full name is
       returned in the format "First Middle".

   "bookproplength()"
       Returns the total length in bytes of the book properties data, including the trailing null used to pack
       the C-style strings, but excluding any ETI server data appended to the end of the standard book
       properties.

   "filecount()"
       Returns the number of resource files as stored in "$self->{filecount}".  Note that this does NOT
       recompute that value from the actual number of resources in "$self->{resources}".  To do that, use
       "create_toc_from_resources()".

   "find_image_type($id,@excluded)"
       Goes through all stored images searching for one with the specified id value, returning the first image
       type found or undef if there were no matches or if no image id was specified.  If the optional argument
       @excluded is specified, any types in the list will be skipped during the search.

       Expected types are 'png', 'jpg', 'gif', and 'pic', searched for in that order.

       This can be used to attempt to locate an alternate image for an undisplayable PICT image.

   "find_resource_by_name($name)"
       Takes as a single argument a resource name and if a resource with that name exists in
       "$self->{resources}" returns the resource type used as the hash key.

       Returns undef if no match was found or a name was not specified.

   "image($type,$id)"
       Returns the image data stored in the resource of the specified type (specifically, stored in
       "$self->{$type}->{$id}->{data}" as parsed from the JPEG resource) corresponding to the 16-bit identifier
       provided as $id.

       Valid values for $type are 'gif','jpg', and 'png'.

       Carps a warning and returns undef if $type is not provided or is not valid, or if $id is not provided.

   "image_hashref($type,$id)"
       Returns the raw object hashref used to store parsed image data for the specified type, as stored in
       "$self->{$type}".  Valid types are 'gif', 'jpg', and 'png'.

       Carps a warning and returns undef if $type is not provided or is not valid.

       If $id is not specified, the keys of the returned hash are the image IDs for the specified image type,
       and the values are hashrefs pointing to hashes containing the following keys:

       •   "unknown"

           A 16-bit integer only available on EBW 1150 resources.  Use with caution.  This key may be renamed if
           more information is found.

       •   "length"

           The length of the actual image data

       •   "offset"

           The byte offset inside of the raw resource data in which the JPEG image data can be found.

       •   "const0"

           An unknown value, but it appears to always be zero.  Use with caution.  This key may be renamed if
           more information is found.

       If the optional argument $id is specified, only the hash for that specific ID is returned, rather than
       the entire hash of hashrefs.

   "image_ids($type)"
       Returns a list of the 16-bit integer IDs of the the specified type of image data stored in the associated
       resource (specifically, stored in "$self->{$type}" as parsed from the JPEG resource).

       Valid types are 'gif', 'jpg', and 'png'.  The method will carp a warning and return undef if another type
       is specified, or no type is specified.

   "is_1150()"
       Returns 1 if "$self->{device} == 2", returns 0 if it is some other value, and undef it is undefined.
       This has value because resources packed for a EBW 1150 or GEB 1150 are in a different format than
       resources packed for other IMP readers.

   "offsetelement($offset)"
       Returns the text of the element corresponding to the given text offset as stored in
       "$self->{offsetelements}", or undef if no such element exists.

   "pack_imp_book_properties()"
       Packs object attributes into the 7 null-terminated strings that constitute the book properties section of
       the header.  Returns that string.

       Note that this does NOT pack the ETI server data appended to this section in encrypted books downloaded
       directly from the ETI servers, even if that data was found when the .imp file was loaded.  This is
       because the extra data can confuse the GEBLibrarian application, and is not needed to read the book.  The
       "bookproplength()" and "pack_imp_header()" methods also assume that this data will not be present.

   "pack_imp_header()"
       Packs object attributes into the 48-byte string representing the IMP header.  Returns that string on
       success, carps a warning and returns undef if a required attribute did not contain valid data.

       Note that in the case of an encrypted e-book with ETI server data in it, this header will not be
       identical to the original -- the resdiroffset value is recalculated for the position with the ETI server
       data stripped.  See "bookproplength()" and "pack_imp_book_properties()".

   "pack_imp_resource(%args)"
       Packs the specified resource stored in "$self->{resources}" into a a data string suitable for writing
       into a .imp file, with a header format determined by "$self->{version}".

       Returns a reference to that string if the resource was found, or undef it was not.

       Arguments

       •   "name"

           Select the resource by resource name.

           If both this and "type" are specified, the type is checked first and the name is only used if the
           type lookup fails.

       •   "type"

           Select the resource by resource type.  This is faster than selecting by name (since resources are
           stored in a hash keyed by type) and is recommended for most use.

           If both this and "name" are specified, the type is checked first and the name is only used if the
           type lookup fails.

   "pack_imp_rsrc_inf()"
       Packs object attributes into the data string that would be the content of the RSRC.INF file.  Returns
       that string.

   "pack_imp_toc()"
       Packs the "$self->{toc}" object attribute into a data string suitable for writing into a .imp file.  The
       format is determined by "$self->{version}".

       Returns that string, or undef if valid version or TOC data is not found.

   "resdirbase()"
       In scalar context, this returns the basename of "$self->{resdirname}".  In list context, it actually
       returns the basename, directory, and extension as per "fileparse" from File::Basename.

   "resdirlength()"
       Returns the length of the .RES directory name as stored in "$self->{resdirlength}".  Note that this does
       NOT recompute the length from the actual name stored in "$self->{resdirname}" -- for that, use
       "set_resdirlength()".

   "resdirname()"
       Returns the .RES directory name stored in "$self->{resdirname}".

   "resource($type)"
       Returns a hashref containing the resource data for the specified resource type, as stored in
       "$self->{resources}->{$type}".

       Returns undef if $type is not specified, or if the specified type is not found.

   "resources()"
       Returns a hashref of hashrefs containing all of the resource data keyed by type, as stored in
       "$self->{resources}".

   "text()"
       Returns the uncompressed text originally stored in the DATA.FRK ('    ') resource.  This will only work
       if the text was unencrypted.

   "title()"
       Returns the book title as stored in "$self->{title}".

   "tocentry($index)"
       Takes as a single argument an integer index to the table of contents data stored in "$self->{toc}".
       Returns the hashref corresponding to that TOC entry, if it exists, or undef otherwise.

   "version()"
       Returns the version of the IMP format used to determine TOC and resource metadata size as stored in
       "$self->{version}".  Expected values are 1 (10-byte metadata) and 2 (20-byte metadata).

   "write_images(%args)"
       Writes the images, if any, to the specified output directory.  Filenames are in the format
       "JPEG_XXXX.jpg" or "PNG_XXXX.png" where "XXXX" is the image ID for that image type formatted as four
       hexadecimal characters.

       Arguments

       •   "dir"

           The output directory in which to write the file.  This will be created if it does not exist.
           Defaults to the basename of the stored resource directory (see also "resdirname()").

   "write_imp($filename)"
       Takes as a sole argument the name of a file to write to, and writes a .imp file to that filename using
       the object attribute data.

       Returns 1 on success, or undef if required data (including the filename) was invalid or missing, or the
       file could not be written.

   "write_resdir()"
       Writes a ".RES" resource directory from the object attribute data, using "$self->{resdirname}" as the
       directory name.

   "write_text(%args)"
       Writes the uncompressed text, if any, to the specified output directory and file.

       Arguments

       •   "dir"

           The output directory in which to write the file.  This will be created if it does not exist.
           Defaults to the basename of the stored resource directory (see also "resdirname()").

       •   "filename"

           The filename of the output file to write.  If not specified, a warning will be carped and the method
           will return undef.

   "create_toc_from_resources()"
       Creates appropriate table of contents data from the metadata in "$self->{resources}", in the format
       specified by "$self->{version}".  This will also set "$self->{filecount}" to match the actual number of
       resources.

       Returns the number of resources found.

   "parse_eti_server_data($data)"
       Parses ETI server data, as potentially found appended to the end of .imp book properties or a RSRC.INF
       resource file on encrypted books downloaded directly from ETI servers.

       Takes as a single argument a string containing just the extra appended data, and stores the parsed values
       in "$self->{etiserverdata}" as a hash.  Note that parsing requires knowledge of the length of the book
       properties at the time this data was inserted; if the book properties have not been properly parsed or
       have been modified, the resulting behaviour of this method is not defined.

       Returns the number of bytes handled, zero if no data was provided.

       The data has the following format and keys:

       •   [0-3 bytes]: padding data to make sure the following data is 4-byte aligned, stored in key "pad".

       •   [4 bytes, big-endian unsigned long int]: unknown value, usually = 2, stored in key "unknown1"

       •   [4 bytes, big-endian unsigned long int]: issue number for periodicals (always 0xffffffff for books),
           stored in key "issuenumber".

       •   [variable-length null-terminated string]: content feed for periodicals, null string for books, stored
           in key "contentfeed".

       •   [variable-length null-terminated string]: source string in the format 'SOURCE_ID:SOURCE_TYPE:None',
           where "SOURCE_ID" is usually '3' and "SOURCE_TYPE" is usually 'B'.

       •   [4 bytes, big-endian unsigned long int]: unknown value, stored in key "unknown2".  This value may not
           be present at all.

   "parse_imp_book_properties($propdata)"
       Takes as a single argument a string containing the book properties data.  Sets the object variables from
       its contents, which should be seven null-terminated strings in the following order:

       •   Identifier

       •   Category

       •   Subcategory

       •   Title

       •   Last Name

       •   Middle Name

       •   First Name

       Note that the entire name is frequently placed into the "First Name" component, and the "Last Name" and
       "Middle Name" components are left blank.

       In addition, ETI server data may be appended to this data on encrypted books downloaded from ETI servers.
       If present, that data will be stored in the hash "$self->{etiserverdata}".  See
       "parse_eti_server_data($data)" for details.

       A warning will be carped if the length of the parsed properties (including the C null string terminators)
       is not equal to the length of the data passed.

   "parse_imp_header()"
       Parses the first 48 bytes of a .IMP file, setting object variables.  The method croaks if it receives any
       more or less than 48 bytes.

       Header Format

       •   Offset 0x00 [2 bytes, big-endian unsigned short int]

           Version.  Expected values are 1 or 2; the version affects the format of the table of contents header.
           If this isn't 1 or 2, the method carps a warning and returns undef.

       •   Offset 0x02 [8 bytes]

           Identifier.  This is always 'BOOKDOUG', and the method carps a warning and returns undef if it isn't.

       •   Offset 0x0A [8 bytes]

           Unknown data, stored in "$self->{unknown0x0a}".  Use with caution -- this value may be renamed if
           more information is obtained.

       •   Offset 0x12 [2 bytes, big-endian unsigned short int]

           Number of included files, stored in "$self->{filecount}".

       •   Offset 0x14 [2 bytes, big-endian unsigned short int]

           Length in bytes of the .RES directory name, stored in "$self->{resdirlength}".

       •   Offset 0x16 [2 bytes, big-endian unsigned short int]

           Offset from the point after this value to the .RES directory name, which also marks the end of the
           book properties, stored in "$self->{resdiroffset}".  Note that this is NOT the length of the book
           properties.  To get the length of the book properties, subtract 24 from this value (the number of
           bytes remaining in the header after this point).  It is also NOT the offset from the beginning of the
           file to the .RES directory name -- to find that, add 24 to this value (the number of bytes already
           parsed).

       •   Offset 0x18 [4 bytes, big-endian unsigned long int?]

           Unknown value, stored in "$self->{unknown0x18}".  Use with caution -- this value may be renamed if
           more information is obtained.

       •   Offset 0x1C [4 bytes, big-endian unsigned long int?]

           Unknown value, stored in "$self->{unknown0x1c}".  Use with caution -- this value may be renamed if
           more information is obtained.

       •   Offset 0x20 [4 bytes, big-endian unsigned long int]

           Compression type, stored in "$self->{compression}".  Expected values are 0 (no compression) and 1
           (LZSS compression).

       •   Offset 0x24 [4 bytes, big-endian unsigned long int]

           Encryption type, stored in "$self->{encryption}".  Expected values are 0 (no encryption) and 2 (DES
           encryption).

       •   Offset 0x28 [2 bytes, big-ending unsigned short int]

           Unknown value, stored in "$self->{unknown0x28}".  Use with caution -- this value may be renamed if
           more information is obtained.

       •   Offset 0x2A [1 byte]

           Unknown value, stored in "$self->{unknown0x2A}".  Use with caution -- this value may be renamed if
           more information is obtained.

       •   Offset 0x2B [2 nybbles (1 byte)]

           The upper nybble at this position is the IMP reader device for which the e-book was designed, stored
           in "$self->{device}".  Expected values are 0 (Softbook 200/250e), 1 (REB 1200/GEB 2150), and 2 (EBW
           1150/GEB1150).

           The lower nybble marks the possible zoom states, stored in "$self->{zoomstates}".  Expected values
           are 0 (both zooms), 1 (small zoom), and 2 (large zoom)

       •   Offset 0x2C [4 bytes, big-endian unsigned long int]

           Unknown value, stored in "$self->{unknown0x2c}".  Use with caution -- this value may be renamed if
           more information is obtained.

   "parse_resource_cm()"
       Parses the "!!cm" resource loaded into "$self->{resources}", if present, extracting the LZSS
       uncompression parameters into "$self->{lzssoffsetbits}" and "$self->{lzsslengthbits}".

       Returns 1 on success, or undef if no "!!cm" resource has been loaded yet or the resource data is invalid.

   "parse_resource_images()"
       Parses the image data resources loaded into "$self->{resources}", if present, placing the image data and
       metadata of each image found into "$self->{jpg}" and "$self->{png}", keyed by 16-bit image resource ID.

       Returns the total number of images found and parsed.

       This method is called automatically by "load()" and "load_resdir()".

       See also accessor methods "image(%args)" and "image_hashrefs(%args)".

   "parse_resource_imrn()"
       Parses the index of text offsets to all images as stored in "$self->{resources}->{'ImRn'}", if present,
       storing them in "$self->{imrn}" as a hash of hashrefs indexed by its 32-bit integer offset to the 0x0F
       control code in the uncompressed text stored in the DATA.FRK resource.

       Returns the total number of offsets found and parsed.

       The hash keys of each offset hash are:

       •   "width"

           Image display width in pixels.

       •   "height"

           Image display height in pixels.

       •   "id"

           A 16-bit integer value used to uniquely identify the image inside a particular resource type.

       •   "restype"

           The four-letter resource type string.

       •   "constF1"

           A 32-bit value of unknown purpose which should always be 0xFFFFFFFF.

       •   "constF2"

           A second 32-bit value of unknown purpose which should always be 0xFFFFFFFF.

       •   "const0"

           A 32-bit integer value of unknown purpose which should always be 0x00000000.

       •   "constB"

           A 16-bit integer value of unknown purpose which could be 0xFFFA, 0xFFFB, 0xFFFC, or 0xFFFE.

       •   "unknown16"

           A 16-bit integer value of unknown purpose found only in 1150 resources.

       •   "unknown32"

           A 32-bit integer value of unknown purpose.

       This method is called automatically by "load()" and "load_resdir()".

   "parse_text()"
       Parses the '    ' (DATA.FRK) resource loaded into "$self->{resources}", if present, extracting the text
       into "$self->{text}", uncompressing it if necessary.  LZSS uncompression will use the
       "$self->{lzsslengthbits}" and "$self->{lzssoffsetbits}" attributes if present, and default to 3 length
       bits and 14 offset bits otherwise.

       HTML headers and footers are then applied, and control codes replaced with appropriate tags.

       Returns the length of the raw uncompressed text before any HTML modification was done, or undef if no
       text resource was found or the text was encrypted.

   "parse_imp_toc_v1($tocdata)"
       Takes as a single argument a string containing the table of contents data, and parses it into object
       attributes following the version 1 format (10 bytes per entry).

       Format

       •   Offset 0x00 [4 bytes, text]

           Resource name.  Stored in hash key "name".  In the case of the 'DATA.FRK' text resource, this will be
           four spaces ('    ').

       •   Offset 0x04 [2 bytes, big-endian unsigned short int]

           Unknown, but always zero or one.  Stored in hash key "unknown1".

       •   Offset 0x08 [4 bytes, big-endian unsigned long int]

           Size of the resource data in bytes.  Stored in hash key "size".

   "parse_imp_toc_v2($tocdata)"
       Takes as a single argument a string containing the table of contents data, and parses it into object
       attributes following the version 2 format (20 bytes per entry).

       Format

       •   Offset 0x00 [4 bytes, text]

           Resource name.  Stored in "name".  In the case of the 'DATA.FRK' text resource, this will be four
           spaces ('   ').

       •   Offset 0x04 [4 bytes, big-endian unsigned long int]

           Unknown, but always zero.  Stored in "unknown1".

       •   Offset 0x08 [4 bytes, big-endian unsigned long int]

           Size of the resource data in bytes.  Stored in "size".

       •   Offset 0x0C [4 bytes, text]

           Resource type.  Stored in "type", and used as the key for the stored resource hash.

       •   Offset 0x10 [4 bytes, big-endian unsigned long int]

           Unknown, but always either zero or one.  Stored in "unknown2".

   "set_book_properties(%args)"
       Sets the specified book properties.  Returns 1 on success, or undef if no properties were specified.

       Arguments

       •   "identifier"

           The book identifier, as might be provided as an OPF "<dc:identifier>" element.

       •   "category"

           The main book category, as might be provided as an OPF "<dc:subject>" element.

       •   "subcategory"

           The subcategory, generally a set of search arguments for the ETI website.

       •   "title"

           The book title, as might be provided as an OPF "<dc:title>" element.

       •   "lastname"

           The primary author's last name, but see the entry for "firstname" before deciding how to handle name
           storage.

       •   "middlename"

           The primary author's middle name, but see the entry for "firstname" before deciding how to handle
           name storage.

       •   "firstname"

           The primary author's first name, but this field is also used by a great many .imp books to store the
           entire name in "First Last" format.  If this field is to be used this way, "lastname" and
           "middlename" must be blank.

       Example

        $imp->set_book_properties(title => 'My Best Book',
                                  category => 'Fiction',
                                  firstname => 'John Q. Public');

PROCEDURES

All procedures are exportable, but none are exported by default.

"detect_resource_type(\$data)"
Takes as a sole argument a reference to the data component of a resource. Returns a 4-byte string
containing the resource type if detected successfully, or undef otherwise.

Detection will not work on the "DATA.FRK" (' ') resource. That one must be detected separately by
name/type.

"parse_imp_resource_v1()"
Takes as a sole argument a string containing the data (including the 10-byte header) of a version 1 IMP
resource.

Returns a hashref containing that data separated into the following keys:

• "name"

The four-letter name of the resource.

• "type"

The four-letter type of the resource. This is detected from the data, and is not part of the v1
header.

• "unknown1"

A 16-bit unsigned int of unknown purpose. Expected values are 0 or 1.

Use with caution. This key may be renamed later if more information is found.

• "size"

The expected size in bytes of the actual resource data. A warning will be carped if this does not
match the actual size of the data following the header.

• "data"

The actual resource data.

"parse_imp_resource_v2()"
Takes as a sole argument a string containing the data (including the 20-byte header) of a version 2 IMP
resource.

Returns a hashref containing that data separated into the following keys:

• "name"

The four-letter name of the resource.

• "unknown1"

A 32-bit unsigned int of unknown purpose. Expected values are 0 or 1.

Use with caution. This key may be renamed later if more information is found.

• "size"

The expected size in bytes of the actual resource data. A warning will be carped if this does not
match the actual size of the data following the header.

• "type"

The four-letter type of the resource.

• "unknown2"

A 32-bit unsigned int of unknown purpose. Expected values are 0 or 1.

Use with caution. This key may be renamed later if more information is found.

• "data"

The actual resource data.

BUGS AND LIMITATIONS

       •   Not finished.  Do not try to use yet.

       •   MacPaint PICT images are not well-supported.  If present in the book, they will be saved, but a
           warning will be carped about invalid image data.

       •   Support for v1 files is completely untested and implemented with some guesswork.  Bug reports
           welcome.

AUTHOR

       Zed Pobre <zed@debian.org>

THANKS

       Thanks are due to Nick Rapallo <nrapallo@yahoo.ca> for invaluable assistance in understanding the .IMP
       format and testing this code.

       Thanks are also due to Jeffrey Kraus-yao <krausyaoj@ameritech.net> for his work reverse-engineering the
       .IMP format to begin with, and the documentation at <http://krausyaoj.tripod.com/reb1200.htm>.

LICENSE AND COPYRIGHT

       Copyright 2008 Zed Pobre

       Licensed to the public under the terms of the GNU GPL, version 2.