Provided by: libebook-tools-perl_0.5.4-1.3_amd64
NAME
EBook::Tools::IMP - Object class for manipulating the SoftBook/GEB/REB/eBookWise ".IMP" and ".RES" e-book formats
SYNOPSIS
use EBook::Tools::IMP qw(:all) my $imp = EBook::Tools::IMP->new(); $imp->load('myfile.imp');
CONSTRUCTOR AND INITIALIZATION
"new($filename)" Instantiates a new EBook::Tools::IMP object. If $filename is specified, it will also immediately initialize itself via the "load" method. "load($filename)" Loads a .imp file, parsing it into the various object attributes. Returns 1 on success, or undef on failure. "load_resdir($dirname)" Loads a ".RES" resource directory, parsing it into the object attributes. Returns 1 on success, or undef on failure. "author()" Returns the full name of the author of the book. Author information can either be found entirely in the "$self->{firstname}" attribute or split up into "$self->{firstname}", "$self->{middlename}", and "$self->{lastname}". If the last name is found separately, the full name is returned in the format "Last, First Middle". Otherwise, the full name is returned in the format "First Middle". "bookproplength()" Returns the total length in bytes of the book properties data, including the trailing null used to pack the C-style strings, but excluding any ETI server data appended to the end of the standard book properties. "filecount()" Returns the number of resource files as stored in "$self->{filecount}". Note that this does NOT recompute that value from the actual number of resources in "$self->{resources}". To do that, use "create_toc_from_resources()". "find_image_type($id,@excluded)" Goes through all stored images searching for one with the specified id value, returning the first image type found or undef if there were no matches or if no image id was specified. If the optional argument @excluded is specified, any types in the list will be skipped during the search. Expected types are 'png', 'jpg', 'gif', and 'pic', searched for in that order. This can be used to attempt to locate an alternate image for an undisplayable PICT image. "find_resource_by_name($name)" Takes as a single argument a resource name and if a resource with that name exists in "$self->{resources}" returns the resource type used as the hash key. Returns undef if no match was found or a name was not specified. "image($type,$id)" Returns the image data stored in the resource of the specified type (specifically, stored in "$self->{$type}->{$id}->{data}" as parsed from the JPEG resource) corresponding to the 16-bit identifier provided as $id. Valid values for $type are 'gif','jpg', and 'png'. Carps a warning and returns undef if $type is not provided or is not valid, or if $id is not provided. "image_hashref($type,$id)" Returns the raw object hashref used to store parsed image data for the specified type, as stored in "$self->{$type}". Valid types are 'gif', 'jpg', and 'png'. Carps a warning and returns undef if $type is not provided or is not valid. If $id is not specified, the keys of the returned hash are the image IDs for the specified image type, and the values are hashrefs pointing to hashes containing the following keys: • "unknown" A 16-bit integer only available on EBW 1150 resources. Use with caution. This key may be renamed if more information is found. • "length" The length of the actual image data • "offset" The byte offset inside of the raw resource data in which the JPEG image data can be found. • "const0" An unknown value, but it appears to always be zero. Use with caution. This key may be renamed if more information is found. If the optional argument $id is specified, only the hash for that specific ID is returned, rather than the entire hash of hashrefs. "image_ids($type)" Returns a list of the 16-bit integer IDs of the the specified type of image data stored in the associated resource (specifically, stored in "$self->{$type}" as parsed from the JPEG resource). Valid types are 'gif', 'jpg', and 'png'. The method will carp a warning and return undef if another type is specified, or no type is specified. "is_1150()" Returns 1 if "$self->{device} == 2", returns 0 if it is some other value, and undef it is undefined. This has value because resources packed for a EBW 1150 or GEB 1150 are in a different format than resources packed for other IMP readers. "offsetelement($offset)" Returns the text of the element corresponding to the given text offset as stored in "$self->{offsetelements}", or undef if no such element exists. "pack_imp_book_properties()" Packs object attributes into the 7 null-terminated strings that constitute the book properties section of the header. Returns that string. Note that this does NOT pack the ETI server data appended to this section in encrypted books downloaded directly from the ETI servers, even if that data was found when the .imp file was loaded. This is because the extra data can confuse the GEBLibrarian application, and is not needed to read the book. The "bookproplength()" and "pack_imp_header()" methods also assume that this data will not be present. "pack_imp_header()" Packs object attributes into the 48-byte string representing the IMP header. Returns that string on success, carps a warning and returns undef if a required attribute did not contain valid data. Note that in the case of an encrypted e-book with ETI server data in it, this header will not be identical to the original -- the resdiroffset value is recalculated for the position with the ETI server data stripped. See "bookproplength()" and "pack_imp_book_properties()". "pack_imp_resource(%args)" Packs the specified resource stored in "$self->{resources}" into a a data string suitable for writing into a .imp file, with a header format determined by "$self->{version}". Returns a reference to that string if the resource was found, or undef it was not. Arguments • "name" Select the resource by resource name. If both this and "type" are specified, the type is checked first and the name is only used if the type lookup fails. • "type" Select the resource by resource type. This is faster than selecting by name (since resources are stored in a hash keyed by type) and is recommended for most use. If both this and "name" are specified, the type is checked first and the name is only used if the type lookup fails. "pack_imp_rsrc_inf()" Packs object attributes into the data string that would be the content of the RSRC.INF file. Returns that string. "pack_imp_toc()" Packs the "$self->{toc}" object attribute into a data string suitable for writing into a .imp file. The format is determined by "$self->{version}". Returns that string, or undef if valid version or TOC data is not found. "resdirbase()" In scalar context, this returns the basename of "$self->{resdirname}". In list context, it actually returns the basename, directory, and extension as per "fileparse" from File::Basename. "resdirlength()" Returns the length of the .RES directory name as stored in "$self->{resdirlength}". Note that this does NOT recompute the length from the actual name stored in "$self->{resdirname}" -- for that, use "set_resdirlength()". "resdirname()" Returns the .RES directory name stored in "$self->{resdirname}". "resource($type)" Returns a hashref containing the resource data for the specified resource type, as stored in "$self->{resources}->{$type}". Returns undef if $type is not specified, or if the specified type is not found. "resources()" Returns a hashref of hashrefs containing all of the resource data keyed by type, as stored in "$self->{resources}". "text()" Returns the uncompressed text originally stored in the DATA.FRK (' ') resource. This will only work if the text was unencrypted. "title()" Returns the book title as stored in "$self->{title}". "tocentry($index)" Takes as a single argument an integer index to the table of contents data stored in "$self->{toc}". Returns the hashref corresponding to that TOC entry, if it exists, or undef otherwise. "version()" Returns the version of the IMP format used to determine TOC and resource metadata size as stored in "$self->{version}". Expected values are 1 (10-byte metadata) and 2 (20-byte metadata). "write_images(%args)" Writes the images, if any, to the specified output directory. Filenames are in the format "JPEG_XXXX.jpg" or "PNG_XXXX.png" where "XXXX" is the image ID for that image type formatted as four hexadecimal characters. Arguments • "dir" The output directory in which to write the file. This will be created if it does not exist. Defaults to the basename of the stored resource directory (see also "resdirname()"). "write_imp($filename)" Takes as a sole argument the name of a file to write to, and writes a .imp file to that filename using the object attribute data. Returns 1 on success, or undef if required data (including the filename) was invalid or missing, or the file could not be written. "write_resdir()" Writes a ".RES" resource directory from the object attribute data, using "$self->{resdirname}" as the directory name. "write_text(%args)" Writes the uncompressed text, if any, to the specified output directory and file. Arguments • "dir" The output directory in which to write the file. This will be created if it does not exist. Defaults to the basename of the stored resource directory (see also "resdirname()"). • "filename" The filename of the output file to write. If not specified, a warning will be carped and the method will return undef. "create_toc_from_resources()" Creates appropriate table of contents data from the metadata in "$self->{resources}", in the format specified by "$self->{version}". This will also set "$self->{filecount}" to match the actual number of resources. Returns the number of resources found. "parse_eti_server_data($data)" Parses ETI server data, as potentially found appended to the end of .imp book properties or a RSRC.INF resource file on encrypted books downloaded directly from ETI servers. Takes as a single argument a string containing just the extra appended data, and stores the parsed values in "$self->{etiserverdata}" as a hash. Note that parsing requires knowledge of the length of the book properties at the time this data was inserted; if the book properties have not been properly parsed or have been modified, the resulting behaviour of this method is not defined. Returns the number of bytes handled, zero if no data was provided. The data has the following format and keys: • [0-3 bytes]: padding data to make sure the following data is 4-byte aligned, stored in key "pad". • [4 bytes, big-endian unsigned long int]: unknown value, usually = 2, stored in key "unknown1" • [4 bytes, big-endian unsigned long int]: issue number for periodicals (always 0xffffffff for books), stored in key "issuenumber". • [variable-length null-terminated string]: content feed for periodicals, null string for books, stored in key "contentfeed". • [variable-length null-terminated string]: source string in the format 'SOURCE_ID:SOURCE_TYPE:None', where "SOURCE_ID" is usually '3' and "SOURCE_TYPE" is usually 'B'. • [4 bytes, big-endian unsigned long int]: unknown value, stored in key "unknown2". This value may not be present at all. "parse_imp_book_properties($propdata)" Takes as a single argument a string containing the book properties data. Sets the object variables from its contents, which should be seven null-terminated strings in the following order: • Identifier • Category • Subcategory • Title • Last Name • Middle Name • First Name Note that the entire name is frequently placed into the "First Name" component, and the "Last Name" and "Middle Name" components are left blank. In addition, ETI server data may be appended to this data on encrypted books downloaded from ETI servers. If present, that data will be stored in the hash "$self->{etiserverdata}". See "parse_eti_server_data($data)" for details. A warning will be carped if the length of the parsed properties (including the C null string terminators) is not equal to the length of the data passed. "parse_imp_header()" Parses the first 48 bytes of a .IMP file, setting object variables. The method croaks if it receives any more or less than 48 bytes. Header Format • Offset 0x00 [2 bytes, big-endian unsigned short int] Version. Expected values are 1 or 2; the version affects the format of the table of contents header. If this isn't 1 or 2, the method carps a warning and returns undef. • Offset 0x02 [8 bytes] Identifier. This is always 'BOOKDOUG', and the method carps a warning and returns undef if it isn't. • Offset 0x0A [8 bytes] Unknown data, stored in "$self->{unknown0x0a}". Use with caution -- this value may be renamed if more information is obtained. • Offset 0x12 [2 bytes, big-endian unsigned short int] Number of included files, stored in "$self->{filecount}". • Offset 0x14 [2 bytes, big-endian unsigned short int] Length in bytes of the .RES directory name, stored in "$self->{resdirlength}". • Offset 0x16 [2 bytes, big-endian unsigned short int] Offset from the point after this value to the .RES directory name, which also marks the end of the book properties, stored in "$self->{resdiroffset}". Note that this is NOT the length of the book properties. To get the length of the book properties, subtract 24 from this value (the number of bytes remaining in the header after this point). It is also NOT the offset from the beginning of the file to the .RES directory name -- to find that, add 24 to this value (the number of bytes already parsed). • Offset 0x18 [4 bytes, big-endian unsigned long int?] Unknown value, stored in "$self->{unknown0x18}". Use with caution -- this value may be renamed if more information is obtained. • Offset 0x1C [4 bytes, big-endian unsigned long int?] Unknown value, stored in "$self->{unknown0x1c}". Use with caution -- this value may be renamed if more information is obtained. • Offset 0x20 [4 bytes, big-endian unsigned long int] Compression type, stored in "$self->{compression}". Expected values are 0 (no compression) and 1 (LZSS compression). • Offset 0x24 [4 bytes, big-endian unsigned long int] Encryption type, stored in "$self->{encryption}". Expected values are 0 (no encryption) and 2 (DES encryption). • Offset 0x28 [2 bytes, big-ending unsigned short int] Unknown value, stored in "$self->{unknown0x28}". Use with caution -- this value may be renamed if more information is obtained. • Offset 0x2A [1 byte] Unknown value, stored in "$self->{unknown0x2A}". Use with caution -- this value may be renamed if more information is obtained. • Offset 0x2B [2 nybbles (1 byte)] The upper nybble at this position is the IMP reader device for which the e-book was designed, stored in "$self->{device}". Expected values are 0 (Softbook 200/250e), 1 (REB 1200/GEB 2150), and 2 (EBW 1150/GEB1150). The lower nybble marks the possible zoom states, stored in "$self->{zoomstates}". Expected values are 0 (both zooms), 1 (small zoom), and 2 (large zoom) • Offset 0x2C [4 bytes, big-endian unsigned long int] Unknown value, stored in "$self->{unknown0x2c}". Use with caution -- this value may be renamed if more information is obtained. "parse_resource_cm()" Parses the "!!cm" resource loaded into "$self->{resources}", if present, extracting the LZSS uncompression parameters into "$self->{lzssoffsetbits}" and "$self->{lzsslengthbits}". Returns 1 on success, or undef if no "!!cm" resource has been loaded yet or the resource data is invalid. "parse_resource_images()" Parses the image data resources loaded into "$self->{resources}", if present, placing the image data and metadata of each image found into "$self->{jpg}" and "$self->{png}", keyed by 16-bit image resource ID. Returns the total number of images found and parsed. This method is called automatically by "load()" and "load_resdir()". See also accessor methods "image(%args)" and "image_hashrefs(%args)". "parse_resource_imrn()" Parses the index of text offsets to all images as stored in "$self->{resources}->{'ImRn'}", if present, storing them in "$self->{imrn}" as a hash of hashrefs indexed by its 32-bit integer offset to the 0x0F control code in the uncompressed text stored in the DATA.FRK resource. Returns the total number of offsets found and parsed. The hash keys of each offset hash are: • "width" Image display width in pixels. • "height" Image display height in pixels. • "id" A 16-bit integer value used to uniquely identify the image inside a particular resource type. • "restype" The four-letter resource type string. • "constF1" A 32-bit value of unknown purpose which should always be 0xFFFFFFFF. • "constF2" A second 32-bit value of unknown purpose which should always be 0xFFFFFFFF. • "const0" A 32-bit integer value of unknown purpose which should always be 0x00000000. • "constB" A 16-bit integer value of unknown purpose which could be 0xFFFA, 0xFFFB, 0xFFFC, or 0xFFFE. • "unknown16" A 16-bit integer value of unknown purpose found only in 1150 resources. • "unknown32" A 32-bit integer value of unknown purpose. This method is called automatically by "load()" and "load_resdir()". "parse_text()" Parses the ' ' (DATA.FRK) resource loaded into "$self->{resources}", if present, extracting the text into "$self->{text}", uncompressing it if necessary. LZSS uncompression will use the "$self->{lzsslengthbits}" and "$self->{lzssoffsetbits}" attributes if present, and default to 3 length bits and 14 offset bits otherwise. HTML headers and footers are then applied, and control codes replaced with appropriate tags. Returns the length of the raw uncompressed text before any HTML modification was done, or undef if no text resource was found or the text was encrypted. "parse_imp_toc_v1($tocdata)" Takes as a single argument a string containing the table of contents data, and parses it into object attributes following the version 1 format (10 bytes per entry). Format • Offset 0x00 [4 bytes, text] Resource name. Stored in hash key "name". In the case of the 'DATA.FRK' text resource, this will be four spaces (' '). • Offset 0x04 [2 bytes, big-endian unsigned short int] Unknown, but always zero or one. Stored in hash key "unknown1". • Offset 0x08 [4 bytes, big-endian unsigned long int] Size of the resource data in bytes. Stored in hash key "size". "parse_imp_toc_v2($tocdata)" Takes as a single argument a string containing the table of contents data, and parses it into object attributes following the version 2 format (20 bytes per entry). Format • Offset 0x00 [4 bytes, text] Resource name. Stored in "name". In the case of the 'DATA.FRK' text resource, this will be four spaces (' '). • Offset 0x04 [4 bytes, big-endian unsigned long int] Unknown, but always zero. Stored in "unknown1". • Offset 0x08 [4 bytes, big-endian unsigned long int] Size of the resource data in bytes. Stored in "size". • Offset 0x0C [4 bytes, text] Resource type. Stored in "type", and used as the key for the stored resource hash. • Offset 0x10 [4 bytes, big-endian unsigned long int] Unknown, but always either zero or one. Stored in "unknown2". "set_book_properties(%args)" Sets the specified book properties. Returns 1 on success, or undef if no properties were specified. Arguments • "identifier" The book identifier, as might be provided as an OPF "<dc:identifier>" element. • "category" The main book category, as might be provided as an OPF "<dc:subject>" element. • "subcategory" The subcategory, generally a set of search arguments for the ETI website. • "title" The book title, as might be provided as an OPF "<dc:title>" element. • "lastname" The primary author's last name, but see the entry for "firstname" before deciding how to handle name storage. • "middlename" The primary author's middle name, but see the entry for "firstname" before deciding how to handle name storage. • "firstname" The primary author's first name, but this field is also used by a great many .imp books to store the entire name in "First Last" format. If this field is to be used this way, "lastname" and "middlename" must be blank. Example $imp->set_book_properties(title => 'My Best Book', category => 'Fiction', firstname => 'John Q. Public');
PROCEDURES
All procedures are exportable, but none are exported by default. "detect_resource_type(\$data)" Takes as a sole argument a reference to the data component of a resource. Returns a 4-byte string containing the resource type if detected successfully, or undef otherwise. Detection will not work on the "DATA.FRK" (' ') resource. That one must be detected separately by name/type. "parse_imp_resource_v1()" Takes as a sole argument a string containing the data (including the 10-byte header) of a version 1 IMP resource. Returns a hashref containing that data separated into the following keys: • "name" The four-letter name of the resource. • "type" The four-letter type of the resource. This is detected from the data, and is not part of the v1 header. • "unknown1" A 16-bit unsigned int of unknown purpose. Expected values are 0 or 1. Use with caution. This key may be renamed later if more information is found. • "size" The expected size in bytes of the actual resource data. A warning will be carped if this does not match the actual size of the data following the header. • "data" The actual resource data. "parse_imp_resource_v2()" Takes as a sole argument a string containing the data (including the 20-byte header) of a version 2 IMP resource. Returns a hashref containing that data separated into the following keys: • "name" The four-letter name of the resource. • "unknown1" A 32-bit unsigned int of unknown purpose. Expected values are 0 or 1. Use with caution. This key may be renamed later if more information is found. • "size" The expected size in bytes of the actual resource data. A warning will be carped if this does not match the actual size of the data following the header. • "type" The four-letter type of the resource. • "unknown2" A 32-bit unsigned int of unknown purpose. Expected values are 0 or 1. Use with caution. This key may be renamed later if more information is found. • "data" The actual resource data.
BUGS AND LIMITATIONS
• Not finished. Do not try to use yet. • MacPaint PICT images are not well-supported. If present in the book, they will be saved, but a warning will be carped about invalid image data. • Support for v1 files is completely untested and implemented with some guesswork. Bug reports welcome.
AUTHOR
Zed Pobre <zed@debian.org>
THANKS
Thanks are due to Nick Rapallo <nrapallo@yahoo.ca> for invaluable assistance in understanding the .IMP format and testing this code. Thanks are also due to Jeffrey Kraus-yao <krausyaoj@ameritech.net> for his work reverse- engineering the .IMP format to begin with, and the documentation at <http://krausyaoj.tripod.com/reb1200.htm>.
LICENSE AND COPYRIGHT
Copyright 2008 Zed Pobre Licensed to the public under the terms of the GNU GPL, version 2.