Archive::Tar::Wrapper
API wrapper around the 'tar' utility
API wrapper around the 'tar' utility
use Archive::Tar::Wrapper;
my $arch = Archive::Tar::Wrapper->new();
# Open a tarball, expand it into a temporary directory
$arch->read("archive.tgz");
# Iterate over all entries in the archive
$arch->list_reset(); # Reset Iterator
# Iterate through archive
while(my $entry = $arch->list_next()) {
my($tar_path, $phys_path) = @$entry;
print "$tar_path\n";
}
# Get a huge list with all entries
for my $entry (@{$arch->list_all()}) {
my($tar_path, $real_path) = @$entry;
print "Tarpath: $tar_path Tempfile: $real_path\n";
}
# Add a new entry
$arch->add($logic_path, $file_or_stringref);
# Remove an entry
$arch->remove($logic_path);
# Find the physical location of a temporary file
my($tmp_path) = $arch->locate($tar_path);
# Create a tarball
$arch->write($tarfile, $compress);
Archive::Tar::Wrapper is an API wrapper around the 'tar' command line utility. It never stores anything in memory, but works on temporary directory structures on disk instead. It provides a mapping between the logical paths in the tarball and the 'real' files in the temporary directory on disk.
It differs from Archive::Tar in two ways:
my $arch = Archive::Tar::Wrapper->new(tar => '/path/to/tar');
Since "Archive::Tar::Wrapper" creates temporary directories to store tar data, the location of the temporary directory can be specified:
my $arch = Archive::Tar::Wrapper->new(tmpdir => '/path/to/tmpdir');
Tremendous performance increases can be achieved if the temporary directory is located on a ram disk. Check the "Using RAM Disks" section below for details.
Additional options can be passed to the "tar" command by using the "tar_read_options" and "tar_write_options" parameters. Example:
my $arch = Archive::Tar::Wrapper->new(
tar_read_options => "p"
);
will use "tar xfp archive.tgz" to extract the tarball instead of just "tar xf archive.tgz". Gnu tar supports even more options, these can be passed in via
my $arch = Archive::Tar::Wrapper->new(
tar_gnu_read_options => ["--numeric-owner"],
);
Similarily, "tar_gnu_write_options" can be used to provide additional options for Gnu tar implementations. For example, the tar object
my $tar = Archive::Tar::Wrapper->new(
tar_gnu_write_options => ["--exclude=foo"],
);
will call the "tar" utility internally like
tar cf tarfile --exclude=foo ...
when the "write" method gets called.
By default, the "list_*()" functions will return only file entries. Directories will be suppressed. To have "list_*()" return directories as well, use
my $arch = Archive::Tar::Wrapper->new(
dirs => 1
);
If more files are added to a tarball than the command line can handle, "Archive::Tar::Wrapper" will switch from using the command
tar cfv tarfile file1 file2 file3 ...
to
tar cfv tarfile -T filelist
where "filelist" is a file containing all file to be added. The default for this switch is 512, but it can be changed by setting the parameter "max_cmd_line_args":
my $arch = Archive::Tar::Wrapper->new(
max_cmd_line_args => 1024
);
"read" handles both compressed and uncompressed files. To find out if a file is compressed or uncompressed, it tries to guess by extension, then by checking the first couple of bytes in the tarfile.
If only a limited number of files is needed from a tarball, they can be specified after the tarball name:
$arch->read("archive.tgz", "path/file.dat", "path/sub/another.txt");
The file names are passed unmodified to the "tar" command, make sure that the file paths match exactly what's in the tarball, otherwise "read()" will fail.
To iterate over the list, the following construct can be used:
# Get a huge list with all entries
for my $entry (@{$arch->list_all()}) {
my($tar_path, $real_path) = @$entry;
print "Tarpath: $tar_path Tempfile: $real_path\n";
}
If the list of items in the tarfile is big, use "list_reset()" and "list_next()" instead of "list_all".
If no additional parameters are given, permissions and user/group id settings of a file to be added are copied. If you want different settings, specify them in the options hash:
$arch->add($logic_path, $stringref,
{ perm => 0755, uid => 123, gid => 10 });
If $file_or_stringref is a reference to a Unicode string, the "binmode" option has to be set to make sure the string gets written as proper UTF-8 into the tarfile:
$arch->add($logic_path, $stringref, { binmode => ":utf8" });
On Linux, it's quite easy to create a RAM disk and achieve tremendous speedups while untarring or modifying a tarball. You can either create the RAM disk by hand by running
# mkdir -p /mnt/myramdisk # mount -t tmpfs -o size=20m tmpfs /mnt/myramdisk
and then feeding the ramdisk as a temporary directory to Archive::Tar::Wrapper, like
my $tar = Archive::Tar::Wrapper->new( tmpdir => '/mnt/myramdisk' );
or using Archive::Tar::Wrapper's built-in option 'ramdisk':
my $tar = Archive::Tar::Wrapper->new(
ramdisk => {
type => 'tmpfs',
size => '20m', # 20 MB
},
);
Only drawback with the latter option is that creating the RAM disk needs to be performed as root, which often isn't desirable for security reasons. For this reason, Archive::Tar::Wrapper offers a utility functions that mounts the ramdisk and returns the temporary directory it's located in:
# Create new ramdisk (as root):
my $tmpdir = Archive::Tar::Wrapper->ramdisk_mount(
type => 'tmpfs',
size => '20m', # 20 MB
);
# Delete a ramdisk (as root):
Archive::Tar::Wrapper->ramdisk_unmount();
Optionally, the "ramdisk_mount()" command accepts a "tmpdir" parameter pointing to a temporary directory for the ramdisk if you wish to set it yourself instead of letting Archive::Tar::Wrapper create it automatically.
This approach has limitations when it comes to file permissions: If the file to be added belongs to a different user/group, Archive::Tar::Wrapper will adjust the uid/gid/permissions of the target file in the temporary directory to reflect the original file's settings, to make sure the system tar will add it like that to the tarball, just like a regular tar run on the original file would. But this will fail of course if the original file's uid is different from the current user's, unless the script is running with superuser rights. The tar program by itself (without Archive::Tar::Wrapper) works differently: It'll just make a note of a file's uid/gid/permissions in the tarball (which it can do without superuser rights) and upon extraction, it'll adjust the permissions of newly generated files if the -p option is given (default for superuser).
Archive::Tar::Wrapper doesn't currently handle filenames with embedded newlines.
Copyright 2005 by Mike Schilli, all rights reserved. This program is free software, you can redistribute it and/or modify it under the same terms as Perl itself.
2005, Mike Schilli <cpan@perlmeister.com>