Provided by: pwget_2016.1019+git75c6e3e-1_amd64 bug


       pwget - Perl Web URL fetch program


           pwget [URL ...]
           pwget --config $HOME/config/pwget.conf --tag linux --tag emacs ..
           pwget --verbose --overwrite
           pwget --verbose --overwrite --Output ~/dir/
           pwget --new --overwrite


       Automate periodic downloads of files and packages.

       If you retrieve latest versions of certain program blocks periodically, this is the Perl
       script for you. Run from cron job or once a week to upload newest versions of files around
       the net. Note:

   Wget and this program
       At this point you may wonder, where would you need this perl program when wget(1)
       C-program has been the standard for ages. Well, 1) Perl is cross platform and more easily
       extendable 2) You can record file download criteria to a configuration file and use perl
       regular epxressions to select downloads 3) the program can anlyze web-pages and "search"
       for the download only links as instructed 4) last but not least, it can track newest
       packages whose name has changed since last downlaod. There are heuristics to determine the
       newest file or package according to file name skeleton defined in configuration.

       This program does not replace pwget(1) because it does not offer as many options as wget,
       like recursive downloads and date comparing. Use wget for ad hoc downloads and this
       utility for files that change (new releases of archives) or which you monitor

   Short introduction
       This small utility makes it possible to keep a list of URLs in a configuration file and
       periodically retrieve those pages or files with simple commands. This utility is best
       suited for small batch jobs to download e.g. most recent versions of software files. If
       you use an URL that is already on disk, be sure to supply option --overwrite to allow
       overwriting existing files.

       While you can run this program from command line to retrieve individual files, program has
       been designed to use separate configuration file via --config option. In the configuration
       file you can control the downloading with separate directives like "save:" which tells to
       save the file under different name. The simplest way to retrieve the latest version of
       apackage from a FTP site is:

           pwget --new --overwite --verbose \

       Do not worry about the filename "package-1.00.tar.gz". The latest version, say,
       "package-3.08.tar.gz" will be retrieved. The option --new instructs to find newer version
       than the provided URL.

       If the URL ends to slash, then directory list at the remote machine is stored to file:


       The content of this file can be either index.html or the directory listing depending on
       the used http or ftp protocol.


       -A, --regexp-content REGEXP
           Analyze the content of the file and match REGEXP. Only if the regexp matches the file
           content, then download file. This option will make downloads slow, because the file is
           read into memory as a single line and then a match is searched against the content.

           For example to download Emacs lisp file (.el) written by Mr. Foo in case insensitive

               pwget -v -r '\.el$' -A "(?i)Author: Mr. Foo" \

       -C, --create-paths
           Create paths that do not exist in "lcd:" directives.

           By default, any LCD directive to non-existing directory will interrupt program. With
           this option, local directories are created as needed making it possible to re-create
           the exact structure as it is in configuration file.

       -c, --config FILE
           This option can be given multiple times. All configurations are read.

           Read URLs from configuration file. If no configuration file is given, file pointed by
           environment variable is read. See ENVIRONMENT.

           The configuration file layout is envlained in section CONFIGURATION FILE

       --chdir DIRECTORY
           Do a chdir() to DIRECTORY before any URL download starts. This is like doing:

               cd DIRECTORY

       -d, --debug [LEVEL]
           Turn on debug with positive LEVEL number. Zero means no debug.  This option turns on
           --verbose too.

       -e, --extract
           Unpack any files after retrieving them. The command to unpack typical archive files
           are defined in a program. Make sure these programs are along path. Win32 users are
           encouraged to install the Cygwin utilities where these programs come standard. Refer
           to section SEE ALSO.

             .tar => tar
             .tgz => tar + gzip
             .gz  => gzip
             .bz2 => bzip2
             .xz  => xz
             .zip => unzip

       -F, --firewall FIREWALL
           Use FIREWALL when accessing files via ftp:// protocol.

       -h, --help
           Print help page in text.

           Print help page in HTML.

           Print help page in Unix manual page format. You want to feed this output to c<nroff
           -man> in order to read it.

           Print help page.

       -m, --mirror SITE
           If URL points to Sourcefoge download area, use mirror SITE for downloading.
           Alternatively the full full URL can include the mirror information. And example:

               --mirror kent

       -n, --new
           Get newest file. This applies to datafiles, which do not have extension .asp or .html.
           When new releases are announced, the version number in filename usually tells which is
           the current one so getting hardcoded file with:

               pwget -o -v

           is not usually practical from automation point of view. Adding --new option to the
           command line causes double pass: a) the whole is examined for
           all files and b) files matching approximately filename program-1.3.tar.gz are
           examined, heuristically sorted and file with latest version number is retrieved.

           Ignore "lcd:" directives in configuration file.

           In the configuration file, any "lcd:" directives are obeyed as they are seen. But if
           you do want to retrieve URL to your current directory, be sure to supply this option.
           Otherwise the file will end to the directory pointer by "lcd:".

           Ignore "save:" directives in configuration file. If the URLs have "save:" options,
           they are ignored during fetch. You usually want to combine --no-lcd with --no-save

           Ignore "x:" directives in configuration file.

       -O, --output DIR
           Before retrieving any files, chdir to DIR.

       -o, --overwrite
           Allow overwriting existing files when retrieving URLs.  Combine this with
           --skip-version if you periodically update files.

       --proxy PROXY
           Use PROXY server for HTTP. (See --Firewall for FTP.). The port number is optional in
           the call:


       -p, --prefix PREFIX
           Add PREFIX to all retrieved files.

       -P, --postfix POSTFIX
           Add POSTFIX to all retrieved files.

       -D, --prefix-date
           Add iso8601 ":YYYY-MM-DD" prefix to all retrieved files.  This is added before
           possible --prefix-www or --prefix.

       -W, --prefix-www
           Usually the files are stored with the same name as in the URL dir, but if you retrieve
           files that have identical names you can store each page separately so that the file
           name is prefixed by the site name.


       -r, --regexp REGEXP
           Retrieve file matching at the destination URL site. This is like "Connect to the URL
           and get all files matching REGEXP". Here all gzip compressed files are found form HTTP
           server directory:

               pwget -v -r "\.gz"

           Caveat: currently works only for http:// URLs.

       -R, --config-regexp REGEXP
           Retrieve URLs matching REGEXP from configuration file. This cancels --tag options in
           the command line.

       -s, --selftest
           Run some internal tests. For maintainer or developer only.

       --sleep SECONDS
           Sleep SECONDS before next URL request. When using regexp based downlaods that may
           return many hits, some sites disallow successive requests in within short period of
           time. This options makes program sleep for number of SECONDS between retrievals to
           overcome 'Service unavailable'.

           Retrieve URL and write to stdout.

           Do not download files that have version number and which already exists on disk.
           Suppose you have these files and you use option --skip-version:


           Only file.txt is retrieved, because file-1.1.tar.gz contains version number and the
           file has not changed since last retrieval. The idea is, that in every release the
           number in in distribution increases, but there may be distributions which do not
           contain version number. In regular intervals you may want to load those packages
           again, but skip versioned files. In short: This option does not make much sense
           without additional option --new

           If you want to reload versioned file again, add option --overwrite.

       -t, --test, --dry-run
           Run in test mode.

       -T, --tag NAME [NAME] ...
           Search tag NAME from the config file and download only entries defined under that tag.
           Refer to --config FILE option description. You can give Multiple --tag switches.
           Combining this option with --regexp does not make sense and the concequencies are

       -v, --verbose [NUMBER]
           Print verbose messages.

       -V, --version
           Print version information.


       Get files from site:

           pwget ..

       Display copyright file for package GNU make from Debian pages:

           pwget --stdout --regexp 'copyright$'

       Get all mailing list archive files that match "gz":

           pwget --regexp gz

       Read a directory and store it to filename YYYY-MM-DD::!dir!000root-file.

           pwget --prefix-date --overwrite --verbose

       To update newest version of the package, but only if there is none at disk already. The
       --new option instructs to find newer packages and the filename is only used as a skeleton
       for files to look for:

           pwget --overwrite --skip-version --new --verbose \

       To overwrite file and add a date prefix to the file name:

           pwget --prefix-date --overwrite --verbose \


       To add date and WWW site prefix to the filenames:

           pwget --prefix-date --prefix-www --overwrite --verbose \


       Get all updated files under cnfiguration file's tag updates:

           pwget --verbose --overwrite --skip-version --new --tag updates
           pwget -v -o -s -n -T updates

       Get files as they read in the configuration file to the current directory, ignoring any
       "lcd:" and "save:" directives:

           pwget --config $HOME/config/pwget.conf /
               --no-lcd --no-save --overwrite --verbose \

       To check configuration file, run the program with non-matching regexp and it parses the
       file and checks the "lcd:" directives on the way:

           pwget -v -r dummy-regexp


           pwget.DirectiveLcd: LCD [$EUSR/directory ...]
           is not a directory at /users/foo/bin/pwget line 889.


       The configuration file is NOT Perl code. Comments start with hash character (#).

       At this point, variable expansions happen only in lcd:. Do not try to use them anywhere
       else, like in URLs.

       Path variables for lcd: are defined using following notation, spaces are not allowed in
       VALUE part (no directory names with spaces). Variable names are case sensitive. Variables
       substitute environment variabales with the same name. Environment variables are
       immediately available.

           VARIABLE = /home/my/dir         # define variable
           VARIABLE = $dir/some/file       # Use previously defined variable
           FTP      = $HOME/ftp            # Use environment variable

       The right hand can refer to previously defined variables or existing environment
       variables. Repeat, this is not Perl code although it may look like one, but just an
       allowed syntax in the configuration file. Notice that there is dollar to the right hand>
       when variable is referred, but no dollar to the left hand side when variable is defined.
       Here is example of a possible configuration file contant. The tags are hierarchically
       ordered without a limit.

       Warning: remember to use different variables names in separate include files. All
       variables are global.

   Include files
       It is possible to include more configuration files with statement

           INCLUDE <path-to-file-name>

       Variable expansions are possible in the file name. There is no limit how many or how deep
       include structure is used. Every file is included only once, so it is safe to to have
       multiple includes to the same file.  Every include is read, so put the most importat
       override includes last:

           INCLUDE <etc/pwget.conf>             # Global
           INCLUDE <$HOME/config/pwget.conf>    # HOME overrides it

       A special "THIS" tag means relative path of the current include file, which makes it
       possible to include several files form the same directory where a initial include file

           # Start of config at /etc/pwget.conf

           # THIS = /etc, current location
           include <THIS/pwget-others.conf>

           # Refers to directory where current user is: the pwd
           include <pwget-others.conf>

           # end

   Configuraton file example
       The configuration file can contain many <directoves:>, where each directive end to a
       colon. The usage of each directory is best explained by examining the configuration file
       below and reading the commentary near each directive.

           #   $HOME/config/pwget.conf F- Perl pwget configuration file

           ROOT   = $HOME                      # define variables
           CONF   = $HOME/config
           UPDATE = $ROOT/updates
           DOWNL  = $ROOT/download

           #   Include more configuration files. It is possible to
           #   split a huge file in pieces and have "linux",
           #   "win32", "debian", "emacs" configurations in separate
           #   and manageable files.

           INCLUDE <$CONF/pwget-other.conf>
           INCLUDE <$CONF/pwget-more.conf>

           tag1: local-copies tag1: local      # multiple names to this category

               lcd:  $UPDATE                   # chdir directive

               #  This is show to user with option --verbose
               print: Notice, this site moved YYYY-MM-DD, update your bookmarks


           tag1: external

             lcd:  $DOWNL

             tag2: external-http


             tag2: external-ftp

      save:xx-file.txt.gz login:foo pass:passwd x:

               lcd: $HOME/download/package


             tag2: package-x

               lcd: $DOWNL/package-x

               #  Person announces new files in his homepage, download all
               #  announced files. Unpack everything (x:) and remove any
               #  existing directories (xopt:rm)

      pregexp:\.tar\.gz$ x: xopt:rm

           # End of configuration file pwget.conf


       All the directives must in the same line where the URL is. The programs scans lines and
       determines all options given in line for the URL.  Directives can be overridden by command
       line options.

           Currently only conv:text is available.

           Convert downloaded page to text. This option always needs either save: or rename:,
           because only those directives change filename. Here is an example:

      cnv:text save:file.txt
      pregexp:\.html cnv:text rename:s/html/txt/

           A text: shorthand directive can be used instead of cnv:text.

           Download file only if the content matches REGEXP. This is same as option
           --Regexp-content. In this example directory listing Emacs lisp packages (.el) are
           downloaded but only if their content indicates that the Author is Mr. Foo:

      cregexp:(?i)author:.*Foo pregexp:\.el$

           Set local download directory to DIRECTORY (chdir to it). Any environment variables are
           substituted in path name. If this tag is found, it replaces setting of --Output. If
           path is not a directory, terminate with error.  See also --Create-paths and --no-lcd.

           Ftp login name. Default value is "anonymous".

           This is relevant to Sourceforge only which does not allow direct downloads with links.
           Visit project's Sourceforge homepage and see which mirrors are available for

           An example:

    new: mirror:kent

           Get newest file. This variable is reset to the value of --new after the line has been
           processed. Newest means, that an "ls" command is run in the ftp, and something
           equivalent in HTTP "ftp directories", and any files that resemble the filename is
           examined, sorted and heurestically determined according to version number of file
           which one is the latest. For example files that have version information in YYYYMMDD
           format will most likely to be retrieved right.

           Time stamps of the files are not checked.

           The only requirement is that filename "must" follow the universal version numbering

               FILE-VERSION.extension      # de facto VERSION is defined as [\d.]+

               file-19990101.tar.gz        # ok
               file-1999.0101.tar.gz       # ok
               file-         # ok

               file1234.txt                # not recognized. Must have "-"
               file-0.23d.tar.gz           # warning, letters are problematic

           Files that have some alphabetic version indicator at the end of VERSION may not be
           handled correctly. Contact the developer and inform him about the de facto standard so
           that files can be retrieved more intelligently.

           NOTE: In order the new: directive to know what kind of files to look for, it needs a
           file tamplate. You can use a direct link to some filename. Here the location
           "" is examined and the filename template used is took
           as "file-1.1.tar.gz" to search for files that might be newer, like


           If the filename appeard in a named page, use directive file: for template. In this
           case the "download.html" page is examined for files looking like "file.*tar.gz" and
           the latest is searched:

    file:file-1.1.tar.gz new:

       overwrite: o:
           Same as turning on --overwrite

           Read web page and apply commands to it. An example: contact the root page and save it:

     page: save:foo-homepage.html

           In order to find the correct information from the page, other directives are usually
           supplied to guide the searching.

           1) Adding directive "pregexp:ARCHIVE-REGEXP" matches the A HREF links in the page.

           2) Adding directive new: instructs to find newer VERSIONS of the file.

           3) Adding directive "file:DOWNLOAD-FILE" tells what template to use to construct the
           downloadable file name. This is needed for the "new:" directive.

           4) A directive "vregexp:VERSION-REGEXP" matches the exact location in the page from
           where the version information is extracted. The default regexp looks for line that
           says "The latest version ... is ... N.N".  The regexp must return submatch 2 for the
           version number.

           AN EXAMPLE

           Search for newer files from a HTTP directory listing. Examine page
  for model "package-1.1.tar.gz" and find a newer
           file. E.g. "package-4.7.tar.gz" would be downloaded.


           AN EXAMPLE

           Search for newer files from the content of the page. The directive file: acts as a
           model for filenames to pay attention to.

      new: pregexp:tar.gz file:package-1.1.tar.gz

           AN EXAMPLE

           Use directive rename: to change the filename before soring it on disk. Here, the
           version number is attached to the actila filename:


           The directived needed would be as follows; entries have been broken to separate lines
           for legibility:


           This effectively reads: "See if there is new version of something that looks like
           file.el-1.1 and save it under name file.el by deleting the extra version number at the
           end of original filename".

           AN EXAMPLE

           Contact absolute page: at and search A HREF urls
           in the page that match pregexp:. In addition, do another scan and search the version
           number in the page from thw position that match vregexp: (submatch 2).

           After all the pieces have been found, use template file: to make the retrievable file
           using the version number found from vregexp:.  The actual download location is
           combination of page: and A HREF pregexp: location.

           The directived needed would be as follows; entries have been broken to separate lines
           for legibility:

               pregexp: package.tar.gz
               vregexp: ((?i)latest.*?version.*?\b([\d][\d.]+).*)
               file: package-1.3.tar.gz

           An example of web page where the above would apply:


               The latest version of package is <B>2.4.1</B> It can be
               downloaded in several forms:

                   <A HREF="download/files/package.tar.gz">Tar file</A>
                   <A HREF="download/files/">ZIP file


           For this example, assume that "package.tar.gz" is a symbolic link pointing to the
           latest release file "package-2.4.1.tar.gz". Thus the actual download location would
           have been "".

           Why not simply download "package.tar.gz"? Because then the program can't decide if the
           version at the page is newer than one stored on disk from the previous download. With
           version numbers in the file names, the comparison is possible.

           FIXME: This opton is obsolete. do not use.

           THIS IS FOR HTTP only. Use Use directive regexp: for FTP protocls.

           This is a more general instruction than the page: and vregexp: explained above.

           Instruct to download every URL on HTML page matching pregexp:RE. In typical situation
           the page maintainer lists his software in the development page. This example would
           download every tar.gz file in the page. Note, that the REGEXP is matched against the A
           HREF link content, not the actual text that is displayed on the page:

      page:find pregexp:\.tar.gz$

           You can also use additional regexp-no: directive if you want to exclude files after
           the pregexp: has matched a link.

      page:find pregexp:\.tar.gz$ regexp-no:desktop

           For FTP logins. Default value is "".

           Search A HREF links in page matching a regular expression. The regular expression must
           be a single word with no whitespace. This is incorrect:

               pregexp:(this regexp )

           It must be written as:


           Print associated message to user requesting matching tag name.  This directive must in
           separate line inside tag.

               tag1: linux

                 print: this download site moved 2002-02-02, check your bookmarks.

           The "print:" directive for tag is shown only if user turns on --verbose mode:

               pwget -v -T linux

           Rename each file using PERL-CODE. The PERL-CODE must be full perl program with no
           spaces anywhere. Following variables are available during the eval() of code:

               $ARG = current file name
               $url = complete url for the file
               The code must return $ARG which is used for file name

           For example, if page contains links to .html files that are in fact text files,
           following statement would change the file extensions:

      page:find pregexp:\.html rename:s/html/txt/

           You can also call function "MonthToNumber($string)" if the filename contains written
           month name, like <2005-February.mbox>.The function will convert the name into number.
           Many mailing list archives can be downloaded cleanly this way.

               #  This will download SA-Exim Mailing list archives:
      pregexp:\.txt$ rename:$ARG=MonthToNumber($ARG)

           Here is a more complicated example:

      pregexp:mbox.*\d$ rename:my($y,$m)=($url=~/year=(\d+).*month=(\d+)/);$ARG="$y-$m.mbox"

           Let's break that one apart. You may spend some time with this example since the
           possiblilities are limitless.

               1. Connect to page

               2. Search page for URLs matching regexp 'mbox.*\d$'. A
                  found link could match hrefs like this:

               3. The found link is put to $ARG (same as $_), which can be used
                  to extract suitable mailbox name with a perl code that is
                  evaluated. The resulting name must apear in $ARG. Thus the code
                  effectively extract two items from the link to form a mailbox

                   my ($y, $m) = ( $url =~ /year=(\d+).*month=(\d+)/ )
                   $ARG = "$y-$m.mbox"

                   => 2004-12.mbox

           Just remember, that the perl code that follows "rename:" directive must must not
           contain any spaces. It all must be readable as one string.

           Get all files in ftp directory matching regexp. Directive save: is ignored.

           After the "regexp:" directive has matched, exclude files that match directive regexp-

           This option is for interactive use. Retrieve all files from HTTP or FTP site which
           match REGEXP.

           Save file under this name to local disk.

           Downloads can be grouped under "tagN" so that e.g. option --tag1 would start
           downloading files from that point on until next "tag1" is found.  There are currently
           unlimited number of tag levels: tag1, tag2 and tag3, so that you can arrange your
           downlods hierarchially in the configuration file.  For example to download all Linux
           files rhat you monitor, you would give option --tag linux. To download only the NT
           Emacs latest binary, you would give option --tag emacs-nt. Notice that you do not give
           the "level" in the option, program will find it out from the configuration file after
           the tag name matches.

           The downloading stops at next tag of the "same level". That is, tag2 stops only at
           next tag2, or when upper level tag is found (tag1) or or until end of file.

               tag1: linux             # All Linux downlods under this category

                   tag2: sunsite    tag2: another-name-for-this-spot

                   #   List of files to download from here


                   #   List of files to download from here

               tag1: emacs-binary

                   tag2: emacs-nt

                   tag2: xemacs-nt

                   tag2: emacs

                   tag2: xemacs

       x:  Extract (unpack) file after download. See also option --unpack and --no-extract The
           archive file, say .tar.gz will be extracted the file in current download location.
           (see directive lcd:)

           The unpack procedure checks the contents of the archive to see if the package is
           correctly formed. The de facto archive format is


           In the archive, all files are supposed to be stored under the proper subdirectory with
           version information:


           "IMPORTANT:" If the archive does not have a subdirectory for all files, a subdirectory
           is created and all items are unpacked under it. The default subdirectory name in
           constructed from the archive name with currect date stamp in format:


           If the archive name contains something that looks like a version number, the created
           directory will be constructed from it, instead of current date.

               package-1.43.tar.gz    =>  package-1.43

       xx: Like directive x: but extract the archive "as is", without checking content of the
           archive. If you know that it is ok for the archive not to include any subdirectories,
           use this option to suppress creation of an artificial root package-YYYY.MMDD.

           This options tells to remove any previous unpack directory.

           Sometimes the files in the archive are all read-only and unpacking the archive second
           time, after some period of time, would display

               tar: package-3.9.5/.cvsignore: Could not create file:
               Permission denied

               tar: package-3.9.5/BUGS: Could not create file:
               Permission denied

           This is not a serious error, because the archive was already on disk and tar did not
           overwrite previous files. It might be good to inform the archive maintainer, that the
           files have wrong permissions. It is customary to expect that distributed packages have
           writable flag set for all files.


       Here is list of possible error messages and how to deal with them.  Turning on  --debug
       will help to understand how program has interpreted the configuration file or command line
       options. Pay close attention to the generated output, because it may reveal that a regexp
       for a site is too lose or too tight.

       ERROR {URL-HERE} Bad file descriptor
           This is "file not found error". You have written the filename incorrectly.  Double
           check the configuration file's line.


       "Sourceforge note": To download archive files from Sourceforge requires some trickery
       because of the redirections and load balancers the site uses. The Sourceforge page have
       also undergone many changes during their existence. Due to these changes there exists an
       ugly hack in the program to use wget(1) to get certain information from the site.  This
       could have been implemented in pure Perl, but as of now the developer hasn't had time to
       remove the wget(1) dependency. No doubt, this is an ironic situation to use wget(1). You
       you have Perl skills, go ahead and look at UrlHttGet(). UrlHttGetWget() and sen patches.

       The program was initially designed to read options from one line. It is unfortunately not
       possible to change the program to read configuration file directives from multiple lines,
       e.g. by using backslashes (\) to indicate contuatinued line.


       Variable "PWGET_CFG" can point to the root configuration file. The configuration file is
       read at startup if it exists.

           export PWGET_CFG=$HOME/conf/pwget.conf     # /bin/hash syntax
           setenv PWGET_CFG $HOME/conf/pwget.conf     # /bin/csh syntax


       Not defined.


       External utilities:

           wget(1)   only needed for downloads
                     see BUGS AND LIMITATIONS

       Non-core Perl modules from CPAN:


       The following modules are loaded in run-time only if directive cnv:text is used. Otherwise
       these modules are not loaded:


       This module is loaded in run-time only if HTTPS scheme is used:



       lwp-download(1) lwp-mirror(1) lwp-request(1) lwp-rget(1) wget(1)


       Jari Aalto


       Copyright (C) 1996-2016 Jari Aalto

       This program is free software; you can redistribute and/or modify program under the terms
       of GNU General Public license either version 2 of the License, or (at your option) any
       later version.