Provided by: webcheck_1.10.4-1_all bug

NAME

       webcheck - website link checker

SYNOPSIS

       webcheck [OPTION]...  URL

DESCRIPTION

       webcheck will check the document at the specified URL for links to other documents, follow
       these links recursively and generate an HTML report.

       -i,  --internal=PATTERN
              Mark URLs matching the PATTERN (perl-type regular expression) as an internal  link.
              Can be used multiple times.  Note that the PATTERN is matched against the full URL.
              URLs matching this PATTERN will be considered internal, even if they match  one  of
              the --external PATTERNs.

       -x,  --external=PATTERN
              Mark  URLs matching the PATTERN (perl-type regular expression) as an external link.
              Can be used multiple times.  Note that the PATTERN is matched against the full URL.

       -y, --yank=PATTERN
              Do not check URLs matching the PATTERN (perl-type regular expression).  Like the -x
              flag, though this option will cause webcheck to not check the link matched by regex
              whereas -x will check the link but not its children.  Can be used  multiple  times.
              Note that the PATTERN is matched against the full URL.

       -b, --base-only
              Consider  any  URL  not starting with the base URL to be external.  For example, if
              you run
                  webcheck -b http://www.example.com/foo
              then   http://www.example.com/foo/bar   will   be   considered   internal   whereas
              http://www.example.com/  will  be considered external.  By default all the pages on
              the site will be considered internal.

       -a, --avoid-external
              Avoid external links.  Normally if webcheck is examining an HTML page and it  finds
              a  link  that points to an external document, it will check to see if that external
              document exists.  This flag disables that action.

       --ignore-robots
              Do not retrieve and parse  robots.txt  files.   By  default  robots.txt  files  are
              retrieved  and  honored.   If  you  are  sure  you  want to ignore and override the
              webmaster's decision this option can be used.
              For more information on robots.txt handling see the NOTES section below.

       -q, --quiet, --silent
              Do not print out progress as webcheck traverses a site.

       -d, --debug
              Print debugging information while crawling the site.  This option is mainly  useful
              for developers.

       -o, --output=DIRECTORY
              Output  directory.  Use  to  specify  the  directory  where  webcheck will dump its
              reports. The default is the current directory or as specified by config.py. If this
              directory does not exist it will be created for you (if possible).

       -c, --continue
              Try  to continue from a previous run. When using this option webcheck will look for
              a webcheck.dat in the output directory.  This file is read  to  restore  the  state
              from  the  previous run.  This allows webcheck to continue a previously interrupted
              run.  When this option is used, the --internal, --external and --yank options  will
              be  ignored  as  well  as  any URL arguments.  The --base-only and --avoid-external
              options should be the same as the previous run.
              Note that this option is experimental and it's semantics  may  change  with  coming
              releases  (especially  in  relation  to  other options).  Also note that the stored
              files are not guaranteed to be compatible between releases.

       -f, --force
              Overwrite files without asking.  This option is required for running webcheck  non-
              interactively.

       -r, --redirects=N
              Redirect  depth.  the  number  of redirects webcheck should follow when following a
              link. 0 implies to follow all redirects.

       -u, --userpass=URL
              Specify  a  URL  with  username  and  password  information  to   use   for   basic
              authentication when visiting the site.
              e.g. http://test:secret@example.com/
              This option may be specified multiple times.

       -w, --wait=SECONDS
              Wait  SECONDS  between document retrievals. Usually webcheck will process a url and
              immediately move on to the next. However on some loaded systems it may be desirable
              to  have  webcheck  pause  between  requests.   This  option can be set to any non-
              negative number.

       -v, --version
              Show version of program.

       -h, --help
              Show short summary of options.

URL CLASSES

       URLs are divided into two classes:

       Internal URLs are retrieved and the retrieved item  is  checked  for  syntax.   Also,  the
       retrieved  item  is  searched  for links to other items (of any class) and these links are
       followed.

       External URLs are only retrieved to test whether they are valid and to gather  some  basic
       information  from  them  (title,  size,  content-type,  etc).  The retrieved items are not
       inspected for links to other items.

       Apart from their class, URLs can also be considered yanked (as specified with  the  --yank
       or --avoid-external options).  The URLs can be either internal or external and will not be
       retrieved or checked at all.  URLs of unsupported schemes are also considered yanked.

EXAMPLES

       Check the site www.example.com but  consider  any  path  with  "/webcheck"  in  it  to  be
       external.
           webcheck http://www.example.com/ -x /webcheck

NOTES

       When  checking  internal  URLs  webcheck honors the robots.txt file, identifying itself as
       user-agent webcheck. Disallowed links will not be checked at all as if the -y  option  was
       specified  for  that URL. To allow webcheck to crawl parts of a site that other robots are
       disallowed, use something like:
           User-agent: *
           Disallow: /foo

           User-agent: webcheck
           Allow: /foo

ENVIRONMENT

       <scheme>_proxy
              Proxy url for <scheme>.

REPORTING BUGS

       Bug reports shoult be sent to the  mailing  list  <webcheck-users@lists.arthurdejong.org>.
       More information on reporting bugs can be found on the webcheck homepage:
       http://arthurdejong.org/webcheck/

COPYRIGHT

       Copyright © 1998, 1999 Albert Hopkins (marduk)
       Copyright © 2002 Mike W. Meyer
       Copyright © 2005, 2006, 2007, 2008, 2009, 2010 Arthur de Jong
       webcheck  is  free software; see the source for copying conditions.  There is NO warranty;
       not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
       The files produced as output from  the  software  do  not  automatically  fall  under  the
       copyright of the software, unless explicitly stated otherwise.