Provided by: w3c-linkchecker_4.81-6_all bug

NAME

       checklink - check the validity of links in an HTML or XHTML document

SYNOPSIS

       checklink  [ options ] uri ...

DESCRIPTION

       This manual page documents briefly the checklink command, a.k.a. the
       W3CX Link Checker.

       checklink is a program that reads an HTML or XHTML document, extracts a
       list of anchors and links and checks that no anchor is defined twice
       and that all the links are dereferenceable, including the fragments. It
       warns about HTTP redirects, including directory redirects, and can
       check recursively a part of a web site.

       The program can be used either as a command line tool or as a CGI
       script.

OPTIONS

       This program follow the usual GNU command line syntax, with long
       options starting with two dashes (`-'). A summary of options is
       included below.

       -?, -h, --help
            Show summary of options.

       -V, --version
            Output version information.

       -s, --summary
            Show result summary only.

       -b, --broken
            Show only the broken links, not the redirects.

       -e, --directory
            Hide directory redirects - e.g. <http://www.w3.org/TR> ->
            <http://www.w3.org/TR/>.

       -r, --recursive
            Check the documents linked from the first one.

       -D, --depth n
            Check the documents linked from the first one to depth n (implies
            --recursive).

       -l, --location uri
            Scope of the documents checked (implies --recursive).  Can be
            specified multiple times in order to specify multiple recursion
            bases.  If the URI of a candidate document is downwards relative
            to any of the bases, it is considered to be within the scope.  If
            not specified, the default is the base URI of the initial
            document, for example for
            <http://www.w3.org/TR/html4/Overview.html> it would be
            <http://www.w3.org/TR/html4/>.

       -X, --exclude regexp
            Do not check links whose full, canonical URIs match regexp.  Note
            that this option limits recursion the same way as --exclude-docs
            with the same regular expression would.

       --exclude-docs regexp
            In recursive mode, do not check links in documents whose full,
            canonical URIs match regexp.  This option may be specified
            multiple times.

       --suppress-redirect URI->URI
            Do not report a redirect from the first to the second URI.  The
            "->" is literal text.  This option may be specified multiple
            times.  Whitespace may be used instead of "->" to separate the
            URIs.

       --suppress-redirect-prefix URI->URI
            Do not report a redirect from a child of the first URI to the same
            child of the second URI.  The \"->\" is literal text.  This option
            may be specified multiple times.  Whitespace may be used instead
            of "->" to separate the URIs.

       --suppress-temp-redirects
            Do not report warnings about temporary redirects.

       --suppress-broken CODE:URI
            Do not report a broken link with the given CODE.  CODE is the HTTP
            response, or -1 for robots exclusion.  The ":" is literal text.
            This option may be specified multiple times.  Whitespace may be
            used instead of ":" to separate the CODE and the URI.

       --suppress-fragment URI
            Do not report the given broken fragment URI.  A fragment URI
            contains "#".  This option may be specified multiple times.

       -L, --languages accept-language
            The "Accept-Language" HTTP header to send.  In command line mode,
            this header is not sent by default.  The special value "auto"
            causes a value to be detected from the "LANG" environment
            variable, and sent if found.  In CGI mode, the default is to send
            the value received from the client as is.

       -c, --cookies cookie-file
            Use cookies, load/save them in cookie-file.  The special value
            "tmp" causes non-persistent use of cookies, i.e. they are used but
            only stored in memory for the duration of this link checker run.

       -R, --no-referer
            Do not send the "Referer" HTTP header.

       -q, --quiet
            No output if no errors are found.  Implies --summary.

       -v, --verbose
            Verbose mode.

       -i, --indicator
            Show progress while parsing as percentage of lines processed.  No
            indicator is shown for documents containing no linefeeds.

       -u, --user username
            Specify a username for authentication.

       -p, --password password
            Specify a password for authentication.

       --hide-same-realm
            Hide 401's that are in the same realm as the document checked.

       -S, --sleep secs
            Sleep the specified number of seconds between requests to each
            server.  Defaults to 1 second, which is also the minimum allowed.

       -t, --timeout secs
            Timeout for requests, in seconds.  The default is 30.

       -C, --connection-cache number
            Maximum number of cached connections.  Using this option overrides
            the "Connection_Cache_Size" configuration file parameter, see its
            documentation below for the default value and more information.

       -d, --domain domain
            Perl regular expression describing the domain to which the
            authentication information (if present) will be sent.  The default
            value can be specified in the configuration file.  See the
            "Trusted" entry in the configuration file description below for
            more information.

       --masquerade "real-prefix surrogate-prefix"
            Perform a simple string substitution: URIs which begin with the
            string "real-prefix" are rewritten using the "surrogate-prefix"
            before being dereferenced.  Useful for making a local directory
            masquerade as a remote one. For example:

              --masquerade "http://example.com/x/y/z/ file:///my/local/dir/"

            If the document being checked contains a link to
            http://example.com/x/y/z/foo.html, then the local file system will
            be checked for file:///my/local/dir/foo.html.

            --masquerade takes a single argument consisting of two URIs,
            separated by whitespace.  The quote marks are not part of the
            argument, but one usual way of providing a value with embedded
            whitespace is to enclose it in quotes.

       -H, --html
            HTML output.

FILES

       /etc/w3c/checklink.conf
            The main configuration file.  You can use the W3C_CHECKLINK_CFG
            environment variable to override the default location.

            "Trusted" specifies a regular expression for matching trusted
            domains (ie. domains where HTTP basic authentication, if any, will
            be sent).  The regular expression will be matched case
            insensitively against host names.  The default behavior (when
            unset, that is) is to send the authentication information only to
            the host which requests it; usually you don't want to change this.
            For example, the following configures only the w3.org domain as
            trusted:

                Trusted = \.w3\.org$

            "Allow_Private_IPs" is a boolean flag indicating whether checking
            links on non-public IP addresses is allowed.  The default is true
            in command line mode and false when run as a CGI script.  For
            example, to disallow checking non-public IP addresses, regardless
            of the mode, use:

               Allow_Private_IPs = 0

            "Forbidden_Protocols" is a comma separated list of additional
            protocols/URI schemes that the link checker is not allowed to use.
            The "javascript" and "mailto" schemes are always forbidden, and so
            is the "file" scheme when running as a CGI script.

               Forbidden_Protocols = javascript,mailto

            "Markup_Validator_URI" and "CSS_Validator_URI" are formatted URIs
            to the respective validators.  The %s in these will be replaced
            with the full "URI encoded" URI to the document being checked, and
            shown in the link checker results view in the online/CGI version.
            The defaults are:

               Markup_Validator_URI =
                 http://validator.w3.org/check?uri=%s
               CSS_Validator_URI =
                 http://jigsaw.w3.org/css-validator/validator?uri=%s

            "Doc_URI" is a URI used for linking to the documentation, and CSS
            and JavaScript files in the dynamically generated content of the
            link checker.  The default is:

               Doc_URI = http://validator.w3.org/docs/checklink.html

            "Connection_Cache_Size" is an integer denoting the maximum number
            of connections the link checker will keep open at any given time.
            The default is:

               Connection_Cache_Size = 2

ENVIRONMENT

       checklink uses the libwww-perl library which has a number of
       environment variables affecting its behaviour.  See "SEE ALSO" for some
       pointers.

       W3C_CHECKLINK_CFG
            If set, overrides the path to the configuration file.

SEE ALSO

       The documentation for this program is available on the web at
       <http://validator.w3.org/docs/checklink.html>.

       LWP, Net::FTP, Net::NNTP, Net::IP, perlre.

AUTHOR

       This program was originally written by Hugo Haas <hugo@w3.org>, based
       on Renaud Bruyeron's checklink.pl.  It has been enhanced by Ville
       Skyttae and many other volunteers since.  Use the
       <www-validator@w3.org> mailing list for feedback, and see
       <http://validator.w3.org/docs/checklink.html#csb> for more information.

       This manual page was originally written by Frederic Schuetz
       <schutz@mathgen.ch> for the Debian GNU/Linux system (but may be used by
       others).

COPYRIGHT

       This program is licensed under the W3CX Software License,
       http://www.w3.org/Consortium/Legal/copyright-software
       <http://www.w3.org/Consortium/Legal/copyright-software>.