Provided by: htdig_3.2.0b6-18build1_amd64 bug

NAME

       htdig - retrieve HTML documents for ht://Dig search engine

SYNOPSIS

       htdig [options]

DESCRIPTION

       Htdig retrieves HTML documents using the HTTP protocol and gathers information from these documents which
       can later be used to search these documents. This program can be referred to as the search robot.

OPTIONS

       -      Get the list of URLs to start indexing  from  standard  input.  This  will  override  the  default
              parameter start_url specified in the config file  and the file supplied to the -m option.

       -a     Use  alternate work files. Tells htdig to append .work to database files, causing a second copy of
              the database to be built. This allows the original  files  to  be  used  by  htsearch  during  the
              indexing run.

       -c configfile
              Use the specified configfile instead of the default.

       -h maxhops
              Restrict the dig to documents that are at most maxhops links away from the starting document. This
              only works if option -i is also given.

       -i     Initial. Do not use any old databases. Old databases will be erased before running the program.

       -m filename
              Minimal run. Only index the URLs given in the file filename, ignoring all others. URLs in the file
              should be formatted one URL per line.

       -s     Print statistics about the dig after completion.

       -t     Create  an  ASCII  version  of  the  document  database. This database is easy to parse with other
              programs so that information can be extracted from it for purposes other than searching. One could
              gather some interesting statistics from this database.

              Fieldname                           Value
                  u       URL
                  t       Title
                  a       State
                          (0 normal, 1 not found, 2 not indexed, 3 obsolete)
                  m       Time of last modification reported by the server
                  s       Document Size in bytes
                  H       Excerpt of the document
                  h       Meta Description
                  l       Time of last retrieval
                  L       Count of links in the document or  of outgoing links
                  b       Number of links to the document, also called
                          incoming links or backlinks
                  c       Hop count of this document
                  g       Signature of this document
                          (used to detect duplicates)
                  e       E-Mail address to use for a notification from htnotify
                  n       Date on which such notification is sent
                  S       Subject of the notifcation message
                  d       The text of Incoming links pointing to this document
                          (e.g. <a href="docURL">description</a>)
                  A       Anchors in the document (i.e. <A NAME=...)

       -u username:password
              Tells  htdig  to  send  the supplied username and password with each HTTP request. The credentials
              will be encoded using the ´Basic´ authentication method. There HAS to be a colon (:)  between  the
              username and password.

       -v     Verbose  mode.  This  increases  the  verbosity of the program. Using more than 2 is probably only
              useful for debugging purposes. The default verbose mode (using only one -v) gives a nice  progress
              report while digging. Please consult the section below on the exact format of the progress report.

       FORMAT OF THE PROGRESS REPORT GIVEN IN VERBOSE MODE
              A  line  is  shown for each URL, with 3 numbers before the URL and some symbols after the URL. The
              first number is the number of documents parsed so far, the second is the DocID for this  document,
              and  the  third  is  the  hop  count  of  the  document  (number of hops from one of the start_url
              documents). Signification of the symbols printed after the url:

              "*" is printed for a link already visited

              "+" is printed for a new link just queued

              "-" is output for a link rejected for any of a number of reasons. To find out what  those  reasons
              are, you need to run htdig with at least 3 -v options, i.e. -vvv.

       If there are no "*", "+" or "-" symbols after the URL, it doesn't mean the document was not parsed or was
       empty, but only that no links to other documents were found within it. With more  verbose  output,  these
       symbols will get interspersed in several lines of debugging output.

       FILES

       /etc/htdig/htdig.conf
              The default configuration file.

SEE ALSO

       Please  refer  to  the HTML pages (in the htdig-doc package) /usr/share/doc/htdig-doc/html/index.html and
       the manual pages htdigconfig(8) , htmerge(1) , htnotify(1) , htsearch(1) and  rundig(1)  for  a  detailed
       description of ht://Dig and its commands.

AUTHOR

       This  manual  page  was  written  by  Christian  Schwarz,  modified by Stijn de Bekker. It is updated and
       maintained by Robert Ribnitz and based on the HTML documentation of ht://Dig.

                                                  21 July 1997                                          htdig(1)