Provided by: visitors_0.7-9_amd64 bug

NAME

       visitors - a fast web server log analyzer

SYNOPSIS

       visitors [options] <filename> [<filename> ...]

DESCRIPTION

       Visitors generates access statistics from specified web log files.

       The resulting reports contain a number of useful informations and statistics:

       · Requested pages

       · Requested images

       · Referers by number of visits and age

       · Unique visitors in each day

       · Page views per visit

       · Pages  accessed  by  the  Google  crawler (and the date of google's last access on every
         page)

       · Pages accessed by the AdSense crawler (and the date of adsense's last  access  on  every
         page)

       · Percentage of visits originated from Google searches for every day

       · User navigation patterns (web trails)

       · Keyphrases used in Google searches

       · Human languages used in google searches

       · User agents

       · Weekdays and Hours distributions of accesses

       · Weekdays/Hours combined bidimensional map

       · Month/Day combined bidimensional map

       · Visual path analysis with Graphviz

       · Operating systems, browsers and domains popularity

       · Visitors screen resolution and color depth

       · 404 errors

       The  web log files don't need to follow a strict format, except: the date MUST be included
       between [ and ] chars, the client hostname MUST be the first entry in  the  log,  referers
       and  requests  MUST be included between double quote chars. Out of the box Apache log file
       will work without problems.

       It's possible to use Visitors with IIS log files converting them using  the  iis2apache.pl
       utility   distributed   with   Visitors   (The  utility  is  the  same  you  can  find  at
       http://www.jammed.com/~jwa/hacks/ and is distributed under the GPL license).

       Note that logfile can be a - character to use the standard input.

   Available options:
       -A --all
               Activate all the optional reports. This option is equivalent to  -GKUWRDOB.   Note
               that  --trails  is not implicitly included in this option because it also requires
               --prefix.  See the --trails option documentation for details.

       -T --trails
               Enable the Web Trails feature. The report will show what  are  the  more  frequent
               moves  between  pages  of  your  site. This option requires the --prefix option to
               work.

       -G --google
               Activate two reports about pages accessed by the Google and Adsense web  crawlers.
               Pages  are  shown  ordered  accordingly  to  the  last time the Google web crawler
               requested the page. The first page shown is the latest that was accessed.

       -K --google-keyphrases
               Activate a report that shows common search keyphrases used to found your web  site
               from Google.

       -Z --google-keyphrases-age
               Activate a report that shows common the lastest keyphrases used to found your site
               from Google.

       -H --google-human-language
               Activate a report that shows common human languages used to  serach  from  Google.
               This feature uses the 'hl' variable of the Google referer URL.

       -U --user-agents
               Show information about common user agents.

       -W --weekday-hour-map
               Activate  the generation of a combined weekdays/hours bidimensional map that shows
               information about traffic in every 168 different hours of a 7 days week.  Brighter
               colors  mean  higher  traffic. This is ideal to figure what's the best moment on a
               week for a maintenance downtime, what's the target of  the  site,  if  people  are
               accessing  it from work or from home, and so on. The map is generated as pure html
               inside the report.

       -M --month-day-map
               Activate the generation of a  combined  month/day  bidimensional  map  that  shows
               information about traffic in every 365 different days of the year. Brighter colors
               mean higher traffic. This is useful in order to figure with a quick  look  traffic
               trends and days with particuarly high or low traffic. The map is generated as pure
               html inside the report.

       -R --referers-age
               Shows referers ordered by age. The 'age' of a referer is the date it appeared  the
               first  time.  In  the report, newer referers are on top.  This report is useful to
               check for new external links.

       -D --domains
               Activate the generation of information about Top Level  Domains  popularity.  This
               information  may be useful to guess the amount of visits from different countries.
               Note that Visitors will not resolve numerical IP addresses if they are not already
               resolved  in  the  log file. All the unresolved IP addresses will be shown in this
               report under the entry Unresolved IP.

       -O --operating-systems
               Activate the report about  Operating  Systems  popularity,  sorted  by  number  of
               accesses. All the common operating systems are listed in the report, while unknown
               operating systems will be summed in the unknown entry.

       -B --browsers
               Activate the report about Browsers popularity, sorted by number of  accesses.  All
               the  common  browsers  are  listed  in  the report, while unknown browsers will be
               summed in the unknown entry. Browsers are listed by family (for  example  Internet
               Explorer, Opera, and so on), and not by specific version.

       -X --error404
               Activate the generation of missing documents (404 error) report.  This report will
               show files requested, but missing, ordered by number of requests.  The  report  is
               useful  in order to discover if for some mistake there is some file missing in the
               web site, but often you will see bizarre requests performed by users  or  internet
               worms and security scans.

       -Y --pageviews
               Activate  the  generation  of  a  report  that  shows  (and  approximation) of the
               percentage of pages viewed per unique  visit.  The  goal  of  this  report  is  to
               understand  the  usage  pattern  of  the  site  and  the  level of interest of the
               visitors. For example, in a site that provides a number of pages with  interesting
               contents,  the  percentage  of visitors performing a single page view per visit is
               probably searching for something else.

       -S --robots
               Activate the generation of a report that shows user agents of  clients  requesting
               the  file  robots.txt, with the exception of the MSIE Crawler requests. The result
               is a list of web robots and spieders that  accessed  your  web  site,  ordered  by
               number of requests of robots.txt.

       --screen-info
               Activate  the screen resolution and color depth reports. Note that for this report
               to work you have to insert on your HTML pages the javascript code you can find  in
               the README file in the visitors tarball.

       --stream
               Enable the Stream Mode (see the STREAM MODE DETAILS section for more information).
               Shortly: when in stream mode Visitors will process all  the  log  files  specified
               (possibly  none,  that's  valid in this mode) as usual, producing the report. Then
               the stream mode is entered and Visitors will start to read from standard input for
               a continuous stream of web logs, updating the statistics incrementally as new data
               is available.  A  new  report  is  produced  periodically  if  new  data  arrived,
               accordingly  to  the  --update-every  option  (default is to update the statistics
               every ten minutes). It's possible to ask Visitors to reset  the  statistics  after
               some  period  of  time  using  the  --reset-every  option.   This allows to have a
               snapshot of what is going on in the last five minutes, hour, day  or  week.   Note
               that  --stream  requires  --output-file  because  Visitors  needs to overwrite the
               report for every update, so can't output to standard output as  usually.   If  you
               plan to use the stream mode, also check the --tail option.

       --update-every seconds
               By  default  in  Stream  Mode statistics are updated every 10 minutes. This option
               specifies a different period in seconds.

       --reset-every seconds
               By default in Stream Mode statistics are never  reset,  but  continuously  updated
               incrementally. This option specifies to reset statistics after the given amount of
               time in seconds. This is useful to have a snapshot of the web site usage.

       -f --output-file file
               Write output to file instead of stdout.

       -m --max-lines number
               Set the max number of entries that should  be  shown  in  reports  like  referers,
               keyphrases  and  so on. This option sets all the reports max number of entries for
               all the reports at once.

       -r --max-referers number
               Set the max number of entries in the referer report.

       -p --max-pages number
               Set the max number of entries in the accessed pages report.

       -i --max-images number
               Set the max number of entries in the accessed images report.

       -x --max-error404 number
               Set the max number of entries in the missing documents report.

       -u --max-useragents number
               Set the max number of entries in the user agents report.

       -t --max-trails number
               Set the max number of entries in the web trails report.

       -g --max-googled number
               Set the max number of entries in the crawled pages report (google bot).

           --max-adsensed number
               Set the max number of entries in the crawled pages report (adsense bot).

       -k --max-google-keyphrases number
               Set the max number of entries in the Google keyphrases report.

       -a --max-referers-age number
               Set the max number of entries in the referers by date report.

       -d --max-domains number
               Set the max number of entries in the domains report.

       -P --prefix string
               Prefixes specify to visitors how a link should  look  like  to  be  classified  as
               internal to your site. This option is required for --trails and will also have the
               nice effect to avoid that internal links are shown in the referers report. If  you
               are   analyzing  statistics  for  http://www.your.site.com/,  just  use:  --prefix
               http://www.your.site.com

               If your site is reachable using more hostnames you should specify all these,  like
               in the following example:
               --prefix http://www.your.site.com --prefix http://your.site.com

       -o --output html|text
               Output module. You can use text or html. The default is html.

       -V --graphviz
               This  option  enables  the  Graphviz  mode: Visitors will analyze the log file and
               create a graph describing the access patterns of your web  site.  The  information
               used to create the graph is the same as the web trails report (that you can enable
               with --trails), but as a graph it can be more readable for non trivial  sites.  An
               example on how to use this feature:

               % visitors access.log --prefix http://www.hping.org \
                 --graphviz > graph.dot

               % dot /tmp/graph.dot -Tpng > graph.png

               On  Debian  systems,  the  dot  command  is  included in the graphviz package. The
               generated graph will have edges of different colors, from blue to red to specify a
               low  to  high  level of popularity of a given movement from one page to another of
               the web site.  This option requires one or more --prefix options in order to work,
               just like the --trails option.

       -V --graphviz-ignorenode-google
               Don't put the google node on the generated graph. Only useful with --trails

       -V --graphviz-ignorenode-external
               Don't  put  the  external  referer  node  on the generated graph. Only useful with
               --trails

       -V --graphviz-ignorenode-noreferer
               Don't put the node indicating requests without referer  on  the  generated  graph.
               Only useful with --trails

       --tail  When  this  option  is  specified  Visitors  will emulate the Unix command tail -f
               --max-unchanged-stats=1 -q. You can specify the log  file  names  to  monitor  for
               changes,  once  new  data  is appended in any of the specified file, visitors will
               output the new data to the standard output. This option is useful  conjunction  to
               the Stream Mode (--stream). Files can be log-rotated because Visitors in Tail Mode
               will always try to reopen the file to check for changes.

       --time-delta delta
               If your web server is in a different  timezone  than  most  of  your  visitors  or
               yourself,  you will notice a shift in the reports regarding time and days of week.
               By default, Visitors will generate output using the host's locale. You can use the
               --time-delta  option  in order to adjust the output. Positive values will shift on
               the right (toward future) from the given number of  hours,  negative  values  will
               shift  on  the  left  (toward past). In the future this option may have support to
               directly specify the output timezone.

       --filter-spam
               Filter referer spam  using  a  keyword-based  filter  (see  blacklist.h  for  more
               information  on  keywords).  If  you  don't  know  what referer spam is check this
               Wikipedia page: http://en.wikipedia.org/wiki/Referer_spam

       --ignore-404
               When this option is turned on log lines with 404 errors are just used to  generate
               the 404 errors report and not used for other reports.

       --grep pattern
               Process only log lines matching the specified pattern.  Patterns are matched using
               the glob-style matching (the one used by the unix shell):

               *         Matches any sequence of characters in string, including a null string.

               ?         Matches any single character in string.

               [chars]   Matches any character in the set given by chars.  If a sequence  of  the
                         form  x-y  appears  in  chars,  then  any  character  between  x  and y,
                         inclusive, will match.

               \x        Matches the single character x.  This provides a  way  of  avoiding  the
                         special interpretation of the characters *?[]\ in pattern.
       For  default  matching is performed in a case sensitive way, but case insensitive matching
       may be forced prefixing the pattern with the  string  cs:,  so  for  example  the  pattern
       cs:firefox  will  match  all the log lines containing the string firefox, FireFox, FIREFOX
       and so on.

       --exclude pattern
               Works exactly like --grep, but only lines NOT matching the specified  pattern  are
               processed.  Note  that  --grep  and  --exclude can be used multiple times, and are
               processed sequentially.  For example visitors --grep  firefox  --exclude  download
               will  process only lines including the string firefox but not including the string
               download.

       --debug Show additional information on errors. For example invalid lines  are  printed  on
               the standard error if found. Mainly useful for developers and error reporting.

       -h --help
               Show usage and copyright information.

       -v --version
               Show program version.

EXAMPLES

       The simplest usage, to be used interactively when you have a web log to check (for example
       over ssh in your web server), just use:

       % visitors access.log | less

       That will produce a human readable output in text only. To generate html  web  stats  with
       much more information you may use instead this:

       % visitors --output text -A -m 30 access.log -o html > report.html

       If  you  want  information  on  the  usage patterns for your site you must provide the url
       prefix of your web site, and specify the --trails option. The  next  example  produces  an
       HTML report with usage patterns information.

       % visitors -A -m 30 access.log --trails \
         --prefix http://www.hping.org > report.html

       Note  that  it's  ok  to  specify  multiple  file names, or to provide the input using the
       standard input like in the following two examples:

       % visitors /var/log/apache/access.log.*
       % zcat access.log.*.gz | visitors -

STREAM MODE DETAILS

       The usual way to run Visitors is to specify some option to control the report  generation,
       and  the name of log files.  For example to generate a report from two Apache's access log
       files you can write:

       % visitors -A access.log.1 access.log.2 > report.html

       Visitors will analyze the log files, and will output the report.  Sometimes it can be more
       interesting  to have web statistics updated continuously, almost in real time, as new data
       is available. In order to provide this feature Visitors implements a  mode  called  Stream
       Mode  that  reads  a  stream  of logs from the standard input.  The following command line
       shows how to use it (but check the --stream option documentation for more information).

       % tail -f /var/log/apache/access.log | \
         visitors --stream -A --update-every 60 \
         --output-file /tmp/report.html

       Visitors will incrementally update the statistics as  new  logs  are  available  and  will
       update  the  html  report  every  60  seconds.  As you can see in this mode is required to
       specify the report file name using the --output-file  option  because  Visitors  needs  to
       overwrite  the  report  to  update  it. Note that instead of the tail command in the above
       example it is possible to use instead Visitors in Tail Mode (an  emulation  for  the  tail
       program):

       % visitors --tail /var/log/apache/access.log | \
         visitors --stream -A --update-every 60 \
         --output-file /tmp/report.html

       It's  possible  to  generate real time statistics about the last N seconds of web traffic,
       where N is configurable and can be from few  seconds  to  one  week  or  more,  using  the
       --reset-every  option. The following example generates statistics updated every 30 seconds
       about the last hour of traffic:

       % visitors --tail /var/log/apache/access.log | \
         visitors --stream -A --update-every 30 --reset-every 3600 \
         --output-file /tmp/report.html

AUTHORS

       Visitors was written by Salvatore Sanfilippo <antirez@invece.org>.

COPYING

       Copyright (C) 2004,2005 Salvatore Sanfilippo <antirez@invece.org>.

       Visitors is distributed under the GNU General Public License.

       This manual page was  written  (based  on  the  original  HTML  documentation)  by  Romain
       Francoise  <rfrancoise@debian.org>  for  the  Debian  GNU/Linux system, but may be used by
       others.  Salvatore Sanfilippo updated this man  page  starting  from  Visitors  0.5,  this
       manual page is now part of the Visitors tarball.