Provided by: visitors_0.7-9_amd64 bug

NAME

       visitors - a fast web server log analyzer

SYNOPSIS

       visitors [options] <filename> [<filename> ...]

DESCRIPTION

       Visitors generates access statistics from specified web log files.

       The resulting reports contain a number of useful informations and statistics:

       • Requested pages

       • Requested images

       • Referers by number of visits and age

       • Unique visitors in each day

       • Page views per visit

       • Pages accessed by the Google crawler (and the date of google's last access on every page)

       • Pages accessed by the AdSense crawler (and the date of adsense's last access on every page)

       • Percentage of visits originated from Google searches for every day

       • User navigation patterns (web trails)

       • Keyphrases used in Google searches

       • Human languages used in google searches

       • User agents

       • Weekdays and Hours distributions of accesses

       • Weekdays/Hours combined bidimensional map

       • Month/Day combined bidimensional map

       • Visual path analysis with Graphviz

       • Operating systems, browsers and domains popularity

       • Visitors screen resolution and color depth

       • 404 errors

       The web log files don't need to follow a strict format, except: the date MUST be included between [ and ]
       chars,  the  client  hostname  MUST be the first entry in the log, referers and requests MUST be included
       between double quote chars. Out of the box Apache log file will work without problems.

       It's possible to use Visitors with  IIS  log  files  converting  them  using  the  iis2apache.pl  utility
       distributed  with Visitors (The utility is the same you can find at http://www.jammed.com/~jwa/hacks/ and
       is distributed under the GPL license).

       Note that logfile can be a - character to use the standard input.

   Available options:
       -A --all
               Activate all the optional reports. This option is equivalent to -GKUWRDOB.  Note that --trails is
               not implicitly included in this option because it  also  requires  --prefix.   See  the  --trails
               option documentation for details.

       -T --trails
               Enable  the  Web  Trails  feature.  The report will show what are the more frequent moves between
               pages of your site. This option requires the --prefix option to work.

       -G --google
               Activate two reports about pages accessed by the Google and Adsense web crawlers. Pages are shown
               ordered accordingly to the last time the Google web crawler requested the page.  The  first  page
               shown is the latest that was accessed.

       -K --google-keyphrases
               Activate a report that shows common search keyphrases used to found your web site from Google.

       -Z --google-keyphrases-age
               Activate a report that shows common the lastest keyphrases used to found your site from Google.

       -H --google-human-language
               Activate a report that shows common human languages used to serach from Google. This feature uses
               the 'hl' variable of the Google referer URL.

       -U --user-agents
               Show information about common user agents.

       -W --weekday-hour-map
               Activate  the  generation  of  a combined weekdays/hours bidimensional map that shows information
               about traffic in every 168 different hours of a 7 days week. Brighter colors mean higher traffic.
               This is ideal to figure what's the best moment on a week for a maintenance downtime,  what's  the
               target  of  the  site,  if  people are accessing it from work or from home, and so on. The map is
               generated as pure html inside the report.

       -M --month-day-map
               Activate the generation of a combined month/day bidimensional map that  shows  information  about
               traffic  in  every  365  different days of the year. Brighter colors mean higher traffic. This is
               useful in order to figure with a quick look traffic trends and days with particuarly high or  low
               traffic. The map is generated as pure html inside the report.

       -R --referers-age
               Shows  referers ordered by age. The 'age' of a referer is the date it appeared the first time. In
               the report, newer referers are on top.  This report is useful to check for new external links.

       -D --domains
               Activate the generation of information about Top Level Domains popularity. This  information  may
               be  useful  to  guess  the amount of visits from different countries. Note that Visitors will not
               resolve numerical IP addresses if they are  not  already  resolved  in  the  log  file.  All  the
               unresolved IP addresses will be shown in this report under the entry Unresolved IP.

       -O --operating-systems
               Activate  the  report  about  Operating Systems popularity, sorted by number of accesses. All the
               common operating systems are listed in the report, while unknown operating systems will be summed
               in the unknown entry.

       -B --browsers
               Activate the report about Browsers popularity, sorted by  number  of  accesses.  All  the  common
               browsers  are  listed  in the report, while unknown browsers will be summed in the unknown entry.
               Browsers are listed by family (for example Internet Explorer, Opera,  and  so  on),  and  not  by
               specific version.

       -X --error404
               Activate  the  generation  of  missing documents (404 error) report.  This report will show files
               requested, but missing, ordered by number of requests. The report is useful in order to  discover
               if  for  some  mistake there is some file missing in the web site, but often you will see bizarre
               requests performed by users or internet worms and security scans.

       -Y --pageviews
               Activate the generation of a report that shows (and approximation) of  the  percentage  of  pages
               viewed  per  unique visit. The goal of this report is to understand the usage pattern of the site
               and the level of interest of the visitors. For example, in a site that provides a number of pages
               with interesting contents, the percentage of visitors performing a single page view per visit  is
               probably searching for something else.

       -S --robots
               Activate  the  generation  of  a  report  that  shows  user agents of clients requesting the file
               robots.txt, with the exception of the MSIE Crawler requests. The result is a list of  web  robots
               and spieders that accessed your web site, ordered by number of requests of robots.txt.

       --screen-info
               Activate  the  screen  resolution  and color depth reports. Note that for this report to work you
               have to insert on your HTML pages the javascript code you can find in  the  README  file  in  the
               visitors tarball.

       --stream
               Enable  the Stream Mode (see the STREAM MODE DETAILS section for more information). Shortly: when
               in stream mode Visitors will process all the log files specified (possibly none, that's valid  in
               this  mode)  as  usual,  producing  the report. Then the stream mode is entered and Visitors will
               start to read from standard input for a continuous stream of web logs,  updating  the  statistics
               incrementally  as  new  data  is  available.   A  new report is produced periodically if new data
               arrived, accordingly to the --update-every option (default is to update the statistics every  ten
               minutes).  It's  possible to ask Visitors to reset the statistics after some period of time using
               the --reset-every option.  This allows to have a snapshot of what is going on in  the  last  five
               minutes,  hour, day or week.  Note that --stream requires --output-file because Visitors needs to
               overwrite the report for every update, so can't output to standard output  as  usually.   If  you
               plan to use the stream mode, also check the --tail option.

       --update-every seconds
               By  default  in  Stream  Mode  statistics  are  updated every 10 minutes. This option specifies a
               different period in seconds.

       --reset-every seconds
               By default in Stream Mode statistics are never reset,  but  continuously  updated  incrementally.
               This  option  specifies  to  reset  statistics after the given amount of time in seconds. This is
               useful to have a snapshot of the web site usage.

       -f --output-file file
               Write output to file instead of stdout.

       -m --max-lines number
               Set the max number of entries that should be shown in reports like referers,  keyphrases  and  so
               on. This option sets all the reports max number of entries for all the reports at once.

       -r --max-referers number
               Set the max number of entries in the referer report.

       -p --max-pages number
               Set the max number of entries in the accessed pages report.

       -i --max-images number
               Set the max number of entries in the accessed images report.

       -x --max-error404 number
               Set the max number of entries in the missing documents report.

       -u --max-useragents number
               Set the max number of entries in the user agents report.

       -t --max-trails number
               Set the max number of entries in the web trails report.

       -g --max-googled number
               Set the max number of entries in the crawled pages report (google bot).

           --max-adsensed number
               Set the max number of entries in the crawled pages report (adsense bot).

       -k --max-google-keyphrases number
               Set the max number of entries in the Google keyphrases report.

       -a --max-referers-age number
               Set the max number of entries in the referers by date report.

       -d --max-domains number
               Set the max number of entries in the domains report.

       -P --prefix string
               Prefixes  specify  to  visitors  how a link should look like to be classified as internal to your
               site. This option is required for --trails and will also have  the  nice  effect  to  avoid  that
               internal  links  are  shown  in  the  referers  report.  If  you  are  analyzing  statistics  for
               http://www.your.site.com/, just use: --prefix http://www.your.site.com

               If your site is reachable using more  hostnames  you  should  specify  all  these,  like  in  the
               following example:
               --prefix http://www.your.site.com --prefix http://your.site.com

       -o --output html|text
               Output module. You can use text or html. The default is html.

       -V --graphviz
               This  option  enables  the  Graphviz  mode: Visitors will analyze the log file and create a graph
               describing the access patterns of your web site. The information used to create the graph is  the
               same  as the web trails report (that you can enable with --trails), but as a graph it can be more
               readable for non trivial sites. An example on how to use this feature:

               % visitors access.log --prefix http://www.hping.org \
                 --graphviz > graph.dot

               % dot /tmp/graph.dot -Tpng > graph.png

               On Debian systems, the dot command is included in the graphviz package. The generated graph  will
               have  edges of different colors, from blue to red to specify a low to high level of popularity of
               a given movement from one page to another of the web site.  This  option  requires  one  or  more
               --prefix options in order to work, just like the --trails option.

       -V --graphviz-ignorenode-google
               Don't put the google node on the generated graph. Only useful with --trails

       -V --graphviz-ignorenode-external
               Don't put the external referer node on the generated graph. Only useful with --trails

       -V --graphviz-ignorenode-noreferer
               Don't  put the node indicating requests without referer on the generated graph.  Only useful with
               --trails

       --tail  When this option is specified Visitors will emulate the Unix  command  tail  -f  --max-unchanged-
               stats=1  -q. You can specify the log file names to monitor for changes, once new data is appended
               in any of the specified file, visitors will output the new data  to  the  standard  output.  This
               option  is  useful  conjunction  to  the Stream Mode (--stream). Files can be log-rotated because
               Visitors in Tail Mode will always try to reopen the file to check for changes.

       --time-delta delta
               If your web server is in a different timezone than most of your visitors or  yourself,  you  will
               notice a shift in the reports regarding time and days of week. By default, Visitors will generate
               output  using  the  host's  locale.  You  can  use the --time-delta option in order to adjust the
               output. Positive values will shift on the right (toward future) from the given number  of  hours,
               negative  values will shift on the left (toward past). In the future this option may have support
               to directly specify the output timezone.

       --filter-spam
               Filter referer spam using a  keyword-based  filter  (see  blacklist.h  for  more  information  on
               keywords).   If   you   don't   know   what   referer   spam   is   check  this  Wikipedia  page:
               http://en.wikipedia.org/wiki/Referer_spam

       --ignore-404
               When this option is turned on log lines with 404 errors are just used to generate the 404  errors
               report and not used for other reports.

       --grep pattern
               Process only log lines matching the specified pattern.  Patterns are matched using the glob-style
               matching (the one used by the unix shell):

               *         Matches any sequence of characters in string, including a null string.

               ?         Matches any single character in string.

               [chars]   Matches any character in the set given by chars.  If a sequence of the form x-y appears
                         in chars, then any character between x and y, inclusive, will match.

               \x        Matches  the  single  character  x.   This  provides  a  way  of  avoiding  the special
                         interpretation of the characters *?[]\ in pattern.
       For default matching is performed in a case sensitive way, but case insensitive matching  may  be  forced
       prefixing  the  pattern with the string cs:, so for example the pattern cs:firefox will match all the log
       lines containing the string firefox, FireFox, FIREFOX and so on.

       --exclude pattern
               Works exactly like --grep, but only lines NOT matching the specified pattern are processed.  Note
               that  --grep  and  --exclude  can  be  used  multiple times, and are processed sequentially.  For
               example visitors --grep firefox --exclude download will process only lines including  the  string
               firefox but not including the string download.

       --debug Show  additional  information  on  errors.  For example invalid lines are printed on the standard
               error if found. Mainly useful for developers and error reporting.

       -h --help
               Show usage and copyright information.

       -v --version
               Show program version.

EXAMPLES

       The simplest usage, to be used interactively when you have a web log to check (for example  over  ssh  in
       your web server), just use:

       % visitors access.log | less

       That  will  produce  a  human  readable  output  in  text only. To generate html web stats with much more
       information you may use instead this:

       % visitors --output text -A -m 30 access.log -o html > report.html

       If you want information on the usage patterns for your site you must provide the url prefix of  your  web
       site,  and  specify  the  --trails  option.  The next example produces an HTML report with usage patterns
       information.

       % visitors -A -m 30 access.log --trails \
         --prefix http://www.hping.org > report.html

       Note that it's ok to specify multiple file names, or to provide the input using the standard  input  like
       in the following two examples:

       % visitors /var/log/apache/access.log.*
       % zcat access.log.*.gz | visitors -

STREAM MODE DETAILS

       The usual way to run Visitors is to specify some option to control the report generation, and the name of
       log files.  For example to generate a report from two Apache's access log files you can write:

       % visitors -A access.log.1 access.log.2 > report.html

       Visitors will analyze the log files, and will output the report.  Sometimes it can be more interesting to
       have  web  statistics  updated  continuously,  almost in real time, as new data is available. In order to
       provide this feature Visitors implements a mode called Stream Mode that reads a stream of logs  from  the
       standard  input.   The  following  command  line  shows  how  to  use  it  (but check the --stream option
       documentation for more information).

       % tail -f /var/log/apache/access.log | \
         visitors --stream -A --update-every 60 \
         --output-file /tmp/report.html

       Visitors will incrementally update the statistics as new logs are available  and  will  update  the  html
       report  every  60 seconds.  As you can see in this mode is required to specify the report file name using
       the --output-file option because Visitors needs to overwrite the report to update it. Note  that  instead
       of  the  tail  command  in  the  above  example  it  is possible to use instead Visitors in Tail Mode (an
       emulation for the tail program):

       % visitors --tail /var/log/apache/access.log | \
         visitors --stream -A --update-every 60 \
         --output-file /tmp/report.html

       It's possible to generate real time statistics about the last N  seconds  of  web  traffic,  where  N  is
       configurable  and  can  be  from  few  seconds  to  one week or more, using the --reset-every option. The
       following example generates statistics updated every 30 seconds about the last hour of traffic:

       % visitors --tail /var/log/apache/access.log | \
         visitors --stream -A --update-every 30 --reset-every 3600 \
         --output-file /tmp/report.html

AUTHORS

       Visitors was written by Salvatore Sanfilippo <antirez@invece.org>.

COPYING

       Copyright (C) 2004,2005 Salvatore Sanfilippo <antirez@invece.org>.

       Visitors is distributed under the GNU General Public License.

       This  manual  page  was  written  (based  on  the  original  HTML  documentation)  by  Romain   Francoise
       <rfrancoise@debian.org> for the Debian GNU/Linux system, but may be used by others.  Salvatore Sanfilippo
       updated this man page starting from Visitors 0.5, this manual page is now part of the Visitors tarball.

Visitors 0.7                                       April 2005                                        VISITORS(1)