Ubuntu Manpage: urlwatch - Watch web pages and arbitrary URLs for changes

NAME

       urlwatch - Watch web pages and arbitrary URLs for changes

SYNOPSIS

       urlwatch [options]

DESCRIPTION

       urlwatch  watches  a list of URLs for changes and prints out unified diffs of the changes. You can filter
       always-changing parts of websites by providing a "hooks.py" script.

OPTIONS

       --version
              show program's version number and exit

       -h, --help
              show the help message and exit

       -v, --verbose
              Show debug/log output

       --urls=FILE
              Read URLs from the specified file

       --hooks=FILE
              Use specified file as hooks.py module

       -e, --display-errors
              Include HTTP errors (404, etc..) in the output

ADVANCED FEATURES

       urlwatch includes some advanced features that you have to activate  by  creating  a  hooks.py  file  that
       specifies  for  which  URLs  to  use  a  specific  feature.  You can also use the hooks.py file to filter
       trivially-varying elements of a web page.

   ICALENDAR FILE PARSING
       This module allows you to parse .ics files that are in iCalendar format and  provide  a  very  simplified
       text-based format for the diffs. Use it like this in your hooks.py file:

         from urlwatch import ical2txt

         def filter(url, data):
             if url.endswith('.ics'):
                 return ical2txt.ical2text(data).encode('utf-8') + data
             # ...you can add more hooks here...
             return data

   HTML TO TEXT CONVERSION
       There  are three methods of converting HTML to text in the current version of urlwatch: "lynx" (default),
       "html2text" and "re". The former two use command-line utilities of the same name to convert HTML to text,
       and the last one uses a simple regex-based tag stripping method (needs  no  extra  tools).   Here  is  an
       example of using it in your hooks.py file:

         from urlwatch import html2txt

         def filter(url, data):
             if url.endswith('.html') or url.endswith('.htm'):
                 return html2txt.html2text(data, method='lynx')
             # ...you can add more hooks here...
             return data

FILES

       ~/.urlwatch/urls.txt
              A list of HTTP/FTP URLs to watch (one URL per line)

       ~/.urlwatch/lib/hooks.py
              A Python module that can be used to filter contents

       ~/.urlwatch/cache/
              The state of web pages is saved in this folder

AUTHOR

       Thomas Perl <thp.io/about>

WEBSITE

       http://thp.io/2008/urlwatch/

urlwatch 1.15                                      August 2012                                       URLWATCH(1)