Ubuntu Manpage: urlwatch - Watch web pages and arbitrary URLs for changes

NAME

       urlwatch - Watch web pages and arbitrary URLs for changes

SYNOPSIS

       urlwatch [options]

DESCRIPTION

       urlwatch  watches  a list of URLs for changes and prints out unified diffs of the changes.
       You can filter always-changing parts of websites by providing a "hooks.py" script.

OPTIONS

       --version
              show program's version number and exit

       -h, --help
              show the help message and exit

       -v, --verbose
              Show debug/log output

       --urls=FILE
              Read URLs from the specified file

       --hooks=FILE
              Use specified file as hooks.py module

       -e, --display-errors
              Include HTTP errors (404, etc..) in the output

ADVANCED FEATURES

       urlwatch includes some advanced features that you have to activate by creating a  hooks.py
       file  that  specifies  for  which  URLs  to  use  a specific feature. You can also use the
       hooks.py file to filter trivially-varying elements of a web page.

   ICALENDAR FILE PARSING
       This module allows you to parse .ics files that are in iCalendar format and provide a very
       simplified text-based format for the diffs. Use it like this in your hooks.py file:

         from urlwatch import ical2txt

         def filter(url, data):
             if url.endswith('.ics'):
                 return ical2txt.ical2text(data).encode('utf-8') + data
             # ...you can add more hooks here...
             return data

   HTML TO TEXT CONVERSION
       There  are  three  methods  of converting HTML to text in the current version of urlwatch:
       "lynx" (default), "html2text" and "re". The former two use command-line utilities  of  the
       same  name  to  convert  HTML  to  text,  and  the  last one uses a simple regex-based tag
       stripping method (needs no extra tools).  Here is an example of using it in your  hooks.py
       file:

         from urlwatch import html2txt

         def filter(url, data):
             if url.endswith('.html') or url.endswith('.htm'):
                 return html2txt.html2text(data, method='lynx')
             # ...you can add more hooks here...
             return data

FILES

       ~/.urlwatch/urls.txt
              A list of HTTP/FTP URLs to watch (one URL per line)

       ~/.urlwatch/lib/hooks.py
              A Python module that can be used to filter contents

       ~/.urlwatch/cache/
              The state of web pages is saved in this folder

AUTHOR

       Thomas Perl <thp.io/about>

WEBSITE

       http://thp.io/2008/urlwatch/