Provided by: urlwatch_2.28-3_all bug

NAME

       urlwatch-intro - Introduction to basic urlwatch usage

QUICK START

       1. Run urlwatch once to migrate your old data or start fresh

       2. Use urlwatch --edit to customize jobs and filters (urls.yaml)

       3. Use urlwatch --edit-config to customize settings and reporters (urlwatch.yaml)

       4. Add urlwatch to your crontab (crontab -e) to monitor webpages periodically

       The   checking   interval  is  defined  by  how  often  you  run  urlwatch.  You  can  use
       e.g. crontab.guru <https://crontab.guru> to figure out the  schedule  expression  for  the
       checking  interval,  we recommend not more often than 30 minutes (this would be */30 * * *
       *).  If  you  have  never  used  cron  before,  check  out  the   crontab   command   help
       <https://www.computerhope.com/unix/ucrontab.htm>.

       On   Windows,   cron  is  not  installed  by  default.  Use  the  Windows  Task  Scheduler
       <https://en.wikipedia.org/wiki/Windows_Task_Scheduler> instead, or see this  StackOverflow
       question <https://stackoverflow.com/q/132971/1047040> for alternatives.

HOW IT WORKS

       Every time you run urlwatch(1), it:

       • retrieves the output of each job and filters it

       • compares it with the version retrieved the previous time ("diffing")

       • if  it  finds any differences, it invokes enabled reporters (e.g.  text reporter, e-mail
         reporter, ...) to notify you of the changes

JOBS AND FILTERS

       Each website or shell command to be monitored constitutes a "job".

       The instructions for each such job are contained in a  config  file  in  the  YAML  format
       <https://yaml.org/spec/>.  If  you  have  more than one job, you separate them with a line
       containing only ---.

       You can edit the job and filter configuration file using:

          urlwatch --edit

       If you get an error, set your $EDITOR (or $VISUAL) environment variable in your shell, for
       example:

          export EDITOR=/bin/nano

       While  you  can  edit  the  YAML  file manually, using --edit will do sanity checks before
       activating the new configuration file.

   Kinds of Jobs
       Each job must have exactly one of the following keys, which also defines the kind of job:

       • url retrieves what is served by the web server (HTTP GET by default),

       • navigate uses a headless browser to load web pages requiring JavaScript, and

       • command runs a shell command.

       Each job can have an optional name key to define a user-visible name for the job.

       You can then use optional keys to finely control various job's parameters.

       See urlwatch-jobs(5) for detailed information on job configuration.

   Filters
       You may use the filter key to select one or more Filters to apply to the data after it  is
       retrieved, for example to:

       • select    HTML:   css,   xpath,   element-by-class,   element-by-id,   element-by-style,
         element-by-tag

       • make HTML more readable: html2text, beautify

       • make PDFs readable: pdf2text

       • make JSON more readable: format-json

       • make iCal more readable: ical2text

       • make binary readable: hexdump

       • just detect changes: sha1sum

       • edit text: grep, grepi, strip, sort, striplines

       These filters can be chained. As an example, after retrieving an HTML  document  by  using
       the  url key, you can extract a selection with the xpath filter, convert this to text with
       html2text, use grep to extract only lines matching a specific regular expression, and then
       sort them:

          name: "Sample urlwatch job definition"
          url: "https://example.dummy/"
          https_proxy: "http://dummy.proxy/"
          max_tries: 2
          filter:
            - xpath: '//section[@role="main"]'
            - html2text:
                method: pyhtml2text
                unicode_snob: true
                body_width: 0
                inline_links: false
                ignore_links: true
                ignore_images: true
                pad_tables: false
                single_line_break: true
            - grep: "lines I care about"
            - sort:
          ---

       See urlwatch-filters(5) for detailed information on filter configuration.

REPORTERS

       urlwatch can be configured to do something with its report besides (or in addition to) the
       default of displaying it on the console.

       Reporters are configured in the global configuration file:

          urlwatch --edit-config

       Examples of reporters:

       • email (using SMTP)

       • email using mailgunslackdiscordpushbullettelegrammatrixpushoverstdoutxmppshell

       See urlwatch-reporters(5) for reporter configuration options.

SEE ALSO

       urlwatch(1),       urlwatch-jobs(5),       urlwatch-filters(5),        urlwatch-config(5),
       urlwatch-reporters(5), cron(8)

COPYRIGHT

       2023 Thomas Perl

                                           May 03, 2023                         URLWATCH-INTRO(7)