Ubuntu Manpage: urlwatch-jobs - Job types and configuration for urlwatch

NAME

       urlwatch-jobs - Job types and configuration for urlwatch

SYNOPSIS

       urlwatch --edit

DESCRIPTION

       Jobs are the kind of things that urlwatch(1) can monitor.

       The  list  of jobs to run are contained in the configuration file urls.yaml, accessed with
       the command urlwatch --edit, each separated by a line containing  only  ---.  The  command
       urlwatch  --list  prints  the name of each job, along with its index number (1, 2, 3, ...)
       which gets assigned automatically according to its position in the configuration file.

       While optional, it is recommended that each job starts with a name entry:

          name: "This is a human-readable name/label of the job"

       The following job types are available:

URL

       This is the main job type -- it retrieves a document from a web server:

          name: "urlwatch homepage"
          url: "https://thp.io/2008/urlwatch/"

       Required keys:

       • url: The URL to the document to watch for changes

       Job-specific optional keys:

       • cookies: Cookies to send with the request (see advanced_topics)

       • method: HTTP method to use (default: GET)

       • data: HTTP POST/PUT data

       • ssl_no_verify: Do not verify SSL certificates (true/false)

       • ignore_cached: Do not use cache control (ETag/Last-Modified) values (true/false)

       • http_proxy: Proxy server to use for HTTP requests

       • https_proxy: Proxy server to use for HTTPS requests

       • headers: HTTP header to send along with the request

       • encoding: Override the character encoding from the server (see advanced_topics)

       • timeout: Override the default socket timeout (see advanced_topics)

       • ignore_connection_errors: Ignore (temporary) connection errors (see advanced_topics)

       • ignore_http_error_codes: List of HTTP errors to ignore (see advanced_topics)

       • ignore_timeout_errors: Do not report errors when the timeout is hit

       • ignore_too_many_redirects: Ignore redirect loops (see advanced_topics)

       (Note: url implies kind: url)

BROWSER

This job type is a resource-intensive variant of "URL" to handle web pages requiring
JavaScript in order to render the content to be monitored.

The optional pyppeteer package must be installed to run "Browser" jobs (see dependencies).

At the moment, the Chromium version used by pyppeteer only supports macOS (x86_64),
Windows (both x86 and x64) and Linux (x86_64). See this issue
<https://github.com/pyppeteer/pyppeteer/issues/155> in the Pyppeteer issue tracker for
progress on getting ARM devices supported (e.g. Raspberry Pi).

Because pyppeteer downloads a special version of Chromium (~ 100 MiB), the first execution
of a browser job could take some time (and bandwidth). It is possible to run
pyppeteer-install to pre-download Chromium.

name: "A page with JavaScript"
navigate: "https://example.org/"

Required keys:

• navigate: URL to navigate to with the browser

Job-specific optional keys:

• wait_until: Either load, domcontentloaded, networkidle0, or networkidle2 (see
advanced_topics)

As this job uses Pyppeteer <https://github.com/pyppeteer/pyppeteer> to render the page in
a headless Chromium instance, it requires massively more resources than a "URL" job. Use
it only on pages where url does not give the right results.

Hint: in many instances instead of using a "Browser" job you can monitor the output of an
API called by the site during page loading containing the information you're after using
the much faster "URL" job type.

(Note: navigate implies kind: browser)

SHELL

       This  job type allows you to watch the output of arbitrary shell commands, which is useful
       for e.g. monitoring an FTP uploader folder, output of scripts that query external  devices
       (RPi GPIO), etc...

          name: "What is in my Home Directory?"
          command: "ls -al ~"

       Required keys:

       • command: The shell command to execute

       Job-specific optional keys:

       • stderr: Change how standard error is treated, see below

       (Note: command implies kind: shell)

   Configuring stderr behavior for shell jobs
       By  default urlwatch captures stderr for error reporting (non-zero exit code), but ignores
       the output when the shell job exits with exit code 0.

       This behavior can be customized using the stderr key:

       • ignore: Capture stderr, report on non-zero exit code, ignore otherwise (default)

       • urlwatch: stderr of the shell job is sent to stderr of the urlwatch process;  any  error
         message  on  stderr  will  not be visible in the error message from the reporter (legacy
         default behavior of urlwatch 2.24 and older)

       • fail: Treat the job as failed if there is any output on stderr, even with exit status 0

       • stdout: Merge stderr output into stdout, which means stderr output  is  also  considered
         for the change detection/diff part of urlwatch (this is similar to 2>&1 in a shell)

       For  example,  this  job  definition  will  make the job appear as failed, even though the
       script exits with exit code 0:

          command: |
            echo "Normal standard output."
            echo "Something goes to stderr, which makes this job fail." 1>&2
            exit 0
          stderr: fail

       On the other hand, if you want to diff both stdout and stderr of the job, use this:

          command: |
            echo "An important line on stdout."
            echo "Another important line on stderr." 1>&2
          stderr: stdout

OPTIONAL KEYS FOR ALL JOB TYPES

       • name: Human-readable name/label of the job

       • filter: filters (if any) to apply to the output (can be tested with --test-filter)

       • max_tries: Number of times to retry fetching the resource

       • diff_tool: Command to a custom tool for generating diff text

       • diff_filter: filters (if  any)  to  apply  to  the  diff  result  (can  be  tested  with
         --test-diff-filter)

       • treat_new_as_changed:  Will  treat  jobs  that  don't  have any historic data as CHANGED
         instead of NEW (and create a diff for new jobs)

       • compared_versions: Number of versions to compare for similarity

       • kind (redundant): Either url, shell or browser.  Automatically derived from  the  unique
         key (url, command or navigate) of the job type

       • user_visible_url:  Different URL to show in reports (e.g. when watched URL is a REST API
         URL, and you want to show a webpage)

SETTING KEYS FOR ALL JOBS AT ONCE

       The main configuration file has a job_defaults key that can be used to configure keys  for
       all jobs at once.

       See urlwatch-config(5) for how to configure job defaults.

EXAMPLES

       See urlwatch-cookbook(7) for example job configurations.

FILES

       $XDG_CONFIG_HOME/urlwatch/urls.yaml

COPYRIGHT

       2022 Thomas Perl