oracular (7) podget.7.gz

Provided by: podget_0.9.3-1_all bug

NAME

       Podget - Simple tool to automate downloading of podcasts.

SYNOPSIS

       podget <options>

DESCRIPTION

       Podget  is  a simple podcast aggregator/downloader optimized for scheduled background jobs
       (i.e. cron).

       It features support for:
       - Downloading podcasts from RSS and ATOM XML feeds.
       - For sorting the files into folders and categories.
       - For importing URLs from iTunes PCAST files and OPML lists.
       - Automatic M3U & ASX playlist creation.
       - Cleanup of old files.
       - Automatic UTF-16 conversion for feeds hosted on MS Windows Servers.

       Podget works by extracting the  <enclosure>  tags  from  the  feed  then  downloading  the
       specified  URL.   There is one exception when Podget will ignore <enclosure> tags and that
       is when they are within <podcast:liveItem> tags because Podget is an aggregator and not  a
       player so has not been optimized for live content.

OPTIONS

       -c <FILE> | --config <FILE>
              Name of configuration file.

       --create-config <FILE>
              Create configuration file and exit.

       -C | --cleanup
              Skip downloading and only run cleanup loop.

       --cleanup_days <NUMBER>
              Cleanup files older than <NUMBER> days.

       --cleanup_simulate
              Simulate cleanup loop to see what files would be deleted.

       -d <DIRECTORY> | --dir_config <DIRECTORY>
              Directory that configuration files are stored in.

       --dir_session <DIRECTORY>
              Directory that session files are stored in.

       -f | --force
              Force download of items from each feed even if they've already been downloaded.

       -h | --help
              Display condensed help dialog.

       -l <DIRECTORY> | --library <DIRECTORY>
              Directory to store downloaded files in.

       -n | --no-playlist
              Do not create M3U playlist of new items.

       -p | --playlist-asx
              In addition to M3U playlists, create ASX playlists.

       --playlist-per-podcast
              Create a playlist of new items for each podcast feed.

       -r <COUNT> | --recent <COUNT>
              Download only the <COUNT> newest items from each feed.

       --serverlist <FILE>
              Use <FILE> as serverlist instead of default.

       -s | --silent
              Run silently (for cron jobs).

       -v     Set verbosity to level 1.

       -vv    Set verbosity to level 2.

       -vvv   Set verbosity to level 3.

       -vvvv  Set verbosity to level 4.

       --verbosity <LEVEL>
              Set verbosity level (0-4).

       -V | --version
              Display version.

       OPML List Options:

              --import_opml <FILE or URL>
                     Import servers from OPML file or HTTP/FTP URL.

              --export_opml <FILE>
                     Export serverlist to OPML file.

       PCAST List Options:

              --import_pcast <FILE or URL>
                     Import server from iTunes PCAST file or HTTP/FTP URL.

CONFIGURATION FILES

       By default, Podget relies on two configuration files.

       podgetrc
              This is a file with most options for how Podget should run.

              If  it  is  required  to  run podget with different options for certain feeds, then
              additional configuration files can be created and used  with  the  --config  or  -c
              option.   When  this option is run with a new filename that does not exist yet, the
              file is created with default options that can then be customized as necessary.

       serverlist
              This is a file of all the feeds that Podget should monitor and download from.

              If you need to separate your feeds into multiple lists, then additional  files  can
              be  created  with  the  --serverlist  option.   When  this option is run with a new
              filename that does not exist yet, the file is created with  a  default  list  of  a
              single  feed.   Whenever  a new list is created, Podget will download a single item
              from the single feed included by default to verify that everything is working.

              For a description of the options available for  this  file,  please  refer  to  the
              SERVER LIST CONFIGURATION section of this document.

   USER CONFIGURATION DIRECTORY
       The  first  time  a  user  runs podget, it will create a configuration directory.  In this
       directory, it will install the default configuration files.

       Where this configuration directory is automatically placed is dependent upon  the  version
       of Podget that you used when you first ran it.

       For version 0.8.10 and before:
              $HOME/.podget

       For later versions:
              If $XDG_CONFIG_HOME is set then it will be placed in:  $XDG_CONFIG_HOME/podget
              IF unset, then it will be placed in: $HOME/.config/podget

       If  a  user wants to clean up their $HOME directory by moving their existing configuration
       directory to either of the new locations, it can be done but it is necessary  to  remember
       to remove the leading period so it is no longer a hidden directory.
              Example:  mv $HOME/.podget $HOME/.config/podget

       These locations can be overridden by the use of the --dir_config or -d option when you run
       podget.

   WHICH CONFIGURATION DIRECTORY IS USED
       Since there are at least three possible locations for the configuration directory then  it
       is  necessary  to  know which one podget will use.  To keep things simple, Podget uses the
       first one it finds and tests in the following order:

         1.  $HOME/.podget
         2.  $XDG_CONFIG_HOME/podget
         3.  $HOME/.config/podget

       This location testing is skipped by the use of the --dir_config or -d option.

   AUTOMATIC CLEANUP
       You can enable automatic cleanup with every run by configuring it in your  podgetrc  file.
       Simply set the following options:

         # Autocleanup.
         # 0 == disabled
         # 1 == delete any old content
         cleanup=1

         # Number of days to keep files.   Cleanup will remove anything
         # older than this.
         cleanup_days=7

       However, some people prefer to run cleanup as a separate cron session. To do that, set the
       options in podgetrc to:

         # Autocleanup.
         # 0 == disabled
         # 1 == delete any old content
         cleanup=0

         # Number of days to keep files.   Cleanup will remove anything
         # older than this.
         cleanup_days=7

       Then add something similar to this example to your crontab:

         # Once a week on Sunday at 04:07AM
         07 04 * * Sun /usr/bin/podget -C

   MULTIPLE CONCURRENT SESSIONS
       Podget checks for sessions using the same core configuration  file  that  may  already  be
       running  when  it  starts  and exits if any are found.  This insures that any long running
       sessions are not interrupted by new ones.

       If you have feeds that require distinct configurations, then you can enable  them  to  run
       simultaneously  by  using  separate  configuration  files  for  each.   Then  if  you have
       sufficient bandwidth, you can call them all at the same time.

       Example Crontab configuration:

         00 02 * * * /usr/bin/podget -c podgetrc-group1
         00 02 * * * /usr/bin/podget -c podgetrc-group2

   SEQUENTIAL SESSIONS
       Sometimes, you have feed lists that use the  same  configuration  but  you  wish  to  keep
       separate.  There are two ways to handle this.

       First,  run  then  separately  from  crontab with sufficient time in between so they don't
       interfere with each other.

         00 02 * * * /usr/bin/podget --serverlist RSS-Feeds
         00 03 * * * /usr/bin/podget --serverlist ATOM-Feeds

       The second option is to place them into a shell script so they are called sequentially and
       do not interfere with each other and then add it to your crontab.

         #!/usr/bin/env bash
         /usr/bin/podget --serverlist RSS-Feeds
         /usr/bin/podget --serverlist ATOM-Feeds

   ENABLING DEBUG OUTPUT
       Debug output can be enabled in two ways.

       The  first way is by uncommenting the DEBUG option in your podgetrc and setting it to '1'.
       However this way will not enable DEBUG until just over 1400 lines of script have  run  and
       when  podgetrc finally is read.  This is sufficient for most issues.

       The second way is from the command-line and enables debug as early as possible.

       Simply execute podget like so:

         $ DEBUG=1 podget -vvvv

       You  can  enable  other  options  as well if you need to but for debugging purposes, it is
       highly recommended that you enabled as much verbosity as possible.

   SERVER LIST CONFIGURATION
       By default, Podget uses serverlist for the default list of servers to contact. However you
       can configure the name with the config_serverlist variable in your podgetrc file.

       Feeds are listed one per line in the serverlist file.

       Default format with category and name:
              <url> <category> <name>

       Alternate Formats:
       1. With a category but no name.
              <url> <category>
       2. With a name but no category (2 ways).
              <url> No_Category <name>
              <url> . <name>
       3. With neither a category or name.
              <url>

       1. URL Rules:
              A. Any spaces in the URL need to be converted to %20
       2. Category Rules:
              A. Must be one word without spaces.
              B. You may use underscores and dashes.
              C. You can insert date substitutions.
                     %YY%  ==  Year
                     %MM%  ==  Month
                     %DD%  ==  Day
              D. Category disabling:
                     -  With  a  name,  the  category  must  either  be  a  single  period (.) or
                     'No_Category'.
                     - If the name is blank, the category can also be blank.
       3. Name Rules:
              A. If you are creating ASX playlists, make sure the feed name  does  not  have  any
              spaces in it and the filename cannot be blank.
              B.  You  can  leave  the  feed  name blank, and files will be saved in the category
              directory.
              C. Names with spaces are only compatible with filesystems that allow for spaces  in
              filenames.   For  example, spaces in feed names are OK for feeds saved to Linux ext
              partitions but are not OK for those saved to Microsoft FAT partitions.
              D. Feed names can be disabled by leaving them blank.
       4. Disable the downloading of any feed by commenting it out with a leading #.

       Example:
        http://www.lugradio.org/episodes.rss Linux LUG Radio

       Example with date substitution in the category and a blank feed name:
        http://downloads.bbc.co.uk/rmhttp/downloadtrial/worldservice/summary/rss.xml
       News-%YY%-%MM%-%DD%

       Example of two ways to do a feed with authentication:
        http://somesite.com/feed.rss CATEGORY Feed Name USER:username PASS:password
        http://username:password@somesite.com/feed.rss CATEGORY Feed Name

              NOTE:  The  second  method  will  fail  if  a  colon (:) is part of the username or
              password.  Both methods will fail if a space is part of the username or password.

       Common Options:

       OPT_CONTENT_DISPOSITION
              Attempt to get filename from the Content-Disposition  tag  that  is  part  of  wget
              --server-response.

       OPT_DISPOSITION_FAIL
              This  option works in conjunction with OPT_CONTENT_DISPOSITION by removing any URLs
              that fail to receive a filename from the COMPLETED log.  This  allows  them  to  be
              automatically  retried  the next time a session runs.  If this option is added to a
              feed that has already been downloaded then the user will need to  remove  the  URLs
              for the problematic files from the COMPLETED log manually. On one feed this allowed
              for the improvement of the number of filename problems from  approximately  15%  to
              under  2%  over the course of 6 sessions.  Those sessions can occur sequentially on
              one day or as part of your established cron rotation.

       OPT_FEED_ORDER_ASCENDING
              By default, Podget assumes that items in a feed  will  be  listed  from  newest  to
              oldest  (descending  order).  This option will modify Podget's handling of the feed
              for those that are listed from oldest to newest.  This option  will  not  have  any
              noticeable effect for feeds where you want to download every item.  It will have an
              effect for new feeds when combined with the --recent [COUNT] option.

       OPT_FEED_PLAYLIST_NEWFIRST
              Most playlist options create lists of just the new items that are downloaded in the
              current  session.   This  option  creates  or updates a full playlist for all items
              available for a feed sorted  from  newest  to  oldest  based  on  the  modification
              date/time of the file.

       OPT_FEED_PLAYLIST_OLDFIRST
              Same  as  OPT_FEED_PLAYLIST_NEWFIRST  except  playlist  is  ordered  from oldest to
              newest.

       OPT_FILENAME_LOCATION
              Some feeds do not have the detailed filename listed in the FEED but  rather  rename
              the  file  on  redirection.  This option addresses that issue by attempting to grab
              the filename from the last  'Location:'  tag  in  the  output  of  'wget  --server-
              response'.

       OPT_FILENAME_RENAME_MDATE
              For  feeds  that use a singular filename for each item that is identified by a long
              somewhat incomprehensible string in the URL.  These  feeds  were  previously  fixed
              with  FILENAME_FORMATFIX4  which  would append the string to the common filename to
              produce unique filenames for each item.  However this produced filenames that  were
              not  very easy to understand.  This option gives us another method for dealing with
              these  common  filenames.   This  appends  the  date  of  the  files  last   change
              (modification   date)   as   a   prefix   to   the   filename   in  the  format  of
              YYYYMMDD_HHhMMm_<common-part>.  This makes the filenames  sortable  and  gives  the
              user something that makes a moderate amount of sense.  Does not work for all feeds,
              for some feeds the last modification time for each file is the  time  of  download.
              Which may be acceptable in some situations but can cause confusion when downloading
              more than one item at a time from a feed.

       OPT_WGET_DEFUSERAGENT
              Configure Wget to use  it's  default  user-agent  (normally  formatted  similar  to
              "Wget/1.21.2")  and  to  not use either Podget's default user-agent ("Podget") or a
              custom agent set in WGET_BASEOPTS in podgetrc.

       OPT_NO_CERT_CHECK
              Disable wget SSL certificate verification.  This is common used for feeds that  are
              using self-signed certificates.

       OPT_PREFER_IPv4 or OPT_PREFER_IPv6
              Configure  wget  so that when a DNS lookup gives a choice of several addresses that
              it should connect to the specified family first.

       Examples:
        http://somesite.com/feed.rss CATEGORY Feed Name OPT_PREFER_IPv4
        http://somesite.com/feed.rss CATEGORY Feed Name OPT_PREFER_IPv6
        http://somesite.com/feed.rss CATEGORY Feed Name OPT_WGET_DEFUSERAGENT
        http://somesite.com/feed.rss CATEGORY Feed Name OPT_NO_CERT_CHECK
        http://somesite.com/feed.rss CATEGORY Feed Name OPT_CONTENT_DISPOSITION
        http://somesite.com/feed.rss     CATEGORY     Feed      Name      OPT_CONTENT_DISPOSITION
       OPT_DISPOSITION_FAIL
        http://somesite.com/feed.rss CATEGORY Feed Name OPT_FILENAME_LOCATION
        http://somesite.com/feed.rss CATEGORY Feed Name OPT_FILENAME_RENAME_MDATE
        http://somesite.com/feed.rss      CATEGORY      Feed      Name      OPT_FILENAME_LOCATION
       OPT_FILENAME_RENAME_MDATE
        http://somesite.com/feed.rss CATEGORY Feed Name OPT_FEED_ORDER_ASCENDING
        http://somesite.com/feed.rss CATEGORY Feed Name OPT_FEED_PLAYLIST_NEWFIRST
        http://somesite.com/feed.rss CATEGORY Feed Name OPT_FEED_PLAYLIST_OLDFIRST

       RSS Feed Options:
              There are three options for RSS Feeds that are not supported for ATOM feeds.

              The first two are related with the renaming the downloaded files with the  contents
              of  the  <TITLE> tag from the HTML and the third is to expand what tags Podget gets
              content from.

       OPT_FILENAME_RENAME_TITLETAG
              This first version is for handling feeds that place  the  <TITLE>  tag  before  the
              <ENCLOSURE>  tag.   The  majority of tested feeds that use <TITLE> tags follow this
              order.

       OPT_FILENAME_RENAME_REVTITLETAG
              The second version is for handling  feeds  that  have  the  <ENCLOSURE>  tag  first
              followed by the <TITLE> tag.

       OPT_RSS_MEDIACONTENT
              This  third option will enable Podget to download content from <MEDIA:CONTENT> tags
              in addition to <ENCLOSURE> tags.

       Examples:
        http://somesite.com/feed.rss CATEGORY Feed Name OPT_FILENAME_RENAME_TITLETAG
        http://somesite.com/feed.rss    CATEGORY    Feed    Name     OPT_FILENAME_RENAME_TITLETAG
       OPT_FILENAME_RENAME_MDATE
        http://somesite.com/feed.rss CATEGORY Feed Name OPT_FILENAME_RENAME_REVTITLETAG
        http://somesite.com/feed.rss CATEGORY Feed Name OPT_RSS_MEDIACONTENT

       To  determine if the feed uses <TITLE> tags and in which order, run the following with the
       URL for the feed:

               wget -O - http://somesite.com/feed.rss | sed -n -e :a -e 's/.*<enclosure.*url\s*=\s*"\([^"]+\)".*/URL 1/Ip' -e t -e "s/.*<enclosure.*url\s*'=\s*\([^i]\+\)'.*/URL \1/Ip" -e t -e 's/.*<title>\(.*\)<[/]title>.*$/TITLE 1/Ip' -e t -e '/\(<enclosure\|<title>\).*/I{N;s/ *0 /;T;ba}'

       This will produce a list of lines that start with either TITLE or URL.  The  URL  is  from
       the  <ENCLOSURE>  tag  and the TITLE is obviously from the <TITLE> tag.  On many feeds the
       first thing you will notice is a few uses of the <TITLE>  tag  before  the  first  URL  is
       specified.   In  that  case,  Podget  uses  the  last TITLE found, so the earlier ones are
       discard.  The important part is when we get to the  first  URL,  from  there  we  need  to
       determine if the title for that item came before or after the URL.  If it comes first then
       we use OPT_FILENAME_RENAME_TITLETAG for it.   If  the  title  comes  second  then  we  use
       OPT_FILENAME_RENAME_REVTITLETAG.

       On  some  feeds,  the downloaded filename will not have anything identifiable to determine
       which TITLE goes with it.  In those cases it may be necessary to download a few items  and
       listen to them to determine which order they use.

       On  some  feeds, it will be discovered that the downloaded filename and the TITLE are very
       similar.  In those cases, it is left to the user to determine which they prefer.

       On some feeds, the TITLE will have very little to specify when it was recorded and it  may
       be  useful  to use the OPT_FILENAME_RENAME_MDATE option to add a date tag to each filename
       as it is converted.

       And on some feeds, there will be a complete absence of TITLE lines.  Those  feeds  do  not
       use the tag so using either option will not produce any changes.

       Atom Feed Options:
              The following options are available for advanced handling of Atom feeds.

       ATOM_FILTER_SIMPLE
              This option will enable filtering for just audio or video files from a feed.

       ATOM_FILTER_TYPE="type"
              This option allows more detailed filtering of the variety of types available.  This
              can limit the files downloaded to one type (example:  "audio/mpeg")  or  to  a  few
              types (example: "(audio|video)/.*" for all audio and video types, OR "audio/.*" for
              all audio types).

       ATOM_FILTER_LANG="language"
              If an Atom feed supports multiple languages for enclosures, then you can  use  this
              option to filter to only those you desire.  You can limit to one language (example:
              "en" for just English) or combine several  supported  languages  to  get  them  all
              (example:  "(en|es|fr)"  to download files in English, Spanish and French.  How the
              languages are defined may vary from feed to feed.

       Note:  If you do not enable any of  the  ATOM_FILTER  options  on  a  feed  with  multiple
       enclosures  per  item, when you run podget it will tell you the count per type or language
       to help you decide if you should enable the filters to reduce the number of  files  to  be
       downloaded.

       Examples:
        http://somesite.com/feed CATEGORY Feed Name ATOM_FILTER_SIMPLE
        http://somesite.com/feed CATEGORY Feed Name ATOM_FILTER_TYPE="audio/mpeg"
        http://somesite.com/feed CATEGORY Feed Name ATOM_FILTER_TYPE="(audio|video)/.*"
        http://somesite.com/feed CATEGORY Feed Name ATOM_FILTER_LANG="en"
        http://somesite.com/feed CATEGORY Feed Name ATOM_FILTER_LANG="(en|es|fr)"
        http://somesite.com/feed     CATEGORY     Feed     Name     ATOM_FILTER_TYPE="audio/mpeg"
       ATOM_FILTER_LANG="en"

   HANDLING UTF-16 FEEDS
       Some servers provide their feeds in UTF-16 format rather than the more common UTF-8.

       To automatically convert these files, create a secondary serverlist in your  configuration
       directory:

               serverlist.utf16

       Remember  to  change  the  name  of  the  serverlist  to  match  what  you  set it to with
       config_serverlist if you changed it.

EXAMPLE CRON JOB

       Once podget is running correctly, it's most useful if you run it from a cron job  so  that
       the  new  episodes are available to play or load onto a portable player and you don't have
       to wait for them to download.

       To edit your crontab, do:

         $ crontab -e

       Then add one line similar to this example:

         15 04 * * * /usr/bin/podget -s

       This will run podget at 4:15 AM every day.

       In some cases, you might need to add a few directories  to  your  PATH  variable  so  that
       Podget can find everything it needs.

       Then the job might look like:

         15 04 * * * PATH=/opt/local/bin:/usr/local/bin:$PATH /usr/bin/podget -s

AUTHORS

       Dave Vehrs

                                         10 February 2023                               podget(7)