Provided by: python-podcastparser-doc_0.6.10-2_all bug

NAME

       podcastparser - podcastparser Documentation

       podcastparser is a simple and fast podcast feed parser library in Python.  The two primary
       users of the library are the gPodder Podcast Client and the gpodder.net web service.

       The following feed types are supported:

       • Really Simple Syndication (RSS 2.0)

       • Atom Syndication Format (RFC 4287)

       The following specifications are supported:

       • Paged Feeds (RFC 5005)

       • Podlove Simple ChaptersPodcast Index Podcast Namespace

       These formats only specify the possible markup elements and attributes. We recommend  that
       you  also read the Podcast Feed Best Practice guide if you want to optimize your feeds for
       best display in podcast clients.

       Where times and durations are used, the values are expected  to  be  formatted  either  as
       seconds or as RFC 2326 Normal Play Time (NPT).

          import podcastparser
          import urllib.request

          feedurl = 'http://example.com/feed.xml'

          parsed = podcastparser.parse(feedurl, urllib.request.urlopen(feedurl))

          # parsed is a dict
          import pprint
          pprint.pprint(parsed)

       For both RSS and Atom feeds, only a subset of elements (those that are relevant to podcast
       client applications) is parsed. This section describes which elements and  attributes  are
       parsed and how the contents are interpreted/used.

RSS

       rss@xml:base
              Base URL for all relative links in the RSS file.

       rss/channel
              Podcast.

       rss/channel/title
              Podcast title (whitespace is squashed).

       rss/channel/link
              Podcast website.

       rss/channel/description
              Podcast description (whitespace is squashed).

       rss/channel/image/url
              Podcast cover art.

       rss/channel/itunes:image
              Podcast cover art (alternative).

       rss/channel/itunes:type
              Podcast type (whitespace is squashed).  One of ‘episodic’ or ‘serial’.

       rss/channel/itunes:keywords
              Podcast keywords (whitespace is squashed).

       rss/channel/atom:link@rel=payment
              Podcast payment URL (e.g. Flattr).

       rss/channel/generator
              A  string  indicating the program used to generate the channel. (e.g. MightyInHouse
              Content System v2.3).

       rss/channel/language
              Podcast language.

       rss/channel/itunes:author
              The group responsible for creating the show.

       rss/channel/itunes:owner
              The podcast owner contact information.  The <itunes:owner> tag information  is  for
              administrative  communication  about  the  podcast  and  isn’t  displayed  in Apple
              Podcasts

       rss/channel/itunes:category
              The show category information.

       rss/channel/itunes:explicit
              Indicates whether podcast contains explicit material.

       rss/channel/itunes:new-feed-url
              The new podcast RSS Feed URL.

       rss/channel/podcast:locked
              If the podcast is currently locked from being transferred.

       rss/channel/podcast:funding
              Funding link for podcast.

       rss/redirect/newLocation
              The new podcast RSS Feed URL.

       rss/channel/item
              Episode.

       rss/channel/item/guid
              Episode unique identifier (GUID), mandatory.

       rss/channel/item/title
              Episode title (whitespace is squashed).

       rss/channel/item/link
              Episode website.

       rss/channel/item/description
              Episode description.  If it  contains  html,  it’s  returned  as  description_html.
              Otherwise  it’s  returned  as  description (whitespace is squashed).  See Mozilla’s
              article Why RSS Content Module is Popular

       rss/channel/item/itunes:summary
              Episode description (whitespace is squashed).

       rss/channel/item/itunes:subtitle
              Episode subtitled / one-line description (whitespace is squashed).

       rss/channel/item/content:encoded
              Episode description in HTML.  Best source for description_html.

       rss/channel/item/itunes:duration
              Episode duration.

       rss/channel/item/pubDate
              Episode publication date.

       rss/channel/item/atom:link@rel=payment
              Episode payment URL (e.g. Flattr).

       rss/channel/item/atom:link@rel=enclosure
              File download URL (@href), size (@length) and mime type (@type).

       rss/channel/item/itunes:image
              Episode art URL.

       rss/channel/item/media:thumbnail
              Episode art URL.

       rss/channel/item/media:group/media:thumbnail
              Episode art URL.

       rss/channel/item/media:content
              File download URL (@url), size (@fileSize) and mime type (@type).

       rss/channel/item/media:group/media:content
              File download URL (@url), size (@fileSize) and mime type (@type).

       rss/channel/item/enclosure
              File download URL (@url), size (@length) and mime type (@type).

       rss/channel/item/psc:chapters
              Podlove Simple Chapters, version 1.1 and 1.2.

       rss/channel/item/psc:chapters/psc:chapter
              Chapter entry (@start, @title, @href and @image).

       rss/channel/item/itunes:explicit
              Indicates whether episode contains explicit material.

       rss/channel/item/itunes:author
              The group responsible for creating the episode.

       rss/channel/item/itunes:season
              The season number of the episode.

       rss/channel/item/itunes:episode
              An episode number.

       rss/channel/item/itunes:episodeType
              The episode type.  This flag is used if an episode is a trailer or bonus content.

       rss/channel/item/podcast:chapters
              The url to a JSON file describing the chapters.  Only the url is added to the  data
              as fetching an external URL would be unsafe.

       rss/channel/item/podcast:person
              A person involved in the episode, e.g. host, or guest.

       rss/channel/item/podcast:transcript
              The url for the transcript file associated with this episode.

ATOM

       For Atom feeds, podcastparser will handle the following elements and attributes:

       atom:feed
              Podcast.

       atom:feed/atom:title
              Podcast title (whitespace is squashed).

       atom:feed/atom:subtitle
              Podcast description (whitespace is squashed).

       atom:feed/atom:icon
              Podcast cover art.

       atom:feed/atom:link@href
              Podcast website.

       atom:feed/atom:entry
              Episode.

       atom:feed/atom:entry/atom:id
              Episode unique identifier (GUID), mandatory.

       atom:feed/atom:entry/atom:title
              Episode title (whitespace is squashed).

       atom:feed/atom:entry/atom:link@rel=enclosure
              File download URL (@href), size (@length) and mime type (@type).

       atom:feed/atom:entry/atom:link@rel=(self|alternate)
              Episode website.

       atom:feed/atom:entry/atom:link@rel=payment
              Episode payment URL (e.g. Flattr).

       atom:feed/atom:entry/atom:content
              Episode description (in HTML or plaintext).

       atom:feed/atom:entry/atom:published
              Episode publication date.

       atom:feed/atom:entry/media:thumbnail
              Episode art URL.

       atom:feed/atom:entry/media:group/media:thumbnail
              Episode art URL.

       atom:feed/atom:entry/psc:chapters
              Podlove Simple Chapters, version 1.1 and 1.2.

       atom:feed/atom:entry/psc:chapters/psc:chapter
              Chapter entry (@start, @title, @href and @image).

       Simplified, fast RSS parser

       exception podcastparser.FeedParseError(msg, exception, locator)
              Exception raised when asked to parse an invalid feed

              This  exception  allows users of this library to catch exceptions without having to
              import the XML parsing library themselves.

       class podcastparser.PodcastHandler(url, max_episodes)

              characters(chars)
                     Receive notification of character data.

                     The Parser will call this method to report each chunk of character data. SAX
                     parsers  may return all contiguous character data in a single chunk, or they
                     may split it into several chunks; however, all  of  the  characters  in  any
                     single  event  must  come  from the same external entity so that the Locator
                     provides useful information.

              endElement(name)
                     Signals the end of an element in non-namespace mode.

                     The name parameter contains the name of the element type, just as  with  the
                     startElement event.

              startElement(name, attrs)
                     Signals the start of an element in non-namespace mode.

                     The  name  parameter  contains the raw XML 1.0 name of the element type as a
                     string and the attrs parameter holds an instance  of  the  Attributes  class
                     containing the attributes of the element.

       class podcastparser.RSSItemDescription
              RSS   2.0   almost   encourages   to  put  html  content  in  item/description  but
              content:encoded is the better source of html content and itunes:summary is known to
              contain the short textual description of the item.  So use a heuristic to attribute
              text to either description or description_html, without overriding existing values.

       podcastparser.file_basename_no_extension(filename)
              Returns filename without extension

              >>> file_basename_no_extension('/home/me/file.txt')
              'file'

              >>> file_basename_no_extension('file')
              'file'

       podcastparser.is_html(text)
              Heuristically tell if text is HTML

              By looking for an open tag (more or less:) >>> is_html(‘<h1>HELLO</h1>’)  True  >>>
              is_html(‘a < b < c’) False

       podcastparser.normalize_feed_url(url)
              Normalize  and  convert  a  URL. If the URL cannot be converted (invalid or unknown
              scheme), None is returned.

              This will also normalize feed:// and itpc:// to http://.

              >>> normalize_feed_url('itpc://example.org/podcast.rss')
              'http://example.org/podcast.rss'

              If no URL scheme is defined (e.g. “curry.com”), we  will  simply  assume  the  user
              intends to add a http:// feed.

              >>> normalize_feed_url('curry.com')
              'http://curry.com/'

              It  will  also  take  care  of converting the domain name to all-lowercase (because
              domains are not case sensitive):

              >>> normalize_feed_url('http://Example.COM/')
              'http://example.com/'

              Some other minimalistic changes are also taken care of, e.g.  a  ?  with  an  empty
              query is removed:

              >>> normalize_feed_url('http://example.org/test?')
              'http://example.org/test'

              Leading and trailing whitespace is removed

              >>> normalize_feed_url(' http://example.com/podcast.rss ')
              'http://example.com/podcast.rss'

              Incomplete (too short) URLs are not accepted

              >>> normalize_feed_url('http://') is None
              True

              Unknown protocols are not accepted

              >>> normalize_feed_url('gopher://gopher.hprc.utoronto.ca/file.txt') is None
              True

       podcastparser.parse(url, stream, max_episodes=0)
              Parse a podcast feed from the given URL and stream

              Parametersurl – the URL of the feed. Will be used to resolve relative links

                     • stream – file-like object containing the feed content

                     • max_episodes  – maximum number of episodes to return. 0 (default) means no
                       limit

              Returns
                     a dict with the parsed contents of the feed

       podcastparser.parse_length(text)
              Parses a file length

              >>> parse_length(None)
              -1

              >>> parse_length('0')
              -1

              >>> parse_length('unknown')
              -1

              >>> parse_length('100')
              100

       podcastparser.parse_pubdate(text)
              Parse a date string into a Unix timestamp

              >>> parse_pubdate('Fri, 21 Nov 1997 09:55:06 -0600')
              880127706

              >>> parse_pubdate('2003-12-13T00:00:00+02:00')
              1071266400

              >>> parse_pubdate('2003-12-13T18:30:02Z')
              1071340202

              >>> parse_pubdate('Mon, 02 May 1960 09:05:01 +0100')
              -305049299

              >>> parse_pubdate('')
              0

              >>> parse_pubdate('unknown')
              0

       podcastparser.parse_time(value)
              Parse a time string into seconds

              See RFC2326, 3.6 “Normal Play Time” (HH:MM:SS.FRACT)

              >>> parse_time('0')
              0
              >>> parse_time('128')
              128
              >>> parse_time('00:00')
              0
              >>> parse_time('00:00:00')
              0
              >>> parse_time('00:20')
              20
              >>> parse_time('00:00:20')
              20
              >>> parse_time('01:00:00')
              3600
              >>> parse_time(' 03:02:01')
              10921
              >>> parse_time('61:08')
              3668
              >>> parse_time('25:03:30 ')
              90210
              >>> parse_time('25:3:30')
              90210
              >>> parse_time('61.08')
              61
              >>> parse_time('01:02:03.500')
              3723
              >>> parse_time(' ')
              0

       podcastparser.parse_type(text)
              “normalize” a mime type

              >>> parse_type('text/plain')
              'text/plain'

              >>> parse_type('text')
              'application/octet-stream'

              >>> parse_type('')
              'application/octet-stream'

              >>> parse_type(None)
              'application/octet-stream'

       podcastparser.remove_html_tags(html)
              Remove HTML tags from a string and replace numeric  and  named  entities  with  the
              corresponding character, so the HTML text can be displayed in a simple text view.

       podcastparser.squash_whitespace(text)
              Combine multiple whitespaces into one, trim trailing/leading spaces

              >>> squash_whitespace(' some           text  with a    lot of   spaces ')
              'some text with a lot of spaces'

       podcastparser.squash_whitespace_not_nl(text)
              Like squash_whitespace, but don’t squash linefeeds and carriage returns

              >>> squash_whitespace_not_nl(' linefeeds\ncarriage\r  returns')
              'linefeeds\ncarriage\r returns'

       This  is  a  list  of  podcast-related  XML  namespaces  that  are  not  yet  supported by
       podcastparser, but might be in the future.

CHAPTER MARKS

rawvoice RSS: Rating, Frequency, Poster,  WebM,  MP4,  Metamark  (kind  of  chapter-like
         markers)

       • IGOR: Chapter Marks

OTHERS

libSYN  RSS  Extensions:  contactPhone,  contactEmail,  contactTwitter,  contactWebsite,
         wallpaper, pdf, background

       • Comment API: Comments to a given item (readable via RSS)

       • MVCB: Error Reports To Field (usually a mailto: link)

       • Syndication Module: Update period, frequency and base (for skipping updates)

       • Creative Commons RSS: Creative commons license for the content

       • Pheedo: Original link to website and original link to enclosure (without  going  through
         pheedo redirect)

       • WGS84: Geo-Coordinates per item

       • Conversations Network: Intro duration in milliseconds (for skipping the intro), ratings

       • purl  DC  Elements:  dc:creator  (author  / creator of the podcast, possibly with e-mail
         address)

       • Tristana: tristana:self (canonical URL to feed)

       • Blip: Show name, show page, picture, username, language, rating, thumbnail_src, license

       • IndexModule IndexSearch Page

AUTHOR

       gPodder Team

COPYRIGHT

       2023, gPodder Team