Provided by: liburi-fetch-perl_0.09-1_all bug

NAME

       URI::Fetch - Smart URI fetching/caching

SYNOPSIS

           use URI::Fetch;

           ## Simple fetch.
           my $res = URI::Fetch->fetch('http://example.com/atom.xml')
               or die URI::Fetch->errstr;

           ## Fetch using specified ETag and Last-Modified headers.
           $res = URI::Fetch->fetch('http://example.com/atom.xml',
                   ETag => '123-ABC',
                   LastModified => time - 3600,
           )
               or die URI::Fetch->errstr;

           ## Fetch using an on-disk cache that URI::Fetch manages for you.
           my $cache = Cache::File->new( cache_root => '/tmp/cache' );
           $res = URI::Fetch->fetch('http://example.com/atom.xml',
                   Cache => $cache
           )
               or die URI::Fetch->errstr;

DESCRIPTION

       URI::Fetch is a smart client for fetching HTTP pages, notably syndication feeds (RSS,
       Atom, and others), in an intelligent, bandwidth- and time-saving way. That means:

       •   GZIP support

           If you have Compress::Zlib installed, URI::Fetch will automatically try to download a
           compressed version of the content, saving bandwidth (and time).

       •   Last-Modified and ETag support

           If you use a local cache (see the Cache parameter to fetch), URI::Fetch will keep
           track of the Last-Modified and ETag headers from the server, allowing you to only
           download pages that have been modified since the last time you checked.

       •   Proper understanding of HTTP error codes

           Certain HTTP error codes are special, particularly when fetching syndication feeds,
           and well-written clients should pay special attention to them.  URI::Fetch can only do
           so much for you in this regard, but it gives you the tools to be a well-written
           client.

           The response from fetch gives you the raw HTTP response code, along with special
           handling of 4 codes:

           •   200 (OK)

               Signals that the content of a page/feed was retrieved successfully.

           •   301 (Moved Permanently)

               Signals that a page/feed has moved permanently, and that your database of feeds
               should be updated to reflect the new URI.

           •   304 (Not Modified)

               Signals that a page/feed has not changed since it was last fetched.

           •   410 (Gone)

               Signals that a page/feed is gone and will never be coming back, so you should stop
               trying to fetch it.

USAGE

   URI::Fetch->fetch($uri, %param)
       Fetches a page identified by the URI $uri.

       On success, returns a URI::Fetch::Response object; on failure, returns "undef".

       %param can contain:

       •   LastModified

       •   ETag

           LastModified and ETag can be supplied to force the server to only return the full page
           if it's changed since the last request. If you're writing your own feed client, this
           is recommended practice, because it limits both your bandwidth use and the server's.

           If you'd rather not have to store the LastModified time and ETag yourself, see the
           Cache parameter below (and the SYNOPSIS above).

       •   Cache

           If you'd like URI::Fetch to cache responses between requests, provide the Cache
           parameter with an object supporting the Cache API (e.g.  Cache::File, Cache::Memory).
           Specifically, an object that supports "$cache->get($key)" and "$cache->set($key,
           $value, $expires)".

           If supplied, URI::Fetch will store the page content, ETag, and last-modified time of
           the response in the cache, and will pull the content from the cache on subsequent
           requests if the page returns a Not-Modified response.

       •   UserAgent

           Optional.  You may provide your own LWP::UserAgent instance.  Look into
           LWPx::ParanoidUserAgent if you're fetching URLs given to you by possibly malicious
           parties.

       •   NoNetwork

           Optional.  Controls the interaction between the cache and HTTP requests with
           If-Modified-Since/If-None-Match headers.  Possible behaviors are:

           false (default)
               If a page is in the cache, the origin HTTP server is always checked for a fresher
               copy with an If-Modified-Since and/or If-None-Match header.

           1   If set to 1, the origin HTTP is never contacted, regardless of the page being in
               cache or not.  If the page is missing from cache, the fetch method will return
               undef.  If the page is in cache, that page will be returned, no matter how old it
               is.  Note that setting this option means the URI::Fetch::Response object will
               never have the http_response member set.

           "N", where N > 1
               The origin HTTP server is not contacted if the page is in cache and the cached
               page was inserted in the last N seconds.  If the cached copy is older than N
               seconds, a normal HTTP request (full or cache check) is done.

       •   ContentAlterHook

           Optional.  A subref that gets called with a scalar reference to your content so you
           can modify the content before it's returned and before it's put in cache.

           For instance, you may want to only cache the <head> section of an HTML document, or
           you may want to take a feed URL and cache only a pre-parsed version of it.  If you
           modify the scalarref given to your hook and change it into a hashref, scalarref, or
           some blessed object, that same value will be returned to you later on not-modified
           responses.

       •   CacheEntryGrep

           Optional.  A subref that gets called with the URI::Fetch::Response object about to be
           cached (with the contents already possibly transformed by your "ContentAlterHook").
           If your subref returns true, the page goes into the cache.  If false, it doesn't.

       •   Freeze

       •   Thaw

           Optional. Subrefs that get called to serialize and deserialize, respectively, the data
           that will be cached. The cached data should be assumed to be an arbitrary Perl data
           structure, containing (potentially) references to arrays, hashes, etc.

           Freeze should serialize the structure into a scalar; Thaw should deserialize the
           scalar into a data structure.

           By default, Storable will be used for freezing and thawing the cached data structure.

       •   ForceResponse

           Optional. A boolean that indicates a URI::Fetch::Response should be returned
           regardless of the HTTP status. By default "undef" is returned when a response is not a
           "success" (200 codes) or one of the recognized HTTP status codes listed above. The
           HTTP status message can then be retreived using the "errstr" method on the class.

LICENSE

       URI::Fetch is free software; you may redistribute it and/or modify it under the same terms
       as Perl itself.

AUTHOR & COPYRIGHT

       Except where otherwise noted, URI::Fetch is Copyright 2004 Benjamin Trott,
       ben+cpan@stupidfool.org. All rights reserved.