Provided by: tcllib_1.20+dfsg-1_all bug

NAME

       uri - URI utilities

SYNOPSIS

       package require Tcl  8.2

       package require uri  ?1.2.7?

       uri::setQuirkOption option ?value?

       uri::split url ?defaultscheme?

       uri::join ?key value?...

       uri::resolve base url

       uri::isrelative url

       uri::geturl url ?options...?

       uri::canonicalize uri

       uri::register schemeList script

_________________________________________________________________________________________________

DESCRIPTION

       This package does two things.

       First,  it  provides  a  number  of  commands for manipulating URLs/URIs and fetching data
       specified by them. For fetching data this package analyses the requested URL/URI and  then
       dispatches it to the appropriate package (http, ftp, ...) for actual retrieval.  Currently
       these commands are defined for the schemes http, https, ftp, mailto, news, ldap, ldaps and
       file.  The package uri::urn adds scheme urn.

       Second,  it  provides  regular  expressions  for  a  number of registered URL/URI schemes.
       Registered schemes are currently ftp, ldap, ldaps,  file,  http,  https,  gopher,  mailto,
       news, wais and prospero.  The package uri::urn adds scheme urn.

       The    commands    of    the    package    conform    to    RFC   3986   (https://www.rfc-
       editor.org/rfc/rfc3986.txt), with the exception of a loophole arising from  RFC  1630  and
       described  in  RFC  3986  Sections  5.2.2 and 5.4.2. The loophole allows a relative URI to
       include a scheme if it is the same as the scheme of the  base  URI  against  which  it  is
       resolved. RFC 3986 recommends avoiding this usage.

COMMANDS

       uri::setQuirkOption option ?value?
              uri::setQuirkOption  is  an  accessor command for a number of "quirk options".  The
              command has the same semantics as the command set: when called with one argument it
              reads  an existing value; with two arguments it writes a new value.  The value of a
              "quirk option" is boolean: the value false  requests  conformance  with  RFC  3986,
              while  true requests use of the quirk.  See section QUIRK OPTIONS for discussion of
              the different options and their purpose.

       uri::split url ?defaultscheme?
              uri::split takes a url, decodes it and then  returns  a  list  of  key/value  pairs
              suitable  for  array  set  containing the constituents of the url. If the scheme is
              missing from the url it defaults to the value of defaultscheme if it was specified,
              or http else. Currently the schemes http, https, ftp, mailto, news, ldap, ldaps and
              file are supported by the package itself.  See section EXTENDING on how  to  expand
              that range.

              The  set of constituents of a URL (= the set of keys in the returned dictionary) is
              dependent on the scheme of the URL. The only key which is therefore always  present
              is scheme. For the following schemes the constituents and their keys are known:

              ftp    user, pwd, host, port, path, type, pbare.  The pbare is optional.

              http(s)
                     user, pwd, host, port, path, query, fragment, pbare.  The pbare is optional.

              file   path, host. The host is optional.

              mailto user, host. The host is optional.

              ldap(s)
                     host, port, dn, attrs, scope, filter, extensions

              news   Either message-id or newsgroup-name.

              For  discussion  of the boolean pbare see options NoInitialSlash and NoExtraKeys in
              QUIRK OPTIONS.

              The constituents are returned as slices of the argument  url,  without  removal  of
              percent-encoding  ("url-encoding")  or other adaptations.  Notably, on Windows® the
              path in scheme file  is  not  a  valid  local  filename.   See  EXAMPLES  for  more
              information.

       uri::join ?key value?...
              uri::join  takes  a  list of key/value pairs (generated by uri::split, for example)
              and returns the canonical URL they represent. Currently the  schemes  http,  https,
              ftp,  mailto,  news,  ldap, ldaps and file are supported by the package itself. See
              section EXTENDING on how to expand that range.

              The arguments are expected to be slices  of  a  valid  URL,  with  percent-encoding
              ("url-encoding") and any other necessary adaptations.  Notably, on Windows the path
              in scheme file is not a valid local filename.  See EXAMPLES for more information.

       uri::resolve base url
              uri::resolve resolves the specified url relative to base, in conformance  with  RFC
              3986.  In  other  words:  a  non-relative  url is returned unchanged, whereas for a
              relative url the missing parts are taken from base and prepended to it. The  result
              of this operation is returned. For an empty url the result is base, without its URI
              fragment (if any).  The command is available for  schemes  http,  https,  ftp,  and
              file.

       uri::isrelative url
              uri::isrelative  determines whether the specified url is absolute or relative.  The
              command is available for a url of any scheme.

       uri::geturl url ?options...?
              uri::geturl decodes the specified url  and  then  dispatches  the  request  to  the
              package  appropriate  for the scheme found in the URL. The command assumes that the
              package to handle the given scheme either has the same name as  the  scheme  itself
              (including  possible  capitalization)  followed  by  ::geturl,  or, in case of this
              failing,  has  the  same  name   as   the   scheme   itself   (including   possible
              capitalization).  It  further  assumes  that whatever package was loaded provides a
              geturl-command in the namespace of the  same  name  as  the  package  itself.  This
              command  is  called with the given url and all given options. Currently geturl does
              not handle any options itself.

              Note: file-URLs are an exception to the rule  described  above.  They  are  handled
              internally.

              It  is  not  possible to specify results of the command. They depend on the geturl-
              command for the scheme the request was dispatched to.

       uri::canonicalize uri
              uri::canonicalize returns the canonical form of a URI.  The canonical form of a URI
              is  one  where relative path specifications, i.e. "." and "..", have been resolved.
              The command is available for all URI schemes that  have  uri::split  and  uri::join
              commands.  The  command  returns  a  canonicalized URI if the URI scheme has a path
              component (i.e. http, https, ftp, and file).  For schemes that have uri::split  and
              uri::join  commands but no path component (i.e. mailto, news, ldap, and ldaps), the
              command returns the uri unchanged.

       uri::register schemeList script
              uri::register registers the first element of schemeList as a  new  scheme  and  the
              remaining  elements  as  aliases  for this scheme. It creates the namespace for the
              scheme and executes the script in the new namespace.  The  script  has  to  declare
              variables  containing  regular  expressions  relevant  to  the scheme. At least the
              variable schemepart has to be declared as that one is used to extend the  variables
              keeping track of the registered schemes.

SCHEMES

       In  addition  to  the commands mentioned above this package provides regular expression to
       recognize URLs for a number of URL schemes.

       For each supported scheme a namespace of the same name as the scheme  itself  is  provided
       inside  of  the  namespace  uri  containing  the variable url whose contents are a regular
       expression to recognize URLs of that scheme.  Additional  variables  may  contain  regular
       expressions for parts of URLs for that scheme.

       The  variable  uri::schemes contains a list of all registered schemes. Currently these are
       ftp, ldap, ldaps, file, http, https, gopher, mailto, news, wais and prospero.

EXTENDING

       Extending the range of schemes supported by uri::split and uri::join is easy because  both
       commands do not handle the request by themselves but dispatch it to another command in the
       uri namespace using the scheme of the URL as criterion.

       uri::split and uri::join call Split[string  totitle  <scheme>]  and   Join[string  totitle
       <scheme>] respectively.

       The   provision  of  split  and  join  commands  is  sufficient  to  extend  the  commands
       uri::canonicalize and uri::geturl (the latter subject to the availability  of  a  suitable
       package  with a geturl command).  In contrast, to extend the command uri::resolve to a new
       scheme, the command itself must be modified.

       To extend the range of schemes for which pattern information is available, use the command
       uri::register.

       An  example  of  a  package  that provides both commands and pattern information for a new
       scheme is uri::urn, which adds scheme urn.

QUIRK OPTIONS

       The value of a "quirk option" is boolean: the value false requests  conformance  with  RFC
       3986, while true requests use of the quirk.  Use command uri::setQuirkOption to access the
       values of quirk options.

       Quirk options are  useful  both  for  allowing  backwards  compatibility  when  a  command
       specification  changes,  and  for  adding  useful  features  that  are not included in RFC
       specifications.  The following quirk options are currently defined:

       NoInitialSlash
              This quirk option concerns the leading character of  path  (if  non-empty)  in  the
              schemes http, https, and ftp.

              RFC  3986  defines path in an absolute URI to have an initial "/", unless the value
              of path is the empty string. For the scheme  file,  all  versions  of  package  uri
              follow this rule.  The quirk option NoInitialSlash does not apply to scheme file.

              For  the schemes http, https, and ftp, versions of uri before 1.2.7 define the path
              NOT to include an initial "/".  When the quirk option NoInitialSlash is  true  (the
              default),  this  behavior  is also used in version 1.2.7.  To use instead values of
              path as defined by RFC 3986, set this quirk option to false.

              This setting does not affect RFC 3986 conformance.  If NoInitialSlash is true, then
              the  value  of  path in the schemes http, https, or ftp, cannot distinguish between
              URIs in which the full "RFC 3986 path" is the empty string "" or a single slash "/"
              respectively.   The missing information is recorded in an additional uri::split key
              pbare.

              The boolean pbare is defined when quirk options NoInitialSlash and NoExtraKeys have
              values  true  and  false  respectively.   In this case, if the value of path is the
              empty string "", pbare is true if the full "RFC 3986 path"  is  "",  and  pbare  is
              false if the full "RFC 3986 path" is "/".

              Using this quirk option NoInitialSlash is a matter of preference.

       NoExtraKeys
              This  quirk  option permits full backward compatibility with versions of uri before
              1.2.7, by omitting the uri::split key  pbare  described  above  (see  quirk  option
              NoInitialSlash).   The  outcome is greater backward compatibility of the uri::split
              command, but an inability to distinguish between URIs in which the full  "RFC  3986
              path" is the empty string "" or a single slash "/" respectively - i.e. a minor non-
              conformance with RFC 3986.

              If the quirk option NoExtraKeys is false (the default), command uri::split  returns
              an additional key pbare, and the commands comply with RFC 3986. If the quirk option
              NoExtraKeys is true, the key pbare is not defined and there is not full conformance
              with RFC 3986.

              Using  the  quirk  option NoExtraKeys is NOT recommended, because if set to true it
              will  reduce  conformance  with  RFC  3986.   The  option  is  included  only   for
              compatibility  with code, written for earlier versions of uri, that needs values of
              path without a leading "/", AND ALSO cannot tolerate unexpected keys in the results
              of uri::split.

       HostAsDriveLetter
              When handling the scheme file on the Windows platform, versions of uri before 1.2.7
              use the host field to represent a Windows drive letter and the colon  that  follows
              it,  and  the path field to represent the filename path after the colon.  Such URIs
              are  invalid,  and  are  not  recognized  by  any  RFC.  When  the   quirk   option
              HostAsDriveLetter  is  true,  this  behavior is also used in version 1.2.7.  To use
              file URIs on Windows that conform to RFC 3986, set this quirk option to false  (the
              default).

              Using  this  quirk is NOT recommended, because if set to true it will cause the uri
              commands to expect and produce invalid URIs.   The  option  is  included  only  for
              compatibility with legacy code.

       RemoveDoubleSlashes
              When a URI is canonicalized by uri::canonicalize, its path is normalized by removal
              of segments "." and "..".  RFC 3986 does not mandate the removal of empty  segments
              "" (i.e. the merger of double slashes, which is a feature of filename normalization
              but not of URI path normalization): it treats URIs with excess slashes as referring
              to  different  resources.   When  the quirk option RemoveDoubleSlashes is true (the
              default), empty segments will be  removed  from  path.   To  prevent  removal,  and
              thereby conform to RFC 3986, set this quirk option to false.

              Using  this quirk is a matter of preference.  A URI with double slashes in its path
              was most likely generated by error,  certainly  so  if  it  has  a  straightforward
              mapping to a file on a server.  In some cases it may be better to sanitize the URI;
              in others, to keep the URI and let the server handle the possible error.

   BACKWARD COMPATIBILITY
       To behave as similarly as possible  to  versions  of  uri  earlier  than  1.2.7,  set  the
       following quirk options:

       •      uri::setQuirkOption NoInitialSlash 1

       •      uri::setQuirkOption NoExtraKeys 1

       •      uri::setQuirkOption HostAsDriveLetter 1

       •      uri::setQuirkOption RemoveDoubleSlashes 0

       In code that can tolerate the return by uri::split of an additional key pbare, set

       •      uri::setQuirkOption NoExtraKeys 0

       in order to achieve greater compliance with RFC 3986.

   NEW DESIGNS
       For new projects, the following settings are recommended:

       •      uri::setQuirkOption NoInitialSlash 0

       •      uri::setQuirkOption NoExtraKeys 0

       •      uri::setQuirkOption HostAsDriveLetter 0

       •      uri::setQuirkOption RemoveDoubleSlashes 0|1

   DEFAULT VALUES
       The  default  values for package uri version 1.2.7 are intended to be a compromise between
       backwards compatibility and improved features.  Different default values may be chosen  in
       future versions of package uri.

       •      uri::setQuirkOption NoInitialSlash 1

       •      uri::setQuirkOption NoExtraKeys 0

       •      uri::setQuirkOption HostAsDriveLetter 0

       •      uri::setQuirkOption RemoveDoubleSlashes 1

EXAMPLES

       A  Windows® local filename such as "C:\Other Files\startup.txt" is not suitable for use as
       the path element of a URI in the scheme file.

       The Tcl command file normalize will  convert  the  backslashes  to  forward  slashes.   To
       generate  a valid path for the scheme file, the normalized filename must be prepended with
       "/", and then any characters that do not match the regexp bracket expression

                  [a-zA-Z0-9$_.+!*'(,)?:@&=-]

       must be percent-encoded.

       The result in this example is "/C:/Other%20Files/startup.txt" which is a valid  value  for
       path.

              % uri::join path /C:/Other%20Files/startup.txt scheme file

              file:///C:/Other%20Files/startup.txt

              % uri::split file:///C:/Other%20Files/startup.txt

              path /C:/Other%20Files/startup.txt scheme file

       On  UNIX®  systems filenames begin with "/" which is also used as the directory separator.
       The only action needed to convert a filename to a valid path is percent-encoding.

CREDITS

       Original code (regular expressions) by Andreas Kupries.   Modularisation  by  Steve  Ball,
       also the split/join/resolve functionality. RFC 3986 conformance by Keith Nash.

BUGS, IDEAS, FEEDBACK

       This  document,  and  the  package  it  describes, will undoubtedly contain bugs and other
       problems.   Please  report  such  in   the   category   uri   of   the   Tcllib   Trackers
       [http://core.tcl.tk/tcllib/reportlist].  Please also report any ideas for enhancements you
       may have for either package and/or documentation.

       When proposing code changes, please provide unified diffs, i.e the output of diff -u.

       Note further that attachments are strongly preferred over inlined patches. Attachments can
       be  made  by going to the Edit form of the ticket immediately after its creation, and then
       using the left-most button in the secondary navigation bar.

KEYWORDS

       fetching information, file, ftp, gopher, http, https, ldap, mailto,  news,  prospero,  rfc
       1630, rfc 2255, rfc 2396, rfc 3986, uri, url, wais, www

CATEGORY

       Networking