Provided by: tcllib_1.19-dfsg-2_all bug

NAME

       uri - URI utilities

SYNOPSIS

       package require Tcl  8.2

       package require uri  ?1.2.7?

       uri::setQuirkOption option ?value?

       uri::split url ?defaultscheme?

       uri::join ?key value?...

       uri::resolve base url

       uri::isrelative url

       uri::geturl url ?options...?

       uri::canonicalize uri

       uri::register schemeList script

________________________________________________________________________________________________________________

DESCRIPTION

       This package does two things.

       First,  it  provides a number of commands for manipulating URLs/URIs and fetching data specified by them.
       For fetching data this package analyses the requested URL/URI and then dispatches it to  the  appropriate
       package  (http,  ftp,  ...)  for  actual retrieval.  Currently these commands are defined for the schemes
       http, https, ftp, mailto, news, ldap, ldaps and file.  The package uri::urn adds scheme urn.

       Second, it provides regular expressions for a number of registered URL/URI  schemes.  Registered  schemes
       are  currently ftp, ldap, ldaps, file, http, https, gopher, mailto, news, wais and prospero.  The package
       uri::urn adds scheme urn.

       The commands of the package conform to RFC 3986  (https://www.rfc-editor.org/rfc/rfc3986.txt),  with  the
       exception  of  a  loophole  arising from RFC 1630 and described in RFC 3986 Sections 5.2.2 and 5.4.2. The
       loophole allows a relative URI to include a scheme if it is the same  as  the  scheme  of  the  base  URI
       against which it is resolved. RFC 3986 recommends avoiding this usage.

COMMANDS

       uri::setQuirkOption option ?value?
              uri::setQuirkOption  is  an accessor command for a number of "quirk options".  The command has the
              same semantics as the command set: when called with one argument it reads an existing value;  with
              two  arguments  it  writes a new value.  The value of a "quirk option" is boolean: the value false
              requests conformance with RFC 3986, while true requests use  of  the  quirk.   See  section  QUIRK
              OPTIONS for discussion of the different options and their purpose.

       uri::split url ?defaultscheme?
              uri::split  takes  a url, decodes it and then returns a list of key/value pairs suitable for array
              set containing the constituents of the url. If the scheme is missing from the url it  defaults  to
              the  value  of defaultscheme if it was specified, or http else. Currently the schemes http, https,
              ftp, mailto, news, ldap, ldaps and  file  are  supported  by  the  package  itself.   See  section
              EXTENDING on how to expand that range.

              The  set  of  constituents of a URL (= the set of keys in the returned dictionary) is dependent on
              the scheme of the URL. The only key which is therefore always present is scheme. For the following
              schemes the constituents and their keys are known:

              ftp    user, pwd, host, port, path, type, pbare.  The pbare is optional.

              http(s)
                     user, pwd, host, port, path, query, fragment, pbare.  The pbare is optional.

              file   path, host. The host is optional.

              mailto user, host. The host is optional.

              ldap(s)
                     host, port, dn, attrs, scope, filter, extensions

              news   Either message-id or newsgroup-name.

              For discussion of the boolean pbare see options NoInitialSlash and NoExtraKeys in QUIRK OPTIONS.

              The  constituents  are returned as slices of the argument url, without removal of percent-encoding
              ("url-encoding") or other adaptations.  Notably, on Windows® the path in  scheme  file  is  not  a
              valid local filename.  See EXAMPLES for more information.

       uri::join ?key value?...
              uri::join  takes  a list of key/value pairs (generated by uri::split, for example) and returns the
              canonical URL they represent. Currently the schemes http, https, ftp, mailto,  news,  ldap,  ldaps
              and file are supported by the package itself. See section EXTENDING on how to expand that range.

              The arguments are expected to be slices of a valid URL, with percent-encoding ("url-encoding") and
              any other necessary adaptations.  Notably, on Windows the path in scheme file is not a valid local
              filename.  See EXAMPLES for more information.

       uri::resolve base url
              uri::resolve  resolves  the specified url relative to base, in conformance with RFC 3986. In other
              words: a non-relative url is returned unchanged, whereas for a relative url the missing parts  are
              taken  from  base  and prepended to it. The result of this operation is returned. For an empty url
              the result is base, without its URI fragment (if any).  The command is available for schemes http,
              https, ftp, and file.

       uri::isrelative url
              uri::isrelative  determines  whether  the  specified  url is absolute or relative.  The command is
              available for a url of any scheme.

       uri::geturl url ?options...?
              uri::geturl decodes the specified url and then dispatches the request to the  package  appropriate
              for  the  scheme found in the URL. The command assumes that the package to handle the given scheme
              either has the same name as the scheme itself  (including  possible  capitalization)  followed  by
              ::geturl,  or, in case of this failing, has the same name as the scheme itself (including possible
              capitalization). It further assumes that whatever package was loaded provides a geturl-command  in
              the  namespace  of  the same name as the package itself. This command is called with the given url
              and all given options. Currently geturl does not handle any options itself.

              Note: file-URLs are an exception to the rule described above. They are handled internally.

              It is not possible to specify results of the command. They depend on the  geturl-command  for  the
              scheme the request was dispatched to.

       uri::canonicalize uri
              uri::canonicalize  returns  the canonical form of a URI.  The canonical form of a URI is one where
              relative path specifications, i.e. "." and "..", have been resolved.  The command is available for
              all  URI  schemes that have uri::split and uri::join commands. The command returns a canonicalized
              URI if the URI scheme has a path component (i.e. http, https, ftp, and file).   For  schemes  that
              have uri::split and uri::join commands but no path component (i.e. mailto, news, ldap, and ldaps),
              the command returns the uri unchanged.

       uri::register schemeList script
              uri::register registers the first element of schemeList as a new scheme and the remaining elements
              as aliases for this scheme. It creates the namespace for the scheme and executes the script in the
              new namespace. The script has to declare variables containing regular expressions relevant to  the
              scheme.  At  least  the  variable  schemepart has to be declared as that one is used to extend the
              variables keeping track of the registered schemes.

SCHEMES

       In addition to the commands mentioned above this package provides regular expression  to  recognize  URLs
       for a number of URL schemes.

       For  each  supported  scheme  a namespace of the same name as the scheme itself is provided inside of the
       namespace uri containing the variable url whose contents are a regular expression to  recognize  URLs  of
       that scheme. Additional variables may contain regular expressions for parts of URLs for that scheme.

       The  variable  uri::schemes  contains  a  list  of all registered schemes. Currently these are ftp, ldap,
       ldaps, file, http, https, gopher, mailto, news, wais and prospero.

EXTENDING

       Extending the range of schemes supported by uri::split and uri::join is easy because both commands do not
       handle the request by themselves but dispatch it to another command in the uri namespace using the scheme
       of the URL as criterion.

       uri::split  and  uri::join  call  Split[string  totitle  <scheme>]  and   Join[string  totitle  <scheme>]
       respectively.

       The  provision  of  split  and  join  commands is sufficient to extend the commands uri::canonicalize and
       uri::geturl (the latter subject to the availability of a suitable package with  a  geturl  command).   In
       contrast, to extend the command uri::resolve to a new scheme, the command itself must be modified.

       To extend the range of schemes for which pattern information is available, use the command uri::register.

       An example of a package that provides both commands and pattern information for a new scheme is uri::urn,
       which adds scheme urn.

QUIRK OPTIONS

       The value of a "quirk option" is boolean: the value false requests conformance with RFC 3986, while  true
       requests use of the quirk.  Use command uri::setQuirkOption to access the values of quirk options.

       Quirk  options are useful both for allowing backwards compatibility when a command specification changes,
       and for adding useful features that are not included in RFC specifications.  The following quirk  options
       are currently defined:

       NoInitialSlash
              This  quirk  option  concerns  the  leading  character of path (if non-empty) in the schemes http,
              https, and ftp.

              RFC 3986 defines path in an absolute URI to have an initial "/", unless the value of path  is  the
              empty string. For the scheme file, all versions of package uri follow this rule.  The quirk option
              NoInitialSlash does not apply to scheme file.

              For the schemes http, https, and ftp, versions of uri before 1.2.7 define the path NOT to  include
              an initial "/".  When the quirk option NoInitialSlash is true (the default), this behavior is also
              used in version 1.2.7.  To use instead values of path as defined  by  RFC  3986,  set  this  quirk
              option to false.

              This  setting  does not affect RFC 3986 conformance.  If NoInitialSlash is true, then the value of
              path in the schemes http, https, or ftp, cannot distinguish between URIs in which  the  full  "RFC
              3986  path" is the empty string "" or a single slash "/" respectively.  The missing information is
              recorded in an additional uri::split key pbare.

              The boolean pbare is defined when quirk options NoInitialSlash and NoExtraKeys  have  values  true
              and  false respectively.  In this case, if the value of path is the empty string "", pbare is true
              if the full "RFC 3986 path" is "", and pbare is false if the full "RFC 3986 path" is "/".

              Using this quirk option NoInitialSlash is a matter of preference.

       NoExtraKeys
              This quirk option permits full backward compatibility  with  versions  of  uri  before  1.2.7,  by
              omitting  the uri::split key pbare described above (see quirk option NoInitialSlash).  The outcome
              is greater backward compatibility of the uri::split  command,  but  an  inability  to  distinguish
              between  URIs  in  which  the  full  "RFC  3986 path" is the empty string "" or a single slash "/"
              respectively - i.e. a minor non-conformance with RFC 3986.

              If the quirk option NoExtraKeys is false (the default), command uri::split returns  an  additional
              key pbare, and the commands comply with RFC 3986. If the quirk option NoExtraKeys is true, the key
              pbare is not defined and there is not full conformance with RFC 3986.

              Using the quirk option NoExtraKeys is NOT recommended, because if  set  to  true  it  will  reduce
              conformance  with  RFC 3986.  The option is included only for compatibility with code, written for
              earlier versions of uri, that needs values of path without a leading "/", AND ALSO cannot tolerate
              unexpected keys in the results of uri::split.

       HostAsDriveLetter
              When  handling  the scheme file on the Windows platform, versions of uri before 1.2.7 use the host
              field to represent a Windows drive letter and the colon that follows it, and  the  path  field  to
              represent the filename path after the colon.  Such URIs are invalid, and are not recognized by any
              RFC. When the quirk option HostAsDriveLetter is true, this behavior is also used in version 1.2.7.
              To  use  file  URIs  on  Windows  that  conform  to  RFC 3986, set this quirk option to false (the
              default).

              Using this quirk is NOT recommended, because if set to true it will  cause  the  uri  commands  to
              expect and produce invalid URIs.  The option is included only for compatibility with legacy code.

       RemoveDoubleSlashes
              When  a  URI  is canonicalized by uri::canonicalize, its path is normalized by removal of segments
              "." and "..".  RFC 3986 does not mandate the removal of empty segments  ""  (i.e.  the  merger  of
              double  slashes,  which is a feature of filename normalization but not of URI path normalization):
              it treats URIs with excess slashes as referring to different resources.   When  the  quirk  option
              RemoveDoubleSlashes  is  true (the default), empty segments will be removed from path.  To prevent
              removal, and thereby conform to RFC 3986, set this quirk option to false.

              Using this quirk is a matter of preference.  A URI with double slashes in its path was most likely
              generated  by  error,  certainly so if it has a straightforward mapping to a file on a server.  In
              some cases it may be better to sanitize the URI; in others, to keep the URI  and  let  the  server
              handle the possible error.

   BACKWARD COMPATIBILITY
       To  behave  as  similarly  as  possible  to  versions  of uri earlier than 1.2.7, set the following quirk
       options:

       •      uri::setQuirkOption NoInitialSlash 1

       •      uri::setQuirkOption NoExtraKeys 1

       •      uri::setQuirkOption HostAsDriveLetter 1

       •      uri::setQuirkOption RemoveDoubleSlashes 0

       In code that can tolerate the return by uri::split of an additional key pbare, set

       •      uri::setQuirkOption NoExtraKeys 0

       in order to achieve greater compliance with RFC 3986.

   NEW DESIGNS
       For new projects, the following settings are recommended:

       •      uri::setQuirkOption NoInitialSlash 0

       •      uri::setQuirkOption NoExtraKeys 0

       •      uri::setQuirkOption HostAsDriveLetter 0

       •      uri::setQuirkOption RemoveDoubleSlashes 0|1

   DEFAULT VALUES
       The default values for package uri version 1.2.7 are  intended  to  be  a  compromise  between  backwards
       compatibility  and  improved  features.   Different  default  values  may be chosen in future versions of
       package uri.

       •      uri::setQuirkOption NoInitialSlash 1

       •      uri::setQuirkOption NoExtraKeys 0

       •      uri::setQuirkOption HostAsDriveLetter 0

       •      uri::setQuirkOption RemoveDoubleSlashes 1

EXAMPLES

       A Windows® local filename such as "C:\Other Files\startup.txt" is  not  suitable  for  use  as  the  path
       element of a URI in the scheme file.

       The Tcl command file normalize will convert the backslashes to forward slashes.  To generate a valid path
       for the scheme file, the normalized filename must be prepended with "/", and then any characters that  do
       not match the regexp bracket expression

                  [a-zA-Z0-9$_.+!*'(,)?:@&=-]

       must be percent-encoded.

       The result in this example is "/C:/Other%20Files/startup.txt" which is a valid value for path.

              % uri::join path /C:/Other%20Files/startup.txt scheme file

              file:///C:/Other%20Files/startup.txt

              % uri::split file:///C:/Other%20Files/startup.txt

              path /C:/Other%20Files/startup.txt scheme file

       On UNIX® systems filenames begin with "/" which is also used as the directory separator.  The only action
       needed to convert a filename to a valid path is percent-encoding.

CREDITS

       Original code (regular  expressions)  by  Andreas  Kupries.   Modularisation  by  Steve  Ball,  also  the
       split/join/resolve functionality. RFC 3986 conformance by Keith Nash.

BUGS, IDEAS, FEEDBACK

       This  document,  and  the package it describes, will undoubtedly contain bugs and other problems.  Please
       report such in the category uri of the Tcllib  Trackers  [http://core.tcl.tk/tcllib/reportlist].   Please
       also report any ideas for enhancements you may have for either package and/or documentation.

       When proposing code changes, please provide unified diffs, i.e the output of diff -u.

       Note  further  that  attachments  are strongly preferred over inlined patches. Attachments can be made by
       going to the Edit form of the ticket immediately after its creation, and then using the left-most  button
       in the secondary navigation bar.

KEYWORDS

       fetching  information,  file, ftp, gopher, http, https, ldap, mailto, news, prospero, rfc 1630, rfc 2255,
       rfc 2396, rfc 3986, uri, url, wais, www

CATEGORY

       Networking